Washington University in St. Louis Washington University Open Scholarship

Arts & Sciences Electronic Theses and Dissertations Arts & Sciences

Winter 12-15-2018 Mechanistic characterization of resistome and microbiome dynamics across diverse microbial habitats Andrew John Gasparrini Washington University in St. Louis

Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds Part of the Bioinformatics Commons, Genetics Commons, and the Microbiology Commons

Recommended Citation Gasparrini, Andrew John, "Mechanistic characterization of resistome and microbiome dynamics across diverse microbial habitats" (2018). Arts & Sciences Electronic Theses and Dissertations. 1697. https://openscholarship.wustl.edu/art_sci_etds/1697

This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected].

WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences Molecular Genetics and Genomics

Dissertation Examination Committee: Gautam Dantas, Chair Daniel E. Goldberg Andrew L. Kau Barbara B. Warner Timothy A. Wencewicz

Mechanistic Characterization of Resistome and Microbiome Dynamics Across Diverse Microbial Habitats by Andrew J. Gasparrini

A dissertation presented to The Graduate School of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

December 2018 St. Louis, Missouri

© 2018, Andrew J. Gasparrini

Table of Contents

List of Figures ...... vi List of Tables ...... viii Acknowledgments...... ix Abstract of the Dissertation ...... xiii Chapter 1: Introduction ...... 1 1.1 Next-generation approaches to understand and combat the resistome ...... 1 1.1.1 The promise of and the threat of resistance ...... 1 1.1.2 Common mechanisms of antibiotic action and antibiotic resistance ...... 5 1.1.3 The golden age of antibiotics ...... 7 1.1.3 High-throughput resistance discovery...... 9 1.1.4 Next-generation approaches to resistance ...... 19 1.1.5 Conclusions ...... 25 1.2 Genomic and functional techniques to mine the microbiome for novel antimicrobials and antimicrobial resistance genes: ...... 27 1.2.1 Introduction ...... 27 1.2.2 Rise in Antimicrobial Resistance and Decline in Antimicrobial Discovery ...... 29 1.2.3 Brief History of Antimicrobials ...... 30 1.2.4 Brief History of Antimicrobial Resistance ...... 32 1.2.5 Culture-based methods underestimate functional diversity of microbiomes ...... 35 1.2.6 Function based methods for resistance discovery ...... 37 1.2.7 Sequence-based methods for resistance discovery ...... 38 1.2.8 Function-based methods for antimicrobial discovery from microbiomes ...... 43 1.2.9 Sequence-based methods for antimicrobial discovery from microbiomes ...... 46 1.2.10 Future directions in antimicrobial and antimicrobial resistance discovery ...... 49 1.3: Antibiotics and the preterm infant gut microbiome and resistome ...... 52 1.3.1 Introduction ...... 52 1.3.2 Techniques for studying development of the infant gut microbiota and the developing resistome ...... 52 1.3.3 A longitudinal, sequencing based interrogation of the preterm infant gut microbiota ...... 54 1.3.4 Antibiotics disrupt the preterm gut microbiota richness and composition ...... 55

ii

1.3.5 Antibiotic therapy enriches for antibiotic resistance genes and multidrug resistant bacteria .... 56 1.3.6 The predictable response of the preterm infant gut microbiota to antibiotic therapy ...... 58 1.3.7 Implications for future research ...... 59 1.4 Scope of the thesis ...... 60 1.5 Acknowledgments ...... 63 Chapter 2: Persistent metagenomic signatures of early life antibiotic treatment in the infant gut microbiota and resistome ...... 64 2.1 Introduction ...... 64 2.1.1 Abstract ...... 64 2.1.2 Introduction ...... 64 2.2 Results ...... 66 2.2.1 Metagenomics based analysis of effect of antibiotics on the preterm infant gut microbiota ..... 66 2.2.3 Antibiotic resistome of preterm infant gut microbiota ...... 72 2.2.4 Persistence of multidrug resistant Enterobacteriaceae in the preterm infant gut microbiota ..... 77 2.2.5 Persistent metagenomic signature of antibiotic treatment in premature infants ...... 80 2.3 Materials and Methods ...... 82 2.3.1 Sample and metadata collection ...... 82 2.3.2 Metagenomic DNA extraction ...... 82 2.3.3 Metagenomic sequencing library preparation ...... 83 2.3.4 Rarefaction analysis ...... 83 2.3.6 Construction of metagenomic libraries from infant gut samples for functional selection ...... 84 2.3.7 Functional selections for antibiotic resistance ...... 87 2.3.8 Amplification and sequencing of functionally selected fragments ...... 88 2.3.9 Assembly and annotation of functionally selected fragments ...... 90 2.3.10 Bacterial isolation from infant stools ...... 92 2.3.12 Isolate sequencing library preparation ...... 93 2.3.13 Assembly of isolate genomes ...... 93 2.3.14 Isolate genomic analysis ...... 93 2.3.15 Enterococcus biofilm formation assay ...... 94 2.4 Discussion ...... 96 2.5 Acknowledgments ...... 97 2.6 Supplementary Figures ...... 99

iii

Chapter 3: Tetracycline resistance by inactivation across environmental, commensal, and pathogenic microbes ...... 108 3.1 Introduction ...... 108 3.2 Results ...... 113 3.2.3 TE_7F_3 confers resistance in clinical P. aeruginosa isolate ...... 118 3.3 Materials and Methods ...... 122 3.4 Discussion ...... 128 3.5 Acknowledgments ...... 129 3.6 Supplementary Figures ...... 131 4.1 Introduction ...... 133 4.1.1 Abstract ...... 133 4.1.2 Introduction ...... 134 4.2 Results ...... 136 4.2.1 Tetracycline inactivation by Legionella longbeachae ...... 136 4.2.2 Tet(50,51,55,56) crystal structures reveal a conserved domain architecture and a dynamic substrate loading channel ...... 137 4.2.4 Chlortetracycline binds to Tet(50) in an unprecedented mode, driving FAD conversion and substrate loading channel closure ...... 140 4.2.5 Tetracycline destructases degrade chlortetracycline despite the novel binding mode ...... 142 4.2.6 Anhydrotetracycline is a mechanistic and competitive inhibitor that locks the FAD in an unproductive conformation ...... 146 4.2.7 Anhydrotetracycline inhibits tetracycline destructases and functionally rescues tetracycline . 148 4.2.8 The novel inhibition mechanism restores tetracycline activity against tetracycline destructase- expressing bacteria ...... 149 4.3 Materials and Methods ...... 150 4.3.1 Legionella plasmid construction ...... 150 4.3.2 Legionella strain construction ...... 151 4.3.3 Tetracycline inactivation in Legionella ...... 151 4.3.4 Cloning, expression and purification of tetracycline-inactivating enzymes ...... 152 4.3.5 Tet(55) selenomethionine-labelling ...... 153 4.3.6 Crystallization, data collection, and structure determination ...... 153 4.3.7 In vitro tetracycline and chlortetracycline inactivation assays ...... 155 4.3.8 Tetracycline inactivation in E. coli ...... 156 4.3.9 Kinetic characterization of tetracycline and chlortetracycline inactivation ...... 156 iv

4.3.10 LC-MS characterization of chlortetracycline degradation products ...... 157 4.3.11 In vitro characterization of anhydrotetracycline inhibition ...... 157 4.3.12 Checkerboard synergy assay ...... 158 4.4 Discussion ...... 159 4.5 Acknowledgements ...... 160 4.6 Supplementary Figures ...... 162 5.1 The effects of early life antibiotic therapy on the preterm infant gut microbiota and resistome...... 172 5.2 Emergence of tetracycline resistance by inactivation across habitats ...... 174 References ...... 177

v

List of Figures

Figure 1.1: Common mechanisms of antibiotic action and resistance ...... 19 Figure 1.2: Next-generation sequencing and functional metagenomic selection accelerate cataloging of known and novel resistance genes ...... 21 Figure 1.3: Functional metagenomic approaches for characterizing resistomes ...... 23 Figure 1.4: Recent global dissemination of plasmid-borne resistance ...... 27 Figure 1.5: Synteny of antibiotic resistance genes provide historical context and foreshadow future threats ...... 31 Figure 1.6: Integration of next-generation sequencing and screening technologies with drug development and resistance surveillance ...... 35 Figure 1.7: Timeline of antibiotic deployment and resistance observation ...... 46 Figure 1.8: Function and sequence-based methods for mining microbiomes ...... 48 Figure 1.9: Prescription data in the NICU ...... 66 Figure 1.10: Antibiotic treatment alters preterm infant gut microbiota development ...... 69 Figure 2.1: Clinical variables including antibiotic exposure predict microbiota diversity ...... 82 Figure 2.2: Partial architectural recovery of preterm infant gut microbiota following discharge from NICU ...... 84 Figure 2.3: Preterm infants harbor an enriched gut resistome ...... 87 Figure 2.4: Multidrug resistant Enterobacteriaceae lineages persist in preterm infant gut microbiota ...... 92 Figure 2.5: Enduring damage to the preterm infant gut microbiota ...... 94 Supplementary Figure 2.1: Rarefaction curve of species identified by Metaphlan by sequencing depth ...... 112 Supplementary Figure 2.2: Temporal changes in relative abundance of top eight classes by group...... 113 Supplementary Figure 2.3: Timeline of samples and antibiotic treatments for infants in this study...... 114 Supplementary Figure 2.4: Functional selections of 22 metagenomic libraries constructed from infant stool ...... 115 Supplementary Figure 2.5: Synteny of functionally selected ARGs with mobile genetic elements ...... 116 Supplementary Figure 2.6: Microbiota with enriched resistome are low in diversity and dominated by Enterobacteriaceae ...... 117

vi

Supplementary Figure 2.7: Genomic and phenotypic heterogeneity in Enterococcus isolated from infant stool ...... 118 Figure 3.1: Diversity in tetracycline inactivating enzyme sequence corresponds to microbial habitat of origin ...... 126 Figure 3.2: Phenotypic profiles of heterologously expressed tetracycline inactivating enzymes correlate with habitat of origin ...... 128 Figure 3.3: Biochemical basis for tetracycline inactivation by TE_7F_3 ...... 130 Figure 3.4 Anhydrotetracycline inhibits inactivation of next-generation tetracyclines by TE_7F_3 ...... 134 Supplementary Figure 3.1: TE_7F_3 inactivates next-generation tetracyclines by monooxygenation ...... 144 Figure 4.1: Dose-response curve showing the effect of tetracycline on growth of Legionella strains...... 149 Figure 4.2: Crystal structures of Tet(50), Tet(51), Tet(55), and Tet(56) reveal a conserved architecture, structural changes that enable substrate loading channel accessibility, and two conformations of the FAD cofactor...... 151 Figure 4.3: Crystal structure of Tet(50) in complex with chlortetracycline reveals an unexpected mode of binding to the tetracycline-binding site that drives substrate loading channel closure and FAD conversion ...... 154 Figure 4.4: Chlortetracycline is degraded by tetracycline destructases despite the unusual binding mode ...... 157 Figure 4.5: Anhydrotetracycline binds to the active site of Tet(50), trapping FAD in the unproductive OUT conformation ...... 159 Figure 4.6: Anhydrotetracycline prevents enzymatic tetracycline degradation, functionally rescuing tetracycline antibiotic activity ...... 162 Supplementary Figure 4.1: Overall structures and FAD conformation states of Tet(50,51,55,56)...... 175 Supplementary Figure 4.2: Tetracycline compounds have a distinctive three-dimensional architecture with a significant bend between rings A and B, allowing for unambiguous modeling into the electron density ...... 176 Supplementary Figure 4.3: Multiple sequence alignment of Tet(47-56) ...... 177 Supplementary Figure 4.4: Anhydrotetracycline prevents enzymatic degradation of tetracycline ...... 178 Supplementary Figure 4.5: Anhydrotetracycline restores tetracycline efficacy against E. coli expressing tet(50,51,55,56) ...... 179 Supplementary Figure 4.6: Anhydrotetracycline synergizes with tetracycline to kill E. coli expressing tet(50,51,55,56) but not tet(X)...... 180 Supplementary Figure 4.7: Model for binding dynamics, substrate plasticity, and inhibition of tetracycline-inactivating enzymes ...... 181

vii

List of Tables

Table 2.1: Clinical characteristics of infant cohorts analyzed in this study...... 80 Supplementary Table 2.1: Assemblies used as outgroups for maximum likelihood phylogenies ...... 119 Supplementary Table 2.2: Antibiotic concentrations used for functional selections ...... 120 Table 3.1: Zone of clearance (mm) and phenotypic resistance determination for clinical Pseudomonas aeruginosa isolate bearing TE_7F_3 ...... 132 Supplementary Table 3.1 : Primers used in this study ...... 145 Supplementary Table 4.1: Data collection and refinement statistics ...... 182 Supplementary Table 4.2: Kinetic parameters for Tet(50,55,56,X) ...... 183 Supplementary Table 4.3: Relevant strains, plasmid, and primers employed in this study ...... 184

viii

Acknowledgments

I am indebted to a number of people, without whom my graduate work would not have been possible. I would like to thank members of the Dantas lab past and present. I had the privilege to work with nearly all members of the Dantas lab in some capacity over the past four years, and I am a better scientist because of it. In particular, I would like to thank those with whom I directly collaborated on the tetracycline resistance project, including Kevin Forsberg,

Tayte Campbell, Jie Sun, and Luke Diorio-Toth. I would also like to thank those members of the lab who worked on the pediatric microbiota with me or beside me, including Aimee Moore,

Molly Gibson, Terence Crofts, Bin Wang, Alaric D’Souza, Drew Schwartz, Eric Keen, and

Robert Thänert. Finally, I must thank the members of the probiotics team, including Aura

Ferreiro, Nathan Crook, Sherry Sun, and Suryang Kwak. And to all lab members past and present with whom I did not directly collaborate but from whom I most certainly benefited, including Robert Potter, Boahemaa Adu-Oppong, Winston Anthony, Kim Sukhum, Aki Kau,

Amy Langdon, Manish Boolchandani, Chris Bulow, and Nick Goldner: thank you for making the

Dantas lab an exceptional place to grow professionally and personally.

I would like to especially thank Gautam Dantas for being an incredible mentor. Gautam provided the perfect balance of space when I wanted it and support when I needed it, and allowed me to struggle just enough that I was able to mature as a scientist. Gautam’s enthusiasm for science, genuine care for his lab members, and passion for fun made it easy and exciting to come to lab each day. His scientific vision and rigor is something I hope to emulate in my own career.

ix

In addition to Gautam, I would like to thank Dan Goldberg, Andy Kau, Barb Warner, and

Tim Wencewicz for serving on my thesis committee and providing periodic guidance, advice, and mentorship during my PhD work.

I was the beneficiary of an incredible group of collaborators over the course of my PhD.

Specifically, it is necessary to thank Barb Warner and Phil Tarr and members of their laboratories for assembling and enabling access to the incredible pediatric cohorts analyzed in

Chapter 2 and providing critical clinical insights throughout my work. I’d like to thank Niraj

Tolia, Jooyoung Park, and Hirdesh Kumar for serving as prolific and generous structural biology collaborators on the work described in Chapters 3 and 4. Likewise, Tim Wencewicz, Jana

Markley, Mars Reck, and Chanez Symister were invaluable partners in the biochemical analysis described in these chapters.

None of the work described herein would have been possible without the incredible community in the Center for Genome Sciences and Systems Biology under the direction of Jeff

Gordon. In particular, the computational support provided by Brian Koebbe and Eric Martin, sequencing support provided by Jess Hoisington-Lopez, and administrative support of Bonnie

Dee and Keith Page made the CGS a delightful and productive place to spend four years.

My work in graduate school was funded in part by the NIGMS training grant through award number T32 GM007067 (Jim Skeath, Principal Investigator) and from the NIDDK

Pediatric Gastroenterology Research Training Program under award number T32 DK077653

(Phillip Tarr, Principal Investigator). Additionally, the work herein was supported by funding sources noted in Chapters 2, 3, and 4.

x

I’m grateful for the support of my friends, those in St. Louis and those elsewhere, for buoying me throughout my PhD. To those with whom I ran, read, worked, played, ate, and drank

– thank you for your immeasurable contributions to my wellbeing.

Andrew J. Gasparrini

Washington University in St. Louis

December 2018

xi

Dedicated to my family:

To my parents,

To Kara, Matthew, Jenna, William and Ben,

And to Sarah.

xii

ABSTRACT OF THE DISSERTATION

Mechanistic characterization of resistome and microbiome

dynamics across diverse microbial habitats

by

Andrew J. Gasparrini

Doctor of Philosophy in Biology and Biomedical Sciences

Molecular Genetics and Genomics

Washington University in St. Louis, 2018

Professor Gautam Dantas, Chair

Increasing antibiotic resistance in pathogens is a serious public health challenge, with over two million antibiotic resistant infections in the United States leading to at least 23,000 deaths and an estimated $55 billion in excess healthcare and societal costs. Antibiotic resistance has risen steadily in both pathogenic and benign bacteria since antibiotics’ introduction to agriculture and medicine 70 years ago. A dramatic reduction in the number of antibiotics approved for human use has accompanied this rise in antibiotic resistance, leading to the alarming prospect of a post-antibiotic era. Understanding the evolutionary origins, genetic contexts, and molecular mechanisms of antibiotic resistance in diverse environments is crucial to anticipating and mitigating the spread of antibiotic resistance. It is similarly critical to understand the effect of antibiotic exposure on the microbial communities, such as the human gut microbiota, which serve as reservoirs for antibiotic resistance genes available for horizontal transfer to pathogenic organisms. In this work, I take a multipronged biochemical, microbiologic, and genomic approach to surveil existing and emerging antibiotic resistance threats in

xiii environmental and human-associated microbial habitats, and understand the effects of antibiotic perturbation on microbial community and resistome dynamics.

To understand the effect of antibiotic perturbation on microbial community and resistome dynamics, I examined a population of very and extremely preterm infants who received extensive antibiotic exposure during their hospitalization in the neonatal intensive care unit. I use complementary sequencing and culture-based methods to analyze 448 stools collected longitudinally over the first 20 months of life from 41 preterm infants, as well as 17 antibiotic- naïve near term infants that served as developmental controls. Using random forests, I define a program of healthy microbiota assembly in antibiotic naïve infants, and demonstrate acute but transient deviations from this program in antibiotic treated preterm infants. With a combination of sequencing isolates cultured from infant stool and functional metagenomics, I show that a consequence of early life antibiotic treatment in preterm infants is prolonged carriage of multidrug resistant Enterobacteriaceae and an enriched gut resident resistome.

I address the emergence of antibiotic resistance from environmental and commensal reservoirs to pathogens through the lens the tetracycline destructases, a family of flavoenzymes capable of inactivating diverse tetracycline substrates. I conduct a bioinformatic survey of functionally selected metagenomes to show that these enzymes are widespread in diverse microbial communities, and characterize their spectrum of activity and mechanism of action to confirm that these enzymes are functionally and biochemically novel and illuminate sequence- function relationships underlying the family. I identify an inhibitor of these enzymes that prevents tetracycline degradation in vitro and rescues tetracycline activity against resistant bacteria, and show that this inhibitor works by both competing for substrate binding and sterically restricting critical FAD cofactor dynamics. Lastly, I identify a Pseudomonas

xiv aeruginosa clinical isolate from a cystic fibrosis patient that encodes a functional tetracycline inactivator with 100% nucleotide identity to one previously identified in our bioinformatic survey, signaling the clinical arrival of this family of enzymes.

xv

Chapter 1: Introduction

1.1 Next-generation approaches to understand and combat the antibiotic resistome

Antibiotic resistance is a natural feature of diverse microbial ecosystems. Although recent studies of the antibiotic resistome have highlighted barriers to horizontal transfer of antibiotic resistance genes between habitats, the rapid global spread of resistance genes to carbapenem, colistin, and quinolone illustrates the dire clinical and societal consequences of such events. Over time, the study of antibiotic resistance has grown from a focus on single pathogenic organisms in axenic culture to a study of antibiotic resistance in pathogenic, commensal, and environmental bacteria at the level of microbial communities. As the study of antibiotic resistance advances, it is important to incorporate this comprehensive approach to inform global antibiotic resistance surveillance and antibiotic development. It is increasingly becoming apparent that although not all resistance genes are risks for geographic and phylogenetic dissemination, the threat presented by those that are is serious and warrants an interdisciplinary research focus. Here, I highlight seminal work in the resistome field, discuss recent advances in the studies of resistomes, and propose a resistome paradigm that can pave the way for improved proactive identification and mitigation of emerging antibiotic resistance threats.

1.1.1 The promise of antibiotics and the threat of resistance

Antibiotics have revolutionized modern medicine and have saved millions of lives since their discovery at the beginning of the 20th century. These crucial drugs are invaluable owing to their ability to kill or inhibit the growth of bacteria without damaging host cells and tissues2,9.

1

Aside from directly curing deadly infectious diseases, antibiotics have enabled physicians to pursue new treatments such as surgery, organ transplantation, and cancer chemotherapy that were previously infeasible or unsafe owing to the high collateral risk of infection10. Antibiotics remain one of the most commonly prescribed classes of drugs, with over 70 billion clinical doses given globally in 2010, which represents a 36% increase in antibiotic consumption compared to

200011. Although the first antibiotics were synthetic compounds, most antibiotics are natural products or derivatives thereof12. Many antibiotics were recovered from cultured soil microorganisms, particularly those belonging to the Streptomyces genus10. As might be expected given their origins as natural products of bacterial secondary metabolism, many antibiotics can be detected at low levels in pristine environmental samples and have been found in samples predating human antibiotic use13,14.

Because of the ubiquity of antibiotic production by environmental microorganisms, defense mechanisms against these antibiotics and thus resistance also appear to be ancient.

Sequencing of ancient environmental DNA preserved in 30,000 year old permafrost sediments has detected genes that confer resistance to several classes of antibiotics15, and direct screening of bacteria cultured from a cave that had no anthropogenic contact for the past four million years revealed extensive resistance against fourteen antibiotics16. Similarly, the host-associated microbiome of previously uncontacted indigenous people in South America was found to encode several antibiotic resistance genes17. This indicates that resistance to antibiotics far predates appropriation of these small molecules for human use.

The collection of resistance genes in a given environment, which is termed the

‘resistome’18,19, encompasses both intrinsic and acquired resistance genes, as well as proto- resistance genes and silent or cryptic resistance genes. It is thought that many clinically relevant

2 antibiotic resistance genes have their evolutionary origins in environmental microorganisms, which we define for the purposes of this article to be composed of microorganisms not generally associated with humans. This hypothesis was pioneered by Benveniste and Davies, who in 1973 noted biochemically identical resistance-conferring kanamycin acetyltransferases in environmental Streptomyces and in clinical isolates of pathogenic Escherichia coli20. Further evidence was provided by a recent functional screen of soil metagenomes, which revealed the presence of environmental antibiotic resistance genes with >99% nucleotide identity to resistance genes in pathogenic isolates4. Synteny of these genes with other resistance genes and mobility elements suggests that they are likely candidates for past or future dissemination by horizontal gene transfer (Fig. 1.1b).

The ancient origin of antibiotics and antibiotic resistance makes it unsurprising that large- scale clinical and agricultural use of these compounds has selected for increasing resistance across the bacterial domain of life. Robust evidence for mounting environmental resistance exists in decades old soil samples, in which increasing levels of antibiotic resistance genes correlate with industrial scale production and use of these drugs21. The resulting deterioration of the efficacy of our antibacterial armamentarium has come with substantial costs that are expected to rise over the next decades. Worldwide, it has been estimated that at least 700,000 people die annually due to resistant infections, a figure that is predicted to increase to 10 million annual deaths by the year 2050 if the current trajectory is not altered. Over the next decades, the cumulative global cost of antibiotic resistance is predicted to exceed 100 trillion US dollars22.

The recent spread of mcr-1, kpc, blaNDM-1, qnr and other plasmid-borne resistance determinants underscores two themes. The first is that in the antibiotic arms race against bacteria, humanity is rarely ahead. This was illustrated with the first widely used antibiotics, the sulfa

3 drugs, against which resistance was observed only six years following their mass introduction1.

More recently, resistance to tigecycline was identified even before its approval for human use, let alone its introduction into the clinic23. It is therefore either a sign of hubris or a lack of understanding of evolutionary principles when new antibiotics are advertised as being resistance- proof. The second theme has been our relatively myopic (albeit understandable) focus on resistance in human pathogens. It is a tautology that antibiotics are most effective at treating human pathogens. They would not be antibiotics if they failed to effectively treat pathogens, but our emphasis on a few pathogen ‘trees’ initially resulted in the field of antibiotics being blind to the ‘forest’ of environmental bacteria. The targets of most antibiotics are both highly conserved and essential across bacterial taxa, and have therefore made excellent subjects of intra-kingdom communication and competition among prokaryotes throughout biological history24. When our view is expanded to include the vast diversity of non-pathogenic bacteria it is not surprising to see that all genera of the prokaryotic kingdom have evolved resistance mechanisms that protect these vital processes.

Here, I will examine these themes across three eras of antibiotic resistance research. The first era, which coincides with the golden age of antibiotics, is characterized by a focus on resistance in pathogens in the clinic and by the ready supply of new antibiotics and diverse antibiotic mechanisms of action. In this era, drawing inspiration from Koch, the research priority was on single resistance genes and enzymes primarily from pathogenic organisms in axenic culture. In the second era of antibiotic resistance studies, we slipped the restrictive bonds of

Koch’s postulates and shifted our attention to resistomes. Antibiotic resistance of both benign and pathogenic bacteria was studied at the microbial community level, and metagenomic approaches were incorporated to discover resistance determinants from diverse habitats. As we

4 enter the next era of antibiotic resistance research, we should acknowledge the ubiquity of antibiotic resistance and take steps to improve development of new antibiotic therapies and surveillance of emerging resistance threats, focusing on resistance genes that are the most likely candidates for horizontal transfer to pathogens. In the remainder of this introduction, I will provide brief overviews of the previous eras of resistome studies, and outline how distinct philosophies from each of these generations could be integrated to improve the translational impact of resistome research going forward.

1.1.2 Common mechanisms of antibiotic action and antibiotic resistance

Antibiotic targets are generally conserved across the bacterial domain of life and are absent from or sufficiently different in eukaryotes. Although hundreds of such targets exist in theory, our current antibiotic arsenal generally attacks the bacterial ribosome (inhibiting protein synthesis, for example, aminoglycosides), cell wall synthesis and lipid membrane integrity

(compromising membrane integrity or lysing bacterial cells, for example, β-lactams), single- carbon metabolic pathways (inducing metabolic disruption, for example, sulfa drugs), and DNA maintenance (interfering with bacterial replication and genomic integrity, for example, quinolones)1,9,25 (see Fig. 1.1, top).

Bacteria typically resort to one of a few common strategies to resist antibiotics26 (see Fig

1.1, bottom). One resistance mechanism involves inactivation of the antibiotic through enzymatic degradation or modification of the antibiotic scaffold, which renders it ineffective. Examples of this mechanism include chloramphenicol acetyltransferases27, tetracycline inactivating enzymes such as Tet(X)28,29, and, most notably, the widespread β-lactamases30,31. A second strategy for resistance is protection, alteration, or overexpression of the antibiotic target. This approach is used by vancomycin resistant enterococci (VRE), which enzymatically modify peptidoglycan,

5

Figure 1.1: Common mechanisms of antibiotic action and resistance. Antibiotics generally function by one of a few common mechanisms (top). These include inhibition of protein synthesis, inhibition of cell wall or membrane synthesis or integrity, disruption of single-carbon metabolism, or disruption of DNA replication and integrity. Likewise, bacteria resort to one of several strategies to resist antibiotics (bottom). These strategies include enzymatic degradation or modification of the antibiotic, alteration or occlusion of the antibiotic target, active efflux of the antibiotic, or reduced permeability of the bacterial cell to antibiotic influx. thus decreasing the affinity of vancomycin for its target32, and by methicillin resistant

Staphylococcus aureus (MRSA), which expresses a redundant, methicillin-insensitive variant of the native penicillin-binding protein33. Two additional resistance mechanisms rely on keeping the antibiotic out of the bacterial cell, either through efflux or altering cell membrane permeability.

By expressing either a generalist multidrug efflux pump34-37 or an antibiotic-specific exporter such as the tetracycline efflux pumps38, bacteria keep the intracellular concentration of the antibiotic below inhibitory levels. Alternatively, some bacteria reduce cell membrane or cell wall permeability to prevent antibiotics from entering the cell by decreasing porin expression or expressing a more selective porin variant39,40. In some cases, bacteria will use several

6 complementary mechanisms to achieve high-levels of resistance. For example, high level carbapenem resistance in clinical isolates of Enterobacter cloacae that lack any known carbapenemase can be achieved through a combination of porin mutations to decrease carbapenem uptake and increased expression of a chromosomal non-carbapenemase β- lactamase41.

1.1.3 The golden age of antibiotics

Both Sir Alexander Fleming and Gerhard Domagk, the discoverers respectively of penicillin and prontosil, warned of the danger of bacterial resistance developing following improper use of their antibiotics42,43. One of the first reports of antibiotic resistance was the finding that a cell-free extract from a coliform bacterium that was impervious to penicillin was able to inactivate penicillin, which indicated enzymatic degradation of the drug44. In the following decades, extensive use of penicillin in hospitals led to widespread resistance in

Staphylococcus aureus and resulted in the 1953 penicillin-resistant S. aureus phage-type 80/81 pandemic, which represented the world’s first superbug until the release of methicillin ended the threat45. Soon after, the emergence of multi-drug resistant strains of Shigella spp. following apparent horizontal gene transfer from fecal E. coli strains was reported. Remarkably, there was evidence for such transfers occurring in hospitalized patients during the course of their antibiotic treatment 46.

The first era of resistance studies was concomitant with the golden age of antibiotics.

During this period, all modern major classes of antibiotic were discovered. Within years, many were rendered obsolete in at least some organisms through the acquisition and spread of resistance1. For a period, novel or previously known mechanisms of resistance that arose in new pathogens were commonly dealt with through the development and release of a new antibiotic.

7

Figure 1.2: Next-generation sequencing and functional metagenomic selection accelerate cataloging of known and novel resistance genes. (a) The catalog of resistance genes has increased exponentially in the second era of antibiotic resistome studies owing to improvements in next-generation sequencing. This trend is exemplified by the tetracycline efflux pump tetA, the β- lactamase family of enzymes (bla), and by chloramphenicol acetyl transferase (CAT). Data from the UniProt database5 were accessed October 12, 2016 based on searches using E.C. numbers (bla 3.5.2.6, CAT 2.3.1.28) or by family name (tetA major facilitator superfamily tcr tet family). Data for sequencing costs from National Human Genome Research Institute webpage6. (b) Functional metagenomics is a powerful tool for ascribing antibiotic resistance function to known gene products. Antibiotic resistance proteins identified by functional metagenomic selection of the preterm infant gut microbiome have high amino acid identity to proteins in NCBI, but low amino acid identity to proteins in antibiotic resistance databases. (c) Additionally, functional metagenomics can reveal entirely sequence-novel resistance determinants. β-lactamases that were identified by functional metagenomic selection of two human gut resistomes (red squares) encompassed greater phylogenetic diversity than all previously identified classes of β-lactamases (blue squares). Part b adapted from Gibson et al. 20167, part c adapted from Sommer et al. 20098.

Indeed, resistant bacteria were even welcomed by Domagk as an opportunity to increase antibiotic market share42. During this era, resistance genes and bacteria were largely identified

8 one gene and one strain at a time using culture-based methods. This low-throughput approach, combined with incompletely exhausted reserves of natural antibiotics, disguised the otherwise apparent threat that antibiotic resistant bacteria were destined to become.

1.1.3 High-throughput resistance discovery

The second era of resistome studies has been characterized by a broader ecological view of resistance, which was enabled by several breakthrough technologies. In the past decades, the new ecological resistome philosophy that recognized the environment as the plausible original source of resistance genes and as the reservoir of resistance gene diversity has been driven by the technical advances in molecular biology and nucleic acid sequencing.

Sequencing-based resistance discovery.

The development of genomic techniques that allow the culture-independent study of diverse environmental microbial communities has been a critical driver of the paradigm shift in resistome studies6,47-49. This is illustrated by the growth of the catalog of known resistance genes, concomitant with the steep drop in sequencing costs. A survey of resistance proteins in the

UniProt database5 demonstrates that since 1986 there has been an exponential increase in the number of hypothetical resistance determinants classified as β-lactamases, chloramphenicol acetyltransferases, or tetracycline efflux pumps (Fig. 1.2a). The surfeit of sequencing data exemplifies the second era of resistance studies although, in many cases, the lack of functional validation of these genes calls into question whether all are resistance determinants. Similarly, the sequence diversity that is captured using cutting-edge techniques may not translate into functional diversity. One technique that has proven useful in discovery of resistance determinants that are novel, diverse, and validated is functional metagenomics (Fig. 1.3), which is a technique developed to mine metagenomes for a variety of phenotypes without a need for

9

Figure 1.3: Functional metagenomic approaches for characterizing resistomes. A graphical overview of the functional metagenomic experimental and computational workflow. Metagenomic DNA is shotgun cloned into an expression system, selected for antibiotic resistance, and resistant transformants are sequenced, assembled, and annotated in high throughput to identify resistance determinants in a sequence- and culture-unbiased manner. culturing50. Functional metagenomics has proven essential for characterizing antibiotic resistomes. Briefly, this technique consists of cloning metagenomic DNA into expression vectors and screening for a phenotype of interest (for example, antibiotic resistance) in a facile host organism. This bypasses the bottleneck that is imposed by difficult to culture bacteria and explicitly selects for genes that are a threat for horizontal transfer, as only phenotypes transferable to a heterologous host are detected. However, resistance to antibiotics that results from mutation of the antibiotic target, an important contributor to resistance in many organisms such as M. tuberculosis51,52, may not be always be detected in functional metagenomic screens as a single wild-type copy of these genes in the expression host often confer a dominant susceptible phenotype. This is the case for quinolone resistance resulting from mutations in the DNA gyrase enzyme51, although analysis of evolutionary patterns in such genes may allow for antibiotic resistance risk estimates to be made53-55. Furthermore, the artificial action of extracting and cloning DNA can lead to the transfer of genes that otherwise might not be mobilizable, which 10 possibly raises concerns over the spread of resistance in cases where actual risk may be low, but may also result in the underestimation of sequences related to mobility elements, which are not explicitly selected for in screens for antibiotic resistance.

Functional metagenomics is a powerful culture- and sequence-unbiased tool.56 A metagenomic library is created by extracting total metagenomic DNA from a microbial community, shearing the DNA to a target size distribution, and cloning these fragments en masse into a screening vector. The metagenomic library is then transformed into a heterologous host and selected for a phenotype of interest, for example by culturing with antibiotics at a concentration that is lethal to the wild-type host49,57 (Fig. 1.3). Pairing functional metagenomic selection with next-generation sequencing and better sequence assembly and annotation algorithms4 has simultaneously decreased the cost and increased the throughput of functional metagenomic experiments, which facilitated the characterization of resistomes from diverse habitats3,7,17,23,58. Importantly, there is no need to culture members of the target microbial community and novel resistance genes can be discovered. Furthermore, the genes discovered by functional metagenomics are by definition threats for horizontal transfer, as they must be functional in a heterologous host.

However, there are several limitations. For a gene to be identified in a functional metagenomic screen, it must be functional in a heterologous host. Common hosts such as E. coli likely present a poor system for discovery of resistance genes that originated from Gram-positive organisms. Indeed, poor overlap has been reported between clones recovered from the same functional metagenomic library subjected to the same screen but in different hosts59,60. In addition, it is generally difficult to screen Gram-positive-specific antibiotics in E. coli or other common Gram-negative platforms owing to their innate resistance, although in some cases this

11 may be overcome61. Thus, there is a clear need to develop and screen functional libraries in phylogenetically diverse heterologous hosts56,62. Furthermore, functional metagenomic experiments may oversample resistomes, as the method struggles to distinguish resistance genes that pose clinical threats from those that are unlikely to spread beyond their current environment.

Genes that confer antibiotic resistance when overexpressed in a heterologous host similarly may not be genuine resistance genes in their native contexts55. To reduce the risk of false positives, it is important that ‘novel’ resistance genes are validated using complementary microbiological and biochemical methods and crosschecked against existing databases. Finally, removal of an antibiotic resistance gene from its genomic context often also removes positional information that is pertinent to assessing its risk dissemination. This can be overcome, in part, by altering insert size, as larger inserts may contain informative regions flanking resistance genes, including mobility elements or plasmid sequences that might indicate risk for spread (that is, the mobilome)3,58,63.

The coupling of functional metagenomic selections with high throughput sequencing and computational methods has facilitated an increase in our ability to catalog resistomes from diverse habitats. Importantly, functional metagenomics can both ascribe resistance function to known genes that were not previously identified as resistance determinants and uncover resistance genes with novel sequences. For example, a recent analysis of functionally discovered resistance genes found that, whereas many of the gene products had high amino acid identity to known enzymes, most had low identity to known resistance determinants7 (Fig. 1.2b). Similarly, the utility of functional metagenomics in uncovering entirely new resistance functions is highlighted by its recent application to discover novel rifamycin phosphorylases64, dihydrofolate reductases65, and tetracycline-inactivating enzymes29.

12

The resistomes of different habitats.

Functional metagenomics was first described in 2000 as a method for accessing the diverse biochemical activities encoded in the uncultured soil metagenome50. In an early application of functional metagenomics for the explicit purpose of screening for antibiotic resistance genes, a library, which contained 5.4 Gb of metagenomic DNA from a remnant oak savannah site, was selected against a panel of aminoglycoside antibiotics, as well as tetracycline and nalidixic acid66. Ten resistant clones were sequenced, and all but one were found to contain novel resistance determinants. Since these initial studies, functional metagenomics has been widely used to screen a diverse array of soil metagenomes, including remote arctic soils67-69, apple orchard soils70, alluvial soils71,72, wetland soils73, agricultural soils58,65, and urban soils74, and has identified novel antibiotic resistance genes to diverse classes of antibiotics. Agricultural resistomes are of particular interest because the selective pressure of heavy antibiotic use and extensive anthropogenic interaction make this a high-risk habitat for antibiotic resistance selection and dissemination. Indeed, the mcr-1 plasmid-mediated colistin resistance determinant was first discovered on a Chinese pig farm75, although it has since been detected in new and banked clinical and environmental samples across the globe76-84.

Global spread of plasmid-borne resistance.

In the past several decades, a variety of new antibiotic resistance threats have emerged and spread worldwide. In particular, the emergence and spread of plasmid-borne resistance genes

(Fig. 1.5), which pose serious threats for horizontal gene transfer, have jeopardized many last line antibiotic therapies, including the cephalosporin and carbapenem families of β-lactam antibiotics, quinolones such as ciprofloxacin, and an antibiotic of last resort, colistin12.

Cephalosporins have been compromised by resistance owing to extended spectrum β-lactamases

13

Figure 1.4: Recent global dissemination of plasmid-borne resistance. Resistance genes encoded on plasmids represent particular threats for broad dissemination. In recent years, resistance genes to quinolones, carbapenems, and polymixins have been observed in clinical isolates worldwide not long after their first identification. Location of first report is indicated as a square, and locations of subsequent identifications are indicated as circles.

(ESBLs), which were first reported in 197985 and subsequently spread phylogenetically across the Enterobacteriaceae and geographically across the globe86. As a result, many infections that were previously treated by drugs such as cefotaxime must now be treated with the carbapenem class of antibiotics. However, the spread of carbapenemases87, particularly the New Delhi metallo-β-lactamase-1 (NDM-1)88-90 and Klebsiella pneumoniae carbapenemases (KPC)91 since their respective discoveries in 2006 and 2001, have similarly rendered these antibiotics vulnerable92.

A similar development has been observed with the quinolones, which were originally touted to be at low risk of the type of transferable resistance that has beleaguered the β-lactam antibiotics. Indeed, whereas chromosomal resistance can readily develop to this class of antibiotic through DNA gyrase mutations, transferable resistance was slow to develop following

14 clinical introduction of these antibiotics in the 1960s. Nonetheless, plasmid-mediated resistance through the Qnr proteins, first described in 1998, has emerged from quinolone overuse and subsequently spread worldwide. In searching for the origins of Qnr resistance determinants, a

PCR screen of Gram-negative organisms found homologs in the aquatic bacterium Shewanella algae with almost 99% amino acid identity93. Shewanella species that lack these homologs were found to have 4- to 8-fold greater sensitivity to quinolone antibiotics, which indicates the ability of the qnr-related genes to confer some antibiotic resistance even in their native contexts in which they likely function to bind DNA for some other purpose9,93. The hypothesized horizontal transfer of qnr homologs from environmental to pathogenic bacteria corroborates hypotheses that many resistance genes may have been coopted by pathogens from the environmental resistome4,20,93,94. Quinolone efficacy has been further reduced by the emergence in Shanghai in

2003 of the plasmid-borne bi-functional aminoglycoside and quinolone resistance gene aac(6’)-

Ib-cr95. This gene, which differs from a classical aminoglycoside transferase by only two base- pairs, is notable in its ability to modify fluoroquinolones as well as aminoglycosides through acetylation. Alarmingly, aac(6’)-Ib-cr is often colocalized with other resistance genes such as

ESBLs96 on multi-drug resistance plasmids.

The spread of the aforementioned resistance mechanisms leave only a few classes of antibiotics approved for the treatment of infection by carbapenem-resistant Enterobacteriaceae.

One of these, the polymyxin class of peptide antibiotics, was first used in the 1960s, but was abandoned in favor of other antibiotics owing to its toxicity. Recently, the polymyxin colistin has been revived owing to its efficacy against several multidrug resistant Gram-negative pathogens97.

However, transferable resistance has now begun to threaten colistin. In 2015, an E. coli isolate from a pig farm in Shanghai with colistin resistance was found to contain mcr-1 — a plasmid-

15 born gene that encodes a phosphoethanolamine transferase75. This finding, and the subsequent discovery of mcr-1 on plasmids worldwide76-84, foreshadows the fall of colistin as the final antibiotic of last resort without transferable plasmid-encoded resistance. With high levels of antibiotic prescription in medicine and unregulated use in agriculture around the world it is likely only a matter of time before completely pan-resistant plasmids are found that combine the above resistance mechanisms. Indeed, the coexistence of NDM-1 and MCR-1 in Enterobacteriaceae has already been observed80,98, and a recent report documented the co-transfer of mcr-1 and ndm-5 on a mobile hybrid IncX3-X4 plasmid99.

In addition to soil habitats, human-associated resistomes have been a recent focus of functional metagenomic investigations. The first human-associated microbial community to be queried for antibiotic resistance by functional metagenomic selection was the oral microbiome.

Screening of a small functional library constructed from dental plaque and saliva samples against tetracycline resulted in the discovery of tet(37), a gene encoding a tetracycline-inactivating enzyme with low sequence identity to any previously described tetracycline resistance determinant, as well as other resistance genes100. A larger follow-up study revealed resistance against tetracycline, gentamicin and amoxicillin in the oral microbiota101. The first report of a functional metagenomic interrogation of the human gut resistome screened metagenomic DNA from the saliva and stool of two healthy humans against a panel of 13 antibiotics. Ninety-five unique inserts, which together conferred resistance to all 13 antibiotics screened, were sequenced and found to be largely dissimilar from the closest known resistance genes. For example, ten novel β-lactamases were identified in just two gut microbiomes, indicating that these microbiomes encompassed greater phylogenetic β-lactamase diversity than all previously described β-lactamases (Fig. 1.2c)8. Additional studies have examined the gut microbiomes of

16 diverse human populations, including healthy adults102, healthy twins and their mothers23,103,104, preterm infants7, uncontacted antibiotic-naïve Amerindians17, and families, as well as livestock and pets, from developing countries in Latin America3. Importantly, these studies have made it clear that although the resident human-associated resistome is enriched after antibiotic therapy7, the gut harbors an extensive resistome even in the absence of anthropogenic antibiotic use3,17.

Limitations of resistome studies.

Although these studies are extremely valuable in cataloging the genetic diversity of the global resistome, they have been largely descriptive or philatelic exercises with substantial requirements for downstream mechanistic validation. It is important to emphasize that just because a gene confers resistance when it is overexpressed in a heterologous host, it may not be a bona fide resistance gene in the original host. Indeed, nuanced definitions of antibiotic resistance should consider the function of resistance determinants in their native contexts and risk for transmission between bacterial species55. There is extensive evidence that resistome composition is strongly correlated with the phylogenetic structure of a microbial community, and that synteny of antibiotic resistance genes with mobility elements is observed less frequently in environmental metagenomes (for example, from soil) than in sequenced pathogen genomes58,105 (Fig. 1.5b).

Nonetheless, the discovery of identical resistance genes in multiple genomic and environmental contexts suggests historical transfer and/or dissemination events. For example, a recent functional metagenomic study uncovered an identical TEM-1 β-lactamase in 25 different habitats and genetic backgrounds3 (Fig. 1.5a). This research, when viewed in light of recent emerging resistance threats, underscores that, although not all resistance genes are threats for widespread dissemination, we should pay close attention to the ones that are. A focus of future studies on antibiotic resistomes should be a quantification of risk for horizontal gene transfer to pathogens

17 based on ‘metadata’ associated with a given resistance gene (for example, synteny with mobility elements, plasmid-borne or chromosomal, in a close relative of human pathogen) to model the risk for transmission54,55.

Figure 1.5: Synteny of antibiotic resistance genes provide historical context and foreshadow future threats. The genetic context of antibiotic resistance genes can provide important evidence of the likelihood of past or future horizontal gene transfer. (a) A single β- lactamase (TEM-1) was found in 25 different genetic contexts in a recent cross-habitat resistome study, which provided a historical record of past mobility. Five contexts are shown here. Habitats in which gene was discovered are indicated on the left (human: human-associated microbial community, animal: animal-associated microbial community, WWTP: wastewater treatment plant, latrines: composting latrine). (b) Four resistance determinants that were discovered in functional selections of soil metagenomic libraries (bottom) have high identity with resistance genes from clinical pathogenic isolates. Shading indicates >99% nucleotide identity, which suggests recent horizontal transfer. Resistance genes in pathogens are syntenic with mobility elements and other resistance genes, which suggests that they may be present on a mobile multidrug resistance cassette and pose a threat for horizontal gene transfer. Part a adapted from Pehrsson et al 20163 and part b from Forsberg et al 20124.

18

1.1.4 Next-generation approaches to resistance

Strategies to combat antibiotic resistance will require substantial changes to both the antibiotic pipeline and stewardship practices and antibiotic resistance surveillance. The ubiquity of antibiotic resistance suggests that the drug development pipeline must include proactive screens for existing environmental resistance threats to new drugs and should explore alternative strategies for combating resistance outside of the traditional antibiotic pipeline. Additionally, it is the responsibility of the wider medical and scientific communities to apply the fruits of the first two eras of resistome studies towards efforts to surveil potential antibiotic resistance hotspots for emerging resistance threats before they spread.

Next-generation therapeutics.

A key aspect for the next-generation of antibiotic resistance studies is the proactive development of therapeutic strategies to counteract emerging antibiotic resistance. One side of this approach involves improved development of new antibiotics with a greater focus on resistance potential. Current antibiotic discovery and development pipelines only take into account resistance mechanisms that are already prevalent in the clinic. In retrospect, we know that this strategy is doomed to fail, as resistance has been observed for every antibiotic that has been implemented for human use1. Going forward, we should improve on the current reactive strategies by screening promising lead compounds for resistance against diverse functional metagenomic libraries. This will illuminate existing mechanisms of resistance that are likely to disseminate to the clinical setting upon the introduction of the antibiotic for clinical use (Fig.

1.6).

In the near future, our understanding of the resistome will be sufficiently mature to enable the development of strategies that actively combat resistance mechanisms themselves in

19 contrast to current strategies that side-step resistance through the expensive development of new antibiotics that immediately select for new mechanisms of resistance. This approach, in our opinion, provides a higher likelihood of success than searching for a fabled ‘silver bullet’ antibiotic to which resistance is unlikely to develop. One such strategy is the repurposing of existing antibiotics in synergistic combinations. Consider, for example, the triple β-lactam combination of meropenem, piperacillin, and tazobactam, which was found to synergistically kill methicillin-resistant Staphylococcus aureus (MRSA) N315 in vitro and in a mouse infection model. The combination of these three antibiotics was collaterally sensitive and thus suppressed the evolution of further resistance106. Treatment regimens that induce collateral sensitivity not only enable the elimination of resistant bacteria, but also subject resistant bacteria to selective pressure to discard their resistance gene107. Inverting the selective advantage of antibiotic resistance is a promising approach for slowing the rate at which resistance evolves in a population. A recent highlight in the search for selection-inverting compounds involved a screen of nearly 20,000 small molecules for compounds that select against the tetracycline efflux pump

TetA. This screen identified two candidates that successfully selected for loss of the tetA gene108.

Perhaps more important are drugs under development and in early phase testing that target AcrA-

AcrB-TolC type efflux complexes in Gram-negative bacteria. Unlike TetA proteins, which are tetracycline-specific, AcrA-AcrB-TolC pumps have broad substrate tolerance and can confer multi-drug resistance, in particular in Gram-negative bacteria109.

A final strategy that has already proven effective in opposing widespread resistance is the inhibition of antibiotic resistance enzymes110. Small molecules that lack antibiotic activity but enhance the efficacy of antibiotic therapy are broadly termed antibiotic adjuvants111. β-lactamase inhibitors and Tet(A) efflux selection inversion108,110,112 fall in this category, as do promising

20 examples from recent studies that paired antibiotics with repurposed drugs already on the market for other indications such as the opioid-receptor agonist loperamide which facilitates tetracycline uptake and the ADP receptor antagonist ticlopidine which inhibits teichoic acid biosynthesis10,111,113,114. Continued discovery of antibiotic adjuvants is facilitated and informed by resistome studies that comprehensively describe the universe of possible resistance genes.

Consider, for example, the recently described E. coli-hosted array of >40 mechanistically distinct and functionally validated resistance genes. This platform, which represented resistance to 16 classes of antibiotics, was used to screen a limited small molecule library for compounds that could enhance the antibiotic activity of gentamicin. Of the 27 compounds that exhibited this activity, eight were found to block aminoglycoside-2″-O-nucleotidyltransferase (ANT(2″)-Ia) mediated resistance. Two compounds selected for further validation exhibited poor antibiotic activity alone and were confirmed to inhibit ANT(2″)-Ia in vitro. By screening drug combinations against a phenotypic resistome array, the authors were able to identify inhibitors of antibiotic resistance genes, demonstrating the utility of leveraging resistome data to discover adjuvants that can prolong the efficacy of existing antibiotics115.

Next-generation surveillance.

Lastly, it is important to prioritize continued and increased surveillance of the antibiotic resistome, in particular, ‘hotspots’ in which there is a high likelihood of resistance gene evolution and transfer between organisms (Fig. 1.6). These hotspots include, but are not limited to, agricultural settings in which large quantities of antibiotics and human pathogens as well as environmental microorganisms intermingle and potentially exchange DNA75, hospitals in which selective pressure is omnipresent due to extensive use of antibiotics and antimicrobials, which may co-select for antibiotic resistance46, and waste-water and sewage systems in which the

21 human gut microbiota is directly exposed to many pharmaceutical residues3. Surveillance requires regular testing of pathogens, commensals, and microbial communities for resistance to a broad panel of antibiotics. This can occur using culture-dependent or genomic approaches, and necessarily entails the continued development of technologies that will facilitate identification of antibiotic resistance determinants.

Recent computational and bioinformatic advancements have offered substantial improvements in our ability to identify antibiotic resistance genes from PCR amplicons and whole-genome sequencing (WGS)49. These databases contain nucleotide or protein sequences for known antibiotic resistance determinants, and can be queried (by BLAST116, for example) to identify the antibiotic resistance genes that are present in a given genome or metagenome.

Figure 1.6: Integration of next-generation sequencing and screening technologies with drug development and resistance surveillance. The wide application of functional metagenomic selections and next generation sequencing will steadily increase our knowledge of functionally annotated antibiotic resistance genes and their genetic context. This will allow for more intelligent antibiotic resistance surveillance and drug design by accounting for specific mechanisms of resistance and risk of resistance spread.

22

However, these functional predictions are explicitly based only on sequence data and are vulnerable to false positive predictions, particularly when functional assignments are based on annotations that are themselves removed from direct biochemical or phenotypic validation. A key component of our ability to annotate and predict antibiotic resistance genes from genomic data is the continued improvement of antibiotic resistance gene databases through functional validation117-122. In addition, improved probabilistic annotation algorithms such as profile hidden

Markov models (HMMs) have been shown to perform substantially better than pairwise sequence alignment in terms of precision and recall123,124. For example, Resfams, a curated database of 166 profile HMMs that were constructed from functionally validated resistance protein families correctly predicted 45% more β-lactamases in functionally selected metagenomic contigs compared to BLAST alignment to antibiotic resistance databases122. In another case, nucleotide sequences from metagenomic datasets were used to computationally predict qnr genes that might confer quinolone resistance. Although qnr genes display low similarity to other resistance determinants and relatively high in-family diversity, the authors were able to discover and functionally verify several novel variants125,126. As a result of these tools, we are now able to more confidently identify both transferable and intrinsic resistance determinants from genomes and metagenomes. Going forward, it is important that these databases are regularly updated with the nucleotide and amino acid sequences of functionally validated, emerging resistance determinants (Fig. 1.6).

Despite these advancements, the prediction of resistance phenotype based on genotype remains a challenge. WGS-based methods present a promising approach to improve the speed and accuracy of diagnostics127-129. An additional advantage of WGS-based diagnostics is the possibility for improved identification of antibiotic resistance by target modification, which is

23 often not feasible by existing PCR-based or microarray-based methods129-131. A current highlight of sequencing based diagnostics is the Mykrobe predictor, which identified antibiotic susceptibility profiles for twelve antibiotics from raw, unassembled Illumina sequencing data from Staphylococcus aureus and Mycobacterium tuberculosis isolates as well as from in silico simulated heterogeneous populations with high sensitivity and specificity132. A recently described alternative approach uses draft genome assemblies as input and a machine learning approach to accurately predict resistance profiles for multi-drug resistant Enterobacteriaceae isolates124. Although it is currently cost and time prohibitive to use WGS as a clinical diagnostic tool, continued reductions in sequencing costs and improvements in computational algorithms may soon make this approach feasible. Faster and more accurate diagnostics will importantly reduce empiric and unsuitable antibiotic prescriptions, limiting further selection for resistance

(Fig. 1.6).

Two recent examples from the literature highlight the important of surveillance efforts as we attempt to arrest the spread of antibiotic resistance. These examples pertain to the current antibiotics of last resort against carbapenem-resistant Enterobacteriaceae — colistin and tigecycline. The first is the emergence of the mcr-1 gene as a transmissible threat to colistin use.

The seminal example of a plasmid-borne colistin resistance gene was found through an active surveillance programme in China monitoring antimicrobial resistance in food animals75. Soon after its discovery, several retrospective studies demonstrated that plasmid-borne mcr-1 had already spread worldwide and into the clinic133 (Box 2). In this case, surveillance provided crucial early detection. The second example of the importance of surveillance relates to tigecycline, a semi-synthetic tetracycline antibiotic134. Due to its increased affinity for the bacterial ribosome compared to other tetracyclines, and the presence of a glycyl tail, tigecycline

24 is subject to reduced resistance through ribosomal protection or drug efflux135. However, tigecycline is vulnerable to the tetracycline-inactivating protein Tet(X)136. Although this protein was originally discovered in Bacteroides fragilis — a traditionally non-pathogenic human gut commensal, several clinical isolates have recently tested positive for tet(X)137,138, and tetracycline inactivation has been discovered to be more widespread in environmental metagenomes than previously thought29. Continued surveillance of this apparent emerging resistance mechanism is prudent, as tigecycline and other next-generation tetracycline derivatives are being designed to exclusively evade resistance through ribosomal protection and efflux139,140. Increased use of next generation tetracyclines imposes a strong selective pressure for the acquisition and spread of tetracycline-inactivating enzymes from the environmental resistome. Foresight of this event should enable the proactive development of next-generation therapeutic strategies before the widespread dissemination of a resistance threat.

1.1.5 Conclusions

Extensive study of the antibiotic resistome over the past decades has allowed us to begin to understand and address existing antibiotic resistance threats. However, it is evident that antibiotic resistance will continue to evolve and spread in spite of our best efforts to develop new antibacterial agents. Unless current practice is changed, it will be increasingly difficult to establish and maintain even a transient lead over bacteria in this arms race. As research into antibiotic resistance expands, it is important to adopt an explicitly proactive approach to antibiotic resistance identification and surveillance, as well as antibiotic therapy development.

This proactive approach involves using a combination of functional metagenomics, next- generation sequencing, and cutting-edge computational methods to monitor evolution and dissemination of resistance before a given resistance determinant emerges in a pathogen or in the

25 clinical setting, as well as proactively developing next-generation therapies that target these resistance determinants. The spread of plasmid-borne carbapenem, quinolone, and polymyxin resistance in recent years is a sobering reminder of our need to catch up to existing threats, and to anticipate emerging resistance mechanisms before they circulate widely in the clinical setting if we wish to alter the current grim antibiotic resistance trajectory. Recent advances in the field highlight the promise that the next-generation of resistome studies holds for characterizing and countering emerging resistance threats.

26

1.2 Genomic and functional techniques to mine the microbiome for novel antimicrobials and antimicrobial resistance genes:

Microbial communities contain diverse bacteria that play important functional roles in every environment. Advances in sequencing and computational methodologies over the past decades have illuminated the phylogenetic and functional diversity of microbial communities from diverse habitats. Among the functions encoded in microbiomes are the abilities to synthesize and resist small molecules, yielding antimicrobial activity. These functions are of particular interest when viewed in light of the public health emergency posed by the increase in clinical antimicrobial resistance and the dwindling antimicrobial discovery and approval pipeline, and given the intimate ecological and evolutionary relationship between antimicrobial biosynthesis and resistance. Herein, I review genomic and functional methods that have been developed for accessing the antimicrobial biosynthesis and resistance capacity of microbiomes and highlight outstanding examples of their applications.

1.2.1 Introduction

Nobel Laureate Joshua Lederberg was the first to propose the term “microbiome”, which he defined as “the ecological community of commensal, symbiotic, and pathogenic microorganisms that literally share our body space.”141 While Lederberg coined the term in reference to the microorganisms that inhabit human body sites, microbial communities can be associated with any environmental, clinical, or engineered habitat. Concurrent with the expansion in sequencing capacity and the widespread adoption of “omics” technologies, the term microbiome was redefined as the collective genetic material of a microbial community, while

“microbiota” is generally used to refer to the microorganisms themselves.142 Initial studies of 27 microbial communities attempted to characterize the phylogenetic and functional diversity by culturing the organisms and performing targeted functional assays. However, culture-based assays are inherently biased, as a large fraction of microbes from most habitats are not readily cultured using standard methods.94 Fortunately, metagenomic sequencing of microbial communities, facilitated by the revolution in genomics over the past several decades, has enabled increased investigation and appreciation of the breadth and depth of functions encoded in diverse microbiomes.

The study of antimicrobial resistance and biosynthesis in microbial communities is motivated by the ubiquity and diversity of microbes in wide-ranging habitats, the evolutionary and ecological origins of antimicrobial resistance, and the global health crisis posed by increasing antimicrobial resistance. The human commensal microbiota, harbors approximately

10trillion microbes,143,144 while the diversity of the soil microbiota has been estimated to be as high as 10,000 unique species per gram.145 Both culture-based and culture-independent analyses of these and other microbial communities have revealed biosynthetic pathways for small molecules with antimicrobial activity146,147 as well as abundant and diverse antimicrobial resistance genes.148-150 The collection of resistance genes in a given microbiome, known as the antimicrobial “resistome”, consists of acquired, intrinsic, proto, and silent resistance genes.19,151

It is important to characterize resistomes and biosynthetic pathways from varied microbial communities in order to better understand and combat the rise of clinical antimicrobial resistance.149 This is ideally achieved by pairing functional and sequence-based genomic technologies with traditional culture-based analyses.

This review is intended to provide an overview of functional and sequence-based methods that have been developed for discovery of antimicrobial biosynthesis pathways and

28 resistance genes from microbial communities. We will critically evaluate these methods and highlight outstanding examples of their applications, with a specific focus on the advantages and disadvantages that each method provides.

1.2.2 Rise in Antimicrobial Resistance and Decline in Antimicrobial Discovery

The World Health Organization has described antimicrobial resistance as the single greatest challenge in infectious disease today, posing a serious public health threat on a global scale.152 Clinical and environmental antimicrobial resistance have been increasing since the widespread anthropogenic deployment of antimicrobials. Indeed, clinical resistance is now frequently observed within a year of deployment of a new antimicrobial.1,2 (Fig. 1.7)

Furthermore, PCR based surveys of archived soils have revealed that the prevalence of environmental antimicrobial resistance genes has been on the rise since 1940.21,94 In the USA alone, 2 million people currently acquire serious infections that are resistant to the antimicrobials designed to treat those infections, resulting in an estimated 23,000 deaths.153 Incidences of multidrug resistance in human pathogens are also increasing.153 The proportion of clinical isolates that are resistant to antimicrobials, including deadly MRSA (methicillin-resistant

Staphylococcus aureus), VRE (vancomycin-resistant Enterococcus), and FQRP

(fluoroquinolone-resistant Pseudomonas aeruginosa) strains has been steadily increasing since the 1980s.154 The economic burden of antimicrobial resistance is also tremendous, with an estimated 8 million additional hospital days resulting in $20 billion in direct healthcare costs and

$35 billion in lost productivity in the US alone.153,155 If the current trajectory is not altered, it is estimated that worldwide deaths from antimicrobial resistance could climb to 10 million each year by 2050.22

29

Compounding the problem posed by antimicrobial resistance, the “golden age” of antimicrobial discovery (ca. 1940-1960) was followed by a substantial innovation gap spanning

1962-2000, in which no new classes of antimicrobials were introduced (Fig. 1.7).2,156 Only three new antimicrobials were approved for human use during the first decade of the 21st century.157

The decrease in antimicrobial discovery can be partially attributed to low incentives for research and development of new drug classes.152 While most antimicrobial products currently in the pipeline are active against at least some of the pathogens identified by the FDA as posing a

‘serious threat to public health’, barely a third (16 drugs) show significant activity against multidrug resistant Gram-negative species.152 The issue is complicated by the collateral damage from excessive antimicrobial use: when an antimicrobial is administered clinically or agriculturally, it provides selective pressure for increased antimicrobial resistance in all microbes in those habitats, not simply the targeted pathogen(s), and thus has a negative societal impact.158

To counter the withdrawal of pharmaceutical companies from the antimicrobial market, the US

Congress is considering legislation to create a streamlined antimicrobial approval pathway. This would approve drugs to treat life-threatening infections based on smaller (and hence faster) clinical trials, provided that they meet the same Phase I safety and efficacy standards as other drugs.159 While these measures are being considered, it is imperative that we continue to better understand the ecological and evolutionary dimensions of antimicrobial production and resistance, to support the improved stewardship of both our current and future arsenals of antimicrobial compounds.

1.2.3 Brief History of Antimicrobials

Antimicrobials are defined as chemicals that either kill (bacteriocidal) or inhibit the growth of (bacteriostatic) bacteria at defined concentrations, and function by targeting systems

30 critical to bacterial physiology. The five major modes of antimicrobial action are: (i) inhibition of cell wall synthesis; (ii) inhibition of protein synthesis via the bacterial ribosome; (iii) inhibition of DNA or RNA synthesis; (iv) inhibition of the folic acid pathway of nucleic acid synthesis; and (v) disruption of cell membrane integrity.2,26,160-162

Antimicrobials have likely been naturally produced by environmental microbes as a means for communication and defense for billions of years.163-165 Residues of antimicrobials were discovered in human skeletal remains dating from 350-550 CE, suggesting that humans have been exposed to antimicrobials for thousands of years.9,166,167 However, it was not until the

20th century that natural-product antimicrobials were deliberately repurposed for human use, revolutionizing our treatment of infectious diseases. Up to the 1920s, infectious diseases were the leading cause of death, with pneumonia, influenza, gastrointestinal infections, and tuberculosis accounting for a combined 540 deaths per 100,000 deaths in the US during the first decade of the 20th century.168 Ehrlich’s discovery of arsphenamine, an organoarsenic drug, a compound that could cure syphilis-infected rabbits9,169 and Fleming’s subsequent serendipitous discovery of penicillin, a β-lactam,170 signaled the advent of the antimicrobial era. The “golden age” of antimicrobial discovery began in the 1940s and lasted through the 1960s, fueled by

Selman Waksman’s rapid antimicrobial discovery platform.1,171 Waksman was awarded the

Nobel Prize for the development of this method,172 and the drugs developed during the golden age of antimicrobial discovery substantially improved the treatment of infectious disease.

Many natural-product antimicrobials were found to have pharmacological or toxicological drawbacks, or eventually became obsolete due to evolution and spread of resistance.10 Synthetically altering natural product antimicrobials to improve pharmacology or evade resistance launched a new era of medicinal chemistry. Many antimicrobials were

31 synthesized by modifying molecular scaffolds of antimicrobials previously discovered during the golden era. As a result, amphenicol, tetracycline, aminoglycoside, macrolide, glycopeptide, and quinolone classes were introduced or expanded, further improving our treatment of infectious diseases.1,9,156,173,174

1.2.4 Brief History of Antimicrobial Resistance

The introduction of antimicrobials to the clinic resulted in a precipitous drop in mortality worldwide from infectious disease.175 However, the selective pressure placed on bacterial systems targeted by antimicrobials due to the extensive use of antimicrobials in clinical and agricultural settings soon led to the steady evolution and increase of antimicrobial resistance,

13,25,26,176 systematically compromising each of the antimicrobials in our arsenal (Fig. 1.7).

The strategies that bacteria use to resist antimicrobials can be divided into four general categories: (i) drug efflux, (ii) reducing permeability of cell wall or membrane, (iii) target overexpression, modification or protection, and (iv) enzymatic inactivation of the drug.26,177 In the first resistance mechanism of drug efflux, bacteria pump out the antimicrobial, keeping the intracellular concentration low and preventing the drug from reaching inhibitory concentrations.

This mechanism encompasses efflux pumps that are both specific to single antimicrobials or classes, such as the tetracycline efflux pumps,38 as well as nonspecific, multidrug resistance efflux pumps.34-37 The second resistance strategy, thematically related to the first, involves reducing the permeability of the cell wall or cell membrane to the antimicrobial, thereby preventing it from reaching its target. This can occur via reducing porin expression or expressing a more selective porin variant.39,40 The third resistance strategy is overexpression, modification, or protection of the drug target in the bacterium, enabling survival. This mechanism is used by methicillin resistant Staphylococcus aureus (MRSA) which expresses PBP2a, a redundant and

32 methicillin-insensitive version of the native PBP2 penicillin-binding protein, encoded by the mecA gene.33 The final (fourth) resistance strategy involves enzymatic inactivation of the drug, by either degrading or modifying the antimicrobial compound such that it no longer has activity.

Examples of this mechanism include β-lactamases, which degrade β-lactam antimicrobials,30,31 tetracycline destructases such as Tet(X),28,29 and chloramphenicol acetyltransferases.27

While we primarily evaluate antimicrobial resistance in terms of its effects on clinical use of antimicrobials, the capacity to resist such compounds is in fact a natural and ancient feature of all microbial communities.15,16 It is important to consider that the genes that we refer to as

“resistance genes” may have had functions unrelated to clinical antimicrobial resistance in their original context, but could be repurposed for resisting antimicrobials when they are encountered in a new context. The original functions of such genes may include cell wall biosynthesis (e.g. β- lactamases), export of signaling molecules, metabolic intermediates, or plant produced compounds (e.g. multidrug efflux pumps), or detoxifying the original host from the antimicrobial

Figure 1.7: Timeline of antibiotic deployment and resistance observation. The introduction of an antimicrobial to the clinic (green) is rapidly followed by the first observation of resistance to that antimicrobial (red).1,2

33 that it is producing (e.g. tetracycline efflux pump induced just-in-time by its biosynthetic precursor anhydrotetracycline112).178 The discovery of β-lactam, tetracycline, and vancomycin resistance genes in 30,000 year-old Beringian permafrost sediments confirms that genes that confer resistance to antimicrobials preceded the clinical use of antimicrobials by several thousand years.15 Furthermore, while the first enzymatic degradation of penicillin was described in 1940 when researchers noted that extracts of certain bacterial cells were capable of inactivating penicillin,44 it is estimated that environmental β-lactamases have existed for over 2 billion years.179,180 Intensive human use of antimicrobials in clinical, agricultural, and industrial settings over the past 80 years have led to the fairly recent exaptation of such genes with resistance-conferring potential for surviving in the presence of antimicrobials.181,182

Since the introduction of antimicrobials to the clinic, the study of antimicrobial resistance has been largely focused on isolates of human pathogens, and has relied on information gathered from culturing these organisms.8 However, it is thought that most clinical antimicrobial resistance has its evolutionary origins in the aforementioned genes with resistance-conferring potential encoded by benign environmental and human-associated bacteria. Furthermore, a recent study identified numerous resistance genes from benign soil bacteria with 100% nucleotide identity to resistance genes that were present in diverse human pathogens, suggesting the relatively recent sharing of resistance genes via horizontal gene transfer.4 This finding highlights the need to study antimicrobial resistance in benign as well as pathogenic bacteria and in the context of microbial communities as well as isolates, as horizontal gene transfer is the mechanism by which most pathogens become resistant to antimicrobials.183

34

Figure 1.8: Function and sequence-based methods for mining microbiomes. DNA is isolated from source material (center), which can then be mined for antimicrobial resistance genes (left) or antimicrobial biosynthetic pathways (right). Functional metagenomic methods (top) typically entail shotgun cloning metagenomic DNA into an expression vector, and selecting for a desired phenotype (e.g. antimicrobial resistance or antimicrobial activity). Sequence-based methods (bottom) customarily involve sequencing metagenomic DNA and annotating sequences using general or function-specific databases.

1.2.5 Culture-based methods underestimate functional diversity of microbiomes

Discovering novel antimicrobials and resistance from microbial communities through culture-based methods is challenging and limited to the cultivable minority of bacteria.184

Genomic technologies offer the promise of substantial performance improvements for characterizing the functional capacity of microbiota, which has implications in both the basic science and translational realms.184,185 For example, characterizing antimicrobial resistance using current culture-based methods can take approximately 1 – 3 weeks versus the proposed less than

35

12 hours using genomic technologies.184 Whole genome sequencing has the potential to reduce exposure to ineffective drugs and the evolution of resistance by reducing diagnosis to days instead of weeks as seen in the case of drug resistant Mycobacterium tuberculosis. 185,186

Depending on the habitat, traditional culturing efforts may capture as little as 1% of a microbial community.94 The other 99% includes species that are recalcitrant to culture by standard methods.187 This severe undersampling has motivated considerable recent advancements in culturing methods to enable improvements in the proportion of microbial communities that can be cultured. However, these methods, which may leverage microfluidic technologies,188,189 clever simulations of natural growth conditions,189,190 or exhaustive sampling of culture conditions,191 can be technically challenging and are not yet part of standard culturing workflows. Culture-based methods are further hindered by the possible unreliable identification of the cultured organisms, which is dependent on the expertise of the researcher. Culture-based methods are also riddled with irreproducibility on non-selective media.192 It is often difficult to recapitulate the phenotype of resistance of a targeted bacterium from selective media to non- selective media.

Advancements in genomic and computational technologies and the concomitant dramatic reductions in sequencing costs over the past decades have allowed for an explosion of microbial genome and metagenome sequencing, illuminating the diversity of microbial communities and the breadth and depth of functions encoded therein.6,184,193-198 Genomic technologies can overcome culture-based limitations and facilitate the discovery of novel antimicrobials and resistance determinants. The remaining review details the functional and sequencing-based methods which can be employed to quantitatively interrogate microbiomes and resistomes (Fig.

1.8).

36

1.2.6 Function based methods for resistance discovery

Hybridization/PCR-based Methods

In addition to functional metagenomics, a powerful method described in Section 1.1 (Fig.

1.3), hybridization and PCR based methods are promising approaches for functional identification of resistance genes. One method that integrates culturing and sequencing for resistance characterization is a new technology that utilizes molecular padlock probes.199 In this method, which shows particular clinical promise in predicting the antimicrobial susceptibility profile of patient samples, the microbial community is exposed to the antimicrobial of interest for a short period of time, after which metagenomic DNA is extracted.199 The metagenomic

DNA is then mixed with complementary biotinylated oligonucleotides of targeted rRNA gene sequences and streptavidin-coated magnetic beads. This is to ensure the capture of the relevant rRNA gene sequences. The mixture is then hybridized and ligated to the designed padlock probes, usually designed to detect certain positions in the 16S rRNA gene.199 The probes are then amplified through rolling circle amplification.199 After amplification, the final product is fluorescently labeled and DNA copy number is quantified using a high-performance fluorescence detector.199 An antimicrobial susceptibility profile can be determined by using DNA copy number of the sample without antimicrobials and comparing with the DNA copy number of the sample after exposure to antimicrobials.199 If there is an increase in DNA copy number after exposure to antimicrobials, this indicates resistance. If there is no significant growth after exposure to antimicrobials, this indicates susceptibility.199 This method has been used to characterize the antimicrobial susceptibility profile of uropathogens from complex samples.199 Researchers accurately predicted the antimicrobial susceptibility profile for

55 out of 56 samples, highlighting the clinical utility of this method.199 An advantage that the

37 padlock probe method presents is short turnaround time, leveraging high throughput to give results in a matter of hours and help clinicians prescribe correct antimicrobials to patients with bacterial infections. However, the probes that are used in the assays have to be designed for individual bacterial species and antimicrobials that are targeted to be used in the assay, and cannot characterize novel resistance genes.

1.2.7 Sequence-based methods for resistance discovery

While function-based methods have been valuable in identifying novel resistance determinants from diverse microbiomes, they are limited in throughput and are ideally complemented by sequence-based methods, which we will review here. The ability to analyze bacterial communities through culture-independent shotgun metagenomic sequencing has revolutionized our ability to characterize resistomes.3,200-202 These analyses have benefited from databasing projects, including the 35,000+ public metagenomes which have been uploaded to the

Metagenomics Rapid Annotations using Subsystems Technology server (MG-RAST).203

Shotgun metagenomic sequencing involves extracting and sequencing total metagenomic DNA from a microbial community. Once sequenced, a specific subset of informative marker genes from the metagenome can be used to infer phylogeny and functions that are present in the microbial community of interest.204 To predict antimicrobial resistance from shotgun sequencing, most studies have relied on pair-wise comparisons (e.g. by Basic Local Alignment Search Tool

(BLAST)116) of sequences to databases of known or predicted resistance genes.205-209 Many antimicrobial resistance databases are highly biased toward human-associated organisms; therefore, environmental antimicrobial resistance is comparatively ignored.122 Prediction algorithms based on hidden Markov models (HMMs) trained on functionally-validated resistance genes from diverse environments have proven to be superior to traditional pair-wise annotators

38 in both precision and accuracy of resistance gene annotation, since HMM profiles allows for the discovery of highly diverse and understudied antimicrobial resistance genes.122

Antimicrobial Resistance Gene Databases

A number of databases that store nucleotide and protein sequence information for antimicrobial resistance genes have been developed. Query sequences are aligned to sequences of known antimicrobial resistance determinants in the databases by BLAST,116 which performs a strict pairwise DNA or protein sequence alignment, to catalog the antimicrobial resistance gene content of a genome or metagenome. These databases have dramatically advanced the field of genomic and metagenomic analyses of antimicrobial resistance by allowing the sequence-based identification of known resistance determinants.120

The Antibiotic Resistance Gene Database (ARDB) is a manually curated antimicrobial resistance database.118 ARDB provides a comprehensive ontology which creates a resistance profile by matching specific genes and mechanism of actions. It uses BLAST to identify and annotate antimicrobial resistance genes. Many studies have used ARDB as their sole method to identify antimicrobial resistance genes210-212 and have revealed that a majority of antimicrobial resistance genes cluster based on ecology.205,211,213 Although this database was the first of its kind, it has not been updated since 2009.

The Comprehensive Antibiotic Resistance Database (CARD) is a collection of known resistance determinants and associated antimicrobials.119 It is designed to predict known antimicrobial resistance from genome sequence data by using a BLAST-based pairwise sequence alignment to known antimicrobial resistance genes in the database. The database is organized based on ontologies, linking genes to their function. This is an ideal database to use if the antimicrobial resistance mechanism of interest has been previously thoroughly researched.

39

However, it is not suited for the detection of point mutations in chromosomal target genes known to be associated with antimicrobial resistance genes.214 Many studies have annotated antimicrobial resistance using CARD alone or in combination with ARDB.208,209,215,216 Unlike

ARDB, the CARD database is actively managed and periodically updated.

The Bush-Jacoby β-lactamase list is a curated database that matches β-lactamase gene sequences to resistance to β-lactam antimicrobials. This database will soon be hosted on the

NCBI website. This database was originally focused heavily on TEM, SHV and OXA type β- lactamases, only a subset of β-lactam antimicrobial resistance genes. Other β-lactamases that are now included in the database are CTX-M, CMY, AmpC, CARB, IMP, VIM, KPC, GES, PER,

VEB.120

The Lactamase Engineering Database (LacED)217 is an extension of the Bush-Jacoby β- lactamase list. This database includes a way to predict TEM, SHV, and class B enzymes by merging the information from both NCBI peptide database and the TEM mutation table. LacED builds protein families from these databases and integrates protein sequence and structure information using DWARF.218

The Repository of Antibiotic Resistance Cassettes (RAC) takes advantage of the fact that resistance genes are often syntenic with mobile genetic elements, forming resistance cassettes.

This database was the first to simultaneously automate identification of antimicrobial resistance genes and place the genes in their broader genetic context.219

ResFinder121 combines the Bush-Jacoby β-lactamase list with ARDB and other published antimicrobial resistance gene sequences. Similar to other resistance databases, query sequences are matched to the sequences in the database via BLAST. The difference between ResFinder and most other resistance gene databases is the ability for the user to tailor the identity and length

40 coverage thresholds. This allows the user to specify settings that are suited for the depth and quality of sequencing for each project. The current challenge with using ResFinder and previously described databases is the inability to classify chromosomal mutations which can lead to antimicrobial resistance or to classify novel resistance genes, and is thus limited to resistance genes commonly acquired through horizontal gene transfer.214

The Antibiotic Resistance Gene-Annotation (ARG-ANNOT) 214 uses a BLAST-enabled search on a curated antimicrobial resistance gene database compiled from Bush-Jacoby β- lactamase list, Resfinder121, ARDB118 and others. A main difference between this program and others is the ability to additionally predict resistance function based on chromosomal point mutations. ARG-ANNOT contains a curated database focused on mutations found in many chromosomal genes (i.e. rpoB, gyrA1, gyrA2, parC) that can confer resistance to antimicrobials.

It is difficult to identify novel resistance genes when using pairwise comparisons (e.g. by

BLAST) to resistance databases which are largely based on resistance genes from cultured bacterial isolates, and are especially biased towards human associated pathogens. Resfams implements an alternative method in order to identify known and novel resistance genes with high precision and accuracy.122 Resfams is a curated database of profile HMMs built on resistance proteins compiled from CARD119, LacED217, and Jacoby and Bush’s collection of curated β-lactamases, based on the gene ontology from the CARD database. Additionally, the

Resfams profile HMMs incorporate antimicrobial resistance genes from environmental, human commensal, and pathogenic bacteria which have been discovered through culture-independent functional metagenomic selections. The HMM approach is superior to pairwise annotation approaches in both precision and accuracy. Pairwise BLAST searches against resistance gene databases were unable to identify over 60% of resistance genes identified by Resfams from a

41 functionally-validated resistance gene test set.122 Perhaps more importantly, differences in

HMM-based versus BLAST-based pairwise resistome analyses of the same data can lead to significantly different (and sometimes completely contradictory) biological conclusions. For instance, comparison of resistome predictions from over 6,000 bacterial genomes by Resfams versus predictions by pair-wise alignment of resistance gene databases revealed that pair-wise methods are inaccurately biased in both their phylogenetic and ecological distributions towards the relatively oversampled human-associated environments which they represent.122

These myriad resistance gene databases are clearly important tools for interrogating microbiomes for antimicrobial resistance genes, but some challenges in their use and opportunities for their improvement exist. An important limitation to both HMM- and BLAST- based searches is the implicit requirement for users to determine the accuracy of the predicted resistance gene function, especially when the exact sequence being annotated has not been previously functionally assayed. Additionally, many of these databases do not incorporate information on the host-specific functionality of some antimicrobial resistance genes.220 For example, most Gram-negative bacteria are intrinsically resistant to important classes of Gram- positive specific drugs (e.g. vancomycin, linezolid, and macrolides) because they cannot cross the Gram-negative outer membrane.177 Accordingly, it is likely inappropriate to annotate homologs of genes which provide resistance to these antimicrobials in Gram-positive bacteria when they are identified in Gram-negative bacteria. An important future goal in these databases is a species-dependent risk estimator for horizontal gene transfer to quantify the likelihood of resistance gene dissemination between species and habitats. Additionally, it is important that environmental or “non-clinical” resistance genes which are currently underrepresented these databases be included122. Because most clinical resistance originated in the environment, this will

42 facilitate the surveillance of emerging resistance mechanisms as they disseminate beyond benign organisms. Despite the inherent limitations in ascribing resistance function based on sequence alone, antimicrobial resistance databases play a key role in curating sequences of known resistance determinants. It is crucial that these databases are regularly updated and simultaneously cross-validated to maintain accuracy and relevance. Continued maintenance and improvement of these databases, including the ongoing incorporation of novel, functionally validated resistance determinants and associated metadata from diverse habitats, will ensure that they remain valuable resources for the antimicrobial resistance community for years to come.

It is likely that the selective pressure that drove the evolution of most antimicrobial resistance genes is the incredible biosynthetic potential of microbial communities to synthesize natural products with antimicrobial activity. This motivates the need to understand antimicrobial resistance in microbial communities in the context of antimicrobial biosynthesis, and vice versa.

Paralleling recent improvements in methods to characterize resistomes, there have also been significant advances in functional and sequence-based methods for discovering novel antimicrobial biosynthesis machinery, which we will review below.

1.2.8 Function-based methods for antimicrobial discovery from microbiomes

Function-based methods for the discovery of antimicrobials from microbiomes present a clear advantage over sequence-based methods in that no a priori knowledge of the sequence of a natural product biosynthetic pathway is required. This facilitates the discovery of novel natural products that are produced by biosynthetic pathways that are divergent from previously characterized pathways and provides access to novel chemical diversity. This unbiased approach has been successfully used for the discovery of natural products with antimicrobial activity.221,222

The method depends on shotgun cloning of metagenomic DNA from a given environment into

43 an expression vector, heterologous expression of pathways encoded in the metagenomic DNA in an appropriate host, and screening for production of bioactive natural products.223

Despite the advantages that this unbiased approach provides, there are several challenges that have hindered the discovery of natural product antimicrobials using functional metagenomics. The main disadvantage is construction and heterologous expression of large insert libraries is technically challenging.224 Strategies to improve the heterologous expression of biosynthetic pathways, including overexpression of alternative sigma factors in E. coli, have proven successful for heterologous production of polyketides and hold promise for the development of improved metagenomic screening hosts.225 Additionally, nontraditional heterologous hosts such as Streptomyces lividans have demonstrated utility in the discovery of bioactive natural products from metagenomic libraries.226 A final challenge inherent in functional metagenomic antimicrobial discovery lies in the fact that antimicrobial production is a more complicated phenotype to select for than antimicrobial resistance, for reasons which we will detail below. This necessitates the use of clever screening strategies to identify and isolate clones of interest.224

One screening strategy that successfully enables discovery of antimicrobial biosynthesis from metagenomic libraries is to directly assay for antimicrobial activity against an indicator strain using a double-agar layer method.227 In this method, the heterologous host of the metagenomic library is allowed to grow on plates, after which top agar containing the indicator strain is overlaid. This method was used to discover two clones from a 113,500 member fosmid library representing 4 Gb constructed from a soil metagenome that were able to inhibit the growth of Bacillus subtilis. The clones encoded a biosynthetic pathway that produced indigo and indirubicin, pigments that were confirmed to be responsible for the observed antibacterial

44 activity.221 Another application of this method resulted in the identification of 65 clones with antibacterial activity from a 700,000 member soil library. A small molecule from the clone with the highest apparent bioactivity was purified and identified as a long chain N-acyl amino acid.

This represented the first report of the antibacterial activity of long chain N-acyl amino acids, as well as the first identification of genes involved in long chain N-acyl amino acid biosynthesis, highlighting two key advantages of the functional metagenomic approach to antimicrobial discovery.222

An alternative approach that extends traditional culturing methods uses the innovative

‘iChip’, a multichannel, microfluidic diffusion growth chamber.189 This device was recently used to discover a novel non-ribosomal peptide with antimicrobial activity from a soil microbial community.228 The iChip requires that input microbial community samples are diluted such that each channel receives approximately one bacterial cell. The device is sealed with a semipermeable membrane and returned to the source so that nutrients and growth factors from the natural environment of the source organisms can diffuse into the chamber, increasing the likelihood of culturing fastidious organisms.189 This high-throughput method was used to culture

10,000 diverse isolates from a soil sample, which were predicted to be recalcitrant to culture using traditional methods. Screening of extracts from these isolates for antimicrobial activity revealed the presence of a non-ribosomal peptide, named teixobactin, synthesized by a previously uncultured betaproteobacterial species. Teixobactin was isolated and characterized, and found to have activity against multi-drug resistant Gram-positive pathogens in vitro, as well as in animal models for MRSA and Streptococcus pneumoniae.228 Although this method relied on classical antimicrobial screening platforms, innovative culture methods facilitated the

45 discovery of a novel antimicrobial and revealed the biosynthetic potential of the uncultivable fraction of microbial communities.

1.2.9 Sequence-based methods for antimicrobial discovery from microbiomes

Sequence-based methods for discovery of antimicrobial biosynthetic pathways from microbiomes depend on homology between the query sequence and known biosynthetic pathways. A clear advantage of this method is throughput. Advancements in annotation and prediction algorithms coupled with the abundance of microbial genome and metagenome sequences have made feasible the analysis of sequence data for identification of biosynthetic gene clusters that may encode novel bioactive natural products. A disadvantage of this strategy is that prediction of the structure of a small molecule from the sequence of a biosynthetic gene cluster is nontrivial, as is prediction of a small molecule’s activity from its structure.

One of the first attempts to identify biosynthetic pathways from an uncultured community relied on PCR amplification of ketosynthase genes from Streptomyces isolates as well as from metagenomic DNA isolated from soil.229 The amplicons were purified, cloned into an expression vector, and sequenced. Of 20 randomly selected transformants that were sequenced, only two had similarity to any known ketosynthase gene products. Four of the novel ketosynthases were shown to be functional when expressed in a Streptomyces lividans or Streptomyces glaucescens host as part of a hybrid polyketide synthase pathway (that is, the native ketosynthase was replaced with the newly discovered ketosynthase). The polyketide products of the hybrid biosynthetic pathways were purified and characterized to be novel octaketide and decaketide natural products. This highlights the utility of culture-free methods for identifying novel, sequence-divergent genes involved in biosynthetic pathways to expand access to novel natural products with diverse bioactivities, including potential antimicrobial activity.

46

Tools for identifying genes involved in natural product biosynthesis from sequence data represent a significant advancement over targeted PCR-based studies. One of the resources developed for mining genomic data for biosynthetic gene clusters is antiSMASH.230 antiSMASH makes use of the ClusterFinder algorithm,231 which is an HMM based algorithm for the discovery of biosynthetic gene clusters from genomic data. The ClusterFinder algorithm consists of a four-step prediction pipeline that can identify biosynthetic gene clusters in genomic data.

The use of an HMM-based, probabilistic algorithm allows identification of biosynthetic gene clusters that lack high similarity in domain structure to any known biosynthetic gene clusters, circumventing the problem that had previously been posed in identification of biosynthetic gene clusters from sequence data. antiSMASH also includes an Active Site Finder module that identifies conserved amino acid motifs in the active sites of key biosynthetic enzymes including those involved in polyketide biosynthesis. A final, key feature of antiSMASH is an algorithm that predicts the structure of a natural product synthesized by the annotated biosynthetic gene cluster. Furthermore, the development of a web-based server has democratized the process of identifying biosynthetic gene clusters, because it is available to those without access to high throughput computing facilities.230

The ClusterFinder/antiSMASH workflow was successfully used to identify a natural product with antimicrobial activity from metagenomic sequencing data generated as part of the

NIH Human Microbiome Project. 2,430 genomes of members of the human microbiota were queried using ClusterFinder to identify over 14,000 putative biosynthetic gene clusters.

Metagenomic reads from 752 samples representing five body sites of healthy subjects were mapped to the 14,000 predicted biosynthetic gene clusters. Among the natural products predicted was a thiopeptide from the vaginal commensal Lactobacillus gasseri. The thiopeptide, named

47 lactocillin, was purified and characterized. Lactocillin was found to have antimicrobial activity against Gram positive bacteria including Enterococus faecalis, Gardnerella vaginalis,

Staphylococcus aureus, and Corynebacterium aurimucosum. The discovery and characterization of the structure and bioactivity of lactocillin from human microbiome metagenomic sequencing data demonstrates the potential that diverse metagenomes hold for encoding natural products with antimicrobial activity.147

A current shortcoming of software available to identify biosynthetic gene clusters from metagenomic sequencing data is that assembly of short read data into genomes or large contigs is required. Although there have been recent advancements in metagenomic assemblies of this type,232-234 it remains a challenge. One way to circumvent the requirement for metagenomic assembly is the use of ShortBRED,235 which finds short, unique amino acid markers from reference proteins of interest and maps metagenomic sequencing reads to these markers. This tool has been successfully implemented for the identification of antimicrobial resistance genes200 and has potential for the identification of genes involved in natural product biosynthesis directly from short read metagenomic sequencing data.

An alternative method that can be used to access the biochemical diversity of microbiomes without the necessity for metagenomic assembly uses short DNA sequences that are conserved between biosynthetic gene clusters. PCR amplification of these conserved biosynthetic motifs from a metagenomic source using degenerate primers followed by amplicon sequencing can provide an overview of the biosynthetic capabilities of a given microbiome, and highlight promising leads for heterologous expression and further characterization.236 A bioinformatics platform for mapping short DNA sequence tags to known biosynthetic gene clusters, termed Environmental Surveyor of Natural Product Diversity (eSNaPD), has been

48 developed to facilitate analysis of amplicon sequencing data in the context of natural product biosynthesis.237 This strategy has been successfully used in the identification of novel bioactive small molecules, including some with antimicrobial activity.236,238

A final, promising method for sequence-based identification of biosynthetic gene clusters from microbiomes involves assembly of complete or near-complete genomes using single- molecule, real-time sequencing (SMRT),239 followed by annotation with antiSMASH or a similar annotation pipeline. This approach was successfully applied in tandem with short-read data reference-independent assembly of a high-quality Corynebacterium simulans genome de novo from a skin microbiota sample. This previously uncharacterized genome was annotated with antiSMASH, revealing predicted nonribosomal peptide synthetases, type I polyketide synthetases, terpene synthases, and bacteriocin production genes.240 Interestingly, the bacteriocin production locus was homologous to the locus for biosynthesis of lactococcin, a bacteriocin which is known to be bactericidal.241 Although the natural products whose biosynthetic gene clusters were identified in this study were not isolated or confirmed for bioactivity, the authors demonstrate the power of metagenomic assembly in the identification of biosynthetic gene clusters from microbiomes.

1.2.10 Future directions in antimicrobial and antimicrobial resistance discovery

The increasing threat to human health from multi-drug resistant pathogens underscores the need to develop tools to interrogate microbiomes for novel antimicrobials and to understand and mitigate the evolution and transmission of resistance genes. A more complete understanding of antimicrobial resistance in the context of microbial communities will advance our treatment of antimicrobial-resistant infections and help us choose therapies that will reduce selection for

49 resistance. Additionally, novel antimicrobials discovered within microbial communities will lay the groundwork for narrow-spectrum, next generation antimicrobials that reduce collateral microbiome damage. Given that most clinical antimicrobial resistance genes originated in environmental microbes, and most antimicrobials are natural products or derivatives thereof, it is critical that these two microbial functions be studied in tandem and in the context of microbial communities in order to understand the dynamics governing the evolution and transmission of antimicrobial resistance and biosynthetic capacities.

Advances in sequencing technology will facilitate discovery of complex functions from metagenomes based on sequence alone. For example, long read sequencing technologies such as the NanoPore Minion and the PacBio SMRT will help to overcome the challenges currently posed by metagenomic assembly from short read data, providing greater information about functions encoded in large , such as biosynthetic and multidrug resistance gene clusters.233,242,243 Furthermore, cheaper sequencing will allow for deeper sequencing, improving our ability to assemble metagenomes and profile antimicrobial resistance and biosynthetic capacities.6 Single-molecule sequencing, particularly when combined with short-read shotgun metagenomic data, may also improve metagenomic assembly and allow for better identification of biosynthetic gene clusters.240 In addition to progress resulting from improvements to DNA reading, advancements in DNA writing will likely benefit the field. Dramatic reductions in the cost and efficiency of DNA synthesis may soon make synthesis and recoding of entire biosynthetic gene clusters feasible, simplifying heterologous expression in genetically tractable hosts and downstream characterization of bioactive natural products.244,245 An integration of functional and genomic techniques, aided by these and other inevitable technical and

50 computational developments, will help us better understand the delicate interplay between antimicrobial biosynthesis and resistance in microbiomes.

51

1.3: Antibiotics and the preterm infant gut microbiome and resistome

1.3.1 Introduction

The infant gut microbiota is shaped by both prenatal and postnatal factors246,247.

Following birth, the infant gut microbiota undergoes a patterned maturation process248-250. This process begins with a few taxa, but this population inexorably expands in content and concentration until it approaches an adult-like microbial configuration dominated by

Bacteroidetes and Firmicutes248,251-253. Perturbation of the gut microbiota during this key developmental window can have lasting effects on host physiology and disease risk254-259. One of the most common perturbations during this period, antibiotic therapy258,260, can substantially alter the gut microbiota and infant physiology255,261-266. Because preterm infants are at high risk for infection, antibiotics are the most commonly prescribed medications in neonatal intensive care units (NICUs) in the United States (Fig. 1.9)260,267,268 accounting for three of the top six medications with the greatest relative increase in use in NICUs in the United States between

2005 and 2010269. Antibiotic-induced gut microbiota dysbioses in this population have been linked to the pathogenesis of necrotizing enterocolitis270-272, late onset sepsis273,274, and other adverse health outcomes275. While these correlations exist, the underlying etiologies remain unclear, motivating the study of microbiota development in the context of antibiotic therapy in this vulnerable population that accounts for nearly 10% of all births in the United States276.

1.3.2 Techniques for studying development of the infant gut microbiota and the developing resistome

Microbial ecology studies, especially those focused on the gut microbiota, have flourished because of widespread adoption of sequencing-based studies, either using the 16S 52

Figure 1.9: Prescription data in the NICU. Data adapted from Clark et al. Pediatrics 200623. Frequency defined as the number of times a unique medication name was reported in the medications table in 220 NICUs in the United States and Puerto Rico between January 1996 and April 2005. Antibiotics are the most prescribed medications in the neonatal intensive care unit. rRNA gene or metagenomic shotgun sequencing, resulting in a deeper appreciation of microbiome diversity249,265,272,277-279. Until recently, similar efforts to characterize the preterm infant gut reservoir of antibiotic resistance genes (the resistome) have been largely culture or

PCR-based. While these methods can provide a high level overview of the resistome, they underestimate diversity, are limited to previously known resistance genes, and are only semi- quantitative. In contrast, deep sequencing of microbiomes paired with functional metagenomics offers an unbiased method to identify functional resistance determinants in a stool, and can provide quantitative information on microbiome and resistome architecture23,57. Gibson et al. recently used this suite of techniques to develop a sequence-unbiased account of the effects of antibiotic therapy on the preterm infant microbiota with respect to phylogeny and antibiotic resistance gene distribution200.

53

1.3.3 A longitudinal, sequencing based interrogation of the preterm infant gut microbiota

To assess the effects of antibiotics on the dynamics of the preterm infant gut microbiota,

Gibson et al. assembled a cohort consisting of 84 preterm infants, of whom 82 received antibiotics within the first 24 to 48 hours of life, as is common practice in the NICU. However,

39% received no further antibiotic therapies after the first week of life, allowing them to serve as controls to study the effects of antibiotic treatment on preterm infants in the first weeks of life.

All infants in the cohort were low birthweight and born prematurely, with a median gestational age of 27 weeks and a median birthweight of 865 grams. This sample set was complemented by robust metadata including detailed histories of all antibiotic exposures, overall health indications including infections, delivery mode, postmenstrual age, hospital environment, enteral feeding, other medications, and maternal health.

A total of 401 stools from these infants, collected longitudinally during their stay in the

NICU and ranging from day 6 to day 158 of life, were interrogated by shotgun metagenomic sequencing. In total, Gibson et al. sequenced and analyzed 551 gigabases (Gb) of preterm infant gut metagenomes (1.37 +/- 1.17 Gb per sample) and a total of 107 Gb of functionally selected preterm infant resistomes (5.1 +/- 3.6 Gb per sample). Unique clade-specific marker genes from approximately 17,000 reference genomes were used to assess the phylogenetic composition allowing both species-level resolution and accurate relative abundance estimations of bacterial species composition of these communities, an advantage over traditional 16S rRNA marker gene sequencing204,280. This represents the largest metagenomic sequencing-based study of the preterm infant gut microbiota to date, providing the statistical power to address questions of microbiota and resistome development in preterm infant populations subject to heavy antibiotic exposure.

54

1.3.4 Antibiotics disrupt the preterm gut microbiota richness and composition

Gut microbial species richness, defined as the number of species present in a microbiota, is widely used as a measure of microbiome health. Decreased species richness in infancy has been associated with a number of host pathologies and is widely used as a measure of microbiome perturbation258,270,281,282. To identify factors that significantly contribute to species richness, Gibson et al. developed a generalized linear mixed model with individual included as a random effect, leveraging available metadata including infant health (e.g. CRIB II (Clinical Risk

Index for Babies) score, day of life, gestational age, birthweight, delivery mode, and presence of positive culture), medications (e.g. antibiotics, caffeine, and iron), and maternal health (e.g. preeclampsia and premature membrane rupture). While postmenstrual age and breast milk significantly contribute to increased species richness, the opposite was true with high CRIB II score, meropenem treatment, cefotaxime treatment, and ticarcillin-clavulanate treatment significantly contributing to decreased species richness. Cumulative antibiotic exposure in infants was associated with a significant reduction in species richness and, with the exception of gentamicin, all antibiotics were associated with reduced species richness. This decrease in species richness occurred over intervals during which the gut microbiota of age-matched, antibiotic-naïve preterm infants was increasing in richness, and was accompanied by disruption of microbiota composition (Fig. 1.10). For example, one individual had a ~25% decrease in observed species following meropenem treatment, while over the same period a matched control had a 20% increase (Fig. 1.10a). Similar age-matched comparisons can be made for combined vancomycin and meropenem treatment (Fig. 1.10b) and for ticarcillin-clavulanate treatment (Fig.

1.10c). Antibiotic-induced perturbations of microbiota phylogenetic composition were observed for these infants as well (Fig. 1.10a-c).

55

Figure 1.10: Antibiotic treatment alters preterm infant gut microbiota development. Preterm infant gut microbiota species richness (top) and composition (bottom) for the three individuals with samples analyzed directly before and after depicted antibiotic treatments for (A) meropenem, (B) vancomycin and meropenem, and (C) ticarcillin-clavulanate. Red bars represent species richness for example individual. Black bars represent average species richness for age matched preterm infants with no antibiotic treatment. Error bars depict one standard deviation.

1.3.5 Antibiotic therapy enriches for antibiotic resistance genes and multidrug resistant bacteria

For further insight into the effects of antibiotics on the resistome of the preterm infant gut microbiota, Gibson et al. constructed functional metagenomic libraries from 21 representative fecal samples. Functional metagenomics is a valuable technique for identifying functional resistance determinants in microbiomes. Because functional metagenomics circumvents the bias introduced by culture and the need for a priori knowledge of resistance function, the method provides a high throughput, sequence-unbiased characterization of reservoirs of antibiotic resistance genes in a given environment and can identify novel resistance genes57,66. These libraries, representing a total of 107 Gb of bacterial DNA, were screened against 16 antibiotics, 56 including those routinely used in NICUs. Of the 794 functionally identified antibiotic resistance genes, 42% were encoded by E. coli, Enterobacter cloacae, and K. pneumoniae, taxa which include many pathogenic strains implicated in nosocomial infections in preterm infants279,283.

Interestingly, the sequences of the majority of resistance determinants uncovered by the functional metagenomics screens were already present in protein databases. However, they were not found in antibiotic resistance specific databases, indicating that they had not previously been ascribed a resistance function and highlighting a key advantage of using functional metagenomics.

One third of the antibiotic resistance genes identified by functional selections conferred resistance to multiple antibiotics, often across classes, or were co-selected with another antibiotic resistance gene. For example, the loci selected by piperacillin also conferred resistance to other

-lactam antibiotics. In fact, many -lactamases identified in this study conferred resistance to multiple classes of -lactam antibiotics. In addition to these genes, which may be extended spectrum -lactamases (ESBLs), Gibson et. al. identified multiple Ambler class A -lactamases, and -lactamases that conferred resistance to both penicillins and 2nd and 3rd generation cephalosporins. ESBLs are of particular concern due to their increasing abundance in community acquired infections and in clinical settings such as the NICU, often in Enterobacteriaceae family members, including those that Gibson et. al. identified as dominant members of the preterm infant gut microbiota284. In addition to cross resistance among -lactam antibiotics, Gibson et. al. also observed contigs that conferred resistance across classes of antibiotic. For example, many tetracycline resistance genes also conferred resistance to both chloramphenicol and cefoxitin.

Our findings suggests that many of these genes may be present in the context of a multidrug drug resistance cassette and are threats for dissemination via horizontal gene transfer. Multidrug-

57 resistance is one of two key features associated with spread of antimicrobial resistance in the preterm infant gut microbiota. The other feature is mobility, or the likelihood of horizontal gene transfer. In order to investigate the potential mobility of resistance genes in the preterm infant gut microbiome Gibson et al. assembled metagenomic contigs from shotgun reads, and found widespread plasmid and chromosomally encoded multidrug resistance clusters. Alarmingly, the authors identified plasmid-associated antibiotic resistance genes encoding resistance to more than six antibiotic classes in 96% of preterm infant gut samples. Furthermore, the preterm infant resistome was found to be highly resilient. Over the course of time represented by the longitudinally sampled resistome, only about one fifth of the resistance gene classes showed significant change, implying that early life antibiotic therapy may select for a lasting expansion of the gut resistome.

1.3.6 The predictable response of the preterm infant gut microbiota to antibiotic therapy

To quantify changes to the gut microbiota before and after antibiotic treatment Gibson et al. utilized metagenomic shotgun sequencing to interrogate the phylogenetic architecture of the bacterial community. While meropenem, cefotaxime, and ticarcillin-clavulanate each significantly and immediately reduced species richness, gentamicin and vancomycin (two of the most commonly used antibiotics in preterm infant populations, and which are often co- administered) resulted in variable species richness responses. Using random forests classification, the direction of species richness response to vancomycin and gentamicin treatment can be predicted based on the relative abundance of only two species (Staphylococcus aureus and E. coli) and two antibiotic resistance genes (cpxR and cpxA) with an error rate of only 15%.

We hypothesize that this finding might be due to the innate vancomycin-resistance of Gram-

58 negative bacteria such as E. coli and the innate vancomycin-susceptibility of S. aureus, coupled

285 with the ability of cpxR and cpxA to confer gentamicin resistance to E. coli . These analyses reveal significant antibiotic-specific microbial and compositional responses of the preterm infant gut microbiota. For a commonly prescribed antibiotic regimen, co-therapy with vancomycin and gentamicin, Gibson et al. identified important species and antibiotic resistance gene biomarkers that can predict with high accuracy the short-term species richness response and therefore potential for disruption of the developing preterm infant gut microbiota.

1.3.7 Implications for future research

It is evident that use of specific antibiotics in the NICU, such as next generation -lactam antibiotics, acutely (but perhaps not durably) perturb the gut microbiota while others, such as gentamicin, are less disruptive. Going forward, it is important to understand the enduring effects of antibiotics on the preterm infant gut microbiota and resistome. Recent studies have observed that multidrug resistant clones present in preterm infants in the NICU are absent at two years of age, suggesting that the preterm infant gut may recover following discharge to home286. On the other hand, resistance genes persist for months or years after antibiotic therapy in adults287-289.

Sequence-based characterizations of gut microbiota composition and resistome of preterm infants after they enter the community are needed to fully understand the longitudinal effects of antibiotics on microbiota composition, multidrug resistant organism persistence, and resistance gene carriage. Longitudinal studies also provide the opportunity to correlate dysbioses that originate in the NICU with health outcomes that are manifest later in life. Additionally, it is important to consider the effect of prenatal and intrapartum antibiotics, and prenatal exposure to microbes, on the infant gut microbiota. This is of particular note in preterm infants, one in four of whom are born to mothers with an intrauterine infection290, and in light of the increasing 59 recognition that the uterus is a nonsterile environment. It is necessary to validate the results of this study and others outside of the preterm human infant context. Murine models of antibiotic use have been invaluable in informing microbiome research. Humanized gnotobiotic models in particular have benefited the field greatly255,261,291, Development of a humanized infant mouse model for preterm antibiotic therapy would allow for testing of clinical antibiotic protocols in a controlled environment in order to discern their relative impacts on the developing gut microbiota. Such analyses would build on the foundation laid by Gibson et al. to provide an evidence-base supporting a personalized medicine approach to antibiotic treatment, wherein antibiotic use in preterm infant populations will be tailored to limit collateral microbiota disruptions and selection for resistance.

1.4 Scope of the thesis

Countering the public health crisis posed by antibiotic resistance requires, among other aspects, a comprehensive understanding of the distribution of antibiotic resistance genes across microbial communities, as well as a unified framework for comprehending the response of these microbial communities to antibiotic perturbation. This entails identifying and characterizing emerging resistance determinants that are not yet prevalent in the clinical setting anticipating their ensuing dissemination under the strong selective pressure imposed by antibiotics. Such early identification allows for proactive development of strategies to counter the eventual clinical impact of pathogens bearing these resistance genes. An additional crucial component of addressing antibiotic resistance in light of the pervasive use of antibiotics in the clinical, agricultural, and industrial settings, is illuminating how microbial communities react to antibiotic exposure. A particularly important microbial community is the human microbiota, which mediates critical aspects of human health and disease144, is frequently exposed to therapeutic 60 concentrations of antibiotics292, and is recognized as an important reservoir of resistance genes8.

Integration of these aims could advance our understanding of the factors governing the emergence of antibiotic resistance through the lens of microbial ecology and evolution.

A crucial component of antibiotic resistance surveillance is the study of the resistome in microbial communities across diverse habitats. A habitat that is particularly interesting is the human gut, which houses a microbial community that is both critical to human health and routinely exposed to antibiotics. Of particular interest is the gut microbiota of preterm infants, who account for 12% of U.S. births and receive nearly universal and often extensive antibiotic therapy. While the acute effects of antibiotics on the preterm infant gut microbiome are well documented293, the long-term effects are not yet thoroughly understood. In Chapter 2, I seek broadly to identify the long-term effects of antibiotic perturbation on microbial communities. To this end, I conducted an integrative analysis of the gut microbiota of antibiotic treated extremely and very premature infants over the first twenty months of life using a combination of shotgun metagenomic sequencing, stool culturing and isolate sequencing, functional metagenomics, and machine learning. In total, I sequenced nearly 1.2 Tb of metagenomic DNA from 444 stools collected longitudinally from 58 infants to define a program of healthy microbiota assembly in antibiotic naïve infants, and show acute but transient deviations from this program in antibiotic treated preterm infants. By sequencing >500 isolates cultured, I demonstrate that antibiotic treatment selects for persistent gastrointestinal colonization by multidrug resistant

Enterobacteriaceae in preterm infants. Likewise, I use functional metagenomics to reveal an enriched gut antibiotic resistome and disrupted patterns of resistome development in preterm infants compared to antibiotic naïve near term infants. Lastly, I develop a classifier which can distinguish with high accuracy between antibiotic treated preterm infants and antibiotic naïve

61 near term infants based on microbiota composition later in life, thus revealing a persistent and possibly pathogenic metagenomic signature of early life antibiotic treatment in preterm infants.

A crucial component of addressing the growing clinical problem of antibiotic resistance is the discovery and improved mechanistic understanding of novel antibiotic resistance determinants in environmental bacteria at risk for acquisition by pathogens. In Chapters 3 and 4,

I explore an emerging mechanism of tetracycline inactivation using complementary bioinformatic, microbiological, biochemical, and structural biology approaches. The work described in these chapters will improve fundamental knowledge of the structure/mechanism relationship of tetracycline inactivating enzymes with respect to catalysis and inhibition. Further, this work addresses critical gaps in our understanding of the origins and evolution of an emerging family of clinically-relevant tetracycline inactivating enzymes. While most tetracycline resistance currently occurs by efflux or target (ribosomal) protection, we identified a family of tetracycline inactivating enzymes through tetracycline and tigecycline selections of soil and gut metagenomes29. I show that these genes are more widespread than previously thought in both gut and environmental microbial communities. Using a combination of biochemistry and structural biology, I characterize the mechanism of action of these enzymes, advancing our understanding of tetracycline resistance by inactivation and flavoenzyme biology more broadly. I identify an inhibitor of these enzymes that is sufficient to prevent tetracycline degradation and restore antibiotic efficacy in the face of these resistance enzymes294. Lastly, I sequence and characterize a Pseudomonas aeruginosa isolate from a cystic fibrosis patient in Pakistan bearing a gene 100% nucleotide identity to a gene first identified in our functional selections. I show that this gene is responsible for tetracycline resistance in this isolate, thus signaling the clinical arrival of this tetracycline resistance by inactivation in the clinical setting.

62

This body of work contributes substantially to the fields of microbiology, antibiotics, and resistance. The identification of enduring effects of antibiotics on the gut microbiota has promise to inform antibiotic treatment regimens in vulnerable population such as preterm infants such that infections are effectively treated while lasting collateral damage to the microbiota is minimized. Additionally, consideration of adverse effects of antibiotics during infection management will improve antibiotic stewardship in intensive care unit settings. My work on emerging tetracycline resistance by inactivation has expanded our understanding of the principles governing dissemination of antibiotic resistance from environmental to clinical settings. Furthermore, the identification of environmental resistance genes prior to their clinical emergence and proactive development of strategies (e.g. inhibitor coadministration) to counter their eventual impact is a paradigm broadly applicable to the antimicrobial development space that will enable sustainable development and deployment of our antibiotic arsenal.

1.5 Acknowledgments

This chapter was synthesized from work in which I made first author contributions done in collaboration with Terence S. Crofts, Boahemaa Adu-Oppong, Molly K. Gibson, Phillip I.

Tarr, Barbara B. Warner, and Gautam Dantas. Sections of this chapter were published in

Gasparrini, Crofts et al. (Gut Microbes, 2016), Adu-Oppong, Gasparrini et al. (Annals of the

New York Academy of Sciences, 2016), and Crofts, Gasparrini et al. (Nature Reviews

Microbiology, 2017).

63

Chapter 2: Persistent metagenomic signatures of early life antibiotic treatment in the infant gut microbiota and resistome 2.1 Introduction

2.1.1 Abstract

Because hospitalized preterm infants are vulnerable to infection, they receive frequent and often prolonged exposures to antibiotic therapy. It is not known if the short-term effects of antibiotics on the preterm infant gut microbiota and resistome persist after discharge from neonatal intensive care units. Here, we use complementary metagenomic, culture based, and machine learning techniques to interrogate the gut microbiota and resistome of antibiotic- exposed preterm infants during, and after, hospitalization, and compare these readouts to antibiotic-naïve healthy infants sampled synchronously. We find a persistently enriched gastrointestinal antibiotic resistome, prolonged carriage of multidrug resistant

Enterobacteriaceae, and distinct antibiotic-driven patterns of microbiota and resistome assembly in extremely preterm infants who received early life antibiotics. Our results demonstrate lasting collateral damage of early life antibiotic treatment and hospitalization in preterm infants and urge alternative strategies for infection management in highly vulnerable neonatal populations.

2.1.2 Introduction

Gut microbes play important roles in host health and disease throughout life, and are particularly important in infancy295. During this period, gut microbes exclude pathogens, synthesize vitamins and amino acids, and promote immune system maturation295. Infant gut

64 microbiota assembly accelerates in the first months of life, following inoculation by organisms from mothers and the environment296, but stabilizes by approximately three years of age248.

Antibiotics in this interval may disproportionately damage the host-microbiota ecosystem248,255,282. Indeed, emerging data suggest that early-life gut microbial alterations correlate with a variety of chronic metabolic and immune disorders in childhood and later life255,263-265,295,297-304, including allergies305, psoriasis306, adiposity307, diabetes308 and inflammatory bowel disease309-311. For most of these disorders, a causal link between antibiotic- mediated microbiota disruption and onset of pathology is lacking. However, antibiotics have been linked to permanent immune alterations306,312 and inflammatory bowel disease in later childhood311, highlighting the damaging long-term potential of early life antibiotic treatment.

Over 11% of live births worldwide occur preterm313, and preterm birth and its sequelae are prominent causes of childhood morbidity and mortality worldwide314. Because bacterial infections are serious and frequent complications of preterm birth315, nearly all infants in neonatal intensive care units (NICUs) receive antibiotics269. We have shown that severe, acute perturbation of the preterm infant gut microbiota immediately following antibiotic treatment is characterized by decreased alpha diversity, increased abundance of Enterobacteriaceae, and antibiotic-specific enrichment of antibiotic resistance genes (ARGs) and multidrug resistant organisms (MDROs)7. Because microbiota perturbation during early life is hypothesized to be disproportionately damaging272,316, it is imperative to study the lasting effects of antibiotics on the preterm infant gut microbiota. Prior studies of preterm infants report gut microbiota recovery concomitant with NICU discharge286,317,318. However, a shortcoming of these studies is their reliance on exclusively culture based or amplicon sequencing (e.g. 16S rRNA) based analysis, which are limited in resolution and focus only on taxa present in the microbiota rather than the

65 functions collectively encoded by those taxa. Here, we analyze ~ 1.2 terabases of metagenomic

DNA, culture and sequence 500 bacterial isolates, and functionally select an additional 300 gigabases of metagenomic DNA for expressed antibiotic resistance, to investigate thoroughly and in high resolution the long-term consequences of antibiotic treatment on the preterm infant gut microbiota.

2.2 Results

2.2.1 Metagenomics based analysis of effect of antibiotics on the preterm infant gut microbiota

To understand the lasting effects of early life antibiotics on infant gut bacterial communities, we whole metagenome shotgun sequenced 444 fecal metagenome samples from 58 infants over the first 21 months of their life (1.18 terabases metagenomic DNA; Supplementary

Fig. 2.3; see Methods). Our cohort included 41 preterm infants sampled while they resided in the

NICU and following discharge to home. One subset of this cohort (n=9) received scant antibiotic therapy in their neonatal period or subsequently (total duration of antibiotic therapy < 7 days).

The remaining 32 preterm infants received 255 total additional courses of antibiotics over the first 21 months of life (median (interquartile range (IQR)) 29.5 days (41.6, 68.3 days) antibiotic therapy). All infants in this cohort were classified as being born very or extremely preterm

(median (IQR) gestational age at birth of 26 weeks (25, 27 weeks)) and had extremely or very low birth weights (median (IQR) 840 g (770, 960 g)). Additionally, 17 antibiotic-naïve, healthy early term319 or late preterm320 (median (IQR) gestational age at birth 36 weeks (36, 37 weeks); hereafter referred to as “near term”) infants of the same chronological age range, sampled synchronously with the preterm cohort, were included to serve as controls for healthy microbiota development.

66

Table 2.1. Clinical characteristics of infant cohorts analyzed in this study Preterm Early Preterm Early + Near Term Antibiotic Subsequent Antibiotic-Naïve Exposure Only Antibiotic Infants (N=9) Exposure (N=17) (N=32) Birth weight, g, median (IQR) 1080 (880, 1270) 830 (698.75, 897.5) 2529 (2359.5, 2966.5) Gestational age at birth, weeks, 27 (26, 27) 25 (24, 26) 36 (36,37) median (IQR) Gender, M/F 4/5 15/17 4/13 Route of delivery, C- 15/2 6/3 25/7 section/vaginal Antibiotic exposure, n courses Gentamicin 9 74 none Ampicillin 9 37 none Vancomycin none 67 none Clindamycin none 16 none Meropenem none 14 none Cefepime none 11 none Cefotaxime none 10 none Mupirocin none 7 none Trimethoprim-sulfamethoxazole none 4 none Ticarcillin-clavulanate none 3 none Oxacillin none 3 none Cefoxitin none 3 none Cefazolin none 2 none Amoxicillin none 2 none Metronidazole none 1 none Penicillin G none 1 none Bacterial culture positive, n Blood 0 22 0 Tracheal 0 30 0 Urine 0 17 0

We inferred bacterial taxonomic composition from shotgun metagenomic data using

MetaPhlAn 2.0280. Across all infants, Shannon diversity increased during an apparent developmental phase before stabilizing (Fig. 2.1a). Near term infant microbiota Shannon diversity increased rapidly in the first month of life before reaching a plateau, while preterm infant microbiota diversity increased more gradually and with greater variation (Fig. 2.1a). At the family level, Enterobacteriaceae and Enterococcaceae dominated the preterm infant gut

67 microbiota in the first months of life. In contrast, early colonization by Enterobacteriaceae in near term infants precedes robust colonization with Bifidobaccteriaceae (Fig. 2.1b). Notably, there is a near absence of Enterococcaceae early in life (<4 months chronological age) in the near term infants, and a similar absence of Prevotellaceae later (>8 months chronological age) in preterm infants. In contrast, we observed predicted microbiota functional stability both over time and between groups (Fig. 2.1c), as inferred by HUMAnN2278.

Low gut microbiota diversity is associated with adverse health outcomes in infancy321-323, childhood282, and adulthood324. To identify clinical features of our cohort that were associated with microbiota diversity, we regressed Shannon diversity on clinical variables including antibiotic treatment (Methods) using a generalized linear mixed model with subject defined as individual effect to control for repeated samplings. All variables were included in initial modeling, and a final model was fit by backwards elimination of variables. Following correction for multiple comparisons, day of life was significantly associated with increased Shannon diversity (p<0.001), while a recent (within 30 days of sample collection) course of vancomycin

(p<0.001), ampicillin (p<0.001), meropenem (p=0.009), or cefepime (p=0.012) was significantly associated with decreased species richness (Fig. 2.1d). In this sparse model, 57% of the variance in Shannon diversity was explained by the fixed effects (day of life and antibiotic treatments) alone, while an additional 12% of the variance was explained by subject. Across all infants in our cohort through the first 110 days of life, Shannon diversity of the microbiota was significantly lower in infants who received >1 course of antibiotics in the prior month (Fig. 2.1e). Thus, recent antibiotic treatment appears to be a major driver of microbiota diversity early in life.

68

69

Figure 2.1: Clinical variables including antibiotic exposure predict microbiota diversity. (a) Shannon diversity of all near term (n=17) and preterm (n=41) infants in this study, by month of life. (b,c) Microbiota species and functional composition of all near term (n=17) and preterm (n=41) infants in this study, by month. (d) Day of life is significantly associated with an increased microbial Shannon diversity, while vancomycin, ampicillin, meropenem, or cefepime treatment within the month prior to sampling is associated with significantly decreased species richness (***, p<0.001, ** p<0.01, * p<0.05; GLMM with subject as random effect). Oxacillin was included in the model but its association with diversity was not significant after correction for multiple comparisons. Error bars indicate s.e. (e) Shannon diversity is significantly lower in infants who have received >1 course of antibiotic treatment in the past month compared to infants who had not received antibiotic treatment during that time span (**** p<0.0001, ***, p<0.001, ** p<0.01, * p<0.05; Wilcoxon with Benjamani-Hochberg correction).

2.2.2 Partial microbiota recovery following NICU discharge

While the taxonomic composition of the preterm infant microbiota clustered by both gestational age at birth and antibiotic treatment status (Adonis, p<0.001, Bray–Curtis), chronological age was a major driver of microbiota composition across all infants248,250,286 (Fig.

2.2a). We hypothesized that after early-life antibiotic-induced perturbation7, the gross composition of the preterm microbiota will converge towards the profile of the microbiota of age-matched healthy, antibiotic naïve, near term infants within the first 21 months of life, but that potentially pathological microbiota ‘scars’ from this early-life disruption (e.g., enriched

ARGs and MDROs) persist.

To identify and then quantify the extent of this antibiotic induced perturbation, we used a random forests algorithm to regress the relative abundances of species present in the microbiota of infants against their chronological age as previously described251. Because of the nonparametric assumptions of random forests325, the algorithm detects both linear and nonlinear relationships between species in the microbiota and age, thus identifying taxa that distinguish different developmental periods in infancy. By assessing model performance with sequentially fewer predictors, we determined that 50 was the minimum number of variables required for

70

Figure 2.2 Partial architectural recovery of preterm infant gut microbiota following discharge from NICU. (a) Microbiota composition is distinct between near term infants, preterm infants with early only antibiotic treatment, and preterm infants with early and subsequent antibiotic treatment (Bray-Curtis, p<0.001, Adonis), but chronological age is a major driver of microbiota composition. (b) Fivefold cross-validation indicates that 50 variables are sufficient for random forests prediction of chronological age of near term infants based on microbiota composition. Error bars indicate s.e. (c) The 50 most informative predictors to the random forests model, determined over 100 iterations of the model. These species were included in a sparse model. Error bars indicate s.e. (d) The sparse random forests model is a good predictor of near term infant chronological age, but preterm infant chronological age is predicted to be less than actual age across numerous stages of development. Curves are loess regression fit to each group. (e) Preterm infant microbiota for age Z-score (MAZ) is significantly lower than that of near term infants in the first month of life, indicating early microbiota immaturity (** p<0.01, **** p<0.0001). (f) Preterm infant MAZ is statistically indistinguishable from near term infant by 12-15 months of life, indicating resolution of microbiota immaturity by this time point. 71 accurate prediction of age (Fig. 2.2b). Hence, we trained a sparse model comprised of the 50 most informative predictors (taxa) on the antibiotic naïve, near term infant subset in order to model healthy microbiota development, and subsequently refined and validated the model (see

Methods). The top age discriminatory taxa in the gut microbiota of antibiotic naïve, near term infants were Faecalibacterium prausnitzii, Subdoligranulum sp., Ruminococcus gnavus, and

Oscillobacter sp. (Fig. 2.2c). We used this sparse model to predict the chronological age of an infant using only the relative abundance of these 50 species. This prediction, or ‘microbiota age,’ provides an approximation of the relative maturity of a microbial community251. We observed a linear relationship between the chronological and microbiota ages of antibiotic naïve, near term infants, suggesting that the model accurately predicts near term infant microbiota age. For preterm infants, however, our model predicted microbiota ages that were younger than chronological age across several stages of development, indicating that disruption of microbiota development occurs in these infants. To better quantify the extent of disruption, we computed a microbiota for age Z-score (MAZ) for each metagenome, as previously described251. Preterm infants who receive early-only or early-plus-subsequent antibiotic treatment have significantly lower MAZs than do infants born near term gestation as measured in the first months of life (Fig.

2.2e). However, by months 12-15 of life following discharge from the NICU, the MAZs of hospitalized preterm infants closely resemble those of healthy, antibiotic naïve, near term infants

(Fig. 2.2f). Thus, despite apparent transient early life delays in the development of the preterm infant gut microbiota, the bacterial taxonomic composition converges on common structures with those of healthy, antibiotic naïve, infants within the first 21 months of life (Fig. 2.2d).

2.2.3 Antibiotic resistome of preterm infant gut microbiota

72

To characterize the antibiotic resistome encoded in the gut microbiota of our infant cohort, we first conducted a high throughput functional metagenomic analysis4 of 217 preterm and near term infant stools, selected to represent the diversity in gestational age at birth, chronological age, and antibiotic treatment history in our cohort. We constructed 22 functional metagenomic libraries by pooling metagenomic DNA from 9-10 stools per library, encompassing

396 gigabase pairs (Gb) of metagenomic DNA with an average library size of 18 Gb

(Supplementary Fig. 2.4; see Methods) and an average insert size of 2-3 kilobase pairs (kb). We selected libraries on sixteen antibiotics relevant to neonatal and pediatric populations

(Supplementary Table 2.2) and recovered resistant transformants for each antibiotic except meropenem (Supplementary Fig. 2.4). Alarmingly, we found that the infant gut metagenome encoded transferrable resistance even to antibiotics rarely or never used in neonates, such as ciprofloxacin and chloramphenicol, and those that represent last lines of defense against

MDROs, such as tigecycline and colistin. Interestingly, only one of eight libraries constructed from stools of antibiotic naïve, near term infants encoded ciprofloxacin resistance (mediated by loci other than gyrA or parC), compared to six of fourteen libraries constructed from preterm infant stools. This observation, viewed in light of the scarce use of ciprofloxacin in neonates269, suggests that acquired (as opposed to intrinsic) ciprofloxacin resistance occurs naturally in preterm infant gut bacterial communities, or that organisms resistant to ciprofloxacin may be co- selected by other antibiotics to which they are resistant.

We sequenced resistance-conferring inserts and assembled 874 unique, functionally identified ARGs. The median identity of functionally selected ARGs to sequences in the NCBI non-redundant protein database (retrieved May 21, 2018) was 94.4%, while their median identity to the Comprehensive Antibiotic Resistance Database (CARD, version 1.2.1, retrieved January

73

Figure 2.3: Preterm infants harbor an enriched gut resistome. (a) Amino acid identity between all functionally selected ARGs and their top hit in CARD vs their top hit in the NCBI nr protein database, colored by class of antibiotic used for selection. Notably, ARGs recovered by fluoroquinolone and polymyxin selection have very low median identity to CARD. (b) The most commonly predicted hosts of functionally selected ARGs based on highest identity BLAST hit in the NCBI nr protein database. (c) Gut resistome composition is distinct between near term infants, preterm infants with early only antibiotic treatment, and preterm infants with early and subsequent antibiotic treatment (Bray-Curtis, p<0.001, Adonis). (d) Preterm infants had fewer unique ARGs encoded in their gut metagenomes than near term infants. (* p<0.05, ** p<0.01 Wilcoxon). (e) The cumulative resistome relative abundance was significantly higher in the gut microbiota of preterm infants with early and subsequent antibiotic treatment compared to both preterm infants with only early antibiotic treatment and near term infants (* p<0.05, Wilcoxon). (f) A random forests model trained on near term gut resistome poorly predicts chronological age of near term infants, suggesting distinct patterns of resistome assembly based on gestational age at birth and antibiotic treatment status. (g) The programmed assembly of the gut resistome in preterm infants differs from that of near term infants in prolonged carriage of some ARGs, delayed acquisition of others, and differences in the relative abundance of ARGs in the gut.

74

24, 2018) 326 was 32.0% (Fig. 2.3a). Hence, while the resistance determinants discovered in our functional selections have largely been previously sequenced, in most cases they have not yet been assigned resistance functions, as we have previously noted7. The predicted sources of resistance conferring ORFs (determined by best BLAST hit to the NCBI non-redundant protein database) were predominantly uncultured bacteria or Enterobacteriaceae (Fig. 2.3b). The identification of Enterobacteriaceae as the likely hosts of ARGs in the infant gut microbiota is consistent with prior studies7 and current understanding of Enterobacteriaceae as prolific hosts and traffickers of ARGs327-329. Additionally, the identification of uncultured (189 ORFs) and unclassified (24 ORFs) bacteria as sources of ARGs highlights the value of functional metagenomics as a culture- and sequence-unbiased method for characterizing resistomes330.

To extend our resistome analysis to all samples in our cohort, we used ShortBRED235 to quantify translated ARG abundance in all sequenced metagenomes using a custom database that included canonical ARGs in CARD as well as the functionally selected ARGs identified here

(see Methods). Resistomes clustered according to gestational age at birth and antibiotic treatment status (Fig. 2.3c, p<0.001, Adonis). The gut metagenomes of preterm infants encoded fewer unique ARGs than those of near term infants (p<0.01, Wilcoxon, Fig. 2.3d). However, the cumulative resistome relative abundance was significantly higher in the gut microbiota of preterm infants with early-plus-subsequent antibiotic treatment compared to preterm infants with early-only antibiotic treatment and with antibiotic naïve, near term infants (p<0.05, Wilcoxon,

Fig. 2.3e). There was a weak inverse correlation between alpha diversity and cumulative resistome burden across all metagenomes (R2=0.09, Supplementary Fig. 2.6a), indicating that microbiota with an enriched resistome are dominated by a few samples. Indeed, in 41 of the 54 metagenomes with a cumulative resistome of reads per kilobase per million mapped reads

75

(RPKM) > 5000, a single species comprised > 50% of microbiota relative abundance. In 25 of these samples, the dominant species was E. coli (Supplementary Fig. 2.6b). Other dominant species were Enterococcus faecalis (n=5), Klebsiella pneumoniae (n=2), Staphylococcus epidermidis (n=2), Enterobacter aerogenes (n=2), Bifidobacterium breve (n=2), Pseudomonas aeruginosa, Bifidobacterium longum and Citrobacter koseri (n=1, each). Thus, it appears that extreme prematurity, associated hospitalization, and antibiotic treatment select for one or two

MDROs that dominate the infant gut microbiota rather than enriching for a greater diversity of resistant organisms and selecting for a diverse resistome.

To define the developmental progression of the infant gastrointestinal resistome over the first months of life, we regressed the abundance of ARGs (in RPKM) in a subset of antibiotic naïve, near term infant gut metagenomes (as determined by ShortBRED) against the day of life for these infants using random forests325. Fivefold cross-validation indicated that 50 predictors were sufficient for optimal model performance, so we constructed a sparse model using the 50 most informative ARGs. The sparse model was subsequently applied to preterm samples to predict the apparent ‘resistome age.’ A clear developmental trajectory based on these 50 ARGs was evident in near term infants (Fig. 2.3g). The developmental trajectory of the preterm infant gut resistome deviates from that of the antibiotic naïve, near term infants in prolonged carriage of some ARGs (e.g., oqxA, oqxB, catI, fosA5, cdeA), near absence of others (e.g., abeM), and a general increase in the normalized abundance of these genes in the gut across all timepoints (Fig.

2.3g). Overall, we found that the model only modestly predicted the chronological age of preterm infants (R2 = 0.62, Fig. 2.3f), suggesting that there are distinct patterns of resistome development based on antibiotic treatment status and gestational age at birth.

76

Highlighting the potential for lateral ARG exchange within the infant microbiome, 225 contigs (6.4% of all contigs) recovered in functional selections encoded a putative mobile genetic element (Supplementary Fig. 2.5a-e). Mobile genetic elements were most commonly observed in tetracycline selections (Supplementary Fig. 2.5f), but were also commonly observed in β-lactam, chloramphenicol, gentamicin, and ciprofloxacin selections. We observed an enrichment for mobile genetic elements on amoxicillin/clavulanate (p<0.01, hypergeometric test), tetracycline

(p<0.01, hypergeometric test), and gentamicin (p<0.001, hypergeometric test). The synteny of functionally selected ARGs with mobile genetic elements provides evidence for a potentially mobile resistome in the infant gut microbiota.

2.2.4 Persistence of multidrug resistant Enterobacteriaceae in the preterm infant gut microbiota

Whole metagenome shotgun sequencing is a powerful method for describing gross microbiota composition and function, but it is less able to elucidate strain level variation. The gut has been established as an early reservoir of bacteria that cause late onset bloodstream infections in neonates274 and is dominated by multidrug resistant Proteobacteria7, but the extent to which these early colonizing strains persist in the infant gut microbiota is not well understood. We reasoned that early life antibiotic treatment in preterm infants might create a gastrointestinal niche for such Proteobacteria that is not relinquished even after discharge from the NICU. To better understand the persistence of specific bacterial strains in the microbiota of infants in our cohort, we cultured pairs of stools collected 8-10 months apart from 15 infants (nine preterm and six near term) on a series of selective agars (see Methods). We optimized culture conditions to isolate opportunistic extraintestinal pathogens known to be highly prevalent and abundant in the preterm infant gut microbiota as well as those that are frequently multidrug resistant. In total, we

77 cultured 530 isolates from these 30 samples. We performed whole genome sequencing, assembly, and annotation of 277 isolates from the preterm set and 253 isolates from the near term set.

The species most frequently isolated by this direct antibiotic selection were E. coli

(n=139), K. pneumoniae (n=62), E. faecalis (n=50), Enterobacter cloacae (n=42), E. faecium

(n=22), C. freundii (n=15), and K. oxytoca (n=14). We identified persistence of nearly identical strains of E. coli, E, cloacae, and K. variicola in samples collected from infants both while in the

NICU and 8-10 months following discharge (Fig. 2.4g). Among the persistent isolates recovered from infants born prematurely were strains of E. coli ST405 and E. cloacae ST108 (classified by in silico multilocus sequence typing (MLST)), both of which are high-risk international lineages, with demonstrated propensity for harboring extended-spectrum β-lactamases (ESBLs) and the

NDM-family carbapenemases331-333. Each of the E. coli strains encoded a TEM-1 β-lactamase as well as an aac(3)-IId aminoglycoside acetyltransferase with predicted resistance to aminoglycosides, and each of the E. cloacae strains encoded an AmpC type β-lactamase. The K. variicola strains each encoded oqxAB, the RND-type multidrug efflux pump334, and the chromosomal Klebsiella β-lactamase blaOKP-B-1335. The isolation of nearly identical (average nucleotide identity (ANI) >99.997%) multidrug resistant Enterobacteriaceae from the gut while infants are in the NICU and then again many months later after discharge provides evidence for an enduring and transmissible pathological microbiome “scar” associated with preterm birth, early life hospitalization, and antibiotic treatment.

Because Enterococcus species are highly prevalent and abundant in the preterm infant gut7, often multidrug resistant336, and cause nosocomial blood stream infections in preterm infants337, we investigated the resistance and virulence phenotypes of these strains. A particular

78

Figure 2.4: Multidrug resistant Enterobacteriaceae lineages persist in preterm infant gut microbiota. Maximum likelihood core genome phylogenies of E. coli (a), Klebsiella spp. (c), and E. cloacae (e) isolated from infant stool. Annotations to the right of metadata indicate timepoints of isolation and sequence type (determined by in silico MLST) for E. coli and E. cloacae or species for Klebsiella. Persistent isolates are highlighted in red. Average nucleotide identity heat maps for E. coli (b), Klebsiella spp. (e), and E. cloacae (f) indicate that persistent isolates are isogenic, i.e., they share >99.997% nucleotide identity. (g) Timeline of isolation of persistent Enterobacteriaceae from infant stool.

79

concern among neonatal populations is vancomycin resistant Enterococci (VRE)338. Of the 15 unique Enterococcus strains we isolated, ten were E. faecalis and five were E. faecium

(Supplementary Fig. 2.7a). Each E. faecalis isolate was susceptible to vancomycin, while two E. faecium isolates were resistant to vancomycin339. However, no E. faecium strain formed a biofilm, while four of the E. faecalis strains formed robust biofilms at room temperature and an additional six formed biofilms at 37˚C (Supplementary Fig. 2.7b). Interestingly, all strains that formed biofilms were isolated from preterm infant stools. This is consistent with the prevailing understanding that early colonizers of the preterm infant gut are largely surface adapted strains that are prevalent in the NICU environment279. Concurringly, E. faecalis that form biofilms, while susceptible to vancomycin when planktonic, were resistant to this antibiotic when in biofilms (Supplementary Fig. 2.7c). Thus, despite the apparent tradeoff between vancomycin resistance and biofilm formation observed among Enterococcus strains, nearly all have evolved strategies for surviving vancomycin exposure. This finding is disturbing because of the widespread usage of vancomycin (Table 2.1) and high prevalence of Enterococcus colonization

(Fig. 2.1b) in NICUs.

2.2.5 Persistent metagenomic signature of antibiotic treatment in premature infants

To understand if there are persistent effects of prematurity, hospitalization, and early life antibiotic treatment on gut microbial content and function, we sought to identify metagenomic features that distinguish post-NICU discharge samples in preterm infants from age-matched samples from antibiotic naïve, near term infants. We used a supervised learning approach to classify samples as originating from a hospitalized preterm infant (including both early-only and early-plus-subsequent antibiotic treatment groups) or an antibiotic-naïve near term infant

80

Figure 2.5: Enduring damage to the preterm infant gut microbiota. (a) Support vector machine confusion matrix for classification of gestational age based on the species, metabolic pathways, and ARGs present in the microbiota following discharge from the NICU or at matched timepoints in nonhospitalized near term infants. (b) Twenty predictors most important to classification. A sparse model trained using only these twenty predictors was highly accurate (96.4% classification accuracy), a persistent metagenomic signature of preterm birth and associated hospitalization and antibiotic treatment. residing at home, based on the relative abundance of bacterial taxa and ARGs in their gut microbiota. Using a support vector machine, we identified the fifteen most informative features and constructed a sparse model consisting of only these variables, which correctly classified all preterm samples and misclassified only two near term samples (96.4% accuracy, Fig. 2.5a). Of the fifteen variables most important to model performance, six were ARGs and nine were bacterial taxa (Fig. 2.5b). The ARGs important to classification were the class A β-lactamase cfxA6340, and three and two genes functionally selected on piperacillin and tetracycline, respectively. The highest identity BLAST hit of four of the functionally selected ARGs was an

ABC transporter, while the other was a MATE family efflux transporter. The predictive species were primarily members of the order Clostridiales (Eubacterium rectale, Ruminococcus obeum,

R. lactaris, Dorea formicigenerans, E. ventriosum, E. ramulus, and E. eligens) and Bacteroidales

(Prevotella copri, Barnesiella intestinihominis). Our model identified with high accuracy if a

81 preterm infant was hospitalized and received early life antibiotic treatment based on the taxonomic and functional microbiota composition following discharge from the NICU despite the apparent recovery observed at a gross architectural level.

2.3 Materials and Methods

2.3.1 Sample and metadata collection

All samples and patient metadata used in this study were collected as part of the Neonatal

Microbiome and Necrotizing Enterocolitis Study (P.I.T., P.I.) or the St. Louis Neonatal

Microbiome Initiative (B.B.W., P.I.) at Washington University in St. Louis School of Medicine and approved by the Human Research Protection Office (approval numbers 201105492 and

201104267, respectively). Samples were obtained from infants after parents provided informed consent. We stratified our cohort by antibiotic exposure and gestational age at birth, with a subset of individuals with early antibiotic exposure only (N=9) with no antibiotic exposure outside the first week of life, a subset of individuals with early and subsequent antibiotic exposure (N=32), and a subset of late preterm or early term infants (N=17) who were not hospitalized and were antibiotic naïve over the first months of life (Table 2.1). All stools produced were collected and stored as previously described 249,341. In total, 444 samples collected longitudinally from 58 infants were shotgun sequenced and included in all metagenomic analysis.

2.3.2 Metagenomic DNA extraction

Metagenomic DNA was extracted from approximately 100 mg of stool samples using the

PowerSoil DNA Isolation Kit (MoBio Laboratories) following the manufacturer’s protocol with the following modification: samples were lysed by two rounds of two minutes of bead beating at

82

2.5k oscillations per minute for 2 minutes followed by 1 minute on ice and 2 additional minutes of beadbeating using a Mini-Beadbeater-24 (Biospec Products). DNA was quantified using a

Qubit fluorometer dsDNA BR Assay (Invitrogen) and stored at -20˚C.

2.3.3 Metagenomic sequencing library preparation

Metagenomic DNA was diluted to a concentration of 0.5 ng/µL prior to sequencing library preparation. Libraries were prepared using a Nextera DNA Library Prep Kit (Illumina) following the modifications described in Baym et al, 2015342. Libraries were purified using the Agencourt

AMPure XP system (Beckman Coulter) and quantified using the Quant-iT PicoGreen dsDNA assay (Invitrogen). For each sequencing lane, 10 nM of approximately 96 samples were pooled three independent times. These pools were quantified using the Qubit dsDNA BR Assay and combined in an equimolar fashion. Samples were submitted for 2×150 bp paired-end sequencing on an Illumina NextSeq High-Output platform at the Center for Genome Sciences and Systems

Biology at Washington University in St. Louis with a target sequencing depth of 2.5 million reads per sample.

2.3.4 Rarefaction analysis

To determine the appropriate sequencing depth necessary to fully characterize infant gut microbiota composition and function, 17 representative metagenomes that were sequenced most deeply were subsampled at the following read depths: 8000000, 7000000, 6000000, 5000000,

4000000, 3000000, 2000000, 1000000, 100000, and 10000. Subsampled metagenomes were profiled using MetaPhlAn 2.0280 to determine species richness at each depth.

83

2.3.5 Metagenome profiling

Prior to all downstream analysis, Illumina paired-end reads were binned by index sequence.

Adapter and index sequences were trimmed and sequences were quality filtered using

Trimmomatic v0.36343 using the following parameters: java -Xms2048m -Xmx2048m -jar trimmomatic-0.33.jar PE -phred33 ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:1:true

SLIDINGWINDOW:6:10 LEADING:13 TRAILING:13 MINLEN:36. Relative abundance of species was calculated using MetaPhlAn 2.0204 (repository tag 2.2.0). Relative abundance tables were merged using the merge_metaphlan_tables.py script. Abundance of metabolic pathways was determined using HUMAnN2. Raw count values were normalized for sequencing depth, collapsed by ontology, and tables were merged using the humann2_renorm_table, humann2_regroup_table, and humann2_join_tables utility scripts278.

2.3.6 Construction of metagenomic libraries from infant gut samples for functional selection

Approximately 5 µg purified extracted total metagenomic DNA was used as starting material for metagenomic library construction. To create small-insert metagenomic libraries, DNA was sheared to a target size of 3,000bp using the Covaris E210 sonicator following manufacturer’s recommended settings (http://covarisinc.com/wp-content/uploads/pn_400069.pdf). Sheared

DNA was concentrated by QIAquick PCR Purification Kit (Qiagen) and eluted in 30 μl nuclease-free H2O. Then the purified DNA was size-selected by using BluePippin instrument

(Sage Science) to a range of 1000-6000 bp DNA fragment through a premade 0.75% Pippin gel cassette. Size selected DNA was then end-repaired using the End-It DNA End Repair kit

(Epicentre) with the following protocol:

84

(1) Mix the following in a 50 μl reaction volume: 30 μl of purified DNA, 5 μl dNTP mix (2.5

mM), 5 μl 10X End-Repair buffer, 1 μl End-Repair Enzyme Mix and 4 μl nuclease-free

H2O.

(2) Mix gently and incubate at room temperature for 45 min.

(3) Heat-inactivate the reaction at 70°C for 15 min.

End-repaired DNA was then purified using the QIAquick PCR purification kit (Qiagen) and quantified using the Qubit fluorometer BR assay kit (Life Technologies) and ligated into the pZE21-MCS-1 vector at the HincII site. The pZE21 vector was linearized at the HINCII site using inverse PCR with PFX DNA polymerase (Life Technologies):

(1) Mix the following in a 50 μl reaction volume: 10 μl of 10X PFX reaction buffer, 1.5 μl of

10 mM dNTP mix (New England Biolabs), 1 μl of 50 mM MgSO4, 5 μl of PFX enhancer

solution, 1 μl of 100 pg μl 21 circular pZE21, 0.4 μl of PFX DNA polymerase, 0.75 μl

forward primer (5’ GAC GGT ATC GAT AAG CTT GAT 3’), 0.75 μl reverse primer (5’

GAC CTC GAG GGG GGG 3’) and 29.6 μl of nuclease free H2O to a final volume of 50

μl.

(2) PCR cycle temperature as follows: 95°C for 5 min, then 35 cycles of [95°C for 45 s, 55°C

for 45 s, 72°C for 2.5 min], then 72°C for 5 min.

Linearized pZE21 was size-selected (~2,200bp) on a 1% low melting point agarose gel (0.5X

TBE) stained with GelGreen dye (Biotium) and purified by QIAquick Gel Extraction Kit

(Qiagen). Pure vector was dephosphorylated using calf intestinal alkaline phosphatase (CIP, New

England BioLabs) by adding 1/10th reaction volume of CIP, 1/10th reaction volume of New

England BioLabs Buffer 3, and nuclease-free H2O to the vector elute and incubating at 37°C overnight before heat inactivation from 15 min at 70°C. End-repaired metagenomic DNA and

85 linearized vector were ligated together using the Fast-Link Ligation Kit (Epicentre) at a 5:1 ratio of insert:vector using the following protocol:

(1) Mix the following in a 15 μl reaction volume: 1.5 μl 10X Fast-Link buffer, 0.75 μl ATP

(10 mM), 1 μl FastLink DNA ligase (2 U/μl), 5:1 ratio of metagenomic DNA to vector,

and nuclease-free H2O to final reaction volume.

(2) Incubate at room temperature overnight.

(3) Heat inactivate for 15 min at 70°C.

After heat inactivation, ligation reactions were dialyzed for 30 min using a 0.025 um cellulose membrane (Millipore catalogue number VSWP09025) and the full reaction volume used for transformation by electroporation into 25 μl E. coli MegaX (Invitrogen) according to the manufacturer’s recommendations. Cells were recovered in 1 ml Recovery Medium (Invitrogen) at 37°C for one hour. Libraries were titered by plating out 0.1 μl and 0.01 μl of recovered cells onto Luria–Bertani (LB) agar plates containing 50 μg/ml kanamycin. For each library, insert size distribution was estimated by gel electrophoresis of PCR products obtained by amplifying the insert from 36 randomly picked clones using primers flanking the HincII site of the multiple cloning site of the pZE21 MCS1 vector (which contains a selectable marker for kanamycin resistance). The average insert size across all libraries was determined to be 3 kb, and library size estimates were calculated by multiplying the average PCR-based insert size by the number of titered colony forming units (CFUs) after transformation recovery. The rest of the recovered cells were inoculated into 50 ml of LB containing 50 μg/ml kanamycin and grown overnight. The overnight culture was frozen with 15% glycerol and stored at -80°C for subsequent screening.

86

2.3.7 Functional selections for antibiotic resistance

Each metagenomic library was selected for resistance to each of 16 antibiotics (at concentrations listed in Supplementary Table 2.2 plus 50 μg/ml kanamycin for plasmid library maintenance) was performed using LB agar. Of note, as our library host, E. coli, is intrinsically resistant to vancomycin, we are unable to functionally screen for loci conferring resistance to this antibiotic.

Further, the use of kanamycin as the selective marker for the metagenomic plasmid library results in low-level cross-resistance with other aminoglycoside antibiotics, resulting in a higher required minimum inhibitory concentration for gentamicin. For each metagenomic library, the number of cells plated on each antibiotic selection represented 10x the number of unique CFUs in the library, as determined by titers during library creation. Depending on the titer of live cells following library amplification and storage, the appropriate volume of freezer stocks were either diluted to 100 μl using MH broth + 50 μg/ml kanamycin or centrifuged and reconstituted in this volume for plating. After plating (using sterile glass beads), antibiotic selections were incubated at 37°C for 18 hours to allow the growth of clones containing an antibiotic resistance conferring

DNA insert. Of the 352 antibiotic selections performed, 296 yielded antibiotic-resistant E. coli transformants (Supplementary Fig. 2.4). After overnight growth, all colonies from a single antibiotic plate (library by antibiotic selection) were collected by adding 750 μl of 15% LB- glycerol to the plate and scraping with an L-shaped cell scraper to gently remove colonies from the agar. The slurry was then collected and this process was repeated a second time for a total volume of 1.5 mL to ensure that all colonies were removed from the plate. The bacterial cells were then stored at -80°C before PCR amplification of antibiotic-resistant metagenomic fragments and Illumina library creation.

87

2.3.8 Amplification and sequencing of functionally selected fragments

Freezer stocks of antibiotic-resistant transformants were thawed and 300 μl of cells pelleted by centrifugation at 13,000 revolutions per minute (r.p.m.) for two minutes and gently washed with

1 mL of nuclease-free H2O. Cells were subsequently pelleted a second time and re-suspended in

30 μl of nuclease-free H2O. Re-suspensions were then frozen at -20°C for one hour and thawed to promote cell lysis. The thawed re-suspension was pelleted by centrifugation at 13,000 r.p.m. for two minutes and the resulting supernatant was used as template for amplification of resistance-conferring DNA fragments by PCR with Taq DNA polymerase (New England

BioLabs):

(1) Mix the following for a 25 μl reaction volume: 2.5 μl of template, 2.5 μl of ThermoPol

reaction buffer (New England BioLabs), 0.5 μl of 10 mM deoxynucleotide triphosphates

(dNTPs, New England Biolabs), 0.5 μl of Taq polymerase (5 U/μl), 3 μl of a custom

primer mix, and 16 μl of nuclease-free H2O.

(2) PCR cycle temperature as follows: 94°C for 10 min, then 25 cycles of [94°C for 45 s,

55°C for 45 s, 72°C for 5.5 min], then 72°C for 10 min.

The custom primer mix consisted of three forward and three reverse primers, each targeting the sequence immediately flanking the HincII site in the pZE21 MCS1 vector, and staggered by one base pair. The staggered primer mix ensured diverse nucleotide composition during early

Illumina sequencing cycles and contained the following primer volumes (from a 10 mM stock) in a single PCR reaction: (primer F1, CCGAATTCATTAAAGAGGAGAAAG, 0.5 μl); (primer F2,

CGAATT CATTAAAGAGGAGAAAGG, 0.5 μl); (primer F3,

GAATTCATTAAAGAGGAGAAAGGTAC, 0.5 μl); (primer R1,

GATATCAAGCTTATCGATACCGTC, 0.21 μl); (primer R2,

88

CGATATCAAGCTTATCGATACCG, 0.43 μl); (primer R3,

TCGATATCAAGCTTATCGATACC, 0.86 μl). The amplified metagenomic inserts were then cleaned using the Qiagen QIAquick PCR purification kit and quantified using the Qubit fluorometer HS assay kit (Life Technologies).

For amplified metagenomic inserts from each antibiotic selection, elution buffer was added to

PCR template for a final volume of 200 μl and sonicated in a half-skirted 96-well plate on a

Covaris E210 sonicator with the following setting: duty cycle, 10%; intensity, 5; cycles per burst,

200; sonication time, 600s. Following sonication, sheared DNA was purified and concentrated using the MinElute PCR Purification kit (Qiagen) and eluted in 20 μl of pre-warmed nuclease- free H2O. In the first step of library preparation, purified sheared DNA was end-repaired:

(1) Mix the following for a 25 μl reaction volume: 20 μl of elute, 2.5 μl T4 DNA ligase

buffer with 10 mM ATP (10X, New England BioLabs), 1 μl dNTPs (1 mM, New

England BioLabs), 0.5 μl T4 polymerase (3 U/μl, New England BioLabs), 0.5 μl T4 PNK

(10 U/μl, New England BioLabs), and 0.5 μl Taq Polymerase (5 U/μl, New England

BioLabs).

(2) Incubate the reaction at 25°C for 30 min followed by 20 min at 75°C.

Next, to each end-repaired sample, 5 μl of 1 μM pre-annealed, barcoded sequencing adapters were added (adapters were thawed on ice). Barcoded adapters consisted of a unique 7-bp oligonucleotide sequence specific to each antibiotic selection, facilitating the de-multiplexing of mixed-sample sequencing runs. Forward and reverse sequencing adapters were stored in TES buffer (10 mM Tris, 1 mM EDTA, 50 mM NaCl, pH 8.0) and annealed by heating the 1 μM mixture to 95°C followed by a slow cool (0.1°C per second) to a final holding temperature of

4°C. After the addition of barcoded adapters, samples were incubated at 16°C for 40 min and

89 then for 10 min at 65°C. Before size-selection, 10 μl each of adapted-ligated samples were combined into pools of 12 and concentrated by elution through a MinElute PCR Purification Kit

(Qiagen), eluting in 14 μl of elution buffer (10 mM Tris-Cl, pH 8.5). The pooled, adaptor-ligated, sheared DNA was then size-selected to a target range of 300-400 bp on a 2% agarose gel in 0.5X

TBE, stained with GelGreen dye (Biotium) and extracted using a MinElute Gel Extraction Kit

(Qiagen). The purified DNA was enriched using the following protocol:

(1) Mix the following for a 25 μl reaction volume: 2 μl of purified DNA, 12.5 μl 2X Phusion

HF Master Mix (New England BioLabs), 1 μl of 10 mM Illumina PCR Primer Mix (5’-

AAT GAT ACG GCG ACC ACC GAG ATC-3’ and 5’-CAAGCAGA A GAC GGC

ATA CGA GAT-3’), and 9.5 μl of nuclease-free H2O.

(2) PCR cycle as follows: 98°C for 30 s, then 18 cycles of [98°C for 10s, 65°C for 30 s, 72°C

for 30s], then 72°C for 5 min.

Amplified DNA was measured using the Qubit fluorometer HS assay kit (Life Technologies) and

10 nM of each sample were pooled for sequencing. Subsequently, samples were submitted for paired-end 101-bp sequencing using the Illumina Next Seq platform at the DNA Sequencing and

Innovation Lab at the Edison Center for Genome Sciences and Systems Biology, Washington

University in St Louis, USA). In total, three sequence runs were performed at 10 pM concentration per lane.

2.3.9 Assembly and annotation of functionally selected fragments

Illumina paired-end sequence reads were binned by barcode (exact match required), such that independent selections were assembled and annotated in parallel. Assembly of the resistance- conferring DNA fragments from each selection was achieved using PARFuMS344 (Parallel

Annotation and Reassembly of Functional Metagenomic Selections), a tool developed

90 specifically for the high-throughput assembly and annotation of functional metagenomic selections.

Open reading frames (ORFs) were predicted in assembled contigs using MetaGeneMark345 and annotated by searching amino acid sequences against an ARG specific profile hidden

Markov model (pHMM) database, Resfams346, with HMMER3347. MetaGeneMark was run using default gene-finding parameters while hmmscan (HMMER3) was run with the option --cut_ga as implemented in the script annotate_functional_selections.py. Selections were excluded from analysis if (a) more than 200 contigs were assembled or (b) the number of contigs assembled exceeded the number of colonies on the selection plate by a factor of ten. Further, assembled contigs less than 500 bp were discarded. Proteins were classified as resistance determinants using the following hierarchical scheme: If a contig encoded a protein with a 100% amino acid identity hit to the CARD database326, it was considered the causative resistance determinant on that contig. Next, if a contig encoded a protein with a significant hit to a Resfams pHMM using profile specific gathering thresholds, it was considered the causative resistance determinant on that contig. In absence of a high scoring hit to the CARD or Resfams databases, contigs were manually curated to identify plausible resistance determinants on an antibiotic specific basis.

Using these criteria, 1184 of the 5658 unique predicted proteins (20.9%) were classified as resistance determinants.

The percent identity of all resistance determinants were determined using BlastP116 query against both the NCBI non-redundant protein database (retrieved May 21, 2018) and the

CARD326 database (version 1.2.1, retrieved January 24, 2018). Once the top local alignment was identified with BlastP, it was used for a global alignment using the Needleman-Wunsch

91 algorithm as implemented in the needle program of EMBOSS348 version 6.6.0 as previously described7.

Putative mobile genetic elements were identified on functionally selected based on string matches to one of the following keywords in Pfam and TIGRfam annotations: ‘transposase’,

‘transposon’, ‘conjugative’, ‘integrase’, ‘integron’, ‘recombinase’, ‘conjugal’, ‘mobilization’,

‘recombination’, or ‘plasmid’.

2.3.10 Bacterial isolation from infant stools

Approximately 50 mg of frozen stool was resuspended in 1 mL Tryptic Soy Broth (TSB) and incubated with shaking at 37˚C for four hours. 50 µL of culture was streaked for isolation using the four quadrant method on each of the following agars: Bile Esculin Agar, ESBL Agar,

MacConkey Agar, MacConkey Agar+cefotaxime, MacConkey Agar+ciprofloxacin, and Blood

Agar (Hardy Diagnostics catalog numbers G12, G321, G35, G121, G258, A10, respectively).

Plates were incubated for 18-24 hours at 37˚C. Four colonies of each distinct morphology on each plate were substreaked onto blood agar and incubated for 18-24 hours at 37˚C. Following confirmation of morphology, a 1 mL TSB was inoculated with a single colony and grown overnight at 37˚C with shaking. Overnight cultures were frozen in 15% glycerol in TSB.

2.3.11 Genomic DNA isolation

1.5 mL TSB was inoculated from isolate glycerol stocks and grown overnight at 37˚C with shaking. DNA was extracted using the BiOstic Bacteremia DNA Isolation Kit (MoBio

Laboratories) following manufacturer’s protocols. Genomic DNA was quantified using a Qubit fluorometer dsDNA BR Assay (Invitrogen) and stored at -20˚C.

92

2.3.12 Isolate sequencing library preparation

Isolate sequencing libraries were prepared in the same manner as described for metagenomic sequencing libraries, following the protocol described in Baym et al., 2015. For each sequencing lane, 10 nM of approximately 300 samples were pooled three independent times. These pools were quantified using the Qubit dsDNA BR Assay and combined in an equimolar fashion.

Samples were submitted for 2×150 bp paired-end sequencing on an Illumina NextSeq High-

Output platform at the Center for Genome Sciences and Systems Biology at Washington

University in St. Louis with a target sequencing depth of 1 million paired end reads per sample.

2.3.13 Assembly of isolate genomes

Prior to all downstream analysis, Illumina paired end reads were binned by index sequence.

Adapter and index sequences were trimmed using Trimmomatic v0.36343 using the following parameters: java -Xms2048m -Xmx2048m -jar trimmomatic-0.33.jar PE -phred33

ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:1:true. Contaminating human reads were removed using DeconSeq349 and unpaired reads were discarded. Reads were assembled using SPAdes350 with the following parameters: spades.py -k 21,33,55,77 –careful. Contigs less than 500 bp were excluded from further analysis. Assembly quality was assessed using QUAST351. Average coverage across the assembly was calculated by mapping raw reads to contigs using bbmap

(https://jgi.doe.gov/data-and-tools/bbtools/).

2.3.14 Isolate genomic analysis

A total of 406 assemblies had an N50 greater than 50,000 and fewer than 500 total contigs longer than 1000 bp, and were included in further analysis. Genomes were annotated using Prokka352 with default parameters. Multilocus sequence types were determined using in silico MLST

93

(https://github.com/tseemann/mlst). Species assignments were determined by querying assemblies against a RefSeq sketch using Mash identifying RefSeq hit with the minimum Mash distance353. Assemblies were binned by species according to Mash designation. For each of the seven most commonly occurring species, pangenome analysis was performed using Roary, with core genome alignments created with PRANK354. An outgroup assembly of the same genus but different species was downloaded from NCBI and included in each pangenome analysis

(Supplementary Table 2.1). Maximum likelihood core genome phylogenies were constructed using RAXML under the GTRGAMMA model with 1000 bootstraps and maximum likelihood optimization initialized from a random starting tree.

2.3.15 Enterococcus biofilm formation assay

Mid-log phase cultures in freshly-prepared tryptic soy broth containing 0.5% glucose (TSBG) were diluted to OD 0.1. 200ul of the diluted culture was added in quadruplicate to 96 well polystyrene plates and incubated at room temperature or 37˚C without shaking. After 24 hours of growth, wells were decanted, washed three times with sterile PBS, and fixed for 30 minutes with

200 µl Bouin’s solution. Fixative was removed by washing three times with sterile PBS, then wells were stained with 0.1% crystal violet for 30 minutes. Excess stain was removed by washing three times with sterile PBS, the stain was solubilized in 200ul ethanol, and absorbance read at 590nm. E. faecalis strains TX5682 (biofilm negative) and TX82 (biofilm positive) were used as controls355.

2.3.16 Enterococcus vancomycin susceptibility testing

Isolates identified as Enterococcus were phenotyped for vancomycin resistance using microbroth dilution according to the CLSI guidelines339. ATCC29212 (vancomycin susceptible) and

94

ATCC51299 (vancomycin-resistant) were included in all assays as controls. Isolates were grown to mid-log phase, diluted in culture media to 1×106 CFU/ml, and used to inoculate plates containing vancomycin ranging from 128-2ug/ml. After 24 hours of static growth at 37˚C, optical density was read at 600 nm and MIC was determined by scoring by eye for turbidity.

Vancomycin resistance of biofilms was assayed after establishing biofilms as above. After 24 hours of static growth at 37˚C, planktonic cells were removed by washing three times with sterile water, and then 200 µl of TSBG containing 5 mg/ml, 5 µg/ml, or no vancomycin, and the plates incubated at 37˚C for an additional 24 hours. After washing planktonic cells three times with sterile water, 200 µl sterile water was added to each well and the viability of the cells in the biofilm was assessed using an XTT Cell Viability Kit (Cell Signaling Technology, #9095) according to manufacturer’s protocols, reading absorbance at 450nm 60 minutes after addition of reagents.

2.3.17 Microbiota age regression using random forests

Random forests was used to regress the relative abundances of all species predicted by

MetaPhlAn2 in infant stool samples against their chronological age using the R package

“randomForest”356 as previously described251. The default parameters were used with the following exceptions: ntree=10,000, importance=TRUE. Fivefold cross-validation was performed using the rfcv function over 100 iterations to estimate the minimum number of features needed to accurately predict microbiota age. The features most important for prediction were identified over 100 iterations of the importance function, and a sparse model consisting of the 50 most important features was constructed and trained on a set of nine antibiotic naïve near term infants randomly selected from the larger near term infant set. This model was validated in

95 the remaining eight antibiotic naïve near term infants, and then applied preterm infants to predict microbiota age. Microbiota for age Z-score was computed as previously described251.

2.3.18 Classification of post-discharge samples

A single sample from each individual was selected (the final post-discharge sample collected from each preterm infant and a roughly age matched sample from each near term infant). All metagenomic data (species and ARG abundances, centered and scaled) were initially used as input for a support vector machine as implemented in the R package e1071. Feature importance was determined by computing the elementwise absolute value of the matrix of weights by the matrix of support vectors357. A sparse model was subsequently constructed consisting of only the fifteen most important features.

2.4 Discussion

By combining metagenomic sequencing, selective and differential stool culture paired with isolate sequencing, functional metagenomics, and machine learning, we demonstrate for the first time persistent and potentially pathological metagenomic signatures of early life antibiotic treatment and hospitalization in preterm infants. This is manifest in an enriched gut resistome and persistent carriage of multidrug resistant Enterobacteriaceae, despite apparent recovery at the gross taxonomic level. Our work highlights the need to integrate high resolution and culture- based approaches for deeply interrogating the gut microbiota to reveal metagenomic signatures of microbiota perturbation. These complementary methods provide the first evidence of a persistent metagenomic signature of early life antibiotic treatment in the dynamic microbial community housed in the infant gut. This provides compelling evidence for an underappreciated lasting effect of preterm birth and associated hospitalization and antibiotic treatment on the

96 microbiome, which may play a role in chronic pathologies associated with prematurity for which the etiology is unclear. From a clinical standpoint, our findings emphasize a necessity for alternatives to broad-spectrum antimicrobial therapy for managing infection in the NICU. While it is possible that the metagenomic scars we identified are implicated in sequelae of preterm birth such as neurodevelopmental 358-360, metabolic 361,362, cardiac 363,364, and respiratory 365,366 defects, further experiments with model systems including gnotobiotic animals are needed to establish a link between these persistent dysbioses and lasting pathologies.

2.5 Acknowledgments

This work was done in collaboration with Bin Wang, Xiaoqing Sun, Elizabeth A.

Kennedy, Ariel Hernandez-Leyva, Phillip I. Tarr, Barbara B. Warner and Gautam Dantas. This chapter is a manuscript currently under review. A.J.G and G.D. conceived and designed the study. P.I.T and B.B.W. assembled the cohorts, oversaw transfer of biologics and clinical metadata, and provided clinical insight. A.J.G. and B.W. extracted metagenomic DNA from stools and prepared shotgun metagenomic sequencing libraries. A.J.G., B.W., and A.H-L. performed stool culturing experiments and isolate genomic DNA extraction. A.J.G. and B.W. prepared isolate genome sequencing libraries. E.A.K. performed Enterococcus phenotyping experiments. X.S. created functional metagenomic libraries, performed functional selections, and prepared functional metagenomic sequencing libraries. A.J.G. analyzed clinical metadata, shotgun metagenomic sequencing data, isolate genome sequencing data, and functional metagenomic data. A.J.G. wrote the manuscript with input from all coauthors.

This work is supported in part by awards to G.D through National Institute of General

Medical Sciences of the National Institutes of Health (R01 GM099538), National Institute of

Allergy and Infectious Diseases of the National Institutes of Health (R01 AI123394), the US 97

Centers for Disease Control and Prevention (200-2016-91955), to P.I.T. through National

Institutes of Health (5P30 DK052574 [Biobank, DDRCC]) and to G.D., P.I.T., and B.B.W. through the Eunice Kennedy Shriver National Institute Of Child Health and Human

Development of the National Institutes of Health (R01 HD092414). A.J.G. received support from a NIGMS training grant through award number T32 GM007067 (Jim Skeath, Principal

Investigator) and from the NIDDK Pediatric Gastroenterology Research Training Program under award number T32 DK077653 (P.I.T., Principal Investigator). Assembled functional metagenomic contigs, shotgun metagenomic reads, and genome assemblies have been deposited to NCBI GenBank and SRA under BioProject ID PRJNA489090.

98

2.6 Supplementary Figures

Supplementary Fig. 2.1. Rarefaction curve of species identified by Metaphlan by sequencing depth. Rarefaction curve for the number of unique species identified by subsampled sequencing depth. Black bar indicates median sequencing depth for all samples, gray shading indicates one standard deviation.

99

Supplementary Fig. 2.2. Temporal changes in relative abundance of top eight classes by group. The changes in relative abundance of the eight most abundant classes of bacteria in the gut microbiota of all infants over time. Curves are loess regression fit to the data. Shading indicates 95% confidence interval.

100

Antibiotic Ampicillin Cefotaxime Gentamicin Meropenem

Individual Other Ticarcillin/Clavulanate Vancomycin

0

100 200 300 400 500 600 Day of Life (DOL)

Supplementary Fig. 2.3. Timeline of samples and antibiotic treatments for infants in this study. Sampling timeline for near term (top) and preterm (bottom) infants included in this study. Periods of antibiotic treatment are indicated by colored bars.

101

Supplementary Fig. 2.4. Functional selections of 22 metagenomic libraries constructed from infant stool. Heatmap of resistance phenotypes observed in functional metagenomic selections. Library size and source (term or preterm infant) is indicated to the left of heatmap.

102

Supplementary Fig. 2.5. Synteny of functionally selected ARGs with mobile genetic elements. Putative mobile genetic elements were identified syntenic to ARGs recovered from tetracycline (A), gentamicin (B), chloramphenicol (C), cefoxitin (D), and piperacillin (E) selections. (F) mobile genetic elements were most commonly identified on tetracycline selections.

103

Supplementary Fig. 2.6. Microbiota with enriched resistome are low in diversity and dominated by Enterobacteriaceae. (A) There is a negative correlation between cumulative relative resistome and alpha diversity (R2=0.092). (B) Composition of the microbiota in samples with a resistome relative abundance > 5000 RPKM. Samples are low in diversity and dominated by a few organisms. In 41 out of 54 samples (76%), a single species comprises >50% of the microbiota. These organisms are most commonly E. coli, Enterococcus faecalis, Klebsiella pneumoniae, Staphylococcus epidermidis, Enterobacter aerogenes, and Bifidobacterium breve.

104

Supplementary Fig. 2.7. Genomic and phenotypic heterogeneity in Enterococcus isolated from infant stool. (A) Maximum likelihood core genome phylogeny of Enterococcus strains isolated from infant gut microbiota. Persistent strains are highlighted in red. Annotations indicate time point and species. (B) Biofilm formation phenotypes of E. faecalis and E. faecium strains at 37˚C and at room temperature. (C) Viability of Enterococcus strains in biofilms following vancomycin treatment.

105

Supplementary Table 2.1. Assemblies used as outgroups for maximum likelihood phylogenies

Outgroup assembly Species Count Outgroup to include accession

Escherichia coli 30 Escherichia fergusonii ATCC 35469 ASM2622v1

Enterococcus spp. 19 Lactococcus garvieae ATCC 49156 ASM26992v1

Enterobacter 11 Enterobacter asburiae ATCC 35953 ASM152171v1 cloacae Salmonella enterica subsp. enterica serovar Klebsiella spp. 21 ASM19599v1 Typhi str. CT18

106

Supplementary Table 2.2. Antibiotic concentrations used for functional selections

Antibiotic Abbreviation Screening Concentration - µg/ml

Amoxicillin AX 16 Amoxicillin + AXCL 16-8 Clavulanate Ampicillin AP 64 Aztreonam AZ 8 Cefepime CP 8 Cefoxitin CX 64 Ceftazidime CZ 16 Chloramphenicol CH 8 Ciprofloxacin CI 0.5 Colistin COL 8 Gentamicin GE 16 Meropenem ME 16 Penicillin G PE 128 Piperacillin PI 16 Tetracycline TE 8 Tigecycline TG 2

107

Chapter 3: Tetracycline resistance by inactivation across environmental, commensal, and pathogenic microbes 3.1 Introduction

3.1.1 Abstract

Many antibiotic resistance genes have their evolutionary origins in benign environmental microbes. Identification of environmental antibiotic resistance genes that are not yet widespread in the clinical setting can allow for the proactive mitigation of the threat posed by these genes.

Tetracycline resistance by antibiotic inactivation (e.g. by Tet(X)), has been identified in numerous environmental microbes, but occurrences in pathogens or clinical isolates have thus far been limited. Here, we identify putative genes encoding tetracycline resistance by inactivation across diverse human associated and environmental metagenomes by antibiotic selection of metagenomic libraries. We find that identified genes formed two distinct clades by microbial habitat of origin, and that resistance phenotypes were similarly correlated. Each of the genes isolated from the human gut encode resistance to next generation tetracycline antibiotics including eravacycline and omadacycline (recently approved for human use). We report a comprehensive biochemical and structural characterization of one of these enzymes, TE_7F_3.

Further, we identify TE_7F_3 in a clinical Pseudomonas aeruginosa isolate, indicating that this family of genes has already disseminated to pathogens of serious clinical concern. Lastly, we show that highlight the need for further surveillance of this family of emerging resistance

108 determinants and provide the foundation for an inhibition based strategy for countering this resistance.

3.1.2 Introduction

Benign environmental microbes are ancient and diverse reservoirs of antibiotic resistance genes, and are the likely evolutionary progenitors of most clinical resistance15,20,25,58. Non- pathogenic bacteria from habitats as varied as soils and healthy human guts have recently exchanged multidrug-resistance cassettes with globally distributed human pathogens4. With widespread antibiotic use in the clinical and agricultural settings, there is strong selective pressure for pathogens to acquire resistance genes with novel activities and substrate specificities from environmental resistomes93,367. The discovery and characterization of environmental resistance genes with novel activities before they are acquired by and widely disseminated into pathogens can help lessen their potential clinical impact, as well as improve fundamental understanding of the ecology, evolution, and transmission of resistance genes across habitats25,94.

Further, understanding the evolutionary origins, genetic contexts, and molecular mechanisms of antibiotic resistance is critical to devising strategies to curb the spread of resistant organisms and their antibiotic resistance genes, and for sustainable development of new strategies for countering infection.

Since their discovery from extracts of Streptomyces aureofaciens in 1948, the tetracyclines have become one of the most widely used classes of antibiotics in agriculture, aquaculture, and the clinic due to their broad antimicrobial spectrum, oral availability, and low cost368,369. Extensive use over the past seven decades has selected for the expansion of tetracycline resistance in environmental microorganisms21, human and animal commensals370, and among bacterial pathogens371. Tetracycline use is particularly prevalent in agriculture, with 109 tetracyclines comprising 66% of total therapeutic antibiotic use in livestock372. Widespread anthropogenic use has resulted in detectable ng/uL to µg/L quantities of tetracyclines in livestock manure and wastewater373, with tetracycline concentration directly correlating with changes in microbial community composition and increase in antibiotic resistance374. In human pathogens, tetracycline resistance is typically acquired via horizontal gene transfer and occurs almost exclusively by ribosomal protection or antibiotic efflux368,371. This distinguishes tetracycline resistance from the primary enzymatic inactivation mechanisms observed against other natural- product antibiotics26,375. Both common tetracycline resistance mechanisms have their evolutionary origins in the environment, but are now found widely distributed in many commensal and pathogenic bacteria25,94,376.

The rise in tetracycline resistance has been partially countered by the development of synthetic and semisynthetic next-generation tetracyclines368. The first generation natural product antibiotics, including chlortetracycline (1948), oxytetracycline (1951), and tetracycline (1953), were followed by the semisynthetic second generation tetracyclines, including doxycycline

(1967) and minocycline (1971), and third generation tetracyclines such as tigecycline (2005), eravacycline (2018) and omadacycline (2018)377. As a result, the tetracyclines are still widely used in agriculture and medicine378. Tetracyclines remain the frontline therapies for a range of indications, including Rickettsia, Mycoplasma pneumoniae, Borrelia recurrentis, Yersinia pestis,

Vibrio cholera, Clostridium species, Chlamydia species, Neisseria species, and Bacillus anthracis369,379. Next-generation tetracyclines represent a promising option for treating recalcitrant infections caused by carbapenem non-susceptible Acinetobacter baumannii380, extended spectrum β-lactamase or carbapenemase producing Enterobacteriaceae381, ventilator

110 associated pneumonia382, and complicated urinary tract infections and complicated intra- abdominal infections due to a range of bacterial pathogens383.

Regrettably, modification of tetracycline’s napthacene scaffold is not a sustainable strategy for countering resistance, as clinical resistance arises soon after, or even prior to, the approval of next-generation antibiotics for clinical use. For example, while tigecycline is unaffected by efflux and ribosomal protection139,384, tigecycline is vulnerable to oxidative inactivation by Tet(X)385. Until recently, Tet(X) was the only known enzyme capable of providing both tetracycline and tigecycline resistance, identified exclusively in non-pathogenic bacteria. In 2013, however, tet(X) was identified in pathogens for the first time. Eleven isolates from the Enterobacteriaceae, Comamonadaceae, and Pseudomonadaceae families from clinical urine specimens in a hospital in Sierra Leone tested positive for tet(X)138, and tet(X) has now been reported in four out of six ESKAPE pathogens137,138,386. The discovery of a tetracycline- inactivating gene in nosocomial pathogens foreshadows the increasing clinical resistance to next- generation tetracycline antibiotics and emphasizes the immediate importance of studying this emerging mode of tetracycline resistance by enzymatic inactivation.

In 2015, we reported a family of tetracycline inactivating enzymes from tetracycline selections of soil metagenomes and a previously uncharacterized homolog from the human pathogen Legionella longbeachae29 . These genes were annotated as FAD-dependent oxidoreductases and were unable to be identified as resistance genes based on sequence alone.

Our in vitro characterizations indicated that they were mechanistically distinct from Tet(X)387, consistent with their low amino acid identity to all known tet(x) genes388. Flavoenzymes are an abundant and diverse enzyme family, and display a proclivity for HGT and gene duplication, allowing them to spread between species and acquire novel functionality with relative ease389. 111

Thus, these genes are likely candidates for dissemination to the clinic, potentially compromising new tetracycline derivatives and motivating surveillance of the prevalence and abundance of this gene family across microbial habitats.

Understanding the current prevalence and distribution of resistance mechanisms across habitats and anticipating when, where, and how further resistance might spread or evolve upon imposition of antibiotic selective pressure is an important component of addressing antibiotic resistance. To this end, we sought to characterize the tetracycline resistome across diverse environmental and human associated metagenomes. We find that genes that encode tetracycline- inactivating enzymes are widespread in diverse microbial communities but cluster by habitat of origin, and that resistance phenotypes also correlate with microbial habitat. Furthermore, we describe TE_7F_3, a tetracycline resistance gene recovered from a human gut metagenome that confers resistance to omadacycline and eravacycline, recently approved next-generation tetracyclines. We identify TE_7F_3 in a clinical isolate of Pseudomonas aeruginosa from a cystic fibrosis patient in a tertiary care hospital in Pakistan, signaling the clinical arrival of tetracycline resistance by inactivation, but show that inhibition of this enzyme with anhydrotetracycline or analogues can prevent tetracycline degradation and rescue antibiotic efficacy.

112

3.2 Results

3.2.1 Ecology of tetracycline inactivation across habitats

In order to understand better the prevalence of tetracycline inactivation as a resistance mechanism across habitats, we conducted a retrospective screen of our 744 Gb of functional metagenomic libraries that had previously been selected on tetracycline and tigecycline

3,4,7,23,58,103 [GenBank accession numbers: JX009202-JX009380, KJ691878-KJ696532,

KU605810-KU608292, KF626669-KF630360, KX125104-KX128902, KU543693-KU549046].

In addition to the 10 tetracycline inactivating genes from soil metagenomes that had previously been described29, we discovered an additional 69 potential tetracycline inactivating enzymes based on growth in the presence of a tetracycline or tigecycline selective agent and annotation as an FAD-dependent oxidoreductase (Fig. 3.1a). We found that these genes formed two distinct

Figure 3.1: Diversity in tetracycline inactivating enzyme sequence corresponds to microbial habitat of origin. (a) Seventy tetracycline inactivators were identified from tetracycline and tigecycline selections of diverse metagenomic libraries. Genes selected for further analysis are indicated by gray dots at branch tips. (b) A phylogeny of the 70 genes selected for further analysis reveals that genes cluster by microbial habitat of origin. Genes from soil metagenomes are shown with red branches, and genes from gut metagenomes are shown with blue branches. (c) Percent amino acid identity heatmap indicates greater relative sequence diversity in the soil derived set compared to the gut derived set, but low identity between the two sets.

113 clades that were correlated with habitat of origin (Fig. 3.1b). There was greater sequence diversity within the soil-derived subset (mean ± s.d. percent amino acid identity 66.48% ±

9.37%) compared to the gut-derived subset (mean ± s.d. percent amino acid identity 91.42% ±

4.72%), but very low identity between the two habitats (mean ± s.d. percent amino acid identity

20.81% ± 1.45%, Fig. 3.1c).

We selected ten genes from this set (nine human gut derived and one soil derived) for further analysis and cloned them into E. coli DH10β. We performed antimicrobial susceptibility testing using the broth microdilution method for these constructs as well as the ten previously reported tetracycline inactivators29 against 11 tetracycline compounds as well as anhydrotetracycline. All but one of these computationally predicted tetracycline inactivators had a bona fide resistance function when subcloned into E. coli. We found that resistance phenotypes, like genotypes, were highly correlated to habitat of origin (Fig. 3.2). While all gut- derived tetracycline inactivators were capable of conferring high-level resistance to tigecycline, minocycline, eravacycline, and omadacycline, soil-derived enzymes were all susceptible or intermediate to these drugs. Thus, resistance to next-generation tetracyclines mediates functional clustering between the soil-derived and gut-derived genes and discriminates between habitat of origin.

3.2.2 TE_7F_3 confers resistance to next-generation tetracyclines via inactivation

114

Each of the gut-derived tetracycline inactivators encoded resistance to every tetracycline currently approved for use in the clinic, including minocycline, tigecycline, eravacycline, and omadacycline. Tigecycline was approved for human use in 2005, and is an antibiotic of last resort for recalcitrant infections including those caused by carbapenem resistant

Enterobacteriaceae134,381. Eravacycline was approved in August 2018 for treatment of complex intra-abdominal infections, and omadacycline was approved in October 2018 for treatment of community acquired bacterial pneumonia and acute skin and skin structure infections. Reports of resistance to eravacycline and omadacycline have to date been limited. Eravacycline resistance

Figure 3.2: Phenotypic profiles of heterologously expressed tetracycline inactivating enzymes correlate with habitat of origin. Antimicrobial susceptibility testing was performed by microbroth dilution on a subset of predicted tetracycline inactivators. We observed broad high-level resistance to tetracyclines within this class. Resistance phenotypes clustered by microbial habitat of origin, with resistance to next-generation tetracyclines (e.g. minocycline, tigecycline, eravacycline, omadacycline) discriminating between habitats.

115 in Klebsiella pneumoniae has been attributed to overexpression of the OqxAB and MacAB efflux pumps390, and nonsusceptibility (MIC 4 µg/mL) has been observed in Escherichia coli

DH10β overexpressing tet(X). The recent clinical approval of these drugs and the limited reports of resistance motivate further study of this mode of resistance. To understand better the mechanistic basis of resistance to next-generation tetracyclines by these resistance determinants, we selected one of these genes, TE_7F_3, for further analysis. We found that media conditioned by E. coli expressing TE_7F_3 could support the growth of a susceptible E. coli strain, indicating that the mechanism of resistance is drug inactivation.

In vitro reactions containing purified TE_7F_3, NADPH, and tigecycline, eravacycline, or omadacycline exhibited a time dependent decrease in absorbance between 360 nm and 400 nm

(Supplementary Fig. 3.1a-c). This is consistent with enzymatic disruption of tetracycline’s β- diketone chromophore, as has previously been described for enzymatic modification of tetracycline29. To understand the biochemical basis of next-generation tetracycline (Fig. 3.3d) modification by TE_7F_3, we determined kinetic parameters of enzymatic activity by monitoring in vitro reaction progress using absorbance at 400 nm. For comparison, we determined kinetic parameters of Tet(X) mediated inactivation of tigecycline and eravacycline in a similar manner. The catalytic efficacy of TE_7F_3 was five times greater than that of Tet(X)

-1 -1 for degradation of eravacycline (kcat/KM values of 0.068 ± 0.017 and 0.013 ± 0.002 µM min , respectively), and eight times greater for degradation of tigecycline (kcat/KM values of 0.072 ±

0.009 and 0.009 ± 0.001 µM-1min-1, respectively; Fig. 3.3a-b). The difference in catalytic efficacy is largely mediated by more rapid substrate turnover by TE_7F_3 compared to Tet(X), as KM values are of similar magnitude across all pairwise enzyme-substrate combinations, but

TE_7F_3 kcat values are an order of magnitude greater compared to Tet(X) (Fig. 3.3c). When

116

Figure 3.3: Biochemical basis for tetracycline inactivation by TE_7F_3. Michaelis-Menten kinetics of tetracycline destructase-mediated degradation of next-generation tetracyclines. (a) Michaelis-Menten plot of Tet(X) degradation of tigecycline and eravacycline. (b) Michaelis-

Menten plot of TE_7F_3 (Tet(3)) degradation of next-generation tetracyclines. (c) Apparent Km, kcat, and catalytic efficiencies for the tetracycline destructase-mediated degradation of tigecycline, eravacycline, and omadacycline. (d) Next-generation tetracycline antibiotics tigecycline, eravacycline, and omadacycline. Error bars represent standard deviation for three independent trials. All data points possess error bars, though some are not visible at the plotted scale.

enzymatic reactions were analyzed by liquid chromatography-mass spectrometry, we observed that the primary product of omadacycline, tigecycline, and eravacycline degradation is consistent with monohydroxylation (Supplementary Fig. 3.1d-f). For each next-generation tetracycline substrate, there was a time dependent decrease in the relative extracted ion counts of the tetracycline substrate (m/z for [M+H]+ equal to 586, 557, 559 for tigecycline, eravacycline, and omadacycline, respectively), and a corresponding increase in the relative extracted ion counts of the monooxygenated product (m/z for [M+H]+ equal to 602, 573, 575 for tigecycline, eravacycline, and omadacycline, respectively).

117

3.2.3 TE_7F_3 confers resistance in clinical P. aeruginosa isolate

While functional metagenomics is a powerful method for identifying resistance genes in a sequence and culture unbiased manner, it is poorly equipped to associate specific resistance determinants with their native hosts. Thus, we sought to identify homologues of our set of functionally validated resistance genes in sequenced clinical isolates. We identified Pa-3, a

Pseudomonas aeruginosa strain isolated from a 45-year-old male cystic fibrosis patient’s tracheal aspirate in an ICU in a tertiary care hospital in Pakistan in December 2016. This strain was predicted to encode a 97% ID tet(X) homologue by BLAST, which we identified as having

100% nucleotide identity to TE_7F_3. We conducted antimicrobial susceptibility testing on this strain using disk diffusion for a panel of clinical antibiotics. This isolate was resistant to all antibiotics tested with the exception of piperacillin/tazobactam, a β-lactam/β-lactamase inhibitor combination, and colistin (Table 3.1). The isolate was positive by carbapenem inactivation assay, indicating that it encodes a functional carbapenemase. Multidrug resistant P. aeruginosa are designated a serious clinical threat by CDC, and acquisition of resistance to next-generation tetracyclines would further exacerbate this hazard.

In addition to antimicrobial susceptibility testing by disk diffusion, we performed broth microdilution for a subset of tetracycline antibiotics for which disks are not yet commercially available against Pa-3 (the strain encoding TE_7F_3), Pa-3 (another clinical isolate from the same collection which did not encode TE_7F_3), and the P. aeruginosa type strain ATCC

27853. Each P. aeruginosa isolate phenotyped displayed at least moderate resistance to tigecycline, eravacycline, and omadacycline, even in the absence of a putative tetracycline inactivator. This is likely due to drug efflux as low-level OprM mediated tigecycline resistance

118

Table 3.1: Zone of clearance (mm) and phenotypic resistance determination for clinical Pseudomonas aeruginosa isolate bearing TE_7F_3. Blank interpretation indicates that no interpretive criteria have been established.

Antibiotic Kirby-Bauer (mm) Interpretation Delafloxacin 6 Resistant Ceftazidime 6 Resistant Cefepime 6 Resistant Meropenem 6 Resistant Imipenem 6 Resistant Piperacillin/Tazobactam 24 Sensitive Ceftolozane/Tazobactam 6 Resistant Ceftazidime/Avibactam 6 Resistant Ciprofloxacin 6 Resistant Levofloxacin 6 Resistant Gentamicin 6 Resistant Amikacin 6 Resistant Trimethoprim/Sulfamethoxazole 6 Fosfomicin 12 Resistant Colistin 13 Sensitive Aztreonam 17 Intermediate Doxycycline 6 Minocycline 6 Tigecycline 6 Nitrofurantoin 6 Carbapenem inactivation assay 14 Positive

in P. aeruginosa PAO1 has been previously reported391. Although tigecycline, eravacycline, and omadacycline breakpoints do not yet exist for P. aeruginosa, Pa-3 was 4-8 fold more resistant to these drugs than Pa-8 and ATCC 27853. To our knowledge, this is the first description of resistance to either eravacycline or omadacycline via enzymatic inactivation in a clinical isolate, although we anticipate clinical deployment of these antibiotics will lead to selection for and expansion of this mode of resistance in both the hospital and community settings. Media conditioned by Pa-3 could support the growth of susceptible E. coli, while media conditioned by 119

27853 or Pa-8 could not, confirming that the mechanism of resistance in Pa-3 but not Pa-8 or

27853 involves antibiotic inactivation. The resistance phenotype was not transferable to wild type E. coli via transformation of bulk plasmid purification from Pa-3, so it is unlikely that

TE_7F_3 is plasmidic. Nonetheless, clinical use of next generation tetracyclines may select for dissemination of this strain. Furthermore, Pseudomonas has a highly plastic genome and undergo horizontal gene transfer at rates greater than observed for other genera392, so even chromosomally encoded genes are possible threats for transfer beyond this strain. Lastly, we observed that TE_7F_3 was syntenic with putative rolling circle transposases (Pfam:PF04986 per Prodigal annotation, see Methods), suggesting that they may be poised for horizontal transfer4,393.

3.2.4 Anhydrotetracycline analogues rescue tetracycline efficacy against resistant strains

We have previously shown that anhydrotetracycline is an inhibitor of tetracycline inactivating enzymes in vitro, and that this inhibition is sufficient to rescue tetracycline efficacy against E. coli resistant via heterologous expression of a tetracycline inactivator294. We reasoned that this strategy might likewise prove useful to prevent degradation of next-generation tetracyclines by TE_7F_3. To this end, we conducted in vitro inhibition assays with purified

TE_7F_3, all necessary cofactors, a next-generation tetracycline substrate, and an anhydrotetracycline inhibitor (see Methods). Reaction progress was monitored by measuring absorbance at 400 nm. In addition to parent anhydrotetracycline, we conducted assays using anhydrodemeclocycline as an inhibitor. We found that anhydrotetracycline inhibited TE_7F_3 degradation of tigecycline, eravacycline, and omadacycline at the micromolar level (IC50s of

1.06 ± 0.108 µM, 6.89 ± 0.655 µM, and 2.37 ± 0.510 µM, respectively). Anhydrodemeclocycline

120

Figure 3.4: Anhydrotetracycline inhibits inactivation of next-generation tetracyclines by TE_7F_3. Anhydrotetracycline and analogues inhibit enzymatic inactivation of tetracycline and next-generation tetracyclines. (a) In vitro, anhydrotetracycline, anhydrochlortetracycline, and anhydrodemeclocycline inhibit Tet(X) modification of tigecycline. Likewise, anhydrotetracycline and anhydrodemeclocycline inhibit in vitro enzymatic modification of tigecycline (b), eravacycline (c), and omadacycline (d) by TE_7F_3. (e) Anhydrodemeclocycline at subinhibitory concentrations can partially rescue tetracycline, tigecycline, eravacycline, and omadacycline efficacy against Pseudomonas aeruginosa Pa-3 and E. coli DH10β heterologously expressing TE_7F_3.

was an even more potent inhibitor of TE_7F_3 degradation of these next-generation substrates, with IC50s of 0.262 ± 0.035 µM, 2.75 ± 0.256 µM, and 0.309 ± 0.036 µM, respectively (Fig.

3.4a-c). We reason based on the predicted structural similarity to previously published cocrystal structures of Tet(50) (PDB ID: 5TUF) with anhydrotetracycline that the mechanism of inhibition likely occurs similarly294. Specifically, anhydrotetracycline or analogue binds slightly offset from

121 the active site and inhibits degradation of tetracycline or analogue by both competing for substrate binding and sterically restricting dynamics of the FAD cofactor critical for catalysis.

In order to extend our in vitro studies, we sought to determine whether anhydrotetracycline or anhydrodemeclocycline could restore the activity of next-generation tetracyclines against Pa-3, the clinical P. aeuruginosa isolate that encodes TE_7F_3. 32 µg/mL anhydrotetracycline caused a two- to four-fold increase in sensitivity of Pa-3 to tigecycline, eravacycline, or omadacycline (Fig 4d). In the same way, 8 µg/mL anhydrodemeclocycline was sufficient to cause a fourfold increase in sensitivity to tetracycline and a twofold increase in sensitivity to tigecycline, eravacycline, or omadacycline. Thus, this strategy has promise for restoring the efficacy of tetracycline antibiotics, including next-generation tetracyclines, against bacteria that resist tetracycline through inactivation.

3.3 Materials and Methods

3.3.1 Identification of candidate tetracycline inactivators in sequenced functional selections

In order to identify additional tetracycline inactivators, we queried previously sequenced functional selections of human gut, animal gut, latrine, and soil metagenomes3,4,7,23,58,103

[GenBank accession numbers: JX009202-JX009380, KJ691878-KJ696532, KU605810-

KU608292, KF626669-KF630360, KX125104-KX128902, KU543693-KU549046].

Functionally selected contigs were annotated using Resfams122 as previously described3, and annotations were queried for putative tetracycline inactivating function based on string matches

122 to one of the following keywords in Pfam and TIGRfam annotations: “FAD dependent oxidoreductase”, “oxidoreductase”, “FAD binding domain”.

3.3.2 Construction of maximum likelihood phylogenies

A multiple sequence alignment of predicted tetracycline inactivating enzymes was created with

MAFFT using the L-INS-i method394. A maximum likelihood phylogeny was then constructed using RAxML with GAMMA model of rate heterogeneity and JTT empirical base frequencies with 100 bootstraps395. Trees were visualized and decorated with metadata using iTol 396.

3.3.3 Subcloning predicted tetracycline inactivators from metagenomic source

Specific predicted tetracycline inactivators were selected for subcloning for downstream phenotypic analysis if they met each of the following criteria: (1) the open reading frame contained a start codon; (2) the open reading frame contained a stop codon; and (3) the open reading frame was greater than 1000 base pairs but less than 1500 base pairs. Tet(47-56) and

Tet(X) had previously been subcloned29. Additional predicted tetracycline inactivators were amplified from their initial metagenomic source using Phusion Hi-Fidelity Polymerase

(ThermoFisher) and primers as specified in Supplemental Table 3.1. Amplicons were ligated into pZE21 at the KpnI/HindIII sites using Fast-Link DNA Ligase (Lucigen) and transformed into E. coli MegaX DH10β cells (Invitrogen) via electroporation. The orientation and sequence of all inserts was confirmed by Sanger sequencing prior to phenotypic analyses.

3.3.4 Antimicrobial susceptibility testing by microbroth dilution

Antibiotic susceptibility testing was performed in E. coli MegaX cells (Invitrogen) bearing the pZE21 expression vector with the tetracycline inactivating gene of interest. Minimum inhibitory concentrations were determined according to Clinical and Laboratory Standards Institute (CLSI)

123 procedures 339 using Mueller-Hinton broth with 50 µg/mL kanamycin and a range of tetracycline antibiotic concentrations profiled via absorbance measurements at 600nm (OD600) at 45 minute intervals using the Synergy H1 microplate reader (Biotek Instruments, Inc) for 48 hours at 37ºC and scored by eye following 20 hours of growth at 37ºC.

3.3.5 Pseudomonas aeruginosa genomic DNA isolation

1.5 mL TSB was inoculated from isolate glycerol stocks and grown overnight at 37˚C with shaking. DNA was extracted using the BiOstic Bacteremia DNA Isolation Kit (MoBio

Laboratories) following manufacturer’s protocols. Genomic DNA was quantified using a Qubit fluorometer dsDNA BR Assay (Invitrogen) and stored at -20˚C.

3.3.6 Pseudomonas aeruginosa isolate sequencing library preparation

Genomic DNA was diluted to a concentration of 0.5 ng/µL prior to sequencing library preparation. Libraries were prepared using a Nextera DNA Library Prep Kit (Illumina) following the modifications described in Baym et al, 2015342. Libraries were purified using the Agencourt

AMPure XP system (Beckman Coulter) and quantified using the Quant-iT PicoGreen dsDNA assay (Invitrogen). For each sequencing lane, 10 nM of approximately 300 samples were pooled three independent times. These pools were quantified using the Qubit dsDNA BR Assay and combined in an equimolar fashion. Samples were submitted for 2×150 bp paired-end sequencing on an Illumina NextSeq High-Output platform at the Center for Genome Sciences and Systems

Biology at Washington University in St. Louis with a target sequencing depth of 1 million reads per sample.

124

3.3.7 Assembly and annotation of Psuedomonas aeruginosa genome

Prior to all downstream analysis, Illumina paired end reads were binned by index sequence.

Adapter and index sequences were trimmed using Trimmomatic v0.36343 using the following parameters: java -Xms2048m -Xmx2048m -jar trimmomatic-0.33.jar PE -phred33

ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:1:true. Contaminating human reads were removed using DeconSeq349 and unpaired reads were discarded. Reads were assembled using SPAdes350 with the following parameters: spades.py -k 21,33,55,77 –careful. Contigs less than 500 bp were excluded from further analysis. Assembly quality was assessed using QUAST351. Average coverage across the assembly was calculated by mapping raw reads to contigs using bbmap

(https://jgi.doe.gov/data-and-tools/bbtools/)397. Genomes were annotated using Prokka352 with default parameters.

3.3.8 Antimicrobial susceptibility testing for Pseudomonas aeruginosa

Susceptibility testing was performed using the Kirby-Bauer disk diffusion method on Mueller-

Hinton agar (Hardy Diagnostics) in accordance with CLSI standards. Pseudomonas aeruginosa

ATCC 27853 was used as a quality control.

3.3.9 Cloning, expression and purification of tetracycline-inactivating enzymes

All genes encoding tetracycline-inactivating enzymes were cloned into the pET28b(+) vector

(Novagen) at BamHI and NdeI restriction sites. Constructs were transformed into BL21-Star

(DE3) competent cells (Life Technologies). Cells harboring the plasmid were grown at 37ºC in

LB medium containing a final concentration of 0.03 mg/mL kanamycin. Once cells reached an

OD600 of 0.6, cells were cooled to 15 ºC and induced with 1 mM IPTG overnight. After this period, cells were harvested by centrifugation at 4000 rpm for 10 min at 4 ºC. Cell pellets were

125 suspended in 10 mL of 50 mM Tris (pH 8.0), 100 mM NaCl, 10 mM imidazole (pH 8.0), 1 mM

PMSF, and 5 mM BME per 1 liter of LB medium and stored at -80ºC.

Cells were thawed in the presence of 0.25 mg/mL lysozyme and disrupted using sonication on ice for 60 seconds. The cell extract was obtained by centrifugation at 13,000 rpm for 30 min at 4

ºC and was applied onto nickel rapid run agarose beads (Goldbio) equilibrated with wash buffer

(50 mM Tris (pH 8.0), 150 mM NaCl, 20 mM imidazole (pH 8.0), and 5 mM BME). The wash buffer was used to wash the nickel column three times with five column volumes. After washing, protein was eluted with five column volumes of elution buffer (wash buffer with 300 mM imidazole). The protein sample was further purified by gel chromatography using a HiLoad

16/600 Superdex 200pg column (GE Healthcare) equilibrated with 10 mM Tris (pH 8.0), 150 mM NaCl, 5 mM dithioerythritol (DTT). The fractions containing the protein of interest were pooled and concentrated using a 30K MWCO Amicon centrifugal filter (Millipore).

3.3.10 In vitro tetracycline inactivation assays

Reactions were prepared in 100 mM TAPS buffer with 100 µM substrate, 14.4 µM enzyme, and an NADPH regenerating system consisting of the following components (final concentrations):

+ glucose-6-phosphate (40mM), NADP (4mM), MgCl2 (1mM), and glucose-6-phosphate dehydrogenase (4U/ml). The regeneration system was incubated at 37ºC for 30 minutes to generate NADPH before use in reactions. Reactions were sampled at various timepoints, and quenched in four volumes of an acidic quencher consisting of equal parts acetonitrile and 0.25 M

HCl.

Products generated from enzymatic inactivation of both tetracycline and chlortetracycline were separated by reverse phase HPLC using a Phenomenex Luna C18 column (5µm, 110 Å,

126

2×50mm) and 0.1% trifluoroacetic acid in water (A) and acetonitrile (B) as mobile phase.

Injections of 25µl sample volume were eluted using a linear gradient from 25%B to 75%B over

14 minutes at a flow rate of 1ml/min.

3.3.11 Kinetic characterization of tetracycline inactivation

The optimal enzyme concentration for steady-state kinetics assays was determined by varying the concentration of enzyme while keeping chlortetracycline and NADPH concentration constant. 0.4 μM enzyme was found to give linear slopes for all concentrations of substrate tested, and was used as the enzyme concentration for all kinetics experiments.

Reactions were prepared in 100 mM TAPS buffer at pH 8.5 with 0-160 µM substrate, 1.6 mM

NADPH, and 0.4 µM enzyme. UV-visible spectroscopy measurements were taken in triplicate at

400nm wavelength light with a Cary 60 UV/Vis system (Agilent) for 10 minutes at room temperature. Initial reaction velocities were determined by linear regression using the Agilent

Cary WinUV Software, and fitted to the Michaelis-Menten equation:

using GraphPad Prism 6.

3.3.12 LC-MS characterization of chlortetracycline degradation products

Reactions were prepared in 100 mM phosphates buffer at pH 8.5 with 1mM CTC, 0.5 mM

NADPH, 5 mM MgCl2 and 0.4 µM Tet(55). After 10 minutes, the reaction was centrifuge filtered for 10 minutes using a Millipore Amicon Ultracel (3 kDa MW cutoff) to remove enzyme.

Prior to centrifugation, filters were triply rinsed with phosphate buffer to remove excess glycerol.

The filtrate was collected and analyzed by LC-MS using an Agilent 6130 single quadrupole

127 instrument with G1313 autosampler, G1315 diode array detector, and 1200 series solvent module. Reaction products were separated using a Phenomenex Gemini C18 column, 50 × 2 mm

(5 µm) with guard column cassette was used with a linear gradient of 0% acetonitrile + 0.1% formic acid to 95% acetonitrile + 0.1% formic acid over 14 min at a flow rate of 0.5 mL/min prior to analysis by electrospray ionization.

3.3.13 In vitro characterization of anhydrotetracycline inhibition

IC50 values were determined for TE_7F_3 by measuring the initial velocity of tetracycline degradation in the presence of varying concentrations of anhydrotetracycline or anhydrodemeclocycline. The concentrations of tetracycline and NADPH were kept constant at

25 μM and 500 μM, respectively. Assays were prepared by combining all components except for enzyme and equilibrating to 25°C for five minutes. After the addition of enzyme, absorbance at

400 nm was measured for five minutes. All measurements were taken in triplicate. The final concentrations for assay components were 100 mM TAPS buffer (pH 8.5), 25 μM tetracycline,

500 μM NADPH, 16 mM MgCl2, 0.4 μM enzyme, and 0.05 – 150 μM inhibitor. IC50 values were determined by plotting the log of anhydrotetracycline concentration against v0 in GraphPad

Prism 6.

3.4 Discussion

Our results indicate that tetracycline inactivation is a resistance mechanism that is widespread in environmental and human associated metagenomes. We describe two distinct clades of tetracycline resistance genes, and demonstrate that genotypes and resistance phenotypes are correlated with microbial habitat of origin. Amongst the gut-derived enzymes reported herein, we also describe the first report of resistance by inactivation to eravacycline and

128 omadacycline (recently approved for human use) in a clinical isolate. We identify TE_7F_3 in a clinical isolate of Pseudomonas aeruginosa, indicating that this mechanism of resistance has already disseminated to the clinical setting. Lastly, we show that inhibition of TE_7F_3 by anhydrotetracycline and derivatives may be a promising strategy for restoring efficacy of next generation tetracyclines in the face of high-level tetracycline resistance. The results presented herein motivate continued surveillance of tetracycline resistance by inactivation in environmental and clinical settings. Similarly, this work underscores the value in proactive screening of antimicrobial agents in the pipeline for resistance determinants present in diverse microbial communities such that strategies to minimize their eventual clinical impact can be explored.

3.5 Acknowledgments

This work was done in collaboration with Jana L. Markley, Hirdesh Kumar, Bin Wang,

Sidra Irum, Meghan Wallace, Carey-Ann D. Burnham, Saadia Andleeb, Niraj H. Tolia, Timothy

A. Wencewicz and Gautam Dantas. This chapter is a manuscript currently under preparation.

A.J.G., G.D., T.A.W., and N.H.T. conceived and designed the study. A.J.G. performed bioinformatics analyses. A.J.G. and B.W. subcloned tetracycline inactivators and performed susceptibility testing by broth microdilution. J.L.M. performed biochemical assays. H.K. crystallized TE_7F_3, collected data, and solved structure. S.I. and S.A. collected clinical sample and cultured P. aeruginosa clinical isolate. M.W. and C-A.D.B. performed P. aeruginosa phenotyping by disk diffusion. A.J.G. wrote the manuscript with input from all authors.

This work is supported in part by awards to G.D, N.H.T., and T.A.W. through National

Institute of Allergy and Infectious Diseases of the National Institutes of Health (R01 AI123394).

129

A.J.G. received support from a NIGMS training grant through award number T32 GM007067

(Jim Skeath, Principal Investigator) and from the NIDDK Pediatric Gastroenterology Research

Training Program under award number T32 DK077653 (Phillip Tarr, Principal Investigator).

130

3.6 Supplementary Figures

Supplementary Figure 3.1: TE_7F_3 inactivates next-generation tetracyclines by monooxygenation. In vitro reactions containing purified TE_7F_3 (Tet(3)), NADPH, and tigecycline (a), omadacycline (b), or eravacycline (c) were assayed by absorbance scans taken at one minute intervals. The rainbow pattern depicts a spectral change over time. Decrease in absorbance over time from 360 nm to 400 nm indicates enzymatic disruption of the characteristic tetracycline β-diketone chromophore. In the case of tigecycline (d), omadacycline (e), and eravacycline (f), the product of enzymatic modification is consistent with monooxygenation. Plots show the relative extracted ion counts of the parent tetracycline (blue curves; m/z for [M+H]+ equal to 586, 557, 559, respectively ) or the monooxygenated product (orange curves; m/z for [M+H]+ equal to 602, 573, 575, respectively).

131

Supplementary Table 3.1: Primers used in this study

Primer sequence (5'  3') Designed specificity

ATGACTTTGCTAAAAAATAAAAAAATTA TE_7F_Contig_3 (452_1588) TTATAGATTCATTAGTTTTTGGAATGA TE_7F_Contig_3 (452_1588) ATGAATTTACTAAACAATAAAAAAGTTACA TG_0402_Contig_27 (425_1561) TTATAGATTCATTAGTTTTTGGAATGA TG_0402_Contig_27 (425_1561) ATGACAATGCGAATAGATACAGACA TE_0402_Contig_1789 (2156_3322) TTATACATTTAACAATTGCTGAAACG TE_0402_Contig_1789 (2156_3322) ATGACTTTGCTAAAAAATAAAAAAATT TE_0402_Contig_1211 (2189_3325) TTATAGATTCATTAGTTTTTGGAATGA TE_0402_Contig_1211 (2189_3325) ATGACAATGCGAATAGATACAGACA TG_0402_Contig_45 (861_2027) TTATACATTTAACAATTGCTGAAACG TG_0402_Contig_45 (861_2027) ATGACAATGCGAATAGATACAGACA TG_0401_3a_Contig_236 (181_1347) TTATACATTTAACAATTGCTGAAACG TG_0401_3a_Contig_236 (181_1347) ATGACTTTACTAAAACATAAAAAAATTACA TE_6F_Contig_7 (1101_2237) TTATAGATTCATTAGTTTTTGGAATGA TE_6F_Contig_7 (1101_2237) ATGACTAGCGATAAGAACCCT S08_TE.2:1046-1630 Contig:2 TCATTCGTATTCTGGTAGCG S08_TE.2:1046-1630 Contig:2

132

Chapter 4: Plasticity, dynamics, and inhibition of emerging tetracycline-resistance enzymes

4.1 Introduction

4.1.1 Abstract

While tetracyclines are an important class of antibiotics in agriculture and the clinic, their efficacy is threatened by increasing resistance. Resistance to tetracyclines can occur through efflux, ribosomal protection, or enzymatic inactivation. Surprisingly, tetracycline enzymatic inactivation has remained largely unexplored despite providing the distinct advantage of antibiotic clearance. The tetracycline destructases are a recently-discovered family of tetracycline-inactivating flavoenzymes from pathogens and soil metagenomes with a high potential for broad dissemination. Here, we show tetracycline destructases accommodate tetracycline-class antibiotics in diverse and novel orientations for catalysis, and antibiotic binding drives unprecedented structural dynamics facilitating tetracycline inactivation. We identify a key inhibitor binding mode that locks the flavin adenine dinucleotide cofactor in an inactive state, functionally rescuing tetracycline activity. Our results reveal the potential of a novel tetracycline/tetracycline destructase inhibitor combination therapy strategy to overcome resistance by enzymatic inactivation and restore the use of an important class of antibiotics.

133

4.1.2 Introduction

Antibiotics revolutionized the treatment of infectious diseases, enabling significant reductions in deaths due to infection over the past 80 years. However, the prolific anthropogenic use of these life-saving chemotherapeutics in the clinic and agriculture has also selected for a steady increase in antibiotic resistance in both benign and pathogenic bacteria21. Regrettably, increasing antibiotic resistance has been accompanied by a decrease in development and regulatory approval of new antibiotics157, threatening the end of the modern antibiotic era. The likely origin of virtually all clinical antibiotic resistance genes are environmental microbial communities, which harbor ancient and diverse resistomes4,15,20,58,94,398,399. Indeed, environmental reservoirs have been identified for a number of recently-emerged and rapidly-disseminating resistance genes representing urgent clinical threats (e.g. plasmid-borne and chromosomally- acquired carbapenem88, colistin75, and quinolone93 resistance genes). This motivates the need to better understand resistance mechanisms of environmental origin before they are widespread in the clinic and ultimately guide new drug discovery and therapeutic strategies that mitigate emerging mechanisms of resistance.

Despite growing resistance, the tetracyclines remain among the most widely used antibiotics in clinical and agricultural settings368. Indeed, tetracyclines ranked in the top three antibiotics in both clinical prescriptions in the United States in 2010 (representing 15% of all antibiotic prescriptions) and in global sales for animal use in 2009 ($500 million in sales)400.

Furthermore, next-generation derivatives are currently fueling a tetracycline renaissance, with the 2005 clinical approval of tigecycline134 and ongoing late-stage clinical trials of eravacycline and omadacycline139,140 justifying urgent interrogation of emerging and novel tetracycline resistance mechanisms. Previously, tetracycline resistance was thought to occur almost

134 exclusively by two mechanisms: ribosomal protection or antibiotic efflux368,371. However, an alternate mechanism – enzymatic inactivation – has been documented in benign and pathogenic bacteria, such as the enzyme Tet(X)28,100,136,385,401-405. We recently identified a new family of tetracycline-inactivating enzymes through functional metagenomic selections for tetracycline resistance from 18 grassland and agricultural soils58. We showed that these nine proteins, Tet(47-

55), were able to enzymatically inactivate tetracycline, resulting in 16-64 fold increases in minimum inhibitory concentration (MIC)406 when expressed in E. coli.

Here, we pursued a multi-pronged structural, in vitro enzymatic, and bacterial phenotypic investigation of the emerging tetracycline destructases. We show that a recently identified tetracycline destructase confers tetracycline resistance to a known soil-derived human pathogen.

We hypothesized that structural characteristics of tetracycline-inactivating enzymes would reveal useful information about their unique activity profiles and lead to the rational design of inhibitors, similar to the widely employed β-lactamase inhibitors110. Discerning the structural and mechanistic details of conformational or transitional states in target proteins has been crucial for the rational design of successful inhibitors in a number of cases, as exemplified by inhibitors of

HIV-1 protease407 and mechanistic inhibitors of glycosyltransferases that involve significant conformational movement in the active site408. Through structure-function analyses of four tetracycline destructases alone and in complex with tetracycline-class ligands, we present the molecular basis for unexpected structural dynamics in tetracycline destructases driven by antibiotic binding. These unprecedented changes provide a novel mechanism for inhibition that has the potential to synergistically restore tetracycline activity and rescue an important class of antibiotics.

135

4.2 Results

4.2.1 Tetracycline inactivation by Legionella longbeachae

The tetracycline destructase family was initially discovered by functional metagenomic selection for tetracycline resistance from soil samples58. We observed that the soil-derived human pathogen Legionella longbeachae, the causative agent of Pontiac Fever and Legionnaires’

Disease409,410 encodes a homolog to the tetracycline destructases, termed tet(56). Like the other tetracycline destructases, Tet(56) is able to inactivate tetracycline in vitro and expression of tet(56) in E. coli confers high-level tetracycline resistance406. To confirm that tet(56) is a functional resistance determinant in L. longbeachae, we deleted the gene and examined the strain for changes in drug sensitivity. Deletion of the chromosomally-encoded tet(56) resulted in an increase in tetracycline sensitivity to L. longbeachae (Fig. 4.1). Moreover, overexpression of

Figure 4.1: Dose-response curve showing the effect of tetracycline on growth of Legionella strains. Deletion of tet(56) from L. longbeachae causes an increase in tetracycline sensitivity. Complementation with a plasmid containing the tet(56) insert rescues the tetracycline resistance phenotype compared to strains bearing the empty-vector control. Furthermore, introduction of the complementing vector into L. pneumophila, which lacks a tet(56) homolog, results in an increase in tetracycline resistance. Data are represented as mean ± s.d. of three technical replicates.

136 tet(56) in the L. longbeachae Δtet(56) strain resulted in increased tetracycline resistance to levels even higher than the wild type L. longbeachae strain containing vector. Finally, expression of tet(56) in L. pneumophila, a Legionella strain lacking a tetracycline destructase homolog, also dramatically increased the level of tetracycline resistance of the strain. These results demonstrate that tetracycline destructases are already functional in a known human pathogen and their introduction into a related pathogen would lead to increased antibiotic resistance.

4.2.2 Tet(50,51,55,56) crystal structures reveal a conserved domain architecture and a dynamic substrate loading channel

We began our structural analysis by solving the X-ray crystal structures of four tetracycline- inactivating enzymes: Tet(50), Tet(51), Tet(55), and Tet(56) (Fig. 4.2a and Supplementary Fig.

4. 1a-e). Although only sharing ~24% amino acid identity with the previously crystallized Tet(X) and requiring initial structure determination via experimental phasing (Supplementary Table

4.1), Tet(50,51,55,56) and Tet(X) exhibit a similar overall architecture. Each possesses a flavin adenine dinucleotide (FAD)-binding Rossmann-type fold domain, a tetracycline-binding domain, and a C-terminal -helix that bridges the two domains (Fig. 4.2a-d). Surprisingly, each of the new structures revealed an unexpected second -helix at the C-terminus (Fig. 4.2a-c) that is not present in Tet(X) (Fig. 4.2d) and could not be predicted based on their primary sequences.

Comparisons to co-crystal structures of Tet(X) in complex with either chlortetracycline [PDB

2Y6R] or iodtetracycline [PDB 2Y6Q]385 reveal that this helical extension comes in close proximity to the tetracycline binding site (Fig. 4.2d), contributing to the formation of a substrate- loading channel. In the Tet(50) crystal structure, we observed two distinct monomers in the asymmetric unit. In Tet(50) monomer A (Fig. 4.2b), this substrate-loading channel is blocked by a flexible loop (Fig. 4.2e,f), whereas in Tet(50) monomer B (Fig. 4.2c), the channel is open,

137

138

Figure 4.2: Crystal structures of Tet(50), Tet(51), Tet(55), and Tet(56) reveal a conserved architecture, structural changes that enable substrate loading channel accessibility, and two conformations of the FAD cofactor. (a) Overlay of the Tet(50) monomer A, Tet(50) monomer B, Tet(51), Tet(55), and Tet(56) crystal structures. The FAD-binding domain (salmon), the tetracycline-binding domain (pale green), the first (cyan) and second (deep teal) C- terminal α-helixes, and FAD molecules (orange) are shown. (b) Cartoon depiction of Tet(50) monomer A. (c) Cartoon depiction of Tet(50) monomer B. (d) A previously published crystal structure of Tet(X) with chlortetracycline (yellow) bound – PDB ID 2Y6R. Tet(X) does not possess the second C-terminal α-helix. (e) Surface representation of Tet(50) monomer A reveals that the substrate-loading channel is closed. Rotated 180º along the y-axis compared to b. (f) Zoomed in view of the closed substrate-loading channel in Tet(50) monomer A. (g) Surface representation of Tet(50) monomer B reveals that the substrate-loading channel is open. Rotated 180º along the y-axis compared to c. (h) Zoomed in view of the open substrate-loading channel in Tet(50) monomer B. (i) Surface representation of Tet(X) reveals that absence of the second C- terminal α-helix results in a wide-open substrate-binding site. Rotated 180º along the y-axis compared to d. (j) Zoomed in view of the wide open substrate-binding site in Tet(X). (k) The FAD cofactor adopts the IN conformation in Tet(50) monomer A, characterized by a 12.3 Å distance between the C8M and C2B atoms of the FAD molecule. (l) In Tet(50) monomer A, FAD is IN and the substrate-loading channel is closed. (m) The FAD cofactor adopts the OUT conformation in Tet(50) monomer B, characterized by a 5.2 Å distance between the C8M and C2B atoms of the FAD molecule. (n) In Tet(50) monomer B, FAD is OUT, and the channel is open. (o) The IN conformation of FAD allows for substrate catalysis. The OUT conformation of FAD allows for regeneration of the reduced FAD for the next round of catalysis. The green area indicates the substrate-binding site. The pink area indicates the putative NADPH binding site.

allowing tetracycline to access the substrate-binding site (Fig. 4.2g,h). The absence of the second -helix in Tet(X) results in a widely exposed entrance to the substrate-binding site (Fig.

4.2i,j), which likely contributes to alternate substrate specificity of this enzyme.

4.2.3 Two FAD conformations are linked to the accessibility of the substrate loading channel

Tetracycline destructases are flavoenzymes that utilize an FAD cofactor to degrade their substrate411,412. These enzymes bind FAD in two distinct conformations that are important for catalysis413. Both conformations are captured by the structures presented here (Supplementary

Fig. 4.1f-j). Tet(50) monomer A, which has a closed substrate-loading channel, bound FAD in an

IN conformation (Fig. 4.2k,l). In this conformation the reactive isoalloxazine moiety of the FAD

139 is stretched away from the adenosine moiety and into the substrate-binding site. This allows reaction with molecular oxygen to produce an FAD-hydroperoxide intermediate that is in close proximity to the tetracycline substrate (the C4a of FAD is ~5.9 Å away from the C11a substrate hydroxylation site in Tet(X)), allowing for hydroxylation and subsequent spontaneous degradation of the tetracycline substrate (Fig. 4.2o)414. After catalysis, FAD flips away from the substrate-binding site, adopting the OUT conformation. Tet(50) monomer B, which has an open substrate-loading channel, binds FAD in an OUT conformation where the isoalloxazine moiety is bent towards the adenosine and away from the substrate-binding site (Fig. 4.2m,n). This conformational change allows for products to be released through the open channel and positions the oxidized FAD for reduction by NADPH in a distinct NADPH binding site during cofactor regeneration (Fig. 4.2o and Fig. 4.3a)414. After reduction, FAD is poised to flip back to the IN conformation for the next round of catalysis upon substrate binding. Our observation of FAD in both IN and OUT conformations imply that FAD exists in an equilibrium between the two states in the absence of substrate binding.

4.2.4 Chlortetracycline binds to Tet(50) in an unprecedented mode, driving FAD conversion and substrate loading channel closure

Since accessibility of the substrate-loading channel appeared to be dependent on the conformation of FAD in the Tet(50) crystal structures, we soaked Tet(50) with various tetracycline compounds. Surprisingly, chlortetracycline binds to Tet(50) in a ~180° rotated orientation compared to the orientation in which chlortetracycline and other tetracycline substrates (e.g. iodtetracycline, minocycline, tigecycline) bind Tet(X)385,415 (Fig. 4.3a-d and

Supplementary Fig. 4.2). Tetracycline compounds have a four-ring system (labeled A-D (Figure

3a,c), and have a distinctive three-dimensional architecture with a significant bend between rings

140

A and B, allowing for unambiguous modeling into the electron density. In the

Tet(X)+chlortetracycline structure, the chlortetracycline D-ring with the attached chlorine faces away from the substrate-binding site and towards bulk solvent (Fig. 4.3c,d). This places the C11a substrate hydroxylation site of ring C proximal to FAD. In the Tet(50)+chlortetracycline structure, the D-ring chlorine now faces FAD with the dimethylamine group of the A-ring making van der Waals contacts with Phe-95 from the flexible loop, Val-348 from the first C- terminal -helix, and Ile-371 from the second C-terminal -helix (Fig. 4.3a,b). Surprisingly, this new orientation positions C11a of chlortetracycline away from C4a of FAD.

We observed a second notable characteristic when comparing the structures of Tet(50) in the presence or absence of chlortetracycline. In the absence of chlortetracycline, Tet(50)

141

Figure 4.3: Crystal structure of Tet(50) in complex with chlortetracycline reveals an unexpected mode of binding to the tetracycline-binding site that drives substrate loading channel closure and FAD conversion. (a) Chlortetracycline binds to the active site of Tet(50) in a ~180º rotated orientation relative to Tet(X)+chlortetracycline such that the D-ring of chlortetracycline is proximal to FAD in the IN conformation (orange). A model of FAD OUT (grey) is overlaid as found in the Tet(50) monomer B in the absence of chlortetracycline (see Fig. 4.2m and 4.2n). (b) The rotated orientation of chlortetracycline in the Tet(50)+chlortetracycline structure is supported by van der Waals contacts provided by Val-348 (cyan) and Ile371 (deep teal) from the two C-terminal α-helices in Tet(50) to the dimethylamine group of the A-ring of chlortetracycline. Additionally, Phe-95 from the flexible loop makes contacts with the dimethylamine group and closes off the substrate-binding site. (c) Tet(X) with chlortetracycline bound with D-ring distal to FAD in the IN conformation. The substrate-binding site is widely exposed to bulk solvent. (d) Residue Met-375 from the first C-terminal α-helix in Tet(X) (cyan) makes van der Waals contacts to the D-ring of chlortetracycline. A second C-terminal helix (red dashed circle, colored in deep teal) does not exist in Tet(X), and substrate can potentially enter from various possible directions. (e) Surface representation of Tet(50)+chlortetracycline monomer A reveals the substrate-loading channel is closed. (f) The substrate-loading channel is also closed in Tet(50)+chlortetracycline monomer B. (g) In Tet(50)+chlortetracycline monomer A, FAD is IN, the loop is closed, and no chlortetracycline is bound. (h) In Tet(50)+chlortetracycline monomer B, FAD is IN, the loop is closed, and chlortetracycline is bound. (i) While the substrate-loading channel is open in Tet(50) monomer B, with FAD OUT, in the absence of chlortetracycline (grey), the flexible loop containing Phe-95 closes over the channel in Tet(50)+chlortetracycline monomer B, with FAD now IN.

monomer A had FAD in an IN conformation with a closed channel (Fig. 4.2e,f,k,l) and monomer

B had FAD in an OUT conformation with an open channel (Fig. 4.2g,h,m,n). However, in the

presence of chlortetracycline, we only detected bound chlortetracycline in monomer B, which

now had FAD in an IN conformation and a closed channel (Fig. 4.3e-h). Thus, substrate binding

to Tet(50) monomer B induced a conformational switch from FAD OUT to FAD IN and loop

closure (Fig. 4.3h,i).

4.2.5 Tetracycline destructases degrade chlortetracycline despite the novel binding mode

Due to the unanticipated orientation of chlortetracycline binding, we examined whether

the tetracycline destructases could degrade chlortetracycline. Enzymatic reactions were analyzed

at several time points by reverse-phase high-performance liquid chromatography (HPLC). We 142 observed the time- and enzyme-dependent degradation of chlortetracycline by Tet(50) and

Tet(X) (Fig. 4.4a). Kinetic parameters of enzymatic inactivation were determined by monitoring in vitro reaction progress using absorbance at 400 nm. The catalytic efficiency of Tet(50) was

-1 -1 five times higher than that of Tet(X) (kcat/KM values of 0.55 and 0.11 µM min , respectively)

(Supplementary Table 4.2). This increased efficiency is primarily due to increased turnover as the apparent KM values are comparable between Tet(50) (6.3±2.0 µM) and Tet(X) (7.9±2.7 µM) in spite of different substrate binding orientations. Tet(55) and Tet(56) also degraded chlortetracycline in vitro with 4-fold and 15-fold greater efficiency than Tet(X), respectively

(Supplementary Table 4.2). Furthermore, tet(50,51,55,56,X) each confer chlortetracycline resistance when expressed in E. coli at levels 16-32 fold greater than the vector-only control

(Supplementary Table 4.3). As a result, despite employing a distinct mode of substrate binding,

Tet(50,51,55,56) are able to degrade chlortetracycline more efficiently than Tet(X).

Tetracycline inactivation by Tet(X) occurs via catalysis at C11a resulting in a product of m/z 461406. Because chlortetracycline binds Tet(50) in an alternative mode that positions C11a far away from the reactive flavin peroxide moiety, we sought to characterize the degradation product to establish substrate hydroxylation. Enzymatic reactions were analyzed by liquid chromatography-mass spectrometry (Supplementary Fig. 4.6), and found to convert chlortetracycline (m/z 479) to an oxidation product with an m/z of 467. This is in contrast to the m/z 461 monooxygenation product observed for tetracycline406, consistent with an alternate binding mode for chlortetracycline. To further characterize this product, reactions were subjected to high resolution mass spectrometry. Reactions with each enzyme assayed (Tet(50,55,56)) yielded a primary product with an exact m/z of 467.12 (Supplementary Fig. 4.7). In the alternative binding mode, the nonplanarity of the chlortetracycline substrate positions the

143

Figure 4.4: Chlortetracycline is degraded by tetracycline destructases despite the unusual binding mode. (a) HPLC chromatograms show the time and enzyme dependent consumption of chlortetracycline. (b) High-resolution MS-MS analysis of the tetracycline destructase reaction with chlortetracycline supports clean conversion to the m/z 467 oxidation product. MS-MS spectrum of the m/z 467 ion from the Tet(55) reaction with proposed fragmentation pathway. The closest reactive carbons to C4a of the FAD cofactor are C3 (c) and C1 (d) of the chlortetracycline A ring, both of which are closer than C11a (e), the hydroxylation site observed in Tet(X) mediated chlortetracycline degradation. (f) Two proposed mechanisms for degradation of chlortetracycline, consistent with MS and crystallographic data. Addition of the C4a flavin peroxide to C3 generates intermediate 1, which can undergo epoxide formation to give an equilibrating mixture of intermediates 2 and 3. Intermediate 3 can also be generated via intermediate 4 arising from direct attack of the C4a flavin peroxide on carbonyl C1. Intermediate 3 can rearrange to cycloheptanone intermediate 5. Fragmentation will give intermediate 6 via loss of carbon monoxide followed by ring contraction resulting in formation of product 7 with m/z 467 for [M+H]+. 144 reactive A-ring C3 in closest proximity to the flavin cofactor. Notably, the C3 is 6.1 Å (Fig. 4.4c) and the C1 carbonyl is 7.4 Å (Fig. 4.4d) from the C4a of the flavin cofactor. These distances are similar to C11a of chlortetracycline and C4a of FAD in the Tet(X)+chlortetracycline structure385, and well within the C4a-reactive atom distances observed for flavin monooxygenases416. The

C11a in the Tet(50)+chlortetracycline, on the other hand, is on the opposite side of the molecule and 7.9 Å away (Fig. 4.4e). Accordingly, we propose a mechanism in which the flavin peroxide can attack C3 of the chlortetracycline A ring, yielding intermediate 1 (Fig. 4.4f). Spontaneous epoxide formation gives intermediates 2 and 3, which rearranges to a cycloheptanone intermediate 5 via Baeyer-Villiger ring expansion. Expulsion of carbon monoxide yields intermediate 6, and ring contraction yields oxidation product 7, with an m/z of 467. Alternatively, intermediate 3 can be formed by flavin peroxide attack of C1 of the chlortetracycline A ring, via intermediate 4, and then similarly continuing through products 5-7. The final product 7 is consistent with the fragmentation pattern observed in tandem mass spectrometry (Fig. 4.4b).

Similar oxidative cascades proceeding through Baeyer-Villiger reactions have been observed in the biosynthesis of the cyclic type II polyketide mithramycin by the flavin monooxygenase

MtmOIV417. The discovery of alternative substrate binding modes and characterization of degradation products demonstrates the plasticity of tetracycline destructases for adapting flavoenzyme-mediated degradation chemistries to achieve resistance in the presence of diverse tetracycline scaffolds.

145

Figure 4.5: Anhydrotetracycline binds to the active site of Tet(50), trapping FAD in the unproductive OUT conformation. (a) Anhydrotetracycline binds the active site of Tet(50) and traps the FAD cofactor in the unproductive OUT conformation (orange) in monomer B. The IN conformation of FAD from monomer A is superimposed in grey for comparison, and sterically clashes with the D-ring hydroxyl of anhydrotetracycline. (b) Surface representation of Tet(50)+anhydrotetracycline reveals that the substrate-loading channel remains open, which corresponds to FAD locked in the OUT conformation. (c) In Tet(50)+anhydrotetracycline monomer B, FAD is OUT, the loop is open, and anhydrotetracycline is bound (not shown: in monomer A, FAD is IN, the loop is closed, no anhydrotetracycline is bound). (d) Residue Thr- 207 in Tet(50) makes van der Waals interactions with the planar 6-methyl group of anhydrotetracycline (aTC) (yellow) in the bound orientation, but would sterically clash with the 6-methyl and 6-hydroxyl groups that branch from the C ring of tetracycline or chlortetracycline if bound in a flipped orientation.

4.2.6 Anhydrotetracycline is a mechanistic and competitive inhibitor that locks the FAD in an unproductive conformation

Due to the global dissemination of the β-lactamases, nearly all β-lactam antibiotics are co-developed with β-lactamase inhibitors110, an approach that has successfully prolonged their clinical utility. We reasoned that a similar strategy might be useful to counteract tetracycline resistance by inactivation, and therefore sought to identify small molecule inhibitors of these enzymes. Previously, we observed that anhydrotetracycline, a key biosynthetic precursor418 and degradation product419 of tetracycline with poor antibiotic activity was not degraded by Tet(47-

56)406. Nonetheless, it is known to be an effector of tetracycline producers and tetracycline- resistant bacteria by inducing expression of energetically expensive tetracycline efflux pumps,

146 permitting tetracycline producers to survive and selecting against tetracycline resistance112.

Based on the structural similarity to tetracycline and the intimate role that anhydrotetracycline plays in tetracycline biology, we hypothesized that anhydrotetracycline represents an evolutionarily-privileged chemical lead for inhibitor design.

We obtained a co-crystal structure of Tet(50) with anhydrotetracycline bound, and observed two unique features in comparison to our Tet(50)+chlortetracycline and the earlier

Tet(X)+chlortetracycline structures. First, anhydrotetracycline binds to Tet(50) in a flipped orientation and in a position distinct from where chlortetracycline binds (Fig. 4.5a-c and

Supplementary Fig. 4.3-4). The unique binding mode for anhydrotetracycline is enabled by the lack of a 6-hydroxyl group of ring C present in tetracycline or chlortetracycline (Fig. 4.5a).

Without this substitution at the 6 position, the tetracycline gains additional aromatic stabilization.

The resultant planar structure allows the 6-methyl group to make van der Waals interactions with a conserved Thr/Ser at residue position 207 in Tet(47-56) (Fig. 4.5d and Supplementary Fig. 4.4-

5). Thr-207 would cause a steric clash with the 6-methyl and 6-hydroxyl groups of ring C in tetracycline or chlortetracycline, explaining the distinct binding modes.

The second interesting feature is that when anhydrotetracycline was bound, FAD was in the OUT conformation and the substrate-loading channel was open (Fig. 4.5b,c). The unique binding location of anhydrotetracycline locks the isoalloxazine moiety of FAD away from the substrate-binding site and sterically blocks the transition to the FAD IN conformation observed in the Tet(50)+chlortetracycline monomer B. This unexpected binding mode establishes a novel mechanism for inhibitors that stabilize the inactive OUT conformation of the FAD cofactor in flavoenzymes and prevents transition to the necessary FAD IN conformation for catalysis.

147

Therefore, anhydrotetracycline is a mechanistic inhibitor of the tetracycline destructases that also competitively blocks substrate binding.

4.2.7 Anhydrotetracycline inhibits tetracycline destructases and functionally rescues tetracycline

We examined the effect of anhydrotetracycline on tetracycline destructase activity in vitro. We performed in vitro enzymatic reactions in the presence or absence of anhydrotetracycline followed by HPLC. For clinical relevance, we first focused on Tet(56), the tetracycline destructase from pathogenic L. longbeachae. We observed the Tet(56)-dependent degradation of 0.1 mM tetracycline over time, as demonstrated by the decrease in the tetracycline peak (Fig. 4.6a). However, in the presence of 1 mM anhydrotetracycline, the tetracycline peak does not change, indicating that tetracycline is not degraded (Fig. 4.6b). Similar results were observed for Tet(50,51,55,X) (Supplementary Fig. 4.8, Supplementary Table 4.4). We also monitored enzymatic inactivation of tetracycline using absorbance at 400 nm in the presence of a range of anhydrotetracycline concentrations. Anhydrotetracycline inhibited Tet(50,55,56) with

IC50 values of 83.2±1.2 µM, 25.6±1.2 µM, and 37.1±1.1 µM, respectively (Fig. 4.6c). Thus, anhydrotetracycline prevents the enzyme-dependent degradation of tetracycline. Together with our structural data, this indicates a common mechanism of inhibition for tetracycline-inactivating enzymes, and establishes anhydrotetracycline as a lead compound that presents a flexible starting point for generating tetracycline destructase inhibitors with improved activity420. This inhibition strategy that stabilizes inactive cofactor states is widely applicable to the larger superfamily of flavoenzymes and offers new avenues for inhibiting any member of this superfamily, many of which have been implicated in human disease and represent promising targets for hypercholesterolemia and antifungal drugs421.

148

Figure 4.6: Anhydrotetracycline prevents enzymatic tetracycline degradation, functionally rescuing tetracycline antibiotic activity. (a) Tetracycline (TC) is degraded by Tet(56) in vitro HPLC chromatograms show in vitro reactions with UV detection at 363 nm and separation on a C18 column. (b) TC degradation is attenuated by the addition of an excess of aTC. (c) Dose- dependent inhibition of Tet(50,51,56) activity by anhydrotetracycline. Velocity is determined by measuring tetracycline consumption via change in absorbance at 400 nm. Data are represented as mean ± s.d. of three technical replicates. (d) Dose-response curve showing effect of aTC on sensitivity of Tet(56)-expressing E. coli to TC in liquid culture. Data are represented as mean ± s.e.m. of three technical replicates. (e) TC and aTC synergistically inhibit growth of E. coli expressing Tet(56), FICI = 0.1875. Points show minimum inhibitory concentrations of two drugs in combination. Dashed line indicates the theoretical concentration of additive drug interaction. Data represented as mean ± s.e.m. of three technical replicates.

4.2.8 The novel inhibition mechanism restores tetracycline activity against tetracycline destructase-expressing bacteria

Our data suggest that a tetracycline/tetracycline-destructase-inhibitor (e.g. anhydrotetracycline) combination therapy strategy could potentially be employed to rescue antibiotic activity of tetracyclines against bacteria that encode tetracycline-inactivating enzymes.

Tet(X) and Tet(56) are of particular interest due to their clinical significance. tet(X) has been recently identified in a number of human pathogens, including 11 nosocomial uropathogens from

Sierra Leone138 and 12 Acinetobacter baumannii isolates from a hospital in China137. We also showed that tet(56) is present and functional in L. longbeachae406 – a pathogen responsible for 149 causing Pontiac Fever and Legionnaires’ Disease409,410. Accordingly, we tested whether anhydrotetracycline rescues tetracycline efficacy against E. coli expressing tet(56). Two µg/mL anhydrotetracycline caused a greater than 5-fold change in sensitivity of E. coli expressing tet(56) to tetracycline in liquid culture, as indicated by a change in IC50 from 47.4 to 8.27 µg/mL

(Fig. 4.6d). Further, anhydrotetracycline and tetracycline acted synergistically to inhibit growth of E. coli expressing Tet(50,51,55,56), with fractional inhibitory concentration indices (FICI) of

0.625, 0.5, 0.375 and 0.1875, respectively (Fig. 4.6e and Supplementary Fig. 4.9). Although anhydrotetracycline is not degraded by Tet(47-56), it is slowly degraded by Tet(X)406. However, anhydrotetracycline still was able to prevent tetracycline degradation by Tet(X) in vitro

(Supplementary Fig. 4.8). Our proof of concept experiment, taken together with our structural and in vitro data, reveals that a co-administration strategy based on inhibition of tetracycline- inactivating enzymes could be effective for the treatment of tetracycline-resistant bacterial infections.

4.3 Materials and Methods

4.3.1 Legionella plasmid construction

The Tet(56) deletion plasmid, pJB7204, was constructed by amplifying 2 kb of DNA upstream and downstream of the Tet(56) ORF using primers JVP2913/JVP2910 and

JVP2911/JVP2912 and Legionella longbeachae chromosomal DNA. The PCR products were digested with SalI/NotI and NotI/SacI, respectively, and ligated into SalI/SacI-digested suicide vector pSR47S422. The ligated product was transformed into EC100D::Δpir and selected on LB plates containing 20 µg/ml kanamycin. The Tet(56) complementing clone, pJB7207, was constructed by amplifying the Tet(56) ORF using primers JVP2921/JVP2922 and Legionella

150 longbeachae chromosomal DNA. The PCR product was digested with BamHI/SalI and cloned into BamHI/SalI-digested expression vector pJB1625 (Supplementary Table 4.5).

4.3.2 Legionella strain construction

The Tet(56) deletion strain, JV8858, was constructed by a traditional loop-in/loop-out strategy. Briefly, the wild type Legionella longbeachae strain JV595 was transformed by electroporation with the ΔTet(56) suicide plasmid pJB7204 and integrants were selected on CYE plates423 containing 30 µg/ml kanamycin. Resolution of the merodiploid was obtained on CYE plates containing sucrose. Strains were then electroporated with the vector pJB1625 or the

Tet(56) complementing clone pJB7207 and transformants were selected on CYE plates containing 5 µg/ml chloramphenicol.

4.3.3 Tetracycline inactivation in Legionella

Antibiotic susceptibility testing was performed using L. longbeachae wild type and deletion strains and L. pneumophila424, bearing either the vector pJB1625425 or the Tet(56) complementing clone pJB7207. Minimum inhibitory concentrations were determined according to Clinical and Laboratory Standards Institute (CLSI) procedures with the following modifications. Results are representative of three independent experiments. The strains were initially grown as a patch on CYE plates containing chloramphenicol for 2 days at 37 degrees.

The bacteria were swabbed into distilled water, washed one time, and resuspended at an 600 nm absorbance (OD600) of 1 (~1E9 CFU/ml). The culture was diluted 200 fold into 10 mL of buffered AYE media containing 2 µg/ml chloramphenicol and a range of tetracycline but lacking supplemental iron, as iron can interfere with tetracycline activity. The cultures were grown for

151

48 hours at 37ºC on a roller drum and the absorbances (OD600) were periodically measured using a Genesys 20 Spectrophotometer.

4.3.4 Cloning, expression and purification of tetracycline-inactivating enzymes

All genes encoding tetracycline-inactivating enzymes were cloned into the pET28b(+) vector (Novagen) at BamHI and NdeI restriction sites. Constructs were transformed into BL21-

Star (DE3) competent cells (Life Technologies). Cells harboring the plasmid were grown at 37ºC in LB medium containing a final concentration of 0.03 mg/mL kanamycin. Once cells reached an

OD600 of 0.6, cells were cooled to 15 ºC and induced with 1 mM IPTG overnight. After this period, cells were harvested by centrifugation at 4000 rpm for 10 min at 4 ºC. Cell pellets were suspended in 10 mL of 50 mM Tris (pH 8.0), 100 mM NaCl, 10 mM imidazole (pH 8.0), 1 mM

PMSF, and 5 mM BME per 1 liter of LB medium and stored at -80ºC.

Cells were thawed in the presence of 0.25 mg/mL lysozyme and disrupted using sonication on ice for 60 seconds. The cell extract was obtained by centrifugation at 13,000 rpm for 30 min at 4 ºC and was applied onto nickel rapid run agarose beads (Goldbio) equilibrated with wash buffer (50 mM Tris (pH 8.0), 150 mM NaCl, 20 mM imidazole (pH 8.0), and 5 mM

BME). The wash buffer was used to wash the nickel column three times with five column volumes. After washing, protein was eluted with five column volumes of elution buffer (wash buffer with 300 mM imidazole). The protein sample was further purified by gel chromatography using a HiLoad 16/600 Superdex 200pg column (GE Healthcare) equilibrated with 10 mM Tris

(pH 8.0), 150 mM NaCl, 5 mM dithioerythritol (DTT). The fractions containing the protein of interest were pooled and concentrated using a 30K MWCO Amicon centrifugal filter (Millipore).

152

4.3.5 Tet(55) selenomethionine-labelling

For selenomethionine-labelled Tet(55) (Se-Met Tet(55)), cells were grown in 1 L of

SelenoMetTM Medium supplemented with SelenoMet Nutrient Mix (Molecular Dimensions

Limited). Once cells reached an OD600 of 0.6, feedback inhibition amino acid mix (0.1 g of lysine, threonine, phenylalanine; 0.05 g of leucine, isoleucine, valine; 0.05 g of L(+) selenomethionine (ACROS Organics 259960025)) was added and the cells were shaken for 15 minutes at 15 ºC. After 15 minutes, cells were induced at 15 ºC with 1 mM IPTG overnight. All other purification conditions were the same as for the native tetracycline-inactivating enzymes.

4.3.6 Crystallization, data collection, and structure determination

For crystallization, Se-Met Tet(55) was concentrated to 25 mg/mL. Crystals were obtained by vapor diffusion using hanging drops equilibrated at 18 °C. Se-Met Tet(55) crystallized in 0.1 M Tris-HCl (8.5) and 20-25% PEG 3000. Se-Met Tet(55) crystals were harvested directly from the growth condition and flash-frozen under liquid nitrogen.

Native Tet(55) was concentrated to 50 mg/mL and crystallized in 0.1 M Tris-HCl (8.5) and 25-27% PEG 4000. Native Tet(55) crystals were harvested directly from the growth condition and flash-frozen. Tet(50) was concentrated to 35 mg/mL and crystallized in 0.1 M

MES (pH 6.0-6.5), 1.6-2.0 M ammonium sulfate, 2-10% 1,4-Dioxane. Crystals were harvested directly from the growth condition and flash-frozen. For co-crystal structures, Tet(50) was concentrated to 17 mg/mL, and Tet(50) crystals were soaked with mother liquor plus 5 mM chlortetracycline or 4mM anhydrotetracycline for 30 minutes before flash-freezing. Tet(51) was concentrated to 13 mg/mL and crystallized in 0.1 M MES (pH 6.0) and 10% PEG 6000. Crystals were cryo-protected with 0.1 M MES (pH 6.0), 10% PEG 6000, and 30% glycerol before flash- freezing. Tet(56) was concentrated to 38 mg/mL and crystallized in 0.1 M tri-sodium citrate (pH 153

5.6), 10% PEG 4000, 10% isopropanol. Tet(56) crystals were cryo-protected in 0.1 M tri-sodium citrate (pH 5.6), 10% PEG 4000, 10% isopropanol, and 20% glycerol before flash-freezing.

The crystal structure of Tet(55) was solved by seleno-methionine labeling and single- wavelength anomalous dispersion (SAD) (Supplementary Table 4.1), as molecular replacement using the previously published Tet(X) structures was unsuccessful. The inability to solve the structure by molecular replacement demonstrates that tetracycline-inactivating enzymes are structurally diverse and multiple structures are required to capture the diversity within the family.

X-ray data for selenomethionine-labelled Tet(55) were collected from a single crystal using a wavelength of 0.976289 Å at synchrotron beamline 4.2.2 of the Advanced Light Source in

Berkeley, CA. All other native data sets were collected at a wavelength of 1 Å. Data were collected on the CMOS detector and were processed with XDS426. Structure solution for Se-Met

Tet(55) was performed using PHENIX AutoSol. Thirteen selenium sites were found, which gave a figure of merit of 0.370. The resulting Tet(55) model was refined against the native Tet(55) data set. R and Rfree flags were imported from the Se-Met Tet(55) mtz file using UNIQUEIFY within the CCP4 package427. Tet(50,51,56) structures were solved my PHENIX AutoMR using an ensemble of three domains of Tet(55) (domain 1 = aa1-70, aa100-172, aa276-319; domain 2= aa71-99, 173-275; domain 3= aa320-387). Structure solution for the Tet(50) chlortetracycline and anhydrotetracycline structures were performed by refinement with the apo Tet(50) structure, from which the R and Rfree flags were imported using UNIQUEIFY.

Subsequent iterated manual building/rebuilding and refinement of models were performed using Coot428 and PHENIX429, respectively. The structure validation server

MolProbity430 was used to monitor refinement of the models. All final refined models have favorable crystallographic refinement statistics, as provided in Supplementary Table 4.1. Figures

154 were generated and rendered in PyMOL Molecular Graphics System, Version 0.99rc6,

Schrödinger, LLC.

4.3.7 In vitro tetracycline and chlortetracycline inactivation assays

Reactions were prepared in 100 mM TAPS buffer with 100 µM substrate, 14.4 µM enzyme, and an NADPH regenerating system consisting of the following components (final

+ concentrations): glucose-6-phosphate (40mM), NADP (4mM), MgCl2 (1mM), and glucose-6- phosphate dehydrogenase (4U/ml). The regeneration system was incubated at 37ºC for 30 minutes to generate NADPH before use in reactions. Reactions were sampled at various timepoints, and quenched in four volumes of an acidic quencher consisting of equal parts acetonitrile and 0.25 M HCl.

Products generated from enzymatic inactivation of both tetracycline and chlortetracycline were separated by reverse phase HPLC using a Phenomenex Luna C18 column (5µm, 110 Å,

2×50mm) and 0.1% trifluoroacetic acid in water (A) and acetonitrile (B) as mobile phase.

Injections of 25µl sample volume were eluted using a linear gradient from 25%B to 75%B over

14 minutes at a flow rate of 1ml/min.

Chlortetracycline reactions analyzed by high resolution tandem mass spectrometry were sampled at 75 minutes. The quenched samples were diluted 6X with 50% methanol in 0.1% formic acid and run on the Q-Exactive Orbitrap by direct infusion using the Advion Triversa nanomate. The data were acquired with resolution of 140,000. The MS scan was acquired from m/z 300 – 550. MS/MS spectra were acquired on the m/z 467.12 compounds.

155

4.3.8 Tetracycline inactivation in E. coli

Antibiotic susceptibility testing was performed in E. coli MegaX cells (Invitrogen) bearing the pZE21 expression vector with the tetracycline inactivating gene of interest.

Minimum inhibitory concentrations were determined according to Clinical and Laboratory

Standards Institute (CLSI) procedures339 using Mueller-Hinton broth with 50 µg/mL kanamycin and a range of chlortetracycline concentrations profiled via absorbance measurements at 600nm

(OD600) at 45 minute intervals using the Synergy H1 microplate reader (Biotek Instruments, Inc) for 48 hours at 37ºC.

4.3.9 Kinetic characterization of tetracycline and chlortetracycline inactivation

The optimal enzyme concentration for steady-state kinetics assays was determined by varying the concentration of enzyme while keeping chlortetracycline and NADPH concentration constant. 0.4 μM enzyme was found to give linear slopes for all concentrations of substrate tested, and was used as the enzyme concentration for all kinetics experiments.

Reactions were prepared in 100 mM TAPS buffer at pH 8.5 with 0-160 µM substrate, 1.6 mM NADPH, and 0.4 µM enzyme. UV-visible spectroscopy measurements were taken in triplicate at 400nm wavelength light with a Cary 60 UV/Vis system (Agilent) for 10 minutes at room temperature. Initial reaction velocities were determined by linear regression using the

Agilent Cary WinUV Software, and fitted to the Michaelis-Menten equation:

using GraphPad Prism 6.

156

4.3.10 LC-MS characterization of chlortetracycline degradation products

Reactions were prepared in 100 mM phosphates buffer at pH 8.5 with 1mM CTC, 0.5 mM NADPH, 5 mM MgCl2 and 0.4 µM Tet(55). After 10 minutes, the reaction was centrifuge filtered for 10 minutes using a Millipore Amicon Ultracel (3 kDa MW cutoff) to remove enzyme.

Prior to centrifugation, filters were triply rinsed with phosphate buffer to remove excess glycerol.

The filtrate was collected and analyzed by LC-MS using an Agilent 6130 single quadrupole instrument with G1313 autosampler, G1315 diode array detector, and 1200 series solvent module. Reaction products were separated using a Phenomenex Gemini C18 column, 50 × 2 mm

(5 µm) with guard column cassette was used with a linear gradient of 0% acetonitrile + 0.1% formic acid to 95% acetonitrile + 0.1% formic acid over 14 min at a flow rate of 0.5 mL/min prior to analysis by electrospray ionization.

4.3.11 In vitro characterization of anhydrotetracycline inhibition

IC50 values were determined for Tet(50), Tet(55), and Tet(56) by measuring the initial velocity of tetracycline degradation in the presence of varying concentrations of anhydrotetracycline. The concentrations of tetracycline and NADPH were kept constant at 25

μM and 500 μM, respectively. Assays were prepared by combining all components except for enzyme and equilibrating to 25°C for five minutes. After the addition of enzyme, absorbance at

400 nm was measured for five minutes. All measurements were taken in triplicate. The final concentrations for assay components were 100 mM TAPS buffer (pH 8.5), 25 μM tetracycline,

500 μM NADPH, 16 mM MgCl2, 0.4 μM enzyme, and 0.05 – 150 μM anhydrotetracycline. A control assay using no anhydrotetracycline was assigned a concentration of 1.0×10-15 μM for analysis. A second control using no enzyme and 100 μM anhydrotetracycline was assigned a

15 concentration of 1.0×10 μM to simulate full inhibition of enzyme. IC50 values were determined 157 by plotting the log of anhydrotetracycline concentration against v0 in GraphPad Prism 6.

Functional Tet(51) expressed poorly, so Tet(51) was omitted from these and other in vitro experiments.

4.3.12 Checkerboard synergy assay

Tetracycline (1024 µg/mL) and anhydrotetracycline (256 µg/mL) were dissolved in cation-adjusted Mueller-Hinton broth supplemented with 50 ug/mL kanamycin. A twofold dilution series of each drug was made independently across 8 rows of a 96 well master plate before 100 uL of each drug dilution series were combined into a 96 well culture plate (Costar), with rows included for no-drug and no-inocula controls. A sterile 96-pin replicator (Scinomix) was used to inoculate plates with ~1 uL of E. coli MegaX (Invitrogen) expressing a tetracycline inactivating enzyme, diluted to OD600 0.1 using. Plates were sealed with Breathe-Easy membranes (Sigma-Aldrich) and incubated at 37˚C with shaking at 220 rpm. Endpoint growth was determined by OD600 at 20 and 36 hours of growth using a Synergy H1 plate reader

(BioTek, Inc.). Three independent replicates were performed for each strain on separate days.

Synergy of anhydrotetracycline and tetracycline combinations was determined using the fractional inhibitory concentration index (FICI) method431,

where FICI>1 indicates antagonism, FICI = 1 indicates additivity, and FICI <1 indicates synergy. The efficacy of the drug combination was also evaluated in the L. longbeacheae background, but synergy was not observed.

158

4.4 Discussion

The widespread anthropogenic use of tetracycline antibiotics motivates the immediate study of emerging mechanisms of tetracycline resistance, such as enzymatic inactivation. Our data provide unprecedented insight into the dynamics of tetracycline-inactivating enzymes and reveal a novel mode of inhibition. Substrates like chlortetracycline are loaded into enzymes in the FAD OUT conformation through the substrate-loading channel (Supplementary Fig. 4.10a), which is open as a flexible loop is pulled away from the channel. Upon substrate binding, the enzyme converts to FAD IN, the channel closes, and catalysis can occur due to the proximity of

FAD to the substrate. Mechanistic inhibitors like anhydrotetracycline also enter the enzyme through the same channel but bind at a distinct site (Supplementary Fig. 4.10b). Binding of inhibitor in this location sterically blocks FAD conversion to the IN conformation and prevents subsequent substrate binding and catalysis. Our model predicts that compounds that either bind with higher affinity to the inhibitor binding pocket, or that concomitantly bind to the inhibitor and substrate-binding sites will provide enhanced inhibition for the control of tetracycline resistance. This novel mechanism of inhibition is not only applicable to preventing antibiotic resistance, but is highly relevant to additional FAD dependent enzymes that comprise the flavoenzyme superfamily and are of clinical interest421.

The rise in resistance to early-generation tetracyclines has spurred the development of next-generation derivatives, including tigecycline (approved for human use in 2005)134, and eravacycline and omadacycline (currently in late-stage clinical trials)139,140. These newer drugs are designed to evade resistance by efflux or ribosomal protection, but they are largely untested against tetracycline-inactivating enzymes. Alarmingly, tigecycline was found to be vulnerable to oxidative inactivation by Tet(X)385, which was recently identified for the first time in numerous

159 pathogens of high clinical concern137,138. These challenges highlight the immediate importance of studying mechanisms of emerging tetracycline resistance, such as those described here that expand substrate scope.

Tetracycline resistance by enzymatic inactivation has thus far been rarely documented compared to resistance by efflux or ribosomal protection. Growing evidence, however, indicates that enzymatic tetracycline inactivation is a widespread feature in soil microbial communities406, and is a recently observed emerging threat in human pathogens137,138,386. Flavoenzymes display a proclivity for horizontal gene transfer and gene duplication, bestowing the potential to spread between bacteria and acquire novel functions389. Interestingly, the contigs on which tet(47-55) were discovered also contained mobility elements and other resistance genes58,406, suggesting that their original genomic context may be as part of a multidrug resistance cassette or mobile genetic element. This indicates that tetracycline-inactivating enzymes pose a threat for facile acquisition by additional human pathogens. Indeed, we show that tet(56) is present and functional in the human pathogen L. longbeachae, and tet(X) has now been reported in four out of six ESKAPE pathogens137,138,386, demonstrating the urgency of this threat. Our results reveal the structural basis for plasticity and dynamics in substrate binding in these enzymes. We propose a novel combination therapy strategy to retain tetracycline efficacy against bacteria that harbor tetracycline-inactivating resistance genes. Our results provide the structural and biochemical foundation to counter the alarming emergence of tetracycline resistance via enzymatic inactivation.

4.5 Acknowledgements

This work was done in collaboration with Jooyoung Park, Margaret R. Reck, Chanez T.

Symister, Jennifer L. Elliot, Joseph P. Vogel, Timothy A. Wencewicz, Gautam Dantas, and Niraj 160

H. Tolia. This chapter was published in Nature Chemical Biology in 2017294. J.P. designed and performed crystallographic experiments and X-ray structure determination, analyzed data, and wrote the paper; A.J.G. designed and performed in vitro and microbiological experiments, analyzed data, and wrote the paper; M.R.R. and C.T.S. performed in vitro experiments; J.L.E. performed crystallographic experiments; J.P.V. performed Legionella experiments; T.A.W.,

G.D., and N.H.T. designed experiments, analyzed data, and wrote the manuscript

This work was supported by an award to N.H.T, G.D., and T.A.W. from the National

Institute of Allergy and Infectious Diseases of the National Institutes of Health (R01 AI123394).

A.J.G. was supported by the National Institute of General Medical Sciences Cell and Molecular

Biology Training Grant (T32 GM007067).

161

4.6 Supplementary Figures

Supplementary Figure 4.1: Overall structures and FAD conformation states of Tet(50,51,55,56)

Crystal structures of (a) Tet(50) monomer A, (b) Tet(50) monomer B, (c) Tet(51), (d) Tet(55), and (e) Tet(56).

(f) In Tet(50) monomer A, FAD is bound non-covalently in the IN conformation, characterized by a 12.3 Å distance between the C8M and C2B atoms of the FAD molecule.

(g) In Tet(50) monomer B, FAD is bound in the OUT conformation (5.2 Å between the C8M and C2B atoms).

(h) In Tet(51), FAD is bound in the OUT conformation (4.5 Å between the C8M and C2B atoms).

(i) In Tet(55), no electron density for ordered FAD is observed.

(j) In Tet(56), FAD is bound in the OUT conformation (5.2 Å between the C8M and C2B atoms).

162

Supplementary Figure 4.2: Tetracycline compounds have a distinctive three-dimensional architecture with a significant bend between rings A and B, allowing for unambiguous modeling into the electron density

(a) The Fo - Fc map (contoured at 2.0) before modeling of chlortetracycline.

(b) The 2Fo - Fc map (contoured at 1.0) after modeling of chlortetracycline.

(c)The Fo - Fc map (contoured at 2.0) before modeling of anhydrotetracycline.

(d)The 2Fo - Fc map (contoured at 1.0) after modeling of anhydrotetracycline.

163

Supplementary Figure 4.3: Multiple sequence alignment of Tet(47-56). Tetracycline destructases have high levels of sequence similarity in the residues important for binding of anhydrotetracycline (aTC, orange) or chlortetracycline (CTC, pink). Conserved FAD binding motif is boxed in blue.

164

Supplementary Figure 4.4: Anhydrotetracycline prevents enzymatic degradation of tetracycline

165

Supplementary Figure 4.5: Anhydrotetracycline restores tetracycline efficacy against E. coli expressing tet(50,51,55,56),

166

Supplementary Figure 4.6: Anhydrotetracycline synergizes with tetracycline to kill E. coli expressing tet(50,51,55,56) but not tet(X)

167

Supplementary Figure 4.7: Model for binding dynamics, substrate plasticity, and inhibition of tetracycline-inactivating enzymes (a) Substrate (e.g. chlortetracycline) can enter and bind the active site of a tetracycline destructase, resulting in a conformational switch from FAD OUT (grey) to FAD IN (orange) and closure of the substrate site.

(b) A mechanistic inhibitor (e.g. anhydrotetracycline) enters and binds the active site, but sterically prevents the FAD cofactor from switching from the OUT to IN conformation and thereby preventing catalysis. Further, it can act synergistically to competitively prevent substrate from binding.

168

Supplementary Table 4.1: Data collection and refinement statistics

Tet(55) Tet(55) Tet(50) Tet(51) Tet(56) Tet(50) + Tet(50) + Native SeMet chlortetracyc anhydrotetrac line ycline Data collection Space group P21212 P21212 P212121 P21 P21212 P212121 P212121 Cell dimensions a, b, c (Å) 64.74, 65.14, 50.94, 83.49, 76.49, 51.10, 50.99, 124.43, 123.98, 107.61, 81.69, 114.02, 107.22, 107.37, 45.73 45.75 152.48 127.31 94.81 152.63 152.79  (º) 90, 90, 90, 90, 90, 90, 90 90, 90, 90, 90 90, 90, 90 90, 90, 90 90 90 96.790, 90 Peak Wavelength 1.018211 0.97628 1.000029 1.000028 1.000032 1.000031 1.000031 9 Resolution (Å) 20 - 2.00 20 - 1.90 20 - 2.10 20 - 1.85 20 - 3.30 20 - 1.75 20 - 2.25 (2.10- (2.00 - (2.20 - 2.10) (1.95 - (3.30 - (1.85 - 1.75) (2.35 - 2.25) 2.00) 1.90) 1.85) 3.20) Rmeas 11.0% 6.4% 10.8% 9.7% 13.2% 8.6% 11.5% (96.4%) (59.7%) (71.6%) (79.7%) (119.2%) (86.5%) (68.7%) I/I 16.14 17.45 15.07 (2.47) 11.93 10.76 19.31 (2.35) 13.57 (2.37) (2.24) (2.57) (1.90) (1.16) CC1/2 99.9% 99.9% 99.8% 99.8% 99.6% 99.9% 99.7% (70.5%) (79.5%) (83.1%) (69.7%) (48.7%) (76.4%) (88.0%) Completeness 99.7 99.6 98.1 (99.3) 99.7 98.5 98.7 (98.1) 98.6 (92.7) (%) (98.8) (98.0) (99.6) (94.0) Redundancy 7.17 3.79 5.70 (5.55) 3.77 3.60 7.42 (7.37) 4.93 (4.92) (6.85) (3.59) (3.67) (3.22)

Refinement Resolution (Å) 20 - 2.00 20 - 2.10 20 - 1.85 20 - 3.30 20 - 1.75 20 - 2.25 No. reflections 25,629 48,794 144,460 12,803 84,325 39,793 Rwork/ Rfree 19.25/ 22.66/26.21 16.77/ 23.96/ 17.80/ 20.73/ 23.86 19.58 29.55 21.90 25.39 No. atoms Protein 3,284 6,641 13,095 5,819 6,733 6,697 Ligand/ion 5 136 256 116 181 182 Water 187 376 942 0 444 337 B-factors Protein 30.24 29.97 27.94 97.42 23.42 30.66 Ligand/ion 28.36 36.23 17.78 82.54 29.83 39.44 Water 30.51 39.75 32.00 26.88 28.79 R.m.s deviations Bond 0.005 0.003 0.011 0.002 0.011 0.005 lengths (Å) Bond angles 0.797 0.547 1.186 0.645 1.273 0.789 (º)

169

Supplementary Table 4.2: Kinetic parameters for Tet(50,55,56,X)

Tetracycline Chlortetracycline

-1 -1 -1 -1 -1 -1 KM (µM) Vmax (s ) kcat/KM (µM s ) KM (µM) Vmax (s ) kcat/KM (µM s )

Tet(50) 16.84 ± 3.555 0.02852 ± 0.004234 6.298 ± 2.035 0.02302 ± 0.009138 0.001555 0.001634

Tet(55) 4.562 ± 1.684 0.01233 ± 0.006757 6.001 ± 1.315 0.0194 ± 0.008082 0.0009667 0.0002397

Tet(56) 7.665 ± 1.630 0.0425 ± 0.013862 3.654 ± 1.080 0.04389 ± 0.030029 0.002051 0.002619

Tet(X) 10.77 ± 2.613 0.00448 ± 0.00104 7.878 ± 2.668 0.00598 ± 0.001898 0.0002618 0.0004613

170

Supplementary Table 4.3: Relevant strains, plasmid, and primers employed in this study

Strains, Plasmids, Primers Reference or source

Strains

Lp02 Legionella pneumophila Philadelphia I 424

JV595 Legionella longbeachae ATCC 33462

JV8858 Legionella longbeachae ∆tet(56) This study

JV8861 JV595 + pJB7207 This study

JV8864 JV595 + pJB1625 This study

JV8868 JV8858 + pJB7207 This study

JV8870 JV8858 + pJB1625 This study

JV8874 Lp02 + pJB7207 This study

JV8876 Lp02 + pJB1625 This study

Plasmids

pSR47S Suicide plasmid R6K suicide vector (KanR sacB) 422

pJB1625 Complementing vector (CmR version of pJB908) 425

pJB7204 ∆tet(56) suicide plasmid This study

pJB7207 tet(56) complementing clone This study

Primers

JVP2910 GCAGCGGCCGCGCCAATGACGAGAATTTTGATATTTTTAGAC

JVP2911 GCAGCGGCCGCCAATTTCGAATGGGATTACCTTACCTC

JVP2912 GCAGAGCTCGCTTAATGATATCAAGATTAATACAATTCCAATCC

JVP2913 GCAGTCGACGCTCAATTGTATGTTCGTTATGAAGATGGG

JVP2921 CCAGGATCCTAAGAGGAGAAATTAACTATGTCTAAAAATATCAAAATTCTCGTC

JVP2922 CCAGTCGACGTCCACTATGATGATTCATATTGAGG

171

Chapter 5: Conclusions 5.1 The effects of early life antibiotic therapy on the preterm infant gut microbiota and resistome

A reservoir of antibiotic resistance genes of critical interest lies in human associated microbes, encompassing both benign or commensal organisms and pathogens. The human gut hosts a diverse commensal microbial community, whose collective genomes (the microbiome) encode an arsenal of functions, including vital activities that modulate host health and physiology as well as antibiotic resistance genes295. During the first years of life, the infant gut microbiota plays crucial roles in pathogen exclusion, nutrient acquisition, and immune system maturation295,432. Microbiota development can be altered by factors including gestational age at birth, delivery and feeding modes, and antibiotic exposure7,432,433. Perturbation of the gut microbiota during early infancy may be disproportionately damaging272,316, and has been linked to a variety of morbidities such as metabolic and immune disorders263,264,295,298. This is particularly relevant for preterm infants, who almost all receive antibiotics in the first days of life249,272,316.

The infant gut microbiota is shaped by both prenatal and postnatal factors246,247.

Following birth, the infant gut microbiota undergoes a patterned maturation process248-250.

Perturbation of the gut microbiota during this key developmental window can have lasting effects on host physiology and disease risk254-259. One of the most common perturbations during this period, antibiotic therapy258,260, can substantially alter the gut microbiota and infant physiology255,261-266. Antibiotic-induced gut microbiota dysbioses in this population are linked to the pathogenesis of necrotizing enterocolitis270-272, late onset sepsis273,274, and other adverse

172 health outcomes275. While these correlations exist, the underlying etiologies remain unclear, motivating the study of microbiota development in the context of antibiotic therapy in this vulnerable population276.

During early-life hospitalization in the NICU, the preterm infant gut microbiota is characterized by an extensive antibiotic resistome and a high relative proportion of MDROs, which are further enriched following antibiotic exposure7. However, the persistence of these microbiome disruptions is unclear. To this end, I conducted a longitudinal sequencing and culture based interrogation of the preterm infant gut microbiota compared to age-matched, antibiotic naïve near term infant controls to quantify the effects of early life antibiotic therapy on microbiota and resistome development over the first twenty months of life. Using random forests, I trained a model of healthy microbiota development in the absence of early life hospitalization and antibiotic treatment. I applied this model to preterm infants in my cohort and identified acute but transient microbiota immaturity that resolves concomitant with discharge from the NICU. By sequencing isolates cultured from infant stools at multiple timepoints, I identified prolonged and persistent carriage of multidrug resistant Enterobacteriaceae in the gut microbiota of preterm infants. This suggests that these potentially pathogenic strains establish a gastrointestinal niche while infants are hospitalized, and retain that niche following discharge to home. In order to characterize the gut resident resistome of the infants in my cohort, I used functional metagenomics to assay 320 Gb of metagenomic DNA for resistance function in high throughput. I assembled and annotated 874 unique antibiotic resistance genes from these microbiota. After computing the relative abundance of these genes across all microbiota in my cohort, I identified a persistently enriched resistome as well as perturbed patterns of resistome assembly in the gut microbiota of preterm infants over the twenty months of life. Lastly, to

173 synthesize my findings on enduring effects of early life antibiotic treatment on the preterm infant gut microbiota, I used a support vector machine to classify post-discharge samples as originating from an antibiotic treated preterm infant or an antibiotic naïve near term infant based on the metagenomic features present in the stool. I was able to classify with high accuracy the group to which a stool belonged despite inability to distinguish these stools based on species composition alone. This provides evidence of a persistent metagenomic signature of early life antibiotic treatment in preterm infants that lasts beyond discharge from the NICU and may play a role in long-term adverse health outcomes in preterm infants for which the etiology is currently unclear.

This work has the potential to improve antibiotic stewardship in NICU settings, as well as inform antibiotic treatment strategies in neonatal settings such that infections are appropriately treated while microbiota diversity is spared. My results underscore the need for narrow spectrum alternative to the antibiotic regimens currently used widely in neonatal medicine. While it is possible that the persistent metagenomic signatures of early life antibiotic treatment which I identified play a role in the long term neurological, metabolic, and immune pathologies from which preterm infants suffer, further experiments using model systems including gnotobiotic animals are required to substantiate this link. Additional future work in this space could

5.2 Emergence of tetracycline resistance by inactivation across habitats

Tetracycline resistance is typically acquired via horizontal gene transfer and occurs almost exclusively by ribosomal protection or antibiotic efflux368,371. A third mechanism, enzymatic inactivation of tetracycline, is rarely observed, with Tet(X) being the only previously characterized example28,401,434. Both common tetracycline resistance mechanisms have their evolutionary origins in the environment25,94,376, but are now widely distributed in many 174 commensal and pathogenic bacteria4,23,122. Tetracycline antibiotics have been widely used in clinical, agricultural, and industrial settings for the past seven decades, and medicinal chemists and pharmaceutical companies continue to revisit the tetracycline scaffold as a source for new antibiotics. These new developments include tigecycline (approved for human use in 2005)134, eravacycline383 (approved for human use in 2018), and omadacycline (currently in late stage clinical trials)140.

We recently discovered a novel family of ten tetracycline inactivating enzymes from functional selection of grassland and agricultural soil metagenomes58,406. The set included a functionally validated but previously unknown tetracycline destructase homolog from the human pathogen Legionella longbeachae. These genes were annotated as flavin adenine dinucleotide

(FAD) dependent oxidoreductases. The tetracycline destructases confer high-level resistance against tetracycline and oxytetracycline when heterologously expressed in E. coli, and degrade tetracycline in vitro with a dependence on time, enzyme, and reduced nicotinamide adenine dinucleotide phosphate (NADPH). Furthermore, tetracycline inactivation by the tetracycline destructases proceeds via diverse mechanisms, producing decay products not seen in tetracycline inactivation by Tet(X)28,387,406. The tetracycline destructases are likely candidates for dissemination to the clinic and expansion of substrate tolerance, potentially compromising next- generation tetracyclines139 and motivating deep mechanistic analysis of their activities and surveillance of their prevalence across microbial habitats.

This work has importantly identified tetracycline resistance by inactivation as a mechanism of resistance that is more widespread in commensal, environmental, and pathogenic organisms than had previously been appreciated. By elucidating the mechanism of action of these enzymes prior to their broad clinical dissemination, we were able to devise rationally an

175 inhibitor-based strategy to counter the action of these resistance genes. I described anhydrotetracycline, and inhibitor that blocks tetracycline degradation by tetracycline inactivating enzymes in vitro and is sufficient to restore tetracycline efficacy in the face of resistant bacteria which encode a tetracycline inactivator. Using biochemistry and structural biology, we revealed a twofold mode of inhibition in which anhydrotetracycline both competes with tetracycline for binding in the active site and restricts critical dynamics of the FAD cofactor, thus halting the catalytic cycle. Should tetracycline resistance by inactivation expand to become a dominant mechanism of resistance to tetracycline antibiotics, coadministration of anhydrotetracycline and tetracycline (or derivatives thereof) is a promising method for circumventing this resistance threat. Additionally, and more broadly, this work has established and expanded a paradigm in which surveillance of environmental resistance threats allows for proactive development of strategies to mitigate the eventual threat posed by clinical emergence of said resistance threat. In the face of increasing antibiotic resistance and decreasing antibiotic development, this model holds promise as a means for preemptive and sustainable development of novel approaches to address resistance threats before they are endemic.

176

References 1 Lewis, K. Platforms for antibiotic discovery. Nat Rev Drug Discov 12, 371-387, doi:10.1038/nrd3975 (2013). 2 Walsh, C. T. & Wencewicz, T. A. Prospects for new antibiotics: a molecule-centered perspective. J Antibiot (Tokyo) 67, 7-22, doi:10.1038/ja.2013.49 (2014). 3 Pehrsson, E. C. et al. Interconnected microbiomes and resistomes in low-income human habitats. Nature 533, 212-216, doi:10.1038/nature17672 (2016). 4 Forsberg, K. J. et al. The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 1107-1111, doi:10.1126/science.1220761 (2012). 5 UniProt: a hub for protein information. Nucleic Acids Res 43, D204-212, doi:10.1093/nar/gku989 (2015). 6 Wetterstrand, K. A. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP), < www.genome.gov/sequencingcostsdata> ( 7 Gibson, M. K. et al. Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nat Microbiol 1, 16024, doi:10.1038/nmicrobiol.2016.24 (2016). 8 Sommer, M. O., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128-1131, doi:10.1126/science.1176950 (2009). 9 Aminov, R. I. A brief history of the antibiotic era: lessons learned and challenges for the future. Front Microbiol 1, 134, doi:10.3389/fmicb.2010.00134 (2010). 10 Brown, E. D. & Wright, G. D. Antibacterial drug discovery in the resistance era. Nature 529, 336-343, doi:10.1038/nature17042 (2016). 11 Van Boeckel, T. P. et al. Global antibiotic consumption 2000 to 2010: an analysis of national pharmaceutical sales data. Lancet Infect Dis 14, 742-750, doi:10.1016/s1473- 3099(14)70780-7 (2014). 12 Wright, P. M., Seiple, I. B. & Myers, A. G. The evolving role of chemical synthesis in antibacterial drug discovery. Angew Chem Int Ed Engl 53, 8840-8869, doi:10.1002/anie.201310843 (2014). 13 Walsh, C. Antibiotics : actions, origins, resistance. (ASM Press, 2003). 14 Tyndall, J. Observations on the Optical Deportment of the Atmosphere in Reference to the Phenomena of Putrefaction and Infection. Br Med J 1, 121-124 (1876). 15 D'Costa, V. M. et al. Antibiotic resistance is ancient. Nature 477, 457-461, doi:10.1038/nature10388 (2011). 16 Bhullar, K. et al. Antibiotic resistance is prevalent in an isolated cave microbiome. PLoS ONE 7, e34953, doi:10.1371/journal.pone.0034953 (2012). 17 Clemente, J. C. et al. The microbiome of uncontacted Amerindians. Sci Adv 1, doi:10.1126/sciadv.1500183 (2015). 177

18 Wright, G. D. The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev Microbiol 5, 175-186, doi:10.1038/nrmicro1614 (2007). 19 D'Costa, V. M., McGrann, K. M., Hughes, D. W. & Wright, G. D. Sampling the antibiotic resistome. Science 311, 374-377, doi:10.1126/science.1120800 (2006). 20 Benveniste, R. & Davies, J. Aminoglycoside Antibiotic-Inactivating Enzymes in Actinomycetes Similar to Those Present in Clinical Isolates of Antibiotic-Resistant Bacteria. Proc Natl Acad Sci U S A 70, 2276-2280 (1973). 21 Knapp, C. W., Dolfing, J., Ehlert, P. A. & Graham, D. W. Evidence of increasing antibiotic resistance gene abundances in archived soils since 1940. Environ Sci Technol 44, 580-587, doi:10.1021/es901221x (2010). 22 O'Neill, J. Tackling drug-resistant infections globally: Final report and recommendations. (2016). 23 Moore, A. M. et al. Pediatric fecal microbiota harbor diverse and novel antibiotic resistance genes. PLoS ONE 8, e78822, doi:10.1371/journal.pone.0078822 (2013). 24 Yim, G., Wang, H. H. & Davies, J. Antibiotics as signalling molecules. Philos Trans R Soc Lond B Biol Sci 362, 1195-1200, doi:10.1098/rstb.2007.2044 (2007). 25 Davies, J. & Davies, D. Origins and evolution of antibiotic resistance. Microbiol Mol Biol Rev 74, 417-433, doi:10.1128/mmbr.00016-10 (2010). 26 Walsh, C. Molecular mechanisms that confer antibacterial drug resistance. Nature 406, 775-781, doi:10.1038/35021219 (2000). 27 Shaw, W. V. et al. Primary structure of a chloramphenicol acetyltransferase specified by R plasmids. Nature 282, 870-872 (1979). 28 Yang, W. et al. TetX is a flavin-dependent monooxygenase conferring resistance to tetracycline antibiotics. J Biol Chem 279, 52346-52352, doi:10.1074/jbc.M409573200 (2004). 29 Forsberg, K. J., Patel, S., Wencewicz, T. A. & Dantas, G. The Tetracycline Destructases: A Novel Family of Tetracycline-Inactivating Enzymes. Chem Biol 22, 888-897, doi:10.1016/j.chembiol.2015.05.017 (2015). 30 Neu, H. C. Effect of beta-lactamase location in Escherichia coli on penicillin synergy. Appl Microbiol 17, 783-786 (1969). 31 Bush, K. Bench-to-bedside review: The role of beta-lactamases in antibiotic-resistant Gram-negative infections. Crit Care 14, 224, doi:10.1186/cc8892 (2010). 32 Courvalin, P. Vancomycin resistance in gram-positive cocci. Clin Infect Dis 42 Suppl 1, S25-34, doi:10.1086/491711 (2006). 33 Katayama, Y., Ito, T. & Hiramatsu, K. A new class of genetic element, staphylococcus cassette chromosome mec, encodes methicillin resistance in Staphylococcus aureus. Antimicrob Agents Chemother 44, 1549-1555 (2000).

178

34 Ogawa, W., Onishi, M., Ni, R., Tsuchiya, T. & Kuroda, T. Functional study of the novel multidrug efflux pump KexD from Klebsiella pneumoniae. Gene 498, 177-182, doi:10.1016/j.gene.2012.02.008 (2012). 35 Ruggerone, P., Murakami, S., Pos, K. M. & Vargiu, A. V. RND efflux pumps: structural information translated into function and inhibition mechanisms. Curr Top Med Chem 13, 3079-3100 (2013). 36 Hinchliffe, P., Symmons, M. F., Hughes, C. & Koronakis, V. Structure and operation of bacterial tripartite pumps. Annu Rev Microbiol 67, 221-242, doi:10.1146/annurev-micro- 092412-155718 (2013). 37 Piddock, L. J. Clinically relevant chromosomally encoded multidrug resistance efflux pumps in bacteria. Clin Microbiol Rev 19, 382-402, doi:10.1128/cmr.19.2.382-402.2006 (2006). 38 Roberts, M. C. Tetracycline resistance determinants: mechanisms of action, regulation of expression, genetic mobility, and distribution. FEMS Microbiol Rev 19, 1-24 (1996). 39 Tamber, S. & Hancock, R. E. On the mechanism of solute uptake in Pseudomonas. Front Biosci 8, s472-483 (2003). 40 Poulou, A. et al. Outbreak caused by an ertapenem-resistant, CTX-M-15-producing Klebsiella pneumoniae sequence type 101 clone carrying an OmpK36 porin variant. J Clin Microbiol 51, 3176-3182, doi:10.1128/jcm.01244-13 (2013). 41 Babouee Flury, B. et al. Association of Novel Nonsynonymous Single Nucleotide Polymorphisms in ampD with Cephalosporin Resistance and Phylogenetic Variations in ampC, ampR, ompF, and ompC in Enterobacter cloacae Isolates That Are Highly Resistant to Carbapenems. Antimicrob Agents Chemother 60, 2383-2390, doi:10.1128/aac.02835-15 (2016). 42 Gradmann, C. Re-Inventing Infectious Disease: Antibiotic Resistance and Drug Development at the Bayer Company 1945-80. Med Hist 60, 155-180, doi:10.1017/mdh.2016.2 (2016). 43 Fleming, A. in Nobel Lectures, Physiology or Medicine 1942-1962 83-93 (1945). 44 Abraham, E. P. & Chain, E. An enzyme from bacteria able to destroy penicillin. 1940. Rev Infect Dis 10, 677-678 (1988). 45 Chambers, H. F. & Deleo, F. R. Waves of resistance: Staphylococcus aureus in the antibiotic era. Nat Rev Microbiol 7, 629-641, doi:10.1038/nrmicro2200 (2009). 46 Akiba, T., Koyama, K., Ishiki, Y., Kimura, S. & Fukushima, T. On the mechanism of the development of multiple-drug-resistant clones of Shigella. Jpn J Microbiol 4, 219-227 (1960). 47 Rappe, M. S. & Giovannoni, S. J. The uncultured microbial majority. Annu Rev Microbiol 57, 369-394, doi:10.1146/annurev.micro.57.030502.090759 (2003). 48 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, 13219, doi:10.1038/ncomms13219 (2016). 179

49 Adu-Oppong, B., Gasparrini, A. J. & Dantas, G. Genomic and functional techniques to mine the microbiome for novel antimicrobials and antimicrobial resistance genes. Ann N Y Acad Sci, doi:10.1111/nyas.13257 (2016). 50 Rondon, M. R. et al. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol 66, 2541- 2547 (2000). 51 Martinez, J. L. & Baquero, F. Mutation frequencies and antibiotic resistance. Antimicrob Agents Chemother 44, 1771-1777 (2000). 52 Pino, M., Power, P., Gutkind, G. & Di Conza, J. A. INQ-1, a chromosome-encoded AmpC beta-lactamase from Inquilinus limosus. J Antimicrob Chemother 69, 560-562, doi:10.1093/jac/dkt378 (2014). 53 McGrath, M., Gey van Pittius, N. C., van Helden, P. D., Warren, R. M. & Warner, D. F. Mutation rate and the emergence of drug resistance in Mycobacterium tuberculosis. J Antimicrob Chemother 69, 292-302, doi:10.1093/jac/dkt364 (2014). 54 Martinez, J. L., Baquero, F. & Andersson, D. I. Predicting antibiotic resistance. Nat Rev Microbiol 5, 958-965, doi:10.1038/nrmicro1796 (2007). 55 Martinez, J. L., Coque, T. M. & Baquero, F. What is a resistance gene? Ranking risk in resistomes. Nat Rev Microbiol 13, 116-123, doi:10.1038/nrmicro3399 (2015). 56 Lam, K. N., Cheng, J., Engel, K., Neufeld, J. D. & Charles, T. C. Current and future resources for functional metagenomics. Front Microbiol 6, 1196, doi:10.3389/fmicb.2015.01196 (2015). 57 Pehrsson, E. C., Forsberg, K. J., Gibson, M. K., Ahmadi, S. & Dantas, G. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs. Front Microbiol 4, 145, doi:10.3389/fmicb.2013.00145 (2013). 58 Forsberg, K. J. et al. Bacterial phylogeny structures soil resistomes across habitats. Nature 509, 612-616, doi:10.1038/nature13377 (2014). 59 Craig, J. W., Chang, F. Y., Kim, J. H., Obiajulu, S. C. & Brady, S. F. Expanding Small- Molecule Functional Metagenomics through Parallel Screening of Broad-Host-Range Cosmid Environmental DNA Libraries in Diverse Proteobacteria. Appl Environ Microbiol 76, 1633-1641, doi:10.1128/aem.02169-09 (2010). 60 Martinez, A. et al. Genetically Modified Bacterial Strains and Novel Bacterial Artificial Chromosome Shuttle Vectors for Constructing Environmental Libraries and Detecting Heterologous Natural Products in Multiple Expression Hosts. Appl Environ Microbiol 70, 2452-2463, doi:10.1128/aem.70.4.2452-2463.2004 (2004). 61 Stokes, J. M. et al. Cold Stress Makes Escherichia coli Susceptible to Glycopeptide Antibiotics by Altering Outer Membrane Integrity. Cell Chem Biol 23, 267-277, doi:10.1016/j.chembiol.2015.12.011 (2016). 62 Liebl, W. et al. Alternative hosts for functional (meta)genome analysis. Appl Microbiol Biotechnol 98, 8099-8109, doi:10.1007/s00253-014-5961-7 (2014).

180

63 Siefert, J. L. Defining the mobilome. Methods Mol Biol 532, 13-27, doi:10.1007/978-1- 60327-853-9_2 (2009). 64 Spanogiannopoulos, P., Waglechner, N., Koteva, K. & Wright, G. D. A rifamycin inactivating phosphotransferase family shared by environmental and pathogenic bacteria. Proc Natl Acad Sci U S A 111, 7102-7107, doi:10.1073/pnas.1402358111 (2014). 65 Torres-Cortes, G. et al. Characterization of novel antibiotic resistance genes identified by functional metagenomics on soil samples. Environ Microbiol 13, 1101-1114, doi:10.1111/j.1462-2920.2010.02422.x (2011). 66 Riesenfeld, C. S., Goodman, R. M. & Handelsman, J. Uncultured soil bacteria are a reservoir of new antibiotic resistance genes. Environ Microbiol 6, 981-989, doi:10.1111/j.1462-2920.2004.00664.x (2004). 67 Allen, H. K., Moe, L. A., Rodbumrer, J., Gaarder, A. & Handelsman, J. Functional metagenomics reveals diverse beta-lactamases in a remote Alaskan soil. ISME J 3, 243- 251, doi:10.1038/ismej.2008.86 (2009). 68 Lang, K. S. et al. Novel florfenicol and chloramphenicol resistance gene discovered in Alaskan soil by using functional metagenomics. Appl Environ Microbiol 76, 5321-5326, doi:10.1128/aem.00323-10 (2010). 69 Perron, G. G. et al. Functional characterization of bacteria isolated from ancient arctic soil exposes diverse resistance mechanisms to modern antibiotics. PLoS ONE 10, e0069533, doi:10.1371/journal.pone.0069533 (2015). 70 Donato, J. J. et al. Metagenomic analysis of apple orchard soil reveals antibiotic resistance genes encoding predicted bifunctional proteins. Appl Environ Microbiol 76, 4396-4401, doi:10.1128/aem.01763-09 (2010). 71 Tao, W., Lee, M. H., Wu, J., Kim, N. H. & Lee, S. W. Isolation and characterization of a family VII esterase derived from alluvial soil metagenomic library. J Microbiol 49, 178- 185, doi:10.1007/s12275-011-1102-5 (2011). 72 Tao, W. et al. Characterization of two metagenome-derived esterases that reactivate chloramphenicol by counteracting chloramphenicol acetyltransferase. J Microbiol Biotechnol 21, 1203-1210 (2011). 73 Jeon, J. H. et al. Novel metagenome-derived carboxylesterase that hydrolyzes beta- lactam antibiotics. Appl Environ Microbiol 77, 7830-7836, doi:10.1128/aem.05363-11 (2011). 74 McGarvey, K. M., Queitsch, K. & Fields, S. Wide variation in antibiotic resistance proteins identified by functional metagenomic screening of a soil DNA library. Appl Environ Microbiol 78, 1708-1714, doi:10.1128/aem.06759-11 (2012). 75 Liu, Y. Y. et al. Emergence of plasmid-mediated colistin resistance mechanism MCR-1 in animals and human beings in China: a microbiological and molecular biological study. Lancet Infect Dis 16, 161-168, doi:10.1016/s1473-3099(15)00424-7 (2016).

181

76 Carnevali, C. et al. Occurance of MCR-1 colistin-resitant Salmonella isolates recovered from human and animals in Italy, 2012-2015. Antimicrob Agents Chemother, doi:10.1128/aac.01803-16 (2016). 77 Ortega-Paredes, D., Barba, P. & Zurita, J. Colistin-resistant Escherichia coli clinical isolate harbouring the mcr-1 gene in Ecuador. Epidemiol Infect 144, 2967-2970, doi:10.1017/s0950268816001369 (2016). 78 Teo, J. Q. et al. mcr-1 in Multidrug-Resistant blaKPC-2-Producing Clinical Enterobacteriaceae Isolates in Singapore. Antimicrob Agents Chemother 60, 6435-6437, doi:10.1128/aac.00804-16 (2016). 79 Fernandes, M. R. et al. First Report of the Globally Disseminated IncX4 Plasmid Carrying the mcr-1 Gene in a Colistin-Resistant Escherichia coli Sequence Type 101 Isolate from a Human Infection in Brazil. Antimicrob Agents Chemother 60, 6415-6417, doi:10.1128/aac.01325-16 (2016). 80 Delgado-Blas, J. F., Ovejero, C. M., Abadia-Patino, L. & Gonzalez-Zorn, B. Coexistence of mcr-1 and blaNDM-1 in Escherichia coli from Venezuela. Antimicrob Agents Chemother 60, 6356-6358, doi:10.1128/aac.01319-16 (2016). 81 Kline, K. E. et al. Investigation of First Identified mcr-1 Gene in an Isolate from a U.S. Patient - Pennsylvania, 2016. MMWR Morb Mortal Wkly Rep 65, 977-978, doi:10.15585/mmwr.mm6536e2 (2016). 82 Wong, S. C. et al. Colistin-Resistant Enterobacteriaceae Carrying the mcr-1 Gene among Patients in Hong Kong. Emerg Infect Dis 22, 1667-1669, doi:10.3201/eid2209.160091 (2016). 83 Brauer, A. et al. Plasmid with colistin resistance gene mcr-1 in ESBL-producing Escherichia coli strains isolated from pig slurry in Estonia. Antimicrob Agents Chemother, doi:10.1128/aac.00443-16 (2016). 84 von Wintersdorff, C. J. et al. Detection of the plasmid-mediated colistin-resistance gene mcr-1 in faecal metagenomes of Dutch travellers. J Antimicrob Chemother, doi:10.1093/jac/dkw328 (2016). 85 Sanders, C. C. & Sanders, W. E. Emergence of Resistance to Cefamandole: Possible Role of Cefoxitin-Inducible Beta-Lactamases. Antimicrob Agents Chemother 15, 792-797 (1979). 86 Perez, F., Endimiani, A., Hujer, K. M. & Bonomo, R. A. The continuing challenge of ESBLs. Curr Opin Pharmacol 7, 459-469, doi:10.1016/j.coph.2007.08.003 (2007). 87 Potter, R. F., D’Souza, A. W. & Dantas, G. The rapid spread of carbapenem-resistant Enterobacteriaceae. Drug Resistance Updates 29, 30-46, doi:10.1016/j.drup.2016.09.002 (2016). 88 Yong, D. et al. Characterization of a new metallo-beta-lactamase gene, bla(NDM-1), and a novel erythromycin esterase gene carried on a unique genetic structure in Klebsiella pneumoniae sequence type 14 from India. Antimicrob Agents Chemother 53, 5046-5054, doi:10.1128/aac.00774-09 (2009).

182

89 Castanheira, M. et al. Early dissemination of NDM-1- and OXA-181-producing Enterobacteriaceae in Indian hospitals: report from the SENTRY Antimicrobial Surveillance Program, 2006-2007. Antimicrob Agents Chemother 55, 1274-1278, doi:10.1128/aac.01497-10 (2011). 90 Kumarasamy, K. K. et al. Emergence of a new antibiotic resistance mechanism in India, Pakistan, and the UK: a molecular, biological, and epidemiological study. Lancet Infect Dis 10, 597-602, doi:10.1016/s1473-3099(10)70143-2 (2010). 91 Yigit, H. et al. Novel carbapenem-hydrolyzing beta-lactamase, KPC-1, from a carbapenem-resistant strain of Klebsiella pneumoniae. Antimicrob Agents Chemother 45, 1151-1161, doi:10.1128/aac.45.4.1151-1161.2001 (2001). 92 Pesesky, M. W. et al. KPC and NDM-1 genes in related Enterobacteriaceae strains and plasmids from Pakistan and the United States. Emerg Infect Dis 21, 1034-1037, doi:10.3201/eid2106.141504 (2015). 93 Poirel, L., Rodriguez-Martinez, J. M., Mammeri, H., Liard, A. & Nordmann, P. Origin of plasmid-mediated quinolone resistance determinant QnrA. Antimicrob Agents Chemother 49, 3523-3525, doi:10.1128/aac.49.8.3523-3525.2005 (2005). 94 Allen, H. K. et al. Call of the wild: antibiotic resistance genes in natural environments. Nat Rev Microbiol 8, 251-259, doi:10.1038/nrmicro2312 (2010). 95 Robicsek, A. et al. Fluoroquinolone-modifying enzyme: a new adaptation of a common aminoglycoside acetyltransferase. Nat Med 12, 83-88, doi:10.1038/nm1347 (2006). 96 Jacoby, G. A., Strahilevitz, J. & Hooper, D. C. Plasmid-mediated quinolone resistance. Microbiol Spectr 2, doi:10.1128/microbiolspec.PLAS-0006-2013 (2014). 97 Li, J. et al. Colistin: the re-emerging antibiotic for multidrug-resistant Gram-negative bacterial infections. Lancet Infect Dis 6, 589-601, doi:10.1016/s1473-3099(06)70580-1 (2006). 98 Zheng, B. et al. Coexistence of MCR-1 and NDM-1 in Clinical Escherichia coli Isolates. Clin Infect Dis 63, 1393-1395, doi:10.1093/cid/ciw553 (2016). 99 Sun, J. et al. Co-transfer of blaNDM-5 and mcr-1 by an IncX3-X4 hybrid plasmid in Escherichia coli. Nat Microbiol 1, 16176, doi:10.1038/nmicrobiol.2016.176 (2016). 100 Diaz-Torres, M. L. et al. Novel tetracycline resistance determinant from the oral metagenome. Antimicrob Agents Chemother 47, 1430-1432 (2003). 101 Diaz-Torres, M. L. et al. Determining the antibiotic resistance potential of the indigenous oral microbiota of humans using a metagenomic approach. FEMS Microbiol Lett 258, 257-262, doi:10.1111/j.1574-6968.2006.00221.x (2006). 102 Cheng, G. et al. Functional screening of antibiotic resistance genes from human gut microbiota reveals a novel gene fusion. FEMS Microbiol Lett 336, 11-16, doi:10.1111/j.1574-6968.2012.02647.x (2012). 103 Moore, A. M. et al. Gut resistome development in healthy twin pairs in the first year of life. Microbiome 3, 27, doi:10.1186/s40168-015-0090-9 (2015).

183

104 Fouhy, F. et al. Identification of aminoglycoside and beta-lactam resistance genes from within an infant gut functional metagenomic library. PLoS ONE 9, e108016, doi:10.1371/journal.pone.0108016 (2014). 105 Pal, C., Bengtsson-Palme, J., Kristiansson, E. & Larsson, D. G. The structure and diversity of human, animal and environmental resistomes. Microbiome 4, 54, doi:10.1186/s40168-016-0199-5 (2016). 106 Gonzales, P. R. et al. Synergistic, collaterally sensitive beta-lactam combinations suppress resistance in MRSA. Nat Chem Biol 11, 855-861, doi:10.1038/nchembio.1911 (2015). 107 Baym, M., Stone, L. K. & Kishony, R. Multidrug evolutionary strategies to reverse antibiotic resistance. Science 351, aad3292, doi:10.1126/science.aad3292 (2016). 108 Stone, L. K. et al. Compounds that select against the tetracycline-resistance efflux pump. Nat Chem Biol 12, 902-904, doi:10.1038/nchembio.2176 (2016). 109 Lomovskaya, O., Zgurskaya, H. I., Totrov, M. & Watkins, W. J. Waltzing transporters and 'the dance macabre' between humans and bacteria. Nat Rev Drug Discov 6, 56-65, doi:10.1038/nrd2200 (2007). 110 Drawz, S. M., Papp-Wallace, K. M. & Bonomo, R. A. New beta-lactamase inhibitors: a therapeutic renaissance in an MDR world. Antimicrob Agents Chemother 58, 1835-1846, doi:10.1128/aac.00826-13 (2014). 111 Wright, G. D. Antibiotic Adjuvants: Rescuing Antibiotics from Resistance. Trends Microbiol, doi:10.1016/j.tim.2016.06.009 (2016). 112 Palmer, A. C., Angelino, E. & Kishony, R. Chemical decay of an antibiotic inverts selection for resistance. Nat Chem Biol 6, 105-107, doi:10.1038/nchembio.289 (2010). 113 Ejim, L. et al. Combinations of antibiotics and nonantibiotic drugs enhance antimicrobial efficacy. Nat Chem Biol 7, 348-350, doi:10.1038/nchembio.559 (2011). 114 Brown, D. Antibiotic resistance breakers: can repurposed drugs fill the antibiotic discovery void? Nat Rev Drug Discov 14, 821-832, doi:10.1038/nrd4675 (2015). 115 Cox, G. et al. A Common Platform for Antibiotic Dereplication and Adjuvant Discovery. Cell Chem Biol 24, 98-109, doi:10.1016/j.chembiol.2016.11.011 (2017). 116 Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403-410, doi:10.1016/s0022-2836(05)80360-2 (1990). 117 Wallace, J. C., Port, J. A., Smith, M. N. & Faustman, E. M. FARME DB: a functional antibiotic resistance element database. Database (Oxford) 2017, doi:10.1093/database/baw165 (2017). 118 Liu, B. & Pop, M. ARDB--Antibiotic Resistance Genes Database. Nucleic Acids Res 37, D443-447, doi:10.1093/nar/gkn656 (2009). 119 McArthur, A. G. et al. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57, 3348-3357, doi:10.1128/aac.00419-13 (2013).

184

120 Xavier, B. B. et al. Consolidating and Exploring Antibiotic Resistance Gene Data Resources. J Clin Microbiol 54, 851-859, doi:10.1128/jcm.02717-15 (2016). 121 Zankari, E. et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 67, 2640-2644, doi:10.1093/jac/dks261 (2012). 122 Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J 9, 207-216, doi:10.1038/ismej.2014.106 (2015). 123 Eddy, S. R. A new generation of homology search tools based on probabilistic inference. Genome Inform 23, 205-211 (2009). 124 Pesesky, M. W. et al. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data. Front Microbiol 7, doi:doi: 10.3389/fmicb.2016.01887 (2016). 125 Boulund, F., Johnning, A., Pereira, M. B., Larsson, D. G. & Kristiansson, E. A novel method to discover fluoroquinolone antibiotic resistance (qnr) genes in fragmented nucleotide sequences. BMC Genomics 13, 695, doi:10.1186/1471-2164-13-695 (2012). 126 Flach, C. F., Boulund, F., Kristiansson, E. & Larsson, D. J. Functional verification of computationally predicted qnr genes. Ann Clin Microbiol Antimicrob 12, 34, doi:10.1186/1476-0711-12-34 (2013). 127 Bertelli, C. & Greub, G. Rapid bacterial genome sequencing: methods and applications in clinical microbiology. Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 19, 803-813, doi:10.1111/1469-0691.12217 (2013). 128 Didelot, X., Bowden, R., Wilson, D. J., Peto, T. E. a. & Crook, D. W. Transforming clinical microbiology with bacterial genome sequencing. Nature reviews. Genetics 13, 601-612, doi:10.1038/nrg3226 (2012). 129 Zumla, A. et al. Rapid point of care diagnostic tests for viral and bacterial respiratory tract infections—needs, advances, and future prospects. The Lancet Infectious Diseases 14, 1123-1135, doi:10.1016/s1473-3099(14)70827-8 (2014). 130 Kothari, A., Morgan, M. & Haake, D. A. Emerging technologies for rapid identification of bloodstream pathogens. Clin Infect Dis 59, 272-278, doi:10.1093/cid/ciu292 (2014). 131 Pulido, M. R., Garcia-Quintanilla, M., Martin-Pena, R., Cisneros, J. M. & McConnell, M. J. Progress on the development of rapid methods for antimicrobial susceptibility testing. J Antimicrob Chemother 68, 2710-2717, doi:10.1093/jac/dkt253 (2013). 132 Bradley, P. et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun 6, 10063, doi:10.1038/ncomms10063 (2015). 133 Du, H., Chen, L., Tang, Y. W. & Kreiswirth, B. N. Emergence of the mcr-1 colistin resistance gene in carbapenem-resistant Enterobacteriaceae. Lancet Infect Dis 16, 287- 288, doi:10.1016/s1473-3099(16)00056-6 (2016).

185

134 Kasbekar, N. Tigecycline: a new glycylcycline antimicrobial agent. Am J Health Syst Pharm 63, 1235-1243, doi:10.2146/ajhp050487 (2006). 135 Sun, Y. et al. The emergence of clinical resistance to tigecycline. Int J Antimicrob Agents 41, 110-116, doi:10.1016/j.ijantimicag.2012.09.005 (2013). 136 Moore, I. F., Hughes, D. W. & Wright, G. D. Tigecycline is modified by the flavin- dependent monooxygenase TetX. Biochemistry 44, 11829-11835, doi:10.1021/bi0506066 (2005). 137 Deng, M. et al. Molecular epidemiology and mechanisms of tigecycline resistance in clinical isolates of Acinetobacter baumannii from a Chinese university hospital. Antimicrob Agents Chemother 58, 297-303, doi:10.1128/aac.01727-13 (2014). 138 Leski, T. A. et al. Multidrug-resistant tet(X)-containing hospital isolates in Sierra Leone. Int J Antimicrob Agents 42, 83-86, doi:10.1016/j.ijantimicag.2013.04.014 (2013). 139 Sutcliffe, J. A., O'Brien, W., Fyfe, C. & Grossman, T. H. Antibacterial activity of eravacycline (TP-434), a novel fluorocycline, against hospital and community pathogens. Antimicrob Agents Chemother 57, 5548-5558, doi:10.1128/AAC.01288-13 (2013). 140 Macone, A. B. et al. In vitro and in vivo antibacterial activities of omadacycline, a novel aminomethylcycline. Antimicrob Agents Chemother 58, 1127-1135, doi:10.1128/aac.01242-13 (2014). 141 Lederberg, J. & McCray, A. The Scientist : 'Ome Sweet 'Omics-- A Genealogical Treasury of Words. The Scientist 17, doi:citeulike-article-id:1874228 (2001). 142 Turnbaugh, P. J. et al. The Human Microbiome Project. Nature 449, 804-810 (2007). 143 Sender, R., Fuchs, S. & Milo, R. Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in Humans. Cell 164, 337-340, doi:10.1016/j.cell.2016.01.013 (2016). 144 Ursell, L. K., Metcalf, J. L., Parfrey, L. W. & Knight, R. Defining the human microbiome. Nutr Rev 70 Suppl 1, S38-44, doi:10.1111/j.1753-4887.2012.00493.x (2012). 145 Roesch, L. F. W. et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 1, 283-290, doi:http://www.nature.com/ismej/journal/v1/n4/suppinfo/ismej200753s1.html (2007). 146 Waldor, M. K. et al. Where next for microbiome research? PLoS Biol 13, e1002050, doi:10.1371/journal.pbio.1002050 (2015). 147 Donia, M. S. et al. A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics. Cell 158, 1402-1414, doi:10.1016/j.cell.2014.08.032 (2014). 148 Sommer, M. O., Church, G. M. & Dantas, G. The human microbiome harbors a diverse reservoir of antibiotic resistance genes. Virulence 1, 299-303, doi:10.4161/viru.1.4.12010 (2010). 149 Perry, J. A., Westman, E. L. & Wright, G. D. The antibiotic resistome: what's new? Curr Opin Microbiol 21, 45-50, doi:10.1016/j.mib.2014.09.002 (2014). 186

150 Sommer, M. O. & Dantas, G. Antibiotics and the resistant microbiome. Curr Opin Microbiol 14, 556-563, doi:10.1016/j.mib.2011.07.005 (2011). 151 Wright, G. D. The antibiotic resistome: the nexus of chemical and genetic diversity. Nat Rev Micro 5, 175-186 (2007). 152 O'Neill, J. Securing new drugs for future generations: the pipeline of antibiotics. (2015). 153 Antibiotic Resistance Threats in the United States, 2013. (Centers for Disease Control and Prevention, 2013). 154 Cooper, M. A. & Shlaes, D. Fix the antibiotics pipeline. Nature 472, 32, doi:10.1038/472032a (2011). 155 Roberts, R. R. et al. Hospital and societal costs of antimicrobial-resistant infections in a Chicago teaching hospital: implications for antibiotic stewardship. Clin Infect Dis 49, 1175-1184, doi:10.1086/605630 (2009). 156 Fischbach, M. A. & Walsh, C. T. Antibiotics for emerging pathogens. Science 325, 1089- 1093, doi:10.1126/science.1176667 (2009). 157 Kinch, M. S., Patridge, E., Plummer, M. & Hoyer, D. An analysis of FDA-approved drugs for infectious disease: antibacterial agents. Drug Discov Today 19, 1283-1287, doi:10.1016/j.drudis.2014.07.005 (2014). 158 Howard, D. H. & Scott, R. D. The Economic Burden of Drug Resistance. Clinical Infectious Diseases 41, S283-S286, doi:10.1086/430792 (2005). 159 Miller, A. Antibacterial Development: a Changing Landscape. Microbe Magazine, doi:doi:10.1128/microbe.11.111.1. 160 Tenover, F. C. Mechanisms of antimicrobial resistance in bacteria. Am J Infect Control 34, S3-10; discussion S64-73, doi:10.1016/j.ajic.2006.05.219 (2006). 161 Alekshun, M. N. & Levy, S. B. Molecular mechanisms of antibacterial multidrug resistance. Cell 128, 1037-1050, doi:10.1016/j.cell.2007.03.004 (2007). 162 MacLean, R. C., Hall, A. R., Perron, G. G. & Buckling, A. The population genetics of antibiotic resistance: integrating molecular mechanisms and treatment contexts. Nat Rev Genet 11, 405-414, doi:10.1038/nrg2778 (2010). 163 Baltz, R. H. Genomics and the ancient origins of the daptomycin biosynthetic gene cluster. J Antibiot (Tokyo) 63, 506-511, doi:10.1038/ja.2010.82 (2010). 164 Fajardo, A. & Martinez, J. L. Antibiotics as signals that trigger specific bacterial responses. Curr Opin Microbiol 11, 161-167, doi:10.1016/j.mib.2008.02.006 (2008). 165 Clardy, J., Fischbach, M. A. & Currie, C. R. The natural history of antibiotics. Curr Biol 19, R437-441, doi:10.1016/j.cub.2009.04.001 (2009). 166 Nelson, M. L., Dinardo, A., Hochberg, J. & Armelagos, G. J. Brief communication: Mass spectroscopic characterization of tetracycline in the skeletal remains of an ancient population from Sudanese Nubia 350-550 CE. Am J Phys Anthropol 143, 151-154, doi:10.1002/ajpa.21340 (2010).

187

167 Bassett, E. J., Keith, M. S., Armelagos, G. J., Martin, D. L. & Villanueva, A. R. Tetracycline-labeled human bone from ancient Sudanese Nubia (A.D. 350). Science 209, 1532-1534 (1980). 168 Jones, D. S., Podolsky, S. H. & Greene, J. A. The burden of disease and the changing task of medicine. N Engl J Med 366, 2333-2338, doi:10.1056/NEJMp1113569 (2012). 169 Elsner, H. L. The new treatment of syphilis (ehrlich-hata): Observations and results. Journal of the American Medical Association 55, 2052-2057, doi:10.1001/jama.1910.04330240030008 (1910). 170 Fleming, A. On the antibacterial action of cultures of a penicillium, with special reference to their use in the isolation of B. influenzae. 1929. Bull World Health Organ 79, 780-790 (2001). 171 Schatz, A., Bugie, E. & Waksman, S. A. Streptomycin, a substance exhibiting antibiotic activity against gram-positive and gram-negative bacteria. 1944. Clin Orthop Relat Res, 3-6 (2005). 172 Zetterstrom, R. Selman A. Waksman (1888-1973) Nobel Prize in 1952 for the discovery of streptomycin, the first antibiotic effective against tuberculosis. Acta Paediatr 96, 317- 319 (2007). 173 Pelaez, F. The historical delivery of antibiotics from microbial natural products--can history repeat? Biochem Pharmacol 71, 981-990, doi:10.1016/j.bcp.2005.10.010 (2006). 174 Fernandes, P. Antibacterial discovery and development--the failure of success? Nat Biotechnol 24, 1497-1503, doi:10.1038/nbt1206-1497 (2006). 175 Zaffiri, L., Gardner, J. & Toledo-Pereyra, L. H. History of antibiotics. From salvarsan to cephalosporins. J Invest Surg 25, 67-77, doi:10.3109/08941939.2012.664099 (2012). 176 de la Cruz, F. & Davies, J. Horizontal gene transfer and the origin of species: lessons from bacteria. Trends Microbiol 8, 128-133 (2000). 177 Blair, J. M., Webber, M. A., Baylay, A. J., Ogbolu, D. O. & Piddock, L. J. Molecular mechanisms of antibiotic resistance. Nat Rev Microbiol 13, 42-51, doi:10.1038/nrmicro3380 (2015). 178 Martinez, J. L. Natural antibiotic resistance and contamination by antibiotic resistance determinants: the two ages in the evolution of resistance to antimicrobials. Front Microbiol 3, 1, doi:10.3389/fmicb.2012.00001 (2012). 179 Hall, B. G., Salipante, S. J. & Barlow, M. Independent origins of subgroup Bl + B2 and subgroup B3 metallo-beta-lactamases. J Mol Evol 59, 133-141, doi:10.1007/s00239-003- 2572-9 (2004). 180 Hall, B. G. & Barlow, M. Evolution of the serine beta-lactamases: past, present and future. Drug Resist Updat 7, 111-123, doi:10.1016/j.drup.2004.02.003 (2004). 181 Stokes, H. W. & Gillings, M. R. Gene flow, mobile genetic elements and the recruitment of antibiotic resistance genes into Gram-negative pathogens. FEMS Microbiol Rev 35, 790-819, doi:10.1111/j.1574-6976.2011.00273.x (2011).

188

182 Gould, S. J. & Vrba, E. S. Exaptation—a Missing Term in the Science of Form. Paleobiology 8, 4-15, doi:doi:10.1017/S0094837300004310 (1982). 183 D'Costa, V. M., Griffiths, E. & Wright, G. D. Expanding the soil antibiotic resistome: exploring environmental diversity. Curr Opin Microbiol 10, 481-489, doi:10.1016/j.mib.2007.08.009 (2007). 184 Didelot, X., Bowden, R., Wilson, D. J., Peto, T. E. & Crook, D. W. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13, 601-612, doi:10.1038/nrg3226 (2012). 185 Kwong, J. C., McCallum, N., Sintchenko, V. & Howden, B. P. Whole genome sequencing in clinical and public health microbiology. Pathology 47, 199-210, doi:10.1097/pat.0000000000000235 (2015). 186 Koser, C. U. et al. Whole-genome sequencing for rapid susceptibility testing of M. tuberculosis. N Engl J Med 369, 290-292, doi:10.1056/NEJMc1215305 (2013). 187 Li, L., Mendis, N., Trigui, H., Oliver, J. D. & Faucher, S. P. The importance of the viable but non-culturable state in human bacterial pathogens. Front Microbiol 5, 258, doi:10.3389/fmicb.2014.00258 (2014). 188 Ma, L. et al. Gene-targeted microfluidic cultivation validated by isolation of a gut bacterium listed in Human Microbiome Project's Most Wanted taxa. Proc Natl Acad Sci U S A 111, 9768-9773, doi:10.1073/pnas.1404753111 (2014). 189 Nichols, D. et al. Use of Ichip for High-Throughput In Situ Cultivation of “Uncultivable” Microbial Species. Applied and Environmental Microbiology 76, 2445-2450, doi:10.1128/AEM.01754-09 (2010). 190 Kaeberlein, T., Lewis, K. & Epstein, S. S. Isolating "uncultivable" microorganisms in pure culture in a simulated natural environment. Science 296, 1127-1129, doi:10.1126/science.1070633 (2002). 191 Lagier, J. C. et al. Microbial culturomics: paradigm shift in the human gut microbiome study. Clin Microbiol Infect 18, 1185-1193, doi:10.1111/1469-0691.12023 (2012). 192 McLain, J. E., Cytryn, E., Durso, L. M. & Young, S. Culture-based Methods for Detection of Antibiotic Resistance in Agroecosystems: Advantages, Challenges, and Gaps in Knowledge. J Environ Qual 45, 432-440, doi:10.2134/jeq2015.06.0317 (2016). 193 Pettersson, E., Lundeberg, J. & Ahmadian, A. Generations of sequencing technologies. Genomics 93, 105-111, doi:10.1016/j.ygeno.2008.10.003 (2009). 194 Bonetta, L. Whole-genome sequencing breaks the cost barrier. Cell 141, 917-919, doi:10.1016/j.cell.2010.05.034 (2010). 195 Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P. & Barron, A. E. Landscape of next-generation sequencing technologies. Anal Chem 83, 4327-4341, doi:10.1021/ac2010857 (2011). 196 Christensen, K. D., Dukhovny, D., Siebert, U. & Green, R. C. Assessing the Costs and Cost-Effectiveness of Genomic Sequencing. J Pers Med 5, 470-486, doi:10.3390/jpm5040470 (2015). 189

197 Mardis, E. R. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9, 387-402, doi:10.1146/annurev.genom.9.081307.164359 (2008). 198 Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K. & Mardis, E. R. The next- generation sequencing revolution and its impact on genomics. Cell 155, 27-38, doi:10.1016/j.cell.2013.09.006 (2013). 199 Mezger, A. et al. A general method for rapid determination of antibiotic susceptibility and species in bacterial infections. J Clin Microbiol 53, 425-432, doi:10.1128/jcm.02434- 14 (2015). 200 Gibson, M. K. et al. Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nature Microbiology 1, 16024, doi:10.1038/nmicrobiol.2016.24 http://www.nature.com/articles/nmicrobiol201624#supplementary-information (2016). 201 Buelow, E. et al. Effects of selective digestive decontamination (SDD) on the gut resistome. J Antimicrob Chemother 69, 2215-2223, doi:10.1093/jac/dku092 (2014). 202 Bengtsson-Palme, J., Boulund, F., Fick, J., Kristiansson, E. & Larsson, D. G. Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Front Microbiol 5, 648, doi:10.3389/fmicb.2014.00648 (2014). 203 Meyer, F. et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386, doi:10.1186/1471-2105-9-386 (2008). 204 Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9, 811-814, doi:10.1038/nmeth.2066 (2012). 205 Hu, Y. et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat Commun 4, 2151, doi:10.1038/ncomms3151 (2013). 206 Forslund, K. et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res 23, 1163-1169, doi:10.1101/gr.155465.113 (2013). 207 Durso, L. M., Harhay, G. P., Bono, J. L. & Smith, T. P. Virulence-associated and antibiotic resistance genes of microbial populations in cattle feces analyzed using a metagenomic approach. J Microbiol Methods 84, 278-282, doi:10.1016/j.mimet.2010.12.008 (2011). 208 Ma, L., Li, B. & Zhang, T. Abundant rifampin resistance genes and significant correlations of antibiotic resistance genes and plasmids in various environments revealed by metagenomic analysis. Appl Microbiol Biotechnol 98, 5195-5204, doi:10.1007/s00253-014-5511-3 (2014). 209 Abeles, S. R., Ly, M., Santiago-Rodriguez, T. M. & Pride, D. T. Effects of Long Term Antibiotic Therapy on Human Oral and Fecal Viromes. PLoS ONE 10, e0134941, doi:10.1371/journal.pone.0134941 (2015). 210 Looft, T. et al. In-feed antibiotic effects on the swine intestinal microbiome. Proc Natl Acad Sci U S A 109, 1691-1696, doi:10.1073/pnas.1120238109 (2012). 211 Nesme, J. et al. Large-scale metagenomic-based study of antibiotic resistance in the environment. Curr Biol 24, 1096-1100, doi:10.1016/j.cub.2014.03.036 (2014). 190

212 Modi, S. R., Lee, H. H., Spina, C. S. & Collins, J. J. Antibiotic treatment expands the resistance reservoir and ecological network of the phage metagenome. Nature 499, 219- 222, doi:10.1038/nature12212 (2013). 213 Smillie, C. S. et al. Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480, 241-244, doi:10.1038/nature10571 (2011). 214 Gupta, S. K. et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob Agents Chemother 58, 212-220, doi:10.1128/aac.01310-13 (2014). 215 Yamashita, A., Sekizuka, T. & Kuroda, M. Characterization of Antimicrobial Resistance Dissemination across Plasmid Communities Classified by Network Analysis. Pathogens 3, 356-376, doi:10.3390/pathogens3020356 (2014). 216 McCall, C. A., Bent, E., Jorgensen, T. S., Dunfield, K. E. & Habash, M. B. Metagenomic Comparison of Antibiotic Resistance Genes Associated with Liquid and Dewatered Biosolids. J Environ Qual 45, 463-470, doi:10.2134/jeq2015.05.0255 (2016). 217 Thai, Q. K., Bos, F. & Pleiss, J. The Lactamase Engineering Database: a critical survey of TEM sequences in public databases. BMC Genomics 10, 390, doi:10.1186/1471-2164- 10-390 (2009). 218 Fischer, M., Thai, Q. K., Grieb, M. & Pleiss, J. DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics 7, 495, doi:10.1186/1471-2105-7-495 (2006). 219 Tsafnat, G., Copty, J. & Partridge, S. R. RAC: Repository of Antibiotic resistance Cassettes. Database (Oxford) 2011, bar054, doi:10.1093/database/bar054 (2011). 220 McArthur, A. G. & Wright, G. D. Bioinformatics of antimicrobial resistance in the age of molecular epidemiology. Curr Opin Microbiol 27, 45-50, doi:10.1016/j.mib.2015.07.004 (2015). 221 Lim, H. K. et al. Characterization of a forest soil metagenome clone that confers indirubin and indigo production on Escherichia coli. Appl Environ Microbiol 71, 7768- 7777, doi:10.1128/aem.71.12.7768-7777.2005 (2005). 222 Brady, S. F. & Clardy, J. Long-Chain N-Acyl Amino Acid Antibiotics Isolated from Heterologously Expressed Environmental DNA. Journal of the American Chemical Society 122, 12903-12904, doi:10.1021/ja002990u (2000). 223 Banik, J. J. & Brady, S. F. Recent application of metagenomic approaches toward the discovery of antimicrobials and other bioactive small molecules. Curr Opin Microbiol 13, 603-609, doi:10.1016/j.mib.2010.08.012 (2010). 224 Katz, M., Hover, B. M. & Brady, S. F. Culture-independent discovery of natural products from soil metagenomes. J Ind Microbiol Biotechnol 43, 129-141, doi:10.1007/s10295- 015-1706-6 (2016). 225 Stevens, D. C. et al. Alternative Sigma Factor Over-Expression Enables Heterologous Expression of a Type II Polyketide Biosynthetic Pathway in Escherichia coli. PLoS ONE 8, e64858, doi:10.1371/journal.pone.0064858 (2013).

191

226 McMahon, M. D., Guan, C., Handelsman, J. & Thomas, M. G. Metagenomic analysis of Streptomyces lividans reveals host-dependent functional expression. Appl Environ Microbiol 78, 3622-3629, doi:10.1128/aem.00044-12 (2012). 227 Lee, M. H. & Lee, S. W. Bioprospecting potential of the soil metagenome: novel enzymes and bioactivities. Genomics Inform 11, 114-120, doi:10.5808/gi.2013.11.3.114 (2013). 228 Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature 517, 455-459, doi:10.1038/nature14098 http://www.nature.com/nature/journal/v517/n7535/abs/nature14098.html#supplementary- information (2015). 229 Seow, K. T. et al. A study of iterative type II polyketide synthases, using bacterial genes cloned from soil DNA: a means to access and use genes from uncultured microorganisms. J Bacteriol 179, 7360-7368 (1997). 230 Weber, T. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 43, W237-243, doi:10.1093/nar/gkv437 (2015). 231 Cimermancic, P. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412-421, doi:10.1016/j.cell.2014.06.034 (2014). 232 Raveh-Sadka, T. et al. Gut bacteria are rarely shared by co-hospitalized premature infants, regardless of necrotizing enterocolitis development. Elife 4, doi:10.7554/eLife.05477 (2015). 233 Frank, J. A. et al. Improved metagenome assemblies and taxonomic binning using long- read circular consensus sequence data. Sci Rep 6, 25373, doi:10.1038/srep25373 (2016). 234 Sangwan, N., Xia, F. & Gilbert, J. A. Recovering complete and draft population genomes from metagenome datasets. Microbiome 4, 8, doi:10.1186/s40168-016-0154-5 (2016). 235 Kaminski, J. et al. High-Specificity Targeted Functional Profiling in Microbial Communities with ShortBRED. PLoS Comput Biol 11, e1004557, doi:10.1371/journal.pcbi.1004557 (2015). 236 Owen, J. G. et al. Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors. Proc Natl Acad Sci U S A 112, 4221-4226, doi:10.1073/pnas.1501124112 (2015). 237 Reddy, B. V., Milshteyn, A., Charlop-Powers, Z. & Brady, S. F. eSNaPD: a versatile, web-based bioinformatics platform for surveying and mining natural product biosynthetic diversity from metagenomes. Chem Biol 21, 1023-1033, doi:10.1016/j.chembiol.2014.06.007 (2014). 238 Reddy, B. V. et al. Natural product biosynthetic gene diversity in geographically distinct soil microbiomes. Appl Environ Microbiol 78, 3744-3752, doi:10.1128/aem.00102-12 (2012).

192

239 Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133-138, doi:10.1126/science.1162986 (2009). 240 Tsai, Y. C. et al. Resolving the Complexity of Human Skin Metagenomes Using Single- Molecule Sequencing. MBio 7, e01948-01915, doi:10.1128/mBio.01948-15 (2016). 241 Oppegard, C. et al. The two-peptide class II bacteriocins: structure, production, and mode of action. J Mol Microbiol Biotechnol 13, 210-219, doi:10.1159/000104750 (2007). 242 Koren, S. et al. Reducing assembly complexity of microbial genomes with single- molecule sequencing. Genome Biol 14, R101, doi:10.1186/gb-2013-14-9-r101 (2013). 243 Gomez-Escribano, J. P., Alt, S. & Bibb, M. J. Next Generation Sequencing of Actinobacteria for the Discovery of Novel Natural Products. Mar Drugs 14, doi:10.3390/md14040078 (2016). 244 Meyer, Q. C., Burton, S. G. & Cowan, D. A. Subtractive hybridization magnetic bead capture: a new technique for the recovery of full-length ORFs from the metagenome. Biotechnol J 2, 36-40, doi:10.1002/biot.200600156 (2007). 245 Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat Methods 11, 499-507, doi:10.1038/nmeth.2918 (2014). 246 Tamburini, S., Shen, N., Wu, H. C. & Clemente, J. C. The microbiome in early life: implications for health outcomes. Nat Med 22, 713-722, doi:10.1038/nm.4142 (2016). 247 Aagaard, K. et al. The placenta harbors a unique microbiome. Sci Transl Med 6, 237ra265, doi:10.1126/scitranslmed.3008599 (2014). 248 Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222-227, doi:10.1038/nature11053 (2012). 249 La Rosa, P. S. et al. Patterned progression of bacterial populations in the premature infant gut. Proc Natl Acad Sci U S A 111, 12522-12527, doi:10.1073/pnas.1409497111 (2014). 250 Backhed, F. et al. Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell Host Microbe 17, 690-703, doi:10.1016/j.chom.2015.04.004 (2015). 251 Subramanian, S. et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature 510, 417-421, doi:10.1038/nature13421 (2014). 252 Favier, C. F., Vaughan, E. E., De Vos, W. M. & Akkermans, A. D. Molecular monitoring of succession of bacterial communities in human neonates. Appl Environ Microbiol 68, 219-226 (2002). 253 Koenig, J. E. et al. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A 108 Suppl 1, 4578-4585, doi:10.1073/pnas.1000081107 (2011). 254 Mshvildadze, M., Neu, J. & Mai, V. Intestinal microbiota development in the premature neonate: establishment of a lasting commensal relationship? Nutr Rev 66, 658-663, doi:10.1111/j.1753-4887.2008.00119.x (2008).

193

255 Cox, L. M. et al. Altering the intestinal microbiota during a critical developmental window has lasting metabolic consequences. Cell 158, 705-721, doi:10.1016/j.cell.2014.05.052 (2014). 256 Clarke, G., O'Mahony, S. M., Dinan, T. G. & Cryan, J. F. Priming for health: gut microbiota acquired in early life regulates physiology, brain and behaviour. Acta Paediatr 103, 812-819, doi:10.1111/apa.12674 (2014). 257 Vatanen, T. et al. Variation in Microbiome LPS Immunogenicity Contributes to Autoimmunity in Humans. Cell 165, 842-853, doi:10.1016/j.cell.2016.04.007 (2016). 258 Vangay, P., Ward, T., Gerber, J. S. & Knights, D. Antibiotics, pediatric dysbiosis, and disease. Cell Host Microbe 17, 553-564, doi:10.1016/j.chom.2015.04.006 (2015). 259 Hsiao, E. Y. et al. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell 155, 1451-1463, doi:10.1016/j.cell.2013.11.024 (2013). 260 Chai, G. et al. Trends of outpatient prescription drug utilization in US children, 2002- 2010. Pediatrics 130, 23-31, doi:10.1542/peds.2011-2879 (2012). 261 Nobel, Y. R. et al. Metabolic and metagenomic outcomes from early-life pulsed antibiotic treatment. Nat Commun 6, 7486, doi:10.1038/ncomms8486 (2015). 262 Gibson, M. K., Crofts, T. S. & Dantas, G. Antibiotics and the developing infant gut microbiota and resistome. Curr Opin Microbiol 27, 51-56, doi:10.1016/j.mib.2015.07.007 (2015). 263 Cho, I. et al. Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488, 621-626, doi:10.1038/nature11400 (2012). 264 Trasande, L. et al. Infant antibiotic exposures and early-life body mass. Int J Obes (Lond) 37, 16-23, doi:10.1038/ijo.2012.132 (2013). 265 Arrieta, M. C. et al. Early infancy microbial and metabolic alterations affect risk of childhood asthma. Sci Transl Med 7, 307ra152, doi:10.1126/scitranslmed.aab2271 (2015). 266 Jernberg, C., Lofmark, S., Edlund, C. & Jansson, J. K. Long-term ecological impacts of antibiotic administration on the human intestinal microbiota. Isme j 1, 56-66, doi:10.1038/ismej.2007.3 (2007). 267 Zhang, T. et al. Prescription drug dispensing profiles for one million children: a population-based analysis. Eur J Clin Pharmacol 69, 581-588, doi:10.1007/s00228-012- 1343-1 (2013). 268 Clark, R. H., Bloom, B. T., Spitzer, A. R. & Gerstmann, D. R. Reported medication use in the neonatal intensive care unit: data from a large national data set. Pediatrics 117, 1979-1987, doi:10.1542/peds.2005-1707 (2006). 269 Hsieh, E. M. et al. Medication use in the neonatal intensive care unit. Am J Perinatol 31, 811-821, doi:10.1055/s-0033-1361933 (2014).

194

270 Warner, B. B. et al. Gut bacteria dysbiosis and necrotising enterocolitis in very low birthweight infants: a prospective case-control study. Lancet, doi:10.1016/s0140- 6736(16)00081-7 (2016). 271 Cotten, C. M. et al. Prolonged duration of initial empirical antibiotic treatment is associated with increased rates of necrotizing enterocolitis and death for extremely low birth weight infants. Pediatrics 123, 58-66, doi:10.1542/peds.2007-3423 (2009). 272 Greenwood, C. et al. Early empiric antibiotic use in preterm infants is associated with lower bacterial diversity and higher relative abundance of Enterobacter. J Pediatr 165, 23-29, doi:10.1016/j.jpeds.2014.01.010 (2014). 273 Sherman, M. P. New concepts of microbial translocation in the neonatal intestine: mechanisms and prevention. Clin Perinatol 37, 565-579, doi:10.1016/j.clp.2010.05.006 (2010). 274 Carl, M. A. et al. Sepsis from the gut: the enteric habitat of bacteria that cause late-onset neonatal bloodstream infections. Clin Infect Dis 58, 1211-1218, doi:10.1093/cid/ciu084 (2014). 275 Groer, M. W., Gregory, K. E., Louis-Jacques, A., Thibeau, S. & Walker, W. A. The very low birth weight infant microbiome and childhood health. Birth Defects Res C Embryo Today 105, 252-264, doi:10.1002/bdrc.21115 (2015). 276 National Vital Statistics Reports. (Centers for Disease Control and Prevention, 2015). 277 Arboleya, S. et al. Intestinal microbiota development in preterm neonates and effect of perinatal antibiotics. J Pediatr 166, 538-544, doi:10.1016/j.jpeds.2014.09.041 (2015). 278 Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 8, e1002358, doi:10.1371/journal.pcbi.1002358 (2012). 279 Brooks, B. et al. Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants. Microbiome 2, 1, doi:10.1186/2049-2618-2-1 (2014). 280 Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 12, 902-903, doi:10.1038/nmeth.3589 (2015). 281 Fujimura, K. E., Slusher, N. A., Cabana, M. D. & Lynch, S. V. Role of the gut microbiota in defining human health. Expert Rev Anti Infect Ther 8, 435-454, doi:10.1586/eri.10.14 (2010). 282 Abrahamsson, T. R. et al. Low gut microbiota diversity in early infancy precedes asthma at school age. Clin Exp Allergy 44, 842-850, doi:10.1111/cea.12253 (2014). 283 Rosenthal, V. D. et al. International Nosocomial Infection Control Consortium (INICC) report, data summary of 36 countries, for 2004-2009. Am J Infect Control 40, 396-407, doi:10.1016/j.ajic.2011.05.020 (2012). 284 Harris, P. N., Tambyah, P. A. & Paterson, D. L. beta-lactam and beta-lactamase inhibitor combinations in the treatment of extended-spectrum beta-lactamase producing Enterobacteriaceae: time for a reappraisal in the era of few antibiotic options? Lancet Infect Dis 15, 475-485, doi:10.1016/s1473-3099(14)70950-8 (2015). 195

285 Mahoney, T. F. & Silhavy, T. J. The Cpx stress response confers resistance to some, but not all, bactericidal antibiotics. J Bacteriol 195, 1869-1874, doi:10.1128/jb.02197-12 (2013). 286 Moles, L. et al. Preterm infant gut colonization in the neonatal ICU and complete restoration 2 years later. Clin Microbiol Infect 21, 936.e931-910, doi:10.1016/j.cmi.2015.06.003 (2015). 287 Jakobsson, H. E. et al. Short-term antibiotic treatment has differing long-term impacts on the human throat and gut microbiome. PLoS ONE 5, e9836, doi:10.1371/journal.pone.0009836 (2010). 288 Lofmark, S., Jernberg, C., Jansson, J. K. & Edlund, C. Clindamycin-induced enrichment and long-term persistence of resistant Bacteroides spp. and resistance genes. J Antimicrob Chemother 58, 1160-1167, doi:10.1093/jac/dkl420 (2006). 289 Jernberg, C., Lofmark, S., Edlund, C. & Jansson, J. K. Long-term impacts of antibiotic exposure on the human intestinal microbiota. Microbiology 156, 3216-3223, doi:10.1099/mic.0.040618-0 (2010). 290 Romero, R., Dey, S. K. & Fisher, S. J. Preterm labor: one syndrome, many causes. Science 345, 760-765, doi:10.1126/science.1251816 (2014). 291 Turnbaugh, P. J. et al. The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci Transl Med 1, 6ra14, doi:10.1126/scitranslmed.3000322 (2009). 292 Zhang, L., Huang, Y., Zhou, Y., Buckley, T. & Wang, H. H. Antibiotic administration routes significantly influence the levels of antibiotic resistance in gut microbiota. Antimicrob Agents Chemother 57, 3659-3666, doi:10.1128/aac.00670-13 (2013). 293 Gibson, M. K., Pesesky, M. W. & Dantas, G. The yin and yang of bacterial resilience in the human gut microbiota. J Mol Biol 426, 3866-3876, doi:10.1016/j.jmb.2014.05.029 (2014). 294 Park, J. et al. Plasticity, dynamics, and inhibition of emerging tetracycline resistance enzymes. Nat Chem Biol 13, 730-736, doi:10.1038/nchembio.2376 (2017). 295 Sommer, F. & Backhed, F. The gut microbiota--masters of host development and physiology. Nat Rev Microbiol 11, 227-238, doi:10.1038/nrmicro2974 (2013). 296 Pantoja-Feliciano, I. G. et al. Biphasic assembly of the murine intestinal microbiota during early development. ISME J 7, 1112-1115, doi:10.1038/ismej.2013.15 (2013). 297 Livanos, A. E. et al. Antibiotic-mediated gut microbiome perturbation accelerates development of type 1 diabetes in mice. Nat Microbiol 1, 16140, doi:10.1038/nmicrobiol.2016.140 (2016). 298 Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K. & Knight, R. Diversity, stability and resilience of the human gut microbiota. Nature 489, 220-230, doi:10.1038/nature11550 (2012). 299 Hviid, A., Svanstrom, H. & Frisch, M. Antibiotic use and inflammatory bowel diseases in childhood. Gut 60, 49-54, doi:10.1136/gut.2010.219683 (2011). 196

300 Penders, J. et al. Gut microbiota composition and development of atopic manifestations in infancy: the KOALA Birth Cohort Study. Gut 56, 661-667, doi:10.1136/gut.2006.100164 (2007). 301 Ahmadizar, F. et al. Early life antibiotic use and the risk of asthma and asthma exacerbations in children. Pediatr Allergy Immunol 28, 430-437, doi:10.1111/pai.12725 (2017). 302 Stokholm, J. et al. Maturation of the gut microbiome and risk of asthma in childhood. Nat Commun 9, 141, doi:10.1038/s41467-017-02573-2 (2018). 303 Missaghi, B., Barkema, H. W., Madsen, K. L. & Ghosh, S. Perturbation of the human microbiome as a contributor to inflammatory bowel disease. Pathogens 3, 510-527, doi:10.3390/pathogens3030510 (2014). 304 Tremlett, H. et al. Gut microbiota in early pediatric multiple sclerosis: a case-control study. Eur J Neurol 23, 1308-1321, doi:10.1111/ene.13026 (2016). 305 Russell, S. L. et al. Early life antibiotic-driven changes in microbiota enhance susceptibility to allergic asthma. EMBO Rep 13, 440-447, doi:10.1038/embor.2012.32 (2012). 306 Zanvit, P. et al. Antibiotics in neonatal life increase murine susceptibility to experimental psoriasis. Nat Commun 6, 8424, doi:10.1038/ncomms9424 (2015). 307 Azad, M. B., Bridgman, S. L., Becker, A. B. & Kozyrskyj, A. L. Infant antibiotic exposure and the development of childhood overweight and central adiposity. Int J Obes (Lond) 38, 1290-1298, doi:10.1038/ijo.2014.119 (2014). 308 Boursi, B., Mamtani, R., Haynes, K. & Yang, Y. X. The effect of past antibiotic exposure on diabetes risk. Eur J Endocrinol 172, 639-648, doi:10.1530/eje-14-1163 (2015). 309 Shaw, S. Y., Blanchard, J. F. & Bernstein, C. N. Association between the use of antibiotics in the first year of life and pediatric inflammatory bowel disease. Am J Gastroenterol 105, 2687-2692, doi:10.1038/ajg.2010.398 (2010). 310 Ungaro, R. et al. Antibiotics associated with increased risk of new-onset Crohn's disease but not ulcerative colitis: a meta-analysis. Am J Gastroenterol 109, 1728-1738, doi:10.1038/ajg.2014.246 (2014). 311 Kronman, M. P., Zaoutis, T. E., Haynes, K., Feng, R. & Coffin, S. E. Antibiotic exposure and IBD development among children: a population-based cohort study. Pediatrics 130, e794-803, doi:10.1542/peds.2011-3886 (2012). 312 Lexmond, W. S. et al. Involvement of the iNKT cell pathway is associated with early- onset eosinophilic esophagitis and response to allergen avoidance therapy. Am J Gastroenterol 109, 646-657, doi:10.1038/ajg.2014.12 (2014). 313 Blencowe, H. et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet 379, 2162-2172, doi:10.1016/s0140-6736(12)60820-4 (2012).

197

314 Liu, L. et al. Global, regional, and national causes of under-5 mortality in 2000-15: an updated systematic analysis with implications for the Sustainable Development Goals. Lancet 388, 3027-3035, doi:10.1016/s0140-6736(16)31593-8 (2016). 315 Stoll, B. J. et al. Neurodevelopmental and growth impairment among extremely low- birth-weight infants with neonatal infection. Jama 292, 2357-2365, doi:10.1001/jama.292.19.2357 (2004). 316 Fouhy, F. et al. High-throughput sequencing reveals the incomplete, short-term recovery of infant gut microbiota following parenteral antibiotic treatment with ampicillin and gentamicin. Antimicrob Agents Chemother 56, 5811-5820, doi:10.1128/aac.00789-12 (2012). 317 Stewart, C. J. et al. Preterm gut microbiota and metabolome following discharge from intensive care. Sci Rep 5, 17141, doi:10.1038/srep17141 (2015). 318 Zwittink, R. D. et al. Association between duration of intravenous antibiotic administration and early-life microbiota development in late-preterm infants. Eur J Clin Microbiol Infect Dis 37, 475-483, doi:10.1007/s10096-018-3193-y (2018). 319 ACOG Committee Opinion No 579: Definition of term pregnancy. Obstet Gynecol 122, 1139-1140, doi:10.1097/01.AOG.0000437385.88715.4a (2013). 320 Raju, T. N., Higgins, R. D., Stark, A. R. & Leveno, K. J. Optimizing care and outcome for late-preterm (near-term) infants: a summary of the workshop sponsored by the National Institute of Child Health and Human Development. Pediatrics 118, 1207-1214, doi:10.1542/peds.2006-0018 (2006). 321 Lindberg, T. P. et al. Preterm infant gut microbial patterns related to the development of necrotizing enterocolitis. J Matern Fetal Neonatal Med, 1-10, doi:10.1080/14767058.2018.1490719 (2018). 322 Abrahamsson, T. R. et al. Low diversity of the gut microbiota in infants with atopic eczema. J Allergy Clin Immunol 129, 434-440, 440.e431-432, doi:10.1016/j.jaci.2011.10.025 (2012). 323 Warner, B. B. et al. Gut bacteria dysbiosis and necrotising enterocolitis in very low birthweight infants: a prospective case-control study. Lancet 387, 1928-1936, doi:10.1016/s0140-6736(16)00081-7 (2016). 324 Kriss, M., Hazleton, K. Z., Nusbacher, N. M., Martin, C. G. & Lozupone, C. A. Low diversity gut microbiota dysbiosis: drivers, functional implications and recovery. Curr Opin Microbiol 44, 34-40, doi:10.1016/j.mib.2018.07.003 (2018). 325 Breiman, L. Random forests. Machine learning 45, 5-32 (2001). 326 Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res 45, D566-d573, doi:10.1093/nar/gkw1004 (2017). 327 Wyres, K. L. & Holt, K. E. Klebsiella pneumoniae as a key trafficker of drug resistance genes from environmental to clinically important bacteria. Curr Opin Microbiol 45, 131- 139, doi:10.1016/j.mib.2018.04.004 (2018).

198

328 Navon-Venezia, S., Kondratyeva, K. & Carattoli, A. Klebsiella pneumoniae: a major worldwide source and shuttle for antibiotic resistance. FEMS Microbiol Rev 41, 252-275, doi:10.1093/femsre/fux013 (2017). 329 Goldstone, R. J. & Smith, D. G. E. A population genomics approach to exploiting the accessory 'resistome' of Escherichia coli. Microb Genom 3, e000108, doi:10.1099/mgen.0.000108 (2017). 330 Crofts, T. S., Gasparrini, A. J. & Dantas, G. Next-generation approaches to understand and combat the antibiotic resistome. Nat Rev Microbiol 15, 422-434, doi:10.1038/nrmicro.2017.28 (2017). 331 Zhang, X., Feng, Y., Zhou, W., McNally, A. & Zong, Z. Cryptic transmission of ST405 Escherichia coli carrying bla NDM-4 in hospital. Sci Rep 8, 390, doi:10.1038/s41598- 017-18910-w (2018). 332 Izdebski, R. et al. MLST reveals potentially high-risk international clones of Enterobacter cloacae. J Antimicrob Chemother 70, 48-56, doi:10.1093/jac/dku359 (2015). 333 Gurnee, E. A. et al. Gut Colonization of Healthy Children and Their Mothers With Pathogenic Ciprofloxacin-Resistant Escherichia coli. J Infect Dis 212, 1862-1868, doi:10.1093/infdis/jiv278 (2015). 334 Kim, H. B. et al. oqxAB Encoding a Multidrug Efflux Pump in Human Clinical Isolates of Enterobacteriaceae. Antimicrob Agents Chemother 53, 3582-3584, doi:10.1128/aac.01574-08 (2009). 335 Fevre, C., Passet, V., Weill, F. X., Grimont, P. A. D. & Brisse, S. Variants of the Klebsiella pneumoniae OKP Chromosomal Beta-Lactamase Are Divided into Two Main Groups, OKP-A and OKP-B. Antimicrob Agents Chemother 49, 5149-5152, doi:10.1128/aac.49.12.5149-5152.2005 (2005). 336 Gasparrini, A. J. et al. Antibiotic perturbation of the preterm infant gut microbiome and resistome. Gut Microbes 7, 443-449, doi:10.1080/19490976.2016.1218584 (2016). 337 Furtado, I. et al. Enterococcus faecium and Enterococcus faecalis in blood of newborns with suspected nosocomial infection. Rev Inst Med Trop Sao Paulo 56, 77-80, doi:10.1590/s0036-46652014000100012 (2014). 338 Akturk, H. et al. Vancomycin-resistant enterococci colonization in a neonatal intensive care unit: who will be infected? J Matern Fetal Neonatal Med 29, 3478-3482, doi:10.3109/14767058.2015.1132693 (2016). 339 CLSI. Methods for Dilution Antimicrobial Susceptibility Testing for Bacteria That Grow Aerobically; Approved Standard--Tenth Edition. Vol. M07-A10 (Clinical and Laboratory Standards Institute, 2015). 340 Fernandez-Canigia, L., Cejas, D., Gutkind, G. & Radice, M. Detection and genetic characterization of beta-lactamases in Prevotella intermedia and Prevotella nigrescens isolated from oral cavity infections and peritonsillar abscesses. Anaerobe 33, 8-13, doi:10.1016/j.anaerobe.2015.01.007 (2015).

199

341 Planer, J. D. et al. Development of the gut microbiota and mucosal IgA responses in twins and gnotobiotic mice. Nature 534, 263-266, doi:10.1038/nature17940 (2016). 342 Baym, M. et al. Inexpensive multiplexed library preparation for megabase-sized genomes. PLoS ONE 10, e0128036, doi:10.1371/journal.pone.0128036 (2015). 343 Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120, doi:10.1093/bioinformatics/btu170 (2014). 344 Forsberg, K. J. et al. The shared antibiotic resistome of soil bacteria and human pathogens. Science (New York, N.Y.) 337, 1107-1111, doi:10.1126/science.1220761 (2012). 345 Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res 38, e132, doi:10.1093/nar/gkq275 (2010). 346 Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J, doi:10.1038/ismej.2014.106 (2014). 347 Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39, W29-37, doi:10.1093/nar/gkr367 (2011). 348 Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16, 276-277 (2000). 349 Schmieder, R. & Edwards, R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6, e17288, doi:10.1371/journal.pone.0017288 (2011). 350 Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19, 455-477, doi:10.1089/cmb.2012.0021 (2012). 351 Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072-1075, doi:10.1093/bioinformatics/btt086 (2013). 352 Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068- 2069, doi:10.1093/bioinformatics/btu153 (2014). 353 Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132, doi:10.1186/s13059-016-0997-x (2016). 354 Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691-3693, doi:10.1093/bioinformatics/btv421 (2015). 355 Mohamed, J. A., Huang, W., Nallapareddy, S. R., Teng, F. & Murray, B. E. Influence of origin of isolates, especially endocarditis isolates, and various genes on biofilm formation by Enterococcus faecalis. Infect Immun 72, 3658-3663, doi:10.1128/iai.72.6.3658- 3663.2004 (2004). 356 Liaw, A. & Wiener, M. Classification and Regression by randomForest. R News 2, 18-22, doi:citeulike-article-id:1121494 (2002).

200

357 Guyon, I., Andr, #233 & Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157-1182 (2003). 358 Kerr-Wilson, C. O., Mackay, D. F., Smith, G. C. & Pell, J. P. Meta-analysis of the association between preterm delivery and intelligence. J Public Health (Oxf) 34, 209-216, doi:10.1093/pubmed/fdr024 (2012). 359 Bhutta, A. T., Cleves, M. A., Casey, P. H., Cradock, M. M. & Anand, K. J. Cognitive and behavioral outcomes of school-aged children who were born preterm: a meta-analysis. Jama 288, 728-737 (2002). 360 Johnson, S. et al. Academic attainment and special educational needs in extremely preterm children at 11 years of age: the EPICure study. Arch Dis Child Fetal Neonatal Ed 94, F283-289, doi:10.1136/adc.2008.152793 (2009). 361 Tinnion, R., Gillone, J., Cheetham, T. & Embleton, N. Preterm birth and subsequent insulin sensitivity: a systematic review. Arch Dis Child 99, 362-368, doi:10.1136/archdischild-2013-304615 (2014). 362 Parkinson, J. R., Hyde, M. J., Gale, C., Santhakumaran, S. & Modi, N. Preterm birth and the metabolic syndrome in adult life: a systematic review and meta-analysis. Pediatrics 131, e1240-1263, doi:10.1542/peds.2012-2177 (2013). 363 Crump, C., Winkleby, M. A., Sundquist, K. & Sundquist, J. Risk of hypertension among young adults who were born preterm: a Swedish national study of 636,000 births. Am J Epidemiol 173, 797-803, doi:10.1093/aje/kwq440 (2011). 364 Kowalski, R. R., Beare, R., Doyle, L. W., Smolich, J. J. & Cheung, M. M. Elevated Blood Pressure with Reduced Left Ventricular and Aortic Dimensions in Adolescents Born Extremely Preterm. J Pediatr 172, 75-80.e72, doi:10.1016/j.jpeds.2016.01.020 (2016). 365 Crump, C., Winkleby, M. A., Sundquist, J. & Sundquist, K. Risk of asthma in young adults who were born preterm: a Swedish national cohort study. Pediatrics 127, e913- 920, doi:10.1542/peds.2010-2603 (2011). 366 Lum, S. et al. Nature and severity of lung function abnormalities in extremely pre-term children at 11 years of age. Eur Respir J 37, 1199-1207, doi:10.1183/09031936.00071110 (2011). 367 Poirel, L., Kampfer, P. & Nordmann, P. Chromosome-encoded Ambler class A beta- lactamase of Kluyvera georgiana, a probable progenitor of a subgroup of CTX-M extended-spectrum beta-lactamases. Antimicrob Agents Chemother 46, 4038-4040 (2002). 368 Thaker, M., Spanogiannopoulos, P. & Wright, G. D. The tetracycline resistome. Cell Mol Life Sci 67, 419-431, doi:10.1007/s00018-009-0172-6 (2010). 369 Nelson, M. L. & Levy, S. B. The history of the tetracyclines. Ann N Y Acad Sci 1241, 17- 32, doi:10.1111/j.1749-6632.2011.06354.x (2011). 370 Johnson, R. & Adams, J. The ecology and evolution of tetracycline resistance. Trends Ecol Evol 7, 295-299, doi:10.1016/0169-5347(92)90226-2 (1992).

201

371 Chopra, I. & Roberts, M. Tetracycline antibiotics: mode of action, applications, molecular biology, and epidemiology of bacterial resistance. Microbiol Mol Biol Rev 65, 232-260 ; second page, table of contents, doi:10.1128/mmbr.65.2.232-260.2001 (2001). 372 Ungemach, F. R., Muller-Bahrdt, D. & Abraham, G. Guidelines for prudent use of antimicrobials and their implications on antibiotic usage in veterinary medicine. Int J Med Microbiol 296 Suppl 41, 33-38, doi:10.1016/j.ijmm.2006.01.059 (2006). 373 Jeong, J., Song, W., Cooper, W. J., Jung, J. & Greaves, J. Degradation of tetracycline antibiotics: Mechanisms and kinetic studies for advanced oxidation/reduction processes. Chemosphere 78, 533-540, doi:10.1016/j.chemosphere.2009.11.024 (2010). 374 Zhu, Y. G. et al. Diverse and abundant antibiotic resistance genes in Chinese swine farms. Proc Natl Acad Sci U S A 110, 3435-3440, doi:10.1073/pnas.1222743110 (2013). 375 Walsh, C. Antibiotics: actions, origins, resistance. (ASM Press, 2003). 376 Aminov, R. I. & Mackie, R. I. Evolution and ecology of antibiotic resistance genes. FEMS Microbiol Lett 271, 147-161, doi:10.1111/j.1574-6968.2007.00757.x (2007). 377 Nguyen, F. et al. Tetracycline antibiotics and resistance mechanisms. Biol Chem 395, 559-575, doi:10.1515/hsz-2013-0292 (2014). 378 Coenen, S. et al. European Surveillance of Antimicrobial Consumption (ESAC): outpatient use of tetracyclines, sulphonamides and trimethoprim, and other antibacterials in Europe (1997-2009). J Antimicrob Chemother 66 Suppl 6, vi57-70, doi:10.1093/jac/dkr458 (2011). 379 Dixon, T. C., Meselson, M., Guillemin, J. & Hanna, P. C. Anthrax. N Engl J Med 341, 815-826, doi:10.1056/nejm199909093411107 (1999). 380 Frederiksen, B. et al. Infant exposures and development of type 1 diabetes mellitus: The Diabetes Autoimmunity Study in the Young (DAISY). JAMA Pediatr 167, 808-815, doi:10.1001/jamapediatrics.2013.317 (2013). 381 Rodriguez-Bano, J., Gutierrez-Gutierrez, B., Machuca, I. & Pascual, A. Treatment of Infections Caused by Extended-Spectrum-Beta-Lactamase-, AmpC-, and Carbapenemase-Producing Enterobacteriaceae. Clin Microbiol Rev 31, doi:10.1128/cmr.00079-17 (2018). 382 Bassetti, M., Vena, A., Castaldo, N., Righi, E. & Peghin, M. New antibiotics for ventilator-associated pneumonia. Curr Opin Infect Dis 31, 177-186, doi:10.1097/qco.0000000000000438 (2018). 383 Thakare, R., Dasgupta, A. & Chopra, S. Eravacycline for the treatment of patients with bacterial infections. Drugs Today (Barc) 54, 245-254, doi:10.1358/dot.2018.54.4.2800623 (2018). 384 Shakil, S., Akram, M. & Khan, A. U. Tigecycline: a critical update. J Chemother 20, 411-419, doi:10.1179/joc.2008.20.4.411 (2008). 385 Volkers, G., Palm, G. J., Weiss, M. S., Wright, G. D. & Hinrichs, W. Structural basis for a new tetracycline resistance mechanism relying on the TetX monooxygenase. FEBS Lett 585, 1061-1066, doi:10.1016/j.febslet.2011.03.012 (2011). 202

386 Boucher, H. W. et al. Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin Infect Dis 48, 1-12, doi:10.1086/595011 (2009). 387 Ghosh, S., LaPara, T. M. & Sadowsky, M. J. Transformation of tetracycline by TetX and its subsequent degradation in a heterologous host. FEMS Microbiol Ecol, doi:10.1093/femsec/fiv059 (2015). 388 Markley, J. L. & Wencewicz, T. A. Tetracycline-Inactivating Enzymes. Front Microbiol 9, 1058, doi:10.3389/fmicb.2018.01058 (2018). 389 Walsh, C. T. & Wencewicz, T. A. Flavoenzymes: versatile catalysts in biosynthetic pathways. Nat Prod Rep 30, 175-200, doi:10.1039/c2np20069d (2013). 390 Zheng, J. X. et al. Overexpression of OqxAB and MacAB efflux pumps contributes to eravacycline resistance and heteroresistance in clinical isolates of Klebsiella pneumoniae. Emerg Microbes Infect 7, 139, doi:10.1038/s41426-018-0141-y (2018). 391 Dean, C. R., Visalli, M. A., Projan, S. J., Sum, P. E. & Bradford, P. A. Efflux-mediated resistance to tigecycline (GAR-936) in Pseudomonas aeruginosa PAO1. Antimicrob Agents Chemother 47, 972-978 (2003). 392 Shen, K. et al. Extensive Genomic Plasticity in Pseudomonas aeruginosa Revealed by Identification and Distribution Studies of Novel Genes among Clinical Isolates. Infect Immun 74, 5272-5283, doi:10.1128/iai.00546-06 (2006). 393 Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119, doi:10.1186/1471-2105-11-119 (2010). 394 Nakamura, T., Yamada, K. D., Tomii, K. & Katoh, K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34, 2490-2492, doi:10.1093/bioinformatics/bty121 (2018). 395 Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312-1313, doi:10.1093/bioinformatics/btu033 (2014). 396 Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44, W242-245, doi:10.1093/nar/gkw290 (2016). 397 BBTools, ( 398 Davies, J. Inactivation of antibiotics and the dissemination of resistance genes. Science 264, 375-382 (1994). 399 Berendonk, T. U. et al. Tackling antibiotic resistance: the environmental framework. Nat Rev Microbiol 13, 310-317, doi:10.1038/nrmicro3439 (2015). 400 State of the World's Antibiotics, 2015. (Center for Disease Dynamics, Economics & Policy, Washington, D.C., 2015). 401 Park, B. H. & Levy, S. B. The cryptic tetracycline resistance determinant on Tn4400 mediates tetracycline degradation as well as tetracycline efflux. Antimicrob Agents Chemother 32, 1797-1800 (1988).

203

402 Speer, B. S. & Salyers, A. A. Characterization of a novel tetracycline resistance that functions only in aerobically grown Escherichia coli. J Bacteriol 170, 1423-1429 (1988). 403 Whittle, G., Hund, B. D., Shoemaker, N. B. & Salyers, A. A. Characterization of the 13- kilobase ermF region of the Bacteroides conjugative transposon CTnDOT. Appl Environ Microbiol 67, 3488-3495, doi:10.1128/aem.67.8.3488-3495.2001 (2001). 404 Nonaka, L. & Suzuki, S. New Mg2+-dependent oxytetracycline resistance determinant tet 34 in Vibrio isolates from marine fish intestinal contents. Antimicrob Agents Chemother 46, 1550-1552 (2002). 405 Ghosh, S., Sadowsky, M. J., Roberts, M. C., Gralnick, J. A. & LaPara, T. M. Sphingobacterium sp. strain PM2-P1-29 harbours a functional tet(X) gene encoding for the degradation of tetracycline. J Appl Microbiol 106, 1336-1342, doi:10.1111/j.1365- 2672.2008.04101.x (2009). 406 Forsberg, K. J., Patel, S., Wencewicz, T. A. & Dantas, G. The tetracycline destructases: a novel family of tetracycline inactivating enzymes. Chemistry and Biology 22, 888-897 (2015). 407 Hornak, V., Okur, A., Rizzo, R. C. & Simmerling, C. HIV-1 protease flaps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state. J Am Chem Soc 128, 2812-2813, doi:10.1021/ja058211x (2006). 408 Pesnot, T., Jorgensen, R., Palcic, M. M. & Wagner, G. K. Structural and mechanistic basis for a new mode of glycosyltransferase inhibition. Nat Chem Biol 6, 321-323, doi:10.1038/nchembio.343 (2010). 409 Cazalet, C. et al. Analysis of the Legionella longbeachae genome and transcriptome uncovers unique strategies to cause Legionnaires' disease. PLoS Genet 6, e1000851, doi:10.1371/journal.pgen.1000851 (2010). 410 Whiley, H. & Bentham, R. Legionella longbeachae and legionellosis. Emerg Infect Dis 17, 579-583, doi:10.3201/eid1704.100446 (2011). 411 Ballou, D. P., Entsch, B. & Cole, L. J. Dynamics involved in catalysis by single- component and two-component flavin-dependent aromatic hydroxylases. Biochem Biophys Res Commun 338, 590-598, doi:10.1016/j.bbrc.2005.09.081 (2005). 412 van Berkel, W. J., Kamerbeek, N. M. & Fraaije, M. W. Flavoprotein monooxygenases, a diverse class of oxidative biocatalysts. J Biotechnol 124, 670-689, doi:10.1016/j.jbiotec.2006.03.044 (2006). 413 Gatti, D. L. et al. The mobile flavin of 4-OH benzoate hydroxylase. Science 266, 110-114 (1994). 414 Massey, V. Activation of molecular oxygen by flavins and flavoproteins. J Biol Chem 269, 22459-22462 (1994). 415 Volkers, G. et al. Putative dioxygen-binding sites and recognition of tigecycline and minocycline in the tetracycline-degrading monooxygenase TetX. Acta Crystallogr D Biol Crystallogr 69, 1758-1767, doi:10.1107/s0907444913013802 (2013).

204

416 Liu, L. K. et al. The Structure of the Antibiotic Deactivating, N-hydroxylating Rifampicin Monooxygenase. J Biol Chem 291, 21553-21562, doi:10.1074/jbc.M116.745315 (2016). 417 Gibson, M., Nur-e-alam, M., Lipata, F., Oliveira, M. A. & Rohr, J. Characterization of kinetics and products of the Baeyer-Villiger oxygenase MtmOIV, the key enzyme of the biosynthetic pathway toward the natural product anticancer drug mithramycin from Streptomyces argillaceus. J Am Chem Soc 127, 17594-17595, doi:10.1021/ja055750t (2005). 418 Wang, P., Bashiri, G., Gao, X., Sawaya, M. R. & Tang, Y. Uncovering the enzymes that catalyze the final steps in oxytetracycline biosynthesis. J Am Chem Soc 135, 7138-7141, doi:10.1021/ja403516u (2013). 419 Yuen, P. H. & Sokoloski, T. D. Kinetics of concomitant degradation of tetracycline to epitetracycline, anhydrotetracycline, and epianhydrotetracycline in acid phosphate solution. J Pharm Sci 66, 1648-1650 (1977). 420 Liu, F. & Myers, A. G. Development of a platform for the discovery and practical synthesis of new tetracycline antibiotics. Curr Opin Chem Biol 32, 48-57, doi:10.1016/j.cbpa.2016.03.011 (2016). 421 Lienhart, W. D., Gudipati, V. & Macheroux, P. The human flavoproteome. Arch Biochem Biophys 535, 150-162, doi:10.1016/j.abb.2013.02.015 (2013). 422 Merriam, J. J., Mathur, R., Maxfield-Boumil, R. & Isberg, R. R. Analysis of the Legionella pneumophila fliI gene: intracellular growth of a defined mutant defective for flagellum biosynthesis. Infect Immun 65, 2497-2501 (1997). 423 Feeley, J. C. et al. Charcoal-yeast extract agar: primary isolation medium for Legionella pneumophila. J Clin Microbiol 10, 437-441 (1979). 424 Berger, K. H. & Isberg, R. R. Two distinct defects in intracellular growth complemented by a single genetic locus in Legionella pneumophila. Mol Microbiol 7, 7-19 (1993). 425 Sexton, J. A. et al. The Legionella pneumophila PilT homologue DotB exhibits ATPase activity that is critical for intracellular growth. J Bacteriol 186, 1658-1666 (2004). 426 Kabsch, W. XDS. Acta Crystallogr D Biol Crystallogr 66, 125-132, doi:10.1107/s0907444909047337 (2010). 427 Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67, 235-242, doi:10.1107/s0907444910045749 (2011). 428 Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60, 2126-2132, doi:10.1107/s0907444904019158 (2004). 429 Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66, 213-221, doi:10.1107/s0907444909052925 (2010). 430 Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66, 12-21, doi:10.1107/s0907444909042073 (2010). 205

431 Berenbaum, M. C. A method for testing for synergy with any number of agents. J Infect Dis 137, 122-130 (1978). 432 Morelli, L. Postnatal development of intestinal microflora as influenced by infant nutrition. J Nutr 138, 1791s-1795s, doi:10.1093/jn/138.9.1791S (2008). 433 Thomas, D. W. & Greer, F. R. Probiotics and prebiotics in pediatrics. Pediatrics 126, 1217-1231, doi:10.1542/peds.2010-2548 (2010). 434 Speer, B. S. & Salyers, A. A. Novel aerobic tetracycline resistance gene that chemically modifies tetracycline. J Bacteriol 171, 148-153 (1989).

206