Whole genome comparison of 1,803 : An analysis of genetic relatedness

and -specific target identification.

A dissertation presented

by

Anthony Bissell

to

The Department of Biology

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in the field of Biology

Northeastern University

Boston, Massachusetts

October 28, 2013

© 2013

Anthony Bissell

ALL RIGHTS RESERVED

1

Whole genome comparison of 1,803 bacteria: An analysis of genetic relatedness

and species-specific antibiotic target identification.

by

Anthony Bissell

ABSTRACT OF DISSERTATION

Submitted in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy in Biology

in the College of Science of

Northeastern University

October 28, 2013

2

ABSTRACT

The primary method of identifying and classifying novel bacteria is by comparison of

16S ribosomal RNA gene sequences. In order to better understand the relationship between 16S gene sequence relatedness and similarity of gene content between bacteria, we conducted all pairwise comparisons of all predicted open reading frames

(ORFs) from 1,803 fully sequenced bacteria. Previously, mean amino acid identity (AAI) of shared gene content has been used to characterize bacterial genome relatedness.

However, because AAI only accounts for orthologous genes, it does not account for the major differences in pan-genomes or unique genes; nor does it distinguish organisms with relatively small genomes (such as ). In order to fully characterize the relationship between bacterial genomes, we developed a novel metric—termed genome similarity index (GSI). Simply, a comparison with a GSI of g has g percent of ORFs with at least g percent similarity. Unexpectedly, we find that the range of the 95th percentile of GSI of all bacterial relationships increased with 16S percent identity indicating inconsistencies between 16S percent identity and genome similarity. At a 16S percent identity of 97%, genome similarity ranged from 49% to 100%. In addition, we analyzed the relationship between current biological taxonomic classifications (, class, order, family, genus, and species) and the GSI, AAI, and 16S. With every metric we examined, there was overlap in the distribution of intra-taxonomic comparisons. Finally, the GSI (when compared to 16S percent identity) identified, where AAI did not, a distinct subset of obligate blood-borne pathogens termed hemotropic Mycoplasma with relatively low 16S percent identity and GSI compared to the rest of the bacteria.

3

ACKNOWLEDGEMENTS

I would like to thank my thesis advisor Kim Lewis for providing excellent guidance and support and for the opportunity to work with a great group of scientists for the past seven years at the Antimicrobial Discovery Center (ADC). To my dissertation committee members Christopher Sassetti, Slava Epstein, Amy Spoering, and Ken Coleman; thank you for providing guidance and advice on my thesis project. Special thanks to my committee member Eric Stewart for all the advice, support, friendship, and conversations over many cups of coffee. Thank you to Laura Fleck and Chao Chen, my fellow drug discovery comrades. Thank you to Heather for all the help with M. tuberculosis. Thank you to Katya Gavrish for all your help and advice over the years.

Thank you to Iris Keren for the brilliant discussions and critiques of my research plans.

Thank you to the people at NovoBiotic; Dallas Hughes, Aaron Peoples, Losee Ling,

Theresa Farrell, and Ashley Zullo, your help and advice has been invaluable. Thank you to Bill Sheehan for the assistance with the Northeastern University Linux Cluster. Thank you to all the members of the ADC, past and present, for making the lab a wonderful place to do science. To my parents; Mom and Errol and my siblings; Nick and Sam, I sincerely appreciate all the love and support through all of my endeavors. To my wife

Thea, I am forever grateful for your unending love, encouragement, advice, and support during the past three years!

4

TABLE OF CONTENTS

ABSTRACT ...... 3 ACKNOWLEDGEMENTS ...... 4 TABLE OF CONTENTS ...... 5 LIST OF FIGURES ...... 7 LIST OF TABLES ...... 10 Chapter 1: Introduction ...... 11 Whole genome sequence comparison of bacteria...... 12 Identification of species-specific antimicrobial targets...... 14 Dissertation Aims ...... 16 Chapter 2: Pairwise comparison of bacterial genomes ...... 19 Introduction ...... 20 Results ...... 21 Genomic relatedness ...... 21 Taxonomic classification ...... 30 Neighbor-joining trees...... 32 Single strain analyses ...... 39 Discussion ...... 42 Materials and Methods ...... 43 Determination of genetic relatedness ...... 43 classification...... 44 Taxonomy neighbor-joining trees...... 44 Density scatterplot visualization...... 44 Chapter 3: Identification of species-selective antibiotic targets in tuberculosis. 45 Introduction ...... 46 Results ...... 47 Identification of Mycobacterium-selective targets...... 47 Discussion ...... 53 Materials and Methods ...... 54 Comparison of M. tuberculosis to active and inactive strains...... 54 Annotation of essential genes in M. tuberculosis...... 55 Identification of Mycobacterium-selective targets...... 55

5

REFERENCES ...... 56 SUPPLEMENTAL DATA ...... 60 APPENDIX I: Platforms for drug discovery ...... 64 Introduction ...... 64 Small molecule potentiates killing of Mycobacterium tuberculosis by ...... 64 Introduction ...... 64 Screen development ...... 65 Pilot screen ...... 69 References ...... 72 A screen for species-selective compounds ...... 73 Pilot screen ...... 73 Results and discussion ...... 76 References ...... 77 APPENDIX II: Lassomycin, a lasso peptide kills Mycobacteria tuberculosis by targeting the ATP- dependent protease ClpC1P1P2 ...... 78

6

LIST OF FIGURES

Figure 1-1. Whole genome comparison of Winogradskyella thalassocola and the closest

(by 16S percent identity) sequenced relative Flavobacteriales bacterium...... 13

Figure 1-2. Timeline of discovery of FDA approved antimicrobial classes ...... 15

Figure 2-1. Comparisons of genome similarity index and mean amino acid identity. .... 23

Figure 2-2AB. Bacterial comparisons of 16S percent identity plotted against GS index

(A) and mean amino acid identity (B) ...... 25

Figure 2-3A. Cumulative 95th percentile for GSI and AAI...... 27

Figure 2-3B. Range of cumulative 95th percentile for GSI and AAI...... 27

Figure 2-4AB. Comparison 16S percent identity and GS index (A) or mean amino acid identity (B) of all Mycoplasma to the complete library of bacteria...... 28

Figure 2-5AB. Subset of seven hemotropic Mycoplasma comparing GSI (A) and mean amino acid identity (B) and 16S percent identity to the library of bacterial genome sequences...... 29

Figure 2-6ABC. Distribution of GS index (A), mean amino acid identity (B), and 16S percent identity (C) of intra taxonomy classification comparisons binned at one percent intervals...... 31

Figure 2-7. Neighbor-joining tree of GSI analysis...... 33

Figure 2-8. Pruned neighbor-joining of GSI analysis...... 34

Figure 2-9. Neighbor-joining tree of AAI analysis...... 35

Figure 2-10. Pruned neighbor-joining tree of AAI analysis...... 36

Figure 2-11. Neighbor-joining tree of 16S percent identity...... 37

Figure 2-12. Pruned neighbor-joining tree of 16S percent identity...... 38

7

Figure 2-13ABC. Distribution of GS index (A), mean amino acid identity (B), and 16S percent identity (C) of intra taxonomy classification comparisons binned at one percent intervals...... 41

Figure 3-1. Relationship of best match proteins in active and inactive organisms...... 49

Figure 3-2. Labeled best match proteins in active and inactive organisms...... 51

Figure I-1. Time-dependent fluorescence of M. tuberculosis constitutively expressing mCherry...... 66

Figure I-2. Fluorescence of stationary M. tuberculosis constitutively expressing mCherry treated with increasing concentrations of rifampicin...... 67

Figure I-3. Diagram of rifampicin potentiation screen protocol...... 69

Figure I-3. Structures of rifampicin potentiator 70D2 or acetylisoniazid (left) and

Isoniazid (right)...... 70

Figure I-1. Results of the species-selective screen overlayed on the taxonomic relationship of screened organisms...... 75

Figure II-1A. The amino acid sequence and post-translational modifications of lassomycin...... 84

Figure II-1B. Backbone structure of lassomycin ...... 84

Figure II-1C. The lassomycin biosynthetic ...... 84

Figure II-2. Time-dependent killing of M. tuberculosis mc26020 by lassomycin ...... 87

Figure II-3. Sequences of the two N-terminal repeat regions of ClpC1 of mutants resistant to lassomycin...... 88

Figure II-4. Lassomycin dramatically increases ATPase activity of ClpC1...... 89

Figure II-5. Stimulation of ClpC1 ATPase activity by lassomycin is highly specific...... 91

8

Figure II-6. Mutant ClpC1 model variants yield fewer binding positions for the region of the lassomycin-resistant mutations than the wild-type model (WT)...... 92

Figure II-7. An image of a docking result ...... 93

Figure II-S1. Ramachandran plot of the dihedral angles solution structures for lassomycin...... 109

Figure II-S2. 2D-[1H–15N]-HSQC spectrum of lassomycin...... 110

Figure II-S3. Surface representation of lassomycin...... 111

Figure II-S4. Backbone overlay of the 20 lowest energy conformers of lassomycin. ... 111

Figure II-S5. Pairwise sequence alignment of glutamicum ClpC

(CgClpC) and Mycobacterium tuberculosis ClpC1 N-terminal protein sequences

(residues 1-144)...... 112

Figure II-S6. Quality evaluation parameters of the Mtb ClpC1 homology model generation...... 112

Figure II-S7. Visualizations of the ClpC N-terminus ...... 113

Figure II-S8. Torsion tree of flexible form of lassomycin...... 114

Figure II-S9. MolProbity clashscore values of the mutant models with different conformations of the mutated residue...... 114

Figure II-S10. MALDI FTICR spectrum of lassomycin ...... 115

9

LIST OF TABLES

Table 2-1. Number of representatives (reps.) per species in genome database...... 40

Table 3-1. List of Mycobacteria-selective antibiotic targets...... 52

Table S-1. List of species and number of representatives...... 60

Table I-1. Hit rate of species-selective screen...... 74

Table II-1. Lassomycin activity against Mtb...... 85

Table II-S1. MALDI FTICR and MALDI TOF/TOF analysis of partially hydrolyzed lassomycin...... 116

Table II-S2. Experimental parameters used to acquire NMR spectra on [13C,

15N]lassomycin to obtain chemical shift assignments, coupling constants, and NOE restraints...... 116

Table II-S3. 1H Chemical shift assignments of lassomycin...... 117

Table II-S4. Nitrogen and carbon chemical shift assignments of lassomycin...... 117

Table II-S5. Structural statistics of the solution structure of lassomycin...... 118

10

Chapter 1: Introduction

11

Whole genome sequence comparison of bacteria.

Microbial taxonomy has three primary concerns; identification, classification, and construction of the tree of life. The pioneering of the 16S ribosomal rRNA sequence as a tool to identify bacteria by Carl Woese (Woese, 1977) and the development of universal primers (Weisberg, 1991) has allowed for the rapid identification of unknown organisms and the classification of novel species. Given the past impracticality of sequencing whole genomes, the utilization of the 16S rRNA sequence was a simple and elegant solution to bacterial identification. Recent advancements in sequencing methods, diminishing costs, computational resources, and growing databases of fully sequenced organisms make a movement toward a more comprehensive whole genome analyses possible.

In previous research by the Lewis lab, several organisms were identified which though similar according to 16S percent identity, showed drastic differences between their whole genome sequences. In one case, an organism we called Winogradskyella thalassocola KLE1078 (D’Onofrio, 2010) shared a 16S sequence similarity of 96.7% with the closest fully sequenced organism, Flavobacteriales bacterium ALC-1. The genome sequence comparison told a very different story (Fig. 1-1). Many of the predicted open reading frames (ORFs) were unique to W. thalassocola (more than

2,000). Of the matches that were found, there was a bimodal distribution in the protein similarity; one mode at 80 percent identity and another at 25 percent identity. This implied a chimeric genome; one portion of the genome highly similar to the

Flavobacteriales and one part distantly related to another organism, and a third portion of Winogradskyella unique proteins.

12

Figure 1-1. Whole genome comparison of Winogradskyella thalassocola and the closest (by 16S percent identity) sequenced relative Flavobacteriales bacterium. Whole genomes were compared using the online tool, RAST (Aziz, 2008). (Analysis done by Eric J. Stewart).

13

Given the ubiquitous use of 16S sequence similarity as a tool for identification and classification of bacteria, it would be beneficial to know to what extent 16S percent identity characterizes genetic relatedness and to have a tool to characterize whole genome relatedness. Several attempts have been made to analyze the relationship between whole genome sequences and 16S percent identity (Huynen, 1999; House,

1999; Perriere, 2002; Lee, 2004; Konstantinidis, 2005). Due to computational limitations and the availability of complete sequences, these analyses have been limited to a few hundred sequences. Due to the recent dramatic decrease in the cost of sequencing, there has been an increase in the number of publicly available fully sequenced organisms. Using a large number of those sequences, I developed and optimized a platform to conduct pairwise comparisons of the publicly available fully sequenced bacterial genomes.

Identification of species-specific antimicrobial targets.

Diminishing returns following the golden era of antibiotic drug discovery (1940s to

1960s, Fig. 1-2) has lead to the rise in multidrug (MDR) and extensively drug resistant

(XDR) bacteria (Swartz, 1994; Salyers, 1997; Magiorakos, 2012). This major public health threat is epitomized by the spread of ESKAPE organisms ( spp.,

Staphylococcus aureus, Klebsiella spp., , and Enterobacter spp.) (Boucher, 2009) and novel approaches are needed to identify new classes of antimicrobials and antibiotic targets (Barton 2006; Hamad

2010; Crunkhorn, 2013; Lewis, 2013).

14

Figure 1-2. Timeline of discovery of FDA approved antimicrobial classes (adapted from Lewis, 2013). Antibiotic classes are placed on the timeline and annotated with the year of discovery of the first member of the class according to publication records.

Traditionally, antibiotic discovery has focused on broad-spectrum activity. Compounds with insufficient spectrum were discarded as undesirable. After overmining and repeated discovery of known compounds, the pool of broad-spectrum have thought to all been discovered. Turning our focus back to discovering and optimizing narrow spectrum antibiotics, we hope to identify novel targets, novel classes of antimicrobials, and breathe new life into the field.

Recent developments in sequencing technologies accomplished by a decrease in the cost to fully sequence a genome have resulted in the expansion of curated libraries of fully sequenced bacterial strains (Kisand, 2013). Several attempts have been made to utilize bioinformatics approaches to identify novel antibiotics and antibiotic targets (Moir,

15

1999; Chan, 2002; Projan, 2003; Alksne, 2008). Though successful in vitro activity of drugs targeting and inhibiting essential enzymes identified by virtual screening or other computational methods has been demonstrated (Kitchen, 2004; McInnes, 2007,

Schneider, 2010), candidate drugs have failed to make it to market for lack of penetration and human cell toxicity, among other reasons. Combining the search for narrow spectrum antibiotics and in silico drug discovery offers a new opportunity for rational drug identification.

Dissertation Aims

The objective of this study was to use pairwise bacterial genome sequence comparisons to investigate the relationship between 16S ribosomal RNA (rRNA) sequence similarity and genetic relatedness of bacteria and develop a method to identify potential species selective antimicrobial targets. I developed a platform utilizing the Northeastern University Linux cluster—named Venture—to conduct, in parallel, pairwise comparisons of predicted open reading frames (ORFs) from bacteria using standalone BLASTp. 1,803 fully sequenced bacteria were obtained from the National

Center for Biotechnology Information (NCBI) (Sayers, 2009) and a comparison database was then constructed of every bacteria-bacteria comparison (~3.25 million) with the best percent match ORFs from the subject organism for each of the query organism ORFs. The 16S rRNA sequence was extracted by annotation from each organism and compared to the 16s rRNA sequence of every other organism using the

ClustalW sequence alignment software. For organisms with multiple 16S sequences, the longest sequence was used.

16

Using the resulting comparison database, several metrics were used for determining genetic relatedness. In order to distinguish the functional similarity (shared protein content) and organism specific proteins between two bacteria, it was important that the metric used to characterize genetic relatedness characterize both shared gene content and organism specific genes. Previously, mean amino acid identity of shared gene content (AAI) has been identified as a method to compare genetic relatedness of (Konstantinidis, 2005). Many organisms have large pan-genomes (genes specific to a given species) or strain specific genes and AAI doesn’t account for these differences between organisms. For that reason, we developed a novel metric called the genome similarity index (GSI)—based on the h-index used to calculate the publication impact of a researcher (Hirsch, 2005)—where the GSI is equal to g when g percent of all ORFs in the query are at least g percent similar to the best matches in the subject organism. Since the GSI incorporates the similarity of the core essential genes, as well as sometimes large number of unique genes between two organisms, I hypothesized that the GSI would allow for quantification of functional differences between two organisms and allow for an analysis of the relationship between 16S percent identity and genome relatedness.

In parallel, I utilized the database of protein sequence comparisons and publicly available essential gene data to develop a method that would allow for the identification of species-selective essential proteins which could be used as targets for antimicrobials.

Narrow spectrum antimicrobials have largely been ignored or discarded by the pharmaceutical industry as undesirable. However, repeated discovery of known compounds, the emergence of multidrug and extensively drug resistant strains, and

17

reliable methods of infection identification has seen an emergence of interest in species-selective or narrow spectrum antibiotics. Since broad spectrum antibiotics hit conserved targets, we hoped that by searching for narrow spectrum drugs, we should identify novel antimicrobials. Mycobacterium tuberculosis is an obligately-aerobic respiratory pathogen with several known Mycobacterial-selective compounds with known antimicrobial targets (see appendix II and bedaquiline) and publicly available transposon data identifying essential genes for in vitro and in vivo growth; making it an ideal test-case for a bioinformatics approach to identify potentially selective antibiotic targets. Using M. tuberculosis as a target, I developed and optimized a method that will allow for the identification of potentially species selective antimicrobial targets.

18

Chapter 2: Pairwise comparison of bacterial genomes

19

Introduction

Analysis and comparison of the 16S ribosomal sequence is widely accepted by the microbiology community as the main method to identify prokaryotes, compare relatedness, and construct phylogenies (Brenner, 2000). The development of universal primers (Weisberg, 1991) has offered a convenient tool to gain insights into a novel organism. The 16S ribsosomal sequence can be quickly amplified, sequenced, and the bacteria can be organized into a known taxonomic classification and nearest typed and characterized species can be identified. However, previous studies have shown that there are inconsistencies in 16S percent identity and the representation of genetic relatedness (Acinas, 2004; Konstantinidis, 2005). DNA-DNA hybridization, GC content deviation, and alignment of concatenated shared genes have also been utilized as tools to delineate speciation (Brenner, 2000; Nishida, 2012). DNA-DNA hybridization assays, though informative, are expensive and require specific expertise (Goris, 2007). GC content has been suggested as a metric to differentiate higher level classification

(Wayne, 1987), but the GC ratio offers little insight into the specific functional differences of two organisms.

New approaches and the significant decrease in the cost of whole-genome sequencing now make it practical to sequence bacteria genomes and conduct comparisons on a whole-genome level. This has resulted in the curation of comprehensive sequence databases such as the Genome database at the National Center for Biotechnology

Information (NCBI), which houses ~2000 fully sequenced and annotated bacterial genomes for comparisons. NCBI allows access to increasing amounts of data on identified and characterized organisms and the respective genomes. Previous work has

20

shown that mean amino acid identity (AAI) is a robust measure of genetic and evolutionary relatedness between two organisms (Konstantinidis, 2005). AAI is calculated by averaging the best match BLASTp comparisons of all predicted open reading frames (ORFs) between a query and subject bacterial genome. ORFs in the query sequence without a match in the subject sequence are ignored. Therefore, AAI only accounts for shared gene content and fails to distinguish organisms with largely unique gene differences.

In this study, we have assessed the relationship between 16S percent identity and the genetic relatedness of 1,803 fully sequenced bacterial sequences (1803*1803/2 =

1,625,405 bacteria comparisons resulting in ~9.5 billion protein-protein comparisons).

We found 16S percent identity to be only weakly related with genome similarity and that current taxonomic ranks are not sufficient to predict functional similarity between strains.

Also, AAI was not able to sufficiently discriminate organisms with largely different genomes. Finally, we analyzed the relationship of bacterial relatedness metrics and current taxonomic classifications.

Results

Genomic relatedness

Pairwise comparisons of 1,803 fully sequenced single-chromosome bacterial genomes were conducted. The bacteria represented 1,112 unique species and between 1 and 51 strains per species (average of ~1.74 strains per species) and an average of ~3,000 predicted open reading frames (ORFs) per sequence. Organisms with multiple chromosomes were excluded to maintain uniformity and consistency in our analysis. In

21

order to avoid comparing a large number of unique genes as a result of the query sequence being significantly larger than the subject sequence, for each comparison, we chose the genome that contained fewer ORFs as the query sequence, thus making the comparisons consistent by maximizing the potential similarity metric. In order to fully characterize the genetic relatedness of two organisms, it was important that we not only use a method of comparison which incorporated functional relatedness (amino acid similarity), but that also differentiated the sometimes large variations in pan-genomes or organism specific genes. Inspired by the h-index (Hirsch, 2005), developed by Jorge

Hirsch to analyze the productivity of a researcher and the scientific impact of their publications, we developed a metric—termed the Genome Similarity Index (GSI)—to characterize the genetic relatedness of two organisms. The GSI is the largest percent of the predicted open reading frames (ORFs) of an organism with the same percent similarity or higher to the subject organism. In other words, a bacterial comparison with a GSI of g has g percent of the ORFs from the query organism with at least g percent similarity to the best match ORFs in the subject organism.

First, we compared GSI to AAI (Fig. 2-1). GSI and AAI correlated well to one another.

As expected, because AAI doesn’t account for organism specific genes, there was less relatedness for GSI than AAI in nearly all of the comparisons. GSI was distributed from

13% to 100%; a wider range than the 27% to 100% of AAI. Finally, there was an obvious separate cluster of comparisons revealed at less similar comparisons (GSI of

20% and AAI of 35%).

22

Figure 2-1. Comparisons of genome similarity index and mean amino acid identity. Each point represents a bacteria-bacteria comparison and shows the GSI (x- axis) plotted against the AAI (y-axis). The density of the scatterplot is represented by purple, blue, green, yellow, and red in order of increasing density (n = 1,625,405).

23

We then sought to understand the relationship between 16S percent identity and genetic relatedness. We compared 16S rRNA sequence similarity of each bacterial comparison to GSI (Fig. 2-2A) and AAI (Fig. 2-2B). The analyses have several similarities. Comparisons with a 16S percent identity of 97% have GSIs ranging from

49% to 100%, revealing relatively dissimilar organisms at the genome level with unexpectedly high 16S sequence similarity.

We were interested to know how GSI and AAI changed as 16S percent identity increased for a majority of the bacterial comparisons. For that reason, we examined the cumulative middle 95th percentile of GSI and AAI at 1 percent intervals of 16S percent identity (Fig. 2-3AB). As 16S percent identity increases, the range of GSI and AAI both increased; more so for GSI. As 16S percent identity decreases, the range of GSI and

AAI decrease for all comparisons, indicating a better correlation with organism dissimilarity. Though similar, the large range of GSI at high 16S percent identity was not as profound for AAI. This was expected given that AAI is an analysis of shared gene content, which ignores organism specific genes.

24

Figure 2-2AB. Bacterial comparisons of 16S percent identity plotted against GS index (A) and mean amino acid identity (B). Each point represents the comparison of two genomes (x-axis) and their 16S ribosomal RNA sequences (y- axis). The density of the scatterplots is represented by purple, blue, green, yellow, and red in order of increasing density (n = 1,625,405).

25

One of the most noticeable differences between the 16S percent identity comparison to

GSI is a distinct cluster of comparisons identified at low GSI (~20%) and low 16S percent identity (~65%) (Fig. 2-2A).

A closer inspection revealed that several organisms annotated as part of the

Mycoplasma genus was responsible for the cluster (Fig. 2-4AB). Mycoplasma are parasites or commensals of humans and other vertebrates, lack a cell wall, and are some of the smallest living cells known to exist (Hutchinson, 1999). The cluster alone consisted of seven hemotropic Mycoplasma (Candidatus Mycoplasma haemolamae str.

Purdue, Mycoplasma wenyonii str. Massachusetts, Mycoplasma suis str. Illinois,

Mycoplasma suis KI3806, Mycoplasma haemocanis str. Illinois,

Ohio2, and Mycoplasma haemofelis str. Langford 1) as query or subject of comparisons

(Fig. 2-5AB). These seven organisms are obligately-bloodborne pathogens infecting cats, dogs, pigs, llamas, and cattle (Neimark, 2001).

The cluster of Mycoplasma was not revealed by the AAI comparison to 16S percent identity (Fig. 2-2B) indicating the GSI differentiates distinct sets of organisms that AAI does not.

26

100 Figure 2-3A. Cumulative 95th percentile for GSI and AAI. The cumulative

95th percentile at intervals 90 1 percent was calculated for GSI (red line) and AAI (blue line) and plotted 80 against 16S percent identity (y-axis).

70 16S Percent Identity Percent16S

60 20 30 40 50 60 70 80 90 100 Cumulative 95th Percentile

60 Figure 2-3B. Range of cumulative 95th 50 percentile for GSI and AAI. The range of the cumulative 95th percentile 40 at intervals 1 percent was calculated for GSI (red line) 30 and AAI (blue line) and plotted against 16S percent 20 identity (x-axis).

10

Cumulative Range Percentile95thCumulative 0 60 70 80 90 100 16S Percent Identity

27

Figure 2-4AB. Comparison 16S percent identity and GS index (A) or mean amino acid identity (B) of all Mycoplasma to the complete library of bacteria. Each point represents the genetic relatedness metric of a bacteria- bacteria comparison (x-axis) plotted against the 16S percent identity (y- axis). The density of the scatterplots is represented by purple, blue, green, yellow, and red in order of increasing density. (n = 90,631)

28

Figure 2-5AB. Subset of seven hemotropic Mycoplasma comparing GSI (A) and mean amino acid identity (B) and 16S percent identity to the library of bacterial genome sequences. Each point represents a comparison of two bacterial genomes (x- axis) plotted against 16S percent identity (y-axis). The density of the scatterplots is represented by purple, blue, green, yellow, and red in order of increasing density. (n =12,621).

29

Taxonomic classification

Since 16S percent identity is often used as a tool for taxonomic classification, we were interested to know the relationship between the different relatedness metrics (GSI, AAI, and 16S) and current taxonomic classification. Intra-taxonomic classification comparisons (phylum to phylum, class to class, family to family, etc.) were plotted at one percent intervals. GSI (Fig. 2-6A) and AAI (Fig. 2-6B) showed discernible grouping of phylum, class, order, and species classifications. The family classification distribution was bimodal for both GSI and AAI, indicating organisms may be misclassified at least in terms of genomic similarity. Genus classifications extended the range of GSI and AAI.

In all cases, the mode of each classification was less in GSI than AAI indicating that AAI likely over-estimates similarity at each level of taxonomic classification.

When 16S percent identity was binned by intra-classification comparisons, the resulting histograms of taxonomic groups range from 65%-100% (Fig. 2-6C), which was far less than either genome relatedness metric (30%-100%). The 16S percent identity species histogram exhibits a higher mode (at 100%) than either GSI or AAI 16S histogram. The

16S genus histogram has a more distinct peak and higher mode (~94%) than either genome relatedness metric. The remaining 16S groups (phylum, class, family, and order) have similar distributions to the respective groups within GSI and AAI analysis; however the 16S distributions overlap more than the other metrics.

30

Figure 2-6ABC. Distribution of GS index (A), mean amino acid identity (B), and 16S percent identity (C) of intra taxonomy classification comparisons binned at one percent intervals. Phylum, class, order, family, genus, and species are indicated by lines of blue, red, green, purple, black, and orange, respectively.

31

Neighbor-joining trees.

We were interested to know how each of the organisms was related to the rest of the bacteria in the database according to each metric. I constructed a distance matrix for all pairwise comparisons and used that to generate a neighbor-joining tree using the “ape” package in R x64 3.0.1 (R Development Team Core, 2008) for GSI (Fig. 2-7), AAI (Fig.

2-9), and 16S percent identity (Fig. 2-11). In order to obtain a broad sense of the accuracy of the trees, each leaf or organism was colored according to the currently annotated taxonomic classification. Branch points were then collapsed on each tree

(Fig. 2-8, Fig. 2-10, Fig. 2-12, respectively) based on boot strapping analysis (less than

700 appearances in 1,000 iterations).

For the most part, the trees appear to recapitulate clusters of the known phylogeny according to the phylum annotations. Upon close inspection, all of the trees have small clusters of other phylums interspersed in to the larger groups. This may indicate incorrect classification or an insufficient number of organisms in the database to characterize a phylum. The genome comparison trees may represent a more functional relationship between organisms than the 16S percent identity tree.

Interestingly, the seven hemotropic Mycoplasma that made up the distinct cluster in the

GSI and 16S percent identity analysis were clustered with the other Mycoplasa in the database in the GSI tree, but not with other Mycoplasma species in the 16S percent identity tree. This could explain the recent reclassification of these organisms from

Haemobartonella sp. to Mycoplasma sp. based on morphology, pathogenic properties, and adherence to host cells (Messick, 2008).

32

Figure 2-7. Neighbor-joining tree of GSI analysis. A neighbor-joining tree was generated from a distance matrix of the GSI metric using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

33

Figure 2-8. Pruned neighbor-joining of GSI analysis. A neighbor-joining tree was generated from a distance matrix of the GSI metric using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Boot strapping analysis was conducted at 1,000 iterations and branchpoints were collapsed at less than 700. Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

34

Figure 2-9. Neighbor-joining tree of AAI analysis. A neighbor-joining tree was generated from a distance matrix of the AAI metric using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

35

Figure 2-10. Pruned neighbor-joining tree of AAI analysis. A neighbor-joining tree was generated from a distance matrix of the AAI metric using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Boot strapping analysis was conducted at 1,000 iterations and branchpoints were collapsed at less than 700. Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

36

Figure 2-11. Neighbor-joining tree of 16S percent identity. A neighbor-joining tree was generated from a distance matrix of the 16S percent identity using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

37

Figure 2-12. Pruned neighbor-joining tree of 16S percent identity. A neighbor- joining tree was generated from a distance matrix of the 16S percent identity using the “ape” package from R x64 3.0.1 (R Development Team Core, 2008). Boot strapping analysis was conducted at 1,000 iterations and branchpoints were collapsed at less than 700. Organisms were colored based on taxonomic classification as annotated on the NCBI Taxonomy site (Sayers, 2009).

38

Single strain analyses

Not surprisingly, sequenced organisms are most represented by typed laboratory strains. Most of the sequences downloaded from the NCBI database were multiples of the same species (Table 2-1). Of the 1803 sequences examined, more than half (56% or 1011 sequences) comprised just 213 species. The 792 remaining sequences were classified as unique species. Among the organisms with the most sequenced strains,

Escherichia coli was the most represented with 51 sequences, had

39 sequences, and Staphylococcus aureus had 29 sequences.

In order to determine if there was a bias based on the database containing multiple strains of common species, the same taxonomic classification distribution analyses was conducted by reconstituting the database and randomly chosen representative sequence for each species.

39

Table 2-1. Number of representatives (reps.) per species in genome database. # of Organisms reps. 51 Helicobacter pylori 39 Staphylococcus aureus 29 Listeria monocytogenes, enterica 24 Chlamydia trachomatis 20 19 Corynebacterium pseudotuberculosis 15 Streptococcus suis 14 Buchnera aphidicola, botulinum, Mycobacterium tuberculosis 13 Bacillus cereus, , Streptococcus pyogenes, 12 jejuni, Corynebacterium diphtheriae, Prochlorococcus marinus 11 , 10 Acinetobacter baumannii, animalis, Bifidobacterium longum, Chlamydia psittaci 9 Acetobacter pasteurianus, Bacillus amyloliquefaciens, Bacillus thuringiensis, influenzae, Propionibacterium acnes, , , pallidum 8 Pseudomonas putida, Pseudomonas aeruginosa, Rhodopseudomonas palustris, Shewanella baltica, Synechococcus sp. 7 Alteromonas macleodii, Bacillus anthracis, Carsonella ruddii, Chlamydophila psittaci, casei, lactis, , Streptococcus thermophiles 6 Bacillus subtilis, Blattabacterium sp., Chlamydophila pneumoniae, , , Geobacillus sp., , Mycobacterium sp., Mycoplasma genitalium, Pseudomonas stutzeri, Streptococcus agalactiae, 5 Synechocystis sp., Xanthomonas campestris, Xylella fastidiosa, Zymomonas mobilis Borrelia burgdorferi, Clostridium difficile, Cyanothece sp., Dehalococcoides sp., Desulfovibrio vulgaris, Enterococcus faecalis, Lactobacillus rhamnosus, , Pantoea ananatis, , Pseudomonas fluorescens, 4 Riemerella anatipestifer, Stenotrophomonas maltophilia, Streptococcus mutans, Xanthomonas oryzae, Yersinia pseudotuberculosis pleuropneumoniae, Aggregatibacter actinomycetemcomitans, Amycolatopsis mediterranei, Bacillus megaterium, Bacteroides fragilis, Bifidobacterium bifidum, pertussis, Bordetella bronchiseptica, Borrelia afzelii, Arthromitus sp., Clostridium perfringens, Corynebacterium ulcerans, Corynebacterium glutamicum, Dickeya dadantii, sp., , Lactobacillus delbrueckii, Lactobacillus helveticus, Lactobacillus johnsonii, Lactobacillus reuteri, Lactobacillus plantarum, Mycobacterium bovis, Mycobacterium intracellulare, Mycoplasma pneumoniae, Mycoplasma bovis, , 3 Oligotropha carboxidovorans, Paenibacillus polymyxa, Paenibacillus mucilaginosus, , Pseudomonas syringae, Ralstonia solanacearum, Rhizobium leguminosarum, , Shewanella sp., flexneri, Sinorhizobium meliloti, Sinorhizobium fredii, Streptococcus gallolyticus, Thermus thermophilus, Wolbachia endosymbiont, Acidithiobacillus ferrooxidans, Acidovorax sp., Alicycliphilus denitrificans, Alicyclobacillus acidocaldarius, Anaeromyxobacter sp., Anaplasma marginale, Arcobacter butzleri, Bacillus coagulans, Bacillus licheniformis, quintana, , Bradyrhizobium sp., Bradyrhizobium japonicum, Tremblaya princeps, Portiera aleyrodidarum, Caulobacter crescentus, Chlorobium phaeobacteroides, Clavibacter michiganensis, Clostridium sp., Clostridium thermocellum, Clostridium kluyveri, Cronobacter sakazakii, Dehalobacter sp., Desulfitobacterium hafniense, Desulfovibrio desulfuricans, Edwardsiella tarda, Enterococcus faecium, Erwinia amylovora, Erwinia pyrifoliae, Fibrobacter succinogenes, Francisella cf., Geobacter sulfurreducens, Geobacter sp., Gluconacetobacter diazotrophicus, Gluconobacter oxydans, Haemophilus somnus, Helicobacter cetorum, thermophilus, , Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus amylovorus, Lactobacillus acidophilus, Lactobacillus buchneri, Lactococcus garvieae, mesenteroides, Melissococcus plutonius, Methylophaga sp., Mycobacterium leprae, Mycobacterium avium, Mycobacterium gilvum, Mycobacterium smegmatis, , 2 , Mycoplasma suis, , Mycoplasma leachii, Mycoplasma mycoides, Mycoplasma haemofelis, Nitrosomonas sp., , Paenibacillus sp., Pectobacterium carotovorum, Phaeobacter gallaeciensis, Polynucleobacter necessarius, Pseudomonas mendocina, Rahnella aquatilis, Rhodospirillum rubrum, Rhodothermus marinus, Rickettsia massiliae, Rickettsia canadensis, Rickettsia slovaca, Rickettsia bellii, Salinibacter ruber, Secondary endosymbiont, Serratia sp., Shewanella putrefaciens, , , Staphylococcus pseudintermedius, Staphylococcus epidermidis, Staphylococcus lugdunensis, Streptococcus equi, Streptococcus salivarius, Streptococcus parasanguinis, Streptomyces cattleya, Sulfobacillus acidophilus, Synechococcus elongatus, Taylorella equigenitalis, Thermoanaerobacter sp., Tropheryma whipplei, Ureaplasma parvum, Wigglesworthia glossinidia, Xanthomonas axonopodis *See table in supplemental (S-1) for complete list including single representatives.

40

Figure 2-13ABC. Distribution of GS index (A), mean amino acid identity (B), and 16S percent identity (C) of intra taxonomy classification comparisons binned at one percent intervals. Phylum, class, order, family, and genus are indicated by lines of blue, red, green, purple, and teal, respectively.

41

Discussion

Pairwise comparison of the genetic relatedness of 1,803 bacteria revealed that at 16S rRNA sequence similarity can encompass a wide range of genetic similarities, particularly at high 16S percent identity. The extensive genetic differences (using GSI or

AAI) indicates constraints and selective pressures acting on 16S rRNA sequence distinctly different than at the whole-genome level. This is consistent with previous studies (Konstantinidis 2005, Ouzounis 2004), however our analysis by GSI reveal more profound differences. An analysis of current taxonomic ranks and several relatedness metrics (GSI, AAI, and 16S) reveals that the inconsistencies continue to the current classifications. The broadest distribution of which is among the genetic relatedness of genus.

It is unlikely that the utilization of the 16S rRNA sequence as the primary method of identification of novel microorganism is going to decrease until another tool of similar convenience, cost, and greater accuracy is devised. We therefore suggest the incorporation of a standard analysis mechanism to identify the highest GSI for prokaryotes with fully sequenced genomes in databases such as NCBI. It would be simple to incorporate this computational step into the submission process and would act as a supplemental mechanism to confirm the identity of the organism based on the nearest neighbor match made by 16S percent identity and provide additional insight into the genetic novelty and ecology of the organism.

In summary, the genome similarity index provides a simple method to discriminate the genetic relatedness of two organisms. We establish here that mean amino acid identity is not sufficient to separate organisms with smaller, degenerate genomes such as

42

Mycoplasma and provides less resolution than GSI. 16S percent identity is a helpful tool for identification and determination of the genetic relatedness of organisms, but

Genome Similarity Index providing functional information.

Materials and Methods

Determination of genetic relatedness. The sequence data for 1803 bacterial genomes were obtained by annotated genbank files from the National Center for

Biotechnology Information (NCBI) ftp site (ftp://ftp.ncbi.nih.gov/). The highest percent identity match for each open reading frame (ORF)—identified by “CDS” annotation— was obtained by BLASTp (standalone Linux version 2.2.27+) comparison for every pairwise sequence comparison. An e-value of less than 0.01 was used as the cutoff to indicate a non-random match. Lacking a match or a match with a sufficiently low e- value, ORFs were considered missing from the subject and subsequently, unique to the query bacteria. This analysis was conducted in parallel on an 80 node Linux cluster and took 105 consecutive days of parallel computing. All comparisons were then chosen in the direction of the query having fewer predicted ORFs than the subject organism in order to avoid forced null matches due to significant differences in genome size. The mean amino acid identity was calculated as the average of the BLASTp percent matches shared between each two organisms. The Genome Similarity Index (GSI) was calculated as g when g percent of ORFs shared at least g percent similarity. The 16S ribosomal sequence alignment (16S percent identity) was determined using the

ClustalW (version 1.8) sequence alignment application. When organisms contained more than one 16S sequence, the highest match between two organisms was used.

The middle 95th percentile (2.5% to 97.5%) was calculated for GSI and AAI.

43

Taxonomy classification. The taxonomic information for all 1803 bacteria was downloaded from the NCBI taxonomy website. The information, if annotated, included , phylum, class, order, family, genus, species, subspecies, and strain. Intra- classification comparisons and the distribution of them were binned at 1 percent intervals for GSI, AAI, and 16S percent identity. Each level of classification was plotted on a line graph. In order to determine bias based on the prevalence of certain species in the database, a list of unique species was chosen at random and the same intra classification comparisons were conducted.

Taxonomy neighbor-joining trees. A distance matrix was calculated using R (R

Development Team Core, 2008) for each of the bacteria relatedness metrics (GSI, AAI, and 16S percent identity). Neighbor-joining trees were constructed using the neighbor- joining function in the “ape” package in R and ladderized for visualization using the

“ladderize” function. The archaeon Methanobrevibactor smithii ATTC 35061 was chosen as the outgroup. Leaf labels are the annotated genus, species, and strain names according to NCBI annotation and are colored by phylum according to annotations downloaded from the NCBI taxonomy site. Bootstrapping values result from 1000 iterations using the “boot.phylo” function. Collapsed trees had branching points collapsed at bootstrap values less than 700 using the R package “phangorn”. All packages were downloaded using the R package installer.

Density scatterplot visualization. Density scatterplots were created with R (x64 version 3.0.1) (R Development Team Core, 2008) using the smoothScatter plugin.

Plotting was set with an nbin of 1000 and density was represented by a linear series of colors; purple, blue, green, yellow, and red in order of increasing density.

44

Chapter 3: Identification of species-selective antibiotic targets in

Mycobacterium tuberculosis.

45

Introduction

Mycobacterium tuberculosis—the causative agent of Tuberculosis (TB)—is a leading cause of death resulting from infectious disease, second only to human immuno- deficiency virus (HIV) (WHO, 2012). Despite a consistent decrease over the past decade in the rate of infection, increase in treatment availability, and the development of novel TB regimens, 9 million new cases and 1.4 million TB deaths are reported each year (WHO, 2012). Multidrug resistant (MDR) and extensively drug resistant (XDR) strains are on the rise (Zignol, 2006; StopTB, 2011) and several cases of strains resistant to all current treatment methods (TDR) have been reported (Velayati, 2013), threatening the progress that has been made combatting this world health challenge.

The need for novel therapies is therefore paramount.

After the golden era of antibiotic discovery (1940s to 1950s), rediscovery of known compounds and discovery of narrow spectrum drugs (considered undesirable) caused the pharmaceutical industry to abandon their antibiotic discovery divisions (Lewis,

2013). We hypothesized that the previously discarded narrow spectrum antibiotics offered an opportunity to identify novel antimicrobial targets and novel classes of antibiotics which would be effective at targeting MDR and XDR strains. By targeting a natural product drug screen against M. tuberculosis and counterscreening

Staphylococcus aureus, a novel antimicrobial—named lassomycin for the distinctive loop shape of the compound—with Mycobacterium-selective activity was identified (see

Appendix II).

The activity of lassomycin was verified as specific to M. tuberculosis, Mycobacterium smegmatis, and Mycobacterium paratuberculosis avium. The compound also showed

46

activity in several MDR and XDR M. tuberculosis strains (Table II-1) indicating a potentially novel target. Several other pathogens and symbotic bacteria were tested for inhibition and no activity was seen (Table II-2). Through whole geome sequence analysis of resistant mutants, the target of the compound was identified as ClpC1; the

ATPase subunit of the ATP-dependent protease ClpP1P2.

Having proof of principle that selective antimicrobials could be identified, I hypothesized that a bioinformatics approach could be used to identify other novel anti-Mycobacterial targets. Using the spectrum of activity of lassomycin, comparative analysis of protein sequences between M. tuberculosis and the active and inactive strains, and publicly available essential gene data; I developed a platform that could identify other species selective antimicrobial targets.

Results

Identification of Mycobacterium-selective targets.

I hypothesized that proteins would be selective antimicrobial targets against M. tuberculosis if they met the following criteria: 1) essential for growth in M. tuberculosis,

2) best match proteins were sufficiently similar in the related organisms, 3) and sufficiently dissimilar or without matching proteins in the unrelated organisms. The sequence similarity of M. tuberculosis ClpC compared to the match best proteins in organisms with and without lassomycin activity was used to define parameters for selectivity. Much of antibiotic resistance in gram-negatives results from a lack of penetration of the outer-membrane. For this reason, I only analyzed organisms tested with lassomycin that were gram-positive.

47

Sequences of type strains for each organism tested with lassomycin were downloaded from the National Center for Biotechnology Information (NCBI). All predicted open reading frames (ORFs) (annotated with “CDS” in genbank sequence file) from M. tuberculosis H37Rv were compared using BLASTp (standalone Linux version 2.2.27+) to the ORFs from strains with lassomycin activity: M. smegmatis and M. paratuberculosis avium and to the ORFs of several strains where lassomycin was inactive; Bacillus anthracis, Staphylococcus aureus, Enterococcus faecalis, Klebsiella pneumonia, Clostridium difficile, Clostridium perfringins, and Bacillus fragilis. The average percent identity of active organisms and inactive organisms was calculated for each protein (Fig. 3-1).

48

80

70

60

50

40

30

20 Average organisms) Identity (inactive Percent Average

10

0 0 10 20 30 40 50 60 70 80 90 100 Average Percent Identity (active organisms)

Figure 3-1. Relationship of best match proteins in active and inactive organisms. Each diamond is the average sequence similarity of a predicted open reading frame in M. tuberculosis compared to the best match protein by BLASTp analysis in organisms with lassomycin activity (x-axis) plotted against the average sequence similarity in organisms without lassomycin activity (y-axis). BLASTp e-values are less than 0.0001 for a match to be considered significant.

49

The first criterion for an antimicrobial target is that the targeting of it results in diminished growth of the pathogen. Using data published by DeJesus et al., each M. tuberculosis protein was annotated as either essential or non-essential for growth

(Figure 3-2). Next, to determine which of the essential proteins would result in selective inhibition in M. tuberculosis, I used protein similarity criteria that would maintain ClpC1, the target of lassomycin, as part of the final list of selective targets. Proteins were labeled as hits if the best match in both of the active organisms (M. smegmatis and M. paratuberculosis avium) were greater than 90% similar to the M. tuberculosis protein and less than 65% similar in all of the inactive organisms.

Of the 4115 open reading frames, 40 were identified as potential Mycobacteria-selective targets (Table 3-1). The potential targets were members of several different metabolic processes and functions including DNA synthesis, peptide synthesis, protein degradation, cell wall synthesis, as well as others. The largest representation of the potential targets identified were ribosomal proteins (12 out of 40); the confirmed target of a large class of antimicrobials; aminoglycosides.

The most interesting result of this method is the identification of AtpE, the C chain of an

ATP synthase in M. tuberculosis (Haagsma, 2009). Recently, a novel antimicrobial named bedaquiline (otherwise known as TMC207) was identified to have

Mycobacterium-selective activity targeting AtpE and shortly after approved by the FDA to treat drug-resistant TB (Edney, 2012).

50

80

70

60

50

40

30

20 Average organisms) Identity (inactive Percent Average

10

0 0 20 40 60 80 100 Average Percent Identity (active organisms)

Figure 3-2. Labeled best match proteins in active and inactive organisms. Each point is labeled as follows: non-essential genes - purple cross and red square, essential genes - green triangle and blue diamond, hits - blue diamond and red square, or non-hit; - purple cross and green triangles. Hits are defined as having a match by BLASTp analysis in all active organisms above 90% similarity and being less than 65% similarity in inactive organisms. Non-hits are defined as having less than 90% similarity in any active organisms or greater than 65% match in any inactive organism. BLASTp e-values are less than 0.0001 for a match to be considered significant.

51

Table 3-1. List of Mycobacteria-selective antibiotic targets. Drugs known to target Gene Predicted protein function protein atpE Probable ATP synthase C chain Bedaquiline clpB Probable endopeptidase ATP binding protein (chain B) clpC1 Probable ATP-dependent protease ATP-binding subunit Lassomycin clpP1 Probable ATP-dependent CLP protease proteolytic subunit 1 Acyldepsipeptide clpP2 Probable ATP-dependent CLP protease proteolytic subunit 2 Acyldepsipeptide dnaK Probable chaperone protein dop Deamidase of pup ftsE Putative cell division ATP-binding protein garA Conserved protein with FHA domain gcpE Probable GcpE protein groEL2 60 kDa chaperonin 2 GroEL2 guaB2 Probable inosine-5'-monophosphate dehydrogenase guaB3 Probable inosine-5'-monophosphate dehydrogenase gyrA DNA gyrase (subunit A) Ciprofloxacin hupB DNA-binding protein HU homolog infC Probable initiation factor if-3 mtrA Two component sensory transduction transcriptional regulatory protein murA UDP-N-acetylglucosamine 1-carboxyvinyltranscerase Fosfomycin (Resistant in Mtb) pafA Proteasome accessory factor prrA Two component response transcriptional regulatory protein relA Probable GTP pyrophosphokinase ribA2 Probable riboflavin biosynthesis protein rplB 50S ribosomal protein L2 rplP 50S ribosomal protein L16 rplQ 50S ribosomal protein L17 rplT 50S ribosomal protein L20 rpoA Probable DNA-directed RNA polymerase rpoC DNA directed RNA polymerase (beta chain) rpsA 30S ribosomal protein S1 rpsB 30S ribosomal protein S2 rpsC 30S ribosomal protein S3 rpsE 30S ribosomal protein S5 rpsF 30S ribosomal protein S6 rpsH 30S ribosomal protein S8 ruvB Probable holliday junction DNA helicase

Rv1461 Hypothetical protein

Rv2050 Hypothetical protein sdhA Probable succinate dehydrogenase snzP Possible pyridoxine biosynthesis protein whiA Probable transcriptional regulatory protein

*When compared to the best match protein by BLASTp in M. tuberculosis, proteins are >90% similar in M. avium paratuberculosis and M. smegmatis and <65% similar in B. anthracis, S. aureus, E. faecalis, K. pneumonia, C. difficile, C. perfringins, and B.fragilis. E-values are less than 0.001 for a match to be considered significant.

52

Discussion

The growing availability of next-generation sequencing data for doing comparative whole-genome analysis makes doing computational drug discovery more appealing and feasible. By comparing the essential predicted open reading frames in a pathogen of interest to close relatives and counterscreening for dissimilarity to organisms with undesired activity, I have developed a platform to identify essential, organism specific antibiotic targets. Using this method, I have assembled a list of Mycobacterial-selective antimicrobial targets.

One of the key insights in my method came from the discovery of a highly

Mycobacterium-selective antimicrobial, lassomycin. The characteristics of the relationship between the predicted target of lassomycin in M. tuberculosis—the ATPase subunit ClpC1—and the best match proteins in the active organisms and inactive organisms were utilized as a control case to identify cutoffs for protein specificity in

Mycobacterium. The method further relies on the availability of essential gene data that has been characterized for M. tuberculosis.

Several unexpected genes were in the list of Mycobacterial-selective targets. GyrA, the target of ciprofloxacin—a broad spectrum antibiotic—was identified as having the potential to be selective to Mycobacterium. Studies have shown that clinically tolerable doses of ciprofloxacin, though potently bactericidal results in a rapid emergence of resistance. The frequency of resistance is 2 to 3 orders of magnitude more frequent in

M. tuberculosis than gram-negatives pathogens like E. coli (Gumbo, 2005). This suggests that binding of ciprofloxacin to gyrase in M. tuberculosis is easily disrupted by

53

only a few mutations. Moxifloxacin is suggested as an alternative less resistant prone quinolone (Shandil, 2007). Optimizing quinolones for narrow spectrum activity may be a viable option to discover new therapeutics. MurA, the target of another broad spectrum, fosfomycin, was also identified as a potential selective target. Interestingly, M. tuberculosis is innately resistant to fosfomycin, because key nucleotide differences in the active site of MurA (De Smet, 1999).

Encouragingly, included in the list of antimicrobial selective targets was AtpE, the target of bedaquiline (previously known as TMC207), a recently FDA approved

Mycobacterium-selective antibiotic which inhibits the Mycobacterial ATP synthase

(Haagsma, 2009). ClpP1P2, the ATP dependent protease of M. tuberculosis was also identified as a Mycobacteria-selective target. ClpP1P2 is uniquely different in

Mycobacteria as it is made up of two distinct subunits ClpP1 and ClpP2 where most other organisms only have one (Akopian, 2012). Many of the targets were also ribosomal proteins; a verified target of many antibiotics (Yonath, 2005). This suggests that the ribosome may offer a viable narrow spectrum target.

This study has demonstrated a bioinformatics approach that successfully identified

Mycobacterial-selective antimicrobial targets. Where broad spectrum discovery has wained, narrow spectrum antimicrobials may offer new life and fulfillment of the desparate need for novel antibiotics.

Materials and Methods

Comparison of M. tuberculosis to active and inactive strains. Sequences for

Mycobacterium tuberculosis (NC_000962), Mycobacterium smegmatis (NC_008596),

54

Mycobacterium paratuberculosis avium (NC_002944), Bacillus anthracis (NC_003997),

Staphylococcus aureus (NC_002745), Enterococcus faecalis (NC_004668), Klebsiella pneumonia (NC_016845), Clostridium difficile (NC_009089), Clostridium perfringins

(NC_003366), and Bacillus fragilis (NC_006347) were obtained from the National

Center for Biotechnology Information (NCBI) website. Pairwise comparison of every predicted open reading frame (annotated by “CDS”) from M. tuberculosis was compared to every open reading frame from all other sequences by standalone BLASTp and the best match by highest percent identity was identified. E-values of less than 0.001 were used to indicate a significant match was found. An average percent identity was then calculated for both the active organisms (M. smegmatis and M. paratuberculosis avium) and inactive organisms (B. anthracis, S. aureus, E. faecalis, K. pneumonia, C. difficile,

C. perfringens, and B. fragilis).

Annotation of essential genes in M. tuberculosis. Each M. tuberculosis gene was annotated as essential or non-essential by a list of essential and non-essential genes downloaded from the supplemental information (Table S1) of “Bayesian Analysis of

Gene Essentiality based on Sequencing of Transposon Insertion Libraries” (Loerger,

2013).

Identification of Mycobacterium-selective targets. The best match proteins for the

M. tuberculosis ClpC1 in the organisms with lassomycin activity were greater than 90% similar. The best match proteins for ClpC1 in the inactive organisms were all less than

65%.

55

REFERENCES

Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF. (2004) Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn . J Bacteriol. 186(9):2629-35.

Alksne LE, Dunman PM. (2008) Target-based antimicrobial drug discovery. Methods Mol Biol. 431:271-83.

Akopian T, Kandror O, Raju RM, Unnikrishnan M, Rubin EJ, Goldberg AL. (2012) The active ClpP protease from M. tuberculosis is a complex composed of a heptameric ClpP1 and a ClpP2 ring. EMBO J. 31(6)1529-41.

Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. (2008) The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 9(75).

Barton S. (2006) New antibiotic on the horizon? Nat Rev Drug Discov. 5:539.

Boucher HW, Talbot GH, Bradley JS, Edwards JE, Gilbert D, Rice LB, Scheld M, Spellberg B, Bartlett J. (2009) Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin Infect Dis. 48(1):1-12.

Brenner D, Staley J, and Krieg N. (2000) Classification of prokaryotic organisms and the concept of bacterial speciation, p. 27-31. In D. R. Boone, R. W. Castenholz, and G. M. Garrity (ed.), Bergey's manual of systematic bacteriology, 2nd ed., vol. 1. Springer-Verlag, New York, N.Y.

Chan PF, Macarron R, Payne DJ, Zalacain M, Holmes DJ. (2002) Novel antibacterials: a genomics approach to drug discovery. Curr Drug Targets Infect Disord. 2(4):291-308.

Crunkhorn S. (2013) Antibacterial drugs: New antibiotics on the horizon? Nat Rev Drug Discov. 12(2):99.

D'Onofrio A, Crawford JM, Stewart EJ, Witt K, Gavrish E, Epstein S, Clardy J, Lewis K. (2010) Siderophores from neighboring organisms promote the growth of uncultured bacteria. Chem Biol. 17(3):256-64.

Daubin, V., M. Gouy, and G. Perriere. (2002) A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12:1080-1090.

De Smet KA, Kempsell KE, Gallagher A, Duncan K, Young DB. (1999) Alteration of a single amino acid residue reverses fosfomycin resistance of recombinant MurA from Mycobacterium tuberculosis. Microbiology. 145:3177-84.

Edney A. (2012) J&J&J Sirturo Wins FDA Approval to Treat Drug-Resistant TB. Bloomberg.

56

Fitz-Gibbon, S. T., and C. H. House. (1999) Whole genome-based phylogenetic analysis of free- living microorganisms. Nucleic Acids Res. 27:4218-22.

Gumbo T, Louie A, Deziel MR, Drusano GL. (2005) Pharmacodynamic Evidence that Ciprofloxacin Failure against Tuberculosis Is Not Due to Poor Microbial Kill but to Rapid Emergence of Resistance. Antmicrob Agents Chemother. 49(8):3178-81.

Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. (2007) DNA- DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 57:81-91.

Haagsma AC, Abdillahi-Ibrahim R, Wagner MJ, Krab K, Vergauwen K, Guillemont J, Andries K, Lill H, Koul A, and Bald D. (2009) Selectivity of TMC207 towards Mycobacterial ATP Synthase Compared with That towards the Eukaryotic Homologue. Antimicrob Agents Chemother. 53(3):1290-92.

Hamad B. (2010) The antibiotics market. Nat Rev Drug Discov. 9:675-76.

Hirsch JE. (2005) An index to quantify an individual's scientific research output. Proc Natl Acad Sci. 102(46):16569-72.

Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter JC. (1999) Global transposon mutagenesis and a minimal Mycoplasma genome. 286(5447):2165- 69.

Hong, S. H., Kim T. Y., and Lee S. Y.. (2004) Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Appl Microbiol Biotechnol. 65:203-210.

Kisand V and Lettieri T (2013) Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genomics. 14:211.

Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 3:939-49.

Konstantinidis KT, and Tiedje JM (2005) Towards a genome-based taxonomy for prokaryotes. J Bacteriol. 187(18):6258-64.

Lewis K. (2013) Platforms for antibiotic discovery. Nat Rev Drug Discov. 12:371-87.

Magiorakos AP, Srinivasan A, Carey RB, Carmeli Y, Falagas ME, Giske CG, Harbarth S, Hindler JF, Kahlmeter G, Olsson-Liljequist B, Paterson DL, Rice LB, Stelling J, Struelens MJ, Vatopoulos A, Weber JT, Monnet DL. (2012) Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clin Microbiol Infect. 18(3):268-81.

McInnes C. (2007) Virtual screening strategies in drug discovery. Curr Opin Chem Biol. 11(5)494-502.

57

Messick JB. (2008) Hemotrophic (hemoplasmas): a review and new insights into pathogenic potential. Veterinary Clinical Pathology. 33(1):2-13.

Moir DT, Shaw KJ, Hare RS, and Vovis GF (1999) Genomics and Antimicrobial Drug Discovery. Antimicrob Agents Chemother. 43(3):439-46.

Neimark, H, Johanasson KE, Rikihisa Y, Tully JG. (2001) Proposal to transfer some members of the genera Haemobartonella and Eperythrozoon to the genus Mycoplasma with descriptions of Candidatus M. haemofelis, Candidatus M. haemomuris, Candidatus M. haemosuis and Candidatus M. wenyonii. Int J Syst Evol Microbiol. 51:891–99.

Nishida H (2012) Genome DNA Sequence Variation, Evolution, and Function in Bacteria and Archaea. Current Issues in Molecular Biology. 15(1):19-24.

R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R- project.org.

Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A, Starchenko G,Tatusova TA, Wagner L, Yaschenko E, Ye J (2009). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009 Jan;37(Database issue):D5-15. Epub 2008 Oct 21.

Salyers AA, Amabile-Cuevas CF (1997) Why are antibiotic resistance genes so resistant to elimination? Antimicrob Agents Chemother. 41:2321–25.

Schneider G (2010) Virtual screening: an endless staircase? Nat Rev Drug Discov. 9:273-76.

Shandil RK, Jayaram R, Kaur P, Gaonkar S, Suresh BL, Mahesh BN,† Jayashree R, Nandi V, Bharath S, Balasubramanian V. (2007) Moxifloxacin, Ofloxacin, Sparfloxacin, and Ciprofloxacin against Mycobacterium tuberculosis: Evaluation of In Vitro and Pharmacodynamic Indices That Best Predict In Vivo Efficacy. Antimicrob Agents Chemother. 51(2):576-82.

Snel, B., P. Bork, and M. A. Huynen. (1999) Genome phylogeny based on gene content. Nat Genet. 21:108-110.

StopTB (2011) New global framework to support scale up to universal access to quality management of MDR-TB. (StopTB Partnership).

Swartz MN. (1994) Hospital-acquired infections: diseases with increasingly limited therapies. Proc Natl Acad Sci USA. 91(7):2420-27.

Velayati AA, Farnia P, and Masjedi MR (2013) The totally drug resistant tuberculosis (TDR-TB). Int J Clin Exp. 6(4):307-9.

58

Wayne LG, Brenner DJ, Colwell RR, Grimont PAD, Kandler O, Krichevsky MI, Moore LH, Moore WEC, Murray RGE, Stackebrandt E, Starr MP, Tr• uper HG (1987). Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. International. Journal of Systematic Bacteriology. 37(4):463-464.

Weisburg WG, Barns SM, Pelletier DA, Lane DJ (1991) 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol. 173(2):697–703.

WHO (2012). Global tuberculosis control. In WHO report (World Health Organization).

Woese CR and Fox GE (1977) Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci. 74(11):5088-90.

Yonath A. (2005) Antibiotics targeting ribosomes: resistance, selectivity, synergism and cellular regulation. Annu Rev Biochem. 74:649-79.

Zignol M, Hosseini MS, Wright A, Weezenbeek CL, Nunn P, Watt CJ, Williams BG, Dye C. (2006) Global incidence of multidrug-resistant tuberculosis. J Infect Dis. 194(4):479-89.

59

SUPPLEMENTAL DATA

Table S-1. List of species and number of representatives. # of Organisms reps. Escherichia coli 51 Helicobacter pylori 39 Staphylococcus aureus 29 Listeria monocytogenes, 24 Chlamydia trachomatis 20 Streptococcus pneumonia 19 Corynebacterium pseudotuberculosis 15 Streptococcus suis 14 Buchnera aphidicola, Clostridium botulinum, Mycobacterium tuberculosis 13 Bacillus cereus, Neisseria meningitidis, Streptococcus pyogenes, Yersinia pestis 12 , Corynebacterium diphtheriae, Prochlorococcus marinus 11 Francisella tularensis, Mycoplasma gallisepticum 10 Acinetobacter baumannii, Bifidobacterium animalis, Bifidobacterium longum, Chlamydia psittaci 9 Acetobacter pasteurianus, Bacillus amyloliquefaciens, Bacillus thuringiensis, , Propionibacterium acnes, Rickettsia prowazekii, Rickettsia rickettsii, Treponema pallidum 8 Pseudomonas putida, Pseudomonas aeruginosa, Rhodopseudomonas palustris, Shewanella baltica, Synechococcus sp. 7 Alteromonas macleodii, Bacillus anthracis, Carsonella ruddii, Chlamydophila psittaci, Lactobacillus casei, Lactococcus lactis, Legionella pneumophila, Streptococcus thermophilus 6 Bacillus subtilis, Blattabacterium sp., Chlamydophila pneumoniae, Coxiella burnetii, Enterobacter cloacae, Geobacillus sp., Klebsiella pneumoniae, Mycobacterium sp., Mycoplasma genitalium, Pseudomonas stutzeri, Streptococcus agalactiae, 5 Synechocystis sp., Xanthomonas campestris, Xylella fastidiosa, Zymomonas mobilis Borrelia burgdorferi, Clostridium difficile, Cyanothece sp., Dehalococcoides sp., Desulfovibrio vulgaris, Enterococcus faecalis, Lactobacillus rhamnosus, Mycoplasma hyopneumoniae, Pantoea ananatis, Pasteurella multocida, Pseudomonas fluorescens, Riemerella anatipestifer, Stenotrophomonas maltophilia, Streptococcus mutans, Xanthomonas oryzae, Yersinia 4 pseudotuberculosis Actinobacillus pleuropneumoniae, Aggregatibacter actinomycetemcomitans, Amycolatopsis mediterranei, Bacillus megaterium, Bacteroides fragilis, Bifidobacterium bifidum, , Bordetella bronchiseptica, Borrelia afzelii, Arthromitus sp., Clostridium perfringens, Corynebacterium ulcerans, Corynebacterium glutamicum, Dickeya dadantii, Frankia sp., Gardnerella vaginalis, Lactobacillus delbrueckii, Lactobacillus helveticus, Lactobacillus johnsonii, Lactobacillus reuteri, Lactobacillus plantarum, Mycobacterium bovis, Mycobacterium intracellulare, Mycoplasma pneumoniae, Mycoplasma bovis, 3 Neisseria gonorrhoeae, Oligotropha carboxidovorans, Paenibacillus polymyxa, Paenibacillus mucilaginosus, Porphyromonas gingivalis, Pseudomonas syringae, Ralstonia solanacearum, Rhizobium leguminosarum, Rickettsia typhi, Shewanella sp., , Sinorhizobium meliloti, Sinorhizobium fredii, Streptococcus gallolyticus, Thermus thermophilus, Wolbachia endosymbiont, Yersinia enterocolitica Acidithiobacillus ferrooxidans, Acidovorax sp., Alicycliphilus denitrificans, Alicyclobacillus acidocaldarius, Anaeromyxobacter sp., Anaplasma marginale, Arcobacter butzleri, Bacillus coagulans, Bacillus licheniformis, , Bordetella parapertussis, Bradyrhizobium sp., Bradyrhizobium japonicum, Tremblaya princeps, Portiera aleyrodidarum, Caulobacter crescentus, Chlorobium phaeobacteroides, Clavibacter michiganensis, Clostridium sp., Clostridium thermocellum, Clostridium kluyveri, Cronobacter sakazakii, Dehalobacter sp., Desulfitobacterium hafniense, Desulfovibrio desulfuricans, Edwardsiella tarda, Enterococcus faecium, Erwinia amylovora, Erwinia pyrifoliae, Fibrobacter succinogenes, Francisella cf., Geobacter sulfurreducens, Geobacter sp., Gluconacetobacter diazotrophicus, Gluconobacter oxydans, Haemophilus somnus, Helicobacter cetorum, Hydrogenobacter thermophilus, Klebsiella oxytoca, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus amylovorus, Lactobacillus acidophilus, Lactobacillus buchneri, Lactococcus garvieae, Leuconostoc mesenteroides, Melissococcus plutonius, Methylophaga sp., Mycobacterium leprae, 2 Mycobacterium avium, Mycobacterium gilvum, Mycobacterium smegmatis, Mycoplasma hyorhinis, Mycoplasma agalactiae, Mycoplasma suis, Mycoplasma fermentans, Mycoplasma leachii, Mycoplasma mycoides, Mycoplasma haemofelis, Nitrosomonas sp., Orientia tsutsugamushi, Paenibacillus sp., Pectobacterium carotovorum, Phaeobacter gallaeciensis, Polynucleobacter necessarius, Pseudomonas mendocina, Rahnella aquatilis, Rhodospirillum rubrum, Rhodothermus marinus, Rickettsia massiliae, Rickettsia canadensis, Rickettsia slovaca, Rickettsia bellii, Salinibacter ruber, Secondary endosymbiont, Serratia sp., Shewanella putrefaciens, Shigella boydii, Shigella sonnei, Staphylococcus pseudintermedius, Staphylococcus epidermidis, Staphylococcus lugdunensis, Streptococcus equi, Streptococcus salivarius, Streptococcus parasanguinis, Streptomyces cattleya, Sulfobacillus acidophilus, Synechococcus elongatus, Taylorella equigenitalis, Thermoanaerobacter sp., Tropheryma whipplei, Ureaplasma parvum, Wigglesworthia glossinidia, Xanthomonas axonopodis

60

Acaryochloris marina, Acetobacterium woodii, Acetohalobium arabaticum, laidlawii, Achromobacter xylosoxidans, Acidaminococcus fermentans, Acidaminococcus intestini, ferrooxidans, Acidiphilium cryptum, Acidiphilium multivorum, , Acidithiobacillus ferrivorans, Acidobacterium capsulatum, Acidothermus cellulolyticus, Acidovorax ebreus, Acidovorax citrulli, Acidovorax avenae, Acinetobacter calcoaceticus, Acinetobacter oleivorans, Actinobacillus succinogenes, Actinobacillus suis, Actinoplanes missouriensis, Actinoplanes sp., Actinosynnema mirum, Advenella kashmirensis, Aequorivita sublithincola, Aerococcus urinae, , Aeromonas salmonicida, , Aggregatibacter aphrophilus, Agrobacterium sp., Agrobacterium vitis, Akkermansia muciniphila, Alcanivorax borkumensis, Alcanivorax dieselolei, Alistipes finegoldii, Alkalilimnicola ehrlichii, Alkaliphilus oremlandii, Alkaliphilus metalliredigens, Allochromatium vinosum, Alteromonas sp., Aminobacterium colombiense, Ammonifex degensii, Amphibacillus xylanus, Amycolicicoccus subflavus, Anabaena variabilis, Anaerobaculum mobile, prevotii, Anaerolinea thermophila, Anaeromyxobacter dehalogenans, Anaplasma centrale, Anaplasma phagocytophilum, Anoxybacillus flavithermus, aeolicus, haemolyticum, Arcobacter sp., Arcobacter nitrofigilis, Aromatoleum aromaticum, Arthrobacter arilaitensis, Arthrobacter phenanthrenivorans, Arthrobacter chlorophenolicus, Arthrobacter aurescens, Arthrobacter sp., Aster yellows, Atopobium parvulum, Azoarcus sp., Azorhizobium caulinodans, Azospirillum lipoferum, Azospirillum sp., Azotobacter vinelandii, Bacillus selenitireducens, Bacillus pumilus, Bacillus cytotoxicus, Bacillus pseudofirmus, Bacillus halodurans, Bacillus atrophaeus, Bacillus sp., Bacillus cellulosilyticus, Bacillus weihenstephanensis, Bacteriovorax marinus, Bacteroides helcogenes, Bacteroides salanitronis, Bacteroides vulgatus, Bacteroides thetaiotaomicron, , , Bartonella grahamii, Bartonella tribocorum, Baumannia cicadellinicola, Bdellovibrio bacteriovorus, Beijerinckia indica, Belliella baltica, Beutenbergia cavernae, Bifidobacterium adolescentis, Bifidobacterium asteroides, Bifidobacterium breve, Bifidobacterium dentium, Blastococcus saxobsidens, Bordetella avium, Bordetella petrii, Borrelia recurrentis, Borrelia garinii, Borrelia bissettii, Borrelia turicatae, Borrelia duttonii, Borrelia crocidurae, Brachybacterium faecium, hyodysenteriae, Brachyspira murdochii, Brachyspira intermedia, Brevibacillus brevis, Brevundimonas subvibrioides, Brucella microti, Brucella suis, Burkholderia pseudomallei, Butyrivibrio proteoclasticus, owensensis, Caldicellulosiruptor obsidiansis, Caldicellulosiruptor lactoaceticus, Caldicellulosiruptor kristjanssonii, Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor bescii, Caldicellulosiruptor saccharolyticus, Caldilinea aerophila, exile, Calditerrivibrio nitroreducens, Campylobacter lari, Campylobacter hominis, Campylobacter fetus, Campylobacter concisus, Hodgkinia cicadicola, Zinderia insecticola, Sulcia muelleri, Moranella endobia, Phytoplasma mali, Riesia pediculicola, Blochmannia floridanus, Blochmannia vafer, Blochmannia pennsylvanicus, Phytoplasma australiense, Azobacteroides pseudotrichonymphae, Mycoplasma haemolamae, Vesicomyosocius okutanii, Ruthia magnifica, Liberibacter asiaticus, Liberibacter solanacearum, Midichloria mitochondrii, Rickettsia amblyommii, Amoebophilus asiaticus, Pelagibacter ubique, Pelagibacter sp., Protochlamydia amoebophila, Hamiltonella defensa, 1 Desulforudis audaxviator, Puniceispirillum marinum, defluvii, Accumulibacter phosphatis, Koribacter versatilis, Solibacter usitatus, Capnocytophaga ochracea, Capnocytophaga canimorsus, Carboxydothermus hydrogenoformans, Carnobacterium sp., Carnobacterium maltaromaticum, Catenulispora acidiphila, Caulobacter segnis, Caulobacter sp., Cellulomonas flavigena, Cellulomonas fimi, Cellulophaga lytica, Cellulophaga algicola, Cellvibrio gilvus, Cellvibrio japonicus, Chelativorans sp., Chitinophaga pinensis, Chlamydia muridarum, Chlamydophila abortus, Chlamydophila pecorum, Chlamydophila caviae, Chlamydophila felis, Chlorobaculum parvum, Chlorobium phaeovibrioides, Chlorobium chlorochromatii, Chlorobium tepidum, Chlorobium limicola, Chloroflexus aggregans, Chloroflexus aurantiacus, Chloroflexus sp., Chloroherpeton thalassium, Chromobacterium violaceum, Chromohalobacter salexigens, Citrobacter rodentium, , Clostridiales genomosp., Clostridium novyi, Clostridium tetani, Clostridium acidurici, Clostridium cellulolyticum, Clostridium acetobutylicum, Clostridium acetobutylicum, Clostridium phytofermentans, Clostridium saccharolyticum, Clostridium lentocellum, Clostridium ljungdahlii, Clostridium cellulovorans, Clostridium beijerinckii, Collimonas fungivorans, Colwellia psychrerythraea, Comamonas testosteroni, Conexibacter woesei, Coprothermobacter proteolyticus, Coraliomargarita akajimensis, Corallococcus coralloides, Coriobacterium glomerans, Corynebacterium kroppenstedtii, Corynebacterium urealyticum, Corynebacterium jeikeium, Corynebacterium resistens, Corynebacterium aurimucosum, Corynebacterium efficiens, Corynebacterium variabile, Croceibacter atlanticus, Cronobacter turicensis, Cryptobacterium curtum, Cupriavidus metallidurans, Cyanobacterium UCYN-A, Cyclobacterium marinum, Cycloclasticus sp., Dechloromonas aromatica, Dechlorosoma suillum, Deferribacter desulfuricans, Dehalococcoides ethenogenes, Dehalogenimonas lykanthroporepellens, proteolyticus, Deinococcus geothermalis, Deinococcus deserti, Deinococcus gobiensis, Deinococcus maricopensis, Delftia sp., Delftia acidovorans, Denitrovibrio acetiphilus, Desulfarculus baarsii, Desulfatibacillum alkenivorans, Desulfitobacterium dehalogenans, Desulfobacca acetoxidans, Desulfobacterium autotrophicum, , Desulfobulbus propionicus, Desulfococcus oleovorans, Desulfohalobium retbaense, Desulfomicrobium baculatum, Desulfomonile tiedjei, Desulfosporosinus meridiei, Desulfosporosinus acidiphilus, Desulfosporosinus orientis, Desulfotalea psychrophila, Desulfotomaculum carboxydivorans, Desulfotomaculum reducens, Desulfotomaculum kuznetsovii, Desulfotomaculum ruminis, Desulfotomaculum acetoxidans, Desulfovibrio alaskensis, Desulfovibrio aespoeensis, Desulfovibrio africanus, Desulfovibrio salexigens, Desulfovibrio magneticus, Desulfurispirillum indicum, Desulfurivibrio alkaliphilus, Desulfurobacterium thermolithotrophum, Dichelobacter nodosus, Dickeya zeae, Dictyoglomus turgidum, Dictyoglomus thermophilum, Dyadobacter fermentans, Edwardsiella ictaluri, Eggerthella sp., Eggerthella lenta, Ehrlichia ruminantium, Ehrlichia canis, , Elusimicrobium minutum, Emticicia oligotrophica, Enterobacter sp., Enterobacter asburiae, Enterobacter aerogenes, Enterococcus hirae, Erwinia tasmaniensis, Erwinia sp., Erwinia billingiae, Erysipelothrix rhusiopathiae, Erythrobacter litoralis, Escherichia blattae, Ethanoligenens harbinense, Eubacterium eligens, Eubacterium rectale, Eubacterium limosum, Exiguobacterium antarcticum, Exiguobacterium sibiricum,

61

Exiguobacterium sp., Ferrimonas balearica, Fervidobacterium nodosum, Filifactor alocis, Finegoldia magna, Flavobacteriaceae bacterium, Flavobacterium psychrophilum, Flavobacterium columnare, Flavobacterium indicum, Flavobacterium branchiophilum, Flavobacterium johnsoniae, Flexibacter litoralis, Flexistipes sinusarabici, Fluviicola taffensis, Francisella noatunensis, Francisella novicida, Francisella philomiragia, Francisella sp., Frankia symbiont, Frateuria aurantia, Fusobacterium nucleatum, Gallibacterium anatis, Gallionella capsiferriformans, Gamma proteobacterium, Gemmatimonas aurantiaca, Geobacillus thermodenitrificans, Geobacillus kaustophilus, Geobacillus thermoglucosidasius, Geobacillus thermoleovorans, Geobacter metallireducens, Geobacter lovleyi, Geobacter bemidjiensis, Geobacter uraniireducens, Geodermatophilus obscurus, Glaciecola nitratireducens, Glaciecola sp., Gloeobacter violaceus, Gluconacetobacter xylinus, Gordonia bronchialis, Gordonia sp., Gordonia polyisoprenivorans, Gramella forsetii, Granulibacter bethesdensis, Granulicella tundricola, Granulicella mallensis, , Haemophilus parainfluenzae, Hahella chejuensis, Halanaerobium praevalens, Halanaerobium hydrogeniformans, Haliangium ochraceum, Haliscomenobacter hydrossis, Halobacillus halophilus, Halomonas elongata, Halorhodospira halophila, Halothermothrix orenii, Halothiobacillus neapolitanus, Helicobacter mustelae, Helicobacter acinonychis, Helicobacter felis, Helicobacter hepaticus, Helicobacter bizzozeronii, , Heliobacterium modesticaldum, Herbaspirillum seropedicae, Herpetosiphon aurantiacus, Hippea maritima, Hirschia baltica, Hydrogenobaculum sp., Hyphomicrobium denitrificans, Hyphomicrobium sp., Idiomarina loihiensis, Ignavibacterium album, Ilyobacter polytropus, Intrasporangium calvum, Isoptericola variabilis, Isosphaera pallida, Jannaschia sp., Janthinobacterium sp., Jonesia denitrificans, Kangiella koreensis, Ketogulonicigenium vulgare, Ketogulonigenium vulgarum, radiotolerans, Kitasatospora setae, Klebsiella variicola, Kocuria rhizophila, Kosmotoga olearia, Kribbella flavida, Krokinobacter sp., Kyrpidia tusciae, Kytococcus sedentarius, Lacinutrix sp., Lactobacillus sanfranciscensis, Lactobacillus gasseri, Lactobacillus ruminis, Lactobacillus sakei, Lactobacillus kefiranofaciens, Lactobacillus crispatus, Lactobacillus brevis, Laribacter hongkongensis, Lawsonia intracellularis, Lawsonia intracellularis, Leadbetterella byssophila, Leifsonia xyli, Leptospirillum ferrooxidans, Leptospirillum ferriphilum, Leptothrix cholodnii, Leptotrichia buccalis, Leuconostoc carnosum, Leuconostoc citreum, Leuconostoc gelidum, Leuconostoc sp., Leuconostoc gasicomitatum, Leuconostoc kimchii, Listeria ivanovii, Listeria seeligeri, Listeria welshimeri, Listeria innocua, Lysinibacillus sphaericus, caseolyticus, Magnetococcus marinus, Magnetospirillum magneticum, Mahella australiensis, Mannheimia succiniciproducens, Maribacter sp., Maricaulis maris, Marinithermus hydrothermalis, Marinitoga piezophila, Marinobacter hydrocarbonoclasticus, Marinobacter aquaeolei, Marinobacter sp., Marinobacter adhaerens, Marinomonas posidonica, Marinomonas mediterranea, Marinomonas sp., Marivirga tractuosa, ruber, Meiothermus silvanus, Melioribacter roseus, Mesoplasma florum, Mesorhizobium ciceri, Mesorhizobium opportunistum, Mesorhizobium loti, Mesotoga prima, Methanobrevibacter smithii, Methylacidiphilum infernorum, Methylibium petroleiphilum, Methylobacillus flagellatus, Methylobacterium extorquens, Methylobacterium chloromethanicum, Methylobacterium populi, Methylobacterium radiotolerans, Methylobacterium sp., 1 Methylobacterium nodulans, Methylocella silvestris, Methylococcus capsulatus, Methylocystis sp., Methylomicrobium alcaliphilum, Methylomirabilis oxyfera, Methylomonas methanica, Methylotenera mobilis, Methylotenera versatilis, Methylovorus sp., Methylovorus glucosetrophus, Micavibrio aeruginosavorus, Microbacterium testaceum, Micrococcus luteus, Microcystis aeruginosa, Microlunatus phosphovorus, Micromonospora sp., Micromonospora aurantiaca, curtisii, Modestobacter marinus, Moorella thermoacetica, , Muricauda ruestringensis, Mycobacterium massiliense, Mycobacterium indicus, Mycobacterium marinum, Mycobacterium vanbaalenii, , Mycoplasma mobile, Mycoplasma putrefaciens, Mycoplasma wenyonii, Mycoplasma synoviae, Mycoplasma crocodyli, Mycoplasma conjunctivae, Mycoplasma pulmonis, , , Mycoplasma haemocanis, Myxococcus fulvus, Myxococcus xanthus, Nakamurella multipartita, Natranaerobius thermophilus, Nautilia profundicola, Neorickettsia risticii, Neorickettsia sennetsu, Niastella koreensis, Nitratifractor salsuginis, Nitratiruptor sp., Nitrobacter winogradskyi, Nitrobacter hamburgensis, Nitrosococcus watsonii, Nitrosococcus oceani, Nitrosococcus halophilus, Nitrosomonas eutropha, Nitrosomonas europaea, Nitrosospira multiformis, Nocardia farcinica, Nocardia brasiliensis, Nocardioides sp., Nocardiopsis dassonvillei, Nocardiopsis alba, Nostoc sp., Nostoc punctiforme, Novosphingobium aromaticivorans, Novosphingobium sp., Oceanimonas sp., Oceanithermus profundus, Oceanobacillus iheyensis, Odoribacter splanchnicus, Oenococcus oeni, Olsenella uli, Onion yellows, Opitutus terrae, Ornithobacterium rhinotracheale, Oscillibacter valericigenes, Owenweeksia hongkongensis, Paenibacillus terrae, Paludibacter propionicigenes, Pantoea vagans, Pantoea sp., Parabacteroides distasonis, Parachlamydia acanthamoebae, Parvibaculum lavamentivorans, Parvularcula bermudensis, Pectobacterium wasabiae, Pectobacterium atrosepticum, Pediococcus claussenii, Pediococcus pentosaceus, Pedobacter saltans, Pedobacter heparinus, Pelobacter carbinolicus, Pelobacter propionicus, Pelodictyon phaeoclathratiforme, Pelotomaculum thermopropionicum, Petrotoga mobilis, Phenylobacterium zucineum, Photobacterium profundum, Photorhabdus asymbiotica, Photorhabdus luminescens, Phycisphaera mikurensis, Pirellula staleyi, Planctomyces limnophilus, Planctomyces brasiliensis, Polaromonas naphthalenivorans, Polaromonas sp., Polymorphum gilvum, Porphyromonas asaccharolytica, Prevotella denticola, Prevotella ruminicola, Propionibacterium freudenreichii, Propionibacterium propionicum, Propionibacterium acidipropionici, Prosthecochloris aestuarii, , Providencia stuartii, Pseudoalteromonas atlantica, Pseudogulbenkiania sp., Pseudomonas fulva, Pseudomonas brassicacearum, Pseudomonas protegens, Pseudonocardia dioxanivorans, Pseudovibrio sp., Pseudoxanthomonas suwonensis, Pseudoxanthomonas spadix, Psychrobacter arcticus, Psychrobacter sp., Psychrobacter cryohalolentis, Psychroflexus torquis, Psychromonas ingrahamii, Pusillimonas sp., Rahnella sp., Ralstonia eutropha, Ramlibacter tataouinensis, Renibacterium salmoninarum, Rhodobacter sphaeroides, Rhodobacter capsulatus, Rhodococcus equi, Rhodococcus erythropolis, Rhodococcus jostii, Rhodococcus opacus, Rhodoferax ferrireducens, Rhodomicrobium vannielii, Rhodopirellula baltica, Rhodospirillum photometricum, Rhodospirillum centenum, Rickettsia peacockii,

62

Rickettsia japonica, , Rickettsia montanensis, Rickettsia montanensis, , Rickettsia rhipicephali, , Rickettsia heilongjiangensis, , Rickettsia philipii, , , Robiginitalea biformata, Roseburia hominis, Roseiflexus castenholzii, Roseiflexus sp., Roseobacter denitrificans, Roseobacter litoralis, Rothia mucilaginosa, Rothia dentocariosa, Rubrivivax gelatinosus, xylanophilus, Ruegeria sp., Ruegeria pomeroyi, Ruminococcus albus, Runella slithyformis, Saccharomonospora viridis, Saccharophagus degradans, Saccharopolyspora erythraea, Salinispora tropica, Salinispora arenicola, Salmonella bongori, Sanguibacter keddieii, Saprospira grandis, Sebaldella termitidis, Segniliparus rotundus, Selenomonas sputigena, Selenomonas ruminantium, Serratia symbiotica, Serratia proteamaculans, Serratia plymuthica, Shewanella amazonensis, Shewanella denitrificans, Shewanella loihica, Shewanella frigidimarina, Shewanella oneidensis, Shewanella pealeana, Shewanella halifaxensis, Shewanella violacea, Shewanella sediminis, Shewanella woodyi, Shewanella piezotolerans, , Sideroxydans lithotrophicus, Simiduia agarivorans, Sinorhizobium medicae, Slackia heliotrinireducens, Sodalis glossinidius, Solibacillus silvestris, Solitalea canadensis, Sorangium cellulosum, Sphaerochaeta pleomorpha, Sphingobacterium sp., Sphingobium japonicum, Sphingobium sp., Sphingomonas wittichii, Sphingopyxis alaskensis, coccoides, Spirochaeta thermophila, Spirochaeta africana, Spirochaeta caldaria, Spirochaeta sp., Spirochaeta smaragdinae, Spirosoma linguale, Stackebrandtia nassauensis, Staphylococcus saprophyticus, Staphylococcus carnosus, Starkeya novella, Stigmatella aurantiaca, Streptobacillus moniliformis, Streptococcus uberis, Streptococcus parauberis, Streptococcus pasteurianus, Streptococcus infantarius, Streptococcus oralis, Streptococcus macedonicus, Streptococcus mitis, Streptococcus gordonii, Streptococcus dysgalactiae, Streptococcus pseudopneumoniae, Streptomyces flavogriseus, Streptomyces sp., Streptomyces griseus, Streptomyces venezuelae, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces violaceusniger, Streptomyces scabiei, Streptomyces hygroscopicus, Streptomyces bingchenggensis, Streptosporangium roseum, Sulfuricurvum kujiense, Sulfurihydrogenibium sp., Sulfurihydrogenibium azorense, Sulfurimonas denitrificans, Sulfurimonas autotrophica, Sulfurospirillum deleyianum, Sulfurovum sp., Symbiobacterium thermophilum, 1 Syntrophobacter fumaroxidans, Syntrophobotulus glycolicus, Syntrophomonas wolfei, Syntrophothermus lipocalidus, Syntrophus aciditrophicus, , Taylorella asinigenitalis, Tepidanaerobacter sp., Teredinibacter turnerae, Terriglobus saanensis, Tetragenococcus halophilus, Thauera sp., Thermacetogenium phaeum, marianensis, Thermanaerovibrio acidaminovorans, Thermincola potens, Thermoanaerobacter mathranii, Thermoanaerobacter brockii, Thermoanaerobacter pseudethanolicus, Thermoanaerobacter italicus, Thermoanaerobacter wiegelii, Thermoanaerobacter tengcongensis, Thermoanaerobacterium xylanolyticum, Thermoanaerobacterium thermosaccharolyticum, Thermoanaerobacterium saccharolyticum, Thermobifida fusca, Thermobispora bispora, Thermocrinis albus, Thermodesulfatator indicus, Thermodesulfobacterium sp., Thermodesulfobium narugense, Thermodesulfovibrio yellowstonii, Thermomicrobium roseum, Thermomonospora curvata, Thermosediminibacter oceani, Thermosipho africanus, Thermosynechococcus elongatus, naphthophila, , Thermotoga sp., Thermotoga maritima, Thermotoga thermarum, Thermotoga lettingae, Thermovibrio ammonificans, Thermovirga lienii, Thermus sp., Thermus scotoductus, Thioalkalimicrobium cyclicum, Thioalkalivibrio sp., Thioalkalivibrio sulfidophilus, Thiobacillus denitrificans, Thiocystis violascens, Thiomicrospira crunogena, Thiomonas intermedia, Tistrella mobilis, Tolumonas auensis, Treponema paraluiscuniculi, Treponema succinifaciens, Treponema brennaborense, Treponema denticola, Treponema azotonutricium, Treponema primitia, Trichodesmium erythraeum, Trichormus azollae, Truepera radiovictrix, Tsukamurella paurometabola, Turneriella parva, Uncultured Termite, , Variovorax paradoxus, , Verminephrobacter eiseniae, Verrucosispora maris, Waddlia chondrophila, Weeksella virosa, Weissella koreensis, Wolbachia sp., Wolinella succinogenes, Xanthobacter autotrophicus, Xanthomonas albilineans, Xenorhabdus bovienii, Xenorhabdus nematophila, Xylanimonas cellulosilytica, Zobellia galactanivorans, Zunongwangia profunda

63

APPENDIX I: Platforms for drug discovery

Introduction

Antibiotic drug discovery has been on the decline since the 1950s (Lewis, 2013).

Despite major efforts by the pharmaceutical industries, no new classes of broad spectrum antibiotics have been discovered in the past 50 years. New strategies need to be developed to combat the increase in multidrug and extensively drug resistant strains

(Payne, 2007; Boucher, 2009; Higgins, 2010). The goal of the following studies was to develop and optimize the described novel antimicrobial screens and demonstrate proof of principle that compounds that potentiate the activity of rifampicin against

Mycobacterium tuberculosis or with M. tuberculosis targets with potential species selective activity could be identified.

Small molecule potentiates killing of Mycobacterium tuberculosis by rifampicin.

Introduction

The current therapy used to treat a Tuberculosis infection is known as the directly observed short-course therapy (DOTS). Despite the name, the therapy is complex and lengthy (WHO, 2012). A patient is required to take a daily dose of four antibiotics for two months followed by two antibiotics for an additional four months. It has been estimated that developing therapies that reduce treatment time from six to two months would decrease the incidence of M. tuberculosis by ninety-four percent (Abbu-Raddad, 2009).

With that in mind, we ventured to develop a screen that would identify antimicrobials capable of shortening treatment time.

64

Screen development

Persister cells have been proposed as the cause of recalcitrance in treating M. tuberculosis (Lewis, 2010). As stationary cells are the most tolerant to antibiotics, this was a logical starting place to search for persister specific molecules. Traditionally, optical density is used to indicate if an antibiotic inhibits growth of a cell population.

This was not an option when developing a screen against a stationary population as optical density cannot be used to distinguish between a culture of live or dead cells in an already dense culture. Fluorescence was where we looked next.

M. tuberculosis MC26020 constitutively expressing the fluorescent protein mCherry on an exogenous plasmid allowed us to monitor fluorescence as an indication of growth.

Mtb continued to produce significant amounts of mCherry far into stationary phase (Fig.

I-1). This indicated fluorescence as a useful way to determine cell viability.

We also wanted to be certain that the molecules we identified specifically targeted persister cells. Potentiating the effects of an existing antibiotic could serve that exact purpose by ensuring growing cells were killed. Rifampicin is currently the most effective antibiotic used to treat Mtb. When treated with rifampicin, even at concentrations 10 times the minimum inhibitory concentration (0.1 µg/ml), stationary Mtb continued to produce mCherry (Fig. I-2).

65

12000

10000

8000

6000 620em.) 4000

2000

Arbitrary (560ex. Units Fluorescence Arbitrary 0 0 5 10 15 20 25 30 Time (days)

Figure I-1. Time-dependent fluorescence of M. tuberculosis constitutively expressing mCherry. Stationary M. tuberculosis was diluted 1:100 and 100ul of cells was dispensed into a 96-well black-wall clear-bottom plate and covered with a BreatheEasy strip to prevent evaporation. Fluorescene (560 ex., 620 em.) was read at 0, 1, 2, 3, 6, 7, 8, 9, 13, 14, 16, 17, and 29 days. Plates were incubated at shaking at 37C.

66

8000

7000

6000

5000

4000 Arbitrary fluorescence units (560ex. 620em.)

3000

2000

1000

0

rifampicin concentration (μg/ml)

Figure I-2. Fluorescence of stationary M. tuberculosis constitutively expressing mCherry treated with increasing concentrations of rifampicin. A freezer stock was diluted and grown for 5 days and then diluted 1:100 and regrown for 14 days to stationary phase growth. 100 ul of culture was dispensed into a 96-well black-wall clear- bottom plate. 1 ul of a serial dilution of rifampicin was dispensed to the final indicated concentrations. The plate was covered with a BreatheEasy strip to prevent evaporation. Fluorescence readings (ex. 560, em. 620) were taken at the time of challenge (blue bar) and after 7 days of treatment (red bar).

67

We used these results to develop our screening protocol (Fig. I-3). Compounds were screened at 15 ug/ml in 96-well plates with and without 0.1 ug/ml rifampicin.

Compounds that resulted in inhibited fluorescence in the presence of rifampicin while inactive alone after 11 days of treatment were considered hits (Fig. I-2). This method would ideally exclude compounds that are generally toxic or antiseptic.

Given that no rifampicin potentiator currently exists, Chlorhexidine (an antiseptic) was in combination with rifampicin acted as a positive control. Rifampicin alone was the negative control. A Z-prime—a measurement of variation and significance of the positive and negative controls of a screen (Zhang, 2008)—of 0.64 was obtained for this screen. A Z-prime above 0.8 is usually necessary to progress a high-throughput screen in industry. However, a Z-prime of 0.5 or greater is statistically significant to distinguish hits from false positives (Zhang, 2008). Given the novelty of this screen and the variability of stationary cultures, we considered 0.64 a sufficient Z-prime to produce reliable results and continue the screening process.

68

Figure I-3. Diagram of rifampicin potentiation screen protocol.

Pilot screen

Approximately 6,000 compounds from the Chembridge Diverset E small molecule library were screened in vitro for rifampicin specific activity against M. tuberculosis. 28 compounds were identified in the screen as resulting in inhibited fluorescence dependent on the presence of rifampicin. Time-dependent killing with and without rifampicin at the screening concentrations was used to further validate these compounds. Compounds that were inactive alone, but caused increased killing in the presence of rifampicin were considered validated hits.

69

One small molecule, compound 70D2 (or acetylisoniazid) (Fig. I-3) passed validation and was found to potentiate the killing activity of rifampicin against M. tuberculosis while having no killing activity alone (Fig. I-4). Acetylated isoniazid is the liver breakdown product of isoniazid, a first line drug used to treat Tuberculosis. Isoniazid has been reported to have synergistic killing effects when combined with rifampicin to treat

Tuberculosis (Simmons, 1977).

The identification of acetylisoniazid wih rifampicin specific activity validates the screening method and indicates the potential of finding more rifampicin potentiators against Mycobacterium tuberculosis.

Figure I-3. Structures of rifampicin potentiator 70D2 or acetylisoniazid (left) and Isoniazid (right).

70

1.00E+09

1.00E+08

1.00E+07 Log CFU/mL Log

1.00E+06

1.00E+05 rif 70 D2 70 D2 + rif INH 10 + rif Compounds

Figure I-4. M. tuberculosis killing by rifampicin (0.1 ug/ml) and isoniazid (3 ug/ml) or acetylisoniazid (15 ug/ml). CFU/ml was enumerated at 0 days (blue bar), 3 days (red bar), and 15 days (green bar) of treatment.

71

References

Abu-Raddad et al. (2009) Epidemiological benefits of more-effective tuberculosis vaccines, drugs, and diagnostics. PNAS. Boucher, H. W. et al. (2009) Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin Infect Dis. 48, 1–12. Higgins, P. G., Dammhayn, C., Hackel, M. & Seifert, H. J. (2009) Global spread of carbapenem-resistant Acinetobacter baumannii. J Antimicrob Chemother. 65, 233– 238. Lewis, K. (2010) Persister cells. Annu Rev of Microbiology Lewis, K. (2013) Platforms for antibiotic discovery. Nat Rev Drug Discov 12(5):371- 387. Payne DJ, Gwynn MN, Holmes DJ, & Pompliano DL. (2007) Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Drug Discov 6(1):29- 40. Simmons NA. (1977) Synergy and rifampicin. J Antimicrob Chemother Zhang JH et al. (1999) A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen. WHO (2012). Global tuberculosis control. In WHO report (World Health Organization).

72

A screen for species-selective compounds

No new classes of antibiotics have been discovered in the past 50 years (Lewis, 2013).

After overmining of natural product and synthetic libraries for broad spectrum antibiotics resulted in the repeated discovery of known compounds, discovery of novel antibiotics halted. Since then, multidrug and extensively drug resistant strains are on the rise and regimens of toxic antibiotics are now being prescribed as a last ditch effort to combat infections (Payne, 2007; Boucher, 2009; Higgins, 2010). Narrow spectrum antibiotics, previously discarded as undesirable, present an opportunity for the discovery of novel classes of therapeutics.

We hypothesize that by screening a series of organisms in parallel, compounds with selective activity could be identified. This method also has the benefit of screening out already known, broad spectrum compounds. We also hypothesized that by selecting for species selective targets, the identification of cytotoxic compounds –drugs that hit human cell targets—would be identified less frequently.

Pilot screen

Bacteria were grown to stationary phase in rich medium, diluted 1:1000 into 15 ug/ml of compound at 100 ul total volume. The bacteria were allowed to regrow to stationary phase and growth was measured by optical density (600 nm). Any compound exhibiting

80% inhibition compared to full growth (no compound) was considered active against that organism. We used this activity data to create a spectrum of activity profile for each compound. We further characterized the specificity of each compound by examining the taxonomic classification and relationship between all of the organisms tested.

73

Table I-1. Hit rate of species-selective screen. All Hits Specific Hits All Bacteria () 2 0.02%

Firmicutes (phylum) 19 0.17% 10 0.09% (class) 26 0.23% 1 0.01% Lactobacillales (order) 28 0.25% 0 0.00% Clostridium (genus) 126 1.13% 61 0.54% C. difficile 354 3.16% 184 1.64% C. perfringens 456 4.07% 267 2.38% S. aureus 185 1.65% 64 0.57% E. faecalis 40 0.36% 3 0.03% S. mutans 75 0.67% 1 0.01% B. fragilis 36 0.32% 4 0.04% H. pylori 535 4.78% 451 4.03% *H. pylori data courtesy of Arietis Pharmaceuticals.

Using the classification of each organisms—as described by the National Center for

Biotechnology Information (NCBI) taxonomy website (Sayers, 2009)—I created a classification tree for all of the organisms tested. I further characterized the compounds that were “All Hits” or “Specific Hits” at each branch point that data was available and overlayed the resulting selectivity data on the taxonomic tree (Fig. I-1). There were specific hits and “All Hits” at every branch point of the tree.

74

Figure I-1. Results of the species-selective screen overlayed on the taxonomic relationship of screened organisms. Specific hits (left of branch point label) were the number of compounds (in parenthesis) or percentage that exclusively inhibited organisms included in that branch point and none of the other organisms. “All Hits” (right of branch point label) was the number of hits (in parenthesis) and percentage of compounds that inhibited the organisms at that branch point regardless of other activity.

75

Results and discussion

Approximately 15,000 compounds from the Chembridge Diverset-E small molecule library were tested in parallel against Staphylococcus aureus, Clostrium difficile,

Clostridium perfringens, Bacteroides fragilis, Helicobacter pylori (activity data courtesy of Arietis Pharmaceuticals), Streptococcus mutans, and Enterococcus faecalis.

Species-selective hit percentages (compounds that only inhibit that organism and none of the others tested) and “All Hits” percentages (compounds that hit each organism regardless of other activity) were calculated for each organism (Table I-1). H. pylori was the most sensitive of the organisms tested, both for “All Hits” and species selective hits.

Streptococcus mutans was the fewest of species selective hits with only 1 compound.

Bacteroides fragilis was the least succeptible overall with a hit rate of 0.32% for “All

Hits”.

Perhaps the most interesting result from the study is that compounds with selective acitivity were identified at every branch point of the taxonomy analyzed. Not only does this validate that species selective compounds can be identified using a comparative screening approach, but activity can potentially be tailored to target any organism or if desired, entire classes of organisms.

76

References

Boucher, H. W. et al. (2009) Bad bugs, no drugs: no ESKAPE! An update from the

Infectious Diseases Society of America. Clin Infect Dis. 48, 1–12.

Higgins, P. G., Dammhayn, C., Hackel, M. & Seifert, H. (2009) Global spread of carbapenem-resistant Acinetobacter baumannii. J Antimicrob Chemother. 65, 233–238.

Lewis K (2013) Platforms for antibiotic discovery. Nat Rev Drug Discov 12(5):371-387.

Payne DJ, Gwynn MN, Holmes DJ, & Pompliano DL (2007) Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Drug Discov 6(1):29-40.

77

APPENDIX II: Lassomycin, a lasso peptide kills Mycobacteria tuberculosis by targeting the ATP-dependent protease ClpC1P1P2

*Submitted for publication in Chemistry and Biology

Lassomycin, a lasso peptide kills Mycobacteria tuberculosis by targeting the ATP-dependent protease ClpC1P1P2

Ekaterina Gavrish1, Clarissa S. Sit2; Olga Kandror3, Amy Spoering4, Aaron Peoples4, Losee Ling4, Ashley Fetterman4, Dallas Hughes4, Shugeng Cao2, Anthony Bissell1, Heather Torrey1, Tatos Akopian3, Andreas Mueller3, Slava Epstein1, Alfred Goldberg3, Jon Clardy2, and Kim Lewis1

1Antimicrobial Discovery Center, Department of Biology, Northeastern University, Boston, MA, USA 02115. 2Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA 02115. 3Goldberg Laboratory, Department of Cell Biology, Harvard Medical School, Boston, MA, USA 02115. 4NovoBiotic Pharmaceuticals, LLC, Cambridge, MA, USA 02138.

Correspondences to:

Kim Lewis, 360 Huntington Ave. Boston, MA, 02115, (617) 373-8238, [email protected]. Jon Clardy, 240 Longwood Ave. Boston, MA 02115, (617) 432-2845, [email protected]. Alfred Goldberg, 240 Longwood Ave. Boston, MA 02115, (617) 432-1855, [email protected].

Keywords: Drug Discovery, Mycobacterium tuberculosis, Lassomycin, Species specific, Lasso peptide

78

Abstract

Diminishing returns from exploitation of natural products is largely responsible for the end of the golden era of antibiotic discovery. In the absence of an effective platform for drug discovery, resistant pathogens rise and spread unchallenged. We considered uncultured bacteria that make up 99% of all diversity as an untapped source of secondary metabolites. Methods to grow uncultured bacteria such as incubation in situ enabled production of a large library of extracts. In order to further improve the probability of discovering novel compounds, a specific screen was developed to target a particular pathogen. There are now pan-resistant strains of M. tuberculosis, and we chose to target this important pathogen. In a screen that included a counter-selection against S. aureus, a novel antimicrobial, lassomycin, was discovered. Lassomycin is a potent bactericidal compound with selective activity against mycobacteria, including drug-resistant forms of M. tuberculosis. The compound had little activity against other bacteria or mammalian cells. Lassomycin targets the ClpC1 ATPase of the ClpC1P1P2 protease which is essential and unique to mycobacteria. Lassomycin is a lasso-shaped peptide with a unique structural fold. The compound dramatically activates the ATPase activity of ClpC1, and effectively kills both growing and dormant cells of the pathogen.

79

Introduction

The scarcity of novel lead compounds is now the major bottleneck in the development of novel antimicrobial drugs (1, 2). In the absence of new therapies, the rise and spread of multidrug resistant pathogens will continue unchecked. Most antibiotics in use today resulted from screening of soil actinomycetes for active compounds. However, overmining of this limited resource has led to diminishing returns, and most current efforts result in rediscovering known compounds (3). Consequently, there has been a general elimination of natural product discovery in most pharmaceutical companies.

Using an untapped source of compounds and solving the problem of rediscovery could lead to novel antimicrobials. Uncultured species of bacteria account for 99% of all microbial diversity and represent an unexploited source of secondary metabolites (4).

We have developed general methods to grow uncultured bacteria, based on cultivation in diffusion chambers in their natural environments (5) and on prolonged incubation in vitro (6). This approach produces up to 40% of growth recovery. However, even with this untapped source, most of the effort in antibiotic screening is spent on rediscovery of known compounds or generally toxic ones. We reasoned that this problem may be solved by a species-selective approach, whereby compounds with broad spectra are eliminated, and only compounds active against a specific species are considered. We chose Mycobacterium tuberculosis as a target organism for this approach, since few natural products are known to act specifically against this pathogen, and therefore most of the specific hits obtained should be new agents. There is also a considerable medical need for novel anti-TB compounds (7) to stem the spread of extremely- and totally drug- resistant strains of the pathogen.

80

We screened extracts from a collection of soil bacteria, obtained by in situ cultivation and by prolonged incubation, against M. tuberculosis and counterscreened against S. aureus. A novel antimicrobial, lassomycin, was purified from an extract of Lentzea kentuckyensis sp.). Lassomycin is a potent bactericidal compound that targets the

ClpC1 ATPase of the essential mycobacterial protease ClpC1P1P2.

Results and Discussion

A library of extracts from soil actinomycetes was screened against M. tuberculosis.

Determining inhibition of M. tuberculosis growth by optical density requires 2 weeks. To shorten the duration of screens, we constructed a strain constitutively expressing mCherry, and used fluorescence as a readout. This method allowed for a reliable detection of growth inhibition in five days. The screen had a hit rate of 10% against M. tuberculosis. A counterscreen against S. aureus had a hit rate of 30%, and the hit rate for extracts specifically acting against M. tuberculosis was 2%. One of the first extracts identified that acted specifically against M. tuberculosis was from isolate IS009804, a

Lentzea kentuckyensis sp. (99.7% identical to L. kentuckyensis, accession number:DQ291145, by 16S rDNA). The extract contained a compound with an unreported mass. The extract was fractionated by HPLC, and a single active fraction was identified by bioassay-guided purification. This fraction was lyophilized, leaving a white powder. Analysis of this fraction by LC-MS indicated that a single major compound was present ([MH]+ = 1880).

81

Preliminary NMR studies indicated that the active compound was a peptide, and further analysis revealed an Asp-Gln-Leu-Val-Gly pentapeptide sequence. Elucidation of the entire structure proved to be quite challenging, and multiple approaches were employed. The producing strain was cultured in a 13C glucose and 15N uracil medium to produce a uniformly labeled compound for further analysis by three-dimensional NMR techniques, the pentapeptide sequence was used as a search fragment in the producing strain’s genome to identify the biosynthetic genes, and MS/MS was employed to experimentally identify the peptide’s sequence. These combined approaches revealed that the active compound, which we have named lassomycin, consists of 16 amino acids in which the N-terminal residues form an 8-residue ring through formation of an amide bond between the terminal amine and the side chain carboxyl group of

Asp8 with the C-terminal carboxyl converted to a methyl ester (Fig. 1A). Acid hydrolysis of lassomycin, followed by derivatization with Marfey’s reagent, and LC/MS analysis established that all of the residues are L-amino acids.

The three-dimensional solution structure of lassomycin was deduced from the NOE distance restraints obtained from three-dimensional NMR data using CYANA 2.1 (Fig.

1B). Surprisingly, the solution structure of lassomycin lacks the characteristic knot structure reported for other homologous lasso peptides like lariatin A and microcin J25

(8) as the C-terminal end packs tightly against the N-terminal ring instead of feeding through the macrolactam (Fig. 1B, Fig. S1-S4, Table S1). Lassomycin has four

82

positively charged arginine side chains and no negatively charged group as the terminal carboxyl is esterified.

The resulting structure was consistent with the biosynthetic genes identified in the producing strain’s genome. The structural gene itself shows highest homology by

BLAST to larA in the larABCDE operon which codes for lariatin A. Produced by

Rhodococcus jostii, lariatin A is a cell wall synthesis inhibitor (9) and a member of the lasso peptide class (Fig. 1A). Lasso peptides consist of 16-21 amino acids with an N- terminal macrolactam and are produced by (Streptomyces,

Rhodococcus) and (Escherichia, Burkholderia). Lariatin A is an 18-amino acid peptide with an 8 member ring that is formed between the N-terminal glycine and glutamic acid at position 8. LarA encodes a precursor peptide that is believed to be cleaved by LarD and enzymatically converted to the mature lasso structure by LarB to produce the active peptide. LarE exports the mature peptide. LarC’s function is unknown, but is necessary for activity (10). The lariatin A precursor peptide shares only

53% homology to the lassomycin precursor. The remainder of the lassomycin operon shares high homology to the lariatin operon (Fig. 1C).

83

Figure II-1A. The amino acid sequence and post-translational modifications of lassomycin. Blue numbering indicates the positions of residues 1, 8 and 16.

Figure II-1B. Backbone structure of lassomycin. The N- and C-termini are labeled (left), and structure of lassomycin showing side chains (right) is presented.

Figure II-1C. The lassomycin biosynthetic operon. Protein sequence homology to the lariatin

A biosynthetic operon in Rhodococcus indicated in parenthesis next to the respective genes.

84

Lassomycin had a minimum inhibitory concentration (MIC) of 0.8-3 μg/ml, fairly potent for a peptide, against a variety of M. tuberculosis strains, including MDR (multidrug resistant) and XDR (extremely drug resistant) isolates (Table 1).

Table II-1. Lassomycin activity against Mtb. Table II-2. Lassomycin specificity.

Strains MIC, μg/ml Strains MIC, μg/ml H37Rv 0.78-1.56 Actinobacteria 186, susceptible clinical isolate 1.56 M. tuberculosis H37Rv 0.78 - 1.56 83, susceptible clinical isolate 1.56-3.1 2 M. tuberculosis mc 2060 0.39 - 0.78 84, resistant to INH, STR 1.56-3.1 M. avium paratuberculosis 0.125 - 0.25 85, resistant to INH, RIF 1.56-3.1 7, resistant to INH, RIF 1.56 M. smegmatis 0.78 - 2 86, resistant to INH, RIF, STR 1.56 Propionibacterium acnes 12.5 - 25 136, resistant to INH, RIF, STR, FQ 0.78 Bifidobacterium longum 25 - 50 133, resistant to INH, RIF, STR, FQ 0.78 Other Gram-positive bacteria 189, resistant to INH, RIF, STR, FQ 3.1 Clostridium difficile >50 3, resistant to INH, RIF, EMB, FQ 3.1 Clostridium perfringens >50 30, resistant to INH, RIF, EMB, PZA, 0.78 FQ Lactobacillus reuteri >50 181, resistant to INH, RIF, EMB, PZA, Lactobacillus casei >50 0.78 FQ Streptococcus mutans >50 183, resistant to INH, RIF, STR, EMB, 3.1 Enterococcus faecalis >50 PZA, FQ Enterococcus faecalis VRE >50 188, resistant to INH, RIF, STR, EMB, 3.1 Bacillus anthracis Sterne >50 PZA, FQ Staphylococcus aureus >50 Gram-negative bacteria Bacteroides fragilis >50 Escherichia coli K12 >50 Klebsiella pneumoniae >50

INH, isoniazid; RIF, rifampicin; STR, ; EMB, ethambutol; PZA, pyrazinamide; FQ, resistant to at least one fluoroquinolone.

Lassomycin was discovered in a screen designed to identify compounds acting specifically against M. tuberculosis. We therefore tested the compound against a panel of diverse bacterial species (Table 2). Lassomycin was highly active against M. tuberculosis; Mycobacterium avium paratuberculosis that is a gastrointestinal pathogen of cattle and a suspected pathogen in Crohn’s disease; and Mycobacterium smegmatis, a soil microorganism. The compound was less active against other actinobacteria tested, and had no activity against other microorganisms tested (Table 2). Importantly,

85

lassomycin was inactive against symbionts of the human microbiota that are suppressed by conventional non-specific antibiotics (11). The lassomycin MIC was unchanged in the presence of serum, indicating resistance to serum proteases and the lack of significant protein binding. The compound caused no lysis of erythrocytes and had low cytotoxicity (IC50, 350 µg/ml) against human NIH 3T3 and HepG2 cells.

Lassomycin had an MBC of 1-4 µg/ml against M. tuberculosis and M. avium paratuberculosis. Lassomycin showed excellent killing activity in a time-dependent assay against exponentially growing cells of M. tuberculosis (Fig. 2A). Thus, its potency is similar to that of the best existing bactericidal agent, rifampicin (Fig. 2A). Stationary cells of M. tuberculosis are highly tolerant to most antibiotics, for example, rifampicin showed a characteristic biphasic killing (Fig. 2B) and a significant number of persister cells survive exposure (12). By contrast, lassomycin had greater killing activity against stationary M. tuberculosis than rifampicin without an obvious presence of surviving persisters.

86

Figure II-2. Time-dependent killing of M. tuberculosis mc26020 by lassomycin. All drugs administered at 10x MIC. Each point represents the average of three biological replicates.

Rifampicin (circles), lassomycin (triangles), or untreated (squares) of exponential (A) and stationary (B) M. tuberculosis. Dashed line indicates the limit of detection.

In order to determine the target of lassomycin, resistant mutants of M. tuberculosis were obtained by selecting colonies on nutrient agar plates containing the compound.

Mutants resistant to 16 µg/ml of lassomycin were obtained at a frequency of 3x10-7.

Genome sequencing of six resistant mutants (Fig. 3) showed mutations in the ClpC1 subunit of the hexameric ATPase complex, which regulates the two-ring protease complex, ClpP1P2, in mycobacteria (13, 14). Together, they form a large (26-subunit)

ATP-dependent protease complex, ClpC16P17P27C16, in which the ClpC1 ATPase

87

binds protein substrates, unfolds them, and translocates them into the ClpP1P2 central chamber for proteolysis (15, 16).

Figure II-3. Sequences of the two N-terminal repeat regions of ClpC1 of mutants resistant to lassomycin. Amino acid changes are indicated in red.

Because the resistant mutants were mapped to the ClpC1 gene, we tested directly if lassomycin altered the activity of this ATPase, which is a member of the AAA family of hexameric ATPases (15, 17). The cloned His6-tagged M. tuberculosis ClpC1 was expressed in M. smegmatis and purified to near-homogeneity, as described previously

(13). As expected, ClpC1 exhibited ATPase activity and stimulated the ATP-dependent degradation of the model protein substrate, casein, by the novel ClpP1P2 protease complex, which we recently described (13, 14). Thus, ClpC1 promotes casein translocation into the proteolytic compartment formed by ClpP1P2. Surprisingly, in the presence of low concentrations of lassomycin, ATP hydrolysis by ClpC1 increased dramatically (up to 7-10 fold). Protein substrates of ClpC1 (e.g. casein) enhance its

ATPase activity 2-3 fold (13), but this stimulation by lassomycin is much more dramatic, and a similar activation was seen in the presence of casein. This activation showed a

88

Ka of 0.5 µM, which resembles the MIC against M. tuberculosis cells. Thus, there must be little or no barrier to lassomycin’s entry into the bacteria. This effect of lassomycin on

ATP hydrolysis was highly cooperative and showed a Hill coefficient of 2 (Fig. 4). This unexpected dramatic activation of ClpC1 likely accounts for its bactericidal activity, which perhaps results from excessive protein degradation by the ClpC1P1P2 complex.

Interestingly, the bactericidal activity of the acyldepsipeptide antibiotics has been shown to result from their direct binding to ClpP and also causing excessive ClpP-mediated proteolysis (18). Alternatively, the excessive ATP consumption could itself be deleterious, or it may be uncoupled from proteolysis, and may prevent the regulated destruction of ClpC1P1P2 substrates. It is noteworthy that “cyclomarin”, a novel antibiotic that had been proposed to target this same enzyme (19), had no effect on the

ATPase activity of these preparations of ClpC1.

Figure II-4. Lassomycin

dramatically increases ATPase

activity of ClpC1. A. 2 µg of pure

ClpC1 were mixed with 100 µl of

the assay buffer (50 mM TrisHCl

pH 7.8; 100 mM KCl; 10%

glycerol; 1 mM

phosphoenolpyruvate; 1 mM

NADH; 2ml pyruvate kinase/lactic

dehydrogenase (Sigma); 1 mM

ATP) and the ATPase activity of

ClpC1 was followed by measuring

89

the coupled oxidation of NADH to NAD spectrometrically at 340 nm. Similar results were obtained when the ATPase activity was measured with the Malachite Green method (20). The rate of ATPase activity in the absence of lassomycin was taken as 100%. B. The Ka and Hill coefficient for lassomycin activation of ClpC1 ATPase were determined using curve fitting with classic Hill-kinetic through a Scaled Levenberg-Marquardt algorithm; tolerance 0.0001.

ClpC1 is a member of the large AAA family of ATPases that serve a variety of key functions in animal and bacterial cells (16). Use of lassomycin as an anti TB drug would presumably not be advisable if it activated other such ATPases. To determine whether lassomycin also affected the activities of related AAA ATPases, we tested several purified well-characterized bacterial and mammalian homologs. No effect was seen against ATP hydrolysis by the other Clp family ATPase in M. tuberculosis, which activates degradation of distinct proteins by ClpP1P2 (Fig. 5). In addition, no stimulation was observed with the E. coli ClpC1 homolog, ClpA, a component of the E. coli ClpAP protease complex, PAN, the proteasomal activating ATPase, from the archaebacteria,

Methanococcus jannaschii, and the mammalian 26S proteasome (Fig. 5). This highly specific activation of ClpC1 is clearly novel and of appreciable mechanistic interest.

Somehow, binding of multiple lassomycin molecules must lead to an accelerated ATP-

ADP exchange cycle and much more rapid functioning of its six subunits (21).

90

Figure II-5. Stimulation of ClpC1 ATPase activity by lassomycin is highly specific. The activities of purified ATPases from bacteria (M. tuberculosis ClpX; E.coli ClpA, ClpB and

GroEL), archaea (PAN) and mouse (26S proteasome) were measured in the presence and absence of lassomycin (10 µM) as described in Fig. 4. ATPase activity of each ATPase in the absence of lassomycin was taken as 100%.

In order to gain further insight into the interaction of lassomycin with ClpC1, an in silico approach was utilized. Because the structure of ClpC1 from M. tuberculosis has not been solved, a homology model was created based on the known structure(s) of the N- terminus of Corynebacterium glutamicum ClpC (1, 22, 23). We focused on the N- terminal region where all the lassomycin-resistant mutations were localized. ClpC from

C. glutamicum and M. tuberculosis have >74% sequence identity and >87% identity in their N-terminal regions (residues 1-144, Fig. S5). Visualization of the mutation sites revealed them to be close to each other on ClpC1 (Fig. S7A). Furthermore, they were

91

all located in an acidic region, which is likely to be the drug’s binding site. Lassomycin's four positively charged guanidinium groups should interact strongly with this acidic region. Docking (24) of lassomycin onto the homology model of ClpC1’s N-terminus showed all of the nine obtained binding states in the same vicinity. Analysis of the residues contributing to binding indicated that Gln17, which was altered in four of six resistant mutants (Fig. 6), is one of the major interacting residues through hydrogen bonding. In all four cases, Gln17 was mutated to an Arg or His, and the resulting reversal of the charge should markedly reduce the tendency for lassomycin binding.

The other mutation sites, Arg21 and, particularly Pro79 are located, in most models, on the rim of the surface area contacting the drug (Fig. 7).

Figure II-6. Mutant ClpC1 model

variants yield fewer binding

positions for the region of the

lassomycin-resistant mutations

than the wild-type model (WT).

Docking was performed assuming a

flexible (black) or rigid (shaded)

backbone of lassomycin.

To evaluate binding of lassomycin to the resistant mutants of ClpC1, similar models of the mutant forms were created. Due to software limitations, several variant models had to be tested. The docking showed that the Gln17Arg mutation has the largest impact on lassomycin binding, and four of the six Gln17Arg mutations showed a significant reduction in the number of likely positions for lassomycin, assuming a flexible backbone

92

(Fig. 6, S3A). Only one variant of the Pro79Thr mutation also reduced binding. Docking analysis using a rigid backbone for lassomycin also showed reduced binding to four of the six Gln17Arg mutants. Although assuming a rigid backbone for lassomycin resulted in fewer positive positions with the wild-type (WT) enzyme than the flexible one, the rigid conformation showed reduced binding in seven of eleven cases. Both results confirmed that Gln17 is critical for binding, while Arg21 and Pro79 appear less important and probably decrease binding by altering the protein's tertiary structure. This analysis, while useful, has clear limitations. For example, it cannot model changes in ClpC1’s tertiary structure, and possible structural changes upon hexamer formation are not considered.

In fact, AAA ATPases are highly dynamic structures, and the upper loop in the C. glutamicum ClpC N-terminal crystal structure has a large temperature-factor (Debye-

Waller) value, indicating a flexible region (Fig. S7B). Also the conformation of lassomycin on the enzyme and possible influence of water on binding are not known.

Nevertheless, the approach indicates clear differences between WT and mutant structures in the likely drug-binding site (Fig. 7).

Figure II-7. An image of a docking result:

Lassomycin (purple) binds at the mutation sites in the

M. tuberculosis ClpC1 N-domain (green; C-terminus

in red). Mutation sites are labeled and shown in

orange. Other residues interacting with lassomycin

are colored in dark green. Polar contacts are

depicted as yellow dashed lines. Surfaces of both

molecules are transparently overlaid.

93

In multiple respects, lassomycin is a very unusual bactericidal antibiotic with a novel mechanism of action. 1) It is unusual in its ability to selectively kill mycobacteria. This specificity can be explained by its targeting of ClpC1, without affecting related hexameric AAA ATPases, and the fact that ClpC1 and ClpP1P2 are essential for viability in mycobacteria but not in the great majority of bacteria. 2) Its structure, which proved difficult to elucidate, is generated by cleavage of a ribosomal product, cyclization, and C-terminal esterification. 3) It’s an unusually basic, 16-residue lasso peptide, which nevertheless even at low concentrations can enter the M. tuberculosis cytosol to interact with an acidic terminal pocket on ClpC1. 4) It is quite unusual for a drug to activate (and not inhibit) its target enzyme. In fact, the cooperative mechanism by which lassomycin dramatically stimulates ClpC1 activity is unclear and intriguing since these hexameric enzymes normally function by an ordered reaction cycle (13) involving each of its six subunits. 5) A related important question for future research is how the acceleration of ATP hydrolysis by ClpC1 kills the cells. One attractive possibility is that it drives excessive protein unfolding and degradation by ClpP1P2. 6) Based on its potency, which resembles or exceeds that of the standard treatments for TB (e.g. rifampin), lassomycin (or derivatives) represents a promising approach to treat this widespread disease and its drug-resistant forms.

94

Materials and methods

Bacterial strains and plasmids

M. smegmatis mc2155 were grown at 37 °C in Middlebrook 7H9 broth with 0.05%

Tween-80 and ADC (0.5% BSA, 0.2% dextrose, 0.085% NaCl, 0.003 g catalase/1 L media) with hygromycin (50 µg/ml) and anhydrotetracycline (ATc) (100 ng/ml). C- terminally 6XHis-tagged ClpC1 was expressed in M. smegmatis on pTetOR plasmid, which has an inducible tetracycline promoter.

Strain identification

16S rDNA sequence analysis was utilized to determine the taxonomic identity of isolate

IS009804. Chromosomal DNA was isolated from approximately 106 cells after a five minute vigorous agitation in the presence of 50 mg of glass beads and 100 μl of H2O in a 0.5 ml Eppendorf tube. PCR amplification of nucleotide bases 20 through 710 of the gene encoding the 16S rRNA was carried out using IS009804 chromosomal DNA,

GoTaq Green Master Mix (Promega M7122), and universal primers Bac8F and 782R

(Baker 2003). PCR thermocycler parameters included 30 cycles of 95 °C for 30 s, 45 °C for 30 s, and 72 °C for 105 s. The amplified DNA fragment was sequenced by Macrogen

(Rockville, MD) using primer 782R and compared by BLAST alignment to the GenBank nucleotide collection.

Fermentation of natural isolates

Colonies of IS009804 were homogenized and transferred into a 250 mL Erlenmeyer flask containing 40 mL of seed broth (15 g glucose, 10 g malt extract, 10 g soluble starch, 2.5 g yeast extract, 5 g casamino acids, and 0.2 g CaCl2•2H2O per 1 L dIH2O,

95

pH adjusted to 7.0 before autoclaving). The seed broth was incubated for 7 days at 28

°C on a rotary shaker (2.5 inch throw, 200 rpm) prior to production medium inoculation at 2.5% (v/v). IS009804 was screened for antibiotic production after growth in a panel of fermentation production media by removal of aliquots of the crude broths after 4 and 11 days of growth. The active fermentation production media was R4 broth (10 g glucose,

1 g yeast extract, 0.1 g casamino acids, 3 g proline, 10 g MgCl2-6H2O, 4 g CaCl2-

2H2O, 0.2 g K2SO4, 5.6 g TES free acid (2-[[1,3-dihydroxy-2-(hydroxymethyl)propan-2- yl]amino]ethanesulfonic acid) per 1 L dIH2O, pH to 7 before autoclaving).The samples were concentrated to dryness and reconstituted in 100% DMSO. Further fermentation was performed in 500 mL aliquots of R4 in 2 L tri-baffled flasks for 5 days at 28 °C on a rotary shaker (2.5 inch throw, 200 rpm).

Compound spectrum, activity

To monitor in vitro cytotoxicity, exponentially growing NIH 3T3, and HepG2 cells in media supplemented with 10% fetal bovine serum were harvested and seeded at 5000 cells per well in a 96-well flat bottom plate (25). After 24 hours the medium was replaced with fresh medium containing compounds added at a two-fold serial dilution, in a process similar to an MIC assay. After 24 and 48 hours of incubation, cell viability will be measured with the CellTiter 96® Aqueous One Solution Cell Proliferation Assay

(Promega), according to manufacturer’s recommendations.

High throughput screen

An auxotrophic strain of M. tuberculosis mc26020 (ΔlysA ΔpanCD) expressing the red fluorescent protein mCherry was screened against 25,600 crude extracts (3200 strains,

96

4 fermentation media, 2 time points). All crude extracts were counterscreened against

S. aureus and the resulting hit rate was 2%. To evaluate the quality of our assay the Z’- factor was calculated for several growth conditions. A Z´ factor between 1 and 0.9 is considered an excellent assay, between 0.9 and 0.7 is good and between 0.7 and 0.5, will benefit significantly from any improvement (26). The highest Z’ factor (0.8 ± 0.06) was obtained for the plates containing 150 µl of cell culture after 5 days of incubation with shaking. Z’ factors of more than 0.7 are considered good assays and we proceeded with the high throughput screening. Lassomycin was detected in crude extracts of one actinomycetes isolates.

Isolation of lassomycin

The fermentation culture (4.0 L) was centrifuged for 20 minutes and the supernatant was decanted and the cell pellet was extracted with acetone (2.0 L) and centrifuged again. The acetone extract was combined with the supernatant and HP-20 resin (75 g).

This mixture was then concentrated under reduced pressure on a rotary evaporator until all of the acetone had been removed. The resin was first washed with water, then with

80% aqueous acetone before eluting with acetone. Each sample was tested in a M. smegmatis bioassay. The acetone sample contained all of the activity. The acetone solution was then concentrated under reduced pressure to an orange oil. Hexanes were added and the mixture was then centrifuged. The hexanes were then decanted, tested for activity and discarded because there was no activity. The remaining residue was then reconstituted in 50% aqueous DMSO and the sample was fractionated using an

HPLC instrument equipped with a reversed-phase C18 column eluting with a gradient of

H2O/Acetonitrile/0.1% TFA over 45 minutes. All fractions were tested for activity

97

against M. smegmatis and only one fraction was active. This fraction was lyophilized, leaving a white powder (80 mg). Analysis of this fraction by LC-MS indicated that a major compound was present ([MH]+ = 1880). Subsequent searches in a natural product database (AntiBase) did not result in any previously reported compound that matched our results.

Purification of M. tuberculosis ClpC1

C-terminally 6XHis-tagged M. tuberculosis ClpC1 subunits were overexpressed in M. smegmatis using an ATc inducible expression system. After overnight induction with

ATc (100 ng/ml), cells were resuspended in buffer A (50 mM TrisHCl pH 7.8, 100 mM

KCl2, 10% glycerol, 1 mM ATP, 4 mM MgCl2), lysed by French press, and lysates were centrifuged for 1 hour at 100,000 g. ClpC1 was isolated from the supernatant by Ni-NTA affinity chromatography (Qiagen). Eluted fractions containing ClpC1 protein were pooled and further purified by size-exclusion chromatography on Sephacryl S-300 column in buffer A. Fractions containing ClpC1 protein were concentrated and used in the assay of ATPase activity.

Measurement of ATPase

2 µg of pure ClpC1 were mixed with 100 ml of the assay buffer (50 mM TrisHCl pH 7.8;

100 mM KCl; 10% glycerol; 1 mM phosphoenolpyruvate; 1 mM NADH; 2 ml pyruvate kinase/lactic dehydrogenase (Sigma); 1 mM ATP) and the ATPase activity of ClpC1 was followed by measuring the coupled oxidization of NADH to NAD spectrometrically at 340 nm.

98

Protein structure visualization

The PyMOL Molecular Graphics System, Version 1.5.0.1 Schrödinger, LLC.

Protein sequence alignment

Alignments of protein sequences were generated with the web-tool EMBOSS Needle algorithm for pairwise protein sequence alignments (EMBL-EBI, Wellcome Trust

Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK).

Molecule docking

AutoDock Vina v1.1.2 was used to perform the docking runs(24). The structure files for the ligand and the receptor were prepared with AutoDockTools v1.5.6rc3 45 (Molecular

Graphics Laboratory, The Scripps Research Institute). Preparation of the receptor molecules (M. tuberculosis ClpC1 N-domain homology model and mutants) included the following steps: addition of all hydrogens, computation of Gasteiger charges, and merging of non-polar hydrogens. Mutated receptor molecules were created with

PyMOL's mutagenesis tool. Also, lassomycin was converted with AutoDockTools assuming all bonds rotatable, except the closed lasso ring backbone. Computations of the docking runs were conducted on the Harvard University Faculty of Arts and Science

Research Computing Cluster “Odyssey”.

In silico docking of lassomycin to ClpC1

The solved structure of Corynebacterium glutamicum ClpC was used (PDB-ID: 3FH2) as a template. According to the SWISS-MODEL quality evaluation, the obtained M. tuberculosis ClpC1 homology model had its evaluation parameters close to the average

99

of solved X-ray structures. It achieved a Q-MEAN Z-score of 0.52 (Fig. S6A). The

ANOLEA algorithm revealed favorable energy states among the whole sequence of the

N-terminus (Fig. S6B).

The structure files of the M. tuberculosis ClpC1 homology model and lassomycin were converted from PDB to PDBQT with AutoDockTools to be used by the AutoDock Vina docking software. AutoDock Vina requires the definition of a limited area, the search space, to perform the docking. Large search spaces enclosing the whole molecule are used for blind docking, but expand the scoring function generated by the algorithm and add many local minima. Therefore, increasing the search space above a certain size also requires the increase of time to find a global minimum. This is controlled by the parameter “exhaustiveness”. In order to reduce the problems and risks with large search spaces, we decided to use a semi-blind docking. The search space was restricted to the acidic half of the ClpC1 N-domain and the exhaustiveness set to 100

(default is 8).

The lassomycin model was also prepared with AutoDockTools. Since the bioactive conformation is not known, two ligand files were created: In one case, all bonds, which can be rotated, were assumed so. In fact, the lasso ring remains rigid, while the rest of the backbone can be altered by the software during the docking (Fig. S8). The other model received a rigid peptide backbone and only the side chains were rotatable).

The models exhibiting the resistance mutations were generated with PyMOL's mutagenesis tool. The tool provides conformations of the newly introduced residue ordered by their frequency found in protein crystal structures. The program cannot

100

compute new tertiary structures after the residue is mutated. Hence, several conformations with the least visible clashes of the van-der-Waals radii were selected.

The mutant models were verified by MolProbity and compared to the wild-type (Fig. S9).

Q17Rrot19 was intentionally generated with many clashes and serves as a negative control. All other mutant models, except the Q17H mutations, scored close to the original ClpC1 homology model (wild-type). The high clashscores for the Q17H mutations can have two reasons: 1) The molecule shows a great deviation in its tertiary structure compared to the wild-type, when GLN17 is mutated. 2) The newly introduced

His-residue has a conformation, which is not covered by the PyMOL software. However, this case is rather unlikely, because many possible directions were observed during the

GLN17H mutant generation.

Vina calculates a binding energy for each position it finds based on the number of rotatable bonds allowed for the ligand and the environment on the receptor. Hence, a comparison of the binding energies between different receptor models is not appropriate, because the mutant models have a different protein sequence. Instead, the results were scored by the position of the docked ligand on the receptor molecule.

Partial hydrolysis of lassomycin

The peptide was partially hydrolyzed using microwave-assisted acid hydrolysis. In a 1.5 mL polypropylene centrifuge tube, the sample solution contains 0.1 µg/µL peptide in

25% TFA or 3 M HCl. The sample tube was placed in the water bath of a CEM microwave chamber (CEM Discover System, CEM Corporation, Matthews, NC) to perform the hydrolysis. Microwave hydrolysis conditions were as follows: 80°C, 300W,

101

32 to 70 minutes for TFA hydrolysis and 12 minutes for HCl hydrolysis. After the microwave hydrolysis, samples were either dried using a SpeedVac vacuum centrifuge

(Savant, Holbrook, NY), purified or diluted 5 to 10 times by water, followed by MS analysis.

MS/MS sequencing and exact mass determination of partially hydrolyzed lassomycin

The partially hydrolyzed lassomycin peptide was diluted 5 to 10 times and directly spotted (0.5 µL) onto a Bruker Daltonics MTP AnchorChip 800/384 target and air dried.

0.5 µL of α-cyano-4-hydroxy cinnamic acid matrix solution (CHCA) was spotted on top and air dried. The spots were washed 3 times by water to remove excess acids and salts. The matrix solution was prepared by diluting 36 µL of saturated matrix solution in

0.1% TFA in 90:10 ACN:H2O to 800 µL final volume using 0.1% TFA in 85:15

ACN:H2O, containing 1 mM ammonium phosphate. Mass spectra were obtained in the positive reflectron mode of ionization using a Bruker Daltonics (Bremen, Germany)

UltrafleXtreme MALDI TOF/TOF mass spectrometer. The MS and MS/MS spectra were obtained in an automated mode of operation; for MSMS analysis the CID (collision- induced dissociation) gas was turned off. The instrument was calibrated over the mass range 700 to 3500 Da using a mixture of standard peptides.

The high-resolution exact mass MALDI MS data of the partially hydrolyzed peptide was collected on a Bruker 9.4T Apex-Qe FTICR (Bruker Daltonics, Billerica, MA) using the samples already spotted on the MTP AnchorChipTM 800/384 target. The FTICR was externally calibrated using a mixture of standard peptides (Fig. S10).

102

In addition, LCMS/MS was performed as follows: the partially hydrolyzed peptide was

ZipTip (Millipore, Billerica, MA) purified and 5 µL of the resultant peptide solution was loaded onto a Waters nanoAcquity UPLC system (Waters, Milford, MA) using a peptide trap (180 µm × 20 mm, Symmetry® C18 nanoAcquity™ column, Waters, Milford, MA) and an analytical column (75 µm × 150 mm, Atlantis™ dC18 nanoAcquity™ column,

Waters, Milford, MA). Desalting on the peptide trap was achieved by flushing the trap with 2% acetonitrile, 0.1% formic acid at a flow rate of 10 µL/min for 3 min. Peptides were separated with a gradient of 2-60% solvent B (acetonitrile, 0.1% formic acid) over

35 min at a flow rate of 350 nL/min. The column was connected to a Waters Q-TOF

Premier (Waters, Milford, MA) for ESI-MS/MS analysis.

Data interpretation was completed manually to provide a proposed peptide sequence.

NMR spectroscopy of [13C, 15N]lassomycin

NMR data was acquired and processed as previously described (27, 28). A Varian

Inova 800-MHz spectrometer with a triple-resonance HCN cold probe and PFGs was used to record spectra. [13C, 15N]lassomycin was dissolved in dimethyl sulfoxide-d6

(Cambridge Isotope Laboratories, Andover, MA), and the sample was heated to 40 ºC for data collection. Table S2 lists the experimental parameters used to acquire the NMR spectra for lassomycin. Tables S3¬–S4 list the proton, backbone nitrogen and carbon chemical shift assignments of the peptide. The 15N-HSQC (Figures S7) gave reasonably well-dispersed peaks, with 12 out of 16 unique backbone NH signals observed, indicating that lassomycin holds a defined structure in solution. The backbone

NH signals for Arg3, Leu5, Arg14 and Ile16 could not be definitively assigned due to

103

spectral overlap. Most of the proton chemical shift assignments were made based on data from the HCCH-TOCSY, 13C-NOESYHSQC and 15N-TOCSYHSQC experiments.

Most of the carbon and nitrogen chemical shift assignments were made based on the backbone experiments HNCACB and CBCA(CO)NH(27).

Structure calculations

CYANA 2.1 was used to calculate the structures of all the stereoisomers,(29) using

NOE restraints measured from the 13C-NOESYHSQC and 15N-NOESYHSQC experiments combined with angle restraints obtained from the HNHA experiment. The automatically assigned NOEs were calibrated within CYANA according to their intensities. The same NOE peak lists were used for the structure calculations of each stereoisomer, following the same procedure as previously reported (27). After seven rounds of calculation for lassomycin (10,000 steps per round), a total of 449 cross-peak

constant were used in the final calculation.

The structural statistics for the 20 lowest energy conformations of lassomycin are summarized in Table S5. These conformations had no constraint violations, a high number of long-range NOEs (61) used in the structure calculation, a low average target function value (0.01), a low backbone rmsd (0.35 ± 0.10 Å), and a low heavy atom rmsd

(1.00 ± 0.23 Å). Coordinates for lassomycin have been deposited in the Protein Data

Bank (2mai) and chemical shift assignments have been deposited in the

BioMagResBank (19355). All other figures were generated using PyMOL.

104

Long term incubation setup

1 g of soil sample was vortexed vigorously in in 10 ml of dIH2O for 10 minutes in a 50 ml conical tube. This sample was serially diluted and mixed with molten 2% SMS agar

(0.125 g casein, 0.1 g potato starch, 1 g casamino acids, 20 g bacto-agar in 1 L dIH2O) for plating at multiple densities of colony forming units. 100 µl of these mixtures were then pipetted into flat bottom 96-well plates. Plates were incubated at room temperature

(22 °C) in a humidified chamber and observed weekly under a 50X dissecting microscope.

General experimental procedures

ESI-LC-MS data were recorded on a MicroMass Q-Tof-2 spectrometer equipped with an

Agilent 1100 solvent delivery system and an online diode array detector using a

Phenomenex Gemini-C18 reversed phase column (50x2.0 mm, 3.0 µm particle size).

Elution was performed with a linear gradient using deionized water with 0.1% formic acid and CH3CN with 0.1% formic acid as solvents A and B, respectively, at a flow rate of 0.2 mL/min. The gradient increased from 25 to 100% of solvent B over 10 minutes followed by an isocratic elution at 100% of solvent B for 7 minutes. Analytical and semi- preparative chromatography was performed on a Zorbax SB-C18 reversed phase column (250x9.4 mm, 5.0 µm particle size) using a Shimadzu SCL-10AVP HPLC system including a SPD-M10AVP diode array detector set at 254 nm. Elution was performed with a linear gradient using deionized water with 0.1% trifluoroacetic acid and

CH3CN with 0.1% trifluoroacetic acid as solvents A and B, respectively, at a flow rate of

3.0 mL/min. The gradient increased from 10 to 100% of solvent B over 20 minutes followed by an isocratic elution at 100% of solvent B for 8 minutes.

105

References

1. Payne DJ, Gwynn MN, Holmes DJ, & Pompliano DL (2007) Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Drug Discov 6(1):29-40.

2. Lewis K (2013) Platforms for antibiotic discovery. Nat Rev Drug Discov 12(5):371-387.

3. Lewis K (2012) Antibiotics: Recover the lost art of drug discovery. Nature 485(7399):439-440.

4. Staley JT & Konopka A (1985) Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu Rev Microbiol 39:321-346.

5. Kaeberlein T, Lewis K, & Epstein SS (2002) Isolating "uncultivable" microorganisms in pure culture in a simulated natural environment. Science 296(5570):1127-1129.

6. Buerger S, et al. (2012) Microbial scout hypothesis and microbial discovery. Appl Environ Microbiol 78(9):3229-3233.

7. Sacchettini JC, Rubin EJ, & Freundlich JS (2008) Drugs versus bugs: in pursuit of the persistent predator Mycobacterium tuberculosis. Nat Rev Microbiol 6(1):41-52.

8. Arnison PG, et al. (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep 30(1):108-160.

9. Iwatsuki M, et al. (2006) Lariatins, antimycobacterial peptides produced by Rhodococcus sp. K01-B0171, have a lasso structure. J Am Chem Soc 128(23):7486-7491.

10. Inokoshi J, Matsuhama M, Miyake M, Ikeda H, & Tomoda H (2012) Molecular cloning of the gene cluster for lariatin biosynthesis of Rhodococcus jostii K01-B0171. Appl Microbiol Biotechnol 95(2):451-460.

11. Dethlefsen L & Relman DA (2011) Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc Natl Acad Sci USA 108 Suppl 1:4554-4561.

12. Keren I, Minami S, Rubin E, & Lewis K (2011) Characterization and transcriptome analysis of Mycobacterium tuberculosis persisters. MBio 2(3):e00100-00111.

13. Akopian T, et al. (2012) The active ClpP protease from M. tuberculosis is a complex composed of a heptameric ClpP1 and a ClpP2 ring. EMBO J 31(6):1529-1541.

14. Raju RM, et al. (2012) Mycobacterium tuberculosis ClpP1 and ClpP2 function together in protein degradation and are required for viability in vitro and during infection. PLoS Pathog 8(2):e1002511.

15. Erzberger JP & Berger JM (2006) Evolutionary relationships and structural mechanisms of AAA+ proteins. Annu Rev Biophys Biomol Struct 35:93-114.

16. White SR & Lauring B (2007) AAA+ ATPases: achieving diversity of function with conserved machinery. Traffic 8(12):1657-1667.

17. Hanson PI & Whiteheart SW (2005) AAA+ proteins: have engine, will work. Nat Rev Mol Cell Biol 6(7):519-529.

106

18. Kirstein J, et al. (2009) The antibiotic ADEP reprogrammes ClpP, switching it from a regulated to an uncontrolled protease. EMBO Mol Med 1(1):37-49.

19. Schmitt EK, et al. (2011) The natural product cyclomarin kills Mycobacterium tuberculosis by targeting the ClpC1 subunit of the caseinolytic protease. Angew Chem Int Ed Engl 50(26):5889- 5891.

20. Geladopoulos TP, Sotiroudis TG, & Evangelopoulos AE (1991) A malachite green colorimetric assay for protein phosphatase activity. Anal Biochem 192(1):112-116.

21. Smith DM, Fraga H, Reis C, Kafri G, & Goldberg AL (2011) ATP binds to proteasomal ATPases in pairs with distinct functional effects, implying an ordered reaction cycle. Cell 144(4):526-538.

22. Schwede T, Kopp J, Guex N, & Peitsch MC (2003) SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res 31(13):3381-3385.

23. Guex N & Peitsch MC (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis 18(15):2714-2723.

24. Trott O & Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455-461.

25. Smee DF, Morrison AC, Barnard DL, & Sidwell RW (2002) Comparison of colorimetric, fluorometric, and visual methods for determining anti-influenza (H1N1 and H3N2) virus activities and toxicities of compounds. J Virol Methods 106(1):71-79.

26. Zhang JH, Chung TD, & Oldenburg KR (1999) A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 4(2):67-73.

27. Sit CS, McKay RT, Hill C, Ross RP, & Vederas JC (2011) The 3D structure of thuricin CD, a two-component bacteriocin with cysteine sulfur to alpha-carbon cross-links. J Am Chem Soc 133(20):7680-7683.

28. Rea MC, et al. (2010) Thuricin CD, a posttranslationally modified bacteriocin with a narrow spectrum of activity against Clostridium difficile. Proc Natl Acad Sci USA 107(20):9352-9357.

29. Guntert P, Mumenthaler C, & Wuthrich K (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273(1):283-298.

107

Supporting Materials

List of Abbreviations

C-terminus carboxy terminus

FTICR Fourier transform ion cyclotron resonance

HCN proton, carbon, nitrogen

MALDI matrix-assisted laser desorption/ionization

MS mass spectrometry

MS/MS tandem mass spectrometry

N-terminus amino terminus

NanoESI nano-electrospray ionization

NMR nuclear magnetic resonance

NOE nuclear Overhauser effect

Q-TOF quadrupole time of flight rmsd root mean square deviation

TOF time of flight

108

Supplementary Figures

Figure II-S1. solution structures for lassomycin.

109

G1

Q9 L10 N15 V11 15N F6 Q9 R4 N15 G12

L2 R13 R4 D8 R14 R3 R13 A7

HN

Figure II-S2. 2D-[1H–15N]-HSQC spectrum of lassomycin. 12 backbone resonances are labeled in blue. Sidechain amide resonances are paired together with green lines and labeled in green. The second sidechain resonance for Asn15 (7.45 ppm) is not shown because of its extremely weak intensity.

110

R14 N15

I16

Q9

180°

Q9 R4

I16 R13 R3

N15

Figure II-S3. Surface representation of lassomycin. Hydrophobic side chains are highlighted in yellow and hydrophilic sidechains are shown in cyan. Hydrophilic residues and the C-terminal residue (Ile16) are labeled.

Figure II-S4. Backbone overlay of the 20 lowest energy conformers of lassomycin.

111

CgClpC 1 MFERFTDRARRVIVLAQEEARMLNHNYIGTEHILLGLIHEGEGVAAKALE 50

||||||||||||:||||||||||||||||||||||||||||||||||:||

MtbClpC 1 MFERFTDRARRVVVLAQEEARMLNHNYIGTEHILLGLIHEGEGVAAKSLE 50

CgClpC 51 SMGISLDAVRQEVEEIIGQGSQPTTGHIPFTPRAKKVLELSLREGLQMGH 100 Figure II-S5. Pairwise sequence alignment of Corynebacterium glutamicum ClpC |:||||:.||.:||||||||.|..:|||||||||||||||||||.||:|| (CgClpC) and Mycobacterium tuberculosis ClpC1 N-terminal protein sequences (residues MtbClpC 51 SLGISLEGVRSQVEEIIGQGQQAPSGHIPFTPRAKKVLELSLREALQLGH 100 1-144). Algorithm parameters: EBLOSUM62 matrix, gap_penalty = 10, extend_penalty = 0.5.

Results: identity: 126/144 (87.5%), similarity: 135/144 (93.8%), gaps: 0/144 (0.0%), score: 634.0 CgClpC 101 KYIGTEFLLLGLIREGEGVAAQVLVKLGADLPRVRQQVIQLLSG 144

.|||||.:|||||||||||||||||||||:|.||||||||||||

MtbClpC 101 NYIGTEHILLGLIREGEGVAAQVLVKLGAELTRVRQQVIQLLSG 144

Figure II-S6. Quality evaluation parameters of the Mtb ClpC1 homology model generation.

(A) QMEAN Z-scores of given categories. QMEAN assesses the quality of the obtained geometry(1). Z-score is the relative value of QMEAN to the average value of reference X-ray structures. (B) ANOLEA: energy potential per residue (2).

112

Figure II-S7. Visualizations of the ClpC N-terminus: (A) Mutation sites (green; labels) are localized to a distinct area. ClpC N-domain protein backbone structure shown in pale orange.

Transparent overlapping surface map shows electrostatics, to make visible the positively charged regions (blue) and negatively charged regions (red). (B) The b-factor value of the top loop of the C. glutamicum ClpC N-terminus crystal structure indicates a very flexible part (red) of the molecule. The color changes from very flexible in red to very rigid in blue.

113

Figure II-S8. Torsion tree of flexible form of lassomycin. Red and purple bonds are rigid, green bonds are rotatable.

Figure II-S9. MolProbity clashscore values of the mutant models with different conformations of the mutated residue. WT indicates the Mtb ClpC1 homology model, the other models are named after the mutation and conformation number of the new residue.

114

Figure II-S10. MALDI FTICR spectrum of lassomycin.

115

Supplementary Tables

Table II-S1. MALDI FTICR and MALDI TOF/TOF analysis of partially hydrolyzed lassomycin. Amino acids Proposed Calculated Observed Error (X) formula mass mass (ppm) [X+H]+ [X+H]+ [X+H]+ Series 1 (ion series A & B) a b ELVGRRNI-COOCH3 C41H76N15O12 970.5792 970.5790 0.3 LVGRRNI-COOCH3 C36H69N14O9 841.5367 841.5363 0.4 VGRRNI-COOCH3 C30H58N13O8 728.4526 728.4527 -0.2 GRRNI-COOCH3 C25H49N12O7 629.3842 629.3841 0.1 c RRNI-COOCH3 C23H46N11O6 572.3627 572 – c RNI-COOCH3 C17H34N7O5 416.2616 416 – Series 2 (ion series C) GLRRFLAD C42H69N14O10 929.5316 929.5316 0.0 GLRRFLAD+H2O C42H71N14O11 947.5421 947.5424 -0.3 a GLRRFLADE C47H78N15O14 1076.5847 1076.5840 0.7 GLRRFLADEL C53H89N16O15 1189.6688 1189.6691 -0.2 GLRRFLADELV C58H98N17O16 1288.7372 1288.7371 0.1 GLRRFLADELVG C60H101N18O17 1345.7587 1345.7586 0.1 GLRRFLADELVGR C66H113N22O18 1501.8598 1501.8578 1.3 Series 3 (ion series D) a M+H2O with loss E-L-V C67H117N26O17 1557.9085 1557.9100 -1.0 M+H2O with loss E-L C72H126N27O18 1656.9769 1656.9766 0.2 M+H2O with loss E C78H137N28O19 1770.0609 1770.0657 -2.7 1-16 (molecular ion) GLRRFLADQLVGRRNI-COOCH3 C83H143N30O20 1880.1089 1880.1079 0.5 a GLRRFLADELVGRRNI-COOCH3 C83H142N29O21 1881.0930 1881.0914 0.9 GLRRFLADELVGRRNI-COOCH3+H2O C83H144N29O22 1899.1035 1899.1044 -0.5 aDe-amidation during acid hydrolysis converted Q to E bExact masses measured on MALDI FTICR MS cObserved in MALDI TOF/TOF MS/MS but not in MS mode on FTICR MS

Table II-S2. Experimental parameters used to acquire NMR spectra on [13C, 15N]lassomycin to obtain chemical shift assignments, coupling constants, and NOE restraints. Exp. Namea Nucleib x-swc y-sw z-sw x-pts y-pts z-pts Ref 13C-HSQC (full) 1H, 13C 11990 28155 1024 128 15N-HSQC 1H, 15N 11990 2800 1024 128 (3) 1 1 15 HNHA H, H, N 11990 8000 1945 1024 96 32 (1, 4) CBCA(CO)NH 1H, 13C, 15N 11990 12000 1945 1024 64 32 (5) HCCH-TOCSY 1H, 1H, 13C 11990 9000 12001 1024 144 28 (6) HNCO 1H, 13C(O), 15N 11990 3770 1945 1024 64 32 (5, 7-9) HNCACB 1H, 13C, 15N 11990 16089 1945 1024 64 32 (5, 9, 10) 13C-NOESYHSQC 1H, 1H, 13C 11990 8000 8000 1024 128 32 (2) 15N-NOESYHSQC 1H, 1H, 15N 11990 7998 1945 1024 128 32 (11) 15N-TOCSYHSQC 1H, 1H, 15N 11990 8000 1945 1024 128 32 (11) aExperiments were acquired at 800 MHz. bThe nucleus acquired in each dimension (e.g. 1H,15N indicates hydrogen x, nitrogen y). cx,y,z-pts and sw are the number of complex points and sweep width in each respective dimension (x is the directly detected dimension).

116

Table II-S3. 1H Chemical shift assignments of lassomycin.

HN H H others Gly 1 8.16 4.79, 3.36

Leu 2 8.39 4.53 1.88, 1.36 CH 1.59, CH3 0.87, 0.85 a Arg 3 NA 4.48 1.52, 1.19 CH2 1.21, 1.17, CH2 3.17, 2.88, 2NH2 7.76, 7.66 Arg 4 7.14 4.40 1.66, 1.54 CH2 1.44, 1.34, CH2 3.02, 2NH2 7.69, 7.58 Leu 5 NA 4.41 1.10, 0.81 CH 1.03, CH3 0.67, 0.61 Phe 6 8.41 3.98 3.64, 3.03 Ala 7 8.65 4.25 1.38 Asp 8 6.98 4.65 2.95, 1.88

Gln 9 8.32 3.59 1.80, 1.65 CH2 2.15, 2.10, NH2 7.21, 6.73 Leu 10 7.72 4.40 1.61, 1.38 CH 1.40, CH3 0.86, 0.83 Val 11 7.72 4.39 1.87 CH3 0.83, 0.83 Gly 12 8.98 5.39, 3.56

Arg 13 7.08 5.17 1.20, 1.06 CH2 1.25, 1.13, CH2 2.85, 2.68, 2NH2 7.42, 7.32 Arg 14 NA NA 1.80, 1.54 CH2 1.51, CH2 3.09, 2.99, 2NH2 7.20, 7.10 Asn 15 9.16 5.32 2.69 NH2 7.47, 7.03 b Ile 16 NA 4.07 1.77 CH2 1.44, 1.23 CH3 0.89, CH3 0.86, CH3 3.55 aNA = not assigned. Due to spectral overlap, a chemical shift could not be definitively assigned to the HA b of Arg 14 and the HN proton of Arg3, Leu5, Arg14 or Ile16. Carboxy group of Ile is methylated (CH3).

Table II-S4. Nitrogen and carbon chemical shift assignments of lassomycin.

N C C others Gly 1 104.75 45.02 Leu 2 118.46 53.33 43.05 C 27.28, C 26.05, 24.92 a Arg 3 NA 54.50 28.52 C 43.90, N2 110.11 Arg 4 114.27 57.97 34.42 C 28.09, C 43.31, N2 108.79 Leu 5 NA 53.76 46.89 C 31.83, C 25.70, 24.71 Phe 6 114.24 60.05 36.78 Ala 7 123.92 53.92 20.54 Asp 8 118.95 52.05 39.51 Gln 9 114.39 59.38 29.57 C 34.63, N 108.24 Leu 10 110.45 56.22 45.10 C 25.31, 24.85 Val 11 111.10 60.14 35.01 C 18.56, 16.22 Gly 12 115.58 44.17

Arg 13 119.20 54.35 33.20 C 27.81, C 43.81, N2 110.32 Arg 14 NA NA 33.75 C 27.25, C 44.29, N2 109.33 Asn 15 115.56 52.02 41.47 N 109.72 b Ile 16 NA 60.86 39.11 C 28.17, C' 18.20, C 14.31, C 54.82 aNA = not assigned. Due to spectral overlap, a chemical shift could not be definitively assigned to the C of Arg14 and the amide N of Arg3, Leu5, Arg14 or Ile16. bCarboxy group of Ile is methylated (C).

117

Table II-S5. Structural statistics of the solution structure of lassomycin.

Structural statistics Distance and angle restraints total cross peak 449 assignments short (|ij|  1) 369 medium (1  |ij|  5) 19 long (|ij|  5) 61 number of  angles 1 Average target function value 0.01 rmsd (Å) for residues 1-16 backbone 0.35  0.10 heavy atoms 1.00  0.23

118

References

1. Kuboniwa H, Grzesiek S, Delaglio F, & Bax A (1994) Measurement of HN-H alpha J couplings in calcium-free calmodulin using new 2D and 3D water-flip-back methods. J Biomol NMR 4(6):871-878.

2. Farrow NA, et al. (1994) Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15N NMR relaxation. Biochem 33(19):5984-6003.

3. Kay LE, Keifer P, & Saarinen T (1992) Pure absorption gradient enhanced heteronuclear single quantum correlation spectroscopy with improved sensitivity. J Am Chem Soc 114(26):10663-10665.

4. Vuister GW & Bax A (1993) Quantitative J correlation: a new approach for measuring homonuclear three-Bond J(HNH.alpha.) coupling constants in 15N-enriched proteins. J Am Chem Soc 115(17):7772-7777.

5. Muhandiram DR & Kay LE (1994) Gradient enhanced triple resonance three-dimensional NMR experiments with improved sensitivity. J Magn Reson Ser B 103(3):203-216.

6. Sattler M, Schwendinger MG, Schleucher J, & Griesinger C (1995) Novel strategies for sensitivity enhancement in heteronuclear multidimensional NMR experiments employing pulsed- field gradients. J Biomol NMR 6(1):11-22.

7. Ikura M, Kay LE, & Bax A (1990) A novel approach for sequential assignment of H-1, C-13, and N-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochem 29(19):4659-4667.

8. Grzesiek S & Bax A (1992) Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J Mag Reson Im 96(2):432-440.

9. Kay LE, Xu GY, & Yamazaki T (1994) Enhanced-sensitivity triple-resonance spectroscopy with minimal H2O saturation. J Magn Reson Ser A 109(1):129-133.

10. Wittekind M & Mueller L (1993) HNCACB, a high-sensitivity 3D NMR experiment to correlate amide-proton and nitrogen resonances with the alpha-carbon and beta-carbon resonances in proteins. J Magn Reson Ser B 101(2):201-205.

11. Zhang OW, Kay LE, Olivier JP, & Forman-Kay JD (1994) Backbone 1H and 15N resonance assignments of the N-terminal SH3 domain of drk in folded and unfolded states using enhanced-sensitivity pulsed-field gradient NMR techniques. J Biomol NMR 4(6):845-858.)

119