DIVERSE ENVIRONMENTAL ENCODE UNIQUE SECONDARY METABOLITES THAT INHIBIT HUMAN PATHOGENS

Elizabeth Davis

A Thesis

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

August 2017

Committee:

Hans Wildschutte, Advisor

Ray Larsen

Jill Zeilstra-Ryalls

© 2017

Elizabeth Davis

All Rights Reserved iii ABSTRACT

Hans Wildschutte, Advisor

Antibiotic resistance has become a crisis of global proportions. People all over the world are dying from multidrug resistant infections, and it is predicted that bacterial infections will once again become the leading cause of death. One human opportunistic pathogen of great concern is Pseudomonas aeruginosa. P. aeruginosa is the most abundant pathogen in cystic fibrosis (CF) patients’ lungs over time and is resistant to most currently used antibiotics. Chronic infection of the CF lung is the main cause of morbidity and mortality in CF patients. With the rise of multidrug resistant and lack of novel antibiotics, treatment for CF patients will become more problematic. Escalating the problem is a lack of research from pharmaceutical companies due to low profitability, resulting in a large void in the discovery and development of antibiotics. Thus, research labs within academia have played an important role in the discovery of novel compounds. Environmental bacteria are known to naturally produce secondary metabolites, some of which outcompete surrounding bacteria for resources. We hypothesized that environmental Pseudomonas from diverse soil and water habitats produce secondary metabolites capable of inhibiting the growth of CF derived P. aeruginosa. To address this hypothesis, we used a population based study in tandem with transposon mutagenesis and bioinformatics to identify eight biosynthetic gene clusters (BGCs) from four different environmental Pseudomonas strains, S4G9, LE6C9, LE5C2 and S3E10. Of the eight BGCs identified, seven had putative products of non-ribosomal peptide synthetases and one had a putative product of a phenazine. All compounds appeared to be diverse and potentially novel, but further biochemical research must be done to verify these findings. Overall, we were able to identify genes that encode secondary iv metabolites capable of inhibiting the growth of CF derived P. aeruginosa as well as other human pathogens. This research has created ground work for the possibility of extracting, characterizing and developing new antibiotics.

v

For Mikayla vi ACKNOWLEDGMENTS

I would like to acknowledge Bowling Green State University and the Biological Sciences department for this wonderful opportunity to learn and grow. I would like to acknowledge my committee members for all their help and support. Dr. Ray Larsen for always being right across the hall, and allowing me to stop in at any time with questions. Also for being such a big supporter of myself and my research. Dr. Jill Zeilstra-Ryalls for all the wonderful advice and the great walks to class twice a week. Working with her has been a great pleasure and I am very thankful for her support during this process. I want to acknowledge Dr. Hans Wildschutte for being an amazing advisor and leading by example through hard work and dedication. I also cannot thank him enough for the feedback and help at any time of the day, even if he was super busy, he always stopped and helped with anything.

I would like to acknowledge my lab mates because without them my time in the lab would have been much more difficult. Thank you, Payel Chatterjee, for always helping me.

Thank you for teaching me techniques and life skills that will be so beneficial in the future.

Thank you, Joe Basalla, for always giving amazing insight on my research and being able to think outside the box. Also thanks to both Payel and Joe for being my coffee break buddies.

Finally, I must acknowledge, Britney Eggly, Mahnur Khan and Emily Vervrugge for all the help with mutant hunts and any other task done in the lab.

Finally, I would like to acknowledge my family for love and support throughout this process and my friends for always letting me vent to them. I want to give the biggest thank you to

Matthew Burgess. He was such an important part of this process, supporting me and believing in me when I did not believe in myself. vii

TABLE OF CONTENTS

Page

CHAPTER I. INTRODUCTION ...... 1

1.1 The antibiotic resistance crisis ...... 1

1.2 Cystic fibrosis and Pseudomonas aeruginosa ...... 4

1.3 Pseudomonas as a model organism ...... 5

1.4 Antagonistic activity of environmental Pseudomonas ...... 8

1.5 Objectives ...... 11

CHAPTER II. MATERIALS AND METHODS ...... 12

2.1 Selecting isolates ...... 12

2.2 Growth conditions ...... 12

2.3 Transposon (Tn) mutagenesis ...... 13

2.3.1 Tri-parental mating filter method ...... 14

2.3.2 Tri-parental mating spotting method ...... 14

2.4 Optimization of transposon (Tn) mutagenesis ...... 15

2.5 Replica plating ...... 16

2.6 Optimization of mutant screening ...... 16

2.7 Scaled up mutant hunt and verification of mutants ...... 18

2.8 Antagonistic assay using human pathogens ...... 19

2.9 Mutant DNA extraction and PCR to identify transposon insertion ...... 19

2.9.1 Linker-mediated (LM) PCR ...... 19

2.9.2 Arbitrary PCR ...... 20

2.10 Wildtype DNA extraction and genome sequencing ...... 21

viii

TABLE OF CONTENTS

Page

CHAPTER III. RESULTS ...... 22

3.1 Strains tested for the ability to undergo Tn mutagenesis ...... 22

3.2 Optimization of Tn mutagenesis ...... 23

3.3 Optimization of mutant screen ...... 25

3.4 Optimization and mutant hunt results ...... 26

3.4.1 Strain S4G9 ...... 26

3.4.2 Strain LE6G11 ...... 27

3.4.3 Strain LE6C9 ...... 27

3.4.4 Strain LE5C2 ...... 28

3.4.5 Strain S3E10 ...... 28

3.5 Antagonistic assay against humans pathogens ...... 29

3.6 PCR and sequencing results ...... 29

3.7 Analysis of genomes and biosynthetic gene clusters (BGCs) ...... 31

3.7.1 Alignment of mutants to wildtype genome ...... 31

3.7.2 Wildtype whole genome annotation results ...... 32

3.7.3 Identification and characterization of biosynthetic

gene clusters (BGCs) ...... 32

3.7.4 S4G9 gene clusters ...... 34

3.7.5 LE6C9 gene clusters ...... 34

3.7.6 LE5C2 gene clusters ...... 35

3.7.7 S3E10 gene clusters ...... 36

ix

TABLE OF CONTENTS

Page

3.7.8 Comparison of gene clusters ...... 37

3.7.9 Average nucleotide identity (ANI) ...... 37

CHAPTER IV. DISCUSSION...... 39

4.1 Population-level diversity of Pseudomonas...... 39

4.2 Transposon mutagenesis and bioinformatics ...... 41

4.3 Secondary metabolites of Pseudomonas ...... 43

4.4 Future directions ...... 47

REFERENCES...... 48

APPENDIX A. FIGURES ...... 59

APPENDIX B. TABLES ...... 69

x

LIST OF ABBREVIATIONS

AA amino acid

Amp ampicillin

ANI average nucleotide identity antiSMASH antibiotic and secondary metabolite analysis shell

ARB arbitrary

AU Pseudomonas aeruginosa

BGC biosynthetic gene cluster

BHI brain heart infusion

BLAST basic local alignment search tool

C+K cetramide with 50 µg/mL kanamycin plate

Cam chloramphenicol

CAS chrome azurol S

CDC Center of Disease Control and Prevention

CF cystic fibrosis

CFTR cystic fibrosis transmembrane conductance regulator

CRISPR clustered regularly interspaced short palindromic repeats

ESKAPE Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae,

Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter spp.

IMG/ABC integrated microbial genomes—atlas of biosynthetic gene clusters

IMG/M integrated microbial genome and microbiomes

JGI Joint Genome Institute

Kan kanamycin

xi

LIST OF ABBREVIATIONS

KanR kanamycin resistant

LB lysogeny broth

LM linker-mediated

LOI loss of inhibition

MDR multidrug resistant

MH muller hinton

MRSA methicillin-resistant Staphylococcus aureus

NB nutrient broth

NCBI National Center for Biotechnology Information

NRPS non-ribosomal peptide synthetase

OD600 optical density at 600 nm

ORF open reading frame

PCR polymerase chain reaction

SWI Small World Initiative

Tn transposon

TSB tryptic soy broth

WHO World Health Organization

1

CHAPTER I. INTRODUCTION

1.1 The antibiotic resistance crisis

After the discovery of penicillin and its use in treating diseases, antibiotics were believed to be miracle drugs (Bennett and Chung 2001). This breakthrough resulted in the survival of millions of people, and had a huge impact on modern medicine and surgery (US Department of

Health and Human Services 1999). Unfortunately, the disadvantages of antibiotic overuse were shadowed by the lifesaving benefits, and today a rapid emergence of antibiotic resistant bacteria is threatening the major strides made decades ago. Resistance has been identified to almost every antibiotic currently being used (Ventola 2015). It is estimated in the United States alone, about 2 million people are infected with multidrug resistant (MDR) bacteria and 23,000 people die each year from these infections (Frieden 2013, Center for Disease Dynamics 2015). Worldwide, this number is even more terrifying at 700,000 deaths each year (O’Neil 2014). These numbers are startling and are rising. If current habits of antibiotic use are not changed, and resistance continues along the same evolutionary path, by the year 2050, the Center for Disease Control and

Prevention (CDC) predicts that 10 million deaths per year worldwide will be the result of antibiotic resistant infections (O’Neil 2014). If these predictions hold true, antibiotic resistance infections will once again be the leading cause of death.

The World Health Organization (WHO) recently released the first ever list of bacteria that are the most serious antibacterial resistant threats to humans (Lawe-Davies and Bennett

2017). Bacteria have been grouped into critical, high, and medium priority. These threats include both gram-positive and gram-negative bacteria, including but not limited to a group of bacteria known as the ESKAPE pathogens: Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter spp. (Hidron 2 et al. 2008, Sievert et al. 2013, Tommasi 2015). The name ESKAPE comes from the first letter of each pathogen listed above and their ability to essentially escape the effects of antibiotics. All of these pathogens were responsible for causing serious and life-threatening infections even before they became antibiotic resistant. The largest concern is for immunosuppressed people, particularly people in health care settings, because most of these pathogens are opportunistic

(Mauldin et al. 2010, Peleg and Hooper 2010). Common infections caused by these bacteria include skin and blood infections, urinary tract infections, bacterial pneumonia, eye infections, ear infections, meningitis, and tuberculosis, just to name a few (Frieden 2013). These infections were treatable 30 years ago, but with evolution of antibiotic resistance, they are once again a major concern.

The misuse and abuse of antibiotics is a selective pressure that has impacted the evolution of antibiotic resistance. From overuse in hospitals and agriculture, to incorrectly prescribing patients, bacteria have had the opportunity to evolve resistance to virtually all antibiotics

(Ventola 2015). Escalating the problem, is a lack of drug discovery from pharmaceutical companies, due to low profitability. Antibiotics have no economic appeal to these companies because the drugs are used for short periods of time, are relatively inexpensive for consumers, and take too much time and money to produce (Tommasi 2015, Freire-Moran et al. 2011). This emerging resistance and lack of antibiotic discovery is moving the world toward times that resemble the “pre-antibiotic era.” This will have many negative global impacts on people. Not only will antimicrobial resistance have the potential to become the leading cause of death, but it will also have a negative economic impact (McGowan 2001). Increasing antibiotic resistant infections result in longer hospital stays, and the use of last resort antibiotics, which both are very expensive. Mauldin et al. (2010) found hospital patients with MDR bacterial infections pay 3 about 29.3% more (median of $40,000 more) for their treatment than patients with nonresistant bacterial infections, and had a 23.8% increase in length of stay in the hospital. Also, the increase of morbidity and mortality of people results in less people working and less economic outputs from companies (Center for Disease Dynamics 2015, McGowan 2001, O’Neil 2014).

The CDC and WHO have proposed many management strategies to address the current antibiotic resistance crisis, in hopes to avoid future complications (Center for Disease Dynamics

2015, O’Neil 2014, Ventola 2015). One of the main focuses of these organizations is educating people, prescribers, and the health care systems on the negative effects of the inappropriate use of antibiotics. Other efforts are on monitoring and improving the practices of prescribing antibiotics, improving the diagnosis of the cause of disease, and preventing further transmission of the antibiotic resistant bacteria. Having better strategies to track and record the types of infections, could also provide a centralized database of the resistant bacteria, the locations they occurred, and outbreaks (Frieden 2013, Ventola 2015). However, educating people, coming up with ways to decrease the spread of infection, and slowing the evolution of resistance can only go so far. The lack of new antibiotic discovery is still an ever-looming problem and although there are some promising drugs on the brink of production, more must be discovered. Broad- spectrum antibiotics and the approaches to drug discovery through synthetic biology are becoming less effective (Tommasi 2015). Thus, there is a growing demand for novel antibiotic compounds, as well as promising systems for identifying the compounds. Since research and discovery is not occurring from pharmaceutical companies, the responsibility falls on academia where university labs hold a significant role in drug discovery.

4

1.2 Cystic fibrosis and Pseudomonas aeruginosa

An example of the benefits of antibiotics, as well as the negative impacts of resistance, occurs in Cystic Fibrosis (CF) patients. Advanced treatments for CF patients have greatly increased life expectancy, and most people are now surviving past the age of 30 (LiPuma 2010,

Rau et al. 2010, Tunney et al. 2010). However, the predominant colonizer of adult CF lungs,

Pseudomonas aeruginosa, is evolving resistance to all currently used antibiotics, and consequently becoming even more difficult to treat (LiPuma 2010, Perry et al. 2008, Rau et al.

2010, Smith et al. 2006).

Approximately 30,000 people suffer from CF in the United States, and it is estimated that people in other countries are equally affected by this disease (Caballero et al. 2015, Klepac-Ceraj et al. 2010, LiPuma 2010). CF manifests from a mutation in the cystic fibrosis transmembrane conductance regulator (CFTR) that inhibits the proper movement of chloride ions across the cell, thus causing a mucous build up along the cell membrane, inside the lungs (Coutinho et al. 2008,

Klepac-Ceraj et al. 2010, LiPuma 2010). Although multiple organs are affected by this disorder, one of the most severe problems with CF patients is chronic infection of the lungs. Mucoid build-up in the respiratory system provides an ideal environment for bacterial colonization and subsequent infection, which has a significant impact on the morbidity and mortality of CF patients (Coutinho et al. 2008, Gross and Loper 2009, Silby et al. 2011, Smith et al. 2006). The chronic infection of CF patients’ lungs is the main cause for the low life expectancy (Coutinho et al. 2008, Cox et al. 2010, Klepac-Ceraj et al. 2010). An elicited immune response from the infection causes repeated cycles of inflammation and scarring, thus resulting in poor lung function and eventually organ failure (Blainey et al. 2012, Parad et al. 1999). 5

Multiple bacterial species have been collected and isolated from the lungs of CF patients, including P. aeruginosa, Burkholderia spp, Staphylococcus aureus, Stenotrophomonas maltophilia, Achromobacter xylosoxidans, and Mycobacterium spp. (Coutinho et al. 2008, Cox et al. 2010, LiPuma 2010). However, research shows that P. aeruginosa is the predominant bacterium in a chronic infection, having the highest prevalence in CF patients over time, specifically in adults (Coutinho et al. 2008, Cox et al. 2010, Zhao et al. 2012). Additionally, P. aeruginosa has been shown to be the main concern in chronic infection because of its multidrug resistance (Coutinho et al. 2008, Jelsbek et al. 2007, Parad et al. 1999).

When the P. aeruginosa initially colonizes the CF lung, its phenotype is similar to naturally occurring environmental Pseudomonas strains, and it exhibits a non-mucoid state, motility, and virulence (Jelsbak et al. 2007, Rau et al. 2010). After colonization, P. aeruginosa rapidly adapts to the CF lung through increased rates of mutations that affect gene regulation including the mucA gene, which causes increased alginate production, and contributes to higher fitness in the lungs (Jelsbak et al. 2007, Rau et al. 2010, Smith et al. 2006). Multiple mutations, in addition to the mucA gene, occur during colonization of the CF lung, allowing not only biofilm formation but also increased antibiotic resistance (Rau et al. 2010, Silby et al. 2011,

Smith et al. 2006). Pathogenic P. aeruginosa from CF patients has been found to be resistant to

-lactam antibiotics, aminoglycosides, aztreonam, ceftazidime, and ciprofloxacin (LiPuma 2010,

Perry et al. 2008, Smith et al. 2006). This rise of MDR P. aeruginosa is alarming, and more research is needed to address this problem.

1.3 Pseudomonas as a model organism

The genus Pseudomonas consists of gram-negative bacteria from the phylum

Proteobacteria (Palleroni et al. 2010), which is one of the most diverse groups of bacteria and has 6 been studied for many years (Gomila et al. 2015). Currently over 200 species of Pseudomonas have been identified and all can be found online at http://www.bacterio.net/pseudomonas.html

(Tayeb et al. 2005). Pseudomonas are model organisms for many reasons: they are ecologically and human relevant, can be worked with easily in a lab setting, and have been used in a variety of biotech, agricultural and industrial settings because of their metabolic capabilities (Silby et al.

2011).

As described above P. aeruginosa is a human pathogen that is of great concern because of its virulence and rising antibiotic resistance. Although it is the predominant colonizer of CF patients’ lungs, it is also an opportunistic pathogen capable of causing pneumonia, and infections of burn victim’s wounds in nosocomial settings (Richard et al. 1994, Yayan et al. 2015). P. aeruginosa is not the only pathogenic Pseudomonas species and not the only concern for antibiotic resistance. Pseudomonas syringae is a well characterized plant pathogen, which can infect a variety of plant species causing lesions on fruits and leaves, blasts, galls and cankers

(Moore 1988). Pseudomonas entomophilia is an insect pathogen responsible for infections and deaths of insects, and highly studied in Drosophila melanogaster (Vodovar et al. 2005). All these pathogens are capable of producing toxins including, cytotoxins, phytotoxins, and TccC- type toxins, which contribute to disease in infected hosts (Gross and Loper 2009, Silby et al.

2011, Peix et al. 2009).

Pseudomonas are described as being ubiquitous throughout the environment having been isolated in large numbers from a variety of habitats including plants, animals, soil and water

(Palleroni et al. 2010, Silby et al. 2011, Spiers et al. 2000). The wide abundance of

Pseudomonas species in the environment allows for sampling, isolating, and identifying bacteria that can be utilized in the lab. Furthermore, specific strains of pseudomonads (defined as 7 members of the genus Pseudomonas) have been well characterized through culture based experiments and genomic analyses (Gomila et al. 2015, Meyer et al. 2002, Peix et al. 2009).

Genetic tractability throughout this genus has given great insight to the bacteria’s capabilities.

For instance, transposon mutagenesis has been used to identify genes involved in antagonistic activity (defined as the inhibition of growth by one bacterial strain on another) (Chatterjee et al.

2017, Davis et al. 2017). Furthermore, multiple phylogenetic analyses using housekeeping genes

(defined as genes shared by all members in the genus) have been performed on Pseudomonas to identify strains at the species level (Gomila et al. 2015, Spiers et al. 2000). These include rpoB, recA, atpD, carA, gyrB, and rpoD genes (Peix et al. 2009, Spiers et al. 2000, Tayeb et al. 2005,

Yamamoto et al. 2000); all of which allow for a greater resolution and diversification of the genus and species compared to 16S rRNA gene analysis alone. The 16S rRNA gene is highly conserved and does not evolve as quickly as other essential genes (Yamamoto et al. 2000).

Many studies have shown that closely related species are hard to distinguish from one another when using the 16S rRNA gene sequence (Peix et al. 2009, Tayeb et al. 2005). When working with one genus, Pseudomonas, the 16S rRNA gene for a phylogenetic tree does not provide as high of phylogenetic resolution as a house keeping gene such as gyrB.

Pseudomonads have genomes that range in size from ~3.7 Mb to 7.1 Mb and that give rise to different metabolic functions (Spiers et al. 2000). They possess diverse biochemical capabilities, and play roles in degradation of organic and synthetic compounds. Studies have shown that Pseudomonas species are involved in numerous activities ranging from plant growth promotion, to bioremediation and biocontrol, but also pathogenicity (Palleroni et al. 2010, Silby et al. 2011). The strain Pseudomonas protogens Pf-5 is well characterized, and known to produce various secondary metabolites including 2,4-Diacetylphloroglucinol (a toxin to fungal 8 plant pathogens), pyoluteorin (a toxin to Oomycetes), and phenazines (a nitrogen containing compound commonly used as an antifungal compound) (Gross and Loper 2009, Pierson and

Pierson 2010, Silby et al. 2011). Furthermore, the strain NCIMB

10586, produces the antibiotic mupirocin which is effective against MRSA (Kassem El-Sayed et al. 2003). Thus, pseudomonads are a model organism because they are easy to isolate, can be utilized in the lab for experimentation, and are known to produce antagonistic factors.

1.4 Antagonistic activity of environmental Pseudomonas

Prior to infection and adaptation in the CF lung, studies suggest that P. aeruginosa originate in the environment (Jelsbak et al. 2007, Rau et al. 2010). However, research has shown that P. aeruginosa is not an abundant species in ecological habitats of soil and water

(Kahn et al. 2007, Deredjian et al. 2014). Instead, other Pseudomonas species are predominant in these habitats. A possible explanation resulting in the decreased abundance of P. aeruginosa in the environment, compared to its fitness in the CF lung, could be from bacterial competition, whereby Pseudomonas strains inhibit the growth or outcompete P. aeruginosa when outside of the CF host. Chatterjee et al. (2017) performed a population-level based study on environmental pseudomonads to test this hypothesis. This research involved the isolation, phylogenetic characterization, and antagonistic activity of pseudomonads isolated from soil and water habitats

(Chatterjee et al. 2017). The strains were collected from diverse habitats including soil from

Bowling Green, Ohio during the spring, April 2012, and a fresh water ecosystem from the central basin of Lake Erie during the winter, February 2012. These habitats were chosen because they represent diverse solid state and fluidic environments, and likely expose microbes to distinct selective pressures. If strains are adapted to such ecosystems, the products they produce may differ. For instance, a molecule may be more active or hydrophobic if produced by water adapted 9 strains, since the compounds are likely to rapidly diffuse away from the host. In soil environments, diffusion may be less of a factor limiting the effectiveness of a compound.

About 390 strains were isolated consisting of 192 from water (96 strains denoted LE5 and

96 strains denoted LE6) and 192 from soil (96 strains denoted S3 and 96 strains denoted S4) habitats. Genomic DNA was extracted from all 390 strains, and used to amplify and sequence the gyrB housekeeping gene. This gene encodes DNA gyrase which plays a role in DNA replication and is essential for pseudomonad survival (Yamamoto et al. 2000). The gyrB gene sequences were aligned and used to construct a phylogenetic tree (Figure 1, Chatterjee et al. 2017) which was used to characterize the population-level diversity. Genetic diversity was evident as the tree was resolved into 13 different populations, based off common ancestors, denoted by branching patterns. Data from the corresponding two habitats were superimposed onto the tree to investigate ecological distribution compared to genetic diversity. It was observed that some of the populations were made up of strains from only one environment such as populations 1, 4, 6,

7, and 11 that are mainly from water, and populations 2, 3, 10, and 13 that are mainly from soil.

This indicates that strains within the same environment are genetically similar based on the gyrB gene, and may be adapted to one habitat. In addition, populations 5, 8, 9 and 12, have a mixture of soil and water isolates. These strains may represent generalists that can persist across multiple environments (Figure 1). Overall, the results suggested that population-level diversity occurs within, as well as between different environments. Interestingly, no P. aeruginosa strains were isolated during this work, suggesting their low or rare abundance in these habitats.

As discussed above, many Pseudomonas species are known for their production of secondary metabolites. The Wildschutte lab previously performed a series of competition plate assays to test the hypothesis that environmental Pseudomonas strains are able to outcompete 10 other environmental Pseudomonas isolates (Chatterjee et al. 2017). Antagonistic assays were performed to compete all strains within a habitat in a pairwise fashion; specifically, the soil (S3 and S4) and water (LE5 and LE6) strains were tested against each other. A high output experimental method resulted in over 50,000 individual interactions generated from 25,600 individual soil to soil, and water to water assays. Hundreds of antagonistic events were observed in this experiment. Out of the LE5 and LE6 strains, about 61% of 163 strains inhibited at least two other strains and nine of the strains were capable of inhibiting more than 20. Among the S3 and S4 strains, about 69.2% of 167 strains inhibited at least two other strains and eight of these strains were capable of inhibiting more than 10 (Chatterjee et al. 2017).

Since pseudomonads have the ability to inhibit other environmental strains, the hypothesis that environmental isolates inhibit pathogenic CF-derived P. aeruginosa was tested.

A panel of 35 clinical P. aeruginosa strains (AU strains) isolated from CF patients’ sputum was provided by Dr. John LiPuma at the University of Michigan. The samples were taken from lungs of 32 unrelated CF patients from 12 cities in the United States from 2004-2012. To test for activity, the same antagonistic assay was used when testing for pseudomonad inhibition among soil and water isolates. Each AU strain was spread plated on a nutrient broth agar medium and all strains (S3, S4, LE5 and LE6) were tested for antagonistic activity against the pathogens. The antagonistic results were overlaid on the phylogenetic tree (Figure 1). Strains were capable of inhibiting multiple AU strains. Interestingly, some environmental strains that did not inhibit other habitat-derived isolates showed antagonism against AU pathogens. Overall results were astonishing, with 156 of the environmental isolates showing a total of 263 antagonistic activities.

About 60% of the water strains and 40% of the soil strains inhibited one or more AU strains

(Chatterjee et al. 2017). Out of the 35 AU strains tested 30 were susceptible to at least one soil or 11 water isolate. The large range of susceptibility in the AU strains implies a variety of secondary metabolites or antimicrobial factors are being produced by the environmental strains (Chatterjee et al. 2017). These results suggest that pseudomonads from different habitats have the ability to out compete pathogenic P. aeruginosa through the production of diverse compounds.

1.5 Objectives

This research involving pseudomonads from different habitats has significance regarding the identification of novel antimicrobial factors, and has potential to contribute to drug discovery against P. aeruginosa, and other pathogens. Based upon previous studies (Chatterjee et al. 2017) we hypothesized that environmental Pseudomonas produce secondary metabolites capable of inhibiting the growth of pathogenic P. aeruginosa isolated from the CF lung. The objective of this study was to identify and characterize diverse gene clusters, from Pseudomonas strains, involved in the production of secondary metabolites using transposon mutagenesis and bioinformatics. This was accomplished through the following aims: (1) optimize transposon mutagenesis for at least two strains from distinct habitats, (2) perform a large scale mutant hunt to identify gene clusters involved in antagonistic activity, and (3) characterize gene clusters through bioinformatic approaches.

12

CHAPTER II. MATERIALS AND METHODS

2.1 Selecting isolates

To increase chances of identifying dissimilar compounds involved in antagonistic activity, distinct strains were chosen based on different populations, separate environments, and ability to inhibit different pathogenic P. aeruginosa. Using the phylogenetic tree in Figure 1, strains were selected based on these criteria, and if they showed antagonistic activities against multiple AU strains. For the water isolates, strains that inhibited four or more pathogens were selected. For the soil isolates, strains that showed antagonistic activities over one or more pathogens were selected. After strains were chosen, it was determined if the isolates were capable of undergoing transposon mutagenesis (see below).

2.2 Growth conditions

Strains were collected and stored as previously described (Chatterjee et al. 2017).

Environmental Pseudomonas strains were cultured in Nutrient Broth (NB) (BD Difco) at 30°C for 24 hours prior to experimentation. For transposon mutagenesis, donor strain E. coli CC118

(pBAM1 plasmid) was cultured in a Lysogeny Broth (LB) (BD Difco) containing 50 μg/mL of kanamycin (Kan) and 150 μg/mL of ampicillin (Amp) at 37°C for 24 hours, and the helper strain

E. coli HB101 (pRK600 plasmid) was cultured in a LB containing 30 μg/mL of chloramphenicol

(Cam) at 37°C for 24 hours. (Martínez-García et al. 2011).

All pathogenic P. aeruginosa (AU strains) were cultured in liquid NB at 37°C for 24 hours. Non-disease causing relatives of the ESKAPE pathogens were grown in liquid / solid NB.

Enterococcus raffinosus, Bacillus subtilis, Staphylococcus epidermidis, Escherichia coli,

Acinetobacter baylyi, Enterobacter aerogenes, and Mycobacterium smegmatis were cultured for

24 hours at 37°C; Erwinia carotovora, and Lysobacter antibioticus SH10 TSA25, were cultured 13 for 24 hours at 30°C. The ESKAPE pathogens and other human pathogens, Bacillus cereus,

Shigella flexneri, Klebsiella pneumoniae, Acinetobacter baumannii, and Enterobacter cloacae were cultured in liquid NB for 24 hours at 37°C; Listeria monocytogenes, Enterococcus faecium and Enterococcus faecalis were cultured in Brain Heart Infusion (BHI) broth (BD Difco) for 24 hours at 37°C; Methicillin-Resistant Staphylococcus aureus (MRSA) was cultured in Tryptic soy broth (TSB) (BD Bacto) for 24 hours at 37°C. When strains were grown on plates, 1.5% agar was used in the media.

2.3 Transposon (Tn) mutagenesis

Conjugation and Tn mutagenesis was performed using a tri-parental mating system with a donor strain E. coli CC118 that contains the Tn carrying pBAM1 KanR vector, a helper strain E. coli HB101 (pRK600 plasmid), and a recipient Pseudomonas bacterium (Martínez-García et al.

2011) (Figure 2A, B). P. putida KT2440 was used as an experimental positive control recipient strain. During conjugation and Tn mutagenesis, the origin of transfer on the pBAM1 plasmid (in

E. coli CC118) is recognized by the pRK600 plasmid (in E. coli HB101) and the pBAM1 plasmid is transferred to the Pseudomonas recipient. Once pBAM1 has been transferred to the

Pseudomonas strain, the KanR Tn is mobilized by the transposase and randomly inserted into the

Pseudomonas chromosome. If the Pseudomonas recipient is capable of Tn mutagenesis, it will grow on a solid cetrimide agar with 50 μg/mL Kan plate.

E. coli and environmental Pseudomonas isolates were grown and cultured as described above. The tri-parental mating filter method and the tri-parental mating spotting method were both utilized for transposon mutagenesis. Each is described below.

14

2.3.1 Tri-parental mating filter method

Each strain (E. coli HB101, E. coli CC118 and Pseudomonas recipient) was cultured as described above. The optical density at 600 nm (OD600) for each culture was adjusted to 1.0 using 10 mM MgSO4. Each adjusted culture was centrifuged for 3 minutes at 12,000 rpm and washed with 1 mL of 10 mM MgSO4 to remove any traces of antibiotics. The cells were mixed in a 1:1:1 ratio into 5 mL of 10 mM MgSO4, resulting in the OD600 of the three cultures to be

0.03. The mating mixture was then vortexed and the resuspension was passed over a 0.45- μm- pore filter disk with a 47-mm diameter (Whatman). The filter was placed onto solid NB agar and incubated for 20 hours at 30°C. Following incubation, the filter was transferred to a 50 mL culture tube containing 5.0 mL of 10 mM MgSO4 solution and vortexed to resuspension. 100 μL was then plated onto solid cetrimide agar with 50 μg/mL Kan to select for Pseudomonas transconjugants.

2.3.2 Tri-parental mating spotting method

All strains (E. coli HB101, E. coli CC118 and Pseudomonas recipient) were cultured in liquid medium as described above. From an overnight culture, 500 μL was centrifuged for 3 minutes at 12,000 rpm. The supernatant was removed, and the pellet was washed with 500 μL of

10 mM MgSO4 to remove any traces of antibiotics. A mating mixture was created by adding 100

μL of each washed pellet (E.coli HB101: E. coli CC118: Pseudomonas recipient, 1:1:1) to a 1.7 mL Eppendorf tube. Additionally, 100 μL of the Pseudomonas re-suspended pellet was placed in a separate tube. The mixed culture and the Pseudomonas alone were vortexed, and centrifuged for 1 minute and 30 seconds at 12,000 rpm. The supernatant was removed from the tubes and the pellets were re-suspended in 10 μL of 10 mM MgSO4. The re-suspended pellets were spotted onto NB agar medium. Cultures were incubated at 30°C for 20-24 hours. Following incubation, 15 the conjugation spots were scrapped up with a bent pipet tip and re-suspended in 200 μL of 10 mM MgSO4. These were vortexed and 100 μL was plated onto solid cetrimide agar with 50

μg/mL Kan to select for Pseudomonas transconjugants. The Pseudomonas isolate alone was performed to test if that isolate was naturally resistant to Kan.

After transposon mutagenesis, both methods, the Pseudomonas transconjugants were replica plated onto sensitive P. aeruginosa or other sensitive pathogens as described below.

2.4 Optimization of transposon (Tn) mutagenesis

Stains capable of Tn mutagenesis were used to optimize a large scale mutant hunt. Two processes must be optimized: determining the ideal number of transconjugants and the mutant screen (discussed below). Optimized Tn mutagenesis was achieved when 25-50 transconjugants per cetramide with 50 µg/mL kanamycin plate (C+K) were select for growth. The aim of 25-50 transconjugants per plate was based on screening for mutants. Too few mutants, and 1000s of plates would need to be screened. Too many mutants would mask the loss of inhibition phenotype. Since the genome size of most pseudomonads is ~6 Mb and contains around 6000 genes (Gross and Loper 2009, Silby et al. 2011), we screen 200 C+K plates containing 50 transconjugants resulting in 10,000 mutants. In theory, the Tn should insert into each gene more than once.

To optimize Tn mutagenesis, and achieve 25-50 transconjugants per C+K plate, varied experimental parameters were tested. To increase efficiency of Tn mutagenesis, several parameters were tested individually as well as in combinations with one another. Multiple incubation times for overnight cultures of Pseudomonas isolates, and E. coli strains were tested; these times ranged from 15 hours to 30 hours. Environmental pseudomonads were subjected to a heat shock of 30 minutes at 42°C to assist in the uptake of the Tn vector. Incubation times for the 16 conjugated mixture spot on the NB plate were tested, ranging from 20 hours to 30 hours. Serial dilutions of the conjugated mixture spot with 10 mM MgSO4 were performed with all the different growth times. As well, we experimented with plating different dilutions of strains on the C+K plates. When optimizing, only a few C+K plates were worked with at a time. After experimental parameters were achieved, resulting in 25-50 transconjugants per plate, the strain was considered fully optimized for Tn mutagenesis.

2.5 Replica plating

Replica plating was used to transfer the transconjugants to the pathogen for co-culture, and subsequent mutant screening. C+K plates containing transconjugants were replica plated onto NB plates containing spread plated sensitive strains using a replicator (Bel-ArtTM) and 11

μm pore filter disk with a 150-mm diameter (Whatman). First the, filter paper was placed on top of the replicator. Then, a C+K plate was overlaid onto the filter paper, leaving the transconjugants on the paper. Finally, a NB plate, containing the sensitive strain, was inverted and pressed onto the filter paper thus transferring the transconjugants. A new filter paper was used for every replica plating process. The sensitive strain and the transconjugants were co- grown over night and screened for zones of inhibition.

2.6 Optimization of mutant screening

Many environmental Pseudomonas isolates can inhibit the growth of multiple AU strains

(Figure 1); however, the zones of inhibition vary between the sensitive strains. Reproducible tight (1 mm zone surrounding the transconjugant) and clear (total inhibition of sensitive strain) zones of inhibition were needed for mutant screening. Large zones of inhibition can overlap a mutant that has loss inhibition ability, and small zones of inhibition results in multiple false positive mutants (Figure 3). To identify a candidate sensitive strain for mutant screening, the 17 environmental Pseudomonas and the AU strains were cultured overnight as described above.

Two μL of the Pseudomonas isolates were spotted on 50 μL of spread plated AU strain on a NB agar plate. After incubation at 30°C for 24 hours and then incubation at 37°C for a second night, candidate AU strains were chosen that could provide optimal screens. Initially, AU strains were tested for optimization if they were antagonized (Chatterjee et al., 2017). However, some AU strains from the preliminary data were slow growing or gave very faint, unclear, zones of inhibition. Therefore, the other AU strains were tested for optimal zones of inhibition. If all the sensitive AU strains were not ideal candidates, because of growth rates or the zones of inhibition were too large or small, then non-disease causing relatives of the ESKAPE pathogens were tested as described above and optimized for zones of inhibition.

Multiple parameters were tested during the optimization of zone clearing. The parameters were tested individually and in combinations with one another. All combinations were tested during Tn mutagenesis and replica plating. The tri-parental mating procedure was followed as described above (Figure 2A, B) and then transconjugants were replica plated onto the tested sensitive strain (Figure 2C, D). Numerous parameters were tested for optimizing replica plating of the transconjugants onto the sensitive strains. Various amounts of the selected sensitive strain, for example, 25 μL spread plated, 50μL spread plated and 100μL, were spread plated and transconjugants were then replica plated. The sensitive strains were plated on Muller Hinton

(MH) agar (BD Difco) plates as well as NB plates containing 2% agar, 2.5% agar and 3% agar and competed with transconjugants. Furthermore, different amount of growth times for both the transconjugants and AU strains were tested. The zones of inhibition against a sensitive strain were considered optimized when the zones were clear and tight. Also, the zones of inhibition had to be repeatable for optimization. The combination of optimized tri-parental mating and 18 optimized zones of inhibition meant a large-scale mutant hunt was ready to perform.

2.7 Scaled up mutant hunt and verification of mutants

Once both Tn mutagenesis and mutant screens were optimized for a strain, a scaled up mutant hunt could be performed. For the Tn mutagenesis, 200 C+K plates were prepared. For replica plating either 200 NB agar plates or 200 MH agar plates were prepared, depending on the strain used, and the mutant screening optimization. Only one environmental Pseudomonas isolate was subjected to a scaled up mutant hunt at a time. First, the optimized protocol of Tn mutagenesis, for the chosen environmental Pseudomonas, was followed as described above. The conjugation mixture was diluted properly and 50 - 100 µL was spread plated on all 200 C+K plates. After incubating the transconjugants for 24 - 48 hours, all 200 C+K plates were replica plated onto 200 NB or MH plates containing the optimized sensitive strain. The transconjugants and the AU strain are co-cultured for the optimized time. Finally, the 200 NB or MH plates containing the transconjugants and the sensitive strain were screened for loss of inhibition mutants. Any transconjugant that exhibited a loss of inhibition of the sensitive strain was marked for verification.

Each potential loss of inhibition (LOI) mutant was verified according to the following procedure. Briefly, mutants were picked from the original C+K plate (prior to replica plating), streaked fresh on a new C+K plate and cultured for 24 hours at 30C. Single colonies were picked and grown overnight at 30C; NB broth frozen stocks in 15% glycerol were created and stored at -80°C. The sensitive strain was also grown overnight. The sensitive strain was spread plated onto a NB plate and 2 μL of the potential LOI mutant was spotted on top. These were again co-grown for 24 hours at 30C, followed by 24 hours at 37C. Finally, the LOI mutants were screened. If a mutant, again, exhibited a LOI phenotype, it was considered a verified 19 mutant and the results suggest that the gene region where the Tn is located is involved in the production of an antagonistic factor (Figure 4).

2.8 Antagonistic assay using human pathogens

After transposon mutagenesis was performed and LOI mutants were found and identified, the wildtype environmental Pseudomonas strains were tested for the ability to inhibit the growth of non-P. aeruginosa human pathogens including the ESKAPE pathogens. We wanted to work with other human pathogens to see the antagonistic capability of the strains and to gain greater insight to the potential compounds being produced by each strain. The Pseudomonas strains and the pathogens were prepared and grown overnight as described above. Two μL of the

Pseudomonas isolates were spotted on 50 μL of spread plated pathogen on a NB agar plate. After incubation at 30°C for 24 hours and then incubation for 24 hours at 37°C the plates were screened for zones of inhibition. Antagonism was defined as a zone of clearing, 1 mm in diameter, around the Pseudomonas isolates.

2.9 Mutant DNA extraction and PCR to identify transposon insertion

Genomic DNA extractions from all mutants were performed using the Wizard Genomic

DNA Purification Kit (Promega), following the protocol for gram-negative bacteria.

2.9.1 Linker-mediated (LM) PCR

Two μg of genomic DNA was digested using restriction enzymes PvuII, ScaI, SmaI, and

SspI from New England Biolabs per their protocols. The products were purified using the

Nucleospin Gel and PCR Clean-up kit and protocol (Machery-Nagel). The digested and purified

DNA was ligated to 4 μM of annealed linker PCR primers BPHI and BPHII (Table 1) using T4

DNA ligase. The products were again purified using the PCR Clean-up kit and protocol mentioned above. Linker-mediated (LM) PCR was performed in two cycles. LM-PCR I was 20 performed using 2 μL of ligated DNA, 5 μM primer 224 and 5 μM primer pBAM1 3424 Rev

(Table 1). PCR conditions for the BPCR I reaction were [19 cycles of 10 seconds at 92°C, 60 seconds at 50°C and 90 seconds at 72°C]. For the second round of LM-PCR 1 μL of LM-PCR I products was used as a template. LM-PCR II used 1 μL of LM-PCR I product, 5 μM primer 224, and 5 μM primer pBAM1 3373 Rev (Table 1). PCR conditions for the LM-PCR II reaction were

[34 cycles of 120 seconds at 92°C, 30 seconds at 55°C and 90 seconds at 72°C]. Sequencing was performed using primers 224 and pBAM1 Rev at the University of Chicago Comprehensive

Cancer Center DNA Sequencing and Genotyping facility (Chatterjee et al. 2017, Davis et al.

2017).

2.9.2 Arbitrary PCR

Arbitrary (ARB) PCR (defined as arbitrary primed polymerase chain reaction) was performed on the mutants in two cycles (Martínez-García et al. 2011, Das et al. 2005). ARB-

PCR I was performed using 2 μL of genomic DNA, and 5 μM primer ARB6, in combination with 5 μM primer ME-I-extR or 5 μM primer ME-O-extF (Table 1). The conditions for ARB-

PCR I reaction were [5 minutes at 95°C, 6 cycles of 30 seconds at 95°C, 30 seconds at 30°C and

90 seconds at 72°C, 30 cycles of 30 seconds at 95°C, 30 seconds at 45°C, and 90 seconds at

72°C, finally followed by an extension period of 4 minutes at 72°C]. For the second round of

ARB PCR 1 μL of ARB-PCR I product was used as the template. ARB-PCR II was performed using 1 μL of ARB-PCR I product and 5 μM primer ARB2 in combination with 5 μM primer

ME-I-intR or 5 μM primer ME-O-intF (Table 1). The conditions for ARB-PCR II reaction were

[60 seconds at 95°C, 30 cycles of 30 seconds at 95°C, 30 seconds at 52°C and 90 seconds at

72°C, finally followed by an extension period of 4 minutes at 72°C]. PCR purification was performed on each ARB-PCR II product using NucleoSpin Gel and PCR clean-up kit 21

(Macherey-Nagel). Samples were sequenced at the University of Chicago Comprehensive

Cancer Center DNA Sequencing and Genotyping facility using either ME-I intR primer or ME-

O-intF primer. All the mutant sequences were aligned to the wild-type genome sequence to determine the location of the transposon insertion.

2.10 Wildtype DNA extraction and genome sequencing

Genomic DNA of wildtype Pseudomonas strains was extracted using the Wizard

Genomic DNA Purification Kit (Promega). The purified genomic DNA was sent to the

University of Delaware DNA Sequencing & Genotyping Center where PacBio sequencing was performed as described by Chin et al. (2013) and Davis et al. (2017). After the sequencing was completed each genome was submitted to antiSMASH (antibiotic and Secondary Metabolite

Analysis SHell) for analysis of secondary metabolites and to Joint Genome Institute (JGI) genome portal for annotation and further analysis (Hadjithomas et al. 2015, Markowoitz et al.

2014, Weber et al. 2015).

22

CHAPTER III. RESULTS

In order to characterize gene clusters involved in the production of antibiotic compounds the loci had to first be identified using transposon (Tn) mutagenesis (Figure 2). To do this, conjugations (Figure 2A), transposon mutagenesis (Figure 2B) and mutant screens (Figure 2C-D) had to be optimized, for each strain, in order to perform a large scale mutant hunt. Prior to optimization, the strains were tested for the ability to undergo conjugation and Tn mutagenesis.

The results for each strain tested, the optimization processes, mutant hunts, identification and characterization of the gene clusters are described below.

3.1 Strains tested for the ability to undergo Tn mutagenesis

A total of thirty environmental Pseudomonas strains (from Figure 1) were tested for their ability to undergo Tn mutagenesis (Table 2). Of these, sixteen were water strains and fourteen were soil strains. In total, fifteen strains were capable of undergoing Tn mutagenesis (Table 2).

Eight of the strains, five water and three soil, were tested with both the Tn mutagenesis methods: filter and spotting (Figure 2A). It was determined that both methods yielded similar results for number of transconjugants. Since the filter and spotting methods were similar in transconjugant numbers it was decided to continue testing strains with the spotting method; the filter method was more labor intensive.

Results as summarized in Table 2 show seven water strains, two from population 2, one from population 5, two from population 6, one found between populations 6 and 7, and one from population 9 were capable of Tn mutagenesis (Figure 2A-B). Similarly, eight soil strains, one from population 2, three from population 9, one from population 10, one from population 11, one from population 13, and one that is not represented on the phylogenetic tree, were capable of Tn mutagenesis. The strains that were not capable of Tn mutagenesis or that were naturally resistant 23 to Kan were distributed throughout the populations (Table 2). These results are interesting because strains that are considered genetically similar based on genetic relatedness in the phylogenetic tree are found to have different Tn mutagenesis susceptibilities. For example, the strains LE5C2, LE5E2, S3E7, and S4F3 are all found in population 2 (Figure 1). Both the water strains (designated LE) are capable of undergoing Tn mutagenesis, while only one soil strain

(designated S) S3E7 can undergo Tn mutagenesis. Not only did the results show what strains could be optimized, but also support other findings that there is genetic diversity of among environmental Pseudomonas, even closely related strains.

3.2 Optimization of Tn mutagenesis

Strains capable of conjugation and Tn mutagenesis were optimized using the Tn mutagenesis spotting method (Figure 2A). A strain was considered optimized when 25-50 transconjugants consistently grew after Tn mutagenesis was repeated at least three times (Figure

2B). Optimization was attempted with strains LE6C8 and LE5B11 (Table 2) but was not pursued because they were located adjacent to the strain LE6C6 on the phylogenetic tree (Figure 1) and were found to both possess a thioquinolobactin sythetase gene involved in antagonistic activity in strain LE6C6. These results suggest that LE6C8 and LE5B11 strains encode the same antagonistic factor as LE6C6 previously identified (Chatterjee et al. 2017).

Optimization was also started with strains S3F11 and LE5E2 but conjugation results were inconsistent; sometimes 25-50 transconjugants were selected for on the cetrimide with kanamycin (C+K) plate and other times over 100 transconjugants were present after the same parameters were used. Therefore, optimization was stopped for both strains. To increase chances of finding diverse genes encoding novel compounds, strains were chosen from different populations. Subsequently, strains that were genetically similar to already optimized strains, 24 based upon the gyrB gene sequence, were not pursued. For instance, LE5C2 was one of the optimized strains (discussed below), but since LE5E2 is closely related, optimization for this strain was not pursued further. Similarly, optimization for strains S3E2 and S3E7 was halted because of their genetic similarity to the optimized strain S4G9 (Figure 1 and Table 2). S4F6 was tested for conjugation ability and Tn mutagenesis but was used in a different study so optimization was no longer pursued for this project.

Out of the fifteen strains capable of undergoing conjugation and Tn mutagenesis, eight strains were optimized. These strains and their optimized Tn mutagenesis parameters are listed in

Table 3. The number of optimized transconjugants per plate varied with some of the strains having less than 20 transconjugants while others resulted in up to 60 transconjugants. Although there was some variation in the optimized parameters, all Tn mutagenesis protocols resulted in around the 25-50 transconjugants on a C+K plate. Optimization was validated when the same number of transconjugants were obtained after three conjugation trials.

The overnight cultures of the recipient (environmental pseudomonad), donor strains (E. coli CC118 (pBAM1)) and helper strain (E. coli HB101 (pRK600)) used in conjugation were cultured for the same length of time (Table 3). This optimized time for the growth of the strains was determined, and not varied to avoid any variation in the number of transconjugants. Heat shock was performed and only used if there was a significant change in the number of transconjugants per plate. All strains, except S3E10 and S3G10, required heat shock to increase

Tn mutagenesis efficiency (Table 3). Diversity of the strains again became apparent when comparing Tn mutagenesis efficiencies based upon the different dilutions of the re-suspended spots, and the different amounts spread plated. Some strains such as LE6G11 had to be diluted 25

1:5500 to yield 10-35 transconjugants while LE5C2 was only diluted 1:100. These results further support the genetic diversity suggested by the gyrB tree (Figure 1).

3.3 Optimization of the mutant screen

Each strain optimized for conjugation and Tn mutagenesis (Table 3) also required optimization for a mutant screen (Figure 2C-D). Each strain was replica plated onto the AU strains it could inhibit (Figure 1) and the screen was optimized for a clear (no growth of the AU strain around the transconjugant) and tight (inhibition of approximately a 1 mm around the transconjugant) zone of inhibition. Examples of the different zones of inhibition phenotypes encountered during the optimization process are found in Figure 3. Optimization was initiated for strains LE6G8, S3G10, and S3H3 but all were eventually discontinued. When replica plating

LE6G8 and S3H3 transconjugants from C+K plates (Figure 2B) onto NB plates (Figure 2C) containing the sensitive strains, additional smaller colonies, possible contaminants, would appear on the NB plates after incubation (Figure 2D); the possible contaminants had the same phenotype as the transconjugants but were smaller in colony size. These smaller colonies were not growing on the C+K plates (Figure 2B) prior to replica plating or after replica plating (Figure 2C). We hypothesized that these smaller colonies were dormant E. coli cells that could not grow on the

C+K plates, but when replica plated onto the NB they could grow again. To test this hypothesis, we subjected the smaller colonies to 16S PCR and sequencing. It was determined that the strains were the recipient Pseudomonas isolates and not contaminants. These results suggest the smaller colonies did not take-up the transposon, resulting in lack of growth on the C+K plates. However, they were able to survive on the C+K plate such that when replica plated they grew on the NB plates. Because these smaller colonies were so abundant during Tn mutagenesis and replica plating, optimization of both strains, LE6G8 and S3H3, was no longer pursued. The zones of 26 inhibition produced by S3G10 were tight but not clear. Even after trying different parameters for optimization it would have been difficult to identify loss of inhibition mutants. Each of these could be worked with further and optimized, but due to time constraints it was decided to set them aside for potential future work. Strains S4G9, LE6C9, LE6G11, LE5C2, and S3E10 were all optimized and the parameters for a large scale mutant hunt are listed in Tables 3 and 4.

Overall the Tn mutagenesis process is described in Figure 2.

3.4 Optimization and mutant hunt results

3.4.1 Strain S4G9

For strain S4G9, zones of inhibition were optimized using three separate sensitive strains.

Initially optimization began with the AU sensitive strains but the zones of inhibition were so large that it was difficult to determine the boundaries of two distinct zones of inhibition (such as in Figure 3C). To decrease the zones of inhibition, increasing the concentration of agar in the NB plates was attempted but did not positively affect the zone diameter, instead the zones of inhibition became more opaque and difficult to detect. Alternatively, a sensitive environmental

Pseudomonas strain S3E3 was used to optimize the S4G9 mutant hunt screen (Figure 2D).

However, when performing the scaled up mutant hunt S3E3 was not an ideal strain to optimize since its mucoid phenotype caused difficulties in screening for transconjugants. No LOI mutants were found during the mutant hunt.

Next, another strain was optimized for a S4G9 mutant hunt. An antagonistic assay of

S4G9 against the safe ESKAPE relatives exhibited growth inhibition of Lysobacter antibioticus and Mycobacterium smegmatis. The zones of inhibition of L. antibioticus were optimized (such as in Figure 3A) and three mutant hunts were performed. This resulted in the identification of three LOI mutants: S4G9-13, S4G9-35 and S4G9-39. Each mutant was named based upon the 27 number on the plate (1-200), it was found on during the mutant hunt. Since three mutant hunts resulted in only three mutants it was decided to revisit the AU strains and try to optimize another isolate that initially was found to produce large zones of inhibition (such as in Figure 3C).

Eventually S4G9 was optimized with the strain AU12176 for the mutant hunt. After trying different media for growth of the sensitive strains, it was determined that replica plating transconjugants after 24 hours of growth resulted in clear and tight zone of inhibition (such as in

Figure 3D) for screening (Figure 2D). Additionally, the sensitive strain had to grow in 37C for an added 24 hours. The final mutant hunt was performed with AU12176 as the sensitive strain and seven mutants were identified and verified: S4G9-11, S4G9-51, S4G9-74, S4G9-86.2,

S4G9-133, S4G9-154 and S4G9-156.

3.4.2 Strain LE6G11

The optimization process of LE6G11 was not as difficult as with S4G9. LE6G11 did not give clear and tight zones of inhibition against the AU strains, so it was decided to test the safe

ESKAPE relatives for the inhibition of growth. This resulted in the inhibition of S. epidermidis,

E. coli, E. aerogenes, A. baylyi and B. subtilis. B. subtilis was optimized as a sensitive strain

(Tables 3 and 4). Three mutant hunts were performed on LE6G11 using B. subtilis. No LOI mutants were found during these mutant hunts. Since optimizing the zones of inhibition of

LE6G11 against any of the AU strains was difficult, and optimizing against the other safe

ESKAPE relatives was proving to be a challenge, it was decided to pursue other environmental strains for Tn mutagenesis.

3.4.3 Strain LE6C9

Strain LE6C9 inhibited the growth of six AU strains (Figure 1). After testing all six, strain AU17108 was the focus of the optimization since zones of inhibition were clear and tight 28

(Figure 3D). When replica plating to the sensitive strain on NB agar medium the zones were clear, however spread plating the sensitive strain on MH agar medium (Figure 2C) resulted in even a more optimal zone of inhibition (Figure 2D). While optimizing the zones of inhibition on the MH plates using AU17108 three LOI mutants were identified: LE6C9-1, LE6C9-3 and

LE6C9-4. The strain number given in this instance was from the order each mutant was identified. After optimization, one mutant hunt was performed on LE6C9 with the parameters in

Tables 3 and 4. Six LOI mutants were identified and verified: LE6C9-79, LE6C9-95, LE6C9-

199, LE6C9-209, LE6C9-237.1 and LE6C9-237.2. Naming each was based off the plate in which they were isolated from during the large scale mutant hunt.

3.4.4 Strain LE5C2

The strain LE5C2 inhibited the growth of two AU strains (Figure 1). After testing additional AU strains eight were found to be inhibited by LE5C2, and clear and tight zones of inhibition were observed with strains AU10014 and AU12176. Strain AU10014 gave a more defined tight zone of inhibition so this isolate was used for optimization. Two LOI mutants were identified during the optimization process and named LE5C2-4 and LE5C2-5. One mutant hunt was performed with LE5C2 following the optimized parameters (Tables 3 and 4). Sixteen additional LOI mutants were identified and verified during the mutant hunt: LE5C2-7, LE5C2-

12.1, LE5C2-12.2, LE5C2-25, LE5C2-27, LE5C2-39, LE5C2-41, LE5C2-48, LE5C2-75,

LE5C2-80, LE5C2-82, LE5C2-117, LE5C2-139, LE5C2-156, LE5C2-178 and LE5C2-181.

3.4.5 Strain S3E10

S3E10 inhibited the growth of one AU strain according to Chatterjee et al. (2017) (Figure

1). After testing the AU strains again with a spot assay, S3E10 was found to inhibit nine AU strains. Optimization was focused on the AU strains 9276, 12176 and 17659. The mucoid 29 phenotype of AU9276 resulted in poor screening because the biofilm produced would interfere with the zones of inhibition. The zones of inhibition of AU12176 and AU17659 were similar on

NB but tight zones of inhibition were optimized when AU17659 was grown on MH agar medium. One mutant hunt was performed on S3E10 with the sensitive strain AU17659 using the optimized parameters (Tables 3 and 4). Seventeen LOI mutants were identified during the mutant hunt: S3E10-8, S3E10-22, S3E10-33, S3E10-37, S3E10-42.2, S3E10-107, S3E10-113, S3E10-

129, S3E10-134, S3E10-139, S3E10-163, S3E10-172, S3E10-175, S3E10-183, S3E10-184,

S3E10-186, and S3E10-200.

3.5 Antagonistic assay against human pathogens

In addition to testing for antagonistic activity against the AU P. aeruginosa CF-derived strains, we also determined inhibition against other human pathogens with the environmental strains in which LOI mutants were identified. Most of the pathogens are on the ‘List of bacteria for which new antibiotics are urgently needed’ by the WHO (Lawe-Davies and Bennett 2017).

Based upon the antagonistic assay S4G9 and LE6C9 could inhibit two and three of the pathogens, respectively (Table 5). These antagonistic results suggest that these pseudomonads are producing broad spectrum compounds capable of inhibiting the growth of gram-positive and gram-negative pathogens.

3.6 PCR and sequencing results

Linker-mediated (LM) PCR was initially used to amplify the flanking regions of the transposon within each mutant. While amplification of the targeted DNA region from the water strains was successful, it was not for the soil isolates. The sequencing results for most LE6C9 mutants after a NCBI nucleotide BLAST analysis were non-ribosomal peptide synthetases

(Table 7) but the results for S4G9-39 were the pBAM1 cloning vector (Table 6). In other words 30 when using LM-PCR on the soil strain, only the transposon was being amplified and sequenced.

While this occurred for mutant S4G9-39, the other S4G9 mutants had no amplification when subjected to LM-PCR.

As an alternative method to LM-PCR, arbitrary (ARB) PCR was utilized (Martínez-

García et al. 2011) to amplify genomic DNA flanking the transposon insertion in the mutant (see

Table 1 for ARB PCR primers). ARB-PCR was successful for both S4G9 and LE6C9 mutants and was used for the remaining LOI mutants (Tables 6-7). The rationale was ARB-PCR worked consistently in amplifying the flanking regions of the transposon for both soil and water strains.

Additionally, ARB-PCR was less labor intensive; it took one day to perform while LM-PCR took a minimum of two days.

When performing ARB-PCR, an amplification was performed with the ME-O primers and with the ME-I primers (Figure 5). The primers were used to amplify both the upstream and downstream flanking regions of the Tn insertion site. If both sets of primers amplified genomic

DNA of a LOI mutant then all products were sequenced. If only one primer produced an amplicon, then that product was sequenced. In some instances, both sets of primers failed to amplify the product or sequencing was poor resulting in no sequencing results or analysis.

The PCR method and the top NCBI nucleotide BLAST hit for each LOI mutant can be found in Tables 6 and 7 for S4G9 and LE6C9 strains, respectively. Only ARB-PCR was performed for the mutants derived from the strains LE5C2 and S3E10 (Tables 8 and 9, respectively). The NCBI nucleotide BLAST results for S4G9, LE6C9, LE5C2, and S3E10 LOI mutants can be found in Tables 6-9, respectively. Interestingly, the BLAST results did not identify the same gene within all mutants. Instead, a wide variety of potential gene products was identified among the mutants. As shown in Tables 6-9, when comparing the mutants from one 31 strain, some gene products were similar and some differed. For example, when looking at strain

LE5C2 (Table 8) mutants LE5C2-4 and LE5C2-5 gene products were identified as putative non- ribosomal peptide synthetases however the gene products of mutants LE5C2-25 and LE5C2-27 were predicted L-lactate permeases. Thus, BLAST analyses gave insight to the gene disrupted; however, these results only provided a partial understanding of genes involved in antibiotic production based on sequences in the NCBI database. To gain a better understanding of loci that encode antagonistic factors, the full genome of each wild-type strain was sequenced.

3.7 Analysis of genomes and biosynthetic gene clusters (BGCs)

3.7.1 Alignment of mutants to wild type genomes

To identify the location of each Tn insertion in the mutated LOI genome, the Tn DNA sequence flaking the transposon insertion site, was aligned to the wild-type whole genome sequence using a NCBI nucleotide BLAST analysis. Having knowledge of the Tn insertion site gave insight into the metabolite produced by the Pseudomonas strain. Table 10 contains the LOI mutants and the coordinates of the wildtype genome aligned with the flanking Tn DNA sequence. When analyzing one strain and comparing Tn insertion sites of different mutants, many had the Tn inserted into the same gene region. Either the Tn was inserted into the same gene or was found in proximal loci, upstream or downstream. Other mutants had Tn insertions in distal genes suggesting other regions contribute in the relevant compound production. For instance, the Tn mutants S4G9-11 and S4G9-86.2 were located within the same genomic region.

However, mutants S4G9-133 and S4G9-156 aligned to different regions of the genome, almost 2

Mb away (Table 10). Similar results occurred with the other environmental strains and their LOI mutants (discussed below).

32

3.7.2 Wildtype whole genome annotation results

The genome sequences of S4G9, LE6C9, LE5C2 and S3E10 were annotated by JGI

IMG/M (Integrated Microbial Genomes and Microbiomes). JGI IMG/M provided statistics on the genomes and gave comparative measures such as protein coding regions, RNA genes, and biosynthetic gene clusters (Table 11). The genome sizes of S4G9, LE6C9, LE5C2 and S3E10 ranged from 5.4-6.9 Mb which are consistent with highly characterized Pseudomonas genomes

(Spiers et al. 2000). The protein coding genes for all the genomes made up about 95-97 percent of the all the genes. Interestingly, the soil strains tended to have similar percentages of total genes when compared to one another but not similar to the water strains. The same occurred with the water strains having similar percentages. For example, RNA genes other than tRNA or rRNA genes, made up about three percent of the total genes in the soil strains, but only made up one percent of the total genes in the water strains (Table 11). This may reflect adaptation to certain environments, whereby some genes are needed for soil strains that may not be selected for in water isolates. An important statistic identified by JGI for our research was the identification of biosynthetic gene clusters (BGCs) in the genomes. BGCs are known to encode a variety of secondary metabolites which are involved in antagonistic activity and production of novel antibiotics (Chatterjee et al. 2017, Weber et al. 2015). The BGCs ranged from three gene clusters in LE5C2 to thirteen in LE6C9 (Tables 11-12).

3.7.3 Identification and characterization of biosynthetic gene clusters (BGCs)

For each annotated wildtype strain, S4G9, LE6C9, LE5C2 and S3E10, BGCs were identified by JGI IMG/M and JGI IMG/ABC (Atlas for Biosynthetic Gene Clusters). Each BGC was also verified using antiSMASH, a program that identifies gene clusters involved in secondary metabolite production (Weber et al. 2015). To identify whether the BGCs might be 33 involved in the production of antagonistic factors, we determined if the coordinates of the Tn mutants (Table 10) were within BGCs (Table 12). Interestingly, each strain had mutants with Tn insertions in a BGC, suggesting the particular gene region plays a role in the production of an antagonistic factor. Table 12 shows all the BGCs identified within each individual wild-type strain and Tn insertions in these regions.

Intriguingly, each strain had Tn hits within two separate BGCs suggesting that genes within different BGCs are necessary for the production of a secondary metabolite. For instance,

S4G9 had two mutants with Tn insertions within a 102 kb BGC and two mutants with Tn insertions within a 53 kb BGC. LE6C9 had five mutants with Tn insertions within a 23 kb BGC and three mutants with Tn insertions within a 50 kb BGC. LE5C2 had three mutants with Tn insertions within a 53 kb BGC and five mutants with Tn insertions within a 79 kb BGC. Finally,

S3E10 had three mutants with Tn insertions within a 53 kb BGC and four mutants with Tn insertions within a 112 kb BGC (Table 12).

Non-ribosomal peptide synthetases (NRPSs) are multi-domain enzyme complexes responsible for the production of non-ribosomal peptides, which are the core components of peptide antibiotics (Gross and Loper 2009). All but one of the BGCs discussed above, encoded a putative NRPS (Table 12). Furthermore, Tn disruptions consistently appear in NRPS

TIGR01720/amino acid adenylation domain containing proteins. The Tn disruptions in these domains are significant because the amino acid adenlylation domain-containing protein is the first enzymatic step in the process of the production of a non-ribosomal peptide. Additionally, the other domains that are required for the production of a non-ribosomal peptide are also found in the BGCs. These domains include thiolation and thioesterase domains (Felngale et al. 2008,

Sieber and Marahiel 2005). Although most BGCs identified encoded a NRPS, these enzyme 34 complexes are known to produce diverse compounds. This is apparent by the size of the gene clusters as well as the non-homologous ORF between the BGCs (discussed below).

3.7.4 S4G9 gene clusters

S4G9 encodes a total of nine BGCs, two of which had more than one Tn insertion (Table

12). Although both BGCs were predicted to encode a NRPS, many genes within the clusters are non-homologous. These differences are obvious from the sizes of the gene clusters as well as the genes and predicted products (Figure 6 and Tables 13-14). The BGC 161816947 (JGI ID number) had Tn insertions within open reading frames (ORFs) 20 and 22 (Figure 6A). LOI mutant S4G9-11 had the Tn insertion within gene 20 and S4G9-86.2 had the Tn inserted in gene

22 of this gene cluster. The putative products of the ORFs 20 and 22 were both NRPS amino acid adenylation domain-containing proteins (Table 13). The BGC 161816952 only had Tn insertions in gene 22 (Figure 6B), which also encoded a predicted NRPS amino acid adenylation domain-containing protein (Table 14). Both BGCs 161816947 and cluster 161816952 appear to encode many enzymes that may be involved in the production of non-ribosomal peptides.

Additionally, genes encoding proteins and receptors for iron uptake are found in the gene clusters, suggesting the NRPS might produce a siderophore (Tables 13-14). Further analysis and testing would have to be performed to verify these results.

3.7.5 LE6C9 gene clusters

LE6C9 encodes a total of 13 BGCs (Table 12). One of the BGCs had five Tn insertions and another gene cluster had three (Figure 7). The BGC 161816930 was predicted to encode a phenazine. Phenazines are nitrogen containing heterocyclic compounds that are known to have broad spectrum antibiotic properties and involvement in virulence (Pierson and Pierson 2010).

This cluster was 23 kb in size with 25 ORFs. Within this cluster, ORFs 16 and 20 had one and 35 four Tn insertions, respectively. The Tn insertion in ORF 16 corresponds to mutant LE6C9-1; Tn insertions within gene 20 correspond mutants LE6C9_3, LE6C9-209, LE6C9-237.1 and LE6C9-

237.2 (Figure 7A). ORF 16 and 20 encode a putative phenazine biosynthesis protein A/B and a phenazine biosynthesis protein phzE, respectively (Table 15). Additionally, multiple genes predicted to encode secretion pathway proteins suggest the secretion of the compound (Table

15).

The BGC 161816936 putative product was a NRPS. This cluster was 50 kb in size with

33 ORFs. Within this cluster three Tn insertions occurred within three different genes (Figure

7B). The LOI mutant LE6C9-79 had the Tn insertion within ORF 11 (Figure 7B) that is predicted to encode a condensation domain-containing protein (Table 16). The mutant LE6C9-95 had a Tn insertion in ORF 16 which is predicted to encode an amino acid adenylation domain- containing protein, and the mutant LE6C9-199 had the Tn insertion within ORF 20 which is predicted to encode a condensation domain-containing protein (Table 16). These results suggest that all genes in the BGC play an important role in producing a functional non-ribosomal peptide. Additionally, this gene cluster could also produce a siderophore for nickel uptake since

ORFs 1 and 2 are predicted to encode a nickel transport system and ATP binding domains, but further analysis would have to be performed to verify these findings.

3.7.6 LE5C2 gene clusters

A total of three putative BGCs were identified in LE5C2. Of these, two were mutated by

Tn insertions (Table 12). The gene cluster 161819466 was 53 kb in size with 41 ORFs and predicted to encode a NRPS. Three Tn hits occurred within this BGC (Figure 8A). ORF 23 was disrupted by a Tn within the LOI mutants LE5C2-5 and LE5C2-12.1. The putative product of

ORF 23 was a NRPS amino acid adenylation domain-containing protein. ORF 24 was disrupted 36 by a Tn within the LOI mutant LE5C2-48. The putative product of ORF 24 was a hypothetical protein (Table 17). These genes are characteristic of an NRPS. Additionally, ORFs 26 and 27 have putative products of macrolide efflux systems. Macrolides being a class of antibiotics suggest LE5C2 is capable of producing a macrolide like compound (Table 17).

The gene cluster 161819467 was 79 kb in size with 53 ORFs and was predicted to encode a bacteriocin/NRPS (Table 12). A total of five mutants had Tn insertions in three genes (Figure

8B). The LOI mutants LE5C2-4, LE5C2-39 and LE5C2-75 all had Tn insertion within ORF 20.

ORF 20 encoded a putative amino acid adenylation domain-containing protein (Table 18).

Disruption of ORF 23 occurred within LOI mutant LE5C2-139 and was also predicted to encode amino acid adenylation domain-containing protein. Within LOI mutant LE5C2-178, the Tn inserted into ORF 32 and was predicted to encode a putative product of a Zn-dependent dipeptidase (Table 18). Along with these Tn disrupted genes other ORFs coding for iron uptake were located within the BGC suggesting that this locus might produce NRPS-derived siderophore.

3.7.7 S3E10 gene cluster

The strain S3E10 encodes a total of eight BGCs (Table 12). Two of the gene clusters had

Tn disruptions within mutants, and both BGCs encoded a putative NRPS. Again, even though these are putative NRPSs, the products are expected to be dissimilar because of the differences in

BGC size and the number of non-homologous genes between each cluster. The S3E10 cluster

161835842 consists of 42 ORFs and 52 kb. Three LOI mutants, S3E10-8, S3E10-172, and

S3E10-183, had Tn insertions in ORF 22 which is predicted to encode a putative NRPS amino acid adenylation domain-containing protein (Figure 9A, Table 19). 37

The gene cluster 161835844 was 112 kb with 51 ORFs (Table 12). Four LOI mutants had

Tn insertions within this gene cluster (Figure 9B). LOI mutants S3E10-33, S3E10-163 and

S3E10-175 had Tn insertions in ORF 29. Mutant S3E10-37 had a Tn insertion within ORF 31.

ORFs 29 and 31 encoded putative NRPS products of an amino acid adenylation domain- containing proteins (Table 20). Both of these gene clusters contain predicted products that are characteristic of a NRPS. Further analysis will be needed to determine the NRPS structure.

3.7.8 Comparison of gene clusters

A powerful strategy to our population-level approach is to apply the genetic, ecological, and function data (Figure 1) to optimize the chances of identifying diverse BGCs that encode dissimilar antagonistic products. Since all but one BGC with Tn insertions encoded a putative

NRPS (Table 12), we wanted to determine how similar or different the gene clusters and potential products were to one another. To do this we created a heat map that exhibits all seven

NRPS BGCs and compares the predicted protein domains within each cluster (Figure 10). From these results, we observed that although there are similarities between predicted proteins within each cluster, there are extensive differences, and no BGCs are identical. BGCs within S3E10 and

S4G9 are more similar to one another than to other strains, while the LE6C9 and LE5C2 group together and are more similar to one another than to the soil strains. Altogether, results suggest these strains, for which we have identified LOI Tn mutants, produce different compounds and offer a proof in concept of the utility of the population-level approach for diverse compound discovery.

3.7.9 Average nucleotide identity (ANI)

To identify the species of each strain we performed an Average Nucleotide Identity

(ANI) analysis with S4G9, LE6C9, LE5C2 and S3E10. ANI was performed on characterized 38

Pseudomonas strains of known species, as described by Silby et al. (2011), in addition to over

340 Pseudomonas genomes in the JGI database. ANI results are summarized in Table 21, and show only the characterized Pseudomonas strains, and top ANI percentages for each isolate used within this study. An ANI over 95% with another genome suggests that the two strains are of the same species (Zhang et al. 2014). An ANI less than 95% suggests strains are different species.

The only strain with an ANI over 95% was LE6C9, with species match to . The highest ANI was 97.95% to Pseudomonas chlororaphis chlororaphis ATCC

9446 (Table 21). The other strain did not exhibit an ANI over 95%. S3E10 exhibited an ANI of

91.7% to Pseudomonas lurida LMG 21995. This strain is a closely related to P. fluorescens

SBW25 and the ANI of S3E10 to SBW25 was 91.31% (Table 21). P. lurida, P. azotoformans, P. simiae and P. extremorientalis all group phylogenetically within the P. fluorescens clade, suggesting all are genetically similar to P. fluorescens (Gomila et al. 2015). Although S3E10 is most closely related to P. fluorescens, it is likely not of the same species having an ANI less than

95%. S4G9 exhibited an ANI of 89.42% to Pseudomonas poae A2-S9. It also had an 89% ANI to P. fluorescens SBW25, P. grimontii BS2976, P. marginalis BS2952, and S3E10 (Table 21).

Finally, LE5C2 ANI results were interesting because it only exhibited 83% ANI to other strains

(Table 21). The low ANI percentage of S4G9, S3E10 and LE5C2 to other known strains suggests that our isolates cannot be designated a particular species, and may possibly represent a new species within Pseudomonas genus. Moreover, the diverse BGCs of each of these strains

(Figure 10) adds speculation that these loci encode novel antagonistic compounds. Future research will be directed at biochemically characterizing the compounds.

39

CHAPTER IV. DISCUSSION

Based upon previous findings that P. aeruginosa is not abundant in the environment

(Deredjian et al. 2014, Khan et al. 2007), we hypothesized that Pseudomonas species outcompete pathogenic P. aeruginosa strains when outside of the CF lung. One factor that may contribute to competition by native soil- and water-derived strains is the production of secondary metabolites and antimicrobial factors. Previously, the Wildschutte lab showed that environmental strains can outcompete CF-derived P. aeruginosa pathogens (Chatterjee et al. 2017). The goal of this study was to identify and characterize gene clusters from diverse environmental

Pseudomonas that encode products capable of inhibiting the growth of P. aeruginosa. To achieve this goal, we optimized transposon mutagenesis for large scale mutant hunts, identified

LOI mutants, and characterized the genes involved in the antagonistic activity.

We identified four strains of Pseudomonas, S4G9, LE6C9, LE5C2, and S3E10, with the predicted capability to produce secondary metabolites (Table 12). All but one BGC were predicted to encode NRPSs which appear to synthesize unique and diverse compounds. From this work, we present an interdisciplinary approach of wet-bench work combined with bioinformatics for BGC identification and analysis. Furthermore, we gained insight into the genetic diversity of the strains and population structure of the Pseudomonas strains in our collection. With the knowledge of BGCs involved in the production of antagonistic activity, this research provides a strong foundation for future chemical compound characterization and antibiotic discovery.

4.1 Population-level diversity of Pseudomonas

By isolating environmental strains, genetically characterizing them and competing the environmental pseudomonads against CF-derived P. aeruginosa (Chatterjee et al. 2017), we 40 showed that environmental bacteria inhibit pathogenic P. aeruginosa. To identify inhibitory compounds produced by environmental Pseudomonas strains, we used a population-level approach developed by the Wildschutte lab to categorize distinct strains based upon the gyrB gene, their habitat, and activity (Figure 1). Here, we have shown the success of this methodology.

Structure of the Pseudomonas population was characterized using gyrB gene sequences

(Figure 1). This housekeeping gene is more diverse than the 16S rRNA gene and thus provides a finer scale phylogenetic tree compared to the more conserved 16S rRNA gene. Moreover, a 16S analysis would weaken our genetic approach in the identification of distinct strains. For instance,

S4G9 and S3E10 were within the same phylogenetic tree population (Figure 1) based on the gyrB gene; 16S rRNA gene sequence of these strains are likely to be identical and may be removed for the analysis to avoid characterization of clonal isolates. Using the gyrB gene, we determined related strains, such as S4G9 and S3E10 are distinct, which is further supported by their dissimilar antagonistic profiles (Figure 1). Additionally, the secondary metabolites produced by these strains appear to be unique, suggesting that they produce different inhibitory factors (Table 12). This population level of diversity among these Pseudomonas strains has also been observed among marine Vibrio species (Hunt et al. 2008, Preheim et al. 2011, Cordero et al. 2012). Phylogenetic analysis using the hsp60 housekeeping gene has shown similar population-level diversity (Hunt et al. 2008) and distribution among habitats in Vibrio species

(Preheim et al. 2011). It is interesting to speculate about whether comparable diversity exists among other bacterial populations. Furthermore, similar antagonistic data (Figure 1) has also been observed among Vibrio species (Cordero et al. 2012, Burks et al. 2017 in press) suggesting competition is a selective pressure among groups of bacteria. Future inter-population analyses 41 between Vibrio and Pseudomonas can be used to determine if distinct compounds are produced by different genera of bacteria, or if beneficial products are horizontally transferred between groups.

Regardless of BGC differences, genomic diversity suggested by the gyrB gene is also supported by differences observed in conjugation and Tn mutagenesis efficiency (Table 2). A variety of strains from the same population showed variable Tn mutagenesis capabilities, and the efficiencies were very diverse (Table 2). For example, strains LE6G11, S3E2, S3E10 and S4G9 were all capable of Tn mutagenesis and are located in population 9, while LE6G7 is also located in population 9, but could not undergo Tn mutagenesis. Additionally, S3E10 was about twice more efficient at Tn mutagenesis than S4G9 even though they were genetically similar based off the gyrB gene (Table 3). Overall, these results support the prediction made by Chatterjee et al.

(2017) that environmental pseudomonads are genetically distinct and produce a variety of secondary metabolites.

4.2 Transposon mutagenesis and bioinformatics

Our approach in this study utilized a combination of wet bench work and bioinformatics which has proven to be useful in identifying and characterizing genes involved in the production of antagonistic factors. Although many steps had to be optimized (Tables 2-4), Tn mutagenesis has proven to be effective in identifying BGCs involved in antimicrobial activities.

Our overall goal was to identify diverse BGCs involved in antagonistic activities. This was accomplished by using the pBAM1 suicide vector to randomly insert a KanR transposon into the genomes of strains (Martínez-García et al. 2011) in order to identify the gene involved in antagonistic activities. Although this may seem simplistic, many challenges must be overcome.

First, conjugation with environmental strains was difficult likely due to CRISPR and restriction- 42 modification systems (Vasu and Nagaraja 2013). However, to improve our chances of success with conjugation and mutagenesis, we utilized the characterized pBAM1 system that has been shown to conjugate efficiently in Pseudomonas isolates. (Martínez-García et al. 2011). Second, the pBAM1 system still required optimization to identify LOI mutants. Although this was labor intensive, it is a reliable approach to identify BGCs involved in antagonistic activity (Table 12).

Although Tn mutagenesis requires multiple optimization steps, this approach was proven to be an effective tool in identifying genes involved in the production of antimicrobial factors.

With this working system, Tn mutagenesis can be used to identify genes involved in antagonistic activity against other pathogens. For instance, members of the Wildschutte lab have adapted Tn mutagenesis to identify BGCs whose products inhibit oomycete plant pathogens, and toxin producing algae involved in freshwater harmful algal blooms. In other applications, our approach has been adapted for use in the Small World Initiative (SWI) and has shown to be an effective teaching strategy to help understand genetics and antibiotic production. For example, Tn mutagenesis was introduced in SWI (http://www.smallworldinitiative.org/) in 2015 at Bowling

Green State University, where students learn first-hand about scientific research and the identification of genes involved in antibiotic production (Davis et al. 2017). It is worth noting that SWI was established in 2012, and has traditionally used chemical extraction techniques to isolate antibiotics from strains with antagonistic activity. Despite global efforts involving over

100 universities across 12 countries, no antibiotic has been discovered using a biochemical approach. The Davis et al. manuscript (2017) provides a streamlined approach to antibiotic discovery and is expected to ease the discovery of natural antibiotics through SWI in the near future. 43

Alone Tn mutagenesis proves to be a powerful tool, but without the use of next generation sequencing and bioinformatics, characterizing the genes and potential secondary metabolites responsible for the antagonistic activities would be extremely challenging if not impossible. For this research, we focused on using the databases antiSMASH and JGI. Both have the capacity of taking genome sequences and identifying BGCs that encode secondary metabolites. The antiSMASH program identifies putative biosynthetic gene clusters and protein domains based off similarity to sequences in the antiSMASH database (Weber et al. 2015). The

JGI genome portal integrates genomes from all three domains of life. Analysis of the genome through JGI IMG/M and JGI IMG/ABC can be extensive, where users can compare genomes to other well characterized genomes, identify BGCs, and analyze genes and functions (Hadjithomas et al. 2015, Markowitz et al. 2013). With the availability of antiSMASH, JGI IMG/M, JGI

IMG/ABC and other computational programs, the process of identifying novel secondary metabolites has become user friendly without the necessity of designing tailored programs for specific needs (Markowitz et al. 2014, Medema and Fischbach 2015, Weber et al. 2015). We have demonstrated the power of combining both wet bench work and computational analysis for application of BGCs involved in antibiotic discovery. The possibilities and uses of this research are limitless and can be adapted to identify the production of other secondary metabolites.

4.3 Secondary metabolites of Pseudomonas

By utilizing an interdisciplinary approach involving molecular biology, genomics, and bioinformatics, we not only identified genes involved in the production of secondary metabolites, but also identified BGCs that encode antagonistic factors (Table 12). We identified two BGCs within each of the following strains, S4G9 (Table 13 and 14), LE6C9 (Tables 15 and 16), LE5C2

(Tables 17 and 18), and S3E10 (Tables 19 and 20), yielding a total of eight unique secondary 44 metabolites. Interestingly, seven of the eight secondary metabolites were NRPSs which encode diverse multi-domain enzyme complexes responsible for the production of non-ribosomal peptides. NRPSs have been well characterized and produce a variety of compounds that are medically and ecologically relevant (Felnagle et al. 2008, Martínez-Núñez and López y López

2016).

The enzymology of NRPSs is quite interesting, and gives rise to diverse groups of compounds. All NRPSs have core domains within the gene clusters, which include an adenylation domain, a thiolation (peptidyl carrier protein) domain, and a condensation domain

(Figure 11, Sieber and Marahiel 2004). Additionally, other domains may be present including an epimerization domain, a methyltransferase domain and a cyclization domain (Felnagle et al.

2008, Tyc et al. 2017). The domains together form modules. The diversity of the compounds produced by the NRPSs result from interactions of the domains, the number of modules, as well as the surrounding coding regions within the gene cluster (Marahiel et al. 1997).

Our results show that each of the gene clusters proposed to produce a NRPS encode all the core domains (Tables 13-20). Furthermore, when looking at the ORFs within these gene clusters, diversity is quite evident suggesting the production of different products from these

NRPSs (Figures 6-9). Additionally, the strains S4G9, LE5C2 and S3E10 had loss of inhibition mutants with transposon disruptions in two NRPS gene clusters, suggesting that both gene clusters are required for the production of the antagonistic compounds. Some bacterial systems can have NRPS proteins interacting with other NRPS proteins, using the modules from both

NRPSs for compound production (Martínez-Núñez and López y López 2016), and recently, it was shown that products from different strains of fungi combine to form an antibiotic (Stierle et al. 2017). Thus, the diversity of mechanisms leading to the production of these potential 45 compounds can be complex. Research has shown that NRPSs are capable of producing many products including antibiotics (Kleinkauf and Döhren 1990), antifungals (Michelsen and

Stougaard 2010), immunosuppressants (Bushley et al. 2013) and anticancer drugs (Lombó et al.

2006). Many drugs have been discovered as naturally produced by bacteria, while others have been genetically engineered (Sieber and Marahiel 2004, Smanski et al. 2016). With the knowledge of so many potential products it is difficult to determine what S4G9, LE6C9, LE5C2 and S3E10 are producing. However, with such a diversity of NRPSs it is possible we have identified four novel compounds. Additionally, these compounds could have multiple applications, such as the inhibition of human and plant pathogens.

One class of secondary metabolites produced by NRPSs is siderophores. These compounds are high affinity, iron chelators and are typically upregulated during iron poor conditions (Tyc et al. 2017). Pseudomonads produce pyoverdines, which are siderophores that fluoresce and are involved in iron acquisition and virulence in pathogenic strains (Meyer and

Abdallah 1978, Lamont et al. 2002). Recently, we showed that a siderophore could be involved in growth inhibition (Chatterjee et al. 2017). If a bacterium can sequester iron more efficiently than the surrounding bacteria, then that bacterium has a better chance of survival in its environment (Johnstone and Nolan 2015). Some of the NRPS gene clusters, have genes encoding iron binding complexes (Tables 13-14, 16, 18, 20). This suggests the possibility that these strains are producing siderophores or pyoverdines with antimicrobial-like functions. Future experimental research and biochemical characterization will identify the structure and function of these compounds.

The final BGC that was not mentioned above was from the strain LE6C9 that could be responsible for producing a phenazine product. Phenazines are nitrogen-containing heterocyclic 46 compounds that possess antibiotic, antifungal and antitumor properties (Laursen and Nielsen

2004, Mavrodi et al. 2006). Like NRPSs, phenazines are a diverse group of compounds with over 100 structural derivatives and more than 6,000 compounds produced (Pierson and Pierson

2010). From the results, the LE6C9 phenazine appears to be a broad spectrum compound with antagonistic activity against both gram positive and gram negative pathogens (Table 5). This is supported by other research showing that phenazines possess broad spectrum antibiotic capabilities (Mayrodi et al. 2006). Furthermore, these compounds are produced by P. chlororaphis strains, and according to the ANI analysis, LE6C9 is a member of that species

(Table 21, Pierson and Thomashow 1992). Further analysis of LE6C9 and its antibiotic compounds could be very beneficial, and give insight into potential interactions of phenazines and NRPSs since LE6C9 produces both and had Tn insertions in each BGC (Table 12).

Preliminary work has been performed to identify the putative structures of the secondary metabolites using an algorithm called PRediction Informatics for Secondary Metabolomes

(PRISM) (Skinnider et al. 2015). However, the predicted structures are based upon similarities within the PRISM database, which means the results are putative. For instance, the predicted structures for S4G9 BGC 161816952 and LE5C2 BGC 161819466 are the same compound

(Table 22). However, based upon the differences between BGCs (Figures 6B, 8A, 10), it seems unlikely that the products are identical. Additionally, this is based on just one gene cluster, and again the results suggest each strain has two gene clusters involved in the production of the secondary metabolite. However, putative structures can certainly give insight to the actual compounds produced and can assist in methods for compound isolation. The predicted structure for LE6C9 BGC 16186930 is the exact structure of a phenazine-1-carboxcyclic acid, which is known to be produced by P. choloraphis (Table 22, Pierson and Pierson 2010). These predicted 47 structures support the results from the gene and putative function of the gene cluster as well as the ANI results for LE6C9 (Table 21). To reiterate, these structures are only predicted based upon known compounds in the PRISM database. Additional experiments and biochemical characterization are required to identify the compounds structure.

4.4 Future directions

We have provided a strong foundation for future work involving identification of the chemical structure and function of compounds produced by the strains S4G9, LE6C9, LE5C2 and S3E10. To verify that genes and BGCs are involved in the production of the antimicrobial compounds, cloning and gene deletions can be performed. Additionally, we can test each mutant for the production of siderophores using siderophore typing methods and a Blue Agar CAS

(chrome azurol S) assay for siderophore detection (Louden et al. 2011, Meyer et al. 2002). If the wildtype strains produce the siderophore, but the gene deletion mutants do not, this would suggest that the NRPS is producing a siderophore involved in antagonistic activity. In addition, collaborators at the University of Michigan are extracting the compounds for biochemical characterization.

Our results are significant because the genes involved in the BGCs are the hallmark of antibiotic compounds. This work provides a basis, for future work, where the compounds can be extracted, identified, and provide insight into the identification of novel antibiotics that are useful in treating devastating pathogens. Gaining knowledge about the biochemical properties of these antagonistic compounds and their activity has potential to benefit the global crisis of antibiotic resistance.

48

REFERENCES

Bennett JW, Chung KT. 2001. Alexander Fleming and the discovery of penicillin. Advances in

Applied Microbiology 49:163-184.

Blainey PC, Milla CE, Cornfield DN, Quake SR. 2012. Quantitative analysis of the human

airway microbial ecology reveals a pervasive signature for cystic fibrosis. Science

Translational Medicine 4(153): 153ra130.

Burks DJ, Norris S, Kauffman K, Joy A, Arevalo P, Azad RK, and Wildschutte H. 2017.

Environmental vibrios represent a source of antagonistic compounds that inhibit

pathogenic Vibrio cholerae and Vibrio parahaemolyticus strains. MicrobiologyOpen (in

press).

Bushley KE, Raja R, Jaiswal R, Cumbie JS, Nonogaki, Boyd AE, Owensby CA, Knaus BK,

Elser J, Miller D, Di Y, McPhail KL, Spatafora JW. 2013. The genome of Tolypocladium

inflatum: Evolution, organization, and expression of the cyclosporin biosynthetic gene

cluster. PLoS Gene 9(6): e1003496. doi: 10.1371/journal.pgen.1003496.

Caballero JD, Clark ST, Coburn B, Zhang Y, Wang PW, Donaldson SL, Tullis DE, Yau YCW,

Waters VJ, Hwang DM, Guttman DS. 2015. Selective sweeps and parallel

pathoadaptation drive Pseudomonas aeruginosa evolution in cystic fibrosis lung. mBio

6(5): e00981-15.

Center for Disease Dynamics, Economics & Policy. 2015. State of world’s antibiotics, 2015.

CDDEP: Washington, D.C. [https://cddep.org/sites/default/files/swa_2015_final.pdf].

Chatterjee P, Davis E, James S, Wildschutte JH, Yu F, Sherman DH, McKay RM, LiPuma JJ, 49

Wildschutte H. 2017. Population-level diversity of pseudomonads and inhibition of cystic

fibrosis patient-derived Pseudomonas aeruginosa. Applied and Environmental

Microbiology 83: e02701-16. doi: 10.1128/AEM.02701-16.

Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner Cheryl, Clum A, Copeland A,

Huddleston J, Eichler EE, Turner SW, Korlach J. Nonhybrid. 2013. Finished microbial

genome assemblies from long-read SMRT sequencing data. Nature Methods 10(6): 563-

569.

Cordero OX, Wildschutte H, Kirkup B, Proehl S, Ngo L, Hussain F, Le Roux F, Mincer T, Polz

MF. 2012. Ecological populations of bacteria act as socially cohesive units of antibiotic

production and resistance. Science 337: 1228-1231. doi: 10.1126/science.1219385.

Coutinho HDM, Falcão-Silva VS, Gonçalves GF. 2008. Pulmonary bacterial pathogens in cystic

fibrosis patients and antibiotic therapy: a tool for the health workers. International

Archives of Medicine 1:24.

Cox MJ, Allaiger M, Taylor B, Baek MS, Huang YJ, Daly RA, Karaoz U, Anderson GL, Brown

R, Fujimura KE, Wu B, Tran D, Koff J, Kleinhenz ME, Nielson D, Brodie EL, Lynch

SV. 2010. Airway microbiota and pathogen abundance in age-stratified cystic fibrosis

patients. PLos One 5(6): e11044.

Das S, Noe JC, Paik S, Kitten T. 2005. An improved arbitrary primed PCR method for rapid

characterization of transposon insertion sites. Journal of Microbiological Methods 63: 89-

94.

Davis E, Sloan T, Aurelius K, Barbour A, Bodey E, Clark B, Dennis C, Drown R, Fleming M,

Humbert A, Glasgo E, Kerns T, Lingro K, McMillian M, Meyer A, Pope B, Stalevicz A,

Steffen B, Steindel A, Williams C, Wimberly C, Zenas R, Butela K, Wildschutte H. 50

2017. Antibiotic discovery throughout the Small World Initiative: a molecular strategy to

identify biosynthetic gene clusters involved in antagonistic activity. MicrobiologyOpen

00:1-9. doi: 10.1002/mbo3.435

Deredjian A, Colinon C, Hien E, Brothier E, Youenou B, Cournoyer B, Dequiedt S, Harmann A,

Jolivet C, Houot S, Ranjard L, Saby NP, Nazaret S. 2014. Low occurrence of

Pseudomonas aeruginosa in agricultural soils with and without organic amendment.

Front Cell Infect Microbiol 4:53.

Felnagle EA, Jackson EE, Chan YA, Podevels AM, Berti AD, McMahon MD, Thomas MG.

2008. Nonribosomal peptide synthetases involved in the production of medically relevant

natural products. Mol. Pharm 5(2):191-211.

Freire-Moran L, Aronsson B, Manz C, Gyssens IC, So AD, Monnet DL, Cars O. 2011. Critical

shortage of new antibiotics in development against multidrug-resistant bacteria – time to

react is now. Drug Resistant Updates 14: 118-124. doi: 10.1016/j.dup.2011.02.003.

Frieden T, 2013. Antibiotic resistance threats in the United States, 2013. Center for Disease

Control, Atlanta: CDC. https://www.cdc.gov/drugresistance/pdf/ar-threats-2013-508.pdf.

Gomila M, Peña A, Mulet M, Lalucat J, García-Valdés E. 2015. Phylogenetics and systematics

in Pseudomonas. Front. Microbiol. 6:214. doi: 10.3389/fmicb.2015.00214.

Gross H, Loper JE. Genomics of secondary metabolite production by Pseudomonas spp. 2009.

Natural Products Reports 26: 1408-1446.

Hadjithomas M, Chen IMA, Chu K, Ratner A, Palaniappan K, Szeto E, Huang J, Reddy TBK,

Cimerančič P, Fischback MA, Ivanona NN, Markowitz VM, Kyrpides NC, Pati A. 2015.

IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel

secondary metabolites. mBio 6(4): e00932-15. doi:10.1128/mBio.00932-15. 51

Hidron AI, Edwards JR, Patel J, Horan TC, Sievert DM, Pollock DA, Fridkin SK, National

Healthcare Safety Network Team and Participating National Healthcare Safety Network

Facilities. 2008. Antimicrobial-resistant pathogens associated with healthcare-associated

infections: annual summary of data reported to the national healthcare safer network at

the Centers for Disease Control and Prevention, 2006-2007. Infection Control and

Hospital Epidemiology 29(11): 996-1011.

Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF. 2008. Resource partitioning and

sympatric differentiation among closely related bacterioplankton. Science 320: 1081-

1085.

Jelsbak L, Johansen HK, Frost AL, Thøgersen R, Thomsen LE, Ciofu O, Yang L, Haagensen

JAL, Høiby N, Molin S. 2007. Molecular epidemiology and dynamics of Pseudomonas

aeruginosa populations in lung of cystic fibrosis patients. Infection and Immunity 75(5):

2214-2224.

Johnstone TC, Nolan EM. 2015. Beyond iron: non-classical biological functions of bacterial

siderophores. Dalton Transactions 44(14): 6320-6339.

Kahn NH, Ishii Y, Kimata-Kino N, Esaki H, Nishino T, Nishimura M, Kogure K, 2007. Isolation

of Pseudomonas aeruginosa from open ocean and comparison with freshwater, clinical

and animal isolates. Microb Ecol 53:173-186.

Kassem El-Sayed A, Hothersall J, Cooper SM, Stephens E, Simpson TJ, Thomas CM. 2003.

Characterization of the mupirocin biosynthesis gene cluster from Pseudomonas

fluorescens NCIMB 10586. Chemistry & Biology 10: 419-430.

Kleinkauf H, von Döhren H. 1990. Nonribosomal biosynthesis of peptide antibiotics. European

Journal of Biochemistry 192:1-15. doi: 10.1111/j.1432-1033.1990.tb19188.x. 52

Klepac-Ceraj V, Lemon KP, Martin TR, Allgaier M, Kembel SW, Knapp SV, Bohannan BJM,

Green JL, Maurer BA, Kolter R. 2010. Relationship between cystic fibrosis respiratory

tract bacterial communities and age, genotype, antibiotics and Pseudomonas aeruginosa.

Environmental Microbiology 12(5): 1293-1303.

Lamont IL, Beare PA, Ochsner U, Vasil AI, Vasil ML. 2002. Siderophore-mediated signaling

regulates virulence factor production in Pseudomonas aeruginosa. PNAS 99(10): 7072-

7077.

Laursen JB, Nielsen J. 2004. Phenazine natural products: biosynthesis, synthetic analogues and

biological activity. Chem Rev 104: 1663-1685.

Lawe-Davies O, Bennett S. 2017. WHO publishes list of bacteria for which new antibiotics are

urgently needed [news release]. World Health Organization. Available from

http://www.who.int/mediacentre/news/releases/2017/bacteria-antibiotics-needed/en/

LiPuma JJ. 2010. The changing microbial epidemiology in cystic fibrosis. Clinical Microbiology

Reviews 23(2): 299-323.

Lombó F, Velasco A, Castro A, de la Calle F, Braña AF, Sánchez-Puelles JM, Méndez C, Salas

JA. 2006. Deciphering the biosynthesis pathway of the antitumor thiocoraline from a

marine Actinomycete and its expression in two Streptomyces species. ChemBioChem 7:

366-376. doi: 10.1002/cbic.200500325.

Louden BC, Haarmann D, Lynne AM. 2011. Use of blue agar CAS assay for siderophore

detection. Journal of Microbiology & Biology Education 12(1): 51-53.

Marahiel MA, Stachelhaus T, Mootz HD. 1997. Modular peptide synthetases involved in

nonribosmal peptide synthesis. Chemical Reviews 97(7): 2651-2674. doi:

10.1021/cr960029e 53

Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Pillay M, Ratner A, Huang J,

Woyke T, Huntemann M, Anderson I, Billis K, Varghese N, Marvormatis K, Pati A,

Ivanova NN, Kyrpides NC. 2014. IMG 4 version of the integrated microbial genomes

comparative analysis system. Nucleic Acid Research 42: D560-D567.

Martínez-García E, Calles B, Arévalo-Rodríguez M, Lorenzo V. 2011. pBAM1: an all-synthetic

genetic tool for analysis and construction of complex bacterial phenotypes. BMC

Microbiology 11:38.

Martínez-Núñez MA, López y López VE. 2016. Nonribosomal peptides synthetases and their

applications in industry. Sustainable Chemical Processes 4:13.

Mauldin PD, Salgado CD, Hansen IS, Durup DT, Bosso JA. 2010. Attributable hospital cost and

length of stay associated with health care-associated infections caused by antibiotic

resistant gram negative bacteria. Antimicrobial Agents and Chemotherapy 54(1): 109-

115.

Mavrodi DV, Blankenfeldt W, Thomashow LS. 2006. Phenazine compounds in fluorescent

Pseudomonas spp. biosynthesis and regulation. Annu. Rec. Phytophathol. 44:417-445

McGowan Jr JE. 2001. Economic impact of antimicrobial resistance. Emerging Infectious

Disease 7(2): 286-292.

Medema MH, Fischbach MA. 2015. Computational approaches to natural product discovery.

Nature chemical biology 11: 639-648. doi: 10.1038/NCHEMBIO.1884.

Meyer JM, Abdallah MA. 1978. The fluorescent pigment of Pseudomonas fluorescens:

biosynthesis, purification and physicochemical properties. Journal of General

Microbiology 107:319-328.

Meyer JM, Geoffroy VA, Baida N, Gardan L, Izard D, Lemanceau P, Achouak W, Palleroni NJ. 54

2002. Siderophore typing, a powerful tool for the identification of fluorescent and

nonfluorescent pseudomonads. Applied and Environmental Microbiology 68(6): 2745-

2753.

Michelsen CF, Stougaard P. 2010. A novel antifungal Pseudomonas fluorescens isolated from

potato soils in Greenland. Curr Microbiol 62:1185-1192. doi: 10.1007/s00284-010-9846-

4.

Moore LW. 1988. Pseudomonas syringae: disease and ice nucleation activity. Ornamentals

Northwest Archives 12(2): 3-16.

O’Neill J. 2014. Antimicrobial resistance: tackling a crisis for the health and wealth of nations.

The Review on Antimicrobial Resistance. amr-review.org.

Palleroni NJ, Pieper DH, Moore ERB. 2010. Microbiology of Hydrocarbon-Degrading

Pseudomonas. Handbook of Hydrocarbon and Lipid Microbiology. doi: 10.1007/978-3-

540-77587-4_129.

Parad RB, Gerard CJ, Zurakowski D, Nichols DP, Pier GB. 1999. Pulmonary outcome in cystic

fibrosis is influenced primarily by mucoid Pseudomonas aeruginosa infection and

immune status and only modestly by genotype. Infection and Immunity 67(9): 4744-

4750.

Peix A, Remírez-Bahena MH, Velázquez E. 2009. Historical evolution and current status of the

of genus Pseudomonas. Infections, Genetics and Evolution 9: 1132-1147.

Peleg AY, Hooper DC. 2010. Hospital acquired infections due to gram-negative bacteria. New

England Journal of Medicine 362(19): 1804-1813.

Perry JD, Laine L, Hughes S, Nicholson A, Galloway A, Gould FK. 2008. Recovery of 55

antimicrobial-resistant Pseudomonas aeruginosa from sputa of cystic fibrosis patients by

culture on selective media. Journal of Antimicrobial Chemotherapy 61: 1057-1061.

Pierson 3rd LS, Pierson EA. 2010. Metabolism and function of phenazines in bacteria: impacts

on the behavior of bacteria in the environment and biotechnological processes. Appl

Microbiol Biotechnol 86: 1659-1670.

Pierson 3rd LS, Thomashow LS. 1992. Cloning and heterologous expression of the phenazine

biosynthetic locus from Pseudomonas aureofaciens 30-84. Mol Plant Microbe Interact

5(4):330-339.

Preheim SP, Boucher Y, Wildschutte H, David LA, Veneziano D, Alm EJ, Polz. 2011.

Metapopulation structure of Vibrioaceae among costal marine invertebrates. Enviorn

Microbiol 13:265-275.

Rau MH, Hansen SK, Johansen HK, Thomsen LE, Workman CT, Nielsen KF, Jelsbak L, Høiby

N, Yang L, Molin S. 2010. Early adaptive developments of Pseudomonas aeruginosa

after the transition from life in the environment to persistent colonization in the airways

of human cystic fibrosis hosts. Environmental Microbiology 12(6): 1643-1658.

Richard P, Le Floch R, Chamoux C, Pannier M, Espaze E, Richet H. 1994. Pseudomonas

aeruginosa outbreak in a burn unit: role of antimicrobials in the emergence of multiply

resistant strains. J. Infect. Dis. 170:380-383. doi:10.1093/infdis/170.2.377.

Sieber SA, Marahiel MA. 2005. Molecular mechanisms underlying nonribosomal peptide

synthesis: approaches to new antibiotics. Chem. Rev. 105:715-738. doi:

10.1021/cr0301191.

Sievert DM, Ricks P, Edwards JR, Schneider A, Patel J, Srinivasan A, Kallen A, Limbago R, 56

Fridkin S, National Healthcare and Safety Network (NHSN) Team and Participating

NHSN Facilities. 2013. Antimicrobial-resistance pathogens associated with healthcare-

associated infections: summary of data reported to the national healthcare safety network

at the Centers for Disease Control and Prevention, 2009-2010. Infection Control and

Hospital Epidemiology 34(1): 1-14.

Silby MW, Winstanley C, Godfrey SAC, Levy SB, Jackson RW. 2011. Pseudomonas genomes:

diverse and adaptable. Federation of European Microbiology Societies 35(4): 652-680.

Skinnider MA, Dejong CA, Rees PN, Johnston CW, Li H, Webster ALH, Wyatt MA, Magarvey

NA. 2015. Genomes to natural products prediction informatics for secondary

metabolomes (PRISM). Nucleic Acid Research 42(20): 9645-9662. doi:

10.1093/nar/gkv1012.

Smanski MJS, Zhou H, Chaesen J, Shen B, Fischbach MA, Voigt CA. 2016. Synthetic biology to

access and expand nature’s chemical diversity. Nature Reviews Microbiology 14: 135-

149.

Smith EE. Buckley DG, Wu Z, Saenphinmmachak C, Hoffman LR, D’Argenio DA, Miller SI,

Ramsey BW, Speert DP, Moskowitz SM, Burns JL, Kaul R, Olson MV. 2006. Genetic

adaptation by Pseudomonas aeruginosa to the airways of cystic fibrosis patients. PNAS

103(22): 8487-8492.

Spiers AJ, Buckling A, Rainey PB. 2000. The causes of Pseudomonas diversity. Microbiology

146: 2345-2350.

Stierle AA, Stierle DB, Decato D, Priestley ND, Alverson JB, Hoody J, McGrath K, Klepacki D.

2017. The berkeleylactones, antibiotic macrolides from fungal coculture. Journal of

Natural Products 80(4): 1150-1160. doi: 10.1021/acs.jnatprod.7b00133. 57

Tayeb LA, Ageron E, Grimont F, Grimont PAD. 2005. Molecular phylogeny of the genus

Pseudomonas based on rpoB sequences and application for the identification of isolates.

Research in Microbiology 156: 763-773.

Tommasi R, Brown DG, Walkup GK, Manchester JI, Miller AA. 2015. ESKAPEing the

labyrinth of antibacterial discovery. Nature Reviews Drug Discovery 8: 529-542.

Tunney MM, Klem ER, Fodor AA, Gilpin DF, Moriarty TF, McGrath SJ, Muhlebach MS,

Boucher RC, Cardwell C, Doering G, Elborn JS, Wolfgang MC. 2010. Use of culture and

molecular analysis to determine the effect of antibiotic treatment on abundance during

exacerbation in patients with cystic fibrosis. Thorax 66(7): 579-584.

Tyc O, Song C, Dickschat JS, Vos M, Garbeva P. 2017. The ecological role of volatile and

soluble secondary metabolites produced by bacteria. Trends in Microbiology 25(4): 280-

292.

U.S. Department of Health and Human Services, Centers of Disease Control and Prevention.

1999. Achievements in Public Health 1900-1999. Morbidity and Mortality Weekly report

48(29): 621-629.

Vasu K, Nagaraja V. 2013. Diverse functions of restriction-modification systems in addition to

cellular defense. Microbiology and Molecular Biology Reviews 77(1): 53-72.

Ventola CL. 2015. The antibiotic resistance crisis part 1: causes and threats. P&T 40(4): 277-

283.

Ventola CL. 2015. The antibiotic resistance crisis part 2: management strategies and new agents.

P&T 40(5): 344-352.

Vodovar N, Vinals M, Liehl P, Basset A, Degrouard J, Spellman P, Boccard F, Lemaitre B. 58

2005. Drosophila host defense after oral infection by an entomopathogenic Pseudomonas

species. PNAS 102(32): 11414-11419.

Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach MA, Müller R,

Wohlleben W, Breitling R, Takano E, Medema MH. 2015. antiSMASH 3.0 – a

comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic

Acids Research 43: W237-W243.

Yamamoto S, Kasai H, Arnold DL, Jackson RW, Vivian A, Harayama S. 2000. Phylogeny of the

genus Pseudomonas: intragenic structure reconstructed from the nucleotide sequences of

gyrB and rpoD genes. Microbiology 146: 2385-2394.

Yayan J, Ghebremedhin B, Rasche K. 2015. Antibiotic resistance of Pseudomonas aeruginosa in

pneumonia at a single university hospital center in Germany over a 10-year period. PLoS

ONE 10(10): e0139836. doi: 10.1371/journal.pone.0139836

Zhang W, Du P, Zheng H, Yu W, Wan L, Chen C. 2014. Whole-genome sequence comparison

as a method for improving bacterial species definition. J. Gen. Appl. Microbiol. 60:70-78.

Zhao J, Schloss PD, Kalikin LM, Carmody LA, Foster BK, Petrosino JF, Cavalcoli JD,

VanDevanter DR, Murray S, Li JZ, Young VB, LiPuma J. 2012. Decade-long bacterial

community dynamics in cystic fibrosis airways. PNAS 109(15): 5809-5814.

59 0.1 and 0.001 PAO1 AU29142 AU17152 AU17787 AU29014 AU16821 AU8215 AU9276 AU8660 AU15021 AU19324 AU12175 AU26901 AU23316 AU14282 AU17766 AU15873 AU12176 AU27145 AU19425 AU20339 AU18234 AU11650 AU27282 AU17108 AU10014 AU18422 AU18005 AU19092 AU15152 AU28855 AU18081 AU16000 AU30307 1 APPENDIX A. FIGURES 2

3 4

Figure 1. Phylogenetic analysis and antagonistic activity among environmental Pseudomonas strains. Population structure for 330 environmental pseudomonads by 5 neighbor-joining analysis of the gyrB sequence, overlain with data for habitat (inner columns: cream color, soil; blue, water) and antagonistic activity (outer bars: black). The magnitude of antagonism is indicated by the 6 height of the bar. Populations are shaded and numbered. Population 6 (highlighted in yellow) is enriched with strains that antagonize multiple environmental and CF derived 7 isolates. Red squares indicate inhibition of CF- derived P. aeruginosa. Dotted gray lines delineate population boundaries. The top X- axis consists of the pathogenic P. aeruginosa strain phylogeny based on the gyrB gene 8 sequence. Both trees were rooted by P. aeruginosa PAO1. The 0.1 and 0.001 represent bar scale values of diversity among 9 environmental and clinical strains respectively. (Reprinted with permission from ASM: Chatterjee P, Davis E, James S, Wildschutte JH, Yu F, Sherman DH, McKay RM, LiPuma JJ, Wildschutte H. 2017. Population-level diversity of pseudomonads 10 and inhibition of cystic fibrosis patient-derived Pseudomonas aeruginosa. Applied and Environmental Microbiology 83: e02701-16. doi:10.1128/ AEM.02701-16.) 11

12

13

60

Figure 2. Tn mutagenesis strategy to identify mutants with a loss of antagonistic phenotype. (A) Conjugation of Tn vector from the donor E. coli to the desired recipient strain. (B) Selection for transconjugants with Tn insertion on cetrimide agar with kanamycin. (C) Transconjugants are replica plated to a spread plated sensitive strain on NB. (D) Mutants for loss of antagonistic phenotype are screened. (E) Insertion of Tn in recipient genome is identified by a PCR method. See Methods for experimental details.

61

Figure 3. Various zones of inhibition encountered during optimization. Sensitive strains are spread plated on NB plates and transconjugants are replica plated to the sensitive strain. Optimization of a mutant screen involves the identification of clear and tight zones of inhibition. (A and D) Examples of zones of inhibition that would be considered optimized because they are clear and tight. (B) Zones of inhibition that would not be considered tight because the zones of inhibition are too small, <1 mm, and (C) too large whereby overlapping zones may interfere with the identification of loss of inhibition mutants.

62

Figure 4. Loss of antagonistic phenotype by transposon insertion. (A) Wildtype strain inhibiting the growth of a sensitive P. aeruginosa pathogen. (B) Verified mutant with the loss of inhibition phenotype to the sensitive P. aeruginosa.

Figure 5. Arbitrary PCR. Arbitrary PCR consists of two rounds of amplification. Round 1: primers, ME-I extR and ME-O extF (dark blue arrows) specific to the transposon (white box, with open triangles at each end) are paired with arbitrary primers (solid and striped red arrows) that bind randomly to the DNA regions flanking the transposon. Products of the first round are used as templates for the second round. Round 2: primers ME-I intR and ME-O intR (light blue arrows) are paired with arbitrary 2 primers (red striped arrows) which are identical to the 5’ sequence of the first round arbitrary primers. Amplicons from the second round are sequenced and analyzed. (Reprinted with permission from Journal of Microbiological Methods, 63, Das S, Noe JC, Paik S, and Kitten T, An improved arbitrary primed PCR method for rapid characterization of transposon insertion sites, 89-94, 2005, with permission from Elsevier). 63

A .

B.

Figure 6. BGCs of S4G9 that were mutated by Tn insertions. (A) Gene cluster 161816947 is predicted to encode a NRPS and has two Tn insertions in LOI mutants S4G9-11 and S4G9-86.2. (B) Gene cluster 161816952 is also predicted to encode a NRPS, and has two Tn insertions in LOI mutants S4G9-133 and S4G9-156. Tn insertions are identified by red triangles. Arrowed boxes are ORFs; identical colors represent homologous loci.

64

A.

B.

Figure 7. BGCs of LE6C9 that were mutated by Tn insertions. (A) Gene cluster 161816930 encodes a putative homoserine lactone / phenazine and has five Tn insertions in LOI mutants LE6C9-1, LE6C9-3, LE6C9-209, LE6C9-237.1, and LE6C9-237.2 (B) Gene cluster 161816936 encodes a putative NRPS and has three Tn insertions in LOI mutant LE6C9-79, LE6C9-95 and LE6C9-199. Tn insertions are identified by red triangles. Arrowed boxes are ORFs; identical colors represent homologous loci.

65

A.

B.

Figure 8. BGCs of LE5C2 that were mutated by Tn insertions. (A) Gene cluster 161819466 encode a putative NRPS and has three Tn insertions in LOI mutants LE5C2-5, LE5C2-12.1 and LE5C2-48. (B) Gene cluster 161819467 also encodes a putative NRPS and has five Tn insertions in LOI mutants hit in LE5C2-4, LE5C2-39, LE5C2-75, LE5C2-139 and LE5C2-178. Tn insertions are identified by red triangles. Arrowed boxes are ORFs; identical colors represent homologous loci.

66

A .

B.

Figure 9. BGCs of S3E10 that were mutated by Tn insertions. (A) Gene cluster 161835842 encodes a putative NRPS and has three Tn insertions in LOI mutants S3E10-8, S3E10-172 and S3E10-183. (B) Gene cluster 161835844 also encodes a putative NRPS and has four Tn insertions in LOI mutants S3E10-33, S3E10-37, S3E10-163 and S3E10-175. Tn insertions are identified by red triangles. Arrowed boxes are ORFs; identical colors represent homologous loci.

67

Figure 10. Function heat map of S4G9, LE6C9, LE5C2 and S3E10 NRPS BGCs. Comparison of the predicted protein families within each gene cluster. The red and blue bar across the top denotes every protein family in the clusters. The purple heat map shows the genes within each protein family in the clusters, darker color denotes a higher frequency of that gene between the clusters. The far left shows a hierarchical tree based off similarity of predicted protein domains within each gene cluster.

68

Figure 11. NRPS core domains. Function of a domain is indicated by the color red. (A) Recognition and activation of an amino acid with ATP by the A domain forms an aminoacyl- AMP intermediate. (B) The T domain or peptidyl carrier protein (PCP) works with the A domain to form the aminoacylthioester intermediate from the aminoacyl-AMP intermediate. (C) The C domain is responsible for catalyzing the peptide bonds between the two substrates tethered to the T domains within two separate modules. The acceptor substrate is the aminoacylthioester accepting the substrate from the previous module, while the donor substrate is the aminoacylthioester being donated down the NRPS to the next module. (Reprinted with permission from Sieber SA, Marahiel MA. 2005. Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibiotics. Chem. Rev. 105:715-738. doi: 10.1021/cr0301191. Copyright 2005 American Chemical Society.)

69

APPENDIX B. TABLES

Table 1. Primers used in this study. Name Sequence 5’ → 3’ Usage Reference/ Source BPHI CAAGGAAGGACGCTGTCTGTCGAAGGTAAGGAACGGACGAGAGAAGGGAGA LM-PCR Ligation Davis et al. 2017 BPHII CTCTCCCTTTCGAATCGTAACCGTTCGTACGAGAATCGCTGTCCTCTCCTTG LM-PCR Ligation Chatterjee et al. 2017 224 CGAATCGTACCGTTCGTACGAGAATCGCT LM-PCR Round 1/ Round 2/ Sequencing pBAM1 ATCCATGTTGCTGTTCAGAC LM-PCR Round 1 3424 Rev pBAM1 ATGGCTCATAACACCCCT LM-PCR Round 2 / 3373 Sequencing Rev ARB6 GGCACGCGTCGACTAGTACNNNNNNNNNNACGCC ARB PCR Round 1 Martínez- ARB2 GGCACGCGTCGACTAGTAC ARB PCR Round 2 García et al. 2011 ME-I- CTCGTTTCACGCTGAATATGGCTC ARB PCR Round 1 extR ME-I- CAGTTTTATTGTTCATGATGATATA ARB PCR Round 2 intR / Sequencing ME-O- CGGTTTACAAGCATAACTAGTGCGGC ARB PCR Round 1 extF ME-O- AGAGGATCCCCGGGTACCGAGCTCG ARB PCR Round 2 intF / Sequencing 70

Table 2. Pseudomonas strains tested for conjugation and Tn mutagenesis. Environment Strains Tested for Population Method Conjugate Conjugation 09A 9 (LE5A9) 1 Spotting DNCa 07B 19 (LE5B7) 5 Spotting DNC 08B 20 (LE5B8) 7 Spotting DNC 11B 23 (LE5Bll) 6 Spotting Xb 02C 26 (LE5C2)c 2 Spotting X 02E 50 (LE5E2) 2 Spotting X 10F 70 (LE5F10) 5 Filter/Spotting DNC 11G 83 (LE5G11) 5 Spotting DNC Water 09H 93 (LE5H9) 6 Filter/Spotting NRK 10H 94 (LE5H10) 7 Spotting NRK 08C 128 (LE6C8) 6 Spotting X 09C 129 (LE6C9) 5 Filter/Spotting X 07G 175 (LE6G7) 9 Spotting DNC 08G 176 (LE6G8) 6/7 Filter/Spotting X 11G 178 (LE6G11) 9 Filter/Spotting X 07H 187 (LE6H7) 5 Spotting DNC S08D 252 (S3D8) 12 Spotting DNC S02E 205 (S3E2) 9 Spotting X S07E 245 (S3E7) 2 Spotting X S10E 269 (S3E10) 9 Spotting X S11F 278 (S3F11) 13 Spotting X S10G 271 S3G10 10 Spotting X S03H 216 (S3H3) 11 Spotting X Soil S01A 289 (S4A1) 13 Spotting DNC S07A 337 (S4A7) 10 Spotting DNC S03F 310 (S4F3) 2 Filter/Spotting DNC S06F 334 (S4F6) Spotting X S08F 350 (S4F8) 10 Filter/Spotting DNC S09G 359 (S4G9) 9 Filter/Spotting X S03H 312 (S4H3) 13 Spotting DNC a: DNC means does not conjugate b: X means does conjugate c: Highlighted strains represents the strains mutant hunts were performed on

71

Table 3. Optimized parameters for conjugation and Tn mutagenesis. Mating Amount Dilution of Growth mixture spread Number of Strain Heat Re- Time of spot plated on transconjugants Optimized shock suspended Overnights growth C+K per C+K plate spot time plate S4G9 20 hrs Yes 22 hrs 1:1000 50 μL 10-50 LE6G8 20 hrs Yes 22 hrs 1:1000 100 μL 20-45 LE6G11 20 hrs Yes 22 hrs 1:5500 100 μL 10-35 LE6C9 20 hrs Yes 22 hrs 1:300 100 μL 25-45 LE5C2 20 hrs Yes 22 hrs 1:100 50 μL 20-40 S3H3 20 hrs Yes 22 hrs 1:50 50 μL 30-60 S3E10 23 hrs No 22 hrs 1:2000 50 μL 15-30 S3G10 23 hrs No 22 hrs 1:500 100 μL 20-30

Table 4. Optimized parameters for mutant screen. Amount of Time Sensitive Environmental Sensitive Growth Time of tranconjugants Strain Media Strain strain transconjugants and sensitive spread strain co-grew plated S4G9 S3E3 48 hrs 30 μL NB 24 hrs in 30C Lysobacter 24 hrs 100 μL NB 24 hrs in 30C antibioticus AU12176 24 hrs 150 μL NB 24 hrs in 30C 24 hrs in 37C LE6G11 B. subtilis 30 - 40 hrs 100 μL NB 24 hrs in 30C LE6C9 AU17108 30 hrs 25 μL MH 48 hrs in 30C LE5C2 AU10014 48 hrs 150 μL NB 24 hrs in 30C 24 hrs in 37C S3E10 AU17659 48 hrs 50 μL MH 24 hrs in 30C

Table 5. Human pathogen antagonistic assay results. Environmental Strain Pathogen inhibited S4G9 MRSA L. monocytogenes LE6C9 MRSA K. pneumoniae B. cereus LE5C2 None S3E10 None

72

Table 6. PCR and NCBI nucleotide BLAST results for S4G9 LOI mutants. Mutant PCR BLAST Results Gene product S4G9-11 ARB PCR Pseudomonas agarici Hypothetical protein S4G9-13 ARB PCR Pseudomonas Chemotaxis protein CheY. fluorescens Chemotaxis protein CheR LM-PCR No sequence n/a S4G9-35 ARB PCR Pseudomonas Dihydrodipicolinate reductase azotoformans molecular chaperone DnaJ Copper resistance protein CopD. Molecular chaperone DnaJ S4G9-39 ARB PCR Pseudomonas Copper resistance protein CopD azotoformans pyruvate dehydrogenase LM-PCR Cloning vector pBAM1 n/a PvuII 224 LM-PCR Cloning vector pBAM1 n/a PvuII 3373R LM-PCR Cloning vector pBAM1 n/a SmaI 224 LM-PCR Cloning vector pBAM1 n/a SmaI 3373R S4G9-51 ARB PCR Pseudomonas UTP—GlnB (protein PII) extremaustralis uridylyltransferase S4G9-74 ARB PCR No sequence n/a S4G9-86.2 ARB PCR Cloning Vector pBAM1 n/a S4G9-133 ARB PCR Pseudomonas Non-ribosomal peptide synthase azotoformans S4G9-154 ARB PCR No sequence n/a S4G9-156 ARB PCR Pseudomonas brenneri Non-ribosomal peptide synthase

73

Table 7. PCR and NCBI nucleotide BLAST results for LE6C9 LOI mutants. Mutant PCR BLAST Gene product Results LE6C9-1 LM-PCR SspI Pseudomonas Non-ribosomal peptide synthase. 224 chlororaphis Transcriptional regulator LM-PCR SspI Pseudomonas AHL-dependent transcriptional 3373R chlororaphis regulator ARB PCR Pseudomonas Non-ribosomal peptide synthetase. chlororaphis Phenazine biosynthesis protein LE6C9-3 LM-PCR SmaI Pseudomonas Non-ribosomal peptide synthetase 2,3- 224 chlororaphis dihydro-3-hydroxyanthranilate isomerase LM-PCR SmaI Pseudomonas Non-ribosomal peptide synthetase. 3373R chlororaphis Anthranilate synthase LE6C9-4 LM-PCR PvuII Pseudomonas Non-ribosomal peptide synthetase. 224 chlororaphis Hybrid sensor histidine kinase/response regulator LM-PCR PvuII Pseudomonas Non-ribosomal peptide synthetase. 3373R chlororaphis Hybrid sensor histidine kinase/response regulator LE6C9-79 ARB PCR Pseudomonas Surfactin synthase thioesterase subunit chlororaphis hypothetical protein LE6C9-95 ARB PCR Pseudomonas Amino acid adenylation domain- chlororaphis containing protein

LE6C9-199 ARB PCR Pseudomonas Condensation domain-containing ME-O chlororaphis protein predicted arabinose efflux permease, MFS family ARB PCR Condensation domain-containing ME-I LE6C9-209 ARB PCR Pseudomonas Anthranilate synthase 2,3-dihydro-3- ME-O chlororaphis hydroxyanthranilate isomerase ARB PCR Glycine hydroxymethyltransferase ME-I Surfactin synthase thioesterase subunit LE6C9-237.1 ARB PCR Pseudomonas Non-ribosomal peptide synthetase. ME-O chlororaphis Anthranilate synthase ARB PCR Anthranilate synthase ME-I 2,3-dihydro-3-hydroxyanthranilate isomerase LE6C9-237.2 ARB PCR Pseudomonas Glycine hydroxymethyltransferase ME-O chlororaphis ARB PCR Non-ribosomal peptide synthetase. ME-I Anthranilate synthase

74

Table 8. NCBI nucleotide BLAST results for LE5C2 LOI Mutants. Mutant BLAST Results Gene Product LE5C2-4 Pseudomonas fluorescens Non-ribosomal peptide synthetase LE5C2-5 Non-ribosomal peptide synthetase LE5C2-7 No sequence n/a LE5C2-12.1 Pseudomonas putida Peptide synthetase LE5C2-12.2 No sequence n/a LE5C2-25 Pseudomonas entomophila L-lactate permease LE5C2-27 Pseudomonas entomophila L-lactate permease LE5C2-39 Pseudomonas sp. SyrP protein/Non-ribosomal peptide synthetase LE5C2-41 Pseudomonas entomophila L-lactate permease LE5C2-48 Pseudomonas putida Hypothetical protein LEC52-75 Pseudomonas sp. Non-ribosomal peptide synthase LE5C2-80 No sequence n/a LE5C2-82 Pseudomonas entomophila L-lactate permease LE5C2-117 No sequence n/a LE5C2-139 Cloning vector pBAM1 n/a LE5C2-156 Pseudomonas entomophila L-lactate permease LE5C2-178 Pseudomonas poae dipeptidase LE5C2-181 Pseudomonas entomophila L-lactate permease

75

Table 9. NCBI nucleotide BLAST results for S3E10 LOI Mutants. Mutant BLAST Results Gene Product S3E10-8 Pseudomonas trivialis Peptide synthase S3E10-22 No sequence n/a S3E10-33 Pseudomonas chlororaphis Amino acid adenylation domain-containing protein S3E10-37 Pseudomonas sp. bs2935 Non-ribosomal peptide synthase S3E10-42.2 Pseudomonas trivialis Ornithine carbamoyltransferase S3E10-107 No sequence n/a S3E10-113 Pseudomonas azotoformans Copper resistance protein CopD 3-octaprenyl-4-hydroxybezoate decarboxylase S3E10-129 Pseudomonas fluorescens 4-hydroxy-tetrahydrodipicolinate synthase S3E10-134 Pseudomonas azotoformans Sec-independent protein translocase protein TatA, phosphoribosyl-ATP pyrophosphatase S3E10-139 Pseudomonas fluorescens n-acetyl-gamma-glutamyl-phosphate reductase membrane protein S3E10-163 Pseudomonas azotoformans Non-ribosomal peptide synthase S3E10-172 Pseudomonas fluorescens Pyoverdine synthetase A Putative non-ribosomal peptide synthetase S3E10-175 Pseudomonas fluorescens Peptide synthase S3E10-183 Pseudomonas cedrina Non-ribosomal peptide synthase S3E10-184 Pseudomonas cedrina Glycerol-3-phosphate dehydrogenase (NAD(P)+) S3E10-186 Pseudomonas fluorescens tRNA uridine (34) 5- carboxymethlyaminomethyl synthesis GTP S3E10-200 Pseudomonas Cytochrome bo3 quinol oxidase subunit 2 extremorientalis Pseudomonas fluorescens Cytochrome ubiquinol oxidase subunit II

76

Table 10. Alignment of mutant sequences to the wild-type genome sequence. Mutant Alignment Coordinates in Wild-Type Genome S4G9-11 2578408-2579050 S4G9-13 3533780-3534560 S4G9-35 5841869-5842537 S4G9-39 556121-556340 S4G9-51 1453021-1453175 S4G9-86.2 2590008-2590493 S4G9-133 4864313-4864838 S4G9-156 4855581-4856009 LE6C9-1 6208853-6209407 LE6C9-3 6213967-6214298 LE6C9_3 6213601-6214136 LE6C9-4 5502299-5502800 LE6C9-79 3890586-3889537 LE6C9-95 3897595-3898217 LE6C9-199 3903503-3904363 LE6C9-209 6213749-6214704 LE6C9-237.1 6213573-6214592 LE6C9-237.2 6213754-6212699 LE5C2-4 1928577-1928973 LE5C2-5 1670450-1671056 LE5C2-12.1 1667604-1667750 LE5C2-25 4465925-4466915 LE5C2-27 4465925-4466915 LE5C2-39 1934944-1935947 LE5C2-41 4466136-4467182 LE5C2-48 1673181-1673660 LE5C2-75 1933648-1933877 LE5C2-82 4465948-4466915 LE5C2-139 1940567-1941328 LE5C2-156 4465958-4466915 LE5C2-178 1961728-1962297 LE5C2-181 4465958-4466915 S3E10-8 4181084-4181473 S3E10-33 6258072-6258236 S3E10-37 6291025-6290874 S3E10-42.2 945758-945639 S3E10-113 2559551-2559695 S3E10-129 633635-633329 S3E10-134 1837159-1836651 S3E10-139 2935979-2934924 S3E10-163 6263726-6263883 S3E10-163 6281179-6281335 77

S3E10-172 4172925-4173348 S3E10-175 6264939-6265420 S3E10-175 6282388-6282861 S3E10-183 4181318-4180972 S3E10-184 257276-257340 S3E10-186 2289695-2289115 S3E10-200 3399091-3399039 S3E10-200 3399203-3399651 78

Table 11. Genome statistics of wildtype environmental strains. S4G9 LE6C9 LE5C2 S3E10 Number % of Number % of Number % of Number % of total total total total IMG Genome ID 2703719187 - 2703719185 - 2706794715 - 2724679614 - Total bases 6767894 100 6897567 100 5404496 100 6520822 100 G+C bases 4120509 60.88 4350801 63.08 3342264 61.84 3962757 60.77 Total Genes 6347 100 6199 100 4959 100 6170 100 Protein coding Genes 6032 95.04 6047 97.55 4811 97.02 5891 95.48 rRNA genes 19 0.30 16 0.26 22 0.44 16 0.26 tRNA genes 69 1.09 67 1.08 74 1.49 66 1.07 Other RNA genes 227 3.58 69 1.11 52 1.05 197 3.19 Protein coding genes with 4956 78.08 4981 80.35 3943 79.51 4753 77.03 function predicted Protein coding gene without 1076 16.95 1066 17.20 868 17.50 1138 18.44 function predicted Protein coding genes Pfams 5275 83.11 5312 85.69 4212 84.94 5072 82.20 Biosynthetic gene cluster 9 - 13 - 3 - 8 - (BGCs) Genes in BGCS 208 3.28 280 4.52 102 2.06 185 3 79

Table 12. Predicted BGCs for Pseudomonas strains and Tn insertions. # of BGC Putative BGC Tn Mutants w/ Tn Strain genes in (kb) product coordinates Hits hit cluster S4G9 43 29 Other 131774-175151 11 11 Bacteriocin 1534391-1545266 102* 47 NRPS 2549202-2651550 2 11, 86.2 12 10 Siderophore 3169456-3181306 11 11 Bacteriocin 4213178-4224002 42 28 Other 4301713-4343695 46 32 NRPS 4409435-4455686 53 41 NRPS 4831944-4884840 2 133, 156 11 10 Bacteriocin 6708264-6719109 LE6C9 43 20 Other 573688-617122 11 8 Butyrolactone 4511217-4522023 71 35 NRPS 5034159-5105099 53 33 NRPS 5184096-5237139 23 25 Phenazine 6197676-6220462 5 1, 3, 209, 237.1, 237.2 11 11 Bacteriocin 655423-666214 11 11 Bacteriocin 1935730-1946623 82 40 NRPS 2757007-2838626 21 17 Hserlactone 3269496-3290155 11 10 Bacteriocin 3456214-3467098 50 24 NRPS 3874653-3924442 3 79, 95, 199 19 15 Siderophore 3998779-4017769 41 32 Other 4118438-4159523 LE5C2 53 41 NRPS 1639452-1692405 3 5, 12.1, 48 79 53 Bacteriocin; 1906137-1985107 5 4, 39, 75, 139, NRPS 178 24 20 Terpene 2317878-2341520 S3E10 11 11 Bacteriocin 746142-757017 10 11 Bacteriocin 1763602-1807177 43 29 Other 2130052-2173429 11 10 Bacteriocin 2328387-2339232 53 38 NRPS 4149521-4202399 3 8, 172, 183 46 30 NRPS 4482295-4528504 112 46 NRPS 6199092-6311180 4 33, 37, 163,175 11 10 Bacteriocin 6394326-6405150 *: highlighted numbers represent BGCs with Tn insertions

80

Table 13. Predicted ORFs for S4G9 BGC 161816947. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0139561_ 1 112392 215 nicotinamidase/ pyrazinamidase Pseudomonas fluorescens WH6 2 112393 412 Nicotinate Pseudomonas phosphoribosyltransferase fluorescens SBW25 3 112394 271 NAD+ synthase Pseudomonas fluorescens WH6 4 112395 221 two component transcriptional Pseudomonas regulator, LuxR family fluorescens SBW25 5 112396 1127 Signal transduction histidine Pseudomonas kinase brassicacearum brassicacearum NFM421 6 112397 87 C4 antisense RNA 7 112398 402 amino acid/amide ABC Pseudomonas transporter substrate-binding brassicacearum protein, HAAT family brassicacearum NFM421 8 112399 305 amino acid/amide ABC Pseudomonas transporter membrane protein 1, brassicacearum HAAT family brassicacearum NFM421 9 112400 383 amino acid/amide ABC Pseudomonas transporter membrane protein 2, brassicacearum HAAT family brassicacearum NFM421 10 112401 252 urea transport system ATP- Pseudomonas binding protein brassicacearum brassicacearum NFM421 11 112402 299 amino acid/amide ABC Pseudomonas transporter ATP-binding protein brassicacearum 2, HAAT family brassicacearum NFM421 12 112403 409 formamidase Pseudomonas fluorescens SBW25 13 112404 115 putative regulatory protein, Pseudomonas FmdB family fluorescens SBW25 14 112405 347 amidase Pseudomonas fluorescens SBW25 15 112406 318 C-terminal, D2-small domain- Pseudomonas containing protein, of ClpB fluorescens SBW25 protein 81

16 112407 451 putative efflux protein, MATE Pseudomonas family fluorescens SBW25 17 112408 390 Predicted arabinose efflux Pseudomonas permease, MFS family fluorescens SBW25 18 112409 121 Predicted nuclease of the RNAse Pseudomonas H fold, HicB family fluorescens SBW25 19 112410 87 HicA toxin of toxin-antitoxin, Pseudomonas fluorescens SBW25 20 112411 3657 non-ribosomal peptide synthase Pseudomonas protegens domain TIGR01720/amino acid Pf-5 adenylation domain-containing protein 21 112412 2622 non-ribosomal peptide synthase Pseudomonas putida F1 domain TIGR01720/amino acid adenylation domain-containing protein 22 112413 2848 non-ribosomal peptide synthase Pseudomonas putida domain TIGR01720/amino acid W619 adenylation domain-containing protein 23 112414 319 acetyl esterase Pseudomonas putida GB-1 24 112415 830 outer-membrane receptor for Pseudomonas putida ferric coprogen and ferric- GB-1 rhodotorulic acid 25 112416 549 putative ATP-binding cassette Pseudomonas putida transporter KT2440 26 112417 288 Formylglycine-generating Pseudomonas putida F1 enzyme, required for sulfatase activity, contains SUMF1/FGE domain 27 112418 425 Tat (twin-arginine translocation) Pseudomonas putida pathway signal sequence BIRD-1 28 112419 450 Zn-dependent dipeptidase, Pseudomonas dipeptidase homolog fluorescens SBW25 29 112420 538 Hypothetical protein 30 112421 4301 arthrofactin-type cyclic Pseudomonas lipopeptide synthetase B fluorescens SBW25 31 112422 3610 arthrofactin-type cyclic Pseudomonas lipopeptide synthetase C fluorescens SBW25 32 112423 141 Hypothetical protein 33 112424 383 membrane fusion protein, Pseudomonas macrolide-specific efflux system fluorescens SBW25 34 112425 652 macrolide transport system ATP- Pseudomonas binding/permease protein fluorescens SBW25 35 112426 309 Hypothetical protein 82

36 112427 1078 Hypothetical protein 37 112428 225 transcriptional regulator, LuxR Pseudomonas family fluorescens SBW25 38 112429 414 methionine-gamma-lyase Pseudomonas fluorescens SBW25 39 112430 156 transcriptional regulator, AsnC Pseudomonas family fluorescens SBW25 40 112431 122 TwoAYGGAY RNA 41 112432 196 hypothetical protein Pseudomonas fluorescens SBW25 42 112433 612 hypothetical protein Pseudomonas fluorescens SBW25 43 112434 205 Nicotinamidase-related amidase Pseudomonas fluorescens SBW25 44 112435 293 transcriptional regulator, LysR Pseudomonas family fluorescens WH6 45 112436 345 transcriptional regulator, AraC Pseudomonas family fluorescens SBW25 46 112437 304 diguanylate cyclase (GGDEF) Pseudomonas domain-containing protein fluorescens SBW25 47 112438 837 TonB-dependent receptor Pseudomonas fluorescens SBW25

83

Table 14. Predicted ORFs for S4G9 BGC 161816952. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0139561_ 1 114484 261 Nucleoside-specific outer Pseudomonas membrane channel protein Tsx fluorescens SBW25 2 114485 382 NTE family protein Pseudomonas fluorescens WH6 3 114486 310 KDO2-lipid IV(A) Pseudomonas lauroyltransferase fluorescens SBW25 4 114487 245 septum site-determining protein Pseudomonas MinC fluorescens WH6 5 114488 270 septum site-determining protein Pseudomonas MinD fluorescens WH6 6 114489 84 cell division topological Pseudomonas specificity factor MinE fluorescens SBW25 7 114490 21 ribosomal large subunit Pseudomonas pseudouridine synthase A fluorescens SBW25 8 114491 429 aspartyl aminopeptidase Pseudomonas fluorescens SBW25 9 114492 87 C4 antisense RNA 10 114493 58 Hypothetical protein 11 114494 722 Small-conductance Pseudomonas mechanosensitive channel fluorescens SBW25 12 114495 73 MbtH protein Pseudomonas fluorescens SBW25 13 114496 470 diaminobutyrate Pseudomonas aminotransferase apoenzyme fluorescens SBW25 14 114497 122 TwoAYGGAY RNA 15 114498 304 N-acetylmuramoyl-L-alanine Pseudomonas amidase fluorescens SBW25 16 114499 440 Signal transduction histidine Pseudomonas kinase fluorescens SBW25 17 114500 226 two component transcriptional Pseudomonas regulator, winged helix family fluorescens SBW25 18 114501 580 Thiol:disulfide interchange Pseudomonas protein DsbD fluorescens WH6 19 114502 281 Thiol-disulfide isomerase or Pseudomonas thioredoxin fluorescens SBW25 20 114503 253 Thiol:disulfide interchange Pseudomonas protein DsbC fluorescens WH6 21 114504 322 Hypothetical protein 22 114505 4298 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens SBW25 adenylation domain-containing protein 84

23 114506 250 Surfactin synthase thioesterase Pseudomonas subunit fluorescens SBW25 24 114507 183 RNA polymerase sigma-70 Pseudomonas factor, ECF subfamily fluorescens SBW25 25 114508 337 Protein N-acetyltransferase, Pseudomonas RimJ/RimL family fluorescens Pf0-1 26 114509 241 amino acid ABC transporter Pseudomonas substrate-binding protein, PAAT fluorescens WH6 family 27 114510 207 transcriptional regulator, TetR Pseudomonas family fluorescens SBW25 28 114511 592 Protein of unknown function Pseudomonas (DUF1446) fluorescens SBW25 29 114512 290 citronellol/citronellal Pseudomonas dehydrogenase fluorescens SBW25 30 114513 538 geranyl-CoA carboxylase beta Pseudomonas subunit fluorescens WH6 31 114514 385 citronellyl-CoA dehydrogenase Pseudomonas fluorescens SBW25 32 114515 253 isohexenylglutaconyl-CoA Pseudomonas hydratase fluorescens SBW25 33 114516 627 geranyl-CoA carboxylase alpha Pseudomonas subunit fluorescens WH6 34 114517 180 Inhibitor of the KinA pathway to Pseudomonas sporulation, predicted fluorescens SBW25 exonuclease 35 114518 94 hypothetical protein Pseudomonas fluorescens SBW25 36 114519 268 hypothetical protein Pseudomonas fluorescens SBW25 37 114520 311 CheW protein Pseudomonas fluorescens SBW25 38 114251 149 Hypothetical protein 39 114522 297 Signal transduction histidine Pseudomonas kinase fluorescens SBW25 40 114523 219 two component transcriptional Pseudomonas regulator, LuxR family fluorescens SBW25 41 114524 306 lipid kinase YegS Pseudomonas fluorescens SBW25

85

Table 15. Predicted ORFs for LE6C9 BGC 161816930. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0139558_ 1 115526 633 general secretion pathway protein Pseudomonas D aeruginosa sv. O12 PA7 2 115527 494 type II secretion system protein E Pseudomonas (GspE) aeruginosa sv. O12 PA7 3 115528 404 general secretion pathway protein Pseudomonas F aeruginosa sv. O12 PA7 4 115529 144 general secretion pathway protein Pseudomonas G aeruginosa sv. O12 PA7 5 115530 156 general secretion pathway protein Pseudomonas H aeruginosa sv. O12 PA7 6 115531 124 general secretion pathway protein Pseudomonas I aeruginosa sv. O12 PA7 7 115532 200 general secretion pathway protein Pseudomonas J aeruginosa sv. O12 PA7 8 115533 290 general secretion pathway protein Pseudomonas K aeruginosa sv. O12 PA7 9 115534 360 type II secretion system protein L Pseudomonas (GspL) aeruginosa sv. O12 PA7 10 115535 156 general secretion pathway protein Pseudomonas M aeruginosa sv. O12 PA7 11 115536 238 MOSC domain-containing Pseudomonas protein YiiM aeruginosa sv. O12 PA7 12 115537 405 MFS transporter, FSR family, Pseudomonas protegens fosmidomycin resistance protein Pf-5 13 115538 74 Hypothetical protein 14 115539 196 acyl homoserine lactone synthase Burkholderia graminis C4D1M 15 115540 183 LuxR family transcriptional Acinetobacter regulator/LuxR family baumannii ATCC 17978 transcriptional regulator, transcriptional activator of rhlAB and lasB 16 115541 163 Phenazine biosynthesis protein Pseudomonas A/B aeruginosa sv. O12 PA7 17 115542 163 Phenazine biosynthesis protein Pseudomonas A/B aeruginosa PACS2 18 115543 400 3-deoxy-D-arabinoheptulosonate- Pseudomonas 7-phosphate synthase aeruginosa ATCC 33351 19 115544 207 Isochorismate hydrolase Pseudomonas aeruginosa PAP7 86

20 115545 637 phenazine biosynthesis protein Pseudomonas phzE aeruginosa PAP7 21 115546 278 trans-2,3-dihydro-3- Pseudomonas hydroxyanthranilate isomerase aeruginosa ATCC 33351 22 115547 222 Pyridoxamine 5'-phosphate Pseudomonas oxidase aeruginosa UCBPP- PA14 23 115548 610 asparagine synthase (glutamine- Pseudomonas hydrolysing) aeruginosa DoWo1 24 115549 612 Gamma-glutamyltransferase 1 Pseudomonas protegens threonine peptidase. MEROPS Pf-5 family T03 25 115550 166 Acetlytransferase (GNAT) Pseudomonas protegens domain-containing protein Pf-5

87

Table 16. Predicted ORFs for LE6C9 BGC 161816936. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0139558_ 1 113519 329 peptide/nickel transport system Pseudomonas ATP-binding protein brassicacearum brassicacearum NFM421 2 113520 323 peptide/nickel transport system Pseudomonas ATP-binding protein brassicacearum brassicacearum NFM421 3 113521 240 NAD(P)-dependent Stigmatella aurantiaca dehydrogenase, short-chain DW4/3-1 alcohol dehydrogenase family 4 113522 89 Hypothetical protein 5 113523 320 regulator of nucleoside Pseudomonas protegens diphosphate kinase Pf-5 6 113524 210 Hypothetical protein 7 113525 421 Hypothetical protein 8 113526 1490 RHS repeat-associated core Burkholderia cepacia domain-containing protein 383 9 113527 123 REP element-mobilizing Rubrivivax transposase RayT benzoatilyticus JA2 10 113528 189 DJ-1/PfpI family protein Chromobacterium violaceum ATCC 12472 11 113529 375 Condensation domain-containing Bacillus protein amyloliquefaciens Campbell F, DSM 7 12 113530 333 Surfactin synthase thioesterase Bacillus subunit amyloliquefaciens Campbell F, DSM 7 13 113531 451 glycine Bacillus hydroxymethyltransferase amyloliquefaciens Campbell F, DSM 7 14 113532 244 Multimeric flavodoxin WrbA Streptomyces ghanaensis ATCC 14672 15 113533 372 Winged helix DNA-binding Bacillus domain-containing protein amyloliquefaciens Campbell F, DSM 7 16 113534 1840 amino acid adenylation domain- Bacillus containing protein amyloliquefaciens Campbell F, DSM 7 17 113535 441 Hypothetical protein 88

18 113536 57 Hypothetical protein Bacillus amyloliquefaciens Campbell F, DSM 7 19 113537 474 Predicted arabinose efflux Bacillus permease, MFS family amyloliquefaciens Campbell F, DSM 7 20 113538 442 Condensation domain-containing Pseudomonas protegens protein Pf-5 21 113539 456 Uncharacterized membrane Pseudomonas protegens protein Pf-5 22 113540 510 Glycosyl transferases group 1 23 113541 329 Hypothetical protein Pseudomonas protegens Pf-5 24 113542 451 PelD GGDEF domain-containing protein 25 113543 173 Hypothetical protein Pseudomonas protegens Pf-5 26 113544 1196 Tetratricopeptide repeat- Pseudomonas protegens containing protein Pf-5 27 113545 936 hypothetical protein Pseudomonas protegens Pf-5 28 113546 310 UDP-glucose 4-epimerase Pseudomonas protegens Pf-5 29 113547 279 5'-nucleotidase, lipoprotein e(P4) family 30 113548 545 Hypothetical protein 31 113549 66 Hypothetical protein Methanosarcina mazei Go1 32 113550 449 von Willebrand factor type A Pseudomonas domain-containing protein fluorescens WH6 33 113551 525 alkaline phosphatase D

89

Table 17. Predicted ORFs for LE5C2 BGC 161819466. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0151585_ 1 111511 362 (R,R)-butanediol dehydrogenase Pseudomonas putida / meso-butanediol dehydrogenase GB-1 / diacetyl reductase 2 111512 182 adenine Pseudomonas putida phosphoribosyltransferase KT2440 3 111513 244 transcriptional regulator, Crp/Fnr Pseudomonas putida family KT2440 4 111514 460 coproporphyrinogen III oxidase, Pseudomonas putida anaerobic S16 5 111515 227 hypothetical protein Pseudomonas putida W619 6 111516 69 cytochrome oxidase maturation Pseudomonas putida protein, cbb3-type W619 7 111517 820 Cu2+-exporting ATPase Pseudomonas entomophila L48 8 111518 174 hypothetical protein Pseudomonas entomophila L48 9 111519 471 cytochrome c oxidase accessory Pseudomonas protein FixG entomophila L48 10 111520 326 cytochrome c oxidase cbb3-type Pseudomonas subunit 3 entomophila L48 11 111521 61 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 4 W619 12 111522 202 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 2 S16 13 111523 480 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 1 GB-1 14 111524 313 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 3 GB-1 15 111525 65 cytochrome c oxidase cbb3-type Pseudomonas subunit 4 entomophila L48 16 111526 202 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 2 W619 17 111527 474 cytochrome c oxidase cbb3-type Pseudomonas putida subunit 1 W619 18 111528 230 Hypothetical protein 19 111529 94 hypothetical protein Pseudomonas entomophila L48 20 111530 183 Inhibitor of the KinA pathway to Pseudomonas putida sporulation, predicted KT2440 exonuclease 90

21 111531 275 amino acid ABC transporter Pseudomonas substrate-binding protein, PAAT entomophila L48 family 22 111532 176 RNA polymerase sigma-70 Pseudomonas factor, ECF subfamily entomophila L48 23 111533 4317 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid entomophila L48 adenylation domain-containing protein 24 111534 539 Hypothetical protein 25 111535 467 efflux transporter, outer Pseudomonas membrane factor (OMF) entomophila L48 lipoprotein, NodT family 26 111536 653 macrolide transport system ATP- Pseudomonas putida binding/permease protein W619 27 111537 392 membrane fusion protein, Pseudomonas putida macrolide-specific efflux system BIRD-1 28 111538 172 RNA polymerase sigma-70 Pseudomonas putida factor, ECF subfamily W619 29 111539 159 Response regulator receiver Pseudomonas domain-containing protein entomophila L48 30 111540 444 Predicted arabinose efflux Pseudomonas putida permease, MFS family KT2440 31 111541 554 electron-transferring-flavoprotein Pseudomonas putida dehydrogenase KT2440 32 111542 249 electron transfer flavoprotein Pseudomonas beta subunit aeruginosa C 1426 33 111543 309 electron transfer flavoprotein Pseudomonas alpha subunit apoprotein entomophila L48 34 111544 273 amino acid ABC transporter Pseudomonas putida F1 substrate-binding protein, PAAT family 35 111545 118 protein of unknown function Pseudomonas putida F1 (DUF4398) 36 111546 269 Outer membrane protein OmpA Pseudomonas entomophila L48 37 111547 480 transcriptional regulator, GntR Pseudomonas putida family KT2440 38 111548 164 Hypothetical protein 39 111549 84 Hypothetical protein 40 111550 202 START domain-containing Pseudomonas putida F1 protein 41 111551 429 citrate synthase Pseudomonas putida F1

91

Table 18. Predicted ORFs for LE5C2 BGC 161819467. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0151585_ 1 111745 358 monosaccharide ABC transporter Pseudomonas putida substrate-binding protein, CUT2 GB-1 family 2 111746 37 Hypothetical protein 3 111747 72 Protein of unknown function Pseudomonas putida (DUF2970) W619 4 111748 1235 methionine synthase (B12- Pseudomonas putida dependent) F1 5 111749 769 Fatty acid cis/trans isomerase Pseudomonas putida (CTI) W619 6 111750 344 Peptidoglycan/LPS O-acetylase Pseudomonas OafA/YrhL, contains entomophila L48 acyltransferase and SGNH- hydrolase domains 7 111751 194 Fe/S biogenesis protein NfuA Pseudomonas putida W619 8 111752 201 protein SCO1/2 Pseudomonas putida KT2440 9 111753 162 hypothetical protein Pseudomonas putida KT2440 10 111754 75 Hypothetical protein 11 111755 277 transcriptional regulator, AraC Pseudomonas putida family GB-1 12 111756 104 Branched-chain amino acid Pseudomonas putida transport protein GB-1 13 111757 233 4-azaleucine resistance probable Pseudomonas putida transporter AzlC W619 14 111758 214 Hypothetical protein 15 111759 357 Hypothetical protein 16 111760 210 Threonine/homoserine/homoserine Pseudomonas lactone efflux protein entomophila L48 17 111761 267 hypothetical protein Pseudomonas putida W619 18 111762 306 transcriptional regulator, LysR Pseudomonas putida family S16 19 111763 786 DNA damage-inducible DNA Pseudomonas putida polymerase II GB-1 20 111764 3083 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens SBW25 adenylation domain-containing protein 92

21 111765 352 Taurine dioxygenase, alpha- Pseudomonas ketoglutarate-dependent entomophila L48 22 111766 1026 amino acid adenylation domain- Pseudomonas containing protein entomophila L48 23 111767 3489 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens Pf0-1 adenylation domain-containing protein 24 111768 145 Transposase, Mutator family Pseudomonas brassicacearum brassicacearum NFM421 25 111769 51 Hypothetical protein 26 111770 837 iron complex outermembrane Achromobacter recepter protein/outer-membrane xylosoxidans A8 receptor for ferric coprogen and ferric-rhodotorulic acid 27 111771 551 putative ATP-binding cassette Pseudomonas putida transporter KT2440 28 111772 582 choline/carnitine/betaine transport Pseudomonas putida W619 29 111773 338 Protein N-acetyltransferase, Pseudomonas putida RimJ/RimL family F1 30 111774 292 Formylglycine-generating Pseudomonas enzyme, required for sulfatase brassicacearum activity, contains SUMF1/FGE brassicacearum domain NFM421 31 111775 426 Selenocysteine lyase/Cysteine Pseudomonas desulfurase fluorescens SBW25 32 111776 457 Zn-dependent dipeptidase, Pseudomonas dipeptidase homolog brassicacearum brassicacearum NFM421 33 111777 187 protein of unknown function Pseudomonas putida (DUF4174) GB-1 34 111778 96 Hypothetical protein 35 111779 276 hypothetical protein Pseudomonas putida S16 36 111780 239 hypothetical protein Pseudomonas entomophila L48 37 111781 318 EamA domain-containing Pseudomonas membrane protein RarD entomophila L48 38 111782 294 DNA-binding transcriptional Pseudomonas regulator, LysR family entomophila L48 39 111783 389 diguanylate cyclase (GGDEF) Pseudomonas domain-containing protein fluorescens Pf0-1 93

40 111784 469 Hypothetical protein 41 111785 301 Protein phosphatase 2C Pseudomonas entomophila L48 42 111786 796 outer-membrane receptor for ferric Pseudomonas coprogen and ferric-rhodotorulic protegens Pf-5 acid 43 111787 307 FecR family protein Pseudomonas protegens Pf-5 44 111788 189 RNA polymerase sigma-70 factor, Pseudomonas ECF subfamily protegens Pf-5 45 111789 173 RNA polymerase sigma-70 factor, Pseudomonas putida ECF subfamily W619 46 111790 319 FecR family protein Pseudomonas putida W619 47 111791 824 outer-membrane receptor for ferric Pseudomonas putida coprogen and ferric-rhodotorulic W619 acid 48 111792 48 Hypothetical protien 49 111793 402 Predicted arabinose efflux Pectobacterium permease, MFS family carotovorum brasiliensis PBR1692 50 111794 195 transcriptional regulator, TetR Pectobacterium family atrosepticum SCRI1043 51 111795 327 protein of unknown function Pseudomonas (DUF1852) protegens Pf-5 52 111796 340 methionine synthase (B12- Pseudomonas syringae independent) pv. aesculi NCPPB 3681 53 111797 224 Anti-sigma-K factor RskA Pseudomonas fluorescens SBW25

94

Table 19. Predicted ORFs for S3E10 BGC 161835842. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0182367_ 1 113987 160 CheW protein Pseudomonas protegens Pf-5 2 113988 130 Protein of unknown function Pseudomonas (DUF2802) fluorescens SBW25 3 113989 132 TwoAYGGAY RNA 4 113990 306 lipid kinase YegS Pseudomonas fluorescens SBW25 5 113991 219 two component transcriptional Pseudomonas regulator,LuxR family fluorescens SBW25 6 113992 297 Signal transduction histidine Pseudomonas kinase fluorescens SBW25 7 113993 149 Hypothetical protein 8 113994 311 CheW protein Pseudomonas fluorescens SBW25 9 113995 268 hypothetical protein Pseudomonas fluorescens SBW25 10 113996 94 hypothetical protein Pseudomonas fluorescens SBW25 11 113997 180 Inhibitor of the KinA pathway to Pseudomonas sporulation, predicted fluorescens WH6 exonuclease 12 113998 627 geranyl-CoA carboxylase alpha Pseudomonas subunit fluorescens SBW25 13 113999 253 isohexenylglutaconyl-CoA Pseudomonas hydratase fluorescens SBW25 14 114000 385 citronellyl-CoA dehydrogenase Pseudomonas fluorescens SBW25 15 114001 538 geranyl-CoA carboxylase beta Pseudomonas subunit fluorescens SBW25 16 114002 290 citronellol/citronellal Pseudomonas dehydrogenase fluorescens SBW25 17 114003 592 Protein of unknown function Pseudomonas (DUF1446) fluorescens SBW25 18 114004 207 transcriptional regulator, TetR Pseudomonas family fluorescens SBW25 19 114005 262 amino acid ABC transporter Pseudomonas substrate-binding protein, PAAT fluorescens SBW25 family 20 114006 183 RNA polymerase sigma-70 factor, Pseudomonas ECF subfamily fluorescens SBW25 95

21 114007 250 Surfactin synthase thioesterase Pseudomonas subunit fluorescens SBW25 22 114008 4292 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens SBW25 adenylation domain-containing protein 23 114009 253 Thiol:disulfide interchange Pseudomonas protein DsbC fluorescens SBW25 24 114010 283 Thiol-disulfide isomerase or Pseudomonas thioredoxin fluorescens SBW25 25 114011 575 Thiol:disulfide interchange Pseudomonas protein DsbD fluorescens SBW25 26 114012 226 two component transcriptional Pseudomonas regulator, winged helix family fluorescens SBW25 27 114013 440 Signal transduction histidine Pseudomonas kinase fluorescens SBW25 28 114014 255 N-acetylmuramoyl-L-alanine Pseudomonas amidase fluorescens SBW25 29 114015 464 diaminobutyrate aminotransferase Pseudomonas apoenzyme fluorescens SBW25 30 114016 73 MbtH protein Pseudomonas fluorescens SBW25 31 114017 448 carboxypeptidase Ss1. Metallo Pseudomonas peptidase. MEROPS family fluorescens SBW25 M20D 32 114018 721 Small-conductance Pseudomonas mechanosensitive channel fluorescens SBW25 33 114019 58 Hypothetical protein 34 114020 429 aspartyl aminopeptidase Pseudomonas fluorescens WH6 35 114021 89 C4 antisense RNA 36 114022 211 ribosomal large subunit Pseudomonas pseudouridine synthase A fluorescens WH6 37 114023 84 cell division topological Pseudomonas specificity factor MinE fluorescens SBW25 38 114024 270 septum site-determining protein Pseudomonas MinD fluorescens WH6 39 114025 245 septum site-determining protein Pseudomonas MinC fluorescens SBW25 40 114026 310 KDO2-lipid IV(A) Pseudomonas lauroyltransferase fluorescens SBW25 41 114027 377 NTE family protein Pseudomonas fluorescens SBW25 42 114028 261 Nucleoside-specific outer Pseudomonas membrane channel protein Tsx fluorescens SBW25

96

Table 20. Predicted ORFs for S3E10 BGC 161835844. JGI Locus Number ORF Tag Predicted Protein Best Hit genome of AA Ga0182367_ 1 115912 253 outer membrane transport Pseudomonas energization protein TonB fluorescens SBW25 2 115913 148 Cell division and transport- Pseudomonas associated protein TolR fluorescens SBW25 3 115914 230 Cell division and transport- Pseudomonas associated protein TolQ fluorescens SBW25 4 115915 621 3-phytase Pseudomonas fluorescens SBW25 5 115916 837 TonB-dependent receptor Pseudomonas fluorescens SBW25 6 115917 304 diguanylate cyclase (GGDEF) Pseudomonas domain-containing protein fluorescens SBW25 7 115918 345 transcriptional regulator, AraC Pseudomonas family fluorescens SBW25 8 115919 293 transcriptional regulator, LysR Pseudomonas family fluorescens SBW25 9 115920 205 Nicotinamidase-related amidase Pseudomonas fluorescens SBW25 10 115921 612 hypothetical protein Pseudomonas fluorescens WH6 11 115922 196 Hypothetical protein 12 115923 132 TwoAyGGAY RNA 13 115924 156 transcriptional regulator, AsnC Pseudomonas family fluorescens SBW25 14 115925 411 methionine-gamma-lyase Pseudomonas fluorescens SBW25 15 115926 223 transcriptional regulator, LuxR Pseudomonas family fluorescens SBW25 16 115927 652 macrolide transport system ATP- Pseudomonas binding/permease protein fluorescens SBW25 17 115928 383 membrane fusion protein, Pseudomonas macrolide-specific efflux system fluorescens SBW25 18 115929 3764 arthrofactin-type cyclic Pseudomonas lipopeptide synthetase C fluorescens SBW25 19 115930 4275 arthrofactin-type cyclic Pseudomonas lipopeptide synthetase B fluorescens SBW25 20 115931 541 Hypothetical protein 21 115932 449 Zn-dependent dipeptidase, Pseudomonas dipeptidase homolog fluorescens SBW25 22 115933 423 Tat (twin-arginine translocation) Pseudomonas pathway signal sequence fluorescens SBW25 97

23 115934 286 Formylglycine-generating Pseudomonas enzyme, required for sulfatase fluorescens SBW25 activity, contains SUMF1/FGE domain 24 115935 276 Formyl transferase Pseudomonas fluorescens SBW25 25 115936 550 putative ATP-binding cassette Pseudomonas transporter fluorescens SBW25 26 115937 806 outer-membrane receptor for Pseudomonas ferric coprogen and ferric- aeruginosa PAO1 rhodotorulic acid 27 115938 271 TIG02646 family protein 28 115939 734 AAA domain-containing protein, Acinetobacter junii putative AbiEii toxin, Type IV TA SH205 system 29 115940 4990 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens SBW25 adenylation domain-containing protein 30 115941 2593 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid aeruginosa 2192 adenylation domain-containing protein 31 115942 3666 non-ribosomal peptide synthase Pseudomonas domain TIGR01720/amino acid fluorescens SBW25 adenylation domain-containing protein 32 115943 83 HicA toxin of toxin-antitoxin, Pseudomonas fluorescens SBW25 33 115944 121 Predicted nuclease of the RNAse Pseudomonas H fold, HicB family fluorescens SBW25 34 115945 390 Predicted arabinose efflux Pseudomonas permease, MFS family fluorescens SBW25 35 115946 195 transcriptional regulator, TetR Pseudomonas family fluorescens SBW25 36 115947 451 putative efflux protein, MATE Pseudomonas family fluorescens SBW25 37 115948 327 C-terminal, D2-small domain- Pseudomonas containing protein, of ClpB fluorescens SBW25 protein 38 115949 347 amidase Pseudomonas fluorescens SBW25 39 115950 115 putative regulatory protein, FmdB Pseudomonas family fluorescens SBW25 40 115951 409 formamidase Pseudomonas fluorescens SBW25 98

41 115952 546 methyl-accepting chemotaxis Pseudomonas protein fluorescens SBW25 42 115953 246 transcriptional regulator, GntR Pseudomonas family fluorescens SBW25 43 115954 541 3-carboxy-cis,cis-muconate Delftia acidovorans cycloisomerase SPH-1 44 115955 462 aerobic C4-dicarboxylate Methylobacterium transport protein nodulans ORS 2060 45 115956 462 fumarase, class II Cupriavidus necator N- 1, ATCC 43291 46 115957 405 Hypothetical protein 47 115958 229 amino acid/amide ABC Pseudomonas transporter ATP-binding protein fluorescens et76 2, HAAT family 48 115959 252 urea transport system ATP- Pseudomonas binding protein brassicacearum brassicacearum NFM421 49 115960 387 amino acid/amide ABC Pseudomonas transporter membrane protein 2, fluorescens et76 HAAT family 50 115961 305 amino acid/amide ABC Pseudomonas transporter membrane protein 1, fluorescens et76 HAAT family 51 115962 402 amino acid/amide ABC Pseudomonas transporter substrate-binding fluorescens et76 protein, HAAT family

99

Table 21. Average nucleotide identity between the genomes of Pseudomonas strains. S4G9 LE6C9 LE5C2 S3E10 S4G9 82.97 78.54 89.08 LE6C9 82.95 79.37 82.93 LE5C2 78.58 79.37 78.50 S3E10 89.09 82.99 78.58 Pseudomonas aeruginosa PAO1* 77.07 78.63 77.48 77.08 Pseudomonas antartica BS2772 88.03 82.28 78.01 87.79 Pseudomonas azotoformans LMG 21611 88.89 82.9 78.59 90.89 Pseudomonas capeferrum WCS358 78.74 79.92 83.55 78.92 Pseudomonas chlororaphis aurantiaca LMG 21630 82.99 95.44 79.39 83.01 Pseudomonas chlororaphis aureofaciens ATCC 13985 83.08 95.57 79.44 83.04 Pseudomonas chlororaphis chlororaphis ATCC 9446 82.91 97.95 79.41 82.97 Pseudomonas chlororaphis piscium DSM 21509 82.91 95.32 79.41 82.94 Pseudomonas entomophila L48 79.04 80.31 83.66 79.1 Pseudomonas extremorientalis BS2774 88.86 82.99 78.54 90.92 Pseudomonas fluorescens Pf0-1 82.33 83.94 78.73 82.28 Pseudomonas fluorescens SBW25 89.07 82.84 78.42 91.31 Pseudomonas grimontii BS2976 89.22 83.01 78.5 88.93 Pseudomonas guariconensis LMG 27394 78.5 83.23 83.02 78.70 Pseudomonas lurida LMG 21995 88.65 82.83 78.43 91.67 Pseudomonas marginalis BS2952 89.26 83.18 78.66 88.95 Pseudomonas mendocina ymp 77.3 78.71 77.51 77.42 Pseudomonas monteilii DSM 14164 78.85 79.68 83.25 78.91 Pseudomonas mosselii DSM 17497 79.03 80.38 83.72 79.06 Pseudomonas plecoglossicida DSM 15088 79.33 80.37 83.78 79.33 Pseudomonas poae A2-S9 89.42 83.23 78.68 88.96 Pseudomonas protegens Pf-5 82.25 85.75 79.15 82.33 Pseudomonas putida F1 78.51 79.5 82.91 78.67 Pseudomonas putida GB-1 78.7 79.65 83.25 78.82 Pseudomonas putida KT2440 78.54 79.46 82.99 78.71 Pseudomonas soli LMG 27941 79.01 80.16 83.55 78.79 Pseudomonas stutzeri A1501 76.26 77.46 76.44 76.41 Pseudomonas syringae pv. syringae B728a 78.83 79.7 77.64 79.01 Pseudomonas taiwanensis DSM 21245 78.5 79.52 82.76 78.52 Pseudomonas trivialis BS3111 87.74 83.07 78.67 87.55 *: highlighted strains denote genomes of highly characterizes species described in Silby et al. 2011

100

Table 22. Predicted compound structures for Pseudomonas gene clusters with Tn insertions. Biosynthetic Gene Putative Putative Biochemical Structure Cluster Product

S4G9 Gene Cluster NRPS 161816947

S4G9 Gene Cluster NRPS 161816952

LE6C9 Gene cluster Phenazine 16186930

LE6C9 Gene Cluster NRPS 161816936

LE5C2 gene cluster NRPS 161819466

101

LE5C2 gene cluster NRPS 161819467

S3E10 gene cluster 161835842 NRPS Predicted Product 1

S3E10 gene cluster 161835842 NRPS Predicted Product 2

S3E10 gene cluster NRPS 161835844