Identification, distribution and evolution of three type III secretion systems and their

effectors in the pathogen, Pantoea stewartii sp. stewartii DC283

A Thesis

Submitted to the Faculty of Graduate Studies and Research

In Partial Fulfillment of the Requirements

For the Degree of

Master of Science

In

Biology

University of Regina

By

Morgan William Bruce Kirzinger

Regina, Saskatchewan

December, 2012.

Copyright © 2012: M. W. B. Kirzinger

UNIVERSITY OF REGINA

FACULTY OF GRADUATE STUDIES AND RESEARCH

SUPERVISORY AND EXAMINING COMMITTEE

Morgan William Bruce Kirzinger, candidate for the degree of Master of Science in Biology, has presented a thesis titled, Identification, Distribution and Evolution of Three Type III Secretion Systems and Their Effectors in the Pathogen, Pantoea Stewartii sp. stewartii DC 283, in an oral examination held on November 30, 2012. The following committee members have found the thesis acceptable in form and content, and that the candidate demonstrated satisfactory knowledge of the subject material.

External Examiner: Dr. Aaron White, University of Saskatchewan

Supervisor: Dr. John Stavrinides, Department of Biology

Committee Member: Dr. Andrew Cameron, Department of Biology

Chair of Defense: Dr. Laurie Sykes Tottenham, Department of Psychology

Abstract

The type III secretion system (T3SS) is an extracellular appendage used primarily by for pathogenesis in both plants and animals, including humans. Related to the bacterial flagellum, this nanomachine uses an extracellular pilus to actively transport effector proteins from Gram-negative bacteria directly into the host cell to cause disease.

One genus that uses the T3SS is Pantoea, which comprises several well-studied plant pathogenic species, as well as opportunistic human pathogenic species. Three T3SSs were identified in a diverse collection of Pantoea representing plant, clinical and environmental samples: a plant-specific T3SS Hrc 1, and two animal-specific T3SSs,

SPI-1 and SPI-2. A PCR-based genetic survey identified 17 Hrc 1 systems, 12 SPI-1 systems, and 9 SPI-2 systems in 9 species, 5 species from the genus Pantoea.

Phylogenetic analysis together with GC content and comparative genomic analyses of the

T3SS genomic islands revealed that the evolution of the plant is more consistent with vertical inheritance, whereas the SPI-1 and SPI-2 systems appear to have been acquired later though horizontal gene transfer. Most interestingly, however, was the identification of the plant T3SS in clinical isolates and the presence of the animal T3SSs in plant isolates. The Pantoea stewartii subsp. stewartii DC283 genome was also analyzed for the presence of type III secreted effectors (T3SEs), which resulted in the identification of 5 novel T3SEs. The evolutionary analysis of the three T3SSs and the identification of known and novel T3SEs highlights the importance of these secretion systems and their effectors in the biology of Pantoea.

i

Acknowledgements

I would like to thank Dr. J. Stavrinides for letting me run wild with my Masters project, for without the freedom, I would have never discovered what secrets the Pantoea genome holds. I am forever grateful for the time that you have invested in my future as a researcher and your patience and support. You are an amazing supervisor and mentor, and I am forever in your debt for all of your inspiration, infamous knowledge and support throughout the course of my tenure as a Graduate student at the University of Regina.

I would like to thank the Stavrinauts, both past and present, for their moral and intellectual support. A special shout out to the one and only Sunshine for helping me in the lab for the summer. I greatly appreciate your research help.

Also, big thanks to KB Arnstead, RJ Hartnett, ZC Johnson, NCA Kirzinger, TA

Langelotz, BN Liski, SA MacLintock, TR McLellan, DM Pieracci, DC Ross, BJ Ryan,

KM Schmalenberg, and CR Varjassy for their unprecedented support and patience throughout my entire Masters. I greatly appreciate your support and am forever grateful for your encouragement. True friends are hard to come by.

I would also like to thank Dr. Gwyn Beattie, Dr. David Coplin, Dr. Brion Duffy,

Dr. Paul Levett, New Zealand Culture of Plant Pathogens, Regina General Hospital, St.

Boniface General Hospital, Sunnybrook Hospital, Texas Children`s Hospital, and Dr.

Steven Lindow for graciously providing our lab with the Pantoea sp. isolates that were used in the genetic survey.

I am also like to acknowledge the Faculty of Graduate Studies and Research for providing funding during my masters at the University of Regina.

ii

Post Defense Acknowledgements

I would like to sincerely thank Dr. Aaron White for taking the time to be my external examiner. Your feedback and support throughout this process has been amazing.

Thank you for making the defense process less intimidating and more enjoyable.

iii

Dedication

I would like to dedicate my thesis to my parents Lorne and Sheila Kirzinger, for without their support, patience, and infinite wisdom this endeavour would not have been possible. You always somehow knew I would be your little “mad scientist” and you have helped me out every step of the way. I cannot thank you enough for letting me explore the world with curiosity and passion, and I am forever grateful for your love and support.

Thank you.

iv

Table of Contents

Abstract ...... i Acknowledgements ...... ii Post Defense Acknowledgements ...... iii Dedication ...... iv List of Tables ...... vi List of Figures ...... vii List of Appendices ...... viii List of Abbreviations ...... ix Introduction ...... 1 Materials and Methods ...... 13 Results and Discussion ...... 18 Concluding Remarks ...... 53 Future Directions ...... 55 List of References ...... 56 Appendix ...... 62

v

List of Tables

Table 1 Summary of results from the bioinformatic prediction of type III secreted effectors...... 39

Table 2 Genes with a plant T3SS promoter (Hrc 1, Hrp box) in P. stewartii DC283, E. amylovora, E. pyrifoliae and E. tasmaniensis...... 40

Table 3 SPI-1 promoter search results for P. stewartii DC283 and E. amylovora, E. pyrifoliae, E. tasmaniensis, S. glossinidius, Y. aldovae and Y. regensburgei...... 45

Table 4 SPI-2 promoter (SsrA box) survey results for P. stewartii DC283, S. glossinidius, Y. aldovae, and Y. regensburgei...... 47

Table 5 N-terminal secretion signal survey results for P. stewartii DC283...... 51

vi

List of Figures

Figure 1 Type III secretion system promoter boxes...... 17

Figure 2 Distribution of the plant (Hrc 1) and animal (SPI-1, SPI-2) T3SSs within the Pantoea sp. group as determined by a PCR-based survey of the ATPase...... 19

Figure 3 Phylogenetic analysis of T3SS ATPases...... 22

Figure 4 Comparison of the evolutionary history of the plant Hrc 1 T3SS to the species tree...... 25

Figure 5 Gene synteny and GC content of the plant Hrc 1 T3SSs...... 27

Figure 6 Phylogenetic analysis and evolution of the animal SPI-1 T3SS...... 30

Figure 7 Gene synteny and GC content of the animal SPI-1 T3SSs...... 32

Figure 8 Phylogenetic analysis and evolution of the animal SPI-2 T3SS...... 34

Figure 9 Gene synteny and GC content of the animal SPI-2 T3SSs...... 35

Figure 10 Strategy for identifying candidate type III effectors...... 38

vii

List of Appendices

Appendix A Perl scripts used for the identification of type III secretion effectors...... 62

Appendix B Information regarding the strains used in this thesis...... 74

Appendix C Host specificity determinants as a genetic continuum...... 79

Appendix D Insights into cross-kingdom plant ...... 105

viii

List of Abbreviations

µL microlitre

Blast basic local alignment search tool

CIP calf intestinal alkaline phosphatase

DNA deoxyribonucleic acid

Exo exonuclease 1

GEF guanine nucleotide exchange factor

GTP guanine triphosphate kb kilobase

LB lysogeny broth

MgSO4 magnesium sulfate mL millilitre

MLSA multilocus sequence analysis mM millimolar

NCBI National Center for Biotechnology Information

ORF open reading frame

PCR polymerase chain reaction

PSI-2 Pantoea secretion island 2

PstDC283 Pantoea stewartii subsp. stewartii str. DC283

SPI-1 Salmonella pathogenicity island-1

SPI-2 Salmonella pathogenicity island-2

T3SE type III secreted effector

T3SS type III secretion system

ix tdh thermostable direct haemolysin trh tdh-related haemolysin

U unit

x

Introduction

The type III secretion system

The type III secretion system (T3SS) is a unique extracellular nanomachine used by Gram-negative bacteria to inject effector proteins into the cytoplasm of host cells during pathogenesis (Cornelis 2006). The T3SS has also been shown to play an important role in host-specific mutualistic relationships between Rhizobium and its host plants, and in the association between the endosymbiont Sodalis and the tsetse fly

(Alfano and Collmer 1997; Viprey, Del Greco et al. 1998; Galan and Collmer 1999;

Marie, Broughton et al. 2001; Toh, Weiss et al. 2006). The T3SS gene cluster is made up of approximately 25 genes organized in operons that form part of the larger 20 – 30 kilobase (kb) T3SS regulon, or genomic island (Hueck 1998; Nguyen, Paulsen et al.

2000; Saier 2004). The majority of these genes encode the structural components of the

T3SS (Hueck 1998), in addition to regulators and in some cases, secreted effector proteins (Nguyen, Paulsen et al. 2000; Cornelis 2006).

To date, seven different T3SSs have been identified through phylogenetic analysis: four animal specific systems and three plant specific systems (Troisfontaines and Cornelis 2005). The different systems have been classified based on their host specificity, animal or plant and are further subdivided based on phylogenetic analysis of the T3SS ATPase. The three plant systems have been shown to play a role in plant pathogenesis and symbiosis through the secretion of effector proteins into plant host cells

(Alfano and Collmer 1997; Alfano, Charkowski et al. 2000; Roden, Belt et al. 2004;

Mudgett 2005). The Hrc 1 plant T3SS, named for the hypersensitive response conserved genes of the T3SS, is found primarily among the enteric plant pathogens including

1

Erwinia sp., Brenneria sp., Pantoea sp., Dickeya sp. and Pectobacterium sp. (Alfano and

Collmer 1997; Alfano, Charkowski et al. 2000; Mudgett 2005; Troisfontaines and

Cornelis 2005). It has been well characterized and its role in plant pathogenesis has been established experimentally in several species such as Pseudomonas sp., Xanthomonas,

Erwinia, and Pantoea sp. (Alfano, Charkowski et al. 2000; Nissan, Manulis-Sasson et al.

2006).

The four animal T3SSs that have been identified and characterized are known to contribute to the virulence capabilities of opportunistic and obligate human and animal pathogens, in addition to host associations in some insect symbionts (Suarez and

Russmann 1998; Tardy, Homble et al. 1999; Subtil, Blocker et al. 2000; Dale, Young et al. 2001; Cornelis 2002; Dale, Jones et al. 2005; Troisfontaines and Cornelis 2005;

Dieye, Ameiss et al. 2009). This includes several genera of the , such as Salmonella. The Salmonella pathogenicity island-1 (SPI-1) and Salmonella pathogenicity island-2 (SPI-2) are typical pathogenicity islands, which are characterized for their decreased %G+C content, and their increased likelihood to undergo horizontal gene transfer (Hansen-

Wester and Hensel 2001). They have been well studied and their role in pathogenesis in animal hosts has been well established (Suarez and Russmann 1998; Dieye, Ameiss et al.

2009; Lara-Tejero and Galan 2009). Animal pathogens use the SPI-1 system to invade animal cells by inducing phagocytosis of the bacterial cell, which results in the bacterium being engulfed and enclosed in a vacuole (Lostroh and Lee 2001). In Salmonella pathogenesis, this vacuole is referred to as the Salmonella-containing vacuole. The SPI-2 system is then used to survive and cause disease from within the vacuole, where the

2 effector proteins are used to manipulate host processes and commandeer the cell (Hensel

2000). Although these T3SSs have been characterized primarily in Salmonella, they have been found in Shigella, Escherichia, Sodalis, Burkholderia, Chromobacterium and more recently in other genera of the (Foultier, Troisfontaines et al.

2003).

Much like Salmonella, several bacterial species have been shown to harbour more than one type of T3SS in their genome, including Burkholderia, Chromobacterium,

Salmonella, Sodalis, Vibrio, and most recently Pantoea (Galan 2001; Rainbow,

Hart et al. 2002; Betts, Chaudhuri et al. 2004; Dale, Jones et al. 2005; Pallen, Beatson et al. 2005; Nissan, Manulis-Sasson et al. 2006; Toh, Weiss et al. 2006; Correa, Majerczak et al. 2008; Noriea, Johnson et al. 2010). One interesting example of the presence of multiple T3SSs in the same genome is the tsetse fly symbiont Sodalis glossinidius str.

‘morsitans’, which carries three T3SSs (Toh, Weiss et al. 2006). The three S. glossinidius symbiosis regions (SSR) are SSR-1, most similar to the Ysa system, and SSR-2 and SSR-3 which are most similar to SPI-1 and SPI-2 from

Salmonella respectively (Toh, Weiss et al. 2006). A comparison of the SSR-1 and SSR-2

T3SSs in S. glossinidius to the Y. enterocolitica T3SS and SPI-1 systems revealed that some genes of the S. glossinidius T3SSs are non-functional, such as sipA and sipC from

SSR-1, and invE and hilA from SSR-2, suggesting possible degeneration of the locus

(Toh, Weiss et al. 2006). Additionally, the S. glossinidius chromosome has homologues of several well-characterized Salmonella effectors, virulence regulators and invasions proteins, including sipD, phoPQ and ssrAB, that currently have no known function in the symbiosis of S. glossinidius (Toh, Weiss et al. 2006).

3

Another interesting example of the presence of more than one T3SS in the same genome is in (Noriea, Johnson et al. 2010). V. parahaemolyticus is a halophilic bacterium found in marine environments, including sediment and the water column, and coastal environments around the world (Noriea, Johnson et al. 2010). It has previously been shown that the majority of V. parahaemolyticus strains have two distinct

T3SSs in their genome that contribute to pathogenesis, T3SS1 and T3SS2 (Makino,

Oshima et al. 2003). Recently it was found that there are in fact two lineages of the

T3SS2: T3SS2α and T3SS2β (Noriea, Johnson et al. 2010), which appear to be correlated with the presence of genes encoding two different haemolysins (Noriea, Johnson et al.

2010). The thermostable direct haemolysin (tdh) and tdh-related haemolysin (trh) genes are rare in environmental isolates: tdh+/trh- strains carry T3SS2α, and tdh-/trh+ strains carry T3SS2β (Noriea, Johnson et al. 2010). Since both the presence of either tdh or trh has been associated with human pathogenesis, there may be selection pressure for the presence of evolutionary distinct systems that have distinctive environment-specific regulation.

Functions of the type III secretion system

The T3SS alone is insufficient to cause disease or enable the establishment of symbiotic relationships; rather, it is the action of secreted effector proteins that are injected directly into host cells by the T3SS. Type III secreted effectors (T3SEs) are proteins that are transported through the T3SS pilus from the bacterial cytoplasm into the host cell and which target host defences, manipulate cell processes and allow for pathogen survival and replication (Galan 2001; Alfano and Collmer 2004; Mudgett 2005;

4

Grant, Fisher et al. 2006). The T3SS exports effectors from the cytosol of the bacterial cell into the cytosol of the host plant or animal cell in an ATP-dependent manner. The N- terminal secretion signal is necessary for the export of T3SEs during pathogenesis, and is identifiable in the first 50 amino acids of a proteins sequence (Lower and Schneider

2009; Jehl, Arnold et al. 2011; McDermott, Corrigan et al. 2011). There are three defining characteristics of the N-terminal secretion signal: 1) greater than 10% serine in the first 50 amino acids, 2) isoleucine, leucine, valine, alanine, or proline in the third or fourth amino acid position, and 3) no aspartic acid or glutamic acid in the first 12 amino acids (Guttman, Vinatzer et al. 2002). Although there are different suites of T3SEs among the bacterial species that harbour a T3SS, the secretion signal is universal, and can be used to identify these effector genes in genomes (Guttman, Vinatzer et al. 2002).

Each T3SS is regulated by a specific transcriptional regulator that binds to a distinct promoter box, which appears upstream of each of the operons containing structural components of the system as well as upstream of T3SEs (Innes, Bent et al.

1993; Lostroh and Lee 2001; Zwiesler-Vollick, Plovanich-Jones et al. 2002;

Tomljenovic-Berube, Mulder et al. 2010). These promoter boxes are vastly different between the different T3SSs groups. For instance, the P. syringae plant Hrc 1 system is regulated by a 28 – 30 bp promoter box (consensus GGAACCn{15-19}CCAC)

(Zwiesler-Vollick, Plovanich-Jones et al. 2002), while the promoter box of the

Salmonella SPI-1 contains two direct repeat hexamers separated by 5 nucleotides

(TTTCATnnnnnTTTCAT) (Lostroh, Bajaj et al. 2000). The Salmonella SPI-2 system, uses a 18 bp promoter that is made up of 7 bp, a 4 bp spacer, then the palindrome of the

5 first 7 bp (consensus ATCAGGTnnnnACCTGAT) (Tomljenovic-Berube, Mulder et al.

2010).

Much like the differences in the regulatory components of the different T3SSs, there are also differences in the suites of effector proteins between the animal T3SSs and plant T3SSs. The main, and obvious reason, is that plant cells are physiologically and structurally different than animal cells and as a result, the effector proteins have different functions and roles depending on the target cell. Plant-specific T3SEs can have a multitude of functions, although their primary role is interfering with host cell defence to facilitate pathogen proliferation (Grant, Fisher et al. 2006). For example, HopN1 in P. syringae, is a cysteine protease that suppresses cell death in nonhost plants (Lopez-

Solanilla, Bronstein et al. 2004), whereas AvrPto1 from P. syringae suppresses papillae formation, which disables Arabidopsis salicylic acid-independent cell wall based defences (Hauck, Thilmony et al. 2003).

In contrast, animal effector proteins generally function to inhibit host cell phagocytosis, disable host defences and facilitate pathogen survival (Galan 2001). For instance, the Yersinia T3SE YopT, a cysteine protease, prevents macrophage mediated phagocytosis by cleaving isoprenyl groups from Rho GTPases releasing them from their plasma membrane bound position into the cytoplasm (Shao, Merritt et al. 2002). YopT, when secreted by the Yersinia T3SS, has been shown to have a cytotoxic effect in mammalian cells (Iriarte and Cornelis 1998). In contrast, SopE in Salmonella is a guanine nucleotide exchange factor (GEF) that activates Cdc42 and Rac1, which are two small guanine triphosphates (GTPs) responsible for cytoskeleton remodelling and actin polymerization (Hardt, Chen et al. 1998). When secreted by the SPI-1 T3SS during

6

Salmonella pathogenesis, SopE facilitates efficient entry into host cells (Hardt, Chen et al. 1998).

In addition to roles in pathogenesis, T3SSs have also been shown to play important roles in mutualistic associations, such as those exhibited by the plant symbiont,

Rhizobium sp. and the insect symbiont, S. glossinidius (Viprey, Del Greco et al. 1998;

Dale, Young et al. 2001; Dale, Jones et al. 2005; Toh, Weiss et al. 2006). For example, the Rhizobium T3SS secretes nodulation outer proteins (Nops), which can act in a similar manner to some pathogenic plant T3SEs by interfering with host cell processes and suppressing plant defence responses (Bartsev, Deakin et al. 2004). One well studied Nop,

NopL, is secreted into the plant cell where it is phosphorylated by plant protein kinases and is proposed to play a role in the interference of plant defence responses (Bartsev,

Deakin et al. 2004).

In S. glossinidius, some of the effectors generally associated with pathogenic bacteria have either been eliminated from the genome or modified to play role in the mutualism (Toh, Weiss et al. 2006). For example, there are homologues of several

Salmonella T3SS effector proteins including sipD in the S. glossinidius genome, although their function in symbiosis has yet to be determined (Toh, Weiss et al. 2006). Until recently, no symbiont effector proteins had been identified outside of the T3SS gene cluster in S. glossinidius (Dale, Jones et al. 2005); however, Costa et al. (2012) recently identified a class IB chaperone in S. glossinidius, along with two effectors, SG0576 and

SG0764, which were shown to be secreted under conditions that induce type III secretion in vitro (Costa, Schmitz et al. 2012). The functions of SG0576 and SG0764 are still not known. There are numerous other animal and plant T3SEs that have been identified and

7 characterized. For reviews on known T3SEs see: Galan 2001; Alfano and Collmer 2004;

Mudgett 2005; Grant, Fisher et al. 2006.

Evolution of type III secretion systems

Many studies have examined the distribution and evolution of the different types of T3SSs (Nguyen, Paulsen et al. 2000; Gophna, Ron et al. 2003; Pallen, Beatson et al.

2005; Troisfontaines and Cornelis 2005; Naum, Brown et al. 2009). Troisfontaines et al.

(2005) used phylogenetics to demonstrate that T3SSs in bacteria form seven distinct phylogenetic clusters. When a gene phylogeny containing all of the systems is compared to the 16S rRNA species phylogeny, there is strong evidence for lateral genetic transfers between organisms. Similarly, Gophna et al. (2003) demonstrated that if all of the

ATPase components of the animal and plant T3SSs were combined into one phylogeny and compared to a 16S tree of the same species, the two were incongruent, suggesting multiple horizontal gene transfer events (Gophna, Ron et al. 2003).

In the case of the plant Hrc 1 T3SS, several studies have examined its evolutionary origins (Gophna, Ron et al. 2003; Troisfontaines and Cornelis 2005; Naum,

Brown et al. 2009). A phylogenetic analysis of the hrcJ and hrcV T3SS genes in the enteric plant pathogens (Erwinia, Pantoea, Pectobacterium, Brenneria, and Dickeya) showed that the Hrc 1 plant T3SS was acquired horizontally (Naum, Brown et al. 2009).

For P. syringae, there is evidence of the Hrc 1 T3SS being maintained vertically. The P. syringae Hrc 1 T3SS has been shown to be ancestral to the species, with the evolutionary history of the two genes at the extreme ends of the cluster, hrpL and hrpS being consistent with housekeeping genes like gyrB and rpoD (Sawada, Suzuki et al. 1999).

8

This indicates that the T3SS plays and has played an essential role in the plant pathogenic capabilities of P. syringae, and was acquired prior to the diversification of the P. syringae species group. Despite this, there are still several groups within P. syringae that either do not have a T3SS or have a novel system that is distantly related to the “classic” Hrc 1

T3SS (Clarke, Cai et al. 2010). Thus, although the T3SS is ancestral in the species, there is evidence of loss, modification, and replacement, which may be reflective of the evolution of an alternate strategy for plant pathogenesis or a complete loss of plant pathogenic capabilities in some strains.

The evolution of the SPI-1 and SPI-2 T3SSs has also been studied (Ochman and

Groisman 1996). The evolution of these two systems has been shown to be the result of horizontal gene transfer at different points in the evolutionary history of Salmonella. SPI-

1 can be found in both S. enterica and S. bongori, a species of Salmonella that is older than S. enterica. In contrast, the SPI-2 T3SS is not present in S. bongori, suggesting that the SPI-1 system is older than the SPI-2 system (Ochman and Groisman 1996; Hansen-

Wester and Hensel 2001). Ochman and Groisman determined that the SPI-1 T3SS was acquired by Salmonella prior to the speciation of S. bongori and S. enterica whereas SPI-

2 was acquired more recently, after the speciation of S. enterica (Ochman and Groisman

1996).

Although the origins of the T3SSs in some species are uncertain, the T3SS appears to undergo both horizontal and vertical inheritance, and in the case of P. syringae, vertical inheritance of the cluster does not appear to equate to a global distribution of the cluster in all strains.

9

Pantoea as a model organism

The genus Pantoea comprises some of the most interesting Gram-negative species of the Enterobacteriaceae since they are closely related to human pathogenic S. enterica, and E. coli. Formerly known as Enterobacter agglomerans (Gavini, Mergaert et al.

1989), Pantoea has been isolated from numerous commercially relevant crops including corn, wheat, onions, melons and roses (Gitaitis and Gay 1997; Amellal, Burtin et al.

1998; Ham, Majerczak et al. 2006; Duerinckx 2008; Kido, Adachi et al. 2008), and has gained a reputation as a plant pathogen for its pathogenic capabilities on rice, beets and gypsophila (Manulis and Barash 2003; Nissan, Manulis-Sasson et al. 2006; Duan, Yi et al. 2007). Some strains, such as strain Pantoea agglomerans Eh252, do not appear to have plant pathogenic capabilities, and are used as a biocontrol agent for fire blight in orchards (Stockwell, Johnson et al. 2002).

The majority of studies on the pathogenic capabilities of Pantoea have centered on the plant T3SS (Alfano and Collmer 1997; Manulis and Barash 2003; Alfano and

Collmer 2004; Nissan, Manulis-Sasson et al. 2006). This T3SS, a Hrc 1 system similar to the plant T3SS of the other enteric plant pathogens such as Erwinia, is essential for disease establishment of P. agglomerans on gypsophila and beet (Nissan, Manulis-Sasson et al. 2006). It was suggested that these plant pathogenic capabilities were acquired recently through gain of a plant T3SS on pPATH, an approximately 150 kb pathogenicity plasmid (Manulis and Barash 2003). The T3SS was found to be in a pathogenicity island

(PAI), which showed a different %GC than the rest of the plasmid, suggesting horizontal gene transfer (Manulis and Barash 2003). While this study did suggest the importance of this system to the evolution of plant pathogenicity, it did not evaluate the distribution in

10

Pantoea, nor did it use any phylogenetic methods to determine the evolutionary origin of the system. Ultimately, the origin and evolution of the Pantoea plant T3SS has not been evaluated comprehensively. A more comprehensive analysis of the plant Hrc 1 T3SS in

Pantoea could lead to a better understanding of the distribution of plant pathogens within the genus, and can identify whether there is a correlation between plant pathogenesis and the presence of a T3SS.

More recently, Pantoea sp. have also been found to be relevant opportunistic human pathogens (Bicudo, Macedo et al. 2007; Duan, Yi et al. 2007; Brady, Cleenwerck et al. 2008; Rezzonico, Smits et al. 2009; Volksch, Thon et al. 2009). In addition to being isolated from human skin abrasions and bodily fluids such as blood and urine, Pantoea sp. have also been the causative agent of nosocomial outbreaks, monoarthritis and other human infections (De Champs, Le Seaux et al. 2000; Bicudo, Macedo et al. 2007; Cruz,

Cazacu et al. 2007; Aly, Salmeen et al. 2008; Brady, Cleenwerck et al. 2008; Rezzonico,

Smits et al. 2009; Lalas and Erichsen 2010). There have also been studies that link the plant–associating capabilities of Pantoea with its animal pathogenic capabilities. Several human infections have resulted from direct human contact with infected plants, often by cuts and scrapes from thorns, and splinters (Flatauer and Khan 1978; De Champs, Le

Seaux et al. 2000; Kratz, Greenberg et al. 2003; Cruz, Cazacu et al. 2007).

Recently, a second T3SS in the Pantoea stewartii subsp. stewartii str. DC283

(PstDC283) genome was characterized, Pantoea secretion island 2 (PSI-2) (Correa,

Majerczak et al. 2012). The PSI-2 system is evolutionarily related to the SPI-1 system of

Salmonella and it is used in the persistence of PstDC283 in the beetle, which it uses as a vector for reaching its host, maize (Correa, Majerczak et al. 2008; Correa, Majerczak

11 et al. 2012). The PSI-2 T3SS in PstDC283 will be referred to as SPI-1 for the rest of this thesis. Since this is the first report of a SPI-1 like T3SS in Pantoea, and there are no published examples of a Pantoea SPI-2 system, there are currently no studies that examine the evolution of the either animal system in Pantoea. Aside from the 3 plant- specific T3SEs that are known (Ham, Majerczak et al. 2006; Nissan, Manulis-Sasson et al. 2006; Ham, Majerczak et al. 2008; Ham, Majerczak et al. 2009; Weinthal, Barash et al. 2011), there are a handful of Pantoea SPI-1 effectors that were identified (Correa,

Majerczak et al. 2012), but there is currently no information regarding the function of these effectors.

Objectives

The primary objective of this thesis was to examine the known T3SSs of the sequenced strain PstDC283, Hrc 1, SPI-1, and SPI-2, and evaluate their distribution and evolution across the genus Pantoea. The secondary objective was to use bioinformatic approaches to identify T3SEs for each of the three T3SS in the PstDC283 genome and in other closely related species. These objectives will serve to expand our current knowledge of the role of these T3SSs in the pathogenic capabilities of PstDC283 and its relatives.

12

Materials and Methods

Bacterial Strains

The genetic survey examined 129 strains of Pantoea from 33 plant strains, 37 environmental strains, and 59 clinical strains. The plant strains used in the survey were obtained from New Zealand Culture of Plant Pathogens Dr. Steven Lindow at Berkeley, and Dr. Gwyn Beattie at Ohio State and the environmental isolates were collected through environmental sampling around the province of Saskatchewan. Clinical strains were obtained from the Regina General Hospital, Dr. Paul Levett at the Saskatchewan

Disease Control Laboratory, St. Boniface General Hospital in Winnipeg, Texas

Children’s Hospital, and Sunnybrook Hospital (Appendix B).

DNA extraction, amplification and sequencing

Bacterial strains were grown on lysogeny broth (LB) medium for 24 hours at 30 degrees Celsius. Genomic DNA was extracted and purified from 5 mL overnight cultures in LB medium using the Qiagen genomic DNA purification kit (Mississauga, Ontario).

Polymerase chain reaction (PCR) amplification and sequencing of internal portions of the three T3SS ATPase genes were performed using custom degenerate primers for each of the three T3SSs: Plant system primers were 5’ ACCGGACTGCGGGCGATTGA 3’ and

5’ GAACGMACTTCATCGGCYACCGGATC 3’, for SPI-1, 5’

TGACCTGTGGTGAAGGACAG CGCAT 3’ and 5’

GCGAAACGCYTCRGCGAYGGTG 3’ and for SPI-2 5’ TAAGCCG

TTTCGTATCGAGGTGCATG 3’ and 5’ GACGTGGCGTAGACCAGGATAATTTGC

3’ where M = A+C, Y = C+T, R = A+G. Primers were designed in conserved regions

13 using the Genbank sequences for the ATPase components of each of the T3SSs in

PstDC283 and close homologues. Each PCR was performed in a total volume of 20 µL using 14.4 μL sterile water, 2.0 μL 10x Standard Taq Buffer, 0.2 μL Taq DNA polymerase (5 U/μL), 0.2 μL forward primer (50 μM), 0.2 μL reverse primer (50 μM),

2.0 μL dNTPs (200 μM each), and 1 μL (200-500 ng/μL) template DNA. PCR amplicons were then purified from the PCR mix by cleaning with Calf Intestinal Alkaline

Phosphatase (CIP) and Exonuclease 1 (Exo) and incubated at 37 degrees Celsius for 30 minutes followed by inactivation at 85 degrees Celsius for 15 minutes. Primers were then premixed with clean PCR products and sent for sequencing to Operon, Huntsville,

Alabama.

Data and sequence analysis

All nucleotide and amino acid sequences were aligned using Clustal W

(Thompson, Higgins et al. 1994). For the genetic survey, the partial ATPase sequences were compared to known complete ATPase genes collected from GenBank, other species from the Enterobacteriaceae with known T3SSs. The alignment was transferred into

Molecular Evolutionary Genetics Analysis (MEGA) version 4.0 (Tamura, Dudley et al.

2007) to create a Neighbour-Joining phylogenetic tree with 1000 bootstrap replicates. For the evolutionary studies, sequences were also aligned in ClustalW and concatenated to generate a comprehensive multi locus sequence analysis (MLSA) species tree (composed of 16S, leuS, recA and rpoB) and MLSA trees for the plant Hrc 1 (HrcN, HrcV, HrcU and

HrpY), SPI-1 (InvG, SpaL, SpaS and InvA), and SPI-2 (SsaC, SsaN, SsaU, and SsaV)

T3SSs. These alignments were also transferred to MEGA to generate Neighbour-Joining

14 phylogenetic trees with 1000 bootstrap replicates. The Maximum Composite Model was used for all nucleotide trees and the Jones-Taylor-Thornton model for amino acid substitutions was used for all protein trees.

Annotating T3SSs

In many instances, the T3SSs were not annotated in the genomes, or were present on partial contigs of incomplete genomes. In these instances, the region of the T3SS was identified by tblastx using the hrcN gene. VectorNTI was used to find the open reading frames (ORFs) of the putative T3SS, and all proposed ORFs were subsequently Blasted with tblastx (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and annotated as genes of the T3SS.

Bacterial genomic data

All bacterial genome data was collected from the NCBI database using Blast, except for genome data for Brenneria salicis ATCC 15712, Dickeya dadantii 3937,

Erwinia carotovora subsp. atroseptica SCRI1043 (Pectobacterium carotovorum subsp. atroseptica SCRI1043), and Pectobacterium subsp. brasiliensis PBR1692, which were obtained from JGI Genome Portal (http://genome.jgi.doe.gov/). Sequences were prepared for analysis using Context text editing program and put into fasta format. PERL was also used for formatting and analyzing sequence data.

Perl survey

The Perl script “One script.pl” was used to extract open reading frames, search for the N-terminal secretion signal, and simultaneously survey the upstream region of each

15 gene for each of the three promoter boxes (Appendix A.2). The N-terminal secretion signal portion of the script examined the first 50 amino acids of all annotated genes for three conditions: 1) greater than 10% S in the first 50 amino acids, 2) I, L,V, A or P in the third or fourth amino acid position, and 3) no D or E in the first 12 amino acids. The promoter box portion of the script examined all 200 bp regions upstream of the start codon for each gene. The Perl script identified exact matches to each of the promoter box sequences in the 200 bp upstream region. The plant Hrp promoter box (Figure 1A), named for the hyper-sensitive response and pathogenicity genes in the Hrc 1 T3SS, survey used [GT]GGA[GA]C[TC]n{15,19}CCAC, based off of the P.syringae Hrp Box, for the first run, and GGAAC[CT]n{16}[CG]CAC for the PstDC283-specific re-run of

One Script.pl (where “[ ]” denotes or, and “{ }” denotes in between, and “n” denotes any nucleotide.) The SPI-1 promoter box (Figure 1B) effector survey used

[CG]GCnnTnnAnnnnAnnGAA as the consensus sequence for the PstDC283 survey.

Finally, the SPI-2 SsrA box survey used two sequences, a strong consensus

T[CA]AGG[CT]n[AT][AT][AT]nnCnGAT, and a weak consensus sequence,

Tn[AC][GA][GC][TC]n[AT][AT]n[AT][ACT]Cn[GT]AT, based off of the S. enterica

SsrA box, for the first run, and for the PstDC283 specific re-run

T[ACT][AC][GA]GT[TAC][AT][AT]nA[CAT]CT[GT]AT (Figure 1C). In the event that there were multiple contigs in the same file all in Genbank format, the Perl script

“Separate.pl” was used to separate the contigs into individual Genbank files for analysis

(Appendix A.1).

16

A)

B)

C)

Figure 1: Type III secretion system promoter boxes. A) Motif of the Hrc 1 promoter box for the four candidate effectors identified in PstDC283. B) Motif of the SPI-1 promoter box used for the SPI-1 effector search. C) The motif of the SPI-2 promoter box identified as a result of the first search.

17

Results and Discussion

Type III secretion system survey and distribution

The recently completed genome of PstDC283 was used as the basis for the genetic survey of the T3SSs in Pantoea. Three different T3SSs were discovered within the genome of PstDC283: a plant Hrc 1 T3SS, and two animal systems, SPI-1 and SPI-2.

The plant system in Pantoea has been previously characterized and is known to be extremely similar to the plant system found in the enteric plant pathogen, Erwinia

(Gophna, Ron et al. 2003; Troisfontaines and Cornelis 2005; Naum, Brown et al. 2009).

The Pantoea SPI-1 system has been shown to play a role in bacterial persistence in the flea beetle (Correa, Majerczak et al. 2008), whereas the SPI-2 system had not been reported previously. Since the presence of three T3SSs in one organism is rare, a survey was conducted to determine if all species of Pantoea have all three T3SSs. To evaluate the distribution of the T3SSs in Pantoea, a collection of 37 environmental, 59 clinical, and 33 plant-associated isolates was surveyed using a PCR-based approach using three sets of T3SS-specific primers. The Hrc 1 T3SS genetic survey uncovered 17 Hrc 1

T3SSs from the 129 isolates surveyed (Figure 2). We identified the plant system in 10 clinical isolates, and 7 plant isolates, yet none of the 34 environmental isolates from

Saskatchewan were found to have a plant T3SS. What was particularly surprising is that a high number of clinical isolates harbour a plant system. This suggests that the clinical isolates may have the potential to infect plants, and that the dual host capabilities of

Pantoea sp. could result in both human to plant, and plant to human transmission

(Kirzinger, Nadarasah et al. 2011).

18

Figure 2: Distribution of the plant (Hrc 1) and animal (SPI-1, SPI-2) T3SSs within the Pantoea sp. group as determined by a PCR-based survey of the ATPase. The presence of a T3SS is denoted by a black box corresponding to the type of T3SS and the strain. Clinical isolates are denoted by red circles, and plant isolates are identified by green squares. T3SSs were also identified in three enteric non-Pantoea species, Erwinia billingae, Cronobacter sakazakii and Klebsiella.

19

The plant system was identified in 5 of the 11 species of Pantoea examined. The distribution of the plant system in the different species groups could be evidence that prior to speciation, Pantoea acquired a plant system that was subsequently inherited vertically with diversification. Because of the lack of representation of the Hrc 1 T3SS across strains; however, this may be more support for horizontal gene transfer. It is also possible that there are different distinct lineages of the plant T3SS that are not identifiable by these primers. Still, the Hrc 1 plant T3SS was also identified in Erwinia billingiae,

Cronobacter sakazakii, and Klebsiella, which were used as outgroups to evaluate the stringency of the PCR-based approach. The identification of the Hrc 1 T3SS in the outgroup species, especially in E. billingiae, helps supports the robustness of the primers used for the Hrc 1 system.

The SPI-1 PCR survey found 12 instances of the SPI-1 T3SS in our Pantoea collection, 6 of which were identified in clinical isolates. Similarly to the Hrc 1 survey, it was found that there was a wide distribution of the SPI-1 system among 5 of the 11 species of Pantoea surveyed (Figure 2). Additionally, the 9 strains that were found to have SPI-1 were also found to have the plant Hrc 1 T3SS, 5 of which were isolated from a nosocomial setting. Because of this, it was expected that the SPI-1 T3SS would be identified in more clinical isolates throughout the library; however, since the function of the SPI-1 T3SS in Pantoea is unknown, it is difficult to determine if this system is necessary for human pathogenesis. The SPI-1 system was also identified in one of the survey controls, C. sakazakii, confirming that the primers used could identify T3SSs in other species.

20

The survey for the presence of a SPI-2 T3SS in our collection successfully identified 9 instances of the SPI-2 T3SS across 5 of the 11 species; 5 were identified in clinical isolates, and 4 in plant isolates (Figure 2). The genetic survey found that not only were there both clinical and plant isolates having the Hrc 1 and SPI-2 T3SSs, or the Hrc 1 and SPI-1 T3SSs, but some had only the SPI-2 T3SS. Given its known function in

Salmonella pathogenesis, intracellular pathogenesis and promoting pathogen survival from within the vacuole, its presence alone could denote that its function in pathogenesis in Pantoea sp. is different than in Salmonella. Alternatively, the SPI-2 secretion system identified could have gained dual functionality. If the SPI-2 system had the capabilities of both the SPI-1 T3SS and the SPI-2 T3SS, then it would be able to function in the absence of SPI-1 and still allow the bacterium to associate with a host. Furthermore, the SPI-2 system could be present in the absence of SPI-1 if there were an additional animal T3SS having a similar function to SPI-1.

Furthermore, there was only one isolate that was found to have only the plant Hrc

1 system with no accompanying animal system, SPI-1 or SPI-2, although interestingly it was one of the outgroups, E. billingiae. Additionally, there were no strains that had just the SPI-1 and SPI-2 systems, although three strains did have all three T3SSs. This included P. agglomerans B025670, P. brenneri B024858, and PstDC283. This interesting and novel finding is extremely exciting as it provides evidence that supports current host-pathogen models in Pantoea, which indicate the potential for cross-kingdom pathogenesis or host-association (Kirzinger, Nadarasah et al. 2011). No T3SSs were identified in any of the strains from P. ananatis, P. anthophila, P. calida,

21

Figure 3: Phylogenetic analysis of T3SS ATPases. (A) Hrc 1 system, (B) SPI-1 system and (C) SPI-2 system phylogenies were generated using amino acid sequence data of the T3SS ATPase. Clinical isolates are denoted by red circles and plant isolates are denoted by green squares. Taxa with reference T3SS systems (no square or circle symbol) were included for each phylogeny. All phylogenies are Neighbour-Joining using 1000 bootstrap replicates. The Agrobacterium flagellar ATPase component was used as an out-group for all phylogenies.

22

P. dispersa, P. eucrina, or P. septica. This is consistent with the loss of the T3SS in some species or lineages.

Phylogenetic analysis of the sequence data for each of the three genetic surveys confirmed that the PCR-based genetic survey was primer-specific and did not recover the same system more than once in strains harbouring more than one T3SS (Figure 3). For each of the three systems, the ATPase sequence data from the genetic survey grouped into the phylogeny where expected for each of the three T3SSs. For example, the plant

Hrc 1 T3SS ATPase sequences collected grouped with other known Hrc 1 T3SSs, confirming that there were three distinct T3SSs.

There are several potential reasons why no T3SSs were identified in some species and isolates surveyed. In the case of the environmental isolates processed by our laboratory, if the T3SSs were present on plasmids, loss of the plasmid carrying the T3SS could have occurred when the samples were cultured in vitro. Another possibility is that the T3SSs of these environmental isolates could be sufficiently divergent that they could not be amplified with the primers used. Lastly, and most simply, there may be no T3SSs present in these genomes, which could be the result of the T3SSs being lost from species or strains of Pantoea over time. Still, even though the survey did not identify T3SSs in several strains, there is not enough evidence to conclude that these strains do not have a

T3SS. Further investigation with primers from other T3SS lineages would be required to determine if they possess any other known T3SSs.

23

Evolutionary analysis of the plant (Hrc 1) T3SS

Phylogenies were generated to determine the evolutionary origins of the Pantoea sp. Hrc 1 T3SS. When a multilocus sequence analysis (MLSA) species tree composed of

16S rRNA, leuS, recA and rpoB was compared to the plant T3SS gene tree composed of

HrcC, HrcN, HrcU, and HrcV (Figure 4), it was noted that the two trees were congruent suggesting vertical inheritance. Although this is contrary to what has been previously reported for Pantoea (Gophna, Ron et al. 2003; Troisfontaines and Cornelis 2005; Naum,

Brown et al. 2009), this does not exclude horizontal gene transfer within and between the

Erwinia and Pantoea groups, which are evolutionarily closely related species.

Evaluation of horizontal transfer between these genera would require more Hrc 1 systems to be analyzed, and an MLSA species tree with greater resolution. Additionally, the

T3SS of Dickeya does not appear to have undergone horizontal gene transfer with other species.

The conserved topology of the clade consisting of Erwinia, Pantoea, and Dickeya in both phylogenies would suggest that these T3SSs have been in these genomes for an extended period of time, prior to speciation. Because of this, there is support for the

Pantoea sp. Hrc 1 T3SS being acquired ancestrally, prior to the speciation of Erwinia and

Pantoea. Contrary to previous studies, ancestral acquisition of the plant Hrc 1 T3SS provides evidence suggesting that species of Pantoea are very old plant pathogens that may have acquired the ability to cause disease in animal hosts.

To gain further insight into the evolution of the Hrc 1 T3SSs and to validate the phylogenetic groupings, the synteny of the Hrc 1 genomic island and its GC content was overlayed onto the gene tree (Figure 5). The genomic island organization of the

24

Figure 4: Comparison of the evolutionary history of the plant Hrc 1 T3SS to the species tree. The evolution of the Hrc 1 T3SS tree (HrcC, HrcN, HrcU, and HrcV) was evaluated by comparison to a species tree (concatenated 16S rRNA, leuS, recA, rpoB). The taxa in the species tree are all strains used in this study. Species in the MLSA tree that do not have a Hrc 1 system were greyed out. Lines joining the two trees represent identical branching patterns between the two trees, supporting vertical evolution. All of the genomic data in the species tree related to the organisms labelled, Pantoea agglomerans 299R species information was used because there was no genetic information for 16S rRNA, leuS, recA, and rpoB in Pantoea agglomerans pv. gypsophila. Also there was no identifiable 16S rRNA sequence for Yokenella regensburgei ATCC 43003, and as a result the 16s rRNA sequence from the closely related Yokenella regensburgei strain CIP 105435 was used. Additionally, the hrpY gene from Pseudomonas syringae strain B728a was used as there was no information regarding the hrpY gene in Pseudomonas syringae strain 642. Agrobacterium housekeeping genes and flagella were used for the species tree and gene tree respectively. Both phylogenies are Neighbour-Joining, 1000 bootstrap replicates.

25

Hrc 1 systems showed that the genetic composition and synteny of the plant systems was largely conserved. There were instances of deletions in a few systems, such as Brenneria salicis 15712, and even some insertions that were site specific, and conserved within the same species. In the Dickeya zeae Ech1591 and the D. dadantii 3937 genomes, there were three genes inserted at the same site in the T3SS cluster, and there was a single conserved site-specific insertion in Pectobacterium carotovorum subsp. atroseptica

SCRI1043 (P. atroseptica), P. carotovorum subsp. brasiliensis PRB1692 (P. brasiliensis), and P. carotovorum subsp. carotovorum PC1 (P. carotovorum). Instances of conserved site-specific and species-specific insertions are strong indicators of a more ancestral insertion of the T3SS into the genome that diversified with speciation. Aside from insertions and deletions, the entire Hrc 1 T3SS gene cluster is conserved across all enteric phytopathogens examined in this study. Not only are all of the operons facing in the same direction and in the same order, but the genes within each operon are also syntenic.

The GC content of genes within a genome was also examined, as it can be a useful indicator of the extent of acclimation to a genome. The %GC of the genes of the

T3SS regulon was compared to the %GC of all of the open reading frames of the genome, for the Plant Hrc 1 T3SS (Figure 5). Genes that have been in the genome longer are expected to have a similar %GC overall to that of the genome as compared to genes that were recently acquired. The exception would be in instances where genes that are acquired horizontally have come from a genome with a very similar %GC range as the recipient genome. The acclimatization of genes can be due to selective pressure that favours the preferred codon usage of the organism, which leads to changes in the DNA

26

Figure 5: Gene synteny and GC content of the plant Hrc 1 T3SSs. Gene synteny, organization and %GC of all T3SS Hrc 1 systems, overlayed onto the T3SS concatenated gene tree. Delta GC represents %GC of the genome - %GC of the T3SS. The genetic synteny of the T3SS gene cluster was also evaluated to demonstrate similarities between the T3SS gene clusters. Protein sequences used to create the genealogy on the left are shown in the cluster, highlighted, darkest to lightest: HrcN, HrcC, HrcV, and HrcU. Genes with diagonal lines represent conserved insertions into the T3SS gene cluster. Boundaries of the T3SS gene clusters were determined by Genbank annotation and Blast.

27 sequence (Puigbo, Bravo et al. 2008). In order to simplify this comparison, a ΔGC was used, where ΔGC = (average %GC of genes within the genome – the average %GC of genes within the T3SS gene cluster). Therefore, a ΔGC of -5 would mean that the genome has a %GC 5% lower than that of the T3SS, whereas a ΔGC of +5 would mean that the genome has a %GC 5% higher than that of the T3SS.

Given the phylogenetic data for the plant Hrc 1, it was expected that the ΔGC values would be close to 0. This was the case for the three species of Erwinia, all of which have a ΔGC of 0 (Figure 5). Interestingly, P. agglomerans pv. gypsophilae and

PstDC283 had ΔGCs of 4 and 5 respectively, the two highest values of ΔGC seen in the

Hrc 1 plant systems. Although there is strong phylogenetic evidence that suggests that these systems are the result of vertical inheritance, there is clear disagreement between the %GC of their T3SS and genome. Because there is evidence that the GC content of species within the same genus can vary, this could indicate that this may have been acquired from a different species of Pantoea. For example, in the genus Pseudomonas, the difference in GC content can be up to 6% between species (Guttman, Morgan et al.

2008). Likewise some species of Pantoea, like Pantoea ananatis, have an overall %GC as low as 53% (Choi, Lim et al. 2012). Alternatively, these systems have been horizontally acquired from a completely different bacterial genus not included in this analysis having a %GC around 50-52. If this were the case, this species would have to fall between the Erwinia and Pantoea groups. Given the phylogenetic congruence of the species tree, there is still support for vertical transmission of the Hrc 1 T3SS within the

Pantoea group. Although it is possible that the Hrc 1 T3SS was acquired initially as the result of a horizontal gene transfer event, the evidence for this acquisition if undetectable,

28 and there is enough evidence to support acquisition of the T3SS prior to the divergence of

Pantoea and Erwinia.

Evolutionary analysis of the SPI-1 T3SS

The genomic data collected from Genbank identified unreported SPI-1 T3SSs in

E. amylovora, E. tasmaniensis, and E. pyrifoliae, while the PCR based genetic survey identified previously un-reported SPI-1 T3SSs in Pantoea. Phylogenetic analysis of these systems showed that the Erwinia and Pantoea SPI-1 T3SSs group together in the T3SS gene tree in a manner that reflects the topology of the species tree (Figure 6). Given the branching pattern of the Erwinia and PstDC283, two models for evolution of the

PstDC283 T3SS are plausible. The first model supports the vertical inheritance of the

PstDC283 SPI-1 T3SS; that is, the SPI-1 T3SS was acquired prior to the diversification of Pantoea and Erwinia. This model also suggests that the Yokenella regensburgei system was acquired from PstDC283 through horizontal gene transfer. The second model suggests that although the PstDC283 T3SS groups with Erwinia, it was horizontally acquired from Y. regensburgei. This second model is difficult to discount given the absence of information on the Y. regensburgei T3SS and its distribution among the Yokenella group. The evolution of the SPI-1 T3SS with respect to PstDC283 could easily be determined with more genetic information and representation from Yokenella and Pantoea. The remainder of the T3SSs analyzed support that the SPI-1 T3SS is subject to horizontal gene transfer. An analysis of GC content and synteny supports this since the core of the gene cluster containing the conserved apparatus proteins is maintained across species.

29

Figure 6: Phylogenetic analysis and evolution of the animal SPI-1 T3SS. The tree on the left is the concatenated MLSA tree used in Figure 4. The tree on the right is a concatenated SPI-1 T3SS tree, constructed from alignments of InvG, SpaL, SpaS and InvA. Both phylogenies are Neighbour-Joining, 1000 bootstrap replicates. Only one representative from Yersinia enterocolitica and one representative from Burkholderia were used for the SPI-1 analysis, 8081 and K96243 respectively.

30

Further analysis of the ΔGC of the SPI-1 systems revealed that there were values that were greater than expected. In one instance, the ΔGC for 5a was 18, likely due to the presence of the T3SS on a plasmid (%GC = 34%), while the genome

%GC was 52%. Additionally, the Erwinia sp. has ΔGC values between +11 and +16, denoting extremely low %GC values for the T3SS. In contrast, the Yersinia sp. and

Providencia stuartii 25827 have ΔGC values close to zero. The SPI-1 T3SS organization is also extremely conserved across species (Figure 7), with the exception of a few species that have distinct rearrangements. These exceptions are Shewanella baltica, C. violaceum and B. thailandensis, all of which show the same conserved region with the other SPI-1 systems, but have noticeable rearrangements and insertions in the other regions of the

T3SS cluster. For PstDC283, the ΔGC was 2, while the Y. regensburgei ΔGC was 7.

PstDC283, and Y. regensburgei also show remarkable amounts of conservation and synteny within the gene cluster along with the remaining species, which given the phylogenetic evidence for horizontal gene transfer, suggests that the SPI-1 system is readily moving horizontally between species.

Evolutionary analysis of the SPI-2 T3SS

Much like the SPI-1 T3SS, the phylogenetic analysis of the SPI-2 system showed support for multiple horizontal gene transfer events, with no evidence of vertical inheritance for the SPI-2 T3SS. The combined MLSA species tree and SPI-2 concatenated gene tree (SsaC, SsaN, SsaU, and SsaV), showed several potential horizontal gene transfer events involving Burkholderia pseudomallei and

31

Figure 7: Gene synteny and GC content of the animal SPI-1 T3SSs. Gene synteny of the SPI-1 T3SS gene cluster and GC content of the T3SS cluster and genome was overlayed onto the SPI-1 gene phylogeny. The delta GC value was calculated as %GC of the genome - %GC of the T3SS. The genes used in the T3SS tree are SpaL, InvA, InvG, and SpaS, and their position is shown in the cluster by gray shading, from darkest to lightest.

32

Chromobacterium violaceum (Figure 8). This could largely be the result of missing information regarding the presence of a SPI-2 T3SS in species of the enteric bacteria that are not represented in this study. One of the more interesting results from the evaluation of the distribution of the SPI-2 T3SS was its absence in the Erwinia sp. despite being present and well distributed in Pantoea. Although this could suggest that the system was acquired horizontally in Pantoea, it is also possible that the system was acquired ancestrally and there has been a complete loss of the SPI-2 T3SS in Erwinia. The phylogenetic grouping of Y. regensburgei and PstDC283 in the system tree suggests one of two things: 1) that these two SPI-2 systems may have been transferred between these two species, or 2) that the system was acquired ancestrally prior to the speciation of

Pantoea and was lost from the Erwinia. The latter model would require the acquisition of the SPI-2 T3SS prior to the Pantoea/Erwinia group and the

Yokenella/Shigella/Salmonella group followed by the loss of the system in Erwinia,

Salmonella and Shigella. The Salmonella species would then have re-acquired a SPI-2

T3SS from a different lineage. Since this model for evolution would require at least 4 evolutionary steps, it is less parsimonious than the first model in which there is only a single transfer event between Pantoea and Yokenella. Additionally, the close grouping of these two species with Yersinia aldovae, whose position does not match the species tree, suggests that multiple exchanges have occurred. Interestingly, the ΔGC values of the

SPI-2 system were closer to 0 than those of the SPI-1 system, which is unexpected given the initial phylogenetic results that strongly suggest horizontal transfer (Figure 9). This suggests that these systems have either had some time to adapt to the genome, or that they may have been acquired from species that had similar genome %GC. In fact,

33

Figure 8: Phylogenetic analysis and evolution of the animal SPI-2 T3SS. The species tree from Figure 4 is shown on the left. The tree on the right was constructed using concatenation of SsaC, SsaN, SsaU, and SsaV from the SPI-2 T3SS. Species that do not have a SPI-2 system were greyed out in the species tree. Only one representative from Yersinia enterocolitica and one representative from Burkholderia were used for the SPI-2 analysis, 105r and MSMB43 respectively.

34

Figure 9: Gene synteny and GC content of the animal SPI-2 T3SSs. The SPI-2 T3SS gene clusters and GC content were overlayed onto the SPI-2 gene phylogeny. Delta GC was calculated as %GC of the genome - %GC of the T3SS as a means to simplify the comparison of %GC between the T3SS and the genome. The genes used to construct the concatenated T3SS tree are shown highlighted, darkest to lightest: SsaN, SsaC, SsaV and SsaU.

35

Y. aldovae, PstDC283, Y. regensburgei, and S. glossinidius all have genome %GCs within 7% of one another and their T3SS %GCs are within 4% of one another, 52%,

54%, 55% and 56% respectively. Additionally, these four systems also have T3SSs that are closely related phylogenetically with high bootstrap support, and their genomic organization is consistent within groups, suggesting horizontal transfer of the T3SS within this group (Figure 9). Additional resolution of the SPI-2 system phylogeny would be beneficial for determining the direction of transfer.

The PstDC283 ΔGC was +1, suggesting that the SPI-2 was acquired from a species with a very similar genomic %GC, or it is ancestral and has acclimatized to the genome. If we examine the first scenario, between Y. aldovae and Y. regensburgei, the species with the T3SS that has a %GC closest to the PstDC283 T3SS is Y. regensburgei,

54%, which is the same %GC of the coding sequences of the PstDC283 SPI-2 system. In conjunction with the phylogenetic data, it is difficult to determine whether the PstDC283

SPI-2 T3SS is from Y. regensburgei, or vice versa. On the other hand, since the SPI-2

T3SS was not detected in the Erwinia group studied, this model for the evolution of the

SPI-2 T3SS is less parsimonious. A more extensive dataset of the SPI-2 T3SS from

Yokenella, Pantoea and closely related species would need to be analyzed to determine which of these is most likely.

Bioinformatic identification and analysis of type III secreted effectors

A bioinformatic survey was used to identify potential T3SEs in the PstDC283 genome. A Perl script was used to extract open reading frames (ORFs) from genome files to search for the characteristic T3SE N-terminal secretion signal, in addition to extracting

36 the 200 bases upstream from the ORFs to search for the T3SS-specific promoter boxes.

Although the N-terminal secretion signal and the T3SS promoter boxes appear to be specific enough to positively identify T3SEs in the genome, there are always false positives. To reduce these, the characteristic N-terminal secretion signal was used in tandem with the presence of the promoter box to identify candidate T3SEs. Some effector proteins are physically linked to the T3SS gene cluster, while the majority are dispersed randomly throughout the genome (Frederick, Ahmad et al. 2001; Ham, Majerczak et al.

2006). The strategy for the identification of candidate effectors can be seen in Figure 10.

The survey identified 4 genes that met the criteria for a promoter for the Hrc 1 plant T3SS, 21 for the SPI-1 and 17 genes for the SsrA box SPI-2, while 260 genes in

PstDC283 met all criteria for the N-terminal secretion signal (Table 1). Of these, 12 genes had both the N-terminal secretion signal and one of the three promoter boxes, making these excellent candidate T3SEs. To ensure that other putative effectors were not being discarded, genes with just the T3SS promoter boxes, and the 260 genes with just the N-terminal secretion signal were also examined further.

Hrc 1 type III secretion effectors

Of the candidate effectors in PstDC283, 4 had a plant-specific T3SS promoter box (called a Hrp box), including wtsE, a known plant T3SE; hrpA, a component of the

T3SS; hrcJ, a secreted T3SS protein; and, schF, a T3SE chaperone protein (Table 2). wtsE is an AvrE family effector protein that has been identified and well studied in

PstDC283 and other plant pathogens (Frederick, Ahmad et al. 2001; Ham, Majerczak et al. 2006; Ham, Majerczak et al. 2008; Ham, Majerczak et al. 2009). wtsE has been shown

37

Extracted all open reading frames from genome

Extracted the first 150 nucleotides of the open reading frame

Extracted 200 bp upstream of each open reading frame

Translated the 150 nucleotides to amino acids

Pool of candidate Pool of candidate ILVAP in the 3rd or 4th amino Plant HrcN 1 effectors effectors acid position

Pool of candidate SPI-1 Pool of candidate No D or E in the first 12 effectors effectors amino acids

Pool of candidate SPI-2 Pool of candidate Greater than 10% S in the effectors effectors first 50 amino acids

Blast against the Genbank database

Homologous to genes in other species Homologous to effectors in other species No close homologues in the database, of Pantoea involved in metabolism with a similar T3SS, or homologous to and not homologous to genes in other and normal cell processes hypothetical proteins in these species species of Pantoea

False positives Candidate effectors Candidate effectors

Figure 10: Strategy for identifying candidate type III effectors.

38

Table 1: Summary of results from the bioinformatic prediction of type III secreted effectors Promoter Box N-terminal Total Number N-terminal secretion Species HrcN 1 SPI-1 SPI-2 sequence and Genes signal promoter box Agrobacterium sp. H3-13 2752 139 1 5 3 0 Burkholderia pseudomallei K96243 6255 269 3 15 5 1 Chromobacterium violaceum ATCC 12472 4405 266 1 14 11 3 Dickeya dadantii 3937 4571 308 7 10 15 6 Dickeya zeae Ech1591 4163 262 5 10 16 8 Erwinia amylovora CFBP1430 3677 238 6 8 16 8 Erwinia pyrifoliae Ep-96 3645 246 6 5 18 9 Erwinia tasmaniensis ET-99 3427 226 3 8 10 6 Pantoea stewartii subsp. stewartii DC283 5109 260 4 22 17 12 Pectobacterium carotovorum subsp. atroseptica SCRI1043 4510 335 6 14 15 9 Pectobacterium carotovorum PC1 4246 286 7 13 15 7 Providencia stuartii 25827 4737 327 0 9 21 16 Pseudomonas fluorescens WH6 6024 371 6 19 10 3 Pseudomonas syringae DC3000 5481 427 30 12 9 17 Salmonella typhimurium 1344 4531 253 0 13 21 7 LT2 4423 242 0 13 20 6 Salmonella enterica RKS4594 4574 250 0 9 24 7 Shewanella baltica OS195 4499 405 1 12 18 8

Sodalis glossinidius str. morsitans 2432 115 0 6 11 1 Yersinia aldovae 35236 3967 253 2 10 18 3 Yersinia enterocolitica 105R 3936 266 3 8 23 5 Yersinia enterocolitica 8081 4053 271 4 6 16 5 Yokenella regensburgei 43003 4722 266 0 8 20 4 Average 4354 273 4 11 15 7

39

Table 2: Genes with a plant T3SS promoter (HrcN1, Hrp box) in P. stewartii DC283, E. amylovora, E. pyrifoliae and E. tasmaniensis. Protein sequence GenBank length (aa); % Species accession no. Gene product in Pst DC283 Closest homologue (GenBank accession no.) identity (E-value) P. stewartii ZP_09830825 WtsE avirulence protein P. agglomerans DspE (AF271717) 1835; 69 (0.0) ZP_09830830 T3SS HrpF family protein E. tasmaniensis HrpF (YP_001906480) 74; 85 (8e-38) ZP_09830835 T3SS pilus protein HrpA E. amylovora HrpA (YP_003529900) 48; 81 (53-18) ZP_09830840 T3SS HrpJ component P. agglomerans HrpJ (CAC43012) 388; 77 (0.0) ZP_09831082 T3SS chaperone protein SchF E. pyrifoliae ShcF (YP_002648387) 136; 68 (1e-64) E. amylovora YP_003529893 T3SS HrpJ component E. pyrifoliae HrpJ (YP_002650067) 388; 96 (0.0) YP_003529900 T3SS pilus protein HrpA E. pyrifoliae HrpA (YP_002650062) 75; 100 (6e-24) YP_003531627 hypothetical protein EAMY_2269 E. amylovora ShcF (CBX81149) 136; 99 (6e-95) E. pyrifoliae YP_002648387 T3SS chaperone protein SchF E. amylovora hypothetical protein (YP_003531627) 136; 95 (2e-92) YP_002650062 T3SS pilus protein HrpA E. amylovora HrpA (YP_003529900) 48; 98 (1e-23) YP_002650067 T3SS HrpJ component E. amylovora HrpJ YP_003529893) 388; 96 (0.0) E. tasmaniensis YP_001906470 T3SS HrpJ component E. pyrifoliae HrpJ (YP_002650067) 388; 89 (0.0) YP_001906475 T3SS pilus protein HrpA E. pyrifoliae HrpA (YP_002650062) 54; 96 (3e-27) YP_001906489 HrpW E. pyrifoliae HrpW (YP_002650048)* 153; 74 (4e-38) YP_001906489 HrpW *HrpW had similarities to both ends but not the middle 73 nt 221; 87 (2e-120)

40 to be essential for the virulence of PstDC283 in corn, and even has cell death-inducing capabilities and defence suppressing activities in non-host plants in addition to being shown to prevent yeast growth (Ham, Majerczak et al. 2006; Ham, Majerczak et al.

2008). The identification of this known effector validated the bioinformatic approach and the use of the Hrp box sequence. Similarly, hrpA is a known structural component of the

T3SS (Taira, Tuimala et al. 1999). hrpA is the pilus subunit and is produced and secreted through the pore in order to form the pilus necessary for the translocation of effectors, and was also expected to be recovered (Taira, Tuimala et al. 1999; Brown, Mansfield et al. 2001). hrpJ on the other hand, though annotated as an outer membrane component of the T3SS, appears to be secreted through the T3SS and plays an important role in virulence once inside host cells (Fu, Guo et al. 2006; Crabill, Karpisek et al. 2012).

Mutational analysis showed that a hrpJ mutant was non-pathogenic, and though the T3SS was formed, hrpJ mutants lacked the ability to inject harpin and effectors into plant cells

(Deng and Huang 1999; Crabill, Karpisek et al. 2012). Furthermore, hrpJ is required for the wild-type elicitation of the hypersensitivity response (Nissinen, Ytterberg et al. 2007).

As a result, since hrpJ is transported into host cells, it is not surprising that it has both the secretion signal and the promoter box. It clearly plays an important role in pathogenesis and the regulation of effector secretion.

Lastly, schF was identified through the Hrc 1 effector survey. schF is a known and well-studied gene that codes for the chaperone protein for T3SEs hopF1 and hopF2

(Desveaux, Singer et al. 2006; Robert-Seilaniantz, Shan et al. 2006). A closer look at the gene up stream of schF in the PstDC283 genome revealed that schF may be the chaperone for Eop3, a T3SE found in E. amylovora that is homologous to the HopX

41 family of effectors (Nissinen, Ytterberg et al. 2007). hopX, formerly known as avrPphE, has previously been shown to suppress effector induced hypersensitivity response of

Pseudomonas syringae pv. tomato DC3000 in host Arabidopsis thaliana (Jamir, Guo et al. 2004). It was also shown that hopX can suppress Bax-induced programmed cell death in yeast (Jamir, Guo et al. 2004). Eop3 of the highly virulent strains of E. amylovora

Ea273, renamed HopX1Ea (Bocsanczy, Schneider et al. 2012), was shown to delay the hypersensitivity response when inoculated into Nicotiana benthamiana but not into

Nicotiana tabacum (Bocsanczy, Schneider et al. 2012). More interestingly, when over expressed in E. amylovora Ea273, the development of disease in apple shoots was significantly reduced (Bocsanczy, Schneider et al. 2012). These functions and the functions of hopX would suggest that Eop3 in PstDC283 functions to suppress a hypersensitivity response in host plants, although a functional assay would be necessary to confirm this.

In addition to these effectors found in PstDC283, the genes with Hrp boxes from

E. amylovora, E. pyrifoliae, and E. tasmaniensis were examined in detail and compared to the effector results from the PstDC283 Hrp box survey (Table 2). hrpA and hrpJ were identified in all three species of Erwinia (E. amylovora, E. pyrifoliae, and E. tasmaniensis), and schF was identified in E. amylovora, E. pyrifoliae but not in E. tasmaniensis. However, hrpW, a harpin, was identified in E. tasmaniensis but not in the other two species of Erwinia by the promoter box survey. BLAST was used to determine that there is no hrpW homologue in PstDC283.

Due to the low number of Hrp boxes identified in the PstDC283 genome, a secondary Hrp box analysis was initiated. This analysis involved refining the Hrp box

42 promoter consensus sequence using the four identified Hrp boxes from hrpA, hrpJ, schF, and wtsE. This second Hrp box identified three new candidate effectors, none of which were found to have an N-terminal secretion signal, two false positives and hrpF, a putative translocon protein required for pathogenesis and is proposed to be a mediator for effector delivery at the bacterial-plant interface (Buttner, Nennstiel et al. 2002).

Surprisingly, it was reported that hrpF has a secretion signal (Buttner, Nennstiel et al.

2002), yet it was not recovered in the 260 proteins that had predicted secretion signals.

Upon closer inspection it was discovered that hrpF in PstDC283 meets two of the three selection criteria for the N-terminal secretion sequence, further evidence that there are exceptions to the specific criteria used for bioinformatic identification of T3SEs.

SPI-1 type III secretion effectors

The search for the SPI-1 promoter box was more difficult than the search for plant and SPI-2 effectors. This was primarily due to the lack of a HilA homologue in

PstDC283, which prompted the search for a novel regulatory protein for the PstDC283

SPI-1 T3SS before an analysis to identify a novel SPI-1 promoter box. Analysis to find a novel regulatory protein involved examining all genes in the SPI-1 for conserved domains similar to those of HilA, primarily “trans_reg_c” DNA binding domain from the

HTH superfamily (Marchler-Bauer, Lu et al. 2011). Two candidate regulators were found, CadC and MixE. CadC has high similarity to the trans_reg_c domain from the

HTH superfamily with a DNA binding site (E value = 5.43e-14). A homologue of the

CadC gene was also identified in Y. regensburgei, which also lacks a functional HilA protein. Secondly, during the search for a protein with the HTH superfamily domain,

43

MixE from PstDC283 was found to have high similarity to a transcriptional regulator

InvF domain (PRK15340) (E value 4.45e-24). A similar protein was also identified in Y. regensburgei with an E value = 1.93e-24. Although these are strong candidates for the regulation of the SPI-1 system, further functional analysis is required to determine if these two genes play a role in regulation.

To identify a potential promoter box that could be used to identify SPI-1 T3SEs,

the promoter regions from mixE and psaF the genes at the beginning of the two

major structural operons of the SPI-1 gene cluster were analyzed for a common

promoter box. The DNA box, [CG]GCnnTnnAnnnnAnnGAA, appeared in both

sequences. When this sequence was applied as a filter to the upstream regions of

all genes in the PstDC283 genome, 21 candidates were identified. The promoter

regions of the genes used to generate the motif (mixE and psaF) were identified

(Table 3), but the other 19 genes were also present in Pantoea isolates that lack a

T3SS and associated with metabolism, suggesting these are false positives. This

means that either a) there is insufficient information from these three sequences to

positively identify a meaningful pattern to search for in the upstream regions of

the PstDC283, b) the regulation of the SPI-1 system is more complicated than the

SPI-1 T3SS from Salmonella, or c) there are no T3SEs for the SPI-1 system in the

PstDC283 genome. In the absence of additional candidate effectors for this

system, functional testing would be required to definitively identify a promoter

consensus sequence for the regulatory element of the SPI-1 T3SS in PstDC283.

44

Table 3: SPI-1 promoter search results for P. stewartii DC283 and E. amylovora, E. pyrifoliae, E. tasmaniensis, S. glossinidius, Y. aldovae and Y. regensburgei. Protein sequence GenBank length (aa); % Species accession no. Gene product in Pst DC283 Closest homologue (GenBank accession no.) identity (E-value) P. stewartii ZP_09831173 T3SS regulatory protein MixE Y. regensburgei transcriptional regulator (ZP_09390285) 287; 63 (1e-106) ZP_09831172 T3SS apparatus protein psaF Y. regensburgei PrgH (ZP_09390287) 245; 62 (6e-104) E. amylovora - - - - E. pyrifoliae - - - - E. tasmaniensis - - - - S. glossinidius YP_454241 T3SS apparatus InvG S. glossinidius YsaC (AAS66838) 586; 100 (0.0) YP_454524 hypothetical protein SG0844 P. stewartii hypothetical protein CKS_5694 (ZP_09831446) 84; 99 (5e-52) Y. aldovae - - - - Y. regensburgei - - - -

45

The three candidate T3SEs for SPI-1 identified from the PstDC283 genome were compared to the candidates identified in E. amylovora, E. pyrifoliae, E. tasmaniensis, S. morsitans, Y. aldovae and Y. regensburgei (Table 3) by Blast. These results showed that there were no shared candidate effectors for any of the species, however, the S. glossinidius screen identified invG, a component of the T3SS, which was not only found to have the new proposed SPI-1 promoter box, but also had an N-terminal secretion signal. A conserved hypothetical protein from S. glossinidius also matched a hypothetical protein in PstDC283, but this protein was not recovered by the SPI-1 promoter search in

PstDC283. Despite the fact that this hypothetical protein does not have any conserved domains and no secretion signal, it still has the potential to be an effector for the SPI-2 system and further testing is required to determine its role, if any in the pathogenicity of

PstDC283.

SPI-2 type III secretion effectors

The promoter box for SPI-2, the SsrA box, which was defined in the Salmonella system, was used in a survey to identify SPI-2 T3SSs (Tomljenovic-Berube, Mulder et al.

2010). Both strong and weak SsrA box consensus sequences were identified in a previous study that analyzed Salmonella (Tomljenovic-Berube, Mulder et al. 2010). The strong box pattern was extremely specific whereas the weak box pattern had more variability in the sequence and thus, lower stringency. There were candidate effectors in PstDC283 that were identified by the strong box that were not identified by the weak box and vice versa.

In PstDC283, the SsrA box survey identified 17 candidate T3SEs for the SPI-2

T3SS (Table 4); however, only 6 of these candidates were examined in further detail,

46

Table 4: SPI-2 promoter (SsrA box) survey results for P. stewartii DC283, S. glossinidius, Y. aldovae, and Y. regensburgei. Protein sequence GenBank length (aa); % Species accession no. Gene product in Pst DC283 Closest homologue (GenBank accession no.) identity (E-value) Pst DC283 ZP_09829532 translocated effector protein SifA S. enterica SifA (AF325833_2) 343; 26 (2e-18) ZP_09830229 hypothetical protein CKS_1167 Rahnella sp. hypothetical protein (YP_004211931) 571; 58 (0.0) ZP_09831121 T3SS apparatus protein SsaR Y. regensburgei T3SS YscR family protein (ZP_09390311) 251; 98 (1e-129) ZP_09831134 T3SS apparatus protein SsaG Y. regensburgei T3SS needle protein (ZP_09390324) 71; 94 (9e-41) ZP_09831147 T3SS apparatus protein SsaC Y. regensburgei T3SS YscC (ZP_09390336) 505; 95 (0.0) S. glossinidius YP_455765 T3SS SpaO protein Candidatus Sodalis SpaO (AFH88792) 338; 71 (4e-165) YP_454961 secreted effector protein Candidatus Sodalis SsaB (AFH88803) 132; 72 (9e-57) YP_454972 T3SS apparatus protein SsaG Candidatus Sodalis SsaG (AFH88807) 71; 89 (1e-38) YP_454985 T3SS apparatus protein SsaR Candidatus Sodalis SsaR (AFH88820) 215; 95 (3e-128) Y. aldovae ZP_04619023 pathogenicity island protein C Y. regensburgei putative T3SS protein (ZP_09390337) 107; 73 (1e-41) ZP_04619034 T3SS apparatus protein SsaG Y. regensburgei T3SS needle protein SsaG (ZP_09390324) 71; 92 (3e-41) ZP_04619041 T3SS protein ssaM Y. regensburgei T3SS apparatus protein SsaM (ZP_09390317) 122; 66 (3e-50) ZP_04619047 T3SS apparatus protein YscR Y. regensburgei T3SS apparatus protein YscR (ZP_09390311) 215; 89 (2e-122) Y. regensburgei ZP_09390311 T3SS apparatus protein YscR P. stewartii SsaR (ZP_09831121) 215; 98 (1e-129) ZP_09390324 T3SS needle protein SsaG Y. aldovae SsaG (ZP_04619034) 71; 92 (3e-41) ZP_09390336 T3SS outer membrane pore YscC P. stewartii SsaC (ZP_09831147) 494; 95 (0.0) ZP_09390337 putative T3SS protein P. stewartii secreted effector protein (ZP_09831148) 133; 88 (1e-70)

47 since the other 11 candidates were found to be genes responsible for metabolic processes or regulation present in other species of Pantoea lacking a T3SS. Of the six candidates, five were identified as known T3SS-associated proteins, including three apparatus proteins and two homologs of known secreted effectors, Effector 1 and Effector 2. The apparatus proteins identified were SsaC, SsaG and SsaR, a putative outer ring protein, a putative needle component protein, and a putative component of the type III secretion apparatus, respectively (Hensel, Shea et al. 1997; Aizawa 2001; Rappl, Deiwick et al.

2003). The first candidate T3SE, Effector 1, shares weak similarity with the previously characterized T3SE ssaB (E value: 4e-24), and is homologous to a putative pathogenicity island protein of Y. regensburgei ATCC 43003 (E value: 1e-70), pathogenicity island protein C of Y. aldovae ATCC 35236 (E value: 1e-37), and a secreted effector protein in

S. morsitans (E value: 8e-25). Effector 1 was also found to have a conserved domain match, and shares homology with SipC, a provisional pathogenicity island chaperone protein (E value: 2.70e-34). SipC is a well known and characterized chaperone protein that along with SsaM and is necessary for the ordered secretion of SPI-2 effectors in

Salmonella (Yu, Ruiz-Albert et al. 2002; Yu, Liu et al. 2004). Effector 2 has a Sif super family domain (E value: 8.55e-24) and shares weak similarity with SifA (E value: 2e-18), which promotes intracellular survival and replication by associating with the Salmonella- containing vacuole and the Salmonella-induced filaments (Brumell, Goosney et al. 2002;

Knodler, Vallance et al. 2003; Alto, Shao et al. 2006). Although both of these candidate effectors have known homologues, a full functional analysis would be necessary to determine their function in PstDC283 host-association.

48

The last candidate effector of the six identified is a hypothetical protein that shares strong similarity with other hypothetical proteins in Rahnella sp. Y9602 (E value:

0.0), Yersinia mollaretii ATCC 43969 (E value: 0.0). It has two conserved domains, a predicted ATPase domain (E value: 3.27e-21), and a DUF2326 super family domain, found in conserved hypothetical proteins, but has no known function (E value: 6.83e-13).

The entire set of candidate effectors from the SPI-2 SsrA promoter box search of

PstDC283 were compared to all SsrA hits for S. morsitans, Y. aldovae and Y. regensburgei in order to determine if the promoter box survey had the ability to identify

T3SEs in closely related SPI-2 T3SSs. SsaG and YscR were all identified in the three species, suggesting that the SsrA box is conserved (Table 4). In S. glossinidius, SpaO and

SsaB were also identified, which are required for the needle complex assembly and an effector that inhibits cellular trafficking in macrophages (Galan 2001; Hindle, Chatfield et al. 2002). A putative T3SS protein and SsaM, a protein that has been shown to be essential for the secretion of certain T3SEs (Yu, Liu et al. 2004), in Y. aldovae, and YscC and a putative effector protein, which shares similarity to a secreted effector protein in

PstDC283, were identified in Y. regensburgei.

Similar to the Hrp box and SPI-1 promoter box surveys, the survey for the SsrA box was also conducted using a consensus promoter from the six aforementioned genes.

The consensus sequence for this box is

T[ACT][CA][GA]GT[CAT][AT][AT]N[AT][CAT]C T[GT]AT. These results did not yield any additional candidate effectors.

49

N-terminal secretion signal survey

The bioinformatic analysis identified 12 candidate effectors from the genome that were found to have both a T3SS promoter box and the N-terminal secretion signal in

PstDC283. However, of these 12 genes, only two were identified as potential

T3SEs, hrpJ (Hrc 1) and sifA (SPI-1). The other 10 candidates were found to have homology to known genes involved in metabolic and regulatory processes in other species of Pantoea that do not have a T3SS, suggesting these were false positives. To identify possible false-negatives, the 260 genes identified in the N-terminal secretion signal survey were compared against the Genbank database, and 42 were identified as potential candidate T3SEs. Of these 42 genes, several were known flagellar genes, bacterial phage genes and genes involved in metabolism, and were removed from the pool. The remaining 15 genes were not found in other species of Pantoea lacking a T3SS, and were analysed in further detail (Table 5).

Of the 15 genes, six are Hrc 1 T3SS-associated, including components of the apparatus, hrpJ, hrcV, and hrcQ, two known SPI-1 effectors, ospC1 and sipD (IpaD in

Table 5), and one known SPI-2 effector sifA. The remaining two plant apparatus genes, hrcV and hrcQ are an inner membrane channel protein and a cytoplasmic protein of the

T3SS respectively (Viprey, Del Greco et al. 1998; Rossier, Van den Ackerveken et al.

2000). ospC1 and sipD are both Shigella virulence genes, ospC1 contribute to the polymorphonuclear transepithelial migration of Shigella (Zurawski, Mitsuhata et al.

2006) and ipaD, the Shigella homologue of sipD, has been shown to play a crucial role in gaining entry into epithelial cells and escaping the phagosome

50

Table 5: N-terminal secretion signal survey results for P. stewartii DC283 Protein sequence GenBank length (aa); % Identified Identified by accession no. Gene product in Pst DC283 Closest homologue (GenBank accession no.) identity (E-value) by Blast Promoter box ZP_09827682 hypothetical protein CKS_3879 S. enterica hypothetical protein (EHC71380) 95; 98 (4e-60) x ZP_09827693 hypothetical protein CKS_3890 No Homologues 374; N/A x ZP_09827752 hypothetical protein CKS_3957 hypothetical protein (EKA98854) 62; 69 (2e-25) x ZP_09828622 hypothetical protein CKS_2598 E. coli hypothetical protein (EII22819) 197; 43 (6e-43) x ZP_09828815 hypothetical protein CKS_2381 E. coli hypothetical protein (ZP_06657356) 342; 80 (0.0) x ZP_09829527 hypothetical protein CKS_1634 hypothetical protein (EKB70366) 80; 38 (4e-11) x ZP_09829532 translocated effector protein SifA S.enterica SifA (AF325833_2) 343; 26 (2e-18) x x ZP_09830840 T3SS HrpJ component P. agglomerans HrpJ (CAC43012) 388; 77 (0.0) x x ZP_09830841 T3SS protein HrcV P. agglomerans HrcV (CAC43013) 717; 90 (0.0) x ZP_09830846 T3SS protein HrcQ E. pyrifoliae HrcQ (ABA39816) 319; 56 (1e-117) x ZP_09831166 hypothetical protein CKS_4542 Y. regensburgei hypothetical protein (ZP_09390293) 124; 48 (5e-26) x ZP_09831189 IpaD family protein Y. regensburgei IpaD (ZP_09390269) 382; 55 (2e-118) x ZP_09831198 OspC1 family protein Shigella flexneri OspC1 (NP_858227) 480; 54 (6e-161) x ZP_09831387 hypothetical protein CKS_5585 No Homologues 136; N/A x ZP_09831476 hypothetical protein CKS_5533 Photorhabdus asymbiotica insecticidal toxin (YP_003041066) 1856; 38 (2e-148) x

51

(Menard, Sansonetti et al. 1993). Of the 15 genes, only hrpJ and sifA were identified through both the promoter box search as well as the N-terminal secretion signal search.

Of the remaining 9 hypothetical proteins, one had weak similarity to conserved domains, the E3 ubiquitin-protein ligase SopA domain (E value, 4.70e-4) and a pentapeptide repeats-containing domain of unknown function (E value, 4.21e-05). Both of these domains are located in the same region of the hypothetical protein. Of particular interest was the hypothetical protein that shares high similarity to an insecticidal toxin in

Photorhabdus asymbiotica, Yersinia ruckeri, and E. tasmaniensis (E values = 2e-148, 3e-

147, and 5e-102, respectively). However, this protein has no conserved domains.

The promoter box analyses identified several putative novel effectors: 1 Hrc 1, 1

SPI-1, and 3 SPI-2 effectors in addition to identifying several potential effectors through the N-terminal secretion signal survey (Table 5). Functional analysis is required in order to definitively determine the role that these genes play in the general biology and host- association of PstDC283. In addition, the promoter box and N-terminal secretion signal analyses also identified 5 proteins previously associated with the Hrc 1 system, 3 proteins previously associated with the SPI-1 system and 4 proteins previously associated with the

SPI-2 T3SS. Although the bioinformatics surveys only identified a few novel effectors, their ability to successfully identify previously identified components of the respective

T3SSs in both PstDC283 and closely related species with T3SSs validates the use of bioinformatic analyses for T3SE identification.

52

Concluding Remarks

Pantoea sp. are ubiquitous plant pathogens and opportunistic pathogens that have been attributed to numerous infections in both commercially relevant crops and nosocomial settings. The presence of T3SSs distributed among Pantoea and the identification of the presence of three different T3SSs in both clinical isolates and a plant isolate of Pantoea provides insight into the versatility of the T3SS in host-association, and our lack of understanding of Pantoea biology.

This work revealed that the plant T3SS in Pantoea is the most ancestral, followed by the SPI-1 T3SS and the SPI-2 T3SS. The plant Hrc 1 T3SS phylogenetic analysis showed extremely strong support for the ancestral acquisition of the Hrc 1 T3SS.

Additionally the T3SS gene cluster itself is extremely conserved across species and there are species-specific insertions and deletions, which indicate that this system was acquired prior to speciation. The SPI-1 T3SS also supports a model for the vertical inheritance prior to the speciation of the Pantoea and Erwinia. Although there is also the possibility that the Yokenella and Pantoea SPI-1 T3SSs may have been transferred horizontally at some point, additional genomic data from additional species in each genus would be required to determine if there is any possibility that there was vertical inheritance followed by loss. Finally the SPI-2 system analysis revealed two potential models for the evolution of the SPI-2 system: 1) horizontal exchange with between Pantoea and

Yokenella and 2) ancestral acquisition prior to the separation of the Erwinia/Pantoea group and the Shigella/Yokenella/Salmonella group. However, of these two possibilities, the horizontal exchange between Pantoea and Yokenella is far more parsimonious than an ancestral acquisition.

53

This work suggests that Pantoea was an ancestral plant pathogen that evolved the ability to associate with animal hosts as either vectors for plant pathogenesis, or alternatively, as primary hosts that they can exploit either opportunistically or when appropriate host plants are unavailable. Although the functions of these novel T3SSs have yet to be determined, and there is no correlation between the type of isolate of

Pantoea (clinical, environmental or plant) with the presence of a certain type of T3SS, their distribution among known plant and clinical isolates suggests complex patterns of dual host pathogenicity and cycles of disease that await discovery.

54

Future Directions

This study has set the ground work for future evolutionary analysis of T3SSs in addition to bioinformatic surveying for T3SEs in the genus Pantoea. The next step for this project is mutating the ATPase components of each of the systems in order to determine the contribution of the T3SSs to colonization of both animal and plant hosts. In addition, the role of the T3SEs identified can also be tested in the lab. This data could be combined with the work of Geeta Nadarasah (Nadarasah 2012), which included host assays for the entire collection of Pantoea isolates in corn, onion and fruit flies, and the work Amanda Dancsok (Dancsok 2012), which includes a nematode host assay of several of the species.

Finally, the bioinformatics work can continue with the recent completion of sequencing of 6 genomes from our collection that have one or more T3SSs. These systems can be added to the phylogeny, and the genomes can be surveyed for effector proteins as well. Though this project has revealed several secrets of the T3SSs in the genus Pantoea, there are still numerous exciting directions for further exploration of the role of the T3SS in host-microbe interactions.

55

List of References

Alfano, J. R., A. O. Charkowski, et al. (2000). "The Pseudomonas syringae Hrp pathogenicity island has a tripartite mosaic structure composed of a cluster of type III secretion genes bounded by exchangeable effector and conserved effector loci that contribute to parasitic fitness and pathogenicity in plants." Proc Natl Acad Sci U S A 97(9): 4856-4861.

Alfano, J. R. and A. Collmer (1997). "The type III (Hrp) secretion pathway of plant pathogenic bacteria: trafficking harpins, Avr proteins, and death." J Bacteriol 179(18): 5655-5662.

Alfano, J. R. and A. Collmer (2004). "Type III secretion system effector proteins: double agents in bacterial disease and plant defense." Annu Rev Phytopathol 42: 385- 414.

Bartsev, A. V., W. J. Deakin, et al. (2004). "NopL, an effector protein of Rhizobium sp. NGR234, thwarts activation of plant defense reactions." Plant Physiol 134(2): 871-879.

Betts, H. J., R. R. Chaudhuri, et al. (2004). "An analysis of type-III secretion gene clusters in Chromobacterium violaceum." Trends in Microbiology 12(11): 476- 482.

Choi, O., J. Y. Lim, et al. (2012). "Complete Genome Sequence of the Rice Pathogen Pantoea ananatis Strain PA13." Journal of Bacteriology 194(2): 531.

Cornelis, G. R. (2002). "Yersinia type III secretion: send in the effectors." J Cell Biol 158(3): 401-408.

Cornelis, G. R. (2006). "The type III secretion injectisome." Nat Rev Microbiol 4(11): 811-825.

Correa, V. R., D. R. Majerczak, et al. (2012). "The Bacterium Pantoea stewartii Uses Two Different Type III Secretion Systems To Colonize Its Plant Host and Insect Vector." Applied and Environmental Microbiology 78(17): 6327-6336.

Correa, V. R., D. R. Majerczak, et al. (2008). "Characterization of a Pantoea stewartii TTSS gene required for persistence in its flea beetle vector." Phytopathology 98(6): S41-S41.

Costa, S. C., A. M. Schmitz, et al. (2012). "A new means to identify type 3 secreted effectors: functionally interchangeable class IB chaperones recognize a conserved sequence." MBio 3(1).

56

Dale, C., T. Jones, et al. (2005). "Degenerative evolution and functional diversification of type-III secretion systems in the insect endosymbiont Sodalis glossinidius." Mol Biol Evol 22(3): 758-766.

Dale, C., S. A. Young, et al. (2001). "The insect endosymbiont Sodalis glossinidius utilizes a type III secretion system for cell invasion." Proceedings of the National Academy of Sciences of the United States of America 98(4): 1883-1888.

Dancsok, A. (2012). Comparative virulence assays of environmental and clinical isolates of Pantoea sp. on nematodes. Department of Chemistry and Biochemistry. Regina, University or Regina. B.Sc.: 41.

Dieye, Y., K. Ameiss, et al. (2009). "The Salmonella Pathogenicity Island (SPI) 1 contributes more than SPI2 to the colonization of the chicken by Salmonella enterica serovar Typhimurium." BMC Microbiology 9(1): 3.

Foultier, B., P. Troisfontaines, et al. (2003). "Identification of substrates and chaperone from the Yersinia enterocolitica 1B Ysa type III secretion system." Infect Immun 71(1): 242-253.

Galan, J. E. (2001). "Salmonella interactions with host cells: type III secretion at work." Annu Rev Cell Dev Biol 17: 53-86.

Galan, J. E. and A. Collmer (1999). "Type III secretion machines: bacterial devices for protein delivery into host cells." Science 284(5418): 1322-1328.

Gavini, F., J. Mergaert, et al. (1989). "Transfer of Enterobacter agglomerans (Beijerinck 1888) Ewing and Fife 1972 to Pantoea gen. nov. as Pantoea agglomerans comb. nov. and Description of Pantoea dispersa sp. nov." International Journal of Systematic Bacteriology 39(3): 337-345.

Gophna, U., E. Z. Ron, et al. (2003). "Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events." Gene 312: 151-163.

Grant, S. R., E. J. Fisher, et al. (2006). "Subterfuge and manipulation: type III effector proteins of phytopathogenic bacteria." Annu Rev Microbiol 60: 425-449.

Guttman, D. S., R. L. Morgan, et al. (2008). The Evolution of the Pseudomonads. M. B. Fatmi, A. Collmer, N. S. Iacobelliset al, Springer Netherlands: 307-319.

Guttman, D. S., B. A. Vinatzer, et al. (2002). "A functional screen for the type III (Hrp) secretome of the plant pathogen Pseudomonas syringae." Science 295(5560): 1722-1726.

Hansen-Wester, I. and M. Hensel (2001). "Salmonella pathogenicity islands encoding type III secretion systems." Microbes Infect 3(7): 549-559.

57

Hardt, W. D., L. M. Chen, et al. (1998). "S. typhimurium encodes an activator of Rho GTPases that induces membrane ruffling and nuclear responses in host cells." Cell 93(5): 815-826.

Hauck, P., R. Thilmony, et al. (2003). "A Pseudomonas syringae type III effector suppresses cell wall-based extracellular defense in susceptible Arabidopsis plants." Proc Natl Acad Sci U S A 100(14): 8577-8582.

Hensel, M. (2000). "Salmonella pathogenicity island 2." Mol Microbiol 36(5): 1015- 1023.

Hueck, C. J. (1998). "Type III protein secretion systems in bacterial pathogens of animals and plants." Microbiol Mol Biol Rev 62(2): 379-433.

Innes, R. W., A. F. Bent, et al. (1993). "Molecular analysis of avirulence gene avrRpt2 and identification of a putative regulatory sequence common to all known Pseudomonas syringae avirulence genes." J Bacteriol 175(15): 4859-4869.

Iriarte, M. and G. R. Cornelis (1998). "YopT, a new Yersinia Yop effector protein, affects the cytoskeleton of host cells." Molecular Microbiology 29(3): 915-929.

Jehl, M. A., R. Arnold, et al. (2011). "Effective--a database of predicted secreted bacterial proteins." Nucleic Acids Res 39(Database issue): D591-595.

Kirzinger, M. W. B., G. Nadarasah, et al. (2011). "Insights into Cross-Kingdom Plant Pathogenic Bacteria." Genes 2(4): 980-997.

Lara-Tejero, M. and J. E. Galan (2009). "Salmonella enterica serovar typhimurium pathogenicity island 1-encoded type III secretion system translocases mediate intimate attachment to nonphagocytic cells." Infect Immun 77(7): 2635-2642.

Lopez-Solanilla, E., P. A. Bronstein, et al. (2004). "HopPtoN is a Pseudomonas syringae Hrp (type III secretion system) cysteine protease effector that suppresses pathogen-induced necrosis associated with both compatible and incompatible plant interactions." Mol Microbiol 54(2): 353-365.

Lostroh, C. P., V. Bajaj, et al. (2000). "The cis requirements for transcriptional activation by HilA, a virulence determinant encoded on SPI-1." Molecular Microbiology 37(2): 300-315.

Lostroh, C. P. and C. A. Lee (2001). "The HilA box and sequences outside it determine the magnitude of HilA-dependent activation of P(prgH) from Salmonella pathogenicity island 1." J Bacteriol 183(16): 4876-4885.

Lostroh, C. P. and C. A. Lee (2001). "The Salmonella pathogenicity island-1 type III secretion system." Microbes Infect 3(14-15): 1281-1291.

58

Lower, M. and G. Schneider (2009). "Prediction of type III secretion signals in genomes of gram-negative bacteria." PLoS One 4(6): e5917.

Makino, K., K. Oshima, et al. (2003). "Genome sequence of Vibrio parahaemolyticus: a pathogenic mechanism distinct from that of V cholerae." Lancet 361(9359): 743- 749.

Manulis, S. and I. Barash (2003). "Pantoea agglomerans pvs. gypsophilae and betae, recently evolved pathogens?" Mol Plant Pathol 4(5): 307-314.

Marie, C., W. J. Broughton, et al. (2001). "Rhizobium type III secretion systems: legume charmers or alarmers?" Curr Opin Plant Biol 4(4): 336-342.

McDermott, J. E., A. Corrigan, et al. (2011). "Computational prediction of type III and IV secreted effectors in gram-negative bacteria." Infect Immun 79(1): 23-32.

Mudgett, M. B. (2005). "New insights to the function of phytopathogenic bacterial type III effectors in plants." Annu Rev Plant Biol 56: 509-531.

Nadarasah, G. (2012). Phylogenetic Analysis and Characterization of Plant, Environmental, and Clinical Strains of Pantoea. Department of Biology. Regina, University of Regina. M.Sc.: 162.

Naum, M., E. W. Brown, et al. (2009). "Phylogenetic evidence for extensive horizontal gene transfer of type III secretion system genes among enterobacterial plant pathogens." Microbiology 155(Pt 10): 3187-3199.

Nguyen, L., I. T. Paulsen, et al. (2000). "Phylogenetic analyses of the constituents of Type III protein secretion systems." J Mol Microbiol Biotechnol 2(2): 125-144.

Nissan, G., S. Manulis-Sasson, et al. (2006). "The type III effectors HsvG and HsvB of gall-forming Pantoea agglomerans determine host specificity and function as transcriptional activators." Mol Microbiol 61(5): 1118-1131.

Nissinen, R. M., A. J. Ytterberg, et al. (2007). "Analyses of the secretomes of Erwinia amylovora and selected hrp mutants reveal novel type III secreted proteins and an effect of HrpJ on extracellular harpin levels." Mol Plant Pathol 8(1): 55-67.

Noriea, N. F., 3rd, C. N. Johnson, et al. (2010). "Distribution of type III secretion systems in Vibrio parahaemolyticus from the northern Gulf of Mexico." J Appl Microbiol 109(3): 953-962.

Ochman, H. and E. A. Groisman (1996). "Distribution of pathogenicity islands in Salmonella spp." Infect Immun 64(12): 5410-5412.

59

Pallen, M. J., S. A. Beatson, et al. (2005). "Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective." FEMS Microbiol Rev 29(2): 201-229.

Puigbo, P., I. G. Bravo, et al. (2008). "E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI)." BMC Bioinformatics 9: 65.

Rainbow, L., C. A. Hart, et al. (2002). "Distribution of type III secretion gene clusters in Burkholderia pseudomallei, B. thailandensis and B. mallei." J Med Microbiol 51(5): 374-384.

Roden, J. A., B. Belt, et al. (2004). "A genetic screen to isolate type III effectors translocated into pepper cells during Xanthomonas infection." Proc Natl Acad Sci U S A 101(47): 16624-16629.

Saier, M. H., Jr. (2004). "Evolution of bacterial type III protein secretion systems." Trends Microbiol 12(3): 113-115.

Shao, F., P. M. Merritt, et al. (2002). "A Yersinia effector and a Pseudomonas avirulence protein define a family of cysteine proteases functioning in bacterial pathogenesis." Cell 109(5): 575-588.

Stockwell, V. O., K. B. Johnson, et al. (2002). "Antibiosis Contributes to Biological Control of Fire Blight by Pantoea agglomerans Strain Eh252 in Orchards." Phytopathology 92(11): 1202-1209.

Suarez, M. and H. Russmann (1998). "Molecular mechanisms of Salmonella invasion: the type III secretion system of the pathogenicity island 1." Int Microbiol 1(3): 197-204.

Subtil, A., A. Blocker, et al. (2000). "Type III secretion system in Chlamydia species: identified members and candidates." Microbes Infect 2(4): 367-369.

Tamura, K., J. Dudley, et al. (2007). "MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0." Mol Biol Evol 24(8): 1596-1599.

Tardy, F., F. Homble, et al. (1999). "Yersinia enterocolitica type III secretion- translocation system: channel formation by secreted Yops." EMBO J 18(23): 6793-6799.

Thompson, J. D., D. G. Higgins, et al. (1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice." Nucleic Acids Res 22(22): 4673-4680.

60

Toh, H., B. L. Weiss, et al. (2006). "Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host." Genome Res 16(2): 149-156.

Tomljenovic-Berube, A. M., D. T. Mulder, et al. (2010). "Identification of the regulatory logic controlling Salmonella pathoadaptation by the SsrA-SsrB two-component system." PLoS Genet 6(3): e1000875.

Troisfontaines, P. and G. R. Cornelis (2005). "Type III secretion: more systems than you think." Physiology (Bethesda) 20: 326-339.

Viprey, V., A. Del Greco, et al. (1998). "Symbiotic implications of type III protein secretion machinery in Rhizobium." Mol Microbiol 28(6): 1381-1389.

Yu, X.-J., M. Liu, et al. (2004). "SsaM and SpiC interact and regulate secretion of Salmonella Pathogenicity Island 2 type III secretion system effectors and translocators." Molecular Microbiology 54(3): 604-619.

Zwiesler-Vollick, J., A. E. Plovanich-Jones, et al. (2002). "Identification of novel hrp- regulated genes through functional genomic analysis of the Pseudomonas syringae pv. tomato DC3000 genome." Mol Microbiol 45(5): 1207-1218.

61

Appendix

Appendix A. Perl scripts used in effector scan.

A.1 Separate.pl

##################################################################### # Purpose: # The purpose of this script is to separate the contigs from a genbank file into individual contig Genbank files for further processing. # # Input Files: # The script is designed to handle *.txt # # Output Files: # "Species".cintig."contig number".gb #######################################################################

#!/usr/bin/perl use strict; use warnings;

#Open file directory opendir (DIRECTORY, "F:\\");

#Declare variables my @filenames; my $filename; my @gbfile; my $contig; my $in_gb; my $line; my $i;

#Grab filenames from directory @filenames = grep (/txt/, readdir DIRECTORY); #Go through each file foreach $filename (@filenames) {

open (GBFILE, $filename); @gbfile = ; close GBFILE;

#identifies each contig, copies it and puts it into a new output file $i=1; foreach $line (@gbfile) { if( $line =~ /^\/\/\n/ ) { open (OUTFILE, ">Yokelenna.contig.$i.gb"); print OUTFILE "$contig\/\/"; close OUTFILE; $contig=''; $i++; } else { $contig .= $line; } } }

62

A.2 One Script.pl

##################################################################### # # One Script (a.k.a. One script to rule them all.pl) # # Purpose: # The purpose of this script is to extract all of the open reading frames (ORFS) from a *.gb file (NCBI genome file), search the first 50 amino acids of the protein sequence of the ORF for a type III secretion system (T3SS) specific N-terminal secretion signal and look at the 200 bases upstream (before) the ORF for the presence of one of three specific promoter boxes. # # The N-terminal secretion signal has three criteria: # 1. Greater than 10% Serine (S) in the first 50 amino acids --> S(50) # 2. Isoleucine (I), Leucine (L), Valine (V), Alanine (A) or Proline (P) in the third or fourth position --> ILVAP(3/4) # 3. No Aspartic Acid (D) or Glutamic Acid (E) in the first twelve amino acids --> DC (12) # # The strongest N-terminal sequences are the ones which have met all three of these conditions (a.k.a. Calvin). These sequences are used to identify T3SS effector proteins (T3SEs), which are targeted for secretion by the T3SS. The program also prints out any N-terminal secretion systems that meet and two of the three criteria, as well as sequences that only meet one of the criterion. Inevitably, there will be false positive results, and there will also be sequences that do not meet these criteria that are indeed T3SEs. The aim of this survey is to look for the ones that follow these criteria, which have been defined as a consensus sequence. # # The script also looks in the 200 base pairs upstream of each ORF for one of three promoter boxes (a.k.a. Hobbes): # 1. Plant T3SS Hrp Box --> GREEN a.k.a. Louis: # [GT]GGA[GA]C[TC].{15,17}CCAC # 2.Salmonella Pathogenicity Island-1 (SPI-1) T3SS HilA Box --> RED a.k.a. Huey: # G[GT]T{5,8}.{1,7}A{3,5} # 3. SPI-2 T3SS SsrA Box --> BLUE a.k.a. Dewey: # Srtong box: .T[CA]AGG[CT].[AT][AT][AT]..C.GAT # Weak box: .T.[AC][GA][GC][TC].[AT][AT].[AT][ACT]C.[GT]AT # # The promoter boxes are searched for independently, so that in the event that there is more than one box in a given upstream region (due to the presence of a false positive result), the screen will still correctly identify the correct promoter box. Also, there are two search patterns for the SsrA box as a result of two distinct patterns that emerged from Coombs et al. (2011) PloSOne. When tested, the strong box pulled up effectors that the weak box did not pull up and vice versa. Ergo, they have both been incorporated into the script (the weak box was put in as a separate promoter box, but still outputs to the same files as the strong box). # # Input Files:

63

# The script is designed to handle *.gb files only. # # Output Files: # The script is designed to print 14 output files per genome (as records), and 8 output files per run (summary files for all genomes). # # The 14 output files per genome and their formats are as follows, where $FILENAME --> the name of the organism analyzed, *.fas denotes fasta formatted output, *.table denotes tab delimited output compatible with Excel: # # $FILENAME.ORFs.fas --> Prints all ORFS from genome # $FILENAME.upstreams.fas --> Prints all upstream regions from genome # $FILENAME.aa.SeqOut.fas --> Prints all proteins from genome # $FILENAME.Nterm.3hits.fas --> Prints all N-term sequences if meets all three condtions # $FILENAME.Nterm.fas --> Prints all N-term sequences as a log file # $FILENAME.Nterm.S50.fas --> Prints all N-terms sequences that meet 10% Serine condition # $FILENAME.Nterm.ILVAP.fas --> Prints all N-terms sequences that meet ILVAP condition # $FILENAME.Nterm.DE12.fas --> Prints all N-terms sequences that meet No DE condition # $FILENAME.Nterm.2hits.fas --> Prints all N-terms sequences that meet 2 of 3 condition # $FILENAME.Hrp.Box.fas --> Prints all Hrp Box hits # $FILENAME.HilA.Box.fas --> Prints all HilA Box hits # $FILENAME.SsrA.Box.fas --> Prints all SsrA Box hits # $FILENAME.Calvin-and-Hobbes.txt --> Prints ORF if has both N-term and promoter # $FILENAME.Table.table --> Prints everything in a summary table # # The 8 ouput files per run of One Script.pl and their formats are as follows, .out denotes output file: # # N-term.Summary.out --> Prints all N-term 3 of 3 in fasta # Promoter-Box.Summary.out --> Prints all promoter boxes and which one # Calvin.and.Hobbes.Summary.out --> Prints all N-term with promoter box # Calvin.Only.Summary.out --> Prints all N-term without promoter box # Hobbes.Only.Summary.out --> Prints all promoter box without N-term # Louis.Summary.out --> Prints all Hrp Box Orfs in Fasta # Huey.Summary.out --> Prints all HilA Box Orfs in Fasta # Dewey.Summary.out --> Prints all SsrABox Orfs in Fasta ##################################################################### use warnings;

# Open location of files for analysis # Need to change directory in order for file to work opendir (DIRECTORY, "C:/Users/Morgan/Desktop/Run DMC Condensed/");

# Declare all filename variables for the output files # Names do not need to be changed as they are declared in the program my $OutputFileName; my $OutputFileName1; my $OutputFileName2; my $OutputFileName3; my $OutputFileName4; my $OutputFileName5; my $OutputFileName6; my $OutputFileName7; my $OutputFileName8; my $OutputFileName9; my $OutputFileName10; my $OutputFileName11;

64 my $OutputFileName12; my $OutputFileName100; my $OutputFileName101; my $OutputFileName102; my $OutputFileName103; my $OutputFileName104; my $OutputFileName105; my $OutputFileName106; my $OutputFileName107; my $OutputFileName108;

# Declare all global variables used in the program both as sequence holders # and as counters and checks my $Calvin; my $noCalvin; my $Hobbes; my $noHobbes; my $Susie; my $Tigger; my $Count; my $BigBird; my $Beaker; my $upstream; my $whatupstream; my $Protein; my $pass; my $nseq; my $position; my $nser; my $aa; my $k; my $l;

# Open directory and read all files and grab all file names # ending in .gb and put the filenames into an array @filenames = grep (/gb/, readdir DIRECTORY);

#Open summary output files, these files are global and are only opened once # foreach of the .gb files used. They display data for all genomes analyzed. $OutputFileName101="N-term.Summary.out"; open (OUTFILE101,">$OutputFileName101") || die "Could not open $OutputFileName101\n"; $OutputFileName102="Promoter-Box.Summary.out"; open (OUTFILE102,">$OutputFileName102") || die "Could not open $OutputFileName102\n"; $OutputFileName103="Calvin.and.Hobbes.Summary.out"; open (OUTFILE103,">$OutputFileName103") || die "Could not open $OutputFileName103\n"; $OutputFileName104="Calvin.Only.Summary.out"; open (OUTFILE104,">$OutputFileName104") || die "Could not open $OutputFileName104\n"; $OutputFileName105="Hobbes.Only.Summary.out"; open (OUTFILE105,">$OutputFileName105") || die "Could not open $OutputFileName105\n"; $OutputFileName106="Louis.Summary.out"; open (OUTFILE106,">$OutputFileName106") || die "Could not open $OutputFileName106\n"; $OutputFileName107="Huey.Summary.out"; open (OUTFILE107,">$OutputFileName107") || die "Could not open $OutputFileName107\n"; $OutputFileName108="Dewey.Summary.out"; open (OUTFILE108,">$OutputFileName108") || die "Could not open $OutputFileName108\n";

$OutputFileName1001="All.Contigs.Table.table"; open (OUTFILE1001,">$OutputFileName1001") || die "Could not open $OutputFileName1001\n"; print OUTFILE1001 "Pantoea stewartii subsp. stewartii DC283\n\n"; print OUTFILE1001 "Product\tS(50)\tILVAP\tNoDE\tGreen\tHrpBox\tRed\tHilABox\tBlue\tSsrABo x\tPromoter and N-term";

# For each loop designed to go through each gb file in directory foreach $filename (@filenames) {

65

# Reset all of the variables used in this loop prior to processing genome $genome = ''; $in_sequence = 0; $line = ''; @gbfile = (); $GB= ''; @GB= (); $ORF = ''; $start = ''; $end = ''; $length = ''; $product = ''; $i = '';

# Extract the file name from the *.gb file so that it can be used to # keep track of which genome the result comes from $filename =~ m/(.+).gb/ig; # Extract file name without .gb $FILENAME = $1; #Set to $FILENAME: use this for printing out files

open (GBFILE, $filename); # Open GB file @gbfile = ; # Grab contents of file close GBFILE; # Close GB file

# Extract all the sequence lines foreach $line (@gbfile) { if( $line =~ /^\/\/\n/ ) { last; } # Denotes end of file elsif( $in_sequence ) { $genome .= $line; } # If not end of file, add sequence to $genome elsif ( $line =~ /^ORIGIN/ ) { $in_sequence = 1; } # Denotes beginning of genome } $genome =~ s/[\s0-9]//g; # Remove all numbers and spaces from $genome

# Extract the CDS data from the gb file and capture lines with # CDS, Product or Strain foreach $line (@gbfile) { $line =~ s/\n//g; if ( $line =~ /\s+CDS\s+|\/product/i ) { $GB .= $line; } elsif ( $line =~ /\/strain="(.+)"/ ) { $strain = $1; } }

# Split the array on CDS, generating a new scalar for eac gene. $GB =~ s/\.\./-/ig; @GB = split(/\s+CDS/,$GB);

# Open all output files that are genome specific here, so that a new # file opens for each genome (generates several output files) $OutputFileName="$FILENAME.ORFs.fas"; # Prints all ORFS from genome open (OUTFILE,">$OutputFileName") || die "Could not open $OutputFileName\n"; $OutputFileName1="$FILENAME.upstreams.fas"; open (OUTFILE1,">$OutputFileName1") || die "Could not open $OutputFileName1\n"; $OutputFileName11="$FILENAME.aa.SeqOut.fas";

66

open (OUTFILE11,">$OutputFileName11") || die "Could not open $OutputFileName11\n";

$OutputFileName2="$FILENAME.Nterm.3hits.fas"; open (OUTFILE2,">$OutputFileName2") || die "Could not open $OutputFileName2\n"; $OutputFileName3="$FILENAME.Nterm.fas"; open (OUTFILE3,">$OutputFileName3") || die "Could not open $OutputFileName3\n"; $OutputFileName8="$FILENAME.Nterm.S50.fas"; open (OUTFILE8,">$OutputFileName8") || die "Could not open $OutputFileName8\n"; $OutputFileName9="$FILENAME.Nterm.ILVAP.fas"; open (OUTFILE9,">$OutputFileName9") || die "Could not open $OutputFileName9\n"; $OutputFileName10="$FILENAME.Nterm.DE12.fas"; open (OUTFILE10,">$OutputFileName10") || die "Could not open $OutputFileName10\n"; $OutputFileName12="$FILENAME.Nterm.2hits.fas"; open (OUTFILE12,">$OutputFileName12") || die "Could not open $OutputFileName12\n";

$OutputFileName4="$FILENAME.Hrp.Box.fas"; open (OUTFILE4,">$OutputFileName4") || die "Could not open $OutputFileName4\n"; $OutputFileName5="$FILENAME.HilA.Box.fas"; open (OUTFILE5,">$OutputFileName5") || die "Could not open $OutputFileName5\n"; $OutputFileName6="$FILENAME.SsrA.Box.fas"; open (OUTFILE6,">$OutputFileName6") || die "Could not open $OutputFileName6\n";

$OutputFileName7="$FILENAME.Calvin-and-Hobbes.fas"; open (OUTFILE7,">$OutputFileName7") || die "Could not open $OutputFileName7\n"; $OutputFileName100="$FILENAME.Table.table"; open (OUTFILE100,">$OutputFileName100") || die "Could not open $OutputFileName100\n";

# Prints the genome name and the column headers for the table file # Table file is printed in tabulated format and can be opened in excel print OUTFILE100 "$FILENAME\n\n"; print OUTFILE100 "Product\tS(50)\tILVAP\tNoDE\t1of3\t2of3\t3of3\t0of3\tGreen\tHrpBox\tRe d\tHilABox\tBlue\tSsrABox\tNoBox\tCalvin and Hobbes\n\n";

print OUTFILE1001 "\n$FILENAME\n";

$BigBird=0; # Counter for 101 N-term $Beaker=0; # Counter for 102 Promoter-Box

# Loops through each ORF in the genome and processes it for($i=1;$i<=$#GB;$i++) { # Looks for compliment CDS in the gb file if ($GB[$i] =~ /complement/i) {# Extracts the complement DNA ORFS from $genome

67

$GB[$i] =~ /(\d+)-(\d+)/; # Looks for the position of the CDS $start = $1; $end = $2; $length = $end - $start; $GB[$i] =~ /\"(.+)/; # Looks for the name of the gene product $product = $1; $product =~ s/\"//; $ORF = substr ($genome, $start-1, $length+1);

# Extracts ORF and the upstream and generates the reverse compliment $genome =~ m/$ORF(.{200})/; $upstream = $1; $upstream = reverse ($upstream); $upstream =~ tr/ACGTacgt/TGCAtgca/; $ORF = reverse ($ORF); $ORF =~ tr/ACGTacgt/TGCAtgca/; } else { # Extracts direct ORFS from $genome $GB[$i] =~ /(\d+)-(\d+)/; $start = $1; $end = $2; $length = $end - $start;

# Extracts direct ORF and upstream $GB[$i] =~ /\"(.+)/; $product = $1; $product =~ s/\"//; $ORF = substr ($genome, $start-1, $length+1); $genome =~ m/(.{200})$ORF/; $upstream = $1; }

# Prints the ORF and upstream to files along with fasta format print OUTFILE ">$strain\_$start\_$end\# $product\n$ORF\n\n"; print OUTFILE1 ">$strain\_$start\_$end\# $product\n$upstream\n\n";

####################################################################### # Translates the ORF to amino acid, one codon at a time my $SeqLength; my $codon; $SeqLength = length ($ORF); $Protein=''; # Calls subroutine to convert DNA to protein one codon at a time for (my $k=0; $k < ($SeqLength-2); $k+=3) { $codon = substr($ORF, $k, 3); $Protein .= codon2aa($codon); }

# Prints protein sequence to file in fasta format print OUTFILE11 ">$product\n$Protein\n\n"; print OUTFILE100 "$product\t";

# Set all flags and counters to defaults for each gene $Calvin=0; $noCalvin=1; $pass=0; $fail=0; $chair=0; $nser=0; $Count=0; $ser=0; $three4=0; $de=0; $Susie=0; $Tigger=0;

68

# Searches for greater than 10% Seine in the first 50 amino acids of the protein for ($position=0; $position<50; $position++) { $aa = substr($Protein, $position, 1); if ($aa eq 'S') { $nser++; } } if ($nser>4){ $ser=1; $pass=1; $Count++; print OUTFILE3 ">$product\n$ORF\n\# S(50) "; print OUTFILE8 ">$product\n$ORF\n\n"; } else { $ser=0; }

# Print if greater than 10% Serine if ($ser) { print OUTFILE100 "+\t"; } else { print OUTFILE100 "-\t"; }

# Search for ILVAP in the third of fourth position my $threefour = substr($Protein, 2, 2); if ($threefour =~ /([ILVAP])/ ) { $three4=1; $fail=1; $Count++; if ($pass){ print OUTFILE3 "ILVAP(3/4) "; } else {print OUTFILE3 ">$product\n$ORF\n\# ILVAP(3/4) "; } print OUTFILE9 ">$product\n$ORF\n\n"; } else { $three4=0; }

# Print if ILVAP in the third or fourth position if ($three4) { print OUTFILE100 "+\t"; } else { print OUTFILE100 "-\t"; }

# Looks for no D or E in the first 12 my $twelve = substr($Protein, 0, 12); if ($twelve =~ /([DE])/) { $de=0; } else { $de=1; $chair=1; $Count++; if ($pass || $fail || $pass && $fail){ print OUTFILE3 "DE(12)"; } else { print OUTFILE3 ">$product\n$ORF\n\# DE(12)"; } print OUTFILE10 ">$product\n$ORF\n\n"; }

# Print if no D or E in the first 12 if ($de) { print OUTFILE100 "-\t"; } else { print OUTFILE100 "+\t"; }

# Prints any combination of the above results in the table if ( $Count==1 ) { print OUTFILE100 "+\t-\t-\t-\t"; } elsif ( $Count==2 ) { print OUTFILE100 "-\t+\t-\t-\t"; } elsif ( $Count==3 ) { print OUTFILE100 "-\t-\t+\t-\t"; } elsif ( $Count==0 ) { print OUTFILE100 "-\t-\t-\t+\t"; }

69

# Prints any combination of two of the above if ($ser && $three4 || $ser && $de || $three4 && $de){ print OUTFILE12 ">$product\n$ORF\n"; if ( $ser && $three4 ){ print OUTFILE12 "\#S(50) ILVAP(3/4)\n\n"; } elsif ( $ser && $de ){ print OUTFILE12 "\#S(50) DE(12)\n\n"; } elsif ( $three4 && $de ){ print OUTFILE12 "\#DE(12) ILVAP(3/4)\n\n"; } }

# Prints if all three conditions are met if ($ser && $three4 && $de) { print OUTFILE2 ">$product\n$ORF\n\n"; print OUTFILE101 "\#$FILENAME\n>$product\n$ORF\n\n"; $Calvin=1; $noCalvin=0; $BigBird=1; }

if ($ser && $three4 && $de) { print OUTFILE1001 "$product\t";

if ($ser) { print OUTFILE1001 "+\t"; } else { print OUTFILE1001 "-\t"; }

if ($three4) { print OUTFILE1001 "+\t"; } else { print OUTFILE1001 "-\t"; }

if ($de) { print OUTFILE1001 "+\t"; } else { print OUTFILE1001 "-\t"; }

$Susie=1; } else { $Tigger=1; }

####################################################################### # Sets all flags to default for each gene $Hobbes=0; $GREEN=0; $RED=0; $SBLUE=0; $WBLUE=0; $whatupstream=$upstream;

# Search for Hrp Box and print to chosen files #Original search pattern: [GT]GGA[GA]C[TC].{15,19}CCAC #Modified search pattern: GGAAC[CT]n{16}[CG]CAC if ($upstream =~ m/([GT]GGA[GA]C[TC].{15,19}CCAC)/ig){ $hrpbox =$1; $local = pos($upstream); $location = 200 - $local; print OUTFILE4 ">P\-$location\_$product\n$hrpbox\n\n"; print OUTFILE102 "\#$FILENAME\n\#Hrp Box\n>$product\n$ORF\n\n"; print OUTFILE106 "\#$FILENAME\n>$product\n$ORF\n\n"; $Beaker=1; $GREEN=1; } else { $GREEN=0; }

70

if ($GREEN) { print OUTFILE100 "+\t$hrpbox\t"; } else { print OUTFILE100 "-\t\t"; }

# Search for HilA Box and print to chosen files # Pantoea specific promoter box: [CG]GC..T..A….A..GAA if ($upstream =~ m/([CG]GC..T..A….A..GAA)/ig){ $hilabox = $1; $local = pos($upstream); $location = 200 - $local; print OUTFILE5 ">1\-$location\_$product\n$hilabox\n\n"; print OUTFILE102 "\#$FILENAME\n\#HilaA Box\n>$product\n$ORF\n\n"; print OUTFILE107 "\#$FILENAME\n>$product\n$ORF\n\n"; $Beaker=1; $RED=1; } else { $RED=0; }

if ($RED) { print OUTFILE100 "+\t$hilabox\t"; } else { print OUTFILE100 "-\t\t"; }

# Search for Ssra Box using two different algorithms and print # Original strong promoter box for SsrA: .T[CA]AGG[CT].[AT][AT][AT]..C.GAT # Original weak promoter box for SsrA: .T.[AC][GA][GC][TC].[AT][AT].[AT][ACT]C.[GT]AT # Modified search pattern for Pantoea specific search: T[ACT][AC][GA]GT[TAC][AT][AT]nA[CAT]CT[GT]AT if ($whatupstream =~ m/(.T[CA]AGG[CT].[AT][AT][AT]..C.GAT)/ig){ $ssrabox = $1; $local = pos($whatupstream); $location = 200 - $local; print OUTFILE6 ">2\-$location\_$product\n$ssrabox\n\n"; print OUTFILE102 "\#$filename\n\#Strong SsrA Box\n>$product\n$ORF\n\n"; print OUTFILE108 "\#$filename\#Strong SsrA Box\n>$product\n$ORF\n\n"; $Beaker=1; $SBLUE=1; } elsif ($whatupstream =~ m/(.T.[AC][GA][GC][TC].[AT][AT].[AT][ACT]C.[GT]AT)/ig){ $ssrabox = $1; $local = pos($whatupstream); $location = 200 - $local; print OUTFILE6 ">2\-$location\_$product\n$ssrabox\n\n"; print OUTFILE102 "\#$FILENAME\n\#Weak SsrA Box\n>$product\n$ORF\n\n"; print OUTFILE108 "\#$FILENAME\#Weak SsrA Box\n>$product\n$ORF\n\n"; $Beaker=1; $WBLUE=1; } else { $SBLUE=0; $WBLUE=0; }

# If statements to print to specific files and table if ($SBLUE || $WBLUE) { print OUTFILE100 "+\t$ssrabox\t"; }

71

else { print OUTFILE100 "-\t\t"; }

if ($GREEN || $RED || $SBLUE || $WBLUE){ $Hobbes=1; $noHobbes=0; print OUTFILE100 "+\t"; } else { $Hobbes=0; $noHobbes=1; print OUTFILE100 "-\t"; }

if ($GREEN || $RED || $SBLUE || $WBLUE){ if ($Susie) { if ($GREEN) { print print OUTFILE1001 "+\t$hrpbox\t"; } else { print OUTFILE1001 "-\t-\t"; }

if ($RED) { print print OUTFILE1001 "+\t$hilabox\t"; } else { print OUTFILE1001 "-\t-\t"; }

if ($SBLUE || $WBLUE) { if ($SBLUE) { print print OUTFILE1001 "S\t$ssrabox\t"; } elsif ($WBLUE) { print print OUTFILE1001 "W\t$ssrabox\t"; } } else { print OUTFILE1001 "-\t-\t"; } } elsif ($Tigger) { print OUTFILE1001 "$product\t-\t-\t-\t"; if ($GREEN) { print print OUTFILE1001 "+\t$hrpbox\t"; } else { print OUTFILE1001 "-\t-\t"; }

if ($RED) { print print OUTFILE1001 "+\t$hilabox\t"; } else { print OUTFILE1001 "-\t-\t"; }

if ($SBLUE || $WBLUE) { if ($SBLUE) { print print OUTFILE1001 "S\t$ssrabox\t"; } elsif ($WBLUE) { print print OUTFILE1001 "W\t$ssrabox\t"; } } else { print OUTFILE1001 "-\t-\t"; } } } elsif ($Susie && $noHobbes) { print OUTFILE1001 "-\t-\t-\t-\t-\t- \t-\n"; }

# Prints if ORF has both promoter and N-term secretion signal if ($Calvin && $Hobbes){ print OUTFILE7 ">$product\n$ORF\n\n"; print OUTFILE100 "+\t\n"; print OUTFILE1001 "+\t\n"; if ($GREEN) { print OUTFILE103 "\#$FILENAME\t\#Hrp Box\n>$product\n$ORF\n\n"; } elsif ($RED) { print OUTFILE103 "\#$FILENAME\t\#HilA Box\n>$product\n$ORF\n\n"; }

72

elsif ($SBLUE) { print OUTFILE103 "\#$FILENAME\t\#Strong SsrA Box\n>$product\n$ORF\n\n"; } elsif ($WBLUE) { print OUTFILE103 "\#$FILENAME\t\#Weak SsrA Box\n>$product\n$ORF\n\n"; } } elsif ($Tigger && $Hobbes) { print OUTFILE1001 "-\t\n"; } else { print OUTFILE100 "-\t\n"; } # End of Hobbes

# Prints if only has N-term or Promoter but not both to specific files if ($Calvin && $noHobbes) { print OUTFILE104 "\#$FILENAME\n>$product\n$ORF\n\n"; } elsif ($Hobbes && $noCalvin) { print OUTFILE105 "\#$FILENAME\n>$product\n$ORF\n\n"; }

} # After this bracket, outside of the FOR loop --> no longer extracting CDS from the $genome } # After this bracket, no longer in the GB file. Will start a new GB file after this bracket.

####################################################################### # Codon to Amino Acid Subroutine # When passed a series of three nucleotides, returns the corresponding amino acid ####################################################################### sub codon2aa { my($codon)=@_; if ($codon =~ /GC[GATC]/i) {return 'A'} #Alanine elsif ($codon =~ /TG[TC]/i) {return 'C'} #Cysteine elsif ($codon =~ /GA[TC]/i) {return 'D'} #Aspartic Acid elsif ($codon =~ /GA[AG]/i) {return 'E'} #Glutamic Acid elsif ($codon =~ /TT[TC]/i) {return 'F'} #Phenylalanine elsif ($codon =~ /GG[GATC]/i) {return 'G'} #Glycine elsif ($codon =~ /CA[TC]/i) {return 'H'} #Histdine elsif ($codon =~ /AT[TCA]/i) {return 'I'} #Isoleucine elsif ($codon =~ /AA[AG]/i) {return 'K'} #Lysine elsif ($codon =~ /TT[AG]|CT[GATC]/i){return 'L'} #Leucine elsif ($codon =~ /ATG/i) {return 'M'} #Methionine elsif ($codon =~ /AA[TC]/i) {return 'N'} #Asparagin elsif ($codon =~ /CC[GATC]/i) {return 'P'} #Proline elsif ($codon =~ /CA[AG]/i) {return 'Q'} #Glutamine elsif ($codon =~ /CG[GATC]|AG[AG]/i){return 'R'} #Arginine elsif ($codon =~ /TC[GATC]|AG[TC]/i){return 'S'} #Serine elsif ($codon =~ /AC[GATC]/i) {return 'T'} #Threonine elsif ($codon =~ /GT[GATC]/i) {return 'V'} #Valine elsif ($codon =~ /TGG/i) {return 'W'} #Tryptophan elsif ($codon =~ /TA[TC]/i) {return 'Y'} #Tyrosine elsif ($codon =~ /TA[AG]|TGA/i) {return ''} #Stop else {print STDERR "Bad codon $codon!!\n"; return '';} #Error }

73

Appendix B. Information regarding the strains used in the genetic survey.

Strain Origin Isolated Host Patient Information 83 Environment 1 Isolated from wheat 625 Environment 1 Isolated from grain sorghum 626 Environment 1 Isolated from corn 788 Environment 1 Green bean 1373 Environment 1 Balsam 1512 Environment 1 Green bean 1574 Environment 1 Unidentified 3581 Environment 1 Oat seed 5565 Environment 1 Soybean 6686 Clinical, human 2 Unidentified Headache 7373 Environment 1 Onion 7612 Animal 1 Grassgrub 7703 Clinical, human 2 Unidentified Abdominal pain 10854 Clinical, human 2 Ruptured appendix 12202 Environment 1 Melon 12531 Environment 1 Baby's breath 12534 Clinical, human 1 Knee laceration 13301 Environment 1 Golden delicious apple 15320 Environment 1 Rice 17124 Environment 1 Olive 17671 Environment 1 Rice 81828 Clinical, human 2 Unidentified Post Hemicholectomy 91151 Clinical, human 2 Unidentified 101150 Clinical, human 2 Unidentified 770398 Clinical, human 3 Blood Female Cerebellar CVA 062465A Clinical, human 2 Unidentified (stroke) Cerebellar CVA 062465B Clinical, human 2 Unidentified (stroke) 091957A Clinical, human 2 Renal failure 091957B Clinical, human 2 Renal failure 240R Environment 4 None 26SR6 Environment 4 None 299R Environment 4 None 308R Environment 4 None B011017 Clinical, human 5 Superficial wound back 11 year old female B014130 Clinical, human 5 Superficial wound leg 11 year old male B015092 Clinical, human 5 Urine midstream 9 years old female

74

Strain Origin Isolated Host Patient Information B016375 Clinical, human 5 Finger 56 year old female B016381 Clinical, human 5 Groin 52 year old female B016395 Clinical, human 5 Superficial wound forearm 83 year old female B021323 Clinical, human 5 Urine midstream 28 year old female B024858 Clinical, human 5 Breast abscess 26 year old female B025670 Clinical, human 5 Superficial wound toe 13 year old male B026440 Clinical, human 5 Superficial wound toe 10 year old male B7 Environment 4 Unidentified BB350028A Clinical, human 2 Blood culture, fever Female BB350028B Clinical, human 2 Blood culture, fever Female BB834250 Clinical, human 2 Sputum, aortic aneurysm Female BB957621A1 Clinical, human 2 CAPD dialysate, peritonitis Male BB957621A2 Clinical, human 2 CAPD dialysate, peritonitis Male BB957621B1 Clinical, human 2 CAPD dialysate, peritonitis Male BB957621C1 Clinical, human 2 CAPD dialysate, peritonitis Male BB957621C2 Clinical, human 2 CAPD dialysate, peritonitis Male BRT 175 Environment 6 Unidentified BRT 98 Environment 4 Unidentified Cit30-11R Environment 4 Unidentified Biocontrol of Eh318 Environment 7 Erwiniaamylovora EhWHF18 Environment 6 Unidentified EhWHL9 Environment 6 Unidentified F9026 Clinical, human 3 Blood Male G0063668 Clinical, human 8 CSF G2291404 Clinical, human 8 Blood G3271436 Clinical, human 8 Urine G4032547 Clinical, human 8 Ear G4061350 Clinical, human 8 Urine G4071105 Clinical, human 8 Urine H42501 Clinical, human 3 Blood Male M1517 Clinical, human 3 Blood Female M1657A Clinical, human 3 Blood Male M1657B Clinical, human 3 Blood Male M232A Environment 4 Unidentified M41864 Clinical, human 3 Blood Female Pantoea agglomerans DC432 Plant 9 Type strain Pantoea ananatis DC434 Plant 9 Type strain

75

Strain Origin Isolated Host Patient Information Pantoea herbicola DC556 Plant 9 Type strain Pantoea stewartii DC283 Plant 9 Type strain SM02150 Environment 10 Water with water SM03214 Environment 10 Goose feces SN01080 Environment 10 Slug SN01121 Environment 10 Bee SN01122 Environment 10 Bee SN01170 Environment 10 Caterpillar SP00101 Environment 10 Raspberry bush SP00202 Environment 10 Apple SP00303 Environment 10 Raspberry bush SP01201 Environment 10 Strawberry SP01202 Environment 10 Strawberry leaf and stem SP01220 Environment 10 Healthy rose bush Virginia creeper leaves and SP01230 Environment 10 stem SP02021 Environment 10 Thistle leaf SP02230 Environment 10 Unidentified tree SP02243 Environment 10 Unidentified tree SP02243 Environment 10 Unidentified tree SP03310 Environment 10 Unidentified tree SP03310 Environment 10 Unidentified tree SP03372 Environment 10 Corn SP03372 Environment 10 Corn SP03383 Environment 10 Corn SP03383 Environment 10 Corn SP03391 Environment 10 Unidentified SP03391 Environment 10 Unidentified SP03412 Environment 10 Bean leaf SP03412 Environment 10 Bean leaf SP04010 Environment 10 Tomato leaf SP04010 Environment 10 Tomato leaf SP04011 Environment 10 Tomato SP04011 Environment 10 Tomato SP04013 Environment 10 Tomato leaf SP04013 Environment 10 Tomato leaf SP04021 Environment 10 Tomato leaf SP04021 Environment 10 Tomato leaf

76

Strain Origin Isolated Host Patient Information SP04022 Environment 10 Tomato leaf SP04022 Environment 10 Tomato leaf SP05051 Environment 10 Tomato leaf SP05051 Environment 10 Tomato leaf SP05052 Environment 10 Tomato leaf SP05052 Environment 10 Tomato leaf SP05061 Environment 10 Tomato leaf SP05061 Environment 10 Tomato leaf SP05091 Environment 10 Tomato leaf SP05091 Environment 10 Tomato leaf SP05092 Environment 10 Tomato leaf SP05092 Environment 10 Tomato leaf SP05120 Environment 10 Corn leaf SP05120 Environment 10 Corn leaf SP05130 Environment 10 Corn stamen SP05130 Environment 10 Corn stamen SS02010 Environment 10 Soil SS02010 Environment 10 Soil SS03231 Environment 10 Soil SS03231 Environment 10 Soil TX1 Clinical, human 11 CF sputum TX10 Clinical, human 11 CF sputum TX2 Clinical, human 11 CF sputum TX3 Clinical, human 11 Blood TX4 Clinical, human 11 Blood TX5 Clinical, human 11 Blood TX6 Clinical, human 11 Blood VB38951A Clinical, human 2 Blood culture, sore throat Female VB38951B Clinical, human 2 Blood culture, sore throat Female X44686 Clinical, human 3 Blood Female

77

1 Received from Maureen Fletcher, Landcare Research, New Zealand Culture of Plant Pathogens 2 Received from St. Boniface General Hospital, Winnipeg, Manitoba 3 Received from Sunnybrook Hospital, Toronto, Ontario 4 Received from Dr. Steven Lindow, Berkeley, California 5 Received from Dr. Paul Levett, Saskatchewan Disease Control Laboratory, Regina, Saskatchewan 6 Received from Gwyn Beattie, Iowa State University, Iowa 7 Received from Dr. Brion Duffy, Switzerland 8 Received from the General Hospital in Regina, Saskatchewan 9 Received from David Coplin, Columbus, Ohio 10 Isolated in Saskatchewan by summer students of Dr. John Stavrinides 11 Received from Texas Children`s Hospital, Houston, Texas

78

Appendix C. Host specificity determinants as a genetic continuum

Note: This appendix has been republished, with permission, from Kirzinger, MWB, and Stavrinides, J. Host specificity determinants as a genetic continuum. Trends in Microbiology. 20 (2): 88-93. License number 3004521270270.

Host specificity determinants as a genetic continuum

Morgan W. B. Kirzinger1 and John Stavrinides1

1Department of Biology, University of Regina, 3737 Wascana Parkway, Regina,

Saskatchewan, Canada, S4S0A2

Corresponding Author:

John Stavrinides Department of Biology University of Regina Regina, Saskatchewan S4S 0A2 Tel: (306) 337-8478 Fax: (306) 337-2410 [email protected]

79

Summary

Host specificity is an important concept that underlies the interaction of all clinically and agriculturally relevant bacteria with their hosts. Changes in the host specificity of animal pathogens, in particular, are often of greatest concern due to their immediate and unexpected impact on human health. Host switching or host jumps can often be traced to modification of key microbial pathogenicity factors that facilitate the formation of particular host associations. An increase in the number of genome-level studies has begun revealing that almost any type of change, from the simplest to the most complex, can potentially impact host specificity. This review highlights examples of host specificity determinants of viruses, bacteria, and fungi, and presents them from within a genetic continuum that spans from the single residue through to entire genomic islands.

80

Host specificity

Host specificity is a fundamental concept that describes the nature of host- microbe associations, and most frequently gains attention for its impact on our understanding of pathogen virulence potential, zoonoses, and emerging and reemerging infectious diseases (see Glossary); however, host specificity transcends the entire spectrum of parasitic, mutualistic, and commensalistic relationships, making it integral for understanding key coevolutionary processes that shape the highly intimate and rather complex interactions between organisms. Host specificity is used to describe the nature and extent of microbial adaptation to one or more particular hosts, with two broad categories being defined: generalists and specialists. A microorganism is described as a generalist if it can associate or infect multiple hosts with little selectivity, and a specialist if it is selective and able to colonize only one defined host or group of hosts [1]. Host specificity can be measured at both the interspecific level (extent of specialization in one species over another) and the intraspecific level (extent of specialization to hosts within a species), although the latter differs depending on whether the host is a plant or an animal.

Intraspecific host specificity in plants is often determined by resistance or R genes that recognize specific pathogen proteins or epitopes [2,3], while in animals and humans, genetic differences in immune determinants can lead to differential susceptibilities between populations [4,5,6].

Microbes utilize a suite of pathogenicity factors for associating with and causing disease in their respective hosts. When pathogenicity factors exploit a very particular feature of a given host to initiate an association, they are referred to as host specificity factors. Probably the most classic example of host specificity determinants are the nod

81 genes of the mutualist Rhizobium, which are part of an elaborate host-specific signal exchange between the host plant and the bacterium [7,8,9,10]. The Nod factors are recognized by specific receptor kinases in the plant, which results in the ability of the plant to respond to the microbe, and the subsequent formation of nitrogen-fixing nodules on the plant root [7,8,9,10]. In contrast, the generalist uses a combination of type III secretion, toxin production, and secreted P. aeruginosa pigments called phenazines to less selectively exploit a large number of hosts, including nematodes, humans, Arabidopsis thaliana, and many other plants and animals

[11,12,13,14,15]. It is the action of these particular pathogenicity determinants that define the host range, and thus the degree of host specificity exhibited by a microbe.

Because of their direct interaction with host substrates, and in particular, components of the host defense response, host-specificity determinants are subjected to strong selective pressures.

Evolutionary processes

Because pathogenicity and virulence determinants are the defining elements of host-specific interactions, evolutionary signatures of host association are often apparent in the comparative patterns of genetic and genomic variation in these determinants. This variation is often introduced by random mutation, recombination, and for microbes in particular, through horizontal gene transfer. Mutations or polymorphisms are the simplest forms of genetic change, and can result in an amino acid substitution, which may impact protein folding, substrate specificity, binding affinity, or protein function [16].

Recombination, a genetic process by which DNA or RNA is swapped with a similar piece leads to changes in patterns of polymorphism and sometimes, in new mosaic genes

82

[17]. Lastly, novel genetic variation is introduced through the process of horizontal gene transfer, wherein genetic fragments or intact transcriptional units are acquired and incorporated into the genome of a microbe [18]. Large genetic units that are acquired horizontally are referred to as pathogenicity, fitness, or genomic islands, and are often marked by a different %GC content, presence of mobile elements, and an insertion site near a transfer RNA [19,20] . Changes in host specificity determinants, whether by mutation, recombination, or acquisition of new genetic determinants that confer a selective advantage in a particular host environment can have a profound impact on host specificity, and can lead to host jumps.

The genetic continuum

When considering the underlying causes of changes in host specificity of a microbe, we often envision either gene gain or amino acid substitution in a particular epitope as the main mechanisms altering host-specific interactions. The movement toward larger genome-scale studies, however, has begun to reveal that host specificity determinants actually encompass a continuum that ranges from changes in single residues, through to gene domains, whole genes, gene repertoires, and eventually to entire genomic islands (Figure 1). Effectively, host specificity determinants identified from any pathosystem, whether involving viruses, bacteria, or fungi, can be placed and defined within this genetic continuum.

Gene: a single residue

At the finest level, single nucleotide substitutions, and thus single amino acid substitutions, are known to impact host-specific interactions. One of the best documented examples is the influenza virus, which undergoes mutation (antigenic drift) in several of

83 its key coding regions [21]. An amino acid substitution at residue 627 in subunit two of the H5N1 RNA polymerase (PB2), for example, has been shown to enable H5N1 pathogenesis in a mouse model [22,23]. A lysine residue at this position enables mouse infection, while a glutamic acid abolishes pathogenicity [23]. In fact, residue 627 of PB2 is a lysine in almost all human/mammalian influenza viruses, and glutamic acid in the majority of avian influenza viruses, with the exception of the ‘Qinghai Lake’ lineage of

H5N1 avian influenza viruses and its descendants [22], indicating host-specific adaptation. But point mutations that affect host specificity are not confined to the influenza virus. Point mutations in coding regions of bacteriophage Φ6 have also been shown to alter host specificity through experimental evolution. Bacteriophage Φ6 is a double stranded RNA virus, which normally has a broad host range that includes four pathogenic varieties or pathovars of Pseudomonas syringae (P. syringae pathovar phaseolicola, P. syringae pv. persicae, P. syringae pv. savastanoi and P. syringae pv. tagetis) [24]. Bacteriophage Φ6 was subjected to ~150 generations of serial passaging in

Pseudomonas pseudoalcaligenes ERA, yielding an ‘evolved’ phage population known as the Φ6E1narrow population. The Φ6E1narrow phage were unable to infect the original four P. syringae hosts, but had gained the ability to infect P. pseudoalcaligenes ERA [25,26].

This shift in host specificity was the direct result of a single nucleotide substitution

(G274A) in the host attachment gene, P3 [26]. Although there was an additional mutation in the untranslated 3' region of the medium segment, mutations in this non-coding region were suggested not to be involved [26,27].

Amino acid residue changes in key proteins of insect-transmitted viruses may lead to shifts in vector specificity, as seen in a recent human epidemic of the Chikungunya

84 virus. The Chikungunya virus is characterized by fever, severe conjunctivitis in addition to low back pain and symmetrical polyarthritis, predominantly in the knees, wrists, and shoulders [28]. Normally, Chikungunya virus is transmitted by the Aedes aegypti mosquito; however, an outbreak in 2005 on Runion Island was the result of an amino acid substitution (A226V) in the E1 gene that encodes the viral envelope protein, which allowed the virus to be transmitted by the Aedes albopictus mosquito [29]. Under laboratory conditions, the mutant Chikungunya virus strain LR (A226V), tagged with green fluorescent protein, was shown to be100-fold more infectious toward A. albopictus as compared to the original strain. The single amino acid substitution in the virus envelope protein was sufficient for the increase in infectivity toward a second mosquito species, A. albopictus, and the expansion of vector host range [29].

Changes in host specificity that are the direct result of changes in single residues may not be confined to virus-host or virus-vector interactions. Human-adapted serovars of Salmonella enterica were shown to have a distinct genetic polymorphism in two type

III secretion system (T3SS) proteins, SseC and SseF [30]. The T3SS is an extracellular needle-like complex that Gram-negative bacteria produce as a means of actively transporting effector proteins and toxins directly into host cells to facilitate pathogenesis

[31]. SseC along with SseB and SseD form the translocon at the tip of the T3SS, and are responsible for the attachment of the secretion system to the phagosomal membrane of human cells [30]. The sseC gene shows extensive sequence conservation across all human and multiple host serovars, whereas sseD alleles that are associated with the human-adapted serovars and multi-host serovars were found to be identical in sequence

[30]. Since these genes are responsible for the attachment of the T3SS to host

85 phagosomal membrane, strong genetic similarities in addition to a genetic polymorphism found only in human-associated serovars are suggested to be responsible for host-specific binding of the T3SS and thus, interactions with humans [30]. Furthermore, an additional protein, SseF, is responsible for recruiting dynein to the Salmonella-containing vacuole, which keeps the vacuole in a juxtanuclear orientation and allows for the bacteria to survive within the vacuole [30]. The single genetic polymorphism in SseF not only potentially plays a role in defining which host cells a certain serovar interacts with, but is also potentially responsible for later stages of infection that involve the host-specific interaction effector proteins and host cell substrates. In this example, and all other examples described above, single nucleotide or amino acid changes can have a dramatic impact on host-specific interactions.

Gene: a subgenic region

Changes in regions or domains of particular genes have also been linked to host- specific adaptation. A seven-strain microarray was utilized for comparative genomic hybridization to evaluate the genetic suites responsible for host specificity in 51 animal and 161 human isolates of Staphylococcus aureus collected from the United Kingdom

[32]. The seven reference strains of S. aureus used to generate the reference microarray were sequenced strains that were isolated from human infections. Two surface proteins found in all S. aureus animal isolates, FnaA and FnaB, contain hypervariable regions, which did not match any of the variable regions on the microarray due to sequence dissimilarity [32]. The variable regions differ between human and animal isolates, demonstrating that isolates with differing host specificities have different surface binding affinities that are dependent on their target host. Similarly, a coagulase gene, also found

86 in all S. aureus animal isolates, had different variable regions than the human isolates on the microarray, likely due to the interaction of host-adapted coagulase with host-specific blood prothrombin [33].

Portions of virulence-associated type III secreted effectors (T3SEs) have also been shown to determine host-specificity [34,35]. In one case, a specific domain within two T3SEs directs the host specificity of P. agglomerans pv. gypsophilae (Pag), a pathogen of gypsophila that elicits a hypersensitive response on beets, and P. agglomerans pv. betae (Pab) a pathogen of both beets and gypsophila [36]. HsvG, present in Pag and Pab, appears to be necessary for pathogenicity in gypsophila, whereas the HsvG homolog, HsvB, is present in both strains and appears to be necessary for host- specific interactions with beet [36]. Swapping of these two homologous effectors between strains was sufficient to alter their host specificity. Genetic dissection of each gene revealed that their host specific nature was dependent on the number of amino acid repeats present in a particular domain. HsvG required two direct amino acid repeats in order to confer pathogenicity on gypsophila, whereas HsvB required a single repeat for pathogenicity in beets [36]. These repeats appear to be involved in transcriptional regulation in the plant host, since HsvG and HsvB are translocated to the nucleus of host cells where they function as transcriptional regulators. The number of direct amino acid repeats present in the effectors acts as a determinant for preferential DNA binding, and thus, host specificity. For both Pantoea and Staphylococcus, two very different microbes, changes in variable or repeat regions in key loci play a direct role in host specificity.

Genome: one gene

87

Host specificity is known to be affected by the presence or absence of a gene. In

Vibrio fischeri, an abundant marine symbiont of the North Pacific squid, Euprymna scolopes, a two-component sensor kinase, RscS, is responsible for inducing the Syp exopolysaccharide, which results in biofilm formation and the colonization of the squid light organ [37,38]. A comparative genomic analysis of the squid symbiont V. fischeri

ES114 and the Japanese pinecone fish (Monocentris japonica) symbiont V. fischeri

MJ11, which does not colonize the squid light organ, revealed that rscS was present in the squid symbiont ES114, but absent from the fish symbiont MJ11. Expression of rscS in MJ11 allowed this strain to colonize the squid light organ, where the MJ11 strain without rscS could not, illustrating that this single regulatory gene is necessary and sufficient to allow for colonization of this particular host [37].

The nitrogen-fixing rhizobacteria exhibit a very narrow specificity for particular leguminous plant hosts with which they form mutualistic associations [39]. This host specificity has been shown to be mediated by the bacterial nod genes, which are induced in response to specific plant compounds [40]. A nodH mutant of Rhizobium meliloti loses its ability to associate with and nodulate its host, alfalfa; however, the mutant gains the ability to infect and cause nodule like structures on two, normally non-host plants, Vicia satvia and Vicia villosa [41]. This single genetic locus can therefore alter the host specific interactions of this particular mutualistic strain, making the nod genes important host specificity factors.

Single-gene host specificity determinants are not confined to the interactions between mutualists and their hosts. The entomopathogenic fungus Lecanicillium longisporum uses host-specific isoforms of subtilisin Pr1-like proteases to degrade the

88 cuticle of its insect hosts [42]. The proteases from five aphid-pathogenic isolates of L. longisporum exhibit substrate specificity, with that of isolate KV71 being shown to degrade the aphid cuticle at greater efficiency than that of the locust. The aphid-specific protease was induced by the presence of N-acetylglucosamine and showed significantly more activity on the aphid cuticle, suggesting that host-specific interactions are initiated through direct bacterial contact with the specific target host [42]. Other protease isoforms from the same isolate, however, are not activated by N-acetylglucosamine and are produced more prominently on substrates containing chitin [42]. For this entomopathogen, host preference is maintained by the induction and activity of substrate- specific, and thus, host-specific proteases. Single genes can therefore define host specificity for mutualistic and pathogenic microbes.

Genome: gene repertoires

Although host specificity can be modulated by a single gene, in some cases, changes in sets of genes have been shown to modulate host-specific interactions.

Ralstonia solanacearum is a plant pathogen of solanaceous plants; however, it is unable to cause disease in tobacco plants [43]. A double mutant of R. solanacearum that has two disrupted T3SEs, avrA and popP1, is capable of infecting and causing wilt in tobacco

[43]. Disruption of either of these T3SEs independently is insufficient to enable pathogenesis in tobacco, but does not alter virulence in either tomato or Arabidopsis [43].

Their ability to restrict pathogen infection of tobacco is likely due to host-specific resistance or R genes, yet both avrA and popP1 likely play an important role in the infection of solanaceous plants.

89

In Xanthomonas, larger T3SE repertoires appear to be important determinants of host specificity. A comprehensive study examining 35 T3SE proteins in 132 strains of

Xanthomonas axonopodis found that the T3SE repertoires were different between strains belonging to each of the 18 host-specific pathovars studied [44]. The majority of the 18 pathovars examined had conserved T3SE repertoires; however, there was still some variation between some groups, where some strains had largely identical T3SE repertoires and others showed more variation [44]. Most interesting though, were the differences between the T3SE repertoires of different strains within the same pathovar, which was attributed to the host from which the strains were isolated [44]. Six strains of

X. axonopodis pv. anacardii have nearly identical T3SE repertoires with the exception of two strains that were isolated from mango (Mangifera indica), which have an additional

T3SE (xopX) than the other four strains that were isolated from the cashew tree

(Anacardium occidentale) [44]. These differences in T3SE repertoires among different pathovars is also seen in plant pathogens such as P. syringae [45], and are a reflection of host-specific adaptation of these strains to their respective hosts.

Host-specific interactions involving plant pathogens are not defined only by

T3SEs. P. syringae pv. glycinea is a pathogen of soybean, whose flagellin protein triggers the hypersensitive response in both non-host tomato plants and host tobacco plants [46]. The strain carries a glycosylation island containing three open reading frames (orf1, orf2, and orf3), two of which (orf1 and orf2) encode putative glycosyltransferases. When orf1 and orf2 are mutated, P. syringae pv. glycinea grows equally well as the parent strain in tobacco, but loses its pathogenicity on the original soybean host. Although flagellin is a known microbe-associated molecular pattern or

90

MAMP, this example illustrates that these modifying enzymes, which glycosylate flagellin, function as an additional host specificity determinant to modulate host-specific interactions.

Genetic repertoires can also determine host specificity in mutualistic interactions.

The bacterium Xenorhabdus nematophila is a symbiont of the entomophagous nematode,

Steinernema carposapsae [47]. Juvenile nematodes transport X. nematophila in a specialized structure near the front of the nematode gut referred to as the receptacle, and after the nematode enters its insect host, it releases the bacteria from this structure into the insect body cavity [48]. X. nematophila attacks the insect from the inside using a suite of insecticidal proteins, providing the nematode with the nutrition it needs to reproduce inside the insect host [49]. Before leaving the insect cavity, the nematode progeny are colonized with X. nematophila, a process that appears to be dependent on three nematode intestinal localization genes (nilA, nlB, and nilC). These genes, identified by transposon mutagenesis, were only present in one of eight strains of X. nematophila studied [50]. Transfer of these genes to a non-nematode colonizing bacterial strain resulted in the microbe gaining the ability to colonize the nematode. The levels of colonization were 60 times lower than in wild type strains expressing nilA, nilB and nilC, suggesting the involvement of additional determinants. Still, these three genes are sufficient to turn a non-colonizing strain into a colonizer [50,51]. Genetic suites are therefore important for defining host specificity for the nematode mutualist, as well as several plant pathogenic bacteria.

Genome: genomic islands

91

Genomic islands are large sections in a genome that may harbor genes involved in symbiosis, and usually exhibit signatures of having been acquired by lateral gene transfer, including a skewed GC content, and linkage to mobile genetic elements and their remnants [52,53]. In some cases, genomic islands are composed of clusters of coregulated genes that function as a regulon, exemplified by both the T3SS and type IV secretion systems (T4SS) that are required for host association for many bacteria [31,54].

Although only their secreted virulence proteins are often the focus of studies examining host specificity, one study examined whether the T4SS system of the animal pathogen,

Bartonella, could itself function as a host specificity determinant, since portions of the system interacted directly with the surface of the host [55]. Expression of the T4SS from the rat-specific tribocorum in the cat-specific and the human-specific strains increased the latter two strains ability to infect rat erythrocytes [55]. Host specificity was shown to be a direct consequence of the polymorphic trwJ and trwL, which are responsible for the attachment of the T4SS to the eukaryotic cell membrane [56]. These two genes independent of the system are not sufficient to cause a change in host specificity, but are required to mediate the host- specific attachment of the Bartonella T4SS to host cells.

Concluding remarks

The genetic determinants of host specificity in viruses, bacteria and fungi can be viewed as a genetic continuum of changes, ranging from the simplest mutations in single residues, to the acquisition of the optimal repertoire of T3SEs or a genomic island. These key pathogenicity determinants provide the means for viruses, bacteria and fungi to interact with their current hosts, with changes to these determinants frequently enabling

92 expansion into new host systems. For pathogens, in particular, the implications of host specificity from the perspective of zoonoses and epidemiology are far-reaching, and the ongoing advancement toward larger scale genomic and proteomic analyses will continue to reveal that we have only just begun to uncover the highly complex subtleties of host specificity.

Acknowledgements

The authors gratefully acknowledge Taylor Duda, April Sefton and Geeta Nadarasah for their feedback on this manuscript. John Stavrinides is supported by grants from the

NSERC Discovery Grants Program, Canada Foundation for Innovation, and the Faculty of Science at the University of Regina.

93

References

1 Kassen, R. (2002) The experimental evolution of specialists, generalists, and the maintenance of diversity. Journal of Evolutionary Biology 15, 173-190

2 Ellis, J., et al. (2000) Structure, function and evolution of plant disease resistance genes. Current Opinion in Plant Biology 3, 278-284

3 Tamura, M. and Tachida, H. (2011) Evolution of the number of LRRs in plant disease resistance genes. Molecular Genetics and Genomics 285, 393-402

4 Keirstead, N.D., et al. (2011) Single nucleotide polymorphisms in collagenous lectins and other innate immune genes in pigs with common infectious diseases. Veterinary

Immunology and Immunopathology 142, 1-13

5 Uciechowski, P., et al. (2011) Susceptibility to tuberculosis is associated with TLR1 polymorphisms resulting in a lack of TLR1 cell surface expression. Journal of Leukocyte

Biology 90, 377-388

6 Lambrechts, L., et al. (2005) Host genotype by parasite genotype interactions underlying the resistance of anopheline mosquitoes to Plasmodium falciparum. Malaria

Journal 4

7 Fauvart, M. and Michiels, J. (2008) Rhizobial secreted proteins as determinants of host specificity in the rhizobium–legume symbiosis. FEMS Microbiology Letters 285, 1-9

8 Spaink, H.P., et al. (1987) Rhizobium nodulation gene nodD as a determinant of host specificity. Nature 328, 337-340

9 Göttfert, M., et al. (1990) Proposed regulatory pathway encoded by the nodV and nodW genes, determinants of host specificity in Bradyrhizobium japonicum. Proceedings of the

National Academy of Sciences 87, 2680-2684

94

10 Iwamoto, T., et al. (2004) Identification of host-specificity determinants in betanodaviruses by using reassortants between striped jack nervous necrosis virus and sevenband grouper nervous necrosis virus. J Virol 78, 1256-1262

11 Mahajan-Miklos, S., et al. (1999) Molecular mechanisms of bacterial virulence elucidated using a Pseudomonas aeruginosa-Caenorhabditis elegans pathogenesis model. Cell 96, 47-56

12 Cruz, A.T., et al. (2007) Pantoea agglomerans - A plant pathogen causing human disease. J. Clin. Microbiol., JCM.00632-00607

13 Wolfgang, M.C., et al. (2003) Conservation of genome content and virulence determinants among clinical and environmental isolates of Pseudomonas aeruginosa.

Proceedings of the National Academy of Sciences 100, 8484-8489

14 Mahajan-Miklos, S., et al. (2000) Elucidating the molecular mechanisms of bacterial virulence using non-mammalian hosts. Mol Microbiol 37, 981-988

15 Starkey, M. and Rahme, L.G. (2009) Modeling Pseudomonas aeruginosa pathogenesis in plant hosts. Nat. Protocols 4, 117-124

16 Rocha, E.P.C. and Danchin, A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Molecular Biology and Evolution 21, 108-116

17 Meselson, M.S. and Radding, C.M. (1975) A general model for genetic recombination. Proceedings of the National Academy of Sciences 72, 358-361

18 Krishnapillai, V. (1996) Horizontal gene transfer. Journal of Genetics 75, 219-232

19 Hacker, J. and Carniel, E. (2001) Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep 2, 376-381

95

20 Hacker, J., et al. (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Molecular Microbiology 23, 1089-1097

21 Gething, M.-J., et al. (1980) Cloning and DNA sequence of double-stranded copies of haemagglutinin genes from H2 and H3 strains elucidates antigenic shift and drift in human influenza virus. Nature 287, 301-306

22 Neumann, G., et al. (2009) Emergence and pandemic potential of swine-origin H1N1 influenza virus. Nature 459, 931-939

23 Tarendeau, F., et al. (2008) Host determinant residue lysine 627 lies on the surface of a discrete, folded domain of influenza virus polymerase PB2 subunit. PLoS Pathog 4, e1000136

24 Cvirkaite-Krupovic, V., et al. (2010) Phospholipids act as secondary receptor during the entry of the enveloped, double-stranded RNA bacteriophage phi6. J Gen Virol 91,

2116-2120

25 Duffy, S., et al. (2006) Pleiotropic costs of niche expansion in the RNA bacteriophage phi 6. Genetics 172, 751-757

26 Duffy, S., et al. (2007) Evolution of host specificity drives reproductive isolation among RNA viruses. Evolution 61, 2614-2622

27 Mindich, L. (2006) Phages with segmented double-stranded RNA genomes. In The bacteriophages (Calendar, R. and Abedon, S.T., eds), pp. 197-207, Oxford University

Press

28 Brighton, S.W., et al. (1983) Chikungunya virus infection. A retrospective study of

107 cases. S Afr Med J 63, 313-315

96

29 Tsetsarkin, K.A., et al. (2007) A single mutation in chikungunya virus affects vector specificity and epidemic potential. PLoS Pathog 3, e201

30 Eswarappa, S.M., et al. (2008) Differentially evolved genes of Salmonella pathogenicity islands: insights into the mechanism of host specificity in Salmonella.

PLoS One 3, e3829

31 Hueck, C.J. (1998) Type III protein secretion systems in bacterial pathogens of animals and plants. Microbiol Mol Biol Rev 62, 379-433

32 Sung, J.M., et al. (2008) Staphylococcus aureus host specificity: comparative genomics of human versus animal isolates by multi-strain microarray. Microbiology 154,

1949-1959

33 Moreillon, P., et al. (1995) Role of Staphylococcus aureus coagulase and clumping factor in pathogenesis of experimental endocarditis. Infect Immun 63, 4738-4743

34 Stavrinides, J., et al. (2006) Terminal reassortment drives the quantum evolution of type III effectors in bacterial pathogens. PLoS Pathog 2, e104

35 Stavrinides, J., et al. (2008) Host-pathogen interplay and the evolution of bacterial effectors. Cellular Microbiology 10, 285-292

36 Nissan, G., et al. (2006) The type III effectors HsvG and HsvB of gall-forming

Pantoea agglomerans determine host specificity and function as transcriptional activators. Mol Microbiol 61, 1118-1131

37 Mandel, M.J., et al. (2009) A single regulatory gene is sufficient to alter bacterial host range. Nature 458, 215-218

38 Nair, V. and Nishiguchi, M.K. (2009) Biological properties (in vitro) exhibited by free-living and symbiotic Vibrio isolates. Vie Milieu Paris 59, 277-285

97

39 Perret, X., et al. (2000) Molecular basis of symbiotic promiscuity. Microbiol. Mol.

Biol. Rev. 64, 180-201

40 Long, S.R. (2001) Genes and signals in the Rhizobium-legume symbiosis. Plant

Physiology 125, 69-72

41 Horvath, B., et al. (1986) Organization, structure and symbiotic function of Rhizobium meliloti nodulation genes determining host specificity for alfalfa. Cell 46, 335-343

42 Bye, N.J. and Charnley, A.K. (2008) Regulation of cuticle-degrading subtilisin proteases from the entomopathogenic fungi, Lecanicillium spp: implications for host specificity. Arch Microbiol 189, 81-92

43 Poueymiro, M., et al. (2009) Two type III secretion system effectors from Ralstonia solanacearum GMI1000 determine host-range specificity on tobacco. Mol Plant Microbe

Interact 22, 538-550

44 Hajri, A., et al. (2009) A "repertoire for repertoire" hypothesis: repertoires of type three effectors are candidate determinants of host specificity in Xanthomonas. PLoS One

4, e6632

45 Sarkar, S.F. and Guttman, D.S. (2004) Evolution of the core genome of Pseudomonas syringae, a highly clonal, endemic plant pathogen. Applied and Environmental

Microbiology 70, 1999-2012

46 Takeuchi, K., et al. (2003) Flagellin glycosylation island in Pseudomonas syringae pv. glycinea and its role in host specificity. J Bacteriol 185, 6658-6665

47 Martens, E.C., et al. (2003) Early colonization events in the mutualistic association between Steinernema carpocapsae nematodes and Xenorhabdus nematophila bacteria. J

Bacteriol 185, 3147-3154

98

48 Goodrich-Blair, H. (2007) They've got a ticket to ride: Xenorhabdus nematophila-

Steinernema carpocapsae symbiosis. Curr Opin Microbiol 10, 225-230

49 Mandel, M.J. (2010) Models and approaches to dissect host-symbiont specificity.

Trends Microbiol 18, 504-511

50 Heungens, K., et al. (2002) Identification of Xenorhabdus nematophila genes required for mutualistic colonization of Steinernema carpocapsae nematodes. Mol Microbiol 45,

1337-1353

51 Cowles, C.E. and Goodrich-Blair, H. (2004) Characterization of a lipoprotein, NilC, required by Xenorhabdus nematophila for mutualism with its nematode host. Mol

Microbiol 54, 464-477

52 Dobrindt, U., et al. (2004) Genomic islands in pathogenic and environmental microorganisms. Nat Rev Micro 2, 414-424

53 Hacker, J. and Kaper, J.B. (2000) Pathogenicity islands and the evolution of microbes.

Annu Rev Microbiol 54, 641-679

54 Cascales, E. and Christie, P.J. (2003) The versatile bacterial type IV secretion systems. Nat Rev Microbiol 1, 137-149

55 Vayssier-Taussat, M., et al. (2010) The Trw type IV secretion system of Bartonella mediates host-specific adhesion to erythrocytes. PLoS Pathog 6, e1000946

56 Nystedt, B., et al. (2008) Diversifying selection and concerted evolution of a type IV secretion system in Bartonella. Mol Biol Evol 25, 287-300

99

Box 1: Outstanding Questions

 Are there any finer-scale changes beyond the single nucleotide change, such as

methylation, that can impact host specificity?

 Are there any intermediate levels of genetic change within the continuum, such as

post translational modification?

 Will additional examples of host specificity reveal that mutualistic or parasitic

associations fall within one or several select areas within the genetic continuum?

 Is there a matching continuum of genetic changes for host immunity?

 Are host-specific associations involving more genes generally older? Do suites of

host-specificity determinants represent a fine-tuning of a particular association?

 Can the genetic complexity of host-specific interactions be used for predicting how

likely an engineered and commercialized microbe will undergo a host jump?

100

Glossary

Commensalism: a host-microbe interaction where the microbe benefits from its association to the host, but at no cost to the host.

Generalist: a microbe or organism that has the ability to interact with or colonize multiple hosts.

Genomic Islands:

Horizontal/Lateral gene transfer: the transfer of genetic material from one organism into another; can result in the acquisition of novel genes, genomic islands or plasmids.

Host: an organism which harbours a microbe, either in or on itself, in a relationship that either harms, benefits or has no effect on the organism.

Host jump: the ability to colonize or infect a new host.

Host range: the collection of organisms with which a pathogen can interact and associate.

Host specificity factors/determinants: genetic factors that are required for a microbe to cause disease in a specific host.

Host specificity: partiality to infecting one or a group of defined hosts (hving a particular host range).

Multi-host microbes: microbes having more than one suitable host with which they interact.

101

Mutualism: a host-microbe interaction whereby both the host and the microbe benefit from the association.

Mutation: a change in the genetic makeup of an organism.

Parasite: an organism that exploits a host for survival, but does not necessarily cause disease or symptoms of disease. All pathogens are parasites.

Parasitism: a host-microbe interaction where the microbe benefits from the association at some cost to the host.

Pathogen: an organism that colonizes, exploits, and reproduces within a host resulting in disease.

Pathogenicity factors/determinants: genetic factors that enable a pathogen to cause disease in a host.

Pathovar: a ‘pathogenic variety’ of bacterial pathogen that exhibits some level of host preference.

Population: a group of organisms of the same species or type, which occupy the same geographical location or niche.

Recombination: A process by which DNA or RNA is broken, exchanged and reattached; can involve like pieces of DNA (homologous recombination), or unlike pieces of DNA

(non-homologous or illegitimate recombination).

102

Resistance (R) gene: a gene that codes for a protein that can recognize specific microbial proteins, and which leads to a defense response characterized by rapid cell death and restriction of pathogen growth.

Specialist: a microbe or organism that only has the ability to infect or associate with a defined set of hosts.

Symbiosis: living together; encompasses parasitic, mutualistic, commensalistic and amensalistic symbioses; often confused with ‘mutualism’.

Virulence factors/determinants: genetic factors of a microbe that affect the degree of disease in a suitable host.

Zoonoses: movement of a microbe from a non-human host to a human host, often causing disease.

103

Figure Legends

Figure 1. The genetic determinants of host specificity as a continuum. Changes in host specificity can range from the smallest to the largest genetic change, encompassing single nucleotide and amino acid substitutions (residues), portions of genes (regions), whole genes (genes), genetic complements (repertoires), and entire genomic islands

(genomic islands).

104

Appendix D. Insights into cross-kingdom plant pathogenic bacteria

Note: This appendix has been republished, with permission, from Kirzinger, M.W.B., Nadarasah, G., and Stavrinides, J. Insights into cross-kingdom plant pathogenic bacteria. Genes. 2(4): 980-997.

Insights into cross-kingdom plant pathogenic bacteria

Morgan W. B. Kirzinger1, Geetanchaly Nadarasah1, and John Stavrinides

Department of Biology, University of Regina, 3737 Wascana Parkway, Regina,

Saskatchewan, Canada, S4S0A2

1Equal Contribution

Corresponding Author: John Stavrinides Department of Biology University of Regina Regina, Saskatchewan S4S 0A2 Tel: (306) 337-8478 Fax: (306) 337-2410 [email protected]

105

Abstract

Plant and human pathogens have evolved disease factors to successfully exploit their respective hosts. Phytopathogens utilize specific determinants that help to breach reinforced cell walls and manipulate plant physiology to facilitate the disease process, while human pathogens use determinants for exploiting mammalian physiology and overcoming highly developed adaptive immune responses. Emerging research, however, has highlighted the ability of seemingly dedicated human pathogens to cause plant disease, and specialized plant pathogens to cause human disease. Such microbes represent an interesting paradigm in the evolution of cross-kingdom pathogenicity, and the benefits and tradeoffs of exploiting multiple hosts with drastically different morphologies and physiologies. This review will explore cross-kingdom pathogenicity, where plants and humans are common hosts. We illustrate that while cross-kingdom pathogenicity appears to be maintained, the directionality of host association (plant to human, or human to plant) is difficult to determine. Cross-kingdom human pathogens, and their potential plant reservoirs, have important implications for the emergence of infectious diseases.

106

Introduction

Plant pathogenic bacteria are responsible for major economical losses in agricultural and farming industries worldwide, prompting massive research efforts to understand their ecology, pathology and epidemiology. Studies of many agriculturally relevant pathogens like Pseudomonas syringae, Xylella fastidiosa, Erwinia amylovora, and Xanthomonas campestris [1-4] have revealed an extensive specialization to plant systems, including a wealth of plant-specific pathogenicity determinants such as type III secretion systems, plant hormone analogs, and enzymes that target plant-specific cell wall components [1,5-7]. Despite this, we often neglect the fact that the interactions of phytopathogenic bacteria are not confined to plants, and may include other organisms in the environment. Recent studies have begun to show that many plant pathogens have the capacity to colonize other hosts outside of the plant kingdom, including insects, animals, and humans [8] [9].

Much like our perceptions of plant pathogens, human pathogens are studied primarily for their detrimental impact on human health. Pathogens such as , Listeria monocytogenes, and possess a diverse set of genetic factors that enable pathogenicity, ranging from specialized secretion systems to toxins and adhesins, all of which are involved in manipulating or circumventing the human immune system [10-12]. We often consider these human pathogenic bacteria to be exclusive animal pathogens, causing disease and epidemics, yet the constant interaction of human carriers with their environment predisposes these pathogens to alternative niches, which includes free-living states, as well as the potential association with non- human hosts. Not surprisingly, we have begun to discover that some animal pathogens,

107 which are known to cause serious human diseases, also have plant pathogenic capabilities. The epidemiology and disease strategies of these pathogens, whether considered primarily a plant or human pathogen, are of particular interest from the perspectives of both the biology and evolution of cross-kingdom pathogenesis.

The disease strategies used by cross-kingdom pathogens to infect unrelated hosts are of particular interest since plant and animal hosts have distinctive physical barriers and defense responses. While specialists may have evolved dedicated factors for overcoming the physical barriers and innate defenses in a certain host, a cross-kingdom pathogen would require a diverse library of genes and disease strategies to overcome the specific obstacles of each of its hosts. This would either necessitate a suite of pathogenicity factors to enable attachment, disease development, and dispersal for each host, or a universal disease strategy in which similar subsets of pathogenicity factors are used for all hosts. Although signatures of coevolution may not be as pronounced for cross-kingdom pathogens, host adaptation will still proceed through incremental nucleotide substitutions, genetic rearrangements, or acquisition of genetic elements involved in virulence or host specificity [13]. The identification of these genetic determinants in cross-kingdom pathogens with both human- and plant-pathogenic potential can provide a better understanding of the evolution of phytopathogenicity, as well as the role of plants as potential reservoirs for clinically relevant bacteria.

This review will examine various bacterial pathogens that exhibit cross-kingdom pathogenicity, where humans and plants are both potential hosts. Pantoea, Burkholderia,

Salmonella, Serratia and Enterobacter, and Enterococcus are six genera that have brought insight into cross-kingdom pathogenicity, and the versatility of bacteria (Table

108

1). First, we examine the pathogenicity of Pantoea and Burkholderia, which are commonly regarded as plant pathogens, but have been shown to cause human disease.

We then examine what is known about the disease strategies of the human pathogens

Salmonella, Serratia, Enterobacter and Enterococcus, which have been shown to cause plant disease. We highlight what is known about the disease determinants and strategies for each pathogen in each of its hosts, and identify those examples where specific determinants are used in multiple hosts. Finally, we address the evolutionary means by which plant pathogens may evolve to be pathogenic to humans, as well as the possible routes of microbes that can lead to the evolution and maintenance of cross-kingdom pathogenicity.

Plant pathogens that infect humans

Plant pathogens have evolved a repertoire of pathogenicity factors that are used to invade plant host cells, facilitate disease development, and eventually promote pathogen dispersal [14]. Surprisingly, some of these plant pathogens cause disease in humans, and are frequently isolated from human infections in the nosocomial environment. The genus

Pantoea, for example, currently includes seven species (Pantoea agglomerans, Pantoea ananatis, Pantoea citrea, Pantoea dispersa, Pantoea punctata, Pantoea stewartii, and

Pantoea terra) [15-21] that are known to cause plant disease. P. agglomerans causes crown and root gall disease of gypsophila and beet [15,20], P. ananatis causes bacterial blight and dieback of Eucalyptus [16], brown stalk rot of maize [22], and stem necrosis of rice [23], and P. citrea is the causal agent of pink disease in pineapple [21]. The type III secretion system (T3SS) appears to be an essential plant pathogenicity factor for many species of Pantoea, enabling efficient colonization and onset of disease through the

109 injection of bacterial effector proteins that disrupt defense signalling in host cells [24].

Pantoea agglomerans pvs. gypsophilae and betae are known for using a T3SS for causing galls on their host plants [25]. While P. agglomerans pv. betae can induce gall formation on gypsophila and beet plants, P. agglomerans pv. gypsophilae is only capable of causing gall formation on gypsophila. Host range is determined in part by the type III effector protein, PthG, which is recognized by the beet plant resulting in a defense response, but functions as a virulence factor in gypsophila [25]. Studies with transgenic

Nicotiana tabacum plants expressing PthG suggest that the effector may interfere in the plant auxin signalling pathways, resulting in higher observed auxin and ethylene levels, and subsequent blockage of root and shoot development [25]. The manipulation of plant- specific hormones is likely directly responsible for callus/gall formation, and highlights the specialization of this pathogen to plant systems.

Despite this specialization to plants, species of Pantoea have also been discovered to be pathogenic to humans. Now classified as an opportunistic human pathogen, P. agglomerans was implicated in a large U.S. and Canadian outbreak of septicaemia caused by contaminated closures on infusion fluid bottles [26]. P. agglomerans has since been associated in the contamination of intravenous fluid, parenteral nutrition, blood products, propofol, and transference tubes, causing illness and even death [27-31]. P. agglomerans has also been obtained from joint fluids of patients with synovitis, osteomyelitis, and arthritis [17], where infection often occurs following injuries with wood slivers, plant thorns, or wooden splinters [17,32-34]. This is also true for P. ananatis, P. septica, and

P. dispersa, which are known for causing disease in onion and sugar cane plants, but have also been implicated in bacteremic infections, septicaemia, and leukemia,

110 respectively [35-37]. Phylogenetic studies examining the relationship between the plant isolates and the clinical isolates have shown that they are indistinguishable, and their potential pathogenicity in plant and animal hosts is unclear [38].

Much like Pantoea, species of Burkholderia are also recognized phytopathogens, but are ubiquitous in the terrestrial and aquatic environment. Burkholderia species, including Burkholderia cepacia, Burkholderia cenocepacia, Burkholderia ambifaria, and

Burkholderia pyrrocinia were initially identified as inhabitants of agricultural soil, with some, like Burkholderia glathei being found in fossil soils in Germany [39]. Given their ubiquity in the terrestrial environment, it is not surprising that some strains, like

Burkholderia plantarii and Burkholderia gladioli have been shown to be pathogenic to onion, rice, gladiolus and iris [39]. Similarly, Burkholderia glumae, the most important bacterial pathogen of rice in Japan, Korea and Taiwan, was shown to carry a plant T3SS, which is essential for its ability to cause plant disease [40]. An analysis of the proteins regulated by HrpB, which activates the expression of T3SS genes, identified 34 extracellular proteins that were secreted, with 21 of 34 having putative HrpB-binding sequences in their upstream regulatory regions [41]. Another set of 16 proteins were recognized to be secreted via a type II protein secretion system (T2SS) [41]. Mutants lacking either the T2SS or the T3SS produced toxoflavin, but were less virulent to rice panicles, indicating the importance of these proteins in plant pathogenicity [41].

Likewise, Burkholderia pseudomallei possesses multiple T3SS gene clusters, one of which, T3SS2, showed high similarity to the T3SSs in phytopathogenic Xanthomonas spp. and Ralstonia solanacearum [42]. Arabinose-negative strains of B. pseudomallei are more virulent in tomato plants, and these strains appear to carry a T3SS, suggesting that

111 these two phenotypes are linked to virulence in plants [43]. Other species, like B. cepacia, have strains that are specialized to plant roots and have been found in the maize rhizosphere [44], while others are pathogenic to onion, and cause severe rots using a cocktail of plant-specific enzymes following biofilm formation [45].

Interestingly, B. cepacia that causes onion rot can also cause life-treating pulmonary infections in individuals with chronic granulomatous disease, or cystic fibrosis (CF) [46]. Between 1980 and 1990, B. cepacia was associated with a number of cases of “cepacia syndrome” in CF treatment centres and social gatherings, where CF patients became infected by other B. cepacia-carrying individuals, with many of the infections being fatal [39,47-50]. Recently, fatal infections were also observed in non-CF patients in reanimation wards in Europe and North America [51]. Investigation into the ability of various strains of B. cepacia to penetrate airway barriers revealed that all three of the B. cepacia strains tested circumvented the mechanical barriers of mucus and ciliary transport to penetrate the airway epithelium [52]. Different strains of B. cepacia use distinct invasion pathways and virulence determinants, which may account for differences in virulence strains [52].

Several of the pathogenicity determinants for different species of Burkholderia in human hosts have been determined. For example, B. pseudomallei, which infects tomato plants, contains a complete pqsA–pqsE operon, which is highly similar to the genes responsible for the synthesis of the virulence-associated signalling molecule 2-heptyl-3- hydroxy-4(1H)-quinolone (PQS) found in Pseudomonas aeruginosa [53]. Introduction of the B. pseudomallei hhqA and hhqE genes into P. aeruginosa pqsA and pqsE mutants, promoted the restoration of PQS production and virulence. Likewise, the presence of a

112 capsular polysaccharide has been implicated in the pathogenicity of Burkholderia toward humans. Parental strains of B. mallei with the capsule were highly virulent in hamsters and mice, while capsule mutants were avirulent in both animal models [54].

Human pathogens that infect plants

Our human-centric view of disease often leads to sweeping assumptions that human pathogenic bacteria are devoted to human hosts. Species of Salmonella, Serratia,

Enterobacter, and Enterococcus are commonly considered problematic human pathogens that are frequently found in the nosocomial environment, and which cause food poisoning, general infections, and septicaemia [55-58] (Table 1). Relatively recent studies, however, have begun to uncover that these human-pathogenic bacterial species are also accomplished phytopathogens that are capable of causing disease in a wide variety of plant hosts.

Salmonella is the causal agent of many human, animal, and bird diseases worldwide, including gastroenteritis and , and results in approximately 1.4 million human illnesses and 600 deaths annually in the United States [56]. Successful host infection by Salmonella appears dependent on many pathogenicity determinants, including two T3SSs and a large suite of secreted effectors that function to mediate both intercellular and intracellular survival in the host [59]. Other virulence factors include the flhD gene, which regulates the production of the flagellum, is essential for the full invasive potential of the bacterium [60]. Other virulence factors like flavohemoglobin protect against nitric oxides, and mediate bacterial survival in macrophages after phagocytosis [61].

113

Despite its clear adaptation to survival in human hosts, Salmonella has also been isolated from the phyllosphere of tomato crops [62]. S. enterica levels on wild tomato

(Solanum pimpinellifolium) were lower than on domesticated tomato cultivars (Solanum lycopersicum), with S. enterica preferentially colonizing type 1 trichomes. Plants irrigated with contaminated water had larger S. enterica populations than plants grown from seeds planted in infected soil; however, both routes of contamination resulted in detectable S. enterica populations in the phyllosphere, suggesting that S. enterica has the ability to associate with plants and has adapted to survive in the phyllosphere. agfA and agfB were identified as being involved in enabling S. enterica to associate with plants

[63]. agfA mutants were unaffected in their ability to attach or colonize alfalfa sprouts, whereas agfB mutants showed reduced colonization ability. This suggests that agfB alone plays a role in plant attachment [63].

Salmonella is not only capable of colonizing the phyllosphere; it is also capable of infecting the plant Arabidopsis, and causing death of plant organs [64,65]. Inoculation of

Salmonella in Arabidopsis via shoot or root tissues resulted in chlorosis, wilting, and eventually death of the infected tissues within seven days [64]. The specific virulence factors involved are not yet known; however, its plant pathogenicity, much like its human pathogenicity appears dependent, in part, on the flagellum. Fili, a protein needed for flagellar assembly has been shown to be essential for plant pathogenesis, possibly through its involvement in a specialized protein export pathway [66]. The Fili protein is similar to the HrpB6 protein of the rice pathogen Xanthomonas, which mediates interactions between Xanthomonas and its host [66]. Secretion of proteins may therefore

114 be integral for both human and plant association for Salmonella, although whether the same proteins are used for pathogenesis, is still unknown.

Serratia marcescens is a human pathogen commonly found in the respiratory and urinary tracts of humans, and is responsible for approximately 1.4% of nosocomial infections in the United States [57]. Through transposon mutagenesis, genes involved in lipopolysaccharide (LPS) biosynthesis, iron uptake, and hemolysin production were discovered to be essential for bacterial virulence in the host Caenorhabditis elegans [67].

Further studies have shown that purified protease proteins of S. marcescens administered to the lung tissue of guinea pigs and mice produced symptoms and haemorrhaging, which was similar to those animals with acute Serratia pneumonia [68].

Yet, despite its animal virulence factors, Serratia has also been found to be a common phytopathogen. S. marcescens is recognized as a phloem-resident pathogen that causes cucurbit yellow vine disease of pumpkin (Cucurbita moschata L.) and squash (Cucurbita pepo L.), which is marked by wilting, phloem discoloration, and yellowing of foliage

[69-71]. Recent studies have shown that S. marcescens produces a biofilm along the sides of the phloem vessels, blocking the transport of nutrients and eventually causing the plant to wilt and die [72,73]. A genetic screen to identify genes that modulate biofilm formation in S. marcescens revealed the involvement of fimbrial genes, as well as an oxyR homolog – a conserved bacterial transcription factor having a primary role in oxidative stress response [74]. The involvement of these plant-specific genes in human pathogenicity has not been determined.

Another bacterium that has also shown cross-kingdom pathogenesis is the Gram- negative bacterium , an important nosocomial pathogen responsible

115 for bacteremia, lower respiratory tract infections, skin and soft-tissue infections, as well as urinary tract infections [55]. Recently, an E. cloacae infection of the bloodstream was traced back to contaminated human albumin [75]. E. cloacae synthesize a Shiga-like toxin II-related cytotoxin, which was implicated in an infant case of haemolytic-uremic syndrome [76]. Studies have shown that the concentration of the outer membrane protein

(OmpX) produced by E. cloacae during infection influenced the ability of the bacterium to invade rabbit intestinal tissue [77]. Overproduction of OmpX led to a 10-fold increase in the invasiveness of E. cloacae in rabbit intestinal enterocytes in situ, whereas the mutant and wildtype strain were unable to effectively invade the same tissue [77].

Interestingly, OmpX shares a high amino acid similarity to the virulence proteins PagC and Rck of Salmonella typhimurium as well as the virulence-associated Ail protein of

Yersinia enterocolitica. Although there is evidence that E. cloacae has evolved to colonize the human host, it has also been identified as a plant pathogen of macadamia

(Macadamia integrifolia), and the causal agent of grey kernel disease [78]. The onset of grey kernel disease affects not only the quality of the kernels produced by the plant, but results in grey discoloration and a foul odour [78]. E. cloacae also causes bacterial soft rot disease in dragon fruit (Hylocereus spp.) [18], bacterial leaf rot in Odontioda orchids

[79], and is also responsible for internal yellowing disease in papaya [78]. Again, the specificity of the virulence factors used in each of these hosts is not known.

Cross-kingdom pathogenesis is not limited to Gram-negative bacteria.

Enterococci are part of the normal intestinal flora of humans and animals, but are also important pathogens responsible for serious infections, especially in immunocompromised patients [58]. With increasing antibiotic resistance, enterococci are

116 recognized as nosocomial pathogens that can be challenging to treat. The genus

Enterococcus includes more than 17 species, but only a few can cause local or systemic clinical infections including urinary tract and abdominal infections, wound infections, bacteremia, and endocarditis [80]. Clinical isolates of Enterococcus faecalis have been found to produce hemolysin – a virulence factor that leads to the lysis of red blood cells

[81]. Hemolytic strains exhibit multiple drug resistance more frequently than non- hemolytic strains, while strains isolated from fecal specimens of healthy individuals display a low (17%) incidence of hemolysin production [81]. The salB gene of E. faecalis has also been shown to be a virulence factor, since it increases bacterial adherence to extracellular matrix (ECM) proteins and promotes biofilm formation during infection [82]. The surface protein Esp was shown to contribute to the colonization and persistence of the bacterium in the urinary tract [83]. E. faecalis has also been shown to exhibit pathogenicity toward insects in addition to human hosts, where the extracellular gelatinase of E. faecalis was found to destroy the host defence system through the degradation of inducible antimicrobial peptides in insect hemolymph and in human serum

[84]. Similarly, a putative quorum-sensing system gene (fsrB) and a serine protease

(sprE) were shown to play an important role in mammalian and nematode models of infection [58].

E. faecalis is not only capable of infecting mammalian and nematode hosts; it also can infect the leaves and root tissue of Arabidopsis thaliana, causing plant death seven days after inoculation [58]. The manifestation of disease initially begins once the bacterium has successfully attached itself to the leaf surface, where upon entry of the leaf tissue through the stomata or wounds, E. faecalis multiplies and colonizes the

117 intercellular spaces of the plant host and causes rotting and disruption of the plant cell wall and membrane structures. The phytopathogenicity of E. faecalis appears to involve some of the same genetic determinants involved in animal pathogenesis, including the quorum sensing system (fsrB) and a serine protease (sprE) [58]. The quorum-sensing mutant (ΔfsrB) exhibited an attenuated virulence in the A. thaliana root model, since fewer bacteria were able to attach to the root surfaces, and biofilm formation was greatly reduced. Similarly, the serine protease mutant (ΔsprE) no longer caused plant death, suggesting that this gene plays an important role in E. faecalis plant pathogenesis [58].

E. faecalis clearly uses a general disease strategy that allows it to use the same virulence factors for exploiting two different hosts.

Evolutionary models

Cross-kingdom pathogenicity represents an intriguing balance between the costs and benefits of bacterial generalization versus specialization. While there are benefits to specializing on one host, and evolving to exploit its subtle vulnerabilities, the availability of this host would limit the success of the pathogen, thereby favouring host diversification. Conversely, a broad host range would ensure host availability for the pathogen, but at the significant cost of having to utilize a universal, and likely suboptimal infection strategy for exploiting that host. When the alternative host is from a different kingdom, the dynamics become even more intriguing, since the disease strategy is likely to be quite different. The evolution of cross-kingdom pathogenicity and its origins can be described using several simplistic models. In the context of the examples described here, one model describes the evolution of a successful animal pathogen that acquires the ability to cause disease in a plant host (human to plant). If we presuppose that animal

118 pathogens are readily acquired from the environment and are already adapted to animal hosts [85], there are several ways in which these pathogens may acquire plant-pathogenic potential. Deposition of animal pathogens back into the environment, such that they are introduced near or onto plants recurrently may facilitate the exchange genetic information with other organisms in the environment, and the gain of a selective advantage that enables persistence. One primary route to this general environment is through the natural strategies used by bacteria to ensure dissemination from their infected host. For humans, symptoms such as diarrhea that result from an infection provide the bacteria with an escape route from the host into the environment [86] (Figure 1). Released bacteria move from human wastes to natural watersheds, with subsequent reuse of these sources for irrigation of commercially relevant crops increasing bacterial titres in the phyllosphere

[64,65]. But, the movement of human pathogens to the general environment may also occur through indirect routes. The Pharoah ant (Monomorium pharaonis) is able to transport human pathogenic bacteria including Salmonella, Staphylococcus, and

Streptococcus within a nosocomial setting [87], and there have been reports of the common house fly, Musca domestica, carrying Pseudomonas aeruginosa, Enterococcus faecalis and Staphylococcus aureus [88] (Figure 1). These insects function as transports for human pathogenic bacteria, moving them from the clinical setting to the general environment [9], and likely onto plants. Acclimation to a plant host would occur over extended periods of time, and would be accelerated with repeated cycling of pathogens from humans to the general environment.

Under this same model, the evolution of cross-kingdom pathogenicity can also be viewed as a secondary trait following the acquisition of key factors by bacteria, which act

119 as anti-feeding determinants against potential animal predators, or which function to ensure efficient survival and persistence in the environment [91]. Virulence or defense determinants may target key defense pathways of animal predators, which could include the innate immune system – the most basal and common defense pathway of eukaryotes

[92]. This would allow a pathogen with such virulence or defense factors to exploit humans (and other animals) as well as plants, thereby promoting pathogen replication and dispersal; however, host association and/or contact would be incidental, and facilitated by external processes, as the microbe would not have strategies for host entry or association.

The maintenance of cross-kingdom pathogenicity would require only the selective pressure that originally resulted in the evolution of the virulence or defense factor.

A second model that can account for the evolution of the cross-kingdom pathogens has plant pathogenicity as the ancestral state [89], with established plant pathogens having evolved the ability to exploit animals (including humans) as alternative hosts. The simplest mode of transmission is direct contact with an epiphytically colonized or diseased plant by humans, whether by ingestion, contact with the skin or other mucosal membranes, scrapes or abrasions, providing the bacteria with a direct route to a potential host. Several plant pathogens that cause opportunistic human infections are associated with commercially relevant crop plants. Species of Pantoea are found on beets [15,20], maize [22], rice [23], and pineapple [21]. In the case of Pantoea, human infections occur frequently following abrasions caused by rose thorns, splinters, and arthritis, suggesting that these plant-associated strains can lead to human infection directly (Figure 1) [17,32-34]. Similarly, Burkholderia infects various plant species including onion, rice, sorghum and velvet beans [39,47,90], and may also have a direct

120 route to humans. Indirect transfer to humans or even to the nosocomial environment may be mediated by other organisms like ants and flies, which have been implicated as carriers of many bacterial species [87,88] , and may provide environmental isolates with a direct route to other environments and hosts.

Although there is evidence to support these models of evolution, it is difficult to establish evolutionary directionality, or the underlying selective pressures that lead to the evolution of cross-kingdom pathogenicity. The ability of human pathogens to exploit plants as alternative hosts is of particular significance from the perspective of host jumps and host-specific adaptation, since plants would ultimately serve as an extensive reservoir for clinically relevant bacteria. Some plant endophytes, which are organisms that live inside of plants often asymptomatically, have been shown to exhibit human pathogenic potential. Species like Morganella morganii, Klebsiella pneumoniae, Pantoea agglomerans and Streptomyces sp., have clinical relevance, but have been more closely investigated for their association with plants [93,94]. M. morganii is a phosphate solubilising bacterium [95], K. pneumoniae a nitrogen-fixing endophyte [96], P. agglomerans a gall-forming phytopathogen, and Streptomyces sp. a plant endophyte and plant pathogen [97-99]. Many of these animal-pathogenic endophytes have been shown to be latent plant pathogens, where after establishing a symbiotic mutualistic relationship with their plant host, the bacterium infects and causes severe disease within the plant host possibly due to environmental changes [100]. It has also been observed that an organism that is an endophyte on one plant species may have the potential to cause disease in a different plant species [101]. Yet, many human pathogens have been found to occur naturally in the rhizosphere [93], and it has been suggested that the mechanisms

121 necessary for surviving in the rhizosphere are similar to those necessary for causing human infection [102]. Contributing to the maintenance of these cross-kingdom pathogens may be anthropogenic factors. Certain species with human and plant pathogenic potential, such as Pantoea and Burkholderia cepacia are also commonly used as biocontrol agents against Erwinia amylovora and Rhizoctonia solani, respectively

[103,104]. The extensive use of these bacteria as biocontrol agents on commercially relevant crops can lead to broad dispersal in the environment, increasing their populations dramatically and providing them with greater opportunity for interaction with other microbes. Horizontal gene transfer with other plant and human pathogenic bacteria in the environment can lead to extensive exchange of host-specific virulence factors or niche- specific determinants, creating rapidly-evolving populations that may exhibit oscillations between host-associative and free-living states.

Conclusion

Advances in molecular genetics coupled with the exploration of the pathogenic potential of seemingly dedicated pathogens is beginning to reveal that many phytopathogenic bacteria are capable of exploiting human hosts, and many human pathogens are capable of exploiting plant hosts. The constant cycling of pathogens from the general environment to human environments may help to maintain cross-kingdom pathogenicity, with plants serving as intermediate hosts or reservoirs for human pathogens. The ability of these cross-kingdom pathogens to maintain their population levels in a variety of environments increases their pan genome and evolutionary potential, ultimately making these pathogens significant from the perspective of emerging and remerging infectious diseases.

122

Acknowledgements

We would like to thank Taylor Duda, April Sefton, and Lucas Robinson, and Kollin

Schmalenberg for their helpful suggestions and comments. J.S. is supported by funding from the NSERC Discovery Grant Program, and the Faculty of Science at the University of Regina.

123

References

1. Hopkins, D.L., Xylella Fastidiosa: Xylem-limited bacterial pathogen of plants.

Annual Review of Phytopathology 1989, 27, 271-290.

2. Whalen, M.C.; Innes, R.W.; Bent, A.F.; Staskawicz, B.J., Identification of

Pseudomonas syringae pathogens of Arabidopsis and a bacterial locus

determining avirulence on both Arabidopsis and soybean. The Plant Cell Online

1991, 3, 49-59.

3. Jock, S.; Donat, V.; López, M.M.; Bazzi, C.; Geider, K., Following spread of fire

blight in Western, Central and Southern Europe by molecular differentiation of

Erwinia amylovora strains with PFGE analysis. Environmental Microbiology

2002, 4, 106-114.

4. Dow, J.M.; Crossman, L.; Findlay, K.; He, Y.-Q.; Feng, J.-X.; Tang, J.-L.,

Biofilm dispersal in Xanthomonas campestris is controlled by cell-cell signaling

and is required for full virulence to plants. Proceedings of the National Academy

of Sciences 2003, 100, 10995-11000.

5. Osbourn, A.E.; Barber, C.E.; Daniels, M.J., Identification of plant-induced genes

of the bacterial pathogen Xanthomonas campestris pathovar campestris using a

promoter-probe plasmid. EMBO J 1987, 6, 23-28.

6. Wei, Z.M.; Laby, R.J.; Zumoff, C.H.; Bauer, D.W.; He, S.Y.; Collmer, A.; Beer,

S.V., Harpin, elicitor of the hypersensitive response produced by the plant

pathogen Erwinia amylovora. Science 1992, 257, 85-88.

124

7. Guttman, D.S.; Vinatzer, B.A.; Sarkar, S.F.; Ranall, M.V.; Kettler, G.; Greenberg,

J.T., A functional screen for the type III (Hrp) secretome of the plant pathogen

Pseudomonas syringae. Science 2002, 295, 1722-1726.

8. Chantanao, A.; Jensen, H.J., Saprozoic nematodes as carriers and disseminators of

plant pathogenic bacteria. J Nematol 1969, 1, 216-218.

9. Nadarasah, G., Stavrinides, J., Insects as alternative hosts for phytopathogenic

bacteria. FEMS Microbiol Rev 2011, 35, 555-575.

10. Enders, U.; Karch, H.; Toyka, K.V.; Michels, M.; Zielasek, J.; Pette, M.;

Heesemann, J.; Hartung, H.P., The spectrum of immune responses to

Campylobacter jejuni and glycoconjugates in Guillain-Barre syndrome and in

other neuroimmunological disorders. Ann Neurol 1993, 34, 136-144.

11. Jarvis, K.G.; Giron, J.A.; Jerse, A.E.; McDaniel, T.K.; Donnenberg, M.S.; Kaper,

J.B., Enteropathogenic Escherichia coli contains a putative type III secretion

system necessary for the export of proteins involved in attaching and effacing

lesion formation. Proc Natl Acad Sci U S A 1995, 92, 7996-8000.

12. Lenz, L.L.; Mohammadi, S.; Geissler, A.; Portnoy, D.A., SecA2-dependent

secretion of autolytic enzymes promotes Listeria monocytogenes pathogenesis.

Proc Natl Acad Sci U S A 2003, 100, 12432-12437.

13. Woolhouse, M.E.; Taylor, L.H.; Haydon, D.T., Population biology of multihost

pathogens. Science 2001, 292, 1109-1112.

14. Panstruga, R.; Dodds, P.N., Terrific protein traffic: the mystery of effector protein

delivery by filamentous plant pathogens. Science 2009, 324, 748-750.

125

15. Cooksey, D.A., Galls of Gypsophila paniculata caused by Erwinia herbicola.

Plant Dis 1986, 70, 464-468.

16. Coutinho, T.A., Preisig, O., Mergaert, J., Cnockaert, M.C., Riedel, K.H., Swings,

J., Wingfield, M.J., Bacterial blight and dieback of Eucalyptus species, hybrids,

and clones in South Africa. Plant Dis 2002, 86, 20-25.

17. Cruz, A.T.; Cazacu, A.C.; Allen, C.H., Pantoea agglomerans, a plant pathogen

causing human disease. J Clin Microbiol 2007, 45, 1989-1992.

18. Masyahit, M., Sijam, K., Awang, Y., Ghazali, M., First report on bacterial soft rot

disease on dragon Fruit (Hylocereus spp.) caused by Enterobacter cloacae in

peninsular Malaysia. Int J Agric Biol 2009, 11, 659-666.

19. Koutsoudis, M.D.; Tsaltas, D.; Minogue, T.D.; von Bodman, S.B., Quorum-

sensing regulation governs bacterial adhesion, biofilm development, and host

colonization in Pantoea stewartii subspecies stewartii. Proc Natl Acad Sci U S A

2006, 103, 5983-5988.

20. Burr, T.J., Katz, B.H., Abawi, G.S., Crosier, D.C. , Comparison of tumorgenic

strains of Erwinia herbicola isolated from table beet with E.h. gypsophilae. Plant

Dis 1991, 75, 855-858.

21. Marín-Cevada, V.; Vargas, V.H.; Juárez, M.; López, V.G.; Zagada, G.;

Hernández, S.; Cruz, A.; Caballero-Mellado, J.; López-Reyes, L.; Jiménez-

Salgado, T., et al., Presence of Pantoea citrea, causal agent of pink disease, in

pineapple fields in Mexico. Plant Pathology 2006, 55, 294-294.

126

22. Goszczynska, T., Botha, W.J., Venter, S.N., Coutinho, T.A. , Isolation and

identification of the casual agent of brown stalk rot, a new disease of maize in

South Africa. Plant Dis 2007, 91, 711-718.

23. Cother, E.J., Reinke, R., McKenzie, C., Lanoiselet, V.M., Noble, D.H., An

unusual stem necrosis of rice caused by Pantoea ananas and the first record of

this pathogen on rice in Australia, Austr. . Plant Pathol 2004, 33, 495-503.

24. Alfano, J.R.; Collmer, A., Type III secretion system effector proteins: double

agents in bacterial disease and plant defense. Annu Rev Phytopathol 2004, 42,

385-414.

25. Weinthal, D.; Yablonski, S.; Singer, S.; Barash, I.; Manulis-Sasson, S.; Gaba, V.,

The type III effector PthG of Pantoea agglomerans pv. gypsophilae modifies host

plant responses to auxin, cytokinin and light. European Journal of Plant

Pathology 2010, 128, 289-302.

26. Maki, D.G.; Rhame, F.S.; Mackel, D.C.; Bennett, J.V., Nationwide epidemic of

septicemia caused by contaminated intravenous products. I. Epidemiologic and

clinical features. Am J Med 1976, 60, 471-485.

27. Matsaniotis, N.S., Syriopoulou, V.P., Theodoridou, M.C., Tzanetou, K.G.,

Mostrou, G.I., Enterobacter sepsis in infants and children due to contaminated

intravenous fluids. Infect Control 1984, 5, 471-477.

28. Alvarez, F.E.; Rogge, K.J.; Tarrand, J.; Lichtiger, B., Bacterial contamination of

cellular blood components. A retrospective review at a large cancer center. Ann

Clin Lab Sci 1995, 25, 283-290.

127

29. Bennett, S.N.; McNeil, M.M.; Bland, L.A.; Arduino, M.J.; Villarino, M.E.;

Perrotta, D.M.; Burwen, D.R.; Welbel, S.F.; Pegues, D.A.; Stroud, L., et al.,

Postoperative infections traced to contamination of an intravenous anesthetic,

propofol. N Engl J Med 1995, 333, 147-154.

30. Habsah, H.; Zeehaida, M.; Van Rostenberghe, H.; Noraida, R.; Wan Pauzi, W.I.;

Fatimah, I.; Rosliza, A.R.; Nik Sharimah, N.Y.; Maimunah, H., An outbreak of

Pantoea spp. in a neonatal intensive care unit secondary to contaminated

parenteral nutrition. J Hosp Infect 2005, 61, 213-218.

31. Bicudo, E.L.; Macedo, V.O.; Carrara, M.A.; Castro, F.F.; Rage, R.I., Nosocomial

outbreak of Pantoea agglomerans in a pediatric urgent care center. Braz J Infect

Dis 2007, 11, 281-284.

32. Flatauer, F.E.; Khan, M.A., Septic arthritis caused by Enterobacter agglomerans.

Arch Intern Med 1978, 138, 788.

33. De Champs, C., Le Seaux, S., Dubost, J.J., Boisgard, S., Sauvezie, B., Sirot, J. ,

Isolation of Pantoea agglomerans in two cases of septic mono-arthritis after plant

thorn and wood sliver injuries. J Clin Microbiol 2000, 38.

34. Kratz, A., Greenberg, D., Barki, Y., Cohen, E., Lifshitz, M., Pantoea

agglomerans as a cause of septic arthritis after palm tree thorn injury; case report

and literature review. Arch Dis Child 2003, 88, 542-544.

35. Schmid, H.; Schubert, S.; Weber, C.; Bogner, J.R., Isolation of a Pantoea

dispersa-like strain from a 71-year-old woman with acute myeloid leukemia and

multiple myeloma. Infection 2003, 31, 66-67.

128

36. De Baere, T.; Verhelst, R.; Labit, C.; Verschraegen, G.; Wauters, G.; Claeys, G.;

Vaneechoutte, M., Bacteremic infection with Pantoea ananatis. J Clin Microbiol

2004, 42, 4393-4395.

37. Brady, C.L.; Cleenwerck, I.; Venter, S.N.; Engelbeen, K.; De Vos, P.; Coutinho,

T.A., Emended description of the genus Pantoea, description of four species from

human clinical samples, Pantoea septica sp. nov., Pantoea eucrina sp. nov.,

Pantoea brenneri sp. nov. and Pantoea conspicua sp. nov., and transfer of

Pectobacterium cypripedii (Hori 1911) Brenner et al. 1973 emend. Hauben et al.

1998 to the genus as Pantoea cypripedii comb. nov. Int J Syst Evol Microbiol

2010, 60, 2430-2440.

38. Volksch, B.; Thon, S.; Jacobsen, I.D.; Gube, M., Polyphasic study of plant- and

clinic-associated Pantoea agglomerans strains reveals indistinguishable virulence

potential. Infect Genet Evol 2009, 9, 1381-1391.

39. Coenye, T.; Vandamme, P., Diversity and significance of Burkholderia species

occupying diverse ecological niches. Environ Microbiol 2003, 5, 719-729.

40. Maeda, Y.; Kiba, A.; Ohnishi, K.; Hikichi, Y., Implications of amino acid

substitutions in GyrA at position 83 in terms of oxolinic acid resistance in field

isolates of Burkholderia glumae, a causal agent of bacterial seedling rot and grain

rot of rice. Appl Environ Microbiol 2004, 70, 5613-5620.

41. Kang, Y.; Kim, J.; Kim, S.; Kim, H.; Lim, J.Y.; Kim, M.; Kwak, J.; Moon, J.S.;

Hwang, I., Proteomic analysis of the proteins regulated by HrpB from the plant

pathogenic bacterium Burkholderia glumae. Proteomics 2008, 8, 106-121.

129

42. Stevens, M.P.; Haque, A.; Atkins, T.; Hill, J.; Wood, M.W.; Easton, A.; Nelson,

M.; Underwood-Fowler, C.; Titball, R.W.; Bancroft, G.J., et al., Attenuated

virulence and protective efficacy of a Burkholderia pseudomallei bsa type III

secretion mutant in murine models of . Microbiology 2004, 150, 2669-

2676.

43. Winstanley, C.; Hart, C.A., Presence of type III secretion genes in Burkholderia

pseudomallei correlates with Ara(-) phenotypes. J Clin Microbiol 2000, 38, 883-

885.

44. Di Cello, F.; Bevivino, A.; Chiarini, L.; Fani, R.; Paffetti, D.; Tabacchioni, S.;

Dalmastri, C., Biodiversity of a Burkholderia cepacia population isolated from

the maize rhizosphere at different plant growth stages. Appl Environ Microbiol

1997, 63, 4485-4493.

45. Mahenthiralingam, E.; Urban, T.A.; Goldberg, J.B., The multifarious,

multireplicon Burkholderia cepacia complex. Nat Rev Microbiol 2005, 3, 144-

156.

46. Govan, J.R.; Hughes, J.E.; Vandamme, P., Burkholderia cepacia: medical,

taxonomic and ecological issues. J Med Microbiol 1996, 45, 395-407.

47. Ballard, R.W.; Palleroni, N.J.; Doudoroff, M.; Stanier, R.Y.; Mandel, M.,

Taxonomy of the aerobic pseudomonads: Pseudomonas cepacia, P. marginata, P.

alliicola and P. caryophylli. J Gen Microbiol 1970, 60, 199-214.

48. Govan, J.R.; Brown, P.H.; Maddison, J.; Doherty, C.J.; Nelson, J.W.; Dodd, M.;

Greening, A.P.; Webb, A.K., Evidence for transmission of Pseudomonas cepacia

by social contact in cystic fibrosis. Lancet 1993, 342, 15-19.

130

49. Holmes, A.; Nolan, R.; Taylor, R.; Finley, R.; Riley, M.; Jiang, R.Z.; Steinbach,

S.; Goldstein, R., An epidemic of Burkholderia cepacia transmitted between

patients with and without cystic fibrosis. J Infect Dis 1999, 179, 1197-1205.

50. LiPuma, J.J., Burkholderia cepacia. Management issues and new insights. Clin

Chest Med 1998, 19, 473-486, vi.

51. Graves, M.; Robin, T.; Chipman, A.M.; Wong, J.; Khashe, S.; Janda, J.M., Four

additional cases of Burkholderia gladioli infection with microbiological correlates

and review. Clin Infect Dis 1997, 25, 838-842.

52. Schwab, U.; Leigh, M.; Ribeiro, C.; Yankaskas, J.; Burns, K.; Gilligan, P.; Sokol,

P.; Boucher, R., Patterns of epithelial cell invasion by different species of the

Burkholderia cepacia complex in well-differentiated human airway epithelia.

Infect Immun 2002, 70, 4547-4555.

53. Diggle, S.P.; Lumjiaktase, P.; Dipilato, F.; Winzer, K.; Kunakorn, M.; Barrett,

D.A.; Chhabra, S.R.; Camara, M.; Williams, P., Functional genetic analysis

reveals a 2-Alkyl-4-quinolone signaling system in the human pathogen

Burkholderia pseudomallei and related bacteria. Chem Biol 2006, 13, 701-710.

54. DeShazer, D.; Waag, D.M.; Fritz, D.L.; Woods, D.E., Identification of a

Burkholderia mallei polysaccharide gene cluster by subtractive hybridization and

demonstration that the encoded capsule is an essential virulence determinant.

Microb Pathog 2001, 30, 253-269.

55. Fluit, A.C.; Schmitz, F.J.; Verhoef, J., Frequency of isolation of pathogens from

bloodstream, nosocomial pneumonia, skin and soft tissue, and urinary tract

131

infections occurring in European patients. Eur J Clin Microbiol Infect Dis 2001,

20, 188-191.

56. Mead, P.S.; Slutsker, L.; Dietz, V.; McCaig, L.F.; Bresee, J.S.; Shapiro, C.;

Griffin, P.M.; Tauxe, R.V., Food-related illness and death in the United States.

Emerg Infect Dis 1999, 5, 607-625.

57. Wisplinghoff, H.; Bischoff, T.; Tallent, S.M.; Seifert, H.; Wenzel, R.P.; Edmond,

M.B., Nosocomial bloodstream infections in US hospitals: analysis of 24,179

cases from a prospective nationwide surveillance study. Clin Infect Dis 2004, 39,

309-317.

58. Jha, A.K.; Bais, H.P.; Vivanco, J.M., Enterococcus faecalis mammalian

virulence-related factors exhibit potent pathogenicity in the Arabidopsis thaliana

plant model. Infect Immun 2005, 73, 464-475.

59. Coburn, B.; Li, Y.; Owen, D.; Vallance, B.A.; Finlay, B.B., Salmonella enterica

serovar Typhimurium pathogenicity island 2 is necessary for complete virulence

in a mouse model of infectious enterocolitis. Infect Immun 2005, 73, 3219-3227.

60. Schmitt, C.K.; Ikeda, J.S.; Darnell, S.C.; Watson, P.R.; Bispham, J.; Wallis, T.S.;

Weinstein, D.L.; Metcalf, E.S.; O'Brien, A.D., Absence of all components of the

flagellar export and synthesis machinery differentially alters virulence of

Salmonella enterica serovar Typhimurium in models of typhoid fever, survival in

macrophages, tissue culture invasiveness, and calf enterocolitis. Infect Immun

2001, 69, 5619-5625.

132

61. Stevanin, T.M.; Poole, R.K.; Demoncheaux, E.A.; Read, R.C., Flavohemoglobin

Hmp protects Salmonella enterica serovar Typhimurium from nitric oxide-related

killing by human macrophages. Infect Immun 2002, 70, 4399-4405.

62. Barak, J.D.; Kramer, L.C.; Hao, L.Y., Colonization of tomato plants by

Salmonella enterica is cultivar dependent, and type 1 trichomes are preferred

colonization sites. Appl Environ Microbiol 2011, 77, 498-504.

63. Barak, J.D.; Gorski, L.; Naraghi-Arani, P.; Charkowski, A.O., Salmonella

enterica virulence genes are required for bacterial attachment to plant tissue. Appl

Environ Microbiol 2005, 71, 5685-5691.

64. Schikora, A.; Carreri, A.; Charpentier, E.; Hirt, H., The dark side of the salad:

Salmonella typhimurium overcomes the innate immune response of Arabidopsis

thaliana and shows an endopathogenic lifestyle. PLoS One 2008, 3, e2279.

65. Cooley, M.B.; Miller, W.G.; Mandrell, R.E., Colonization of Arabidopsis

thaliana with Salmonella enterica and enterohemorrhagic Escherichia coli

O157:H7 and competition by Enterobacter asburiae. Appl Environ Microbiol

2003, 69, 4915-4926.

66. Dreyfus, G., Williams, A.W., Kawagishi, I., Macnab, R.M., Genetic and

biochemical analysis of Salmonella typhimurium flu, a flagellar protein related to

the catalytic subunit of the FoF1 ATPase and to virulence proteins of mammalian

and plant pathogens. J Bacteriol 1993, 175, 3131-3138.

67. Kurz, C.L.; Chauvet, S.; Andres, E.; Aurouze, M.; Vallet, I.; Michel, G.P.; Uh,

M.; Celli, J.; Filloux, A.; De Bentzmann, S., et al., Virulence factors of the human

133

opportunistic pathogen identified by in vivo screening.

EMBO J 2003, 22, 1451-1460.

68. Lyerly, D.; Gray, L.; Kreger, A., Characterization of rabbit corneal damage

produced by Serratia keratitis and by a Serratia protease. Infect Immun 1981, 33,

927-932.

69. Rascoe, J.; Berg, M.; Melcher, U.; Mitchell, F.L.; Bruton, B.D.; Pair, S.D.;

Fletcher, J., Identification, phylogenetic analysis, and biological characterization

of Serratia marcescens strains causing cucurbit yellow vine disease.

Phytopathology 2003, 93, 1233-1239.

70. Bruton, D.B.; Mitchell, F.; Fletcher, J.; Pair, S.D.; Wayadande, A.; Melcher, U.;

Brady, J.; Bextine, B.; Popham, T.W., Serratia marcescens, a phloem-colonizing,

squash bug-transmitted bacterium: casual agent of cucurbit yellow vine disease.

Plant Dis. 1998, 87, 512-520.

71. Bruton, D.B.; Brady, J.; Mitchell, F.; Bextine, B.; Wayadande, A.; Pair, S.D.;

Fletcher, J.; Melcher, U., Yellow vine of cucurbits: Pathogenicity of Serratia

marcescens and transmission by Anasa tristis.(Abstr.) Phytopathology 2001, 91

(suppl.), S11.

72. Labbate, M.; Queek, S.Y.; Koh, K.S.; Rice, S.A.; Givskov, M.; Kjelleberg, S.,

Quorum sensing-controlled biofilm development in Serratia liquefaciens MG1.

Journal of Bacteriology 2004, 186, 692-698.

73. Labbate, M.; Zhu, H.; Thung, L.; Bandara, R.; Larsen, M.R.; Willcox, M.D.P.;

Givskov, M.; Rice, S.A.; Kjelleberg, S., Quorum sensing regulation of adhesion

134

in Serratia marcescens MG1 is surface dependent. J. Bacteriol. 2007, 189, 2702-

2711.

74. Shanks, R.M.Q.; Stella, N.A.; Kalivoda, E.J.; Doe, M.R.; O'Dee, D.A.; Lathrop,

K.L.; Guo, F.L.; Nau, G.J., A Serratia marcescens OxyR homolog mediates

surface attachment and biofilm formation. Journal of Bacteriology 2007, 189,

7262-7272.

75. Wang, S.A.; Tokars, J.I.; Bianchine, P.J.; Carson, L.A.; Arduino, M.J.; Smith,

A.L.; Hansen, N.C.; Fitzgerald, E.A.; Epstein, J.S.; Jarvis, W.R., Enterobacter

cloacae bloodstream infections traced to contaminated human albumin. Clin

Infect Dis 2000, 30, 35-40.

76. Paton, A.W.; Paton, J.C., Enterobacter cloacae producing a Shiga-like toxin II-

related cytotoxin associated with a case of hemolytic-uremic syndrome. J Clin

Microbiol 1996, 34, 463-465.

77. de Kort, G.; Bolton, A.; Martin, G.; Stephen, J.; van de Klundert, J.A., Invasion of

rabbit ileal tissue by Enterobacter cloacae varies with the concentration of OmpX

in the outer membrane. Infect Immun 1994, 62, 4722-4726.

78. Nishijima, K.A., Wall, M.M., Siderhurst, M.S., Demonstrating pathogenicity of

Enterobacter cloacae on macadamia and identifying associated volatiles of gray

kernel of macadamia in Hawaii. Plant Disease 2007, 91, 1221-1228.

79. Takahashi, Y., Takahashi, K., Watanabe, K., Kawano, T., Bacterial black spot

caused by Burkholderia andropogonis on Odontoglossum and intergeneric hybrid

orchids. J. Gen. Plant Path. 2004, 70, 284-287.

135

80. Murray, B.E., Diversity among the multidrug-resistant enterococci. Emerging Infectious

Diseases 1998, 4, 46-65.

81. Ike, Y.; Hashimoto, H.; Clewell, D.B., Hemolysin of Streptococcus faecalis

subspecies zymogenes contributes to virulence in mice. Infect Immun 1984, 45,

528-530.

82. Mohamed, J.A.; Huang, D.B., Biofilm formation by enterococci. J Med Microbiol

2007, 56, 1581-1588.

83. Shankar, N.; Lockatell, C.V.; Baghdayan, A.S.; Drachenberg, C.; Gilmore, M.S.;

Johnson, D.E., Role of Enterococcus faecalis surface protein Esp in the

pathogenesis of ascending urinary tract infection. Infect Immun 2001, 69, 4366-

4372.

84. Park, S.Y.; Kim, K.M.; Lee, J.H.; Seo, S.J.; Lee, I.H., Extracellular gelatinase of

Enterococcus faecalis destroys a defense system in insect hemolymph and human

serum. Infect Immun 2007, 75, 1861-1869.

85. Pearce-Duvet, J.M., The origin of human pathogens: evaluating the role of

agriculture and domestic animals in the evolution of human disease. Biol Rev

Camb Philos Soc 2006, 81, 369-382.

86. Muller, H.E., Occurrence and pathogenic role of Morganella-Proteus-

Providencia group bacteria in human feces. J. Clin. Microbiol. 1986, 23, 404-405.

87. Beatson, S., Pharaoh's ants as pathogen vectors in hospitals. The Lancet 1972,

299, 425-427.

136

88. Fotedar, R.; Banerjee, U.; Singh, S.; Shriniwas; Verma, A.K., The housefly

(Musca domestica) as a carrier of pathogenic microorganisms in a hospital

environment. J Hosp Infect 1992, 20, 209-215.

89. Stavrinides, J., Origin and evolution of phytopathogenic bacteria. In Plant Pathogenic

Bacteria: Genomics and Molecular Biology, Jackson, R.W., Ed. Caister Academic Press:

2009.

90. Palleroni, N., The genus Pseudomonas. The Williams & Wilkins Co.: Baltimore,

USA, 1984; Vol. 1.

91. van Baarlen, P.; van Belkum, A.; Summerbell, R.C.; Crous, P.W.; Thomma,

B.P.H.J., Molecular mechanisms of pathogenicity: how do pathogenic

microorganisms develop cross-kingdom host jumps? Fems Microbiology Reviews

2007, 31, 239-277.

92. Medzhitov, R.; Janeway, C.A., Innate immunity: Impact on the adaptive immune

response. Curr. Opin. Immunol. 1997, 9, 4-9.

93. Rosenblueth, M.; Martinez-Romero, E., Bacterial endophytes and their

interactions with hosts. Mol Plant Microbe Interact 2006, 19, 827-837.

94. Andreote, F.D.; Rossetto, P.B.; Souza, L.C.; Marcon, J.; Maccheroni, W., Jr.;

Azevedo, J.L.; Araujo, W.L., Endophytic population of Pantoea agglomerans in

citrus plants and development of a cloning vector for endophytes. J Basic

Microbiol 2008, 48, 338-346.

95. Thaller, M.C.; Berlutti, F.; Schippa, S.; Lombardi, G.; Rossolini, G.M.,

Characterization and sequence of PhoC, the principal phosphate-irrepressible acid

phosphatase of Morganella morganii. Microbiology 1994, 140, 1341-1350.

137

96. Iniguez, A.L.; Dong, Y.; Triplett, E.W., Nitrogen fixation in wheat provided by

Klebsiella pneumoniae 342. Molecular Plant-Microbe Interactions 2004, 17,

1078-1085.

97. Coombs, J.T.; Franco, C.M., Visualization of an endophytic Streptomyces species

in wheat seed. Appl Environ Microbiol 2003, 69, 4260-4262.

98. Loria, R.; Bignell, D.; Moll, S.; Huguet-Tapia, J.; Joshi, M.; Johnson, E.; Seipke,

R.; Gibson, D., Thaxtomin biosynthesis: the path to plant pathogenicity in the

genus Streptomyces. Antonie van Leeuwenhoek 2008, 94, 3-10.

99. Rose, C.E., III; Brown, J.M.; Fisher, J.F., Brain abscess caused by Streptomyces

infection following penetration trauma: case report and results of susceptibility

analysis of 92 isolates of Streptomyces species submitted to the CDC from 2000

to 2004. J. Clin. Microbiol. 2008, 46, 821-823.

100. Lund, B.; Wyatt, G., The effect of oxygen and carbon dioxide concentrations on

bacterial soft rot of potatoes. I. King Edward potatoes inoculated with Erwinia

carotovora var. atroseptica. Potato Research 1972, 15, 174-179.

101. van Peer, R.; Punte, H.L.; de Weger, L.A.; Schippers, B., Characterization of root

surface and endorhizosphere Pseudomonads in relation to their colonization of

roots. Appl Environ Microbiol 1990, 56, 2462-2470.

102. Berg, G.; Eberl, L.; Hartmann, A., The rhizosphere as a reservoir for opportunistic

human pathogenic bacteria. Environmental Microbiology 2005, 7, 1673-1685.

103. Parke, J.L.; Gurian-Sherman, D., Diversity of the Burkholderia cepacia complex

and implications for risk assessment of biological control strains

Annual Review of Phytopathology 2001, 39, 225-258.

138

104. Wright, S.A.I.; Zumoff, C.H.; Schneider, L.; Beer, S.V., Pantoea agglomerans

strain EH318 produces two antibiotics that inhibit Erwinia amylovora in vitro.

Appl. Environ. Microbiol. 2001, 67, 284-292.

139

Table 1. Summary of cross-kingdom pathogens.

Pathogen Plant host/Niche Plant pathogenicity factors1 Human disease Human pathogenicity factors1 Macadamia, dragon fruit, Respiratory/skin/urinary infection, Enterobacter cloacae Unknown cytotoxin, ompX orchids, papaya septicaemia Urinary/abdominal/cutaneous Enterococcus faecalis Arabidopsis thaliana fsrB, sprE Hemolysin, salB, esp, fsrB, sprE infections, septicaemia Burkholderia ambifaria Soil, maize roots Dioxygenase Unknown Unknown Burkholderia cenocepacia Soil, maize roots, onion T4SS Septicaemia Unknown Burkholderia cepacia Soil, rice, maize, wheat, onion Unknown Septicaemia Unknown Burkholderia gladioli Onion, gladiolus, iris, rice Unknown Septicaemia Unknown Burkholderia glathei Soil Unknown Unknown Unknown Chronic granulomatous Burkholderia glumae Rice toxR, T3SS Unknown disease Soil Unknown Melioidosis/glander Capsular polysaccharide Burkholderia plantarii Rice, gladiolus, iris rpoS, quorum sensing Melioidosis Rhamnolipids Burkholderia pseudomallei Tomato T3SS Melioidosis/glander pqsA–pqsE operon Burkholderia pyrrocinia Soil Unknown Melioidosis/glander Unknown Pantoea agglomerans Crown/root gall pthG, T3SS Arthritis/septicaemia T3SS Pantoea ananatis Eucalyptus, maize, rice T3SS Septicaemia Unknown Pantoea citrea Pineapple T3SS Septicaemia Unknown Pantoea dispersa Seeds T3SS Septicaemia Unknown Pantoea punctata Japanese mandarin oranges T3SS Unknown Unknown Pantoea septica Unknown Unknown Septicemia Unknown Pantoea stewartii Maize T3SS Unknown Unknown Pantoea terrea Japanese mandarin oranges Unknown Unknown Unknown Salmonella enterica Tomato, Arabidopsis thaliana agfA, agfB, Fili Gastroenteritis/typhoid fever Flagellum (flhD), T3SS LPS, iron uptake, hemolysin, Serratia marcescens Squash, pumpkin Fimbrial genes, biofilm, oxyR Septicaemia, urinary tract infection protease 1 T3SS, type III secretion system; T4SS, type IV secretion system;

140

Figure 1. Cycling of cross-kingdom pathogens between plants and humans. Movement of human-associated bacteria (Salmonella, Serratia, Enterobacter, Enterococcus) into the general environment may occur through wastewater, with additional microbial input coming from agricultural and livestock runoff. Irrigation using contaminated water, along with vectoring by insects can lead to inoculation of plant surfaces or plant soils.

Movement of potential human pathogenic bacteria from plants is also likely facilitated by insects, which can disperse bacteria into the general environment. The route back to humans is less clear, although some pathogens like Burkholderia and Pantoea can cause direct infections following cutaneous lesions or injuries from thorns or splinters.

141

Figure 1

142