Genome Sequence of the Enterobacterial Phytopathogen Erwinia Carotovora Subsp
Total Page:16
File Type:pdf, Size:1020Kb
Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. atroseptica and characterization of virulence factors K. S. Bell*†, M. Sebaihia*, L. Pritchard†, M. T. G. Holden*, L. J. Hyman†, M. C. Holeva†, N. R. Thomson*, S. D. Bentley*, L. J. C. Churcher*, K. Mungall*, R. Atkin*, N. Bason*, K. Brooks*, T. Chillingworth*, K. Clark*, J. Doggett*, A. Fraser*, Z. Hance*, H. Hauser*, K. Jagels*, S. Moule*, H. Norbertczak*, D. Ormond*, C. Price*, M. A. Quail*, M. Sanders*, D. Walker*, S. Whitehead*, G. P. C. Salmond‡, P. R. J. Birch†, J. Parkhill*§, and I. K. Toth†§ *The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom; †Plant–Pathogen Interactions Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, United Kingdom; and ‡Department of Biochemistry, Tennis Court Road, Cambridge University, Cambridge CB2 1QW, United Kingdom Edited by Robert Haselkorn, University of Chicago, Chicago, IL, and approved June 3, 2004 (received for review April 6, 2004) The bacterial family Enterobacteriaceae is notable for its well Genome sequencing of the phytopathogens Pseudomonas studied human pathogens, including Salmonella, Yersinia, Shi- syringae pv. tomato (13), Ralstonia solanacearum (14), Xan- gella, and Escherichia spp. However, it also contains several plant thomonas campestris pv. campestris and Xanthomonas axonopo- pathogens. We report the genome sequence of a plant pathogenic dis pv. citri (15), Xylella fastidiosa (16, 17), and Agrobacterium enterobacterium, Erwinia carotovora subsp. atroseptica (Eca) strain tumefaciens (18, 19), has yielded a wealth of information on novel SCRI1043, the causative agent of soft rot and blackleg potato and shared candidate phytopathogenicity determinants. Here, diseases. Approximately 33% of Eca genes are not shared with we describe the complete genome sequence of Eca strain sequenced enterobacterial human pathogens, including some pre- SCRI1043 (Eca1043) and assess its similarity to other members dicted to facilitate unexpected metabolic traits, such as nitrogen of the Enterobacteriaceae and to other plant pathogenic bacte- fixation and opine catabolism. This proportion of genes also ria. We describe a wide array of genes previously unknown in Eca contains an overrepresentation of pathogenicity determinants, and, in some cases, in the Enterobacteriaceae, which are poten- including possible horizontally acquired gene clusters for putative tially involved in pathogenicity and metabolism. In a functional type IV secretion and polyketide phytotoxin synthesis. To investi- genomic investigation, we provide evidence that genes in two gate whether these gene clusters play a role in the disease process, identified clusters have a role in disease development. an arrayed set of insertional mutants was generated, and muta- Materials and Methods tions were identified. Plant bioassays showed that these mutants were significantly reduced in virulence, demonstrating both the Genome Sequencing and Annotation. The strain Eca SCRI1043 presence of novel pathogenicity determinants in Eca, and the (American Type Culture Collection catalog no. BAA-672) was impact of functional genomics in expanding our understanding of isolated from a potato stem with blackleg disease symptoms in phytopathogenicity in the Enterobacteriaceae. 1985 in Perthshire, Scotland (the sample used to provide DNA for sequencing was frozen shortly after isolation). This strain has been used in epidemiological and molecular studies for many he bacterial family Enterobacteriaceae includes many well years and is amenable to genetic manipulation (20, 21). The Tstudied human pathogens within genera such as Salmonella, initial shotgun was generated from 54,600 paired-end sequences Yersinia, and Shigella, as well as the model commensal and from two libraries in pUC18 with insert sizes of 2.2–2.5 and pathogenic species Escherichia coli, and the insect pathogen 2.5–4.0 kb, and 25,800 paired-end reads from a 4.5- to 5.5-kb Photorhabdus luminescens. More genome sequences have been insert library in pMAQ1b by using dye terminator chemistry on determined from this family than from any other (1–6). How- ABI3700 automated sequencers, giving 8.2-fold coverage. ever, the Enterobacteriaceae also include several significant Paired-end sequences from 500 clones from a 12- to 23-kb library plant pathogens, and here we present the complete genome in pBACe3.6 were used to generate a scaffold with 1.8-fold clone sequence of such a pathogen, Erwinia carotovora subsp. atrosep- coverage. All identified repeats were bridged by using read pairs tica (Eca). or end-sequenced PCR products. Eca, also named Pectobacterium carotovorum subsp. atrosep- The initial shotgun sequences were assembled by using PHRAP ticum (7), and recently, Pectobacterium atrosepticum (8), is a (www.phrap.org), and gap closure and finishing used GAP4 MICROBIOLOGY commercially important pathogen restricted to potato in tem- (http:͞͞staden.sourceforge.net) in combination with directed perate regions, where it causes blackleg in the field and soft rot sequencing (primer walks and subcloning and shotgun sequenc- of tubers after harvest. In contrast, the related soft rot bacteria ing of bacterial artificial chromosome clones or PCR products Erwinia carotovora subsp. carotovora (Ecc) and Erwinia chrysan- that spanned sequence gaps). The final sequence was composed themi (Ech) have broader host ranges, including potato, and of 106,500 reads, giving an average 10.2-fold coverage. The cause disease in warmer climates (9). The soft rot erwiniae also live as epiphytes and endophytes on plants or saprophytes in soil and ground water (9, 10), although little is known about these This paper was submitted directly (Track II) to the PNAS office. nonpathogenic lifestyles. Moreover, the molecular bases of Abbreviations: Eca, Erwinia carotovora subsp. atroseptica; Ecc, Erwinia carotovora subsp. differences in host range and the requirements for establishing carotovora; Ech, Erwinia chrysanthemi; CDS, coding sequence; HAI, horizontally acquired genomic island; RBH, reciprocal best hit; PCWDE, plant cell wall-degrading enzyme; T3SS, disease are largely unknown. Although soft rot pathogenesis type III secretion system; T4SS, type IV secretion system; LSD, least significant difference. relies primarily on the prolific production of extracellular plant Data deposition: The sequence reported in this paper has been deposited in the EMBL͞ cell wall-degrading enzymes (PCWDEs) that cause extensive GenBank database (accession no. BX950851). tissue maceration (9, 11), discoveries in recent years suggest the §To whom correspondence may be addressed: E-mail: [email protected] or itoth@ process may be far more subtle and complex than previously scri.sari.ac.uk. thought (12). © 2004 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0402424101 PNAS ͉ July 27, 2004 ͉ vol. 101 ͉ no. 30 ͉ 11105–11110 Downloaded by guest on September 29, 2021 Fig. 2. Ratio of observed to expected CDSs by functional class and bacterial group distribution. The log2(Cfunc͞Call) ratio of CDSs from Eca strain SCRI1043 shared with no other species (i, dark blue), nonenteric only (ii, cyan) enteric only (iii, yellow), or enteric and nonenteric species (iv, red), by functional class. Fig. 1. Comparison of the Eca genome sequence with other bacterial ͞ genomes. Inner to outer tracks show the locations of RBHs found by reciprocal value log2(Cfunc Call) was plotted to illustrate over- or under- FASTA of Eca CDSs against those from 32 bacterial genomes (circular plot): representation of each group (Fig. 2). Gram-positive (gray); Shewanella oneidensis (ochre); nonenteric animal pathogens (green); plant-associated bacteria (brown); nonenteric plant Transposon Mutation Grid. Donor (E. coli S17–1[pir] carrying the pathogens (red); and enterobacteria (blue) (Table 2). The locations of CDSs on Tn5 cassette [mTn5-gusA-pgfp21] in the pUT vector; ref. 28) and the Eca genome colored by functional class (see legend to Fig. 5). Two tracks indicating HAIs listed in Table 1. Shown are islands with evidence of recent recipient (Eca1043) strains were mated and transconjugants Kan acquisition (red bars) and possible islands based on reciprocal FASTA analysis recovered on M9 . A total of 9,216 transconjugants were (green bars). A plot of GϩC skew (red) and percent GϩC content (blue). grown in 24 384-well microtiter plates (M9Kan broth plus 4% glycerol), constituting the mutation grid. Clones from each of the 24 384-well plates were pooled and DNA extracted for PCR finished sequence was compared with DNA sequences in public screening by using primers designed to anneal to the gene of databases by using FASTA, BLASTX, and BLASTN (22). Potential interest and the end of the transposon cassette. Having identified coding sequences (CDSs) were predicted by a combination of a plate pool containing a mutant, PCR from each clone within ORPHEUS (23) and GLIMMER (24) and manual curation, and were that plate was conducted by using a 384-well PCR machine. searched for PFAM (25) domains, and compared with the protein Primers used to identify mutants were: VIRB4F 5Ј- databases by using BLASTP. ARTEMIS (26) was used to collate data CATTACCTCAGTTATGGCTATTACAGGGC-3Ј (virB4); and facilitate annotation. Stable noncoding RNAs were identi- PKSF3 5Ј-CGGATCGGTGGTTCCGTC-3Ј (cfa6); PKSR1 5Ј- fied by comparison with the RFAM database (27). Metabolic GGAGCACTATGCTTCCTGCAC-3Ј (cfa7); and Tn5R 5Ј- pathways