<<

, of protective bacterial from pathogenic ancestors

Patrick H. Degnana,1, Yeisoo Yub, Nicholas Sisnerosb, Rod A. Wingb, and Nancy A. Morana

aDepartment of and , bArizona Genomics Institute, University of Arizona, Tucson, AZ 85721

Edited by Edward F. DeLong, Massachusetts Institute of Technology, Cambridge, MA, and approved April 14, 2009 (received for review January 7, 2009) engage in a multitude of beneficial and deleterious also be transmitted horizontally either intraspecifically [e.g., interactions with . Hamiltonella defensa, an endosymbiont sexually (22)] or interspecifically (12, 17). Moreover, protection of and other sap-feeding , protects its host by H. defensa has been shown to be transferable between from attack by wasps. Thus H. defensa is only condi- distantly related aphid species (19). tionally beneficial to hosts, unlike ancient nutritional symbionts, Although H. defensa confers protection, it also exhibits many such as , that are obligate. Similar to pathogenic bacteria, attributes of enteric pathogens. Its lifestyle requires that it invade H. defensa is able to invade naive hosts and circumvent host novel hosts, and a preliminary survey of its genome content immune responses. We have sequenced the genome of H. defensa showed that it contains many pathogenicity factors related to to identify possible mechanisms that underlie its persistence in host invasion (18). APSE strains encode toxins, including cyto- healthy aphids and protection from . The 2.1-Mb ge- lethal distending toxin and Shiga-like toxin, intimating a role of nome has undergone significant reduction in size relative to its horizontal transfer (HGT) in modulating the protection closest free-living relatives, which include and conferred by H. defensa (18, 23). species (4.6–5.4 Mb). Auxotrophic for 8 of the 10 essential amino To shed light on the interactions of H. defensa, its hosts, acids, H. defensa is reliant upon the essential amino acids produced , and invading parasitoids, we have sequenced the by Buchnera. Despite these losses, the H. defensa genome retains more and pathways for a variety of structures and H. defensa genome from a strain previously shown to confer processes than do obligate symbionts, such as Buchnera. Further- protection to A. pisum (16). The H. defensa genome combines more, putative pathogenicity loci, encoding type-3 secretion sys- mechanisms known from both symbiotic and pathogenic bacte- MICROBIOLOGY tems, and toxin homologs, which are absent in obligate symbionts, rial species. are abundant in the H. defensa genome, as are regulatory genes Results and Discussion that likely control the timing of their expression. The genome is also littered with mobile DNA, including phage-derived genes, Both general and specific features of the H. defensa genome , and insertion-sequence elements, highlighting its dy- reflect its lifestyle as a host-restricted, mutualist symbiont that namic nature and the continued role plays invades host cells. The moderately reduced genome consists of in shaping it. a 2,110,331-bp circular and a 59,034-bp conjugative with average G ϩ C contents of 40.1% and 45.3%, pisum ͉ facultative endosymbiont ͉ mobile DNA ͉ respectively (Table 1, Fig. 1). The chromosome contains a bacteriophage APSE canonical (oriC) situated between mnmG (gidA) and mioC. Of the 2,100 predicted coding sequences nsects host a wide diversity of noncultivable bacteria, which (CDS), 1,665 (79%) have homologs present in GenBank. Most Ihave important ecological ranging from remaining unique hypothetical proteins (75%) are Ͻ100 aa to (1, 2). Genome sequencing of noncultivatable (AA), making their identity as true genes equivocal. In addition, parasitic bacteria has revealed possible mechanisms responsible 188 readily identifiable pseudogenes were present; this number for reproductive manipulations (3–5), whereas of ob- is similar to that in Escherichia coli genomes (24). ligate mutualists of , aphids, psyllids, tsetse flies, and sharp- Phylogenies based on single loci place H. defensa in the shooters have documented biosynthetic abilities important to , but are otherwise poorly resolved (12, 25). host nutrition (6–10). Heritable that protect In analyses of multigene alignments of conserved, single-copy, their hosts from parasites and pathogens are increasingly being core proteins, H. defensa and another aphid endosymbiont, recognized as common. Because they are occasionally trans- , consistently fell within a containing ferred horizontally, sometimes between distantly related species, Yersinia spp. and Serratia spp. (Fig. 2). Low bootstrap values near these symbionts provide a conduit for the transfer of highly these nodes are elevated by removing Hamiltonella and Regiella adaptive and stably inherited traits (resistance and defense) from analyses, suggesting that the long branches reduce confi- between host species. So far, no such defensive symbiont has dence. Regardless, the phylogeny suggests that Hamiltonella and been studied using genome sequencing. Regiella form a lineage distinct from the entomopathogenic Hamiltonella defensa, a gamma-proteobacterium, is a mater- nematode symbionts and Xenorhabdus, and from nally transmitted defensive endosymbiont found sporadically in the sequenced tsetse symbiont glossinidius. sap-feeding insects, including aphids, psyllids, and whiteflies (11–13). In aphids (), H. defensa can block larval development of the solitary endoparasitoid wasps Author contributions: P.H.D., Y.Y., R.A.W., and N.A.M. designed research; P.H.D. and N.S. Aphidius ervi and Aphidius eadyi, rescuing the aphid host (14– performed research; P.H.D. analyzed data; and P.H.D. and N.A.M. wrote the paper. 16). The reduction in aphid mortality is variable among H. The authors declare no conflict of interest. defensa strains and is correlated to the presence of a temperate, This article is a PNAS Direct Submission. lambda-like bacteriophage APSE, which infects H. defensa (17– Data deposition: The sequences reported in this paper have been deposited in the GenBank 20). H. defensa occurs sporadically in A. pisum and is beneficial database (accession nos. CP001277, CP001278). only when parasitoids are present (21). Consequently, infection 1To whom correspondence should be addressed. E-mail: [email protected]. frequencies increase under strong parasitoid pressure but de- This article contains supporting information online at www.pnas.org/cgi/content/full/ crease when parasitoids are absent. H. defensa and APSE can 0900194106/DCSupplemental.

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0900194106 PNAS Early Edition ͉ 1of6 Downloaded by guest on September 28, 2021 Table 1. Comparison of H. defensa genome features to those of relevant Enterobacteriaceae B. aphidicola APS H. defensa 5AT S. glossindius E. coli K12 Y. pestis CO92

Chromosome, bp 640,681 2,110,331 4,171,146 4,639,221 4,653,728 Extrachromosomal elements 2 1 3 – 3 Total G ϩ C (%) 26.2 40.1 54.7 50.8 47.6 Total predicted CDS 571 2,100 2,432 4,284 4,012 Coding density (%) 86.7 80.8 50.9 87.9 83.8 Average CDS size (bp) 984 812 873 950 998 Pseudogenes 13 188 972 150 149 rRNA operons 2 3 7 7 6 tRNAs 32 42 69 86 70 Lifestyle Obligate Facultative Facultative Commensal Pathogen

CDS, coding sequences.

Complementarity of Host and Symbiont . The metabo- lacking from the insect diet of phloem sap (26). Our data suggest lism of H. defensa inferred from the genome confirms that it is that both H. defensa and the host insect rely on Buchnera, the host-dependent. It is an aerobic heterotroph that shares its required endosymbionts that synthesize essential amino acids central metabolic machinery with that of most free-living enteric from this limited carbon and nitrogen source (9, 10). Except for bacteria (Fig. 3). Unlike most endosymbionts, Hamiltonella is the glutamate/aspartate transporter (gltP), the H. defensa ge- also capable of the fermentation of pyruvate to lactate (pykF, nome contains no trace of the missing biosynthetic or transporter ldhA) and acetyl-CoA to acetate (pta, ackA). Thus, H. defensa genes. This suggests that, unlike S. glossinidius, which very appears able to produce energy even under oxygen-limiting recently became host-restricted (27, 28), H. defensa has had a conditions. long-term association with insects (Table S1) consistent with Biosynthesis of essential amino acids and vitamins is a hall- previous evidence (12). mark of nutritional endosymbionts, exemplified by Buchnera. Based on its gene set, H. defensa synthesizes only 2 essential and Putative Virulence Mechanisms Involved in . H. defensa’s 7 nonessential amino acids, but can make most essential vitamins abilities to invade novel insect hosts, to persist in them, and to except thiamine (B1) and pantothenate (B5) (see Fig. 3). Unlike kill their endoparasites are likely dependent on the presence of Buchnera, which lacks most active transport mechanisms, H. numerous loci commonly involved in pathogenicity (18). Our defensa likely acquires missing building blocks via substrate- results give a complete inventory of these pathogenicity or specific transporters. symbiosis factors and indicate that some of these loci have been The essential amino acids that H. defensa requires are largely rearranged or disrupted. For example, H. defensa carries two

0 0 A 0 B 2,00 0 5 1 2 0 00

0 ,80 1

4 coordinates 0

0 CDS 2 4 0 0 G+C skew 30

0 hypothetical mobile genetic

60 C , n=629 elements 1 n=514

AP*SE

0

0 6

1 , 40

0 coordinates G+C skew core genome CDS n=1,145 00 8 virulence/ transposase cell structure regulation 1 ,20 rRNA 0 cell processes transport mobile genetic plasmid putative 0 elements 1,00 virulence info. transfer unknown

Fig. 1. Genomic characteristics of H. defensa str. 5AT. (A) H. defensa genome schematic; rings starting from outer to innermost: (i) coordinates in kb; (ii)Gϩ C skew of hexamers using a 1,000-bp window; (iii) Predicted CDS with E. coli hit (red), NR hit (blue), hypothetical (yellow), pseudogene (gray); (iv) ribosomal RNAs (black), and putative virulence loci (pink); (v) IS elements (light green), group II introns (dark green), phage (blue), or plasmid (purple) islands. Lines connect repeated phage (blue) or plasmid (purple) blocks that are on the same strand (light) or inverted (dark). Asterisk indicates the location of the APSE prophage and the dashed line in (ii) is the location of the incomplete genome juncture. (B) Schematic of plasmid pHD5AT: outer ring (i) coordinates in kb; (ii) predicted coding sequences (CDS) of plasmid origin (purple), hypothetical (yellow), pseudogene (gray), IS elements (light green), and group II introns (dark green); inner ring (iii)Gϩ C skew of hexamers. (C) Graph of primary functional roles for chromosomal CDS and pseudogenes (stippled).

2of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0900194106 Degnan et al. Downloaded by guest on September 28, 2021 Enterobacteriaceae */* Escherichia coli K12 defensa T3SS are complete, neither forms a single genomic */* Escherichia coli O157:H7 island. Putative secreted effector proteins are scattered through- */* Salmonella typhimurium */* Citrobacter koseri out the genome and were probably acquired by multiple HGT */* Klebsiella pneumoniae events (Table S2). */* Enterobacter sp. 638 */* Enterobacter sakazakii The most abundant putative virulence factors are RTX (re- */* Erwinia tasmaniensis peats in toxin) toxins: a protein family that includes a variety of Pectobacterium carotovora */* Sodalis glossinidius – I exported proteins including ␣-hemolysin and leukotoxin (31). */* */* Regiella insecticola – I Hamiltonella defensa – I These proteins have highly variable lengths (800–6,000 AA) and 65/48 44/14 Yersinia pseudotuberculosis contain a tandemly repeated nonapeptide sequence that is */* Yersinia pestis – I */* */* Yersinia enterocolitica involved in binding calcium. The toxin genes (rtxA) tend to occur Serratia marcescens in operons containing an activating acyltransferase (rtxC) and an */* Serratia proteamaculans Proteus mirabilis ABC transporter (rtxBD). H. defensa contains 32 CDS with */* I */* Photorhabdus luminescens – similarity to rtxA, 2 copies of rtxB, and only a single copy of rtxD. */* Xenorhabdus nematophila – I These sequences are significantly diverged from known RTX */* */* Haemophilus influenzae Pasteurellaceae */* Pasteurella multocida toxins (20–40% AA identity), and several are possibly paralogs Haemophilus ducreyi (60–92% AA identity). The rtxA copies include both intact (n ϭ */* Vibrio cholerae Vibrionaceae Vibrio fischeri 10) and fragmented (n ϭ 22) CDS. Together, these data suggest */* Pseudomonas aeruginosa Pseudomonadaceae Pseudomonas entomophila – I past duplication and diversification of these toxin genes, fol- Xylella fastidiosa */* Xanthomonadaceae lowed by mutation and inactivation of some copies. */* Xanthomonas axonopodis Xanthamonas campestris 0.1 amino acid substitutions per site Response of H. defensa to Changing Environments. Despite the constrained biosynthetic capabilities of H. defensa, it has con- Fig. 2. Phylogenetic reconstruction of H. defensa and related Enterobacter- siderably more cell structural, DNA replication, recombination, iaceae using 88 single-copy orthologous proteins. Bacteria engaged in asso- ciations with insects are indicated (I). Support values are reported from 100 and repair genes than do obligate endosymbionts (2). H. defensa bootstrap replicates from RaxML, and PhyML analyses values greater than 80 also retains more regulatory genes, including global regulators are indicated by asterisks. (e.g., 4 sigma factors), specific regulators of biosynthetic path- ways (e.g., for production of biotin, cysteine, fatty acids), 4 pairs of putative 2-component regulators, and 3 genes involved in MICROBIOLOGY type-3 secretion systems (T3SS), which are similar in gene quorum sensing. content and order to T3SS in Salmonella typhimurium LT2 Pathogenic bacteria typically express virulence factors under (SPI-1, SPI-2) (18). These protein translocation systems are strict regulatory controls. In H. defensa, putative regulatory normally used by pathogens to invade host cells and evade host genes flank both T3SS, one of which is homologous to hilA, the immune responses (29) and are required for the maintenance of key regulator for SPI-1 (32). We have also identified homologs the Sodalis- symbiosis (28, 30). Although both H. of Hha and SlyA, which activate the expression of hemolysins

Tol- glucose, fructose, glucosamine, mannose Pal PTS SPI-1 SPI-2 permease ABC ABC

GSP PEP pyruvate 3+

SEC peptides PO FAD ribulose-5-P glucose–6-P 4 ADP Fe2+ Mg2+ chelated ATP AICAR PRPP ribose-5-P fructose–6–P ABC Fe2+ ABC Zn2+ IMP UMP glyceraldehyde– pyridoxal 5’-P 3–P ATP H+ purines pyrimidines chorismate PEP UPP ADP 2H O 4H+ 2 pyruvate OPP

O2 ubiquinone THF lactate THR LYS acetyl-CoA CoA acetate Na+ NADH L-aspartate semialdehyde pantothenate SSS malate + 4H thiamine TCA cycle ABC NAD+ ASP ASN glycerate-3-P SAM MET ABC

dehydrogenase I NADH dehydrogenase NADP GLU GLY SER ARG ABC

ABC VAL, ILE, Na+ hemin glutathione CYS ALA LIVCS LEU H+ SER PRO TYR GLN AA* PHE, TRP, APC Fe-S cluster heme o H+ H+ Na+ H+ pimeloyl-CoA biotin TYR assembly H+ LYS APC SDAC APC SSS ArAAP ABC ABC

Fig. 3. Metabolic reconstruction of H. defensa indicates that it can complete glycolysis, the tricarboxylic acid (TCA) cycle, and the pentose phosphate pathway, in addition to producing both prymidines and purines. Essential (red) and nonessential (green) amino acids are either synthesized de novo or imported by a substrate-specific transporter. Most vitamins and cofactors (blue) are synthesized, although pantothenate and thiamin must be imported. Circles indicate genes in a particular pathway that are present (filled) or absent (open). *Putative ‘‘polar’’ amino acid transporter may transport histidine or threonine.

Degnan et al. PNAS Early Edition ͉ 3of6 Downloaded by guest on September 28, 2021 (33, 34), and 2-component regulators and quorum-sensing genes A B no. are also known to influence expression of virulence factors. The APSE (4/1) glycolysis protein peptides emPAI diversity of regulatory genes suggests a mechanism by which H. other (5/0) GroEL 603 43.5 (18/2) defensa copes with changing environments, such as invasion of a OmpA 167 14.3 membrane OmpN 212 8.8 new host species or attack of hosts by parasitoids. components (12/4) TufB 90 2.8 Tpx 12 2.1 Repetitious Genomics. The genome of H. defensa is riddled with OmpX 29 2.0 mobile DNA. Insertion sequences (IS), group II introns, inte- (8/0) GroES 16 1.9 grated prophage, and plasmids comprise 21% of the genome DnaK 143 1.9 (444,936 bp) (see Fig. 1). Estimates of genetic diversity for the DegP 62 1.9 stress & Ftn 9 1.8 most prevalent, intact IS elements are very low (␲ ϭ 0.000– chaperones APSE capsid 61 1.7 0.040), suggesting recent transpositional activity or gene con- (10/4) (20/1) HDEF_0002 14 1.2 version (see Table S3). The single active group II intron also appears to have undergone recent retrotransposition (see Fig. 1, Fig. 4. Functional distribution of H. defensa proteins recovered from MudPit and Table S3). The lack of site specificity has resulted in analysis. (A) The 89 identified proteins are divided by principle functional roles. retrotransposition within and between genes, as well as into Numbers in parentheses indicate the number of expressed (open, Ͻ1.0) and previously retrotransposed group II introns. PCR screens of H. highly expressed (stippled, Ͼ1.0) proteins in each category based on exponen- defensa strains from different hosts showed that ISHde1, tially modified protein abundance index (emPAI) values. (B) Table of the 12 highly expressed proteins, the number of peptides recovered for each protein, and ISHde2, and ISHde3 were widespread, whereas ISHde4 and the emPAI values. Colors correspond to the assigned functional roles in (A). group II intron were in fewer than half of tested strains (see Table S3). Proliferation of repeats is expected in intracellular bacteria, as they tend to have small effective population sizes Other recovered H. defensa proteins include ones involved in (Ne) because of recurrent transmission bottlenecks, increasing core processes (e.g., transcription, translation) and conserved or the level of genetic drift (35). hypothetical proteins encoded in the genome but having un- Genome evolution and virulence in H. defensa, as in many known functions. free-living bacteria, has been influenced by interactions with bacteriophage (23). Apart from the APSE prophage, H. defensa Conclusions contains 22 phage-like gene blocks (153,384 bp), several of which The reduced size and compositional bias in the genome of H. have undergone partial duplication (see Fig. 1 and Table S4). defensa reflects a long-term, stable association with its insect The prophage islands were readily identified because of both hosts. In this respect, the H. defensa genome is similar to gene content (e.g., phage integrases) and elevated G ϩ C% genomes, which are small, have highly reduced bio- (mean 46.5%). Except for APSE, the prophage appear to be synthetic capabilities, and encode an abundance of mobile inactive, as all of the islands are fragmentary and most contain genetic elements (3, 5). Whereas Wolbachia is known mostly as inactivated or truncated genes. Mobile elements were probably a reproductive parasite and antagonist of its hosts, H. defensa involved in the inactivation, rearrangement, and duplication of protects hosts from parasites. Genes for toxins, effector proteins, the gene blocks, most of which (16 of 22) are flanked on one or and 2 T3SS are likely to be critical elements underlying this more sides by either an IS element or group II intron. mutualistic role. The presence of numerous homologs of known H. defensa bears a conjugative IncFII plasmid pHD5AT. The virulence factors, which have homologs in other insect symbionts type IV secretion system (T4SS) and pilus it encodes are similar and in mammalian and pathogens, reiterates how con- to the tra and pil loci from the Serratia entomophila plasmid served genetic mechanisms involved in bacterial-eukaryotic pADAP. These loci underlie the mobilization and dissemination cellular interactions can result in vastly different outcomes. of pADAP, which carries genes responsible for the cessation of Some of the virulence-gene homologs (e.g., rtxA) are not intact, insect feeding (36). In contrast to pADAP, the pHD5AT plasmid suggesting a changing role for the toxins in this symbiosis. These has no genes implicated in virulence or resistance. shifting gene sets likely reflect the inherent dynamism of antag- Integrated plasmid genes represent 9% of the H. defensa genome onistic interactions, which impose ongoing selection for counter- (197,022 bp) (see Fig. 1 and Table S4). They share features with the in parasites, hosts, and symbionts. Gene losses and ϩ prophage blocks, including elevated G C content (44.2%), a large inactivations in H. defensa are tempered by gene gains via HGT, fraction of pseudogenes or truncated proteins, and flanking IS evidenced by the abundance of plasmid and phage islands. elements or group II introns. Two of the islands are the result of Although the variable toxins encoded by the phage APSE appear chromosomal integration and decay of pHD5AT, as indicated by to contribute to parasitoid protection, the H. defensa genome missing or inactivated genes (Fig. S1 and SI Methods). Two other reveals a history of association with other phage and plasmids plasmid islands are inactivated T4SS, yet are phylogenetically that likely played an earlier role in resorting ecologically impor- distinct from the tra on pHD5AT (see Table S4). The tant genes among H. defensa strains and possibly other bacteria. remaining islands contain a variety of plasmid-associated genes, but precise assignation of fragments to plasmids or integration events Methods are difficult because of recombination. DNA Isolation and Construction of Libraries. We used 2 complementary se- quencing strategies to complete the H. defensa genome: (i) subcloning and H. defensa Proteome. To explore the expression of H. defensa Sanger sequencing a large insert BAC library and (ii) pyrosequencing (Fig. S2). genes and proteins, we performed a proteomics experiment on Intact H. defensa cells were purified from whole insects to minimize contam- a sample of purified H. defensa cells, using the genome sequence ination with aphid and Buchnera DNA, as described previously (18). A BAC for peptide and protein identification. Implementing conserva- library was constructed, fingerprinted, and minimal tiling paths were chosen tive identity cutoffs, we identified 89 expressed proteins (Fig. 4 (as in ref. 38). Individual BACs were then subcloned, sequenced bi- and Table S5). Several phage APSE proteins and one T3SS directionally with ABI3730xl sequencers, and assembled using Phred, Phrap, and Consed (39–41). Overlapping and validated BACs were then merged. protein (SseC) were recovered. Among the most highly ex- Bacterial genomes contain nonclonable fragments, so we performed py- pressed proteins were those involved bacterial responses to stress rosequencing as an unbiased sequencing method. High molecular weight and membrane components. Indeed, the most abundant protein, DNA was isolated directly from the purified H. defensa cells using the Pure- GroEL (Hsp60, MopA) or , is also the most abundant gene Tissue Core Kit B (Qiagen). We generated a standard and paired-end protein in other obligate and facultative endosymbionts (37). single-stranded template DNA (sstDNA) library using the GS DNA Library

4of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0900194106 Degnan et al. Downloaded by guest on September 28, 2021 Preparation Kits (Roche Applied Sciences) that were then amplified by emPCR 29 other genomes (Table S6) (44). Protein sequences of each ortholog were and sequenced on a GS-FLX (454 Sciences). The 454 reads were assembled aligned in Muscle v3.6 (45), and all invariant and gap-containing columns were with Newbler (v1.1.03.24) using default parameters. removed. Individual protein alignments were then concatenated into 4 align- ments (Table S7). Alignments without H. defensa and R. insecticola sequences Final Assembly and Genome Closure. Putative H. defensa contigs generated were also generated to assess impact of of long-branch attraction or other from the 454 reads and distinct from the finished BACs were sorted and artifacts. Each dataset was analyzed with RaXML and PhyML (46, 47), and oriented using linking information from the paired ends. PCR primers were unique topologies were compared using the SH-test in TREE-PUZZLE 5.2 (48). designed at the contig ends, and products were amplified and sequenced The topology with the lowest log likelihood and that disagreed with the using standard protocols described elsewhere (23). The 454 reads for each fewest datasets is presented. Support values were estimated from 100 non- scaffold were then reassembled with Newbler, and Sanger reads were incor- parametric bootstrap replicates. porated in Consed using Phrap. Protein Expression. Briefly, H. defensa cells were isolated as above and imme- Genome Annotation. Genes were predicted for the finished H. defensa ge- diately frozen at –80 °C. The cell pellet was thawed, homogenized, and nome using Glimmer v3.02 (protein-coding genes), tRNAscan-SE (tRNAs) and centrifuged, and proteins were precipitated. The resulting pellet was dis- BlastN (structural and ribosomal RNAs). Putative CDS greater than 30 AA were solved and run on a 10% SDS/PAGE gel, and the lane was divided into sections annotated using consensus of BlastP similarity searches to NR, all microbial and subjected to alkylation and in-gel tryptic . The tryptic peptides genomes, and E. coli str. K12 and protein domain searches using Hmmr and the were extracted from each gel section, concentrated, and injected into an Pfam࿝ls database (42). CDS without hits having expectation values less than LC-MS/MS system. Resultant tandem mass spectra were processed and ana- 10Ϫ10 (BlastP) and 10Ϫ4 (Pfam) were annotated as hypothetical, and CDS with lyzed with Mascot 2.2 (Matrix Science), using a database of H. defensa, B. conflicting results were assigned as putative. Predicted start codons were aphidicola, and A. pisum protein sequences. The results were filtered using a adjusted manually using alignments to the top 5 NR hits and the E. coli best hit Mascot significance threshold of 0.05 and Mowse ion score cutoff of Ͼ31, and if present. Intergenic regions were rescreened with BlastX for possible CDS the false-discovery rate for H. defensa peptides was 0.2%. missed by Glimmer. CDS with truncations Ͼ40% length or fragmented CDS were designated pseudogenes in the final annotation. Boundaries of multi- ACKNOWLEDGMENTS. The authors thank K. Hammond, B. Nankivell, K. copy repeats (e.g., insertion sequences, group II introns) were identified by Sunitsch and J. Currie, T.R. Mueller, K. Collura, R. He, and J.L. Goicoechea of the consensus alignments. Gene functions were inferred from those of identified Arizona Genomics Institute. We also thank J. Ewbank and H. Goodrich-Blair for homologs, and the integration of genes into metabolic pathways was deter- access to unpublished genome data and Q. Lin at the University of Albany mined using EcoCyc (43). Proteomics Facility for running the protein sample. This research was supported by National Science Foundation Grant 0313737 (to N.A.M.). P.H.D. received funding from National Science Foundation Integrative Graduate Education and Whole Genome Phylogeny. Multigene phylogenetic reconstruction was used to Research Traineeship Fellowship in Evolutionary and Functional Genomics, the determine the relationship of H. defensa with other gamma-. Center for Insect Science at the University of Arizona, and National Science MICROBIOLOGY Briefly, we identified 88 of 203 single copy orthologs (SICO) in H. defensa and Foundation Doctoral Dissertation Improvement Grant Award 0709992.

1. Buchner P (1965) Endosymbiosis of with Plant . (John Wiley 20. van der Wilk F, Dullemans AM, Verbeek M, van den Heuvel JF (1999) Isolation and and Sons, New York). characterization of APSE-1, a bacteriophage infecting the secondary endosymbiont of 2. Moran NA, McCutcheon JP, Nakabachi A (2008) Genomics and evolution of heritable Acyrthosiphon pisum. Virology 262:104–113. bacterial symbionts. Annu Rev Genet 42:165–190. 21. Oliver KM, Campos J, Moran NA, Hunter MS (2008) Population dynamics of defensive 3. Klasson L, et al. (2008) Genome evolution of Wolbachia strain wPip from the Culex symbionts in aphids. Proc Roy Soc 275:293–299. pipiens group. Mol Biol Evol 25:1877–1887. 22. Moran NA, Dunbar HE (2006) Sexual acquisition of beneficial symbionts in aphids. Proc 4. Sinkins SP, et al. (2005) Wolbachia variability and host effects on crossing type in Culex Natl Acad Sci USA 103:12803–12806. mosquitoes. Nature 436:257–260. 23. Degnan PH, Moran NA (2008) Diverse-phage encoded toxins in a protective insect 5. Wu M, et al. (2004) Phylogenomics of the reproductive parasite Wolbachia pipientis endosymbiont. Appl Environ Microbiol 74:6782–6791. 24. Lerat E, Ochman H (2004) ␺-␾: Exploring the outer limits of bacterial pseudogenes. wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol Genome Res 14:2273–2278. 2:e69. 25. Moran NA, Russell JA, Koga R, Fukatsu T (2005) Evolutionary relationships of three new 6. Akman L, et al. (2002) Genome sequence of the endocellular obligate symbiont of species of Enterobacteriaceae living as symbionts of aphids and other insects. Appl tsetse flies, Wigglesworthia glossinidia. Nat Genet 32:402–407. Environ Microbiol 71:3302–3310. 7. Degnan PH, Lazarus AB, Wernegreen JJ (2005) Genome sequence of 26. Sandstro¨m JP, Pettersson J (1994) Amino acid composition of phloem sap and the pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of relation to intraspecific variation in pea aphid (Acyrthosiphon pisum) performance. insects. Genome Res 15:1023–1033. J Insect Physiol 40:947–955. 8. McCutcheon JP, Moran NA (2007) Parallel genomic evolution and metabolic interde- 27. Darby AC, et al. (2005) Extrachromosomal DNA of the symbiont Sodalis glossinidius. J pendence in an ancient symbiosis. Proc Natl Acad Sci USA 104:19392–19397. Bacteriol 187:5003–5007. 9. Nakabachi A, et al. (2006) The 160-kilobase genome of the bacterial endosymbiont 28. Toh H, et al. (2006) Massive genome erosion and functional adaptations provide Carsonella. Science 314:267. insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host. Genome 10. van Ham RC, et al. (2003) Reductive genome evolution in Buchnera aphidicola. Proc Res 16:149–156. Natl Acad Sci USA 100:581–586. 29. Hueck CJ (1998) Type III protein secretion systems in bacterial pathogens of animals and 11. Clark MA, et al. (1992) The eubacterial endosymbionts of whiteflies (Homoptera: . Microbiol Mol Biol Rev 62:379–433. Aleyrodoidea) constitute a lineage distinct from the endosymbionts of aphids and 30. Dale C, Young SA, Haydon DT, Welburn SC (2001) The insect endosymbiont Sodalis mealybugs. Curr Microbiol 25:119–123. glossinidius utilizes a type III secretion system for cell invasion. Proc Natl Acad Sci USA 12. Russell JA, Latorre A, Sabater-Mun˜oz B, Moya A, Moran NA (2003) Side-stepping 98:1883–1888. secondary symbionts: widespread horizontal transfer across and beyond the 31. Lally ET, Hill B, Kieba IR, Korostoff J (1999) The interaction between RTX toxins and Aphidoidea. Mol Ecol 12:1061–1075. target cells. Trends Microbiol 7:356–361. 13. Sandstro¨m JP, Russell JA, White JP, Moran NA (2001) Independent origins and hori- 32. Baja V, Hwang C, Lee CA (1995) hilA is a novel ompR/toxR family member that zontal transfer of bacterial symbionts of aphids. Mol Ecol 10:217–228. activates the expression of Salmonella typhimurium invasion genes. Mol Microbiol 14. Bensadia F, Boudreault S, Guay J-F, Michaud D, Cloutier C (2005) Aphid clonal resistance 18:715–727. 33. Nieto JM, et al. (1997) Construction of a double hha hns mutant of Escherichia coli: to a parasitoid fails under heat stress. J Insect Phys 52:146–157. effect on DNA supercoiling and ␣-haemolysin production. FEMS Microbiol Lett 15. Ferrari J, Darby AC, Daniell TJ, Godfray HCJ, Douglas AE (2004) Linking the bacterial 155:39–44. in pea aphids with host-plant use and natural enemy resistance. Ecol 34. Wyborn NR, et al. (2004) Regulation of Escherichia coli hemolysin E expression by H-NS Entomol 29:60–65. and Salmonella SlyA. J Bacteriol 186:1620–1628. 16. Oliver KM, Russell JA, Moran NA, Hunter MS (2003) Facultative bacterial symbionts in 35. Moran NA, Plague GR (2004) Genomic changes following host restriction in bacteria. aphids confer resistance to parasitic wasps. Proc Natl Acad Sci USA 100:1803–1807. Curr Opin Genet Dev 14:627–633. 17. Degnan PH, Moran NA (2008) Evolutionary of a defensive facultative symbi- 36. Hurst MRH, Glare TR, Jackson TA (2004) Cloning Serratia entomophila antifeeding ont of insects: exchange of toxin-encoding bacteriophage. Mol Ecol 17:916–929. genes–a putative defective prophage active against the grass grub Costelytra zeal- 18. Moran NA, Degnan PH, Santos SR, Dunbar HE, Ochman H (2005) The players in a andica. J Bacteriol 186:5116–5128. mutualistic symbiosis: insects, bacteria, , and virulence genes. Proc Natl Acad Sci 37. Fares MA, Moya A, Barrio E (2004) GroEL and the maintenance of bacterial endosym- USA 102:16919–16926. biosis. Trends Genet 20:413–416. 19. Oliver KM, Moran NA, Hunter MS (2005) Variation in resistance to parasitism in aphids 38. Peterson DG, Tomkins JP, Frisch DA, Wing RA, Paterson AH (2000) Construction of plant is due to symbionts not host . Proc Natl Acad Sci USA 102:12795–12800. bacterial artificial chromosome (BAC) libraries: An illustrated guide. J Agr Genomics 5.

Degnan et al. PNAS Early Edition ͉ 5of6 Downloaded by guest on September 28, 2021 39. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. 44. Lerat E, Daubin V, Moran NA (2003) From gene trees to organismal phylogeny in Error probabilities. Genome Res 8:186–194. : the case of the ␥-Proteobacteria. PLoS Biol 1:e19. 40. Ewing B, Hillier L, Wendel MC, Green P (1998) Base-calling of automated sequencer 45. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high traces using phred. I. Accuracy assessment. Genome Res 8:175–185. throughput. Nucleic Acids Res 32:1792–1797. 41. Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. 46. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large Genome Res 8:195–202. phylogenies by maximum likelihood. Syst Biol 52:696–704. 42. Bateman A, et al. (2004) The Pfam protein families database. Nucleic Acids Res 47. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic anal- 32:D138–D141. yses with thousands of taxa and mixed models. 22:2688–2690. 43. Karp PD, et al. (2007) Multidimensional annotation of the Escherichia coli K-12 48. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood genome. Nucleic Acids Res 22:7577–7590. phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504.

6of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0900194106 Degnan et al. Downloaded by guest on September 28, 2021