The mosaic genome structure of the wRi strain infecting

Lisa Klassona,1, Joakim Westberga,1, Panagiotis Sapountzisb, Kristina Na¨ slunda, Ylva Lutnaesa, Alistair C. Darbya,2, Zoe Venetic, Lanming Chend,3, Henk R. Braige, Roger Garrettd, Kostas Bourtzisb,c, and Siv G. E. Anderssona,4

aDepartment of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden; bDepartment of Environmental and Natural Resources Management, University of Ioannina, 30100 Agrinio, Greece; cInstitute of Molecular Biology and Biotechnology, FORTH, 71110 Heraklion, Crete, Greece; dCentre for Comparative Genomics, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark; and eSchool of Biological Sciences, Bangor University, Wales, LL57 2UW, United Kingdom

Edited by Nancy A. Moran, University of Arizona, Tucson, AZ, and approved February 20, 2009 (received for review October 24, 2008) The obligate intracellular bacterium Wolbachia pipientis infects (7) are both reproductive parasites that cause CI, whereas the around 20% of all species. It is maternally inherited and D-group strain wBm is an obligate mutualist in the nematode induces reproductive alterations of insect populations by male Brugia malayi (8). The 1.27-Mb genome of wMel and the killing, feminization, , or cytoplasmic incompati- 1.48-Mb genome of wPip contain several prophages and high bility. Here, we present the 1,445,873-bp genome of W. pipientis frequencies of repeated sequences, including many IS-elements. strain wRi that induces very strong cytoplasmic incompatibility in These genomes have a large repertoire of genes with ankyrin its natural host Drosophila simulans. A comparison with the pre- repeat motifs, 23 in wMel and 60 in wPip, several of which are viously sequenced genome of W. pipientis strain wMel from associated with mobile elements (6, 7). In contrast, the 1.08-Mb Drosophila melanogaster identified 35 breakpoints associated with genome of strain wBm contains no prophage, only a few ankyrin mobile elements and repeated sequences that are stable in Dro- repeat genes and a much lower fraction of repeated sequences, sophila lines transinfected with wRi. Additionally, 450 genes with possibly reflecting its mutualistic adaptation to a single-host orthologs in wRi and wMel were sequenced from the W. pipientis species. strain wUni, responsible for the induction of parthenogenesis in Sequence comparisons of single genes from multiple strains EVOLUTION the parasitoid uniraptor. The comparison of have provided evidence for recombination (9–12) and horizontal these A-group Wolbachia strains uncovered the most highly re- transfer of prophages and insertion sequence (IS)-elements combining intracellular bacterial genomes known to date. This was across Wolbachia strains (13–16). However, the distant relation- manifested in a 500-fold variation in sequence divergences at ship of the 3 sequenced Wolbachia genomes, manifested as an synonymous sites, with different genes and gene segments sup- almost complete lack of gene-order conservation and synony- porting different strain relationships. The substitution-frequency profile resembled that of Neisseria meningitidis, which is charac- mous substitution frequencies that are close to saturation, has terized by rampant intraspecies recombination, rather than that of precluded attempts to infer patterns and rates of recombination Rickettsia, where genes mostly diverge by nucleotide substitu- at the whole-genome level. Thus, with the exception of a few tions. The data further revealed diversification of ankyrin repeat single-gene studies, the extent to which recombination distorts genes by short tandem duplications and provided examples of the evolutionary coherence of Wolbachia genomes is currently horizontal gene transfer across A- and B-group strains that infect unknown. D. simulans. These results suggest that the transmission dynamics We report here the complete genome sequence of the super- of Wolbachia and the opportunity for coinfections have created a group-A Wolbachia strain wRi that naturally infects Drosophila freely recombining intracellular bacterial community with mosaic simulans and induces almost complete CI in its host (17, 18). genomes. Additionally, we present partial genome data of Wolbachia strain wUni from Muscidifurax uniraptor, likewise a supergroup-A horizontal transfer ͉ recombination ͉ ankyrin repeat gene ͉ strain that induces parthenogenesis in its host (19). A compar- genome evolution ͉ insect symbiosis ison of these 2 A-group Wolbachia strains with the previously sequenced genome of the A-group strain wMel reveals the most Wolbachia pipientis are intracellular ␣-proteobacteria of the highly recombining obligate intracellular bacterial community order Rickettsiales that infect as well as isopods, spiders, examined to date. scorpions, mites, and filarial nematodes (1, 2). These bacteria represent a single species, with strains classified into super- groups, of which the most abundant are supergroups A and B. Author contributions: H.R.B., R.G., K.B., and S.G.E.A. designed research; L.K., J.W., P.S., K.N., Y.L., A.C.D., Z.V., and L.C. performed research; L.K., J.W., K.B., and S.G.E.A. analyzed data; A lack of concordance between host and bacterial strain phy- and L.K., J.W., R.G., K.B., and S.G.E.A. wrote the paper. logenies indicate frequent host shifts in addition to maternal The authors declare no conflict of interest. inheritance within the individual host (1–4). In insect popula- tions, Wolbachia induce reproductive manipulations to enhance This article is a PNAS Direct Submission. their own spreading. The most frequently observed reproductive Freely available online through the PNAS open access option. abnormality is cytoplasmic incompatibility (CI) (1, 2, 5), where Data deposition: The sequences described in this paper have been deposited in GenBank database [accession nos. CP001391 (W. pipientis wRi) and ACFP01000000 (Wolbachia uninfected females are unable to produce offspring with infected wUni); the version described in this article is ACFP01000000]. males, whereas infected females can produce offspring with both 1L.K. and J.W. contributed equally to this work. infected and uninfected males, thus creating a reproductive 2Present address: School of Biological Sciences, University of Liverpool, Liverpool L69 7ZB, advantage for infected females. Other spectacular effects of United Kingdom. Wolbachia infections are male embryo killing, feminization, and 3Present address: School of Applied Sciences, University of Wolverhampton, Wolverhamp- parthenogenesis induction (1, 2). ton WV1 1LY, United Kingdom. Three genomes of Wolbachia have been published to date. The 4To whom correspondence should be addressed. E-mail: [email protected]. A-group strain wMel from Drosophila melanogaster (6) and the This article contains supporting information online at www.pnas.org/cgi/content/full/ B-group strain wPip from the mosquito Culex quinquefasciatus 0810753106/DCSupplemental.

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0810753106 PNAS Early Edition ͉ 1of6 Downloaded by guest on September 30, 2021 1 are located within or flanked by prophage sequences, and 6 are flanked by long repeats. The final breakpoint contains a 22-bp hairpin structure with a 4-bp loop. Among these breakpoints, 6 1,260,000 180,000 are close to genes encoding reverse trancriptase and 3 to genes A encoding DNA recombinase. This shows that mobile genetic elements and repeated sequences are hot spots for rearrange- ments in Wolbachia. To examine the stability of these recombi- nation hot spots, we used PCR to analyze all breakpoints from genomic DNA of wRi isolated from naturally infected and Wolbachia pipientis transinfected symbiotic associations of D. simulans, Drosophila 1,080,000 B strain wRi 360,000 yakuba, Drosophila teisseiri, and Drosophila santomea, which have 1,445,873 bp been kept in laboratory conditions from 8 to 15 years (Table S7). The results of this survey showed that the wRi genome remains stable in structure over these sites and suggests that the wRi genome does not oscillate between different genomic structures,

B nor do host switches trigger rearrangements at these sites. The

C observed short-term stability indicates that the recombination 900,000 540,000 frequencies at these repeated sequences are lower than could be detected over this time period.

720,000 The A-Group Wolbachia Strains are Evolutionary Genome Mosaics. We estimated the nonsynonymous (Ka) and synonymous (Ks) sub- Fig. 1. Circular map of the Wolbachia pipientis wRi genome. Each circle stitution frequency per site to quantify sequence divergences confined by the gray lines except for the 2 innermost circles illustrates differ- across strains, although these values may not correspond to ent features on the plus (outer region) and minus (inner region) strands. Lines and boxes in the 3 outermost circles are colored according to the Clusters of actual substitutions if genes evolve mainly by recombination. For Orthologous Groups (COG) categories. First (Outer) circle: protein-coding 851 positional homologs in the wRi and wMel genomes identified genes (CDSs). Second circle: pseudogenes. Third circle: unique CDSs compared by reciprocal BLAST searches (excluding phage genes), the Ϫ3 to wMel. Fourth circle: ANK genes in blue and gene synteny breakpoints median Ka and Ks values were estimated to be 6.2 ϫ 10 and Ϫ compared to wMel in red. Fifth circle: prophage regions in green (not affili- 3.2 ϫ 10 2 substitutions per site, respectively, with a more than ated to a strand) and IS elements color-coded as described in Table S5. Sixth 500-fold variation in Ks-values across genes (Fig. S2a). A similar circle: a diagram showing the synonymous substitution frequency (Ks) be- spectrum of divergences between wRi and wMel was observed tween wRi and wMel potential orthologs; the maximum cut-off was set to for 343 core genes that are conserved across 3 Wolbachia strains ϭ Ks 1. Seventh circle: GC-skew of the wRi genome. (wRi, wMel, and wBm), Orientia tsutsugamushi, and 8 Rickettsia species (as defined in ref. 21) (Fig. S2b), suggesting that these Results and Discussion estimates are not inflated by the inadvertent inclusion of inac- tivated gene fragments. General Features of the Wolbachia wRi Genome. The complete To investigate the mechanisms underlying the remarkable genome sequence of Wolbachia pipientis wRi from Drosophila variation in Ks-values across genes, we sequenced orthologous simulans is a single circular chromosome of 1,445,873 bp, with genes in wUni, another A-group Wolbachia strain, using wMel as 1,150 potential protein-coding sequences and 114 pseudogenes the reference genome. A 3-strain comparison of 450 sequenced (Fig. 1). We identified 4 prophage segments, here called wRi- orthologs showed a broad variability of Ks-values (Fig. 2A) that WOA, wRi-WOB (present in identical duplicates), and wRi- did not correlate with functional categories (Mann-Whitney and WOC. Additionally, we found 35 genes coding for proteins Kolmogorov-Smirnov tests using Bonferroni-Holm correction, containing one or more ankyrin repeat (ANK) domain [Tables P Ͼ 0.05 for all COG categories) (Fig. S2c). One-third of the S1–S3] as compared to 23 such genes annotated in the 1.27-Mb genes were most similar in wMel and wRi, including 52 orthologs genome of wMel (6). Of the genes solely present in either the wRi with no synonymous sustitutions. Another one-third indicated or the wMel genome for which a function or domain hit can be the highest sequence similarity for wMel and wUni, of which 26 identified, the large majority encodes phage proteins, trans- have no synonymous substitutions. Finally, one-fifth of the genes posases and ANK proteins (Tables S4–S6). were most similar in wRi and wUni, with 22 lacking synonymous Overall, the wRi genome contains 22.1% repeated sequences, substitutions. Only 40 genes showed no substitutions at synon- as compared to 8.9% in wMel (Ͼ 200 bp, 95% sequence ymous sites in any of the 3 pairs. Such a complex pattern of identity). About 10% of the wRi genome is covered by IS- sequence divergences indicates extensive recombination. elements; these represent 11 different types, including 67 com- To obtain a simple measure of the relative levels of recom- plete copies, 48 copies with frameshifts or internal stop codons, bination for comparisons across species, we calculated the spread and 11 truncated copies. A total of 46 genes have been disrupted of the relative Ks-values as the median distance to the average by IS-element insertions, of which 19 are hypothetical proteins, relative Ks-values in the ternary plots (see Fig. 2). Notably, the 10 are transposases, and 5 are ANK genes. Furthermore, more level of recombination thus inferred was higher in Wolbachia than 200 insertions of the Wolbachia palindromic element (20) (spread ϭ 0.42) (see Fig. 2A) than in Neisseria meningitidis or remnants thereof were found in the genome of wRi and of (spread ϭ 0.34) (see Fig. 2C), which is naturally competent for these, 43 were inserted into genic sequences. The Wolbachia transformation and highly recombining (22). The level was twice palindromic element is also present in wMel and wUni, but not as high as in its close relative Rickettsia (spread ϭ 0.19) (see Fig. always in the same genes or at the same locations. The most 2B), where the lowest Ks-values were typically associated with 1 abundant 23-bp hairpin-loop structure is present in 79 copies in pair of strains, as expected in nonrecombining bacteria. the wRi genome, in 91 copies in wMel, but only once in wBm. Additionally, of the 450 orthologous genes found in wRi, wMel, and wUni, intragenic recombination was detected in 129 Genome Integrity Following Host Shifts. A comparison of the genes by at least 2 different methods implemented in RDP3 (23). structures of the wRi and wMel genomes revealed 35 gene-order Genes for which intragenic recombination was detected by breakpoints (Fig. S1), 17 of which are flanked by IS-elements, 11 recombination detection program (RDP) were concentrated to

2of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0810753106 Klasson et al. Downloaded by guest on September 30, 2021 01Wolbachia 01Rickettsia 1.0 wRi_wMel A 0.42 B 0.19 A 0.9 wMel_wUni 0.230 0.294 wRi_wUni 0.8 wRi- wMel- Rr- Rc- wMel wUni Rc Rm 0.7 0.365 0.328 0.6 Ks 0.5

1 0 1 0 0.4 0 0.377 1 0 0.405 1 0.3 wRi-wUni Rr-Rm 0.2 01Neisseria01 Staphylococcus C 0.34D 0.28 0.1

MRSA252- 0 251 502 754 1005 1257 1508 1759 2011 2262 2514 0.318 MRSA252- MSSA476 COL 053442- FAM18- 0.430 Window position (bp) FAM18 MC58 0.427 0.336 wKueYo wKueYo BC66 wMel wAtab3 92 100 1 0 1 0 010.347 010.142 wUni wUni 053442-MC58 MSSA476-COL 100 wAtab3 wMel Fig. 2. Ternary plot showing substitution frequency variation across (A) 410 100 genes in A-group Wolbachia (40 genes with no synonymous substitutions in all wRi wRi 3 strains were excluded); (B) 818 genes in Spotted-Fever Group Rickettsia;(C) EVOLUTION 1,529 genes in Neisseria meningitidis; and (D) 2,207 genes in Staphylococcus 100 aureus. Each dot in the diagram represents 1 gene. Absolute Ks-values have wPip wPip 100 100 been transformed to relative values between 0 and 1. The mean relative Ks-value for each pair is shown on each axis. Numbers in bold represent the wTai wTai median distance to the average point. The color of each dot represents the maximum absolute Ks-value among the 3 pairs, ranging from light yellow (low wBm wBm values) to red (high values). Median Ks-values for all pairs are reported in Table 0.02 0.02 S8. Rr, Rickettsia rickettsii; Rc, Rickettsia conorii; Rm, Rickettsia massiliae. Fig. 3. Recombination within leuRS and virB10. (A) The diagram shows the Ks-values calculated for the leuRS gene pair segments consisting of a 99-base window sliding over the gene alignment with 15-base steps. The boxes under the middle of the inner triangle (spread ϭ 0.27), whereas genes the x-axis indicate regions where wRi (red box) and wUni (black box) are highly with no detected recombination were more spread out (spread ϭ diverged from the other two strains. (B and C) Inference of sequence rela- 0.56) in the ternary plot (Fig. S2d). Genes located near different tionships based on segments in the virB10 gene between (B) positions 72 and corners of the inner triangle in the ternary plot provide the 892 and (C) positions 967 and 1526, respectively, of the gene alignment. The strongest evidence for inconsistency in the phylogenetic signal. gray circle shows the position of wRi in the tree. The strains wKueYo, wMel, wUni, wAtab3, and wRi belong to supergroup A (names in red), whereas As no conflict in signal was detected by RDP within these genes, strains wPip and wTai belong to supergroup B (names in blue). it is likely that recombination has occurred over the complete sequences in many cases. Taken together, this indicates that at least three-fourths of the genes that were analyzed are affected they are located in segments with otherwise conserved gene- by recombination. order structures. The final 10 genes lacked homologs in wMel. The switches in sequence similarity patterns within gene To study the relationships of the ANK domains within and alignments is here illustrated with the gene for Leucyl-tRNA among genomes, we performed a Bayesian phylogenetic analysis synthetase (LeuRS), where 1 region was found to be identical in of 319 ANK domains identified in the wRi, wMel, and wUni wMel and wUni, but contained many substitutions in wRi, genomes (Fig. 4). The ANK domains found in the highly followed by a divergent segment in wUni that was identical in conserved genes clustered with the corresponding domains in wRi and wMel (Fig. 3A). Another example is the virB10 gene, the other strains. In contrast, several ANK domains in the where the 5Ј-part of the wRi gene clustered with the A-group variable gene set clustered with other domains in the same gene. In these cases, variability in domain numbers and organization strain wAtab3 from the braconid wasp Asobara tabida (Fig. 3B), is generated by expansion or contraction of repeated sequences, whereas the 3Ј-part of the same gene clustered with the B-group with the repeated unit often spanning across the borders of the w Teleogryllus taiwanemma strains Tai, from the Taiwan cricket, , ANK domains (Fig. S3). The ANK protein WRi࿝003070 is and wPip, from the mosquito Culex quinquefasciatus (Fig. 3C). particularly interesting; the N-terminal repeats are most similar to wMel and the repeats in the middle part to wUni, with recent Rapid Diversification by Expansion-Contraction of Tandem Repeats. duplications of ANK domains in both wRi and wUni (see Fig. We classified the 35 ANK genes identified in the wRi genome S3). Additionally, this is the only wRi ANK gene that is solely into 3 different groups based on the extent of sequence diver- transcribed in female adults and ovaries, but not in male adults gence from their homologs in wMel (see Tables S1–S3). One- and testes (Tables S9 and S10). third, 13 genes, are highly conserved in length, domain organi- zation, and nucleotide sequence (Ks Ͻ0.05). Another 12 genes Gene Transfer Across Supergroups. Of the combined 58 ANK genes showed variability in the number of ANK domains and gene in the wRi and wMel genomes, 31 are located near to prophages length plus high sequence divergence (Ks 0.1 to Ͼ1.0), although and many of these are unique to the individual strain and may

Klasson et al. PNAS Early Edition ͉ 3of6 Downloaded by guest on September 30, 2021 12

010280_1 3_4 p06840_2

_p06840_14

Ri_p06840_3

007

WRi_ WRi_005620_1 6_1 WD059

WRi_p07650_1 wUni_006150_3 WRi_005390_3

WRi_010050_3 WD0633_3

WRi_006690_2 W

WD0292_2 WRi_p06840_1

WD0292_1 WRi_p06840_ WRi_006690_1

WRi_005390_2 WRi

WD0633_2 WRi_

WRi_010050_2 WRi_p06840_7

WRi_006810_3 WRi_p06840_5

WRi_006810_2 WRi_p04040_2 0073_3_p04040_6

WRi_p06840_15 WD0385_2

WRi_006810_1 WD

WRi_p06840_8 WUni_p07640_3

WRi_p06840_4 WRi_000620_3 D

WRi_p06840_6 WRi_000620_4

WRi_p06840_10 WUni_p07640_4 _9

WRi_p06840_9 W

WRi_000290_2 WRi 5

WD0035_2 WRi_p04040_5D0385_6 _

wUni_006060_2 WRi_p04040_7

wUni_009360_3 W 385_8

WRi_002480_3 WRi_p04040_9 0 4 _3

WD0498 WD0385_10

WRi_000290_3 WRi_p0404D 0_0388 5

D _3 5 3 0 WD0 WD0385_7D

wUni_006060_3 W 005620_4

wUni_005760_6 W

WRi_002880_4 WRi_p04040_4

D _6 5 3 0 WD0 WD0385 5620_8

WRi_000290_6 WRi_010280_4 wUni_006060_6 _9

WRi_

wUni_009360_4 WD0596_ 010280_8 D _4 8 9 4 WD0

0035_4 WRi_p07220_1

WRi_002480_4 WRi_006850_7

WRi_000290_4 WRi_00

WD WRi_

wUni_006060_4 WD05960636_1

WRi_002480_5 WRi_p07220_5

wUni_009360_5 WD

WD0498_5 WRi_005440_2

WRi_000290_5 WRi_010100_2

WD0035_5 WRi_005620_2

wUni_006060_5 WRi_010280_2

WRi_003070_4 WD0596_2

WRi_003070_6 WRi_005620_3

WRi_003070_3 WRi_010280_30596_3

wUni_005760_3 WD

wUni_005760_1 WRi_005620_5

wUni_005760_5 WRi_010280_5

WRi_006850_9 WD0596_5

WD0073_2 WRi_p07220_2

WRi_000620_2 WRi_001060_2

wUni_p07640_2 WRi_005620_7

wUni_002670_6 WRi_010280_7

WRi_006090_11 WD0596_8

WD0766_8 WD0385_4

WD0550_6 WRi_006090_1 1

WRi_p03640_6 WD0766_1

wUni_009500_5 wUni_002670_1

WRi_000620_5 WRi_p06840_1

WUni_p07640_5 WRi_006090_2

WD0073_5 wUni_002670_2

WRi_001790_1 WRi_006090_4 WD0191_1 wUni_p04090_1 wUni_002670_4

WD0766_2

WRi_006740_3 WRi_006090_3

WRi_006740_2 wUni_002670_3

WRi_006740_1 WRi_005440_3

WRi_012460_3 WD0636_2

WRi_006850_2 WRi_010100_3 23

WRi_006850_ D0286_3 WD0291_2 W

WD0285_3

WD0291_1 WD0637_3

WRi_006870_14 WRi_010110_3 WRi_006850_21 wUni_p04990_1 1 - 0.95 WRi_005450_3

WRi_012460_5

D _1 4 5 7 WD0 WRi_006090_6

WRi_007240_1 WR W i_006090_7 D0291_

3 0.94 - 0.90 WR

W i_0

D02 06090_5

91_4 WRi_006090_8

_ 6870 WRi_00

13 WD0766_6 _1 870 WRi_006

1 WD0 7

WRi_000620_1 0.89 - 0.80

WRi_006090_966_4 _ WD0073

1 W

wUn D0766_7

i_p07640_1 WRi_006670_1 WR

6

i_001060_ WD0147_10

WD0

286_2 WD0 1 4 WRi

_3 70

_0068 WRi_001060_87_7 80_1

WRi_p026 W 850_3

WRi_006 WD0147_4

W D0147

Ri_006850_5 WD0498_8 WR

2 _5

i_006850_8 W WRi_006850_4 Ri_0024 WRi_006870_5 wUni_009360_8

WD0766_580_8

W

Ri_006 W _1 WRi_006850 D0766_

WRi_006870 W

850_6 WDD0147_8

WRi_006870 WD 014 3

WRi_006870_ 0

WRi_006870_ WRi_001060_77_ _ 147_

10 0.1 W 9 0_30 685 WRi_00

_12 WD0147_2R 6

WRi_0068 i_006860_4

15 WD0147_3 _006850_

WRi wUni_009500_4 WRi_006870_18

1

6 wUni_0 WRi_001060_3

70_19 WRi

WRi_006850_ WRi_002480_9

WRi_006850 _p030 29 W WRi_006850_26 9

wUni_00936D0 0_9500_2

WD W 640_2

_9

WD0294 4 0.5

0294_7 WRi_p0Ri_p03640_43640_598_9 WRi_006750_1

22 WD wUni_004210_1

_

2 W wUni_002670_5 3

0 WDD050550

wUni_006150_1 W

WD0294_8 WD04410550_3_1 WD0637_2 D0

W 50_1_

WRi_005450_2 w 55 2

WRi_010110_2 WRi_005450_1Uni_p03380_1Ri_003040_1

WRi_006850_19 WRi_01 0_5 WRi_006850_18

850_16

WRi_006 WD0 6 3 7 _1

WD 15

WRi_006850_ WRi_012460_1

WRi_006850_14 WD0 2 06850_13

WRi_0 WD 0285_1

WRi_006850_ WD 0110_1

WRi_006850_11 WD0 06850_1

WRi_0 W 0514_2

WRi_006870_2 WD0514_1 WRi_006870_17 0514_3

WD0 D0 86_1

WRi_006870_6 WRi_0

WD0550_4 WRi_006860_3 514_5

wUni_009500_3 WD0035_1 WRi_006870_1 514_4

wUni_006060_

WRi_006870_4 WRi_000290_1

WRi_006870_7 WD0498_2 438

WRi_006870_9 WRi_002480_2

WRi_006870_8 wUni_009360_

WRi_006090_ W

WRi_012460_4 WRi_ 0 WRi_012460_2

12 wUni_009360_7

WRi_006850_24 WRi_002880_5 3

WRi_006850_3 WRi_0 D _2

wUni_p04990_2 wUni_009360_ 0

WRi_007240_2 WRi_p03640

WRi_002880_1 wUni_009500_1 0498_7

W WRi_p03640_3 70_

_6

WD0294 8WD0 2 W

0 W

WD0294 WD0566_

W WRi_006900_3

W _2 6 6 5 WD0

WRi_001060_1 wUni_006150_2

D0294_1

WRi_001060_4 wUni_0 4 WD0

WRi_001060_5 WRi_002480_1

WD WRi_006850_27 WRi_ WRi_006850_25

WRi_006850_31 002480_7

D0294_5 Ri_006900_2

D0294_3

D 2 0294_2 0

0754_2 2480_1 1

006870_

98_1 5_

10 09360_1 2 _4 2 1 _1

10 0

2

0

Fig. 4. Clustering of ankyrin repeat domains from wRi, wMel, and wUni using Bayesian phylogenetic inference. The colors on the branches indicate different intervals of posterior probability: (red) 0.95–1.0, (yellow) 0.90–0.94, and (blue) 0.89–0.80 (nonsignificant), and (black) Ͻ0.80.

have been acquired recently. One such gene, WRi࿝006870, is genes, may have been acquired from the B-group strains. Trans- located near phage wRi-WOC. Although the gene is absent from mission of phages between Wolbachia strains of different super- the other 2 A-group strains, homologs were identified in wNo, groups that infect the same host has been seen in moths (14) and wMa, and wMau B-group strains that like wRi use D. simulans has also been shown to occur between wNo and wHa during as their natural host. The gene is not transcribed in early or late coinfections of D. simulans in the laboratory (15). (overnight) embryos in wRi, not in testes and early embryos in In contrast, no examples of horizontal gene transfer between wNo and not in adult males in wMau: that is, this ANK gene the A-group strains and the D-group Wolbachia strain wBm were exhibits stage-specific expression patterns in both A- and B- observed, consistent with the absence of prophages and low group genomic backgrounds (see Tables S9 and S10). A phy- levels of recombination in mutualistic Wolbachia strains (24). logeny based on the minor capsid protein shows that wRi-WOC Likewise, mutualistic endosymbionts of aphids show low recom- clusters with one of the prophages in wNo (Fig. S4), indicating bination frequencies and have the most stable bacterial genomes that prophage wRi-WOC, along with some of its associated ANK identified to date (25). This dramatic difference in recombina-

4of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0810753106 Klasson et al. Downloaded by guest on September 30, 2021 tion may reflect variability in host-adaptation strategies, access diversification of the ANK genes by segmental gene duplication to mobile elements, and different sets of recombination genes may reflect diversifying selection to match a divergent set of (15), as well as increased possibilities for rare genome variants target molecules in different cells, tissues, and hosts. To test this to reach fixation in such populations (26, 27). hypothesis, target proteins should be searched for among host genes that are also rapidly evolving, with prime candidates in Wolbachia Sequences in Drosophila Genome Assemblies. Wolbachia gametogenesis, meiosis, reproduction (37), and innate immunity sequences were previously identified in the trace archive files of responses (38). An exciting avenue for future research is to both the D. simulans and the Drosophila ananassae genome identify the interacting endosymbiont-host proteins and deter- projects and assumed to originate from contaminating bacterial mine whether these evolve by purifying, positive, or diversifying DNA (28), a discovery that was followed by a debate about selection within Wolbachia subpopulations. whether any corresponded to wRi (29, 30). The more recent The association between wRi and D. simulans is one of a few finding that Wolbachia genes have been transferred into the Wolbachia infections that have been studied in natural popula- nuclear genomes of D. ananassae and other hosts (31–34) further tions. Using wRi as the reference genome, it is now possible to added complexity with regards to the origin of these sequences. initiate comparative studies of wRi genomes extracted from Using BLAST, we recovered 73 scaffolds from the FlyBase natural D. simulans populations with different phenotypes. assembly of the D. ananassae genome containing 177 kb (ex- Although we have demonstrated stability over the breakpoints cluding gaps) that partially or fully matched wRi sequences, some during a period of 15 years in wRi strains kept in the laboratory, of which are currently annotated as D. ananassae genes in both other genetic changes, such as transposition of IS-elements, gene GenBank and FlyBase. We did not identify any wRi sequences inactivation by IS-element insertions, and novel gene acquisition that are shared with wMel, presumably because the wMel could occur rapidly in natural populations. For example, changes genome was used to filter out Wolbachia reads. A comparison to in fecundity have been observed during a 20-year period in a the wRi genome revealed the absence of 2 IS-elements within natural D. simulans population infected with wRi in southern scaffolds of otherwise conserved gene-order structures, suggest- California (18). The genetic basis of this and other rapidly ing that the Wolbachia sequences in the D. ananassae genome are changing phenotypes can now be investigated. similar but not identical to the wRi genome. In contrast, no wRi sequences were identified in the FlyBase assembly of the D. Materials and Methods simulans genome. Furthermore, amplification of wRi ANK and Sequencing Strategies. wRi: Drosophila simulans Riverside eggs were collected EVOLUTION VIR sequences by PCR from D. simulans treated with tetracy- after 2 h and 1 to 2 ml of embryos were homogenized. A continuous renogra- cline failed. We conclude that a corresponding transfer of fin gradient (28%–45%) was used to concentrate Wolbachia cells. The 28% to Wolbachia genes into the nuclear genome of D. simulans is not 32% zone was collected and placed in agarose plugs that were treated with likely. bacterial cell lysis and proteinase K solution. To remove contaminating host DNA, the plugs were run on a 1% Seakem Gold agarose (FMC BioProducts) gel for 24 h and the isolated DNA was subsequently used for library construction Patchy Wolbachia Populations. Pervasive recombination in para- in a modified M13 vector as described previously (39). From the M13 library, sitic Wolbachia destroys the anticipated correlation between 34,322 reads were sequenced, of which 19,727 were present in the final gene history, genome history, and strain phenotype. The wsp assembly, resulting in an overall 8.2-times coverage. An additional 18,031 surface protein has been extensively used for genotyping but was reads were generated during gap closure and finishing. found to be especially prone to recombination (9, 35) and 2 wUni: DNA was isolated from 300 dissected ovaries from adult females different sets of housekeeping genes, gatB, coxA, hcpA, fbpA, ftsZ Muscidifurax uniraptor using a CTAB protocol, followed by lysozyme treat- (11) and aspC, atpD, sucB, and pdhB (12) were proposed as an ment and chloroform extraction. Primers were designed based on the genome alternative. However, not even these genes are protected from sequence of Wolbachia strain wMel, to amplify 1,100-bp products with 300-bp recombination events (11) and our comparisons of wUni, wMel, overlap on both ends to the adjacent product. Primers that successfully and wRi show divergences at synonymous sites ranging from amplified short PCR products were selected and combined to generate long- ϭ ϭ range PCR products, where short products did not amplify. Next, 26,834 reads Ks 0inaspC to Ks 0.1–0.2 for gatB, with different genes and were sequenced and assembled into 287 contigs, of which 106 were longer segments of genes supporting different strain relationships. than 2 kb. Short PCR-products were sequenced directly and long products Hence, no single gene sequence will accurately describe the were sheared by nebulization and cloned into the pSMART-HCKan vector relationships of these A-group Wolbachia strains. before sequencing. The global Wolbachia population is likely to consist of many subpopulations, or patches (36), where the boundaries are Verifying the wRi Genome Assembly. The wRi assembly was confirmed over defined by, for example, geography or host specificity. Recom- each IS-element or inferred breakpoint using genomic DNA from wRi isolated bination among strains is expected to be frequent within patches, from D. simulans and other infected hosts using PCR with specific primers. The but less so between patches. For example, mutualistic adapta- size of the assembled genome wRi is slightly lower than the 1.66 Mb previously tions to a single host may lead to the isolation and evolution of estimated from pulse-field gel electrophoresis (40). However, the relative order of the observed restriction fragments matches those predicted from the a subpopulation with limited recombination, as is possibly the genome sequence, except that the sizes of the individual fragments appear to case in nematode Wolbachia. While multilocus sequence typing have been systematically overestimated in the pulsed-field gel electrophoretic may be useful to characterize supergroups, the intense recom- -analysis. bination seen between A-group strains indicates that character- ization of genotypes might require analysis at the whole genome Informatics. Assembly was performed with PHRED-PHRAP-CONSED (41–43). level. However, as selection is expected to act on traits involved Protein-coding genes were identified with GLIMMER (44) and CRITICA (45) in host-adaptation processes within a patch, such genes may be and tRNA genes by tRNAscan-SE (46). Putative functions were inferred using useful to identify ‘‘fitness types,’’ although not conferring any BLAST against the National Center for Biotechnology Information databases information that is meaningful in a phylogenetic sense. and InterProScan (47). Repeat identification was made using MUMmer (48). Codeml, PAML 3.14 (49) was used to calculate substitution rates. Orthologs used for Ks calculations were retrieved by reciprocal best blast with additional Future Perspectives. The availability of complete genome se- cutoffs. RDP3 (23) was used to check nucleotide alignments for intragenic quence data for the model organisms D. melanogaster and D. recombination using 6 methods, RDP, Geneconv, Bootscan, MaxChi, Chi- simulans, as well as for their respective Wolbachia endosymbi- maera, and 3Seq, with default settings except for window and step sizes. onts, offers an excellent opportunity to study host-adaptation Sequences of the minor capsid gene were aligned with CLUSTALW (50) on the processes by monitoring the coevolution of host and endosym- protein level and back-translated to nucleotide sequences. The phylogeny was biont gene interactions in natural and transinfected hosts. Rapid reconstructed using MrBayes 3.12 (51) with the GTRϩG model and run for

Klasson et al. PNAS Early Edition ͉ 5of6 Downloaded by guest on September 30, 2021 10,000,000 generations. Ankyrin repeats were found with the ANK HMM from random primers (Promega), and thereafter treated with RNase H. For each PFAM (52) running HMMER 2.0 (53). An amino acid alignment was produced gene, specific primers were designed based on the corresponding wRi gene with hmmalign and then back-translated to nucleotides. The phylogeny was nucleotide sequence and used for PCR amplification reconstructed using MrBayes3.12 (51) under the GTRϩIϩG model and run for 27,000,000 generations. For both trees, sampling was made every one- ACKNOWLEDGMENTS. We thank Richard Stouthamer and Fabrice Vavre for hundredth generation with 2 runs of 4 chains and default priors and a providing Muscidifurax uniraptor, Gabor Nyiro for technical assistance, and Lionel Guy for helpful suggestions. This work was supported by Grant QLK3- consensus trees were constructed using a ‘‘burnin’’ of 25%. CT2000-01079, “The European Wolbachia Project: Towards novel biotechno- logical approaches for control of pests and modification of bene- Transcription Analyses. For each of the tested Wolbachia-Drosophila associa- ficial arthropod species by endosymbiotic bacteria” from the European Union tions, 300 testis and 150 ovaries were dissected from adults (1-day-old males (to K.B., H.R.B., R.G. and S.G.E.A.), from the European Community’s Seventh ࿝ and 3-day-old females). Embryos were collected every 2 h and late embryos Framework Program CSA-SA REGPROT-2007–1 under Grant agreement 203590 and intramural funding from University of Ioannina (to K.B.), and from every 16 h. Total RNA was extracted using TRIzol (Invitrogen) and treated with the Swedish Agricultural Research Council, the Swedish Research Council, the RNase-free DNase (Invitrogen). First-strand cDNA was synthesized from 5 ␮g Go¨ran Gustafsson Foundation, the Swedish Foundation for Strategic Research of total RNA using reverse transcriptase (SuperScript III; Invitrogen) and and the Knut and Alice Wallenberg Foundation (to S.G.E.A.).

1. Werren JH, Baldo L, Clark ME (2008) Wolbachia: master manipulators of invertebrate 27. Darby AC, Cho NH, Fuxelius HH, Westberg J, Andersson SG (2007) Intracellular patho- biology. Nat Rev Microbiol 6:741–751. gens go extreme: genome evolution in the Rickettsiales. Trends Genet 23:511–520. 2. Stouthamer R, Breeuwer JA, Hurst GD (1999) Wolbachia pipientis: microbial manipu- 28. Salzberg SL, et al. (2005) Serendipitous discovery of Wolbachia genomes in multiple lator of arthropod reproduction. Annu Rev Microbiol 53:71–102. Drosophila species. Genome Biol 6:R23. 3. Werren JH, Zhang W, Guo LR (1995) Evolution and phylogeny of Wolbachia: repro- 29. Iturbe-Ormaetxe I, Riegler M, O’Neill SL (2005) New names for old strains? Wolbachia ductive parasites of . Proc Biol Sci 261:55–63. wSim is actually wRi. Genome Biol 6:401; author reply 401. 4. McGraw EA, O’Neill SL (1999) Evolution of Wolbachia pipientis transmission dynamics 30. Salzberg SL, et al. (2005) Correction: Serendipitous discovery of Wolbachia genomes in in insects. Trends Microbiol 7:297–302. multiple Drosophila species. Genome Biol 6:402. 5. Bourtzis K, Braig HR, Karr TL (2003) in Insect Symbiosis, eds. Bourtzis K, Miller TA (CRC 31. Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T (2002) Genome fragment of Press, Boca Raton), pp. 217–246. Wolbachia endosymbiont transferred to X chromosome of host insect. Proc Natl Acad 6. Wu M, et al. (2004) Phylogenomics of the reproductive parasite Wolbachia pipientis Sci USA 99:14280–14285. wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol 2:E69. 32. Fenn K, et al. (2006) Phylogenetic relationships of the Wolbachia of nematodes and 7. Klasson L, et al. (2008) Genome evolution of Wolbachia strain wPip from the Culex pipiens group. Mol Biol Evol 25:1877–1887. arthropods. PLoS Pathog 2:e94. 8. Foster J, et al. (2005) The Wolbachia genome of Brugia malayi: endosymbiont evolu- 33. Hotopp JC, et al. (2007) Widespread lateral gene transfer from intracellular bacteria to tion within a human pathogenic nematode. PLoS Biol 3:e121. multicellular eukaryotes. Science 317:1753–1756. 9. Baldo L, Lo N, Werren JH (2005) Mosaic nature of the Wolbachia surface protein. J 34. Nikoh N et al. (2008) Wolbachia genome integrated in an insect chromosome: evolu- Bacteriol 187:5406–5418. tion and fate of laterally transferred endosymbiont genes. Genome Res 18:272–280. 10. Baldo L, Bordenstein S, Wernegreen JJ, Werren JH (2006) Widespread recombination 35. Baldo L, Werren JH (2007) Revisiting Wolbachia supergroup typing based on WSP: throughout Wolbachia genomes. Mol Biol Evol 23:437–449. spurious lineages and discordance with MLST. Curr Microbiol 55:81–87. 11. Baldo L, et al. (2006) Multilocus sequence typing system for the endosymbiont Wol- 36. Berg OG, Kurland CG (2002) Evolution of microbial genomes: sequence acquisition and bachia pipientis. Appl Environ Microbiol 72:7098–7110. loss. Mol Biol Evol 19:2265–2276. 12. Paraskevopoulos C, Bordenstein SR, Wernegreen JJ, Werren JH, Bourtzis K (2006) 37. Begun DJ, et al. (2007) Population genomics: whole-genome analysis of polymorphism Toward a Wolbachia multilocus sequence typing system: discrimination of Wolbachia and divergence in Drosophila simulans. PLoS Biol 5:e310. strains present in Drosophila species. Curr Microbiol 53:388–395. 38. Sackton TB, et al. (2007) Dynamic evolution of the innate immune system in Drosophila. 13. Cordaux R, et al. (2008) Intense transpositional activity of insertion sequences in an Nat Genet 39:1461–1468. ancient obligate endosymbiont. Mol Biol Evol 25:1889–1896. 39. Andersson B, Wentland MA, Ricafrente JY, Liu W, Gibbs RA (1996) A ‘‘double adaptor’’ 14. Masui S, Kamoda S, Sasaki T, Ishikawa H (2000) Distribution and evolution of bacte- method for improved shotgun library construction. Anal Biochem 236:107–113. riophage WO in Wolbachia, the endosymbiont causing sexual alterations in arthro- 40. Sun LV, et al. (2001) Determination of Wolbachia genome size by pulsed-field gel pods. J Mol Evol 51:491–497. electrophoresis. J Bacteriol 183:2219–2225. 15. Bordenstein SR, Wernegreen JJ (2004) Bacteriophage flux in endosymbionts (Wolba- 41. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. chia): infection frequency, lateral transfer, and recombination rates. Mol Biol Evol 21:1981–1991. Error probabilities. Genome Res 8:186–194. 16. Gavotte L, et al. (2004) Diversity, distribution and specificity of WO phage infection in 42. Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer Wolbachia of four insect species. Insect Mol Biol 13:147–153. traces using phred. I. Accuracy assessment. Genome Res 8:175–185. 17. Hoffmann AA, Turelli M, Simmons GM (1986) Unidirectional incompatibility between 43. Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. populations of Drosophila simulans. Evolution 40:692–701. Genome Res 8:195–202. 18. Weeks AR, Turelli M, Harcombe WR, Reynolds KT, Hoffmann AA (2007) From parasite 44. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using to mutualist: rapid evolution of Wolbachia in natural populations of Drosophila. PLoS interpolated Markov models. Nucleic Acids Res 26:544–548. Biol 5:e114. 45. Badger JH, Olsen GJ (1999) CRITICA: coding region identification tool invoking com- 19. Stouthamer R, Breeuwert JA, Luck RF, Werren JH (1993) Molecular identification of parative analysis. Mol Biol Evol 16:512–524. microorganisms associated with parthenogenesis. Nature 361:66–68. 46. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer 20. Ogata H, Suhre K, Claverie JM (2005) Discovery of protein-coding palindromic repeats RNA genes in genomic sequence. Nucleic Acids Res 25:955–964. in Wolbachia. Trends Microbiol 13:253–255. 47. Zdobnov EM, Apweiler R (2001) InterProScan–an integration platform for the signa- 21. Fuxelius HH, Darby AC, Cho NH, Andersson SG (2008) Visualization of pseudogenes in ture-recognition methods in InterPro. Bioinformatics 17:847–848. intracellular bacteria reveals the different tracks to gene destruction. Genome Biol 9:R42. 48. Kurtz S, et al. (2004) Versatile and open software for comparing large genomes. 22. Jolley KA, Wilson DJ, Kriz P, McVean G, Maiden MC (2005) The influence of mutation, Genome Biol 5:R12. recombination, population history, and selection on patterns of genetic diversity in 49. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum Neisseria meningitidis. Mol Biol Evol 22:562–569. likelihood. Comput Appl Biosci 13:555–556. 23. Martin DP, Williamson C, Posada D (2005) RDP2: recombination detection and analysis 50. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of from sequence alignments. Bioinformatics 21:260–262. progressive multiple sequence alignment through sequence weighting, position- 24. Jiggins FM (2002) The rate of recombination in Wolbachia bacteria. Mol Biol Evol 19:1640–1643. specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. 25. Tamas I, et al. (2002) 50 million years of genomic stasis in endosymbiotic bacteria. 51. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under Science 296:2376–2379. mixed models. Bioinformatics 19:1572–1574. 26. Cho NH, et al. (2007) The Orientia tsutsugamushi genome reveals massive proliferation 52. Bateman A, et al. (2004) The Pfam protein families database. Nucleic Acids Res of conjugative type IV secretion system and host-cell interaction genes. Proc Natl Acad 32:D138–D141. Sci USA 104:7981–7986. 53. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763.

6of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0810753106 Klasson et al. Downloaded by guest on September 30, 2021