<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Virology 363 (2007) 26–35 www.elsevier.com/locate/yviro

Shared and species-specific features among ichnovirus genomes

Kohjiro Tanaka a,1,2, Renée Lapointe b,2, Walter E. Barney a, Andrea M. Makkay c, Don Stoltz c, ⁎ ⁎ Michel Cusson b, , Bruce A. Webb a,

a Department of Entomology, University of Kentucky, Lexington, KY 40503, USA b Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, 1055 du PEPS, PO Box 10380 Ste-Foy stn, Quebec City, QC, Canada G1V 4C7 c Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7 Received 3 August 2006; returned to author for revision 13 September 2006; accepted 13 November 2006 Available online 15 February 2007

Abstract

During egg-laying, some endoparasitic wasps transmit a to their caterpillar host, causing physiological disturbances that benefit the wasp larva. Members of the two recognized polydnavirus taxa, ichnovirus (IV) and (BV), have large, segmented, dsDNA genomes containing virulence genes expanded into families. A recent comparison of IV and BV genomes revealed taxon-specific features, but the IV database consisted primarily of the genome sequence of a single species, the Campoletis sonorensis IV (CsIV). Here we describe analyses of two additional IV genomes, the Hyposoter fugitivus IV (HfIV) and the Tranosema rostrale IV (TrIV), which we compare to the sequence previously reported for CsIV. The three IV genomes share several features including a low coding density, a strong A+T bias, similar estimated aggregate genome sizes (∼250 kb) and the presence of nested genome segments. In addition, all three IV genomes contain members of six conserved gene families: repeat element, cysteine motif, viral innexin, viral ankyrin, N-family, and a newly defined putative family, the polar-residue-rich proteins. The three genomes, however, differ in their degree of segmentation, in within-family gene frequency and in the presence, in TrIV, of a unique gene family (TrV). These interspecific variations may reflect differences in parasite/host biology, including -induced pathologies in the latter. © 2007 Published by Elsevier Inc.

Keywords: Ichnovirus; Polydnavirus; Genome evolution

Introduction generation through the wasp germ line (Stoltz, 1990). Importantly, replication and virion assembly are restricted to The two recognized polydnavirus (PDV) taxa, ichnovirus the calyx cells, which are located at the junction of the ovarioles (IV) and bracovirus (BV), share an obligate mutualistic and the lateral oviduct. The PDV genome consists of multiple association with endoparasitic wasps of the families Ichneu- circular double stranded DNA segments, ranging in size from 2 monidae and Braconidae, respectively. A unique trait of PDVs to 42 kb (Kroemer and Webb, 2004). The number of genome is that their genome is integrated into the chromosomal DNA of segments and their size distribution vary from one PDV species the wasp carrier and is thus vertically transmitted to the next to another. The estimated size of characterized PDV genomes ranges from 187 to 567 kb (Dupuy et al., 2006; Espagne et al., 2004; Webb, 1998; Webb et al., 2006). ⁎ Corresponding authors. During oviposition, PDV virions are injected into the E-mail addresses: [email protected] (K. Tanaka), lepidopteran host hemocoel, along with wasp eggs, venom [email protected] (R. Lapointe), [email protected] (W.E. Barney), and ovarian proteins (Webb, 1998). PDV infection causes [email protected] (A.M. Makkay), [email protected] (D. Stoltz), physiological alterations in the parasitized larva, including an [email protected] (M. Cusson), [email protected] (B.A. Webb). arrest or delay in development and/or a suppression of the 1 Present address: Institute for Biological Resources and Functions, National Institute of Advanced Industrial Sciences and Technology (AIST), Tsukuba, immune response that would otherwise lead to encapsulation Ibaraki 305-8566, Japan. and death of wasp eggs by host hemocytes. PDV genes are 2 These authors contributed equally to the work. classified in three groups according to the host specificity of

0042-6822/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.virol.2006.11.034 K. Tanaka et al. / Virology 363 (2007) 26–35 27 their expression (Webb, 1998). Genes expressed exclusively in For both HfIV and TrIV, we constructed genomic libraries the wasp or in the lepidopteran host are designated as class I and that we used for whole genome shotgun sequencing. In parallel, class II genes, respectively, whereas class III genes are we used the EZ∷TN transposon system (Epicentre) to purify expressed in both hosts. Given that affected wasps are individual genome segments, which were then submitted for asymptomatic, class II genes have received the greatest sequencing so as to facilitate sequence assembly and sequence attention; indeed most of the proteins they encode are believed assignment to genome segments. Our results indicate that HfIV to be responsible for the pathological effects observed in the and TrIV share several genome properties and that each lepidopteran host and, therefore, hold some potential for the contains members of the gene families found in CsIV, including development of new pest-control strategies (Kroemer and a newly described group of related putative genes; however, we Webb, 2004). also observed some disparities in genome composition, which The genomes of the Campoletis sonorensis ichnovirus may reflect differences in the types of disturbances the (CsIV), Microplitis demolitor bracovirus (MdBV) and Cotesia cause in their respective hosts. congregata bracovirus (CcBV) were recently sequenced and compared with respect to their organization and gene content Results (Espagne et al., 2004; Webb et al., 2006). They were observed to share various attributes, including segmentation, a lack of General features of the HfIV and TrIV genomes genes required for DNA replication and strong gene expansion into families. On the other hand, the comparison between CsIV The HfIV genome and MdBV indicated that these ichnovirus and bracovirus We identified and sequenced 56 unique HfIV genome genomes contain few shared gene families; however, a segments, 11 of which can yield smaller, “daughter” or “nested” preliminary comparison of available sequence data from several segments, which are generated by intramolecular recombination PDV species suggested that the gene families identified so far of larger genome segments (Kroemer and Webb, 2004). This are well conserved within the IV and BV taxa (Webb et al., degree of genome segmentation is higher than that assessed for 2006). either TrIV or CsIV (Fig. 1A), making HfIV the most highly In the present article, we report on the genome sequences of segmented IV genome characterized to date. Its genome two campoplegine IVs, the Hyposoter fugitivus ichnovirus segments range in size between 2.6 and 8.9 kb, with a median (HfIV) and the Tranosema rostrale ichnovirus (TrIV). In size of 4.0 kb (Fig. 1B), and its aggregate genome size is undertaking this work, we wished to determine whether the estimated at 246 kb (Fig. 1C). The HfIV genome has a G+C general features described for the CsIV genome are shared by content of 43.1%, a value comparable to that determined for other selected IV representatives, with the aim of achieving a TrIV and CsIV (Fig. 1D). Overall, coding sequences represent level of intrataxonomic comparison similar to that which is 30% of its genome (Fig. 1E), although coding density is higher currently possible for the BV group. Earlier work has shown among larger genome segments (>5 kb: 38.9%) than among that at least one gene family, the rep family, is shared by three smaller ones (<5 kb: 26.8%; see Fig. 2). IV species (Volkoff et al., 2002). In addition, we wanted to assess the feasibility of sequencing an entire IV genome (TrIV The TrIV genome in this case) using material obtained from a wild wasp We completed the sequencing of 20 TrIV genome segments, population because the documented occurrence of polymor- which included genome segment F whose sequence had been phism in PDV genomes (Stoltz and Xu, 1990) was expected to reported earlier and shown to contain members of the rep gene make genome assembly more difficult. To conduct work on family (Volkoff et al., 2002). These 20 genome segments make TrIV, we have had to rely on yearly field collections of up 130,961 bp, to which we must add seven partially (∼60%) parasitized host larvae inasmuch as maintenance of a T. rostrale sequenced and annotated genome segments (Fig. 1A) as well as colony in the laboratory has so far proven impossible (Cusson five ORF-containing sequences that could not be assigned to et al., 1998a). specific genome segments. Thus, although the sequenced In selecting the viruses of H. fugitivus and T. rostrale, which portions of the TrIV genome add up to 176,243 bp, we estimate are parasitoids of the forest tent caterpillar Malacosoma disstria at ∼200 kb the aggregate size of the 27 genome segments (Stoltz et al., 1986) and the spruce budworm Choristoneura characterized here, a value obtained by computing the full sizes fumiferana (Cusson et al., 1998a), respectively, we also wanted of the partially sequenced genome segments, which were to determine whether differences in pathological effects are assessed by gel electrophoresis of the associated EZ∷TN clones reflected in viral gene composition. Unlike other IVs charac- (data not shown). Many additional small contigs, totaling terized to date, TrIV has been shown to have little or no apparent ∼75 kb (most of which contained no detectable ORFs), have impact on the host cellular immune response, although it not yet been assigned to any genome segment, but several of induces a strong inhibition of caterpillar metamorphosis these are expected to partially fill the ∼25 kb gaps of the seven (Cusson et al., 1998b; Doucet and Cusson, 1996a, 1996b). partially sequenced genome segments. Thus, the aggregate Conversely, HfIV has a marked effect on host hemocyte genome size of TrIV is estimated at ∼250 kb, a value distributed behavior (Stoltz and Guzo, 1986; Stoltz et al., 1986) but little over what we believe to be fewer than 40 genome segments apparent impact on metamorphosis (Stoltz, unpublished (Figs. 1A, C). The latter DNAs range in size between 4.1 and observations). 10.1 kb, with a median of 6.1 kb (Fig. 1B). One of the genome 28 K. Tanaka et al. / Virology 363 (2007) 26–35

Fig. 1. General features of the HfIV, TrIV and CsIV genomes. (A) Number of genome segments per genome. In the case of TrIV, the dash line represents the estimated number of additional genome segments (i.e., ∼50 kb of unassigned contigs divided by the median size of the fully sequenced genome segments). (B) Size range of genome segments. Box: median size. (C) Aggregate genome size. In the case of TrIV, the dash line represents the estimated aggregate size of partially sequenced genome segments and unassigned contigs. (D) Proportion of G+C nucleotides in each genome. (E) Percentage of coding sequence in each genome. HflV genome sequence accession numbers are AB291165-AB291209, AY547319, AY556383-4, AY563518-9, AY570798-9, AY577428-9, AY597814 and AY935249. TrIV genome sequence accession numbers are AB291138-64, AY940454, AF421353, AB291213-15, AY563518-9, AY570798-9, AY577428-9, AY597814 and AY935249. segments (D3′) is nested, but the parent (donor) segment (D3) one of these ORFs are overlapping in either direction. On has not yet been fully sequenced. The TrIV genome has a G+C average, individual TrIV genome segments contain more ORFs content of 42.2% (Fig. 1D) and a proportion of coding sequence than those of HfIV: one of the fully sequenced TrIV genome of 21.9% (Fig. 1E). segments contains only one putative ORF, whereas there are 8, 4, 4, 1 and 2 genome segments that contain 2, 3, 4, 5 and 6 General features of HfIV and TrIV ORFs ORFs, respectively. Eight of the 31 TrIV genome segments and contigs (Fig. 2) contain ORFs that do not show similarity with In the HfIV genome, we identified 150 ORFs, which we known PDV gene families. numbered according to (i) the genome segment on which they As observed for other IVs, the HfIV and TrIV genomes were found and (ii) their segment-specific size sorting order (see display large intergenic distances and their ORFs show no Supplementary data). Twenty of these putative ORFs are absolute preference with respect to orientation but members of a overlapping in either direction (Fig. 2). Approximately one gene family often have similar orientation on a given segment fourth (15/56) of the HfIV genome segments contain only one (Fig. 2). putative ORF, whereas there are 12, 11, 13, 1, 1, 1 and 1 genome segments with 2, 3, 4, 5, 6, 7 and 8 ORFs, respectively. There is Predicted ORFs displaying similarity with members of known one genome segment for which we did not detect any putative PDV gene families ORF. Sixteen of the 56 HfIV genome segments contain ORFs that are not homologous with any known PDV gene. Homology searches indicated that 78 of the 151 ORFs (52%) In the TrIV genome, we identified 86 putative ORFs on the predicted from the HfIV genome encode proteins showing 27 fully or partially sequenced genome segments as well as on significant similarity to members of previously characterized IV five unassigned contigs, with ORFs being detected on all but gene/protein families: repeat element (rep), viral innexin (inx), one of the genome segments thus far studied (Fig. 2). Twenty- viral ankyrin (ank), N gene and cysteine motif (cys)(Fig. 3; see K. Tanaka et al. / Virology 363 (2007) 26–35 29

Fig. 2. Graphic representation of the HfIV and TrIV genome sequences and annotations, where non-redundant circular genome segments are shown as linear molecules. HfIV: the 56 individual HfIV genome segments are displayed from smallest (A1=2.56 kb) to largest (G1=8.85 kb). TrIV: the 20 fully and 7 partially sequenced TrIV genome segments are displayed from smallest (A1=4.12 kb) to largest (G5=10.14 kb), along with five (right uppermost bars) ORF-containing contigs which could not be assigned to any of the cloned genome segments. Colored boxes show the sizes and locations of ORFs, with orientation indicated by the arrowhead on each box. Refer to legend for color coding of gene families. Non-coding sequences are shown in gray while unsequenced portions (TrIV) are shown in white. The sizes of the latter were estimated by gel electrophoresis of cloned genome segments. The locations of daughter segments are indicated with black lines drawn under the colored bars; in the text, these are designated by the name of the parental genome segment followed by the prime sign (′).

Supplementary data for details). Of these, the rep family is the the TrV family do not contain any cysteine motifs. Interest- best represented followed by the inx and ank families (Fig. 3). ingly, we identified only one cys gene in TrIV (on contig No orthologs of BV-specific genes were found. c111), whereas this family constitutes the second and fifth Analysis of the TrIV genome segments sequenced to date largest gene families in CsIV and HfIV, with ten and five revealed the presence of representatives of each of the genes, respectively (Fig. 3). aforementioned gene families. Although we do not yet have the full sequence of the TrIV genome, it is clear from the A new putative gene family shared by HfIV, TrIV and CsIV available data (≥80% of the genome) that this virus has significantly fewer representatives of the rep, inx, ank and cys Our initial criteria for the selection of putative IV ORFs families than either HfIV or CsIV (Fig. 3); however, as is true imposed a lower limit of 100 codons, including a methionine of the genomes of the latter two viruses, the rep genes start codon. In an effort to assess the likelihood that smaller constitute the largest family of TrIV. The TrV gene family, ORFs could encode functional proteins, we performed local three members of which were characterized earlier (Béliveau et blast searches on a database built from all of the HfIV, TrIV and al., 2000, 2003), takes second position in terms of frequency, CsIV genome sequences and reduced the size limit to 50 with seven members (Figs. 3 and 4A). Four of these are newly codons. The rationale behind this strategy was that if a small described ORFs, including two (TrV3, TrV7) whose encoded ORF has homologs in all three genomes, the chances that it proteins are very similar to TrV1 and TrV2, and two others encodes a functional protein are increased. Using this approach, (TrV5, TrV6) displaying greater similarity to TrV4, although we uncovered a new family of putative genes encoding polar- they are devoid of the long octad repeat characteristic of TrV4 residue-rich proteins (PRRP) found in all three viruses (Fig. 3). (Béliveau et al., 2003; Fig. 4A). The TrV3 ORF is located on In HfIV, PRRPs make up the second largest gene family, with 11 genome segment G2 along with TrV1, while the TrV5 gene is distinct members and 4 copies on daughter segments. In this found on genome segment G3, along with TrV2. The TrV6 virus, PRRPs contain between 67 and 128 a.a., with molecular ORF, the smallest TrV gene identified so far, is located on weights (Mr) ranging from 8.1 to 17.3 kDa. These putative genome segment D1, whereas TrV7 is on an unassigned contig proteins have a content of polar residues varying between 63 (Fig. 2). Because the N-termini of TrV and cys proteins display and 77% over the entire sequence, the majority of which are significant similarities, we suggested earlier the existence of Arg, Glu and Gln residues displaying short, repetitive sequences some distant relationship between these two families of (Fig. 4B). A short stretch of small, hydrophobic residues is proteins (Béliveau et al., 2000, 2003); however, proteins of present in the amino terminal portion of these proteins. In 30 K. Tanaka et al. / Virology 363 (2007) 26–35

from 10 to 2326 bp (mean=495) and from 9 to 2394 bp (mean=522), respectively (see Supplementary data for details).

Repetitive sequences

In HfIV and TrIV, we identified 348 and 217 tandem repeats (perfect and imperfect combined), respectively (see Supple- mentary data for details). In both genomes, at least one tandem repeat sequence was found on each genome segment, with a maximum of 14 and 34 per genome segment and an average repeat size of 15.8 and 15.2 bp for HfIV and TrIV, respectively. The majority of repetitive sequences are located in non-coding regions.

tRNA searches

Although seven tRNA genes were found in MdBV, none could be predicted from the CsIV genome (Webb et al., 2006). Similarly, we could not find any tRNA gene in HfIV, but genome segment G5, of TrIV, was observed to contain one tRNA gene that is a cognate for valine.

Discussion

Shared features among IV genomes

The present study provides the first comparative analysis of ichnovirus genomes, from which we can begin to extract some of the features that define this taxon. As pointed out in an earlier report, in which some preliminary data on the HfIV and TrIV Fig. 3. Pie charts showing the proportions of ORFs in each gene family identified for HfIV, TrIVand CsIV (the latter data set is from Webb et al., 2005). genomes were presented (Webb et al., 2006), IV genomes share While the charts on the left show the proportions of assigned and unassigned a number of characteristics with those of BVs. Apart from their ORFs in each genome, those on the right show only the proportions of ORFs that hallmark segmentation, the viral genomes compared here could be assigned to gene families. Actual numbers of ORFs are shown beside display coding densities (22–30%; Fig. 1E) and A+T biases the gene family name abbreviations. (57–59%; Fig. 1D) that are similar to those observed in MdBV (17 and 66%, respectively; Webb et al., 2006) and CcBV (27 and 66%, respectively; Espagne et al., 2004); however, unlike comparison, the TrIV genome contains only one PRRP (116 a. the two characterized BV genomes, which have vastly different a.; Mr: 14.4 kDa) whereas five are found in the CsIV genome aggregate sizes (MdBV: 189 kb, Webb et al., 2006; CcBV: plus two copies on daughter segments (77–122 a.a.; Mr: 9.7– 568 kb, Espagne et al., 2004), the three IV genomes have 14.2 kDa). None of the PRRP sequences is predicted to contain similar estimated aggregate sizes of ∼250 kb. signal peptides; these putative proteins are therefore expected to Genome segment nesting, a phenomenon documented earlier remain within the cells in which they are expressed. for both HfIV (Xu and Stoltz, 1993) and CsIV (Cui and Webb, 1997), is here reported for the first time in TrIV, with genome Putative promoter elements and polyadenylation signals segments D3′ being derived from a larger DNA (Fig. 2). This phenomenon, also believed to occur in H. didymator ichnovirus All predicted ORFs contain putative upstream promoter (HdIV; Galibert et al., 2003) but not yet described for any BV, elements. In one and five instances for TrIV and HfIV, may be common among IVs. The smaller, “nested” genome respectively, a TATA box sequence was assigned to two segments are apparently generated by larger genome segments ORFs (see Supplementary data for details). The predicted 5′ undergoing intramolecular recombination at intrasegmental UTR lengths range from 1 to 2899 bp (mean=302) and from 3 repeats (Kroemer and Webb, 2004). Given that PDVs do not to 2315 bp (mean=246) for HfIV and TrIV, respectively. replicate in their lepidopteran hosts, segment nesting could In HfIV, 12 poly-A signals out of 150 are predicted to be provide a mechanism for amplifying gene copy numbers (Cui utilized by two separate ORFs, and 36 are of the ATTAAA type. and Webb, 1997). If this interpretation is correct, high-level In TrIV, 8 poly-A signals out of 79 are predicted to be utilized expression of genes carried by those genome segments could be by two separate ORFs, and 15 are of the ATTAAA type. critically important to the success of parasitism. Hypermolarity Lengths of the 3′ UTRs are similar in HfIV and TrIV, ranging of certain genome segments may serve a function similar to that K. Tanaka et al. / Virology 363 (2007) 26–35 31

Fig. 4. Multiple alignment of amino acid sequences (ClustalW) for the TrV and polar-residue-rich proteins (PRRP) families of proteins. (A) The seven known TrV proteins from TrIV. Sequences of TrV1, TrV2 and TrV4 were reported earlier (Béliveau et al., 2000, 2003); those of TrV3, TrV5, TrV6 and TrV7 (bold characters) are from this study. Note that in Supplementary data, some of these genes have been renamed to improve consistency in gene nomenclature (TrV1: G2.1; TrV2: G3.1; TrV3: G2.2; TrV5: G3.2; TrV6: D1.2). Black letters/green shading: identical residues among at least 5 of the 7 proteins; white letters/black shading: identical residues among 2–4 proteins (in cases of multiple sets, identities among first 4 proteins only); black letters/gray shading: conservative substitutions; black letters/yellow shading: residues which, among TrV4, TrV5 and TrV6, are either identical or similar while being different from those in the other 4 proteins. (B) Portions of the HfIV, TrIVand CsIV PRRPs containing charged amino acids. The negatively (D and E) and positively (R, K, and H) charged amino acids are shaded in black and light gray, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) of genome segment nesting. The higher representation of certain two TrIV genomic libraries (Béliveau et al., 2000; Cusson et al., genome segments has been well documented in CsIV (Webb 2001). This strongly suggested that TrV1 was on a high- et al., 2005). The present sequencing effort indicates that this abundance genome segment. Interestingly, TrV1 is a spliced phenomenon likely occurs in HfIV and TrIV as well, as gene that encodes a secreted protein, a feature shared with the suggested by the higher representation of some clones in the cys genes which, in CsIV, are found on hypermolar genome libraries. In the case of TrIV, earlier work had indicated that the segments (Webb et al., 2006). A similar association is observed highly expressed TrV1 gene was greatly over-represented in in MdBV, where the high-abundance genome segment O carries 32 K. Tanaka et al. / Virology 363 (2007) 26–35 the egf-motif genes, which also are spliced and encode secreted hosts (22 versus 6 and 9 for T. rostrale and C. sonorensis, proteins (Webb et al., 2006). Whether hypermolarity of IV respectively; Krombein, 1979) and with the largest set of genes genome segments is consistently associated with high-level (Fig. 3). Finally, the patterns observed could reflect differences expression of secreted proteins needs to be formally assessed. in the pathological effects caused by each virus in its habitual or Perhaps the most striking similarity among the three IV most common lepidopteran host. For example, TrIV has been genomes considered here is the presence of members, in each of shown to have little or no impact on the cellular immune them, of the following six gene families: rep, cys, inx, ank,N- response of its C. fumiferana host (Doucet and Cusson, 1996a), family and the polar-residue-rich proteins (PRRP), a newly an observation that is in sharp contrast with those reported for defined putative family (Fig. 3). In view of the fact that the TrV HfIV (Stoltz and Guzo, 1986) and CsIV (Edson et al., 1981). family, which is unique to TrIV, could very well be derived from Interestingly, there are five and ten members of the cys gene cys genes (see below), the present analysis suggests that the family in HfIVand CsIV, respectively, while we found only one activity of a basic set of genes is required for successful cys gene in TrIV (Fig. 3); in CsIV-infected Heliothis virescens parasitism by IV-carrying wasps. Also remarkable is the domi- hosts, cys gene products have been implicated in the inhibition nance, among these families, of the rep genes, which make up of the cellular immune response that leads to encapsulation of ∼50% of the assigned genes in all three viruses (Fig. 3). The the immature parasitoid (Li and Webb, 1994; Cui et al., 1997). function of these genes remains unknown (Kroemer and Webb, Another important difference among the three IV genomes 2004), but their presence in high numbers in all three genomes compared here is the presence in TrIV of the TrV gene family, suggests that their products are essential components of IV which is absent in both HfIV and CsIV (Fig. 3). The present activity in parasitized host larvae. study adds four new members to this group of small proteins which, in spite of their clear relatedness, display important Differences among IV genomes differences, particularly at their C-termini (Fig. 4A; Béliveau et al., 2003). Although they lack a cys-motif, TrV proteins appear The present study reveals significant interspecific differences related to the CsIV cys proteins, given the significant similarity in the degree of IV genome segmentation (Fig. 1A), with HfIV among their N-termini (Béliveau et al., 2000, 2003). In addition, displaying the greatest level of PDV genome segmentation both families contain spliced genes (although not all TrV genes reported to date. In addition, because the aggregate genome sizes are spliced) that encode abundantly secreted proteins. In the H. estimated here are similar, variations in genome segmentation virescens–C. sonorensis host–parasitoid system, the CsIV cys yield differences in the size distribution of genome segments proteins have been shown to disrupt both the immune system (Fig. 1B). It is not clear yet which selective pressure, if any, (Cui and Webb, 1996) and the development (Fath-Goodin et al., drives PDV genome segmentation up or down. Theoretically, for 2006) of host larvae. In view of the fact that T. rostrale does not a given genome size and coding density, increasing the degree of appear to rely on PDV gene expression to evade the host segmentation could result in fewer genes being present on each immune response (Doucet and Cusson, 1996a; Cusson et al., genome segment, thereby providing greater flexibility with 1998b) and that TrV1 is the most abundantly expressed TrIV respect to molar ratios of individual genes through differential gene during arrestment of host development (Béliveau et al., amplification of specific genome segments. The present data set, 2000, 2003), the absence of the cys-motif in these proteins could however, is too small to permit a reliable assessment of this be an indication that this motif is not required for disruption of hypothesis. Alternatively, differences in the degree of genome host metamorphosis. segmentation could have a purely mechanistic origin, with Evidence for the existence of other species-specific IV gene variation in segmentation, within defined boundaries, being families has been provided by independent studies on HdIV under little or no selective pressure. (Volkoffet al., 1999; Galibert et al., 2003). One of these families, Although the three IV genomes presented here share six gene the glycine- and proline-rich proteins, contains at least three families, the proportion of members within each family tends to members, which are secreted proteins varying in size between 40 vary significantly from one species to another, with the and 70 kDa and displaying repeated sequences arranged in exception of the rep family (Fig. 3). For example, the cys tandem arrays; one of them, P40, is encoded by a spliced gene. family (Dib-Haji et al., 1993) makes up 17% of assigned genes Using an antibody raised against the latter protein, the authors in CsIV but only 3% in TrIV; similarly, the PRRP family makes observed cross-reactivity to an 80 kDa host protein, leading to up 16% of assigned genes in HfIV but only 3% in TrIV; the ank the suggestion that this HdIV gene family may be involved in genes are also more abundant in CsIV and HfIV than in TrIV. immunoevasion (Volkoff et al., 1999). Two additional genes, These differences could simply reflect variations in tissue- apparently unrelated to one another, have been identified as specific requirements for each gene family in a given host; some potentially unique to HdIV; HdCorfS6 and HdGorfP30 encode members of the ank gene family of CsIV, for example, have serine- and threonine-rich proteins that are either secreted or been shown to be differentially expressed in tissues such as fat membrane-associated. The HdGorfP30 protein contains a body and hemocytes (Kroemer and Webb, 2005). Differences in mucin-like motif (Galibert et al., 2003) similar to that of the the relative proportion of each gene family among genomes MdBV Glc proteins (Trudeau et al., 2000), suggestive of could also result from differential constraints imposed by host convergent evolution between the two viruses. Whether the IV range; for example, among the three wasp species considered genomes that we sequenced contain distant relatives (that here, H. fugitivus is the one with the greatest number of known escaped detection) of these HdIV genes is not clear at this point. K. Tanaka et al. / Virology 363 (2007) 26–35 33

A new putative gene family: the polar-residue-rich proteins density and in the number of representatives within the shared (PRRPs) IV gene families (e.g., cys genes), an observation that gives support to the hypothesis that the IV-like particles found in the Given that IV genomes have been shown to encode some wasp Venturia canescens, which are devoid of DNA, have relatively small proteins (see, for example, proteins of the TrV “lost” their genetic material because it is no longer required in family), we felt that a 100-codon lower cutoff for detection of the wasp's present association with its host (Schmidt and putative ORFs could lead to some potentially important ORFs Schuchmann-Feddersen, 1989). being dismissed on the basis that they are too small and display no homology to known proteins. After adjusting our criteria, we Materials and methods identified one group of small ORFs that have homologs in all three IV genomes considered here. These putative proteins are PDV genomic DNA preparation most abundant in the HfIV genome, with 11 members containing between 67 and 128 amino acids that are dominated by polar T. rostrale and H. fugitivus female wasps were obtained as residues (Fig. 4B), hence the name polar-residue-rich proteins described (Cusson et al., 1998a; Doucet and Cusson, 1996a, (PRRPs). Although blastp searches did not yield clear homologs, 1996b; Krell and Stoltz, 1980; Stoltz et al., 1986). Virions were many larger proteins have charged-amino-acid-rich domains purified and their DNA was extracted as described (Krell and that resemble the PRRPs. These domains are predicted to have a Stoltz, 1980; Stoltz et al., 1986). high probability of coiled-coil formation, structures that are For the construction of HfIV genomic libraries, viral DNA thought to be involved in homo- or hetero-oligomerization of was first digested with one of several restriction enzymes (ApaI, proteins (Sohda et al., 2001). Despite their ubiquitous presence BamHI, EcoRI, KpnI, SstI, SpeI, PstI, XbaI and XhoI), and the in IVs, the PRRPs identified here have yet to be assigned a resulting fragments were cloned within appropriate sites of the function and their expression levels remain to be assessed. pZErO-2 vector (Invitrogen). The TrIV genome was digested with the same enzymes, except for ApaI and SstI, which were Sequencing IV genomes using viral DNA obtained from wild not used, and the resulting fragments were cloned into either the wasp populations: the case of TrIV pTZ18R (Béliveau et al., 2000) or pZErO-2 vectors. Plasmid DNA was purified from bacterial cultures using either the In undertaking the sequencing of the TrIV genome, we QIAprep Spin Miniprep Kit (QIAGEN) or Wizard Magnesil expected greater difficulty in sequence assembly relative to the Plasmid Purification System (Promega). In order to confirm that HfIV and CsIV genomes inasmuch as the wild T. rostrale purified plasmids contained only one digested genome population that we used to extract TrIV DNA was expected to fragment, purified plasmids were digested with appropriate generate high levels of viral DNA polymorphism as compared restriction enzymes and fractionated by electrophoresis in 0.6% to H. fugitivus and C. sonorensis. Although this prediction agarose gels. proved to be correct, we were nonetheless able to sequence Individual HfIV and TrIV genome segments were isolated, ≥80% of the TrIV genome. In considering that the latter has an replicated and stably maintained in Escherichia coli using the estimated coding density of ∼22% (Fig. 1E), we feel confident EZ∷TN transposon system, following the manufacturer's that the majority of genes present in the TrIV genome were directions (Epicentre). In this process, bacterial origins of identified in the present sequencing effort. Thus, our experience replication were inserted into the circular genome segments, suggests that the sequencing of additional IV genomes from which permitted the cloning of individual genome segments in wild wasp populations is feasible and may be a worthwhile their entirety. endeavor. Sequencing and assembly Conclusions To sequence the HfIV and TrIV genomes, we used both The present work points to various similarities and genomic and EZ∷TN libraries, for a total of 10 and 8 libraries, differences among ichnovirus genomes. Previous PDV genome respectively. Sequencing reactions were conducted with vector- analyses (Espagne et al., 2004; Webb et al., 2006) have led to and virus-specific primers using the ABI Prism BigDye the conclusion that these unusual viral entities are devoid of Terminator Cycle Sequencing Ready Reaction Kit (PE Applied genes involved in replication and DNA packaging and that Biosystems) on an ABI Model 310 Prism DNA Sequencer (PE most, if not all, PDV virulence genes have been acquired from Applied Biosystems) or the Big Dye v3.1 kit (Applied wasp hosts. In this context, the important similarities observed Biosystems) on an ABI 3730/xl sequencer (Applied Biosys- here are likely derived from an IV (and a wasp host) that is tems), at the University of Kentucky and at the CHUL Research ancestral to the three species that we employed for our analysis. Centre of Laval University, respectively. Conversely, the species-specific differences that we identified The DNA sequences were first edited to remove vector are believed to reflect ongoing adaptations of parasitic wasps to sequences and then assembled using either SeqMan II changes in their hosts. In the case of T. rostrale, the reduced (Lasergene DNAStar) or Genchek v. 2061 (Ocimum Biosolu- requirement for depressing the host immune system appears to tions). Contigs were built with a minimum of 50 nt overlap, and have led to a decrease, in the TrIV genome, in both coding sequence variation below 0.5% was treated as polymorphism. 34 K. Tanaka et al. / Virology 363 (2007) 26–35

The gaps between contigs on the same genome segment were from RNA sequence alignments with secondary structure closed by primer walking. information. The closest predicted poly-A signal to an upstream ORF was considered the putative poly-A signal for that ORF. ORF predictions and sequence homology searches Searches for repeat sequences and tRNA ORFs were identified by using NCBI's ORF Finder (http:// www.ncbi.nlm.nih.gov/gorf/gorf.html), the Gene Construction Polydnavirus genomes contain a number of repetitive Kit 2 program (Texco Inc.) and Genchek v. 2061 (Ocimum sequences (Webb, 1998). In order to search for perfect and Biosolutions). Only ORFs encoding more than 100 amino acids imperfect direct tandem repeat sequences, we used the Tandem (a.a.) with an initiator methionine were initially considered as Repeat Finder (http://tandem.bu.edu/trf/trf.submit.options. putative genes, although some exceptions were made for html), with default settings, at the most permissive values smaller ORFs that generated strong blastp values in internal (alignment parameters: weight for match, mismatch and comparisons involving the HfIV, TrIV and CsIV genomes only. insertions/deletions were 2, 3 and 5, respectively), excluding Database searches were performed using NCBI's blastn, blastx short tandem repeats up to penta tandem repeats (Benson, and blastp (http://www.ncbi.nlm.nih.gov/BLAST/)(Altschul et 1999). PALINDROME (EMBOSS) (http://bioweb.pasteur.fr/ al., 1990, 1997). The a.a. sequences were also checked for seqanal/interfaces/palindrome.html) with default settings was alignments with protein families at InterProScan (http://www. used for detecting inverted repeats (Rice et al., 2000). ebi.ac.uk/InterProScan/) and for protein families and domains at Sequences were also scanned for possible tRNA genes using Prosite (http://au.expasy.org/prosite/). the tRNAscan-SE (v1.21) program (http://www.genetics.wustl. edu/eddy/tRNAscan-SE/), using default settings (Lowe and Identification of putative promoter element and transcription Eddy, 1997). initiation site Acknowledgments The HfIV and TrIV genome sequences were scanned for promoter regions and transcription initiation sites with the This research was supported by Initiative for Future Neural Network Promoter Prediction (NNPP) program v. 2.2, at Agriculture and Food Systems Grant no. 2001-52100-11332 the Berkeley Drosophila Genome Project's (BDGP) web site from the USDA Cooperative State Research, Education, and (http://www.fruitfly.org/seq_tools/promoter.html), using a cut- Extension Service and NSF-MCB-0094403 grants to BAW, a off score of 0.6 (out of 1.0) (Reese et al., 1996). The closest grant from Genome Canada through the Ontario Genomics predicted region to a downstream ORF was designated as the Institute to MC and grants from the Natural Sciences and promoter region. Cellular and viral TATA box element Engineering Research Council of Canada to DS and MC. This is sequences in this promoter region were searched with the publication #06-08-039 of the University of Kentucky Agri- TRANSFACFind search engine (http://motif.genome.jp/), using cultural Experiment Station. a cutoff score of 60 (out of 100) (Heinemeyer et al., 1999). The predicted TATA box with the highest score was considered the Appendix A. Supplementary data putative TATA box for that ORF. Supplementary data associated with this article can be found, Prediction of polyadenylation signals in the online version, at doi:10.1016/j.virol.2006.11.034.

The HfIV and TrIV genome sequences were submitted to References polyadq, a human polyadenylation (poly-A) signal search engine (http://rulai.cshl.org/tools/polyadq/polyadq_form.html). The Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic polyadq program was designed to detect and evaluate potential local alignment search tool. J. Mol. Biol. 215, 403–410. polyadenylation signals in human DNA sequences using weight Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of matrices for base composition and position in the downstream protein database search programs. Nucleic Acids Res. 25, 3389–3402. element of a U- or GU-rich sequence located 20 to 40 bases after Béliveau, C., Laforge, M., Cusson, M., Bellemare, G., 2000. Expression of a the cleavage site (Colgan and Manley, 1997; Tabaska and Zhang, Tranosema rostrale polydnavirus gene in the spruce budworm 1999). In CsIV, 3′ RACE experiments showed that two innexin Choristoneura fumiferana. J. Gen. Virol. 81, 1871–1880. genes (genome segment Q) and one vankyrin gene (genome Béliveau, C., Levasseur, A., Stoltz, D., Cusson, M., 2003. Three related TrIV ichnovirus genes: comparative sequence analysis, and expression in host segment P) utilize ATTAAA instead of AATAAA as the poly-A larvae and Cf-124T cells. J. Insect Physiol. 49, 501–511. signal (unpublished data). We therefore used the polyadq Benson, G., 1999. Tandem repeats finder: a program to analyze DNA sequences. (Tabaska and Zhang, 1999) program to detect the ATTAAA Nucleic Acids Res. 27, 573–580. variant of the polyadenylation signal as well as the AATAAA Colgan, D.E., Manley, J.L., 1997. Mechanism and regulation of mRNA – signal. We also used Erpin, an RNA motif search program (http:// polyadenylation. Genes Dev. 11, 2755 2766. Cui, L., Webb, B.A., 1996. Isolation and characterization of a member of the tagc.univ-mrs.fr/erpin/) (Easy RNA Profile IdentificatioN) cysteine-rich gene family from Campoletis sonorensis polydnavirus. J. Gen. (Gautheret and Lambert, 2001) to predict polyadenylation Virol. 77, 797–809. signals from two matrices and profile types that were constructed Cui, L., Webb, B.A., 1997. Campoletis sonorensis polydnavirus W segment K. Tanaka et al. / Virology 363 (2007) 26–35 35

family: integration in the wasp genome and segment nesting. J. Virol. 71, Kroemer, J.A., Webb, B.A., 2005. Iκβ-related vankyrin genes in the Campoletis 8504–8513. sonorensis ichnovirus: temporal and tissue-specific patterns of expression in Cui, L., Soldevila, A., Webb, B.A., 1997. Expression and hemocyte-targeting of parasitized Heliothis virescens lepidopteran hosts. J. Virol. 79, 7617–7628. a Campoletis sonorensis polydnavirus cysteine-rich gene in Heliothis Krombein, K.V., 1979. Catalog of Hymenoptera in America North of Mexico. virescens larvae. Arch. Insect Biochem. Physiol. 36, 251–271. Smithsonian Institution Press, Washington, D.C. Cusson, M., Barron, J.R., Goulet, H., Régnière, J., Doucet, D., 1998a. Biology Li, X., Webb, B.A., 1994. Apparent functional role for a cysteine-rich and status of Tranosema rostrale rostrale (Hymenoptera: Ichneumonidae), a polydnavirus protein in suppression of the insect cellular immune parasitoid of the eastern spruce budworm (: Tortricidae). Ann. response. J. Virol. 68, 7482–7489. Entomol. Soc. Am. 91, 87–93. Lowe, T.M., Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection Cusson, M., Lucarotti, C., Stoltz, D., Krell, P., Doucet, D., 1998b. A of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, polydnavirus from the spruce budworm parasitoid, Tranosema rostrale 955–964. (Ichneumonidae). J. Invert. Pathol. 72, 50–56. Reese, M.G., Harris, N.L., Eeckman, F.H., 1996. Large scale sequencing Cusson, M., Béliveau, C., Laforge, M., Bellemare, G., Levasseur, A., Stoltz, D., specific neural networks for promoter and splice site recognition. In: Hunter, 2001. Hormonal alterations and molecular mechanisms underlying the L., Klein, T.E. (Eds.), Biocomputing. Proceedings of the 1996 Pacific induction of host developmental arrest by endoparasitic wasps. In: Edwards, Symposium. World Scientific Publishing Co, Singapore. J.P., Weaver, R.J. (Eds.), Endocrine Interactions of Parasites and Pathogens. Rice, P., Longden, I., Bleasby, A., 2000. EMBOSS: the European Molecular BIOS Scientific Publishers, Oxford, pp. 111–121. Biology Open Software Suite. Trends Genet. 16, 276–277. Dib-Haji, S.D., Webb, B.A., Summer, M.D., 1993. Structure and evolutionary Schmidt, O., Schuchmann-Feddersen, I., 1989. Role of virus-like particles in implications of a “cysteine-rich” Campoletis sonorensis polydnavirus gene parasitoid–host interaction of insects. In: Harris, J.R. (Ed.), Subcellular family. Proc. Natl. Acad. Sci. U.S.A. 90, 3765–3769. Biochemistry, vol. 15. Plenum Press, New York, pp. 91–119. Doucet, D., Cusson, M., 1996a. Role of calyx fluid in alterations of immunity in Sohda, M., Misumi, Y., Yamamoto, A., Yano, A., Nakamura, N., Ikehara, Y., Choristoneura fumiferana larvae parasitized by Tranosema rostrale. Comp. 2001. Identification and characterization of a novel Golgi protein, GCP60, Biochem. Physiol. 114A, 311–317. that interacts with the integral membrane protein giantin. J. Biol. Chem. 48, Doucet, D., Cusson, M., 1996b. Alteration of developmental rate and growth of 45298–45306. Choristoneura fumiferana parasitized by Tranosema rostrale: role of the Stoltz, D.B., 1990. Evidence for chromosomal transmission of polydnavirus calyx fluid. Entomol. Exp. Appl. 81, 21–30. DNA. J. Gen. Virol. 71, 1051–1056. Dupuy, C., Huguet, E., Drezen, J.M., 2006. Unfolding the evolutionary story of Stoltz, D.B., Guzo, D., 1986. Apparent haemocytic transformations associated . Virus Res. 117, 81–89. with parasitoid-induced inhibition of immunity in Malacosoma disstria Edson, K.M., Vinson, S.B., Stoltz, D.B., Summers, M.D., 1981. Virus in a larvae. J. Insect Physiol. 32, 377–388. parasitoid wasp: suppression of the cellular immune response in the Stoltz, D.B., Xu, D., 1990. Polymorphism in polydnavirus genomes. Can. J. parasitoid's host. Science 211, 582–583. Microbiol. 36, 538–543. Espagne, É., Dupuy, C., Huguet, É., Cattolico, L., Provost, B., Martins, N., Stoltz, D.B., Guzo, D., Cook, D., 1986. Studies on polydnavirus transmission. Poirié, M., Periquet, G., Drezen, J.-M., 2004. Genome sequence of a Virology 155, 120–131. polydnavirus: insights into symbiotic virus evolution. Science 306, Tabaska, J.E., Zhang, M.Q., 1999. Detection of polyadenylation signals in 286–289. human DNA sequences. Gene 231, 77–86. Fath-Goodin, A., Gill, T.A., Martin, S.B., Webb, B.A., 2006. Effect of Trudeau, D., Witherell, R.A., Strand, M.R., 2000. Characterization of two novel Campoletis sonorensis ichnovirus cys-motif proteins on Heliothis Microplitis demolitor polydnavirus mRNAs expressed in Pseudoplusia virescens larval development. J. Insect Physiol. 52, 576–585. includens haemocytes. J. Gen. Virol. 81, 3049–3058. Galibert, L., Rocher, J., Ravallec, M., Duonor-Cerutti, M., Webb, B.A., Volkoff, Volkoff, A.N., Cerutti, P., Rocher, J., Ohresser, M.C., Devauchelle, G., Duonor- A.N., 2003. Two Hyposoter didymator ichnovirus genes expressed in the Cerutti, M., 1999. Related RNAs in lepidopteran cells after in vitro infection lepidopteran host encode secreted or membrane-associated serine and with Hyposoter didymator virus define a new polydnavirus gene family. threonine rich proteins in segments that may be nested. J. Insect Physiol. 49, Virology 263, 349–363. 441–451. Volkoff, A.N., Béliveau, C., Rocher, J., Hilgarth, R., Levasseur, A., Duonor- Gautheret, D., Lambert, A., 2001. Direct RNA motif definition and identification Cérutti, M., Cusson, M., Webb, B.A., 2002. Evidence for a conserved from multiple sequence alignments using secondary structure profiles. polydnavirus gene family: ichnovirus homologs of the CsIV repeat element J. Mol. Biol. 313, 1003–1011. genes. Virology 300, 316–331. Heinemeyer, T., Chen, X., Karas, H., Kel, A.E., Kel, O.V., Liebich, I., Webb, B.A., 1998. Polydnavirus biology, genome structure, and evolution. In: Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E., 1999. Expanding Miller, L.K., Balls, L.A. (Eds.), The Insect Viruses. Plenum Press, New the TRANSFAC database towards an expert system of regulatory molecular York, pp. 105–139. mechanisms. Nucleic Acids Res. 27, 318–322. Webb, B.A., Strand, M.R., Dickey, S.E., Beck, M.H., Hilgarth, R.S., Barney, Krell, P.J., Stoltz, D.B., 1980. Virus-like particles in the ovary of an W.E., Kadash, K., Kroemer, J.A., Lindstrom, K.G., Rattanadechakul, W., ichneumonid wasp: purification and preliminary characterization. Virology Shelby, K.S., Thoetkiattikul, H., Turnbull, M.W., Witherell, R.A., 2006. 101, 408–418. Polydnavirus genomes reflect their dual roles as mutualists and pathogens. Kroemer, J.A., Webb, B.A., 2004. Polydnavirus genes and genomes: emerging Virology 347, 160–174. gene families and new insights into polydnavirus replication. Annu. Rev. Xu, D., Stoltz, D.B., 1993. Polydnavirus genome segment families in the Entomol. 49, 431–456. ichneumonid parasitoid Hyposoter fugitivus. J. Virol. 67, 1340–1349.