<<

Tengs et al. Virology Journal 2013, 10:132 http://www.virologyj.com/content/10/1/132

RESEARCH Open Access A mobile genetic element with unknown function found in distantly related Torstein Tengs1*, Anja Bråthen Kristoffersen1, Tsvetan R Bachvaroff2 and Christine Monceyron Jonassen1

Abstract Background: The genetic element s2m seems to represent one of very few examples of mobile genetic elements in viruses. The function remains obscure and a scattered taxonomical distribution has been reported by numerous groups. Methods: We have searched GenBank in order to identify all viral accessions that have s2m(−like) sequence motifs. Rigorous phylogenetic analyses and constrained tree topology testing were also performed in order to investigate the apparently mobile nature of s2m. Results: The stem-loop s2m structure can be found in four families of + ssRNA viruses; Astroviridae, , Picornaviridae and . In all of these families, with the possible exception of Caliciviridae, multiple gains and/or losses of s2m would have to be postulated in order to explain the distribution of this character. Conclusions: s2m appears to be a mobile genetic element with a unique evolutionary history in all of the four virus families where it can be found. Based on our findings and a review of the current literature on s2m, a hypothesis implying an RNAi-like function for the s2m element is also outlined. Keywords: , Caliciviruses, , , Mobile genetic element, s2m

Background The s2m sequence is also present in the of In 1997, a 43 basepair, conserved sequence motif was the severe acute respiratory syndrome (SARS) corona- described near the 3’ end of members of the Astroviridae virus (SARS-CoV) where its three-dimensional crystal family [1]. The genetic element corresponded to the sec- structure has been characterized in great detail [4]. The ond most 3’ stem-loop structure (stem-loop II) in function of s2m remains obscure, although in the case of astroviruses and was subsequently named s2m [2]. The SARS-CoV, it was suggested that the structure might sequence motif has later been found in three other virus interfere with synthesis through mimicry of families; Caliciviridae, Picornaviridae and Coronaviridae small subunit ribosomal RNA (SSU rRNA) and subse- [3]. The distribution of s2m seems to be limited to quent binding of ribosomal [4]. This proposed positive-sense, single-stranded RNA (+ssRNA) viruses affinity was, however, only observed when interactions and the element is always located near the 3’ end of the were modeled using prokaryotic proteins. One of the genome. Most commonly, s2m can be found down- many interesting features of s2m is that there seems to stream of the last reading frame, but there are also in- be a high level of conservation on all levels; primary stances where it appears that the final stop codon is part structure (sequence) of both stem and loop regions, sec- of the motif itself [2]. Recently, several examples of vi- ondary (stem/loop structure) and tertiary conformation, ruses containing two copies of s2m have also been indicating that all of these characteristics are important reported [3]. The level of conservation is particularly for functionality. The conserved nature of s2m has led striking given the high rates seen in RNA researchers to suggest using it as a drug target [4] and viruses. there are also protocols describing how primers targeting the conserved s2m sequence can be used for virus discovery using reverse transcription PCR [5]. * Correspondence: [email protected] 1National Veterinary Institute, P.O. Box 750 Sentrum, 0106, Oslo, Norway Full list of author information is available at the end of the article

© 2013 Tengs et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Tengs et al. Virology Journal 2013, 10:132 Page 2 of 9 http://www.virologyj.com/content/10/1/132

The phylogenetic distribution of s2m seems to support Except for the coronaviruses, numerous sequences were a model where the genetic element can be transferred obtained from isolates that had not been assigned a spe- horizontally and mobility of s2m has been suggested in cific taxonomic placement within their respective fam- the literature [3]. We present a thorough analysis of the ilies in the NCBI Taxonomy database, but in general, all distribution of s2m in viral and perform trees constructed were consistent with previously pub- likelihood-based phylogenetic analyses of all the relevant lished reports on viral phylogeny. virus groups. We propose models for the evolutionary The ln likelihood difference (Δln) between uncon- history of s2m within the different host virus families strained optimal trees and the optimal tree where all s2m and perform likelihood ratio tests to investigate the ap- sequences were required to be holophyletic (i.e. a single parent mobile nature of this sequence motif. origin of the s2m sequence) ranged from coronaviruses (3102.9), picornaviruses (939.0), astroviruses (392.3), to Results caliciviruses (3.4). In all but the case of the caliciviruses, A total of 682 s2m-containing sequences were found in monophyly of the s2m-containing viruses was rejected GenBank when a consensus sequence-based approach with p < 0.01 (Additional file 2: Figure S1). Furthermore, was used directly. Allowing a single mismatch even constrained trees assuming two gains of s2m were increased this number to 702 and when two mismatches strongly rejected for coronaviruses, picornaviruses and were allowed, a total of 706 sequences could be found astroviruses (Additional file 2: Figure S2, S3 and S4). For from four different virus families (Additional file 1). example in the picornaviruses one explanation for the dis- Within this set of accessions, representatives from all tribution of the s2m sequence could be that the sequence the s2m-containing virus families were found that was gained four times (Figure 3). The Approximately Un- contained two copies of s2m (Table 1), but no sequences biased (AU) test found that the tree where the s2m se- were found to contain three or more. The s2m se- quences were constrained into a single clade (requiring quences from genomes with two copies were never iden- holophyly of s2m containing taxa) was significantly worse tical to each other. The genomes of than the optimal tree (Additional file 2: Figure S3). Even strains GVI.1/HKU_Ca026F/2007/HKG and GVI.1/ constraining Bat 3 and Canine picornavirus HKU_Ca035F/2007/HKG had identical 3’ends, both with into a single clade was significantly worse than the optimal two copies of s2m, although there were minor sequence tree (Additional file 2: Figure S3). differences in the rest of the sequences. For the astroviruses and caliciviruses, partial RNA- Discussion dependent RNA polymerase (RdRP) amino acid se- Traditionally, the term mobile genetic elements (MGEs) quences were used for the phylogenetic analyses (257 has been restricted to include , and 265 amino acid residues, respectively), whereas lar- and transposons, although it is now widely recognized ger parts of the polyprotein sequence could be unam- that this classification is becoming obsolete as many ele- biguously aligned for the picornaviruses (998 residues) ments with novel features as well as new combinations and the coronaviruses (2833 residues). The s2m- of known features are found [7,8]. Homologous and containing viruses did not form monophyletic groups in non- events have been in- any of the four virus families (Figures 1A, 2, 3 and 4). ferred based on observations done in a wide range of

Table 1 GenBank accessions containing two copies of s2m Family Virus (strain/isolate) Accession number Caliciviridae Norovirus dog (GVI.1/HKU_Ca026F/2007/HKG)* FJ692500.1 Caliciviridae Norovirus dog (GVI.1/HKU_Ca035F/2007/HKG)* FJ692501.1 Picornaviridae Pigeon picornavirus (B 03/641) FR727144.1 Picornaviridae Pigeon picornavirus (A 03/603) FR727145.1 Coronaviridae Zaria bat (ZBCoV) HQ166910.1 Astroviridae (FP3) JN582328.1 Astroviridae Chicken astrovirus (11672) JN582327.1 Astroviridae Chicken astrovirus (GA2011) JF414802.1 Astroviridae Bat astrovirus (Tm/Guangxi/LD77/2007) FJ571066.1 Astroviridae HMO Astrovirus (B NI-196)** GQ415661.1 * - identical 3’ ends (including s2m motifs). ** - possibly due to incorrect sequence assembly. Tengs et al. Virology Journal 2013, 10:132 Page 3 of 9 http://www.virologyj.com/content/10/1/132

A 99 Camberwell virus (AF145896; Hu/NLV/Camberwell/101922/94/AUS) Hawaii calicivirus (HCU07611; Hu/NLV/Hawaii virus/1971/US) 1 (ABB90153; Mu/NoV/GV/MNV1/2002/USA) Norovirus dog (FJ692500; GVI.1/HKU_Ca026F/2007/HKG) 96 Norwalk virus (AAB50465) Chiba virus (AB042808; Hu/NLV/Chiba 407/1987/JP) Norovirus dog (ACV88132; C33/Viseu/2007/PRT) 84 Norovirus dog (ACX69258; GIV.2/Bari/91/2007/ITA) 96 Norovirus dog (JF939046; FD210/2007/Ita) Bat (AFJ39355; TLC58/HK) Porcine enteric sapovirus (AAF04560; Cowden) 81 99 Sapovirus (YP_164336; C12) Sapovirus (AAQ17058; Mc10) Sapovirus (AAU09265; Hu/Dresden/pJG-Sap01/DE) (AAA79323; Urbana) 91 Canine calicivirus (BAB83601) 100 Vesicular exanthema of swine virus (AAG13641; A48) 96 Steller sea lion (ABP88257; V1415) Walrus calicivirus (NP_777370) Rabbit vesivirus (CAI29268)

98 Rabbit calicivirus (ACJ66650; Australia 1 MIC-07) European brown hare syndrome virus (CAA93445; EBHSV-GD) 82 Rabbit hemorrhagic disease virus (AAA47285; FRG) Newbury agent (YP_529550; 1) 100 Calicivirus (AAL99277; NB) Calicivirus (BAD90664; TCG) 0.5 Calicivirus (ACQ44559; AB90/CAN)

B ACX69258 JF939046 ACV88132 FJ692500* FJ692500*

G G G G G A U A U A U A U A U G A G A G A G A G A C G C G C G C G C G C G C G C G C G C G A A A A A U U U U U G C G C G C G C G C C G C G C G C G C G A A A A A A A A A A G G G G G C G C G C G C G C G C G C G C G C G C G G U G U G U G U G U G G G G A G A A A A C A C C C C C C C C G G G G G A A A A A C C C C C C G U A C G C G C G G C G C G C G C A U C G U A A U A U G C G C U A U G U A U A CUAC UGCG U A U A U G 5’ C G 3’ 5’ C G 3’ 5’ C G 3’ 5’ 3’ 5’ 3’ Figure 1 s2m in caliciviruses. (A) Phylogenetic analysis of caliciviruses using partial RNA-dependent RNA polymerase sequences (265 amino acids) and maximum likelihood (RAxML). Bootstrap values > 80% have been indicated and s2m-containing sequences have been highlighted (boldface/underscore). (B) Secondary structure of norovirus s2m sequences folded using mfold [6]. Box and arrow indicate nucleotide positions that vary within the noroviruses. * - The two copies of s2m found in Norovirus dog (GVI.1/HKU_Ca026F/2007/HKG) and Norovirus dog (GVI.1/HKU_Ca035F/2007/HKG). Tengs et al. Virology Journal 2013, 10:132 Page 4 of 9 http://www.virologyj.com/content/10/1/132

Beluga whale coronavirus (NC_010646; SW1) 100 Infectious bronchitis virus (JF274479; ck/CH/LHLJ/07VII)* Turkey_coronavirus (YP_001941164; MG10) 100 Infectious bronchitis virus (AAA70234; Beaudette) 100 Duck coronavirus (JF705860; DK/CH/HN/ZZ2004) Night heron coronavirus (AFD29225; HKU19-6918) 100 Wigeon coronavirus (AFD29233; HKU20-9243) Common moorhen coronavirus (AFD29243; HKU21-8295) Thrush coronavirus (YP_002308496; HKU12-600) Bulbul coronavirus (ACJ12043; HKU11-796) 100 White-eye coronavirus (AFD29200; HKU16-6847) Sparrow coronavirus (AFD29208; HKU17-6124) 100 Porcine coronavirus (AFD29186; HKU15-44) Asian leopard cat coronavirus (EF584908; Guangxi/F230/2006) 100 Munia coronavirus (ACJ12061; HKU13-3514) 100 Magpie-robin coronavirus (AFD29216; HKU18-chu3) Rhinolophus bat coronavirus (ABQ57207; HKU2/GD/430/2006) 100 Miniopterus bat coronavirus (ACA52170; HKU8/AFCD77) Porcine epidemic virus (NP_598309; CV777) 100 Scotophilus bat coronavirus (YP_001351683; BtCoV/512/2005) Bat coronavirus (ABN10910; HKU9-1/BF_005I)

100 SARS coronavirus (NP_828849; Tor2) Bat coronavirus (ADK66840; BtCoV/BM48-31/BGR/2008) 100 100 Bat coronavirus (YP_001039952; HKU4-1 B04f) 100 Bat coronavirus (ABG47051; BtCoV/133/2005) Bat coronavirus (YP_001039961; HKU5-1 LMH03f) (ACT11039; Parker) 100 Murine hepatitis virus (AAB86818; MHV-A59) 100 Murine hepatitis virus (YP_209229; JHM) Human coronavirus (YP_173236; HKU1) Rabbit coronavirus (YP_005454239; HKU14-1) Equine coronavirus (YP_001671996; NC99) 100 Porcine hemagglutinating encephalomyelitis virus (AAY68294; VW572) 100 (AAK83365; BCoV-ENT) 0.3 Giraffe coronavirus (ABP38314; US/OH3-TC/2006) 100 Sable antelope coronavirus (ABP38299; US/OH1/2003/WD1418) Human enteric coronavirus (YP_003038518; 4408) Human coronavirus (AAR01012; ATCC VR-759/OC43) Figure 2 Phylogenetic analysis of coronaviruses using partial polyprotein sequences (2833 amino acids) and maximum likelihood (RAxML). Bootstrap values of 100% have been indicated and s2m-containing sequences have been highlighted (boldface/underscore). virus groups, particularly in the single-stranded RNA vi- sequences themselves (Figure 1B) could not, however, re- ruses [9]. The genomes of bacteriophages are also solve the evolutionary history of s2m for this group, most known to contain genetic material from multiple sources likely due to residues either being too rapidly evolving or and appear to be quite promiscuous when it comes to too conserved to give a good phylogenetic signal (data not acquiring novel genomic features [10]. For non-phage vi- shown). For the coronaviruses, two gains and two losses ruses, mobile genetic elements seem very rare. To our would have to be postulated in order to explain the distri- knowledge, the only other example of what appears to bution of s2m in the most parsimonious way (Figure 2). If be a mobile genetic element in regular viruses is the S7 the ancestral state for the gamma/delta coronavirus group domain found in certain double-stranded RNA viruses, was to contain s2m, losses would have to be proposed for although the degree of amino acid conservation is quite the Night heron and Wigeon isolates, albeit there was lit- low [11]. tle support separating these species so they could repre- No highly supported branches separated the dog sent a single loss. Loss would have to be proposed, norovirus sequences in the calicivirus group (Figure 1A), however, for the Beluga whale coronavirus. The s2m- so a parsimonious model for the distribution of s2m here containing, monophyletic group comprising the SARS would be a single gain of this character. In contrast to virus and a bat coronavirus can be explained through a what was found when investigating the other three s2m- single gain. In the picornaviruses, there appears to a more containing virus families, a single gain of s2m was not complex distribution of s2m (Figure 3). Single gains can rejected by the AU test. Phylogenetic analyses of the s2m explain the two monophyletic groups that have s2m Tengs et al. Virology Journal 2013, 10:132 Page 5 of 9 http://www.virologyj.com/content/10/1/132

Turdivirus 2 (ADL38959; 10717) 100 Paraturdivirus Turdivirus 3 (ADL38961; 10878) 100 Turdivirus 1 (ADL38957; 00356) Mouse (AEM05831; M-5/USA/2010) 100 Bovine kobuvirus (BAC23066; U-1) Kobuvirus 100 Porcine kobuvirus (YP_005102465; SH-W-CHN/2010/China) 100 Porcine kobuvirus swine (ACI95948; S-1-HUN/2007/Hungary) 100 Human klassevirus (ACS91540; 1/02394-01) (YP_003038594; NG-J1) 100 Hepatitis A virus (AAA45466) Avian encephalomyelitis virus (NP_653151; Calnek) 100 Seal picornavirus (ABV57376; HO.02.21) 100 Duck hepatitis A virus (ABB76602; 03D/1) Human (CAA06679; Gregory) 100 Parechovirus Ljungan virus (AAM46079; 87-012) 100 Equine rhinitis 1 (CAA65615; P1436 /71) Equine rhinitis B virus 2 (AAK915; P313/75) Equine rhinitis A virus (CAA65614; PERV) 100 100 100 Foot-and-mouth disease virus (AAG16748; C-S8) Bovine rhinitis B virus (YP_001686841; EC11/2) Seneca valley virus (ABG23522; SVV-001) Encephalomyocarditis virus (NP_056777; Ruckert) 100 100 Theiler's encephalomyelitis virus (CAA39496; GDVII) 100 Human TMEV-like (YP_001949875) 100 Human E1 (ACL99815) Cosavirus Human cosavirus A1 (ACL15185; HCoSV-A1/0553) Porcine 1 (CAB40546; 1/F65) Avian (YP_164335; TW90A) Quail picornavirus (YP_004935357; QPV1/HUN/2010) Pigeon picornavirus B (CBY02483; 03/641) Porcine sapelovirus 1 (NP_653145; V13/8) 100 Simian sapelovirus 1 (AAL57904; 2383) Feline picornavirus (AET06137; 073F) Bat picornavirus 3 (AEM23661; TLC5F) 100 Canine picornavirus (YP_005351240; 325F) 100 Bat picornavirus 2 (AEM23659; MH9F) Bat picornavirus 1 (AEM23657; NC16A) Human (AAA45756; B14) Human rhinovirus C (ABU62848; 024) Human rhinovirus (CAA00931; A89) Bovine (BAA24003; VG-5-27) 100 Porcine enterovirus 9 (AAM00274; UKG/410/73) Human 1 (CAA24461.1; Mahoney) 100 Enterovirus 0.4 Human enterovirus C109 (ADK22846.1; 109/ NICA08-4327) 100 Simian enterovirus A (NP_653149; A-2 plaque virus) Human enterovirus D (NP_040760; D/70) Simian picornavirus (YP_001700373; N125) Simian agent 5 (AAL69623; B165) Human enterovirus A (NP_042242) Figure 3 Phylogenetic analysis of picornaviruses using partial polyprotein sequences (998 amino acids) and maximum likelihood (RAxML). Bootstrap values of 100% have been indicated and s2m-containing sequences have been highlighted (boldface/underscore).

(paraturdiviruses and Equine rhinitis virus B 1 and 2) and explanation could be that the ancestral state for this en- the presence of s2m in Pigeon picornavirus B, but the tire family was to contain s2m. For this to be true, a sin- phylogenetic placement of the two other s2m viruses (Bat gle loss would have to be proposed for the picornavirus 3 and Canine picornavirus) is more ambigu- ( 3 and Astrovirus strain ous. They are not separated by highly supported branches CDB-2012). Two losses would have to have occurred in and could thus reflect a single gain, or more complex ex- Bat astrovirus 1 and Bat astrovirus strain Tm/Guangxi/ planations can be proposed, implying multiple gains and LD38/2007. All members of the large, monophyletic losses. group that includes the classical human astroviruses and The broad distribution of s2m in the astroviruses has the California sea lion astrovirus 2 as the most basal (Figure 4) has been noticed previously [3] and a possible branch contain s2m. This group is not separated from Tengs et al. Virology Journal 2013, 10:132 Page 6 of 9 http://www.virologyj.com/content/10/1/132

100 Bovine astrovirus (AED89611; B76-2/HK) Porcine astrovirus (AER29999; 43/USA) 100 Porcine astrovirus 4 (AER30008; 35/USA) Astrovirus wild boar (YP_8005271208; wild boar/WBAstV-1/2011/HUN) 94 Astrovirus rat (ADJ38390; RS118/HKG/2007) Astrovirus (ACN44170; MLB1/WD0016) Minke whale astrovirus 2 (ADX97520; MH03-629) Bottlenose dolphin astrovirus 3 (ADX97504; Tt23) Bottlenose dolphin astrovirus 1 (ACR54279; Bd1) California sea lion astrovirus 2 (ACR54273; CSL2) 98 Porcine astrovirus 2 (ADO30543; Shanghai/2008) 82 Human astrovirus 3 (AEN74891; Rus-Nsc04-H355) 100 Human astrovirus 6 (ADJ17722) 100 Human astrovirus 8 (AAF85963; Yuc-8) Human astrovirus 4 (AAW51878; Dresden) Human astrovirus 1 (ACN78551; Beijing/176/2006/CHN) 100 Bat astrovirus 1 (ACF75864; AFCD337) 88 Bat astrovirus (ACN88707; Tm/Guangxi/LD38/2007) Bat astrovirus (ACN88713; Ha/Guangxi/LS11/2007) 99 Bat astrovirus (ACN88709; Tm/Guangxi/LD77/2007) Bat astrovirus 1 (ACF75852; AFCD57) Mouse astrovirus (NC_015935; M-52/USA/2008) Bat astrovirus (ACN88711; Tm/Guangxi/LD71/2007 91 Astrovirus (ACR23347; VA1) HMO Astrovirus A (ACX85471; NI-295) Mamastrovirus (ADR65075; SMS-AstV) Mamastrovirus 13 (CAB95003) Porcine astrovirus 5 (AER30002; 33/USA) 89 100 Wood pigeon astrovirus (CBY02491; 06/15660-1) Pigeon avian nephritis virus (AEQ33643; SH10) 100 Chicken astrovirus (NP_620617; G-4260) Turkey astrovirus (NP_853540) Avastrovirus 83 100 Avastrovirus 3 (AAF60952) Astrovirus (AFF57941; CDB-2012) 0.2 Duck astrovirus 1 (ADB79809; DA08) Chicken astrovirus (AEE88304; GA2011) Figure 4 Phylogenetic analysis of astroviruses using partial RNA-dependent RNA polymerase sequences (257 amino acids) and maximum likelihood (RAxML). Bootstrap values > 80% have been indicated and s2m-containing sequences have been highlighted (boldface/underscore). the rest of the (non-s2m containing) by character. Alternative hypotheses would have to propose any highly supported branches, and is thus possible that that s2m was present in the last common ancestor of this is a ‘primitive’ member of this virus family and that the + ssRNA viruses. There is also an apparent lack of the absence of s2m in the remaining isolates can be intermediate/primitive forms of s2m motifs in GenBank. explained through a single loss. It is probable that our search strategy for s2m(−like) mo- It is intuitive that loss of a complex character, and in tifs has a certain false negative rate, but the number of particular a character that can only provide an evolu- sequences that could be found quickly reached a plateau, tionary advantage in a direct or indirect interplay with where allowing more substitutions or fixing a smaller an existing cellular mechanism and does not lead to a number of consensus motif characters only led to an ex- tremendous increase in fitness, is more likely than gain ponential increase in the number of obvious false posi- of such a feature in an evolutionary perspective. Given tives (data not shown). the data currently available for s2m we conclude that it A model where containing the s2m motif is the pro- is impossible to establish a statistical model that can take posed ancestral state for all + ssRNA viruses would have into account any such differences. In spite of this, we to postulate a large number of independent losses and a still believe that horizontal transfer is the most plausible extraordinary selection pressure to maintain s2m in cer- explanation for the distribution of s2m. It is also for- tain viral lineages. This seems unlikely as s2m only ap- mally possible that this is a case of convergent evolution, pears to provide a somewhat subtle (yet immediate) but we consider this highly unlikely given the high de- selective advantage for the host viruses. The fact that gree of s2m similarity and the complexity of the s2m remains conserved in spite of the high mutation Tengs et al. Virology Journal 2013, 10:132 Page 7 of 9 http://www.virologyj.com/content/10/1/132

rates seen in RNA viruses indicates that the virus some- mitochondria, it is possible to do a phylogenetic analysis how benefits from acquiring the sequence motif, but un- of the genetic element that has (presumably) been trans- fortunately there are very few examples from the ferred and then compare the resulting topology with that published literature on closely related viruses that differ of the hosts. s2m is short, but due to its secondary struc- in their s2m status. The turdiviruses (Figure 3) were all ture it can be unambiguously aligned and there should collected from dead , and the authors were unsuc- also be sufficient characters that show some degree of cessful in their efforts to culture the viral strains for fur- variability to give reasonable resolution if data from ther characterization [12]. All bats that were found to closely related species are compared. Regardless of this, contain bat picornaviruses appeared healthy (Figure 3 we were unable to find any correlation between s2m and [13]) and there is no data indicating that the two mutational patterns and host phylogenies. For instance, viral strains that apparently have lost s2m within the we were unable to assess whether the two copies of s2m delta/gammacoronavirus group (Night-heron corona- found in the dog norovirus strains came from independ- virus strain HKU19-6918 and Wigeon coronavirus strain ent sources or if they are the result of some sort of du- HKU20-9243) are significantly different from the other plication event and subsequent independent evolution members of this group in terms of pathogenicity, (Figure 1B and Table 1), and the phylogeny of the host specificity, etc. [14]. Exchanging the (non-s2m deltacoronovirus s2m sequences was poorly resolved containing) 3’-end of a (MCV) with and did not match the phylogeny of the hosts (data not the (s2m containing) 3’-end of a SARS-CoV did not ap- shown). There seems to be a number of loci that ‘permit’ pear to have a dramatic effect on the virus [15] and an certain substitutions and that these mutate quite quickly, IBV strain with a deleted version of s2m that was discov- masking any phylogenetic signal. ered as an escape mutant in a vaccine development pro- Although our analyses did not address the function of ject (GenBank accession number JF274479) did not s2m per se, we believe that our observations might pro- appear phenotypically different from closely related vi- vide some clues as to how s2m might evolve and provide ruses in culturing experiments (Dr. Shengwang Liu, per- a selective advantage to the host viruses. We believe that sonal communication). s2m must have some sort of ‘autonomous’ function that The viruses that contain s2m can infect a wide range of does not require complex interactions with other parts higher , including birds, bats, horses, and of the viral genome/transcriptome as there do not ap- , and display different tissue tropisms. The most pear to be any conserved flanking regions in s2m viruses likely scenario for the emergence of a new s2m- (proximity to coding region, adjacent reading frames, containing virus would be a situation where a co-infection nucleotide motifs etc.). Neither do there appear to be includes both an s2m-containing donor virus and a recipi- any conserved amino acid motifs in any of the annotated ent virus. Bats have been shown to carry many different open reading frames except the GDD core of the RNA- viruses, including members of all the families that have dependent RNA polymerase when looking at protein se- been shown to harbor s2m [13,16]. Due to their mobility, quence data from representatives from the four virus feeding habits, long span, roosting behavior, general families (data not shown). The conserved nature of s2m virus susceptibility, etc., it has been proposed that bats might also imply that the s2m targets are homologous as may represent an important reservoir for emerging viruses all infected are (relatively) closely related in [17]. However, for the coronaviruses, several s2m- an evolutionary perspective whereas the viruses are dis- containg members have been postulated to have an avian tantly related when looking both at molecular data and origin [14]. On a molecular level, is has been suggested functionality (genome replication and transcription/ that transfer of s2m occurs through non-homologous re- translation strategies). An intracellular target also seems combination in an replication-dependent manner [3]. plausible, as all s2m containing viruses replicate in the Based on our sequence alignments, it is impossible to de- cytoplasm and this is where s2m is likely to be available termine whether or not such a model should include just for interactions with the cellular machinery. The obser- the hairpin structure or if transfer of the entire 3’ end of vation that s2m can apparently be transferred between the genome represents a more likely scenario. The non- unrelated viruses and remain functional (under selection coding nature of this part of the genome is associated with pressure to maintain sequence and structure) also sug- high mutation rates and any sequence similarity in the gests strongly that the target for s2m is host-specific and s2m-flanking regions (particularly downstream, near the not viral. Based on structural similarities between micro poly(A) tail) would quickly be lost due to the high error RNA (miRNA) hairpins involved in regulation, we rates observed in replicating RNA virus genomes [18]. propose that s2m functions through a RNA interference In most cases where phylogenetic analyses are used to (RNAi)-like mechanism, possibly targeting homologous investigate the horizontal transfer of a genetic element sequence loci in infected organisms. Recent observations or a DNA-containing organelle such as or using a reverse -based approach and a recombinant Tengs et al. Virology Journal 2013, 10:132 Page 8 of 9 http://www.virologyj.com/content/10/1/132

Sindbis virus indicate that the required cellular machinery s2m-containing clades were present alternate topologies for this to function should be in place in human cells [19], were constructed in a pairwise manner to test topologies and this model would also be consistent with an additive ef- with two s2m clades constrained together. The optimal fect, where more copies of s2m would allow the formation tree compatible with the constraint was calculated using of more miRNA/protein complexes, resulting in a more RAxML with the above amino acid substitution model, profound effect on target gene regulation. followed by site likelihood calculation with RAxML. The Based on the finding of s2m in what appears to be Approximately Unbiased (AU) test was calculated using newly emerging viruses, such as SARS-CoV, we also be- CONSEL [23]. lieve that s2m still maintains its mobility and will play a role in the future of virus evolution. Additional file

Conclusions Additional file 1: List of s2m-containing sequences found in The s2m sequence motif appears to be an active mobile Genbank. genetic element that thus far can be found in four differ- Additional file 2: Figure S1, S2, S3 and S4. For each virus family, the ent families of + ssRNA viruses. It seems likely that s2m optimal tree with bootstrap support values is shown (identical to Figures 1, 2, 3 and 4). Each of the s2m-containing clades was numbered, provides some kind of selective advantage for the viruses from top to bottom without regard to taxonomy, and the total number that contain the motif, and a possible function could of observed s2m gains was mapped onto the tree. Constraint trees were be related to RNAi-like gene regulation of infected constructed to test a series of hypotheses for s2m gains. For example, all s2m-containing sequences were constrained to a single clade to see if a organisms. tree with only a single gain of the s2m sequence was significantly less likely given the alignment. Other less constrained trees were also Methods constructed to test if trees with more than one s2m gain were also significantly less likely given the alignment. The specific constraints on All sequence data were downloaded from GenBank. s2m clades are shown in the table below. In all cases the differences in Only accessions where continuous sequence information log likelihood and p-values for the different trees are shown in from the 3’ end of the genome to the selected for comparison to the most likely unconstrained tree. In addition to the Approximately Unbiased (AU) p-values, other test support values are also phylogenetic analyses was available were used when shown (np = bootstrap probablility, bp = bootstrap proportion, pp = investigating the evolutionary relationship between the posterior probability, kh = Kishino-Hasegawa, sh = Shimodaira-Hasegawa, (s2m-containing) viruses. wkh = weighted Kishino-Hasegawa, wsh = weighted Shimodaira- Hasegawa). By using all the available sequence information from publications pertaining to s2m, several conserved se- quences domains within s2m could be identified. These Competing interests The authors declare that they have no competing interests. sequence motifs were used to perform nucleotide BLAST searches through the NCBI portal and >400 s2m- Authors’ contributions containing sequences could be identified. The s2m motifs TT conceptualized the project together with CMJ, collected the sequence (43–44 long) were individually extracted and data and constructed the amino acid alignments. TT also wrote the final version of the manuscript. TRB performed all the phylogenetic analyses and aligned, and the following consensus sequence could be the AU testing. ABK did all the database mining and programming required generated: CGNGG(N)CCACGNNGNGT(N)ANNANCG to identify and tabulate the s2m-containing sequences in GenBank. All AGGGT(N)ACAG (N’s in parenthesis indicate possible authors read and approved the final manuscript. indels) for the conserved core region of s2m. This text ’ string profile was used to search all viral sequences in Authors information TT and ABK are senior staff scientists at the Norwegian Veterinary Institute GenBank using different combinations of the indicated (NVI; Section for virology and Section for epidemiology, respectively). TT indels and allowing for nucleotide substitutions. works on emerging and ABK works on epidemiology and All the alignments were constructed using the Clustal genetics of and fish pathogens. CMJ is section leader at NVI (Section for virology) and was the first to describe the s2m element in viruses. TRB W algorithm [20] and manually edited using Bioedit works as a senior research scientist at the Institute for Marine and (version 7.0.5.3; [21]). Only unambiguously aligned do- Environmental Technology and is currently investigating gene transfer and mains were included in the subsequent analyses. Phylo- gene duplications in . genetic analyses were performed using RAxML with the Acknowledgements optimal GTR (General Time Reversible) with Gamma Funding for this project was provided by the Research Council of Norway distribution model for amino acid substitution and 100 (grant 1869073). bootstrap replicates [22]. Author details Minimally constrained trees were constructed to test 1National Veterinary Institute, P.O. Box 750 Sentrum, 0106, Oslo, Norway. whether trees with monophyly of s2m-containing se- 2Institute for Marine and Environmental Technology, 701 E. Pratt St, quences were significantly less likely than the optimal Baltimore, MD 21202, USA. unconstrained tree. The primary constraint was to make Received: 12 December 2012 Accepted: 18 April 2013 all s2m-containing taxa monophyletic. If more than two Published: 25 April 2013 Tengs et al. Virology Journal 2013, 10:132 Page 9 of 9 http://www.virologyj.com/content/10/1/132

References 21. Hall TA: BioEdit: a user-friendly biological sequence alignment editor and 1. Monceyron C, Grinde B, Jonassen TO: Molecular characterisation of the analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 1999, 3′-end of the astrovirus genome. Arch Virol 1997, 142:699–706. 41:95–98. 2. Jonassen CM, Jonassen TO, Grinde B: A common RNA motif in the 3′ end 22. Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic of the genomes of astroviruses, avian infectious bronchitis virus and an analyses with thousands of taxa and mixed models. Bioinformatics 2006, equine rhinovirus. J Gen Virol 1998, 79(Pt 4):715–718. 22:2688–2690. 3. Kofstad T, Jonassen CM: Screening of feral and wood pigeons for viruses 23. Shimodaira H, Hasegawa M: CONSEL: for assessing the confidence of harbouring a conserved mobile viral element: characterization of novel phylogenetic tree selection. Bioinformatics 2001, 17:1246–1247. Astroviruses and Picornaviruses. PLoS One 2011, 6:e25964. 4. Robertson MP, Igel H, Baertsch R, Haussler D, Ares M Jr, Scott WG: The doi:10.1186/1743-422X-10-132 structure of a rigorously conserved RNA element within the SARS virus Cite this article as: Tengs et al.: A mobile genetic element with genome. PLoS Biol 2005, 3:e5. unknown function found in distantly related viruses. Virology Journal 5. Jonassen CM: Detection and sequence characterization of the 3′-end of 2013 10:132. coronavirus genomes harboring the highly conserved RNA motif s2m. Methods Mol Biol 2008, 454:27–34. 6. Zuker M: Mfold web server for folding and hybridization prediction. Nucleic Acids Res 2003, 31:3406–3415. 7. Leplae R, Hebrant A, Wodak SJ, Toussaint A: ACLAME: a CLAssification of mobile genetic elements. Nucleic Acids Res 2004, 32:D45–D49. 8. Leplae R, Lima-Mendez G, Toussaint A: ACLAME: a CLAssification of mobile genetic elements, update 2010. Nucleic Acids Res 2010, 38:D57–D61. 9. Chare ER, Holmes EC: A phylogenetic survey of recombination frequency in RNA viruses. Arch Virol 2006, 151:933–946. 10. Hatfull GF, Hendrix RW: Bacteriophages and their genomes. Current Opinion Virol 2011, 1:298–303. 11. Liu H, Fu Y, Xie J, Cheng J, Ghabrial SA, Li G, Peng Y, Yi X, Jiang D: Evolutionary of mycovirus-related dsRNA viruses reveals cross-family and evolution of diverse viral lineages. BMC Evol Biol 2012, 12:91. 12. Woo PC, Lau SK, Huang Y, Lam CS, Poon RW, Tsoi HW, Lee P, Tse H, Chan AS, Luk G, et al: Comparative analysis of six genome sequences of three novel picornaviruses, turdiviruses 1, 2 and 3, in dead wild birds, and proposal of two novel genera, orthoturdivirus and paraturdivirus, in the family picornaviridae. J Gen Virol 2010, 91:2433–2448. 13. Lau SK, Woo PC, Lai KK, Huang Y, Yip CC, Shek CT, Lee P, Lam CS, Chan KH, Yuen KY: Complete genome analysis of three novel picornaviruses from diverse bat species. J Virol 2011, 85:8819–8828. 14. Woo PC, Lau SK, Lam CS, Lau CC, Tsang AK, Lau JH, Bai R, Teng JL, Tsang CC, Wang M, et al: Discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus. J Virol 2012, 86:3995–4008. 15. Goebel SJ, Taylor J, Masters PS: The 3′ cis-acting genomic replication element of the severe acute respiratory syndrome coronavirus can function in the murine coronavirus genome. J Virol 2004, 78:7846–7851. 16. Tse H, Chan WM, Li KS, Lau SK, Woo PC, Yuen KY: Discovery and genomic characterization of a novel bat sapovirus with unusual genomic features and phylogenetic position. PLoS One 2012, 7:e34987. 17. Calisher CH, Childs JE, Field HE, Holmes KV, Schountz T: Bats: important reservoir hosts of emerging viruses. Clin Microbiol Rev 2006, 19:531–545. 18. Domingo E: Mechanisms of viral emergence. Vet Res 2010, 41:38. 19. Shapiro JS, Langlois RA, Pham AM, Tenoever BR: Evidence for a cytoplasmic microprocessor of pri-miRNAs. RNA 2012, 18:1338–1346. 20. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through Submit your next manuscript to BioMed Central sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22:4673–4680. and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit