Novikova et al. BMC Genomics 2010, 11:231 http://www.biomedcentral.com/1471-2164/11/231

RESEARCH ARTICLE Open Access EvolutionaryResearch article genomics revealed interkingdom distribution of Tcn1-like chromodomain-containing Gypsy LTR retrotransposons among fungi and

Olga Novikova*1, Georgiy Smyshlyaev2 and Alexander Blinov1

Abstract Background: Chromodomain-containing Gypsy LTR retrotransposons or chromoviruses are widely distributed among eukaryotes and have been found in plants, fungi and vertebrates. The previous comprehensive survey of chromoviruses from mosses (Bryophyta) suggested that genomes of non-seed plants contain the clade which is closely related to the retrotransposons from fungi. The origin, distribution and evolutionary history of this clade remained unclear mainly due to the absence of information concerning the diversity and distribution of LTR retrotransposons in other groups of non-seed plants as well as in fungal genomes. Results: In present study we preformed in silico analysis of chromodomain-containing LTR retrotransposons in 25 diverse fungi and a number of species including spikemoss Selaginella moellendorffii (Lycopodiophyta) coupled with an experimental survey of chromodomain-containing Gypsy LTR retrotransposons from diverse non-seed vascular plants (, ferns, and horsetails). Our mining of Gypsy LTR retrotransposons in genomic sequences allowed identification of numerous families which have not been described previously in fungi. Two new well-supported clades, Galahad and Mordred, as well as several other previously unknown lineages of chromodomain-containing Gypsy LTR retrotransposons were described based on the results of PCR-mediated survey of LTR retrotransposon fragments from ferns, horsetails and lycophytes. It appeared that one of the clades, namely Tcn1 clade, was present in basidiomycetes and non-seed plants including mosses (Bryophyta) and lycophytes (genus Selaginella). Conclusions: The interkingdom distribution is not typical for chromodomain-containing LTR retrotransposons clades which are usually very specific for a particular taxonomic group. Tcn1-like LTR retrotransposons from fungi and non- seed plants demonstrated high similarity to each other which can be explained by strong selective constraints and the 'retained' genes theory or by horizontal transmission.

Background rotransposons are divided into several superfamilies: Retrotransposons are a class of mobile genetic elements, Copia (Pseudoviridae), Gypsy (Metaviridae), Bel-Pao, which use reverse transcription in their transposition. Retrovirus (Retroviridae), and ERV [1]. Five orders of retrotransposons are recognized: those Chromodomain-containing LTR retrotransposons or having long terminal repeats (LTRs) (LTR retrotranspo- chromoviruses are the most widespread lineage of Gypsy sons); those lacking LTRs (non-LTR retrotransposons); LTR retrotransposons and are present in genomes of DIRS retrotransposons; Penelope-like retrotransposable fungi as well as in plants and vertebrates [2,3]. The char- elements; and short interspersed nuclear elements acteristic feature of chromoviruses is the presence of an (SINEs). According to the modern classification, LTR ret- additional domain - the chromodomain (CHD). CHDs are present in various eukaryotic proteins involved in * Correspondence: [email protected] chromatin remodeling and regulation of gene expression 1 Laboratory of Molecular Genetic Systems, Institute of Cytology and Genetics, Novosibirsk, Russia during development [4-6]. CHDs perform a wide range of Full list of author information is available at the end of the article

© 2010 Novikova et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons BioMed Central Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Novikova et al. BMC Genomics 2010, 11:231 Page 2 of 20 http://www.biomedcentral.com/1471-2164/11/231

diverse functions including chromatin targeting and pro- in mosses (Bryophyta) [8]. The survey was performed teinDNA/RNA interactions [6]. Recently, it has been using genome sequence data for the 25 fungal species shown that the CHDs target integration of new LTR ret- listed in Table 1. The hemiascomycetous yeasts were not rotransposon copies into heterochromatin by recognizing included in the present investigation since a comprehen- histone modifications [7]. sive survey of LTR retrotransposons from this group of Our previous comprehensive survey of chromoviruses ascomycetes was recently published [10]. First, reverse from mosses (Bryophyta) suggested that the diversity of transcriptase (RT) and integrase (Int) coding regions of CHD-containing Gypsy LTR retrotransposons in plant Gypsy LTR retrotransposons were detected in genomic genomes is underestimated [8,9]. There are four well- sequences using algorithm based on hidden Markov known CHD-containing Gypsy LTR retrotransposon model implemented in uGENE software http:// clades widely distributed among gymnosperms and ugene.unipro.ru/. The transposable elements thus identi- angiosperms: Tekay, CRM, Galadriel and Reina [2,3]. fied were then classified into families based on RT and Int Four novel clades were found to be present in mosses. domains sequence similarity. Members of the same fam- Moreover, we showed that representatives from one of ily shared high amino acid identity (90-100%) but had the moss-specific clades are more closely related to ret- very little similarity to elements from other families. Our rotransposons from fungi than to retrotransposons from survey has identified more than 150 novel Gypsy LTR ret- plants. Although we proposed that the retrotransposons rotransposon families which have not been described from this clade could have been 'retained' from the last previously (Additional file 1). common ancestor of Fungi/Metazoa lineage and plants, The newly identified Gypsy LTR retrotransposons fall the origin of this clade remains unclear. into two major, distinct lineages according to phyloge- The questions addressed in current investigation are as netic analysis based on RT and partial Int domains: chro- follows: (1) what kind of Gypsy LTR retrotransposons moviruses (or CHD-containing Gypsy LTR from fungi are closely related to the LTR retrotranspo- retrotransposons) and Ylt1-like LTR retrotransposons. sons detected in mosses; (2) how widely those clades Two retrotransposons (LacBicTy3-15 from Laccaria which were previously identified in mosses are distrib- bicolor and CopConTy3-14 from Coprinus cinereus) uted among other non-seed plants including lycophytes, formed their own lineage closely related to Ylt1-like LTR ferns and horsetails; and (3) what is the origin of the clade retrotransposons. This lineage cannot be attributed to which is common for non-seed plants and fungi. The in Ylt1 because of low bootstrap support (only 64%; Figure silico analysis of Gypsy LTR retrotransposons in 25 spe- 1) and was named SN_1006. Ylt1-like LTR retrotranspo- cies of fungi, genomes of which available in public data- sons were found in a number of fungal species (Figure 1). bases, along with survey of related LTR retrotransposons Previously, Ylt1-like LTR retrotransposons were reported from whole genome sequences (WGS) and expressed for Yarrowia lipolytica (original Ylt1 retrotransposon) sequence tags (ESTs) of diverse plants including spike- [11], Candida albicans (Tca3 element) [12], and basidio- moss Selaginella moellendorffii (Lycopodiophyta) and mycete Cryptococcus neoformans [13]. In the present PCR-based screening of ferns, horsetails and lycophytes study, LTR retrotransposons belonging to the Ylt1 lineage showed that a common clade of CHD-containing Gypsy have been found in both ascomycete and basidiomycete LTR retrotransposons can be found in mosses, lycophytes fungi. They fall into several clearly separated groups: the from genus Selaginella, and basidiomycetes. According to branch formed by transposable elements from Basidio- the classification of CHD-containing Gypsy LTR ret- mycota, the group of retrotransposons from ascomycetes, rotransposons proposed by Gorinsek et al. (2004) [3] this and the branch of the original Ylt1 LTR retrotransposon clade has name Tcn1. It seems that Tcn1 is a unique clade (Figure 1). of chromoviruses which has a wide inter-kingdom distri- Twenty monophyletic clades can be recognized in the bution. Tcn1-like LTR retrotransposons from fungi and phylogenetic tree of fungal CHD-containing Gypsy LTR non-seed plants demonstrated higher similarity to each retrotransposons, nine of which have been previously other in comparison with LTR retrotransposons from reported [3]. Thirteen clades are specific for ascomycetes other clades. This can be explained by strong selective (Nessie, Pyret, Maggy, Pyggy, MGLR3, Yeti, Coccy1, constraints and the 'retained' genes theory or by horizon- Coccy2, Polly, Afut1, Tf1, Ty3, and Afut4), six have been tal transmission. found only in genomes of basidiomycetes (MarY1, Laccy1, Laccy2, Tcn2, Puccy1, and Puccy2) and one clade Results (Tcn1) is present in both basidiomycetes and chytridio- Gypsy LTR retrotransposons survey from fungal genomes mycetes (Batrachochytrium dendrobatidis JEL423) (Fig- The Gypsy LTR retrotransposons mining from fungal ure 2). genomes was initiated in an attempt to identify ret- For each of the newly identified retrotransposon fami- rotransposons closely related to those which were found lies, we attempted to isolate a full-length representative Novikova et al. BMC Genomics 2010, 11:231 Page 3 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 1: List of fungal species, genomes of which were analyzed in silico in present study

Phylum/Subphylum Class Species and strain genome size LTRa (Mb)

Ascomycota/ Sordariomycetes Chaetomium globosum 36 23 Pezizomycotina CBS 148.51

Fusarium oxysporum 60 39 4286 FGSC

Fusarium verticillioides 46 4 7600

Nectria haematococca 40 65 MPVI

Podospora anserina S 37 11 mat+

Trichoderma reesei 33 5 QM6a

Trichoderma virens 38 5 Gv29-8

Eurotiomycetes Aspergillus clavatus 35 29 NRRL 1

Aspergillus niger 37 1 ATCC1015

Aspergillus terreus 35 3 NIH2624

Coccidioides immitis RS 29 236

Histoplasma 28 74 capsulatum NAm1

Uncinocarpus reesii 30 64 1704

Leotiomycetes Sclerotinia sclerotiorum 38 25 1980

Botrytis cinerea B05.10 38 25

Dothideomycetes Alternaria brassicicola 30 108 ATCC 96866 Novikova et al. BMC Genomics 2010, 11:231 Page 4 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 1: List of fungal species, genomes of which were analyzed in silico in present study (Continued)

Pyrenophora tritici- 37.8 118 repentis Pt-1C-BFP

Stagonospora 37 12 nodorum SN15

Basidiomycota/ Agaricomycetes Amanita bisporigera 58 7 Agaricomycotina (Homobasidiomycetes)

Coprinus cinereus 38 160 Okayama7#130

Laccaria bicolor S238N 61 114

Postia placenta MAD- 90 921 698

Basidiomycota/ Urediniomycetes Sporobolomyces roseus 21 6 Pucciniomycotina

Puccinia graminis f. sp. 81.5 534 tritici

Chytridiomycota Chytridiomycetes Batrachochytrium 24 1 dendrobatidis JEL423 a total copy number of detected Gypsy LTR retrotransposons or to reconstruct it using overlaps between partial domain varied among diverse families of PosPlaTy3 ele- sequences. The lengths of the elements thus identified ments. It can be found either at amino-terminus varied greatly from approximately 4.4 kb to more than 13 (PosPlaTy3-3) and carboxyl-terminus of Pol (PosPlaTy3- kb. The structural features for each family are listed in 4) or between PR and RT domains (PosPlaTy3-5) (Figure Additional Table S1 (Additional file 1). The majority of 3). The presence of dUTPase in LTR retrotransposon full-length LTR retrotransposons had either a single open sequences has been described earlier for the elements reading frame (ORF) encoding a fused Gag-Pol polypro- from a basidiomycete Phanerochaete chrysosporium and tein or two ORFs encoding separate proteins (Figure 3). an ascomycete Tuber melanosporum [14,15]. The role The Gag protein sequences differed greatly between fam- and origin of this domain remained unclear. Moreover, it ilies. Nevertheless, cysteine motifs characterized by the seems that the described LTR retrotransposons acquire amino acid sequence C-X2-C-X4-H-X4-C (CCHC) were this domain independently from different sources (Addi- found in Gag for some of the identified Gypsy LTR ret- tional file 2) [14]. It was assumed that the presence of rotransposons (see Additional file 1). The Pol polypro- dUTPase allows viruses that contain this domain to repli- teins sequences were more conserved than Gag, cate in non-dividing cells, in which cellular dUTPase especially with the RT and Int domains. RT, Int, PR (pro- activity is absent because replication of DNA does not teinase) and chromodomains (CHDs, in chromoviruses) occur [16]. were detected. Characteristic motifs were found through- Tnc1 clade is found in non-seed plants out the Gag and Pol sequences of all of the putative intact element copies. Further phylogenetic analysis revealed that previously In addition to the abovelisted enzymatic domains, a described CHD-containing LTR retrotransposons from deoxyuridine triphosphatase domain (dUTPase) has been mosses including PpatensLTR retrotransposons isolated found in several LTR retrotransposons from the basidio- from genomic sequence of moss Physcomitrella patens mycete Postia placenta MAD-698. The location of this formed a common branch with Tcn1-like LTR retrotrans- posons from fungi (Figure 4) [8]. In an attempt to deter- Novikova et al. BMC Genomics 2010, 11:231 Page 5 of 20 http://www.biomedcentral.com/1471-2164/11/231

The whole sequence of SM-Tcn1 LTR retrotransposon 100 REAL Alternaria alternata AB025309 95 PYGGY Pyrenophora graminea AF533704 was obtained from WGS. SM-Tcn1 is 5704 bp in length 81 MAGGY Magnaporthe grisea XM_001414071 100 Afut2 Aspergillus fumigatus AF202956 and carries two putative ORFs. ORF1 or gag (969 bp in 93 MGLR3 Magnaporthe grisea AF314096 CHROMOVIRUS 100 Afut3 Aspergillus fumigatus GQ294562 89 Tricholoma bakamatsutake AB036886 length) encodes a 323 amino acid (aa) protein with strong 64 Tricholoma magnivelare AB036885 92 Tf2-7 Schizosaccharomyces pombe NM_001020326 osvaldo Drosophila buzzatii AJ133521 similarity to retroviral Gag proteins (pfam03732). ORF2 Ulysses Drosophila virilis X56645 OSVALDO 52 79 mag Bombyx mori X17219 or pol (3714 bp) encodes a 1238 aa polyprotein, with 100 Anopheles gambiae GYPSY69-I_AG [Repbase] MAG 100 gypsy Drosophila subobscura M12927 82 yoyo Ceratitis capitata U60529 characteristic retroviral aspartyl protease (PR), RT, Int, 100 Tom Drosophila ananassae Z24451 GYPSY 100 McClintock Drosophila melanogaster AF541948 and CHD domains (Figure 3; Additional file 3). SM-Tcn1 mdg3 Drosophila melanogaster X95908 100 blastopia Drosophila melanogaster Z27119 MDG3 possesses 440 bp LTRs with conserved features, including 100 LacBicTy3-15 sc_10 CopCinTy3-14 XM_001838426 SN_1006 Ylt1 Yarrowia lipolytica XM_499888 the dinucleotide end sequences (TG...CA). Target site 64 75 PyrTriTy3-8 supercontig4b 100 BotCinTy3-4 XM_001559456 duplication (AACAC...AACAC) was also detected for the 100 100 NecHaemTy3-5 sc_20 NecHaemTy3-6 sc_107 100 PucGraTy3-15 supercontig18 described copy of SM-Tcn1. The LTRs contained TATA 98 100 PucGraTy3-16 supercontig29 PucGraTy3-14 supercontig24 YLT1 and CAAT boxes. No putative primer-binding site (PBS) 100 PcMetavir15 Phanerochaete chrysosporium sc_14 92 LacBicTy3-14 sc_74 100 PcMetavir12 Phanerochaete chrysosporium sc_23 was found. The PBS is necessary for initiation of reverse 78 PcMetavir13 Phanerochaete chrysosporium sc_26 100 CopCinTy3-15 XM_001841487 transcription and synthesis of the first strand comple- 100 PcMetavir14 Phanerochaete chrysosporium sc_12 100 LacBicTy3-13 sc_63 100 CopCinTy3-16 XM_001840638 mentary 5'LTR sequence [17]. Typical PBS is located near Human immunodeficiency virus 1 NC_001802 RETROVIRIDAE 100 Human immunodeficiency virus 2 NC_001722 the 5'LTR and complementary to the 3' terminal nucle-

0.2 otides of the primer tRNA used for initiation. Other known mechanism for initiation of the first strand syn- Figure 1 Neighbor-joining (NJ) phylogenetic trees based on RT thesis is self-priming, in this case a sequence derived and partial Int amino acid sequences of Gypsy LTR belonging to Ylt1 and SN_1006 clades. Statistical support was evaluated by boot- from LTR is located just downstream of the 5'LTR [18,19]. strapping (1000 replications); nodes with bootstrap values over 50% However, evidences were found neither for tRNA prim- are indicated. The Gypsy LTR retrotransposons clades are shown on the ing nor for self-priming of SM-Tcn1 LTR retrotranspo- right and include Chromovirus, Osvaldo, mag, Gypsy, mdg3, SN_1006 son. The sequence presented between 5' LTR and gag is and Ylt1. Sequences of human immunodeficiency viruses (Retroviri- conservative and thymine-rich (Figure 3). The possible dae) were used as outgroup. The name of the host species and acces- sion number are indicated for all elements taken from GenBank. Newly mechanism for initiation of reverse transcription of SM- identified retrotransposons are highlighted by bold; localization in ge- Tcn1 remained unclear. A polypurine tract (PPT) was nomic sequence is indicated for each of them. Genomic sequences of detected immediately upstream of the 3'LTR. The PPT Laccaria bicolor S238N and Nectria haematococca MPVI have been tak- sequence is involved in second-strand DNA synthesis. en from The DOE Joint Genome Institute [55]; the following species are The BLAST search (blastn) of full-length SM-Tcn1 ret- available at Broad Institute [54]: Botrytis cinerea B05.10; Pyrenophora trit- ici-repentis Pt-1C-BFP; Coprinus cinereus okayama7#130; Puccinia rotransposon indicated the presence of more than 200 graminis f. sp. tritici. For more details: Additional files 1, 5 and 6. hits in the S. moellendorffii genome. The close examina- tion of identified copies showed that they have in average mine the distribution of the Tcn1clade among plants, we 91.7% nucleotide identity with the original SM-Tcn1 used public databases for further survey of LTR ret- sequence. A BLAST search (tblastn) using SM-Tcn1 rotransposons including genomic databases for red and putative Pol protein as a query yielded more than 500 green algae, spikemoss Selaginella moellendorffii, and hits. The size of S. moellendorffii genome is only ~100 seed plants (see Materials and Methods section). A Tcn1- Mbp [20], thus SM-Tcn1 retrotransposons comprise ca. like LTR retrotransposon search was implemented with 1.5% of genomic sequence. BLAST (blastp and blastx). Amino acid sequences of RT Gypsy LTR retrotransposons from non-seed vascular plants and Int domains of known Tcn1-like (Tcn1 from C. neo- The bioinformatic survey of CHD-containing Gypsy LTR formans, Ccchromovir1 and Ccchromovir2 from C. retrotransposons, which allowed us to identify Tcn1-like cinereus, PcMetavir6 from Phanerochaete chrysosporium, retrotransposons in a few basidiomycetes, P. patens and S. and PpatensLTRs from P. patens) and newly identified moellendorffii, did not provide a satisfactory answer to retrotransposons (SpoRosTy3-4 and BatDenTy3-1) were the question concerning distribution of this clade among used as the queries. The Tcn1-like LTR retrotransposons non-seed vascular plants. Therefore, we used PCR with were identified only in the whole genomic sequence of degenerate primers to investigate the distribution of Selaginella moellendorffii (SM-Tcn1, Figure 4); none of CHD-containing Gypsy LTR retrotransposons in 26 ferns the tested algae or seed plant genomes contained LTR and horsetails (monilophytes) belonging to three classes: retrotransposons from this clade. It seems that the Tcn1 Psilotopsida, Polypodiopsida and Equisetopsida, and in clade can be found in basidiomycetes and chytridiomy- 10 lycophytes from two classes: Isoetopsida and Lycopo- cetes fungi as well as non-seed plants (Bryophyta and diopsida (Table 2). Lycopodiophyta). Novikova et al. BMC Genomics 2010, 11:231 Page 6 of 20 http://www.biomedcentral.com/1471-2164/11/231

99 100 Tricholoma magnivelare AB036885 90 MarY1 Tricholoma matsutake AB028236 CopCinTy3-7 XM_001829959 CopCinTy3-5 XM_001834739 100 PcMetavir1 Phanerochaete chrysosporium sc_44 Ccchromovir4 (CopCinTy3-6) XM_001834772 55 90 100 PosPlaTy3-2 sc_46 PosPlaTy3-1 sc_192 100 66 PcMetavir2 Phanerochaete chrysosporium sc_4 PcMetavir3 Phanerochaete chrysosporium sc_19 100 PcMetavir10 Phanerochaete chrysosporium sc_5 78 100 PosPlaTy3-4 sc_165 PosPlaTy3-3 sc_138 MARY1 Tricholoma bakamatsutake AB036886 70 PcMetavir4 Phanerochaete chrysosporium sc_9 PcMetavir11 Phanerochaete chrysosporium sc_6 58 100 PcMetavir5 Phanerochaete chrysosporium sc_7 98 LacBicTy3-2 sc_21 97 AmaBisTy3-1 GQ294561 PosPlaTy3-5 sc_166 CopCinTy3-4 XM_001839944 CopCinTy3-2 XM_001834926 CopCinTy3-3 XM_001838010 PucGraTy3-1 supercontig107 BatDenTy3-1 supercontig13 87 SpoRosTy3-4 sc_1b 56 Tcn1 Cryptococcus neoformans XM_571377 91 PcMetavir6 Phanerochaete chrysosporium sc_10 TCN1 100 Ccchromovir1 (CopCinTy3-1) XM_001834840 100 Ccchromovir2 (CopCinTy3-10) XM_001835230 100 LacBicTy3-1 sc_24 PosPlaTy3-6 sc_1655 82 PucGraTy3-9 supercontig1 100 PucGraTy3-10 supercontig5b 95 PucGraTy3-7 supercontig134 100 PucGraTy3-8 supercontig15 100 100 PucGraTy3-5 supercontig80 LACCY1 PucGraTy3-6 supercontig12 100 PucGraTy3-2 supercontig3 100 PucGraTy3-3 supercontig5a 77 PucGraTy3-4 supercontig41 NecHaemTy3-3 sc_9 96 79 Aspergillus oryzae GYPSY2-I_AO [Repabase] 96 HisCapTy3-1 XM_001540215 HisCapTy3-2 supercontig3 55 BotCinTy3-2 XM_001552430 HisCapTy3-6 XM_001541811 NESSIE 78 UncReeTy3-5 supercontig1 100 HisCapTy3-4 XM_001538690 100 HisCapTy3-5 supercontig9 61 98 skippy Fusarium oxysporum S60179 Cgret Glomerella cingulata AF264028 ChaGloTy3-4 XM_001222906 100 Boty Botryotinia fuckeliana X81791 80 ScleSclerTy3-1 supercontig6a ScleSclerTy3-2 supercontig1 PYRET 100 AFLAV Aspergillus flavus AY485786 Gypsy4-I_AO Aspergillus oryzae Repbase 71 CfT-1 Cladosporium fulvum AF051915 89 PyrTriTy3-1 supercontig4a 100 PyrTriTy3-6 supercontig26 88 PyrTriTy3-7 supercontig19 100 PyrTriTy3-5 supercontig1 PyrTriTy3-3 supercontig2 68 51 100 PyrTriTy3-4 supercontig9 84 ChaGloTy3-1 XM_001229222 BotCinTy3-3 XM_001548791 89 100 ChaGloTy3-2 XM_001222219 MAGGY 100 ChaGloTy3-3 XM_001229594 91 Dane1 Aspergillus nidulans AF295689 Afut2 Aspergillus fumigatus AF202956 68 99 MAGGY Magnaporthe grisea XM_001414071 96 FusOxyTy3-1 supercontig31 100 FusOxyTy3-2 supercontig17 grh Magnaporthe grisea T18350 100 98 PyrTriTy3-2 supercontig12 75 92 NecHaemTy3-4 sc_24 HT ChaGloTy3-8 XM_001224270 85 Dane4 Aspergillus nidulans XM_655183 PYGGY 86 AltBraTy3-2 Contig5.76 54 REAL Alternaria alternata AB025309 95 PYGGY Pyrenophora graminea AF533704 ScleSclerTy3-3 supercontig6b Afut3 Aspergillus fumigatus GQ294562 94 ScleSclerTy3-4 supercontig4 85 100 MGLR3 Magnaporthe grisea AF314096 MGLR3 69 PodAnsTy3-3 EU697468 94 95 PodAnsTy3-2 EU697464 ChaGloTy3-7 XM_001223152 Yeti Podospora anserina AJ272171 YETI 52 100 UncReeTy3-4 supercontig2 CocImmTy3-3 supercontig4 COCCY1 100 UncReeTy3-1 supercontig3 CocImmTy3-5 supercontig6 COCCY2 79 polly Leptosphaeria maculans AM084347 99 StaNodTy3-3 sc_55 61 AltBraTy3-3 Contig9.4 TrichReeTy3-1 sc_1 AltBraTy3-1 Contig1.100 POLLY HisCapTy3-3 supercontig5 97 UncReeTy3-2 supercontig4 96 CocImmTy3-6 supercontig5 100 Afut1 Aspergillus fumigatus L76086 60 AspTerTy3-1 supercontig13 AltBraTy3-5 Contig0.191 100 StaNodTy3-4 sc_67 100 70 FusOxyTy3-3 supercontig14 100 FusVerTy3-2 supercontig1 77 FusVerTy3-1 supercontig16 NecHaemTy3-1 sc_19 TrichVirTy3-1 sc_16 AFUT1 PodAnsTy3-1 CU633867 StaNodTy3-2 sc_49 StaNodTy3-1 sc_4 96 AspClaTy3-1 supercontig77 AspClaTy3-2 supercontig17 TrichReeTy3-2 sc_27 AltBraTy3-4 Contig2.232 Tf2-7 Schizosaccharomyces pombe NM_001020326 F 92 LacBicTy3-3 sc_174 T 1 58 CopCinTy3-12 XM_001835305 97 LacBicTy3-4 sc_56 96 PcMetavir9 Phanerochaete chrysosporium sc_16b 100 CopCinTy3-11 XM_001841125 100 PosPlaTy3-8 sc_112 PcMetavir8 Phanerochaete chrysosporium sc_16a CopCinTy3-9 XM_001840214 PucGraTy3-13 supercontig25 100 100 Tcn2 Cryptococcus neoformans XM_567971 100 Tcn5 Cryptococcus neoformans XM_766715 TCN2 65 100 100 Tcn3 Cryptococcus neoformans XM_766714 Tcn4 Cryptococcus neoformans XM_766730 60 PcMetavir7 Phanerochaete chrysosporium sc_28 100 Ccchromovir-5 (CopCinTy3-8) XM_001828110 80 PosPlaTy3-7 sc_89 71 LacBicTy3-5 sc_91 89 LacBicTy3-6 sc_2 Ty3 Saccharomyces cerevisiae M34549 TY3 PucGraTy3-11 supercontig37 PucGraTy3-12 supercontig4 UCCY 91 100 P 1 100 PucGraTy3-35 supercontig5 PucGraTy3-36 supercontig78 100 PucGraTy3-37 supercontig8 89 PucGraTy3-33 supercontig15 100 PucGraTy3-34 supercontig12 100 PucGraTy3-30 supercontig27a 98 PucGraTy3-31 supercontig114 100 PucGraTy3-32 supercontig27b PUCCY2 100 PucGraTy3-28 supercontig1 100 PucGraTy3-29 supercontig10 SpoRosTy3-1 sc_9 100 SpoRosTy3-2 sc_1a 100 SpoRosTy3-3 sc_2 100 85 LacBicTy3-8 sc_8 100 LacBicTy3-11 sc_14 LacBicTy3-9 sc_38a 65 100 100 LacBicTy3-10 sc_9 LacBicTy3-12 sc_38b LACCY2 LacBicTy3-7 sc_23 100 CopCinTy3-13 XM_001839531 AspNigTy3-1 AM270026 TrichReeTy3-3 sc_10 CocImmTy3-7b supercontig1b 79 CocImmTy3-7a supercontig1a HisCapTy3-10 XM_001539008 59 HisCapTy3-9 supercontig10 54 HisCapTy3-11 supercontig2 100 Dane3 Aspergillus nidulans XM_655128 80 Afut4 Aspergillus fumigatus Gq294563 87 ChaGloTy3-6 XM_001230211 AFUT4 NecHaemTy3-2 sc_59 81 ChaGloTy3-5 XM_001228540 64 TrichReeTy3-4 sc_38 HisCapTy3-8 supercontig1 StaNodTy3-5 sc_19 98 AspOryTy3-3 AP007162 AspClaTy3-5 supercontig79 51 AspClaTy3-4 supercontig86 Ylt1 Yarrowia lipolytica AJ310725 72 osvaldo Drosophila buzzatii AJ133521 0.1

Figure 2 Neighbor-joining (NJ) phylogenetic trees based on RT and partial Int amino acid sequences of Gypsy LTR retrotransposons includ- ing newly described fungal chromodomain-containing LTR retrotransposons. Statistical support was evaluated by bootstrapping (1000 replica- tions); nodes with bootstrap values over 50% are indicated. The clades are shown on the right. The name of the host species and accession number are indicated for all elements taken from GenBank. Newly identified retrotransposons are highlighted in bold; localization in genomic sequence is in- dicated for each of them. Genomic sequences of Trichoderma reesei QM6a, Trichoderma virens Gv29-8, Nectria haematococca MPVI, Aspergillus niger ATCC1015, Alternaria brassicicola ATCC 96866, Stagonospora nodorum SN15, Laccaria bicolor S238N, Postia placenta MAD-698, and Sporobolomyces ro- seus have been taken from The DOE Joint Genome Institute [55]; the following species are available at Broad Institute [54]: Chaetomium globosum CBS 148.51; Fusarium oxysporum 4286 FGSC;Fusarium verticillioides 7600; Aspergillus clavatus NRRL 1; Aspergillus terreus NIH2624; Coccidioides immitis RS; His- toplasma capsulatum NAm1; Uncinocarpus reesii 1704; Sclerotinia sclerotiorum 1980; Botrytis cinerea B05.10; Pyrenophora tritici-repentis Pt-1C-BFP; Copri- nus cinereus okayama7#130; Puccinia graminis f. sp. tritici; Batrachochytrium dendrobatidis JEL423. The possible horizontal transmission (HT) is marked. For more details: Additional files 1, 5 and 6. Novikova et al. BMC Genomics 2010, 11:231 Page 7 of 20 http://www.biomedcentral.com/1471-2164/11/231

revealed that 10 clones were not from CHDcontaining CopCinTy3-6 5’LTR gag pol 3’LTR PR RT RH INT (6903 bp) TSD CCHC chromo TSD LTR retrotransposons but were from Athila-like Gypsy PosPlaTy3-4 5’LTR gag pol chromo 3’LTR PR RT INT (8007 bp) elements (Additional file 4). Many representatives of this dUTPase 5’LTR gag pol 3’LTR MARY1 PosPlaTy3-3 PR RT INT (7414 bp) clade possess not only classical gag and pol sequences, TSD dUTPase chromo TSD PosPlaTy3-5 5’LTR gag-pol 3’LTR PR RT INT (7241 bp) but also an additional open reading frame that might TSD CCHC dUTPase chromo TSD gag pol BatDenTy3-1 5’LTR 3’LTR PR RT INT encode an env-like protein [25,26]. TCN1 (5440 bp) CCHC chromo

5’LTR gag pol 3’LTR HisCapTy3-6 RT INT The phylogenetic relationships among obtained clones NESSIE (6565 bp) TSD CCHC HHCC chromo TSD and known CHD-containing Gypsy LTR retrotranspo- 5’LTR gag pol 3’LTR NecHaemTy3-5 PR RT RH INT sons, extracted from databases, were reconstructed using YLT1 (12164 bp) HHCC add. ORF

PPT neighbor-joining (NJ) analysis based on the multiple ..gaggagggggtg 3’LTR SM-Tcn1 5’LTR gag pol 3’LTR Selaginella moellendorffii (5706 bp) TSD PR RT INT TSD alignment of nucleotide sequences of RT fragment (Fig- chromo 5’LTR gtttgattcagattgtgttttttgcttttcttttagctcgttttgattatttgattgcctttgtctttttcccccttttcttttctttcttcttgcca gag ure 4). The Gypsy LTR retrotransposons from Drosophila PBS? melanogaster were used as an outgroup. The newlyidenti- Figure 3 Structural organization of a number of full-length LTR fied LTR retrotransposon grouped into four large clusters retrotransposons from fungi and SM-Tcn1 LTR retrotransposon from spikemoss Selaginella moellendorffii identified in present on the phylogenetic tree. The group of clones from study. The clade for each of elements is shown on the left. Abbrevia- diverse monilophytes and one LTR retrotransposon from tions: LTR - long terminal repeat, TSD - target site duplication, PR - pro- lycophytes alpinum (LycAlpTy3-1 clone) teinase, RT - reverse transcriptase, RH - ribonuclease H, Int - core form a common group with previously described retro- integrase, chromo - chromodomain, dUTPase - deoxyuridine triphos- elements from mosses Tetraphis pellucida and Vesicu- phatase domain, CCHC and HHCC - Zn-finger motifs, add. ORF - addi- tional open reading frame with unknown function, PPT - polypurine laria dubyana [8]. Although the bootstrap support is tract, PBS? - no putative primer-binding site was found for SM-Tcn1. below 50% (data not shown), this new clade (named "Galahad") seems to be a sister group to the Galadriel The estimated diversity of monilophytes (= Infradivi- clade. Galahad appears to be one of the oldest widely dis- sion Moniliformopses) is about 9000 species and includes tributed clades of CHD-containing LTR retrotransposons horsetails, whisk ferns, and all eusporangiate and lep- from plants. Since Galahad clade was found in all non- tosporangiate ferns [19]. Most of the species examined in seed plants including mosses, the probable age of this the present study were leptosporangiate ferns from the clade would be in the range of 400-700 Myr, which is esti- order Polypodiales, class Polypodiopsida. This order cov- mated time divergence of liverworts and mosses from ers more than 80% of current known diversity of ferns vascular plants [27]. [21]. Additionally, one representative of heterosporous The second group is formed by LTR retrotransposons ferns, Salvinia natans (Polypodiopsida, Salviniales), two from both monilophytes and lycophytes. The phyloge- ophioglossoid ferns (Psilotopsida, Ophioglossales) and netic analysis did not provide support for a monophyletic three horsetails (Equisetopsida, Equisetales) were origin of this cluster. Moreover, the relationships inside included [22]. Lycophytes are much less diverse in com- the cluster remained unclear, with the exception of sev- parison with monilophytes and comprise less than 1% of eral lineages. One of the lineages ('a' in Figure 4), con- extant land plants (around 1200 living species). Three tained members from lycophytes in the family major lineages are distinguished among lycophytes: club- : Lycopodium clavatum (LycClavGty3 mosses and firmosses (Lycopodiaceae), spikemosses clones), L. japonicum (LycJapGty3-1 clone), and Diphasi- (Selaginellaceae), and quillworts (Isoetaceae) [23]. astrum complanatum (DiphComGty3 clones). Another Among lycophytes included in the present study are Iso- lineage seems to have had a long-term association with etes and Huperzia species (Isoetaceae) as well as two fern genomes, since it appears to be widely distributed Selaginella species (Selaginellaceae), which belong to the among leptosporangiate ferns and can be found in class Isoetopsida, and seven diverse species from Lycopo- Dennstaedtiaceae (Pteridium aquilinum), Pteridaceae diaceae (Lycopodiales, ). (Adiantum pedatum), Aspleniaceae (Asplenium viride), The presence of CHD-containing Gypsy LTR retroele- Woodsiaceae (Athyrium distentifolium and Cystopteris ments among the listed plants was tested by amplifying fragilis), and Dryopteridaceae (Dryopteris crassirhizoma genomic DNA with previously developed degenerate oli- and Polystichum tripteron) (lineage 'b' on Figure 4). Addi- gonucleotide primers [8,24]. Consistent with the spacing tionally, five satellite lineages, represented mostly by sin- of reverse transcriptase (RT) domains, the amplified PCR gle clones can be found on the phylogenetic tree. products were approximately 320 bp in length. In total, 98 The largest group is represented by 37 LTR retrotrans- clones with sequence similarity to known RT sequences posons. Three clearly separated clusters can be found were isolated, of which 76 were from monilophytes and inside this group (marked as 'd', 'f', and Mordred on Fig- 22 from lycophytes. The preliminary blastp search ure 4). One of these clusters is formed by LTR retrotrans- posons from Ophioglossaceae (Botrychium multifidum), Novikova et al. BMC Genomics 2010, 11:231 Page 8 of 20 http://www.biomedcentral.com/1471-2164/11/231

50 SelKraGty3-3 78 SelKraGty3-1 60 100 SelKraGty3-4 Selaginella SelKraGty3-2 (lycophytes) 53 SM-Tcn1 Selaginella moellendorffii sc_0 99 SelPulGty3-1 100 SelPulGty3-4 100 SelPulGty3-2 SelPulGty3-5 SpoRosTy3-4 Sporobolomyces roseus sc_1 Tcn1 Cryptococcus neoformans XM_571377 54 PcMetavir6 Phanerochaete chrysosporium sc_10 100 Ccchromovir1 Coprinus cinereus XM_001834840 89 Ccchromovir2 Coprinus cinereus XM_001835230 Tcn1 57 BatDenTy3-1 Batrachochytrium dendrobatidis supercontig13 91 PpatensLTR3 Physcomitrella patens GQ294566 99 FunGygTy3a Funaria hygrometrica EU408440 PpatensLTR4 Physcomitrella patens GQ294567 75 PpatensLTR2 Physcomitrella patens GQ294565 98 60 FunHygTy3b Physcomitrella patens GQ294566 PpatensLTR1 Physcomitrella patens GQ294564 88 mosses 57 TetPellTy3 Tetraphis pellucida EU408416 HedCilTy3 Hedwigia ciliata EU408450 62 DicPolyTy3 Dicranum polysetum EU408400 PlagLaetTy3a Plagiothecium laetum EU408471 100 58 PleuShrebTy3 Pleurozium schreberi EU408374 100 LesSaxTy3 Lescuraea saxicola EU408372 AmaBisTy3-1 Amanita bisporigera GQ294561 98 PosPlaTy3-1 Postia placenta sc_192 91 CopCinTy3-7 Coprinus cinereus XM_001829959 MarY1 99 MarY1Tricholoma matsutake AB028236 PlagLaetTy3bPlagiothecium laetum EU408463 81 BraSchTy3-2 Brasenia schreberi DQ206458 56 NymTetTy3-2 Nymphaea tetragona AY959265 60 rGyRT15 Musa acuminata AM901561 100 Monkey Musa acuminata AF143332 Galadriel RT Vitis vinifera Am474226 97 MagGraTy3-1 Magnolia grandiflora AY959239 Tntom1 Nicotiana tomentosiformis AJ508603 Galadriel Lycopersicon esculentum AF119040 100 TetPellTy3-11Tetraphis pellucida EU408417 100 51 TetPellTy3-1Tetraphis pellucida EU408407 VesDubTy3 Vesicularia dubyana EU408406 LycAlpGty3-2 SalNatGty3-1 100 SalNatGty3-2 SalNatGty3-3 62 PolyVulGty3-2 60 DryoFilGty3-3 100 PolyBrauGty3-2 DryoExpGty3-1 56 100 MattStrutGty3-1 PolyBrauGty3-1 Galahad 58 DryoFilGty3-2 100 DryoFilGty3-1 91 99 CystFragGty3-4 88 CystFragGty3-3 CystFragGty3-5 100 WoodSubGty3-1 100 WoodSubGty3-2 100 AthyMonGty3-4 AthySinGty3-3 AthyMonGty3-3 100 AthyMonGty3-2 100 AthySinGty3-1 100 RT Oryza sativa AC091724 54 Retrosat4 Oryza sativa AC134047 RIRE3 Oryza sativa AP008216 100 RT Oryza sativa AY360387 Tekay 99 RT Sorghum bicolor AF061282 68 Tekay Zea mays AF448416 Retrosor2 Sorghum bicolor AF061282 100 PhegConnGty3-2 PhegConnGty3-1 79 PolyVulGty3-1 c AthyDistGty3-2 HupSquGty3-1 52 LycClavGty3-3 100 LycClavGty3-1 99 LycClavGty3-2 72 DiphComGty3-3 a 77 100 DiphComGty3-1 LycClavGty3-4 LycJapGty3-1 IsoHisGty3-1 DiphComGty3-2 65 DryoCarGty3-1 100 DryoCarGty3-2 DryoCarGty3-3 AdiPedaGty3-2 100 DryoCrasGty3-1 87 CystFragGty3-2 57 AthyDistGty3-3 b 89 PteAquGty3-1 97 PolyTripGty3-1 99 AspVirGty3-2 94 AspVirGty3-3 95 BotMultGty3-1 100 BotMultGty3-5 100 BotMultGty3-6 100 BotMultGty3-4 100 BotMultGty3-7 68 BotMultGty3-2 d DryoExpGty3-2 97 AspVirGty3-1 95 DryoCrasGty3-2 DryoCarGty3-4 100 76 DryoCrasGty3-3 WoodPolGty3-3 100 WoodPolGty3-4 77 WoodPolGty3-2 f 60 WoodPolGty3-5 LycAnnGty3-1 82 100 LycMagGty3-2 100 LycMagGty3-1 71 LycMagGty3-4 LycMagGty3-3 BotMultGty3-3 100 OphVulGty3-1 OphVulGty3-2 OnoSensGty3-1 91 100 OnoSensGty3-2 EquFluvGty3-1 84 EquHiemGty3-1 Modred 100 EquHiemGty3-2 56 99 EquHiemGty3-3 CystFragGty3-1 AspRutGty3-1 91 80 WoodPolGty3-1 90 97 DryoFilGty3-4 DryoCrasGty3-4 60 AdiPedaGty3-1 65 AthyMonGty3-1 100 AthyDistGty3-1 84 AthySinGty3-2 ATGP6 Arabidopsis thaliana AC012327 99 LjCRM1 Lotus japonicus AP00452 86 Beetle1 Beta vulgaris AJ539424 68 RIRE7 Oryza sativa AB033235 CRM 100 CRM Zea mays AY129008 75 CRR Oryza sativa AC022352 76 Cereba Hordeum vulgare AY040832 Gloin Arabidopsis thaliana AC007188 63 Gimli Arabidopsis thaliana AL049655 51 LORE1e Lotus japonicus AJ96699 52 rn_20-10 Oryza sativa AP008207 Reina Reina Zea mays AF123535 55 rn_50-33 Oryza sativa AP008208 ZAM Drosophila melanogaster AJ000387 98 mdg3 Drosophila melanogaster X95908 0.05

Figure 4 Neighbor-joining (NJ) phylogenetic tree based on RT nucleotide sequences of CHD-containing Gypsy LTR retrotransposons in- cluding newly described elements from monilophytes and lycophytes plants (highlighted in bold). Statistical support was evaluated by boot- strapping (1000 replications); nodes with bootstrap values over 50% are indicated. The name of the host species and accession number are indicated for LTR retrotransposons taken from GenBank. Four diverse clusters of LTR retrotransposons from mosses, monilophytes and lycophytes are shown by arrows. The group of Tcn1-like LTR retrotransposons from mosses (Bryophyta) is also indicated. Previously known clades, clades described in this study, and unclassified lineages (a-f) are shown on the right. Novikova et al. BMC Genomics 2010, 11:231 Page 9 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 2: List of monilophytes and lycophytes, which were analyzed experimentally in present study

Division Class Family Species Chr (Athila)a

Moniliformopses Psilotopsida Ophioglossaceae Ophioglossum vulgatum L. 2

Botrychium multifidum 6 (Gmelin) Rupr.

Polypodiopsida Dennstaedtiaceae Pteridium aquilinum (L.) 1 (1) Kuhn

Pteridaceae Adiantum pedatum L. 2

Aspleniaceae Asplenium viride Huds. 3 (1)

Asplenium ruta-muraria L. 1

Woodsiaceae Athyrium sinense Rupr. 3 (3)

Athyrium distentifolium 3 (1) Tausch ex Opiz

Athyrium monomachii 4 (1) (Kom.) Kom.

Cystopteris fragilis (L.) Bern. 5

Woodsia polystichoides DC 5 Eaton

Woodsia subcordata Turcz. 2

Thelypteridaceae Phegopteris connectilis 2 (Michx.) Watt.

Onocleaceae Matteuccia struthiopteris (L.) 1 (1) Tod.

Onoclea sensibilis L. 2

Dryopteridaceae Dryopteris expansa (Presl) 2 Fraser-Jenk. & Jermy

Dryopteris filix-mas (L.) 4 Schott

Dryopteris crassirhizoma 4 Nak. Novikova et al. BMC Genomics 2010, 11:231 Page 10 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 2: List of monilophytes and lycophytes, which were analyzed experimentally in present study (Continued)

Dryopteris carthusiana (Vill.) 4 Fuchs

Polystichum braunii 2 (Spenner) Fée

Polystichum tripteron 1 (Kunze) Presl

Polypodiaceae Polypodium vulgare L. 2

Pyrrosia lingua (Thunb.) -- (1) Farw.

Salviniaceae Salvinia natans (L.) All. 3

Equisetopsida Equisetaceae Equisetum hiemale L. 3

Equisetum fluviatile L. 1

Lycopodiophyta Isoetopsida Isoetaceae Isoetes histrix Bory 1

Selaginellaceae Selaginella kraussiana 4 (Kunze) A. Braun

Selaginella pulvinata (Hook. 4 & Grev.) Maxim.

Lycopodiopsida Lycopodiaceae Diphasiastrum 3 complanatum (L.) Holub

Huperzia squarrosa (Forst.) 1 Trevis.

Lycopodium alpinum L. 1

Lycopodium clavatum L. 4

Lycopodium magellanicum 4 Hert. ex Nessel

Lycopodium japonicum 1 Thunb. ex Murr.

Lycopodium annotinum L. 1 (1) a number of unique sequences for CHD-containing LTR retrotransposons (Chr) and Athila-like LTR retrotransposons obtained in present study Novikova et al. BMC Genomics 2010, 11:231 Page 11 of 20 http://www.biomedcentral.com/1471-2164/11/231

Aspleniaceae (A. viride) and Dryopteridaceae (Dryopteris The hypothesis of 'retained' genes is based on the expansa, D. carthusiana, and D. crassirhizoma). The sec- observation that EST data of Physcomitrella contained a ond cluster has a bootstrap support of 77% and is repre- fraction of transcripts derived from putative genes sented by clones isolated from Lycopodium annotinum ('retained' genes), which are not present in seed plants (Lycopodiaceae) and Woodsia polystichoides (Woodsi- but can be found in other kingdoms including fungi. It aceae). The last monophyletic cluster, clade Mordred, is was proposed that such retained genes along with Phy- the largest, well-supported clade which is widely distrib- scomitrella-specific (or moss-specific) genes encode uted among representatives of all investigated classes functions that make mosses unique in terms of physiol- except Isoetopsida. It was found to be present in fern ogy and metabolism [28]. We used these data and com- genomes from families Ophioglossaceae (Ophioglossum pared the levels of similarity for RT-Int fragments from vulgatum and Botrychium multifidum), Pteridaceae (Adi- Tcn1-like LTR retrotransposons and two putatively antum pedatum), Woodsiaceae (Athyrium sinense, A. dis- retained genes from Physcomitrella, which showed a high tentifolium, A. monomachii, C. fragilis, and W. similarity with functional genes from fungi: uric acid- polystichoides), Onocleaceae (Onoclea sensibilis), and xanthine permease (uapA, TIGR00801) and inorganic Dryopteridaceae (Dryopteris filix-mas and D. crassirhi- phosphate transporter (Pho88, pfam10032). zoma); in horsetails Equisetum hiemale and E. fluviatile The pairwise comparisons between hypothetical Pho88 (Equisetaceae); and Lycopodium magellanicum proteins from basidiomycetes Coprinus cinereus (Lycopodiaceae). Okayama7#130, Phanerochaete chrysosporium and Cryp- The Tcn1-like LTR retrotransposons were detected tococcus neoformans revealed 51.3% to 64.9% similarity only in Selaginella species, S. kraussiana (SelKraGty3 whereas only 23.0% identical amino acid residues was clones) and S. pulvinata (SelPulGty3 clones) in addition found on average in pairwise comparisons between fun- to the previously described LTR retrotransposons from gal proteins and putative Pho88 from Physcomitrella mosses and SM-Tcn1 LTR retrotransposon from Selag- (Table 3). The most closely related homolog for putative inella moellendorffii (Figure 4) [8]. The absence of Tcn1- Pho88 from P. patens was found in Schizosaccharomyces like clones isolated from other lycophytes, ferns, and pombe (26.8% of similarity). The similarity between uapA horsetails can be explained by failed PCR amplification from Physcomitrella and Cryptococcus (36.9%) was due to the high divergence of Tcn1-like elements in almost the same as between proteins from Cryptococcus genomes of these species or, more likely, by lack of these and Coprinus (39.6%) or Cryptococcus and Phanerochaete elements from their genomes. (43.2%). More then 67% of amino acid residues are identi- cal in permeases from Coprinus and Phanerochaete. Pre- Tcn1-like LTR retrotransposons: 'retained' or horizontally dicted uapA from Ashbya gossypii and Physcomitrella transmitted? share 40.7% of amino acid residues. As a rule chromoviruses clades are specific for a particu- It seems to be that the hypothesis of 'retained' genes lar group of eukaryotic organisms such as Ascomycota cannot be implemented as explanation for Tcn1 clade dis- fungi (Nessie, Pyret, Maggy, Pyggy, MGLR3, Yeti, Coccy1, tribution since investigated RT-Int fragments of Tcn1-like Coccy2, Polly, Afut1, Tf1, Ty3, and Afut4), Basidiomycota LTR retrotransposons from fungi and plants have higher fungi (MarY1, Laccy1, Laccy2, Tcn2, Puccy1, and similarity to each other than functional proteins which Puccy2), or plants (Reina, CRM, Tekay, Galadriel, and were proposed to be 'retained' [28]. RT-Int fragments Chlamyvir as well as additional less investigated clades from Tcn1-like LTR retrotransposons have average simi- from mosses) [3,8]. In the light of such specificity, it was larity 49%. Moreover, evolutionary rates estimated for unexpected to find a clade containing elements from Tcn1 LTR retrotransposons appeared to be less than evo- basidiomycetes and non-seed plants. Nevertheless, it lutionary rates for 'retained' genes or other LTR ret- seems that Tcn1 clade has an interkingdom distribution rotransposons (Table 3). and can be found in a number of fungi, diverse mosses (Bryophyta) as well as in lycophytes (genus Selaginella). Discussion Such a wide distribution makes the Tcn1 clade unique Despite a number of whole genome sequence studies, the among the CHD-containing Gypsy LTR retrotranspo- distribution and diversity of CHD-containing Gypsy LTR sons. The interkingdom distribution of Tcn1 clade could retrotransposons is still poorly understood. Current be the result of horizontal transmission (HT) of LTR ret- knowledge of distribution and evolution of this group of rotransposons among fungi and plants; otherwise, Tcn1- mobile elements has been mainly obtained from diverse like LTR retrotransposons could have been 'retained' by model organisms [3]. The quickly generated massive data mosses and lycophytes from the most recent common sets (for example, WGS and EST databases) provide a ancestor of plants and Fungi/Metazoa lineage of eukary- great opportunity to perform detailed analysis for non- otes [8,28]. model organisms. Experimental data accumulation also Novikova et al. BMC Genomics 2010, 11:231 Page 12 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 3: Amino acid divergences of proteins and RT-Int fragments of CHD-containing Gypsy LTR retrotransposons from Tcn1, Pyggy and Pyret clades

Genes or LTR Length Amino acid identity (%) Evolutionary rate (10-9)b retrotransposons

inorganic phosphate transporter (Pho88)a

Physcomitrella patens 154 aa 26.8 0.420 (XM_001783642)/ Schizosaccharomyces pombe (NM_001019478)

P. patens (XM_001783642)/ 151 aa 23.1 0.445 Coprinus cinereus (XM_001836748)

P. patens/Phanerochaete 151 aa 25.0 0.428 chrysosporium (JGI: scaffold_7 [325337..326089])

P. patens/Cryptococcus 152 aa 20.1 0.521 neoformans (XM_569621)

C. cinereus/Ph. chrysosporium 151 aa 64.9 ND

C. cinereus/C. neoformans 151 aa 51.3 0.472

Ph. chrysosporium/C. 151 aa 58.6 ND neoformans

uric acid-xanthine permease (uapA)a

P. patens (XM_001784081)/ 466 aa 40.7 0.266 Ashbya gossypii (NM_212305)

P. patens/C. cinereus 456 aa 38.4 0.288 (XM_001839036)

P. patens/Ph. chrysosporium 391 aa 43.6 0.263 (JGI: scaffold_22 [347275..348820])

P. patens/C. neoformans 472 aa 36.9 0.318 (AF542528)

C. cinereus/Ph. chrysosporium 391 aa 67.9 ND

C. cinereus/C. neoformans 456 aa 39.6 0.560 Novikova et al. BMC Genomics 2010, 11:231 Page 13 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 3: Amino acid divergences of proteins and RT-Int fragments of CHD-containing Gypsy LTR retrotransposons from Tcn1, Pyggy and Pyret clades (Continued)

Ph. chrysosporium/C. 391 aa 43.2 ND neoformans

Tcn1

Tcn1 C. neoformans/ 684 aa 53.6 0.441 Ccchromovir-1 C. cinereus

Tcn1 C. neoformans/ 684 aa 52.2 ND PcMetavir6 Ph. chrysosporium

Tcn1 C. neoformans/ 677 aa 49.1 0.381 BatDenTy3-1 B. dendrobatidis

PcMetavir6 Ph. chrysosporium/ 688 aa 72.4 ND Ccchromovir-1 C. cinereus

PcMetavir6 Ph. chrysosporium/ 677 aa 47.0 0.412 BatDenTy3-1 Batrachochytrium dendrobatidis

BatDenTy3-1 B. dendrobatidis/ 677 aa 46.5 0.411 Ccchromovir-1 C. cinereus

Tcn1 C. neoformans/ 675 aa 52.0 0.216 (HT) PpatensLTR1 P. patens

Tcn1 C. neoformans/SM-Tcn1 682 aa 50.3 0.235 (HT) Selaginella moellendorffii

PcMetavir6 Ph. chrysosporium/ 675 aa 51.8 0.217 (HT) PpatensLTR1 P. patens

PcMetavir6 Ph. chrysosporium/ 682 aa 49.5 0.230 (HT) SM-Tcn1 S. moellendorffii

PpatensLTR1 P. patens/SM- 675 aa 58.6 0.429 Tcn1 S. moellendorffii

Pyggy

PyrTriTy3-2 Pyrenophora tritici- 706 aa 77.0 0.259 (HT) repentis/NecHaemTy3- 4Nectria haematococca

PyrTriTy3-2 P. tritici-repentis/ 706 aa 48.5 0.706 ChaGloTy3-8 Chaetomium globosum Novikova et al. BMC Genomics 2010, 11:231 Page 14 of 20 http://www.biomedcentral.com/1471-2164/11/231

Table 3: Amino acid divergences of proteins and RT-Int fragments of CHD-containing Gypsy LTR retrotransposons from Tcn1, Pyggy and Pyret clades (Continued)

PyrTriTy3-2P. tritici-repentis/ 701 aa 49.2 0.703 grh Magnaporthe grisea

PyrTriTy3-2 P. tritici-repentis/ 695 aa 55.5 0.531 Dane4Aspergillus nidulans

AltBraTy3-2Alternaria 709 aa 47.7 0.880 brassicicola/NecHaemTy3-4N. haematococca

AltBraTy3-2A. brassicicola/ 695 aa 49.2 0.649 Dane4 A. nidulans

grh M. grisea/Dane4 A. 695 aa 48.5 0.670 nidulans

Pyret

skippy Fusarium oxysporum/ 648 aa 40.4 0.772 PyrTriTy3-1 P. tritici-repentis

skippy F. oxysporum/AFLAV 673 aa 41.4 0.766 Aaspergillus flavus

AFLAV A. flavus/PyrTriTy3-1 P. 648 aa 44.3 0.656 tritici-repentis a The corresponding accession numbers in GenBank are provided in the brackets; b ND - not determined: information concerning time divergence between species groups is unavailable; HT - putative horizontal transmission should not be neglected. In the absence of information recognized by the presence of very closely related mobile about genomic sequences from monilophytes, PCR elements in distant host taxa [29-33]. HT is well known screening seems to be very useful for isolation and char- for gypsy LTR retrotransposons in Drosophila [30] and acterization of new LTR retrotransposons from this has been suggested to have occurred in plants [2,24]. group of non-seed vascular plants. Two new well-sup- Recently, the evidence was provided for HT of RIRE1 LTR ported clades, Galahad and Mordred, as well as several retrotransposon between representatives of genus Oryza other previously unknown lineages of CHD-containing [32] and Route66 LTR retrotransposon between repre- Gypsy LTR retrotransposons were described based on the sentatives of Panicoideae (Poaceae) and several species of results of PCR-mediated survey of RT fragments from the genus Oryza [33]. ferns, horsetails and lycophytes. Several criteria can be used for HT event recognition. One of the clades originally described for fungal The first criterion is inconsistencies between the phylog- genomes, Tcn1, appeared to be present in genomes of enies of transposable elements (TEs) and host species mosses (Bryophytes) and lycophytes (genus Selaginella) [29,34]. There are potential problems with application of (Figure 5). Such an interkingdom distribution is not typi- this criterion for HT detection. Multiple transposable ele- cal for CHD-containing LTR retrotransposons clades ment lineages can be present within genomes. Moreover, which are usually very specific for a particular taxonomic transposable elements are multicopy components of group [see [8]]. Data suggested that horizontal transmis- genomes. Comparisons of paralogous copies instead of sion took place between fungi and non-seed plants (prob- orthologs along with varying rates of their sequence evo- ably mosses and lycophytes). Horizontal transmissions or lution are the main sources for incongruence in phyloge- horizontal transfers (HTs) of mobile elements are usually netic analysis, this could be misidentified as HT. The Novikova et al. BMC Genomics 2010, 11:231 Page 15 of 20 http://www.biomedcentral.com/1471-2164/11/231

second criterion, which seems to offer the strongest evi- and red algae as well as in all seed plants investigated so dence, is a higher degree of observed sequence similarity far. The hypothesis of 'retained' genes in moss Physcomi- for transposable elements than for functional genes, so trella represented an attractive alternative to horizontal called 'slowdown effect on evolutionary rates'. Once transmission as an explanation of the phylogenetic incon- inserted, a new copy of transposable element is presumed sistencies as well as the existence of a number of func- to evolve without functional constrains. Thus, all types of tional genes in Physcomitrella genome, which seem to mutations should have an equal chance to be fixed [35]. have non-plant origin and can be found in bacteria, fungi The lower than expected sequence divergence of TEs in and protozoa but not in higher plants [28]. Nevertheless, comparison with non-mobile nuclear genes of the host the most important feature of Tcn1-like LTR retrotrans- species can be explained either by strong selective con- posons in the context of HT is their lower evolutionary straints in TE sequence coupled with a strict vertical rates in comparison with other groups of CHD-contain- transmission, or by horizontal transfer [31,36,37]. The ing LTR retrotransposons. The close examination and third criterion of inferring HT is the discontinuous distri- comparison of evolutionary rates for LTR retrotranspo- bution of TEs among closely related taxa, i.e., presence of sons including representatives of Tcn1, Pyggy and Pyret a TE in one lineage and its absence in a sister lineage. clades, and evolutionary rates estimated for putatively Such discontinuous distribution could be due to random 'retained' genes suggests that a horizontal transmission of loss of TEs, ancestral polymorphism, or independent Tcn1-like LTR retrotransposons took place among fungi sorting of copies into descendant species. By itself, this and the last common ancestor (LCA) of mosses and lyco- kind of evidence provides only weak support for HT since phytes (Table 3 and Figure 5) [27]. Alternatively, it is pos- TE can be lost through population dynamics or ecological sible, but highly unlikely, that two independent acts of HT forces that are difficult to reconstruct [38,39]. occurred. First HT event could happen among fungi and All three criteria are satisfied in case of Tcn1-like LTR LCA of mosses since all investigated mosses contain retrotransposons. They demonstrated patchy distribution Tcn1-like LTR retrotransposons [8]. The second HT among fungi and plants (Figure 5). They were found in all could occur among fungi and LCA of Selaginella since investigated mosses, but only in a few lycophytes and only representatives of this genus carry this group of ret- they absent in basal lineages of green plants such as green rotransposons among all investigated lycophytes (Figure 4 and Figure 5). It is necessary to note that despite HT seeming to be a preferable explanation for the observed distribution. The evidence is not strong enough to dis- D M EINA EKAY ALA RIEL R CR T G card other explanations; such as selective pressure cou- EUDICOTS pled with vertical transmission of retrotransposons in MONOCOTS 300 Mya genomes of non-seed plants and loss of these elements by BASAL ANGIOSPERMS AHA 360 Mya D GYMNOSPERMS AL DO RED other plants. G M MONILOPHYTES Another putative case of HT based on the results of 500 Mya Selaginella LYCOPHYTES present survey of LTR retrotransposons from fungal spe- MOSSES PLANTS LIVERWORTS cies was found for PyrTriTy3-2 LTR retrotransposon 1200 Mya CHAROPHYTES from Pyrenophora tritici-repentis Pt-1C-BFP (Dothideo- 1600 Mya CHLOROPHYTES mycetes). PyrTriTy3-2 belongs to Pyggy clade and ASCOMYCETES 1 1210 Mya 2 appeared to be more closely related to LTR retrotranspo- BASIDIOMYCETES FUNGI CHYTRIDIOMYCETES sons from Sordariomycetes (NecHaemTy3-4 from Nec- TCN1 tria haematococca MPVI and ChaGloTy3-8 from Figure 5 Distribution of different clades of CHD-containing Gyp- Chaetomium globosum CBS 148.51) than to the elements sy LTR retrotransposons in plants. Evolutionary tree is represented from other Dothideomycetes such as AltBraTy3-2 from according to Bowman et al., 2007 [40] and Berbee and Taylor, 2001 [41] Alternaria brassicicola ATCC 96866, REAL from Alter- with minor modifications. Divergence times (Mya - million years ago) naria alternata (AB025309) [42], and PYGGY from are indicated according to Hedges, 2002 [27]. Data suggest that Tcn1- like LTR retrotransposons were horizontally transmitted between fungi Pyrenophora graminea (AF533704) [43] (Figure 2). The and non-seed plants (indicated by arrows). Presumably HT took place pairwise comparisons of RT-Int fragments and investiga- among fungi and the last common ancestor (LCA) of mosses and lyco- tion of evolutionary rates for retrotransposons from phytes (indicated as 1). Alternatively, it is possible that two indepen- Pyggy and Pyret clades revealed the unexpectedly high dent acts of HT occurred (indicated as 2). First HT event could happen similarity between PyrTriTy3-2 and NecHaemTy3-4 (77% among fungi and LCA of mosses since all investigated mosses contain Tcn1-like LTR retrotransposons. The second HT could occur among identical amino acids), much higher than between any fungi and LCA of Selaginella since only representatives of this genus other retrotransposons from Pyggy or Pyret clades, and carry this group of retrotransposons among all the investigated lyco- at least two times lower evolutionary rate in the couple phytes. Novikova et al. BMC Genomics 2010, 11:231 Page 16 of 20 http://www.biomedcentral.com/1471-2164/11/231

PyrTriTy3-2/NecHaemTy3-4 than in comparisons of Conclusions other LTR retrotransposons (Table 3). Tcn1-like LTR retrotransposons were found in basidio- The high similarity, phylogenetic inconsistencies, as mycota fungi and non-seed plants, including all investi- well as lower evolutionary rates could be explained by gated mosses and lycophytes from genus Selaginella. very strict evolutionary constraints or a HT event. How- Such interkingdom distribution is not typical for chro- ever, taking into consideration that the high selective modomain-containing LTR retrotransposons clades pressure could be implemented only in the case of func- which are usually very specific for a particular taxonomic tional importance of the PyrTriTy3-2 or NecHaemTy3-4, group and can be explained by strong selective con- HT looks more preferable for the explanation of the straints and the 'retained' genes theory or by horizontal described case. It is known that transposable elements transmission. The close examination and comparison of can alter gene expression since they carry their own regu- evolutionary rates for LTR retrotransposons including latory sequences and insertions can be selectively advan- representatives of Tcn1 and two other clades of LTR ret- tageous. However, only those transposable elements, rotransposons, and evolutionary rates estimated for puta- which were involved in regulation, evolve under strict tively 'retained' genes from mosses and fungi suggests selective pressure [44,45]. that a horizontal transmission of Tcn1-like LTR ret- While extremely rare, horizontal transfer seems to be rotransposons took place among fungi and mosses/lyco- quite common and recurrent in eukaryotes. An incom- phytes. However the evidence is not strong enough to plete list of putative HT events includes: HT as a key discard other explanations; such as selective pressure event in the evolution of several fungal genes [46-48]; HT coupled with vertical transmission of retrotransposons in from fungi to rice weevil Sitophilus oryzae proposed for genomes of non-seed plants and loss of these elements by pectinase gene [49]; numerous HT events described for other plants. eukaryotic transposable elements [30-38]; as well as HTs of mitochondrial genes, for example, multiple angio- Methods sperm-angiosperm HTs of homing group I intron in the Genomic sequences screening, sequence and phylogenetic mitochondrial cox1 gene (for a review, see [50]); and a HT analysis of the intron II and two adjacent exons of the mitochon- Fungal genomic sequences are available at: Fungal drial nad1 gene from the flowering plants (angiosperms) Genome Initiative [56]; The DOE Joint Genome Institute to Gnetum (gymnosperms) [51]. [57]; and The Sanger Institute [58]. The source of individ- The actual mechanisms of horizontal transfer for ual genomes can be found in table represented in Addi- eukaryotic genes and transposable elements are still tional file 5. unknown since it is not possible to show experimentally We used UniPro uGENE software [59] for LTR ret- how HT can occur. Parasites, symbionts, bacteria, or rotransposons identification. The designed pipeline for viruses all could be suggested as potential vectors for hor- Gypsy LTR retrotransposons identification and classifica- izontal transfer. Moreover, based on an example of mas- tion included: loading genomic sequence, translation of sive HT from a land plant donor to the basal angiosperm genomic sequence over six possible reading frames to Amborella trichopoda, it has been demonstrated that amino acids, and subsequent search for homologous direct plant-to-plant transfer can take a place [52]. The regions performed using "HMMER search" options of associations between biotrophic fungi and their plant UniPro uGENE. The algorithm of HMMER search is hosts are ubiquitous in nature and range from mutually based on profile hidden Markov models, which can per- beneficial to potentially fatal pathogenic interactions. form amino acid sequence searches by use of an appro- Mycorrhiza refers to an association or symbiosis between priate profile [60]. For the analyses, we used a multiple plants and fungi that colonize the cortical tissue of plant alignment consensus sequence, which contains Gypsy roots. Ectomycorrhizal fungi are mostly basidiomycetes LTR retrotransposon reverse transcriptase (RT) and par- that grow between root cortical cells of many tree species tial integrase (Int) domains. The profile HMM, based on whereas arbuscular mycorrhizal (AM) fungi belong to the this consensus sequence, was built using UniPro uGENE order Glomales (Glomeromycota) and form highly software. An additional test for the presence of RT and branched structures called arbuscules, within root corti- partial Int domains was performed using BLAST (blastp) cal cells of wide range of land plant species [53-55]. Both which also was incorporated in the designed pipeline. All types of mycorrhiza represent intimate association and BLAST analysis was essentially performed using could provide suitable conditions for HT of transposable sequence databases accessible from the National Center elements. AM-like mycorrhiza is widely distributed for Biotechnology Information [61]. The classification of among mosses, ferns and lycophytes (for review, [53]). the newly identified elements was performed by a com- parative analysis of their sequences. Newly identified ele- Novikova et al. BMC Genomics 2010, 11:231 Page 17 of 20 http://www.biomedcentral.com/1471-2164/11/231

ments and their accession numbers in public databases Sordaryomycetes/Dothideomycetes, 490 Myr according are listed in Additional file 6. to Padovan et al. (2005) [74]; and Heterobasidiomycetes The whole nucleotide sequences of the transposable (or Tremellomycetes)/Homobasidiomycetes (Agaricomy- elements, if possible, were also extracted with the assis- cetes and Dacrymycetes), 700 Myr according to Hibbett tance of UniPro uGENE software. After localization of et al. (2007) [75] and Taylor et al. (2004) [76]. amino acid sequences obtained during HMMER search in the initial genomes in its nucleotide representation, the Species collection and total DNA isolation sequences were expanded up to 15 Kb and used for long Table 2 lists plant species, and Table 3 lists fungal species terminal repeats (LTRs) search. The algorithm for repeats used in present study. The of vascular non- search, 'Repeat Find', is included to the UniPro uGENE as seed plants (monilophytes and lycophytes) is given after well as the visualization feature and 'ORF Find' option Pryer et al. (2004) [22], Smith et al. (2006) [21], and Korall which were used to identify the putatively intact copies of et al. (2007) [77]. Plant species (monilophytes) were col- LTR retrotransposons. Structural features of newly iden- lected in nature. The detailed label data are available from tified LTR retrotransposons can be found in Additional the authors. The genomic DNA of lycophytes was pro- file 1. vided by the Royal Botanic Gardens, Kew, London, UK Tcn1-like LTR retrotransposon search was carried out [78]. Genomic DNA was isolated from the leaves. Extrac- using BLAST (blastp and blastx). BLAST analysis was tion was performed using the QIAGEN DNeasy Plant performed using sequence databases accessible from the Mini Kit (QIAGEN). Isolated DNA was used directly in National Center for Biotechnology Information (NCBI) PCR amplifications. server [59], The U.S. Department of Energy Joint Gypsy LTR retrotransposons PCR amplification and Genome Institute [57], and Broad Institute of MIT and sequencing Harvard [56] as well as Phytozome, a tool for green plant Previously designed degenerate PCR primers for chro- comparative genomics [62]. The described copy of SM- modomain-containing Gypsy LTR retrotransposons were Tcn1 from spikemoss Selaginella moellendorffii (Lycopo- used in present study: GyRT1 = 5'-MRNATGTGYGT- diophyta) is located in scaffold_0 (1426925-1421008) of NGAYTAYMG-3' [24] and ty3-A = 5'-AATTCGCTGC- genomic sequence version 1.0 which is available at The CGCTAAGATNARNADRTCRTC-3' [8], where M = A + U.S. Department of Energy Joint Genome Institute web- C, Y = C + T, R = A + G, D = A + G + T and N = A + G + site [57]. The whole sequence of SM-Tcn1 with annota- C + T. These primers were designed to amplify the most tions can be found in Additional file 3. Other websites conserved part of the reverse transcriptase (RT) domain used in the present study were: Repbase [63], NCBI con- of LTR retrotransposons and were proved to be efficient served domain database and search service [64], ESTs [8,24]. The expected length of PCR products was about from Porphyra yezoensis at Kazusa DNA Research Insti- 320 bp. PCR amplification with degenerate primers was tute [65], Cyanidioschyzon merolae Genome Project [66], performed using 0.1 μg of genomic DNA in 10-μl volume The Plant Genomics Consortium [67], The Institute for of 10 mM Tris-HCl (pH 8.9), 1 mM (NH ) SO , 4 mM Genomic Research [68], Cassava and Leafy Spurge EST 4 2 4 Project [69]. MgCl2, 200 μM each of four dNTPs, 0.5 μM primers, and All multiple DNA alignments were performed by Clust- 2.5 units of Taq polymerase. After an initial denaturation alW [70] and edited manually in UniPro uGENE. Phylo- step for 3 min at 94°C, the PCR reactions were subjected genetic analyses were performed using the Neighbor- to 30 cycles of amplification consisting of 30 sec denatur- Joining (NJ) method in MEGA 4.0 program [71]. Statisti- ation at 94°C, 42 sec annealing at 50°C, and 1 min exten- cal support for the NJ tree was evaluated by bootstrap- sion at 72°C. PCR products were separated by agarose gel ping (number of replications, 1000) [72]. Evolutionary electrophoresis. The resulting PCR products were rates were estimated by standard methods [73]. Poisson directly ligated into a pGEM vector using a pGEM-T- correction distances (d) were estimated from the equa- Easy cloning kit (Promega) for sequence determination. tion d = -ln(1 - p), where p represents the proportion of Clones were amplified by PCR with M13 primers, and different amino acids. The rate of amino acid substitution 40 ng of the product was used in a 10 μl cycle sequencing (r) was estimated by the standard equation r = d/2T, reaction with the ABI BigDye Terminator Kit on an ABI where T is the divergence time of the last common ances- 310 Genetic Analyser (Applied Biosystems); or sequenc- tor of the compared species. The estimated divergence ing reactions were performed with Dye Terminator Cycle times used were: Plants/Fungi, 1500 Myr and Basidiomy- Sequencing Kit (Beckman Coulter) and analyzed on CEQ cetes/Ascomycetes, 1200 Myr according to Hedges 8000 Genetic Analysis System. Sequences were deposited (2002) [27]; Homobasidiomycetes/Chytridiomycetes, 900 to GenBank under Acc. Numbers GQ443314-GQ443445 Myr, Sordaryomycetes/Eurotiomycetes, 540 Myr, and and AY959294-AY959313. Novikova et al. BMC Genomics 2010, 11:231 Page 18 of 20 http://www.biomedcentral.com/1471-2164/11/231

List of abbreviations used Author Details 1 LTR: long terminal repeat; SINE: short interspersed Laboratory of Molecular Genetic Systems, Institute of Cytology and Genetics, Novosibirsk, Russia and 2Novosibirsk State University, Novosibirsk, Russia nuclear element; CHD: chromodomain; RT: reverse tran- scriptase; Int: integrase; ORF: open reading frame; PR: Received: 15 October 2009 Accepted: 8 April 2010 Published: 8 April 2010 proteinase; dUTPase: deoxyuridine triphosphatase This©BMC 2010 isarticleGenomics an Novikova Open is available 2010, Access et al; 11 from:licenseearticle:231 http:/ distributed BioMed/www.biomedcentral.com/1471-2164/11/231 Central under Ltd.the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. domain; PPT: polypurine tract; PBS: primer-binding site; References HT: horizontal transfer or horizontal transmission; TE: 1. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, transposable element; LCA: last common ancestor; AM: Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH: A unified classification system for eukaryotic transposable elements. Nat arbuscular mycorrhiza; HMM: hidden Markov models; Rev Genet 2007, 8:973-982. Myr: million years; Mya: million years ago. 2. Marin I, Llorens C: Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol Biol Evol 2000, 17:1040-1049. Additional material 3. Gorinsek B, Gubensek F, Kordis D: Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol 2004, 21:781-798. Additional file 1 Structure of novel LTR retrotransposons from Fungi. 4. Paro R, Hogness DS: The Polycomb protein shares a homologous Table contained list of novel LTR retrotransposons from Fungi detected in domain with a heterochromatin-associated protein of Drosophila. Proc present study, their copy numbers and putative structure including pre- Natl Acad Sci USA 1991, 88:263-267. dicted enzymatic domains. 5. Koonin EV, Zhou S, Lucchesi JC: The chromo superfamily: new members, Additional file 2 Phylogenetic analysis of dUTPase. Neighbor-Joining duplication of the chromo domain and possible role in delivering phylogenetic tree reconstructed based on dUTPase amino acid sequences transcription regulators to chromatin. Nucleic Acids Res 1995, from eukaryotes, viruses, and dUTPase domains from CHD-containing 23:4229-4233. Gypsy LTR retrotransposons. 6. Brehm A, Tufteland KR, Aasland R, Becker PB: The many colours of chromodomains. Bioessays 2004, 26:133-140. Additional file 3 SM-Tcn1 CHD-containing Gypsy LTR retrotranspo- 7. Gao X, Hou Y, Ebina H, Levin HL, Voytas DF: Chromodomains direct son. Sequence of SM-Tcn1 CHD-containing Gypsy LTR retrotransposon integration of retrotransposons to heterochromatin. Genome Res 2008, from spikemoss Selaginella moellendorffii (Lycopodiophyta) with annota- 18:359-369. tions in GenBank format. 8. Novikova O, Mayorov V, Smyshlyaev G, Fursov M, Adkison L, Pisarenko O, Additional file 4 Phylogenetic analysis of Athila-like LTR retrotrans- Blinov A: Novel clades of chromodomain-containing Gypsy LTR posons. Neighbor-joining (NJ) phylogenetic tree based on RT nucleotide retrotransposons from mosses (Bryophyta). Plant J 2008, 56:562-574. sequences of Athila-like LTR retrotransposons including newly described 9. Novikova O: Chromodomains and LTR retrotransposons in plants. elements. Comm & Integr Biol 2009, 2:158-162. Additional file 5 List of fungal species, genomes of which were ana- 10. Neuvéglise C, Feldmann H, Bon E, Gaillardin C, Casaregola S: Genomic lyzed. Table contained the list of fungal species, genomes of which were evolution of the long terminal repeat retrotransposons in analyzed in silico in the present study and the sources of genomic hemiascomycetous yeasts. Genome Res 2002, 12:930-943. sequences. 11. Schmid-Berger N, Schmid B, Barth G: Ylt1, a highly repetitive Additional file 6 Novel Gypsy LTR retrotransposons from Fungi. Table retrotransposon in the genome of the dimorphic fungus Yarrowia contained the list of novel Gypsy LTR retrotransposons from Fungi detected lipolytica. J Bacteriol 1994, 176:2477-2482. in present study and their accession numbers. 12. Goodwin TJ, Poulter RT: Multiple LTR-retrotransposon families in the asexual yeast Candida albicans. Genome Res 2000, 10:174-191. 13. Goodwin TJ, Poulter RT: The diversity of retrotransposons in the yeast Authors' contributions Cryptococcus neoformans. Yeast 2001, 18:865-880. ON participated in the design of the study, carried out the analysis, partici- 14. Novikova OS, Blinov AG: dUTPase-containing Metaviridae LTR pated in the sequence analysis and drafted the manuscript. GS contributed to retrotransposons from the genome of Phanerochaete chrysosporium data acquisition, participated in analysis and interpretation. AB participated in (Fungi: Basidiomycota). Dokl Bioch Bioph 2008, 420:146-149. the design of the study, performed coordination and has given final approval 15. Riccioni C, Rubini A, Belfiori B, Passeri V, Paolocci F, Arcioni S: Tmt1: the of the submitted version. All authors read and approved the final manuscript. first LTR-retrotransposon from a Tuber spp. Curr Genet 2008, 53:23-34. 16. Payne SL, Elder JH: The role of retroviral dUTPases in replication and Acknowledgements virulence. Curr Protein Pept Sci 2001, 2:381-388. Author thanks Dr. Mark L. Farman (Department of Plant Pathology, University of 17. Varmus H, Brown P: Retroviruses. In Mobile DNA Edited by: Berg DE, Kentucky, USA) for the helpful comments and Dr. David Thornbury (Depart- Howe MM. Washington: American Society for Microbiology; 1989:53-108. ment of Plant Pathology, University of Kentucky, USA) for his stylistic sugges- 18. Wilhelm M, Wilhelm FX: Reverse transcription of retroviruses and LTR tions. This work was supported by the Russian Foundation for Basic Research retrotransposons. Cell Mol Life Sci 2001, 58:1246-1262. (grant number RFBR 09-04-00360-a) and by state contract 10002-251/П-25/ 19. Levin HL: A novel mechanism of self-primed reverse transcription 155-270/200404-082 and Siberian Branch of the Russian Academy of Sciences defines a new family of retroelements. Mol Cell Biol 1995, 15:3310-3317. (project No. 10.4). 20. Wang W, Tanurdzic M, Luo M, Sisneros N, Kim HR, Weng JK, Kudrna D, The sequence data for Laccaria bicolor, Trichoderma reesei, Trichoderma virens, Mueller C, Arumuganathan K, Carlson J, Chapple C, de Pamphilis C, Aspergillus niger, Stagonospora nodorum, Postia placenta and Sporobolomyces Mandoli D, Tomkins J, Wing RA, Banks JA, et al.: Construction of a roseus were produced by the US Department of Energy Joint Genome Institute bacterial artificial chromosome library from the spikemoss Selaginella http://www.jgi.doe.gov/. Preliminary sequence data for Alternaria brassicicola moellendorffii: a new resource for plant comparative genomics. BMC were obtained from Genome Sequencing Center at Washington University Plant Biol 2005, 5:10. Medical School http://genome.wustl.edu/. 21. Smith AR, Pryer KM, Schuettpelz E, Korall P, Schneider H, Wolf PG: A The genomic DNA of lycophytes was provided by the Royal Botanic Gardens, classification for extant ferns. Taxon 2006, 55:705-731. Kew, London, UK http://www.kew.org. Monilophytes material was kindly pro- 22. Pryer KM, Schuettpelz E, Wolf PG, Schneider H, Smith AR, Cranfill R: Phylogeny and evolution of ferns (monilophytes) with a focus on the vided by Dr. Alexander Shmakov (Altai State University, Barnaul, Russia) and Dr. early leptosporangiate divergences. Am J Bot 2004, 91:1582-1598. Elena Korolyuk (Central Siberian Botanical Garden, Novosibirsk, Russia). 23. Wikstrom N, Kenrick P: Evolution of Lycopodiaceae (Lycopsida): estimating divergence times from rbcL gene sequences by use of nonparametric rate smoothing. Mol Phyl Evol 2001, 19:177-186. Novikova et al. BMC Genomics 2010, 11:231 Page 19 of 20 http://www.biomedcentral.com/1471-2164/11/231

24. Friesen N, Brandes A, Heslop-Harrison JS: Diversity, origin, and 48. Khaldi N, Collemare J, Lebrun MH, Wolfe KH: Evidence for horizontal distribution of retrotransposons (gypsy and copia) in conifers. Mol Biol transfer of a secondary metabolite gene cluster between fungi. Evol 2001, 18:1176-1188. Genome Biol 2008, 9:R18. 25. Wright DA, Voytas DF: Potential retroviruses in plants: Tat1 is related to a 49. Shen Z, Denton M, Mutti N, Pappan K, Kanost MR, Reese JC, Reeck GR: group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode Polygalacturonase from Sitophilus oryzae: possible horizontal transfer envelope-like proteins. Genetics 1998, 149:703-715. of a pectinase gene from fungi to weevils. J Insect Sci 2003, 3:24. 26. Vitte C, Panaud O: LTR retrotransposons and flowering plant genome 50. Richardson AO, Palmer JD: Horizontal gene transfer in plants. J Exp Bot size: emergence of the increase/decrease model. Cytogenet Genome Res 2007, 58:1-9. 2005, 110:91-107. 51. Won H, Renner SS: Horizontal gene transfer from flowering plants to 27. Hedges SB: The origin and evolution of model organisms. Nat Rev Genet Gnetum. Proc Natl Acad Sci USA 2003, 100:10824-10829. 2002, 3:838-849. 52. Bergthorsson U, Richardson AO, Young GJ, Goertzen LR, Palmer JD: 28. Rensing SA, Fritzowsky D, Lang D, Reski R: Protein encoding genes in an Massive horizontal transfer of mitochondrial genes from diverse land ancient plant: analysis of codon usage, retained genes and splice sites plant donors to the basal angiosperm Amborella. Proc Natl Acad Sci USA in a moss, Physcomitrella patens. BMC Genomics 2005, 6:43. 2004, 101:17747-17752. 29. Hartl DL, Lohe AR, Lozovskaya ER: Modern thoughts on an ancyent 53. Albrecht C, Geurts R, Bisseling T: Legume nodulation and mycorrhizae marinere: function, evolution, regulation. Annu Rev Genet 1997, formation; two extremes in host specificity meet. EMBO J 1999, 31:337-358. 18:281-288. 30. Jordan IK, Matyunina LV, McDonald JF: Evidence for the recent horizontal 54. Wang B, Qiu YL: Phylogenetic distribution and evolution of mycorrhizas transfer of long terminal repeat retrotransposon. Proc Natl Acad Sci USA in land plants. Mycorrhiza 2006, 16:299-363. 1999, 96:12621-12625. 55. Hibbett DS, Matheny PB: The relative ages of ectomycorrhizal 31. Novikova O, Sliwiñska E, Fet V, Settele J, Blinov A, Woyciechowski M: CR1 mushrooms and their plant hosts estimated using Bayesian relaxed clade of non-LTR retrotransposons from Maculinea butterflies molecular clock analyses. BMC Biol 2009, 7:13. (Lepidoptera: Lycaenidae): evidence for recent horizontal 56. Broad Institute [http://www.broad.mit.edu] transmission. BMC Evol Biol 2007, 7:93. 57. The DOE Joint Genome Institute [http://www.jcvi.org/] 32. Roulin A, Piegu B, Wing RA, Panaud O: Evidence of multiple horizontal 58. The Sanger Institute [http://www.sanger.ac.uk] transfers of the long terminal repeat retrotransposon RIRE1 within the 59. UniPro uGENE software [http://genome.unipro.ru] genus Oryza. Plant J 2008, 53:950-959. 60. Eddy SR: A probabilistic model of local sequence alignment that 33. Roulin A, Piegu B, Fortune PM, Sabot F, D'Hont A, Manicacci D, Panaud O: simplifies statistical significance estimation. PLoS Comput Biol 2008, Whole genome surveys of rice, maize and sorghum reveal multiple 4:e1000069. horizontal transfers of the LTR-retrotransposon Route66 in Poaceae. 61. National Center for Biotechnology Information [http:// BMC Evol Biol 2009, 9:58. www.ncbi.nlm.nih.gov] 34. Kidwell MG: Horizontal transfer. Curr Opin Genet Dev 1992, 2:868-873. 62. Phytozome, a tool for green plant comparative genomics [http:// 35. Volff JN, Körting C, Schartl M: Multiple lineages of the non-LTR www.phytozome.net] retrotransposon Rex1 with varying success in invading fish genomes. 63. Repbase [http://www.girinst.org] Mol Biol Evol 2000, 17:1673-1684. 64. National Center for Biotechnology Information conserved domain 36. Kordis D, Gubensek F: Unusual horizontal transfer of a long interspersed database and search service [http://www.ncbi.nlm.nih.gov/Structure/ nuclear element between distant vertebrate classes. Proc Natl Acad Sci cdd/cdd.shtml] USA 1998, 95:10704-10709. 65. ESTs from Porphyra yezoensis at Kazusa DNA Research Institute [http:/ 37. Novikova O, Fet V, Blinov A: Non-LTR retrotransposons in fungi. Funct /est.kazusa.or.jp] Integr Genomics 2009, 9:27-42. 66. Cyanidioschyzon merolae Genome Project [http://merolae.biol.s.u- 38. Kaplan N, Darden T, Langley CH: Evolution and extinction of tokyo.ac.jp] transposable elements in Mendelian populations. Genetics 1985, 67. The Plant Genomics Consortium [http://nypg.bio.nyu.edu/] 109:459-480. 68. The Institute for Genomic Research [http://www.tigr.org] 39. Lohe AR, Moriyama EN, Lidholm DA, Hartl DL: Horizontal transmission, 69. Cassava and Leafy Spurge EST Project [http://titan.biotec.uiuc.edu/ vertical inactivation, and stochastic loss of mariner-like transposable cassava] elements. Mol Biol Evol 1995, 12:62-72. 70. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the 40. Bowman JL, Floyd SK, Sakakibara K: Green genes-comparative genomics sensitivity of progressive multiple sequence alignment through of the green branch of life. Cell 2007, 129:229-234. sequence weighting, position-specific gap penalties and weight 41. Berbee ML, Taylor JW: Fungal Molecular Evolution: Gene Trees and matrix choice. Nucleic Acids Res 1994, 22:4673-4680. Geologic Time. In The Mycota: a comprehensive treatise on fungi as 71. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary experimental systems for basic and applied research. Systematics and Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 2007, Evolution, Part B Volume VII. Edited by: McLaughlin DJ, McLaughlin EG, 24:1596-1599. Lemke PA. New York: Springer-Verlag; 2001:229-246. 72. Felsenstein J: Confidence limits on phylogenies: an approach using the 42. Kaneko I, Tanaka A, Tsuge T: REAL, an LTR retrotransposon from the bootstrap. Evolution 1985, 39:783-791. plant pathogenic fungus Alternaria alternata. Mol Gen Genet 2000, 73. Nei M, Kumar S: Molecular evolution and phylogenetics New York: Oxford 263:625-634. University Press; 2000. 43. Taylor EJ, Konstantinova P, Leigh F, Bates JA, Lee D: Gypsy-like 74. Padovan AC, Sanson GF, Brunstein A, Briones MR: Fungi evolution retrotransposons in Pyrenophora: an abundant and informative class revisited: application of the penalized likelihood method to a Bayesian of molecular markers. Genome 2004, 47:519-525. fungal phylogeny provides a new perspective on phylogenetic 44. Medstrand P, Landry JR, Mager DL: Long terminal repeats are used as relationships and divergence dates of Ascomycota groups. J Mol Evol alternative promoters for the endothelin B receptor and 2005, 60:726-735. apolipoprotein C-I genes in humans. J Biol Chem 2001, 276:1896-1903. 75. Hibbett DS, Binder M, Bischoff JF, Blackwell M, Cannon PF, Eriksson OE, 45. Ono R, Kobayashi S, Wagatsuma H, Aisaka K, Kohda T, Kaneko-Ishino T, Huhndorf S, James T, Kirk PM, Lücking R, Thorsten Lumbsch H, Lutzoni F, Ishino F: A retrotransposon-derived gene, PEG10, is a novel imprinted Matheny PB, McLaughlin DJ, Powell MJ, Redhead S, Schoch CL, Spatafora gene located on human chromosome 7q21. Genomics 2001, JW, Stalpers JA, Vilgalys R, Aime MC, Aptroot A, Bauer R, Begerow D, Benny 73:232-237. GL, Castlebury LA, Crous PW, Dai YC, Gams W, Geiser DM, Griffith GW, 46. Wenzl P, Wong L, Kwang-won K, Jefferson RA: A functional screen Gueidan C, Hawksworth DL, Hestmark G, Hosaka K, Humber RA, Hyde KD, identifies lateral transfer of beta-glucuronidase (gus) from bacteria to Ironside JE, Kõljalg U, Kurtzman CP, Larsson KH, Lichtwardt R, Longcore J, fungi. Mol Biol Evol 2005, 22:308-316. Miadlikowska J, Miller A, Moncalvo JM, Mozley-Standridge S, Oberwinkler 47. Slot JC, Hibbett DS: Horizontal transfer of a nitrate assimilation gene F, Parmasto E, Reeb V, Rogers JD, Roux C, Ryvarden L, Sampaio JP, cluster and ecological transitions in fungi: a phylogenetic study. PLoS Schüssler A, Sugiyama J, Thorn RG, Tibell L, Untereiner WA, Walker C, Wang ONE 2007, 2:e1097. Novikova et al. BMC Genomics 2010, 11:231 Page 20 of 20 http://www.biomedcentral.com/1471-2164/11/231

Z, Weir A, Weiss M, White MM, Winka K, Yao YJ, Zhang N: A higher-level phylogenetic classification of the Fungi. Mycol Res 2007, 111:509-547. 76. Taylor JW, Spatafora J, O'Donnell K, Lutzoni F, James T, Hibbett DS, Geiser D, Bruns TD, Blackwell M: The relationships of fungi. In Assembling the tree of life Edited by: Cracraft J, Donoghue MJ. New York: Oxford University Press; 2004:171-196. 77. Korall P, Conant DS, Metzgar JS, Schneider H, Pryer KM: A molecular phylogeny of scaly tree ferns (Cyatheaceae). Am J Bot 2007, 94:873-886. 78. The Royal Botanic Gardens, Kew, London, UK [http://www.kew.org]

doi: 10.1186/1471-2164-11-231 Cite this article as: Novikova et al., Evolutionary genomics revealed interk- ingdom distribution of Tcn1-like chromodomain-containing Gypsy LTR ret- rotransposons among fungi and plants BMC Genomics 2010, 11:231