<<

Horizontal transfer of retrotransposons between PNAS PLUS bivalves and other aquatic of multiple phyla

Michael J. Metzgera,b,c, Ashley N. Paynterd, Mark E. Siddalld, and Stephen P. Goffa,b,e,1

aDepartment of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10027; bHoward Hughes Medical Institute, Columbia University, New York, NY 10027; cPacific Northwest Research Institute, Seattle, WA 98122; dSackler Institute of Comparative Genomics, American Museum of Natural History, New York, NY 10024; and eDepartment of Microbiology and Immunology, Columbia University, New York, NY 10027

Contributed by Stephen P. Goff, March 2, 2018 (sent for review October 2, 2017; reviewed by Cédric Feschotte and Welkin Johnson) The LTR retrotransposon Steamer is a selfish endogenous element We previously identified an LTR retrotransposon, Steamer,in in the soft-shell clam genome that was first detected because of its the genomes of soft-shell clams (Mya arenaria) that was highly dramatic amplification in bivalve transmissible neoplasia afflicting amplified in the transmissible cancer of that species (2–10 copies the species. We amplified and sequenced related retrotransposons per haploid genome in normal individuals vs. 100–300 copies in from the genomic DNA of many other bivalve species, finding neoplastic cells) (11, 12). Steamer is a 4,968-bp retrotransposon evidence of horizontal transfer of retrotransposons from the ge- in the Mag family of the Ty3 lineage of LTR retrotransposons, nome of one species to another. First, the phylogenetic tree of the with a single 1,335-aa gag-pol ORF. As with all other Mag ele- Steamer-like elements from 19 bivalve species is markedly discor- ments, it has no detectible env gene. The dramatically expanded dant with host phylogeny, suggesting frequent cross-species copy number of Steamer in the clonal cancer line may be re- transfer throughout bivalve evolution. Second, sequences nearly sponsible for the oncogenic phenotype or may contribute to the Steamer identical to were identified in the genomes of Atlantic continued evolution of this contagious cancer lineage. We also razor clams and Baltic clams, indicating recent transfer. Finally, a identified sequences of retroelements related to Steamer in sev- search of the National Center for Biotechnology Information se- eral bivalves susceptible to bivalve transmissible neoplasia, but quence database revealed that Steamer-like elements are present those specific retroelements were not amplified in the neoplasias in the genomes of completely unrelated organisms, including in those species (13). The identification of Steamer did, however, EVOLUTION zebrafish, sea urchin, acorn worms, and coral. Phylogenetic incon- gruity, a patchy distribution, and a higher similarity than would be prompt us to look further throughout the genomes of bivalves to expected by vertical inheritance all provide evidence for multiple understand the diversity of Steamer-like elements (SLEs) and to long-distance cross-phyla horizontal transfer events. These data determine if their phylogenetic relationships suggest vertical in- suggest that over both short- and long-term evolutionary time- heritance from a bivalve ancestor or more recent HTT between scales, Steamer-like retrotransposons, much like retroviruses, can species. We found SLEs in many, but not all, bivalve species and move between organisms and integrate new copies into new found evidence for multiple, frequent cross-species transfers, host genomes. including recent transfer of nearly identical elements between soft-shell clams, razor clams, and Baltic clams. We furthermore retrotransposons | transposable elements | horizontal transfer | bivalves | found evidence for widespread and frequent transfer throughout cross-phyla horizontal transfer bivalve evolutionary history and even cross-phyla transfer into

ransposable elements (TEs) are selfish genetic elements that Significance Tgenerate new copies of themselves within the genomes of their host cells and are inherited vertically during replication of An LTR retrotransposon, Steamer, was previously identified by their hosts. TEs include both the DNA transposons, which usu- virtue of high expression and dramatic amplification in a ally replicate through cut-and-paste mechanisms, and the retro- transmissible cancer in soft-shell clams (Mya arenaria). Here, transposons, which replicate through reverse transcription of an we investigated genome sequences obtained from both phys- RNA transcribed from a DNA copy resident in the host genome ical collections of bivalves and genome databases and found Steamer (1). These TEs most often increase their copy number by in- evidence of horizontal transfer of -like transposons from one species to another, with jumps between bivalves and tracellular retrotransposition, leading to new insertions into the even between of completely different phyla. Some genome of the cell they inhabit. In somatic cells, these events can events were ancient, but some (in particular, those between cause mutations and lead to cancers, and, if they occur in germ bivalves) appear to be recent, as the elements are nearly cells or progenitors of germ cells, can result in increased copy identical in different species. These data show that horizontal number in the germ-line genome of the host species (2, 3). The transfer of LTR retrotransposons like Steamer has occurred and gag and pol genes of LTR retrotransposons are related to those continues to occur frequently and that the marine environment of retroviruses (4), and it appears very likely that the vertebrate may be particularly suitable for transfer of transposons. retrovirus lineage itself arose from an ancestral LTR retro- transposon which acquired an envelope gene (5). Despite this Author contributions: M.J.M. and S.P.G. designed research; M.J.M. and A.N.P. performed research; M.J.M., A.N.P., and M.E.S. contributed new reagents/analytic tools; M.J.M. and evolutionary relationship with retroviruses, LTR retrotransposons S.P.G. analyzed data; and M.J.M. and S.P.G. wrote the paper. and other TEs are not expected to transmit easily from cell to Reviewers: C.F., Cornell University; and W.J., Boston College. cell or from individual to individual. It is even harder to under- The authors declare no conflict of interest. stand how these elements can be transmitted from the germ line of Published under the PNAS license. one species to another. To do this, the TE must be released from a Data deposition: The sequences reported in this paper have been deposited in the GenBank cell in one individual and then transported into and integrated database (accession nos. MH012205–MH012241 and MH025768–MH025795). into the germ line of a different organism. While this is a rare 1To whom correspondence should be addressed. Email: [email protected]. event compared with intracellular transposition, with multiple This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. barriers, there are numerous reports of horizontal transfer of 1073/pnas.1717227115/-/DCSupplemental. TEs (HTT) from one organism to another (6–10).

www.pnas.org/cgi/doi/10.1073/pnas.1717227115 PNAS Latest Articles | 1of9 Downloaded by guest on September 30, 2021 many other marine organisms, including vertebrates, sea urchins, of multiple independent researchers, and from multiple commercial and coral. sources. The integrity of the genomic DNA and the species iden- tities were confirmed by amplification and sequencing of a region of Results the mitochondrial cytochrome c oxidase subunit I gene (COI). Se- Multiple Cross-Species Transfers Throughout Bivalve Evolution. To quences from this gene could be amplified from only 24 species search for SLEs across the bivalve class, we performed PCR using reported pan-invertebrate primers (14), but by using other primer amplification using degenerate primers in conserved positions in variants, mitochondrial COI DNA was amplified and sequenced the reverse transcriptase-integrase (RT-IN) region of the pol gene from all 37 species. in the genomic DNA of 36 bivalve species (and one gastropod) Using degenerate primers in conserved regions of Steamer, obtained from the Ambrose Monell Cryo Collection (AMCC) of the SLE sequences were amplified from 19 of the 37 species analyzed American Museum of Natural History (AMNH), from collections (Table S1). The DNAs were cloned from these 18 bivalves and one

A B

2.5 3.5

3 2 2.5

1.5 2

1.5 1

1 0.5 0.5 SLE nucleotide sequence distance 0 SLE amino acid sequence distance 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 COI nucleotide sequence distance COI amino acid sequence distance

C COI SLE

Retusa obtusa candida 100 Cardites floridanus x Limaria pellucida 2 100 Placopecten magellanicus Placopecten magellanicus 2 70 Spondylus tenuis x Limaria pellucida Mytilus trossulus Pteria colymbus x Limaria pellucida 1 100 Anadara transversa x Geukensia demissa Barbatia candida obtusa 100 Crassostrea virginica Crassostrea virginica Crassostrea gigas 58 100 Mytilus trossulus Ischadium recurvum 100 Mytilus edulis Crassostrea gigas 92 Brachidontes exustus x patula 94 Geukensia demissa Mya arenaria 65 100 Ischadium recurvum directus 1 70 61 Ctena orbiculata x 86 78 100 Gastrochaena ovata x Limecola balthica 99 Siliqua patula Mytilus edulis 100 97 68 Ensis directus Cerastoderma edule 1 100 generosa Panopea generosa 100 Donax variabili x Mercenaria mercenaria 2 Limecola balthica Cerastoderma edule 4 78 Mya arenaria 100 Dreissena polymorpha philippinarum 100 Caryocorbula swiftiana x Polititapes aureus 1 62 Mulinia lateralis x Venerupis corrugata 2 84 75 55 Chama macerophylla x Dreissena polymorpha 96 66 Mercenaria mercenaria Cerastoderma edule 2 59 97 72 Chione elevata x 53 Venerupis philippinarum Venerupis corrugata 1 97 Polititapes aureus Mercenaria mercenaria 1 Venerupis corrugata Cerastoderma edule 3 100 Sphaerium fabale x Ensis directus 2 100 Cerastoderma glaucum x Placopecten magellanicus 1 Cerastoderma edule 66 Elliptio complanata x Mercenaria mercenaria 3 100 Lampsilis siliquoidea x Polititapes aureus 3 Lasmigona costata x Polititapes aureus 2

Fig. 1. Comparison of pairwise distances and phylogenetic trees of SLEs and host mitochondrial DNA. (A and B) Nucleotide sequence distances (A) and amino acid sequence distances (B) are plotted for all pairwise comparisons of bivalve SLE sequences (496) and are compared with the distances of the corresponding host COI [circles and crosses; comparisons between sequences reported in Paynter et al. (16) and Steamer from M. arenaria are marked with a cross]. A linear regression was plotted, and pairwise comparisons in which the SLE sequence was higher (blue) or lower (red) than expected are marked, using 1.5 times the IQR as the cutoff. (C, Left) The host phylogenetic tree was made from the alignment of COI nucleotide sequences (Fig. S1) rooted on the gastropod R. obtusa. (Right) The SLE phylogenetic tree was made from the alignment of amino acid sequences from the RT-IN region amplified by conserved primers, rooted on the Polititapes aureus 2/3/Mercenaria mercenaria 3 sequences. The 21 pairwise comparisons in which the SLE distance was lower than expected in either A or B are marked by red dotted lines. The HTT events predicted by both phylogenetic incongruity and pairwise distance comparison are shown on the host COI tree as red marks (the dotted mark represents a case in which pairwise distance comparison data could support either one or two events). Sequences represent amplified sequences from genomic DNA of 36 bivalve species and R. obtusa as well as a sequence from the C. gigas genome assembly (Table S1). PhyML 3.0 was used to generate the trees, and Dendroscope 3.5.9 was used to generate the tanglegram. An “x” marks a species in which no SLE was identified. The species of origin is listed, bootstrap values above 50 are shown, and nodes with bootstrap support below 25 have been condensed.

2of9 | www.pnas.org/cgi/doi/10.1073/pnas.1717227115 Metzger et al. Downloaded by guest on September 30, 2021 gastropod, and one to four distinct elements were sequenced from (Fig. S1) and current molecular phylogenetic analyses (17), but PNAS PLUS each species. Notably, while these sequences were similar to Steamer, when we aligned the host tree with the SLE tree, there was the sequences identified from each species were unique and profound discordance between the COI tree and the SLE tree therefore could not be explained by laboratory contamination with throughout the resulting tanglegram (Fig. 1). Many SLE sequences a clonal source of DNA. Additionally, a highly related SLE was are more closely related to each other than would be expected by identified in the fully sequenced genome (Crassostrea gigas, vertical inheritance, and there is evidence of acquisition of multiple, scaffold 39526) (15). SLE sequences were present in 11 families phylogenetically distinct SLEs in a single species (such as cockles, and nine orders. A previous study of bivalves from the AMCC which host at least four distinct elements). SLEs were not detected using highly specific primers (16) identified SLEs nearly identical in many species, leading to a patchy distribution of SLEs throughout to Steamer in two bivalves (Baltic clam, Limecola balthica, and Atlantic the bivalve lineage, although it is possible that other SLEs are razor clam, Ensis directus), and we confirmed those findings using present in some species but were not detected with the primers the degenerate primers. used here. We then compared the SLE phylogeny with the phylogeny of Additionally, we analyzed each pairwise comparison between the host organisms. The host phylogenetic tree based on COI bivalve SLEs and compared the distance to the host COI distance sequence agrees with the established of bivalve species (Fig. 1). Many of the comparisons fell along a line expected for

A Species Common name Location Steamer Ce1 Ce2 Ce3 Ce4 PEI-Wfar-NM01 Mya arenaria soft-shell clam PEI, Canada 100 - - - -

100 Mar-C6-Ge2 Mya arenaria soft-shell clam Germany 99.7 - - - - 98 Mar-C5-Or2 Mya arenaria soft-shell clam Oregon, USA 99.7 - - - - 99 WAP1 Mya arenaria soft-shell clam Washington, USA 98.9 - - - - 1Z Dreissena polymorpha zebra New York, USA - - 66.5 - - CeH3 Cerastoderma edule cockle Galicia, Spain - 100 100 100 100 50 98 AMNH-12F Cerastoderma edule cockle Suddorfer, Germany - 98.6 - 98.8 - 96 EVOLUTION 100 AMNH-16F Cerastoderma edule cockle Odde, Germany - - - - - 52 Cgl-06 Cerastoderma glaucum lagoon cockle Galicia, Spain - - - - - AMNH-11F Limecola balthica baltic clam Suddorfer, Germany 98.6 - - - - 100 100 AMNH-15F Limecola balthica baltic clam Odde, Germany - - - - - AMNH-13F Ensis directus atlantic razor clam Suddorfer, Germany 98.0 - - 72.6 - 97 100 AMNH-1R Ensis directus atlantic razor clam Maine, USA 99.1 - - - - 65 PRC1 Siliqua patula Washington, USA 92.0 - - - - Geo1 Panopea generosa Washington, USA 72.2 - - - - CB11 Mytilus trossulus mussel BC, Canada - - - - - 100 AMNH-18F Mytilus edulis mussel Suddorfer, Germany - 99.8 - - -

0.4 B Mya arenaria, PEI-WfarNM01 (Steamer) C Cerastoderma edule, CeH3 (SLECe2) 99 66 Mya arenaria, Mar-C5-Or2 Dreissena polymorpha, 1Z Mya arenaria, Mar-C6-Ge2 90 Ensis directus, AMNH-1R Venerupis corrugata 2 99 95 Ensis directus 1, AMNH-13F Venerupis corrugata 1 Mya arenaria, WAP1 99 85 Limecola balthica, AMNH-11F 0.2 Siliqua patula, PRC1 D Cerastoderma edule, CeH3 (SLECe3) Panopea generosa, Geo1 75 100 Cerastoderma edule, AMNH-12F Cerastoderma edule, AMNH-12F 100 Ensis directus 2, AMNH-13F 100 Cerastoderma edule, CeH3 (SLECe1) Venerupis corrugata 1 Mytilus edulis, AMNH-18F 100 Mercenaria mercenaria 2 Venerupis philippinarum

0.09 0.4

Fig. 2. Polymorphism within species and evidence for multiple, recent HTT between bivalve species. (A, Left) A phylogenetic tree of COI sequences from multiple individuals from multiple locations is shown. Proposed HTT events are shown as lines indicating likely routes of transfer of Steamer (red) from E. directus to M. arenaria and L. balthica and of SLECe1 (blue) between M. edulis and populations of C. edule. Dotted lines between sequences similar to SLECe2 and SLECe3 show more distant relationships. The SLECe1 sequence from the individual used as a reference here (CeH3) is 99.2% identical to the previously published sequence from a different individual in the same population (KX018578). (Right) The Steamer-like elements found in each individual sample are shown. Columns correspond to the distinct elements found in M. arenaria and C. edule: Steamer (red), SLECe1 (blue), SLECe2 (green), and SLECe3 (brown). The first member of each group identified is the reference and is listed as 100%. If a member of that group was detected in the sample, the percent identity compared with the group reference is shown (treating gaps as missing data). Samples negative for PCR amplification of each group are marked as “−.” (B) Phylogenetic tree of the SLE clade including Steamer and SLECe1 sequences, rooted with the sequence from M. mercenaria.(C) Phylogenetic tree of SLECe2 and closely related sequences. (D) SLECe3 and closely related sequences. The sequence from S. patula has a single frameshift mutation, the sequence from P. generosa has two closely spaced frameshift mutations that maintain an ORF, and the sequence from D. polymorpha has four frameshift mutations.

Metzger et al. PNAS Latest Articles | 3of9 Downloaded by guest on September 30, 2021 vertical inheritance (SLE distance increases as host distance in- sequences of three more distinct elements were amplified from creases). In some comparisons, the SLE distance is low and the cockles (Cerastoderma edule) from Galicia, Spain. Here we tested host distance is high (marked in red in Fig. 1), providing strong additional cockle samples from two locations in Germany and found evidence of HTT events. In other comparisons, the SLE distances one individual with only two of the four SLEs (>98% identity to were much higher than expected (marked in blue in Fig. 1), and that found in cockles from Galicia), while no SLEs were amplified this represents comparisons of different lineages of SLE within from the individual from the other location. None of the SLE-Ce closely related species (for example, in cockles there are four sequences were detected in the closely related lagoon cockle distinct elements, so there are six pairwise comparisons with a SLE (Cerastoderma glaucum). Thus, while these SLEs were present in nucleotide distance of 1.1–1.7 and a host distance of zero). all cockles tested at a single site (Galicia), differences could be Overall, the complete discordance of the host and SLE trees, the observed in geographically separated populations of the same patchy distribution of SLEs in the bivalve lineage, and the high species, showing either that the HTT events were not fixed in the similarity of SLEs compared with lower similarity of host sequences species when the populations diverged or that the elements have strongly argue for many cases of cross-species HTT throughout the been lost from some populations. bivalve lineage, with vertical maintenance of some elements. Evidence of HTT of SLEs from Cockles. In addition to evidence of HT Recent Horizontal Transfer of Steamer Among Soft-Shell Clams, Razor of Steamer itself, we found evidence of cross-species HT of other Clams, and Baltic Clams. A recent report used highly specific pri- SLEs. An element nearly identical to SLECe1 was amplified mers to investigate the presence of sequence fragments of from the DNA of a sample of Mytilus edulis collected from the Steamer in the AMCC (16). Consistent with this study, we am- same location as a cockle with the element, strongly suggesting plified and sequenced a fragment of the pol gene that is nearly another recent HTT event (Fig. 2). SLECe1 is closely related to identical to the Steamer elements from soft-shell clams (M. are- Steamer itself (79.3% identical), and since SLECe1 is not found naria) in multiple Atlantic razor clams (E. directus) and in a in the closely related cockles (C. glaucum)ormussels(Mytilus trossulus), single sample of Baltic clam (L. balthica) (Fig. 2). We amplified the data are consistent with the hypothesis that SLECe1 in cockles and cloned two large fragments of retroelements from a sample and derives from an earlier HTT from the of E. directus. One clone covers most of the ORF (AMNH-1R- lineage or some other source. LPM-02, 1,662 bp, 97.5% identical to Steamer) and has an intact Other elements, less closely related to SLEs from cockles, RT-IN region but two frameshifts earlier in the element. The were also found in a zebra mussel (Dreissena polymorpha, 66.2% other clone was amplified with primers in each LTR, so it covers identical to SLECe2) and an Atlantic razor clam (E. directus, the entire coding region, and it has multiple frameshifts and stop 72.6% identical to SLECe3). These may be older HTT events, codons throughout the element (AMNH-1R-L2L2-02, 4,725 bp, but the lack of amplification of the SLECe3-like element in other 97.2% identical). Atlantic razor clams and soft-shell clams are E. directus individuals suggests that these may have been recent both bivalves, but they have been genetically isolated from each HTT events from unknown sources. other for an estimated 300–500 My (18–20), and their COI se- quences are only 65% identical. The presence of the nearly Cross-Phyla Horizontal Transfer of Retroelements. We next looked identical sequences in these distantly related species strongly for SLEs more broadly and systematically by searching the National argues for recent cross-species horizontal transfer (HT) of the Center for Biotechnological Information (NCBI) nucleotide data- retrotransposon from one species to the other. Additionally, the base with the same conserved RT-IN region of the pol gene from all sequences are too similar to be due to vertical inheritance of known SLEs, including Steamer itself and all SLEs identified the element, but the sequences are unique, and therefore they here and in our previous studies (32 total). Unexpectedly, SLEs are not due to contamination with any previously amplified or were identified in sequences from organisms of completely dif- sequenced DNA. ferent phyla, including Chordata (zebrafish, cichlids, and salmon), To explore the range of distribution of the Atlantic razor clam Echinodermata (sea urchin), Priapulida (marine priapulid worms), element in other razor clam species, we investigated genomic Hemichordata (acorn worms), Cnidaria (acropirid coral), and Porifera DNA from a Pacific razor clam (Siliqua patula), another member (sponges) (Fig. 3). Many of these sequences have been annotated as of the family, and a geoduck (Panopea generosa), an- K02A2.6-like, based on a more distantly related Caenorhabditis other member of the Adapedonta order. A highly similar se- elegans retrotransposon (InterPro accession Q09575). quence could be amplified from Pacific razor clams (92.0% These TBLASTN hits strongly suggest cross-phyla HT of SLEs, nucleotide identity, with one stop codon), and a less related se- but it is possible that genome assemblies are contaminated with quence could be amplified from the geoduck (72.2%, with two foreign sequences, so we validated each initial hit with a TBLASTN closely spaced frameshift mutations which combine to maintain search directly against the species’ current reference genome an ORF). These results are most consistent with the entry of a assembly. We checked the number of contigs in the genome that retrotransposon into the genome of a common ancestor of the contain sequences nearly identical to each hit and determined Adapedonta order, followed by vertical inheritance within the the size of those contigs. Four initial hits were identified only in Adapedonta order and then more recent cross-species HTT small contigs (<10 kb) and therefore were excluded from analysis. from razor clams into Baltic clams and soft-shell clams (possibly Each remaining hit (98 total) was found in contigs at least 59 kb in through other intermediate organisms). length, with 55 hits found in contigs >1 Mb. This confirms that the SLEs were found in high-confidence genomic sequences and rep- Lack of Fixation of SLEs in Some Species Suggests Recent HTT Events. resent bona fide sequences in the genomes of the correct species. The finding of a nearly identical element (98.6% both to Steamer To further validate the finding of cross-phyla HTT, we investigated from soft-shell clams and to the element from Atlantic razor the hits in the zebrafish genome more closely. Our search identified clams) in one sample of Baltic clam but not in another (Fig. 2) many sequence fragments with similarity to the RT-IN region and suggests a recent HTT event which has not become fixed in all five sequence regions with high identity to the entire Steamer members of the species. Interestingly, the individual that was retrotransposon. One sequence (found in CU571394.10) has evidence positive for Steamer was the one collected in the same location as of a 5-bp target-site duplication (AAGAG) at the predicted ends of the Atlantic razor clams, in the Suddorfer Strand in the North the LTRs, and the 5′ and 3′ LTRs are nearly identical (167 of Sea of Germany. 168 bp), consistent with recent retroelement integration. The other In addition to the one SLE identified previously in cockles elements have either only one LTR or have LTRs with mismatched (13) (previously termed “SLE-Ce” and now “SLECe1”), SLE sequences immediately adjacent, suggesting recombination between

4of9 | www.pnas.org/cgi/doi/10.1073/pnas.1717227115 Metzger et al. Downloaded by guest on September 30, 2021 PNAS PLUS

77 Mya arenaria (Steamer) A 76 97 Ensis directus 1 4 Limecola balthica 92 Siliqua patula 93 100 Cerastoderma edule 1 3.5 100 Mytilus edulis Panopea generosa Mercenaria mercenaria 2 3 64 XM_011666388 Strongylocentrotus purpuratus (5x, 114 - 829 kb) XM_011662828 Strongylocentrotus purpuratus (2x, 467 - 813 kb) 69 XM_015916950 Acropora digitifera (5x, 89 - 1,210 kb) 2.5 XM_015904904 Acropora digitifera (5x, 263 - 744 kb) 58 XM_015915497 Acropora digitifera (1x, 399 kb) Crassostrea gigas 2 100 XM_019259416 Larimichthys crocea (3x, 239 - 738 kb) 99 XM_015378095 Cyprinodon variegatus (1x, 1,311 kb) LN596472 Cyprinus carpio (1x, 110 kb) 1.5 96 CU571394 Danio rerio (3x, 52,660 - 62,628 kb) LN590702 Cyprinus carpio (1x, 7,653 kb) HF933223 Oryzias latipes (1x, 31,901 kb) 96 1 100 XM_014330929 Haplochromis burtoni (2x, 45 - 67 kb) 50 HF933211 Oryzias latipes (1x, 34,899 kb) XM_015603866 Astyanax, mexicanus (1x, 66 kb) 0.5 80 100 BX088646 Danio rerio (5x, 398 - 59,641 kb)

SLE amino acid sequence distance LN590916 Cyprinus carpio (3x, 18 - 1,260 kb) HM159471 Salmo salar (8x, 26,434 - 159,039 kb) 0 XM_019773121 Branchiostoma belcheri (3x, 176 - 3,733 kb) XM_021508498 Mizuhopecten yessoensis (7x, 290 - 1,411 kb) 0 0.25 0.5 0.75 1 1.25 1.5 XM_014821780 Priapulus caudatus (5x, 68 - 607 kb) 99 XM_014819782 Priapulus caudatus (1x, 102 kb) Host TopIIA and P-type ATPase amino acid sequence distance 82 XM_015901361 Acropora digitifera (3x, 515 - 1,031 kb) 100 XM_011449091 Crassostrea gigas (4x, 48 - 1,861 kb) Cerastoderma edule 4 52 XM_013227287 Biomphalaria glabrata (3x, 2 - 84 kb) XM_020773833 Montastraea faveolata (1x, 1,031 kb) 100 XM_011674777 Strongylocentrotus purpuratus (1x, 1,377 kb) XM_011663844 Strongylocentrotus purpuratus (1x, 763 kb) Retusa obtusa XM_015892014 Acropora digitifera (2x, 14 - 124 kb) B XM_021041930 Exaiptasia pallida (2x, 366 - 584 kb) 86 XM_011683927 Strongylocentrotus purpuratus (2x, 340 - 2,287 kb) Crassostrea virginica Ischadium recurvum 68 XM_021045156 Exaiptasia pallida (1x, 310 kb) XM_015922092 Acropora digitifera (1x, 235 kb) 94 XM_006813320 Saccoglossus kowalevskii (3x, 2 - 97 kb) XM_011663474 Strongylocentrotus purpuratus (3x, 103 - 832 kb) Chordata 69 XM_019758444 Branchiostoma belcheri (3x, 8 - 1,333 kb) XM_015900257 Acropora digitifera (1x, 926 kb) 89 XM_011417475 Crassostrea gigas (4x, 83 - 671 kb) Cnidaria XM_011439518 Crassostrea gigas (7x, 3 - 1,104 kb) 97 HF933227 Oryzias latipes (3x, 28 - 31,901 kb) 50 CR385090 Danio rerio (1x, 54,305 kb) Echinodermata 67 HF933230 Oryzias latipes (2x, 52 - 24,237 kb) 90 BX321920 Danio rerio (3x, 48 - 59,641 kb) LN594951 Cyprinus carpio (1x, 60 kb) Priapulida XM_014821056 Priapulus caudatus (1x, 78 kb) 100 XM_011442468 Crassostrea gigas (3x, 18 - 570 kb) Mytilus trossulus

Brachiopoda GU207459 Crassostrea gigas (5x, 159 - 1,589 kb) EVOLUTION Limaria pellucida 91 XM_019355045 Oreochromis niloticus (6x, 418 - 54,509 kb) 100 CR759901 Danio rerio (1x, 51,023 kb) Hemichordata LN591664 Cyprinus carpio (1x, 167 kb) XM_011446228 Crassostrea gigas (2x, 67 - 1,620 kb) 79 Geukensia demissa Porifera XM_013553308 Lingula anatina (2x, 69 - 335 kb) XM_013553501 Lingula anatina (7x, 84 - 559 kb) 62 Polititapes aureus 1 100 Venerupis corrugata 2 56 Venerupis philippinarum 100 Cerastoderma edule 2 63 63 Dreissena polymorpha Venerupis corrugata 1 99 XM_021505722 Mizuhopecten yessoensis (2x, 78 - 2,297 kb) 53 56 XM_020072616 Crassostrea gigas (2x, 235 - 303 kb) XM_014811798 Priapulus caudatus (2x, 106 - 308 kb) Mercenaria mercenaria 1 94 XM_021485688 Mizuhopecten yessoensis (1x, 366 kb) 93 HF933214 Oryzias latipes (4x, 35 - 34,899 kb) XM_019073533 Cyprinus carpio (2x, 100 - 19,716 kb) 54 CT956089 Danio rerio (5x, 346 - 62,628 kb) 98 89 GU129140 Salmo salar (44x, 1 - 159039 kb) 99 XM_016450855 Sinocyclocheilus anshuiensis (4x, 68 - 3,124 kb) BX005065 Danio rerio (6x, 423 - 78,094 kb) 100 Cerastoderma edule 3 100 Ensis directus 2 XM_011435015 Crassostrea gigas (3x, 11 - 1,470 kb) 87 XM_015942522 Nothobranchius furzeri (3x, 42 - 57,680 kb) 100 XM_020701023 Oryzias latipes (5x, 9 - 31,901 kb) 100 XM_019888906 Hippocampus comes (4x, 31 - 1,493 kb) 97 LN590713 Cyprinus carpio (2x, 11,347 - 16,766 kb) 100 XM_011678706 Strongylocentrotus purpuratus (3x, 47 - 850 kb) 57 M75723 Tripneustes gratilla (SURL) 95 XM_019092442 Cyprinus carpio (1x, 454 kb) 59 XM_019893937 Hippocampus comes (3x, 56 - 6,007 kb) XM_014823023 Priapulus caudatus (2x, 102 - 107 kb) Placopecten magellanicus 1 100 LN597085 Cyprinus carpio (5x, 6 - 27,918 kb) HF933227 Oryzias latipes (6x, 2 - 29,600 kb) LN594387 Cyprinus carpio (1x, 78 kb) 100 100 LN590717 Cyprinus carpio (1x, 12,725 kb) LN599955 Cyprinus carpio (2x, 46 - 428 kb) XM_017684405 Pygocentrus nattereri (5x, 41 - 2,506 kb) 99 XM_021484412 Mizuhopecten yessoensis (2x, 55 - 1,102 kb) Placopecten magellanicus 2 100 XM_021041583 Exaiptasia pallida (2x, 4 - 820 kb) XM_015898474 Acropora digitifera (1x, 1,056 kb) 51 XM_014821163 Priapulus caudatus (2x, 76 - 223 kb) XM_021057482 Exaiptasia pallida (1x, 59 kb) AB231867 Nematostella vectensis (14x, 43 - 1,304 kb) 98 XM_019348637 Oreochromis niloticus (4x, 31,245 - 44,097 kb) 99 XM_014408448 Maylandia zebra (6x, 4 - 14,997 kb) 99 HF933223 Oryzias latipes (3x, 4 - 31,901 kb) 97 LN590717 Cyprinus carpio (4x, 134 - 24,322 kb) XM_021042941 Exaiptasia pallida (1x, 364 kb) 72 XM_011668435 Strongylocentrotus purpuratus (2x, 101 - 270 kb) XM_020712351 Oryzias latipes (2x, 24,562 - 31,847 kb) 100 XM_015914925 Acropora digitifera (5x, 133 - 1,170 kb) 88 XM_021059990 Exaiptasia pallida (2x, 29 - 106 kb) XM_011407944 Amphimedon queenslandica (1x, 96 kb) 81 HF933215 Oryzias latipes (4x, 21 - 36,487 kb) 100 LN594209 Cyprinus carpio (1x, 224 kb) 92 Limaria pellucida Barbatia candida 82 XM_021488224 Mizuhopecten yessoensis (1x, 2,381 kb) 100 Polititapes aureus 2 71 XM_003724223 Strongylocentrotus purpuratus (6x, 190 - 938 kb) 94 Polititapes aureus 3 100 XM_021484917 Mizuhopecten yessoensis (4x, 1490 - 2,016 kb) Mercenaria mercenaria 3 100 EF199621 Chlamys farreri (CFG1) 97 EF199622 Mizuhopecten yessoensis (PYG1) X17219 Bombyx, mori (Mag)

0.5

Fig. 3. Pairwise comparisons and the phylogenetic tree of SLEs provide evidence of cross-phyla HTT. (A) Amino acid sequence distances are plotted for all 7,738 pairwise comparisons of SLEs and are compared with the distances of the corresponding host using combined alignments from TopIIA and P-Type ATPase as host reference genes. A line with a slope of 1 is shown in red. (B) A phylogenetic tree made from the alignment of amino acid sequences from the RT-IN region amplified by conserved primers and from TBLASTN searches of the NCBI nucleotide database. The species of origin is listed, and bootstrap values above 50 are shown. Each hit was used to search the reference genome assembly of the species; the number of contigs with hits and the size range of contigs with hits are in parentheses. The sequences of SURL, CFG1, and PYG1 are included as known references, and Mag is included as an outgroup. Members of different phyla are color coded as shown (the reference, Mag, from the silkworm genome, is gray). Red dotted lines connect tips of the tree for 39 pairwise comparisons in which the SLE distance is lower than the distance of the conserved host genes. Diamonds mark the proposed cross-phyla HTT events that are supported by both phylogenetic evidence and pairwise comparisons.

Metzger et al. PNAS Latest Articles | 5of9 Downloaded by guest on September 30, 2021 elements as well as insertions and deletions, and none has a com- The most likely explanation is that four (or more) distinct elements plete intact ORF. Genomic DNA from zebrafish embryos from were transmitted to an ancestor of the two species before their three laboratory lines (AB, TL, and Casper) were assayed for the divergence (∼85–125 Mya) (20). presence of the three elements that contain two LTRs, using a PCR Many groups of closely related SLE sequences are present in primer in the SLE and a reverse primer in the flanking genomic completely different organisms, and pairwise analyses of SLE sequence. Two of the elements were confirmed to be present in all distances provide further evidence of multiple cross-phyla transfer three lines, and the third was found in the Casper but not in the AB events of closely related elements (Figs. 3 and 4). These high- or TL lines (Fig. S2). confidence HTT events include multiple events between bivalves While the phylogenetic analysis of the TBLASTN search and fish, between fish and echinoderms, and in the SLE-Ce4 shows closely related SLEs in very distantly related organisms, clade between species including bivalves, a coral, and a priapulid which likely represent HTT events (Fig. 3), we additionally an- worm. Notably, from the systematic search of the entire NCBI alyzed all pairwise comparisons of SLEs and their corresponding nonredundant nucleotide database and the wide variety of phyla hosts. Since mitochondrial mutations occur at variable rates in represented therein, all of the species found to harbor SLEs are different phyla, nuclear genes were used as a host comparison. aquatic, providing a plausible route of transfer of these elements. According to the OrthoDB, there are 147 groups of proteins that are present in all of the phyla with SLEs, and there are no Discussion proteins that are single copy in 80% or more of the species. We While there have been multiple reports of HT of LTR retro- chose two proteins (TopIIA and P-type ATPase) with the lowest transposons between species (6–8), the majority of these have number of duplicate genes as host references. Combining the been observed within Drosophila (21) and between plant species amino acid sequences of these two host reference proteins, we (22, 23). Additionally, many of the LTR retrotransposon HTT find 39 pairwise comparisons in which the SLE distance is events in Drosophila involve the transfer of Gypsy, a retroelement smaller than the distance of host genes. These comparisons mark which has acquired an envelope gene (24), making it functionally 12 high-confidence cross-phyla HTT events and six intraphyla a retrovirus despite its falling within the Ty3/gypsy lineage of HTT events. The host reference genes are highly conserved, so retrotransposons rather than within the vertebrate retroviruses. this is a very stringent criterion and likely significantly underes- Cross-phyla HTT events have been observed with DNA transpo- timates HTT. sons, but as recently as 2010, cross-phyla HT of retrotransposons Interestingly, there is a large clade of SLEs that is only found in had been rarely observed (8, 25), and it was suggested that DNA fish. This provides evidence either of vertical inheritance through transposons may be more suited to long jumps into highly di- fish evolution or of more restricted fish-specific transfer of this vergent hosts (7). Recently, however, there have been three reports transposon. The pairwise comparisons suggest that there have been of ancient “long-distance” HTT between plants and arthropods at least five cross-species HTT events from one fish genome to (26) and between birds and nematodes (27), and there is even a another (although we cannot exclude the possibility of an inter- case of Tcn1-like LTR retrotransposons present in both fungi and mediate vector). We also observed pairs of closely related SLE plants, suggesting an ancient transkingdom HTT event (28). An sequences from zebrafish (Danio rerio)andcarp(Cyprinus carpio) earlier report has shown that sequences nearly identical to Steamer appearing in four distinct conserved positions in the tree of SLEs. could be found in two bivalve species in addition to M. arenaria

Mollusca 86 Crassostrea gigas Chordata 100 Mizuhopecten yessoensis Cnidaria 100 Biomphalaria glabrata Echinodermata 50 Lingula anatina Priapulida Priapulus caudatus Brachiopoda Bombyx mori Hemichordata 88 Saccoglossus kowalevskii Porifera Strongylocentrotus purpuratus 100 Haplochromis burtoni 100 Maylandia zebra 60 Oreochromis niloticus 73 Cyprinodon variegatus 58 54 Nothobranchius furzeri 100 Oryzias latipes 89 Larimichthys crocea Hippocampus comes Salmo salar 100 100 Astyanax mexicanus Pygocentrus nattereri 100 100 Cyprinus carpio 100 Sinocyclocheilus anshuiensis Danio rerio Branchiostoma belcheri 100 Nematostella vectensis 100 Exaiptasia pallida 100 Montastraea faveolata Acropora digitifera Amphimedon queenslandica

Fig. 4. Summary of cross-phyla HTT events. A phylogenetic tree was made using the TopIIA and P-type ATPase amino acid sequences of the species identified as harboring SLEs in the TBLASTN search. Bars mark HTT events supported by both phylogenetic analysis and by a lower than expected SLE distance. Bar colors identify the likely SLE donor based on the current evidence; striped bars reflect cases in which the donor could not be predicted.

6of9 | www.pnas.org/cgi/doi/10.1073/pnas.1717227115 Metzger et al. Downloaded by guest on September 30, 2021 (16), and here we have expanded that study with broad and systematic quite recently. The 5′ and 3′ LTRs are identical in the one sequenced PNAS PLUS strategies which show that there has been widespread HTT of SLEs full-length endogenous element present in the soft-shell clam throughout bivalve evolution as well as across phyla throughout the genome. This argues for a recent origin, but the short length of marine environment. the LTR (177 bp) makes an accurate analysis of integration date Phylogenetic incongruence and a patchy distribution across using a molecular clock difficult. The finding of Steamer in both phylogeny provide evidence to support claims of HTT, but it has Pacific and European samples of M. arenaria suggests that the been argued that phylogenetic analysis can be misleading in some first transmission into the M. arenaria genome occurred at least cases (7, 29). Additionally, PCR amplification can be negative due 800 y ago, as M. arenaria is believed to have been brought to Europe to trivial point mutations in primer regions, so a negative result around the 1300s (33). The Steamer sequences from the M. arenaria may not necessarily mean that no element is present, and a patchy samples in this study and in 10 more studies published recently (16) distribution may not necessarily mean that HTT has occurred. are all nearly identical, again suggesting recent entry into the Therefore, we have additionally analyzed pairwise comparisons genome or a recent expansion of a single lineage within the species. of SLEs and host sequence to identify cases in which the SLE The SLEs in cockles appear to be recently acquired as well. None of distances are lower than expected based on host sequences, the four cockle SLEs was present in all members of the species, based on the methods of previous studies of HTT (30, 31). The showing that they have not reached fixation, and none was identi- data from all methods support our conclusions of cross-species and fied in C. glaucum, suggesting that the HTT events all occurred since cross-phyla HTT. These pairwise comparisons can be problematic the divergence of C. edule and C. glaucum. when the host genes are subject to variable mutation rates. For For the most recent HTT events identified here (Steamer trans- example, mollusks appear to have a higher mitochondrial mutation ferring from E. directus to M. arenaria and L. balthica, and SLEs from rate than other phyla, so the COI distance between two bivalves, C. edule transferring to M. edulis and E. directus), the geographic such as soft-shell clams and mussels, is larger than the distance ranges of the hosts clearly overlap. In one case, phylogenetic between soft-shell clams and vertebrates or even sponges, which is incongruity suggests HTT between bivalves M. trossulus and clearly at odds with the phylogenetic tree generated by these se- L. pellucida; while these samples were taken from very different quences and with known taxonomy. We therefore did not use COI geographic locations (Pacific and Atlantic coasts, respectively) sequences and instead used nuclear genes in the analysis of cross- the host range of M. trossulus is quite large, and it hybridizes with phyla HTT. The criteria used to define unexpectedly similar SLEs other members of the Mytilus , so there would be ample depend on the host genes selected. The two host genes selected for opportunity for close proximity that could allow for HTT. One our analysis of cross-phyla HTT are highly conserved, and they are interesting exception is the predicted HTT between a cockle EVOLUTION indeed among a small list of genes conserved across all of the phyla (C. edule) and a zebra mussel (D. polymorpha). The zebra mussel harboring SLEs. Therefore, their use as a measure of host distance is the only freshwater mussel in which we identified an SLE, but means that our analysis is very conservative and likely overlooks many they are quite invasive in many geographic areas, and the pres- true cases of HTT. Overall, data of all three types (phylogenetic ence of C. edule in estuaries may have allowed for transfer. The incongruence, patchy distribution, and more similar than expected SLEs in the two species are only 66% identical to each other, so transposon sequence) support multiple HTT events throughout this is not likely to be a recent HTT event, and may also have bivalves and across phyla. occurred in a marine ancestor of the zebra mussel or transferred One additional limitation of HTT detection strategies is that through an unknown intermediate host. they are unlikely to identify HTT between closely related species. While many of the species involved in HTT are in close proximity, This makes the detected number of HTT events a conservative the exact mechanism of these events remains unknown. Evidence estimate. Assuming that there are greater barriers to HTT into of HTT through host–parasite interactions, such as the ancient more divergent hosts, then a significant number of HTT events HTT between birds and nematodes, suggested that pathogenic may be occurring within species or within closely related species interactions might be particularly suitable for HTT (27, 34–36). in an undetectable manner. Notably, we did not observe any sequences corresponding to ob- Potentially, contamination of either the DNA samples used in vious parasites acting as vehicles for HTT, although this could be PCR or contamination of the genome assemblies themselves due to the limited availability of marine parasite sequence data. could lead to spurious evidence suggesting HTT, but several lines The widespread transmissible neoplasias in bivalves may provide a of evidence suggest this is not the case here. Each bivalve SLE potential vector for the spread of TEs. While most cancers arise sequence and TBLASTN hit was unique, and after the exclusion of within an individual as a result of oncogenic changes within cells of four hits from small contigs, the majority of the hits were found in that individual, transmissible clonal cancers that jump from one more than one contig, the majority were found in contigs of >1Mb host to another [first found in Tasmanian devils (37) and dogs (38, in length, and all had hits in contigs of at least 59 kb. 39)] have been found in an increasing number of species. Recently, While previously reported cross-phyla HTT events were ancient, we found independent lineages of bivalve transmissible neoplasia the identification of three recent HTT events in the current study in at least four marine bivalve species, showing that this is a broad of bivalves suggests that LTR retrotransposons, at least in the phenomenon across the kingdom and that marine inver- Steamer clade, are active and able to transfer relatively frequently. tebrates may be particularly susceptible (12, 13). It has also been One previous report of HT of a retrotransposon in the marine shown that the transmissible neoplastic cells of soft-shell clams environment involved SURL elements, which belong to the same express high levels of Steamer RNA (11, 40), and reverse tran- Mag family retrotransposons as the SLEs (32). In this case, in- scriptase activity can be detected in hemolymph and cell super- vestigators found that vertical inheritance predominated within natants (11, 41–43), suggesting that some products of Steamer are echinoid species but also found evidence of HTT in a few cases being released from neoplastic cells. LTR retrotransposons also between highly related species. While there is evidence for vertical generate virus-like particles (VLPs) within the cell, and it is pos- inheritance of many copies of SLEs, we find strong evidence that sible that some other virus or envelope-like membrane protein HTT plays a significant role across multiple phyla, and vertical in- from the cell could transpackage these VLPs. Such VLPs could, in heritance can explain only a small number of the bivalve SLE principle, introduce SLE sequences into germ cells and thus into sequences identified. the germ line. Limited searches by EM have not revealed the The timing of HTT events is difficult to determine accurately presence of VLPs in the M. arenaria neoplastic cells. with the sequence data available here. The lack of fixation of The consequences of the frequent HT of retrotransposons into SLEs in some species argues that those HTT events occurred new species are unknown. It has been hypothesized that intro- after the most recent species divergence and may have occurred duction of a new element into a species without a specific control

Metzger et al. PNAS Latest Articles | 7of9 Downloaded by guest on September 30, 2021 mechanism could be followed by rapid expansion (6), which could Species Confirmation. The mitochondrial COI sequence of each species was lead to pathogenesis. Indeed, the transmissible cancer lineage compared with sequences available on the NCBI database to confirm species’ identities. Of 44 samples from 36 bivalve species and one gastropod, we spreading throughout the soft-shell clam population has suffered a ≥ massive amplification of the Steamer retrotransposon. It is unknown amplified a COI sequence that was 99% identical to the sequence from the same species in the NCBI database in 32 samples (24 species). In eight species, whether any of the new integration events in the transmissible no other COI sequence was available from that species for comparison, and cancer lineage are drivers of oncogenesis, but insertional muta- the sequence was 70–85% identical to another member of the same genus. genesis of retroelements can cause oncogenic mutations (3, 44). In three samples from the AMCC, Limaria pellucida, Codakia orbiculata,and In the canine transmissible venereal tumor, a canine LINE1 element Retusa obtusa (a gastropod used as an outgroup for the bivalve class), the was found integrated immediately upstream of the c-myc gene (45– match to a currently existing NCBI COI sequence from the respective species 47), likely playing a significant role in the evolution of that trans- was 70–90%, with no clear identical sequences in the NCBI database. This missible cancer. Thus, the introduction of Steamer or SLEs into new suggests high diversity within the species or misclassification at the species species can lead to detrimental mutations and may increase the level of either the AMCC samples or the previously reported samples in the NCBI database. One sample, Sphaerium fabale, was 99% identical to a dif- incidence of both conventional and transmissible cancers, and it may ferent species, Sphaerium striatinum (no S. fabale COI sequence was avail- have significant effects on the evolution of these organisms. able), suggesting either that the two species are not distinct or that either The findings here suggest that the marine/aquatic environ- the AMCC samples or the previously reported samples had been misclassified ment may be particularly amenable to HTT due to the ability of at the species level. In each case, the species identification made by the particles to spread without the exposure to UV or the dry air of AMCC was used in the analysis. the terrestrial environment. A recent transcriptomic study of the Pacific white shrimp (48) identified TEs of multiple classes which BLAST Search and Alignment. A TBLASTN search was conducted on the NCBI appear to be derived by HTT from other aquatic organisms. nonredundant nucleotide database using the conserved 226-aa region of Most of the organisms found to harbor SLEs are marine, al- Steamer pol targeted by primers DHKPL-F1 and PXRPW-R1 (Table S2) and all additional bivalve SLE sequences identified in this study (July 6, 2017). Target though several freshwater fish were identified as well. It is un- sequences were identified with complete coverage across the region that clear whether the transmission of the SLEs in fish occurred in excludes the primer sequences themselves. Sequences with e values above the freshwater environment or in the marine environment of an 10−60 were excluded, exact duplicates were excluded, and where multiple ancestor of those fish. While it is not possible to directly compare similar sequences (>80% identity) were identified within a single species, a the rates of HTT in the marine environment with that between single representative was selected. Additionally, bivalve SLE sequences al- terrestrial organisms, the results of our broad sampling of dif- ready in the NCBI database, which were a part of the query set, and SURL ferent bivalve species combined with a systematic database search were excluded, leaving 102 unique sequences. (with very stringent criteria) argue that the LTR transposons in- Each of these 102 hits was used as query for a TBLASTN search of the current genome assembly of the species in which it was found, counting only vestigated here have spread widely, but only within the aquatic −60 hits with e values below 10 , with >80% coverage and >80% identity. The environment. The cross-species and cross-phyla HTT events reported number of hits in each genome assembly, the number of unique contigs here also suggest that this is both a recent phenomenon and one containing hits, and the sizes of the contigs were recorded. Four queries in that has been occurring throughout long evolutionary timescales. this secondary BLAST search were found in only a single contig <10 kb in Together, these data provide evidence to suggest that the aquatic length, and therefore they were excluded from analysis, leaving 98 unique environment itself may act as both the vehicle and ecological sequences, with 407 total hits throughout the genome assemblies. connection (49) that can allow the spread of TEs from one genomic reservoir into new germ lines. Phylogenetic Analysis. Sequences were aligned with MAFFT v 7.311 (51) using the E-INS-i method (as the sequences include a variable region between two Materials and Methods conserved regions). Some manual adjustments in DNA alignments were made based on the amino acid alignments. Additional alignments using Sample Sources. Bivalve tissue samples were obtained from several sources CLUSTAL yielded similar results. Primer-binding regions were excluded from including the AMCC of the AMNH, multiple independent research collections analysis. Maximum likelihood phylogenetic trees were generated using and previous studies (11, 13, 50), and commercial sources. Species collected, PhyML 3.0 (52), with nearest neighbor interchange tree improvement and collection locations, and sources of additional bivalve samples are listed in 100 bootstrap replicates, treating gaps in the alignment as missing data. For Table S1. Species names and classifications are used according to the World amino acid trees, frameshifts were also treated as missing data. One se- Register of Marine Species (www.marinespecies.org). quence contained a “J,” which can stand for a site that is ambiguous for either leucine or isoleucine. We therefore excluded it from analysis by DNA Extraction. Tissues were frozen or stored in ethanol. Samples from the marking it as a missing site. Amino acid trees used the LG model +G+I, and AMNH collection were extracted using the E.Z.N.A. Mollusc DNA Kit (Omega nucleotide trees used GTR +G+I, based on the Akaike Information Criterion Bio-tek). DNA extraction of other samples was done using the DNeasy Blood analysis of the full SLE amino acid alignment and the full DNA COI align- and Tissue Kit (Qiagen) with minor modification, as done previously. Briefly, ment. Single trees were visualized using FigTree version 1.4.2 or Dendro- DNA extraction of tissues included an additional step to reduce the amount scope 3.5.9 (53). The tanglegram was constructed using a Neighbor-Net μ of PCR-inhibiting polysaccharides. After tissue lysis, 63 L of buffer P3 heuristic, which optimizes branch crossings, implemented in Dendroscope (54). (Qiagen) was added to the lysate and allowed to precipitate for 5 min. The lysate was spun for 10 min at full speed at 4 °C, and the resulting super- Pairwise Analysis. For the within-bivalve analyses, MAFFT-aligned SLE and COI natant was mixed with buffer AL (Qiagen) for 10 min at 56 °C and then was sequences were used (as in phylogenetic analysis). A linear trendline was mixed with ethanol and added to the column, continuing with the standard made with an intercept of zero, and the interquartile range (IQR) of the protocol. distance from the trendline was calculated. A cutoff of 1.5 times the IQR wasusedtodetermineexpectedvalues. For analysis of the TBLASTN hits, PCR. Primers used are listed in Table S2. PfuUltra II Fusion HS DNA Polymerase OrthoDB (55) was used to identify two host genes likely to be found in (Agilent) was used to amplify 10–50 ng of genomic DNA for 35 cycles. PCR all taxa containing SLEs, with minimal duplications. TopIIA (group products were purified or gel extracted using spin columns (Qiagen), and EOG091G00U2) and P-Type ATPase (group EOG091G022E) genes were se- PCR products were either sequenced directly or cloned using the Zero Blunt lected. BLASTP searches of the relevant genome databases were used to TOPO Kit (Invitrogen) and sequenced with primers flanking the cloning site. identify the genes in species not included in OrthoDB, and BLASTP was used When multiple, nearly identical clones were sequenced from a single indi- to identify the top hit in cases of multiple genes and isoforms. In vertebrates, vidual, one representative was selected for analysis. The first clone sequenced the P-type ATPase is duplicated (atp7a and atp7b); atp7a was selected from with an ORF was selected, and the first clone with a frameshift or stop codon each species. Sequences were aligned with MAFFT as above. For bivalves, a was used if no clones had an ORF. An SLE sequence was determined to be a sequence was available only for C. gigas and Mizuhopectin yessoensis,sothe distinct element if it was <80% identical at the DNA sequence level and amino distance from C. gigas was used as a proxy value for all other bivalves for all acid level. PCR amplification was considered positive only if an SLE sequence cross-phyla comparisons. Strongylocentrotus purpuratus was also used as a was obtained. proxy for Tripneustes gratilla. Therefore, intrabivalve and intraechinoderm

8of9 | www.pnas.org/cgi/doi/10.1073/pnas.1717227115 Metzger et al. Downloaded by guest on September 30, 2021 comparisons were excluded from analysis. The F84 model was used for tions) for the samples of zebra mussels (D. polymorpha)andElliptio complanata PNAS PLUS computation of nucleic acid distances with DNADist, and the JTT model was collected from the Hudson River; Josh Barber (Institute of Comparative Medicine used for computation of amino acid distances with ProtDist, v3.6a2.1 and the Zebrafish Core Facility, Columbia University Medical Center) for (J. Felsenstein). embryos of zebrafish of multiple strains; Doug Rogers and Camille Speck (Washington State Department of Fish and Wildlife); Carly Strasser and Phillipe ACKNOWLEDGMENTS. We thank the investigators who collected the St-Onge for additional M. arenaria samples; and Daniel Huson for advice on samples for the AMCC and the investigators who collected the samples the use of Dendroscope. The work was supported by the Howard Hughes used in previous studies of bivalve transmissible neoplasia, including Antonio Medical Institute (S.P.G. and M.J.M.), NIH Training Grant T32 CA009503 (to Villalba, María J. Carballal, and David Iglesias (Xunta de Galicia, Spain), M.J.M.), and a Research Experiences for Undergraduates Site Grant from the James Sherry and Carol Reinisch (Environment Canada), and Annette F. Division of Biological Infrastructure of the National Science Foundation (to Muttray and Susan A. Baldwin (University of British Columbia, Canada); A.N.P.). The funding bodies had no part in the design of the study, collection, Denise Mayer (New York State Museum, Division of Research and Collec- analysis, interpretation of data, or writing of the paper.

1. Goodier JL, Kazazian HH, Jr (2008) Retrotransposons revisited: The restraint and re- 29. Capy P, Anxolabéhère D, Langin T (1994) The strange phylogenies of transposable habilitation of parasites. Cell 135:23–35. elements: Are horizontal transfers the only explantation? Trends Genet 10:7–12. 2. Goodier JL (2014) Retrotransposition in tumors and brains. Mob DNA 5:11. 30. Gilbert C, Hernandez SS, Flores-Benabib J, Smith EN, Feschotte C (2012) Rampant 3. Kemp JR, Longworth MS (2015) Crossing the line toward genomic instability: LINE- horizontal transfer of SPIN transposons in squamate reptiles. Mol Biol Evol 29: 1 retrotransposition in cancer. Front Chem 3:68. 503–515. 4. Xiong Y, Eickbush TH (1990) Origin and evolution of retroelements based upon their 31. Pace JK, 2nd, Gilbert C, Clark MS, Feschotte C (2008) Repeated horizontal transfer of a – reverse transcriptase sequences. EMBO J 9:3353 3362. DNA transposon in mammals and other tetrapods. Proc Natl Acad Sci USA 105: 5. Malik HS, Henikoff S, Eickbush TH (2000) Poised for contagion: Evolutionary origins of 17023–17028. the infectious abilities of invertebrate retroviruses. Genome Res 10:1307–1318. 32. Gonzalez P, Lessios HA (1999) Evolution of sea urchin retroviral-like (SURL) elements: 6. Schaack S, Gilbert C, Feschotte C (2010) Promiscuous DNA: Horizontal transfer of Evidence from 40 echinoid species. Mol Biol Evol 16:938–952. transposable elements and why it matters for eukaryotic evolution. Trends Ecol Evol 33. Strasser M (1998) Mya arenaria—An ancient invader of the North Sea coast. Helgol 25:537–546. Meeresunters 52:309–324. 7. Wallau GL, Ortiz MF, Loreto EL (2012) Horizontal transposon transfer in eukarya: 34. Panaud O (2016) Horizontal transfers of transposable elements in eukaryotes: The Detection, bias, and perspectives. Genome Biol Evol 4:689–699. – 8. Dotto BR, et al. (2015) HTT-DB: Horizontally transferred transposable elements da- flying genes. CRBiol339:296 299. tabase. Bioinformatics 31:2915–2917. 35. Wijayawardena BK, Minchella DJ, DeWoody JA (2013) Hosts, parasites, and horizontal 9. Ivancevic AM, Walsh AM, Kortschak RD, Adelson DL (2013) Jumping the fine LINE gene transfer. Trends Parasitol 29:329–338. between species: Horizontal transfer of transposable elements in animals catalyses 36. Gilbert C, Schaack S, Pace JK, 2nd, Brindley PJ, Feschotte C (2010) A role for host- genome evolution. BioEssays 35:1071–1082. parasite interactions in the horizontal transfer of transposons across phyla. Nature EVOLUTION 10. Walsh AM, Kortschak RD, Gardner MG, Bertozzi T, Adelson DL (2013) Widespread 464:1347–1350. horizontal transfer of retrotransposons. Proc Natl Acad Sci USA 110:1012–1016. 37. Pearse AM, Swift K (2006) Allograft theory: Transmission of devil facial-tumour dis- 11. Arriagada G, et al. (2014) Activation of transcription and retrotransposition of a novel ease. Nature 439:549. retroelement, Steamer, in neoplastic hemocytes of the mollusk Mya arenaria. Proc 38. Murgia C, Pritchard JK, Kim SY, Fassati A, Weiss RA (2006) Clonal origin and evolution Natl Acad Sci USA 111:14175–14180. of a transmissible cancer. Cell 126:477–487. 12. Metzger MJ, Reinisch C, Sherry J, Goff SP (2015) Horizontal transmission of clonal 39. Rebbeck CA, Thomas R, Breen M, Leroi AM, Burt A (2009) Origins and evolution of a – cancer cells causes leukemia in soft-shell clams. Cell 161:255 263. transmissible cancer. Evolution 63:2340–2349. 13. Metzger MJ, et al. (2016) Widespread transmission of independent cancer lineages 40. Siah A, McKenna P, Danger JM, Johnson GR, Berthe FC (2011) Induction of trans- – within multiple bivalve species. Nature 534:705 709. posase and polyprotein RNA levels in disseminated neoplastic hemocytes of soft-shell 14. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplifica- clams: Mya arenaria. Dev Comp Immunol 35:151–154. tion of mitochondrial cytochrome c oxidase subunit I from diverse metazoan inver- 41. AboElkhair M, et al. (2009) Reverse transcriptase activity associated with haemic tebrates. Mol Mar Biol Biotechnol 3:294–299. neoplasia in the soft-shell clam Mya arenaria. Dis Aquat Organ 84:57–63. 15. Zhang G, et al. (2012) The oyster genome reveals stress adaptation and complexity of 42. AboElkhair M, et al. (2009) Reverse transcriptase activity in tissues of the soft shell shell formation. Nature 490:49–54. clam Mya arenaria affected with haemic neoplasia. J Invertebr Pathol 102:133–140. 16. Paynter AN, Metzger MJ, Sessa JA, Siddall ME (2017) Evidence of horizontal trans- 43. House ML, Kim CH, Reno PW (1998) Soft shell clams Mya arenaria with disseminated mission of the cancer-associated Steamer retrotransposon among ecological cohort – bivalve species. Dis Aquat Organ 124:165–168. neoplasia demonstrate reverse transcriptase activity. Dis Aquat Organ 34:187 192. 17. Combosch DJ, et al. (2017) A family-level tree of life for bivalves based on a Sanger- 44. Shukla R, et al. (2013) Endogenous retrotransposition activates oncogenic pathways – sequencing approach. Mol Phylogenet Evol 107:191–208. in hepatocellular carcinoma. Cell 153:101 111. 18. Plazzi F, Passamonti M (2010) Towards a molecular phylogeny of mollusks: Bivalves’ 45. Katzir N, Arman E, Cohen D, Givol D, Rechavi G (1987) Common origin of trans- early evolution as revealed by mitochondrial genes. Mol Phylogenet Evol 57:641–657. missible venereal tumors (TVT) in dogs. Oncogene 1:445–448. 19. Kano Y, Kimura S, Kimura T, Warén A (2012) Living monoplacophora: Morphological 46. Katzir N, et al. (1985) “Retroposon” insertion into the cellular oncogene c-myc in conservatism or recent diversification? Zool Scr 41:471–488. canine transmissible venereal tumor. Proc Natl Acad Sci USA 82:1054–1058. 20. Hedges SB, Marin J, Suleski M, Paymer M, Kumar S (2015) Tree of life reveals clock-like 47. Choi YK, Kim CJ (2002) Sequence analysis of canine LINE-1 elements and p53 gene in speciation and diversification. Mol Biol Evol 32:835–845. canine transmissible venereal tumor. J Vet Sci 3:285–292. 21. Bartolomé C, Bello X, Maside X (2009) Widespread evidence for horizontal transfer of 48. Wang X, Liu X (2016) Close ecological relationship among species facilitated hori- transposable elements across Drosophila genomes. Genome Biol 10:R22. zontal transfer of retrotransposons. BMC Evol Biol 16:201. 22. El Baidouri M, et al. (2014) Widespread and frequent horizontal transfers of trans- 49. Venner S, et al. (2017) Ecological networks to unravel the routes to horizontal posable elements in plants. Genome Res 24:831–838. transposon transfers. PLoS Biol 15:e2001536. 23. Roulin A, et al. (2009) Whole genome surveys of rice, maize and sorghum reveal 50. Strasser CA, Barber PH (2009) Limited genetic variation and structure in softshell multiple horizontal transfers of the LTR-retrotransposon Route66 in Poaceae. BMC clams (Mya arenaria) across their native and introduced range. Conserv Genet 10: Evol Biol 9:58. 803–814. 24. Song SU, Gerasimova T, Kurkulos M, Boeke JD, Corces VG (1994) An env-like protein 51. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: encoded by a Drosophila retroelement: Evidence that gypsy is an infectious retrovirus. Improvements in performance and usability. Mol Biol Evol 30:772–780. Genes Dev 8:2046–2057. 52. Guindon S, et al. (2010) New algorithms and methods to estimate maximum- 25. Konieczny A, Voytas DF, Cummings MP, Ausubel FM (1991) A superfamily of Arabi- likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol 59: dopsis thaliana retrotransposons. Genetics 127:801–809. – 26. Lin X, Faridi N, Casola C (2016) An ancient transkingdom horizontal transfer of 307 321. Penelope-like retroelements from arthropods to conifers. Genome Biol Evol 8: 53. Huson DH, Scornavacca C (2012) Dendroscope 3: An interactive tool for rooted phy- – 1252–1266. logenetic trees and networks. Syst Biol 61:1061 1067. 27. Suh A, et al. (2016) Ancient horizontal transfers of retrotransposons between birds 54. Scornavacca C, Zickmann F, Huson DH (2011) Tanglegrams for rooted phylogenetic and ancestors of human pathogenic nematodes. Nat Commun 7:11396. trees and networks. Bioinformatics 27:i248–i256. 28. Novikova O, Smyshlyaev G, Blinov A (2010) Evolutionary genomics revealed inter- 55. Zdobnov EM, et al. (2017) OrthoDB v9.1: Cataloging evolutionary and functional kingdom distribution of Tcn1-like chromodomain-containing Gypsy LTR retro- annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic transposons among fungi and plants. BMC Genomics 11:231. Acids Res 45:D744–D749.

Metzger et al. PNAS Latest Articles | 9of9 Downloaded by guest on September 30, 2021