<<

Vassetzky et al. Mobile DNA (2021) 12:10 https://doi.org/10.1186/s13100-021-00238-y

RESEARCH Open Access New Ther1-derived SINE Squam3 in scaled Nikita S. Vassetzky1,2* , Sergei A. Kosushkin2, Vitaly I. Korchagin1 and Alexey P. Ryskov1

Abstract Background: SINEs comprise a significant part of genomes and are used to study the evolution of diverse taxa. Despite significant advances in SINE studies in vertebrates and higher eukaryotes in general, their own evolution is poorly understood. Results: We have discovered and described in detail a new Squam3 SINE specific for scaled reptiles (). The subfamilies of this SINE demonstrate different distribution in the genomes of squamates, which together with the data on similar SINEs in the tuatara allowed us to propose a scenario of their evolution in the context of reptilian evolution. Conclusions: Ancestral SINEs preserved in small numbers in most genomes can give rise to taxa-specific SINE families. Analysis of this aspect of SINEs can shed light on the history and mechanisms of SINE variation in reptilian genomes. Keywords: SINEs, Retrotransposons, Squamata, Reptilia, Evolution

Background sequences recognized by the enzymes of their partner Genomes are invaded by various repetitive elements, the LINE for reverse transcription/integration. most abundant of which (at least in higher eukaryotes) A typical SINE consists of the head derived from one are Long and Short INterspersed Elements (LINEs and of the cellular RNA (tRNA, 7SL RNA, or 5S SINEs, respectively). The amplification cycle of these ret- RNA); the body, the terminal part of which is recognized rotransposons includes the transcription of their gen- by the partner reverse transcriptase (RT); and the tail, a omic copies, reverse transcription and integration into stretch of simple repeats. There are variations; certain the genome. LINEs rely on the transcription by the cel- SINEs have no body or their body contains sequences of lular RNA polymerase II, while reverse transcription and unknown origin and function (some of them called cen- integration are fulfilled by their own enzymes. SINEs do tral domains) that are shared between otherwise unre- not encode any enzymes and employ the cell machinery lated SINE families, etc. [1]. for their transcription by RNA polymerase III (pol III) LINEs are found in the genomes of all higher eukary- and the machinery of their partner LINE for their reverse otes. Clearly, SINEs cannot exist without LINEs but not transcription and integration into chromosomes. Accord- vice versa; there are rare genomes that have LINEs but ingly, SINEs have pol III promoters for transcription and lack SINEs (e.g., Saccharomyces or Drosophila). During evolution, LINE (sub)families can become inactive and their partner SINEs also cease to amplify. If another * Correspondence: [email protected] LINE family becomes active in a particular genome, re- 1Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, placement of the sequence recognized by its RT can re- Russia animate a SINE [2]. Usually, a genome harbors one or 2Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia several SINE families; some of them can be inactive and

© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Vassetzky et al. Mobile DNA (2021) 12:10 Page 2 of 10

were amplified in the ancestors. The analysis of SINE found beyond Squamata (see below). The analysis of variation in different taxa allows us to use them as their consensus sequences has revealed three major reliable phylogenetic markers [3, 4]. subfamilies that we called Squam3A, Squam3B, and The main lineages of the - clade are scaled Squam3C. reptiles (Squamata), tuatara (Rhynchocephalia), turtles (Testudines), crocodiles (), and (Aves). Squam3 structure Squamata, the largest order of reptiles, include the fol- Squam3 is a typical SINE [13] composed of the tRNA- lowing major lineages: Serpentes (), Iguania (in- derived head, the body with a central domain and the cluding iguanids, agamids, ), , 3′-terminus matching that of the partner LINE, and the , , , and . tail, a stretch of several simple repeats. The consensus Phylogenetic relations among squamate reptiles are sequences range from 218 to 239 nt (without tail). There highly controversial due to the conflicting signals pro- is no clear preference for a particular tRNA species vided by molecular, morphological, and paleontological (which is not uncommon among SINEs). data. Together with tuatara, the only extant representa- The body is similar to a fragment of the CORE central tive species of Rhynchocephalia, they form monophyletic domain; the pronounced similarity spans over 28 nt superorder Lepidosauria, which is the sister group to (double-overlined in Fig. 1). There is also a similarity Archelosauria, the clade that contains archosaurs (croco- with the very 3′-terminus of LINEs of the L2 clade iden- diles and birds) and turtles [5]. tified in Darevskia valentini (data not shown) and a less The first reptile SINE was found in 1990 in the pronounced similarity with L2 LINEs of Anolis caroli- Chinesepondturtle[6]; currently, we know approxi- nensis (L2-26_ACar and L2-24_ACar in Repbase). mately ten SINE families in reptiles [1] with a differ- The tail of Squam3 is largely composed of (TAAA)n or ent taxonomic distribution, e.g., Cry is limited to (CTT)n; however, certain species have (GTT)n, (ATT)n, turtles and degraded copies of AmnSINE, which was or poly(A) (Table 1). Squam3 has a very low rate of tar- active in the ancestor of amniotes [7], can be found get site duplications. This is unusual but not exceptional far beyond reptiles. Another example is Ther1 initially among SINEs and can point to an alternative cleavage described as a mammalian SINE (MIR) but renamed pattern in different DNA strands by the partner LINE later [8, 9]. Several known Ther1/MIR subfamilies endonuclease [13]. (MIRb, MIRc, and MIR_Testu) have minor differences from Ther1 except the Alligator mississippiensis’s MIR1_ Squam3 subfamilies AMi with an extended deletion. Moreover, active Ther1/ Genomic copies of SINEs are subject to random muta- MIR SINEs were found in non-avian reptiles, so ample tions; accordingly, single-nucleotide mutations can be and diverse derived SINEs could be expected in their ge- used to identify subfamilies only for highly conserved nomes [10]. This is further corroborated by active diversi- SINEs. We use extended insertions/deletions to distin- fication of reptilian L2 [11]. guish between the three major Squam3 subfamilies des- Despite active sequencing of genomes of various ignated as Squam3A, Squam3B, and Squam3C (Fig. 1). species of and snakes, no detailed comparative Squam3B has a characteristic 11-nt insertion (marked in genomicstudiesofaSINEfamilyindifferenttaxaat pink in Fig. 1), and Squam3C has a characteristic 7-nt the order level are available. We discovered a new insertion (marked in blue in Fig. 1). There are also SINE named Squam3 in the genomes of Darevskia minor differences between the Squam3 subfamilies. In and Anolis lizards. Further analysis demonstrated their addition, there are sub-subfamilies; one of these distribution throughout squamates; a similar SINE (Squam3B3) has become a major variant in the two was found in the tuatara [12]butnotinotherreptiles species. or birds. However, Squam3 remained unnoticed in al- Further analysis of Squam3-related sequences in the most 40 genomes of squamates. Here, we analyzed tuatara genome has revealed a similar SINE (tuaMIRa) the structure, distribution, and evolution of Squam3 with a 32-nt insertion (marked in amaranth in Fig. 1). and its relatives. This insertion restores the CORE central domain and makes the element similar to Ther1 (MIR). It should be Results noted that this deletion in Squam3 and tuaMIRс relative Squam3 identification to Ther1 is distinct from the deletion in MIR1_AMi The consensus sequence of Darevskia Squam3 was used (Fig. S2A). TuaMIR SINEs also have an 8–13-nt deletion to search the genomes of scaled reptiles. It was found in in the LINE-derived region (marked in violet in Fig. 1). all sequenced genomes (as well as in a variety of Gen- Moreover, another element (tuaMIRb) with a similar in- Bank sequences of squamate species whose genomes sertion lacks the ~ 40-nt region between the CORE and have not been sequenced; Table S1). No Squam3 was the LINE-derived region conserved in other Squam3- Vassetzky et al. Mobile DNA (2021) 12:10 Page 3 of 10

Fig. 1 Sequence alignment of Squam3 subfamilies of squamate reptiles with tuatara tuaMIR SINEs and Ther1. The tRNA-derived region, CORE central domain, LINE-derived region, and tail are indicated above the sequences. See text for other explanations and Ther1-related SINEs but has a much longer L2 reported these SINEs, so we use their nomenclature of LINE-derived region due to the 77-nt insertion (marked tuatara SINEs. in mango in Fig. 1). The sequences of these tuatara SINE The genomes of Gekkota and Lacertoidea (Gekkonidae, families were recently reported [12] but only the relation , , and families) had both to MIR (former name of Ther1) and the mean diver- Squam3A and Squam3B subfamilies in similar propor- gence of all Ther1-related sequences were mentioned. tions (although the proportion of Squam3A could be oc- Apart from that, Squam3 subfamilies differ by the tail, casionally as low as 12%). Snakes had the Squam3C which is largely (TAAA)n in Squam3A/C or (CTT)n in subfamily except for the python, which had 43% Squam3B. The mean sequence similarity also differs be- Squam3A. The rest of the squamates (, tween subfamilies, it peaks in Squam3B (up to 94%) but , , , and families) is lower in Squam3C (~ 63%) and Squam3A (54–63%). had the Squam3A subfamily alone (Table 1). The analysis Figure 2 visualizes the diversity of Squam3 in the ge- of individual NCBI sequences of squamate species not nomes of lizards, snakes, and tuatara. Squam3C in most listed in Table 1 largely confirms this pattern except that a species demonstrates little variation between spe- few highly divergent Squam3A sequences were found in cies; this contrasts with the diversity within Squam3A three more snake families (, Lamprophiidae, and and Squam3B subfamilies. The tuatara SINEs clearly ) (Table S1). We specifically searched for constitute a cluster separate from Ther1. Squam3A in one of the advanced snakes (Vipera berus), The number of Squam3 full-length copies varied over and found ~ 330 copies. a wide range: from ~ 500 in Anolis carolinensis to ~ 260, The tuatara (Sphenodontidae) has a set of tuaMIR 000 in Gekko japonicas (0.005 and 2.55% of the genomes families related to Squam3 and Ther1. Thus, we specific- by length, respectively) (Fig. 3). The mean similarity of ally searched for these sequences in the genomes of Squam3 subfamilies in most species is 60–65% with the Squamata. No tuaMIRb or tuaMIRc were found, while notable exceptions of Squam3B (~ 90%) and Squam3A minor tuaMIRa quantities exist in all squamate genomes in Iguania (53%). analyzed ranging from a single full-length copy to ~ 500 (in Shinisaurus crocodilurus) (Table S2). All snakes have Distribution of Squam3 in reptile genomes a single tuaMIRa copy in the same genomic locus (as We next searched for the consensus sequences of judged by very similar flanking regions). Squam3 subfamilies in genomes of squamates and neighboring taxa. Overall, the genomes of 38 squamates, tuatara, turtle (Trachemys scripta elegans), crocodile Squam3 and other similar CORE SINEs (Crocodylus porosus), and bird (Gallus gallus) were ana- We compared Squam3 with tuaMIR and other CORE- lyzed. Squam3 was found in all squamates but neither in containing SINEs of vertebrates. While the 5′-sequences other reptiles nor in birds (Table 1). Similar SINE fam- of all COREs are similar, the characteristic deletion ilies were found in the tuatara (Sphenodon punctatus). (marked in amaranth in Fig. 1) distinguishes all Squam3 When this work was in progress, Gemmel et al. [12] and tuaMIRc from other SINEs (Fig. S2C). Table 1 Squam3 SINE in scaled reptiles. Major subfamilies are described by the proportion and estimated number of full-length copies, the mean sequence similarity, and the Vassetzky tail repeat unit. Certain parameters of genome assemblies are given in the left columns (the level column indicates the chromosome-, scaffold-, and contig-levels levels of assembly: , , and , respectively) ta.Mbl DNA Mobile al. et Family Species Squam3 SINE Genome assembly

subfamily copies lengthw/o similarity tail level ‘N’ N50 Reference tail, nt

Squamata Gekkota Gekkonidae Gekko japonicus 3A (21%) 54,829 224 60% (TAAA)n 4% 707,733 [14]

3B (10%) 26,109 238 75% (CTT)n (2021)12:10 3B3 (69%) 180,151 271 81% (CTT)n

Paroedura picta 3A (12%) 17,761 221 57% (TAAA)n 9% 4,106,116 [15]

3B (19%) 28,122 238 61% (CTT)n

3B3 (57%) 84,367 267 74% (CTT)n

Eublepharidae Eublepharis macularius 3A (50%) 68,299 224 60% (TAAA)n 2% 663,762 [16]

3B (50%) 63,045 239 85% (GTT)n

Lacertoidea Lacertidae Darevskia valentini 3A (48%) 25,848 218 63% (TAAA)n 16% 658,539 [17]

3B (52%) 28,003 238 91% (CTT)n

Lacerta agilis 3A (16%) 17,446 219 62% (TAAA)n 0% 86,565,987 [18]

3B (84%) 91,590 238 92% (CTT)n

Lacerta bilineata 3A (39%) 24,123 219 63% (TAAA)n 0% 368,212 [19]

3B (61%) 37,732 238 75% (CTT)n

Lacerta viridis 3A (39%) 24,836 219 61% (TAAA)n 0% 662,519

3B (61%) 38,847 238 94% (CTT)n

Podarcis muralis 3A (35%) 16,967 220 61% (TAAA)n 0% 92,398,148 [20]

3B (65%) 31,440 238 88% (CTT)n

Zootica vivipara 3A (38%) 10,036 220 61% (TAAA)n 3% 92,810,032 [21]

3B (62%) 16,374 238 89% (CTY)n

Teiidae Salvator merianae 3A (53%) 4892 221 54% (TAAA)n 2% 55,382,274 [22]

3B (47%) 4338 234 85% (CTT)n

Serpentes Pantherophis guttatus 3C 12,936 226 64% (TAAA)n 5% 16,790,024 [23]

Pantherophis obsoletus 3C 12,961 226 63% (TAAA)n 3% 14,519,768 [24]

Ptyas mucosa 3C 19,524 226 63% (TAAA)n 3% 15,963,960 [25]

Thamnophis elegans 3C 16,934 226 64% (TAAA)n 0% 440,193 [26]

Thamnophis sirtalis 3C 12,410 226 63% (TAAA)n 21% 647,592 [27]

Thermophis baileyi 3C 15,914 228 65% (TAAA)n 8% 2,413,955 [28]

Elapidae Emydocephalus ijimae 3C 15,782 226 62% (TAAA)n 0% 18,937 [29] ae4o 10 of 4 Page

Hydrophis cyanocinctus 3C 15,094 226 64% (TAAA)n 4% 7437 [30]

Hydrophis hardwickii 3C 15,782 228 62% (TAAA)n 4% 5391 [31]

Hydrophis melanocephalus 3C 14,271 226 63% (TAAA)n 11% 59,810 [29] Table 1 Squam3 SINE in scaled reptiles. Major subfamilies are described by the proportion and estimated number of full-length copies, the mean sequence similarity, and the Vassetzky tail repeat unit. Certain parameters of genome assemblies are given in the left columns (the level column indicates the chromosome-, scaffold-, and contig-levels levels of assembly: , , and , respectively) (Continued) ta.Mbl DNA Mobile al. et Family Species Squam3 SINE Genome assembly

subfamily copies lengthw/o similarity tail level ‘N’ N50 Reference tail, nt

Laticauda colubrina 3C 19,118 226 63% (TAAA)n 13% 3,139,541

Laticauda laticaudata 3C 27,835 226 61% (TAAA)n 0% 39,330 (2021)12:10 Naja naja 3C 10,813 226 64% (TAAA)n 6% 224,088,900 [32]

Notechis scutatus 3C 27,122 226 63% (TAAA)n 5% 5,997,050 [33]

Ophiophagus hannah 3C 11,613 226 63% (TAAA)n 13% 241,519 [34]

Pseudonaja textilis 3C 17,187 226 65% (TAAA)n 2% 14,685,528 [35]

Pythonidae Python bivittatus 3A (43%) 9349 221 58% (TAAA)n 4% 213,970 [36]

3C (57%) 12,393 237 75% (A)n

Viperidae horridus 3C 15,006 226 63% (TAAA)n 12% 23,829 [37]

Crotalus pyrrhus 3C 15,556 226 64% (TAAA)n 0% 5299 [38]

Crotalus viridis viridis 3C 18,694 226 63% (TAAA)n 6% 179,897,795 [39]

Protobothrops flavoviridis 3C 20,667 226 64% (TAAA)n 3% 467,050 [40]

Protobothrops mucrosquamatus 3C 20,184 228 64% (TAAA)n 8% 424,052 [41]

Vipera berus berus 3C 19,964 226 64% (TAAA)n 14% 126,452 [42]

Shinisauria Shinisauridae Shinisaurus crocodilurus 3A 165,288 225 58% (TAAA)n 8% 1,469,749 [43]

Anguimorpha Anguidae Dopasia gracilis 3A 35,118 225 71% (TAAA)n 3% 1,273,270 [44]

Varanoidea Varanidae Varanus komodoensis 3A 108,651 229 66% (TAAA)n 1% 23,831,982 [45]

Iguania Agamidae Pogona vitticeps 3A 4542 221 53% (TAAA)n 4% 2,290,546 [46]

Dactyloidae Anolis carolinensis 3A 457 217 54% (TAA)n 5% 150,641,573 [47] Rhynchocephalia Sphenodon punctatus ───── 10% 3,052,611 [12] Testudines Trachemys scripta elegans ───── 1% 147,425,149 [48] Crocodilia Crocodylus porosus ───── 5% 84,437,661 [49] Aves Gallus gallus ───── 1% 91,315,245 [50] ae5o 10 of 5 Page Vassetzky et al. Mobile DNA (2021) 12:10 Page 6 of 10

Fig. 2 Unrooted NJ tree of consensus sequences of Squam3 and tuaMIR SINEs

Discussion legless . On the contrary, Squam3A gradually de- One of the most intriguing aspects of SINEs is how they clined in Agamidae (~ 4500 copies and 53% mean simi- emerged and evolved. This study gives us a unique op- larity in Pogona vitticeps). Finally, Squam3A ceased to portunity to trace this for a single SINE family in a very propagate (and evolve) in Dactyloidae (< 500 copies in wide range of taxa. The Squam3 SINE was found in Anolis carolinensis). scaled reptiles (Squamata) but not in the tuatara While other Squam3 subfamilies emerged in squamate (Rhynchocephalia) and further lineages including croco- lineages, Squam3A continued to amplify in Gekkota and diles, birds, and turtles. We found three major subfam- Lacertoidea (from ~ 5000 to ~ 65,000 copies) but not in ilies distinguished by relatively long insertions/deletions snakes (except primitive ones, ~ 9000 in Python bivitta- (Squam3A, Squam3B, and Squam3C). They also differ tus). We could find only ~ 300 copies in Vipera berus; by the number of copies and the mean sequence similar- individual copies were also found in non-genomic se- ity, which points to the age of a SINE subfamily (to be quences of four other snake families (Table S2). precise, to the time of its amplification) since SINE gen- After Squam3A declined in the Gekkota and Lacertoi- omic copies are not subject to selective pressure and dea, their genomes gave rise to the Squam3B subfamily. gradually accumulate mutations with time. It is arguably the youngest Squam3 subfamily. Amaz- ingly, the mean similarity of Squam3B is very high in La- Evolution of Squam3 certa agilis (92%) and L. viridis (94%) but as low as 75% Overall, presumably there was a small pool (a few hun- in L. bilineata. This indicates that Squam3B is likely ac- dred?) of not very active Squam3A in the genomes of tive in L. viridis and L. agilis but not in L. bilineata ancestral Squamata. In some lineages (Shinisauridae and representing the same . In Gekkonidae, the more Varanidae), Squam3A amplified quite actively without prolific Squam3B3 sub-subfamily emerged (~ 180,000 significant sequence modifications (to reach ~ 165,000 copies in Gekko japonicus, which is the highest number copies in Shinisaurus crocodilurus; the number of of all Squam3 subfamilies). For some reason, the activity Squam3 copies was higher only in the Gekko japonicus of both Squam3A and Squam3B was low in Teiidae (Sal- with a ~ twice larger genome). Squam3A amplification vator merianae) but still, Squam3B amplified later than was also active in Anguidae (~ 35,000 copies in Dopasia Squam3A. gracilis) but it started relatively recently considering the The Squam3C subfamily is limited to snakes; more- high mean similarity (71%) of the SINE sequences in this over, it is the only major subfamily in most snakes. Vassetzky et al. Mobile DNA (2021) 12:10 Page 7 of 10

Fig. 3 Schematic distribution of Squam3 SINEs in Squamata lineages studied. Colored band lengths are proportional to their copy numbers in genomes and the numbers above indicate the mean similarity of individual copies. If more than one species was available, the mean values are given

Squam3A quantities were probably present in all squa- the CORE domain in the Lepidosauria ancestor and the mates but did not propagate in most snakes. Instead, the same region is present in related SINEs (Figs. S2B and Squam3C in advanced snakes (Caenophidia) became ac- S2C). This precursor SINE gave rise to tuaMIRс in the tive slightly later or in the same period of time (the tuatara and Squam3 in Squamata. mean Squam3C similarity is 61–65% vs. 51–71% in Squam3A). This pattern is not true for Python bivittatus Conclusions representing more primitive snakes, where the amplifica- We discovered a new SINE Squam3 found in all (38 to the tion of Squam3A was followed by that of Squam3C (with time of analysis) sequenced genomes of scaled reptiles the mean similarities of 58 and 75%, respectively). (Squamata). Despite the ever-increasing amount of gen- omic data for lizards and snakes, this quite prolific SINE Origin of Squam3 was not reported previously. The evolutionary dynamics We were very excited to find what is called the “missing of SINE families and subfamilies is obscure and linked to link” of Sqaum3 evolution in the tuatara. The genome of the divergence of the genomes. This study is a step for- Sphenodon punctatus has three SINE families that are ward in understanding how SINEs emerge and decline. similar to Squam3 in the leftmost ~ 120 nt except the 32- We identified and described Squam3 subfamilies and dir- nt deletion in Squam3 relative to two of them (tuaMIRa ectly compared their structural traits and copy number and tuaMIRb). Thus, a large CORE fragment was deleted across a variety of major squamate taxa in comparison in two tuaMIR SINEs. Another tuatara SINE (tuaMIRс) with related tuatara SINE families. This study gives an has this deletion and is similar to Squam3 within this re- insight into how SINE families emerge and evolve. gion (but differs in the head and LINE-derived regions). It is plausible that the ancestor of Ther1 that was active in Methods the common ancestor of , reptiles, birds, and Most genomic data were downloaded from NCBI Ge- even coelacanth [9, 51] acquired the 32-nt deletion within nomes (https://www.ncbi.nlm.nih.gov/genome) except Vassetzky et al. Mobile DNA (2021) 12:10 Page 8 of 10

Anolis carolinensis, Podarcis muralis (Ensembl, https:// picta; Squam3GjB3, Gekko japonicus; Squam3EmB, Eublepharis macularius; www.ensembl.org), Dopasia gracilis, Shinisaurus crocodi- Squam3SmB, Salvator merianae. lurus (diArk, https://www.diark.org/diark), and Darevs- Additional file 2: Fig. S2. A. Alignment of Ther1/MIR subfamilies. B. kia valentini [17]. We used the genomic sequences of Comparison of full-length consensus sequences of Squam3, tuaMIR and other CORE SINEs with tRNA- and L2-derived regions. The corresponding Lacerta agilis and Thamnophis elegans with permission regions are indicated above the sequences. C. CORE domains of CORE from the Vertebrate Genomes Project. Individual se- SINEs in vertebrates. The characteristic Squam3 deletion is marked in am- quences of squamate species not listed in Table 1 were aranth (as in Fig. 1). also extracted from NCBI (https://www.ncbi.nlm.nih. Additional file 3: Fig. S3. Alignment of LINE-derived regions of tua- MIRb and Ther1 and 3′-terminal sequences of several L2 LINEs. The origin gov//advanced). If no data on the genome size and total length is given in parentheses. was available in publications or the Animal Genome Size Additional file 4: Table S1. Squam3 copies found in individual NCBI Database [52], it was calculated as the mean of most sequences of squamate species not listed in Table 1. close species. Additional file 5: Table S2. Distribution of tuaMIR subfamilies in We used custom Perl scripts based on the Smith- genomes of studied. Waterman search to find genomic copies of SINEs with at least 65% identity and 90% length overlap with the Acknowledgments We thank Dr. Dmitri Kramerov for critical reading of the manuscript. consensus. After all Squam3 families were identified, the genome bank was successively depleted using their con- Authors’ contributions sensus sequences and all hits were combined for further NSV and APR, conceptualization; all, genomic data analysis; NSV and SAK, study design and manuscript preparation; APR and VIK, supervision; APR, analysis. project administration and funding acquisition. All authors read and Multiple sequence alignments were generated using approved the final manuscript. MAFFT [53] and edited by GeneDoc [54]. Subfamilies Funding were identified manually and analyzed in a larger sample This research was funded by the Russian Science Foundation (RSF) Project if necessary. We considered only ample subfamilies (≥1% No. 19-14-00083. of the total number of full-length copies). A search for Availability of data and materials tuaMIR SINEs in reptile/bird genomes was carried out The data generated are available in the manuscript supporting files. The by initial identification of all copies with at least 65% banks of Squam3 SINEs, as well as multiple alignments of random sets of similarity to the consensus sequences followed by man- SINE sequences, are available for each species on request. ual subsampling and realigning of candidate copies pos- Declarations sibly containing specific mutations separating them from tuaMIRa sequences. The mean similarity was deter- Ethics approval and consent to participate mined for 100 randomly selected sequences (or all avail- Not applicable. able if less) using the alistat program (Eddy S., Consent for publication Cambridge, [55]). A neighbor-joining tree was con- Not applicable. structed using MEGA software with 1000 bootstrap rep- “ ” Competing interests lications and the partial deletion option. The authors declare that they have no competing interests.

Received: 16 December 2020 Accepted: 25 February 2021 Supplementary Information The online version contains supplementary material available at https://doi. org/10.1186/s13100-021-00238-y. References 1. Vassetzky NS, Kramerov DA. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 2013;41:D83–9 [cited 2014 Jun 4]. Available from: http:// Additional file 1: Fig. S1. Alignment of species-specific Squam3 se- quences. Green, Squam3A; red, Squam3B; blue - Suam3C. Species desig- www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3531059&tool= nations are: Squam3EmA, Eublepharis macularius; Squam3GjA, Gekko pmcentrez&rendertype=abstract. japonicus; Squam3PpA, Paroedura picta; Squam3Vk, Varanus komodoensis; 2. Kramerov DA, Vassetzky NS. Origin and evolution of SINEs in eukaryotic genomes. Heredity (Edinb). 2011;107:487–95 Available from: http://www. Squam3Ch, Crotalus horridus; Squam3Pt, Pseudonaja textilis; Squam3Cp, Crotalus pyrrhus; Squam3Pg, Pantherophis guttatus; Squam3Nn, Naja naja; ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Cita Squam3Oh, Ophiophagus hannah; Squam3Pmc, Protobothrops mucrosqua- tion&list_uids=21673742. matus; Squam3Ts, Thamnophis sirtalis; Squam3Hc, Hydrophis cyanocinctus; 3. Shedlock AM, Takahashi K, Okada N. SINEs of speciation: tracking lineages with retroposons. Trends Ecol Evol. 2004;19:545–53 Available from: http:// Squam3Te, Thamnophis elegans; Squam3Pf, Protobothrops flavoviridis; Squam3Po, Pantherophis obsoletus; Squam3Cv, Crotalus viridis; Squam3Hh, www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt= Hydrophis hardwickii; Squam3Tb, Thermophis baileyi; Squam3Vb, Vipera Citation&list_uids=16701320. berus; Squam3Ej Emydocephalus ijimae; Squam3Hm, Hydrophis melanoce- 4. Suh A, Bachg S, Donnellan S, Joseph L, Brosius J, Kriegs JO, et al. De-novo emergence of SINE retroposons during the early evolution of passerine phalus; Squam3Ll, Laticauda laticaudata; Squam3Lc, Laticauda colubrina; – Squam3Ns, Notechis scutatus; Squam3Pr, Protobothrops mucrosquamatus; birds. Mob DNA Mobile DNA. 2017;8:1 8. 5. Crawford NG, Parham JF, Sellas AB, Faircloth BC, Glenn TC, Papenfuss TJ, et Squam3PbC, Python bivittatus; Squam3DvB, Darevskia valentini; – Squam3LbB, Lacerta bilineata; Squam3LaB, Lacerta agilis; Squam3LvB, La- al. A phylogenomic analysis of turtles. Mol Phylogenet Evol. 2015;83:250 7 Elsevier Inc. Available from: https://doi.org/10.1016/j.ympev.2014.10.021. certa viridis; Squam3PmB, Podarcis muralis; Squam3ZvB, Zootica vivipara; Squam3GjB, Gekko japonicus; Squam3PpB and Squam3PpB3, Paroedura 6. Endoh H, Nagahashi S, Okada N. A highly repetitive and transcribable sequence in the tortoise genome is probably a retroposon. Eur J Biochem. Vassetzky et al. Mobile DNA (2021) 12:10 Page 9 of 10

1990;189:25–31 Available from: http://www.ncbi.nlm.nih.gov/pubmed/1691 29. Kishida T, Go Y, Tatsumoto S, Tatsumi K, Kuraku S, Toda M. Loss of olfaction 979. in sea snakes provides new perspectives on the aquatic adaptation of 7. Nishihara H, Smit AF, Okada N. Functional noncoding sequences derived amniotes. Proc R Soc B Biol Sci. 2019;286. from SINEs in the mammalian genome. Genome Res. 2006;16:864–74 30. Hydrophis cyanocinctus (ID 75161) - Genome - NCBI [Internet]. [cited 2020 Available from: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd= Dec 2]. Available from: https://www.ncbi.nlm.nih.gov/genome/75161 Retrieve&db=PubMed&dopt=Citation&list_uids=16717141. ?genome_assembly_id=437861 8. Smit AF, Riggs AD. MIRs are classic, tRNA-derived SINEs that amplified 31. Hydrophis hardwickii (ID 75162) - Genome - NCBI [Internet]. [cited 2020 Dec before the mammalian radiation. Nucleic Acids Res. 1995;23:98–102 2]. Available from: https://www.ncbi.nlm.nih.gov/genome/75162?genome_a Available from: http://www.ncbi.nlm.nih.gov/pubmed/7870595. ssembly_id=437862 9. Gilbert N, Labuda D. Evolutionary inventions and continuity of CORE-SINEs 32. Suryamohan K, Krishnankutty SP, Guillory J, Jevit M, Schröder MS, Wu in mammals. J Mol Biol. 2000;298:365–77 Available from: http://www.ncbi. M, et al. The Indian cobra reference genome and transcriptome nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids= enables comprehensive identification of toxins. Nat Genet. 10772856&dopt=Abstract. 2020;52:106–17 Springer US. Available from: https://doi.org/10.1038/ 10. Shedlock AM, Botka CW, Zhao S, Shetty J, Zhang T, Liu JS, et al. s41588-019-0559-8. Phylogenomics of nonavian reptiles and the structure of the ancestral 33. Notechis scutatus (ID 14408) - Genome - NCBI [Internet]. [cited 2020 Dec 2]. amniote genome. Proc Natl Acad Sci U S A. 2007;104:2767–72. Available from: https://www.ncbi.nlm.nih.gov/genome/14408?genome_a 11. Shedlock AM. Phylogenomic investigation of CR1 LINE diversity in reptiles. ssembly_id=408294 Syst Biol. 2006;55:902–11. 34. Vonk FJ, Casewell NR, Henkel CV, Heimberg AM, Jansen HJ, McCleary RJR, et 12. Gemmell NJ, Rutherford K, Prost S, Tollis M, Winter D, Macey JR, et al. The al. The king cobra genome reveals dynamic gene evolution and adaptation tuatara genome reveals ancient features of amniote evolution. Nature. 2020; in the snake venom system. Proc Natl Acad Sci U S A. 2013;110:20651–6. 584:403–9. 35. Pseudonaja textilis (ID 72610) - Genome - NCBI [Internet]. [cited 2020 Dec 2]. 13. Kramerov DA, Vassetzky NS. SINEs. Wiley Interdiscip Rev RNA. 2011;2:772–86 Available from: https://www.ncbi.nlm.nih.gov/genome/72610?genome_a [cited 2014 Jun 4]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21 ssembly_id=408420 976282. 36. Castoe TA, De Koning APJ, Hall KT, Card DC, Schield DR, Fujita MK, et al. 14. Liu Y, Zhou Q, Wang Y, Luo L, Yang J, Yang L, et al. Gekko japonicus Erratum: The Burmese python genome reveals the molecular basis for genome reveals evolution of adhesive toe pads and tail regeneration. Nat extreme adaptation in snakes (Proceedings of the National Academy of Commun. 2015;6 Nature Publishing Group. [cited 2020 Nov 29]. Available Sciences of the United States of America (2013) 110, 51, (20645–20650) DOI: from: https://pubmed.ncbi.nlm.nih.gov/26598231/. 10.1073/pnas. 1314475110). Proc Natl Acad Sci U S A. 2014;111:3194. 15. Hara Y, Takeuchi M, Kageyama Y, Tatsumi K, Hibi M, Kiyonari H, et al. 37. Crotalus horridus (ID 16679) - Genome - NCBI [Internet]. [cited 2020 Dec 2]. Madagascar ground gecko genome analysis characterizes asymmetric fates Available from: https://www.ncbi.nlm.nih.gov/genome/16679?genome_a of duplicated genes. BMC Biol BMC Biology. 2018;16:1–19. ssembly_id=274149 16. Xiong Z, Li F, Li Q, Zhou L, Gamble T, Zheng J, et al. Draft genome of the 38. Gilbert C, Meik JM, Dashevsky D, Card DC, Castoe TA, Schaack S. leopard gecko, Eublepharis macularius. GigaScience. 2016;5 Available from: Endogenous hepadnaviruses, bornaviruses and circoviruses in snakes. Proc R https://doi.org/10.1186/s13742-016-0151-4. Soc B Biol Sci. 2014;281. 17. Darevskia (ID 327916) - BioProject - NCBI [Internet]. [cited 2020 Dec 9]. 39. Crotalus viridis viridis (ID 71654) - Genome - NCBI [Internet]. [cited 2020 Dec Available from: https://www.ncbi.nlm.nih.gov/bioproject/327916 2]. Available from: https://www.ncbi.nlm.nih.gov/genome/71654?genome_a 18. GenomeArk - Lacerta agilis [Internet]. [cited 2020 Dec 2]. Available from: ssembly_id=434976 https://vgp.github.io/genomeark/Lacerta_agilis/ 40. Shibata H, Chijiwa T, Oda-Ueda N, Nakamura H, Yamaguchi K, Hattori S, et 19. Kolora SRR, Weigert A, Saffari A, Kehr S, Walter Costa MB, Spröer C, et al. al. The habu genome reveals accelerated evolution of venom protein Divergent evolution in the genomes of closely related lacertids, Lacerta genes. Sci Rep. 2018;8:1–11. viridis and L. bilineata, and implications for speciation. Gigascience. 2019;8:22 41. Aird SD, Arora J, Barua A, Qiu L, Terada K, Mikheyev AS. Population genomic NLM (Medline). [cited 2020 Nov 29]. Available from: http://orcid.org/0000- analysis of a pitviper reveals microevolutionary forces underlying venom 0001-7839-735X. chemistry. Genome Biol Evol. 2017;9:2640–9. 20. Andrade P, Pinho C, De Lanuza GPI, Afonso S, Brejcha J, Rubin CJ, et al. 42. Vipera berus berus (ID 14467) - Genome - NCBI [Internet]. [cited 2020 Dec 2]. Regulatory changes in pterin and carotenoid genes underlie balanced color Available from: https://www.ncbi.nlm.nih.gov/genome/14467?genome_a polymorphisms in the wall lizard. Proc Natl Acad Sci U S A. 2019;116:5633– ssembly_id=214193 42 [cited 2020 Dec 2]. National Academy of Sciences. Available from: 43. Gao J, Li Q, Wang Z, Zhou Y, Martelli P, Li F, et al. Sequencing, de novo https://www.pnas.org/content/116/12/5633. assembling, and annotating the genome of the endangered Chinese 21. Yurchenko AA, Recknagel H, Elmer KR. Chromosome-level assembly of the crocodile lizard Shinisaurus crocodilurus. Gigascience. 2017;6:1–6. common lizard (Zootoca vivipara) genome. Genome Biol Evol. 2020;12:1953–60. 44. Song B, Cheng S, Sun Y, Zhong X, Jin J, Guan R, et al. A genome draft of 22. Roscito JG, Sameith K, Pippel M, Francoijs KJ, Winkler S, Dahl A, et al. The the legless anguid lizard, Ophisaurus gracilis. Gigascience. 2015;4:15–7. genome of the tegu lizard Salvator merianae: combining Illumina, PacBio, 45. Lind AL, Lai YYY, Mostovoy Y, Holloway AK, Iannucci A, Mak ACY, et al. and optical mapping data to generate a highly contiguous assembly. Genome of the komodo dragon reveals adaptations in the cardiovascular Gigascience Oxford University Press. 2018;7:1–13. and chemosensory systems of monitor lizards. Nat Ecol Evol. 2019;3:1241–52 23. Ullate-Agote A, Milinkovitch MC, Tzika AC. The genome sequence of the Springer US. Available from: https://doi.org/10.1038/s41559-019-0945-8. corn snake (Pantherophis guttatus), a valuable resource for EvoDevo studies 46. Georges A, Li Q, Lian J, O’Meally D, Deakin J, Wang Z, et al. High-coverage in squamates. Int J Dev Biol. 2014;58:881–8. sequencing and annotated assembly of the genome of the Australian 24. Pantherophis obsoletus (ID 88953) - Genome - NCBI [Internet]. [cited 2020 dragon lizard Pogona vitticeps. GigaScience. 2015;4 Available from: https:// Dec 2]. Available from: https://www.ncbi.nlm.nih.gov/genome/88953 doi.org/10.1186/s13742-015-0085-2. ?genome_assembly_id=889057 47. Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, et al. The 25. Ptyas mucosa (ID 44753) - Genome - NCBI [Internet]. [cited 2020 Dec 2]. genome of the green anole lizard and a comparative analysis with birds Available from: https://www.ncbi.nlm.nih.gov/genome/44753?genome_a and mammals. Nature. 2011;477:587–91. ssembly_id=884075 48. Brian Simison W, Parham JF, Papenfuss TJ, Lam AW, Henderson JB. An 26. GenomeArk - Thamnophis elegans [Internet]. [cited 2020 Dec 2]. Available annotated chromosome-level reference genome of the red-eared slider from: https://vgp.github.io/genomeark/Thamnophis_elegans/ turtle (Trachemys scripta elegans). Genome Biol Evol. 2020;12:456–62. 27. Thamnophis sirtalis (ID 16688) - Genome - NCBI [Internet]. [cited 2020 Dec 49. Ghosh A, Johnson MG, Osmanski AB, Louha S, Bayona-Vásquez NJ, 2]. Available from: https://www.ncbi.nlm.nih.gov/genome/16688?genome_a Glenn TC, et al. A High-Quality Reference Genome Assembly of the ssembly_id=245767 Saltwater Crocodile, Crocodylus porosus, Reveals Patterns of Selection in 28. Li JT, Gao YD, Xie L, Deng C, Shi P, Guan ML, et al. Comparative genomic Crocodylidae. Genome Biol Evol. 2019;12:3635–46 Oxford University investigation of high-elevation adaptation in ectothermic snakes. Proc Natl Press. [cited 2020 Dec 2]. Available from: https://pubmed.ncbi.nlm.nih. Acad Sci U S A. 2018;115:8406–11. gov/31821505/. Vassetzky et al. Mobile DNA (2021) 12:10 Page 10 of 10

50. Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432:695–716. 51. Nikaido M, Noguchi H, Nishihara H, Toyoda A, Suzuki Y, Kajitani R, et al. Coelacanth genomes reveal signatures for evolutionary transition from water to land. Genome Res. 2013;23:1740–8. 52. Gregory TR. Animal Genome Size Database. 2020. Available from: http:// www.genomesize.com 53. Yamada KD, Tomii K, Katoh K. Application of the MAFFT sequence alignment program to large data - reexamination of the usefulness of chained guide trees. Bioinformatics. 2016;32:3246–51. 54. Nicholas KB, Nicholas HBJ. GeneDoc: Analysis and Visualization of Genetic Variation 1997. Available from: http://www.nrbsc.org/gfx/genedoc/index. html 55. Eddy S, Cambridge U. SQUID - C function library for sequence analysis; 2005.

Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.