The future of annotation and their classification in the light of functional genomics -what we can learn from the fables of Jean de la Fontaine? Peter Arensburger, Benoit Piegu, Yves Bigot

To cite this version:

Peter Arensburger, Benoit Piegu, Yves Bigot. The future of transposable element annotation and their classification in the light of functional genomics - what we can learn from thefables of Jean de la Fontaine?. , Taylor & Francis, 2016, 6 (6), pp.e1256852. ￿10.1080/2159256X.2016.1256852￿. ￿hal-02385838￿

HAL Id: hal-02385838 https://hal.archives-ouvertes.fr/hal-02385838 Submitted on 27 May 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Copyright Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and progressfurther in than aid genewere rather specifically developed to annotation to study TEs. We that argue subsequently developed. is whole Today, mostly annotation using done TE tools that David Finnegan near last the end the of century Transposable ideas sciencebysignificantly element influenced has (TE) pioneering the beenof Abstract transposableTE, element mobile geneticMGE, element Institute GIRI, Genetic Information ICE, integrative conjuga Abbreviations keywords address: author Corresponding Bi 2 1 Peter Arensburger functional genomics of future The element transposable annotation classification andtheir light the of in

University, CAPomona, 91768 Nouzilly United States America. of Physiologie de lareproduction etPhysiologie des Comportements, UMR INRA SciencesBiological Department, State University, California Polytechnic CA Pomona, 91768 their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

– : mobile genetic element, ontology taxonomy, repeat, genome,

France :

1 the TE field is field classification both impeded TE byTE the schemes current a and by Piégu, Benoît -

what we can learn from the fables of fables de the canJean la Fontaine? learn we from what tive elements Comment citer cedocument:

-

2 United States America. of E ological Sciencesological State California Polytechnic Department, , and Yves Bigot

, as as systems classification the well by were that 1

2

- mail: [email protected] mail: - CNRS 7247, PRC, CNRS 7247, 37380

-

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and standardsclassification. andanadoption of annotation improved TE for a of formation societythe organize to a debate larger an questions these on as asmethods well ideas possible encourage is for schemes. classification alternative to hope Our understand TE wide . genome annotation are Novel redefinemethods helping to our biology recognize to TE that failure is fundamentally that different multicellular from of their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables ing of TE of sequence Weing briefly discuss and . origins these some of new Comment citer cedocument:

2

d to promote the the promote to d Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and transmission toindependently of vectors. be definition this we find While is it useful important be TEs, they are considered to to TEsbecause not able between theymove hosts are to whiledefinition, ,and integrative phages, conjugative elements (ICE) have the plasmidscopies integrated into or their of tohosts. this Therefore, according includes it that mob parasitic vectors they that use lateral transfers" for fromchromosome one a or genome host another, between location to or hosts by using needsdefinition be are to "TEs broadened discre to their in between genomeanother host or different ". Recently, this that we proposed Haren et al. definition ofA TEs the all these voices by the questions in "Lion the andRat". including is it for community, time the We for organized. call emergence societybiology the an address of for international TEsto of c views TE the is and it for time theyhow areRecent (see classified publications taxonomically. in discussions the what is,concepts a include natureof TE TEsare how assembled identified in genomes, and and accepted by arethat a championed without minority eukaryotic transposable of field organized elementsaroundis several currently (TEs) concepts and scientificthe set making of for decisions future the field. procedures We on the that argue societies information of co electedthe where within flow the representatives organize for allow debates. contradictory to time second this in category often form Communities fr emphasize input systemexpense collegiality. comesAlternatively, butoften of at the some scientific communities that dictate dominant the ideascommunity makes whichis field, the “might in an efficient right”, greater”. the Inhelp scientific circles sometimes respectivelysociety thatsummarized may and be “even makes “might as smallest the right” can theseand Lamb”society. “Wolf the Among for his fablesknown protagonists involving organi the examine that Jean seventeenth poetsthe of French the of de famous laFontaine, most the century, is well their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables 5

defined TEs as "discrete segments of DNA fromlocusof one capabledefined to "discrete TEsas segmentsmoving of om all members,om all lines the along second the fable, of comesof butthis at a cost

ile DNA sequences that are primarily DNAile sequencesprimarily maintained by are that vertical transmission as Comment citer cedocument: ommunity to re ommunity to

1

and the “Lion andRat” and “Lion the - examine its which science bases the upon is 3

it is the most powerful members is powerful it the most the of

3 te segments of DNA capable of moving withinte DNA segmentsmoving of capable of . An important aspect is . definition this Animportant of argument by a majority.These 2

describe views human two of zation zation humanof 3,4 ) reflect these similar featuressimilar mmunity mmunity

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and Repegenome, varies a butin genome, recent modelchicken the by in ( that article we showed establish a effect The match. these RepeatMasker of reliability the on and Repbase annotations genomic and/or 3) genome, repeats th consensuses library the Repbase, absenceconsensuses in from 2) of a repeat true matching the in three of result the typically issues: constructed mislabeled taxonomically and/or poorly 1) (i.e.precision misidentified). repeats are The taxonomically RepeatMasker genome, the of results often lacks repeats accuracyall annotated) not (i.e. and are U databases entities. by commercial academic chargea use fee thesefree, to of use the isthese databases for of GIRIright reservesthe grants. of Theseare composed submissions libraries partially While by groups. academic outside institution supported inprivate, profit part non by private funds, don InstituteRepbase (GIRI) Information by ( Genetic the based(a.k.a. th Typically, searching). sections sequenced newly the matching of a preestablished to genome repeats known of library eukaryotic genomes (e.g. andlists have abundance studycomposition beenthe repeated used of to RepeatMasker a annotated also listof provide can genome. the location in repeats and their Such RepeatMaskerprogram to task the simplify order This of prediction. masking usin is nearly always performed sequencednewly a genome common strategy eukaryotic in maskregions is to genomic repeated interestInitially, TE in rationale and for quality annotationsUses TE high closelytransfer within related sequences both alsobetween but very divergent groups. definition isformal difficult is viruses, that as phages, three by are (all MGEs ICEs grouped based a than TEsrather formal definition of scientific only on evidence.a reasons the such Oneof is it that more note a of definition practical to that nfortunately, for those for nfortunately, studying interested in but gene in repeated the annotation not portions their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables researchers) and exchangelateral and through TEs, can genetic recombine material atMasker and Repbase annotated only about half about half existing the only of atMasker repeats and Repbase annotated (11% vs. annotation wasannotation to linked gene When annotation. annotated a in are 6 . In addition to to outputting a with addition masked genome . In repeats, Comment citer cedocument: 7,8 ). RepeatMasker works on the principle of library principle the of on RepeatMasker works ).

at are too divergent from the divergentat from consensus library are too to e libraries repeats of are ones the at hosted 4

helps to deal with the deal with diversityhelps to and complexity http://www.girinst.org/repbase/

se annotations are problematic ations and US federal sequences many in - based searching, Gallus gallus ) a

) g theg Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and relies on output two, the mostly programs the of RECON are RepeatModeler programs, several do packagesbeen just to mor have the Two that. of developed Because isdesirable multiple it of usually results the combine researchers extensive expertise. without computer speed, produce consistent res both and accurate butmust in become adopted these widely to the methods they calculation needonly rapid community not researchers institutio at public personalcomputersof has clusters increased accessible have more and computing become to has last becomeThis point much less last the in speed few a years of problem as computation the manythese haveand of 3) historically programs requireds place andaccuracy,precision way inherent to no 2) groups, discoveredtaxonomic repeats into sequence. disadvantages The of describedpreviously repeats, th repeatshave when from divergedsubstantially or identify repeatshave to no potential the sequence these or even little bear when to similarity conserved k repeats pairwise genome the itself of alignments to (e.g. from RECON searchthat patterns (e.g. repetition for Repeat of Tandem Finder sequence assembly. Numerous An alternativea create generate way repeat to to one is library creating for methodology models is incompletely described literature. the in is it available as2) writing this of sequences some in cases than it human the TEsin genome causing RepeatMasker, detect to fewer However, be Dfam we find to because: unsatisfying for itappearsaTE 1) useflawed model to models Markov andhidden sequence instead alignments o issueaddress problematic (very third the divergent repeats) Dfam introducing by RepeatMasker of community. the a authors to lackaccountability The of attempted to have generated and annotated Repbase in been never detail, has in and this published lack transparency of many consensus atby methodology GIRI. The which sequencesare fi The have by researchers been confirmed independently Institute at Roslin the 20% repeats) underestimated and dramatically diversity the TEs of their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables rst two rst problematic the two of issues RepeatMasker are for a probably due, at to part, in least - mers

(e.g. RepeatExplorer 16

and TEdenovo from REPET the packageand TEdenovo Comment citer cedocument: ns allocations (e.g. XSEDE programs have and this include been programsprograms for developed de novo for only five species, only for as RepeatMasker, just its 3) with

15

methods include: 1) a 1) lackstudies include: of testingmethods their ). Anattractivefeature these of is they that methods 5

13 https://www.xsede.org ults and be easy enough to be to used andby beults enough easy

and RepeatScout f consensussequences. simple ubstantial computational resources. ubstantial computational 17 de novo . RepeatModeler, the older the older the .of RepeatModeler, de novo 12 ) and thoseidentify that ) 9 . Since then, our findings findings . Since then, our 13

repeat searching , dnaPipeTE based on the genome genome the on based 10 18 .

but has such poorsuch but has

). In order for). In order e popular ones in turn leads to turn in 11 e consensus 14

which useswhich ) or ) or Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and may each be to host. a variantsspecific represented of byset divergent sequences.indels and/or highly several in is that a species found Furthermore, TE host completemost version the of isIn model” representation a this not of consensus main only “TE the composed sequence (the and creators REPET the RepeatExplorer the of why programs havea “TE replacedmodel”. with it copies Clearly, genome. the in a single sequence ad cannot by aattempts represent single consensus sequence TE to an that averaged sequence multiple of literature the of Indeed, andthe many of Repbase) much databases (including speciesa define TE of development The description ofThe changing species TE is a field medical the annotations, including for sciencegenome only whatrequired for genomebe will constitutes aThis great annotation. of proper importance not be will decide if, to parallel in scientific with these community standards developments, are gold acceptable superficiallyannotate geno repeated only to are able run to researcherimportant. power Asmost available computing the is a reaching stage to they where nuclei, in architecture and genome the need for in portion genic the non of the of becomes genome for in betterits role gene understood regulation We are the on repeatedhighly sequences. these of development programs, butat discovering moment the they appear to be only to limited RepeatExplorer purposethis include discover some repeat unassembled the sequences using sequence d genome. the portions of assembly In cases available anisbe such not when to may possible it quality genome existencethe a assembly high contains that sequences the of repeated most of the assumptionin An important novo value to dubious modern research efforts discovering in performance repeats TEdenovo to compared their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

repeat genomes. discovery assembled in

cusp of arepeatcusp of genome sea As of field change the annotation. in importance the de novo de novo

repeat discovery programs on their own, soon be will their it repeat no on longer discovery programs s, but for other discipliness, other quality genome butfor high depend that upon Comment citer cedocument:

TE annotationTE tools species way the is TE are changing descri

TE) but also TE) all of consensusesthe detected to variants as due de novo

15

and Red 20

repeat we have discovery methods discussed so fa . 17

. This leaves TEdenovo as. leaves This new the standard TEdenovo for

12 6

among others. It early is days still among the in - depth depth repeat w annotation me sequences. the next The issue for equately fulfill this function this is which equately fulfill that it should it that be considered of

ata. Programs designed for designedata. Programs for ill become more ill become more bed. bed. r is is r de Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and groups defi (i.e.species) emerge. class, our TEs to Finally, of family, and grouping purposes,for divisions such are flexible taxonomic not novel for and make difficult it to usecontinue lacks forfield social structure such an organized a debate. weaknessA second is we that scheme has (yet) not been large subject the of scheme our that proposal we recognize is def Casposons new classes,integrating and new orders new proposal awe new outline our taxonomic for scheme which classesTE recent areand orders discoveries necessary with up this keep in to evolutionary convergencesattention to level, the addition several and 3) of at molecular the new and DerbyshireCurcio science view appears current fitthe us to best, to butrequires caref world advocated TE view the justified longer of science, byno by Finnegan modern is the 2) an our ) into and have integrate recent attempted most the discoveries to among and (both Over last the few years we have analysed schemes, bases the critically taxonomy variousTE of history. reflects evolutionary better were that a acquiredit basis much by TEs, making for intermediate transposition en and Curcio basedtaxonomy models. transposition on Finnegan (and subse communitiesdifferent have TE diverged.eukaryotic transposon community the The adopted taxonomy updatesTE ( accepteda scienti the swath bylarge of transposons) DNA. can DNA to Finnegan's has transpose from basal been dichotomy directly intermediate a using DNA His transposition. of classelements I can tra classes phylogenetic atcould, base, their two bebased classified presumedmechanism into their DavidIn late the Finnegan TE 1980's pioneered systematics Rationale th for revisiting their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables zymatic machineries that trigger TE their machinerieszymatic integration host trigger that into natureof the DNA, including 26,27 Derbyshire . However, taxonomic as samestandards previous the using to applied we have

taxonomic divisionsas such taxonomic classes, and super orders quent updates) model, while the prokaryotic communityquent focused while updates) prokaryotic the model, a on

that describedthat world diversity the based TE the of diversity the on of 23,24 alysis. the 1) that we concluded findings basal Based dichotomy out on Comment citer cedocument: - RNA

25 see e taxonomye of TEs . This view more accurately. view captures This more evolutionary the many of

- 3 DNA mechanism, while the classDNA the mechanism, while II elements DNA (a.k.a.

for review issue). this of However,for an issue is this where

fic community and has been fic subsequent community basis the two of

The prokaryotic view was a in prokaryotic outlined paper 2002 The by nspose by reverse of an of RNA bynspose reverse transcription - icient several in respects. first The is our that families, as such recently the described - scale debate in the field, partially becausescale debate field, the partially in TE the

7

21,22 has the advantage of allowing the has the advantage the allowing of . His was TEs that conception - families. practical While nition of nition of is awhat TE

a TE taxonomy that that taxonomy a TE field 3 . In Table 1 ul ul Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and de laRecherche et de laTechnologie. STUDIUM and Ministère dethe Recherche Groupement the Nationale, CNRS 2157, de l’Education work This was by funded State University, California the Polytechnic C.N.R.S., the I.N.R.A., the Acknowledgments fable we it, disputeiswould Without great always “The of help. “Cat the andFox”: sleeping!” be a such certainly society of organization The would butto La bethe quote Fontaine challenging including theirbiology organiz role nuclear in scientificinternational societyvarious communities focused on that gathers several aspects TE of of taxonomy the TEs. However, emergence the may a a such be to first of committee step an and eukaryotic sides)prokaryotic take as as to the chargeof well virologists Transposable Taxonomy the of gather (ICTTE) researchers Element that would TE the on (both we In 2015 Concluding remarks phages,prokaryotic and TEs. ICEs have PhiGO used a such scheme and ACLAME create to this new to field, haswhile been previously Ariane Since early the Toussaint's used. 2000's team sequences novel of characterization taxonomic groups. Such new TE a of propo and discovery system. be shared clustered basedterms. their ontological would both This TEscould on allow thatforth a system base discussionthese aIn of course the of on recentmeeting international issues ICE and TEs. between(seeabove) strict since becertain a viruses, continuum may there too is likely phages, their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

called forcalled creation the a theof new organizationcalled Committee International for

d on ontology terms could be terms create developed to ontology a could on taxonomic dynamic d Comment citer cedocument:

ation, regulatorynetworks, as as taxonomy. well TE 8

29 – 31 , tools to annotate and to classify, tools 28 issuessurrounding

a proposal was a put proposal sal, 32 .

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Refer

their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

fundamental update to meet the challenge of their diversity and complexity. Piégu, B., Bire, S., Arensburger, P. & Bigot, Y. A survey of transposable element classification systems La Fontaine, J. ‘The Lion and Rat’ Book 2, Fable 11. La Fontaine, J. ‘The Wolf and Lamb’ Book 1, Fable 10. 29, wide characterization of eukaryotic repetitive elements next from Novak, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplor aegypti). dnaPipeTEwith from Raw Genomic Reads and Comparative Analysis thewith Yellow Fever Mosquito (Aedes Goubert, C. 1269 Bao,Automated Z. De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Benson, Tandem G. repeats finder: programa to analyze sequencDNA Hubley, R. genome. Mason, A. S., Fulton, E.,J. Hocking, M.P. & Burt, W.D. A new look at the L BMC Genomics repeats in the genome model of the red jungle Gallusfowl, gallus, using seriesa of de novo investigating tools. Guizard, S., Piégu, B., Arensburger, P., Guillou, & F. Bigot, DeepY. landscape Holt, R. A. Lander, E. S. A Smit, A, Hubley,F R. & Green, RepeatMaskerP. Open Microbiol. Haren Hoen, R. D. (2015). ences

792 – , L., , Ton 1276 (2002).

– 793 (2013). Genome Biol. Evol.

et al. 53, et al. et al. et al. et al.

- 245

Hoang, B. & Chandler, M. Integrating DNA: and Retroviral . 17,

The genome sequence of the malaria mosquito Anopheles gambiae. The Dfam Thedatabase Dfam of repetitive families.DNA

A call for benchmarking transposable element annotation methods. De Novo Assembly and Annotation of the Asian Tiger Mosquito ( BMC Genomics

Initial sequencinInitial – 659 (2016). 281 (1999).

Comment citer cedocument:

7,

1192

17, g g and analysis of the human genome. –

688 (2016). 1205 (2015).

9 -

3.0. 3.0. 1996 Nucleic Acids Res. - 2010. (1996). - generation sequence reads. er: a er: Galaxy a es. TR contentof the Nature Nucleic Acids Res.

update of dispersed and tandem Mol. Phylogenet. Evol.

409,

44, - Aedes albopictus) based web server for genome Science

D81 Mob. DNA 860 – – D89 (2016). 921 (2001).

298, 27,

Bioinformatics 6, 573 Genome Res.

129 (2002).

(2015).

Annu. Rev. 86, –

580 (1999). -

A call for a 90 –

109

12,

-

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and 32. 31. 30. 29. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

Girgis, Girgis, H. Z. Red: an intelligent, rapid, accurate tool for detecting repeats de 21, Price, A. L., Jones, N. C. & Pevzner, A. P. De novo identification of repeat in families large genomes. Annotation Approaches. Flutre, Duprat, T., E., Feu A Smit, A, H.,F R. RepeatModeler Open La La Fontaine, J. ‘The Cat and Fox’ Book 9, Fable 14. Nucleic Acids Res. Leplae, R., Lima Microbiol. Toussaint, A., Lima Leplae, R. ACLAME: A CLAssification of Mobile genetic Elements. Analysis and Annotation of DNA Repeats and Dark Matter in Eukaryotic Genomes. in (2015). that generates target site duplications. Hickman, A. B. & Dyda, TheF. casposon Synthesizin Krupovic, M., Shmakov, S., Makarova, K. S., Forterre, & P. Koonin, E. V. Recent Mobility of Casposons, Self 4, Cu Repbase. Kapitonov, V. V. & Jurka, J. A universal classification of eukaryotic transposable elements implemented in (2007). Wicker, T. Finnegan, J.D. Eukaryotic transposable elements and genome evolution. Finnegan, J.D. Transposable elements. Ardeljan, D. Bioinformatics

865 rcio, M. J. & Derbyshire, K. M. The outs and ins of transposition: from mu to kangaroo.

i351 –

877 (2003). – Nat. Rev. Genet. i358 (2005).

158, et al. g g Transposons at the Origin of the CRISPR et al.

16,

567 - A unified cl Mendez, & Toussaint,G. A. ACLAME: A CLAssification of Mobile genetic Elements, update 2010.

Meeting Report: The Role of the in Cancer. 38,

(2015). -

– Mendez, G. & Leplae, R. PhiGO, phagea ontology associated with the ACLAME database. 571 (2007).

D57

illet, C. & Quesneville, ConsideringH. Transposable Element Diversification in De Novo PLoS ONE 9, Comment citer cedocument: –

D61 (2010). assification system for eukaryotic transposable elements. 411 –

412 (2008).

6,

e16526 (2011). - Curr. Opin.Dev. Genet. Nucleic A

- 1.0. 1.0. (2008).

encoded Cas1 from

cids Res.

10

- Cas Immunity.

43,

10576 2,

861 Nucleic Acids Res. – – Aciduliprofundum boonei 10587 (2015). 867 (1992). Genome Biol. Evol. Cancer Res Trends Genet. TIG - novo on the genomic scale.

.

76, 32,

Nat. Rev. Genet.

4316 45D

8, Nat. Rev. Mol. Biol.

375 5, – – 49 (2004).

4319 (2016). 103 is a DNA DNA is a – 386 (2016). – Bioinforma 107 (1989).

8,

BMC 973 -

Res. 982 tics

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and 33. their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

102 Huang, S. – 114 (2016). et al.

Discovery of an Active RAG Transposon Illuminates the Origins of V(D)J Recombination.

Comment citer cedocument:

11

Cell

166,

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and some with havingDNA members a Classes TE transposon 1. classification TE Table pr DDE Nuclease/Recombinase Class their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables -

transposons

(Cut intermediate transposition with dsDNA a linear transposonsDDE/D (Copy intermediate transposition with no DNA transposonsDDE mechanism Transposition Order

Comment citer cedocument: - out/Paste in)

- in) oposed in

-

3

Mu Nuclease/Recombinase Phylogenetic relationships between Superfamilies IS1595 IS630/Tc1/mariner (ITm)/Zator ISAs1, ISL3 IS110, IS630, IS982, IS1380, IS1182, IS6, IS21, IS66, IS30, IS1, IS3, IS4, IS701, ISH3, IS1634, Tn3

12 - Merlin,

°

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and

their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables out/Paste in) a heteromericusing intermediate and transposition with dsDNA a linear transposonsDDE/D

Comment citer cedocument: (Cut

- CACTA/Mirage/Chapaev (CMC), Academ, IS1380/PiggyBac, IS256/MuDR/Mutator/Rehavkus IS5/PIF/Harbinger, Tn7 /ProtoRag Kolobok, P(?),Sola, Dada, Hobo/Ac/Tam (hAT),

13

33

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and

Y1 their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables - transposons

dsDNA transpositiondsDNA with a circular Y1 (Cut transposition a circular dsDNA Y1 transposonswith in) (Copy retrotransposons LTR in) intermediate transpositiondsDNA with a circular transposonsDDE (Copy

Comment citer cedocument: - out/Paste in) - - out/Paste out /Paste

-

-

Copia IS3 DIRS Crypton CTnDOT Tn916 IS200/IS605 ERV3 ERV2 ERV1 BEL Gypsy

14

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and

classification TEs pending Casposons S Y2 - their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables transposons - transposons 26,27

? ? Copy (Copy to defineremains a configuration that intermediateDNA in Casposase (Cut transposition dsDNAcircular S withtransposons a Copy (Copy transposition a circular ssDNA Y2 transposonswith in) (Copy

Comment citer cedocument: - - - out/paste in or paste - in) - - in or out/Paste in orin

with a - - out/ out/

- in) -

in) -

Helitrons IS91 VIPER Ngaro Fanzor ISAs1 Casposons Tn5397 IS607

15

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and non Class for non a Classes with TE TEs their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables - LTR LTR retrotransposons

“Zisuptase”) ondepending a transposition Zisupton another origin transposase having with a DDE DDE retrotransposons integrases of LTR putatively related to Transposase Copy (Copy integraseDDE /Mavericks LINEs (En) Order Comment citer cedocument: - - transposons in) - in orin

(Unknown - LTR retrotransposonLTR phenotype - -

out/

Ginger1 Tlr1 Polintons/Mavericks Mavirus (?) Zisupton P(?) Ginger2

16

LINEs withLINEs a PD withLINEs EN anAP endonuclease,then RT Phylogenetic relationships between Superfamilies

°

- (D/E)XK EN

Version postprint of Jean de la Fontaine?. Mobile Genetic Elements, 6(6), 1-17. DOI: 10.1080/2159256X.2016.1256852 Arensburger, P.,Piégu, B.,Bigot,Y. (2016). Thefuture oftransposable elementannotation and Retron/msRNA RT features Class Group I intron (G1i) Intein host genes Machinery for excision of Class respectwith to the previous version indicated in italics just below the levels Orderof Class, and Superfamilies. Bibliographic refere in black, those in eukaryotes being in blue. Both colors are used for mixed superfamilies. The criteria used are  for prokaryotic a with TEs Classes TE rare for SSE an Classes with phenotype TE TEs  With With the exception of the Intein and Group intronI Classes, names of their classification inthe lightoffunctional genomics -whatwecan learnfromthe fables

(retrotransposition) Retron/msRNA Transposition mechanism Order dependent(HEN HR) Vsr dependent(HEN HR) PD dependent(HEN HR) GIY dependent(HEN HR) His dependent(HEN HR) HNH dependent(HEN HR) LAGLIDADG dependent(HEN HR) HNH inteins dependent(HEN HR) LAGLIDADG inteins Transposition mechanism Order introns II Group Penelope Comment citer cedocument: - - - (D/E)XK G1i (?) Cys YIG G1i

3 G1i G1i

are indicated. - like elements (PLE)

G1i G1i

°

phenotype

17

InBase (http://tools.neb.com/inbase/)InBase could inserted be proposed as used, in genesHost in which each intein specifically or site into host genes Phylogenetic relationships between HEN, and Superfamilies Introners Mobile introns lariat introns II Group Penelope, GIY GIYNeptune, Coprina, no GIY Athena, no GIY withLINEs both and AP PD msRNA Phylogenetic relationships between RT Superfamilies specifically could inserted also be used sitesHost in which each group intron I superfamilies found in prokaryotes are typed

- like - - YIG domain

elements YIG domain - - YIG domain

 YIG domain

nces of updated points

- (D/E)XK EN*

-