COMMENTARY

Evolutionary tinkering with transposable elements

I. King Jordan* National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894

t was almost 30 years ago when with to create novel forms. Indeed, de- Franc¸oisJacob declared that evolu- spite the early notion of TEs as being tionary innovation (the emergence strictly selfish (parasitic) elements that of novel form and function over serve no function for their hosts (7), Itime) occurred primarily via a process there now exist numerous examples of of ‘‘tinkering’’ (1). By tinkering, Jacob formerly mobile TE sequences that have essentially meant the creation of novelty been ‘‘domesticated’’ (8) to serve some through random combinations of pre- functional role for the host genomes in existing forms. Two fundamental and which they reside (9, 10). However, countervailing notions are implicit in there is still a relative paucity of de- this view of evolution: optimality versus tailed studies that address both the evo- constraint. Were evolution to perform lutionary dynamics of TE-derived host optimally, a more apt metaphor might as well as the functional roles of be that of an engineer. An engineer the proteins that they encode. The work works according to a plan, with a pre- of Cordaux et al. (3) on the SETMAR cise goal for the desired end, and uses represents an important step to- material designed specifically toward ward alleviating this knowledge gap. that end. Evolution, on the other hand, SETMAR, originally discovered by must work without the benefit of fore- Robertson and Zumpano (11), is a chi- sight and is subject to very real con- meric gene made up of a SET histone straints with respect to the material at methyltransferase transcript fused to the its disposal; as such, evolutionary biol- domain of a formerly mo- ogy is replete with examples of subopti- bile TE sequence. The transposase do- mal solutions to functional challenges main in question comes from a member (2). Similarly, a tinkerer works without of the Hsmar1 mariner-like family of a clear plan by using anything and ev- elements. Mariner-like elements are erything at his disposal to produce an more commonly found in insects, and entity that possesses some kind of (un- Hsmar1 was the first TE of this type anticipated) functional utility. In this found in the . Hsmar1 issue of PNAS, Cordaux et al. (3) ex- elements are class II, or DNA elements, plore an example of tinkering along the that have terminal inverted repeats human evolutionary lineage, whereby an (TIRs) flanking an ORF that encodes a existing host gene merged with a Fig. 1. Establishment of a TE-derived genetic transposase. Class II elements transpose (TE) to create a regulatory network. (a) DNA type, class II, TE (blue) via a cut-and-paste mechanism catalyzed primate-specific chimeric gene. inserts downstream of host gene exons (red). (b)TE by the transposase, which binds to the In the decades since Jacob’s exposi- binding domain fuses with host gene transcript. TIRs, excises the element, and then tion, molecular biology studies have pro- (c) The chimeric gene can now regulate multiple inserts it in a new location. Class I ele- duced a deluge of primary data (tens of cognate binding site-containing locations around ments, or retrotransposons, which trans- thousands of three-dimensional protein the genome. pose via the reverse transcription of an structures and literally billions of nucle- RNA intermediate, are actually far otides of gene sequences, including hun- more common than DNA elements in this innovation are dominated by cre- dreds of complete genomes in the past the human genome. The so-called long- ation through rearrangement. few years alone). Comparative studies of and short-interspersed nuclear elements, One of the largely unanticipated re- the resulting data have underscored the LINEs and SINEs, respectively, make up extent to which genome evolution is in- sults of mammalian genome sequencing Ϸ25% of the human genome. However, deed characterized by tinkering. There efforts was the revelation of the extent for as yet unknown reasons, DNA ele- are a discrete and finite number of to which these genomes are made up of ments like Hsmar1 are overrepresented structural folds, protein sequence do- sequences derived from TE insertions. among host genes with TE-derived cod- mains, and gene families (4); new genes The human genome sequence was found ing sequences. This overrepresentation Ϸ evolve through slight modifications to consist of 45% TE-derived se- may be because of the broad utility of and͞or recombinations of these preexist- quences (6), and this figure is certainly the DNA-binding properties encoded by ing forms. The actual de novo evolution a vast underestimate because many TE- the transposase ORF. In fact, there is of protein coding sequences is exceed- derived human sequences have diverged a distinct possibility that, as these kinds ingly rare. For instance, despite the beyond recognition. In addition to being of chimeric genes are born, they are Ϸ80–100 million years that have elapsed ubiquitous genomic elements, TEs are able to bind to multiple dispersed sites since the human and mouse lineages also autonomous in the sense that they around the genome (those occupied by diverged, the genomes of these two spe- carry the regulatory and protein coding their cognate TIRs), resulting in the cies share Ͼ99% homologous genes (5). sequences necessary to catalyze their Clearly, however, mammalian evolution transposition. The ubiquity of TEs, has been marked by substantial func- along with the functional machinery that Conflict of interest statement: No conflicts declared. tional innovation, and so it must be that they encode, makes them ideal genetic See companion article on page 8101. the genome-level dynamics underlying building blocks that evolution can tinker *E-mail: [email protected].

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0602656103 PNAS ͉ May 23, 2006 ͉ vol. 103 ͉ no. 21 ͉ 7941–7942 Downloaded by guest on September 26, 2021 emergence of complex regulatory net- stream cryptic 5Ј donor splice site. This The sequence-based evidence de- works (Fig. 1). Britten and Davidson 5Ј splice site presumably became acti- scribed earlier, together with previously (12) articulated a very similar model for vated together with a cryptic 3Ј splice conducted experimental work demon- the evolution of cis-regulatory networks acceptor site in the Hsmar1 sequence, strating SETMAR methyltransferase based on repetitive DNA. Recruitment resulting in the formation of a novel activity (14), make up a compelling and of DNA-type element sequences into intron͞exon structure. fairly detailed story of the birth of the host genes may also represent a dis- In addition to detailing how the TE-derived chimeric gene. However, tinctly mutualistic evolutionary strategy SETMAR gene fusion occurred, Cordaux et al. (3) did not stop there; that these relatively low-frequency Cordaux et al. (3) took the critical step they went on to biochemically character- elements employ on occasion to help ize the MAR domain’s ability to bind ensure their long-term survival in the TIR-like DNA sequences, as well as its genome. SETMAR potential to encode an active trans- Cordaux et al. (3) began their study posase. The experimental assays con- by performing a series of sequence anal- coding sequences ducted followed directly from the se- yses aimed at elucidating the evolution- quence analyses that suggested ary dynamics and potential function of are evolving under conservation of the binding domain and the SETMAR gene. First of all, they loss of the catalytic domain. Indeed, the were able to identify SETMAR or- selective constraint. experiments bear these predictions out thologs computationally among a fairly because the MAR peptide was shown to diverse set of vertebrate genomes rang- be able to bind TIR sequences but ing from mouse to zebrafish. All of of demonstrating that this chimeric gene could not catalyze transposition by using these orthologs were shown to possess is actually functional. SETMAR function a standard in vivo assay. The tight inte- only the two SET exons, and none of was demonstrated by (i) showing that gration of sequence analysis and experi- them is flanked by an Hsmar1 element the gene is widely expressed and (ii) mental work is one of the distinguishing et al. insertion. A more detailed analysis of demonstrating that the SETMAR coding features of the article by Cordaux ; orthologous regions cloned and se- the sequence analyses yielded specific sequences are evolving under selective quenced from eight primate genomes predictions that were then experimen- constraint. The latter conclusion is was then used to precisely determine tally confirmed. Moreover, the binding based on a pattern of elevated synony- when SETMAR emerged along the evo- experiments can be taken to suggest mous (KS) versus nonsynonymous (KA) lutionary lineage leading to humans. ϾϾ specific sequence analyses that could be Based on presence͞absence patterns, substitution rates. KS KA is consis- used to characterize the distribution and they were able to determine that an tent with purifying (negative) selection evolutionary conservation of SETMAR Hsmar1 element inserted in the SET because of functional constraint (13). binding sites in the human genome. One 40–58 million years ago. Interest- The relative differences of KS vs. KA can easily imagine further experiments ingly, this time span is around the along the coding sequence were also that could uncover the regulatory prop- same time that an Alu (SINE) element taken to suggest that the DNA-binding erties of the SETMAR gene. Such an inserted in the Hsmar1 5Ј TIR, render- capability of the N-terminal MAR do- approach could help to illuminate the ing the element immobile. After the main is being conserved, whereas the most provocative aspect of this study: Hsmar1 insertion, an exon capture event catalytic activity located in the C-termi- the suggestion of a specific mechanism resulting in the fusion of the transposase nal domain has been lost. The substitu- for the rapid evolution of a genetic encoding domain to the preexisting SET tion of the characteristic transposase regulatory network composed of a do- transcript was facilitated by a 27-bp de- catalytic sequence motif in the C-termi- mesticated transposase domain and its letion that removed the original SET nal MAR domain is also consistent with cognate binding sites dispersed through- stop codon and also activated a down- the absence of catalytic activity. out the genome (Fig. 1).

1. Jacob, F. (1977) Science 196, 1161–1166. 6. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, 10. Smit, A. F. (1999) Curr. Opin. Genet. Dev. 9, 2. Darwin, C. (1859) On the Origin of Species (John C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., 657–663. Murray, London). Doyle, M., FitzHugh, W., et al. (2001) Nature 409, 11. Robertson, H. M. & Zumpano, K. L. (1997) Gene 3. Cordaux, R., Udit, S., Batzer, M. A. & Feschotte, 860–921. 205, 203–217. C. (2006) Proc. Natl. Acad. Sci. USA 103, 8101– 7. Doolittle, W. F. & Sapienza, C. (1980) Nature 284, 12. Britten, R. J. & Davidson, E. H. (1971) Q. Rev. 8106. 601–603. Biol. 46, 111–138. 4. Chothia, C. (1992) Nature 357, 543–544. 8. Miller, W. J., Hagemann, S., Reiter, E. & Pinsker, 13. Sharp, P. M. (1997) Nature 385, 111–112. 5. Waterston, R. H., Lindblad-Toh, K., Birney, E., W. (1992) Proc. Natl. Acad. Sci. USA 89, 4018– 14. Lee, S. H., Oshige, M., Durant, S. T., Rasila, K. K., Rogers, J., Abril, J. F., Agarwal, P., Agarwala, R., 4022. Williamson, E. A., Ramsey, H., Kwan, L., Nick- Ainscough, R., Alexandersson, M., An, P., et al. 9. Kidwell, M. G. & Lisch, D. R. (2001) Evolution Int. oloff, J. A. & Hromas, R. (2005) Proc. Natl. Acad. (2002) Nature 420, 520–562. J. Org. Evolution 55, 1–24. Sci. USA 102, 18075–18080.

7942 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0602656103 Jordan Downloaded by guest on September 26, 2021