Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2008 LTR in plants: mechanisms of targeted integration and applications in genome engineering Yi Hou Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Cell and Developmental Biology Commons, and the and Genomics Commons

Recommended Citation Hou, Yi, "LTR retrotransposons in plants: mechanisms of targeted integration and applications in genome engineering" (2008). Graduate Theses and Dissertations. 11646. https://lib.dr.iastate.edu/etd/11646

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. LTR retrotransposons in plants: mechanisms of targeted integration

and applications in genome engineering

by

Yi Hou

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Molecular, Cellular, and Developmental Biology

Program of Study Committee: Daniel Voytas, Major Professor W. Allen Miller Stephen Howell Thomas Peterson Yanhai Yin

Iowa State University Ames, Iowa 2008 Copyright © Yi Hou, 2008. All rights reserved ii

TABLE OF CONTENTS

ABSTRACT iii

CHAPTER 1. GENERAL INTRODUCTION 1 REFERENCES 14

CHAPTER 2. CHROMODOMAINS DIRECT INTEGRATION OF RETROTRANSPOSONS TO HETEROCHROMATIN ABSTRACT 21 INTRODUCTION 21 RESULTS AND DISSCUSSION 24 REFERENCES 45 ACKNOWLEDGEMENTS 51 REFERENCES 51

CHAPTER 3. CENTROMERIC RETROTRANSPOSONS: POTENTIAL ROLE OF THE C-TERMINUS IN DETERMINING GENOMIC DISTRIBUTION ABSTRACT 60 INTRODUCTION 61 RESULTS 63 DISSICUSSION 70 PERSPECTIVE 73 MATERIALS AND METHODS 74 REFERENCES 75

CHAPTER 4. PLANT RETROELEMENTS FOR AMPLIFICATION AND DELIVERY ABSTRACT 80 INTRODUCTION 80 RESULTS 83 DISSICUSSION 92 MATERIALS ANDMETHODS 97 REFERENCES 99

CHAPTER 5. GENERAL CONCLUSIONS GENERAL CONCLUSIONS 103 REFERENCES 106

ACKNOWLEDGEMENTS 107

iii

ABSTRACT

One important step in the replication of LTR retrotransposons is the selection of a favorable chromosomal site for integrating cDNA into the host genome. The data emerging from studies of and the yeast LTR retrotransposons suggest that a tethering mechanism underlies target site choice: integration complexes are tethered to specific components of chromatin, and this determines where the integrates. The chromoviruses are Ty3/gypsy retrotransposons that have chromodomains located in their integrase C termini. Three groups of chromoviruses are described based on amino acid sequence relationships of their chromodomains and comparisons of their chromodomains to the heterochromain protein I (HP1) chromodomain, which typically recognizes histone H3- K9 methylation, an epigenetic mark characteristic of heterochromatin. When fused to fluorescent marker proteins, the chromoviral chromodomains target proteins to specific subnuclear foci coincident with heterochromatin. The chromodomain of Maggy, a fungal chromovirus, recognizes histone H3 dimethyl- and trimethyl-K9 in vitro. Furthermore, when fused to the integrase of the Schizosaccharomyces pombe Tf1 retrotransposon, the Maggy chromodomain directs integration to sites of H3-K9 methylation. A model was suggested in which mobile elements specifically target heterochromatin and then perpetuate heterochromatin by establishing epigenetic marks through the RNAi pathway. The plant centromere-specific retrotransposons (CR elements) are chromoviruses that are highly associated with centromeres. An eleven amino acid sequence motif (the CR motif) was identified in the C-terminus of CR from both monocots and dicots. When a larger domain containing the CR motif was fused to YFP and expressed in Arabidopsis protoplasts, specific nuclear loci were observed that were coincident with centromere- specific markers, namely the DAPI-staining chromocenters and CENP-C, a component of the kinetochore. Enrichment of the CR domain in the chromocenters was decreased in ddm1 and met1 mutant cells, which have reduced levels of heterochromatin. An arginine residue was identified in the CR motif that is required for its function. Mutations in this residue disrupt the ability of the CR motif to target proteins to centromeres. Collectively, these data suggest that plant CR elements adhere to the tethering model and that the integrase-encoded CR

iv motif recognizes a specific component of plant centromeric heterochromatin to direct integration of CR elements to centromeres. Understanding the mechanism by which plant retrotransposons target integration requires functional elements that can be genetically manipulated. Toward this end, a retrotransposon vector system was developed that is derived from the tobacco Tnt1 retrotransposon. Mini-Tnt1 vectors were constructed by replacing portions of the Tnt1 open reading frames with a selectable marker gene. Tnt1-encoded gene products required for transposition were provided in trans by endogenous Tnt1 elements, whose expression was induced during the generation of tobacco protoplasts. Two different mini-Tnt1 vectors were developed: transcription of one is driven by a complete 5’ LTR; transcription of the other is driven by a chimeric 5’ LTR in which the U3 region was replaced by the CaMV 35S promoter. It was shown that both vectors can be effectively complemented in trans by the endogenous helper Tnt1s after transformation into tobacco protoplasts. Also, like endogenous Tnt1, insertion sites of mini-Tnt1s were within or near coding sequences. Experimental evidence was obtained indicating that genetic recombination occurs during Tnt1 reverse transcription and that multiple copies of Tnt1 mRNA are packaged into virus-like particles. This supports an emerging picture indicating that multiple mRNAs are used during reverse transcription of diverse LTR retroelements, and that recombination and template switching are common occurrences during replication.

1

CHAPTER 1. GENERAL INTRODUCTION

The discovery of transposable elements – the major building blocks of the genome – was made by Barbara McClintock in the late 1940s. Her insight and careful experimentation made it clear that transposable elements are able to move or change position within a genome. Since then, transposable elements have been shown to drive genome change in multiple ways (Bennetzen, 2000). Through the process of integration, they can alter gene sequences, regulate gene expression, or initiate chromosome rearrangements. Although the host has evolved mechanisms to control transposition, massive expansions of transposable elements have been tolerated during evolution. For these reasons, transposable elements are useful tools for understanding gene structure and genome evolution (Kumar and Bennetzen, 1999). In addition, they have been developed as useful devices for mutagenesis and gene delivery in experimentally tractable organisms (Kumar and Hirochika, 2001).

LTR retrotransposons Transposable elements share two major properties: one is the ability to move from one place to another within a genome and the other is the ability to multiply their copy number via transposition. Both of these two activities are achieved by the two major classes of transposable elements, namely the Class I elements or retrotransposons and the Class II or DNA transposable elements. These two classes are distinguished by their replication strategies. Retrotransposons transpose through reverse-transcription of an mRNA intermediate, whereas DNA transposons cut and and then paste a DNA copy of an element into a new location. In contrast to DNA transposons, the replication mechanism of retrotransposable elements can greatly amplify copy numbers, and thereby they rapidly increase host genome size. For example, the contribution of retrotransposons to genome content ranges from 3% in budding yeast to50- 80% in (Kim et al., 1998; Meyers et al., 2001; SanMiguel et al., 1998). The abundance of retrotransposons is an important evolutionary factor in shaping genomes and driving processes such as mutation, recombination, sequence duplication and genome expansion.

2

Based on their genetic organization and sequence similarities, retrotransposons are divided into 5 major orders: LTR () retrotransposons, DIRS-like elements, Penelope-like elements, LINEs and SINEs (Wicker et al., 2007). Among these, the LTR retrotransposons are the most extensively studied due to their close relationship with the retroviruses. Retrotransposons and retroviruses are collectively called retroelements. Retroelements encode two primary : gag, which encodes a structural protein, and pol, which encodes a group of enzymes including reverse transcriptase (RT), integrase (IN) and protease (PR). The retroelement LTRs do not encode any known proteins, but they contain the promoters and terminators required for transcription (Kumar et al, 1999). LTR retrotransposons are further sub-classified into /Ty1- copia and /Ty3-gypsy elements, which differ from each other both in their degree of sequence similarity and the order of their encoded gene products (Xiong et al, 1990). For the Pseudoviridae/Ty1-copia elements, integrase precedes reverse transcriptase, and this gene order is reversed for the Ty3-gypsy elements.

The life cycle of the LTR retrotransposons includes several major steps (Fig. 1): transcription, translation, particle formation, reverse transcription and integration. The life cycle begins when the host transcription and translation machinery produce an mRNA that is translated into a polyprotein. Protease processes the polyprotein into the four major functional units: Gag, protease (PR), integrase (IN) and reverse transcriptase (RT). Gag forms the virus-like particle (VLP), in which RNA copies of the element are packaged. Reverse transcription is carried out by RT to produce cDNA. IN then binds the cDNA as part of a preintegration complex and inserts the cDNA into the genome. The primary difference between the LTR retrotransposons and the retroviruses is that the VLP does not leave the host through membrane budding and fusion, which is the function of the retroviral env gene.

3

Figure 1. Life cycle of the LTR retroelements. The primary difference between the LTR retrotransposons and the retroviruses is that the retroviral particle can be released from the host and infect new cells through membrane budding and fusion. IN, integrase; RT, reverse transcriptase; VLP, virus-like particle.

Integrase Integration is an essential step in retrotransposon replication and is required for the persistence and spread of retroelements. Retrotransposon and retroviral integrases (INs) have three distinct domains (Fig. 2): 1) an N-terminal region with a HHCC zinc-binding domain that has potential cDNA-binding activity; 2) a catalytic domain that performs the integration reaction and has a signature DD35E motif; and 3) a highly variable C-terminal domain. The catalytic domain terminates about 120 amino acids downstream of the glutamate in the DD35E motif (Fig. 2). The C-terminus begins at that point and extends to the end of integrase.

In contrast to the high conservation of the zinc-binding motif and the catalytic domain, the C-terminal domain is divergent both in structure and function among different

4 families of retrotransposons. For the Pseudovirdae/Ty1-copia elements, a GKGY motif, which is believed to be necessary for the stability of integrase and reverse transcriptase, is found about 65 residues downstream of the catalytic domain (Peterson-Burch and Voytas, 2002)unpublished data). A nuclear localization signal was also identified at the C-terminus of integrase in both the Ty1 and Ty3 elements of S. cerevisiae (Kenna et al., 1998; Lin et al., 2001). A six amino acid targeting domain (TD) is located at the very C-terminus of the integrase of Ty5, a member of Ty1-copia family of retrotransposons. Ty5 is specifically enriched in heterochromatic regions due to targeted transposition. Targeting by Ty5 results from an interaction between TD and Sir4p, a structural component of heterochromatin. For the Metaviridae/Ty3-gyspy elements, a GPF/Y motif was identified in the IN C-terminus by sequence analysis (Fig. 2) (Malik and Eickbush, 1999). The GPF/Y motif promotes integrase multimerization and intermolecular strand joining, based on studies of Tf1, a LTR retrotransposon in Schizosaccharomyces pombe (Ebina et al., 2008). Chromovirus is a genus of Metaviridae/Ty3-gyspy elements, whose IN C-termini contain a chromodomain downstream of the GPF/Y motif (Malik and Eickbush, 1999). The chromodomain of the fungal element, Maggy, interacts with H3 dimethyl- and trimethyl-K9. When fused to the IN of the S. pombe Tf1 element, the chromodomain directs Tf1 integration to sites of H3-K9 methylation. In addition to its targeting function, the Maggy chromodomain also confers high transposition activity to the element in in vivo studies (Nakayashiki et al., 2005). The Tf1 chromodomain also modulates the activity and specificity of integrase (Hizi and Levin, 2005). The IN C-terminus, therefore, carries out various functions in different retrotransposons to guarantee precision and specificity of the integration reaction. In the following chapter, I will discuss more about the targeting function of the C-terminus of integrase, specifically the C-terminus of the centromere-specific retrotransposons, which are a specific lineage of Chromoviruses.

5

Figure 2. Conserved sequence features of LTR-retrotransposon and retroviral integrases. Conserved amino acid residues in the protein are labeled.

Targeted integration A non-random distribution for retroviruses and retrotransposons has been reported in many different eukaryotes (Hua-Van et al., 2005; Kim et al., 1998; Labrador and Corces, 1997). In yeast, nearly all retrotransposons show target specificity. For example, the Ty3 elements of Saccharomyces cerevisiae target specifically to a few nucleotides upstream of RNA polymerase III (Pol III) transcription initiation sites (Chalker, 1992). Ty1 finds its genomic niche within 750 bp upstream of Pol III transcribed genes (Devine et al, 1996). Ty5 primarily inserts in silenced regions of the yeast genome, including the silent mating-type cassettes and the telomeric regions (Gai and Voytas, 1998; Zou and Voytas, 1997).

Diverse evolutionary influences constrain the chromosomal distribution of retrotransposons within a genome. These include purifying selection, which acts on the host to counter the deleterious effects caused by integration in gene-rich euchromatin. Further, different rates of recombination across chromosomes can cause differential accumulation or loss of mobile elements (Charlesworth and Langley, 1989). The retroelements, themselves, can influence their chromosomal distribution. In the targeting model, the retroelement- encoded integrase recognizes specific features of chromatin and this causes the retrotransposon to accumulate in distinct chromosomal regions. The tethered-targeting model is supported by an increasing amount of data from studies of yeast retrotransposons and the retroviruses (Bushman, 2003).

6

Retrotransposons use several strategies for targeted integration. Some retrotransposons show sequence-specificity. The R2 elements, for example, which are non- LTR retrotransposons of insects, encode a sequence-specific endonuclease that is responsible for its integration at a unique site in the 28S DNA (Yang et al., 1999). Another retrotransposon from Drosophila, ZAM, preferentially recognizes and inserts into the sequence 5’-GCGCGCg-3’(Leblanc et al., 1999).

The most compelling evidence for the retroelement targeting model comes from studies of the Ty5 retrotransposons of Saccharomyces cerevisiae. It was discovered that the yeast Ty5 retrotransposon typically integrates into regions of heterochromatin at the and silent mating loci (Zou et al., 1996). Targeting by Ty5 has been shown to rely on the interaction between IN and Sir4p, the latter being a structural component of heterochromatin (Xie et al., 2001). The domain of Ty5 integrase (six amino acids in the integrase C-terminus) that interacts with Sir4p (a region near the C-terminus) has been characterized (Zhu et al, 2003). Furthermore, Ty5 integration hotspots can be created when Sir4p is tethered to ectopic DNA sites, implying that the interaction between TD and Sir4p is the only determinant of Ty5 target specificity (Zhu et al., 2003).

The targeting model is also supported by studies of the Tf1 retrotransposon in S. pombe, the HIV, and other Ty retrotransposons in S. cerevisase(Ciuffi and Bushman, 2006; Kirchner et al., 1995; Leem et al., 2008; Sandmeyer, 2003). For example, in Saccharomyces cerevisiae, Ty1 and Ty3 elements integrate at sites of RNA polymerase III transcription by recognizing the Pol III transcription complex or chromatin components associated with Pol III transcription (Bachman et al., 2005; Mou et al., 2006; Yieh et al., 2002; Yieh et al., 2000). The interaction between IN and a transcription activator, Atf1p, brings Tf1 to the Fab1 promoter. In addition, retroviruses use a similar strategy as Ty5 for targeted integration. An interaction between HIV integrase and the transcription factor LEDGF/p75 underlies HIV’s preference to integrate near actively transcribed genes (Ciuffi et al., 2005; Llano et al., 2006; Shun et al., 2007).

7

LTR retrotransposons in plant genomes Retrotransposons are ubiquitous in plant genomes. About 50% of the maize, rye, barley and wheat genomes are thought to be composed of retrotransposons. In addition, it is generally believed that the differences in genome size observed in the plant kingdom are caused by variations in the content of LTR retrotransposons and polyploidy (Kumar and Benetzen 1999; Tikhonov et al, 1999; Wicker et al, 2001). One aspect of LTR retrotransposon biology that interests a great many plant researchers concerns the question of how genomic distribution patterns of retrotransposons are achieved.

Sequence analysis of the Arabidopsis genome showed that Pseudoviridae and Metaviridae tend to accumulate within pericentromeric heterochromatin. This is particularly pronounced for certain Metaviridae sublineages (Peterson-Burch et al., 2004). A variety of factors, including selection against integration into euchromatin, preferential targeting to heterochromatin and suppression of recombination near the pericentromere, likely contribute to this non-random distribution. Comparisons based on the age of elements and their chromosomal location suggests that integration-site specificity may be the determining factor for the non-random distribution of the Metaviridae (Pereira, 2004). However, there has been no experimental data to verify this prediction.

Centromeres and transposable elements Centromeres are specialized regions of the chromosome that guarantee the faithful segregation of genetic material during mitosis and meiosis. They serve as the anchor point for the spindle microtubules, which attach to a proteinaceous structure called the kinetochore. The kinetocore is a large multiprotein complex that orchestrates chromosome movement (Jiang et al., 2003). Centromeric regions are defined as those sequences to which the kinetochore is localized. By contrast, the pericentromeric region is defined as the area that joins sister centromeres together and maintains a boundary that isolates the centromeres from the chromosome arms (Dawe, 2003). Although the functions of centromeres are conserved among all eukaryotes, both the DNA and proteins that comprise kinetochores has evolved rapidly (Dawe and Henikoff, 2006).

8

Centromeric DNA is strikingly variable among species, ranging from the simple 125 bp centromeres of S. cerevisiae to the holocentric centromeres of Caenorhabditis elegans, which span the entire length of the chromosome. In most eukaryotes, however, the centromeres consist of long stretches of short tandem repetitive “satellite” DNA. For example, human centromeres are composed of several megabases of simple AT-rich 171 bp ‐satellite repeats, and Arabidopsis centromeres are made up of long arrays of a different AT-rich 178 bp satellite repeat (Wevrick and Willard, 1989, Hosouchi, 2002 #107). Although structurally similar, the high sequence divergence among satellite repeats, even between closely related species, suggests that centromeric satellite repeats are the most rapidly evolving regions of complex genomes (Malik and Henikoff, 2002).

Unlike the high variability in size and sequence of centromeric DNA, the architecture and composition of centromeric chromatin is conserved between different species. The most prominent epigenetic hallmark of centromeric chromatin is the presence of a specialized histone H3 named CENH3 (also known as CENP-A in humans, or Cse4 in S. cerevisiae). The presence of CENH3 distinguishes centromeric chromatin from the surrounding pericentromeric heterochromatin. It was shown that CENH3 is the core histone that replaces canonical histone H3 in centromeric nucleosomes (Palmer et al., 1991; Palmer et al., 1987). Centromere-specific nucleosomes serve as the foundation for the assembly of kinetochore proteins. In yeast, all known components of the kinetochore require centromere-specific nucleosomes to assemble (Collins et al., 2005), and it is likely that kinetochore proteins are localized to centromeres in plants and animals through similar mechanisms. The post- translational modification of histones is another level of epigenetic regulation that occurs at centromeres. In humans, Ser-7 of CENH3 was found to be phosphorylated by Aurora A or Aurora B kinase, a modification that is important for proper kinetochore function (Kunitoku et al., 2003). In addition, histone H3 modifications present in human and Drosophila centromeric chromatin, which are enriched in H3 di-methyl K4, are distinct from those in euchromatin and heterochromatin (Sullivan and Karpen, 2004).

9

The recent discovery of non-octameric nucleosomes at centromeres triggered a reinterpretation of the classical model of chromatin organization. At S. cerevisiae centromeres, Scm3, forms a complex with the CENH3 homolog Cse4 and histone H4 to create abnormal nucleosomes that lack H2A and H2B (Mizuguchi et al., 2007; Zhang et al., 2007). In Drosophila, CENH3-containing nucleosomes were also found to occur as a tetramer including one molecule each of CENH3, H4, H2A and H2B, referred to as a hemisome (Dalal et al., 2007). If confirmed, those models will have radical implications on our understanding of centromere-specific nucleosome structure and function as well as how higher order chromatin is formed.

Transposable elements are the most abundant building blocks of genetic material in higher eukaryotes. As a rule, the abundance of retrotransposons is lower in centromeres compared to their frequency in the other regions of the genome. This implies that inserted transposons are removed frequently by unequal recombination or that transposon invasions can not be tolerated at centromeres (Jiang et al., 2003). Recent studies, however, have found exceptions to the above rule, and an increasing number of diverse species have been shown to have quite a few transposable elements in their centromeric and/or pericentromeric regions, suggesting that these elements may directly contribute to the structure and function of the centromere (Amor and Choo, 2002; Cheng et al., 2002; Zhong et al., 2002).

There are three major lines of evidence that indicate a role for transposable elements in the evolution of centromeric DNA and centromeric proteins (Wong and Choo, 2004). First, the presence of centromere-specific retrotransposons (CRs) provides direct evidence to support the idea that transposable elements are involved in centromere structure. CRs were first found at the centromeres of different grass species, but they have recently been identified in many monocots and dicots (Aragon-Alcaide et al., 1996; Gorinsek et al., 2004; Jiang et al., 1996; Miller et al., 1998; Presting et al., 1998). They are specifically enriched in centromeres and share high levels (80%) of DNA sequence conservation among various grass species, including those species estimated to have diverged >55 million years ago. In addition, CRs are irregularly interspersed with the centromeric satellite repeats, and both

10 interact with CENH3 in species such as rice, maize, barley and sugarcane (Houben et al., 2007; Hudakova et al., 2001; Nagaki et al., 2004; Nagaki and Murata, 2005; Zhong et al., 2002).

Several lines of evidence indicate that transposable elements have served as templates to evolve centromeric and pericentromeric repeats. The 250 bp centromeric tandem repeats of wheat show high similarity with part of a Ty3/gyspy-like retrotransposable element (cereba) (Cheng and Murata, 2003). Also, one of the En/Spm-like (Atenspm) DNA transposons was found to be involved in generating satellite units in pericentromeric regions of (Kapitonov and Jurka, 1999).

Finally, transposable elements have contributed to the evolution of centromere- specific proteins. CENP-B is an important centromere-associated protein, which displays high sequence similarity to pogo-like transposases (Casola et al., 2008), and CENP-B likely originated from these transposases. There are three CENP-B homologs in S. pombe (Abp1, Cbh1 and Cbh2), which bind to the centromeric dg and dh repeats and are believed to be involved in initiating heterochromatin formation (Nakagawa et al., 2002). The dg and dh repeats are transcribed and generate siRNA, which participates in methylation of H3-K9 through recruitment of the heterochromatin protein Swi6 (Volpe et al., 2003; Volpe et al., 2002). -derived proteins, therefore, may act as centromere-specific factors that aid in the assembly of centromeric heterochromatin.

The tobacco Tnt1 retrotransposon Tnt1 as a model for studying mechanisms of retrotransposition in plants Our laboratory has an interest in studying how plant retrotransposons select chromosomal integration sites. One initial challenge is to identify an active retrotransposon to use as a basis for such studies. We have chosen to focus research on the Tnt1 elements of tobacco, which are among few plant retrotransposons known to be transpositionally competent. Tnt1 is a member of a super family of LTR-retrotransposons which are widely distributed in the Solanaceae. The original Tnt1 was isolated after it integrated into a target

11 nitrate reductase gene (Grandbastien et al., 1989). Sequence analysis showed that Tnt1 belongs to the Ty1/copia family, and there are over 100 copies of Tnt1 in the tobacco genome.

The mechanisms by which the Tnt1 elements are regulated have been well studied, particularly at the transcriptional level. Except for low levels of transcription in roots, Tnt1 is not expressed in other organs of healthy tobacco plants (Pouteau et al., 1991). However, the mobility of Tnt1 is highly activated by stresses such as wounding, biotic elicitors and pathogen attack (Grandbastien et al., 1997). Newly transposed Tnt1 copies were detected in nearly 25% of the tobacco plants regenerated from protoplast cultures (Melayah et al., 2001). Activation of transposition is directly related to the increase in Tnt1 transcription observed under stress conditions (Corinne Moreau-Mhiri, 1996; Grandbastien et al., 1997; Mhiri et al., 1997; Sylvie Pouteau, 1994). The transcription of Tnt1 is determined by its regulatory cis- elements, which are located in the LTRs. The most important motifs found in U3 (3’ untranslated region) are the BI box, which is a short palindromic sequence similar to G-Box, and the BII repeats, which are tandem repeats closely related to the H-Box. G- and H-boxes are also found in the regulatory regions of several plant defense genes (Loake et al., 1992), which suggests that LTRs provide a molecular basis for Tnt1 stress regulation. Consistent with this, there is some evidence that BI acts to activate transcription and that BII can interact with specific nuclear factors in tobacco protoplasts (Casacuberta and Grandbastien, 1993).

The U3 region in Tnt1 LTRs is highly divergent among different Tnt1 elements. Many Tnt1 elements, for example, vary in the number of BII repeats. RT-PCR analysis of Tnt1 populations transcribed in protoplasts revealed that about 90% of the transcripts contain 4 BII repeats and 10% contain 3 BII repeats. Sequence analysis of newly transposed Tnt1 insertions, however, revealed a slightly different distribution of BII repeats (66% for 4 BII repeats and 31% for 3 BII repeats) compared to the RNA templates (Casacuberta et al., 1997; Grandbastien et al., 1997). It was suggested that the whole genome population of Tnt1 retrotransposons may be regulated by gradually producing Tnt1 copies with less transcriptional activity (Grandbastien et al., 2005).

12

Since Tnt1 is one of the few plant elements known to be transpositionally active, its insertion preference has been well characterized. Newly transposed Tnt1 elements prefer to integrate within or near gene coding sequences (CDS), and this distribution pattern also applies to pre-existing genomic insertions (Le et al., 2007). Tnt1 insertions within or near CDSs appear to be tolerated by the host, suggesting that Tnt1 may contribute to tobacco genome diversity. Tnt1 was also demonstrated to be active in various heterologous hosts, including Arabidopsis (Courtial et al., 2001; Lucas et al., 1995) , Medicago truncatula (d'Erfurth et al., 2003; Tadege et al., 2008), and lettuce (Lactuca sativa) (Mazier et al., 2007). Analysis of Tnt1 insertion sites in these species revealed that the majority of theTnt1 insertion sites are located in coding regions (Courtial et al., 2001; Mazier et al., 2007; Tadege et al., 2008), thereby demonstrating the potential for Tnt1 to be used for gene tagging or insertional mutagenesis.

Plant LTR retrotransposons have unique features that make them superior for gene tagging comparing with other, traditional insertional elements (Kumar and Hirochika, 2001). Because they transpose via RNA intermediate, the mutations they create are highly stable. Further, transposition target sites are not linked to the progenitor sites, which make it easier to generate a large collection of random insertions. The insertions are also stable during seed to seed propagation (Kumar and Hirochika, 2001). Tnt1 elements are particularly suitable for gene tagging, especially in plants with large genomes, because they transpose into gene- rich regions. Tnt1 elements are very active and generate multiple insertions during regeneration through tissue culture (Courtial et al., 2001; Mazier et al., 2007; Tadege et al., 2008). One goal of the research presented here was to develop Tnt1 for use in basic and applied plant biology.

Dissertation organization Chapter II was published in Genome Research, and the paper was highlighted on the cover. In the manuscript, we provide evidence that retrotransposon-encoded chromodomains recognize specific histone modifications. Further, the interaction between the chromodomain and the modified histone can direct integration complexes to heterchromatic domains,

13 resulting in targeted integration. This work was a collaborative effort with Xiang Gao, a former postdoctoral fellow in our lab. A postdoctoral fellow in Henry Levin’s lab (Hirotaka Ebina) also participated in the study. I showed that chromodomains target YFP to specific nuclear loci that are coincident with the localization of CFP-TFL2 fusion proteins. TFL2 has a chromodomain that targets TFL2 to heterochromatin in Arabidopsis protoplasts. I also performed the control experiments that used the Tnt1 C-terminus of integrase, which targets to euchromatin.

Chapter III describes an 11 amino acid motif at the integrase C-terminus of plant centromeric retrotransposons (CRs). When fused with YFP, the CR motif localizes to specific nuclear domains coincident with DAPI-rich chromocenters. These domains overlap with the region occupied by CENP-C, a major component of the kinetochore. Recruitment of the CR motif to the chromocenter is abrogated in Arabidopsis ddm1 and met1 mutants, which have reduced levels of heterochromatin. These data collectively suggest that the CR motif interacts with a component of plant centromeric heterochromatin and that this interaction contributes to the centromeric distribution of CR elements. I performed all of the work described in this study.

Chapter IV describes a mini-Tnt1 two-component retroelement system designed to replicate and deliver genes of interest into plant genomes. This work was a collaborative effort between myself and Jyothi Rajagopal, a former postdoctoral fellow, Phillip Irwin, a former lab manager, and David Wright, a scientist in the lab. Phillip Irwin made the mini- Tnt1 backbone vector. Jyothi Rajagopal made the 35S-mini-Tnt1 element and obtained preliminary results indicating that the mini-Tnt1’s made cDNA. David Wright helped design the constructs, and I performed all the remaining work.

Chapter V provides general conclusions of my research as well as observations and future directions not mentioned in previous chapters.

14

REFERENCES Amor, D.J., and Choo, K.H. (2002). Neocentromeres: role in human disease, evolution, and centromere study. Am J Hum Genet 71, 695-714.

Aragon-Alcaide, L., Miller, T., Schwarzacher, T., Reader, S., and Moore, G. (1996). A cereal centromeric sequence. Chromosoma 105, 261-268.

Bachman, N., Gelbart, M.E., Tsukiyama, T., and Boeke, J.D. (2005). TFIIIB subunit Bdp1p is required for periodic integration of the Ty1 retrotransposon and targeting of Isw2p to S. cerevisiae tDNAs. Genes Dev 19, 955-964.

Bennetzen, J.L. (2000). Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42, 251-269.

Bushman, F.D. (2003). Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell 115, 135-138.

Casacuberta, J.M., and Grandbastien, M.A. (1993). Characterisation of LTR sequences involved in the protoplast specific expression of the tobacco Tnt1 retrotransposon. Nucleic Acids Res 21, 2087-2093.

Casacuberta, J.M., Vernhettes, S., Audeon, C., and Grandbastien, M.A. (1997). Quasispecies in retrotransposons: a role for sequence variability in Tnt1 evolution. Genetica 100, 109-117.

Casola, C., Hucks, D., and Feschotte, C. (2008). Convergent domestication of pogo-like transposases into centromere-binding proteins in fission yeast and mammals. Mol Biol Evol 25, 29-41.

Charlesworth, B., and Langley, C.H. (1989). The population genetics of Drosophila transposable elements. Annu Rev Genet 23, 251-287.

Cheng, Z., Dong, F., Langdon, T., Ouyang, S., Buell, C.R., Gu, M., Blattner, F.R., and Jiang, J. (2002). Functional rice centromeres are marked by a satellite repeat and a centromere- specific retrotransposon. Plant Cell 14, 1691-1704.

Cheng, Z.J., and Murata, M. (2003). A centromeric family originating from a part of Ty3/gypsy-retroelement in wheat and its relatives. Genetics 164, 665-672.

Ciuffi, A., and Bushman, F.D. (2006). Retroviral DNA integration: HIV and the role of LEDGF/p75. Trends Genet 22, 388-395.

Ciuffi, A., Llano, M., Poeschla, E., Hoffmann, C., Leipzig, J., Shinn, P., Ecker, J.R., and Bushman, F. (2005). A role for LEDGF/p75 in targeting HIV DNA integration. Nat Med 11, 1287-1289.

15

Collins, K.A., Castillo, A.R., Tatsutani, S.Y., and Biggins, S. (2005). De novo kinetochore assembly requires the centromeric histone H3 variant. Mol Biol Cell 16, 5649-5660.

Corinne Moreau-Mhiri, J.-B.M., Colette Audeon, Madina Ferault, Marie-Ange Grandbastien, He Lucas, (1996). Regulation of expression of the tobacco Tnt1 retrotransposon in heterologous species following pathogen-related stresses. The Plant Journal 9, 409-419.

Courtial, B., Feuerbach, F., Eberhard, S., Rohmer, L., Chiapello, H., Camilleri, C., and Lucas, H. (2001). Tnt1 transposition events are induced by in vitro transformation of Arabidopsis thaliana, and transposed copies integrate into genes. Mol Genet Genomics 265, 32-42. d'Erfurth, I., Cosson, V., Eschstruth, A., Lucas, H., Kondorosi, A., and Ratet, P. (2003). Efficient transposition of the Tnt1 tobacco retrotransposon in the model legume Medicago truncatula. Plant J 34, 95-106.

Dalal, Y., Furuyama, T., Vermaak, D., and Henikoff, S. (2007). Structure, dynamics, and evolution of centromeric nucleosomes. Proc Natl Acad Sci U S A 104, 15974-15981.

Dawe, R.K. (2003). RNA interference, transposons, and the centromere. Plant Cell 15, 297- 301.

Dawe, R.K., and Henikoff, S. (2006). Centromeres put epigenetics in the driver's seat. Trends Biochem Sci 31, 662-669.

Ebina, H., Chatterjee, A.G., Judson, R.L., and Levin, H.L. (2008). The GPY/F domain of Tf1 integrase multimerizes when present in a fragment, and substitutions in this domain reduce enzymatic activity of the full length protein. J Biol Chem.

Gai, X., and Voytas, D.F. (1998). A single amino acid change in the yeast retrotransposon Ty5 abolishes targeting to silent chromatin. Mol Cell 1, 1051-1055.

Gorinsek, B., Gubensek, F., and Kordis, D. (2004). Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol 21, 781-798.

Grandbastien, M.A., Audeon, C., Bonnivard, E., Casacuberta, J.M., Chalhoub, B., Costa, A.P., Le, Q.H., Melayah, D., Petit, M., Poncet, C., et al. (2005). Stress activation and genomic impact of Tnt1 retrotransposons in Solanaceae. Cytogenet Genome Res 110, 229- 241.

Grandbastien, M.A., Lucas, H., Morel, J.B., Mhiri, C., Vernhettes, S., and Casacuberta, J.M. (1997). The expression of the tobacco Tnt1 retrotransposon is linked to plant defense responses. Genetica 100, 241-252.

16

Grandbastien, M.A., Spielmann, A., and Caboche, M. (1989). Tnt1, a mobile retroviral-like transposable element of tobacco isolated by plant cell genetics. Nature 337, 376-380.

Hizi, A., and Levin, H.L. (2005). The integrase of the long terminal repeat-retrotransposon tf1 has a chromodomain that modulates integrase activities. J Biol Chem 280, 39086-39094.

Houben, A., Schroeder-Reiter, E., Nagaki, K., Nasuda, S., Wanner, G., Murata, M., and Endo, T.R. (2007). CENH3 interacts with the centromeric retrotransposon cereba and GC- rich satellites and locates to centromeric substructures in barley. Chromosoma 116, 275-283.

Hua-Van, A., Le Rouzic, A., Maisonhaute, C., and Capy, P. (2005). Abundance, distribution and dynamics of retrotransposable elements and transposons: similarities and differences. Cytogenet Genome Res 110, 426-440.

Hudakova, S., Michalek, W., Presting, G.G., ten Hoopen, R., dos Santos, K., Jasencakova, Z., and Schubert, I. (2001). Sequence organization of barley centromeres. Nucleic Acids Res 29, 5029-5035.

Jiang, J., Birchler, J.A., Parrott, W.A., and Dawe, R.K. (2003). A molecular view of plant centromeres. Trends Plant Sci 8, 570-575.

Jiang, J., Nasuda, S., Dong, F., Scherrer, C.W., Woo, S.S., Wing, R.A., Gill, B.S., and Ward, D.C. (1996). A conserved repetitive DNA element located in the centromeres of cereal chromosomes. Proc Natl Acad Sci U S A 93, 14210-14213.

Kapitonov, V.V., and Jurka, J. (1999). Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica 107, 27-37.

Kenna, M.A., Brachmann, C.B., Devine, S.E., and Boeke, J.D. (1998). Invading the yeast nucleus: a nuclear localization signal at the C terminus of Ty1 integrase is required for transposition in vivo. Mol Cell Biol 18, 1115-1124.

Kim, J.M., Vanguri, S., Boeke, J.D., Gabriel, A., and Voytas, D.F. (1998). Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res 8, 464-478.

Kirchner, J., Connolly, C.M., and Sandmeyer, S.B. (1995). Requirement of RNA polymerase III transcription factors for in vitro position-specific integration of a retroviruslike element. Science 267, 1488-1491.

Kumar, A., and Bennetzen, J.L. (1999). Plant retrotransposons. Annu Rev Genet 33, 479- 532.

Kumar, A., and Hirochika, H. (2001). Applications of retrotransposons as genetic tools in plant biology. Trends Plant Sci 6, 127-134.

17

Kunitoku, N., Sasayama, T., Marumoto, T., Zhang, D., Honda, S., Kobayashi, O., Hatakeyama, K., Ushio, Y., Saya, H., and Hirota, T. (2003). CENP-A phosphorylation by Aurora-A in prophase is required for enrichment of Aurora-B at inner centromeres and for kinetochore function. Dev Cell 5, 853-864.

Labrador, M., and Corces, V.G. (1997). Transposable element-host interactions: regulation of insertion and excision. Annu Rev Genet 31, 381-404.

Le, Q.H., Melayah, D., Bonnivard, E., Petit, M., and Grandbastien, M.A. (2007). Distribution dynamics of the Tnt1 retrotransposon in tobacco. Mol Genet Genomics 278, 639-651.

Leblanc, P., Dastugue, B., and Vaury, C. (1999). The integration machinery of ZAM, a retroelement from Drosophila melanogaster, acts as a sequence-specific endonuclease. J Virol 73, 7061-7064.

Leem, Y.E., Ripmaster, T.L., Kelly, F.D., Ebina, H., Heincelman, M.E., Zhang, K., Grewal, S.I., Hoffman, C.S., and Levin, H.L. (2008). Retrotransposon Tf1 is targeted to Pol II promoters by transcription activators. Mol Cell 30, 98-107.

Lin, S.S., Nymark-McMahon, M.H., Yieh, L., and Sandmeyer, S.B. (2001). Integrase mediates nuclear localization of Ty3. Mol Cell Biol 21, 7826-7838.

Llano, M., Vanegas, M., Hutchins, N., Thompson, D., Delgado, S., and Poeschla, E.M. (2006). Identification and characterization of the chromatin-binding domains of the HIV-1 integrase interactor LEDGF/p75. J Mol Biol 360, 760-773.

Loake, G.J., Faktor, O., Lamb, C.J., and Dixon, R.A. (1992). Combination of H-box [CCTACC(N)7CT] and G-box (CACGTG) cis elements is necessary for feed-forward stimulation of a chalcone synthase promoter by the phenylpropanoid-pathway intermediate p- coumaric acid. Proc Natl Acad Sci U S A 89, 9230-9234.

Lucas, H., Feuerbach, F., Kunert, K., Grandbastien, M.A., and Caboche, M. (1995). RNA- mediated transposition of the tobacco retrotransposon Tnt1 in Arabidopsis thaliana. EMBO J 14, 2364-2373.

Malik, H.S., and Eickbush, T.H. (1999). Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73, 5186-5190.

Malik, H.S., and Henikoff, S. (2002). Conflict begets complexity: the evolution of centromeres. Curr Opin Genet Dev 12, 711-718.

Mazier, M., Botton, E., Flamain, F., Bouchet, J.P., Courtial, B., Chupeau, M.C., Chupeau, Y., Maisonneuve, B., and Lucas, H. (2007). Successful gene tagging in lettuce using the Tnt1 retrotransposon from tobacco. Plant Physiol 144, 18-31.

18

Melayah, D., Bonnivard, E., Chalhoub, B., Audeon, C., and Grandbastien, M.A. (2001). The mobility of the tobacco Tnt1 retrotransposon correlates with its transcriptional activation by fungal factors. Plant J 28, 159-168.

Meyers, B.C., Tingey, S.V., and Morgante, M. (2001). Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res 11, 1660- 1676.

Mhiri, C., Morel, J.B., Vernhettes, S., Casacuberta, J.M., Lucas, H., and Grandbastien, M.A. (1997). The promoter of the tobacco Tnt1 retrotransposon is induced by wounding and by abiotic stress. Plant Mol Biol 33, 257-266.

Miller, J.T., Dong, F., Jackson, S.A., Song, J., and Jiang, J. (1998). Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150, 1615-1623.

Mizuguchi, G., Xiao, H., Wisniewski, J., Smith, M.M., and Wu, C. (2007). Nonhistone Scm3 and histones CenH3-H4 assemble the core of centromere-specific nucleosomes. Cell 129, 1153-1164.

Mou, Z., Kenny, A.E., and Curcio, M.J. (2006). Hos2 and Set3 promote integration of Ty1 retrotransposons at tRNA genes in Saccharomyces cerevisiae. Genetics 172, 2157-2167.

Nagaki, K., Cheng, Z., Ouyang, S., Talbert, P.B., Kim, M., Jones, K.M., Henikoff, S., Buell, C.R., and Jiang, J. (2004). Sequencing of a rice centromere uncovers active genes. Nat Genet 36, 138-145.

Nagaki, K., and Murata, M. (2005). Characterization of CENH3 and centromere-associated DNA sequences in sugarcane. Chromosome Res 13, 195-203.

Nakagawa, H., Lee, J.K., Hurwitz, J., Allshire, R.C., Nakayama, J., Grewal, S.I., Tanaka, K., and Murakami, Y. (2002). Fission yeast CENP-B homologs nucleate centromeric heterochromatin by promoting heterochromatin-specific histone tail modifications. Genes Dev 16, 1766-1778.

Nakayashiki, H., Awa, T., Tosa, Y., and Mayama, S. (2005). The C-terminal chromodomain- like module in the integrase domain is crucial for high transposition efficiency of the retrotransposon MAGGY. FEBS Lett 579, 488-492.

Palmer, D.K., O'Day, K., Trong, H.L., Charbonneau, H., and Margolis, R.L. (1991). Purification of the centromere-specific protein CENP-A and demonstration that it is a distinctive histone. Proc Natl Acad Sci U S A 88, 3734-3738.

19

Palmer, D.K., O'Day, K., Wener, M.H., Andrews, B.S., and Margolis, R.L. (1987). A 17-kD centromere protein (CENP-A) copurifies with nucleosome core particles and with histones. J Cell Biol 104, 805-815.

Pereira, V. (2004). Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5, R79.

Peterson-Burch, B.D., Nettleton, D., and Voytas, D.F. (2004). Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol 5, R78.

Peterson-Burch, B.D., and Voytas, D.F. (2002). Genes of the Pseudoviridae (Ty1/copia retrotransposons). Mol Biol Evol 19, 1832-1845.

Pouteau, S., Huttner, E., Grandbastien, M.A., and Caboche, M. (1991). Specific expression of the tobacco Tnt1 retrotransposon in protoplasts. EMBO J 10, 1911-1918.

Presting, G.G., Malysheva, L., Fuchs, J., and Schubert, I. (1998). A Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J 16, 721-728.

Sandmeyer, S. (2003). Integration by design. Proc Natl Acad Sci U S A 100, 5586-5588.

SanMiguel, P., Gaut, B.S., Tikhonov, A., Nakajima, Y., and Bennetzen, J.L. (1998). The paleontology of intergene retrotransposons of maize. Nat Genet 20, 43-45.

Shun, M.C., Raghavendra, N.K., Vandegraaff, N., Daigle, J.E., Hughes, S., Kellam, P., Cherepanov, P., and Engelman, A. (2007). LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev 21, 1767-1778.

Sullivan, B.A., and Karpen, G.H. (2004). Centromeric chromatin exhibits a histone modification pattern that is distinct from both euchromatin and heterochromatin. Nat Struct Mol Biol 11, 1076-1083.

Sylvie Pouteau, M.-A.G., Martine Boccara, (1994). Microbial elicitors of plant defence responses activate transcription of a retrotransposon. The Plant Journal 5, 535-542.

Tadege, M., Wen, J., He, J., Tu, H., Kwak, Y., Eschstruth, A., Cayrel, A., Endre, G., Zhao, P.X., Chabaud, M., et al. (2008). Large-scale insertional mutagenesis using the Tnt1 retrotransposon in the model legume Medicago truncatula. Plant J 54, 335-347.

Volpe, T., Schramke, V., Hamilton, G.L., White, S.A., Teng, G., Martienssen, R.A., and Allshire, R.C. (2003). RNA interference is required for normal centromere function in fission yeast. Chromosome Res 11, 137-146.

20

Volpe, T.A., Kidner, C., Hall, I.M., Teng, G., Grewal, S.I., and Martienssen, R.A. (2002). Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science 297, 1833-1837.

Wevrick, R., and Willard, H.F. (1989). Long-range organization of tandem arrays of alpha satellite DNA at the centromeres of human chromosomes: high-frequency array-length polymorphism and meiotic stability. Proc Natl Acad Sci U S A 86, 9394-9398.

Wicker, T., Sabot, F., Hua-Van, A., Bennetzen, J.L., Capy, P., Chalhoub, B., Flavell, A., Leroy, P., Morgante, M., Panaud, O., et al. (2007). A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8, 973-982.

Wong, L.H., and Choo, K.H. (2004). Evolutionary dynamics of transposable elements at the centromere. Trends Genet 20, 611-616.

Xie, W., Gai, X., Zhu, Y., Zappulla, D.C., Sternglanz, R., and Voytas, D.F. (2001). Targeting of the yeast Ty5 retrotransposon to silent chromatin is mediated by interactions between integrase and Sir4p. Mol Cell Biol 21, 6606-6614.

Yang, J., Malik, H.S., and Eickbush, T.H. (1999). Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc Natl Acad Sci U S A 96, 7847-7852.

Yieh, L., Hatzis, H., Kassavetis, G., and Sandmeyer, S.B. (2002). Mutational analysis of the transcription factor IIIB-DNA target of Ty3 retroelement integration. J Biol Chem 277, 25920-25928.

Yieh, L., Kassavetis, G., Geiduschek, E.P., and Sandmeyer, S.B. (2000). The Brf and TATA- binding protein subunits of the RNA polymerase III transcription factor IIIB mediate position-specific integration of the gypsy-like element, Ty3. J Biol Chem 275, 29800-29807.

Zhang, W., Mellone, B.G., and Karpen, G.H. (2007). A specialized nucleosome has a "point" to make. Cell 129, 1047-1049.

Zhong, C.X., Marshall, J.B., Topp, C., Mroczek, R., Kato, A., Nagaki, K., Birchler, J.A., Jiang, J., and Dawe, R.K. (2002). Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14, 2825-2836.

Zhu, Y., Dai, J., Fuerst, P.G., and Voytas, D.F. (2003). Controlling integration specificity of a yeast retrotransposon. Proc Natl Acad Sci U S A 100, 5891-5895.

Zou, S., and Voytas, D.F. (1997). Silent chromatin determines target preference of the Saccharomyces retrotransposon Ty5. Proc Natl Acad Sci U S A 94, 7412-7416.

21

CHAPTER 2. CHROMODOMAINS DIRECT INTEGRATION OF RETROTRANSPOSONS TO HETEROCHROMATIN

Xiang Gao, Yi Hou, Hirotaka Ebina, Henry Levin, and Daniel F. Voytas Published in Genome Research, 18:359-369

ABSTRACT The enrichment of mobile genetic elements in heterochromatin may be due, in part, to targeted integration. The chromoviruses are Ty3/gypsy retrotransposons with chromodomains at their integrase C-termini. Chromodomains are logical determinants for targeting to heterochromatin, because the chromodomain of heterochromatin protein 1 (HP1) typically recognizes histone H3 K9 methylation, an epigenetic mark characteristic of heterochromatin. We describe three groups of chromoviruses based on amino acid sequence relationships of their integrase C-termini. Genome sequence analysis indicates that representative chromoviruses from each group are enriched in gene poor regions of the genome relative to other retrotransposons, and when fused to fluorescent marker proteins, the chromodomains target proteins to specific subnuclear foci coincident with heterochromatin. The chromodomain of the fungal element, MAGGY, interacts with histone H3 dimethyl- and trimethyl-K9, and when the MAGGY chromodomain is fused to integrase of the Schizosaccharomyces pombe Tf1 retrotransposon, new Tf1 insertions are directed to sites of H3 K9 methylation. Repetitive sequences such as transposable elements trigger the RNAi pathway resulting in their epigenetic modification. Our results suggest a dynamic interplay between retrotransposons and heterochromatin, wherein mobile elements recognize heterochromatin at the time of integration and then perpetuate the heterochromatic mark by triggering epigenetic modification.

INTRODUCTION Mobile elements are a major component of the genetic material, and to protect genomes from the deleterious consequences of transposition, cells have evolved strategies,

22 such as RNAi, to reign in mobile element amplification (Bosher and Labouesse 2000; Vastenhouw and Plasterk 2004). Such surveillance systems lead to epigenetic inactivation of mobile elements through posttranscriptional silencing (Sijen and Plasterk 2003), histone and DNA modifications (Lippman and Martienssen 2004; Martienssen et al. 2004), as well as the recently discovered piwi/Aubergine germline silencing pathway (Girard et al. 2006; Grivna et al. 2006; Vagin et al. 2006). Mobile element insertions that decorate eukaryotic genomes, therefore, frequently define unique chromatin domains. Whereas the genetic consequences of transposition in terms of mutation and genome rearrangement have long been recognized, the biological consequence of their epigenetic marks, which include effects on gene expression and the formation of heterochromatin, are only beginning to be appreciated (Slotkin and Martienssen 2007).

Underlying the genetic and epigenetic impact of mobile elements is integration site choice. Although forces such as selection or recombination contribute to the non-random distribution of mobile elements in eukaryotic genomes, an increasing number of studies indicate that many mobile elements target integration to specific chromosomal sites (Bushman 2003). For some elements, target sites are determined by recognizing specific DNA sequences whereas for others, including several retrotransposons and retroviruses, chromatin impacts target site choice. The Ty1 and Ty3 retrotransposons of Saccharomyces cerevisiae, for example, integrate near sites of RNA polymerase III transcription by recognizing pol III transcription complexes or chromatin states associated with pol III transcription (Bachman et al. 2005; Mou et al. 2006; Yieh et al. 2002; Yieh et al. 2000). The Tf1 retrotransposon of Schizosaccharomyces pombe recognizes certain pol II promoters (Bowen et al. 2003; Singleton and Levin 2002). Similarly, the retroviruses, which are closely related to the retrotransposons, show target site biases. Murine leukemia virus (MLV) shows a preference for promoter regions (Wu et al. 2003), and human immunodeficiency virus (HIV) integrates preferentially into actively transcribed genes at sites with transcription- associated histone modifications (Schroder et al. 2002; Wang et al. 2007).

23

In contrast to targeting to specific transcription complexes or chromatin states associated with transcription, the Ty5 retrotransposon of S. cerevisiae integrates preferentially into transcriptionally inactive regions of the genome, namely heterochromatin found at the telomeres and silent mating loci (Zou et al. 1996). Targeting to heterochromatin requires a six amino acid motif (the targeting domain) at the C-terminus of Ty5 integrase that interacts with the heterochromatin protein Sir4 (Gai and Voytas 1998; Xie et al. 2001). This interaction tethers the Ty5 integration complex to target sites resulting in the observed target site biases. Although targeting determinants encoded by other retrotransposons remain to be identified, increasingly, it appears that integrase is the mediator of target site choice. Recent experiments in which domains of HIV and MLV integrase were swapped demonstrated that integrase is responsible for the different patterns of integration observed at genes for these two viruses (Lewinski et al. 2006), and in a separate study, sequence differences among various retroviral integrases were found to correlate with global and local integration patterns (Derse et al. 2007). Analogous to the Ty5/Sir4 relationship, an interaction between HIV integrase and the transcription factor LEDGF/p75 underlies HIV’s preference for actively transcribed genes (Ciuffi et al. 2005; Llano et al. 2006; Shun et al. 2007).

The targeting determinants for Ty5 lie within the integrase C-terminus, a region of the protein that is highly divergent among retrotransposons both in sequence composition and size. One exception occurs in a specific lineage of retrotransposons referred to as the chromoviruses (family Metaviridae) (Gorinsek et al. 2004; Gorinsek et al. 2005; Marin and Llorens 2000). The integrase C-termini of some chromoviruses have a chromodomain (CHD) – an approximately 40-50 amino acid sequence motif that can interact with diverse targets, including proteins, RNA, and DNA (Brehm et al. 2004). Among the best- characterized CHD partners are methylated histone residues. The heterochromatin protein 1 (HP1) (chromobox homolog, CBX) CHD typically interacts with histone H3 methyl-K9 (Jacobs and Khorasanizadeh 2002; Nielsen et al. 2002), and the polycomb CHD interacts with H3 methyl-K27 (Fischle et al. 2003; Min et al. 2003), both of which are marks typically found in heterochromatin. In humans, the tandem CHDs of Chromo-helicase/ATPase DNA- binding protein 1 (CHD1) act cooperatively to specifically bind H3 methyl-K4 (Flanagan et

24 al. 2005), a mark characteristic of euchromatin. In each of these proteins, the CHD targets the protein to sites bearing the specific epigenetic marks. Consequently, when CHDs were identified in retrotransposon integrases, they were hypothesized to play a role in target specificity (Koonin et al. 1995; Malik and Eickbush 1999). In this study, we provide evidence that some retrotransposon CHDs do, in fact, recognize histone modifications characteristic of heterochromatin and that retrotransposon chromodomains can tether integration complexes to heterochromatin resulting in targeted integration.

RESULTS AND DISCUSSION Diversity of retrotransposon-encoded chromodomains Amino acid sequences of integrase C-termini were analyzed from chromoviruses present in the genomes of diverse plants, animals and fungi. Two distinct groups of CHDs were identified (Supplementary Figure S1). Group I CHDs display high sequence similarity to cellular (i.e. non-transposable element) CHDs and are found in diverse eukaryotes. In the HP1 CHD and its homologues, three conserved aromatic residues recognize methylated lysines on histone H3 (Jacobs and Khorasanizadeh 2002; Nielsen et al. 2002); all three residues are conserved in group I CHDs. The group II CHDs are found only in plant retrotransposons. Both sequence similarity and predicted secondary structures indicate that they are homologs of the classical, group I CHDs; however, the group II motif lacks the first conserved aromatic residue and usually the third, suggesting that they interact with different partners. A very different motif was identified at the corresponding position of the CHD among plant centromere-specific retrotransposons (CR elements) (Jiang et al. 2003). Although this group of retrotransposons has been classified as chromoviruses, the motif could not be convincingly aligned to CHDs, and results of secondary structure predictions suggest that this well-conserved motif is structurally distinct (data not shown). In the studies described below, the MAGGY retrotransposon from Magnaportha grisea represents elements with group I CHDs; group II elements are represented by both a Tma-like element from Arabidopsis thaliana (referred to hereafter as Tma) and Os_rn 377-208, a RIRE3-like element from Oryza sativa (referred to hereafter as Os); the Zea mays CRM and O. sativa CRR2 are representative elements with the CR motif.

25

α1 α2 α3 β

Cellular CHD Dm HP1a --EEEYAVEKIID--RRVRK---GKVE----YYLKWKGYPE-TENTWEPENNLD-CQDLIQQYEASRK Mm HP1a EDEEEYVVEKVLD--RRMVK---GQVE----YLLKWKGFSE-EHNTWEPEKNLD-CPELISEFMKKYK Sp Swi6 --EDEYVVEKVLK--HRMARKG-GGYE----YLLKWEGYDDPSDNTWSSEADCSGCKQLIEAYWNEHG Sp Chp2 ---EEFAVEMILD--SRMKKDG-SGFQ----YYLKWEGYDDPSDNTWNDEEDCAGCLELIDAYWESRG

Group I CHD Fo_Skippy --PEVYEAEAIRD--TRKIN---GQRE----YLIKWKNYPE-NENTWEPPKH------Cf Cft1 EAENEFEVEKILD---KK-----GQR-----YLVKWKGYDE-SENTWEPRINLANCYQLLRQFQKWRQ Mg Pyret DAEQTYNVKQIFD--HRRNH---GKIK----YFIKWENYGH-EKNIWEPLNHLQDCQEPLRQYYQELD Cr Chlamy DGNAYWTVHDVIE--HRDRR---VGRKPVRDFLVKWEGFGP-EHNSWEPEANLR-EDELVADIVDKYL Mg MGRL-3 ---EEWEVERVIS--SRVLR---GVLQ----YQVQWRGWDP-DPEFYDAEG-FKNAAVQLRQYHNAYP Uh Chromovir1 DEDLDFKVEALID--KRSHN---GTTE----YKVLWRGYSE-EAASWEPVENL-NCPDLIQEYEVSEG Tm MarY1 -GEPQYEVKSILD--SRLHR---GKLQ----YLVHWKGYGY-EENSWVEESDIN-APRLIKEFHRWHT Xt Silurana -DQQEFEVQEIID--SRKSR---GKVQ----YLLHWKGFGP-EERSWVNAKDVH-APRLITSFHR--- Dr Sushi33 -GSPAYSVRRLLD--VRRRGR--G-FQ----YLVDWEGYGP-EERSWIPARHVL-DRAVIMDF----- Dr Sushi5 -GETAYSVKRILD--SRRRGR--G-FQ----YLVDWEGYGP-EHRSWVPAGDIL-DHSLIDDY----- Ga Sushi -GGPVYTVRKLLA--VRKRGR-RGR-Q----FLVDWEGYGP-EERQWVSSSFIV-DPDLIRDFYRAHP Tr Sushi -GEPVWSVNKLLA--VRRRGR--G-FQ----YLVDWVGYGP-EDQSWVPPSYLA-DPSLLEDF----- Dr Sushi -GEEAYLVREVLD--SRRRG---GALQ----YLVDWEGYGP-EERSWVNARDIL-DPTLTEEFHQNQP Mg MAGGY* ---REYEVEEILDSFWETRGRGGRRLK----YIVRWAGY---SEPTTEPADYLENAAQLVKNFHRRYP Cc Chromovir ----EYEVHDILDSRWRGRGKN-RKLE----YLVSWKGYGS-TDDTWEPEENLEHAPEIVKEFHQRHP

Group II CHD X ? Lj Chro ---SSIQPFHILAH-RTVLRH--G-ID-VAQVLVQWQGQPL-EEATWEDKVTIQSQFPA------Os Osr35 ALQV---PFQFLD--KRLVKK--GNRS-VLQLLTHWYHSSP-SESTWEDMEDLFARFPRALAW----- Os rn377-208* AADVW--PELILD--RRLVKK--GNAA-HVQVLIKWSSLSA-DDATWEDYDVLKTRFPSAPAWGQAAS Zm 1 PIQV---PTRILQ--RRFIDR--GGEL-IAQVKVVWSGMTE-DLATWEDVEALRARFPKALIWDQAGA Zm Reina ELKI---PTEVLE--SRLLRK--GNKV-IPQLLIRWSNWPA-SLSTWEDEHA------At 1 DLTYVEKPTRILE--TSERK---TRNKVIRFCKVQWSHHSE-EEATWEREDELKATHP------Os Osr34 DLTYVEKPVRILD--TSERR---TRNKVTRFCRVQWSHHSE-EEATWEREDELKAAHPHLF------Os RIRE3 DLTYVEKPIRILE--TSERR---TRNRVIRFCKVQWSNHSE-EESTWEREDELKSAHPHLF------Zm Tekay DLTYSEYPVRILE--TSRRI---TRSKVINMCKVQWSHHSE-DEATWEREDELRAEFP------Os 1 NLTYKERPIKVLE--EAERQ---TRRKTIKFYKVQWSNHSE-DEATWEREDLLRAEFPEL------Sb RetroSor2 TLEYREYPVRILD--RATKE---TRSSTFPMCKVLWSNHTE-REATWEKESELQLRYPPYLFERYVTP At Tma* NMTLEARPVRVLE--RRIKE---LRRKKIPLIKVLWDCDGV-TEETWEPDARMKARFKKCF------At Tma3-1 ---LETRPVRVLE--RRIKE---LRRKKIPLIKVLWDCDGV-TEETWEPEARMKARFKTWFEKQVA-- Pt 1 VTAFDKDVDYIIA--DRTVSRRGVPAHS--EYLVKWKNLPE-SEATWEREDDLWQFAEHI------Pt 2 TASYEKRVETILA--DRKIKLPNGAEQT--EYLVKWRKLLR-TEASWEPEDAL------Le Galariel PTQYDAEIEKILD--HRVLGTSKKNTKT--EELVHWKGKSA-ADAVWEKAKDLWQFDAQIDDYLKTVS

CR motif Os CRR2* STSPQVQLHDPITRARARQLNYQVSSFLNSCS---SCLYPGDATLVLLRNDGEDPNGERI Zm CRM* SIPIQVPISGPITRARARQLNHQVITLLSSCP---SYLDHGDPTLVLLRNQGEDRKGKGF Hv Cereba PTAPAAIHTGPVTRARARQLNYQVLSFIGNTSNH-EHMMLPKLTFVVLMNEGPSMDKKD- Mt 1 TSASIQGLGGPMTRSRTKKAKEALTQLVAKVLESKPTLESMEDKMVMCIKPLEEGWG--- Lj 1 GHGALKGLGGPMTRARAKRAKEALQQMIALALEEGTHVRELEPKLVNFLMNYEEE----- At CRA2 VVEDVLVTPAVPTRSRAKLFDQAIAGMLNHIRDRPNDLSQVTS-LVLFQAQGPHQDG--- At CRA3 KKEAMHVLNGPMTRSKTKRLLNQDITTLLQHIE--GSLKQDACQTLVVI-QAV------

Figure S1. Amino acid sequence alignments of retrotransposon CHDs. The alignments are similar to those described in previous studies (Gorinsek et al. 2004; Gorinsek et al. 2005; Marin and Llorens 2000), and all of the amino acid sequences were aligned together with the exception of the CR motifs, which were aligned separately. The CHDs separate into two groups based on similarity to CHDs from cellular proteins. Among conserved residues shared by group I and cellular CHDs are three aromatic amino acids that interact with methylated lysine residues on histone H3 in the HP1 CHD (arrowheads) (Jacobs and Khorasanizadeh 2002; Nielsen et al. 2002). Only one of these aromatic residues is conserved among the group II CHDs (arrowhead). Another is completely absent (X) and the third is present only in some sequences (?). Residues mutated and tested for effects on CHD subnuclear localization (Figure 2B) are shaded in dark red. Arrows above the alignment denote conserved secondary structures shared by the cellular, group I and group II CHDs. The CR retrotransposons encode a very different motif at a position in integrase that corresponds to the location of the

26

CHDs in the group I and group II elements. The two letter abbreviations before each retrotransposon sequence designate species of origin (At, Arabidopsis thaliana; Cc, Coprinopsis cinerea; Cf , Cladosporium fulvum; Cr, Chlamydomonas reihardtii; Cn, Cryptococcus neoformans; Dm , Drosophila melanogaster; Dr, Danio rerio; Fo , Fusarium oxysporum; Ga, Gasterosteus aculeatus; Hv, Hordeum vulgare; Le, Lycopersicon esculentum; Lj, Lotus japonicus; Mg, Magnaporthe grisea; Mm, Mus musculus; Mt, Medicago truncatula; Os, Oryza sativa; Pt, Poncirus trifoliate; Sb, Sorghum bicolor; Sp, Schizosaccharomyces pombe; Tm, Tricholoma matsutake; Tr, Takifugu rubripes; Uh, Ustilago hordei; Xt, Xenopus tropicalis; Zm, Zea mays). GenBank accession numbers for the retrotransposon sequences are as follows: Dm HP1a, P05205; Mm HP1a, Q61686; Sp Swi6,_P40381; Sp Chp2, CAA16917; Fo_Skippy, L34658; Cf Cft1, AF051915; Mg Pyret, Ab062507; Cr Chlamydomonas, sc_2123; Mg MGRL-3, AF314096; Uh Uhchromovir1, AC114899; Tm MarY1, AB028236; Xt Silurana, AC147356; Dr Drsushi33, AL596141; Dr Drsushi5, AL714031; Ga Gasushi, AC145765; Tr sushi, AF030881; Dr Drsushi, AL732562; Mg MAGGY, L35053; Cc Ccchromovir, AACS01000152; Lj Ljchro, AP004915; Os Osr35, AC068924; Os rn_377-208, AK068625; Zm 1, AF466646; Zm Reina, U69258; At 1, AP002071a; Os Osr34, AP004365a; Oz RIRE3, AC119148; Zm Tekay, AF050455; Os dagul, AC087542; Sb RetroSor2, AF061282; At Tma, AF147263; At Tma3-1, AC005965; Pt 1, AF506028; Pt 2, AF506028; Le galariel, AF119040; Oz CRR2, AK068116; Zm CRM, AC152494; Hv Cereba, AY040832; Mt 1, AC131249; Lj 1, AP004525; At CRA2, At2g06890; At CRA3, AC007887). Asterisks indicated those elements studied in detail.

Retrotransposons with chromodomains are associated with repetitive regions of host genomes. If CHDs are targeting determinants, chromoviruses would be predicted to have biased chromosomal distributions. In species such as A. thaliana where blocks of euchromatin and heterochromatin are distinct and well-characterized, Ty3/gypsy retrotransposons (of which the chromoviruses are a member) show a significant association with heterochromatin compared to Ty1/copia retrotransposons (Pereira 2004; Peterson-Burch et al. 2004). This holds true for insertions of our representative group II chromovirus, Tma, which are clustered in pericentromeric regions (Supplementary Figure S2A).

27

Figure S2. Retrotransposons with CHDs and CR motifs are preferentially located in gene-poor, transposon-rich regions of their host genomes. A) The genomic distribution of Ty1/copia (yellow), group II (red) and CR (blue) retrotransposon insertions in the A. thaliana genome. Sequences representing the C-termini of Tos17 (Ty1/copia), Tma (group II), and CR element integrases were used to query the annotated A. thaliana genome sequence by tblastn. Sequence hits with E-values lower then 1E-4 and over 70% identity and 90% coverage

28 were considered. The positions of insertions are plotted on each of the five A. thaliana chromosomes. Note that both types of elements are enriched in pericentromeric regions (depicted in black), which are the major domains of heterochromatin in A. thaliana (Fransz et al. 2002). In contrast, Ty1/copia retrotransposon insertions extend further out along the chromosome arms (Peterson-Burch et al. 2004). The breaks in the chromosome denote the centromeres, which were not sequenced. B) DNA sequence divergence between LTRs of full-length Tos17 and Os elements. The 5’ and 3’ LTR sequences from each element were parsed using a Perl script, and divergence was estimated using the Jukes-Cantor model (Jukes and Cantor 1969).

The clustering of transposable elements on all seven chromosomes of M. grisea – the host of the group I element, MAGGY – has previously been noted (Dean et al. 2005; Thon et al. 2004). Unfortunately, the distributions of epigenetic marks that define heterochromatin, such as histone and DNA methylation, have not been characterized in this species. In general, heterochromatin is gene poor and enriched in repeats and transposable elements (Grewal and Elgin 2007; Grewal and Jia 2007). We therefore quantitatively assessed the frequency with which MAGGY insertions are associated with genes (a proxy for euchromatin) or transposable elements (a proxy for heterochromatin). Analysis of 20 kb windows centered on MAGGY insertions revealed that on average, MAGGY insertions are associated with 2.40 (SE 0.18) genes and 0.56 (SE 0.10) transposable elements (Figure 1A), whereas random windows have 3.88 (SE 0.24) genes and 0.17 (SE 0.05) transposable elements (P < 0.00001 for both genes and transposable elements). Windows of 10 kb and 40 kb were also tested, and biases in distribution were also found to be significant (data not shown). MAGGY insertions, therefore, appear to be clustered with other transposable elements and are located in relatively gene poor regions.

Heterochromatin in rice includes the pericentromere as well as islands along the chromosomes (Houben et al. 2003). In contrast to Tma of A. thaliana, insertions of Os – the other representative group II element – are dispersed on the rice chromosome arms (Figure 1B). The association of Os with genes and other transposable elements was evaluated in 40 kb windows centered on Os insertions (40 kb windows were chosen because of the larger intergenic regions in plants relative to fungi; however similar results were also obtained with

29

20 kb and 80 kb windows, data not shown). In regions flanking the 917 Os insertions in the rice genome, there were on average 2.96 (SE 0.07) genes and 2.93 (SE 0.06) transposable elements (Figure 1C). In contrast, random 40 kb windows averaged 5.03 (SE 0.09) genes and 1.20 (SE 0.05) transposable elements (P < 0.00001 for both genes and transposable elements). As a control, we evaluated gene and transposon associations for Tos17 – a Ty1/copia retrotransposon – which prefers to integrate near genes (Miyao et al. 2003). Consistent with this integration specificity, Tos17 insertions were associated on average with 4.90 (SE 0.25) genes and 1.10 (SE 0.12) transposable elements (P < 0.00001 for both transposable elements and genes), and therefore they occupy a different genome context from the Os chromovirus that is more similar to random sites (P < 0.33 for both transposable elements and genes). Comparisons of LTR DNA sequence divergence indicate that insertions comprising the Tos17 and Os families are comparable in age and very young (Supplementary Figure 2B). These insertions, therefore, are less likely to be fixed, suggesting that target site specificity is the major underlying factor in their different genomic distributions. Although forces such as recombination and selection could contribute, they would have to act very differently on these two comparably-aged families in order to yield such very different genomic distributions.

The CR retrotransposons got their name because of their association with centromeric repeats, and therefore they show a clear heterochromatin association (Cheng et al. 2002; Miller et al. 1998). Although there are only a few CR insertions present in the rice genome sequence, most are associated with pericentromeric regions (Figure 1B). Many additional insertions reside in the centromeric heterochromatin, which is evidenced by the recent sequencing of a rice centromere and FISH analyses in a variety of plant species (Cheng et al. 2002; Dong et al. 1998; Jiang et al. 1996; Nagaki et al. 2004).

30

Figure 1. Retrotransposons with CHDs and CR motifs are preferentially located in gene-poor, transposon-rich regions of their host genomes. A) The genomic distribution of M. grisea MAGGY (group I) insertions. For each of the seventy-five MAGGY insertions in the completed genome sequence (Dean et al. 2005), 20 kb windows at the site of insertion (10 kb upstream and downstream) were surveyed for genes and transposable elements. Seventy-five randomly selected sites were similarly surveyed as a control. B) The distribution of group II (red), CR (blue) and Tos17 (yellow) insertions on chromosome 1 of O. sativa. Insertions of group II chromoviruses and Tos17 are distributed along the chromosome arms; however, as indicated in C below, these two element families occupy different genomic contexts. Only a few CR elements are found in the assembled rice genome sequence and these are near the pericentromeric regions (gap in chromosome). Many other CR insertions have been revealed by FISH analysis or are present in the recent sequence of the centromere from chromosome 8 (Cheng et al. 2002; Nagaki et al. 2004). C) Os (class II) insertions are in transposon-rich, gene- poor regions scattered across rice chromosome 1. For each Os insertion site, a 40 kb window (20 kb upstream and 20 kb downstream) was surveyed for genes and transposable elements. Windows containing Os insertions have more transposable elements and fewer genes than random windows or windows centered on Tos17

31 insertions, a retrotransposon that integrates preferentially into gene-rich regions of the rice genome (Miyao et al. 2003).

Retroelement chromodomains direct proteins to heterochromatin Our bioinformatic analyses demonstrated that all five representative retrotransposons from the three groups of chromoviruses are associated with repeat-rich regions of their respective host genomes. Since some CHDs target cellular proteins to heterochromatin through interactions with histone H3 methyl-K9 (Jacobs and Khorasanizadeh 2002; Nielsen et al. 2002), we reasoned that retrotransposon CHDs may serve a similar function for integrase. To more directly test this hypothesis, we transiently expressed fusion proteins generated between YFP and retrotransposon CHDs in A. thaliana suspension cells and leaf protoplasts. The YFP-CHD fusions were visualized by confocal microscopy, and both types of CHDs and the CR motif showed a punctate nuclear distribution (Figure 2A). In contrast, YFP fusions to the C-terminus of integrase from Tnt1 – a Ty1/copia retrotransposon associated with genes (Le et al. 2007) – were distributed throughout the nucleus. A CFP fusion to TFL2 (HP1-like homologue in A. thaliana) co-localized with the retrotransposon YFP-CHD fusions. When transiently expressed, TFL2 localizes to the chromocenters, the primary domain of heterochromatin in A. thaliana (Fransz et al. 2002; Zemach et al. 2006). We confirmed this association by showing co-localization between TFL2 and a YFP fusion to CENP-C – a kinetochore protein (Supplementary Figure S3A).

Group I and group II CHDs have conserved aromatic amino acid residues, which in the HP1 CHD, interact with methylated H3 K9 (Bannister et al. 2001; Lachner et al. 2001) (See Supplementary Figure S1). These conserved residues were mutated in the MAGGY, Tma and Os CHDs, and the mutations abolished the localization of all three CHDs to heterochromatin (Figure 2B). In summary, localization experiments with representative retrotransposon CHDs and their mutants are consistent with a role for these motifs in recognizing heterochromatin and directing integration complexes to these sites.

32

The MAGGY chromodomain interacts with histone H3 methyl-K9 To test directly whether the MAGGY CHD interacts with histones, pull-down experiments were performed using a His6-tagged MAGGY CHD and purified histones from calf thymus (Supplementary Figure S4A). MAGGY specifically pulled down H3. To test the specificity of the H3 interaction, pulldowns were conducted with a calf thymus histone extract and various glutathione S-transferase (GST)-tagged CHDs. As indicated on stained polyacrylamide gels, the MAGGY CHD pulled down from the mixture a protein of the same size as histone H3 (Figure 3A) that was subsequently confirmed to be H3 by mass spectrometry (data not shown). Neither GST alone nor a MAGGY CHD with mutations in conserved aromatic amino acids pulled-down histone H3. Moreover, no proteins were pulled-down by GST-Tma and GST-Os fusions, suggesting that the partners for these CHDs are plant-specific.

To determine if the MAGGY CHD interacts with a specific H3 modification, His- tagged wild type and mutant MAGGY CHDs were used in pull-down assays with a series of biotinylated histone H3 modified peptides, including H3 dimethyl-R2, H3 aa 18-36, H3 methyl-K27, H3 dimethyl-K27, H3 trimethyl-K27, H3 aa 28-41, H3 methyl-K36, H3 dimethyl-K36, H3 trimethyl-K36, H3 aa 1-21, H3 methyl-K4, H3 methyl-R8, H3 aa 9-29, H3 trimethyl-K14, H3 methyl-R17, H3 trimethyl-K18, and H3 dimethyl-K9 (M. Bedford, personal communication and data not shown). The MAGGY CHD specifically pulled down the dimethyl-K9 peptide but neither the methyl-K4 peptide (Figure 3B) nor any of the other peptides tested (data not shown). Additional experiments were performed to test for interactions with both di- and trimethyl-K9 peptides, and the MAGGY CHD interacted with both modified forms (Supplementary Figure S4B) with some apparent preference for the trimethylated form (data not shown). In summary, the pull-down experiments indicate that like its HP1 homologue, the MAGGY CHD interacts with histone H3 and has strong specificity for H3 di- and trimethyl-K9.

33

Figure 2. Subnuclear localization of retrotransposon CHDs and the CR motif. A) YFP fusion proteins expressing retrotransposon CHDs or the CR motif localize to sites of heterochromatin in A. thaliana cells. Constructs expressing fusion proteins between YFP and either group I or group II CHDs or the CR motif were transformed into A. thaliana suspension cell protoplasts and visualized by confocal microscopy. The fusion proteins formed punctate foci in the nucleus and were enriched in the nucleolus. Localization was coincident with a fusion between CFP and TFL2 – the A. thaliana HP1 homologue. B) Mutations in a conserved aromatic amino acid in group I and group II CHDs that is predicted to interact with the methyl group on H3 K9 abrogate subnuclear localization of YFP-CHD fusion proteins. Residues modified in the representative group I and group II CHDs are highlighted in red in Supplementary Figure S1. In each case, residues were mutated to valine.

34

H3 methyl-K9 is found in heterochromatin in most eukaryotes, including fungi, plants and animals (Gendrel et al. 2002; Jacobs et al. 2001; Peters et al. 2001; Tamaru and Selker 2001) and the interaction between the MAGGY CHD and H3 methyl-K9 suggests that it is responsible for the observed subnuclear localization of the MAGGY CHD fusion protein. The role of H3 methyl-K9 in heterochromatin biology has been studied extensively in S. pombe (Grewal and Rice 2004), and in wild type S. pombe strains, we observed that an EGFP-MAGGY CHD fusion localized to 1-3 subnuclear foci in approximately 70% of the nuclei examined (Figure 3C, top panels and data not shown). This localization was similar to that observed for Chp2, a protein with a CHD that localizes to heterochromatin (Sadaie et al. 2004). The two proteins co-localize in both swi6-115 and wild type cells (Figure 3C, bottom panels and data not shown). The swi6-115 strain has a mutation in the HP1 homologue of S. pombe and yet normal H3 methyl-K9 in heterochromatic regions (Nakayama et al. 2001). The subnuclear localization of the MAGGY CHD was abrogated by mutations in the CHD that recognize H3 methyl-K9 (Supplementary Figure S3B). In S. pombe, Clr4 is the H3 K9 methyl-transferase, and clr4Δ strains lack histone H3 methyl-K9 (Nakayama et al. 2001). No subnuclear localization of the MAGGY CHD or Chp2 control was observed in a clr4Δ strain (Supplementary Figure S3B); instead the tagged CHDs were evenly distributed throughout the nucleus. We conclude, therefore, that the MAGGY CHD associates with heterochromatin in S. pombe and that H3 methyl-K9 is required for this association.

35

Figure 3. The MAGGY CHD recognizes histone H3 methyl-K9. A) MAGGY CHD interacts with histone H3. GST-MAGGY CHD and GST-Tma CHD fusion proteins were mixed with a calf thymus histone extract and then pulled down using glutathione agarose beads. Both the input and pull-down reactions were separated by SDS PAGE and visualized by SYPRO Ruby protein gel stain. H3, as indicated by the asterisk, was enriched in the pull-down with the MAGGY CHD. No interaction was observed with the MAGGY CHD RW mutation or with wild type or mutant variants of the Tma CHD. B) The MAGGY CHD specifically interacts with H3 dimethyl-K9. Biotin-labeled histone peptides were incubated with His6-tagged MAGGY CHD, and then pulled-down with streptavidin agarose beads. Pull-down reactions were separated by SDS PAGE and transferred to nylon membranes. The presence of the CHD was detected using an anti-His6 antibody. The

36

MAGGY CHD specifically interacts with histone H3 dimethyl-K9, and not with H3 dimethyl-K4. Mutations in conserved residues of the CHD abrogate the interaction. C) MAGGY CHD localizes to sites of H3 methyl-K9 in S. pombe. The MAGGY CHD localizes to punctate sites within the nucleus as does Chp2, a CHD protein found at sites of centromeric heterochromatin in S. pombe (Sadaie et al. 2004) (top panel). Co-localization of the MAGGY CHD and Chp2 is observed in a swi6-115 strain, which lacks the S. pombe HP1 homologue (Nakagawa et al. 2002). The swi6-115 strain should allow expression of genes in centromeric heterochromatin and was used in transposition assays that require expression of a marker gene carried by Tf1 (see Figure 4).

Figure S3. Further characterization of the subnuclear localization of chromovirus CHDs. A) The TFL2 protein of A. thaliana co-localizes with CENP-C in protoplasts. CENP-C is a marker for chromocenters in A. thaliana (Shibata and Murata 2004). B) Subnuclear localization of the MAGGY CHD in S. pombe requires conserved residues in the CHD (first panel). Neither the MAGGY CHD nor Chp2 localize to heterochromatin in a clr4Δ strain, which lacks H3 methyl-K9 (Nakayama et al. 2001) (second and third panels). C) Unlike the MAGGY CHD, the Tma and Os CHDs do not localize to heterochromatin in S. pombe.

37

Figure S4. The MAGGY CHD recognizes histone H3 methyl-K9. A) MAGGY CHD interacts with purified histone H3. Individually purified histones from calf thymus were used in pull-down experiments with a His6- tagged MAGGY CHD. The pull-down products were separated by SDS-PAGE and visualized by Coomassie blue staining (upper panel). A specific interaction was only observed between the MAGGY CHD and histone H3. The weak interaction with H4 is non-specific, since it is observed with control proteins that lack the MAGGY CHD (lower panel). M, a histone mixture. B) The MAGGY CHD interacts with both H3 dimethyl- K9 and H3 dimethyl-K9 peptides. The experiment was performed as described in the Figure 3B legend.

Specifically, biotin-labeled histone peptides were incubated with His6-tagged MAGGY CHD, and then pulled- down with streptavidin agarose beads. Pull-down reactions were separated by SDS PAGE and transferred to nylon membranes. The presence of the CHD was detected using an anti-His6 antibody.

Group II CHDs and the CR motif also target fluorescent marker proteins to heterochromatin (Figure 2), yet they do not recognize H3 methyl-K9 (data not shown). Similarly, we did not observe an interaction with any of the histone modifications mentioned above as well as H2AX, H2AX pS, H4 pS1, H4 dimethyl-R3, H4 aa 11-28, H4 methyl-K20, H4 dimethyl-K20, and H4 trimethyl-K20 (M. Bedford, personal communication and data not shown). Consistent with the pull-down experiments, no subnuclear localization was observed for the Tma and Os CHDs in S. pombe (Supplementary Figure 3C). Ongoing research is focused on identifying the interacting partner of the group II CHD and CR motif,

38 which we predict will be a factor or molecular mark uniquely associated with plant heterochromatin. Notably, we could not find any closely related cellular proteins that bear similarity to either motif. These domains, therefore, may have been ‘invented’ by retrotransposons to target to heterochromatin. Alternatively, if they were acquired from a cellular protein, this motif has diverged in the cellular progenitor to such an extent that it is no longer recognizable.

The MAGGY chromodomain targets Tf1 insertions to heterochromatin We wanted to test directly whether retrotransposon CHDs are determinants of target specificity. MAGGY actively transposes in M. grisea (Nakayashiki et al. 1999), and ideally, we wanted to determine whether point mutations in the MAGGY CHD that abrogate interactions with histone H3 alter target specificity. However, little work has been carried out on M. grisea heterochromatin, and target specificity of de novo MAGGY insertions has not been characterized. In contrast, and as indicated above, heterochromatin in S. pombe has been extensively studied (Grewal and Rice 2004). Furthermore, the Tf1 retrotransposon of S. pombe is a chromovirus (Hizi and Levin 2005); however, its highly diverged CHD does not fall into any of the three groups shown in Supplementary Figure S1. Unlike the group 1 and 2 retrotransposons and the CR elements, Tf1 is associated with genes and integrates preferentially in euchromatin, particularly a 100-400 bp window upstream of transcriptional start sites (Behrens et al. 2000; Bowen et al. 2003; Singleton and Levin 2002). Tf1 elements have never been observed to integrate into heterochromatic regions (Behrens et al. 2000; Singleton and Levin 2002), and none of the endogenous Tf elements in the completed S. pombe genome sequence are located within heterochromatin (Bowen et al. 2003). Although the role of the Tf1 CHD in target specificity remains obscure, our goal was to add the MAGGY CHD to Tf1 to determine if it directed integration to heterochromatin.

Transposition of Tf1 in S. pombe was monitored using a plasmid carrying a neo- marked Tf1 element (Levin 1995). cDNA generated by this marked Tf1 confers G418 resistance upon either integration into the genome or by recombining with native Tf sequences. Integration and recombination frequencies can be calculated by comparing

39 frequencies of G418 resistance conferred by a wild type element and an integrase frameshift mutant for which Tf1 cDNA can only enter the genome through recombination. The integration frequency, therefore, can be calculated by subtracting the number of G418 resistant colonies observed in the frameshift mutant from the number observed in wild type. Wild type Tf1 retroelements typically show about 20-fold more integration than recombination events (Figure 4A).

We anticipated that expression of the neo gene within Tf1 might be compromised if Tf1 variants with a MAGGY CHD integrated into heterochromatin. We therefore used for our experiments a swi6-115 host strain, which has a mutation in HP1 (W269R). Swi6 protein levels are dramatically reduced in this strain, and marker genes inserted into centromeric repeats are expressed (Nakagawa et al. 2002). However, and importantly for our experiments, histone modifications that characterize heterochromatin are not altered in swi6- 115 strains (Nakayama et al. 2001), although they tend to be restricted to sites where heterochromatin nucleates, such as CenH at the mating loci and the dg-dh repeats in the outer centromere (Noma et al. 2004). As described above, the subnuclear localization of a MAGGY CHD-EGFP fusion is indistinguishable between wild type and a swi6-115 background, and the MAGGY CHD co-localizes with the heterochromatin marker Chp2 in swi6-115 cells (Figure 3C).

In initial experiments, we replaced the Tf1 CHD with the CHD from MAGGY; however, the modified element was incapable of transposition (data not shown). The Tf1 integrase CHD is located at the very C-terminus of the Tf1 polyprotein, and as an alternative strategy, we simply fused the MAGGY CHD to the integrase C-terminus, thereby creating a Tf1 element with tandem CHDs (Tf1-Mac). A variant of Tf1-Mac was also created that carries the integrase frameshift mutation (Tf1-Mac-fs). Although the addition of the MAGGY CHD reduced transposition frequencies seven-fold compared to wild type, Tf1- Mac still produced four-fold more integration than recombination events in a swi6-15 strain (Figure 4A).

40

Figure 4. Target site choice of the Tf1 retrotransposon in S. pombe is altered by adding the MAGGY CHD to integrase. A) Transposition and cDNA recombination frequencies of Tf1 and Tf1-Mac. The MAGGY CHD was fused to the C-terminus of Tf1 integrase, creating Tf1-Mac. Tf1-Mac transposes, but at a lower frequency than wild type Tf1. Integrase frameshift mutants (Tf1-fs and Tf1-Mac-fs) abolish Tf1 or Tf1-Mac integration, and therefore serve to infer frequencies of cDNA recombination. B) A screen for insertions of Tf1 and Tf1-Mac in heterochromatin. A diagram of S. pombe centromeric repeats and mating type region depicting the location of the PCR primers used to identify heterochromatic Tf1 insertions. C) Targeting of Tf1-Mac to heterochromatin. Centromeric heterochromatin in S. pombe, which is enriched in histone H3 methyl-K9,

41 becomes a target site for Tf1-Mac in the swi6-115 strain. . In the swi6-115 strain, there are eight integration events into heterochromatin out of 2200 Tf1-Mac transposition events. This number of transposition events includes elements incorporated into the genome by either integration or recombination. The number that arose specifically by integration was inferred by normalizing the data using the value for the ratio of transposition to recombination (4:1). The normalized heterochromatin targeting frequency is 8 out of 1650 integration events. Tf1-Mac does not target into heterochromatin in the clr4Δ strain, which lacks H3 methyl-K9. D) Native Tf1 target specificity is retained by the Tf1-Mac construct. Fusion of the MAGGY CHD to Tf1 does not abolish Tf1’s native targeting specificity in the swi6-115 strain. Sites of insertion for nine randomly selected Tf1 integration events were determined by inverse PCR. Eight were found within promoters and one near a transcription terminator, consistent with previously documented Tf1 targeting patterns (Behrens et al. 2000; Singleton and Levin 2002).

To determine if Tf1-Mac integrates into heterochromatin, Tf1-Mac transposition was induced, and DNA was prepared from the resulting G418 resistant colonies. PCR experiments were then carried out using a Tf1 primer and either 1) one of four primers at the two ends of the centromeric dg repeats or 2) one of four primers at the ends of K-region at the mating locus (Figure 4B). For wild type Tf1, 2400 G418 colonies were screened. Based on the frequency of integration relative to recombination (Figure 4A), this approximates 2267 integration events. No insertions of wild type Tf1 elements were detected into either the outer centromere or the mating locus. For Tf1-Mac, 2200 G418 resistant colonies were screened representing approximately 1650 integration events. In this case, eight insertions were identified in heterochromatin (Figure 4C and data not shown). These eight insertions were cloned and all eight arose by transposition, as they were flanked by target site duplications (data not shown). These results suggest that the addition of the MAGGY CHD to Tf1 integrase alters the Tf1 integration pattern.

Since the majority of the Tf1-Mac insertions did not integrate into the dg repeats or K-region, we used inverse PCR to determine the genomic integration sites of randomly selected Tf1-Mac transposition events (Figure 4D). Of the nine integration events recovered, all were near genes and most occurred within the preferred window for Tf1 integration, namely upstream of genes transcribed by RNA polymerase II. The observed pattern was indistinguishable from wild type Tf1 (Behrens et al. 2000; Singleton and Levin 2002). We

42 concluded that native Tf1 targeting mechanisms determine the integration site for the majority of Tf1-Mac insertions, and that the addition of the MAGGY CHD directs approximately 1/200 integration events to the regions of heterochromatin surveyed in our screen. In terms of possible target sites, the intergenic regions in euchromatin approximate 7 Mb, whereas our PCR screen surveyed less than 0.1 Mb of heterochromatin. Therefore, in terms of availability alone, there are many more euchromatic integration sites for Tf1 relative to sites in heterochromatin. The efficiency differences of the two targeting mechanisms could also be due to the difference in affinity between targeting determinants and their cellular partners, which cannot be evaluated because the interacting partner of Tf1’s chromodomain remains unknown.

It is possible that addition of the MAGGY CHD impaired Tf1’s native targeting mechanism resulting in the eight heterochromatic Tf1-MaC insertions, rather than directing targeting to heterochromatin through interaction between MAGGY CHD and histone H3 methyl-K9. To control for this possibility, we first tested transposition of a Tf1-MaC variant with mutations in conserved residues that recognize H3 methy-K9 (the RW to VV mutant). This element, however, was compromised for transposition (data not shown), and so as an alternative control, we tested transposition of Tf1-Mac in a clr4Δ strain. This strain lacks histone H3 methyl-K9 (Nakayama et al. 2001) and abrogates the subnuclear localization of the MAGGY EGFP-CHD (Supplementary Figure S3B). Transposition and recombination frequencies were comparable in the siw6-114 and clr4Δ strains (Figure 4A). A screen of 2000 G418 resistant colonies approximating 1566 integration events failed to recover any Tf1-Mac insertions in heterochromatin in the clr4Δ strain (Figure 4C). The Tf1-Mac targeting pattern in the swi6-115 strain is significantly different than in the clr4Δ strain (P < 0.008). This control suggests that Tf1-Mac integrates into heterochromatin due to the interaction between the MAGGY CHD and histone H3 methyl-K9.

We are aware that binding specificities differ for some CHDs depending on whether they are assessed in vitro or in vivo. For example, the Arabidopsis TFL2 CHD binds H3 di- and trimethyl-K9 as well as trimethyl-K27 in vitro, but primarily recognizes H3 trimethyl-

43

K27 in vivo (Turck et al. 2007; Zhang et al. 2007). We recognize that our heterologous experimental system for measuring MAGGY CHD function (i.e. S. pombe vs. M. grisea) has limitations, however, both our in vitro biochemical and in vivo genetic experiments suggest that the MAGGY CHD interacts exclusively with H3 di- and trimethyl K9. We would like to note that in the filamentous fungus, Neurospora crassa, which shares a common ancestor with M. grisea (200 Myr ago), the HP1 homologue interacts with H3 trimethyl-K9 both in vivo and in vitro (Freitag et al. 2004).

Some HP1 isoforms in mammals and Drosophila melanogaster target proteins to euchromatin where they repress gene expression (Hiragami and Festenstein 2005). Such differences in targeting may be due to sequences outside of the chromodomain, and we cannot rule out the possibility that other sequences in integrase play a role in targeting. Moreover, different targeting patterns among chromoviruses could result from different cellular partners of their CHDs. Despite these possibilities, our experiments demonstrate that the chromovirus CHD itself can target integration, and this observation – coupled with our data indicating that chromoviruses are enriched in gene poor regions, GFP-CHD fusion proteins localize to heterochromatin, and CHDs recognize histone modifications – support the conclusion that in their native context, chromovirus CHDs perform a targeting function.

Heterochromatin and transposable elements Targeted integration into domains of heterochromatin likely has significant consequences for both the retrotransposon and their hosts. In plants, for example, retroelement activity is one of the most important factors contributing to genome expansion (SanMiguel et al. 1996), and rampant transposition would be predicted to expand domains of heterochromatin. Support for this comes from analysis of pericentromeric heterochromatin in A. thaliana, which has undergone about a 6-10 fold expansion within the last 5 Myr relative to related species (Hall et al. 2006). In contrast, the size of euchromatin in these species is largely unchanged.

44

Figure 5. Self-perpetuating model for heterochromatin expansion. Repetitive elements such as retrotransposons produce dsRNAs that trigger the RNAi pathway. This results in the targeting of DNA and histone methyltransferases to retrotransposons resident on host chromosomes. Histone modification (e.g. H3 methyl- K9) establishes heterochromatin and, in turn, creates recognition sites for retrotransposon integrases. The targeting of retrotransposons to the domain of heterochromatin serves to reinforce the epigenetic mark.

Selective targeting to heterochromatin might benefit the mobile element by allowing it to avoid negative selection arising from insertion into genes (Boeke and Devine 1998). Although the mobile element may be silenced, this would not necessarily compromise its ability to transpose, especially for those that insert in facultative heterochromatin. There are also many examples of genes that reside in constitutive heterochromatin (Grewal and Elgin 2007). Among these are the S. cerevisiae Ty5 retrotransposons, which are transcriptionally activated 20-fold by the pheromone response pathway despite their location in heterochromatin (Ke et al. 1997).

Whereas we provide evidence that retrotransposon CHDs target integration to heterochromatin by recognizing epigenetic marks, abundant evidence has accumulated over the past several years indicating that transposable element insertions themselves perpetuate the epigenetic marks that define heterochromatin (Slotkin and Martienssen 2007). The RNAi pathway keeps transposable element insertions quiescent and is triggered by double-stranded

45

RNA that arises from read-through transcription of nested transposable elements (Figure 5). siRNAs are then generated by dicer through cleavage of dsRNA, which ultimately leads to the silencing of the mobile element by DNA and histone methylation. As suggested by our data, the epigenetically marked transposable elements now become targets of additional integration events, and this self-perpetuating mechanism creates a genomic safe haven for the mobile elements and reinforces the expansion and persistence of domains of heterochromatin. Although it is clear that forces such as recombination and selection impact localization of mobile elements in heterochromatin, we believe we are only beginning to appreciate the extent to which integration site choice contributes to the landscape of the eukaryotic genome.

METHODS Strains and plasmids YFP and CFP coding sequences from plasmids pSKY36 and pSKC36 (gift of S. Howell) (Banno et al. 2001) were amplified by primers DVO3302 and DVO3303 and cloned into the XbaI and SacII sites of pYH37 (Havecker et al. 2005) to create pYH48 and pYH49, respectively (The DNA sequences of all primers used in this study are available upon request). The SV40 large T antigen NLS (MAPKKKRKV) was fused to the N-terminus of YFP and CFP (Kalderon et al. 1984). TFL2 was PCR amplified using pAVA121NF as a template (gift of V. Gaudin) (Gaudin et al. 2001) and primers DVO3259 and DVO3260. The PCR fragment was cloned into the XmaI and SacI sites of pYH49 to create pYH94. The MAGGY, Tma, and Os CHDs were amplified by PCR from genomic DNA extracted from M. grisea (gift of R. Dean), A. thaliana, and Zea maize, respectively (primers DVO3439 and DVO3440, MAGGY; DVO3484 and DVO3486, Tma; DVO3489 and DVO3490; DVO3327 and DVO3328, CRM). All of the above PCR products were cloned into the XmaI and SacI sites of pYH48 to create pXG147 (MAGGY CHD), pXG156 (Tma3 CHD), pXG158 (Os CHD), and pYH67 (CRM CR motif).

To create His6-tagged CHDs for the pull-down experiments, the coding sequences of the CHDs were PCR amplified (using primers DVO3567 and DVO3440, MAGGY;

46

DVO3520 and DVO3486, Tma; DVO3522 and DVO3488, Os) and cloned into the NdeI and SacI sites of pET28b+ (Novagen). GST fusion constructs were generated by cloning PCR products (generated with primers DVO3622 and DVO3440, MAGGY; DVO3623 and DVO3486, Tma; DVO3624 and DVO3490, Os) into the NcoI and SacI sites of pET42 (Novagen). Mutant chromodomains were generated by site-specific PCR mutagenesis (Ausubel et al. 1987) (primers DVO3557 and DVO3558 for the MAGGY R28W29 to VV mutation (amino acid positions are according to sequence in Supplementary Figure S1; spaces are not counted); primers DVO3559 and DVO3560 for the Tma3 W31 to V mutation; primers DVO3561 and DVO3562 for the K28W29 to VV mutation). The BL21-codon plus E. coli strain (Invitrogen) was used for protein expression.

The MAGGY CHD was fused to the C-terminus of Tf1 integrase by two-step PCR- based gene construction (Dillon and Rosen 1990) using primers DVO3378 – DVO3582. The PCR product encoding the gene fusion was used to replace the NarI and BsrGI restriction fragment of the Tf1 element in pHL414-2 (Levin et al. 1993), thereby creating Tf1-Mac (pXG175). The gene fusion was also cloned into the Tf1 frameshift mutant in pHL431 (Hoff et al. 1998) to create Tf1-Mac-fs (pXG194). The swi6-115 mutant (YB429: h90 ade6-M216 leu1-32 urs4-DS/E swi6-115) was used to test Tf1-Mac target specificity (gift of J. Nakayama) (Nakayama et al. 2001). yXG150 was derived from SPM1049 (h90 ade6-M210 leu1-32 uea4-D18 clr4::KanMX6) (gift of J. Nakayama) (Nakayama et al. 2001) to detect Tf1-Mac transposition in the clr4Δ strain. Because Tf1 carries a marker gene that confers kanamycin resistance after transposition, the KanMX6 construct was deleted in yXG150. This was accomplished by first carrying out a two-step PCR amplification to create a LEU2 gene with 200 bp of flanking KanMX6 sequence. Primers used for the amplification were DVO4040 – DVO4044. The PCR product was then used for targeted gene replacement.

Constructs to monitor subnuclear localization in S. pombe were generated by PCR- amplifying chromodomains of MAGGY, Tma, and Os (DVO3718 and DVO3721, MAGGY; DVO3719 and DVO3521, Tma; DVO3720 and DVO3523, Os). The 5’ primer for each amplification was designed to encode the SV40 large T antigen NLS (MAPKKKRKV)

47

(Kalderon et al. 1984). PCR products were cloned into the NdeI and BamHI of pYB232 to generate fusions with EGFP under the control of the nmt promoter (gift of J. Nakayama) (Sadaie et al. 2004). As a control for the subnuclear localization experiments, mRFP was PCR-amplified (DVO4076 and DVO4077) using pFA6a-mRFP-hph as a template (gift of T. Toda) (Asakawa et al. 2005). The PCR product was cloned into the NdeI site of pREP42 (Maundrell 1993) to generate pXG204. The Chp2 coding sequence was PCR-amplified (DVO4156 and DVO4157) using pYB232 as a template. The product was digested with XhoI and BamHI and cloned into the SalI and BamHI sites of pXG204. To monitor the subnuclear localization of chromodomains in S. pombe, the strain yXG126 was derived from spYB399 (pYB232/h90 ade6-M216 leu1-32 ura4-DS/E) (Sadaie et al. 2004) by losing plasmid pYB232.

DNA sequence analysis DNA sequences of chromoviruses and chromodomains from cellular genes were retrieved from NCBI. The chromovirus integrase C-termini were used to query rice (Sasaki et al. 2002) and A. thaliana (Initiative 2000) genome sequences by tblastn to identify sites of chromovirus insertions. Retrotransposon distributions were mapped on chromosomes using a BioPerl script. The sequence alignment of chromodomains was generated by ClustalX (Thompson et al. 1997) and with manual adjustments. Secondary structure predictions of retrotransposon chromodomains was carried out using Jpred software (Cuff et al. 1998).

To assess the genomic distribution of MAGGY insertions, the amino acid sequence of MAGGY integrase was used to query the M. grisea genome (NCBI, 2003 version) (Dean et al. 2005) by tblastn (Altschul et al. 1997). Sequence hits with E-values lower than 1E-4 and over 70% identity and 90% coverage were collected (75 elements). Using the CDS annotations from GenBank, the numbers of genes or transposable elements were then counted in 10 kb windows both upstream and downstream of the insertion site. A similar analysis was conducted with 75 random sites selected by a random number generator. For the O. sativa genome, the locations of chromodomain-encoding retrotransposons and Tos17 (AAP53905) were mapped against the protein database annotated by TIGR (Yuan et al.

48

2005). For these analyses, the amino acid sequence of the rn_377-208 (Os) and Tos17 integrase C-termini were used to query the rice genome sequence by blastp (Altschul et al. 1997). Insertions were identified by the following cutoff criteria: 1E-4, 30% identify and 90% alignment coverage for rn_377-208 (917 elements); 1E-4, 40% identity and 90% alignment coverage for Tos17 (106 elements). Annotated coding regions were evaluated to determine the number of genes adjacent to retrotransposon insertions. Annotations were manually checked to ensure that each annotation corresponded to a unique insertion. It is likely that many degenerate insertions and solo LTRs are not included in the available annotations, and so the actual number of transposable elements adjacent to a given insertion is likely an underestimate; however, this underestimate only strengthens the conclusion that chromodomain-encoding retrotransposons are associated with other mobile elements. Analyses were also performed in which the chromovirus group under question was excluded from the dataset to control for self-association. For rice, when other group II elements were eliminated from the 40 kb intervals, the number of transposable elements per interval dropped to 2.6 from 2.9, which is still significantly different (P < 0.00001) from the number of transposable elements associated with random sites or Tos17 elements. For MAGGY and Tos17, no other MAGGY or Tos17-like elements were present in the intervals analyzed. The significance of the association of retrotransposons with genes or transposable elements was evaluated by the t-test.

The LTR sequences of the Tos17 and rice group II chromoviruses were collected as described (Peterson-Burch et al. 2004). Comparisons of the 5’ and 3’ LTR sequences were performed to determine the number of substitutions per neutral site (Jukes and Cantor 1969). This provided an estimate of the age of the elements, since the two LTRs are typically identical at the time of integration.

In vivo protein localization Established protocols were used for transformation and heterologous protein expression in S. pombe and A. thaliana suspension cells (Havecker et al. 2005; Moreno et al. 1991). Images were collected using a Leica upright confocal microscope with a 60X water

49 objective lens and 4X zoom lens at the Image Analysis Facility, College of Veterinary Medicine, Iowa State University or with Nikon Eclipse E800 microscope at the CBS Imaging Center, University of Minnesota.

Pull-down assays

His6- and GST-CHD fusion proteins were purified from E. coli extracts with Ni- agarose beads and glutathione-sepharose as described by the manufacturer (Sigma). For pull-down assays with the calf thymus histones (Worthington), GST-tagged chromodomains were incubated with the histone extract (5 µg) in BTP buffer (25 mM BTP, pH 6.8, 1M KCl, 0.5% Triton X-100) for two hours and washed six times with binding buffer (Huyen et al. 2004). Samples were separated by SDS-PAGE and gels stained with SYPRO Ruby Protein

Gel Stain (Invitrogen). To identify interactions with specific H3 modifications, His6-tagged fusion proteins were incubated with biotinylated H3 peptides and streptavidin agarose in binding buffer (150 mM NaCl, 50 mM Hepes, pH 7.5, 0.1% Tween and 10% glycerol) for two hours and washed with binding buffer four times. Bound proteins were eluted with SDS loading buffer at 37°C for 30 min, followed by immunoblotting using an anti-His6 antibody (Qiagen). Similarly, GST-tagged fusion chromodomain proteins were incubated with biotinylated H3 peptides with di- and tri-methyl H3 K9, pulled down by streptavidin agarose beads and detected by immunoblotting using anti-GST antibodies.

Transposition assays and target site analyses Quantitative Tf1 transposition assays were performed as previously described (Levin 1995) except that cells were scraped from EMM 5-FOA plates, diluted, and plated to YES medium with G418 and 5-FOA to select transposition events. To determine the total number of cells, dilutions were plated on YES medium without G418. The ratio of the number of G418, 5-FOA resistant cells to the total cell number was taken as the transposition frequency.

To identify insertions in S. pombe heterochromatin, DNA was purified (Ausubel et al. 1987) from pools of 20 colonies growing on YES + G418 + 5-FOA plates. The DNA was used as a template in PCR reactions with a Tf1 primer that was not complementary to S.

50 pombe Tf2 elements (DVO3639). Eight primers were used to survey Tf1 insertions in the outer pericentromeric region and mating loci (DVO3790 – DVO3793 and DVO3797 - DVO3800), each of which was paired with DVO3936 in separate PCR reactions. DVO3791 and DVO3792 were used to amplify dh repeats as a control for DNA quality and PCR conditions. The PCR products were examined on 0.8% agarose gels and sequenced with the Tf1 primer DVO3731, which is contained in DVO3639. The resulting 5’ flanking sequences were searched against the NCBI non-redundant database using blastn (Altschul et al. 1997) to identify Tf1 insertion sites. DNA sequences flanking the 3’ end of Tf1 insertions were determined by PCR using a primer that recognizes a downstream genomic sequence and the Tf1 primer DVO3743. The PCR products were then sequenced, and the 5’ and 3’ flanking sequences made it possible to detect target site duplications. The Fisher exact test was performed to assess the significance of the frequency of insertions in heterochromatin.

To detect the genome-wide insertion sites of Tf1 and Tf1-Mac, inverse PCR reactions were carried out with DNA prepared from randomly selected colonies growing on YES plus G418 plates (Ochman et al. 1988). Primers used for the inverse PCR were DVO3639 and DVO3743. PCR products were sequenced and insertion sites determined as described above. Recombination events with endogenous Tf2 sequences or solo LTR were not mapped. For each integration event, the distance to the nearest neighboring genes was determined using the S. pombe genome annotation (Wood et al. 2002).

ACKNOWLEDGEMENTS We thank J. Nakayama for providing several S. pombe strains, M. Bedford for help in testing interactions between CHDs and modified histones and M. Sanders for assistance with the microscopy. This work was supported by National Institutes of Health Grant GM061657 (D.V.) and the Intramural Research Program of the NIH from the National Institute of Child Health and Human Development (H.L.).

51

REFERENCES Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.

Asakawa, K., M. Toya, M. Sato, M. Kanai, K. Kume, T. Goshima, M.A. Garcia, D. Hirata, and T. Toda. 2005. Mal3, the fission yeast EB1 homologue, cooperates with Bub1 spindle checkpoint to prevent monopolar attachment. EMBO Rep 6: 1194-1200.

Ausubel, F.M., R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, and K. Struhl. 1987. Current Protocols in Molecular Biology. Greene/Wiley Interscience, New York.

Bachman, N., M.E. Gelbart, T. Tsukiyama, and J.D. Boeke. 2005. TFIIIB subunit Bdp1p is required for periodic integration of the Ty1 retrotransposon and targeting of Isw2p to S. cerevisiae tDNAs. Genes Dev 19: 955-964.

Bannister, A.J., P. Zegerman, J.F. Partridge, E.A. Miska, J.O. Thomas, R.C. Allshire, and T. Kouzarides. 2001. Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature 410: 120-124.

Banno, H., Y. Ikeda, Q.W. Niu, and N.H. Chua. 2001. Overexpression of Arabidopsis ESR1 induces initiation of shoot regeneration. Plant Cell 13: 2609-2618.

Behrens, R., J. Hayles, and P. Nurse. 2000. Fission yeast retrotransposon Tf1 integration is targeted to 5' ends of open reading frames. Nucleic Acids Res 28: 4709- 4716.

Boeke, J.D. and S.E. Devine. 1998. Yeast retrotransposons: finding a nice quiet neighborhood. Cell 93: 1087-1089.

Bosher, J.M. and M. Labouesse. 2000. RNA interference: genetic wand and genetic watchdog. Nat Cell Biol 2: E31-36.

Bowen, N.J., I.K. Jordan, J.A. Epstein, V. Wood, and H.L. Levin. 2003. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res 13: 1984-1997.

Brehm, A., K.R. Tufteland, R. Aasland, and P.B. Becker. 2004. The many colours of chromodomains. Bioessays 26: 133-140.

Bushman, F.D. 2003. Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell 115: 135-138.

52

Cheng, Z., F. Dong, T. Langdon, S. Ouyang, C.R. Buell, M. Gu, F.R. Blattner, and J. Jiang. 2002. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14: 1691-1704.

Ciuffi, A., M. Llano, E. Poeschla, C. Hoffmann, J. Leipzig, P. Shinn, J.R. Ecker, and F. Bushman. 2005. A role for LEDGF/p75 in targeting HIV DNA integration. Nat Med 11: 1287-1289.

Cuff, J.A., M.E. Clamp, A.S. Siddiqui, M. Finlay, and G.J. Barton. 1998. JPred: a consensus secondary structure prediction server. Bioinformatics 14: 892-893.

Dean, R.A., N.J. Talbot, D.J. Ebbole, M.L. Farman, T.K. Mitchell, M.J. Orbach, M. Thon, R. Kulkarni, J.R. Xu, H. Pan, et al. 2005. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434: 980-986.

Derse, D., B. Crise, Y. Li, G. Princler, N. Lum, C. Stewart, C.F. McGrath, S.H. Hughes, D.J. Munroe, and X. Wu. 2007. Human T-cell leukemia virus type 1 integration target sites in the human genome: comparison with those of other retroviruses. J Virol 81: 6731-6741.

Dillon, P.J. and C.A. Rosen. 1990. A rapid method for the construction of synthetic genes using the polymerase chain reaction. Biotechniques 9: 298, 300.

Dong, F., J.T. Miller, S.A. Jackson, G.L. Wang, P.C. Ronald, and J. Jiang. 1998. Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc Natl Acad Sci U S A 95: 8135-8140.

Fischle, W., Y. Wang, S.A. Jacobs, Y. Kim, C.D. Allis, and S. Khorasanizadeh. 2003. Molecular basis for the discrimination of repressive methyl-lysine marks in histone H3 by Polycomb and HP1 chromodomains. Genes Dev 17: 1870-1881.

Flanagan, J.F., L.Z. Mi, M. Chruszcz, M. Cymborowski, K.L. Clines, Y. Kim, W. Minor, F. Rastinejad, and S. Khorasanizadeh. 2005. Double chromodomains cooperate to recognize the methylated histone H3 tail. Nature 438: 1181-1185.

Fransz, P., J.H. De Jong, M. Lysak, M.R. Castiglione, and I. Schubert. 2002. Interphase chromosomes in Arabidopsis are organized as well defined chromocenters from which euchromatin loops emanate. Proc Natl Acad Sci U S A 99: 14584-14589.

Freitag, M., P.C. Hickey, T.K. Khlafallah, N.D. Read, and E.U. Selker. 2004. HP1 is essential for DNA methylation in neurospora. Mol Cell 13: 427-434.

Gai, X. and D.F. Voytas. 1998. A single amino acid change in the yeast retrotransposon Ty5 abolishes targeting to silent chromatin. Mol Cell 1: 1051-1055.

53

Gaudin, V., M. Libault, S. Pouteau, T. Juul, G. Zhao, D. Lefebvre, and O. Grandjean. 2001. Mutations in LIKE HETEROCHROMATIN PROTEIN 1 affect flowering time and plant architecture in Arabidopsis. Development 128: 4847-4858.

Gendrel, A.V., Z. Lippman, C. Yordan, V. Colot, and R.A. Martienssen. 2002. Dependence of heterochromatic histone H3 methylation patterns on the Arabidopsis gene DDM1. Science 297: 1871-1873.

Girard, A., R. Sachidanandam, G.J. Hannon, and M.A. Carmell. 2006. A germline- specific class of small binds mammalian Piwi proteins. Nature 442: 199-202.

Gorinsek, B., F. Gubensek, and D. Kordis. 2004. Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol 21: 781-798.

Gorinsek, B., F. Gubensek, and D. Kordis. 2005. Phylogenomic analysis of chromoviruses. Cytogenet Genome Res 110: 543-552.

Grewal, S.I. and S.C. Elgin. 2007. Transcription and RNA interference in the formation of heterochromatin. Nature 447: 399-406.

Grewal, S.I. and S. Jia. 2007. Heterochromatin revisited. Nat Rev Genet 8: 35-46. Grewal, S.I. and J.C. Rice. 2004. Regulation of heterochromatin by histone methylation and small RNAs. Curr Opin Cell Biol 16: 230-238.

Grivna, S.T., E. Beyret, Z. Wang, and H. Lin. 2006. A novel class of small RNAs in mouse spermatogenic cells. Genes Dev 20: 1709-1714.

Hall, A.E., G.C. Kettler, and D. Preuss. 2006. Dynamic evolution at pericentromeres. Genome Res 16: 355-364.

Havecker, E.R., X. Gao, and D.F. Voytas. 2005. The Sireviruses, a plant-specific lineage of the Ty1/copia retrotransposons, interact with a family of proteins related to dynein light chain 8. Plant Physiol 139: 857-868.

Hiragami, K. and R. Festenstein. 2005. Heterochromatin protein 1: a pervasive controlling influence. Cell Mol Life Sci 62: 2711-2726.

Hizi, A. and H.L. Levin. 2005. The integrase of the long terminal repeat- retrotransposon tf1 has a chromodomain that modulates integrase activities. J Biol Chem 280: 39086-39094.

Hoff, E.F., H.L. Levin, and J.D. Boeke. 1998. Schizosaccharomyces pombe retrotransposon Tf2 mobilizes primarily through homologous cDNA recombination. Mol Cell Biol 18: 6839-6852.

54

Houben, A., D. Demidov, D. Gernand, A. Meister, C.R. Leach, and I. Schubert. 2003. Methylation of histone H3 in euchromatin of plant chromosomes depends on basic nuclear DNA content. Plant J 33: 967-973.

Huyen, Y., O. Zgheib, R.A. Ditullio, Jr., V.G. Gorgoulis, P. Zacharatos, T.J. Petty, E.A. Sheston, H.S. Mellert, E.S. Stavridi, and T.D. Halazonetis. 2004. Methylated lysine 79 of histone H3 targets 53BP1 to DNA double-strand breaks. Nature 432: 406-411.

Initiative, T.A.G. 2000. Analysis of the genome sequence of the Arabidopsis thaliana. Nature 408: 796-815.

Jacobs, S.A. and S. Khorasanizadeh. 2002. Structure of HP1 chromodomain bound to a lysine 9-methylated histone H3 tail. Science 295: 2080-2083.

Jacobs, S.A., S.D. Taverna, Y. Zhang, S.D. Briggs, J. Li, J.C. Eissenberg, C.D. Allis, and S. Khorasanizadeh. 2001. Specificity of the HP1 chromo domain for the methylated N-terminus of histone H3. EMBO J 20: 5232-5241.

Jiang, J., J.A. Birchler, W.A. Parrott, and R.K. Dawe. 2003. A molecular view of plant centromeres. Trends Plant Sci 8: 570-575.

Jiang, J., S. Nasuda, F. Dong, C.W. Scherrer, S.S. Woo, R.A. Wing, B.S. Gill, and D.C. Ward. 1996. A conserved repetitive DNA element located in the centromeres of cereal chromosomes. Proc Natl Acad Sci U S A 93: 14210-14213.

Jukes, T.H. and C. Cantor. 1969. Evoltion of protein molecules. In Mammalian protein metabolism, pp. 21-132. Academic Press, New York.

Kalderon, D., B.L. Roberts, W.D. Richardson, and A.E. Smith. 1984. A short amino acid sequence able to specify nuclear location. Cell 39: 499-509.

Ke, N., P.A. Irwin, and D.F. Voytas. 1997. The pheromone response pathway activates transcription of Ty5 retrotransposons located within silent chromatin of Saccharomyces cerevisiae. EMBO J 16: 6272-6280.

Koonin, E.V., S. Zhou, and J.C. Lucchesi. 1995. The chromo superfamily: new members, duplication of the chromo domain and possible role in delivering transcription regulators to chromatin. Nucleic Acids Res 23: 4229-4233.

Lachner, M., D. O'Carroll, S. Rea, K. Mechtler, and T. Jenuwein. 2001. Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins. Nature 410: 116-120.

Le, Q.H., D. Melayah, E. Bonnivard, M. Petit, and M.A. Grandbastien. 2007.

55

Distribution dynamics of the Tnt1 retrotransposon in tobacco. Mol Genet Genomics 278: 639-651.

Levin, H.L. 1995. A novel mechanism of self-primed reverse transcription defines a new family of retroelements. Mol Cell Biol 15: 3310-3317.

Levin, H.L., D.C. Weaver, and J.D. Boeke. 1993. Novel gene expression mechanism in a fission yeast retroelement: Tf1 proteins are derived from a single primary translation product. EMBO J 12: 4885-4895.

Lewinski, M.K., M. Yamashita, M. Emerman, A. Ciuffi, H. Marshall, G. Crawford, F. Collins, P. Shinn, J. Leipzig, S. Hannenhalli, C.C. Berry, J.R. Ecker, and F.D. Bushman. 2006. Retroviral DNA integration: viral and cellular determinants of target- site selection. PLoS Pathog 2: e60.

Lippman, Z. and R. Martienssen. 2004. The role of RNA interference in heterochromatic silencing. Nature 431: 364-370.

Llano, M., D.T. Saenz, A. Meehan, P. Wongthida, M. Peretz, W.H. Walker, W. Teo, and E.M. Poeschla. 2006. An essential role for LEDGF/p75 in HIV integration. Science 314: 461-464.

Malik, H.S. and T.H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73: 5186-5190.

Marin, I. and C. Llorens. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol Biol Evol 17: 1040-1049.

Martienssen, R., Z. Lippman, B. May, M. Ronemus, and M. Vaughn. 2004. Transposons, tandem repeats, and the silencing of imprinted genes. Cold Spring Harb Symp Quant Biol 69: 371-379.

Maundrell, K. 1993. Thiamine-repressible expression vectors pREP and pRIP for fission yeast. Gene 123: 127-130.

Miller, J.T., F. Dong, S.A. Jackson, J. Song, and J. Jiang. 1998. Retrotransposon- related DNA sequences in the centromeres of grass chromosomes. Genetics 150: 1615-1623.

Min, J., Y. Zhang, and R.M. Xu. 2003. Structural basis for specific binding of Polycomb chromodomain to histone H3 methylated at Lys 27. Genes Dev 17: 1823- 1828.

56

Miyao, A., K. Tanaka, K. Murata, H. Sawaki, S. Takeda, K. Abe, Y. Shinozuka, K. Onosato, and H. Hirochika. 2003. Target site specificity of the Tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15: 1771-1780.

Moreno, S., A. Klar, and P. Nurse. 1991. Molecular genetic analysis of fission yeast Schizosaccharomyces pombe. Methods Enzymol 194: 795-823.

Mou, Z., A.E. Kenny, and M.J. Curcio. 2006. Hos2 and Set3 promote integration of Ty1 retrotransposons at tRNA genes in Saccharomyces cerevisiae. Genetics 172: 2157-2167.

Nagaki, K., Z. Cheng, S. Ouyang, P.B. Talbert, M. Kim, K.M. Jones, S. Henikoff, C.R. Buell, and J. Jiang. 2004. Sequencing of a rice centromere uncovers active genes. Nat Genet 36: 138-145.

Nakagawa, H., J.K. Lee, J. Hurwitz, R.C. Allshire, J. Nakayama, S.I. Grewal, K. Tanaka, and Y. Murakami. 2002. Fission yeast CENP-B homologs nucleate centromeric heterochromatin by promoting heterochromatin-specific histone tail modifications. Genes Dev 16: 1766-1778.

Nakayama, J., J.C. Rice, B.D. Strahl, C.D. Allis, and S.I. Grewal. 2001. Role of histone H3 lysine 9 methylation in epigenetic control of heterochromatin assembly. Science 292: 110-113.

Nakayashiki, H., K. Kiyotomi, Y. Tosa, and S. Mayama. 1999. Transposition of the retrotransposon MAGGY in heterologous species of filamentous fungi. Genetics 153: 693-703.

Nielsen, P.R., D. Nietlispach, H.R. Mott, J. Callaghan, A. Bannister, T. Kouzarides, A.G. Murzin, N.V. Murzina, and E.D. Laue. 2002. Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9. Nature 416: 103-107.

Noma, K., T. Sugiyama, H. Cam, A. Verdel, M. Zofall, S. Jia, D. Moazed, and S.I. Grewal. 2004. RITS acts in cis to promote RNA interference-mediated transcriptional and post-transcriptional silencing. Nat Genet 36: 1174-1180.

Ochman, H., A.S. Gerber, and D.L. Hartl. 1988. Genetic applications of an inverse polymerase chain reaction. Genetics 120: 621-623.

Pereira, V. 2004. Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5: R79.

Peters, A.H., D. O'Carroll, H. Scherthan, K. Mechtler, S. Sauer, C. Schofer, K. Weipoltshammer, M. Pagani, M. Lachner, A. Kohlmaier, S. Opravil, M. Doyle, M.

57

Sibilia, and T. Jenuwein. 2001. Loss of the Suv39h histone methyltransferases impairs mammalian heterochromatin and genome stability. Cell 107: 323-337.

Peterson-Burch, B.D., D. Nettleton, and D.F. Voytas. 2004. Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol 5: R78.

Sadaie, M., T. Iida, T. Urano, and J. Nakayama. 2004. A chromodomain protein, Chp1, is required for the establishment of heterochromatin in fission yeast. EMBO J 23: 3825-3835.

SanMiguel, P., A. Tikhonov, Y.K. Jin, N. Motchoulskaia, D. Zakharov, A. Melake- Berhan, P.S. Springer, K.J. Edwards, M. Lee, Z. Avramova, and J.L. Bennetzen. 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 765-768.

Sasaki, T., T. Matsumoto, K. Yamamoto, K. Sakata, T. Baba, Y. Katayose, J. Wu, Y. Niimura, Z. Cheng, Y. Nagamura, et al. 2002. The genome sequence and structure of rice chromosome 1. Nature 420: 312-316.

Schroder, A., P. Shinn, H. Chen, C. Berry, J. Ecker, and F. Bushman. 2002. HIV-1 Integration in the Human Genome Favors Active Genes and Local Hotspots. Cell 110: 521.

Shun, M.C., N.K. Raghavendra, N. Vandegraaff, J.E. Daigle, S. Hughes, P. Kellam, P. Cherepanov, and A. Engelman. 2007. LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev 21: 1767-1778.

Sijen, T. and R.H. Plasterk. 2003. Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426: 310-314.

Singleton, T.L. and H.L. Levin. 2002. A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot Cell 1: 44-55.

Slotkin, R.K. and R. Martienssen. 2007. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet 8: 272-285.

Tamaru, H. and E.U. Selker. 2001. A histone H3 methyltransferase controls DNA methylation in Neurospora crassa. Nature 414: 277-283.

Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876-4882.

58

Thon, M.R., S.L. Martin, S. Goff, R.A. Wing, and R.A. Dean. 2004. BAC end sequences and a physical map reveal transposable element content and clustering patterns in the genome of Magnaporthe grisea. Fungal Genet Biol 41: 657-666.

Turck, F., F. Roudier, S. Farrona, M.L. Martin-Magniette, E. Guillaume, N. Buisine, S. Gagnot, R.A. Martienssen, G. Coupland, and V. Colot. 2007. Arabidopsis TFL2/LHP1 specifically associates with genes marked by trimethylation of histone H3 lysine 27. PLoS Genet 3: e86.

Vagin, V.V., A. Sigova, C. Li, H. Seitz, V. Gvozdev, and P.D. Zamore. 2006. A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313: 320-324.

Vastenhouw, N.L. and R.H. Plasterk. 2004. RNAi protects the Caenorhabditis elegans germline against transposition. Trends Genet 20: 314-319.

Wang, G.P., A. Ciuffi, J. Leipzig, C.C. Berry, and F.D. Bushman. 2007. HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res 17: 1186-1194.

Wood, V. R. Gwilliam, M.A. Rajandream, M. Lyne, R. Lyne, A. Stewart, J. Sgouros, N. Peat, J. Hayles, S. Baker, et al. 2002. The genome sequence of Schizosaccharomyces pombe. Nature 415: 871-880.

Wu, X., Y. Li, B. Crise, and S.M. Burgess. 2003. Transcription start regions in the human genome are favored targets for MLV integration. Science 300: 1749-1751.

Xie, W., X. Gai, Y. Zhu, D.C. Zappulla, R. Sternglanz, and D.F. Voytas. 2001. Targeting of the yeast Ty5 retrotransposon to silent chromatin is mediated by interactions between integrase and Sir4p. Mol Cell Biol 21: 6606-6614.

Yieh, L., H. Hatzis, G. Kassavetis, and S.B. Sandmeyer. 2002. Mutational analysis of the transcription factor IIIB-DNA target of Ty3 retroelement integration. J Biol Chem 277: 25920-25928.

Yieh, L., G. Kassavetis, E.P. Geiduschek, and S.B. Sandmeyer. 2000. The Brf and TATA-binding protein subunits of the RNA polymerase III transcription factor IIIB mediate position-specific integration of the gypsy-like element, Ty3. J Biol Chem 275: 29800-29807.

Yuan, Q., S. Ouyang, A. Wang, W. Zhu, R. Maiti, H. Lin, J. Hamilton, B. Haas, R. Sultana, F. Cheung, J. Wortman, and C.R. Buell. 2005. The institute for genomic research Osa1 rice genome annotation database. Plant Physiol 138: 18-26.

59

Zemach, A., Y. Li, H. Ben-Meir, M. Oliva, A. Mosquna, V. Kiss, Y. Avivi, N. Ohad, and G. Grafi. 2006. Different domains control the localization and mobility of LIKE HETEROCHROMATIN PROTEIN1 in Arabidopsis nuclei. Plant Cell 18: 133-145.

Zhang, X., S. Germann, B.J. Blus, S. Khorasanizadeh, V. Gaudin, and S.E. Jacobsen. 2007. The Arabidopsis LHP1 protein colocalizes with histone H3 Lys27 trimethylation. Nat Struct Mol Biol 14: 869-871.

Zou, S., N. Ke, J.M. Kim, and D.F. Voytas. 1996. The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev 10: 634-645.

60

CHAPTER 3. CENTROMERIC RETROTRANSPOSONS: POTENTIAL ROLE OF THE INTEGRASE C-TERMINUS IN DETERMINING GENOMIC DISTRIBUTION

This is a manuscript to be submitted to BMC Plant Biology Yi Hou and Daniel F. Voytas

ABSTRACT The plant CR retrotransposons are a highly conserved family of Ty3/gypsy elements found almost exclusively in centromeric heterochromatin. This biased genomic distribution may be caused by targeted integration of CR retrotransposons into centromeric regions or by selection against insertions into other chromosomal sites. The C-terminus of CR integrases from both monocots and dicots encodes a short amino acid sequence motif (the CR motif) that we hypothesize is involved in targeted integration at centromeres. When fused to YFP and expressed in vivo, specific sub-nuclear foci are observed that are coincident with DAPI- rich chromocenters in Arabidopsis protoplasts. These YFP foci also overlap with CENP-C, a major component of the kinetochore, which is associated with centromeric regions throughout the cell cycle. This sub-nuclear localization pattern was also observed in tobacco and maize protoplasts but not in fission yeast or rat cells. Deletion analysis determined that as few as 11 contiguous amino acids in the CR motif can mediate centromeric localization. Recruitment of the CR motif to chromocenters was decreased in Arabidopsis ddm1 and met1 mutant cells that have significantly reduced heterochromatin. Our data suggest that the CR motif interacts with specific components of plant centromeric heterochromatin. We hypothesize that this interaction underlies the novel distribution pattern of the CR retrotransposon insertions in plant genomes.

61

INTRODUCTION

The long terminal repeat (LTR) retrotransposons are a specific type of transposable element, which like their cousins the retroviruses, replicate through reverse transcription of their mRNA and integration of their cDNA into a new locus in their host’s genome. Genome sequencing has revealed that LTR retrotransposons constitute a large fraction of eukaryotic genomes, from 3% of Saccharomyces cerevisiae to over 50% of some plants such as maize (Bennetzen, 1996; Kim et al., 1998). Most retrotransposons are not randomly distributed in the genome, based on sequencing analysis and fluorescence in situ hybridization (FISH) (Peterson-Burch et al., 2004; Mroczek and Dawe, 2003). To avoid the negative consequences of integration, some retrotransposons accumulate in ‘safe havens’, principally regions of low gene density such as heterochromatin (Craig 1997; Craigie 1992; Sandmeyer 1990). In turn, enrichment of retrotransposons in certain regions of the genome can contribute to or reinforce chromatin structure (Slotkin and Martienssen, 2007).

It is suggested that many LTR retrotransposons identify certain chromosomal regions during integration by recognizing specific chromatin states or DNA-bound protein complexes (Bushman, 2003). For example, in the Saccharomyces cerevisiae genome, Ty1 and Ty3 elements integrate specifically near RNA polymerase III transcription initiation sites by recognizing the pol III transcription complex or chromatin features associated with pol III transcription (Bachman et al., 2005; Mou et al., 2006; Yieh et al., 2002; Yieh et al., 2000). In a plasmid-based assay, Tf1, a LTR retrotransposon in Schizosaccharomyces pombe, was shown to integrate into the fab1 promoter region due to the interaction between integrase and the transcriptional activator Atf1 (Leem et al., 2008). The most compelling support for the retrotransposon targeting model comes from studies of the Ty5 retrotransposon. Ty5 targets the heterochromatin of telomeres and the silent mating loci (Zou et al., 1996). It was found that targeting by Ty5 is mediated by a small motif (targeting domain, TD) at the C-terminus of integrase (IN). TD interacts with Sir4, a component of heterochromatin, and this interaction brings the integration complex to target sites. Retroviruses use a strategy similar to Ty5 for targeting integration. An interaction between HIV IN and the transcription factor

62

LEDGF/p75 causes HIV to integrate near actively transcribed genes (Ciuffi et al., 2005; Llano et al., 2006; Shun et al., 2007).

Chromoviruses are chromodomain-containing members of the Ty3/gypsy retrotransposon. The C-terminus of chromoviral INs typically contains a chromodomain (CHD) -- a 40 to 50 amino acid sequence motif that can interact with diverse components of chromatin (Malik and Eickbush, 1999; Brehm et al., 2004). It was reported recently that the interaction between the CHD of the fungal element, MAGGY, and a histone mark associated with heterochromatin, H3 K9 methylation, can target integration complexes to heterochromatin (Gao et al., 2008). Centromere-specific retrotransposable elements (CRs) are chromoviruses that are highly associated with centromeres(Cheng et al., 2002; Hudakova et al., 2001; Jiang et al., 2003; Miller et al., 1998; Mroczek and Dawe, 2003; Presting et al., 1998; Zhong et al., 2002). CRs were first found in grasses, but have recently been discovered in many other monocots and dicots (Gorinsek et al., 2005). They belong to a distinct clade, the “CRM clade”, of the chromoviruses (Gorinsek et al., 2005), and CRs differ from the other chromoviruses in their chromodomains (Gorinsek et al., 2005; Gao et al, 2008).

Here we identify a short, 11 amino acid sequence motif (the CR motif) from the C- terminus of CR integrases from both monocots and dicots. We hypothesize that the CR motif is involved in targeting integration to centromeres. When fused to YFP and expressed in vivo, specific sub-nuclear foci are observed that are coincident with a centromeric marker in Arabidopsis protoplasts. Recruitment of the CR motif to chromocenters is decreased in Arabidopsis ddm1 and met1 mutant cells that have significantly reduced heterochromatin. Our data suggest that the CR motif interacts with specific components of plant centromeric heterochromatin. We hypothesize that this interaction underlies the novel distribution pattern of the CR retrotransposon insertions in the genome. To our knowledge, this is the first targeting domain identified in a plant LTR retrotransposon.

63

RESULTS The C-termini of integrases from CRs have a distinct subnuclear localization. To understand the cellular function of the C-terminus of CR element integrase, we analyzed its localization in Arabidopsis and maize leaf protoplasts. Translational fusions of integrase C-termini were made to a YFP marker with a nuclear localization signal (NLS). To avoid possible conformational artifacts or inactivation of the fluorescence due to protein fusions, C-termini of CRs were fused to the C-terminal region of YFP. As a control, the YFP construct with the NLS alone was used. The constructs from CRM were introduced into maize and Arabidopsis leaf protoplasts. The construct from CRA was introduced into Arabidopsis protoplasts. As expected, the localization of YFP with only the NLS was uniform throughout the nucleus. In contrast, the NLS-YFP fused to the C-termini of CR integrase showed a novel pattern (Fig. 1A, 1B). A specific subnuclear localization was observed, as indicated by numerous discrete rounded foci. These localization experiments suggest that the integrase C-terminus could be a mediator of target specificity.

The CR domain is the determinant of subnuclear localization. CR elements form a specific clade of the Chromoviruses, whose chromodomain likely targets these retroelements to specific sites in the genome (Malik and Erickbush, 1999; Xiang et at., 2008). A BLAST search using known CR integrases as queries identified new CR elements in many monocot and dicot plants. An alignment of the C-terimini of these integrases was constructed, and a novel CR domain was identified (Fig. 1C). We asked whether this CR domain is required for the subnuclear foci observed when the C-termini of CR integrases were fused to YFP. A YFP fusion lacking the CR domain was analyzed for its subnuclear localization pattern in the transient protoplast expression assay. Consistent with the previous results, wild-type C-termini of CRs localized to subnuclear foci (speckles), which were distributed evenly throughout the nucleus. In contrast, deletion of the CR domain resulted in complete loss of subnuclear foci (Fig. 1E). We next asked whether the CR domain itself was sufficient for the localization to nuclear foci. We isolated the CR domain from the C-terminus, fused it to YFP, and tested it in the transient expression assay.

64

65

Figure1. Subnuclear localization of CR domains identified by sequence analysis. (A) YFP fusion proteins expressing the C-terminus of integrase from centromeric retrotransposons of Z. mays (CRM), O. sativa (CRR) and A. thaliana (CRA) localize to discrete foci in nuclei of A. thaliana cells. There are no discrete foci in the nuclei of A. thaliana cells transferred with NLS-YFP. (B) YFP fusion proteins expressing the C-terminus of integrase from CRM localize to discrete foci in nuclei of Z. mays cells. (C) Structural organization of IN. The bars depict IN of a typical CR element. The N-terminal HHCC domain and the catalytic DD35E domain are labeled. The C-terminus is marked by slanted lines. Specific features in the C-terminus are as follows: GPF/Y motif, CR domain (black box). The aligned sequences are from conserved CR domains found in the C-termini of CR integrases. (D) Neighbor-joining tree (unrooted, bootstrapped 10,000 replicates) based on the CR domain alignment. Nodes with confidence values are indicated. (E) YFP fused to the CR domain shows a similar nuclear localization pattern as YFP fused to the entire C-terminus of integrase. (a) Scheme of C-integrase, C- integrase without the CR domain and the CR domain fused to YFP. (b) The indicated fusion proteins were transiently expressed in A. thaliana cells.

The CR domain by itself gave a subnuclear localization pattern similar to the entire integrase C-terminus (Fig. 1E). These experiments demonstrate that the CR domain is sufficient to target proteins to subnuclear foci. To explore the relationship of CR motif and CR element, a unrooted neighbor-joining tree was made using CR domain amino acid sequences (Fig. 1D). The tree is congruent with the phylogeny of the hosts of the CR elements.

The CR domain localizes to chromocenters. Based on the retroelement targeting model (Bushman et al, 1995), integration sites are determined when retroelement integrases recognize unique chromatin features. This is supported by studies of the Ty5 retrotransposons of Saccharomyces cerevisiae whose integration to heterochromatin relies upon the interaction of a targeting domain (six amino acids in the integrase C-terminus) and Sir4p, a structural protein of heterochromatin (Zhu et al, 2003). We hypothesized that the CR domain recognizes specific components of centromeric heterochromatin and that this interaction is responsible for the highly specific distribution to CR elements at centromeres. To address this question more directly, we investigated whether the subnuclear localization of the CR domain coincides with centromeric heterochromatin. In Arabidopsis interphase nuclei, heterochromatin forms clearly distinguishable chromocenters, which appear as bright, fluorescent regions after 4,6-

66 diamidino-2-phenylindole (DAPI) staining (Fransz et al., 2002). Immunostaining was carried out with Arabidopsis leaf protoplasts expressing a YFP-CR domain construct using antibodies that recognize YFP. It is apparent that the YFP-CR fusion protein co-localizes with heterochromatic regions that stain with DAPI (Fig. 2A). Centromere protein C (CENP- C) is a component of the kinetochore and is essential for chromosome segregation (Dawe et al., 1999). In Arabidopsis, CENP-C appears at the centromeric regions throughout the cell cycle, as evidenced by immunofluorescence and fluorescence in situ hybridization experiments using centromeric repeats (Shibata and Murata, 2004). To further characterize the YFP-CHD subnuclear foci, we performed co-localization experiments using CR domain fused to CFP and CENP-C fused to YFP. YFP-CENP-C co-localized with the CFP-CR fusion protein; however, the CFP-CR signal was less focused (Fig. 2B).

Figure 2. The CR domain localizes to centromeric heterochromatin. (A) Immunostaining shows the localization of the CR domain at the intensely DAPI-stained chromocenters. Fixed nuclei from Arabidopsis leaf protoplasts expressing CR domain-YFP fusions were immunolabeled with anti-GFP and inspected under a confocal microscope. Note the intense YFP signal at the perinucleolar chromocenters and lower signals at other chromocenters. (B) The localization of CR domain-CFP fusion proteins is coincident with a fusion between YFP and CENP-C, a component of the kinetochore. The image was taken on a confocal microscope.

67

Localization of YFP-CHD is affected in mutants displaying reduced amounts of centromeric heterochromatin. To examine the importance of DNA and histone H3-K9 methylation for CR domain subnuclear localization, the YFP-CR domain was transiently expressed in protoplasts derived from several Arabidoposis mutants: suvh2 has a defect in a histone methytransferase gene that causes a strong reduction in H3-K9 methylation(Naumann et al., 2005); kyp has a defect in the kryptonite H3-K9 methyltransferase gene that causes decreases in CpNpG DNA methylation and decreases in histone H3-K9 methylation(Jackson et al., 2002) ; ddm1 has a mutation in a SW1/SNF2 chromatin remodeling factor; met1 has a mutation in a DNA methyltransferase gene. The latter two mutants have reduced DNA methylation and histone H3-K9 methylation in heterochromatin (Soppe et al., 2002; Tariq et al., 2003). Results of subnuclear localization experiments showed that the distribution of the CR domain in kyp and suvh2 was similar to the pattern observed in wild type cells (Fig. 3A), suggesting that H3-K9 dimethylation is dispensable for the localization of the CR motif at chromocenters. The number and intensity of CR domain foci in nuclei was reduced in ddm1 and met1 mutants (Fig. 3A). To quantify these differences, we counted the number of foci in 100 nuclei of wild type cells and cells from each mutant. As the graph shows, most of wild type nuclei contain 7 or more green spots. In contrast, most of the ddm1 and met1 nuclei contain less than 5 foci (Fig. 3C). We also performed immunostaining and found that the YFP-CR domain foci co-localized with DAPI in met1 and ddm1 mutants (Fig. 3B). Recruitment of the CR domain to chromocenters was decreased in ddm1 and met1 mutant cells that have significantly reduced heterochromatin, suggesting that the CR domain interacts with specific components of plant heterochromatin. To further ask if CR domain specifically recognizes centromeric heterochromatin, an YFP-CR domain construct was introduced into cells derived from an nrpd2 mutant, which is defective in establishing facultative heterochromatin but not constitutive heterochromatin (Onodera et al., 2005) . The localization pattern of the CR domain was not changed in nrpd2 cells (Fig. 3D). YFP-CENP-C also showed wild type localization signal in nrpd2 cells (Fig. 3D).

68

Figure 3. Localization of the CR domain to chromocenters requires the integrity of constitutive heterochromatin. (A) Subnuclear localization of CR domain-YFP was reduced in ddm1 and met1 but not in suvh2 and kyp mutant cells. (B) Immunostaining shows the localization of the CR domain at the DAPI-stained chromocenters in ddm1 mutant cells. (C) Distribution of the number of foci per nucleus enriched in YFP-CR domain proteins in ddm1, met1 and wild type cells. One hundred nuclei were counted for each sample. (D) Subnuclear localization of CR domain-YFP and CENP-C-YFP was not changed in nrpd2 mutant cells.

69

Figure 4. The CR motif is sufficient for CR domain subnuclear localization patterns, and a conserved arginine is essential. (A) The blue arrows denote conserved residues, which when mutated, have no effect on CR domain subnuclear localization. The red arrow indicates the key arginine residue. The CR motif is marked by the light yellow square. (B) Mutations in the conserved arginine residue in the CR domain of both CRM and CRR abrogate subnuclear localization of YFP-CR domain fusion proteins. The arginine was mutated to alanine. (C) The CR motif fused with YFP can co-localize with the CR domain fused with CFP.

An 11 amino acid motif in the CR domain is sufficient for subnuclear localization. The CR domain was mutagenized to identify the amino acid residues responsible for the distinct pattern of subnuclear localization. Several conserved residues identified in alignments of the CR domain from different species were separately mutated to alanine. The mutant CR domains were fused with YFP and expressed in Arabidopsis leaf protoplasts. Only the arginine mutation (marked in red in the alignment) caused a loss of subnuclear localization (Fig. 4A, 4B). Since mutations in other conserved residues did not alter CR

70 domain localization, we further asked if the most conserved 11 amino acid GPITRARARQL motif (the CR motif) is sufficient for CR domain subnuclear localization. Discrete subnuclear foci were observed when only the CR motif was fused to YFP. To confirm that the CR motif is completely responsible for the subnuclear localization of the CR domain, co- localization of YFP-CR motif and CFP-CR domain fusion proteins were assessed. As shown in Fig. 4C, the CR motif is sufficient for subnuclear localization of the CR domain.

DISCUSSION Centromere-specific retrotransposons and the CR motif: Centromere-specific retrotransposons are highly enriched in plant centromeric heterochromatin (Cheng et al., 2002; Nagaki and Murata, 2005; Presting et al., 1998; Zhong et al., 2002). This distribution may be caused by CRs specifically targeting heterochromatic regions during transposition or by selection acting against retrotransposon insertions into the other chromosomal sites. A detailed analysis of CR sequences suggests that CRs accumulate at centromeres through retrotransposition. First, CRs encode functional proteins required for retrotransposition (Cheng et al., 2002; Zhong et al., 2002). Second, full-length CRM and CRR elements have transposed recently based on the sequence similarities between their 5’ and 3’ LTRs (Sharma and Presting, 2008). Third, the relatively high ratio of synonymous to non-synonymous codon changes in inter- and intraspecific CR sequence comparisons suggests a typical pattern of enrichment in the genome caused by retrotransposition (Langdon et al., 2000).

Retrotransposon and retroviral integrases (INs) have three distinct domains: 1) an N- terminal region with a HHCC zinc-binding domain that potentially binds cDNA, 2) a catalytic domain that performs the integration reaction, and 3) a C-terminal domain that is highly divergent among different families of retroelements. One function of the C-terminus of integrase was discovered for the Ty5 elements of Saccharomyces. The very C terminus of Ty5 IN encodes a targeting domain that is required for the preintegration complex to interact with heterochromatin. The integrase C-termini of the Chromoviruses have chromodomains, which are thought to have a targeting function similar to Ty5. The chromodomain of the

71 fungal element, Maggy, interacts with H3 dimethyl- and trimethyl-K9. When the Maggy chromodomain is fused to the integrase of the S. pombe Tf1 element, it can direct Tf1 to the sites of H3-K9 methylation (Xiang et al., 2008). In this study, we identified a short amino acid sequence at the C-terminus of integrase in CR elements (the CR motif). Localization experiments demonstrated that the CR motif has the ability to direct proteins such as YFP to centromeric heterochromatin, suggesting the CR motif may contribute to centromeric target specificity.

A more detailed view of the distribution of CR elements in the rice and maize genome has been made possible by the ongoing maize and rice genome sequencing efforts. Despite their preferential localization in centromeric heterochromatin, several CRRs are found in pericentromeric and euchromatic regions on all 12 rice chromosomes. Among these is CRM4, an old subfamily within the CR clade of chromoviruses, which is non-centromeric (Nagaki et al., 2005b; Sharma and Presting, 2008). We could not find the CR motif in C- terminus of the CRM4 intergrase. Because of the low fidelity of reverse transcription (Miller, 1997), it is possible that the CR motif was mutated or lost during reverse transcription. It is known that Ty5 will no longer target to heterochromatin when important residues in the targeting domain are mutated (Dai et al., 2007). The non-centromeric distribution of CR elements may be caused by integrases with mutated or missing CR motifs. Sequence analysis showed that CRM4 elements are located close to centromeres. It is also possible that the non-centromeric distribution of CR elements is limited to old CR families that lost the ability to retrotranspose into active centromeres and therefore they are pushed into pericentromeric regions by rapidly evolving satellite repeats (Sharma and Presting, 2008; Wong and Choo, 2004). Further sequence analysis should be carried out to distinguish among these possibilities.

Phylogenetic analysis revealed that the relationships among CR motifs reflects the relationships among the species from which they are derived (Gorinsek et al., 2005). It has been suggested that the CR element may have acquired the CR motif before the CR elements were widespread in seed plants (Spermatophyta). Once the CR motif was formed, it brought

72 some selective advantage to the CR element (e.g. specialized targeting) and to the host (e.g. contributions to centromere function). A Blast search with the CR motif was carried out, but we could not find any endogenous protein with CR-like motifs. It may be that the CR motif has evolved rapidly within the retrotransposon, and so it is difficult to identify the ancestral sequences.

The CR motif and centromeric heterochromatin: The results presented here strongly argue that the CR domain specifically targets proteins to plant centromeric heterochromatin. First, the CR domain co-localizes with CENP- C, a constitutive component of the kinetochore. Secondly, the localization of the CR domain is reduced in ddm1 and met1 mutants in which heterochromatic domains are reduced(Soppe et al., 2002) . In contrast, the localization of the CR domain was not changed in nrpd2 mutants, which display altered facultative heterochromatin but show no alterations in constitutive heterochromatin (Onodera et al., 2005). It is particularly intriguing that the 11 amino acid CR motif within the CR domain is sufficient to target to centromeric heterochromatin. The targeting domain of Ty5 is also a small amino acid motif (6 amino acids), and phosphorylation on a serine regulates integration of Ty5 to heterochromatin (Dai et al., 2007). We speculate that some kind of modification, perhaps methylation of the conserved and critical arginine, may regulate target specificity of CR elements.

Subnuclear localization of the CR motif was observed in maize protoplasts, tobacco protoplasts (data not shown) and Arabidopsis protoplasts, but not in S. pombe and mammalian cells (data not shown). One remarkable feature of the CR motif is its high level of conservation; the CR motifs from maize and rice have the same amino acid sequence despite the fact that these two species diverged >55 million years ago. We hypothesize that the CR motif recognizes a very conserved component of plant centromeres. Alternatively, the CR motif may be modified by a plant-specific protein, and this modification may be required for the CR motif to target to centromeres.

73

In contrast to the diversity of centromeric sequences, centromeric chromatin has many shared features among different species (Carroll and Straight, 2006). CEN-H3, a histone 3 variant, replaces the canonical histone 3 in centromeres and interacts with other histone proteins to construct a specific nucleosome that is important for kinetochore function (Choo, 2001; Henikoff et al., 2001), CHIP-based analyses with a CEN-H3 antibody showed that CEN-H3 specifically binds CENT-O, CR elements in rice maize, CENT-C, and centromeric repeats in sugarcane and Luzula nivea (Nagaki et al., 2004; Nagaki et al., 2005a; Nagaki and Murata, 2005; Zhong et al., 2002). CEN-H3, therefore, is a reasonable candidate to be the CR motifs partner. Unfortunately, interaction was not detected between CEN-H3 and the CR domain in in vitro pull-down and yeast two hybrid assays. In vivo assays should be developed to further characterize the CR motifs interacting partner.

Epigenetic modification is another distinguishing hallmark of centromeres. Studies on human and fruit flies have revealed that centromeres have a chromatin structure distinct from both euchromatin and heterochromatin (Sullivan and Karpen, 2004). However, no epigenetic modifications specific to plant centromeres have been described until now (Sharma and Presting, 2008; Zhang et al., 2008; Li et al., 2008; Nagaki et al., 2004; Shi and Dawe, 2006). GST-tagged wild-type and mutant CR domains were used in pull-down assays with a series of biotinylated histone H3 modified peptides, including H3 dimethyl-R2, H3 methyl-K27, H3 dimethyl- K27, H3 trimethyl-K27, H3 methyl-K36, H3 dimethyl-K36, H3 trimethyl-K36, H3 methyl-K4, H3 methyl-R8, H3 trimethyl-K14, H3 methyl-R17, H3 trimethyl-K18, and H3 dimethyl-K9 (M. Bedford, pers. communication; data not shown). None of these histone modifications were found to interact with the CR domain. Since centromeric chromatin was not well studied, the CR domain may recognize a specific centromere modification that has not been discovered.

PERSPECTIVE Centromere evolution and centromere-specific retrotransposons: Recent work in primates and barley indicate that centromere function (neocentromeres) can be obtained without canonical centromeric repeats (Choo, 2001;

74

Nasuda et al., 2005). It has also been shown that regions without centromeric repeats can recruit CEN-H3 and other essential proteins to form a functional kinetochore when normal centromeres are lost or inactivated (Amor et al., 2004; Amor and Choo, 2002; Nasuda et al., 2005). Here we propose that the CR element may play a role in initiating centromere assembly by recognizing neocentromere-specific markers and serving as the template to evolve centromeric repeats. In our studies, we indentified a CR motif, which has the ability to localize to centromeres. The CR motif may recognize an epigenetic mark associated with neocentromeres to bring the CR element to neocentromeric regions that lack centromeric repeats. It was shown that centromeric satellite repeats have sequence homology to known transposable elements. One particular example is a 250 bp centromeric tandem repeat of wheat species Aegilops spelioides, which shares high sequence similarity with part of a CR element (cerebra) (Cheng and Murata, 2003). It is also predicted that once the new centromere was formed, the repeats can recruit sequence-specific binding proteins to establish centromeres and ensure the stable inheritance of centromeres (Henikoff et al., 2001; Malik and Henikoff, 2002). Much remains to be learned about the role of retroelements in centromere function. It is clear that such a role is consistent with the many other ways retroelements affect genome integrity and evolution.

MATERIAL AND METHODS Plant materials and protoplast transformation Seeds of Arabidopsis thaliana mutants were kindly provided by several individuals: ddm1 and met1 were provided by E. Richards; suvh2 by G. Reuter; nrpd2 by C. Pikaard. The mutant kyp was obtained from the ABRC. Arabidopsis thaliana wild-type ecotypes Columbia and the various mutants were grown under short-day conditions at 24 C. After 4 to 6 weeks post-germination, rosette leaves were used for the isolation and transformation of protoplasts as described (Yoo et al., 2007) . Seeds of Zea mays were kindly provided by the North Central Regional Plant Introduction Station. Protoplast isolation and transformation was as described (http://genetics.mgh.harvard.edu/sheenweb/protocols_reg.html). The YFP and CFP signals were detected 10 hours after transfection using a laser confocal microscope (Nikon Eclipse 200 microscope). Images were obtained using an excitation wavelength of 488 nm

75 for YFP and 440 nm for CFP. Images for YFP, CFP and chlorophyll signals were collected though 505-525 nm, 480 nm and 630 nm filter, respectively.

Immunolabeling: Nuclei isolation and fixation was as previously described (Bowler et al., 2004). Slides were blocked with 2% BSA in PBS for 30 min at 37℃, followed by overnight incubation at 4℃ with 100 µl of primary antibody mixture containing 0.5 µg of anti-GFP (Molecular Probe) in 1% BSA. Slides were washed three times, 5 min each, in PBS, followed by 45 min incubation at 37℃ with secondary FITC-conjugated secondary antibody from Cappel/ICN, Southern Biotech). Slides were washed as described above, stained with DAPI in mounted media, and inspected with a confocal microscope as described above.

REFERENCES Amor, D.J., Bentley, K., Ryan, J., Perry, J., Wong, L., Slater, H., and Choo, K.H. (2004). Human centromere repositioning "in progress". Proc Natl Acad Sci U S A 101, 6542-6547.

Amor, D.J., and Choo, K.H. (2002). Neocentromeres: role in human disease, evolution, and centromere study. Am J Hum Genet 71, 695-714.

Bachman, N., Gelbart, M.E., Tsukiyama, T., and Boeke, J.D. (2005). TFIIIB subunit Bdp1p is required for periodic integration of the Ty1 retrotransposon and targeting of Isw2p to S. cerevisiae tDNAs. Genes Dev 19, 955-964.

Bennetzen, J.L. (1996). The contributions of retroelements to plant genome organization, function and evolution. Trends Microbiol 4, 347-353.

Bowler, C., Benvenuto, G., Laflamme, P., Molino, D., Probst, A.V., Tariq, M., and Paszkowski, J. (2004). Chromatin techniques for plant cells. Plant J 39, 776-789.

Bushman, F.D. (2003). Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell 115, 135-138.

Carroll, C.W., and Straight, A.F. (2006). Centromere formation: from epigenetics to self- assembly. Trends Cell Biol 16, 70-78.

Cheng, Z., Dong, F., Langdon, T., Ouyang, S., Buell, C.R., Gu, M., Blattner, F.R., and Jiang, J. (2002). Functional rice centromeres are marked by a satellite repeat and a centromere- specific retrotransposon. Plant Cell 14, 1691-1704.

76

Cheng, Z.J., and Murata, M. (2003). A centromeric tandem repeat family originating from a part of Ty3/gypsy-retroelement in wheat and its relatives. Genetics 164, 665-672.

Choo, K.H. (2001). Domain organization at the centromere and neocentromere. Dev Cell 1, 165-177.

Ciuffi, A., Llano, M., Poeschla, E., Hoffmann, C., Leipzig, J., Shinn, P., Ecker, J.R., and Bushman, F. (2005). A role for LEDGF/p75 in targeting HIV DNA integration. Nat Med 11, 1287-1289.

Dai, J., Xie, W., Brady, T.L., Gao, J., and Voytas, D.F. (2007). Phosphorylation regulates integration of the yeast Ty5 retrotransposon into heterochromatin. Mol Cell 27, 289-299.

Dawe, R.K., Reed, L.M., Yu, H.G., Muszynski, M.G., and Hiatt, E.N. (1999). A maize homolog of mammalian CENPC is a constitutive component of the inner kinetochore. Plant Cell 11, 1227-1238.

Fransz, P., De Jong, J.H., Lysak, M., Castiglione, M.R., and Schubert, I. (2002). Interphase chromosomes in Arabidopsis are organized as well defined chromocenters from which euchromatin loops emanate. Proc Natl Acad Sci U S A 99, 14584-14589.

Gao, X., Hou, Y., Ebina, H., Levin, H.L., and Voytas, D.F. (2008). Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res 18, 359-369.

Gorinsek, B., Gubensek, F., and Kordis, D. (2005). Phylogenomic analysis of chromoviruses. Cytogenet Genome Res 110, 543-552.

Henikoff, S., Ahmad, K., and Malik, H.S. (2001). The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098-1102.

Hudakova, S., Michalek, W., Presting, G.G., ten Hoopen, R., dos Santos, K., Jasencakova, Z., and Schubert, I. (2001). Sequence organization of barley centromeres. Nucleic Acids Res 29, 5029-5035.

Jackson, J.P., Lindroth, A.M., Cao, X., and Jacobsen, S.E. (2002). Control of CpNpG DNA methylation by the KRYPTONITE histone H3 methyltransferase. Nature 416, 556-560.

Jiang, J., Birchler, J.A., Parrott, W.A., and Dawe, R.K. (2003). A molecular view of plant centromeres. Trends Plant Sci 8, 570-575.

Kim, J.M., Vanguri, S., Boeke, J.D., Gabriel, A., and Voytas, D.F. (1998). Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res 8, 464-478.

77

Langdon, T., Seago, C., Mende, M., Leggett, M., Thomas, H., Forster, J.W., Jones, R.N., and Jenkins, G. (2000). Retrotransposon evolution in diverse plant genomes. Genetics 156, 313- 325.

Leem, Y.E., Ripmaster, T.L., Kelly, F.D., Ebina, H., Heincelman, M.E., Zhang, K., Grewal, S.I., Hoffman, C.S., and Levin, H.L. (2008). Retrotransposon Tf1 is targeted to Pol II promoters by transcription activators. Mol Cell 30, 98-107.

Li, X., Wang, X., He, K., Ma, Y., Su, N., He, H., Stolc, V., Tongprasit, W., Jin, W., Jiang, J., et al. (2008). High-resolution mapping of epigenetic modifications of the rice genome uncovers interplay between DNA methylation, histone methylation, and gene expression. Plant Cell 20, 259-276.

Llano, M., Vanegas, M., Hutchins, N., Thompson, D., Delgado, S., and Poeschla, E.M. (2006). Identification and characterization of the chromatin-binding domains of the HIV-1 integrase interactor LEDGF/p75. J Mol Biol 360, 760-773.

Malik, H.S., and Eickbush, T.H. (1999). Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 73, 5186-5190.

Malik, H.S., and Henikoff, S. (2002). Conflict begets complexity: the evolution of centromeres. Curr Opin Genet Dev 12, 711-718.

Miller, J.T., Dong, F., Jackson, S.A., Song, J., and Jiang, J. (1998). Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150, 1615-1623.

Mou, Z., Kenny, A.E., and Curcio, M.J. (2006). Hos2 and Set3 promote integration of Ty1 retrotransposons at tRNA genes in Saccharomyces cerevisiae. Genetics 172, 2157-2167.

Mroczek, R.J., and Dawe, R.K. (2003). Distribution of retroelements in centromeres and neocentromeres of maize. Genetics 165, 809-819.

Nagaki, K., Cheng, Z., Ouyang, S., Talbert, P.B., Kim, M., Jones, K.M., Henikoff, S., Buell, C.R., and Jiang, J. (2004). Sequencing of a rice centromere uncovers active genes. Nat Genet 36, 138-145.

Nagaki, K., Kashihara, K., and Murata, M. (2005a). Visualization of diffuse centromeres with centromere-specific histone H3 in the holocentric plant Luzula nivea. Plant Cell 17, 1886-1893.

Nagaki, K., and Murata, M. (2005). Characterization of CENH3 and centromere-associated DNA sequences in sugarcane. Chromosome Res 13, 195-203.

78

Nagaki, K., Neumann, P., Zhang, D., Ouyang, S., Buell, C.R., Cheng, Z., and Jiang, J. (2005b). Structure, divergence, and distribution of the CRR centromeric retrotransposon family in rice. Mol Biol Evol 22, 845-855.

Nasuda, S., Hudakova, S., Schubert, I., Houben, A., and Endo, T.R. (2005). Stable barley chromosomes without centromeric repeats. Proc Natl Acad Sci U S A 102, 9842-9847.

Naumann, K., Fischer, A., Hofmann, I., Krauss, V., Phalke, S., Irmler, K., Hause, G., Aurich, A.C., Dorn, R., Jenuwein, T., et al. (2005). Pivotal role of AtSUVH2 in heterochromatic histone methylation and gene silencing in Arabidopsis. EMBO J 24, 1418-1429.

Onodera, Y., Haag, J.R., Ream, T., Nunes, P.C., Pontes, O., and Pikaard, C.S. (2005). Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120, 613-622.

Peterson-Burch, B.D., Nettleton, D., and Voytas, D.F. (2004). Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol 5, R78.

Presting, G.G., Malysheva, L., Fuchs, J., and Schubert, I. (1998). A Ty3/gypsy retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J 16, 721-728.

Sharma, A., and Presting, G.G. (2008). Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity. Mol Genet Genomics 279, 133- 147.

Shi, J., and Dawe, R.K. (2006). Partitioning of the maize epigenome by the number of methyl groups on histone H3 lysines 9 and 27. Genetics 173, 1571-1583.

Shibata, F., and Murata, M. (2004). Differential localization of the centromere-specific proteins in the major centromeric satellite of Arabidopsis thaliana. J Cell Sci 117, 2963-2970.

Shun, M.C., Raghavendra, N.K., Vandegraaff, N., Daigle, J.E., Hughes, S., Kellam, P., Cherepanov, P., and Engelman, A. (2007). LEDGF/p75 functions downstream from preintegration complex formation to effect gene-specific HIV-1 integration. Genes Dev 21, 1767-1778.

Slotkin, R.K., and Martienssen, R. (2007). Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet 8, 272-285.

Soppe, W.J., Jasencakova, Z., Houben, A., Kakutani, T., Meister, A., Huang, M.S., Jacobsen, S.E., Schubert, I., and Fransz, P.F. (2002). DNA methylation controls histone H3 lysine 9 methylation and heterochromatin assembly in Arabidopsis. EMBO J 21, 6549-6559.

79

Sullivan, B.A., and Karpen, G.H. (2004). Centromeric chromatin exhibits a histone modification pattern that is distinct from both euchromatin and heterochromatin. Nat Struct Mol Biol 11, 1076-1083.

Tariq, M., Saze, H., Probst, A.V., Lichota, J., Habu, Y., and Paszkowski, J. (2003). Erasure of CpG methylation in Arabidopsis alters patterns of histone H3 methylation in heterochromatin. Proc Natl Acad Sci U S A 100, 8823-8827.

Wong, L.H., and Choo, K.H. (2004). Evolutionary dynamics of transposable elements at the centromere. Trends Genet 20, 611-616.

Yieh, L., Hatzis, H., Kassavetis, G., and Sandmeyer, S.B. (2002). Mutational analysis of the transcription factor IIIB-DNA target of Ty3 retroelement integration. J Biol Chem 277, 25920-25928.

Yieh, L., Kassavetis, G., Geiduschek, E.P., and Sandmeyer, S.B. (2000). The Brf and TATA- binding protein subunits of the RNA polymerase III transcription factor IIIB mediate position-specific integration of the gypsy-like element, Ty3. J Biol Chem 275, 29800-29807.

Yoo, S.D., Cho, Y.H., and Sheen, J. (2007). Arabidopsis mesophyll protoplasts: a versatile cell system for transient gene expression analysis. Nat Protoc 2, 1565-1572.

Zhang, W., Lee, H.R., Koo, D.H., and Jiang, J. (2008). Epigenetic modification of centromeric chromatin: hypomethylation of DNA sequences in the CENH3-associated chromatin in Arabidopsis thaliana and maize. Plant Cell 20, 25-34.

Zhong, C.X., Marshall, J.B., Topp, C., Mroczek, R., Kato, A., Nagaki, K., Birchler, J.A., Jiang, J., and Dawe, R.K. (2002). Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14, 2825-2836.

Zou, S., Ke, N., Kim, J.M., and Voytas, D.F. (1996). The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev 10, 634-645.

80

CHAPTER 4. PLANT RETROELEMENTS FOR AND DELIVERY

A manuscript to be submitted to Plant Journal Yi Hou, Jyothi Rajagopal, Phillip A. Irwin, David Wright and Daniel F. Voytas

ABSTRACT Retrotransposons are mobile genetic elements that are widely distributed in plants and constitute the interspersed repetitive sequences in most plant genomes. The abundance of plant retrotransposons and their replication mechanism suggests they may be useful genetic tools for plant genome analysis, such as gene tagging and large-scale insertional mutagenesis. Here we describe a novel mini-retrotransposon vector system that is derived from plant retrotransposon Tnt1 and is useful for integrating foreign DNA into plants. The mini-Tnt1 vectors lack coding sequence yet retain the 5’LTR and 3’LTR regions that supply transcription initiation and termination sites as well as all cis-elements required for reverse transcription. Two different mini-Tnt1s were developed: transcription of one is driven by a complete 5’LTR; transcription of the other is driven by a chimeric 5’LTR whose first 233 bp were replaced by the CaMV 35S promoter. After transfer into tobacco protoplasts, both vectors undergo retrotransposition during protoplast regeneration. This demonstrates that mini-Tnt1 elements lacking functional gene products can be complemented in trans by endogenous elements. Transposition frequencies of mini-Tnt1vectors with the complete 5’LTR were comparable to the endogenous Tnt1. Also like endogenous Tnt1, insertion sites of mini-Tnt1s were within or near coding sequences. We provide experimental evidence that genetic recombination occurs during Tnt1 reverse transcription, suggesting that multiple copies of Tnt1 mRNA are packaged in virus-like particles. Our data demonstrate that mini- Tnt1 retrotransposons can be engineered for various applications in plant genome engineering.

81

INTRODUCTION Long terminal repeat (LTR) retrotransposons are mobile genetic elements that replicate through reverse transcription of an mRNA intermediate. Since LTR retrotransposons and retroviruses are similar in their genetic organization (Boeke et al, 1997), they are both referred to as retroelements. The LTRs contain the promoters and terminators associated with the transcription of the retroelement. Between the LTRs are a primer binding site (PBS) and polypurine tract (PPT), which are required for reverse transcription. All retroelement mRNAs encode two common genes: gag, which encodes virus or virus-like- particle, and pol, which encodes three enzymes, namely reverse transcriptase (RT), integrase (IN) and protease (PR).

Retroelements have a common replication cycle, and the major feature distinguishing the retrotransposons from the retroviruses is that the former are not infectious. Replication begins when the host transcription machinery produces an element mRNA, which is translated into protein required for replication and also serves as the template for reverse transcription. In the cell cytoplasm, Gag forms virus or virus-like particles (VLPs) in which RNA copies of the element and RT are packaged. Reverse transcription, which is carried out by RT to produce cDNA copies of the retroelement, is initiated at the PBS close to the 5’LTR using a host-encoded tRNA or a tRNA-derived primer. Minus-strand synthesis extends to the 5’ end of mRNA to generate minus-strand strong stop DNA (ssDNA). Because of sequence redundancy between the 5’ and 3’ LTR, the ssDNA jumps to 3’end of the same or different mRNA by pairing with complementary sequences. DNA synthesis then proceeds to the 5’-end of the mRNA completing minus-strand synthesis. The RNaseH activity of RT removes the RNA from the RNA/DNA hybrid, leaving the PPT that serves as the plus-strand primer. When the plus-strand DNA reaches the end of the template, the plus-strand ssDNA undergoes a second jump by pairing with the sequences at the 5’-end of cDNA. Completion of plus-stand synthesis generates a full-length double-stranded cDNA, which is the substrate for integration. The cDNA is carried into the nucleus by the integration complex, the major

82 component of which is integrase. Integrase cuts the genome and inserts the cDNA into the host DNA.

The ability to catalyze cDNA integration makes retroelements a good tool for gene delivery (Miller, 1997). Retroelement-based gene delivery strategies typically apply a two- component vector system. One component is a transfer vector, which carries a foreign gene for delivery to the host genome. The transfer vector also has all the cis-regulatory sequences required for high-efficiency retroelement replication. The second component is a helper retroelement, which supplies the necessary proteins required for the replication. After integration into the host cell, the vector retroelement is not capable of additional rounds of replication without the helper retroelement, and so the gene of interest is maintained and expressed in the modified cell. When a cell expresses both helper and vector elements, functional replication intermediates are formed. For the retroviruses, the intermediates are virus particles that can be harvested and used to infect the tissue of choice to deliver the target gene. For the retrotransposons, the replication intermediates are VLPs that carry a copy of the vector retroelement mRNA. This mRNA is reverse transcribed into cDNA, which can then integrate or recombine with the host genome.

Two-component retroviral vector systems are widely used for gene delivery in human gene therapy due to their ability to infect non-dividing cells and maintain persistent long term expression of the transgene (Li and Rossi, 2008). Two-component retrotransposon gene delivery systems have also been developed in budding and fission yeast (Bolton et al., 2005; Xu and Boeke, 1990). So far there are no reports describing the use of two-component retroelement delivery systems in higher plants.

LTR-retrotransposons are particularly abundant in plant genomes compared with those of animals or lower eukaryotes (Kumar and Bennetzen, 1999; L'Homme et al., 2000; Vitte and Bennetzen, 2006). Although a number of retrotransposons have been identified in plants, only the tobacco Tnt1 and Tto1 elements and the rice Tos17 elements have been shown to be transpositionally active (Grandbastien et al., 1989; Hirochika, 1993) (Hirochika

83 et al., 1996). Of these three, Tnt1 is the best characterized. Mobility of Tnt1 is greatly increased by stresses such as protoplast isolation, cell culture, and pathogen attack (Grandbastien et al., 2005; Grandbastien et al., 1994). For example, newly transposed Tnt1 copies have been detected in nearly 25% of the tobacco plants regenerated from protoplast culture (Melayah et al., 2001). It was also shown that transposed Tnt1 elements prefer to integrate within or near gene coding sequences and that these integrations are typically tolerated by the host (Le et al., 2007). Furthermore, Tnt1 elements are highly transposition competent in various heterologous plant species, including Arabidopsis, Medicago truncatula and lettuce (Lactuca sativa) (Courtial et al., 2001; d'Erfurth et al., 2003; Lucas et al., 1995; Mazier et al., 2007). Because of these features, Tnt1 was chosen for use in developing a two- component retroelement vector system.

In this report, we describe a plant two-component retroelement system designed to replicate and deliver foreign genes of interest. Mini-Tnt1 elements were constructed by replacing portions of the Tnt1 open reading frames (ORFs) with the selectable marker neomycin phosphotransferase II (NPTII). Tnt1-encoded gene products required for transposition were provided in trans by endogenous Tnt1 elements whose expression was induced by protoplast isolation. We show that mini-Tnt1 elements lacking functional gene products can be effectively complemented in trans by the endogenous helper Tnt1s. Our results suggest that like the vertebrate retroviruses, LTR- retrotransposons can be engineered to be efficient vehicles for gene delivery.

RESULTS The mini-Tnt1 two-component system Tnt1 was PCR-amplified from tobacco genomic DNA, and oligonucleotide-directed mutagenesis was used to modify cloned PCR products so that they were identical to the published Tnt1 sequence (GenBank accession X13777). A mini-Tnt1 vector was constructed for testing a two-component retroelement DNA delivery system in plants (Fig 1). The mini- Tnt1 vector consists of the complete 5’ and 3’ LTRs, which provide transcription initiation and termination sites, respectively. Sequences within the 5’ internal domain of Tnt1 was also

84 included that extend through the first 315 bp of gag. Non-coding internal domain sequences near the 3’ LTR were also included. The internal domain sequences contain cis-acting elements (e.g. priming sites) required for reverse transcription (Bolton et al., 2005; Xu and Boeke, 1990). A modified version of the mini vector was generated in which the 5’ region of the 5’LTR (U3 region) was replaced with the CaMV 35S promoter. The CaMV 35S promoter should enable constitutive mini-Tnt1 expression. The 35S promoter driven mini- Tnt1 (35S-mini-Tnt1) can also be used in the assay for transposition mediated through reverse transcription (Fig 1) (Lucas et al., 1995). More specifically, replication by reverse transcription will result in loss of the 35S sequences in progeny elements (see also below).

A neomycin phosphotransferase II (NPTII) gene with a nopaline synthase (NOS) promoter was cloned into both mini-Tnt1 vectors. Plant cells expressing NPTII are kanamycin resistant. The NPTII gene also contains an 85 bp artificial intron, which is in the same orientation as the mini-Tnt1 transcript. When mini-Tnt1 vectors are transferred into tobacco protoplasts, the endogenous Tnt1 elements, whose expression is induced by protoplast isolation (Pouteau et al., 1991), should provide in trans the gene products required for mini-Tnt1 retrotransposition. Two negative controls were made by deleting the PBS in the mini-Tnt1 vectors. Since transposition of yeast retrotransposon Ty1 was abolished in similar PBS mutants (Chapman et al., 1992), disruption of mini-Tnt1 transposition by these mutations was expected.

There are three ways to test for mini-Tnt1 retrotransposition (Fig. 1A). First, after splicing of the mRNA and reverse transcription, the intron of NPTII should not be present in the newly transposed elements. This enables parental and progeny elements to be distinguished by PCR (Fig. 2B). Secondly, during transposition, a complete 5' LTR of 35S- mini-Tnt1 element will be regenerated because the 5' end of the 3' LTR is used as a template for the synthesis of the 5’ LTR. Reconstitution of the 5’ LTR in a newly transposed 35S-Tntl element in Arabidopsis has previously been demonstrated (Lucas et al., 1995). Thirdly, if miniTnt1 elements undergo transposition, the flanking sequences of new insertions should contain the characteristic duplication at the integration site.

85

Figure 1. Molecular reagents for the mini-Tnt1 two-component system. Tnt1 is an autonomous retrotransposon present in the tobacco genome that provides Gag and Pol for the mini-Tnt1 vector element. In Arabidopsis, Gag and Pol are expressed from the CaMV 35S promoter. Mini-Tnt1 is a non-autonomous element with an NPTII gene that is driven by NOS promoter and an artificial intron. mRNA of mini-Tnt1 vector element is spliced and reverse transcribed by helper Gag and Pol Tnt1 proteins. The replicated min-Tnt1 lacks an intron (1), has a complete 5’ LTR (2) and has target site duplication (3). LTR sequences are shown as solid triangles. The short thin red lines denote the location of the PBS and polypurine tracts (PPT). Open reading frames and promoter are indicated by the open boxes. Small Red Box stands for artificial intron. Amino acid sequence domains conserved in retroviral proteins include PR, IN, RT and GAG.

Assay for mini-Tnt1 cDNA synthesis Tobacco protoplasts (2 X 106) were electroporated with the mini-Tnt1 constructs (pJR17, pYH175) and the negative controls with the PBS deletions (pYH176, pYH175) (Fig.

86

2A). Endogenous Tnt1 elements are expressed during protoplast isolation (Pouteau et al., 1991) and should provide Gag and Pol proteins required for the replication of the mini-Tnt1 elements. After electroporation, the protoplasts were allowed to grow for two months, and kanamycin resistant calli were selected and scored (Table 1). Kanamycin resistance can arise either by integration of the mini-Tnt1constructs into the genome or by expression, replication and integration of mini-Tnt1 cDNA. To distinguish between these possibilities, genomic DNA was isolated from kanamycin resistant calli and PCR-amplified with primers flanking the intron (primers 1 & 2, Fig. 2). If mini-Tnt1 cDNA is synthesized, the loss of the intron will be detected because it will generate a PCR product 85 bp shorter than the fragment with the intron. For the 35S-mini-Tnt1 construct, about 10% of the calli had a PCR product corresponding to the size expected due to splicing. For mini-Tnt1, about one third of the calli showed intron loss. Sequence analysis of the smaller PCR fragments indicated that the intron had indeed been spliced (data not shown), thereby demonstrating that mini-Tnt1 cDNA is synthesized.

Among the samples showing evidence of splicing, about half had two PCR bands. One band corresponds to the parental mini-Tnt1 elements, which integrated into the plant genome by non-homologous end-joining or by homologous recombination with endogenous mini-Tnt1 elements. The other band corresponds to the progeny mini-Tnt1 elements, which transposed into the genome after reverse transcription. The other half of the samples showed only one smaller band, suggesting that mini-Tnt1 cDNA synthesis occurred very soon after transformation, even before the parental mini-Tnt1 construct integrated into the genome.

To confirm that reverse transcription of mini-Tnt1 was responsible for generating the intron-less elements, PCR screens for intron loss were performed with kanamycin resistant calli obtained with the PBS mutants. We anticipated that the PBS mutations would compromise transposition; however, to our surprise, intron loss was still detected in both mutant mini-Tnt1 constructs (Table 1). It is known that multiple copies of the retrotransposon mRNA can be packaged into VLPs (Feng et al., 2000; Haag et al., 2000). During protoplast isolation, the transcription of endogenous Tnt1 elements is greatly

87 increased, and so it is possible that VLPs contain transcripts of both endogenous Tnt1 and PBS mutants. If minus-strand strong stop DNA (ssDNA) is generated from the endogenous Tnt1 mRNA, it could transfer to the mRNA of a mini-Tnt1 element during the first strand transfer that occurs during cDNA synthesis. Second strand cDNA synthesis would then be primed from the PPT on the mini-Tnt1 mRNA. To test this possibility, a double mutant mini-Tnt1 was created that lacks both the PBS and the PPT. Kanamycin calli derived from the double mutants were analyzed, and no evidence of intron loss was obtained. These experiments confirm that the observed intron loss is due to reverse transcription of the mini- Tnt1 elements in the transformed protoplasts.

Mini-Tnt1 elements transpose in tobacco protoplasts We next sought to ascertain whether the amplified PCR products without introns are derived from transposition events. To do this, we examined the flanking sequences of several mini-Tnt1 insertions in kanamycin resistant calli to see whether they contain target site duplications chararcteristic of transposition. Ten kanamycin resistant calli were selected that showed no evidence of the parental Tnt1 plasmid in the PCR assays. Among these, 4 were derived from experiments with the 35S-mini-Tnt1 and the remaining 6 were derived from experiments with the complete 5’ LTR mini-Tnt1. DNA prepared from the calli was digested with HaeIII, for which cut sites are present in the 5’ and 3’ regions of NPTII. There are no HaeIII sites in the mini-Tnt1 LTR. Digestion with HaeIII, therefore, releases both the 5’ and 3’ LTRs as well as part of the sequence of NPTII. After intramolecular ligation of the digested DNA, oligonucleotides located in each LTR and NPTII (and directed outwards) were used as primers in inverse polymerase (IPCR) chain reactions (Fig. 3A) (Silver and Keerikatte, 1989). Since one of the primers in each reaction is located within NPTII, IPCR will only amplify sequences flanking the 5’ or 3’LTRs of mini-Tnt1s in the genome and not endogenous elements. A nested IPCR was carried out using two primer pairs to increase the fidelity of the reaction. The PCR products were subsequently cloned and sequenced. The sequences flanking the 5’ and 3’ LTRs from the same sample were compared. Amplified mini-Tnt1 elements from 6 samples were flanked on both sides by 5 bp target site duplications (Fig3. B). Four samples showed more than one target site duplication,

88 suggesting the mini-Tnt1 transposed more than once (data not shown). Mini-Tnt1, therefore, creates a duplication of 5bp upon integration into the tobacco genome, similar to what was observed for a full-length element (Grandbastien et al., 1989).

Figure 2. Assay for mini-Tnt1 cDNA synthesis. A. Constructs used for tobacco protoplast transformation. B. Results of PCR screens for intron loss in kanamycin resistant calli derived from various mini-Tnt1s. Lanes 1 & 11, DNA markers; lane 2, negative control, namely the parental mini-Tnt1; lane 3, positive control, namely a plasmid carrying NPTII without an intron; lane 4, control without template; lanes 7, 13 & 15, intron loss; lanes 5, 6, 10, 12, 16, 17, 18 & 19, intron retention; lanes 8, 9 14 & 20, both intron loss and retention. Primers used in PCR assay are indicated by blue arrow line. LTR sequences are shown as solid triangles. The short thin red lines denote the location of the PBS and polypurine tracts (PPT). Open reading frames and promoter are indicated by the open boxes. AI stands for artificial intron. Small Red Box stands for artificial intron. The italic PBS, PPT denotes the deleted PBS and PPT sites.

Table 1: Summary of PCR screens for intron loss.

89

There are over 100 copies of Tnt1 in the tobacco genome (Le et al., 2007). It is possible that the integrated elements described above were the result of homologous recombination between mini-Tnt1 cDNA and endogenous Tnt1 elements. To test this possibility, PCR was carried out with wild type tobacco genomic DNA using primers located upstream and downstream of mini-Tnt1 insertion sites (Fig. 3B). The wild type insertion sites were rescued, and all showed no evidence of pre-existing Tnt1 sequences (date not shown). This is consistent with the min-Tnt1s having integrated into the tobacco genome. Each of the target site sequences was searched against the DNA sequence databases. Sequence similarity with an e-value cut-off < 1.00E-5 was considered as significant. Four of the six insertions showed similarity to previously characterized DNA sequences (Table 2). Three were within coding sequences and one was in a retrotransposon.

The mRNA template used by LTR retrotransposons for reverse transcription begins at the junction between the U3 and R regions in the 5’ LTR and ends at the junction between R and U5 in the 3’LTR. If mini-Tnt1 replicates using the same mechanism as retroviruses and other retrotransposons, then we expect newly integrated 35-mini-Tnt1 elements to lack the 35S promoter and to have complete 5’ LTRs with U3, R and U5 (Fig. 1). Oligonucleotides complementary to sequences upstream of the 5’ LTR and the beginning of the NOS promoter were used in a PCR reaction with template DNA from kanamycin resistant calli. Four PCR products were obtained that were derived from independently transposed elements. DNA sequencing analysis revealed that in all four, the 5’ LTRs were reconstituted through reverse transcription. For three of these, the sequence was consistent with the use of the mini-Tnt1 U5 region as a template. The remaining 5’ LTR, however, was chimeric, and its sequence matched both the U3 region of endogenous Tnt1 elements and part of the mini-Tnt1 U3 region (Fig. 4). The 3’ LTR of this element with the chimeric 5’ LTR was rescued by PCR using primers located downstream of the 3’ LTR and within the 3’ end of NPTII. DNA sequencing revealed that the 3’ LTR was identical to the 5’ LTR, namely a chimera between the U3 region of the mini-Tnt 5’ LTR and an endogenous Tnt1 element. It has been suggested that reverse transcriptase transfers from one RNA template to the other to generate

90 full-length copies of the element (Sharma et al., 2008). The chimeric LTR is consistent with the use of both mini-Tnt1 and endogenous element mRNA templates.

Fiure 3. Integration site analysis of mini-Tnt1s. A. Genomic DNA from calli with putative transposition events was digested with the restriction enzyme HaeIII. After intramolecular ligation of the HaeIII fragments, two pairs of primers (indicated by the horizontal arrows) were used in separate IPCR reactions that were specific for the 5’ or 3’ LTRs. Sequences flanking 6 (3 from construct pJR17, 3 from construct pYH175) mini-Tnt1 insertions are shown. Duplicated sequences at the insertion site are indicated by capital letters. B. Two pairs of primers were selected downstream and upstream of the insertion sites of the mini-Tnt1s. PCR was performed with genomic DNA from wild type tobacco to recover the empty mini-Tnt1 insertion sites.

91

Figure 4. Characterization of mini-Tnt1 reverse transcription products. A. Structure of the predicted reverse transcription product of the 35S-mini-Tnt1. The two pairs of primers were used to specifically amplify reverse transcribed copies of the 5’ LTR of 35S-mini-Tnt1. B. Sequence comparison of a reconstituted 5’ LTR of 35S- mini-Tnt1 and the D15 LTR, which represents a newly transposed endogenous Tnt1 (Melayah et al., 2001). Dashes indicate sequence identity and X’s indicates deletions. Only the nucleotides which differ from the sequence of the reconstituted 5’ LTR are shown. Blue sequences, BI repeats; green sequences, BII repeats; red sequences, target site duplication.

Table 2. BLAST search results for sequences at the mini-Tnt1 insertion sites.

92 cDNA is synthesized when the mini-Tnt1 two component system is introduced into Arabidopsis It is difficult to study retrotransposition of Tnt1 in tobacco because of the high copy number of Tnt1 and Tnt1-like elements in the genome (more than 100 copies) (Grandbastien et al., 2005). Arabidopsis is a well-studied model plant, and Tnt1 actively transposes in Arabidopsis tissue culture (Courtial et al., 2001). Because of this, we introduced into Arabidopsis protoplasts both a 35S-mini-Tnt1 and a helper element in which the Tnt1 gag and pol are driven by the 35S promoter. Kanamycin resistant calli were selected and allowed to grow for 3 months. Two independent experiments produced 296 calli. The PCR-based assay was used to screen for intron loss, and 9 of 296 calli showed evidence of spliced cDNA (Fig. 5). Further characterization of these events is in progress.

Figure 5. Assay for mini-Tnt1 cDNA synthesis in Arabidopsis thaliana. PCR results for intron loss from mini- Tnt1. Lane 1 is a DNA marker; lanes 2, 10 & 14 are calli with two types of mini-Tnt1 elements, namely those with and without introns; lanes 3, 4, 5, 6, 7, 8, 9, 11, 12, & 13 are calli with mini-Tnt1 elements that show no evidence of intron loss..

DISCUSSION Mini-Tnt1 transposition during tobacco protoplast regeneration We have demonstrated here that after the mini-Tnt1 vector is introduced into tobacco protoplasts, they can be complemented in trans by helper endogenous Tnt1 elements and undergo the three major steps of retrotransposition: transcription, reverse transcription of

93 mRNA, and integration of cDNA into the genome. To the best of our knowledge, this is the first two-component retroelement delivery system that has been developed for higher plants. Two mini-Tnt1 vectors were evaluated in this study, both of which retrotransposed during protoplast regeneration. Based on a PCR screen for cDNA synthesis that assessed loss of an intron, transposed mini-Tnt1 elements were detected in one third of the calli generated from protoplasts transformed with the mini-Tnt1 vector that had a complete 5’ LTR. This level of transposition is comparable to the 25% transposition frequency reported for endogenous Tnt1 elements when plants were regenerated from protoplasts (Melayah et al., 2001). For the construct with the 35S promoter (35S-mini-Tnt1), transposition was observed in about 10% of the transformed protoplasts or about 3-fold lower than the mini- Tnt1 with the complete 5’ LTR. The U3 region of the 5’ LTR was replaced with the 35S promoter in 35S-mini-Tnt1. U3 contains several cis-acting elements that behave as transcriptional activators (Grandbastien et al., 2005). In fact, the native 5’ LTR showed higher transcriptional activity than the 35S promoter in a tobacco transient expression assay (Pouteau et al., 1991). Because transcription is one of the major regulatory steps in Tnt1 transposition, higher transcription may be one reason for the higher transposition frequency of mini-Tnt1 element with the complete LTR.

Our results provide evidence that multiple copies of mRNA are packaged into the Tnt1 VLPs and that recombination occurs during Tnt1 reverse transcription. Incorporating two molecules of genomic RNA into a single viral particle is characteristic of retroviruses. It was observed that LTR retrotransposons in yeast also package and use more than one mRNA during reverse transcription (Feng et al., 2000; Haag et al., 2000). Here we show two lines of experimental evidence that Tnt1 packages more than one copy of mRNA into its VLP. First, cDNA was still synthesized from the mini-Tnt1 elements with PBS deletions. We believe this can be explained if mRNA from both endogenous Tnt1 and the mutant mini-Tnt1 are packaged into the VLP. The mRNA mini-Tnt1 mutant may anneal to the minus ssDNA derived from the endogenous Tnt1. This would enable reverse transcription of the mini-Tnt1 to take place. This is supported by the observation that no cDNA was produced from constructs with both the PBS and PPT deleted, as the PPT from the mini-Tnt1 would be

94 required to complete reverse transcription. Second, the 5’ LTR of 35S-mini-Tnt1 was repaired, but in one case, U3 had an extra BII repeat derived from U3 of the endogenous Tnt1. Additionally, U3 of the 3’ LTR of the same transposed element had the exact same sequence as the 5’LTR. This indirectly demonstrates that both Tnt1 and 35S-mini-Tnt1 mRNAs were packaged into one VLP and that template switching occurred between them during the minus strand DNA synthesis of 35S-mini-Tnt1. Recombination and template switching have been frequently observed in retroviruses (Hu et al., 1997; Stuhlmann and Berg, 1992). High frequencies of deletion caused by homologous recombination were also revealed during retrotransposition of Ty1 in Saccharomyces cerevisiae (Xu and Boeke, 1987). Recently, sequence analysis suggested that template switching occurred during reverse transcription in Triticeae genomes, and this was evoked to explain how some complex LTR retrotransposon insertions arose (Sabot and Schulman, 2007.). Our evidence supports the emerging picture that multiple mRNAs are used during reverse transcription of diverse LTR retroelements, and that recombination and template switching are common occurrences during replication.

Figure 6. Model for genetic recombination in a Tnt1 VLP during mini-Tnt1 reverse transcription. Transcripts of a mini-Tnt1 and an endogenous Tnt1 were packaged into the same VLP. In the copy choice model, genetic recombination occurs when the growing minus ssDNA switches from one RNA template to another during minus-strand DNA synthesis.

95

Applications of the mini-Tnt1 two component system Our experiments demonstrate that the mini-Tnt1 two-component retrotransposon vector system can be used to replicate and deliver gene sequences of interest. Such retroelement-derived vectors have a variety of applications in plant biology and biotechnology:

Gene tagging. Insertional mutagenesis is a powerful tool for forward genetic analysis in plants (Kumar and Hirochika, 2001). The most popular insertion elements are Agrobacterium T-DNA and class II transposable elements, such as Ac, En/Spm and Mutator of maize. However, T-DNA is not particularly useful for plants with large genomes, because it integrates randomly, and T-DNA is also not useful for plants that are difficult to transform with Agrobacterium. Class II retroelements move by a “cut and paste” mechanism, thereby creating unstable mutations (Wessler, 2006). Tnt1 can overcome all of the above limitations because it targets genes and moves by a “copy and paste” mechanism. In this regard, full- length Tnt1 elements have already been successfully used as insertional mutagens in Medicago truncatula and lettuce (Mazier et al., 2007; Tadege et al., 2005; Tadege et al., 2008). Mini-Tnt1 vector systems have several unique features that make them valuable tools for gene tagging and mutagenesis: 1) The distribution of mini-Tnt1 genomic insertion sites and the frequency of mini-Tnt1 transposition are both comparable to endogenous Tnt1; 2) The small size of mini-Tnt1s make them useful for transferring DNA into plants; 3) After transformation, the helper element can be removed by backcrossing so that the insertions will no longer be active; 4) Half of the transposed mini-Tnt1 elements integrate into the genome shortly after transformation, even before the parental mini-Tnt1 copies. This simplifies analysis of mini-Tnt1 integration sites.

Delivery vectors for homologous recombination. Mini-Tnt1s can be used to produce cDNA that engages in recombination, including homologous recombination or gene targeting. The two most widely used methods for introducing DNA into plants are infection by Agrobacterium and high-velocity particle bombardment. In both methods, the input DNA remains in the plant cell for a very short period of time. Since the occurrence of homologous

96 recombination is infrequent, the transient delivery of DNA cannot provide enough donor DNA for efficient recombination. LTR retrotransposon cDNA has been shown to engage in homologous recombination in Saccharomyces cerevisiae (Ke et al., 1999; Ke and Voytas, 1997). In this study, we show that mini-Tnt1 vector elements experience high level of retrotransposition and that cDNA is synthesized during protoplast regeneration. If integration is blocked, mini-Tnt1s should only be able to enter the genome by homologous recombination, and therefore, they can be used as a donor DNA producer for homologous recombination. We predict that using retrotransposons to deliver donor will increase frequencies of homologous recombination.

Targeted integration. It was found that some retrotransposons show high degrees of integration specificity and that integration specificity of some retrotransposons can be changed by modifying integrases to recognize DNA-bound protein partners (Brady et al., 2008; Gao et al., 2008; Zhu et al., 2003). Based on this work, it may be possible to develop a retrotransposon-based two-component element that can be used to deliver gene sequences to specific chromosomal sites. Such vectors would use the retroelement’s native target specificity or integrase could be engineered to recognize preferred target sites.

Understanding mechanisms of retrotransposition in plants. Although the regulation of Tnt1 transcription is well studied, very little is known about other aspects of plant retrotransposition, such as reverse transcription, integration and post-transcriptional regulation. Here we provide evidence that Tnt1 may use a similar mechanism as the retroviruses and retrotransposons in packaging multiple copies of mRNA into VLPs that then engage in recombination during reverse transcription. The use of the mini-Tnt1 two-component retrotransposon system should provide a valuable model for investigating other aspects in the replication and regulation of the widespread and abundant plant retroelements.

97

MATERIALS AND METHODS DNA constructs The Tnt1-94 element was originally cloned from genomic DNA isolated from Nicotiana tobaccum cv. Xanthi. The complete element was assembled from four different PCR products. Sequence analysis revealed 0.5% sequence mismatches in comparison to the reference sequence (X13777). The mismatched nucleotides were repaired by overlapping primers and the resulting clones were then used to assemble a full-length element which was cloned into vector pDW860 (unpublished data) between ClaI and ApaI sites to create the plasmid pIP57.

MiniTnt1 vector retroelements were designed to carry only the cis-elements necessary for reverse transcription. Two different versions of the vector retroelements were developed: pIP62 is under the control of the native LTR; pIP65 is under the control of CaMV 35S promoter. The MluI-ApaI fragment containing 324 bp upstream of the 3’ LTR and the whole 3’ LTR was cloned into pDW860 to create pJR10. The vector pIP62 was created by three- fragment ligation. The three fragments were the ClaI-XhoI fragment containing the 5’ LTR and downstream 5’ end LTR to the first 315 bp of gag, XhoI-MluI polylinker fragment created by annealing DVO2039 and DVO2040 and fragment pJR10 digested with ClaI and MluI. CaMV 35S promoter is joined to the 5’ LTR by replacing the 233 nucleotides of 5’end LTR though overlapping primer to create a chimeric promoter 35S-LTR. The ClaI-XbaI fragment containing the 35S-LTR was cloned into pIP62 in place of the fragment between the same sites to construct pIP65.

To provide a marker gene to follow the transposition, an 85 bp (5’ GTAAGTTTATCAGTTAAATATAATAAATAAAGAAGAAAACCAAAAAAATGGCTA ACTAAAACGATGGTCTTATGATTTTATGCAG3’) artificial intron was inserted into NPTII and then the modified NPTII fragment was fused to the nopaline synthase (NOS) promoter. NPTII splicing site were predicted by SplicePredictor (http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi). The XhoI-NdeI fragment containing the NOS promoter-modified NPTII was inserted into pIP62 and pIP65 between the same sites to

98 construct pYH175 and pJR17. The construct pJR17 was digested by BglII and pflM1 and the resulting fragment was digested by Mung Bean enzyme and religated by T4 DNA ligase to make the PBS deletion pYH176. The construct pYH175 was digested by XbaI and pflM1 and resulted fragment treated by Mung Bean enzyme and religated to make the PBS deletion pYH178. Pair PCR was carried out with two pairs of primers to delete the PPT site. The NdeI-ApaI PCR fragment with the PPT deleted was religated into the construct pYH176 in place of the original fragment to create pYH229. The same strategy was applied to pYH178 to make construct pYH233 with PBS and PPT double deletions.

Plant transformation and plant material Transformed tobacco calli were obtained by culturing electroporated protoplasts isolated from leaf mesophyll of Nicotiana tobaccum cv. Xanthi. The culture media contained kanamycin (50 g/ml), as described by Wright et al 2005. Numbers of Kanr resistant calli were scored after 60 days. Genomic DNA was extracted from the calluses by the Epicentre DNA extraction kit.

Arabidopsis calli were obtained by culturing PEG-transformed protoplasts as described by Wenck and Marton, 1995 with modification. The Arabidopsis thaliana ecotype Columbia was grown under short-day conditions at 24Ԩ in germination media. At 4 to 6 days after germination, the seedlings were collected and treated with cellulase (0.25% W/V) and macerozyme (0.05% W/V) in K3 media (Nagy and Maliga, 1976) for 24 hours. Protoplasts were recovered by floatation in K3 media and washed once. PEG transformation of Arabidopsis thaliana was described in (Yoo et al., 2007). After transformation, the conditions for protoplast culture are the same as described in Wenck and Marton, 1995. Selection of protoplasts is described as in (Damm et al., 1989). Numbers of Kanr calluses were scored after 90 days.

99

Inverse PCR. Two pairs of primers (DVO4707 and DVO4872 or DVO4944 and DVO4874) were used to amplify the sequences flanking the 5’ LTRs of new mini-Tnt1 insertions. Two other pairs of primers (DVO4877 and DVO4878 or DVO4783 and DVO4880) were used to amplify the sequences flanking the 3’ LTRs of new insertions. Genomic DNA (200 ng) was digested with HaeIII in a 20 l reaction volume for 12 hours at 37°C. After heat inactivation of the restriction enzyme (20 min at 80°C), the DNA was self-ligated at 15°C overnight in 100 l of ligase buffer containing 40U ligase (NEB). The ligation products (5 l) were used in subsequent PCR reactions. The first reaction was carried out in a 50 l volume of 1X Jumpstar Taq buffer, with 0.75 l of each primer (20 M), 8 l 2.5 mM dNTPs and 2.5U Jumpstar Taq polymerase (Sigma). The product of the first PCR reaction was diluted 100-fold. 1 l of diluted product was used in the next round of PCR as described above for the first reaction.

Sequence analysis Sequences flanking mini-Tnt1s were used as queries in BLASTN, BLASTX or TBLASTX searches against the non-redundant (nr) and EST databases at the National Center for Biotechnology information (NCBI, http://www.ncbi.nlm.nih.gov) or against the Tobacco gen index (NtGI, http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=tobacco). Flanking sequences that returned high scoring pairs with e-value < 1.00E-5 were considered as significant.

REFERENCES Bolton, E.C., Coombes, C., Eby, Y., Cardell, M., and Boeke, J.D. (2005). Identification and characterization of critical cis-acting sequences within the yeast Ty1 retrotransposon. RNA 11, 308-322.

Brady, T.L., Schmidt, C.L., and Voytas, D.F. (2008). Targeting integration of the Saccharomyces Ty5 retrotransposon. Methods Mol Biol 435, 153-163.

Chapman, K.B., Bystrom, A.S., and Boeke, J.D. (1992). Initiator methionine tRNA is essential for Ty1 transposition in yeast. Proc Natl Acad Sci U S A 89, 3236-3240.

100

Courtial, B., Feuerbach, F., Eberhard, S., Rohmer, L., Chiapello, H., Camilleri, C., and Lucas, H. (2001). Tnt1 transposition events are induced by in vitro transformation of Arabidopsis thaliana, and transposed copies integrate into genes. Mol Genet Genomics 265, 32-42. d'Erfurth, I., Cosson, V., Eschstruth, A., Lucas, H., Kondorosi, A., and Ratet, P. (2003). Efficient transposition of the Tnt1 tobacco retrotransposon in the model legume Medicago truncatula. Plant J 34, 95-106.

Damm, B., Schmidt, R., and Willmitzer, L. (1989). Efficient transformation of Arabidopsis thaliana using direct gene transfer to protoplasts. Mol Gen Genet 217, 6-12.

Feng, Y.X., Moore, S.P., Garfinkel, D.J., and Rein, A. (2000). The genomic RNA in Ty1 virus-like particles is dimeric. J Virol 74, 10819-10821.

Gao, X., Hou, Y., Ebina, H., Levin, H.L., and Voytas, D.F. (2008). Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res 18, 359-369.

Grandbastien, M.A., Audeon, C., Bonnivard, E., Casacuberta, J.M., Chalhoub, B., Costa, A.P., Le, Q.H., Melayah, D., Petit, M., Poncet, C., et al. (2005). Stress activation and genomic impact of Tnt1 retrotransposons in Solanaceae. Cytogenet Genome Res 110, 229- 241.

Grandbastien, M.A., Audeon, C., Casacuberta, J.M., Grappin, P., Lucas, H., Moreau, C., and Pouteau, S. (1994). Functional analysis of the tobacco Tnt1 retrotransposon. Genetica 93, 181-189.

Grandbastien, M.A., Spielmann, A., and Caboche, M. (1989). Tnt1, a mobile retroviral-like transposable element of tobacco isolated by plant cell genetics. Nature 337, 376-380.

Haag, A.L., Lin, J.H., and Levin, H.L. (2000). Evidence for the packaging of multiple copies of Tf1 mRNA into particles and the trans priming of reverse transcription. J Virol 74, 7164- 7170.

Hirochika, H. (1993). Activation of tobacco retrotransposons during tissue culture. EMBO J 12, 2521-2528.

Hirochika, H., Otsuki, H., Yoshikawa, M., Otsuki, Y., Sugimoto, K., and Takeda, S. (1996). Autonomous transposition of the tobacco retrotransposon Tto1 in rice. Plant Cell 8, 725-734.

Hu, W.S., Bowman, E.H., Delviks, K.A., and Pathak, V.K. (1997). Homologous recombination occurs in a distinct retroviral subpopulation and exhibits high negative interference. J Virol 71, 6028-6036.

101

Ke, N., Gao, X., Keeney, J.B., Boeke, J.D., and Voytas, D.F. (1999). The yeast retrotransposon Ty5 uses the anticodon stem-loop of the initiator methionine tRNA as a primer for reverse transcription. RNA 5, 929-938.

Ke, N., and Voytas, D.F. (1997). High frequency cDNA recombination of the saccharomyces retrotransposon Ty5: The LTR mediates formation of tandem elements. Genetics 147, 545- 556.

Kumar, A., and Bennetzen, J.L. (1999). Plant retrotransposons. Annu Rev Genet 33, 479- 532.

Kumar, A., and Hirochika, H. (2001). Applications of retrotransposons as genetic tools in plant biology. Trends Plant Sci 6, 127-134.

L'Homme, Y., Seguin, A., and Tremblay, F.M. (2000). Different classes of retrotransposons in coniferous spruce species. Genome 43, 1084-1089.

Le, Q.H., Melayah, D., Bonnivard, E., Petit, M., and Grandbastien, M.A. (2007). Distribution dynamics of the Tnt1 retrotransposon in tobacco. Mol Genet Genomics 278, 639-651.

Li, M., and Rossi, J.J. (2008). Lentiviral vector delivery of siRNA and shRNA encoding genes into cultured and primary hematopoietic cells. Methods Mol Biol 433, 287-299.

Lucas, H., Feuerbach, F., Kunert, K., Grandbastien, M.A., and Caboche, M. (1995). RNA- mediated transposition of the tobacco retrotransposon Tnt1 in Arabidopsis thaliana. EMBO J 14, 2364-2373.

Mazier, M., Botton, E., Flamain, F., Bouchet, J.P., Courtial, B., Chupeau, M.C., Chupeau, Y., Maisonneuve, B., and Lucas, H. (2007). Successful gene tagging in lettuce using the Tnt1 retrotransposon from tobacco. Plant Physiol 144, 18-31.

Melayah, D., Bonnivard, E., Chalhoub, B., Audeon, C., and Grandbastien, M.A. (2001). The mobility of the tobacco Tnt1 retrotransposon correlates with its transcriptional activation by fungal factors. Plant J 28, 159-168.

Pouteau, S., Huttner, E., Grandbastien, M.A., and Caboche, M. (1991). Specific expression of the tobacco Tnt1 retrotransposon in protoplasts. EMBO J 10, 1911-1918.

Sabot, F., and Schulman, A.H. (2007). Template switching can create complex LTR retrotransposon insertions in Triticeae genomes. BMC Genomics 8, 247. Sharma, A., Schneider, K.L., and Presting, G.G. (2008). Sustained retrotransposition is mediated by nucleotide deletions and interelement recombinations. Proc Natl Acad Sci U S A 105, 15470-15474.

102

Silver, J., and Keerikatte, V. (1989). Novel use of polymerase chain reaction to amplify cellular DNA adjacent to an integrated provirus. J Virol 63, 1924-1928.

Stuhlmann, H., and Berg, P. (1992). Homologous recombination of copackaged retrovirus RNAs during reverse transcription. J Virol 66, 2378-2388.

Tadege, M., Ratet, P., and Mysore, K.S. (2005). Insertional mutagenesis: a Swiss Army knife for functional genomics of Medicago truncatula. Trends Plant Sci 10, 229-235.

Tadege, M., Wen, J., He, J., Tu, H., Kwak, Y., Eschstruth, A., Cayrel, A., Endre, G., Zhao, P.X., Chabaud, M., et al. (2008). Large-scale insertional mutagenesis using the Tnt1 retrotransposon in the model legume Medicago truncatula. Plant J 54, 335-347.

Vitte, C., and Bennetzen, J.L. (2006). Analysis of retrotransposon structural diversity uncovers properties and propensities in angiosperm genome evolution. Proc Natl Acad Sci U S A 103, 17638-17643.

Wessler, S.R. (2006). Transposable elements and the evolution of eukaryotic genomes. Proc Natl Acad Sci U S A 103, 17600-17601.

Xu, H., and Boeke, J.D. (1987). High-frequency deletion between homologous sequences during retrotransposition of Ty elements in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 84, 8553-8557.

Xu, H., and Boeke, J.D. (1990). Localization of sequences required in cis for yeast Ty1 element transposition near the long terminal repeats: analysis of mini-Ty1 elements. Mol Cell Biol 10, 2695-2702.

Yoo, S.D., Cho, Y.H., and Sheen, J. (2007). Arabidopsis mesophyll protoplasts: a versatile cell system for transient gene expression analysis. Nat Protoc 2, 1565-1572.

Zhu, Y., Dai, J., Fuerst, P.G., and Voytas, D.F. (2003). Controlling integration specificity of a yeast retrotransposon. Proc Natl Acad Sci U S A 100, 5891-5895.

103

CHAPTER 5. GENERAL CONCLUSIONS

The tethered-targeting model states that retroelement-encoded proteins recognize specific components of chromatin during integration and that this results in the accumulation of retroelements in distinct chromosomal regions (Bushman, 2003). The work presented in this thesis provides support for the tethered-targeting model, and is part of an emerging body of data suggesting that tethered targeting is the primary mechanism by which LTR retrotransposons and retroviruses identify integration sites. Targeted integration into heterochromatin is believed to benefit both retrotransposons and their hosts. It allows the retrotransposon to avoid negative selection that results from insertion into genes (Boeke and Devine, 1998). It may reduce harm to the host, since heterochromatin is typically gene-poor. Furthermore, it has been suggested that the accumulation of retrotransposons in certain chromosomal domains through targeted integration may play an important role in the formation of heterochromatin (Slotkin and Martienssen, 2007). The data in this dissertation support a dynamic interplay between retrotransposons and heterochromatin in which retrotransposons recognize specific heterochromatin markers at the time of integration and then help establish heterochromatin by triggering epigenetic modification through the RNAi pathway.

The C-terminus of integrase: a universal determinant of target site choice Integrase (IN) is important for the replication of retrotransposons and serves as the molecular machine that cuts and joins DNA. Whereas the N-terminus of IN carries out the catalytic activity, the C-terminal domain is highly divergent among retroelements. A function for the C-terminus of IN was discovered for the Ty5 retrotransposons of Saccharomyces. The very C-terminus of Ty5 IN encodes a 6 amino acid targeting domain that interacts with the heterochromatin protein Sir4p. This interaction is required for Ty5 to specifically integrate into heterochromatin. The IN C-termini of the chromoviruses, a specific lineage of Ty3/gypsy elements, have chromodomains that are thought to function as targeting determinants like the targeting domain of Ty5. Here we show that the chromodomain of the

104 fungal element, Maggy, interacts with H3 dimethyl- and trimethyl-K9 in vitro. Furthermore, when the Maggy chromodomain was fused to the IN of the Schizosaccharomyces pombe Tf1 retrotransposon, new Tf1 insertions were directed to chromosomal sites with methylated H3- K9. This indirectly demonstrated a targeting function for the Maggy chromodomain. One avenue for future work would be to study the function of the Maggy chromodomain in its native context (i.e. within the Maggy retrotransposon). Maggy is known to transpose in its host, the fungus Magnaporthe grisea. Further, the M. grisea genome has been sequenced, which would aid in the characterization of Maggy insertion site preference. This line of research would make it possible to better understand the role of IN chromodomains in transposition and targeting.

Plant centromere-specific retrotransposons (CRs) form a specific lineage of chromoviruses that are highly associated with centromeres. We described a highly conserved CR motif that is 11 amino acids in length and is located at the C-terminus of CR integrases from both monocots and dicots. Localization experiments demonstrated that the CR motif can direct fluorescent marker proteins to centromeric heterochromatin, suggesting the CR motif may contribute to centromeric target specificity. As with Maggy, it would be ideal to study the function of the CR motif within the context of a CR retrotransposon. To date, no transpositionally-competent CR elements have been described. The CR elements, however, are highly conserved at the nucleotide sequence level. It might be possible to assemble a CR element that matches the consensus sequence for recently transposed CR elements in the maize or rice genomes. Although challenging, the identification of a functional CR element would open up lines of research both to understand targeted integration and also to understand the role of CR elements in centromere function.

One way to overcome the lack of a CR element is to use a substitute retrotransposon to study targeting. For example, it might be possible to add the CR motif to other plant retrotransposons to direct them to centromeric heterochromatin. Although LTR retrotransposons have been identified in plants for many years, only a few are known to be transposition-competent. We developed a vector system based on the Tnt1 elements of

105 tobacco to study targeting and other aspects of transposition, such as reverse transcription, integration and post-transcriptional regulation. With respect to reverse transcription, we provide evidence from characterizing our mini-Tnt1 vectors that Tnt1 packages multiple copies of mRNA into VLPs that then engage in recombination during reverse transcription. We are currently using the mini-Tnt1 system to address the targeting function of the CR motif. We have engineered Tnt1 IN to carry a CR motif, and we have shown that the motif localizes IN to centromeric heterochromatin. We are currently testing whether the Tnt1 insertions derived from the modified integrase are located at centromeres. The use of the mini-Tnt1 two-component retrotransposon system should provide a valuable tool for investigating other aspects in the replication and regulation of the widespread and abundant plant retroelements.

Heterochromatin and transposable elements

We provide evidence in this dissertation that retrotransposon chromodomains can target integration to heterochromatin by recognizing epigenetic markers such as H3-K9 methylation. Abundant evidence has accumulated suggesting that transposable elements may also contribute to heterochromatin structure and function (Slotkin and Martienssen, 2007). The RNAi pathway keeps transposable element insertions silenced and is triggered by double-stranded RNA that is produced from the transcription of accumulated transposable elements. siRNAs are then generated and lead to the silencing of transposable elements by DNA and histone methylation. As shown by our data, the epigenetically marked transposable elements will become the targets of new insertion events. This self-reinforcing mechanism creates a genomic safe haven for the mobile elements and strengthens the expansion and persistence of domains of heterochromatin.

Consistent with the role of the integrase C-terminus in targeting Ty5 and Maggy retrotransposons, the centromere-specific retrotransposons (CRs) encode an integrase that recognizes specific components of centromeric heterochromatin and may be responsible for the enrichment of CRs in centromeric region. Constitutive heterochromatin is typically found at centromeres and is essential for chromosome function and genome integrity. In

106 most eukaryotes , the centromere contains various tandem repeats, ranging from several kilobases to several megabases in length. CRs are exclusive to the centromeres in a wide range of plant species and contribute to the repetitive nature of plant centromeres. Although direct evidence is lacking, recent data suggests that CRs may directly contribute to the evolution of the structure and function of the centromere:

First, CRs share high levels (80%) of DNA sequence conservation among various grass species that are estimated to diverged >55 million years ago. In addition, CRs are irregularly interspersed with the centromeric satellite repeats, and both interact with CENH3, the most prominent epigenetic hallmark of centromeric chromatin. Secondly, it is reported that CR elements in rice (CRRs) are highly enriched in chromatin associated with H3-K9 dimethylation. Small RNAs corresponding to CRRs were detected, suggesting that CRR may contribute to the RNAi-mediated pathway for formation and maintenance of centromeric heterochromatin (Neumann et al., 2007). Thirdly, transcripts of CRs in maize (CRMs) are tightly bound to CENH3, suggesting that single-stranded CRM RNA may have a targeting or stabilizing role for the centromere complex (Topp et al., 2004). Lastly, CRs may serve as a template to evolve centromeric and pericentromeric repeats through extensive deletion or other mutational forces. The 250 bp centromeric tandem repeats of wheat show high similarity with to a wheat CR (cereba) (Cheng and Murata, 2003). All of the above data suggest a dynamic interplay between CRs and centromeres. As indicated above, a functional CR element may make it possible to more directly explore relationships between CRs and plant centromere function.

REFERENCES Boeke, J.D., and Devine, S.E. (1998). Yeast retrotransposons: finding a nice quiet neighborhood. Cell 93, 1087-1089.

Bushman, F.D. (2003). Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell 115, 135-138.

Cheng, Z.J., and Murata, M. (2003). A centromeric tandem repeat family originating from a part of Ty3/gypsy-retroelement in wheat and its relatives. Genetics 164, 665-672.

107

Neumann, P., Yan, H., and Jiang, J. (2007). The centromeric retrotransposons of rice are transcribed and differentially processed by RNA interference. Genetics 176, 749-761.

Slotkin, R.K., and Martienssen, R. (2007). Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet 8, 272-285.

Topp, C.N., Zhong, C.X., and Dawe, R.K. (2004). Centromere-encoded RNAs are integral components of the maize kinetochore. Proc Natl Acad Sci U S A 101, 15986-15991.

108

ACKNOWLEDGEMENTS

I would like to thank my professor, Dr. Daniel Voytas, for his guidance and perseverance in teaching and guiding me to think creatively and for his mentorship in science, generosity and friendship. I gladly acknowledge many good friends for their contribution to my training and education as well as their good company especially Ericka Havecker and Xiang Gao. I am very grateful to Ericka Havecker for her encouragement and mentorship toward me as well as her enthusiasm for science, and to Xiang Gao for his pioneer work to initiate the project. I also thank David Wright for the many conversations about our favorite retrotransposon Tnt1 and for his ability approach problems with a critical and objective view. I would also like to thank lab members who have encouraged and helped me to perform this research, namely Junbiao Dai, Weiwu Xie, Troy Brandy, Jiquan Gao, Fengli Fu, Jeffery Townsend, Ronnie Winfrey and Clarice Schmidt. Finally and most importantly, I would like to thank my parents for their patience and unconditional support.