Strategies for re-targeting the non-LTR Alu

AN ABSTRACT

SUBMITTED ON THE FIFTH DAY OF SEPTEMBER OF 2017

TO THE DEPARTMENT OF CELL AND MOLECULAR BIOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF THE TULANE UNIVERSITY

FOR THE DEGREE

OF

DOCTOR OF PHILOSOPHY

BY

Catherine M. Ade

APPROVED:

AstridJ. Engel,

David A. Mullin, Ph.D

William C. Wimley, Ph.D.

ABSTRACT

Genetic engineering of biological molecules has provided the ability to create improved tools that have been applied in numerous fields including re- search, agriculture, industrial biotechnology, and medicine. Mobile elements are the perfect platform for developing gene targeting systems in humans, as they are endogenous and capable of mobilizing genetic material. Here, I present data demonstrating the first successful genetic engineering of a human non-LTR ele- ment protein to promote site-specific insertions. I tested two distinct strategies to target Alu insertions to specific sites in the human genome. The first strategy consisted of fusing a site-specific DNA binding domain (DBD) to the L1 ORF2 protein to favor Alu insertions to the target site. Five distinct DNA binding do- mains targeting specific sequences were tested: the Adeno associated virus

(AAV) REP proteins, a TAL effector, the Cre recombinase, several zinc fingers, and the catalytically inactive dCas9. The second strategy utilized the CRISPR system, fusing a catalytically active Cas9 protein to an endonuclease deficient L1

ORF2 protein, the targeting abilities provided by a gRNA. I was successful at en- riching insertions of Alu insertions within in 1.2 kb window around the target se- quence by 47 fold by fusing a six-finger zinc finger, ZF4, to the N-terminus of the

ORF2 protein. Recovered insertions showed a distinct bias, inserting upstream of the ZF4 target sequence when compared to recovered Alu insertions driven by unfused L1 ORF2. Other DBDs were unable to alter targeting preference of the

ORF2 protein. However, the data provided valuable information on the require- ments for designing successful genetic engineering approaches. My findings

demonstrate that multiple factors including target site abundance, linker selec- tion, terminus for fusion, size of DNA binding domain, and overall complexity of the system play an important role in designing fusion proteins with targeting ca- pabilities. The data demonstrate that it is possible to redirect insertion prefer- ence of human non-LTR-retroelements.

Strategies for re-targeting the non-LTR retrotransposon Alu

A DISSERTATION

SUBMITTED ON THE FIFTH DAY OF SEPTEMBER OF 2017

TO THE DEPARTMENT OF CELL AND MOLECULAR BIOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF THE TULANE UNIVERSITY

FOR THE DEGREE

OF

DOCTOR OF PHILOSOPHY

BY

rll

Catherine M. Ade

APPROVED:

gel^Ph.D

h.D.

William C. Wimley, Ph.D.

© Copyright by Catherine M. Ade, 2017

All Rights Reserved

ACKNOWLEDGEMENTS

.

I need to first express my gratitude to Astrid Engel, for taking a chance on an awkward CMB student. You have been a great mentor in and outside the lab, giving me countless opportunities to grow and succeed. Without your guidance, wisdom, kindness, and patience, I would not be the person I am today.

To the former members of the lab, Dr. Brad Wagstaff and Rebecca Derbes: thank you for training and helping me get through the first few years of graduate school. Without your advice, experience, and humor, this experience would have been quite different. To the members of COMET, especially Dr. Maria Morales in the Deininger lab: thank you for reigning in, dealing with, and sometimes partici- pating in the crazy. All y’all made it an easy decision to join Astrid’s lab.

To everyone I consider family- biological relatives, Virginia band geeks, Ohio nerds, and New Orleans comrades- thank you. Thank you for all of your unwa- vering support, lending me your ears and minds. Without you all, this process would not have been as rewarding or entertaining.

Finally, extra scritches and cuddles to my fur babies Tazo, Vinnie, Violet, Blue,

Frisco, and Boudreaux. You always know how to make me smile.

TABLE OF CONTENTS

LIST OF TABLES………………………………………………………………………vi

LIST OF FIGURES…………………………………………………………………….viii

CHAPTER 1. INTRODUCTION……………………………………………………….1

1.1 Mobile elements in the human genome

1.2 LINE-1

1.2.1 LINE-1 Structure

1.2.2 LINE-1 transcription, translation, and expression

1.2.3 LINE-1 replication cycle

1.2.4 Non-Autonomous elements mobilized by LINE-1

1.3 Alu elements

1.3.1 Origin of Alu elements

1.3.2 Alu Mobilization

1.4 contribute to human diseases

1.4.1 Insertional mutagenesis

1.4.2 Non-allelic Homologous recombination (NAHR) events

1.4.3 Adverse effects from L1 protein expression

1.5 Retroelement activity and expression in cancer

1.6 Environmental influences on mobile element activity

1.7 Host suppression mechanisms

i

1.8 Assays to detect retrotransposition events

1.8.1 Alu and L1 retrotransposition assay

1.8.2 Alu and L1 rescue assay

1.8.3 Next-generation sequencing assays to detect new insertions

1.9 Naturally occurring elements exhibit site-specific insertions

1.10 DNA binding domains with targeting capabilities

1.10.1 Adeno-associated viral proteins (AAV)

1.10.2 Cre recombinase

1.10.3 TALENS

1.10.4 Zinc fingers

1.10.5 The CRISPR/Cas9 targeting system

1.11 Engineering mobile elements to promote site-specific insertions

CHAPTER 2. MATERIALS & METHODS……………………………59

2.1 Constructs

2.1.A Alu Constructs

2.1.B Creating the fusion proteins: LZ-ORF2 and TZ-ORF2 and Cre-

ORF2 (Chapter 3)

2.1.C Creation of the fusion proteins: TAL-ORF2 (Chapter 4)

2.1.D Creation of the fusion proteins: ZF2.17-ORF2,ZF2.18-ORF2, and

ZF2.1817-ORF2 (Chapter 5)

2.1.E Creation of the fusion proteins: N-ZF4-ORF2, N-ZF2-ORF2, C-ZF4-

ORF2, and C-ZF2-ORF2 with GHL, FL4, and HL4 linkers (Chapters 6)

ii

2.1.F Creation of the CRISPR/Cas9 fusion proteins and gRNAs (Chapter

7)

2.2. Retrotransposition, Alu Rescue assay and insert analysis

2.3. Creation of a HeLa-LoxP cell line and HeLa-EGFP cell line.

2.4 Western Blot Analysis

CHAPTER 3. The effect of the multimeric DNA binding proteins, TZ and LZ and Cre on redirecting ORF2p targeting capabilities…………………………….84

3.1. Introduction

3.2. Results

3.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion pro-

teins.

3.2.2 Expression of the ORF2 fusion proteins

3.2.3 Evaluation of the capability of the ORF2 -fusion proteins to redirect in-

sertion preference of the Alu.

3.3. Discussion

CHAPTER 4. The Transcription Activator-like Effector: The effect of mono- meric DNA binding domains with a few genomic target sequences on tar- geting capabilities……………………………………………………….………….102

4.1. Introduction

4.2. Results

4.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion pro-

teins.

4.2.2 Expression of the TAL-ORF2 fusion protein

iii

4.2.3 Evaluation of the capability of the TAL-ORF2 -fusion proteins to redirect

insertion preference of the Alu.

4.3 Discussion

CHAPTER 5. Zinc finger proteins: The effect of monomeric DNA binding domains with a few genomic target sequences on targeting capabili- ties……………………………………………………………………………..………111

5.1 Introduction

5.2 Results

5.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion pro-

teins.

5.2.2 Evaluation of the capability of the ORF2-fusion proteins to redirect inser-

tion preference of the Alu.

5.3 Discussion

CHAPTER 6. Six-fingered zinc fingers that target multiple sites in the ge- nome enrich for Alu insertions when fused to ORF2...... 117

6.1. Introduction

6.2. Results

6.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion pro-

teins.

6.2.2 Expression of the ORF2 fusion proteins

6.2.3 Evaluation of the capability of the ORF2 -fusion proteins to redirect in-

sertion preference of the Alu.

6.2.4 Insertion bias observed due to availability of endo site 6.3. Discussion

iv

CHAPTER 7. Adapting the CRISPR system to the ORF2 protein in order to target Alu insertions………………………………………………………………..142

7.1 Introduction

7.2 Results

7.2.1 Evaluation of the Cas9 targeting capability of the Cas9-ORF2-fusion

proteins.

7.2.2 Evaluation of the retrotransposition capability of the functional Cas9-

endonuclease defective ORF2-fusion proteins.

7.2.3 Evaluation of the capability of the functional Cas9-endonuclease defec-

tive ORF2-fusion proteins to redirect insertion preference of Alu.

7.2.4 Evaluation of the capability of the dCas9-ORF2 (defective Cas9-

functional ORF2)-fusion proteins to redirect insertion preferences of Alu.

7.2.5 Using an MS2-ORF2 fusion protein to increase the interaction a MS2-

gRNA-Cas9 complex to increase efficiency of targeting Alu insertions

7.2.6 Re-designing the A-tail of the Alu Rescue Cassette to promote se-

quence homology between a specific genomic target and Alu insert.

7.3: Discussion

CHAPTER 8. CONCLUSIONS……………………………………………...……..168

BIBLIOGRAPHY……………………………………………………………………..174

APPENDIX……………………………………………………………………………203

BIOGRAPHY…………………………………………………………………………264

v

LIST OF TABLES

Table 1. Relative retrotransposition rates of Alu when driven by ORF2 or by the indicated fusion protein………………………………………………………………..93

Table 2. Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally fused TAL effector………………………………………………………. 106

Table 3. Two potential Alu targeting events when Alu is driven by TAL-

ORF2…………………………………………………………………………………..108

Table 4. Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally fused EGFP targeting zinc fingers…………………………………… 114

Table 5. Relative retrotransposition rates of Alu when driven by ORF2 or by the

N-ZF4-ORF2 fusion protein…………………………………………………………119

Table 6. Recovered Alu Retrotransposition events driven by the ORF2 and the

N-ZF4-ORF2 fusion construct………………………………………………………122

Table 7. Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally or C-terminal ZF4 fusion proteins with different linkers……………...128

Table 8. Frequency of Alu inserts driven by the N-ZF4-ORF2 and C-ZF4-ORF2 fusion proteins with different linkers recovered within 10, 4 and 1.2 kb from the target sequence………………………………………………………………………129

vi

Table 9. Relative retrotransposition rates of Alu when driven by ORF2 or by the

N-ZF2-ORF2 fusion protein………………………………………………………... 136

Table 10. Frequency of Alu inserts driven by the N-ZF2-ORF2 and C-ZF2-ORF2 fusion proteins with different linker recovered within 10, 4 and 1.2 kb from the target sequence………………………………………………………………………137

Table 11. Relative retrotransposition rates of Alu when driven by ORF2 or by the functional Cas9-endonuclease defective ORF2-fusion proteins………………. 150

Table 12. Recovered Alu insertions driven by the four Cas9 fusion proteins with the gRNA2 targeting chromosome 8:2587720-2587745……………………….. 151

Table 13. Genomic frequency of the selected gRNA target sites to L1………. 153

Table 14. Relative retrotransposition rates of Alu when driven by ORF2 or by the dCas9-ORF2 fusion protein using distinct gRNAs targeting L1…………………155

Table 15. Recovered Alu insertions driven by the dCas9-ORF2 fusion proteins driven by four different gRNAs……………………………………………………...156

Table 16. Relative retrotransposition rates of Alu when driven by ORF2 or by the

MS2-ORF2 fusion proteins, co-transfected with dCas9 and a gRNA…………………………………………………………………………………..158

Table 17. Recovered Alu insertions driven by the dCas9, the MS2-ORF2 fusion protein driven using four different MS2 tagged gRNAs……………………………………………………………………………...….159

vii

LIST OF FIGURES

Figure 1. Schematic representation of the class I elements found in the hu- man genome...... 4

Figure 2. Structure of the LINE-1 (L1) element ...... 7

Figure 3. L1 Replication Cycle ...... 18

Figure 4. Target Primed Reverse Transcription ...... 19

Figure 5. Structure of an ...... 25

Figure 6. L1 construct used in the L1 retrotransposition assay ...... 40

Figure 7. L1 and Alu constructs used in rescue retrotransposition assays ...... 43

Figure 8. Examples of naturally occurring non-LTR mobile elements that exhibit site-specific insertions ...... 50

Figure 9. Schematic of the Alu Constructs for Evaluation of Alu Retrotrans- position and Recovery of Alu Insertions sites ...... 61

Figure 10. Schematic of the ORF2 fusion constructs utilizing DNA binding domains that target sequences as multimers ...... 64

Figure 11. Schematic of the pBud TAL-ORF2CH construct...... 66

Figure 12. Schematic of the ORF2 fusion constructs utilizing zinc fingers that target EGFP ...... 68

viii

Figure 13. Driver ORF2 fusion constructs utilizing DNA binding domains that target genomic sequences present in multiple times in the human ge- nome ...... 72

Figure 14. Schematic of the CRISPR-ORF2 fusion constructs ...... 74

Figure 15. Schematic of the Alu retrotransposition assay ...... 81

Figure 16. The Rep proteins TZ and LZ form obligate dimers and tetramers, respectively, in order to bind their genomic target sequence ...... 87

Figure 17. Schematic of two Cre recombinase molecules binding to a single

LoxP target site in the genome ...... 90

Figure 18. Expression analysis of LZ-ORF2, TZ-ORF2, and C-Cre-ORF2 fusion proteins ...... 95

Figure 19. Histogram of the chromosomal distribution of genomic locations of the recovered of Alu inserts driven by the TZ-ORF2 and LZ-ORF2 fusion proteins...... 97

Figure 20. Potential model of why TZ, LZ, and Cre ORF2 fusion proteins were unable to drive targeted Alu insertions……………………………… ...... 101

Figure 21. Schematic of TAL-ORF2 fusion protein ……… ...... 105

Figure 22. Expression analysis of the TAL-ORF2 fusion protein ...... 107

Figure 23. Distribution of Alu insertions driven by the TAL-ORF2 fusion pro- tein ……………………………… ...... 109

Figure 24. Expression analysis of ORF2 and N-ZF4-ORF2 ...... 121

Figure 25. Insertional Distribution of Alu elements driven by ORF2 or the N-

ZF4-ORF2 fusion construct ……………………………… ...... 124

ix

Figure 26. Density histogram of ORF2 endonuclease sites located within

2kb of the ZF4 target sequence ……………………………… ...... 127

Figure 27. Schematic of the insertion distribution of Alu driven by all of the individual N-ZF4-ORF2 and C-ZF4-ORF2 constructs …………………… ...... 131

Figure 28. Schematic of the combined insertion distribution of Alu driven by all N-ZF4-ORF2 and C-ZF4-ORF2 constructs …………………… ...... 134

Figure 29. Expression analysis of C-ZF4-ORF2, N-ZF2-ORF2, and C-ZF2-

ORF2 …………………… ...... 135

Figure 30. Strategy for Cas9 directed TPRT of Alu inserts ……… ...... 146

Figure 31. Evaluation of function of the Cas9-ORF2 and nickase-ORF2 fu- sion proteins …………………… ...... 149

Figure 32. Location of target sites of L1-directed gRNAs ……… ...... 154

Figure 33. The strategy for retargeting Alu insertions using the MS2-ORF2 fusion protein …………………… ...... 157

Figure 34. Adapting the 3’ end of the Alu RNA for TPRT ……… ...... 161

Figure 35. Schematic detail of the recovered inserts driven by the nickase endo-- fusion protein using the pBS-Ya5rescue-L1 PAM plasmid…………… ...... 163

Figure 36. Model of proposed explanation for the lack of targeted Alu inser- tions using the MS2-ORF2 fusion protein CRISPR/Cas9 strategy ...... 166

x

1

Chapter 1. Introduction

1.1 Mobile elements in the human genome

Sequencing the human genome demonstrated that almost fifty percent of the human genome is composed of repetitive DNA sequences, with the majority of these sequences originating from mobile elements (Lander et al., 2001). Most of these repetitive sequences could be classified into five different categories based on sequence homology and derivation (Lander et al., 2001): (1) simple re- petitive DNA (e.g. (A)n, (CA)n and (CGG)n), (2) tandemly repeated functional se- quences (which can be found at and centromeres), (3) segmental du- plications of DNA (DNA between 10-300 kb that inserted into a new genomic lo- cation), (4) mobile element-derived repeats, and (5) processed .

Although mobile element-derived repeats were first estimated to occupy

45% of the human genome (Lander et al., 2001), this percentage is likely an un- derestimation. Current literature suggests that transposable elements comprise as much as two-thirds of the human genome (de Koning et al., 2011). Over time, older, fixed mobile element insertions become harder to recognize due to the ac- cumulation of mutations. Implementing better recognition programs has allowed for the identification of many previously unrecognized, highly mutated elements.

2

Mobile elements can be classified into two classes on the basis of their mode of transposition: Class I (retroelements or retrotransposons) which are subdivided into retrotransposons (LTR), and non-Long

Terminal Repeat retrotransposons (non-LTR) and Class II (DNA transposons)

[reviewed in (Wessler, 2006)].

Class II DNA Transposons mobilize into new genomic locations through a

‘cut and paste’ mechanism [reviewed in (Munoz-Lopez and Garcia-Perez, 2010)].

Each DNA transposon is flanked by terminal inverted repeats, which direct the transposase where to cleave genomic DNA. In the human genome, there are at least seven different classes of transposons (Smit, 1996). Although Class II mo- bile elements have contributed to about three percent of the human genome, analyses show that these transposable elements have not been active for 50 mil- lion years.

Class I retroelements mobilize by using an RNA intermediate. These el- ements can be further classified into LTR-retroelements and non-LTR retroele- ments based on the presence or absence of flanking long terminal repeats

(LTRs) (Figure 1). LTR retrotransposons, also known as endogenous - es (ERVs), constitute about eight percent of the human genome and share struc- tural similarities with (Dewannieux and Heidmann, 2013). These mobile elements are unique in that the machinery responsible for successful in- tegration (the gag and pol proteins, specifically) is coded for in the RNA interme- diate, yet the RNA polymerase start site is coded for in the 5’ LTR (Havecker et al., 2004). Reverse transcription of the LTR retroelement RNA intermediate into

3 cDNA begins in the cytoplasm, primed by a tRNA in a virus-like particle (VLP) before integrating into the genome (Havecker et al., 2004). There are no data supporting any current activity of ERV LTR retrotransposons in the human ge- nome. The non-LTR retrotransposons are the only currently active mobile ele- ments in the human genome. Like the LTR retrotransposons, these elements mobilize through a ‘copy and paste’ mechanism using an RNA intermediate; however, the cDNA copy of the retrotransposon is synthesized at the insertion site. Non-LTR retrotransposons can be classified into two categories based on their ability to encode the retrotransposition machinery: autonomous and non- autonomous (Daniels and Deininger, 1985; Daniels and Deininger, 1991; Moran et al., 1996; Dewannieux et al., 2003; Hancks et al., 2011; Raiz et al., 2011).

4

Figure 1. Schematic representation of the class I elements found in the human genome.

Class I mobile elements mobilized through a ‘copy and paste’ mechanism and can be subdivided into LTR retroelements and non-LTR retroelements. LTR retroelements are flanked by long terminal repeats (LTR, white arrows) and are considered to be inactive in the human genome. The non-LTR retroelements are flanked by target site duplications (TSDs, black arrows). L1 is an autonomous element that contains two open reading frames, ORF1 and ORF2, which encode proteins needed for retrotransposition. The non-autonomous elements encom- pass Alu and SVA.

5

The main active autonomous retrotransposon is Long INterspersed Ele- ment 1 (LINE-1 or L1), which contributes approximately 17% of the mass of the human genome (Lander et al., 2001). There are two active non-autonomous re- trotransposons: SVA and the Short INterspersed Element (SINE), Alu. Alu ele- ments make up approximately 11% of the human genome, while SVA elements comprise less than 0.5%. These three non-LTR retroelements make up the ma- jority of retrotransposition events that occurred in the human lineage [reviewed in

(Deininger et al., 2003; Kazazian, Jr., 2004)]. Both L1 and Alu elements insert in a dispersed manner throughout the genome. However, L1 elements are en- riched in AT rich regions of the genome, and are depleted in gene-rich portions.

Conversely, Alu elements are enriched in gene-coding portions of the genome, and are not found as frequently in AT rich regions of the genome (Lander et al.,

2001). This observation is unexpected, as both Alu and L1 retroelements use the same insertion machinery. Evolutionary analyses indicate that post-insertion selection likely accounts for the observed difference.

A full-length L1 element is 6kb in length, and there are over half a million copies present in the human genome. However, most of the L1 insertions are 3’ truncated. The average human genome is estimated to contain between 3000 and 5000 full length L1 copies (Lander et al., 2001). Due to the accumulation of inactivating mutations, out of the possible 5000 full-length L1 copies, only around

100 of these elements remain retrotranspositionally competent. However, the rate of retrotransposition activity varies between these 100 active elements. Only a few of these retrotranspositionally competent L1 elements are highly acitve,

6 and are referred to as ‘hot’ L1s. These hot L1 elements are considered to contribute to the bulk of new L1 insertions in the human genome (Brouha et al.,

2003; Beck et al., 2010). Many of these highly-active L1 elements are not fixed in the human genome- only a subset of the human population will contain an L1 element at one particular locus (i.e. polymorphic). These polymorphic L1 elements can be traced back to a germline insertion that occurred sometime after human speciation (Boissinot et al., 2000; Myers et al., 2002).

1.2 LINE-1 Elements

1.2.1 LINE-1 Structure

A full-length L1 element in the human genome is approximately 6kb in length, and contains four distinct regions (Figure 2). The 5’ untranslated region

(UTR) contains the RNA polymerase II promoter, allowing for transcription of the

L1 element. An intact, active L1 element codes for two distinct proteins (ORF1p and ORF2p), and is translated from a bicistronic L1 RNA. Both ORF1 and ORF2 proteins are required for successful L1 retrotransposition events. Finally, L1 el- ements conclude with a 3’ UTR, which contains a polyadenylation signal and poly-A tail (Dombroski et al., 1991; Alisch et al., 2006).

7

Figure 2. Structure of the LINE-1 (L1) element

A full-length L1 element is approximately 6000 bp long, transcribed from RNA polymerase II (Pol II) by a sense promoter (SP). An antisense promoter (ASP), located in the 5’ UTR, has the potential to transcribe upstream sequences. L1 codes for two proteins, ORF1 and ORF2. ORF2 has and endonuclease functions (Endo and RT, respectively). A poly-A tail is added from the RNA Pol II machinery.

8

1.2.1.A LINE-1 5’ UTR

The 5’UTR of genomic L1 elements is GC-rich, approximately 900bp long, and responsible for initiating transcription. The 5’UTR contains a CpG island, which is usually methylated, and binding sites for RUNX3, YY1 and SRY. These cellular factors recruited to these binding sites are proposed to aid in L1 tran- scription. For example, binding of the YY1 protein is thought to help with tran- scribing sequences with GC-rich promoters (Swergold, 1990; Minakami et al.,

1992; Becker et al., 1993; Yang et al., 2003). Furthermore, it is hypothesized that the YY1 protein is responsible for correctly positioning the transcription initia- tion complex (Weis and Reinberg, 1997).

The first 100 bp contains the internal sense promoter (SP), which is re- quired for transcription initiation (Swergold, 1990). However, the 5’UTR also con- tains a functional antisense promoter (ASP) that can initiate the transcription of genes upstream of particular L1 loci (Speek, 2001; Matlik et al., 2006). The anti- sense promoter has been implicated in driving the alternative transcription of up- stream genes, which can result in chimeric transcripts that have the potential to be translated (Speek, 2001; Matlik et al., 2006). For example, published data show that L1 antisense promoter activity led to c-MET expression, a proto- oncogene, located upstream of a genomic L1 locus (Birchmeier et al., 2003).

The anti-sense promoter is thought to hinder L1 retrotransposition events by competing with the L1 sense promoter, limiting transcription and subsequently,

L1 retrotransposition (Matlik et al., 2006). Therefore, L1 antisense promoters

9 may act as a cellular control mechanism to limit retrotransposition events of ac- tive L1 elements.

Transcriptional repression of L1 occurs through CpG island methylation

(Hata and Sakaki, 1997; Woodcock et al., 1997). The high GC content present in the L1 promoter is speculated to be a host defense mechanism to mediate the deleterious effect of mobile element activity. To this end, L1 promoter hypo- methylation has been reported in cancer, which is associated with higher expres- sion of L1 and L1 transmobilized retrotransposons (Florl et al., 1999; Santourlidis et al., 1999).

1.2.1.B LINE-1 ORF1 protein

The first protein encoded by the bicistronic L1 mRNA molecule is the

41kDa Open Reading Frame 1 (ORF1). The L1 ORF1 protein (in addition to the

L1 ORF2 protein) is required for successful retrotransposition events of L1

(Moran et al., 1996). There are three distinct functional regions of the L1 ORF1 protein: the non-conserved N-terminus containing a coiled coil region and the highly conserved domains RNA recognition motif (RRM) and C-terminus domain

(CTD). The coiled-coil domain is required for ORF1 multimerization (Martin et al., 2000), while both conserved regions enable the L1 ORF1 protein to interact with L1 mRNA during retrotransposition events (Kolosha and Martin, 1995;

Martin and Bushman, 2001; Kolosha and Martin, 2003; Martin et al., 2005).

Monomers of the ORF1p come together to form trimers to interact with the L1

RNA (Martin et al., 2003).

10

The function of the L1 ORF1 protein remains unclear. It is hypothesized that ORF1 proteins bind to nascent L1 mRNA molecules, protecting the structure from degradation once transcribed. In addition, in vitro studies demonstrate that the ORF1 protein contains nucleic acid chaperone activities (Martin and

Bushman, 2001). These two observations led to the proposal that the ORF1 pro- tein facilitates nucleic acid strand transfer during L1 retrotransposition events

(Martin and Bushman, 2001; Martin et al., 2005).

1.2.1.C LINE-1 ORF2 protein

The second protein coded for by the bicistronic L1 mRNA molecule is the

150 kDa ORF2 protein. This protein contains three domains: an N-terminal en- donuclease (EN), reverse transcriptase (RT), and the cysteine-rich domain

(CYS), located near the C-terminus of the protein (Mathias et al., 1991; Feng et al., 1996; Moran et al., 1996; Clements and Singer, 1998; Cost and Boeke,

1998). The first functional domain in the ORF2 protein, the endonuclease do- main, consists of approximately 239 amino acids (Feng et al., 1996;

Weichenrieder et al., 2004). The ORF2 endonuclease is an apurinic/apyrimidinic

(AP) endonuclease, similar to a family of endonucleases known to participate in

DNA repair pathways (Barzilay et al., 1995). The structure of the ORF2 endonu- clease closely resembles those of the DNaseI and ExoIII proteins from E. coli, which nick target DNA sequences (Mol et al., 1995). Consequently, the AP en- donuclease present in the L1 ORF2 protein acts as a target site nickase (Feng et al., 1996). The ORF2 endonuclease cleaves one strand to expose a 3’ T-rich

11

DNA strand that base pairs with the A-tail of the L1 RNA and serves as a priming site to initiate Target Primed Reverse Transcription (TPRT) (Luan et al., 1993).

The second DNA strand is cleaved by a poorly understood mechanism, which may be made either by the ORF2 protein or an alternate endonuclease present in the cell. This cleavage occurs several base-pairs away from the ORF2 endo- nuclease cleavage site to create a staggered cut, leading to the creation of a tar- get site duplication (TSD) that flanks the new retrotransposed insert (Feng et al.,

1996; Jurka, 1997; Cost and Boeke, 1998).

The central domain present in the L1 ORF2 protein provides the reverse transcriptase activity and is referred to as the RT domain. The RT domain gen- erates a cDNA copy of an RNA template, such as L1, Alu, SVA, or the mRNA of a gene at the site of insertion (Moran et al., 1996; Dewannieux et al., 2003;

Hancks et al., 2011). This is in part due to the indiscriminate priming characteris- tics exhibited by the ORF2 RT domain (Dhellin et al., 1997). The L1 mRNA has been shown to interact with both L1 proteins, ORF1 and ORF2, to form an RNP

(Kulpa and Moran, 2005; Kulpa and Moran, 2006; Taylor et al., 2013). The RT shares both high sequence homology and inhibition by nucleoside analog RT in- hibitors with the HIV virus (Dai et al., 2011).

The C-terminal domain in the L1 ORF2 protein is known as the CYS do- main as it contains a highly conserved CCHC zinc-knuckle motif. Mutations in the CYS domain hinder retrotransposition events (Moran et al., 1996). Although its function is unknown, the CYS domain is hypothesized to aid in in the binding of L1 mRNA to the ORF2 protein (Piskareva et al., 2013). Additionally, similar

12 zinc-knuckle domains exist in retroviral RTs, where they mediate RNA binding to the retroviral reverse transcriptase (Ostertag and Kazazian Jr, 2001). Therefore, the L1 ORF2 CYS domain is speculated to be responsible for mediating and sta- bilizing interactions between the ORF2 protein and the mRNA at the new inser- tion site.

1.2.1.D LINE-1 3’ UTR

The 3’UTR of active, genomic L1 elements contains several conserved features. The 3’UTR contains a nuclear export factor binding site (NFX)

(Lindtner et al., 2002). This binding site was suggested to be important for trans- porting the large (approximately 6 kb) nascent L1 mRNA into the cytoplasm in order for both L1 proteins to be translated (Lindtner et al., 2002). In addition, guanine-rich sequences in hominoid specific L1 3’ UTRs are proposed to form G- quadruplex structures that stimulate retrotransposition (Sahakyan et al., 2017).

The 3’ UTR contains a stretch of adenines at the very 3’ end of the L1 element.

These adenine residues are introduced by reverse transcribing the poly-A tail of the template L1 RNA when generating a new insert. Interestingly, synthetic L1 constructs lacking the 3’UTR, but containing the polyA tail, are fully functional and highly efficient (Wagstaff et al., 2011).

The final feature of the L1 3’UTR is a weak polyadenylation signal. This signal is not very effective, as many transcribed L1 elements read through the L1 polyadenylation signal, terminating transcription downstream at a stronger, ge- nomic termination signal. This results in the incorporation of flanking genomic

13 sequence in the L1 RNA template used in retrotransposition events and is re- ferred to as 3’ transduction. The transcribed flanking genomic sequences are mobilized with the L1 element into a new genomic location (Moran et al., 1999;

Ejima and Yang, 2003; Helman et al., 2014; Pitkanen et al., 2014). These trans- duced genomic sequences are hypothesized to be an important evolutionary mechanism for exon shuffling (Cajuso et al., 2014; Pitkanen et al., 2014).

1.2.2 LINE-1 transcription, translation, and expression

1.2.2.A LINE-1 Transcription

Transcription of bicistronic L1 transcripts occurs at an internal promoter, located in the 5’UTR of genomic L1 elements. L1 elements are transcribed by

RNA polymerase II; full-length L1 mRNA molecules are approximately 6 kb long and contain the internal promoter sequence, 5’ methylguanosine cap, and polyA tail (Swergold, 1990). In addition, the GC-rich internal promoter of genomic L1 elements contains several transcription factor binding sites, including YY1 and

RUNX (Swergold, 1990; Minakami et al., 1992; Becker et al., 1993; Yang et al.,

2003). YY1 has been suggested to aid in directing transcription machinery to the transcription start site (Weis and Reinberg, 1997).

Multiple features of the L1 promoter can regulate transcription of the L1 mRNA. Regulation can occur through methylating CpG islands in the GC-rich

5’UTR. In addition, transcription driven by the L1 antisense promoter could inter- fere with transcription of the sense strand that generates the L1 mRNA.

14

1.2.2.B LINE-1 mRNA splicing

A full-length L1 element contains multiple internal splice acceptor sites

(SA) and splice donor sites (SD), in both the sense and antisense orientation

(Belancio et al., 2006). Data form Northern blot analyses showed that these in- ternal SA and SD sites can generate alternatively spliced L1 mRNA products

(Belancio et al., 2008). Furthermore, this study also showed a delay in L1 mRNA splicing, where spliced products required more than 6 hours (post-transfection) in order to be detected. This delay was proposed as a defense mechanism em- ployed by the cell against de novo L1 inserts. Splicing usually occurs in the nu- cleus and is generally a pre-requisite for cytoplasmic localization (and subse- quently translation). Because the L1 mRNA is likely to be recognized as improp- erly spliced in the cytoplasm, the authors hypothesized that a full length L1 mRNA (i.e. unspliced) could be targeted for degradation, limiting the retrotrans- position capabilities of L1 elements (Belancio et al., 2008).

1.2.2.C LINE-1 premature polyadenylation

Genomic L1 elements contain multiple internal AT rich sites that function as canonical polyadenylation signals (pAs) (Belancio et al., 2008). Therefore, premature polyadenylation can occur, truncating the L1 RNA. Some of these in- ternal pAs are stronger than the canonical 3’UTR pAs (Perepelitsa-Belancio and

Deininger, 2003). As with alternative splicing, premature polyadenylation of the

L1 RNA could also function as a cellular control mechanism to limit the effects of

L1 on the human genome. This was demonstrated when L1 retrotransposition

15 increased after the internal polyadenylation sites were mutated, abolishing these signals.

1.2.2.D LINE-1 Translation

Translation of the two L1 ORFs (ORF1 and ORF2) occurs from the same

RNA molecule. Previous studies demonstrated that the L1 ORF1 protein is ex- pressed at significantly higher levels than the L1 ORF2 protein when translated from the same mRNA molecule. This observation led to two distinct models of

ORF2 protein translation to explain the observed expression differences between

ORF1 and ORF2 (Goodier et al., 2004; Li et al., 2006; Alisch et al., 2006).

The first model of L1 translation proposes the presence of two internal ri- bosomal entry site (IRES) sequences: one for each L1 protein. Published litera- ture indicates that the presence of an IRES between two ORFs is needed for ef- ficient and effective translation of the second protein (Jang et al., 1988; Pelletier and Sonenberg, 1989). The first IRES sequence is located upstream of ORF1 and promotes translation of ORF1. The second IRES signal is located at the 3’ end of the ORF1 protein sequence, which initiates translation of the ORF2 pro- tein. Data used to make this model utilized a mouse L1 and suggests that the expression differences may be attributed to different efficiencies of the two IRES signals (Li et al., 2006). However, these observations have not been replicated in human L1 studies due to the lack of sequence conservation between human L1 elements and the IRES sequences identified in rodents (Furano, 2000).

16

The second model proposes an inefficient re-initiation of ORF2 translation.

In this model, the ORF1 protein is initiated and translated by canonical RNA Pol

II machinery, and not an IRES signal. The ribosome stops at the stop codon of

ORF1, completing ORF1 translation. Next, the ribosome must re-initiate transla- tion in order to express the ORF2 protein from the same L1 mRNA molecule. In this model, the observed expression discrepancy between the L1 proteins may be explained by the inefficient re-initiation of ORF2 translation by the ribosome, and therefore not dependent on IRES signals (Alisch et al., 2006; Dmitriev et al.,

2007).

1.2.3 LINE-1 replication cycle

An L1 retrotransposition event first begins with the transcription of a ge- nomic L1 element by RNA polymerase II. Like most RNA polymerase II tran- scripts, the full-length 6 kb transcript has a 5’ 7-methylguanylate cap, and is pol- yadenylated either using its own weak polyadenylation signal (included in the 3’

UTR), or a genomic polyadenylation signal (found in the 3’ flanking genomic re- gion) (Moran et al., 1999; Ejima and Yang, 2003; Dmitriev et al., 2007). The nascent L1 mRNA molecule is exported from the nucleus to the cytoplasm.

Once in the cytoplasm, the bicistronic mRNA is translated to generate both ORF1 and ORF2 proteins (Figure 3). These two proteins exhibit cis preference: ORF1p and ORF2p favor interacting with the RNA that generated them, forming the ribo- nucleoprotein complex (Wei et al., 2001). The L1 RNP enters the nucleus through an unknown mechanism where it proceeds to generate a new L1 copy

17 through the Target Primed Reverse Transcription (TPRT) mechanism (Figure 4)

(Luan et al., 1993).

18

Figure 3. L1 replication cycle.

(1) Genomic L1 elements are transcribed by RNA Pol II in the nucleus. (2) Nas- cent mRNA molecules are transported to the cytoplasm where (3) translation of the ORF1 and ORF2 proteins occur. (4) The newly translated ORF1 and ORF2 proteins bind to the same mRNA molecule, forming the L1 ribonucleoprotein (RNP) complex. (5) The RNP complex goes back into the nucleus where (6) tar- get primed reverse transcription (TPRT) occurs, integrating a new copy of the L1 element into the genome.

19

Figure 4. Target Primed Reverse Transcription.

The L1 ORF2 endonuclease site (5’- TT/AAAA- 3’) is cleaved by the ORF2 en- donuclease. This cleavage creates a T-rich flap, which anneals to the A-tail of the L1 or Alu retroelement, allowing for reverse transcription to occur. A second nick occurs in the complementary DNA strand, and second strand synthesis is performed through an unknown mechanism. This process results in a new ge- nomic copy of a retroelement.

20

Once in the nucleus, the first step in the L1 integration process is ORF2 nicking the genomic DNA at a non-stringent canonical endonuclease cleavage site (5’-TT|AAAA- 3’) (Feng et al., 1996). The nick allows access to a T-rich DNA strand that base pairs with the L1 poly-A tail. This nick provides the priming site to start reverse transcription of the L1 RNA template, generating cDNA at the site of insertion. The L1 insertion process is often interrupted due to either the low processivity of ORF2 or by interfering cellular factors. These characteristics lead to the 5’ truncation of the majority of new L1 insertions (Lander et al., 2001).

A second nick on the other DNA strand must also occurs. However, this process is poorly understood and currently, the source for the second nick is un- known. Proposed models speculate that the second nick could result from either the L1 ORF2 endonuclease or the flap endonuclease from the nucleotide exci- sion repair (NER) pathway, Ercc1/XPF (Gasior et al., 2008). Completion of the insertion generates target site duplications (TSDs), which average ~15-20 bases, but can vary in size up to several thousand (Gilbert et al., 2002; Symer et al.,

2002; Gilbert et al., 2005; Gasior et al., 2006a; Wagstaff et al., 2012; Servant et al., 2017)

1.2.2.E LINE-1 Localization

Each of the L1 components needs to be in specific cellular locations at very specific times in order to carryout successful retrotransposition events. L1

RNA is first transcribed from genomic DNA by RNA polymerase II in the nucleus.

The RNA is translocated into the cytoplasm, where both ORF1 and ORF2 are

21 translated into proteins. The L1 RNA is then brought back into the nucleus by both L1 proteins to the new insertion site. Then, the L1 ORF2 protein nicks the target DNA and reverse transcribes a cDNA copy of the L1 RNA. During de novo retrotransposition events therefore, the L1 components are likely localized into specific subcellular compartments.

However, several older reports on L1 protein localization differ from more recent studies. For example, one group utilized immunohistochemistry (IHC) studies, using antibodies against the C-terminus of both L1 proteins, to show that

L1 mRNA, ORF1, and ORF2 co-localize to cytoplasmic stress granules (Goodier et al., 2007; Goodier et al., 2010), which was also confirmed by later studies

(Harris et al., 2010; An et al., 2011). This IHC strategy relies on constructs fused to tags, which alter both the stability and subcellular localization of the fusion pro- tein (Goodier et al., 2004; Taylor et al., 2013; Sokolowski et al., 2013). An addi- tional complication is that IHC strategies tend to not be as sensitive as other as- says, as it is difficult to separate the positive signal from the non-specific binding of the antibody. Additionally, at the time these studies were performed, the only available antibodies for both L1 proteins were not specific and not widely com- mercially available (Sokolowski et al., 2014).

Western blot analysis of subfractionated cellular extracts show that un- tagged L1 ORF1 protein localizes predominantly to the nucleus (Sokolowski et al., 2013). This study also showed that VP16-tagged L1 ORF1 protein localizes to both nuclear and cytoplasmic compartments in the cell. Additionally, our lab showed that both myc-tagged and untagged ORF1 localization is predominantly

22 nuclear (Ade, in press), demonstrating that in this case, the addition of the myc tag does not alter ORF1 localization. This study also showed that nuclear locali- zation was not altered when the ORF1 protein was expressed from either the full- length L1 construct or a plasmid containing the ORF1 sequence alone. Howev- er, further research and experiments must be performed and optimized in order to better determine the subcellular localization of L1 RNPs.

1.2.4 Non-Autonomous elements mobilized by LINE-1

The L1 machinery can be high jacked by other mobile elements or RNA molecules. The SINE Alu and retrotransposon SVA are able to parasitize the L1 machinery in trans for insertion events. In contrast to L1, Alu elements are tran- scribed by RNA polymerase III and only require the L1 ORF2 protein to mobilize

(Dewannieux et al., 2003). Additionally, this study showed that Alu retrotranspo- sition increases when the L1 ORF2 protein is supplied alone in trans when com- pared to when ORF2 is supplied by a full-length L1 construct. Although not re- quired for retrotransposition, the addition of the ORF1 protein enhances Alu re- trotransposition (Wallace et al., 2008a). The non-autonomous SVA retroelement also requires the L1 ORF2 protein for mobilization events. However, it remains unclear to what extent SVA relies on ORF1p for its mobilization (Hancks et al.,

2011; Raiz et al., 2011). One study showed that a re-constructed SVA element, based on a copy found on chromosome 10, was dependent on ORF1p for re- trotransposition, while a different re-constructed SVA element from chromosome

2 did not require ORF1p for mobilization (Hancks et al., 2011).

23

1.3 Alu elements

Alu elements are primate specific Short Interspersed Elements (SINEs).

Like all SINEs, Alu elements are non-autonomous and rely on the L1 machinery to drive retrotransposition events (Dewannieux et al., 2003). Alu elements are approximately 300 bp long, and with over one million copies, they comprise ap- proximately 11% of the human genome (Lander et al., 2001). The vast majority of Alu element copies are both transcriptionally and retrotranspositionally incom- petent due to mutations and epigenetic silencing (Shen et al., 1991). Still, re- trotranspositionally competent Alu elements are responsible for contributing the most to genetic diversity in the human population when compared to other active retroelements (Stewart et al., 2011; Witherspoon et al., 2013). Studies estimate that de novo Alu insertions occur in approximately 1 in every 20 live human births

(Cordaux et al., 2006).

1.3.1 Origin of Alu elements

Alu elements are derived from 7SL RNA, a component of the signal recognition particle (SRP) responsible for the co-translation of membrane and secretory proteins in the endoplasmic reticulum (Walter and Blobel, 1982; Walter et al., 1982). Due to its importance, the 7SL RNA sequence is highly conserved across species. Alu elements are dimers, composed of two truncated 7SL mon- omers connected by an A-rich region. Alu elements contain four distinct regions: the left monomer, middle-A rich region, right monomer, and a poly-A tail (Figure

24

5). All of these components are required for effective Alu retrotransposition events.

25

Figure 5. Structure of an Alu element.

A full-length genomic Alu element consists of the left and right monomer, con- nected by the middle A-rich region. Alu elements are transcribed by RNA poly- merase III (Pol III), which utilizes the A and B boxes located in the left monomer to assemble upstream of the element. Alu elements do not contain a terminator sequence; rather, transcription stops when the RNA polymerase III machinery encounters a T-rich region in the flanking genomic sequence.

26

The first structure in an intact Alu element is the bipartite RNA polymerase

III promoter (Figure 5). This promoter contains the A and B boxes which are re- sponsible for directing the polymerase machinery to the correct upstream tran- scriptional start site [reviewed in (Weiner et al., 1986)]. Sequence analyses of the 5’ terminus of genomic Alu elements show that the A box is not as conserved as the B box when compared to the 7SL consensus sequence (Ullu and Weiner,

1985). However, only certain sequences within the B box are required for tran- scription initiation (Fuhrman et al., 1981). Therefore, it has been hypothesized that Alu elements have evolved to not require the function of the A box in their promoter region.

1.3.2 Alu Mobilization

1.3.2.A Alu Transcription

Alu elements are transcribed by RNA polymerase III. RNA polymerase III utilizes the A and B boxes found in the promoter region of the Alu element to di- rect the transcription machinery upstream (Duncan et al., 1981; Ullu and Weiner,

1985; Paulson and Schmid, 1986; Batzer and Deininger, 2002). However, the exact function of the A box in Alu transcription has been debated, as the A box sequences in retro-competent Alu elements have diverged significantly from the

7SL sequence (Ullu and Weiner, 1985; Weiner et al., 1986). Alu elements do not contain a transcription termination signal; instead, transcription occurs through the encoded A-tail until a RNA polymerase III termination signal (usually 4 Ts) is reached in the flanking genomic sequence (Fuhrman et al., 1981; Deininger et

27 al., 1981; Weiner et al., 1986). Therefore, due to the accumulation of mutations in individual Alu elements, a variable polymorphic A-tail, and loci-specific down- stream sequences, each Alu RNA molecule is unique (Ade et al., 2013).

Alu elements are GC-rich and contain multiple CpGs. These CpGs are subjected to methylation and other epigenetic modifications, significantly affect- ing the chromatin status of that genomic location (Slagel and Deininger, 1989;

Liu and Schmid, 1993; Englander et al., 1993; Liu et al., 1994; Vorce et al., 1994;

Englander and Howard, 1995). These modifications likely account for the low transcriptional activity of Alu elements. Due to their abundance, Alu methylation accounts for approximately 25% of all genome methylation events (Xie et al.,

2009).

1.3.2.B Alu RNP

Alu RNA molecules bind to the signal recognition proteins SRP9 and

SRP14 (Chang et al., 1996). SRP 9 and 14 facilitate bringing 7SL RNA to ribo- somes (Terzi et al., 2004). It is hypothesized that these two proteins are respon- sible for localizing Alu RNA to ribosomes (Hsu et al., 1995) to increase the possi- bility of Alu high jacking L1 proteins as they are being made (Boeke, 1997; Roy-

Engel et al., 2002; Dewannieux et al., 2003; Ade et al., 2013). Mutations altering the ability of the Alu RNA to interact with SRP9 and SRP14 reduce or abolish re- trotransposition competency (Sarrowa et al., 1997; Bennett et al., 2008). Studies suggest that Alu RNA-ribosome interactions (Alu RNP) might begin in the nucleo- lus, resulting in Alu RNA getting imported into the cytoplasm (Jacobson and

28

Pederson, 1998). PolyA binding protein (PABP), which binds to the A-tail of SI-

NE RNA molecules (West et al., 2002; Muddashetty et al., 2002), is proposed to interact with and further favor Alu RNA localization to ribosomes. PABP is known to interact with eIF4G, a protein located in the CAP complex of RNA pol- ymerase II derived transcripts. PABP, while bound to Alu RNA, might interact with the eIF4G located in the 5’ cap of L1 mRNA while it is being translated. This interaction places the Alu RNA at an ideal location to steal the L1 ORF2 protein immediately following translation, which would allow for Alu retrotransposition events (Gingras et al., 1999).

1.3.2.C Alu Retrotransposition

The L1 ORF2 protein is required for Alu retrotransposition (Dewannieux et al., 2003). Just like L1, Alu elements integrate through the process of target primed reverse transcription (TPRT). Similarly, the A-tail of the Alu element is proposed to base pair with the Ts exposed in the cleaved DNA (Figure 4). The length of the Alu A-tail is critical for successful retrotransposition events, priming reverse transcription events at the new site of integration. The mean length of the Alu A-tail in younger subfamilies is approximately 30 bp long, while the older, inactive elements have shorter A-tails (Roy-Engel et al., 2002). This study also showed that disease causing de novo Alu elements averaged A-tail lengths of over 50 adenine residues. Tissue culture studies demonstrated the importance of the length of the Alu A-tail (Dewannieux and Heidmann, 2005). Additionally, studies demonstrated that the length of the A-tail expands due to ORF2 reverse

29 transcriptase slippage on the Alu RNA (Wagstaff et al., 2012). Once inserted into the genome, these A-tails shorten and mutate, decreasing the retrotransposition capabilities of these elements (Roy-Engel et al., 2001; Roy-Engel et al., 2002).

Once inserted, Alu elements are also flanked by target site duplications, a hall- mark of bona fide retrotransposition events.

1.3.2.D Not all Alu Elements are Capable of Retrotransposition

Retrotransposition first begins with transcription of an Alu element by RNA polymerase III. Although many Alu elements contain intact internal promoters,

RNA pol III transcribed Alu RNA appears to be scarce in primate cells (Paulson and Schmid, 1986; Matera et al., 1990; Shaikh et al., 1997; Oler et al., 2012). In addition to the presence of an internal promoter, it has been demonstrated that specific sequence features of Alu elements can influence retrotransposition effi- ciency, such as right monomer composition. Random mutations in the right monomer affected retrotransposition rates sporadically: some mutations did not alter the retrotransposition efficiency, while others decreased retrotransposition rates. Therefore, the exact effect of a particular mutation cannot be predicted.

Additionally, the length and composition of the A-tail greatly influences the re- trotransposition capability of a genomic Alu element (Comeaux et al., 2009). In this study, retrotransposition was most effective when the A-tail was at least 20 adenines long, contained no disruptions in the A-tail, and had few (if any) se- quences between the A-tail and the pol III termination signal.

30

Very few studies have been performed with reliable information on bona fide Alu transcripts [reviewed in (Roy-Engel, 2012)]. One complication of detect- ing and analyzing these RNA Pol III derived Alu transcripts is the presence of Alu

RNA in RNA Pol II transcripts; Alu sequences have been found in both in the in- trons and 3’ UTR of mRNA molecules. Importantly, RNA Pol II transcribed Alu

RNA molecules are unable to retrotranspose (Yulug et al., 1995; Kroutter et al.,

2009). RNA polymerase III occupancy has been used as a proxy for potential transcription activity. A few successful studies have used genome-wide ChIP- seq to identify and analyze RNA polymerase III driven transcripts (Oler et al.,

2010; Canella et al., 2010; Moqtaderi et al., 2010; Oler et al., 2012). Analysis of

ChIP identified Alu loci demonstrated that most genomic Alu elements, with bound transcription machinery, lacked the sequence characteristics needed for retrotransposition (Comeaux et al., 2009; Oler et al., 2012). Therefore, the ability of a particular Alu element to bind transcription machinery (i.e., transcription po- tential) is not a good indicator of the retrotransposition capability.

1.4 Retrotransposons contribute to human diseases

There are three main mechanisms as to how retrotransposons contribute to human disease: insertional mutagenesis, non-allelic homologous recombina- tion (NAHR) and expression of proteins that may contribute to genetic damage

(Batzer and Deininger, 2002; Gasior et al., 2006b; Hedges and Deininger, 2007;

Cordaux and Batzer, 2009; Hancks and Kazazian, Jr., 2012; Kaer and Speek,

2013; Hancks and Kazazian, Jr., 2016). Most of the genetic damage caused by

31 mobile elements is irreversible. Moreover, humans do not possess a mechanism to specifically remove a mobile element insertion once integration occurs. There- fore, these mobile elements are a constant threat to genomic integrity.

1.4.1 Insertional mutagenesis

Insertional mutagenesis is the first mechanism by which mobile elements can cause diseases. Retrotransposon based insertional mutagenesis is estimat- ed to contribute to around 0.3% of genetic diseases (Deininger and Batzer, 1999;

Callinan and Batzer, 2006). De novo insertion events disrupt gene function, as they can occur within introns or exons of genes, which potentially disrupts splic- ing patterns and introduces foreign promoters and regulatory sequences into new genomic regions (Kaer and Speek, 2013). As of 2017, there are over one- hundred examples of germline diseases that have been caused by mobile ele- ment insertions into a gene (Kaer and Speek, 2013; Hancks and Kazazian, Jr.,

2016). Interestingly, the vast majority are due to de novo Alu insertions. There is a huge diversity of diseases reported, which include (but are not limited to) im- mune-deficiencies (Apoil et al., ; Brouha et al., 2002), vision deficiencies

(Schwahn et al., 1998), degenerate muscle conditions (Narita et al., 1993;

Holmes et al., 1994; Awano et al., 2010; Solyom et al., 2012), and blood clotting disorders such as hemophilia (Kazazian et al., 1988; Li et al., 2001; Ganguly et al., 2003).

32

1.4.2 Non-allelic Homologous recombination (NAHR) events

Due to their abundance, genomic Alu and L1 elements can promote une- qual homologous recombination events, termed non-allelic homologous recombi- nation (NAHR) (Deininger and Batzer, 1999; Han et al., 2008; Startek et al.,

2015). These recombination events can lead to segmental duplications, dele- tions, and chromosomal rearrangements (Batzer and Deininger, 2002). Alu, be- ing more abundant, is more commonly observed contributing to NAHR events.

Alu/Alu NAHR events contribute to several diseases, such as acute myelogenous leukemia, Ewing sarcoma, Tay-Sachs disease, and Lesch-Nyhan Syndrome

(Deininger and Batzer, 1999; Batzer and Deininger, 2002). Alu elements are found dispersed through genes at varying densities. Data from published litera- ture indicate that specific genes containing a high density of Alus are particularly susceptible to this form of genetic instability (Waldman and Liskay, 1988; Labuda and Striker, 1989; Batzer et al., 1990; Batzer and Deininger, 2002). For exam- ple, the VHL, MLL1, MLH1, MSH2, BRCA1, and BRCA2 genes are recurrently affected by Alu-mediated NAHR (Konkel and Batzer, 2010).

1.4.3 Adverse effects from L1 protein expression

Expression of L1 proteins are reported to contribute to genetic damage.

Studies show that L1 ORF2 expression causes double strand breaks (DSB), damaging the genome (Gasior et al., 2006b). Furthermore, the expression of ei- ther full-length L1 or ORF2 alone has been shown to induce a senescence-like phenotype in human fibroblasts and adult stem cells (Belancio et al., 2010). L1

33 expression in MCF7 and HeLa cells lead to a decrease in cell viability when ei- ther ORF2 or full-length L1 was expressed (Wallace et al., 2008b). Cell viability was partially restored when mutations to either the reverse transcriptase or en- donuclease domain were made. However, mutating both domains eliminated the negative effects of the ORF2 protein on cell viability. Additionally, it has been demonstrated that truncated ORF2 proteins can have adverse effects on the ge- nome both by creating double strand breaks and mobilizing Alu elements (Kines et al., 2014). Therefore, ORF2 proteins need not be completely functional to have adverse effects on cell viability.

1.4.4. Retroelement activity and expression in cancer

There have been several identified cancers that occurred when mobile el- ement inserted into the genome [recently reviewed in (Kaer and Speek, 2013;

Burns, 2017)]. Two types of heritable cancers arose when either an Alu or an L1 element inserted into the DNA repair genes MLH1 and MSH2 [reviewed in

(Burns, 2017) and (Kloor et al., 2004)]. Additionally, an L1 insertion into the RB1 gene caused a familial case of retinoblastoma (Rodriguez-Martin et al., 2016).

Additionally, an Alu insertion into the germline lead to the development of sporad- ic ovarian cancer (Rowe et al., 1995). Studies in breast cancer tumor samples showed that an Alu inserted into both the BRCA1 and BRCA 2 genes in the germline, resulting in the aberrant splicing of BRCA2, which lead to breast cancer

(Miki et al., 1996; Teugels et al., 2005). Mobile element insertions into the Neu- rofibromin 1 (NF1) gene have caused 18 separate cases of Neurofibromatosis

34 type I (Wimmer et al., 2011). Mobile element insertions were found in both the introns and exons of NF1, with three insertion sites having more than one mobile element landing in that locus in separate patients. This particular disease analy- sis shows that some genomic locations might be more susceptible to mobile el- ement insertions that lead to disease.

Multiple cases of somatic insertions have been associated with several cancers. For example, a somatic L1 insertion into the PTEN gene is associated with the development of endometrial carcinoma (Helman et al., 2014). The inser- tion of an Alu element into the Mlvi-2 locus resulted in a B-cell lymphoma

(Economou-Pachnis and Tsichlis, 1985). In another instance, an L1 element in- serted into the APC gene of a somatic cell, leading to the development of colon cancer (Miki et al., 1992). However, it is difficult to discern whether mobile ele- ment insertions discovered in cancers are driver mutations (contributing to the cancer’s progression) or whether the insertions are passenger mutations, simply representing a consequence of the cancer’s progression.

Recently, L1 proteins have been used as a hallmark for certain cancers, acting as marker for cancer severity and prognosis (Leibold et al., 1990; Rodic et al., 2014; Ardeljan et al., 2017). In over 90% of pancreatic, breast, and ovarian cancers, the L1 ORF1 protein can be detected by IHC (Rodic et al., 2014). Addi- tionally, ORF1 protein can be detected in approximate 50% of GI tract, lung, and prostate cancers (Rodic et al., 2014). However, L1 ORF1p expression can vary by patient, representing the different epigenetic landscapes of individual tumors

(Rodic et al., 2015; Burns, 2017). Moreover, DNA methylation studies have been

35 performed in lung cancer samples that compare de novo somatic L1 retrotrans- position events with those that to do, demonstrating the importance of DNA land- scape to L1 activity (Iskow et al., 2010). One recent study identified an increased number of L1 insertions and presence of the L1 ORF1 protein in Barrett’s esoph- agus when compared to matched, normal tissue (Doucet-O'Hare et al., 2015).

1.5 Environmental influences on mobile element activity

Mobile element expression and retrotransposition can be influenced by exposure to environmental factors. Exposures that alter methylation, either di- rectly or indirectly, could influence L1 expression levels (Bourc'his and Bestor,

2004). For example, one study demonstrated that Benzo-alpha-pyrene (BaP) treatment, a carcinogen, changed the methylation status of the L1 promoter and increased L1 expression (Stribinskis and Ramos, 2006; Teneng et al., 2011). The authors proposed that BaP inhibits the assembly of the methylation machinery

DNMT1 and DNMT3A, which leads to hypomethylation of the L1 promoter

(Weisenberger and Romano, 1999). The authors also indicated that exposure to this compound promoted an enrichment of open chromatin at the histone level at the L1 promoter (Teneng et al., 2011). Several groups have performed research on how heavy metals and other carcinogens affect L1 promoter methylation status. Phthalate, arsenic, cadmium, benzene, hydrogen peroxide, lead, and etoposide were reported to decrease L1 promoter methylation (Hagan et al.,

2003; Bollati et al., 2007; Pilsner et al., 2009; Intarasunanont et al., 2012;

Hossain et al., 2012; Kloypan et al., 2015; Huen et al., 2016). Although more

36 research needs to be performed in order to determine the exact consequences of

L1 promoter hypomethylation, these studies demonstrate that exposure to enviromental agents can have an impact on L1 epigenetics.

Several in vitro studies demonstrated that exposure to UV, gamma, and ionizing radiation increases L1 retrotransposition when compared to control cells

(Servomaa and Rytomaa, 1988; Servomaa and Rytomaa, 1990; Farkash et al.,

2006; Tanaka et al., 2012; Kloypan et al., 2015; Luzhna et al., 2015).

Additionally, L1 retrotransposition increased in tissue culture experiments when exposed to the drugs morphine, cocaine, methamphetamine, and cigarette smoke extracts (Miglino et al., 2012; Okudaira et al., 2014; Moszczynska et al.,

2015; Okudaira et al., 2016).

Compounds have different effects on mobile element stimulation and activity, making it difficult to accurately predict the exposure effects from heavy metals, carcinogens, and others have on L1 acitivity. For example, an in vivo study showed that exposure to methamphetamine increased L1 retrotransposition events in rat brains (Moszczynska et al., 2015). A separate study showed that exposure to light at night decreased melatonin production and stimulated L1 retrotransposition events in the tumors of nude male mice (deHaro et al., 2014). Other studies show that some compounds have no effect on L1 retrotransposition. For example, the chemotherapeutic agents cisplatin (a DNA crosslinking agent), calicheamicin (a DNA cleaving agent), and camptothecin (a topoisomerase inhibitor) do not increase L1 retrotransposition in culture (Terasaki et al., 2013).

37

These studies demonstrate that the cellular responses triggered by the exposure to different compounds are complex. This limits the ability to dissect all of the mechanistic pathways responsible for the effects of the various com- pounds on L1 activity. Furthermore, it would not be surprising if individual com- pounds have different mechanisms affecting L1 activity. These mechanisms could vary depending on the cell lines, tumor, or unique epigenetic landscape of each patient sample utilized in these experiments. In order to enact protective measures against L1 mobilization and activity, we need to first understand the mechanisms behind these changes in L1 mobilization.

1.6 Host suppression mechanisms

Retroelements are considered to be parasitic, endogenous mutagens to a genome. Although the activity of these mobile elements can lead to apoptosis or cellular senescence (Goodier et al., 2004; Gasior et al., 2006b), there is no known mechanism to specifically remove a mobile element once it has inserted into the genome. Therefore, cells have developed preventive mechanisms for limiting the activity of these elements to better preserve genomic integrity.

A host has many different ways of suppressing retroelement activity. The first major cellular control mechanism is at the expression level, including tran- scriptional silencing. Promoter methylation significantly contributes to preventing expression of L1 and Alu. In mouse germline cells, the DNA methyltransferase

3-like gene (Dnmt3L), MILI, and MIWI2 proteins are responsible for keeping de novo mobile element insertion events methylated (Bourc'his and Bestor, 2004;

38

Weisenberger et al., 2005; Aravin et al., 2008). Dnmt3L knockout mice show that losing this gene prevents de novo methylation of L1 elements after embryonic reprogramming, leading to their reactivation (Bourc'his and Bestor, 2004). Alter- nate regulation strategies in mammals include the Drosha microprocessor ma- chinery, which is shown to hinder Alu retrotransposition in cell culture assays

(Heras et al., 2013). Additionally, studies demonstrated that MOV10, a protein in the RNA-induced silencing complex (RISC), inhibited both L1 and Alu retrotrans- position using similar tissue-culture methods (Goodier et al., 2012). The APO-

BEC3 family of proteins is shown to downregulate L1 and Alu retrotransposition in culture. However, the exact mechanism behind this repression remains un- clear. Different APOBEC proteins show different effects on Alu and L1 activity.

APOBEC3A, APOBEC3B, and APOBEC3C limit both L1 and Alu retrotransposi- tion (Bogerd et al., 2006; Muckenfuss et al., 2006; Stenglein and Harris, 2006).

The APOBEC3G protein is unique because it limits Alu retrotransposition, but not

L1 retrotransposition (Hulme et al., 2007). APOBECs are though to inhibit these elements by sequestering the RNA molecules, preventing mobilization (Chiu et al., 2006; Hulme et al., 2007). Surprisingly, knocking out the APOBEC3 protein in mice did not lead to an increase in retrotransposition events from L1 or L1- driven retroelements (Mikl et al., 2005). Therefore, the exact role of APOBEC3 proteins inhibiting mobile element activity needs to be further elucidated.

39

1.7 Assays to detect retrotransposition events

Due to the importance of retrotransposons on expanding and diversifying the human genome, researchers have been interested in understanding the mechanism and regulation of these mobile elements. Therefore, the develop- ment of assays to study, monitor, and evaluate mobile element insertions in tis- sue culture systems was crucial to the field (Dewannieux et al., 2003; Raiz et al.,

2011). There are currently three distinct methods to evaluate mobile element in- sertions: the retrotransposition assay, rescue assay, and next-generation se- quencing assays. Each method is designed to measure specific characteristics of mobile element insertions.

1.7.1 Alu and L1 retrotransposition assay

The first assay developed was the L1 retrotransposition assay, which evaluates L1 activity in tissue culture (Freeman et al., 1994; Moran et al., 1996).

This assay relies on a reporter cassette that contains an inverted reporter gene interrupted by an intron in the sense orientation (Figure 6). During a retrotrans- position event, the intron will be spliced out due to the position of the intron rela- tive to the L1 promoter. Because the reporter cassette is in the opposite orienta- tion of the L1 construct, only successful integration of the L1 construct into the genome will result in expression of the reporter cassette. Currently there are L1 constructs with reporter cassettes to neomycin, blasticidin, fluorescence, and fire- fly luciferase (Moran et al., 1996; Ostertag et al., 2000; Goodier et al., 2007; Xie et al., 2010).

40

Figure 6. L1 construct used in the L1 retrotransposition assay

L1 retrotransposition can be measured by using an L1 retrotransposition reporter construct. Both L1 proteins, L1 ORF1 and L1 ORF2, are transcribed from an RNA Pol II promoter. Constructs contain a reporter cassette and promoter in the opposite orientation of the L1 sequence. The reporter cassette possesses an intron, which is in the sense orientation relative to the L1 sequence. Upon tran- scription, the intron located in the reporter gene will be spliced out. The reporter gene will be expressed when it is integrated into the genome with the intron spliced out.

41

Unlike L1, the Alu retrotransposition assay only has one resistance cas- sette, neomycin (Dewannieux et al., 2003). This is due to the unique require- ments of an Alu element that is transcribed by RNA polymerase III. For example, a self-splicing intron had to be used in the Alu reporter cassette as Pol III tran- scripts do not go through the same processing (splicing) as RNA polymerase II transcripts do (Esnault et al., 2002). Additionally, Pol III transcripts rely on ge- nomic T-rich sequences to act as their terminator. Therefore, Alu retrotransposi- tion cassettes must be devoid of T-rich sequences to minimize truncated tran- scripts. Finally, the available Alu construct will not work in cell lines that already have neomycin resistance.

1.7.2 Alu and L1 rescue assay

The retrotransposition assays described above are great tools to analyze how well a particular L1 or Alu element jumps in tissue culture. However, these assays do not allow for identification and sequence analysis of the insertion site.

To meet this need, both the L1 and the Alu rescue assays were developed to an- alyze de novo inserts in a tissue culture system (Ostertag and Kazazian, Jr.,

2001; Gilbert et al., 2002; Symer et al., 2002; Gilbert et al., 2005; Wagstaff et al.,

2012). Briefly, the rescue cassettes are designed so that de novo insertions can function as a plasmid once transformed into bacterial cells. The cassettes used in these experiments contain a bacterial promoter to confer resistance in bacteri- al cells, in addition to an origin of replication (Stalker et al., 1982; Shafferman and

Helinski, 1983) (Figure 7). These two additional features will allow for easy se-

42 quencing of the genomic DNA flanking the Alu or L1 insertion. Briefly, either the

L1 or Alu rescue vector is transfected into retrotranspositionally competent cells, along with a driver of choice (in the case of the Alu rescue vector) (Symer et al.,

2002; Wagstaff et al., 2012). After colonies form, the genomic DNA is extracted, digested, and the DNA fragments are ligated together to form circles. The pres- ence of an origin of replication and the bacterial promoter driving resistance al- lows for the DNA to replicate in the bacterial cells under kanamycin selection.

Growth of kanamycin resistant bacterial colonies will yield plasmids containing the inserts and flanking genomic sequences that can be analyzed. The recov- ered insertions can be sequence evaluated to determine the location of the inser- tion, A-tail composition, and target site duplication (TSDs).

43

Figure 7. L1 and Alu constructs used in rescue retrotransposition assays

Both the L1 and Alu rescue constructs contain a mobile element in the sense ori- entation and a reporter cassette in the opposite orientation. Upon transcription, the intron located in the reporter cassette will be spliced out. These constructs also contain a bacterial origin of replication (Ori) and bacterial promoter so that circular DNA containing the inserts can function as a plasmid in bacterial cells.

44

1.7.3 Next-generation sequencing assays to detect new insertions

There are multiple high-throughput methods that have been implemented to study mobile element insertions in the human genome. Due to their repetitive nature, these elements must be analyzed by specifically designed pipelines that collate mobile DNA instead of removing these sequences. Various labs have developed methods for analyzing repetitive DNA that rely on enriching for these insertions during the library preparations of the samples used for next generation sequencing.

One approach uses a hemi-specific nested PCR strategy, termed L1-Seq, where a human-specific L1 primer is used to extend and enrich for newer L1 in- sertions in the genome (Ewing and Kazazian, Jr., 2010). This approach identi- fied over 1100 genomic L1 insertions in 25 individuals, with a high level of speci- ficity and sensitivity. One drawback of this method is that the primers used are specific to L1, and share sequence homology with over 1000 loci in the human genome (due to the repetitive nature of L1 elements). Therefore, there is a high probability that artifacts could be created during the initial PCR steps. Additional- ly, this method only acquired data from the 3’ end of genomic L1 insertions; therefore, this strategy is unable to detect and distinguish information on full- length L1 elements.

Transposon-Seq, an alternate approach from the Devine lab, utilizes ge- nomic DNA digestion preceding Roche 454 linker ligation. The DNA containing linkers is then PCR amplified using a nested PCR approach with primers specific to newer L1 elements to detect genomic polymorphic L1 elements (Iskow et al.,

45

2010). With this approach, 650 uncharacterized polymorphic L1 insertions in 76 individuals were identified, including a somatic L1 insertion in a lung cancer cell line that was not present in the matched, normal tissue. This approach has the potential to be utilized further in identifying de novo L1 insertions in heterogene- ous samples, such as tumors. A limitation of this approach is its inability to dis- tinguish full-length L1 elements from 5’ truncated L1 elements. Additionally, PCR duplicates and independent ligation events cannot be distinguished, due to the digestion of the genomic DNA: all DNA from a single locus will be cleaved in the same manner.

In 2015, the Deininger lab developed a NGS approach that identified full- length genomic L1 elements, termed SIMPLE (Streva et al., 2015). In this ap- proach, randomly sheared genomic DNA is PCR amplified using a primer specific to the 5’ UTR of genomic L1 elements to enrich for full-length L1 insertions.

Next, a linker is ligated to the 3’ end of the PCR products, taking advantage of the adenine overhang provided by the Taq polymerase. The samples are PCR amplified with Illumina specific primers to send off for NGS analysis. SIMPLE identified almost all of the previously reported full-length L1 elements in the hu- man genome. Using this approach, the group reported the detection of 228 full- length polymorphic insertions in seven individuals.

An alternate strategy developed by the Faulkner lab attempted to detect both full-length Alu and L1 insertions in brain tissues using a hybridization cap- ture technique termed RC-Seq (Baillie et al., 2011). This strategy relies on the development of probes to the 5’ and 3’ ends of genomic L1 and Alu elements,

46 which are put onto arrays. Genomic samples were PCR amplified and hybridized to the capture arrays, where the hybridized samples were PCR amplified again before Illumina sequencing. This approach identified over 7000 somatic L1 in- sertions and over 13,000 somatic Alu insertions in only 3 patients (Baillie et al.,

2011). Over 70% of the analyzed L1 insertions contained 5’ transductions, which is much higher than previously reported. Further manual inspection of these data indicated that a few of the 5’ transduced regions contained the Illumina primer sequence, indicating these insertions were artifacts.

Since 2011, improvements have been made to the initial RC-Seq method.

For example, L1-Seq and RC-Seq were performed in parallel to detect 107 po- tential tumor-specific insertions in 16 colorectal cancer samples (Solyom et al.,

2012). Conclusions from this study showed an increase in both sensitivity and specificity in L1-Seq over RC-Seq, as RC results were not as readily validated by

PCR. A second improvement to the RC-Seq strategy used new, updated probes and multiplex liquid-phase capture, which identified 12 tumor-specific hepatocel- lular carcinoma L1 insertions (Shukla et al., 2013).

The Jorde lab focuses their attention on detecting genomic Alu insertions via NGS approaches, termed ME-Scan (Witherspoon et al., 2010; Witherspoon et al., 2013). This protocol first started with sheared genomic DNA and subse- quent linker ligation to each end of a genomic fragment, followed by nested PCR amplification to be analyzed by Illumina sequencing. Their approach successful- ly identified over 2500 previously unidentified Alu insertions from diverse human populations (Witherspoon et al., 2013). This technique utilized custom sequenc-

47 ing primers and random shearing, allows for easy detection of the Alu-genomic

DNA junction.

1.8 Naturally occurring elements exhibit site-specific insertions

Mobile elements exhibit two different insertion preferences: those that ex- hibit site-specificity and those that insert dispersed throughout the genome. L1 and Alu insert dispersed throughout the genome (Gilbert et al., 2005; Wagstaff et al., 2012). The main determinant as to where these insertions occur seems to be defined by the L1 ORF2 endonuclease site, 5’-TT/AAAA-3’ (consensus) (Feng et al., 1996; Cost and Boeke, 1998) and the ability of the RNA to base pair for

TPRT to occur (Roy-Engel, 2012). In contrast, site-specific retrotransposons consistently insert into specific sequences. Several non-LTR retrotransposons insert into a specific sequence located in the ribosomal genes in the host DNA

(Fujiwara, 2015).

The type of endonuclease used by the non-LTR retrotransposons appears to play a role in the insertion site selection. Non-LTR Retrotransposons can be divided into specific groups based on the endonuclease domain present: apuri- nic/apyrimidinic endonuclease (AP) or Restriction enzyme-like (RLE) endonucle- ases (Feng et al., 1996; Yang et al., 1999). There are two major clades of AP retroelements and five different RLE clades that exhibit site-specific insertions into repetitive stretches of DNA (Jurka, 1997; Fujiwara, 2015). Although multiple types of repetitive DNA are targets for non-LTR retrotransposons (including rRNA, tRNA, snRNA, telomeres, , and transposon DNA), the litera-

48 ture available suggests that two types of DNA are particularly amenable to inser- tions: telomeres and ribosomal DNA. Drosophila telomeres are composed of and maintained through retrotransposition events of three mobile elements: L TART,

HeT-A, and TAHRE (Biessmann et al., 1992; Levis et al., 1993; Pardue and

DeBaryshe, 2003; Capkova et al., 2008; Pardue and DeBaryshe, 2011; Silva-

Sousa and Casacuberta, 2013; Raffa et al., 2013; Zhang et al., 2014). All three of these mobile elements belong to the AP retrotransposons. Although they tar- get telomeres, they do not insert into a preferred sequence at the telomeric target site. In contrast to Drosophila, the TRAS and SART AP retrotransposons insert specifically into the TTAGG repeats present in the telomeres of Bombyx mori

(Okazaki et al., 1995; Takahashi et al., 1997; Kubo et al., 2001; Osanai-

Futahashi et al., 2008). Both TRAS and SART retrotransposons also play a role in maintenance, as TRAP studies showed a decreased level of te- lomerase activity in cultured cells from Bombyx mori (Mitchell et al., 2010).

These data indicate that some site-specific mobile element insertions play a criti- cal role in maintaining the viability and longevity of cells.

The R2 retroelement from Bombyx mori inserts specifically into the 28S ri- bosomal DNA in its host genome (Burke et al., 1987). The R2 element contains a restriction-like endonuclease (RLE) in order to perform TPRT events (Eickbush and Jamburuthugoda, 2008). The site-specific insertional characteristic of this mobile element allowed for in vitro studies dissecting the insertional mechanism of non-LTR retrotransposons (Luan et al., 1993; Luan and Eickbush, 1995;

George et al., 1996; Christensen et al., 2005; Christensen et al., 2006). These

49 studies led to the development of the Target Primed Reverse Transcription mod- el. This insertional mechanism has been extrapolated to explain the integration process of the non-LRT retrotransposons L1 and Alu element, which do not ex- hibit site-specific insertions.

The type of endonuclease is not the only factor determining site insertion preference. Studies show that the site-specificity exhibited by the R2-A re- trotransposon are also influenced by the presence of specific DNA binding do- mains coded for in the N-terminus of the retrotransposon (Shivram et al., 2011)

(Figure 8). Different R2 subfamilies contain a Myb DNA binding domain in addi- tion to one (or several) zinc fingers. These DNA binding domains confer target- ing to the 28S ribosomal DNA (Shivram et al., 2011; Thompson and Christensen,

2011). Therefore, the presence of DNA binding domains upstream of the endo- nuclease and reverse transcriptase domains might be a key factor in targeting insertions of retrotransposons.

50

Figure 8. Examples of naturally occurring non-LTR mobile elements that exhibit site-specific insertions.

A. The open reading frame of the R2 retroelement. The N-terminus contains DNA binding domains (ZF and Myb), which confer site-specificity. The reverse transcriptase (RT) and C-terminal RLE-endonuclease domains are downstream.

B. R2-A and R2-D are the two main variants of the R2 retroelement. The R2-A derivation contains four DNA binding motifs, including three ZFs and one Myb binding domain. R2-D still targets the 28S rDNA, however this mobile element only contains one ZF and one Myb binding domain.

51

1.9 DNA binding domains with targeting capabilities

1.9.A Adeno-associated viral proteins (AAV)

The Adeno Associated Virus (AAV) genome contains one open reading frame, which codes for the REP gene. The two largest genes encoded by this protein, Rep78 and Rep68, have helicase, ATPase, and endonuclease activities

(Tratschin et al., 1984; Im and Muzyczka, 1990; Ni et al., 1994; Ward et al., 1994;

Wonderling et al., 1995). These proteins target and bind to specific Rep Recogni- tion Sequences (RRS), one of which is located in the AAVS1 region on chromo- some 19 in the human genome (Kotin et al., 1990; Weitzman et al., 1994). Fus- ing a multimerization domain to the N-terminus of the Rep78 protein created two novel DNA binding domains: TZ and LZ (Waterman et al., 1996). These two pro- teins can be utilized to engineer proteins with targeting capabilities (Owens et al.,

1993; Cathomen et al., 2000).

The AAV proteins have been useful tools in genome engineering and gene delivery mechanisms. In tissue culture experiments, AAVS1 integration of a specific gene occurred when the gene of interest (GOI) was flanked by AAV inverted terminal repeats, co-transfected with the Rep proteins (Kogure et al.,

2001; Recchia et al., 2004). Additionally, AAV-based vectors have been devel- oped to treat a variety of diseases, including rheumatoid arthritis, hemophilia B, and Parkinson’s disease (Herzog et al., 1997; Williams, 2007; Kaplitt et al.,

2007). Therefore, AAV-derived targeting strategies could prove to be very useful for site-specific integration systems.

52

1.9.B Cre recombinase

One of the most developed approaches for gene manipulation is the Cre- recombinase system from the P1 bacteriophage [reviewed in (Nagy, 2000)]. This

38 kDa protein is responsible for catalyzing recombination events between LoxP sites (Hamilton and Abremski, 1984). Two Cre recombinase molecules bind to each LoxP site, forming a tetramer to mediate efficient recombination events

(Voziyanov et al., 1999). Studies showed that the Cre-Lox system could effi- ciently mediate these recombination events in mammalian cells (Sauer and

Henderson, 1988; Sauer and Henderson, 1989; Sauer and Henderson, 1990).

Although some cryptic LoxP sites are reported in mammalian genomes (Semprini et al., 2007), the random occurrence of this specific 34 bp LoxP sequence should occur in 1018 bases of DNA, which is much larger than the human genome (Nagy,

2000).

The first experiments using Cre-recombinase technologies were aimed at activating a reporter gene through a bipartite selectable marker. This marker is placed in the genome and engineered in such a way that only recombination by

Cre-recombinase restores its function (Nagy, 2000). Additionally, it has been shown that Cre is able to interact with the target DNA if the recombinase capabili- ties are removed (Lee and Sadowski, 2003). Therefore, the Cre protein can be effectively used as a DNA binding domain without any interference from the re- combinase activities. Since the initial in vitro experiments, several in vivo tech- nologies and procedures have been developed that utilize Cre-recombinase me- diated knockouts in mice (Sakai and Miyazaki, 1997; Lallemand et al., 1998).

53

The Cre recombinase system can be used to conditionally knock-out genes, ei- ther in a tissue specific manner or during a specific developmental time point for genes that are embryonically lethal (Lakso et al., 1992; Holland et al., 1998).

1.9.C TALENs

Transcription activator-like effector endonucleases, or TALENs, are pro- teins that contain a DNA binding domain (TAL effector), derived from the plant bacteria Xanthomonas, fused to an endonuclease (EN) domain (Boch et al.,

2009; Moscou and Bogdanove, 2009). A TAL effector contains multiple 33-35 amino acid domains that are linked together in order to recognize longer DNA targets. Each domain confers targeting specificity by two specific amino acids within each 33-35 amino acid domain, termed repeat-variable diresidues (RVDs)

(Mak et al., 2012; Deng et al., 2012). TAL effectors are usually fused to an en- donuclease, such as FokI, in order to successfully create targeted breaks in the genome (Christian et al., 2010; Cermak et al., 2011). TALENs have successfully been utilized for precise, targeted genomic editing and engineering strategies in multiple species [reviewed in (Gaj et al., 2013)]. For example, several TALENs were designed for disrupting genes (such as the CCR5 gene in humans (Miller et al., 2007; Mussolino et al., 2011)) and adding genes into a genome (such as

AAVS1 (Hockemeyer et al., 2011)). TAL effectors can be used without an endo- nuclease domain to stimulate transcription when fused with a transcription acti- vating domain. In one example, TAL effectors were able to upregulate transcrip- tion of endogenous SOX2 and KLF4 in 293T cells (Zhang et al., 2011). Con-

54 versely, TAL effectors can also be used to inhibit transcription of mammalian genes when fused to a KRAB or mSim3 domain (Peng et al., 2000; Cong et al.,

2012; Li et al., 2012).

1.9.D Zinc Fingers

Zinc finger nucleases were discovered as genomic engineering tools in the early 1990s (Li et al., 1992). These fusion proteins are designed to have two distinct, yet functional components: a DNA binding domain that confers specificity for a particular target, and an endonuclease that cleaves the DNA. Each finger of the zinc finger protein binds three DNA bases; therefore, these DNA binding domains can be tailored to bind very specific sequences based on experimental requirements (Pavletich and Pabo, 1991; Carroll, 2011). Usually, zinc finger nu- cleases contain the FokI endonuclease, which nicks the DNA; therefore, two zinc finger-FokI fusion proteins must be designed and utilized in order to create the targeted double strand breaks (Bitinaite et al., 1998; Smith et al., 2000).

One distinct advantage about utilizing and optimizing zinc finger genomic editing strategies is that this technology can be applied to multiple organisms.

Zinc finger nucleases have been successfully created to specifically target DNA sequences in multiple species [reviewed in (Carroll, 2011)]. mRNA injection of zinc-finger nucleases into the embryos of several species were able to success- fully target a range of genes in order to produce viable offspring with specific ge- netic aberrations (Doyon et al., 2008; Meng et al., 2008; Foley et al., 2009;

Geurts et al., 2009; Mashimo et al., 2010; Carbery et al., 2010; Meyer et al.,

55

2010; Ochiai et al., 2010; Young et al., 2011). A different strategy, viral delivery of targeting zinc finger nucleases, was utilized to successfully modify the ge- nomes of several crop species (Marton et al., 2010). The range of genomic en- gineering successes demonstrates that zinc finger strategies can be utilized for a wide variety of applications.

1.9.E The CRISPR/Cas9 targeting system

The Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) system isolated from S. pyogenes has revolutionized genomic engineering strat- egies (Straubeta and Lahaye, 2013). CRISPR strategies have been successfully adapted in mammalian cells to target and alter specific genomic sequences by utilizing a Cas9 endonuclease protein in addition to a targeting guide RNA

(gRNA). The targeting gRNAs can be designed to target any feasible site by manipulating the crRNA sequence, making the CRISPR strategy particularly flex- ible to select and modify genomic targets of interest (Sander and Joung, 2014).

One of the requirements of designing gRNAs is that there must be a Protospacer

Adjacent Motif (PAM) sequence directly downstream of the targeting crRNA se- quence (Sander and Joung, 2014). The gRNA and Cas9 protein form a complex that is taken to the desired genomic target. The PAM of each gRNA directs the

Cas9 endonuclease protein exactly where to cleave the target DNA, creating a targeted double strand break. Additionally, there are several Cas9 protein vari- ants that are currently utilized for different targeting strategies that provide a wide array of approaches (Doudna and Charpentier, 2014). The catalytically inactive

56 version of Cas9, dCas9, has been used as a targeting DNA binding domain when supplemented with a gRNA (Jinek et al., 2012; Gasiunas et al., 2012). When fused to the transcriptional activating domain VP64, dCas9 can regulate the ex- pression of endogenous genes in human and mouse cells (Maeder et al., 2013;

Perez-Pinera et al., 2013; Konermann et al., 2013; Gilbert et al., 2013; Ebina et al., 2013), including VEGFA (Maeder et al., 2013). Thus, CRISPR is a flexible platform that can be modified to fit a wide range of applications.

1.10 Engineering mobile elements to promote site-specific inser- tions

Using the knowledge that DNA binding domains can impart targeting specificity to mobile elements, the Ivic’s lab demonstrated that fusing DNA bind- ing domains to the N-terminus of the DNA transposon, Sleeping beauty (SB), en- riched targeted insertions (Voigt et al., 2012; Ammar et al., 2012). The first study utilized the adeno-associated viral (AAV) Rep proteins, which target AAVS1, a specific RRS sequence located on the human chromosome 19 (Kotin et al.,

1990). In this study, fusing the AAV REP proteins to the N-terminus of SB in- creased SB transposition events within 10 kb of genomic RRS sequences by 2.7 fold (Ammar et al., 2012). In the same year, the Ivic’s lab fused a 6-fingered zinc finger (ZF) to the N-terminus of SB, which targets an 18 bp sequence in genomic

L1 elements. This strategy successfully enriched targeted SB insertions up to four-fold in a 400 bp window around the ZF target site when compared to un- fused SB (Voigt et al., 2012).

57

An additional study using a E2C zinc finger-HIV fusion protein demonstrated an enrichment of HIV insertions near the E2C binding site (Tan et al., 2004). The zinc finger used in this particular study is a polydactyl protein that contains 6 distinct fingers, which binds to an 18 base pair genomic DNA se- quence in the E2C binding site (Beerli et al., 1998; Beerli et al., 2000). Interest- ingly, fusing the zinc finger to the N-terminus of the HIV integrase did not disrupt the function of either the DNA binding domain or the integrase. These studies showed that the addition of a DNA binding domain to the N-terminus of reverse transcription machinery could be used for redirecting insertions.

Currently little is known about the cellular factors that regulate the activity of mammalian Alu and L1 elements. The details known about the insertion mechanisms of these two retroelements are extrapolated from the previous stud- ies of R2, a non-LTR retrotransposon that inserts into 28S rDNA (Luan et al.,

1993; Christensen et al., 2006). One of the biggest hurdles in studying human mobile elements is their dispersed distribution throughout the human genome.

Therefore, redirecting a mobile element to either insert into a specific sequence or be enriched at a particular site in the genome would be very beneficial to the field. For example, the only established systems to study Alu and L1 rely on large tags (the neomycin cassette is three times the size of an Alu element), cre- ating an artificial system. By targeting a mammalian retroelement to a specific location in the genome, the reliance on a resistance tag could be eliminated.

This development could pave the way for creating an Alu or L1 transgenic mouse model, which does not rely on a tagged element. If Alu or L1 elements integrate

58 in specific genomic locations, site-specific primers and proteins can be used to evaluate both retrotransposition intermediates and protein complexes formed at the site of integration. This dissertation presents the data from evaluating a vari- ety of DNA binding domains and their capability of redirecting retrotransposition insertion site preference. Five different types of DNA binding domains (REP pro- teins, Cre, dCas9, TAL effectors and zinc fingers) were fused to the L1 ORF2 protein to redirect Alu insertions to a specific site in the human genome.

59

Chapter 2. Materials and Methods

All constructs were sequence verified (Elim Biopharmaceuticals, Inc,

Hayward, California), and analyzed using DNA Star software (Lasergene 10).

Plasmids were purified using the Qiagen Plasmid Midi Prep kit.

2.1 Constructs

2.1.A Alu Constructs

The following previously published Alu constructs were used to evaluate the ability of the ORF2 fusion constructs to drive targeted Alu insertions:

pAluYa5-neoTET (Dewannieux et al., 2003) contains the AluYa5 tagged with the neoTET retrotransposition cassette.

pBS-Ya5rescue-A70D-SH (Wagstaff et al., 2012) contains the EM7 bacte- rial promoter upstream of the neoTET cassette in order to confer kanamycin re- sistance in bacterial cells. The minimal γ origin of replication (ORI) of plasmid

R6K was also introduced into this construct: any circular DNA containing this origin of replication will function as a plasmid when transformed into bacterial cells (Stalker et al., 1982; Shafferman and Helinski, 1983). Additionally, the

Shine-Dalgarno sequence was modified to remove AT richness that could func- tion as a RNA polymerase III terminator sequence (Orioli et al., 2011).

60

pBS-Ya5rescue-L1 PAM contains the target sequence to the 3’ end of ge- nomic L1 elements needed for base pairing during TPRT (see Figure 4) instead of the A tail of the pBS-Ya5rescue-A70D-SH construct. The new 3’end was cloned into the EcoRI and MluI sites of the Ya5rescue-A70D-SH construct using two phosphorylated oligonucleotides, corresponding to the sequence of the new

A-tail:

P-5’-CGCGTGCTAGCATGGCACATGTATACATTTTTTTTG-3’ and

3’-ACGATCGGACCGTGTACATATGTAAAAAAAACTTAA-5’-P.

Oligos were designed to contain an NheI site for easy detection of the insertion by digestion, plus an eight T-run to ensure termination of the RNA polymerase III transcription.

A detailed schematic of the Alu constructs is shown in Figure 9.

61

Figure 9. Schematic of the Alu Constructs for Evaluation of Alu Retrotrans- position and Recovery of Alu Insertions Sites.

A. AluYa5neoTET contains the 7SL upstream enhancer region and the Alu Ya5 consensus sequence, followed by the neoTET self-splicing indicator cassette. This construct also contains a poly-A stretch and a pol III terminator. A functional neomycin resistance cassette is restored when successful retrotransposition of the spliced Alu RNA occurs. B. pBS-Ya5rescue-A70Du was created from the AluYa5-neoTET construct by substituting the 3′ region with a commercially synthesized sequence. These changes included introducing the EM7 bacterial promoter upstream of the neoTET cassette in order to obtain kanamycin resistance in bacterial cells (green arrow, indicated by *)(Kroutter et al., 2009)(Kroutter et al., 2009)(Kroutter et al., 2009). Additionally, the minimal γ origin of replication (ORI) of plasmid R6K was intro- duced in order to minimize transcript length and poly-T runs (noted as blue rec- tangle, #) were modified to prevent premature termination of RNA polymerase III transcription(Stalker et al., 1982; Shafferman and Helinski, 1983)(Stalker et al., 1982; Shafferman and Helinski, 1983)(Stalker et al., 1982; Shafferman and Helinski, 1983). Lastly, the Shine-Dalgarno sequence was modified to remove AT richness that could function as a RNA polymerase III terminator from pBS- Ya5rescue-A70D.

62

2.1.B Creating the fusion proteins: LZ-ORF2 and TZ-ORF2 and Cre-ORF2

(Chapter 3)

pBudORF2CH (Wagstaff et al., 2012) was used as the base plasmid to create all constructs. This plasmid contains the fully codon optimized ORF2 from

L1RP, cloned into the expression vector pBudCE4.1 (Invitrogen/Thermo Scien- tific), under control of the CMV promoter. Additionally, a glycine helical peptide linker (GHL) (KLGGGAPAVGGGPKAADK), commercially synthesized by

Genscript, was cloned into the PstI and BamHI sites of pBudCE4.1, creating the new plasmid pBudO2CH +LGHL. The restriction sites HindIII, PstI, and SalI were added to the 5’ end of the GHL linker to allow N-terminal cloning procedures.

pcDNALZ-ORF2 and pcDNATZ-ORF2 contain the fully codon optimized

ORF2 protein from L1RP cloned in frame downstream of the LZ (leucine zipper) or

TZ (engineered leucine zipper) Rep binding motifs of the Adeno-Associated virus

(AAV) and the transcriptional activation domain of VP16 (AD). A schematic of the constructs is shown in Figure 10. ORF2 was PCR amplified with primers that generated 5’ ApaI and 3’ AgeI restriction sites for cloning into pcDNARepLZAD and pcDNARepTZAD (Cathomen et al., 2000).

pBud-N-Cre-ORF2 and pBud-C-Cre-ORF2 contain Cre either at the N- terminus or the C-terminus of the ORF2 protein (Figure 10). Cre was PCR ampli- fied from pBS185 CMV-Cre (Addgene plasmid#11916 (Sauer and Henderson,

1990)) with the following primers containing compatible ends for cloning the Cre recombinase protein upstream or downstream of the L1 ORF2 protein into the pBudO2CH +LGHL plasmid:

63

FPstI: 5’- GAACGGATCTGCAGGATCAGGATCAGGCATGTCCAATTTACTGAC -3’

RSalI: 5’- GCAGCTCAGTCGACCCTAATCGCCATCTTCCAG -3’.

For the N-terminally fused construct, 5’ PstI and 3’ SalI were used for cloning and for the C-terminally fused Cre, 5’, and the BamHI and 3’ EcoRI were used for cloning into pBud- ORF2CH.

64

Figure 10. Schematic of the ORF2 fusion constructs utilizing DNA binding domains that target sequences as multimers. Expression in all constructs is driven by the CMV promoter.

A. pBud-ORF2CH contains the fully codon optimized ORF2 from L1RP, cloned in- to the expression vector pBudCE4.1 under control of the CMV promoter.

B. pcDNALZ-ORF2 and pcDNATZ-ORF2 contain the Rep proteins TZ and LZ fused to the N-terminus of the L1 ORF2 protein.

C. pBud-N-Cre-ORF2 and pBud-C-ORF2 contain the Cre-recombinase protein fused to either the N-terminus or the C-terminus of the L1 ORF2 protein. The N- Cre-ORF2 fusion protein contains an 18 amino acid glycine-helical linker (GHL) between the Cre recombinase and the L1 ORF2 protein.

65

2.1.C Creation of the fusion proteins: TAL-ORF2 (Chapter 4)

pBud TAL-ORF2CH contains the TAL Effector targeting the AAV site fused upstream of the ORF2 (Figure 11). Complementary oligonucleotides that contain a BsmBI site (which provides compatible ends with SalI) were cloned into the

EarI site of the hAAVS1 1L TALEN plasmid (Addgene plasmid #35431 (Sanjana et al., 2012)). The TAL Effector was excised from the modified plasmid using the

BsmBI and SacI sites and cloned into the SacI and SalI sites of pBudO2CH +LGHL to introduce the TAL Effector upstream of the GHL linker and downstream of the

CMV promoter.

66

Figure 11. Schematic of the pBud TAL-ORF2CH construct. pBud TAL-ORF2 contains the left TAL effector targeting the human AAVS1 site cloned in frame upstream of pBudO2CH +LGHL plasmid. This construct also con- tains the ORF2 protein and GHL linker.

67

2.1.D Creation of the fusion proteins: ZF2.17-ORF2,ZF2.18-ORF2, and ZF2.1817-

ORF2 (Chapter 5)

pBud ZF2.17-ORF2CH , pBud ZF2.18-ORF2CH and pBud ZF2.1817-ORF2CH were creating by introducing the zinc fingers upstream of the L1 ORF2 protein

(Figure 12). These zinc fingers target specifics sequence in the engineered green fluorescence protein (EGFP) gene. ZF2.18, (a 3-fingered zinc finger target- ing a 9 bp sequence in EGFP) was PCR amplified and cloned in frame into the

PstI site of pBudO2CH +LGHL plasmid using primers that introduced flanking PstI sites. ZF2.17 (a 3-fingered zinc finger that targets a 9-bp sequence in EGFP adja- cent to the binding site of ZF2.18), was PCR amplified and cloned into the SalI and

PstI sites of the ORF2CH+LGHL plasmid. These two zinc fingers were designed to function as either separate 3-fingered zinc fingers, or together as a 6-finger zinc finger. To create the ZF2.1817-ORF2 fusion protein ZF2.18 was cloned upstream of the ZF2.17 sequence. The combined 2.18+2.17 sequence created a 6-fingered zinc finger sequence.

68

pBud ZF2.17-ORF2CH

pBud ZF2.18-ORF2CH

pBud ZF2.1817-ORF2CH

Figure 12. Schematic of the ORF2 fusion constructs utilizing zinc fingers that target EGFP. pBud ZF2.17-ORF2CH and pBud ZF2.18-ORF2CH each contain a three-fingered zinc finger that targets a 9 bp sequence in EGFP. The pBud ZF2.1817-ORF2CH is de- signed to target an 18 bp sequence in the EGFP.

69

2.1.E Creation of the fusion proteins: N-ZF4-ORF2, N-ZF2-ORF2, C-ZF4-ORF2, and C-ZF2-ORF2 with GHL, FL4, and HL4 linkers (Chapters 6).

The ZF4 and ZF2 zinc fingers were a kind gift from Dr. Zoltan Ivics (Voigt et al., 2012). Different constructs were generated with the ZF4 and ZF2 fused ei- ther at the N-terminus or the C-terminus of the L1 ORF2 protein. In addition, sets of the constructs were generated containing the alternate linkers FL4 and HL4

(Figure 13 shows schematics of all the variations of the constructs made). The constructs were made as follows:

pBud-N-ZF4-ORF2CH and pBud-N-ZF2-ORF2CH were generated by clon- ing PCR amplified ZF4 or ZF2 DNA upstream of L1 ORF2 into the SalI and PstI, sites of the ORF2CH+LGHL plasmid . The ZF4 sequence was PCR amplified using the following two primers:

F PST-ZF4: 5’-ATTCCTGCAGGCCACCATGCTGGAACCCGGCGAGA-3’ and

R Sal-ZF4: 5’- TGTCGACGCTGGTCTTTTTGCCAGTATGGG-3’.

The ZF2 sequence was amplified for use with the Gibson Assembly Cloning kit

(NEB) using the following two primers:

F-GA O2 ZF2: 5’- GCTTGCATTCCTGCAGGCCACCATGCTGGAACCCGGCGAG-3’

R GA Linker ZF2: 5’-CCGCCGCCCAGCTTGTCGACGCTGGTCTTTTTGCCCGTATG 3'

The forward primer used for this cloning method contained overlapping sequenc- es to the flanking sequences in the backbone plasmid (pBudCE4.1) and the N- terminus of the ZF2 zinc finger. The reverse primer contained overlapping se-

70 quences with the GHL linker and the 3’ end of the ZF2 zinc finger. This PCR product was combined using the Gibson Assembly kit (NEB, E5510) following the manufacturer’s recommended protocol, with the pBud-N-ZF4-ORF2CH construct that had been previously digested with both SalI and PstI.

Similarly, to create the C-terminally fused ZF proteins, the zinc finger and the glycine helical linker (GHL) sequences were moved to the C-terminus of the

ORF2CH sequence using the Gibson Assembly kit (NEB, E5510) following the manufacturer’s recommended protocol. Two separate PCR products were ampli- fied and combined with the pBudORF2CH (Wagstaff et al., 2012) EcoRI and

BamHI fragment. The GHL linker was amplified with the following primers that contained overlapping sequences to the C-terminus of the L1 ORF2 protein and the N-terminus of the ZF4 zinc finger:

F ORF2-LinkerGA: 5’-CTTCAGCCTGATCGGCGGCAACGGATCCGTCGACAAGC-3’

R ZF4-linkerGA: 5’-TCTCGCCGGGTTCCAGCATGATATCTTTATCG-3’

The following two primers were designed to amplify either the ZF4 or ZF2 zinc fingers that contained overlapping sequences to the C-terminus of the GHL linker and the pBud vector:

F linker ZF4GA: 5’-CCCAAGGCCGCCGATAAAGATATCATGCTGGAACCC-3’

RpBud-ZF4GA: 5’-CATCAATGTATCTTATCATGTCTGAATTCAGCTGGTCTTTTTGCC-3’

The Flexible Linker 4 (FL4- LSGGGGSGGGGSGGGGSGGGGSAAA) and

Helical Linker 4 (HL4- LAEAAAKEAAAKEAAAKEAAAKAAA) (Arai et al., 2001)

71 were synthesized by Genscript containing flanking 5’ SalI and 3’ EcoRV se- quences used to swap the GHL linker sequence with the new linkers.

72

Figure 13. Driver ORF2 fusion constructs utilizing DNA binding domains that target genomic sequences present in multiple times in the human ge- nome.

All Driver constructs contained the ORF2CH protein, driven by the CMV promoter. Fusions were created by adding the ZF4 and the ZF2 to the N-terminus or the C- terminus of the L1 ORF2 protein. In addition, all constructs were generating us- ing three different protein linkers separating the two proteins. Schematics of the plasmids: A. pBud-N-ZF4-ORF2CH set; B. pBud-C-ZF2-ORF2CH- set; C. pBud-N- ZF2-ORF2CH- set and D. pBud-C-ZF2-ORF2CH- set are shown. The small rectan- gle represents the linker sequence: 18 amino acid glycine-helical linker (GHL, green rectangle); 25 amino acid Flexible Linker 4 (FL4, dark teal rectangle); and the 25 amino acid Helical Linker 4 (HL4 purple rectangle).

73

2.1.F Creation of the CRISPR/Cas9 fusion proteins and gRNAs (Chapter 7).

2.1.F.I. Cas9, nickase and dCas9 fusion constructs

The Cas9 and the nickase proteins were fused upstream of an L1 ORF2 protein lacking endonuclease activity. For this purpose either an endonuclease double mutant or a N-terminally truncated ORF2 protein (lacking the endonucle- ase domain) were used to create the fusion constructs. However, the dCas9 pro- tein is a catalytically inactive version of the Cas9 protein, but retains its ability to interact with a gRNA (Qi et al., 2013). Thus, dCas9 was fused to the endonucle- ase competent ORF2 protein. Schematics of all the constructs are shown in Fig- ure 14.

74

Figure 14. Schematic of the CRISPR-ORF2 fusion constructs.

Cas9 or the D10A nickase variant were fused to the N-terminus of the ORF2 en- donuclease double mutant or the ORF2 RTCYS domains (completely lacks the endonuclease domain).

A. The nickase variant of the Cas9 protein contained the D10A mutation, indicat- ed by the orange line. Cas9 was fused to the N-terminus of the L1-ORF2 endo- nuclease double mutant of the RTCYS domains, co-transfected with a gRNA.

B. In the second system, dCas9 is utilized as the targeting agent, and the L1 ORF2 protein is utilized for its endonuclease and reverse transcriptase abilities. In the second strategy, MS2 is fused to the N-terminus of the L1-ORF2 protein and co-transfected with the dCas9 protein and a targeting gRNA.

75

To fuse the Cas9, dCas9 and the nickase proteins upstream of the ORF2 protein additional restriction sites were added to pBudO2CH +LGHL plasmid.

Complementary oligonucleotides containing the ClaI, PacI, and BstEII restriction sites were cloned into the HindIII and SalI sites of pBudO2CH +LGHL, creating pBudO2CH + CLGHL. The oligonucleotides used were as follows:

Top Hind newRE Sal: 5’- Phos –AGCTTatcgattaattaagaattccaccatgctggtaaccG -3’

Bot Hind newRE Sal: 5’- Phos-TCGACggttaccagcatggtggaattcttaattaatcgatA -3’

The pBudRTCYS+CLGHL construct was creating by PCR amplifying the reverse transcriptase and Cys domains (RTCYS) of the L1 ORF2 protein with

EcoRV and BamHI sites. The RTCYS PCR product was used to substitute the full length ORF2 sequence of the pBudO2CH +CLGHL plasmid. The following two primers were utilized:

F EcoRV RTCysCH: 5’- AAGATATCATGAAGGCCGAGATCAAGATGTTCTTC -3’

R BamHI ORF2CH: 5’- GGATCCGTTGCCGCCGATCAGGC -3

To generate the ORF2 endonuclease double mutant constructs, the first

500 bp of the ORF2 protein of the pBudO2CH +CLGHL plasmid was swapped with the sequence of an ORF2 endonuclease double mutant to create the pBud L1 endo-- +CLGHL construct. The 500 bp PCR product from the endonuclease dou- ble mutant was amplified with the following primers containing XbaI and EcoRV sequences to allow for insertion into the pBudO2CH +CLGHL plasmid.

76

F EcoRV ORF2ch: 5’- ATCATGACCGGCAGCACCAGCC -3’

R XbaI endo (-) ORF2ch: 5’ CTCTAGATTCTCCAGCTTGTTGGCGTACAGG -3’

The Cas9 nuclease was amplified from the spCas9 plasmid ((Addgene#

48137) (Cong et al., 2013)) plasmid to create the the pBud Cas9-RTCYS+CLGHL and pBud Cas9- L1 endo-- +CLGHL constructs using the Gibson Assembly meth- od per the manufacturer’s protocol. Primers used to amply the Cas9 nuclease were:

F-Cas9/nickaseGA 5’ –ACCCAAGCTTATCGATTAATCGACCATGGATAAAAAGTATTC -3’

R Cas9 GA: 5’- CCAGCTTGTCGACGGTTACCTTGACTTTCCTCTTCTTCTTG -3’

The D10A Cas9 Nickase protein was PCR amplified from the hCas9_D10A plasmid ((Addgene# 41816) (Mali et al., 2013)) to create the pBud

Nickase- L1 endo-- +CLGHL and pBud dCas9-ORF2 +CLGHL.constructs using the

Gibson Assembly method.. The following primers were utilized:

F-Cas9/nickaseGA 5’-ACCCAAGCTTatcgattaatCGACCATGGATAAAAAGTATTC -3’

R nickase GA: 5’- CCAGCTTGTCGACGGTTACCTTCACCTTCCTCTTCTTCTTG -3’

The dCas9 protein was added to the pBudO2CH + CLGHL vector using the

Gibson Assembly protocol. To build the pBud dCas9-ORF2, the dCas9 protein

(catalytically inactive D10A H841 double mutant) was PCR amplified from the plasmid pcDNA dCas9-VP64 ((Addgene #61422) (Konermann et al., 2015)) us- ing the following primers:

77

F-dCas9 VP64 GA 5’–ACCCAAGCTTATCGATTAATAACGAGATGGCCAAGGTGGACGA-3’

R dCas9VP64 GA: CCAGCTTGTCGACGGTTACCTTCTTGTACAGCTCGTCCATGCC -3’

To build the plasmid pBud MS2-ORF2 +CLGHL the MS2 protein from the

MS2-HB plasmid ((Addgene# 35573) (Tsai et al., 2011)) was PCR amplified to create the plasmid pBud MS2-ORF2 +CLGHL using the Gibson Assembly protocol.

The primers used to amply the MS2 domain were:

F-MS2 GA: 5’ –ACCCAAGCTTATCGATTAATTCGACCATGGCTTCTAACTTTAC-3’

R MS2 GA: 5’- CCAGCTTGTCGACGGTTACCTTGTTAATTAAGGAGTTTGCTGCG -3’

2.1.F.2 gRNA Construction

Two guide RNA (gRNA) backbones were used for the CRISPR/Cas9 gRNA plasmids: MLM 3636 (Addgene 43860) and sgRNA MS2 (Addgene 61424)

(Konermann et al., 2015). The sgRNA MS2 backbone was used in order to cre- ate the gRNAs that utilized the MS2 protein, as this construct contains two MS2 binding loops encoded in the gRNA sequence. Five gRNAs were created using the MLM 3636 backbone, and three gRNAs were designed to clone into the sgRNA MS2 backbone:

gRNA 2, was designed to a unique site on chromosome 8 (5’-

CTGATAAATAGTCAGTTAAA-3’) using the available website www.ZiFit.partners.org. This sequence contains the appropriate sequences to both direct site-specific Cas9 cleavage and provide the sequences for A-tail an- nealing during TPRT.

78

Three gRNAs were designed to the 5’ UTR of genomic L1 elements: gRNA 551, gRNA 765, and gRNA 892. These three gRNAs targeted the follow- ing sequences: 5’- GCCTCTGTAGGCTCCACCTC -3’, 5’- AGCAGGGGCACAC-

TGACACC -3’, and 5’- GTAGATAAAACCACAAAGAT -3’, respectively. An addi- tional gRNA was designed to the 3’ end of genomic L1 elements, with the target sequence of 5’- GTGGGTGCAGCGCACCAGCA -3’.

Three gRNAs containing the MS2 binding sites were designed using the same genomic L1 5’ UTR target sequences as listed above.

2.1.F.3 WRN Assay

HeLa cells were transiently transfected with 1 µg Cas9, nickase or the

Cas9 fusion protein constructs, 1 µg of the gRNA construct plus 300 ng of the

Homology Arm Cassette (HAC). The following day, the transfected cells were switched to media with blasticidin (2 µg/mL) and grown under selection for about

12-14 days until individual colonies were observed. To confirm recombination of the HAC into the WRN gene (Werner syndrome RecQ like helicase), individual blasticidin colonies were evaluated by semi nested PCR using the combinations of the following primers:

Red set: FWRN Nest1: 5’-GAGCCATGTAGTATATTATGGC-3’ / FWRN Nest3:

5’-ATGTTTCATCCCACCATCTTTAATGAG-3’ and R106Blast: 5’-

ATCTCATGCTGGAGTTCTTCGC-3’.

79

Green set: F107Blast: 5’- ATGGGGATGCTGTTGATTGTAGCCG-3’ and RWRN

Nest1: 5’-GCCAACAAACTACTTGTTGAGTAC-3 / RWRN Nest3: 5’-

ATAGTAAACAAGAGTCAAATAGGGA-3’.

The guide RNA was designed to target a unique site (5’-

GTCATAGCTACCATAGCTTT-3’) in exon 21 of the WRN gene using the availa- ble website www.ZiFit.partners.org.

2.2. Retrotransposition, Alu Rescue assay and insert analysis

Transient Alu retrotransposition and Alu rescue assays were performed in

HeLa cells as previously described (Wagstaff et al., 2012). The Alu plasmid pAluYa5-neoTET is an engineered Alu tagged with a retrotransposition indicator cassette for evaluating Alu retrotransposition in culture. This cassette is designed so that the selectable marker (neomycin) is only expressed only following an

RNA-mediated retrotransposition event of the spliced, tagged Alu RNA (Figure

15). Because Alu retrotransposition requires expression of either the L1 re- trotransposon or the L1 ORF2 protein, this construct was used to evaluate the function of the ORF2 fusion proteins. To evaluate insertion preference, the modi- fied pAluYa5-neoTET construct, pBS-Ya5rescue-A70D-SH, driven by the ORF2 protein or the ORF2 fusion constructs was used. Recovery of Alu inserts fol- lowed our previously published detailed protocol (Ade and Roy-Engel, 2016).

Briefly, for the Alu rescue assay, neomycin resistant colonies from individual flasks were pooled separately and processed independently to be able to distin- guish between potentially “identical” independent insertions from those repre-

80 senting the recovery of the same insertion multiple times. The plasmids contain- ing the recovered Alu inserts were sent for sequencing to Elim Biopharmaceuti- cals, Inc, Hayward, California. DNA Star Lasergene 10 software was utilized to analyze the flanking genomic sequences. The genomic position of each rescued

Alu insertion was determined using BLAT (http://genome.ucsc.edu) search using the human genome reference (GRCh37hg19).

81

Figure 15. Schematic of the Alu retrotransposition assay.

RNA transcription is performed by the internal RNA Pol III promoter of Alu, en- hanced by the 7SL upstream sequence. A self-splicing intron interrupts the ne- omycin (neo) resistance gene driven by the SV40 promoter present in an invert- ed orientation. Because of orientation, the intron will splice out only from the transcripts generated by the retroelement’s promoter. When the spliced RNA undergoes retrotransposition, it will generate a new insert tagged with a function- al neomycin gene.

82

2.3. Creation of a HeLa-LoxP cell line and HeLa-EGFP cell line.

To evaluate the ORF2-Cre fusion proteins, we generated the HeLa-LoxP cell line by stably integrating the LoxP target sequence using a previously devel- oped Sleeping Beauty strategy (Izsvak et al., 2009) and grown under blasticidin selection. To create the EGFP-containing HeLa cell line, the pExT plasmid

(Addgene# 36889) (Tasic et al., 2012) was stably transfected into HeLa cells and kept under hygromycin selection.

2.4 Western Blot Analysis

Two to four T75 flasks of HeLa (4x106/flask) were transiently transfected with 5 µg of plasmid per flask, using Lipofectamine plus (Life Technologies) fol- lowing the manufacturer’s recommended protocol. Cells were harvested 24 or

48 hours post-transfection using total lysis buffer, (50 mM Tris, 150mM NaCl, 10 mM EDTA, 0.5% Triton X 0.5%, pH=7.2). Protein samples were sonicated three times for 10 seconds using a Microson Ultrasonic Cell Disrupter, and incubated on ice between each sonication. The protein concentrations of the lysates were determined using the Bradford protein assay (BioRad #500-0006) using a Bovine

Serum Albumin (BSA) standard curve. Equivalent amounts of protein for each sample were mixed with standard Laemmli buffer plus 2-β-mercaptoethanol and boiled for 15 min. A total of 5 µg of the protein extracts were electrophoresed on

NuPage Bis-Tris gels (Invitrogen/Thermo Scientific) and transferred to a nitrocel- lulose membrane using the iBlot gel transfer system (Invitrogen/ /Thermo Scien- tific) for 8 minutes, using the P3 setting. Membranes were blocked for at least

83 one hour in PBS pH 7.4, 0.05% Tween 20, 5% non-fat dry milk (BioRad) at room temperature. The membrane was incubated with primary antibody overnight at

4ºC. Custom polyclonal rabbit antibodies were generated against amino acids

159–172 of the mouse L1 ORF2 endonuclease domain (deHaro et al., 2014), or a primary monoclonal antibody against the endonuclease of L1 ORF2 was used

(Sokolowski et al., 2014). The secondary HRP-donkey anti-rabbit (Santa Cruz

Biotechnology Inc; sc-2317) or goat anti mouse (Santa Cruz sc-2317 and Santa

Cruz sc-2020) was diluted to 1:5000 in PBS pH 7.4, 0.05% Tween 20, 3% non- fat dry milk (Biorad) and incubated for at least 1 h at room temperature. Signals were detected using the SuperSignalWest Pico Chemiluminescent Substrate

(Pierce, Rockford, IL). The protein standard was visualized using the precision protein StrepTactin-HRP conjugate (BioRad).

2.5 Motif instances

The identification of instances and respective locations of motifs was im- plemented in python. The sequence of interest and the reverse complement were identified using the regex python library against the human hg19 reference se- quence with a maximum of 1,2 or 3 errors as indicated. Distances to the closest

L1-endonuclease site were calculated using the ‘closest’ function available from

BEDtools (Quinlan and Hall, 2010).

84

Chapter 3. The effect of the multimeric DNA binding proteins, TZ and LZ and Cre on redirecting ORF2p targeting capabilities

3.1. Introduction

DNA binding domains allow certain cellular proteins to interact with specif- ic genomic sequences. Site-specific recognition is conferred through including discrete tertiary structures present in the properly folded protein such as leucine zippers, zinc fingers, zinc knuckles, and SAND motifs (Bottomley et al., 2001).

Sequence recognition capabilities can be dictated by the direct interaction of ei- ther one protein unit (monomeric) or by the creation of a recognizing unit formed by several protein units (multimeric). Usually when multimeric binding proteins have the ability to form complex quaternary structures, this multimerization is a requirement for proper recognition and binding of the target DNA sequence

(Chen and Pirrotta, 1993). Proteins that require multimerization to impart target- ing and binding specificity can be composed of the same protein (homo- polymeric), or different protein units (hetero-polymeric) [reviewed in (Hudson and

Ortlund, 2014)].

Previous studies demonstrate that integrating DNA binding domains into proteins can redirect the interactions of the engineered fusion protein to selected

DNA sequences. For example, both the yeast-two-hybrid and the mammalian- two-hybrid system rely on the interaction of fusion proteins to determine potential

85 protein-protein interactions (Fields and Song, 1989; Luo et al., 1997). Another example of activation domains fused to a DNA binding domain utilizes the

CRISPR/Cas9 targeting system. In this system, the endonuclease deficient vari- ant of the Cas9 protein, dCas9, is fused to the VP64 activation domain in order to drive expression of VEGFA (Maeder et al., 2013). In this chapter, we present the data from evaluating the effect of fusing the Adeno Associated Virus (AAV) Rep proteins and the Cre recombinase to the L1 ORF2 protein on redirecting ORF2p targeting capabilities.

The Adeno Associated Virus (AAV) genome contains two open reading frames, one of which is the REP gene. The Rep gene codes for four proteins:

Rep78, Rep68, Rep52, and Rep40. The two largest Rep proteins, Rep78 and

Rep68, stimulate replication and have endonuclease, helicase, and ATPase ac- tivities (Tratschin et al., 1984; Im and Muzyczka, 1990; Ni et al., 1994; Ward et al., 1994; Wonderling et al., 1995). These proteins bind to the specific Rep recognition sequences (RRS) located in inverted terminal repeats in the viral ge- nome (Im and Muzyczka, 1989; Owens et al., 1993; Linden et al., 1996). Target- ed binding occurs through multimerization of the Rep proteins (Im and Muzyczka,

1989; Owens et al., 1993; Smith et al., 1997), possibly as hexamers (Smith et al.,

1997). The human genome contains a similar RRS sequence, located in what is referred to as the AAVS1 region on chromosome 19, which can interact with the large viral Rep proteins (Weitzman et al., 1994) and likely mediates targeted in- tegration of the AAV virus at this locus.

86

Fusion of the Rep78 protein N-terminus to GCN4-based multimerization domains allowed for two novel DNA binding domains to be used in engineering proteins with DNA targeting capabilities (Owens et al., 1993; Cathomen et al.,

2000). The wild-type leucine zipper (LZ) forms a dimer, and the engineered leu- cine zipper (TZ) functions as a tetramer (Waterman et al., 1996) (Figure 16). Us- ing both a one-hybrid system and electromobility shift assays, the authors demonstrated that the dimerization and tetramerization of LZ and TZ, respective- ly, was a critical requirement for the Rep fusion proteins to bind to the RRS se- quence.

87

Figure 16. The Rep proteins TZ and LZ form obligate dimers and tetramers, respectively, in order to bind their genomic target sequence.

Both LZ and TZ bind to a 16 bp sequence in the AAVS1 locus, located on chro- mosome 19 in the human genome. The wild-type leucine zipper LZ forms a di- mer, while the engineered TZ domain forms a tetramer to target the AAVS1 lo- cus.

88

When fused to the N-terminus Sleeping Beauty (SB) transposase, the engineered LZ and TZ DNA binding domains redirected the insertions of the SB transposon (Ammar et al., 2012). Both the LZ- and TZ- transposase fusions were able to drive the SB transposon to integrate into the genome, but at a lower rate than the wild-type SB (Ammar et al., 2012). Additionally, using a mammali- an one-hybrid system, the authors demonstrated that both fusion proteins were able to bind to the RRS motif. These observations confirmed that fusing the TZ or LZ Rep proteins to the N-terminus of the SB transposase did not abolish either transposon activities or RRS binding capabilities. Using an inter-plasmid trans- position assay, the authors demonstrated that there was a 15-fold increase in SB transposition events that occurred into a particular TA dinucleotide sequence lo- cated approximately 700 bp downstream of the RRS target sequence. The au- thors found that TZ and LZ-SB fusions promoted a 2.7-fold increase in SB inser- tions near genomic RRS sequences, indicating the potential targeting capabilities of these DNA binding domains. These results are highly suggestive that the Rep proteins were able to redirect the SB insertion preference.

Cre recombinase is well characterized for its ability to specifically bind a target sequence and is a widely used tool for implementing and evaluating tar- geted mutation and integration strategies. The ~39 kDa Cre recombinase protein is derived from the P1 bacteriophage and recognizes a 34 bp recognition se- quence, known as LoxP (Hamilton and Abremski, 1984). A LoxP sequence is composed of two 13 bp palindromic Cre binding sites, separated by an 8 bp spacer sequence (Figure 17). One Cre monomer binds to an individual palin-

89 dromic repeat within the LoxP sequence, resulting in two Cre recombinase pro- teins bound to a single LoxP sequence. Successful recombination occurs when two distinct LoxP sites are brought together, forming a Cre recombinase protein tetramer. Through this interaction of four recombinase molecules and two LoxP sequences, recombination occurs within the 8 bp spacer region of each LoxP se- quence (Voziyanov et al., 1999).

90

Figure17. Schematic of two Cre recombinase molecules binding to a single LoxP target site in the genome.

Two Cre recombinase monomers bind to the palindromic repeats within LoxP sequences. Successful recombination events occur when two LoxP sequences are brought together, forming a Cre recombinase tetramer. Recombination oc- curs in the 8 base pairs between the two palindromic repeats.

91

Soon after the introduction of in vivo Cre-mediated recombination and de- letion events, the need for conditional gene knock-outs arose. The role of a de- leted gene cannot be thoroughly evaluated if its complete absence results in em- bryonic lethality. To mitigate these issues, researchers developed conditional knockouts, where transcription of Cre recombinase is driven by a cell or tissue specific promoter (Tsien et al., 1996; Takeda et al., 1998). This strategy results in tissue specific knock-outs of a gene of interest, while maintaining the expres- sion in tissues critical for embryonic survival.

Researchers developed several methods of inducing genetic recombina- tion, deletion, and insertion events temporally using modified Cre recombinase strategies. The first method relies on fusing the Cre recombinase protein with the ligand-binding domain a modified estrogen receptor (ER). This modification allows for the estrogen receptor to only be activated in the presence of synthetic

ER ligands, such as tamoxifen (Feil et al., 1997; Indra et al., 1999; Metzger and

Chambon, 2001). In these experiments, Cre-ER fusion proteins remain inactive until treatment with tamoxifen, or its active metabolite. When Cre-ER binds to its ligand, the fusion protein is transported into the nucleus and is able to interact with LoxP flanked sequences. Therefore, genetic recombination events will occur only when the specific compound of interest is administered. A more complete review of Cre-recombinase methods, strategies, applications, and limitations can be found in the review articles by (Murray et al., 2012; Van Duyne, 2015).

In this study, we elected to evaluate if fusing the LZ Rep protein, TZ Rep protein, or Cre recombinase to the N-terminus of the L1 ORF2 protein is capable

92 of redirecting Alu insertions to the target site. We chose to test the Rep proteins for two reasons. First, the fact that previous studies showed that these DNA binding domains enriched for SB transposition events near the RRS sequence

(Ammar et al., 2012). Secondly, these proteins are non-pathogenic and can be utilized in a variety of cell lines (Flotte et al., 1994; Kaplitt et al., 1994; Russell et al., 1994). Finally, we chose to evaluate Cre fused to both the N- and C-terminus of the ORF2 protein because Cre-recombinase methods have been the gold standard in genome editing techniques for the past few decades (for comprehen- sive reviews, see (Murray et al., 2012; Van Duyne, 2015).

3.2. Results

3.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion proteins.

To test the ability of TZ-ORF2, LZ-ORF2, C- and N- Cre-ORF2 fusion pro- teins to drive retrotransposition, the constructs were co-transfected with an Alu tagged with the neoTET cassette (Dewannieux et al., 2003) to determine Alu re- trotransposition in HeLa cells or the HeLa-LoxP cell line (see Materials and

Methods for details). The ORF2 expression construct (pBudORF2CH) was used as control and as reference to evaluate retrotransposition efficiency of the ORF2 fusion proteins. All fusion proteins were able to support Alu retrotransposition.

However, only the N-Cre-ORF2 and C-Cre-ORF2 fusion constructs supported

Alu retrotransposition with efficiencies comparable to the type ORF2 construct

(no significant difference P= 0.6974, P= 0.9881, respectively, Table 1). Both TZ-

ORF2 and LZ-ORF2 fusion constructs has significantly lower retrotransposition

93 rates when compared to the ORF2 control (Table 1, P< 0.05). These data are suggestive that fusing TZ, LZ, or Cre proteins to the N-or C-terminus of the L1

ORF2 protein does not hinder the ability for ORF2 to drive Alu retrotransposition events.

Table 1: Relative retrotransposition rates of Alu when driven by ORF2 or by the indicated fusion protein. ORF2/fusion protein Cell line Retrotransposition Relative to rate ORF2 (%) ORF2 (HeLa control) HeLa 8.93 ± 0.13 x104 100

LZ-ORF2 HeLa 9.46 ± 0.15 x104 106.0 ± 0.9#

TZ-ORF2 HeLa 9.49 ± 0.11 x104 106.3 ± 1.25#

ORF2 (HeLa LoxP HeLa LoxP 9.68 ± 0.19 x104 100 control) N-Cre-ORF2 HeLa LoxP 9.84 ± 0.13 x104 99.2 ± 1.9

C-Cre-ORF2 HeLa LoxP 9.67 ± 0.53 x104 99.9 ± 4.2

#: Significantly different from ORF2 control P< 0.05 (Un-paired Student T-test)

3.2.2 Expression of the ORF2 fusion proteins

One concern when creating recombinant proteins is the possibility that the fusion protein becomes unstable, i.e. degraded, or partially processed by cellular enzymes, limiting the ability to drive targeted Alu insertions. Western blot anal- yses were performed to determine the expression of ORF2-fusion proteins and evaluate the expected molecular weight. Expression analyses determined that

ORF2, TZ-ORF2, and LZ-ORF2 fusion constructs were expressed at the predicted sizes (149 kDa for ORF2, and 190 kDa for both TZ-ORF2 and LZ-ORF2) (Figure

94

18A). However, we were unable to detect N- and C-terminal Cre ORF2 fusion pro- teins at their expected size, ~190 kDa (C-Cre-ORF2: Figure 18B; N-Cre-ORF2: data not shown). Thus, it appears that fusing certain DNA binding proteins, such as Cre, to either terminus of the L1 ORF2 protein may alter stability of the fusion protein, as we were unable to visualize our C-terminally fused Cre fusion con- struct by Western blot analysis. However, ORF2 activity is still detected (Table

1), which suggests that the fusion proteins are not completely degraded. In- stead, processed products retaining ORF2 activity may exist, or it is possible that very little of the expressed full-length fusion protein is sufficient to efficiently drive

Alu retrotransposition.

95 A

B

Figure 18. Expression analysis of LZ-ORF2, TZ-ORF2, and C-Cre-ORF2 fu- sion proteins.

A. Evaluation of TZ-ORF2 and LZ-ORF2 fusion proteins. Bands of the expected size full-length TZ-ORF2 and LZ-ORF2 fusion proteins (~190 kDa) were ob- served. B. Evaluation of C-Cre-ORF2 fusion protein. We were unable to detect a band of the expected size (~190 kDa indicated by the red arrow) for the full-length C-Cre- ORF2 fusion protein.

96

3.2.3 Evaluation of the capability of the ORF2 -fusion proteins to redirect inser- tion preference of the Alu.

To evaluate targeting capability of these fusion proteins, we used our pre- viously described Alu rescue assay to recover and determine the location of the de novo Alu insertions, (Wagstaff et al., 2012; Ade and Roy-Engel, 2016). We recovered 75 Alu insertions driven by LZ-ORF2 and 14 insertions driven by TZ-

ORF2 (Figure 19). All insertions showed the signature characteristics of Alu re- trotransposition events (direct repeats of average length 14 bp, an A-tail and in- sertions occurred at the canonical endonuclease site (5’-TTAAAA-3’) (Wagstaff et al., 2012). Only two of the Alu inserts driven by LZ-ORF2 inserted into chro- mosome 19 (where the AAVS target site is located). However, the inserts were not located within 10 kb of the target sequence and no potential AAVS1 target sequences were recovered. Detailed genomic locations of these insertions are listed in Appendix Tables 1 and 2.

97

Figure 19. Histogram of the chromosomal distribution of genomic locations of the recovered of Alu inserts driven by the TZ-ORF2 and LZ-ORF2 fusion proteins.

We recovered 75 insertions driven by LZ-ORF2 and 14 insertions driven by TZ- ORF2. Only two recovered Alu insertions inserted in chromosome 19, which contains the AAVS1 target site. However the two inserts were in completely unre- lated locations in chromosome 19. The X axis corresponds to the chromosome number in the human reference genome, while the Y axis represents how many Alu insertions inserted in each chromosome. Column color represents the re- covered Alu inserts driven by TZ-ORF2 (orange) and LZ-ORF2 (blue).

98

Although expression of the full-length fusion protein was not observed, we recovered Alu inserts driven by C-Cre-ORF2 and N-Cre-ORF2 fusion constructs to verify they represented bona fide retrotransposition events. We recovered 11

Alu insertions driven by the two Cre-ORF2 fusion proteins. Alu inserts showed the signature characteristics of retrotransposition, as previously described. De- tailed genomic locations are listed in Appendix 3. Analysis of genomic sequence flanking the insertion verified that the LoxP target sequence was absent at the site of insertion or near its immediate vicinity.

We further extended our evaluation of the flanking genomic region to 5 kb upstream and 5 kb downstream of the insertion site to assess if any Alu elements inserted near sequences sharing similarity (allowing for 3 mismatches) to the

LoxP target site or the AAVS1 site. None of the genomic locations contained any sequences that resembled the AAVS1 locus (when analyzing the TZ and LZ

ORF2 fusion proteins) or the LoxP target sequences (for Alu insertions driven by

N-Cre-ORF2 or C-Cre-ORF2) within that 10 kb span of genomic DNA. Overall, there was no indication that the fused DNA binding domains evaluated altered the insertion preference of Alu elements.

3.3. Discussion

Fusion of the ORF2 with the AAV Rep proteins showed no capability to redirect insertions of Alu elements to the AAVS1 site. However, previous work showed that these proteins, when fused to the DNA transposon Sleeping Beauty,

99 enriched for SB insertions near the AAVS1 locus (Ammar et al., 2012). A poten- tial explanation for the difference may stem from the fact that the two AAV Rep proteins contain GCN4-based oligomerization domains. A requisite for proper binding of the Rep protein is the formation of multimers: a dimer for LZ and te- tramer for TZ (Owens et al., 1993; Cathomen et al., 2000). We propose that LZ and TZ are likely able to form their obligate multimers when fused to the Sleeping

Beauty transposon without interfering with the insertion ability of Sleeping Beauty as this transposon has been proposed to function and bind to DNA as a multimer

(Izsvak et al., 2002). Thus, it is possible that in this case the multimerization needed for targeting may not interfere with Sleeping Beauty function.

Similarly, the Cre-recombinase protein, which also functions as a multi- mer, was unable to enrich Alu insertions near LoxP sites when fused to either the

N-or C-terminus of the L1 ORF2 protein. Like the Rep proteins, Cre functions as a multimer in order to effectively target the LoxP sequence, with some studies estimating that Cre is able to target the LoxP site as a hexamer (Smith et al.,

1997). The multimerization requirement of Cre monomers bound to a single

LoxP target site might have had an adverse effect on the ability of the L1 ORF2 fusion proteins to drive Alu insertions. However, an additional concern is that we were unable to detect full-length C-Cre-ORF2 fusion proteins by Western blot analysis (Figure 18B). In this case, the lack of a full length fusion protein is the most likely explanation for the inability of Cre-ORF2 fusion proteins to target Alu inserts to a genomic location. Our data suggest that some engineered fusion proteins might be more susceptible to processing, losing their ability to success-

100 fully confer targeting. Therefore, these constructs might be unable to generate stable fusion constructs between the Cre-recombinase molecule and the L1

ORF2 protein. However, even if some full length Cre-ORF2 fusion protein was generated, its requirement for multimerization may not be compatible with ORF2 function.

We propose a model where the multimerization requirements of TZ, LZ and Cre molecules at an individual target site might hinder the L1 ORF2 protein from driving Alu retrotransposition events near the AAVS1 or LoxP target sites

(Figure 20). As stated previously, the success of fusing a Rep protein to the

DNA transposon Sleeping Beauty could have been due to both proteins being obligate multimers. We speculate that the reason these three ORF2 fusion pro- teins were unable to enrich for Alu insertions near the respective target sequence is that multimerization of the DNA binding domain interfered with the ability of the fused ORF2 protein to drive Alu retrotransposition. Only when the ORF2 inter- acted with the DNA in its monomeric form was it able to successfully drive Alu retrotransposition. Because the monomeric form of the DNA binding protein is unable to bind to the selected sequence, targeted Alu insertions would not occur.

From these data, we decided to focus on utilizing DNA binding domains that function and bind to target sequences successfully as monomers.

101

A

B

Figure 20. Potential model of why TZ, LZ, and Cre ORF2 fusion proteins were unable to drive targeted Alu insertions.

A. Multimers of the fusion protein would interact with target site but would be un- able to drive insertions. We propose that if the LZ (blue), TZ (orange), or Cre (red) DNA binding proteins were able to form their obligate multimers and bind to the appropriate target site, the L1 ORF2 protein may not be unable function or interact with the flanking genomic DNA sequences, thereby inhibiting targeted Alu insertions. B. Monomeric ORF2 fusion proteins would drive Alu insertions but would lose the capability to interact with a target site. In this scenario, the ORF2 portion of the fusion protein interacts with the genomic DNA to drive Alu insertions, but does not allow for the obligate multimerization of the TZ, LZ, or Cre recombinase to occur. Therefore, targeted Alu retrotransposition events would not occur.

102

Chapter 4. The Transcription Activator-like Effector: The effect of monomeric DNA binding domains with a few genomic target se- quences on targeting capabilities.

TAL Effector that targets the AAVS1 locus on chromosome 19 fused to the

N terminus of the L1 ORF2 proteins.

4.1. Introduction

The transcription activator-like (TAL) effectors were discovered in the ge- nomes of Xanthomas bacteria [reviewed in (Boch and Bonas, 2010). TAL effec- tors are secreted proteins that modify the genomes of the plant species these bacteria colonize (Bogdanove et al., 2010; Boch and Bonas, 2010). The natural- ly occurring TAL effectors contain a variable number of monomeric repeats, con- sisting of anywhere between 1.5 to 33.5 monomers, each composed of 34 amino acids (Boch and Bonas, 2010). Each individual monomer is responsible for bind- ing a single base pair, which is determined by two amino acid residues present at the 12th and 13th positions in the monomer, termed repeat variable diresidues

(RVDs): NI = A, HD = C, NG = T, NN = G or A (Boch et al., 2009; Moscou and

Bogdanove, 2009) . The combination of individual monomers forms the TAL ef- fector which is capable of highly specific targeting (Boch et al., 2009).

103

The ability of TAL effectors to specifically interact with selected sequences led researchers to develop strategies for modifying and designing TAL effectors that can be used for a variety of genetic engineering purposes. The binding sites of natural TAL effectors always begin with a thymidine residue. Therefore, engi- neered TAL effectors must begin with a thymidine (Moscou and Bogdanove,

2009; Boch et al., 2009). This appears to be the only limitation of designing TAL effectors, making them highly versatile for targeting a large variety of sequences.

In fact, a recent study developed TAL effectors that could target over 18,000 pro- tein coding genes in the human genome (Kim et al., 2013). TAL effector flexibil- ity has been exploited to perform genomic editing. For example, TAL effectors have been fused to endonucleases to form what is known as TALENs. One suc- cessful TALEN was designed to disrupt the CCR5 gene (Mussolino et al., 2011), while a different approach inserted genes required for maintaining pluripotent stem cells into the human genome (Mussolino et al., 2011; Hockemeyer et al.,

2011).

We selected a previously tested TAL effector designed to target the

AAVS1 locus on human chromosome 19 (Sanjana et al., 2012). We selected to use this TAL effector locus for multiple reasons. The first is that the AAVS1 site might be a safe harbor for genome manipulation, as previous studies have uti- lized this site for genomic engineering with both AAV and TALEN strategies

(Hockemeyer et al., 2011; Ammar et al., 2012). Additionally, the selected TAL effector targets a site that is surrounded by multiple A-rich regions of DNA, which could function as L1 ORF2 endonuclease sites. This attribute is critical, as

104 cleavage DNA by the L1 ORF2 protein is required for TPRT and retrotransposi- tion. To test this TAL effector, we fused it to the N-terminus of the L1 ORF2 pro- tein, connected by a glycine helical linker (GHL) (Voigt et al., 2012) (Figure 21).

105

Figure 21. Schematic of TAL-ORF2 fusion protein.

The AAVS1 targeting TAL effector was fused to the N-terminus of the ORF2 pro- tein, connected by a glycine helical linker (GHL). The target sequence in the AAVS1 site in chromosome 19 is shown.

106

4.2. Results

4.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion proteins.

To evaluate the ability of the Tal-ORF2 fusion protein to drive retrotransposition, we co-transfected either the ORF2 or the TAL-ORF2 fusion protein with an Alu tagged with the neoTET cassette (Dewannieux et al., 2003). In our tissue culture assay system, the ORF2 with the N-terminally fused TAL effector supported Alu retrotransposition with efficiencies comparable to the ORF2p alone (Table 2).

Table 2. Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally fused TAL effector. ORF2/fusion Retrotransposition rate Relative to ORF2 (%) protein ORF2 Control 8.93 ± 0.13 x104 100

TAL-ORF2 8.89 ± 0.22 x104 99.6 ± 2.4

There was no significant difference in retrotransposition between the TAL-ORF2 protein and the L1 ORF2 protein control P=0.4228 (Un-paired Student T-test).

4.2.2 Expression of the TAL-ORF2 fusion protein

Western blot analyses determined that the ~250 kDa fusion protein was ex- pressed at the predicted size (Figure 22). From these two results, we determined that fusing this TAL effector to the N-terminus of the L1 ORF2 protein did not hinder the ability of ORF2 to drive Alu retrotransposition events, nor did it appear to have a nega- tively impact on the stability of the fusion protein.

107

TAL ORF2

ORF2

Figure 22. Expression analysis of the TAL-ORF2 fusion protein.

Western blot analysis of the Tal-ORF2 fusion protein transiently expressed in HeLa cells. The ORF2 protein was visualized at approximately 149 kDa (gray ar- row). The TAL-ORF2 fusion protein was visualized at approximately 250 kDa (brown arrow).

108

4.2.3 Evaluation of the capability of the TAL-ORF2 -fusion proteins to redirect insertion preference of the Alu.

We used the Alu rescue assay to determine the genomic locations of Alu in- serts driven by the Tal-ORF2 fusion protein (Wagstaff et al., 2012). We recovered 42

Alu insertions ((Figure 23) detailed data in Appendix Table 4). No Alu elements were recovered that inserted into chromosome 19, where the AAVS1 locus is lo- cated. We evaluated the immediate flanking genomic regions of all recovered

Alu inserts (5 kb upstream of the insertions, 5 kb downstream) to assess if any inserted near sequences sharing similarity to the AAVS1 target site. We found two insertions that shared between 60% and 70% sequences similarity to the

AAVS1 locus. Interestingly, these two recovered Alu elements inserted 38 bp and 140 bp away from the potential target site (Table 3). As we were only able to recover two “potential” targeted Alu insertions (which were not a 100% match to the AAVS1 locus), we consider that the TAL-ORF2 fusion had no significant ef- fect on altering insertion preference of the tagged Alu.

Table 3. Two potential Alu targeting events when Alu is driven by TAL-ORF2. Chr. Potential Target % Match Distance to Tar- Away from get Alu 1 1 AATGGGTTGGAGGAG-CA 72.2% 38 bp

2 20 TGTGGG-TGGAGTGAGGG 66.7% 140 bp

The TAL effector targets sequence: TGTGGGGTGGAGGGGACA Red letters indicate a match to the target sequence. Dashes represent a missing base pair.

109

Figure 23. Distribution of Alu insertions driven by the TAL-ORF2 fusion protein.

We recovered 42 Alu insertions in total. No recovered Alu elements inserted in chromosome 19, the location of the AAVS1 target site (red box).

110

4.3 Discussion

Fusing a TAL effector to the N-terminus of the L1 ORF2 protein did not enrich for insertions near the AAVS1 locus in chromosome 19. Although we did not recover any insertions that inserted within chromosome 19, we did recover two Alu insertions within 200 bp of plausible target sites, containing multiple mismatches. One conclusion from this strategy is that there are likely multiple factors to consider when designing and implementing targeting strategies. This

TAL effector is able to target the AAVS1 locus as a monomer (it eliminates the potential negative effect that multimerization may have on ORF2 targeting)

(Chapter 3). Alternatively, the frequency of available sites in the genome might play an important role in determining the targeting capabilities of an engineered targeting system (as will be demonstrated in Chapter 6). Therefore, we postulate that our strategy may have been more successful in enriching Alu insertions near a target sequence when the target site is highly abundant in the human genome.

111

Chapter 5. Zinc finger proteins: The effect of monomeric DNA binding domains with a few genomic target sequences on target- ing capabilities

Zinc Fingers targeting the EGFP locus

5.1 Introduction

Zinc finger technologies were one of the first techniques for genome engi- neering employed by researchers. This is in part due to the abundance of zinc fingers already present in eukaryotic genomes (Gaj et al., 2013). The Cys2His2 zinc finger proteins were first identified in the structure of the transcription factor

TFIIIA and were the first to be evaluated for function and structure (Miller et al.,

1985; Brown et al., 1985). Analyses of the crystal structure showed that an indi- vidual zinc finger is approximately 30 amino acids long, and has a canonical ββα configuration to mediate the interaction between the zinc finger protein and the target DNA (Pavletich and Pabo, 1991; Beerli et al., 1998; Beerli and Barbas, III,

2002). The specificity of a given ZF protein is determined by the amino acid resi- dues present at what is known as ‘the fingertip’ of the protein (Wolfe et al., 2000;

Laity et al., 2001). Each zinc finger protein recognizes a 3 bp target sequence, and successful targeting events require zinc fingers to bind a DNA sequence that is 9 to 18 bp in length. Therefore, most zinc fingers are engineered to have be- tween 3 to 6 domains to effectively target a unique DNA sequence in a genome

112

(Beerli et al., 1998; Beerli et al., 2000; Gaj et al., 2013). Due to the targeting abil- ity of zinc fingers, multiple strategies have been designed and implemented to simplify the process of creating novel zinc fingers.

Zinc finger proteins cannot be used as genetic engineering strategies by themselves, as their main role is to mediate DNA binding. In other words, zinc finger proteins do not contain the catalytic activities required to stimulate genomic modification. Thus, the majority of the zinc finger proteins are fused to functional domains for the creation of molecular tools. For example, an endonuclease do- main (or entire protein) is added to the zinc finger in order to achieve targeted

DNA cleavage. This concept was first demonstrated with the FokI cleavage do- main, which does not cleave DNA sequences specifically when separated from the FokI DNA binding domain (Li et al 1992). Chandrasegaran’s lab showed that the FokI binding domain could be swapped out with other targeting domains, re- directing FokI-mediated cleavage (Li et al., 1992; Kim and Chandrasegaran,

1994; Kim et al., 1996; Kim et al., 1998). This discovery drove the creation of engineered zinc finger nucleases designed to perform targeted gene mutagene- sis and gene replacement events at specific loci in Drosophila soma and germline (Bibikova et al., 2001; Bibikova et al., 2002; Bibikova et al., 2003). After their successful implementation, several methods for constructing these zinc- finger nucleases were developed in order to easily target unique locations in the genome. As of 2013, individual zinc fingers that recognize nearly all of the 64 possible nucleotide triplets in the genome have been constructed, allowing for

113 the creation of zinc fingers that target almost any sequence in the human ge- nome (Gaj et al., 2013).

These developments lead to several successful genetic manipulations in both human cells and model organisms. For example, the CCR5 and T-cell re- ceptor (TCR) functions were disrupted in human cell lines using zinc-finger nu- clease technologies (Perez et al., 2008; Holt et al., 2010; Gaj et al., 2012). Fur- thermore, zinc-finger nucleases were utilized to successfully integrate the VEGF-

A gene and correct the function of several others in human cell lines (Maeder et al., 2008; Gaj et al., 2013). In general, these particular nucleases can be de- signed and modified with relative ease, and can be employed for multiple strate- gies. Due to their wide range of uses and flexibility of design, we decided to uti- lize two zinc fingers, ZF2.17 and ZF 2.18 previously designed to target a sequence in the engineered green fluorescence protein (EGFP) [kind gift Dr. Zoltan Ivics, unpublished data]. Each zinc finger is deigned to target 9 bp in the EGFP gene:

ZF2.17 targets the sequence 5’-GAGGACGGC-3’, and ZF2.18 targets the sequence

5’-ATCCGCCAC-3. These two proteins can be linked together to form a 6- fingered zinc finger targeting an 18 bp sequence in EGFP, 5’-

ATCCGCCACnnnnnnGAGGACGGC-3’. As previously performed with other

DNA binding domains, we fused these zinc fingers to the N-terminus of the L1

ORF2 protein (Figure 12) connected by a glycine helical linker (GHL) (Voigt et al., 2012).

114

5.2 Results

5.2.1 Evaluation of the retrotransposition capability of the ORF2-fusion proteins.

As previously performed, we evaluated the capability of the ORF2-fusion to drive Alu retrotransposition with one modification. We first introduced the

EGFP sequence into the HeLa genome using a previously described Sleeping

Beauty transposition strategy (details in Chapter 2) to provide the selected target- ing sequence in the HeLa genome. The new EGFP containing HeLa cell line is referred to as HeLa EGFP. Using this cell line, we compared the retrotransposi- tion rates of Alu driven by either L1 ORF2 or the zinc finger-ORF2 fusion pro- teins. Our results show that the N-terminally fused zinc fingers were able to drive

Alu retrotransposition events. However, two out of the three EGFP-targeting fu- sion proteins were not comparable to the ORF2 construct. Using our retrotrans- position assay, both the ZF2.18-ORF2 and ZF2.1817 -ORF2 constructs supported lower levels of Alu retrotransposition when compared to ORF2 alone (Table 4).

Table 4. Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally fused EGFP targeting zinc fingers. ORF2/fusion Cell line Retrotransposition Relative to ORF2 (%) protein rate ORF2 Control HeLa EGFP 8.93 ± 0.13 x104 100

4 ZF2.17-ORF2 HeLa EGFP 8.89 ± 0.22 x10 119.9 ± 23.8

4 ZF2.18-ORF2 HeLa EGFP 8.93 ± 0.13 x10 90.5 ± 3.9 #

4 ZF2.17 2.18- ORF2 HeLa EGFP 8.89 ± 0.22 x10 82.4 ± 11.7#

#: Significantly different from ORF2 control P< 0.05 (Paired Student T-test). P=0.18 for ZF2.17-ORF2 constructs.

115

Although multiple attempts were performed, we were unable to detect the full-length fusion proteins by Western blot analysis (data not shown). From these two results, we determined that these fusion proteins retained the ability to drive Alu retrotransposition events, but the resulting fusion proteins might not be stable, and therefore, not be capable of driving targeted Alu insertions.

5.2.2 Evaluation of the capability of the ORF2-fusion proteins to redirect insertion preference of the Alu.

We proceeded to recover Alu inserts driven by these fusion proteins using our previously described Alu rescue assay (Wagstaff et al., 2012). We recovered 14 Alu insertions driven by ZF2.17-ORF2, 12 insertions driven by the ZF2.18-ORF2 fusion protein, and 11 insertions driven by the ZF2.1817 -ORF2 construct (detailed data in

Appendix Table 5). We were unable to identify the EGFP sequence within 700 bp of the Alu insertion (the average length of a sequencing read). In addition, none of the recovered flanking sequences (within 10 kb of the Alu insert) remote- ly resembled the EGFP target sequences. Therefore, we determined that this particular strategy was unable to enrich Alu insertions near the EGFP target site.

5.3 Discussion

There are several potential explanations as to why fusing these 3-fingered zinc fingers to the N-terminus of the ORF2 protein failed. Although these zinc finger proteins had been previously shown to target the EGFP sequence (Zoltan

Ivics personal communication), it is possible that in the fusion context, targeting

116 is hindered. Furthermore, a second reason for the lack of targeting may be that these fusion proteins are not stable (lack of detection by Western blot analyses).

If these three fusion proteins are processed prior to retrotransposition events, it is very unlikely the processed fusion proteins are able to support targeted Alu re- trotransposition events. Therefore, the stability of these fusion proteins needs to be taken into consideration when engineering designer fusion proteins.

Lastly, the number of available target sites available might also have a key role in the success of fusion protein strategies. We generated a HeLa cell line that contained at least one perfect EGFP target sequence per cell. However, the low frequency of target sequence may represent a challenge as it forces the zinc finger-ORF2 fusion protein to scan a whole genome to encounter the site. Hav- ing a minimal number of target sites can be ideal, especially for strategies where off-target effects can be deleterious. However, having a DNA binding domain that can target a site present multiple times in the genome might be more effec- tive. This can be seen in the naturally occurring zinc fingers in the Bombyx mori retrotransposon protein, which target a highly repetitive sequence in the genome

(rDNA) (Burke et al., 1987). With these data, along with the results in Chapters 4 and 6, we propose that the number of available target sites is also key factor in determining the success of a fusion protein strategy.

117

Chapter 6. N-ZF4: A 6-fingered zinc finger that targets multiple sites in the genome enriches for Alu insertions when fused to the N-terminus of ORF2

6.1. Introduction

Several non-LTR retrotransposons contain DNA binding domains in their

N-terminus, such as zinc fingers. These DNA binding domains allow for targeted integration of the retrotransposon into predictable genomic locations. For exam- ple, the non-LTR retrotransposon R2, isolated from Bombyx mori, contains a zinc finger in its N-terminus, conferring site-specific insertions into the 28S rDNA in the genome. If not for the consistent, specific insertions into the 28S ribosomal

DNA sequences, the field would not have the model of Target Primed Reverse

Transcription (TPRT) (Luan and Eickbush, 1995; George et al., 1996;

Christensen et al., 2005; Christensen et al., 2006).

One explanation for the targeting efficiency of these retrotransposons is that these elements evolved to target repetitive sequences in the genome. By targeting repetitive sequences, site-specific retrotransposons will encounter a large number of available insertion sites. There are hundreds of tandem copies of the 28S rDNA sequence that the retroelement R2 targets in the human ge- nome (Lander et al., 2001). Retrotransposons containing a DNA binding domain

118 that target repetitive elements have two major evolutionary advantages. The first is that, due to the abundance of target sequences in the genome, de novo re- trotransposition events have many opportunities to insert into the correct se- quence. Additionally, cells will not lose the function of a critical gene if a re- trotransposon inserts into one of the copies, as there are several more unadul- terated gene sequences left. Therefore, de novo insertions have multiple oppor- tunities to insert into their target sequences, and cells are not irreparably harmed by de novo insertions.

In 2012 a group of researchers were able to redirect insertions of the DNA transposon Sleeping Beauty (SB) by fusing a six-finger zinc finger to the N- terminus of the SB transposon (Voigt et al., 2012). The zinc finger used (ZF4) targets an 18 bp sequence in the 3’ end of genomic L1 elements; which provides upwards of 15 thousand potential target sequences throughout the human ge- nome. When compared to the wild-type Sleeping Beauty construct, the addition of the ZF4 protein to the N-terminus increased targeted events 2.5-fold. A sec- ond benefit of targeting repetitive elements in the human genome is that these elements are not critical for cellular survival; therefore, repetitive elements could be a safe harbor for designing and implementing targeting strategies. In our pre- vious studies, zinc fingers targeting unique sequences (Chapter 5) failed to redi- rect insertional preference of Alu inserts possibly, due to the low probability of the fusion protein to localize the target site. In this chapter, we present the data evaluating zinc fingers that targeted repetitive sequences. Two zinc finger pro- teins were selected to fuse to the L1 ORF2 protein: ZF4 and ZF2. Both zinc fin-

119 gers bind to different sequences located in the 3’ end genomic L1 elements, providing a significant increase in the abundance of target sites.

6.2. Results

6.2.1 Evaluation of the retrotransposition capability of the ZF4-ORF2-fusion pro- teins.

ZF4 was the first zinc finger tested, which targeted an abundance genomic location, as it had been shown to successfully retarget SB insertions (Voigt et al.,

2012). ZF4 was fused to the N-terminus of ORF2 using the same glycine helical linker (GHL) as in the SB study. To evaluate the ability of N-ZF4-ORF2 to drive

Alu retrotransposition, we co-transfected either the ORF2 or N-ZF4-ORF2 fusion protein with an Alu tagged with the neoTET cassette (Dewannieux et al., 2003).

N-ZF4-ORF2 fusion protein was able to support Alu retrotransposition with effi- ciencies comparable to the ORF2 construct in HeLa cells (Table 5).

Table 5: Relative retrotransposition rates of Alu when driven by ORF2 or by the N-ZF4-ORF2 fusion protein. ORF2 fusion Cell Retrotransposition Relative to ORF2 line rate (%) ORF2 HeLa 8.93 ± 0.13 x10-4 100 N-ZF4-ORF2 HeLa 9.73 ± 0.33 x10-4 109.0 ± 3.4 No significant differences were observed (Unpaired Student T-test).

6.2.2 Expression of the ORF2 fusion proteins

120

One concern when creating recombinant proteins is that the fusion protein will not be stable, and will be partially or completely degraded within cells once transfected. If fusion proteins are not stable, they will be unable to drive targeted

Alu insertions into the genome. Western blot analyses were performed on cell extracts from transfected cells to evaluate the expression of ORF2-fusion pro- teins, and to determine if any processing occurred. Expression analyses deter- mined that proteins were being expressed from both the ORF2 and the N-ZF4-ORF2 fusion constructs, and the size of the bands observed corresponded with the predict- ed sizes of the protein (149 kDa and 170 kDa, respectively) (Figure 24). From these two results, the data are suggestive that fusing the 6-fingered zinc finger ZF4 to the N-terminus of the L1 ORF2 protein did not hinder the ability for ORF2 to drive

Alu retrotransposition events, and the protein appears to be stable (no evident processing).

121

Figure 24. Expression analysis of ORF2 and N-ZF4-ORF2 Western blot analysis of protein extracts of transiently transfected HeLa with ORF2 (149 kDa, blue arrow) and N-ZF4-ORF2 fusion protein (170 kDa, green arrow).

122

6.2.3 Evaluation of the capability of the ORF2 -fusion proteins to redirect inser- tion preference of the Alu.

We evaluated the targeting capability of the ZF4-ORF2 fusion protein us- ing our previously described Alu rescue assay to recover and determine the loca- tion of the de novo Alu insertions (Wagstaff et al., 2012; Ade and Roy-Engel,

2016). We recovered 117 Alu insertions driven by the N-ZF4-ORF2 fusion con- structs (Table 6). We compared the locations of the recovered insertions relative to the target site with the insertion sites of previously published Alu insertions driven by ORF2 (Wagstaff et al., 2012). One third of the recovered Alus inserted within a 10 kb window flanking the ZF4 target site (5 kb upstream, 5 kb down- stream), representing approximately a four-fold enrichment of Alu elements when compared to ORF2-driven inserts. We observed about a 13-fold enrichment of

Alu elements that inserted within a 4 kb window (2 kb upstream of the insert, 2kb downstream), and a 47-fold enrichment within a 1.2 kb window (600 bp up- stream, 600 bp downstream) (Table 6).

Table 6. Recovered Alu Retrotransposition events driven by the ORF2 and the N-ZF4-ORF2 fusion construct. ORF2/fusion Recovered 10kb 4kb 1.2kb protein Rescues Window Window Window ORF2 226 17 5 1 (100%) (7.5%) (2.2%) (0.44%) N-ZF4-ORF2 117 38 33 24 (100%) (32.5%) (28.2%) (20.5%) Fold En- richment N/A 4.3 12.8 47 Percentages are indicated directly below the rescue count.

123

In addition, the ZF4 was able to successfully target older L1 subfamilies that contained one mismatch in the ZF4 target sequence (5’-

GCCATAAAAAAGGATGAG-3’). The orientation and location of each recovered

Alu that inserted within a 10 kb window (5 kb upstream and 5 kb downstream) was plotted relative to the ZF4 target sequence that were driven by either the N-

ZF4 ORF2 fusion protein or ORF2 (Figure 25). Of the 117 Alu insertions driven by the N-ZF4-ORF2 fusion protein, 38 (32.5%) inserted within a 10 kb window,

33 (28.2%) inserted within a 4 kb window, and 24 (20.5%) inserted within a 1.2 kb window around the ZF4 target site (more data on sequence analyses are found in Appendix 6) .

124

Figure 25. Insertional distribution of Alu elements driven by ORF2 or the N- ZF4-ORF2 fusion construct that inserted within a 10 kb window around the target site.

The plots show the insertion sites of the 117 Alu insertions driven by the N-ZF4- ORF2 fusion construct and 226 previously characterized Alu insertions driven by ORF2. Triangles represent individual Alu inserts at the approximate site of inser- tion relative to the target site (black vertical line and representing “0” in the x- axis). The location of the L1 sequence is shown as a light gray arrow along the x-axis. Triangles above the axis represent insertions in the same orientation and triangles below represent insertions in the opposite orientation relative to the tar- get site. Red triangles represent inserts located within 0-599 bp of the target site, yellow triangles 600-1999 bp of the target sites and blue triangles 2000-5000 bp of the target site. There is a bias of more Alu insertions upstream of the target site. Statistical analysis confirms there is an enrichment of Alu insertions 2 kb upstream vs. 2 kb downstream of the ZF4 target site (Binomial test, P = 0.040861).

125

We observed an insertional bias of Alu elements, with more inserting up- stream of the ZF4 target site when driven by the N-ZF4-ORF2 fusion protein

(Figure 25). Statistical analysis shows a significant enrichment of Alu insertions

2 kb upstream vs. downstream of the ZF4 target site (Binomial test, P =

0.040861). There are two potential explanations for the observed insertional bi- as: introduction of conformational constraints by the presence of the ZF4 zinc finger at the N-terminus of ORF2, or differences in the availability of ORF2 endo- nuclease sites surrounding the target sequence.

First, the observed insertional bias could be a consequence of where and how the ZF4 is fused to the L1 ORF2 protein. Presence of ZF4 at the N-terminus of ORF2 could influence the insertion process, where the conformation of the bound zinc finger and GHL linker might position ORF2 to preferentially drive Alu insertions upstream of the ZF4 target site. To address this possibility, we created fusion proteins that contained the ZF4 at the C-terminus of the L1 ORF2 protein.

Furthermore, the flexibility of the linker could affect both the targeting capabilities and insertional bias of the Alu insertions. To address this, we selected two pre- viously published 25 amino acid long linkers, one flexible (FL4-

LSGGGGSGGGGSGGGGSGGGGSAAA) and one helical (HL4- LAEAAAKE-

AAAKEAAAKEAAAKAAA), which would alter the spacing and relationship of the zinc finger and ORF2 protein (Arai et al., 2001).

The second possible explanation for the observed insertional bias of Alu elements driven by our N-ZF4-ORF2 fusion protein is the availability and density of potential ORF2 endonuclease sites near the ZF4 target sequence. To address

126 this possibility, we determined the density of endonuclease sites around the ZF4 sequence in genomic L1 elements (Figure 26). The sequence of interest (endo- nuclease consensus 5’ TTAAAA-3’ and target site 5’-

GCCATAAAAAAGGATGAG-3’) and the reverse complement were identified us- ing the regex python library against the human hg19 reference sequence with a maximum of 2 errors (mismatches). Examination of the data indicates the pres- ence of a genomic region (approximately 300 bp), located directly downstream of the ZF4 target sequence, that shows a very low density of endonuclease sites, which we termed the ‘endonuclease desert’. Not surprisingly, this region com- prises the 3’ GC-rich region of genomic L1 elements and contains a lower than expected number of ORF2 endonuclease sites when compared to the expected, random, adjacent genomic sequences. Therefore, the lack of endonuclease sites downstream of the ZF4 target sequence could in part be an explanation for the observed insertional bias of Alu elements upstream of the ZF4 target se- quence.

To further address the observed insertional bias, we selected a second zinc finger, ZF2, which also targets a sequence in the 3’ end of L1, but is local- ized approximately 260 bp upstream and in the opposite orientation of the ZF4 target sequence (Voigt et al., 2012). By selecting a ZF with a different binding site, the target sequence is flanked by different sequences, altering the density of available ORF2 endonuclease sites. Additionally, the ‘endonuclease desert’

(identified in Figure 26), is no longer immediately adjacent to the ZF target se- quence.

127

Genomic flank

Figure 26. Density histogram of ORF2 endonuclease sites located within 2 kb of the ZF4 target sequence.

The sequence of interest (endonuclease consensus and target site) and the re- verse complement were identified using the regex python library against the hu- man hg19 reference sequence with a maximum of 2 errors (mismatches). Dis- tances to the closest L1-endonuclease site were calculated using the ‘closest’ function available from BEDtools. Distances were graphed using the histogram function in excel using 25 bp bins. An ‘endonuclease desert’ was observed di- rectly downstream of the ZF4 target site. Black arrow indicates the target site corresponding to “0” in the x-axis, negative numbers represent upstream loca- tions and positive numbers represent downstream locations from the target site. The end of genomic L1 elements is shown approximately 325 bp downstream of the ZF4 target site. The locations corresponding to the downstream genomic flank region is indicated.

128

6.2.4 Insertion bias observed due to structural conformation of the N-ZF4-ORF2 fusion protein

In order to evaluate the effect that linker and terminus selection has on the insertional bias, we created five new constructs. First, we cloned the ZF4 zinc finger and the GHL linker to the C-terminus of the ORF2 protein, creating C-ZF4-

ORF2 GHL. In addition we exchanged the GHL linker in the N-ZF4-ORF2 and C-

ZF4-ORF2 with the flexible linker FL4 or helical HL4 linkers, to create a total of six constructs utilizing the ZF4 zinc finger (Figure 13). All of the five new ZF4 fu- sion proteins were able to drive Alu retrotransposition. However, some show slightly lower efficiencies (Table 7), but none were significantly different to the

ORF2 controls (Un-paired Student T-test). Western blot analysis confirmed the expression of a protein with a size compatible with the expected size of the C-

ZF4-ORF2 GHL fusion, showing this fusion protein is likely stable when ex- pressed in mammalian cells (Figure 29, lane 1).

Table 7 Relative retrotransposition rates of Alu when driven by ORF2 or by N- terminally or C-terminal ZF4 fusion proteins with different linkers. ORF2/fusion protein Cell line Retrotransposi- Relative to ORF2 tion rate# (%) ORF2 Control HeLa 5.39± 4.43 x10-4 100 N-ZF4-ORF2 FL4 HeLa 3.88± 3.2 x10-4 69.6 ± 13.8 N-ZF4-ORF2 HL4 HeLa 4.47± 3.1 x10-4 86.0 ± 4.0 C-ZF4-ORF2 GHL* HeLa 3.69± 4.6 x10-4* 72.1 ± 2.4 C-ZF4-ORF2 FL4 HeLa 4.17± 3.1 x10-4 80.4 ± 7.7 C-ZF4-ORF2 HL4 HeLa 3.55± 2.8 x10-4 68.4 ± 2.7 #: Not significant difference from ORF2 controls P< 0.05 (Un-paired Student T- test). * Rate has been adjusted to reflect ORF2 controls used for linker data

129

Using the Alu rescue assay, we evaluated the targeting capability of the all of the new ZF4-ORF2 fusion proteins. We recovered between 47 and 108 Alu inserts (Table 8). All ZF4 fusion proteins showed some targeting when com- pared the ORF2 protein (P< 0.05, Fisher Exact Test). Details of recovered se- quences can be found in Appendix Tables 6-11.

Table 8. Frequency of Alu inserts driven by the N-ZF4-ORF2 and C-ZF4-ORF2 fusion proteins with different linkers recovered within 10, 4 and 1.2 kb from the target sequence. ORF2/fusion Recovered 10kb 4kb 1.2kb protein Rescues Window Window Window N-ZF4-ORF2 GHL 117 38 33 24 (100%) (32.5%) (28.2%) (20.5%) N-ZF4-ORF2 FL4 108 26 15 11 (100%) (24.1%) (13.8%) (10.2%) N-ZF4-ORF2 HL4 81 19 12 8 (100%) (23.5%) (14.8%) (9.9%) C-ZF4-ORF2 GHL 93 13 9 4 (100%) (14.0%) (9.7%) (4.3%) C-ZF4-ORF2 FL4 47 12 6 2 (100%) (25.5%) (12.8) (4.3%) C-ZF4-ORF2 HL4 92 17 16 10 (100%) (18.5%) (17.4%) (10.9%) Percentages are indicated directly below the rescue count.

The fusion proteins containing the ZF4 at the C-terminus showed slightly lower targeting efficiencies when compared to the N-ZF4-ORF2 GHL fusion pro- tein (Fisher Exact Test, P< 0.05) (Table 8) except for the C-ZF4-ORF HL4 fusion protein (Fisher Exact Test, P> 0.05). However N-ZF4-ORF2 FL4 and C-ZF4-

ORF2 FL4, which use the same flexible linker show comparable target efficien- cies (12.8% vs 13.8%) of Alu inserts landing within a 2 kb window around the target site (sequencing data found in Appendix tables 7 and 8). To further evalu-

130 ate insertional bias, the orientation and location of each recovered Alu insertion that inserted within a 10 kb window (5 kb upstream and 5 kb downstream) was plotted relative to the ZF4 target sequence that were driven by either the N-ZF4

ORF2 fusion protein or ORF2 (Figure 27). The insertional bias previously ob- served for the N-ZF4 ORF2 fusion proteins was not observed in C-terminally fused ZF4 constructs (P=0.075, Binomial Test).

Overall, the data indicate that fusing a DNA binding domain to the amino terminus of the L1 ORF2 protein is better for targeting Alu insertions. In general, the GHL linker was significantly better at targeting Alu insertions to genomic L1 elements than the FL4 and HL4 linkers within N-ZF4-ORF2 constructs (P< 0.05,

Fisher Exact test). However, no significant differences existed between the N-

ZF4-ORF2 constructs that contained either the FL4 or HL4 linker (Fisher Exact

Test, P> 0.05). The N-ZF4-ORF2 GHL construct is significantly better than the C

ZF4-ORF2 GHL construct (P<0.05, Fisher Exact test); however, the C-ZF4-ORF2

HL4 construct drove targeted Alu retrotransposition events with comparable effi- ciencies to the N-ZF4-ORF2 GHL construct (P> 0.05, Fisher Exact Test). These data indicate that, overall, the linker selection has little impact on targeting effi- ciency and the N-terminus seems a better location to add a DNA binding domain to impart targeting to ORF2.

131

132

Figure 27. Schematic of the insertion distribution of Alu driven by all of the individual N-ZF4-ORF2 and C-ZF4-ORF2 constructs.

Triangles represent individual Alu inserts and the approximate site of insertion relative to the target site (black vertical line and representing “0” in the x-axis). The location of the L1 sequence is shown as a light gray arrow along the x-axis. Triangles above the axis represent insertions in the same orientation and arrows below represent insertions in the opposite orientation relative to the target site. Red triangles represent inserts located within 0-599 bp of the target site, yellow triangles 600-1999 bp of the target sites and blue triangles 2000-5000 bp of the target site.

133

The data from the N-and C- terminally fused ZF4 proteins were combined to better analyze targeting capabilities and insertional bias of the fusion proteins

(Figure 28). Statistical analysis of the combined data demonstrate that inserts driven by any of the C-terminally ZF4 fusions do not show the upstream insertion bias observed for the N-terminally ZF4 fusions. We found an enrichment of Alu insertions within 2 kb upstream of the ZF4 target site when driven by N-terminal

ZF4 fusions (Binomial test, P = 0.005223). In contrast, there was no significant enrichment of Alu insertions driven by C-ZF4-ORF2 fusion constructs within 2 kb upstream or downstream of the ZF4 target (Binomial test, P = 0.075) (Appendix tables 9, 10, and 11). This is suggestive that the location of where the DNA bind- ing domain is located relative to the endonuclease of the ORF2 may influence insertion site preference.

134

Figure 28. Schematic of the combined insertion distribution of Alu driven by all N-ZF4-ORF2 and C-ZF4-ORF2 constructs.

A total of 306 and 232 Alu insertions driven by all N-ZF4-ORF2 fusion constructs or C-ZF4-ORF2 fusion constructs, respectively, were recovered. Triangles rep- resent individual Alu inserts and the approximate site of insertion relative to the target site (black vertical line and representing “0” in the x-axis). The location of the L1 sequence is shown as a light gray arrow along the x-axis. Triangles above the axis represent insertions in the same orientation and arrows below represent insertions in the opposite orientation relative to the target site. Red tri- angles represent inserts located within 0-599 bp of the target site, yellow trian- gles 600-1999 bp of the target sites and blue triangles 2000-5000 bp of the target site. We found an enrichment of Alu insertions within 2 kb upstream of the ZF4 target site located in genomic L1 elements when Alu was driven by the N- terminally fused ZF4 constructs (Binomial test, P = 0.005223). We did not see enrichment of Alu insertions within 2 kb upstream or downstream of the ZF4 tar- get site located in genomic L1 elements when Alu elements were driven by the C-terminally fused ZF4 constructs (Binomial test, P = 0.075).

135

1 2 3 4

Figure 29. Expression analysis of C-ZF4-ORF2, N-ZF2-ORF2, and C-ZF2- ORF2.

Lane 1: C-ZF4-ORF2; Lane 2: N-ZF2-ORF2; Lane 3: C-ZF2-ORF2; Lane 4: emp- ty vector control. C-ZF4-ORF2 and C-ZF2-ORF2 fusion proteins were detected at the predicted size (~170 kDa).

136

6.2.5 Insertion bias observed due to availability of endo site

To address if the endonuclease density influenced the insertional bias, we tested a second ZF (ZF2) that targets a different sequence in the 3’ end of L1.

We exchanged the ZF4 sequence for the ZF2 sequence in all six previously test- ed zinc finger constructs, including the constructs with the GHL, FL4, and HL4 linkers (Figure 13). All six of the ZF2 fusion proteins were able to successfully drive Alu retrotransposition events using our Alu retrotransposition assay, alt- hough some with lower efficiencies than the ORF2 protein (Table 9).

Table 9: Relative retrotransposition rates of Alu when driven by ORF2 or by the N-ZF2-ORF2 fusion protein. ORF2 fusion Cell Retrotransposition Relative to ORF2 line rate (%) ORF2 ( HeLa control) HeLa 5.39± 4.43 x10-4 100 N-ZF2-ORF2 GHL* HeLa 3.02± 3.6 x10-4* 57.1 ± 4.7 N-ZF2-ORF2 FL4 HeLa 4.92± 3.5 x10-4 95.8 ± 14.9 N-ZF2-ORF2 HL4 HeLa 5.45± 3.08 x10-4 118.1 ± 17.8 C-ZF2-ORF2 GHL* HeLa 5.73± 7.9 x10-4* 107.4 ± 10.4 C-ZF2-ORF2 FL4 HeLa 4.11± 3.62 x10-4 91.9 ± 14.4 C-ZF2-ORF2 HL4 HeLa 4.26± 4.43 x10-4 83.9 ± 5.7 #: Significantly different from ORF2 control P< 0.05 (Un-paired Student T-test). *:Rates have been adjusted to reflect listed ORF2 controls used for linker data Retrotransposition rates are Standard Deviation. Percentages are Standard er- ror.

Protein expression analyses were performed using cell extracts of transi- ently transfected cells with the N-ZF2-ORF2 and C-ZF2-ORF2 fusion proteins.

We were unable to detect the N-terminally fused full-length fusion protein. However,

137 we were able to faintly detect the C-terminally fused ZF2 at the expected size 170 kDa

(Figure 29, lanes 2 and 3, respectively). From these results coupled with the re- trotransposition efficiencies, the data are suggestive that fusing the 6-fingered zinc finger ZF2 to the N terminus of the L1 ORF2 protein did not hinder the ability for ORF2 to drive Alu retrotransposition events, but the final fusion protein may have lower stability when transfected into mammalian cells (Figure 29), explain- ing the lower retrotransposition rates.

We were able to rescue 167 Alu insertions driven by the all of the ZF2 fu- sion constructs using our Alu rescue assay. We only recovered one Alu that in- serted within 2 kb window around the ZF2 target sequence, driven by the N-ZF2-

ORF2 GHL construct (Table 10) (sequence analysis of recovered insertions can be found in Appendix tables 12- 17).

Table 10. Frequency of Alu inserts driven by the N-ZF2-ORF2 and C-ZF2-ORF2 fusion proteins with different linkers recovered within 10, 4 and 1.2 kb from the target sequence. ORF2/fusion protein Recovered 10kb 4kb 1.2kb Rescues Window Window Window N-ZF2-ORF2 GHL 24 1 1 0 (100%) (4.2%) (4.2%) (0.0%) N-ZF2-ORF2 FL4 19 0 0 0 (100%) (0%) (0%) (0%) N-ZF2-ORF2 HL4 31 0 0 0 (100%) (0%) (0%) (0%) C-ZF2-ORF2 GHL 25 1 0 0 (100%) (4.0%) (0%) (0.0%) C-ZF2-ORF2 FL4 15 0 0 0 (100%) (0.0%) (0.0%) (0.0%) C-ZF2-ORF2 HL4 53 1 0 0 (100%) (1.9%) (0.0%) (0.0%) Percentages are indicated directly below the rescue count.

138

Although we recovered two Alu insertions that were driven by either the C-

ZF2-ORF2 GHL or C-ZF2-ORF2 HL4 constructs, no construct was able to drive targeted Alu insertions closer than 2 kb. Therefore, we determined that ZF2, when fused to the L1 ORF2 protein, was unable to drive targeted Alu insertions.

Although the ZF2 showed similar binding capabilities as the ZF4 in vitro (Voigt et al., 2012), our data indicates that in vivo this zinc finger is unable to redirect tar- geting of Alu insertions. Unfortunately, we were unable to evaluate if the inser- tional bias observed with our N-ZF4-ORF2 GHL construct was due to the density of endonuclease sites flanking the target sequence.

6.3 Discussion

Data obtained from the ZF4 and ZF2 ORF2 fusion proteins indicate fusing a six- fingered zinc finger to the N- or C-terminus of the L1 ORF2 protein did not hinder the ability of ORF2 to drive Alu retrotransposition events. Additionally, most of the fusion constructs utilizing the ZF4 DNA binding domain were able to enrich for Alu insertions in a 4 kb window around the ZF4 target sequence compared to unfused ORF2 (Figures 27 and 28). These data demonstrate that the fusion pro- teins were able to fold properly, and that each domain was capable of interacting with its respective nucleic acid (s) (DNA or RNA). Therefore, ORF2 can tolerate fusions to the amino- and carboxy- termini. This knowledge could expand the utilization of ORF2 protein applications, including fusing different tags to ORF2 for in vitro localization or pull-down assays (Taylor et al., 2013).

139

Our data demonstrated that the observed insertional bias of Alu elements landing upstream of the ZF4 target sequence was due, in part, to the fusion of

ZF4 to the N-terminus of ORF2. When the data were combined, all N-terminally fused ZF4 constructs showed this insertional bias, driving Alu elements to insert upstream of the ZF4 target sequence (Figure 28, Binomial test P=0.003026).

Conversely, when the ZF4 zinc finger was fused to the carboxy- terminus of the

ORF2 protein, this insertional bias was lost (Figure 28, Binomial test P=0.23975).

Therefore, it is plausible that the conformation of the final fusion proteins contain- ing N-terminally fused ZF4 proteins favors the interaction of the ORF2 protein upstream of the DNA target sequence. C-terminally fused DNA binding domains, when bound to their target sequence, might position the ORF2 protein in a more neutral position, where there is more flexibility of where ORF2 can interact with target DNA.

The lack of targeting data from the ZF2-ORF2 fusion proteins, limits the ability to determine if the extent that the endonuclease site density has on the in- sertional bias observed by the N-ZF4-ORF2 fusion proteins. Because of the similarities between the ZF4 and ZF2 zinc fingers, we are unsure why ZF2 con- structs were unable to enrich for Alu insertions near genomic L1 elements. The

ZF2 fusion proteins might not have been as stable as the ZF4 fusion proteins, as we were unable to detect full-length N-ZF2-ORF2 fusion proteins through West- ern blot analysis (Figure 29). In vitro studies of ZF4 and ZF2 demonstrated com- parable DNA binding efficiencies to their respective target sequences (Voigt et al., 2012). However, our studies show that these zinc fingers significantly dif-

140 fered in their targeting capabilities in our ex vivo assay. One possible explana- tion for why our ZF2-ORF2 fusion constructs were unable to drive target Alu in- sertions is that the ZF2 zinc finger does not interact with genomic DNA in a man- ner that is compatible with the ORF2 protein. ZF2 could bind to its target se- quence too tightly, which might not allow for the ORF2 protein to cleave the tar- get DNA and carry out TPRT of the Alu element.

One important conclusion from this study is that the abundance of the target site is not the only characteristic of a DNA binding domain to consider. We speculate that one of the properties that contributed to the success of the N-ZF4-

ORF2 fusion protein in driving targeted Alu insertions was the abundance of ZF4 target sites in the genome (~17000). ZF2 and ZF4 target sequences are present in approximately the same number, in the same repetitive element, in the human genome. Therefore, other properties of the DNA binding domain, including sta- bility, conformation and chromatin state of the target DNA sequence, in addition to the binding strength of the DNA binding domain could be more important to the targeting success of the fusion protein.

Finally, we determined that the linker selected in these experiments does not, overall, have a significant impact on the success of the targeting strategy.

Although the N-ZF4-ORF2 GHL construct was able to drive targeted Alu inser- tions more efficiently and effectively than the other ZF4-containing constructs

(Figure 28), the linker selection did not significantly alter the targeting capabilities the other constructs (P> 0.05, Fisher Exact test). For example, both the N-ZF4-

ORF2 FL4 and N-ZF4-ORF2 HL4 constructs drove targeted Alu retrotransposi-

141 tion events with similar efficiencies to each other (P> 0.05, Fisher Exact test).

The only construct where the linker selection might have had an impact on tar- geting capabilities is the C-ZF4-ORF2 HL4 construct: there was no significant dif- ference between recovered Alu insertions driven by the unfused ORF2 construct and Alu insertions driven by the C-ZF4-ORF2 HL4 construct. Lack of observed targeting, however, could have been due to the low number of recovered Alu in- sertions. More data on recovered Alu insertions could make targeting data from this construct significant. Additionally, changing the linker in the ZF2-ORF2 con- structs did not improve the targeting capabilities of these ZF2 fusion proteins.

Therefore, although the GHL linker facilitated effective targeting in the N-ZF4-

ORF2 GHL fusion construct, overall the choice of zinc finger remained as the main determining factor for imparting targeting capabilities. At this time we are unsure why the GHL linker was significantly better at driving targeted Alu re- trotransposition events (Fisher Exact Test, P< 0.05) when fused to the N- terminus of the ORF2 protein.

142

Chapter 7. Adapting the CRISPR system to the ORF2 protein in order to target Alu insertions

7.1 Introduction

CRISPR or clustered regularly interspaced short palindromic repeats, are bacte- rial immune systems that incorporate foreign DNA into the host genome for pro- tection (Sander and Joung, 2014). The foreign DNA is incorporated into arrays within the host genome and transcribed into CRISPR RNA molecules, or crRNA.

Each crRNA molecule also contains a protospacer adjacent motif, or PAM, di- rectly downstream of the unique, invading DNA sequence. The PAM is a 3 nu- cleotide ‘NGG’ sequence that directs the Cas9 nuclease to cleave the target

DNA. Without the PAM sequence, the Cas9 nuclease cannot cleave the target sequence (Sander and Joung, 2014). Additionally, crRNA molecules must hy- bridize to a transactivating CRISPR RNA molecule (tracrRNA), which forms a complex with the Cas9 nuclease (Deltcheva et al., 2011).

The type II CRISPR system, isolated from S. pyogenes, has been modi- fied as a molecular tool to perform site-specific genome editing and double- strand break formation (Jinek et al., 2012; Straubeta and Lahaye, 2013). Suc- cessful targeting requires that the Cas9 nuclease, tracrRNA, and crRNA are pro- vided. The modern CRISPR system has been designed to include both the tra- crRNA and crRNA in one plasmid, forming the guide RNA (gRNA). These tar-

143 geting gRNAs can be designed to target any feasible site by manipulating the 20- nucleotide crRNA sequence, making the CRISPR strategy particularly flexible to select genomic targets of interest.

Wild-type CRISPR strategies have been implemented in the laboratory for multiple gene editing applications. For example, the CCR5, EMX1, and PVALB genes have been successfully disrupted using targeted gRNAs to these genes

(Cho et al., 2013; Cong et al., 2013). Additionally, the AAVS1 gene was suc- cessfully integrated into humans cells using the CRISPR/Cas9 system (Mali et al., 2013). One beneficial aspect of this system is that multiple versions of the

Cas9 nuclease exist. These alternate proteins can be used in applications where targeting is required, but Cas9 nuclease-directed double-strand breaks is not.

The endonuclease properties of Cas9 come from amino acids 10 and 840.

When either of these two amino acids are mutated, the Cas9 protein becomes a nickase that cleaves one DNA (Haft et al., 2005; Makarova et al., 2006). Cas9 nickases have been used to successfully induce homology directed recombina- tion (HDR) with similar efficiencies to that of the wild-type Cas9 nuclease (Mali et al., 2013; Cong et al., 2013). Additionally, a catalytically inactive version of the

Cas9 protein, dCas9, has been used with a targeting gRNA to direct different proteins to a specific genomic location (Jinek et al., 2012; Gasiunas et al., 2012).

When fused to the transcriptional activating domain VP64, dCas9 can regulate the expression of endogenous genes in human and mouse cells (Maeder et al.,

2013; Perez-Pinera et al., 2013; Konermann et al., 2013; Gilbert et al., 2013;

Ebina et al., 2013). In one specific example, a VP64-dCas fusion protein stimu-

144 lated the transcription of VEGFA (Maeder et al., 2013). These data demonstrate that CRISPR is a flexible platform that can be modified to fit a wide range of ap- plications, such as imparting sequence-specific targeting capabilities to proteins.

CRISPR/Cas9 constructs have been integral for genome engineering strategies in recent years. We adapted and evaluated three separate CRISPR strategies for re-targeting Alu insertions into specific locations in the genome.

The wild-type Cas9 nuclease cleaves DNA bluntly, targeted and mediated by a gRNA. Two nickase variations of the Cas9 protein exist that cleave one strand of the DNA, either complementary or non-complementary, depending on the mutat- ed nuclease. Finally, a nuclease deficient version of the Cas9 protein, dCas9, was created to interact site-specifically with a target DNA site without any catalyt- ic activities.

Because of the previous success of Cas9 fusions to redirect proteins, we selected to test this system in order to modify the targeting capability of the L1

ORF2 protein. However, for this approach to function, the Cas9 endonuclease activity should not compete with the L1 endonuclease activity. Thus, for our pur- poses, Cas9 fusion proteins should only have one functional endonuclease do- main. In order to create these fusion proteins we have two alternative strategies:

1- use an ORF2 without endonuclease activity, or 2- use an endonuclease- deficient Cas9. Our first approach was to create a fusion protein containing an

ORF2 protein devoid of endonuclease activity. We selected to fuse either the wildtype (WT) Cas9 protein or the D10A nickase variant (Cong et al., 2013) to the

N-terminus of either an ORF2 endonuclease double mutant (Endo --) or to a

145 truncated ORF2 protein lacking the endonuclease domain (i.e. containing only the reverse transcriptase and CYS domain).

Previously published data showed that changing the endonuclease do- mains of telomere-specific LINE elements altered the targeting preference to in- sert based on the specific endonuclease domain present (Takahashi and

Fujiwara, 2002). However, this approach was not effective when applied to the

L1 ORF2 protein, as analyses of L1 insertion events showed that human L1 ele- ments still inserted into the canonical L1 ORF2 endonuclease sites (Repanas et al., 2007). Due to the target site selection, the experimental design did not pro- vide the required sequences for proper A-tail priming during TPRT. Therefore, targeted insertions would not occur if the necessary sequences were absent from the target site. To mediate this problem, we designed a gRNA using the availa- ble website www.ZiFit.partners.org to a genomic location that, when cleaved by

Cas9 nuclease or the nickase, would provide the necessary T-rich sequence for

TPRT priming of the Alu A tail (Figure 29).

146

Figure 30. Strategy for Cas9 directed TPRT of Alu inserts.

The ORF2 endonuclease cleaves at AT-rich regions exposing a T-rich DNA strand that can base pair with the A-tail of the Alu RNA, which is needed for TPRT to occur. The T-rich DNA strand provides the priming site for reverse tran- scription of the Alu RNA to generate the cDNA (i.e. new insert). The strategy is to select a guide RNA that will target the Cas9 (double strand cut) or nickase (single cut) to cleave at an AT-rich site so that it will expose a T-rich DNA strand comparable to the ORF2 endonuclease cleavage. An advantage of using Alu RNA is that the A-tail RNA sequence can be manipulated so that it is perfectly complementary to the exposed DNA strand (indicated by *).

147

Our second strategy utilized an endonuclease defective Cas9 (dCas9) fused to the functional ORF2 protein. Although the dCas9 protein is a catalytical- ly inactive, it retains its ability to interact with a gRNA, and therefore, its targeting capability (Qi et al., 2013). For this approach, we selected gRNAs that would target sequences proximal to ORF2 endonuclease sites (AT-rich). This strategy went one step further, utilizing the MS2 coast protein and MS2 biding sites. The

MS2 coat protein was fused to the N-terminus of ORF2, which was co- transfected with gRNAs containing MS2 binding loops and the dCas9 protein.

This chapter presents all the results from both CRISPR strategies.

7.2. Results

7.2.1 Evaluation of the Cas9 targeting capability of the Cas9-ORF2-fusion pro- teins.

Both the Cas9 and the L1 ORF2 proteins are relative large proteins (~150 kDa) and a fusion of the two may compromise the Cas9 protein/gRNA interac- tion. To evaluate Cas9 targeting and cleavage capabilities of the fusion proteins, we utilized a homology arm assay designed to site-specifically promote homolo- gy driven recombination of a blasticidin resistance cassette, directed by a target- ing gRNA, to the WRN gene. The blasticidin resistance cassette is designed as a homology arm cassette (HAC), which contains sequences homologous to a spe- cific genomic location targeted by the gRNA. The rationale behind this assay is that homologous recombination is stimulated between the HAC and the targeted genomic region if site-specific cleavage by Cas9 occurs. Targeted insertions of

148 the HAC can be identified through PCR analyses of the genomic location (Figure

30A).

Using this assay, we evaluated the wildtype Cas9, D10A nickase, the

Cas9-ORF2endo-, Cas9-RTCYS, nickase-ORF2endo- and the nickase-RTCYS fusion proteins. PCR analysis demonstrated that three out of our four fusion pro- teins were able to cleave within the WRN gene, resulting in successful integration of the HAC into the WRN gene (Figure 31B), indicative of proper Cas9/nickase function in these fusion proteins. Although multiple efforts were made, no posi- tive PCR results were observed for the nickase-ORF2RTCYS fusion protein.

149

A

B

Figure 31. Evaluation of function of the Cas9-ORF2 and nickase-ORF2 fu- sion proteins.

A. Schematic of the WRN target site and the expected Cas9-driven insertion of the homology arm cassette (HAC). Red and green arrow sets show the primer annealing locations used for PCR analysis.

B. PCR results from individual blasticidin resistant colonies generated by: 1- Cas9, 2- D10A nickase, 3- Cas9-ORF2endo-, 4- Cas9-RTCYS and the 5- nick- ase-ORF2endo-- fusion proteins. Red and green arrows show the corresponding expected size for each primer pair. The (+) represents a PCR product that was previously generated and used as a positive control for insertion size.

150

7.2.2 Evaluation of the retrotransposition capability of the functional Cas9- endonuclease defective ORF2-fusion proteins.

Next, we analyzed if the fusion of the CRISPR protein compromised the ability of the modified L1 ORF2 protein to drive Alu retrotransposition events. We selected a guide RNA (gRNA2) to a unique site on chromosome 8 (2587720-

2587745) that contained the appropriate sequences to both direct site-specific

Cas9 cleavage and provide the sequences for A-tail annealing during TPRT

(Figure 30). Only the nickase ORF2endo-- and nickase ORF2RTCYS constructs were able to drive very modest levels of retrotransposition (Table 11). This sug- gests that the Cas9 endonuclease is unable to competently substitute for the en- donuclease activity of the ORF2.

Table 11. Relative retrotransposition rates of Alu when driven by ORF2 or by the functional Cas9-endonuclease defective ORF2-fusion proteins. ORF2 fusion Cell line Relative to ORF2 (%) ORF2 ( HeLa control) HeLa 100

Cas9 Endo-- HeLa 0

Cas9 RTCYS HeLa 0

Nickase Endo-- HeLa 0.0817± 0.0817#

Nickase RTCYS HeLa 0.1225± 0.1225#

#: Significantly different from ORF2 control, P< 0.00001 (Un-paired Student T-test).

151

7.2.3 Evaluation of the capability of the functional Cas9-endonuclease defective

ORF2-fusion proteins to redirect insertion preference of Alu.

Although the efficiency of the Cas9-ORF2 endo defective fusion proteins is extremely poor, it is possible that the few events represent targeted insertion events. To evaluate this possibility, we analyzed insertions using the Alu rescue system. In addition, we scaled up the experimental design and were even able to observe a few colonies from all tested fusion proteins. We rescued and ana- lyzed a total of 7 Alu inserts. No targeting was observed to the expected location on chromosome 8 (Table 12) (details of recovered sequences can be found in

Appendix Table 18).

Table 12. Recovered Alu insertions driven by the four Cas9 fusion proteins with the gRNA2 targeting chromosome 8:2587720-2587745. ORF2 fusion Cell line Number of Rescues Targeting? Cas9 Endo-- HeLa 3 No

Cas9 RTCYS HeLa 2 No

Nickase Endo-- HeLa 1 No

Nickase RTCYS HeLa 1 No

None of the recovered insertions inserted near the target sequence locat- ed on chromosome 8, nor did the recovered Alu insertions contain the hallmarks of Cas9 mediated nuclease activity (presence of PAM and sequence homology to selected gRNA). If Alu TPRT occurred using the endonuclease cleavage site provided by the WT Cas9 or D10A nickase protein, the PAM sequence should be

152 present. Additional sequencing of the recovered inserts showed that cleavage occurred at sites that resembled canonical ORF2 endonuclease cleavage sites

(5’-TTAAAA-3’), which may have occurred from endogenous ORF2 present in the HeLa cell line. Therefore, we conclude that this strategy is ineffective to tar- get Alu insertions to specific locations in the genome. One of the reasons for lack of targeting may be from the low frequency of the target site in the human genome. Thus, we decided to choose a new target location that is present multi- ple times in the genome to increase the odds of targeting events for the next ex- periments. In addition, we selected to evaluate the second strategy by using fu- sions containing an ORF2 with functional endonuclease and a Cas9 with a defec- tive nuclease (dCas9).

7.2.4 Evaluation of the capability of the dCas9-ORF2 (defective Cas9-functional

ORF2)-fusion proteins to redirect insertion preferences of Alu.

The recent availability of an endonuclease deficient Cas9 protein (dCas9) has led to an increase of potential genomic editing strategies (Gilbert et al., 2013;

Qi et al., 2013). The dCas9 protein still retains the ability to complex with a tar- geting gRNA in order to target and interact with specific genomic sequences. In this strategy the dCas9 will function as an alternate DNA binding domain and rely on the endonuclease activities of the L1 ORF2 protein. Therefore, the gRNAs for this strategy, need to be designed to target sequences proximal to potential

ORF2 endonuclease sites (i.e. AT-rich sequences).

153

Data from R2 retroelement studies in Bombyx mori, the retargeting of

Sleeping Beauty, and from our N-terminal ZF4-ORF2 fusion protein in Chapter 6 indicate that the number of available target sites is likely a crucial component for successful targeting events. Thus, we decided to utilize previously designed gRNAs that target the 5'UTR of genomic L1 elements. We selected three gRNAs for our analyses of our dCas9-ORF2 fusion proteins that target different se- quences in the 5’UTR of L1: 551, 765, and 892 (#s denote position in the 5’UTR, see Figure 32). We chose these three target locations as they are located prox- imal to multiple potential ORF2 endonuclease sites. In addition, these three tar- get sites are located far enough apart that they can be compared. Thus, if one gRNA targets Alu insertions better than the other gRNAs, we will be able to eval- uate the influence of target site selection in our system. Additionally, the select- ed target sites are close enough to one another that we will be able to easily identify overlapping targeted insertions mediated by two different gRNAs. In ad- dition, we designed a gRNA to the 3' UTR of L1 elements (Figure 32). Genome analysis evaluating the frequency of selected the targets sites (allowing for one mismatch) confirms the abundance of these sites in the genome (Table 13).

Table 13. Genomic frequency of the selected gRNA target sites to L1. gRNA Target sequence Instances (allowing for max 1 mismatch) 551 5’-gaggtggagcctacagaggcaggc-3’ 3266

765 5’-ggtgtcagtgtgcccctgctggg-3’ 1346

892 5’-gtagataaaaccacaaagatggg-3’ 4739

3’L1 5’-gtgggtgcagcgcaccagcatgg-3’ 4706

154

Fig. 32. Location of target sites of L1-directed gRNAs.

Schematic of the L1 is shown with the location and sequences of the gRNA tar- get sites.

155

We used the Alu retrotransposition assay to evaluate the ability of our dCas9-ORF2 fusion protein to drive Alu retrotransposition events. In this assay, we co-transfected our dCas9-ORF2 fusion protein with the 551, 765, 892 or 3’L1 gRNA to ensure the gRNA did not inhibit retrotransposition. All three experi- mental conditions were able to drive Alu retrotransposition (Table 14).

Table 14. Relative retrotransposition rates of Alu when driven by ORF2 or by the dCas9-ORF2 fusion protein using distinct gRNAs targeting L1. ORF2 fusion gRNA Relative retrotransposition rate ORF2 (control) N/A 100 dCas9 ORF2 551 81.00±5.51# dCas9 ORF2 765 80.33±10.27 dCas9 ORF2 892 84.67±14.68

#: Significantly different from ORF2 control (Un-paired Student T-test).

Next, we analyzed 39 recovered Alu inserts driven by our dCas9-ORF2 fusion construct targeted with the 551 gRNA, 32 insertions targeted with the 765 gRNA, 38 insertions targeted by the 892 gRNA and 8 targeted by the 3’L1 gRNA.

Furthermore, we used both a 1:1 and a 5:1 ratio of all gRNAs to dCas9-ORF2 fusion protein in order to increase the likelihood of targeting events occurring.

We did not observe an increase in targeting efficiency with any gRNA at either concentration when compared to Alu inserts driven by ORF2 (All rescue data combined in Table 15) (details of recovered sequences can be found in Appendix

Table 19).

156

Table 15. Recovered Alu insertions driven by the dCas9-ORF2 fusion proteins driven by four different gRNAs. gRNA Total Insertion distance from the target site Rescues Within 5kb Within 2kb Within 600bp 551 39 2 1 0

765 31 1 0 0

892 38 2 2 1

3’ L1 8 1 0 0

7.2.5. Using an MS2-ORF2 fusion protein to increase the interaction a MS2- gRNA-Cas9 complex to increase efficiency of targeting Alu insertions

Studies showed that utilizing the strong interaction between the MS2 pro- tein and an MS2-tagged gRNA can improve targeting efficiency in CRISPR/Cas9 systems (Konermann et al., 2015). We decided to adapt this strategy in order to drive targeted insertions of Alu. Our design strategy was to create an MS2-

ORF2 fusion protein, which can be complexed successfully with a modified gRNA that contains MS2 binding loops (see Figure 33). The rationale for this strategy is that the dCas9 protein will complex with the gRNA containing the MS2 binding loops. The MS2 protein will bind to its binding loops on the gRNA, which will bring the MS2-ORF2 fusion protein to the targeted genomic location.

157

Figure 33. The strategy for retargeting Alu insertions using the MS2-ORF2 fusion protein.

The rationale behind this strategy was to re-target Alu insertions using a gRNA containing MS2 binding domains and the catalytically inactive dCas9 protein. The ORF2 protein is fused to MS2, which should be brought to the target site by dCas9 through the interaction of MS2 with the MS2 binding loops located on the gRNA.

158

We evaluated this approach using the same gRNAs that targeted se- quences in genomic L1 elements. Using the same genomic targets with the

MS2-ORF2 fusion protein strategy will allow direct comparisons with our previous results obtained the dCas9-ORF2 fusion protein strategy.

We used the Alu retrotransposition assay to evaluate the ability of our

MS2-ORF2 fusion protein to drive Alu retrotransposition events. In this assay, we co-transfected our MS2-ORF2 fusion protein with either the 551, 765, or 892 gRNA containing MS2 binding loops to ensure the gRNA did not inhibit re- trotransposition. All three experimental conditions were able to drive Alu re- trotransposition events (Table 16).

Table 16. Relative retrotransposition rates of Alu when driven by ORF2 or by the MS2-ORF2 fusion proteins, co-transfected with dCas9 and a gRNA. ORF2 fusion Cell line gRNA Relative to ORF2 (%) ORF2 ( HeLa control) HeLa N/A 100

MS2-ORF2 HeLa 551 73.33± 5.49#

MS2-ORF2 HeLa 765 80± 12

MS2-ORF2 HeLa 892 61 ± 6#

#: Significantly different from ORF2 control P< 0.05 (Un-paired Student T-test).

We analyzed 13 recovered Alu inserts driven by our MS2-ORF2 fusion construct targeted with the 551-MS2 gRNA, 13 insertions targeted with the 765-

MS2 gRNA, and 15 insertions targeted by the 892-MS2 gRNA. We did not ob- serve an increase in targeting efficiency with any of the tested gRNAs when

159 compared to Alu inserts driven by ORF2 (Table 17) (details of recovered se- quences can be found in Appendix Table 20).

Table 17. Recovered Alu insertions driven by the dCas9, the MS2-ORF2 fusion protein driven using four different MS2 tagged gRNAs. gRNA Total Insertion distance from the target site Rescues Within 5kb Within 2kb Within 600bp 551-MS2 13 0 0 0

765-MS2 13 0 0 0

892-MS2 15 1 0 0

7.2.6. Re-designing the A-tail of the Alu Rescue Cassette to promote sequence homology between a specific genomic target and Alu insert.

To increase the number of available target sites relative to gRNA 2, we designed a gRNA to the 3’ UTR of genomic L1 elements. This gRNA was de- signed to contain a PAM sequence directly downstream of the genomic target in order to direct the Cas9 endonuclease where to cleave the DNA. The 3’ UTR of

L1 elements is not an ideal sequence for priming TPRT events, as it does not contain the nucleotides necessary for Alu A-tail annealing. To address this, we modified the Alu Rescue Vector so that the 3’end of the Alu transcript will perfect- ly match the 3’ L1 UTR target site when cleaved site-specifically by either the

Cas9 or nickase nucleases. The new Alu end we designed is called ‘L1 PAM’, and will have perfect sequence homology to the downstream or “bottom” DNA

160 strand. This target site also contains the PAM sequence. Therefore, this new

L1PAM Alu rescue vector can be used with fusion proteins that contain either the

WT Cas9 endonuclease or the D10A nickase proteins (Figure 34).

161

Figure 34. Adapting the 3’ end of the Alu RNA for TPRT. A. When the 3’ end of genomic L1 elements is cleaved by either the Cas9 nucle- ase or the D10A nickase, base pairs are exposed that will need an A-tail se- quence that is complementary. B. The A-tail is complementary to the target DNA (shown in blue), and anneals to the genomic DNA. The Alu RNA can then be reverse transcribed into the ge- nome by the modified L1 ORF2 protein, which retains the reverse transcriptase function.

162

The observed rescue retrotransposition rates using the Alu construct with the modified 3’ end were extremely low. Out of over six million transfected HeLa cells, only two neomycin resistant colonies formed; therefore, we were only able to recover two insertions from this strategy. Unfortunately, both recovered se- quences did not reflect the predicted Alu inserts expected from this targeting strategy. Analysis of the first insert demonstrated that the recovered sequence was not a bona fide retrotransposition event (See Figure 35A). Rather, the se- quence is an intronless version of the pBS-Ya5rescue-L1 PAM construct that in- tegrated into the genome. We determined that the insert was not an RNA poly- merase III derived transcripts, as we were able to identify the presence of the

7SL upstream enhancer sequence (which would not be part of the Alu RNA) as well as vector sequences downstream of the RNA polymerase III terminator. A potential explanation for this insertion is that it is the retrotransposition event of a spurious transcript generated by an alternate promoter (likely an RNA polymer- ase II promoter) within the plasmid. We were unable to obtain genomic flanking sequences to verify any hallmarks of retrotransposition.

The second insertion recovered inserted in chromosome 22 (Figure 35B).

Unlike the previously recovered insert, this second insert lacked the upstream

7SL enhancer sequence; however, it still contained vector sequences down- stream of the terminator. This result suggests that transcription did not terminate at the expected location within the rescue construct. We were unable to deter- mine direct repeats or evaluate other hallmarks of retrotransposition, as this re- covered insert only contained the 5’ genomic flank.

163

Figure 35. Schematic detail of the recovered inserts driven by the nickase endo-- fusion protein using the pBS-Ya5rescue-L1 PAM plasmid.

A. The first insert contained vector sequences upstream (7SL and vector DNA) and downstream (terminator and vector) of the Alu sequence.

B. The second insert inserted within chromosome 22, and did not contain the 7SL upstream sequence, but did have the terminator sequence at the 3’ end of the Alu element. H- represents the HindIII sites used in the recovery procedure, N- represent the flanking vector sequence. Red arrows highlight features not expected in bona fide Alu retrotransposed insertions.

164

7.3 Discussion

Although dCas9 is about the same size of the ORF2 (~150 kDa), fusions of these two proteins supported retrotransposition. However, retrotransposition efficiency drastically diminished or failed when an endonuclease double mutant of ORF2 or the N-terminally truncated variant (RTCYS) were included in Cas9 or nickase fusion proteins (Table 10). These results are suggestive that the cleav- age capabilities of Cas9 or the nickase are unable to substitute for the ORF2 en- donuclease. Furthermore, it implies that the ORF2 endonuclease domain may have other functions needed for retrotransposition in addition to DNA cleavage.

Although none of the CRISPR/Cas9 strategies were unable to drive tar- geted retrotransposition events (Tables 12, 15, and 17). The Cas9 proteins uti- lized in these experiments are large proteins: approximately 150 kDa, about the same size as ORF2 protein. Although the dCas9-ORF2 fusion protein was able to drive Alu retrotransposition events, this fusion protein might not have been able to target Alu insertions due to the size of the final fusion protein. The dCas9-ORF2 fusion protein might be unable of allow both the dCas9 and ORF2 proteins to simultaneously interact with the target nucleic acid molecules, includ- ing the target DNA, gRNA, and Alu RNA molecule. Although the MS2 protein is much smaller than any of the Cas9 protein variants utilized (~20 kDa vs 150 kDa), the MS2-ORF2 fusion protein was also unable to re-direct Alu insertions to specific locations in the genome. This strategy relies on multiple components

(located on separate constructs) to stably interact with each other inside the cell at very precise locations. If any of these interactions were lost or unstable, target-

165 ing would not occur. Instead, the MS2-ORF2 protein would be free to drive Alu insertions elsewhere in the genome independent of the gRNA/dCas9 interaction

(Figure 36). Previous experiments using the MS2 strategy to target activation domains showed decreased targeting efficiency compared to fusing the same activation domains to the dCas9 protein alone (Mali et al., 2013). Therefore, we propose that in our case, using overly complex targeting systems that rely on multiple, separate components might have not be the ideal strategy for retarget- ing Alu insertions.

166

Figure 36. Model of proposed explanation for the lack of targeted Alu in- sertions using the MS2-ORF2 fusion protein CRISPR/Cas9 strategy. The MS2-ORF2 strategy relied on the interaction of the fusion protein with the MS2 binding domain on the gRNA. This complex needed to be brought to the target DNA through the interaction with dCas9. If the MS2-ORF2 fusion protein is unable to remain in the dCas9 complex, the MS2-ORF2 fusion protein would be free to drive dispersed Alu retrotransposition events throughout the genome.

167

In our final approach, we modified the A-tail of the pBS-Ya5rescue-A70Du cassette, creating the pBS-Ya5rescue-L1 PAM construct. This new tail is de- signed to perfectly base pair with the target DNA, which creates the template for

TPRT when the DNA is cleaved by either the Cas9 nuclease or D10A nickase

(Figure 34B). Due to the dramatically reduced retrotransposition efficiency, we were only able to recover two insertions. Analyses of the recovered sequences showed that the strategy was likely recovering artifacts from pushing the limits of this system. Although the CRISPR proteins target very efficiently in other exper- imental approaches, we concluded that this system is completely ineffective for targeting retrotransposition events.

168

Chapter 8. CONCLUSION

DNA binding domains (DBDs) have been used with great success to im- part targeting capabilities to a variety of proteins, creating highly useful genomic tools. We evaluated the ability of five types of DBDs and strategies (AAV Rep proteins, Cre, TAL effectors, zinc fingers, and the dCas9/gRNA system) to redi- rect the L1 ORF2 protein to drive Alu retrotransposition events to specific se- quences. Although only one DBD evaluated was successful, the failed experi- ments generated valuable information that can provide guidance in the design of engineered ORF2 fusion proteins.

We first learned that the ORF2 protein tolerates the addition of small and large protein domains, both at the amino- and carboxy- terminus. Although in some instances retrotransposition efficiencies slightly diminished (~ 20%, such as in the N-ZF2-ORF2 GHL construct), all fusion proteins containing a functional

ORF2 protein were capable of driving Alu retrotransposition in culture. However, only one specific DBD was able to successfully redirect Alu insertions to a specif- ic genomic location: the 6-fingered zinc finger ZF4. We were able to enrich for

Alu insertions landing within a 1.2 kb window surrounding the ZF4 target se- quence by 47 fold when we fused this relatively small (20kDa) ZF domain to the

N-terminus of the ORF2 protein. Targeted Alu insertions were successfully driv- en by three different fusion protein linker variants, containing eitherthe GHL, FL4,

169 or HL4 linker. The N-ZF4-ORF2 GHL construct enriched for Alu insertions signif- icantly more than the N-ZF4-ORF2 FL4 or N-ZF4-ORF2 HL4 constructs (Fisher

Exact Test 0.009452 and 0.037758 , respectfully). However the linker change did not abolish nor significantly improve targeting capabilities when compared to

Alu insertions driven by the unfused ORF2 construct (Fisher exact test P> 0.05).

Additionally, there was no statistical difference between C-ZF4-ORF2 constructs that utilized the same three linkers, suggesting that location (N- vs. C-terminus) may be more important than the type of linker used (Fisher exact test P> 0.05).

Each of these three constructs drove targeted Alu retrotransposition events with comparable efficiencies (Table 8). Overall, although important, the linker selec- tion appears to have a lesser role in influencing targeting efficiency. However, mores studies using shorter and longer versions of the linkers would be needed to confirm this assertion.

By combining the information obtained from the successful ZF4 targeting strategies with the unsuccessful DBD and CRISPR strategies, we were able to learn about some of the preferences, characteristics, and requirements of re- targeting a retroelement. There were many characteristics of the ZF4 DNA bind- ing domain we believe directly contributed to its success. First, the full-length N-

ZF4-ORF2 GHL fusion protein seemed to be stably expressed in mammalian cells based on the Western blot analysis results (Figure 24). However, not all fusion proteins could be detected, including the N-ZF2-ORF2 GHL fusion protein, which also contained a six-fingered zinc finger targeting an alternate sequence in the 3’ end of the L1 sequence. The ZF2 zinc finger is very similar to the ZF4 pro-

170 tein, with the only notable difference being the DNA sequence targeted. Due to their close sequence similarity (nucleic acid and amino acid level), it is surprising that ZF2 and ZF4 fusion proteins showed contrasting expression levels. Overall, these data are indicative that the stability of individual ORF2 fusion proteins var- ies and is difficult to predict, even between similar DNA binding domains.

Second, the ZF4 zinc finger functions and binds to its target DNA se- quence as a monomer. Although a previous study successfully retargeted the

DNA transposon Sleeping Beauty using the REP proteins LZ and TZ (which func- tion as a dimer and tetramer, respectively), we were unable to re-target Alu inser- tions to the AAVS1 locus by fusing these proteins to the N-terminus of the ORF2 protein (Ammar et al., 2012). Re-targeting Sleeping Beauty might have been successful because the transposon also functions as a multimer. From these da- ta, we conclude that multimerization requirements of both the DNA binding do- main and mobile element need to be taken into consideration when designing fusion proteins.

Third, the abundance of target sites in the genome is likely a limiting factor in determining the efficiency and capability of the fusion protein driving targeted

Alu insertions. We were unable to observe an enrichment of Alu insertions near the artificially introduce EGFP sequence when using our EGFP-targeting zinc- finger fusion proteins. This is in contrast to the ZF4 zinc finger, which targets a sequence in L1 that is present over 17,000 times in the human genome. An im- portant factor to consider is our ability to detect insertions at a target site that is present only once in the genome. For example, if we assume there are ~1 mil-

171 lion endonuclease sites available in the genome, in order to detect an insertion into one unique loci, 1 million inserts need to be recovered. In our case, our best

ZF4-ORF2 fusion showed a 47 fold enrichment of Alu inserts. This would mean that detecting an insertion into one unique loci would require the analysis of over

21,000 inserts. With the Alu rescue assay, this would be an almost impossible task. Thus, abundance of the target sequence in the genome plays an important role in determining targeting capabilities of any fusion protein. It is not, however, the only factor that is important for the targeting success of these strategies. For example, the DNA binding domain ZF2 was unable to drive targeted Alu inser- tions when fused to the L1 ORF2 protein on either terminus. The ZF2 target se- quence is located approximately 250 bp upstream of the ZF4 target site, and is also abundant in the human genome (approximately 17,000 target sites). Alt- hough in vitro studies demonstrated that the ZF2 and ZF4 zinc fingers had com- parable binding efficiencies (Voigt et al., 2012), their targeting capabilities dif- fered in vivo. One possible explanation is that some fusion proteins containing the ZF2 zinc finger were unstable, as we were unable to detect expression of full- length N-ZF2-ORF2 fusion proteins through Western Blot analysis (Figure 29). It should be noted that even if we were unable to detect ZF2-ORF2 fusion protein expression, we cannot definitively conclude that the fusion protein is not being expressed. ORF2 expression has been notoriously difficult to detect via western blot analysis. Currently, there are no good commercial antibodies available. Fur- thermore, ORF2 antibodies developed in labs vary in their capabilities, and many detect non-specific bands. However, all N-ZF2-ORF2 constructs were able to

172 drive Alu retrotransposition events with comparable efficiencies to the unfused

ORF2 protein. Therefore, at least the ORF2 portion of the ZF2-ORF2 fusion pro- teins needs to be stable long enough to drive retrotransposition events.

Fourth, the CRISPR/Cas9 systems we utilized provided some valuable in- formation regarding the limitations of our targeting systems. Our four constructs that utilized an endonuclease deficient ORF2 protein (Cas9-Endo--, Cas9-

RTCYS, Nickase- Endo--, and Nickase-RTCYS) were unable to drive Alu re- trotransposition events, even though three out of the four fusion proteins were able to cleave the gRNA-directed WRN target DNA. Additionally, the ORF2 pro- tein still maintained retrotransposition capabilities. Therefore, the cleavage ca- pabilities of either the Cas9 or nickase proteins are unable to substitute for the

ORF2 endonuclease, suggesting that the endonuclease domain may have other functions needed for retrotransposition. For example, the DNA at the cleavage site might need to be immediately positioned in a highly specific manner for re- verse transcription to occur, which might only happen upon ORF2-mediated cleavage. It is likely that ORF2 positioning is lost when the cleavage site is pro- vided by a different nuclease, such as the Cas9 endonuclease and nickase pro- teins. Although we were able to recover a few insertions driven by these con- structs, no effective targeting was observed when using gRNAs that target either a unique or abundant genomic location showed. Additionally, strategies that re- lied on the inactive dCas9 protein as a DBD were also unable to drive targeted

Alu insertions. These mechanisms likely involved too many components and processes that needed occur at a very specific time at the site of insertion. Fur-

173 thermore, the more components required to assemble in a complex, the less like- ly it will work. Therefore, strategies that are complex (either regarding the inser- tion and targeting machinery requirements, timing, or precision) are more likely to fail than simple strategies that only rely on fusing a DNA binding domain to the

ORF2 protein.

Overall, our results demonstrate that modifying ORF2 retrotransposition to target sequences is more complex than the simple introduction of a DNA binding domain. A successful domain must be able to function independently of the

ORF2 protein without interfering with the activities of the ORF2 protein, either through size or mechanism. Additionally, we observed that DNA binding do- mains re-target Alu insertions more successfully when there are thousands of target sequences in the genome, which is likely not useful for gene therapy strat- egies and approaches. Lastly, adapting complex systems with stringent targeting requirements could push these strategies to their limit. Although more effort needs to be put in to determining the exact requirements and limitations of a suc- cessful ORF2 targeting system, it is clear these preferences are currently very difficult to discern.

174

BIBLIOGRAPHY

Bibliography

Ade,C. and Roy-Engel,A.M. (2016). SINE Retrotransposition: Evaluation of Alu Activity and Recovery of De Novo Inserts. Methods Mol. Biol. 1400, 183-201.

Ade,C., Roy-Engel,A.M., and Deininger,P.L. (2013). Alu elements: an intrinsic source of human genome instability. Curr. Opin. Virol. 3, 639-645.

Alisch,R.S., Garcia-Perez,J.L., Muotri,A.R., Gage,F.H., and Moran,J.V. (2006). Unconventional translation of mammalian LINE-1 retrotransposons. Genes Dev. 20, 210-224.

Ammar,I., Gogol-Doring,A., Miskey,C., Chen,W., Cathomen,T., Izsvak,Z., and Ivics,Z. (2012). Retargeting transposon insertions by the adeno-associated virus Rep protein. Nucleic Acids Res. 40, 6693-6712.

An,W., Dai,L., Niewiadomska,A.M., Yetil,A., O'Donnell,K.A., Han,J.S., and Boeke,J.D. (2011). Characterization of a synthetic human LINE-1 retrotransposon ORFeus-Hs. Mob. DNA 2, 2.

Apoil,P.A., Kuhlein,E., Robert,A., Rubie,H., and Blancher,A. HIGM syndrome caused by insertion of an AluYb8 element in exon 1 of the CD40LG gene. Immunogenetics.

Arai,R., Ueda,H., Kitayama,A., Kamiya,N., and Nagamune,T. (2001). Design of the linkers which effectively separate domains of a bifunctional fusion protein. Protein Eng 14, 529-532.

Aravin,A.A., Sachidanandam,R., Bourc'his,D., Schaefer,C., Pezic,D., Toth,K.F., Bestor,T., and Hannon,G.J. (2008). A piRNA Pathway Primed by Individual Transposons Is Linked to De Novo DNA Methylation in Mice. Molecular Cell 31, 785-799.

Ardeljan,D., Taylor,M.S., Ting,D.T., and Burns,K.H. (2017). The Human Long Interspersed Element-1 Retrotransposon: An Emerging Biomarker of Neoplasia. Clin. Chem. 63, 816-822.

Awano,H., Malueka,R.G., Yagi,M., Okizuka,Y., Takeshima,Y., and Matsuo,M. (2010). Contemporary retrotransposition of a novel non-coding gene induces exon-skipping in dystrophin mRNA. J Hum. Genet.

Baillie,J.K., Barnett,M.W., Upton,K.R., Gerhardt,D.J., Richmond,T.A., De,S.F., Brennan,P., Rizzu,P., Smith,S., Fell,M., Talbot,R.T., Gustincich,S., Freeman,T.C., Mattick,J.S., Hume,D.A., Heutink,P., Carninci,P., Jeddeloh,J.A., and Faulkner,G.J.

175

(2011). Somatic retrotransposition alters the genetic landscape of the human brain. Nature.

Barzilay,G., Walker,L.J., Robson,C.N., and Hickson,I.D. (1995). Site-directed mutagenesis of the human DNA repair enzyme HAP1: identification of residues important for AP endonuclease and RNase H activity. Nucleic Acids Res. 23, 1544-1550.

Batzer,M.A. and Deininger,P.L. (2002). Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370-379.

Batzer,M.A., Kilroy,G.E., Richard,P.E., Shaikh,T.H., Desselle,T.D., Hoppens,C.L., and Deininger,P.L. (1990). Structure and variability of recently inserted Alu family members [published erratum appears in Nucleic Acids Res 1991 Feb 11;19(3):698-9]. Nucleic. Acids. Res. 18, 6793-6798.

Beck,C.R., Collier,P., Macfarlane,C., Malig,M., Kidd,J.M., Eichler,E.E., Badge,R.M., and Moran,J.V. (2010). LINE-1 Retrotransposition Activity in Human Genomes. Cell 141, 1159-1170.

Becker,K.G., Swergold,G.D., Ozato,K., and Thayer,R.E. (1993). Binding of the ubiquitous nuclear transcription factor YY1 to a cis regulatory sequence in the human LINE-1 . Hum. Mol. Genet. 2, 1697-1702.

Beerli,R.R. and Barbas,C.F., III (2002). Engineering polydactyl zinc-finger transcription factors. Nat. Biotechnol. 20, 135-141.

Beerli,R.R., Dreier,B., and Barbas,C.F., III (2000). Positive and negative regulation of endogenous genes by designed transcription factors. Proc. Natl. Acad. Sci. U. S. A 97, 1495-1500.

Beerli,R.R., Segal,D.J., Dreier,B., and Barbas,C.F., III (1998). Toward controlling gene expression at will: specific regulation of the erbB-2/HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. Proc. Natl. Acad. Sci. U. S. A 95, 14628-14633.

Belancio,V.P., Roy-Engel,A.M., and Deininger,P. (2008). The impact of multiple splice sites in human L1 elements. Gene 411, 38-45.

Belancio,V.P., Roy-Engel,A.M., Pochampally,R.R., and Deininger,P. (2010). Somatic expression of LINE-1 elements in human tissues. Nucleic Acids Res. 38, 3909-3922.

Belancio,V.P., Hedges,D.J., and Deininger,P. (2006). LINE-1 RNA splicing and influences on mammalian gene expression. Nucleic Acids Research 34, 1512-1521.

Bennett,E.A., Keller,H., Mills,R.E., Schmidt,S., Moran,J.V., Weichenrieder,O., and Devine,S.E. (2008). Active Alu retrotransposons in the human genome. Genome Res 18, 1875-1883.

Bibikova,M., Beumer,K., Trautman,J.K., and Carroll,D. (2003). Enhancing gene targeting with designed zinc finger nucleases. Science 300, 764.

176

Bibikova,M., Carroll,D., Segal,D.J., Trautman,J.K., Smith,J., Kim,Y.G., and Chandrasegaran,S. (2001). Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Mol. Cell Biol. 21, 289-297.

Bibikova,M., Golic,M., Golic,K.G., and Carroll,D. (2002). Targeted chromosomal cleavage and mutagenesis in Drosophila using zinc-finger nucleases. 161, 1169-1175.

Biessmann,H., Valgeirsdottir,K., Lofsky,A., Chin,C., Ginther,B., Levis,R.W., and Pardue,M.L. (1992). HeT-A, a transposable element specifically involved in "healing" broken chromosome ends in Drosophila melanogaster. Mol. Cell Biol. 12, 3910-3918.

Birchmeier,C., Birchmeier,W., Gherardi,E., and Vande Woude,G.F. (2003). Met, metastasis, motility and more. Nat. Rev. Mol. Cell Biol. 4, 915-925.

Bitinaite,J., Wah,D.A., Aggarwal,A.K., and Schildkraut,I. (1998). FokI dimerization is required for DNA cleavage. Proc. Natl. Acad. Sci. U. S. A 95, 10570-10575.

Boch,J. and Bonas,U. (2010). Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu. Rev. Phytopathol. 48, 419-436.

Boch,J., Scholze,H., Schornack,S., Landgraf,A., Hahn,S., Kay,S., Lahaye,T., Nickstadt,A., and Bonas,U. (2009). Breaking the code of DNA binding specificity of TAL- type III effectors. Science 326, 1509-1512.

Boeke,J.D. (1997). LINEs and Alus--the polyA connection. Nat. Genet. 16, 6-7.

Bogdanove,A.J., Schornack,S., and Lahaye,T. (2010). TAL effectors: finding plant genes for disease and defense. Curr. Opin. Plant Biol. 13, 394-401.

Bogerd,H.P., Wiegand,H.L., Hulme,A.E., Garcia-Perez,J.L., O'shea,K.S., Moran,J.V., and Cullen,B.R. (2006). Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc. Natl. Acad. Sci. U. S. A 103, 8780-8785.

Boissinot,S., Chevret,P., and Furano,A.V. (2000). L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol. Biol. Evol. 17, 915-928.

Bollati,V., Baccarelli,A., Hou,L., Bonzini,M., Fustinoni,S., Cavallo,D., Byun,H.M., Jiang,J., Marinelli,B., Pesatori,A.C., Bertazzi,P.A., and Yang,A.S. (2007). Changes in DNA methylation patterns in subjects exposed to low-dose benzene. Cancer Res. 67, 876-880.

Bottomley,M.J., Collard,M.W., Huggenvik,J.I., Liu,Z., Gibson,T.J., and Sattler,M. (2001). The SAND domain structure defines a novel DNA-binding fold in transcriptional regulation. Nat. Struct. Biol. 8, 626-633.

Bourc'his,D. and Bestor,T.H. (2004). Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature 431, 96-99.

177

Brouha,B., Meischl,C., Ostertag,E., de Boer,M., Zhang,Y., Neijens,H., Roos,D., and Kazazian,H.H., Jr. (2002). Evidence consistent with human L1 retrotransposition in maternal meiosis I. Am. J Hum. Genet. 71, 327-336.

Brouha,B., Schustak,J., Badge,R.M., Lutz-Prigge,S., Farley,A.H., Moran,J.V., and Kazazian,H.H., Jr. (2003). Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl. Acad. Sci. U. S. A 100, 5280-5285.

Brown,R.S., Sander,C., and Argos,P. (1985). The primary structure of transcription factor TFIIIA has 12 consecutive repeats. Febs Lett. 186, 271-274.

Burke,W.D., Calalang,C.C., and Eickbush,T.H. (1987). The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme. Mol. Cell Biol. 7, 2221-2230.

Burns,K.H. (2017). Transposable elements in cancer. Nat. Rev. Cancer 17, 415-424.

Cajuso,T., Hanninen,U.A., Kondelin,J., Gylfe,A.E., Tanskanen,T., Katainen,R., Pitkanen,E., Ristolainen,H., Kaasinen,E., Taipale,M., Taipale,J., Bohm,J., Renkonen- Sinisalo,L., Mecklin,J.P., Jarvinen,H., Tuupanen,S., Kilpivaara,O., and Vahteristo,P. (2014). Exome sequencing reveals frequent inactivating mutations in ARID1A, ARID1B, ARID2 and ARID4A in unstable colorectal cancer. Int. J. Cancer 135, 611- 623.

Callinan,P.A. and Batzer,M.A. (2006). Retrotransposable elements and human disease. Genome Dyn. 1, 104-115.

Canella,D., Praz,V., Reina,J.H., Cousin,P., and Hernandez,N. (2010). Defining the RNA polymerase III transcriptome: Genome-wide localization of the RNA polymerase III transcription machinery in human cells. Genome Res. 20, 710-721.

Capkova,F.R., Biessmann,H., and Mason,J.M. (2008). Regulation of telomere length in Drosophila. Cytogenet. Genome Res. 122, 356-364.

Carbery,I.D., Ji,D., Harrington,A., Brown,V., Weinstein,E.J., Liaw,L., and Cui,X. (2010). Targeted genome modification in mice using zinc-finger nucleases. Genetics 186, 451- 459.

Carroll,D. (2011). Genome engineering with zinc-finger nucleases. Genetics 188, 773- 782.

Cathomen,T., Collete,D., and Weitzman,M.D. (2000). A chimeric protein containing the N terminus of the adeno-associated virus Rep protein recognizes its target site in an in vivo assay. J Virol. 74, 2372-2382.

Cermak,T., Doyle,E.L., Christian,M., Wang,L., Zhang,Y., Schmidt,C., Baller,J.A., Somia,N.V., Bogdanove,A.J., and Voytas,D.F. (2011). Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 39, e82.

178

Chang,D.Y., Hsu,K., and Maraia,R.J. (1996). Monomeric scAlu and nascent dimeric Alu RNAs induced by adenovirus are assembled into SRP9/14-containing RNPs in HeLa cells. Nucleic Acids Res 24, 4165-4170.

Chen,J.D. and Pirrotta,V. (1993). Multimerization of the Drosophila zeste protein is required for efficient DNA binding. Embo J. 12, 2075-2083.

Chiu,Y.L., Witkowska,H.E., Hall,S.C., Santiago,M., Soros,V.B., Esnault,C., Heidmann,T., and Greene,W.C. (2006). High-molecular-mass APOBEC3G complexes restrict Alu retrotransposition. Proc. Natl. Acad. Sci U. S. A 103, 15588-15593.

Cho,S.W., Kim,S., Kim,J.M., and Kim,J.S. (2013). Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230-232.

Christensen,S.M., Bibillo,A., and Eickbush,T.H. (2005). Role of the Bombyx mori R2 element N-terminal domain in the target-primed reverse transcription (TPRT) reaction. Nucleic Acids Res. 33, 6461-6468.

Christensen,S.M., Ye,J., and Eickbush,T.H. (2006). RNA from the 5' end of the R2 retrotransposon controls R2 protein binding to and cleavage of its DNA target site. Proc. Natl. Acad. Sci U. S. A 103, 17602-17607.

Christian,M., Cermak,T., Doyle,E.L., Schmidt,C., Zhang,F., Hummel,A., Bogdanove,A.J., and Voytas,D.F. (2010). Targeting DNA double-strand breaks with TAL effector nucleases. Genetics 186, 757-761.

Clements,A.P. and Singer,M.F. (1998). The human LINE-1 reverse transcriptase:effect of deletions outside the common reverse transcriptase domain. Nucleic Acids Res. 26, 3528-3535.

Comeaux,M.S., Roy-Engel,A.M., Hedges,D.J., and Deininger,P.L. (2009). Diverse cis factors controlling Alu retrotransposition: What causes Alu elements to die? Genome Res 19, 545-555.

Cong,L., Ran,F.A., Cox,D., Lin,S., Barretto,R., Habib,N., Hsu,P.D., Wu,X., Jiang,W., Marraffini,L.A., and Zhang,F. (2013). Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823.

Cong,L., Zhou,R., Kuo,Y.C., Cunniff,M., and Zhang,F. (2012). Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains. Nat. Commun. 3, 968.

Cordaux,R. and Batzer,M.A. (2009). The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10, 691-703.

Cordaux,R., Hedges,D.J., Herke,S.W., and Batzer,M.A. (2006). Estimating the retrotransposition rate of human Alu elements. Gene.

Cost,G.J. and Boeke,J.D. (1998). Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry 37, 18081-18093.

179

Dai,L., Huang,Q., and Boeke,J.D. (2011). Effect of reverse transcriptase inhibitors on LINE-1 and Ty1 reverse transcriptase activities and on LINE-1 retrotransposition. BMC. Biochem. 12, 18.

Daniels,G. and Deininger,P. (1985). Repeat sequence families derived from mammalian tRNA Genes. Nature 317, 819-822.

Daniels,G. and Deininger,P.L. (1991). Characterization of a third major SINE family of repetitive sequences in the galago genome. Nucleic Acids Res. 19, 1649-1656. de Koning,A.P., Gu,W., Castoe,T.A., Batzer,M.A., and Pollock,D.D. (2011). Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7, e1002384. deHaro,D., Kines,K.J., Sokolowski,M., Dauchy,R.T., Streva,V.A., Hill,S.M., Hanifin,J.P., Brainard,G.C., Blask,D.E., and Belancio,V.P. (2014). Regulation of L1 expression and retrotransposition by melatonin and its receptor: implications for cancer risk associated with light exposure at night. Nucleic Acids Res. 42, 7694-7707.

Deininger,P.L. and Batzer,M.A. (1999). Alu repeats and human disease. Mol. Genet. Metab. 67, 183-193.

Deininger,P.L., Jolly,D., Rubin,C., Friedmann,T., and Schmid,C.W. (1981). Base Sequence Studies of 300 Nucleotide Renatured Repeated Human DNA Clones. J. Mol. Biol. 151, 17-33.

Deininger,P.L., Moran,J.V., Batzer,M.A., and Kazazian,H.H., Jr. (2003). Mobile elements and mammalian genome evolution. Curr. Opin. Genet. Dev. 13, 651-658.

Deltcheva,E., Chylinski,K., Sharma,C.M., Gonzales,K., Chao,Y., Pirzada,Z.A., Eckert,M.R., Vogel,J., and Charpentier,E. (2011). CRISPR RNA maturation by trans- encoded small RNA and host factor RNase III. Nature 471, 602-607.

Deng,D., Yan,C., Pan,X., Mahfouz,M., Wang,J., Zhu,J.K., Shi,Y., and Yan,N. (2012). Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335, 720-723.

Dewannieux,M., Esnault,C., and Heidmann,T. (2003). LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35, 41-48.

Dewannieux,M. and Heidmann,T. (2005). Role of poly(A) tail length in Alu retrotransposition. Genomics 86, 378-381.

Dewannieux,M. and Heidmann,T. (2013). Endogenous retroviruses: acquisition, amplification and taming of genome invaders. Curr. Opin. Virol. 3, 646-656.

Dhellin,O., Maestre,J., and Heidmann,T. (1997). Functional differences between the human LINE retrotransposon and retroviral reverse transcriptases for in vivo mRNA reverse transcription. Embo J. 16, 6590-6602.

180

Dmitriev,S.E., Andreev,D.E., Terenin,I.M., Olovnikov,I.A., Prassolov,V.S., Merrick,W.C., and Shatsky,I.N. (2007). Efficient Translation Initiation Directed by the 900 Nucleotides- Long and GC-Rich 5' UTR of the Human Retrotransposon LINE-1 mRNA is Strictly Cap- Dependent Rather Than IRES-Mediated. Mol. Cell Biol. 27, 4685-4697.

Dombroski,B.A., Mathias,S.L., Nanthakumar,E., Scott,A.F., and Kazazian,H.H. (1991). Isolation of an active human transposable element. Science 254, 1805-1810.

Doucet-O'Hare,T.T., Rodic,N., Sharma,R., Darbari,I., Abril,G., Choi,J.A., Young,A.J., Cheng,Y., Anders,R.A., Burns,K.H., Meltzer,S.J., and Kazazian,H.H., Jr. (2015). LINE-1 expression and retrotransposition in Barrett's esophagus and esophageal carcinoma. Proc. Natl. Acad. Sci. U. S. A 112, E4894-E4900.

Doudna,J.A. and Charpentier,E. (2014). Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096.

Doyon,Y., McCammon,J.M., Miller,J.C., Faraji,F., Ngo,C., Katibah,G.E., Amora,R., Hocking,T.D., Zhang,L., Rebar,E.J., Gregory,P.D., Urnov,F.D., and Amacher,S.L. (2008). Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases. Nat. Biotechnol. 26, 702-708.

Duncan,C.H., Jagadeeswaran,P., Wang,R.R., and Weissman,S.M. (1981). Structural analysis of templates and RNA polymerase III transcripts of Alu family sequences interspersed among the human beta-like globin genes. Gene 13, 185-196.

Ebina,H., Misawa,N., Kanemura,Y., and Koyanagi,Y. (2013). Harnessing the CRISPR/Cas9 system to disrupt latent HIV-1 provirus. Sci. Rep. 3, 2510.

Economou-Pachnis,A. and Tsichlis,P.N. (1985). Insertion of an Alu SINE in the human homologue of the Mlvi-2 locus. Nucleic Acids Research 13, 8379-8387.

Eickbush,T.H. and Jamburuthugoda,V.K. (2008). The diversity of retrotransposons and the properties of their reverse transcriptases. Virus Res. 134, 221-234.

Ejima,Y. and Yang,L. (2003). Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling. Hum. Mol. Genet. 12, 1321-1328.

Englander,E.W. and Howard,B.H. (1995). Nucleosome positioning by human Alu elements in chromatin. J. Biol. Chem. 270, 10091-10096.

Englander,E.W., Wolffe,A.P., and Howard,B.H. (1993). Nucleosome interactions with a human Alu element. Transcriptional repression and effects of template methylation. J. Biol. Chem. 268, 19565-19573.

Esnault,C., Casella,J.F., and Heidmann,T. (2002). A Tetrahymena thermophila ribozyme-based indicator gene to detect transposition of marked retroelements in mammalian cells. Nucleic Acids Res. 30, e49.

Ewing,A.D. and Kazazian,H.H., Jr. (2010). High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 20, 1262-1270.

181

Farkash,E.A., Kao,G.D., Horman,S.R., and Prak,E.T. (2006). Gamma radiation increases endonuclease-dependent L1 retrotransposition in a cultured cell assay. Nucleic Acids Res 34, 1196-1204.

Feil,R., Wagner,J., Metzger,D., and Chambon,P. (1997). Regulation of Cre recombinase activity by mutated estrogen receptor ligand-binding domains. Biochem. Biophys. Res. Commun. 237, 752-757.

Feng,Q., Moran,J.V., Kazazian HH,J.r., and Boeke,J.D. (1996). Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905-916.

Fields,S. and Song,O. (1989). A novel genetic system to detect protein-protein interactions. Nature 340, 245-246.

Florl,A.R., Lower,R., Schmitz-Drager,B.J., and Schulz,W.A. (1999). DNA methylation and expression of LINE-1 and HERV-K provirus sequences in urothelial and renal cell carcinomas. Br. J. Cancer 80, 1312-1321.

Flotte,T.R., Afione,S.A., and Zeitlin,P.L. (1994). Adeno-associated virus vector gene expression occurs in nondividing cells in the absence of vector DNA integration. Am. J. Respir. Cell Mol. Biol. 11, 517-521.

Foley,J.E., Yeh,J.R., Maeder,M.L., Reyon,D., Sander,J.D., Peterson,R.T., and Joung,J.K. (2009). Rapid mutation of endogenous zebrafish genes using zinc finger nucleases made by Oligomerized Pool ENgineering (OPEN). PLoS. ONE. 4, e4348.

Freeman,J.D., Goodchild,N.L., and Mager,D.L. (1994). A modified indicator gene for selection of retrotransposition events in mammalian cells. Biotechniques 17, 46, 48-49, 52.

Fuhrman,S., Deininger,P.L., LaPorte,P., Friedmann,T., and Geiduschek,P. (1981). Analysis of Transcription of the Human Alu Family of Ubiquitous Repeating Element by Eukaryotic RNA Polymerase III. Nucleic Acid. Res 9, 6439-6455.

Fujiwara,H. (2015). Site-specific non-LTR retrotransposons. Microbiol. Spectr. 3, MDNA3-2014.

Furano,A.V. (2000). The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Prog. Nucleic Acid Res. Mol. Biol. 64, 255-294.

Gaj,T., Gersbach,C.A., and Barbas,C.F., III (2013). ZFN, TALEN, and CRISPR/Cas- based methods for genome engineering. Trends Biotechnol. 31, 397-405.

Gaj,T., Guo,J., Kato,Y., Sirk,S.J., and Barbas,C.F., III (2012). Targeted gene knockout by direct delivery of zinc-finger nuclease proteins. Nat. Methods 9, 805-807.

Ganguly,A., Dunbar,T., Chen,P., Godmilow,L., and Ganguly,T. (2003). Exon skipping caused by an intronic insertion of a young Alu Yb9 element leads to severe hemophilia A. Hum. Genet. 113, 348-352.

182

Gasior,S.L., Preston,G., Hedges,D.J., Gilbert,N., Moran,J.V., and Deininger,P.L. (2006a). Characterization of pre-insertion loci of de novo L1 insertions. Gene.

Gasior,S.L., Roy-Engel,A.M., and Deininger,P.L. (2008). ERCC1/XPF limits L1 retrotransposition. DNA Repair (Amst) 7, 983-989.

Gasior,S.L., Wakeman,T.P., Xu,B., and Deininger,P.L. (2006b). The human LINE-1 retrotransposon creates DNA double-strand breaks. J. Mol. Biol. 357, 1383-1393.

Gasiunas,G., Barrangou,R., Horvath,P., and Siksnys,V. (2012). Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. U. S. A 109, E2579-E2586.

George,J.A., Burke,W.D., and Eickbush,T.H. (1996). Analysis of the 5' junctions of R2 insertions with the 28S gene: implications for non-LTR retrotransposition. Genetics 142, 853-863.

Geurts,A.M., Cost,G.J., Freyvert,Y., Zeitler,B., Miller,J.C., Choi,V.M., Jenkins,S.S., Wood,A., Cui,X., Meng,X., Vincent,A., Lam,S., Michalkiewicz,M., Schilling,R., Foeckler,J., Kalloway,S., Weiler,H., Menoret,S., Anegon,I., Davis,G.D., Zhang,L., Rebar,E.J., Gregory,P.D., Urnov,F.D., Jacob,H.J., and Buelow,R. (2009). Knockout rats via embryo microinjection of zinc-finger nucleases. Science 325, 433.

Gilbert,L.A., Larson,M.H., Morsut,L., Liu,Z., Brar,G.A., Torres,S.E., Stern-Ginossar,N., Brandman,O., Whitehead,E.H., Doudna,J.A., Lim,W.A., Weissman,J.S., and Qi,L.S. (2013). CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442-451.

Gilbert,N., Lutz,S., Morrish,T.A., and Moran,J.V. (2005). Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol. Cell Biol. 25, 7780-7795.

Gilbert,N., Lutz-Prigge,S., and Moran,J.V. (2002). Genomic deletions created upon LINE-1 retrotransposition. Cell 110, 315-325.

Gingras,A.C., Raught,B., and Sonenberg,N. (1999). eIF4 initiation factors: effectors of mRNA recruitment to ribosomes and regulators of translation. Annu. Rev. Biochem. 68, 913-963.

Goodier, J. L., Cheung, L. E., and Kazazian, H. H. Jr. MOV10 RNA helicase is a potent inhibitor of retrotransposition in cells. PloS Genetics . 2012. Ref Type: In Press

Goodier,J.L., Mandal,P.K., Zhang,L., and Kazazian,H.H., Jr. (2010). Discrete subcellular partitioning of human retrotransposon RNAs despite a common mechanism of genome insertion. Hum. Mol Genet.

Goodier,J.L., Ostertag,E.M., Engleka,K.A., Seleme,M.C., and Kazazian,H.H., Jr. (2004). A potential role for the nucleolus in L1 retrotransposition. Hum. Mol. Genet. 13, 1041- 1048.

183

Goodier,J.L., Zhang,L., Vetter,M.R., and Kazazian,H.H., Jr. (2007). LINE-1 ORF1 Protein Localizes in Stress Granules with Other RNA-Binding Proteins, Including Components of RNA Interference RNA-Induced Silencing Complex. Mol. Cell Biol. 27, 6469-6483.

Haft,D.H., Selengut,J., Mongodin,E.F., and Nelson,K.E. (2005). A guild of 45 CRISPR- associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS. Comput. Biol. 1, e60.

Hagan,C.R., Sheffield,R.F., and Rudin,C.M. (2003). Human Alu element retrotransposition induced by genotoxic stress. Nat. Genet. 35, 219-220.

Hamilton,D.L. and Abremski,K. (1984). Site-specific recombination by the bacteriophage P1 lox-Cre system. Cre-mediated synapsis of two lox sites. J. Mol. Biol. 178, 481-486.

Han,K., Lee,J., Meyer,T.J., Remedios,P., Goodwin,L., and Batzer,M.A. (2008). L1 recombination-associated deletions generate human genomic variation. Proc. Natl. Acad. Sci. U. S. A 105, 19366-19371.

Hancks,D.C., Goodier,J.L., Mandal,P.K., Cheung,L.E., and Kazazian,H.H., Jr. (2011). Retrotransposition of marked SVA elements by human L1s in cultured cells. Hum. Mol Genet 20, 3386-3400.

Hancks,D.C. and Kazazian,H.H., Jr. (2012). Active human retrotransposons: variation and disease. Curr. Opin. Genet Dev. 22, 191-203.

Hancks,D.C. and Kazazian,H.H., Jr. (2016). Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9.

Harris,C.R., Normart,R., Yang,Q., Stevenson,E., Haffty,B.G., Ganesan,S., Cordon- Cardo,C., Levine,A.J., and Tang,L.H. (2010). Association of Nuclear Localization of a Long Interspersed Nuclear Element-1 Protein in Breast Tumors with Poor Prognostic Outcomes. Genes Cancer 1, 115-124.

Hata,K. and Sakaki,Y. (1997). Identification of critical CpG sites for repression of L1 transcription by DNA methylation. Gene 189, 227-234.

Havecker,E.R., Gao,X., and Voytas,D.F. (2004). The diversity of LTR retrotransposons. Genome Biol. 5, 225.

Hedges,D.J. and Deininger,P.L. (2007). Inviting instability: Transposable elements, double-strand breaks, and the maintenance of genome integrity. Mutat. Res. 616, 46-59.

Helman,E., Lawrence,M.S., Stewart,C., Sougnez,C., Getz,G., and Meyerson,M. (2014). Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053-1063.

Heras,S.R., Macias,S., Plass,M., Fernandez,N., Cano,D., Eyras,E., Garcia-Perez,J.L., and Caceres,J.F. (2013). The Microprocessor controls the activity of mammalian retrotransposons. Nat. Struct. Mol. Biol. 20, 1173-1181.

184

Herzog,R.W., Hagstrom,J.N., Kung,S.H., Tai,S.J., Wilson,J.M., Fisher,K.J., and High,K.A. (1997). Stable gene transfer and expression of human blood coagulation factor IX after intramuscular injection of recombinant adeno-associated virus. Proc. Natl. Acad. Sci. U. S. A 94, 5804-5809.

Hockemeyer,D., Wang,H., Kiani,S., Lai,C.S., Gao,Q., Cassady,J.P., Cost,G.J., Zhang,L., Santiago,Y., Miller,J.C., Zeitler,B., Cherone,J.M., Meng,X., Hinkley,S.J., Rebar,E.J., Gregory,P.D., Urnov,F.D., and Jaenisch,R. (2011). Genetic engineering of human pluripotent cells using TALE nucleases. Nat. Biotechnol. 29, 731-734.

Holland,E.C., Hively,W.P., DePinho,R.A., and Varmus,H.E. (1998). A constitutively active epidermal growth factor receptor cooperates with disruption of G1 cell-cycle arrest pathways to induce glioma-like lesions in mice. Genes Dev. 12, 3675-3685.

Holmes,S.E., Dombroski,B.A., Krebs,C.M., Boehm,C.D., and Kazazian HH,J.r. (1994). A new retrotransposable human L1 element from the LRE2 locus on chromosome 1q produces a chimaeric insertion. Nat Genet 7, 143-148.

Holt,N., Wang,J., Kim,K., Friedman,G., Wang,X., Taupin,V., Crooks,G.M., Kohn,D.B., Gregory,P.D., Holmes,M.C., and Cannon,P.M. (2010). Human hematopoietic stem/progenitor cells modified by zinc-finger nucleases targeted to CCR5 control HIV-1 in vivo. Nat. Biotechnol. 28, 839-847.

Hossain,M.B., Vahter,M., Concha,G., and Broberg,K. (2012). Low-level environmental cadmium exposure is associated with DNA hypomethylation in Argentinean women. Environ. Health Perspect. 120, 879-884.

Hsu,K., Chang,D.Y., and Maraia,R.J. (1995). Human signal recognition particle (SRP) Alu-associated protein also binds Alu interspersed repeat sequence RNAs. Characterization of human SRP9. J Biol Chem 270, 10179-10186.

Hudson,W.H. and Ortlund,E.A. (2014). The structure, function and evolution of proteins that bind DNA and RNA. Nat. Rev. Mol. Cell Biol. 15, 749-760.

Huen,K., Calafat,A.M., Bradman,A., Yousefi,P., Eskenazi,B., and Holland,N. (2016). Maternal phthalate exposure during pregnancy is associated with DNA methylation of LINE-1 and Alu repetitive elements in Mexican-American children. Environ. Res. 148, 55-62.

Hulme,A.E., Bogerd,H.P., Cullen,B.R., and Moran,J.V. (2007). Selective inhibition of Alu retrotransposition by APOBEC3G. Gene 390, 199-205.

Im,D.S. and Muzyczka,N. (1989). Factors that bind to adeno-associated virus terminal repeats. J. Virol. 63, 3095-3104.

Im,D.S. and Muzyczka,N. (1990). The AAV origin binding protein Rep68 is an ATP- dependent site-specific endonuclease with DNA helicase activity. Cell 61, 447-457.

Indra,A.K., Warot,X., Brocard,J., Bornert,J.M., Xiao,J.H., Chambon,P., and Metzger,D. (1999). Temporally-controlled site-specific mutagenesis in the basal layer of the

185 epidermis: comparison of the recombinase activity of the tamoxifen-inducible Cre-ER(T) and Cre-ER(T2) recombinases. Nucleic Acids Res. 27, 4324-4327.

Intarasunanont,P., Navasumrit,P., Waraprasit,S., Chaisatra,K., Suk,W.A., Mahidol,C., and Ruchirawat,M. (2012). Effects of arsenic exposure on DNA methylation in cord blood samples from newborn babies and in a human lymphoblast cell line. Environ. Health 11, 31.

Iskow,R.C., McCabe,M.T., Mills,R.E., Torene,S., Pittard,W.S., Neuwald,A.F., Van Meir,E.G., Vertino,P.M., and Devine,S.E. (2010). Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 141, 1253-1261.

Izsvak,Z., Khare,D., Behlke,J., Heinemann,U., Plasterk,R.H., and Ivics,Z. (2002). Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J. Biol. Chem. 277, 34581-34588.

Jacobson,M.R. and Pederson,T. (1998). Localization of signal recognition particle RNA in the nucleolus of mammalian cells. Proc. Natl. Acad. Sci. U. S. A 95, 7981-7986.

Jang,S.K., Krausslich,H.G., Nicklin,M.J., Duke,G.M., Palmenberg,A.C., and Wimmer,E. (1988). A segment of the 5' nontranslated region of encephalomyocarditis virus RNA directs internal entry of ribosomes during in vitro translation. J. Virol. 62, 2636-2643.

Jinek,M., Chylinski,K., Fonfara,I., Hauer,M., Doudna,J.A., and Charpentier,E. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821.

Jurka,J. (1997). Sequence patterns indicate an enzymatic involvement in integration of mammalian . Proc. Natl. Acad. Sci. U. S. A. 94, 1872-1877.

Kaer,K. and Speek,M. (2013). Retroelements in human disease. Gene 518, 231-241.

Kaplitt,M.G., Feigin,A., Tang,C., Fitzsimons,H.L., Mattis,P., Lawlor,P.A., Bland,R.J., Young,D., Strybing,K., Eidelberg,D., and During,M.J. (2007). Safety and tolerability of gene therapy with an adeno-associated virus (AAV) borne GAD gene for Parkinson's disease: an open label, phase I trial. Lancet 369, 2097-2105.

Kaplitt,M.G., Leone,P., Samulski,R.J., Xiao,X., Pfaff,D.W., O'Malley,K.L., and During,M.J. (1994). Long-term gene expression and phenotypic correction using adeno- associated virus vectors in the mammalian brain. Nat. Genet. 8, 148-154.

Kazazian,H.H., Jr. (2004). Mobile elements: drivers of genome evolution. Science 303, 1626-1632.

Kazazian,H.H., Wong,C., Youssoufian,H., Scott,A.F., Phillips,D.G., and Antonarakis,S.E. (1988). Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164-166.

Kim,Y., Kweon,J., Kim,A., Chon,J.K., Yoo,J.Y., Kim,H.J., Kim,S., Lee,C., Jeong,E., Chung,E., Kim,D., Lee,M.S., Go,E.M., Song,H.J., Kim,H., Cho,N., Bang,D., Kim,S., and

186

Kim,J.S. (2013). A library of TAL effector nucleases spanning the human genome. Nat. Biotechnol. 31, 251-258.

Kim,Y.G., Cha,J., and Chandrasegaran,S. (1996). Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U. S. A 93, 1156-1160.

Kim,Y.G. and Chandrasegaran,S. (1994). Chimeric restriction endonuclease. Proc. Natl. Acad. Sci. U. S. A 91, 883-887.

Kim,Y.G., Smith,J., Durgesha,M., and Chandrasegaran,S. (1998). Chimeric restriction enzyme: Gal4 fusion to FokI cleavage domain. Biol. Chem. 379, 489-495.

Kines,K.J., Sokolowski,M., deHaro,D.L., Christian,C.M., and Belancio,V.P. (2014). Potential for genomic instability associated with retrotranspositionally-incompetent L1 loci. Nucleic Acids Res. 42, 10488-10502.

Kloor,M., Sutter,C., Wentzensen,N., Cremer,F.W., Buckowitz,A., Keller,M., von Knebel,D.M., and Gebert,J. (2004). A large MSH2 Alu insertion mutation causes HNPCC in a German kindred. Hum. Genet. 115, 432-438.

Kloypan,C., Srisa-art,M., Mutirangura,A., and Boonla,C. (2015). LINE-1 hypomethylation induced by reactive oxygen species is mediated via depletion of S-adenosylmethionine. Cell Biochem. Funct. 33, 375-385.

Kogure,K., Urabe,M., Mizukami,H., Kume,A., Sato,Y., Monahan,J., and Ozawa,K. (2001). Targeted integration of foreign DNA into a defined locus on chromosome 19 in K562 cells using AAV-derived components. Int. J. Hematol. 73, 469-475.

Kolosha,V.O. and Martin,S.L. (1995). Polymorphic sequences encoding the first open reading frame protein from LINE-1 ribonucleoprotein particles. J Biol Chem 270, 2868- 2873.

Kolosha,V.O. and Martin,S.L. (2003). High-affinity, non-sequence-specific RNA binding by the open reading frame 1 (ORF1) protein from long interspersed nuclear element 1 (LINE-1). J. Biol. Chem. 278, 8112-8117.

Konermann,S., Brigham,M.D., Trevino,A., Hsu,P.D., Heidenreich,M., Cong,L., Platt,R.J., Scott,D.A., Church,G.M., and Zhang,F. (2013). Optical control of mammalian endogenous transcription and epigenetic states. Nature 500, 472-476.

Konermann,S., Brigham,M.D., Trevino,A.E., Joung,J., Abudayyeh,O.O., Barcena,C., Hsu,P.D., Habib,N., Gootenberg,J.S., Nishimasu,H., Nureki,O., and Zhang,F. (2015). Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583-588.

Konkel,M.K. and Batzer,M.A. (2010). A mobile threat to genome stability: The impact of non-LTR retrotransposons upon the human genome. Semin. Cancer Biol 20, 211-221.

Kotin,R.M., Siniscalco,M., Samulski,R.J., Zhu,X.D., Hunter,L., Laughlin,C.A., McLaughlin,S., Muzyczka,N., Rocchi,M., and Berns,K.I. (1990). Site-specific integration by adeno-associated virus. Proc. Natl. Acad. Sci. U. S. A 87, 2211-2215.

187

Kroutter,E.N., Belancio,V.P., Wagstaff,B.J., and Roy-Engel,A.M. (2009). The RNA polymerase dictates ORF1 requirement and timing of LINE and SINE retrotransposition. PLoS. Genet. 5, e1000458.

Kubo,Y., Okazaki,S., Anzai,T., and Fujiwara,H. (2001). Structural and phylogenetic analysis of TRAS, telomeric repeat-specific non-LTR retrotransposon families in Lepidopteran insects. Mol. Biol. Evol. 18, 848-857.

Kulpa,D.A. and Moran,J.V. (2005). Ribonucleoprotein particle formation is necessary but not sufficient for LINE-1 retrotransposition. Hum. Mol. Genet. 14, 3237-3248.

Kulpa,D.A. and Moran,J.V. (2006). Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat. Struct. Mol. Biol. 13, 655-660.

Labuda,D. and Striker,G. (1989). Sequence conservation in Alu evolution. Nucleic. Acids. Res. 17, 2477-2491.

Laity,J.H., Lee,B.M., and Wright,P.E. (2001). Zinc finger proteins: new insights into structural and functional diversity. Curr. Opin. Struct. Biol. 11, 39-46.

Lakso,M., Sauer,B., Mosinger,B., Jr., Lee,E.J., Manning,R.W., Yu,S.H., Mulder,K.L., and Westphal,H. (1992). Targeted oncogene activation by site-specific recombination in transgenic mice. Proc. Natl. Acad. Sci. U. S. A 89, 6232-6236.

Lallemand,Y., Luria,V., Haffner-Krausz,R., and Lonai,P. (1998). Maternally expressed PGK-Cre transgene as a tool for early and uniform activation of the Cre site-specific recombinase. Transgenic Res. 7, 105-112.

Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W., Funke,R., Gage,D., Harris,K., Heaford,A., Howland,J., Kann,L., Lehoczky,J., LeVine,R., McEwan,P., McKernan,K., Meldrim,J., Mesirov,J.P., Miranda,C., Morris,W., Naylor,J., Raymond,C., Rosetti,M., Santos,R., Sheridan,A., Sougnez,C., Stange-Thomann,N., Stojanovic,N., Subramanian,A., Wyman,D., Rogers,J., Sulston,J., Ainscough,R., Beck,S., Bentley,D., Burton,J., Clee,C., Carter,N., Coulson,A., Deadman,R., Deloukas,P., Dunham,A., Dunham,I., Durbin,R., French,L., Grafham,D., Gregory,S., Hubbard,T., Humphray,S., Hunt,A., Jones,M., Lloyd,C., McMurray,A., Matthews,L., Mercer,S., Milne,S., Mullikin,J.C., Mungall,A., Plumb,R., Ross,M., Shownkeen,R., Sims,S., Waterston,R.H., Wilson,R.K., Hillier,L.W., McPherson,J.D., Marra,M.A., Mardis,E.R., Fulton,L.A., Chinwalla,A.T., Pepin,K.H., Gish,W.R., Chissoe,S.L., Wendl,M.C., Delehaunty,K.D., Miner,T.L., Delehaunty,A., Kramer,J.B., Cook,L.L., Fulton,R.S., Johnson,D.L., Minx,P.J., Clifton,S.W., Hawkins,T., Branscomb,E., Predki,P., Richardson,P., Wenning,S., Slezak,T., Doggett,N., Cheng,J.F., Olsen,A., Lucas,S., Elkin,C., Uberbacher,E., Frazier,M., Gibbs,R.A., Muzny,D.M., Scherer,S.E., Bouck,J.B., Sodergren,E.J., Worley,K.C., Rives,C.M., Gorrell,J.H., Metzker,M.L., Naylor,S.L., Kucherlapati,R.S., Nelson,D.L., Weinstock,G.M., Sakaki,Y., Fujiyama,A., Hattori,M., Yada,T., Toyoda,A., Itoh,T., Kawagoe,C., Watanabe,H., Totoki,Y., Taylor,T., Weissenbach,J., Heilig,R., Saurin,W., Artiguenave,F., Brottier,P., Bruls,T., Pelletier,E., Robert,C., Wincker,P., Smith,D.R., Doucette-Stamm,L., Rubenfield,M., Weinstock,K., Lee,H.M., Dubois,J., Rosenthal,A., Platzer,M., Nyakatura,G., Taudien,S., Rump,A., Yang,H., Yu,J., Wang,J., Huang,G., Gu,J., Hood,L., Rowen,L., Madan,A., Qin,S., Davis,R.W., Federspiel,N.A., Abola,A.P., Proctor,M.J.,

188

Myers,R.M., Schmutz,J., Dickson,M., Grimwood,J., Cox,D.R., Olson,M.V., Kaul,R., Raymond,C., Shimizu,N., Kawasaki,K., Minoshima,S., Evans,G.A., Athanasiou,M., Schultz,R., Roe,B.A., Chen,F., Pan,H., Ramser,J., Lehrach,H., Reinhardt,R., McCombie,W.R., de la,B.M., Dedhia,N., Blocker,H., Hornischer,K., Nordsiek,G., Agarwala,R., Aravind,L., Bailey,J.A., Bateman,A., Batzoglou,S., Birney,E., Bork,P., Brown,D.G., Burge,C.B., Cerutti,L., Chen,H.C., Church,D., Clamp,M., Copley,R.R., Doerks,T., Eddy,S.R., Eichler,E.E., Furey,T.S., Galagan,J., Gilbert,J.G., Harmon,C., Hayashizaki,Y., Haussler,D., Hermjakob,H., Hokamp,K., Jang,W., Johnson,L.S., Jones,T.A., Kasif,S., Kaspryzk,A., Kennedy,S., Kent,W.J., Kitts,P., Koonin,E.V., Korf,I., Kulp,D., Lancet,D., Lowe,T.M., McLysaght,A., Mikkelsen,T., Moran,J.V., Mulder,N., Pollara,V.J., Ponting,C.P., Schuler,G., Schultz,J., Slater,G., Smit,A.F., Stupka,E., Szustakowki,J., Thierry-Mieg,D., Wagner,L., Wallis,J., Wheeler,R., Williams,A., Wolf,Y.I., Wolfe,K.H., Yang,S.P., Yeh,R.F., Collins,F., Guyer,M.S., Peterson,J., Felsenfeld,A., Wetterstrand,K.A., Patrinos,A., and Morgan,M.J. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.

Lee,L. and Sadowski,P.D. (2003). Identification of Cre residues involved in synapsis, isomerization, and catalysis. J. Biol. Chem. 278, 36905-36915.

Leibold,D.M., Swergold,G.D., Singer,M.F., Thayer,R.E., Dombroski,B.A., and Fanning,T.G. (1990). Translation of LINE-1 DNA elements in vitro and in human cells. Proc. Natl. Acad. Sci. U. S. A 87, 6990-6994.

Levis,R.W., Ganesan,R., Houtchens,K., Tolar,L.A., and Sheen,F.M. (1993). Transposons in place of telomeric repeats at a Drosophila telomere. Cell 75, 1083-1093.

Li,L., Wu,L.P., and Chandrasegaran,S. (1992). Functional domains in Fok I restriction endonuclease. Proc. Natl. Acad. Sci. U. S. A 89, 4275-4279.

Li,P.W., Li,J., Timmerman,S.L., Krushel,L.A., and Martin,S.L. (2006). The dicistronic RNA from the mouse LINE-1 retrotransposon contains an internal ribosome entry site upstream of each ORF: implications for retrotransposition. Nucleic Acids Res 34, 853- 864.

Li,X., Scaringe,W.A., Hill,K.A., Roberts,S., Mengos,A., Careri,D., Pinto,M.T., Kasper,C.K., and Sommer,S.S. (2001). Frequency of recent retrotransposition events in the human factor IX gene. Hum. Mutat. 17, 511-519.

Li,Y., Moore,R., Guinn,M., and Bleris,L. (2012). Transcription activator-like effector hybrids for conditional control and rewiring of chromosomal transgene expression. Sci. Rep. 2, 897.

Linden,R.M., Winocour,E., and Berns,K.I. (1996). The recombination signals for adeno- associated virus site-specific integration. Proc. Natl. Acad. Sci. U. S. A 93, 7966-7972.

Lindtner,S., Felber,B.K., and Kjems,J. (2002). An element in the 3' untranslated region of human LINE-1 retrotransposon mRNA binds NXF1(TAP) and can function as a nuclear export element. RNA. 8, 345-356.

189

Liu,W.M., Maraia,R.J., Rubin,C.M., and Schmid,C.W. (1994). Alu transcripts: cytoplasmic localisation and regulation by DNA methylation. Nucleic. Acids. Res. 22, 1087-1095.

Liu,W.M. and Schmid,C.W. (1993). Proposed roles for DNA methylation in Alu transcriptional repression and mutational inactivation. Nucleic. Acids. Res. 21, 1351- 1359.

Luan,D.D. and Eickbush,T.H. (1995). RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol. Cell Biol. 15, 3882-3891.

Luan,D.D., Korman,M.H., Jakubczak,J.L., and Eickbush,T.H. (1993). Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595-605.

Luo,Y., Batalao,A., Zhou,H., and Zhu,L. (1997). Mammalian two-hybrid system: a complementary approach to the yeast two-hybrid system. Biotechniques 22, 350-352.

Luzhna,L., Ilnytskyy,Y., and Kovalchuk,O. (2015). Mobilization of LINE-1 in irradiated mammary gland tissue may potentially contribute to low dose radiation-induced genomic instability. Genes Cancer 6, 71-81.

Maeder,M.L., Linder,S.J., Cascio,V.M., Fu,Y., Ho,Q.H., and Joung,J.K. (2013). CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10, 977-979.

Maeder,M.L., Thibodeau-Beganny,S., Osiak,A., Wright,D.A., Anthony,R.M., Eichtinger,M., Jiang,T., Foley,J.E., Winfrey,R.J., Townsend,J.A., Unger-Wallace,E., Sander,J.D., Muller-Lerch,F., Fu,F., Pearlberg,J., Gobel,C., Dassie,J.P., Pruett- Miller,S.M., Porteus,M.H., Sgroi,D.C., Iafrate,A.J., Dobbs,D., McCray,P.B., Jr., Cathomen,T., Voytas,D.F., and Joung,J.K. (2008). Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification. Mol. Cell 31, 294-301.

Mak,A.N., Bradley,P., Cernadas,R.A., Bogdanove,A.J., and Stoddard,B.L. (2012). The crystal structure of TAL effector PthXo1 bound to its DNA target. Science 335, 716-719.

Makarova,K.S., Grishin,N.V., Shabalina,S.A., Wolf,Y.I., and Koonin,E.V. (2006). A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct. 1, 7.

Mali,P., Aach,J., Stranges,P.B., Esvelt,K.M., Moosburner,M., Kosuri,S., Yang,L., and Church,G.M. (2013). CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat. Biotechnol. 31, 833-838.

Martin,S.L., Branciforte,D., Keller,D., and Bain,D.L. (2003). Trimeric structure for an essential protein in L1 retrotransposition. Proc. Natl. Acad. Sci U. S. A 100, 13815- 13820.

Martin,S.L. and Bushman,F.D. (2001). Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol. Cell Biol. 21, 467-475.

190

Martin,S.L., Cruceanu,M., Branciforte,D., Wai-Lun,L.P., Kwok,S.C., Hodges,R.S., and Williams,M.C. (2005). LINE-1 retrotransposition requires the nucleic acid chaperone activity of the ORF1 protein. J Mol. Biol. 348, 549-561.

Martin,S.L., Li,J., and Weisz,J.A. (2000). Deletion analysis defines distinct functional domains for protein-protein and nucleic acid interactions in the ORF1 protein of mouse LINE-1. J. Mol. Biol. 304, 11-20.

Marton,I., Zuker,A., Shklarman,E., Zeevi,V., Tovkach,A., Roffe,S., Ovadis,M., Tzfira,T., and Vainstein,A. (2010). Nontransgenic genome modification in plant cells. Plant Physiol 154, 1079-1087.

Mashimo,T., Takizawa,A., Voigt,B., Yoshimi,K., Hiai,H., Kuramoto,T., and Serikawa,T. (2010). Generation of knockout rats with X-linked severe combined immunodeficiency (X-SCID) using zinc-finger nucleases. PLoS. ONE. 5, e8870.

Matera,A.G., Hellmann,U., and Schmid,C.W. (1990). A transpositionally and transcriptionally competent Alu subfamily. Mol. Cell. Biol. 10, 5424-5432.

Mathias,S.L., Scott,A.F., Kazazian,H.H.Jr., Boeke,J.D., and Gabriel,A. (1991). Reverse transcriptase encoded by a human transposable element. Science 254, 1808-1810.

Matlik,K., Redik,K., and Speek,M. (2006). L1 antisense promoter drives tissue-specific transcription of human genes. J Biomed. Biotechnol. 2006, 71753.

Meng,X., Noyes,M.B., Zhu,L.J., Lawson,N.D., and Wolfe,S.A. (2008). Targeted gene inactivation in zebrafish using engineered zinc-finger nucleases. Nat. Biotechnol. 26, 695-701.

Metzger,D. and Chambon,P. (2001). Site- and time-specific gene targeting in the mouse. Methods 24, 71-80.

Meyer,M., de Angelis,M.H., Wurst,W., and Kuhn,R. (2010). Gene targeting by homologous recombination in mouse zygotes mediated by zinc-finger nucleases. Proc. Natl. Acad. Sci. U. S. A 107, 15022-15026.

Miglino,N., Roth,M., Lardinois,D., Sadowski,C., Tamm,M., and Borger,P. (2012). Cigarette smoke inhibits lung fibroblast proliferation by translational mechanisms. Eur. Respir. J. 39, 705-711.

Miki,Y., Katagiri,T., Kasumi,F., Yoshimoto,T., and Nakamura,Y. (1996). Mutation analysis in the BRCA2 gene in primary breast cancers. Nat. Genet. 13, 245-247.

Miki,Y., Nishisho,I., Horii,A., Miyoshi,Y., Utsunomiya,J., Kinzler,K.W., Vogelstein,B., and Nakamura,Y. (1992). Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643-645.

Mikl,M.C., Watt,I.N., Lu,M., Reik,W., Davies,S.L., Neuberger,M.S., and Rada,C. (2005). Mice deficient in APOBEC2 and APOBEC3. Mol. Cell Biol. 25, 7270-7277.

191

Miller,J., McLachlan,A.D., and Klug,A. (1985). Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. Embo J. 4, 1609-1614.

Miller,J.C., Holmes,M.C., Wang,J., Guschin,D.Y., Lee,Y.L., Rupniewski,I., Beausejour,C.M., Waite,A.J., Wang,N.S., Kim,K.A., Gregory,P.D., Pabo,C.O., and Rebar,E.J. (2007). An improved zinc-finger nuclease architecture for highly specific genome editing. Nat. Biotechnol. 25, 778-785.

Minakami,R., Kurose,K., Etoh,K., Furuhata,Y., Hattori,M., and Sakaki,Y. (1992). Identification of an internal cis-element essential for the human L1 transcription and a nuclear factor(s) binding to the element. Nucleic Acids Res. 20, 3139-3145.

Mitchell,M., Gillis,A., Futahashi,M., Fujiwara,H., and Skordalakes,E. (2010). Structural basis for telomerase catalytic subunit TERT binding to RNA template and telomeric DNA. Nat. Struct. Mol. Biol. 17, 513-518.

Mol,C.D., Kuo,C.F., Thayer,M.M., Cunningham,R.P., and Tainer,J.A. (1995). Structure and function of the multifunctional DNA-repair enzyme exonuclease III. Nature 374, 381- 386.

Moqtaderi,Z., Wang,J., Raha,D., White,R.J., Snyder,M., Weng,Z., and Struhl,K. (2010). Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells. Nat Struct Mol Biol 17, 635-640.

Moran,J.V., DeBerardinis,R.J., and Kazazian,H.H., Jr. (1999). Exon shuffling by L1 retrotransposition. Science 283, 1530-1534.

Moran,J.V., Holmes,S.E., Naas,T.P., DeBerardinis,R.J., Boeke,J.D., and Kazazian HH,J.r. (1996). High frequency retrotransposition in cultured mammalian cells. Cell 87, 917-927.

Moscou,M.J. and Bogdanove,A.J. (2009). A simple cipher governs DNA recognition by TAL effectors. Science 326, 1501.

Moszczynska,A., Flack,A., Qiu,P., Muotri,A.R., and Killinger,B.A. (2015). Neurotoxic Methamphetamine Doses Increase LINE-1 Expression in the Neurogenic Zones of the Adult Rat Brain. Sci. Rep. 5, 14356.

Muckenfuss,H., Hamdorf,M., Held,U., Perkovic,M., Lower,J., Cichutek,K., Flory,E., Schumann,G.G., and Munk,C. (2006). APOBEC3 proteins inhibit human LINE-1 retrotransposition. J. Biol. Chem.

Muddashetty,R., Khanam,T., Kondrashov,A., Bundman,M., Iacoangeli,A., Kremerskothen,J., Duning,K., Barnekow,A., Huttenhofer,A., Tiedge,H., and Brosius,J. (2002). Poly(A)-binding Protein is Associated with Neuronal BC1 and BC200 Ribonucleoprotein Particles. J. Mol. Biol. 321, 433-445.

Munoz-Lopez,M. and Garcia-Perez,J.L. (2010). DNA transposons: nature and applications in genomics. Curr. Genomics 11, 115-128.

192

Murray,S.A., Eppig,J.T., Smedley,D., Simpson,E.M., and Rosenthal,N. (2012). Beyond knockouts: cre resources for conditional mutagenesis. Mamm. Genome 23, 587-599.

Mussolino,C., Morbitzer,R., Lutge,F., Dannemann,N., Lahaye,T., and Cathomen,T. (2011). A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity. Nucleic Acids Res. 39, 9283-9293.

Myers,J.S., Vincent,B.J., Udall,H., Watkins,W.S., Morrish,T.A., Kilroy,G.E., Swergold,G.D., Henke,J., Henke,L., Moran,J.V., Jorde,L.B., and Batzer,M.A. (2002). A comprehensive analysis of recently integrated human Ta L1 elements. Am. J. Hum. Genet. 71, 312-326.

Nagy,A. (2000). Cre recombinase: the universal reagent for genome tailoring. Genesis. 26, 99-109.

Narita,N., Nishio,H., Kitoh,Y., Ishikawa,Y., Ishikawa,Y., Minami,R., Nakamura,H., and Matsuo,M. (1993). Insertion of a 5' truncated L1 element into the 3' end of exon 44 of the dystrophin gene resulted in skipping of the exon during splicing in a case of Duchenne muscular dystrophy. J. Clin. Invest 91, 1862-1867.

Ni,T.H., Zhou,X., McCarty,D.M., Zolotukhin,I., and Muzyczka,N. (1994). In vitro replication of adeno-associated virus DNA. J. Virol. 68, 1128-1138.

Ochiai,H., Fujita,K., Suzuki,K., Nishikawa,M., Shibata,T., Sakamoto,N., and Yamamoto,T. (2010). Targeted mutagenesis in the sea urchin embryo using zinc-finger nucleases. Genes Cells 15, 875-885.

Okazaki,S., Ishikawa,H., and Fujiwara,H. (1995). Structural analysis of TRAS1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol. Cell Biol. 15, 4545-4552.

Okudaira,N., Ishizaka,Y., and Nishio,H. (2014). Retrotransposition of long interspersed element 1 induced by methamphetamine or cocaine. J. Biol. Chem. 289, 25476-25485.

Okudaira,N., Ishizaka,Y., Nishio,H., and Sakagami,H. (2016). Morphine and Fentanyl Citrate Induce Retrotransposition of Long Interspersed Element-1. In Vivo 30, 113-118.

Oler,A.J., Alla,R.K., Roberts,D.N., Wong,A., Hollenhorst,P.C., Chandler,K.J., Cassiday,P.A., Nelson,C.A., Hagedorn,C.H., Graves,B.J., and Cairns,B.R. (2010). Human RNA polymerase III transcriptomes and relationships to Pol II promoter chromatin and enhancer-binding factors. Nat Struct Mol Biol 17, 620-628.

Oler,A.J., Traina-Dorge,S., Derbes,R.S., Canella,D., Cairns,B.R., and Roy-Engel,A.M. (2012). Alu expression in human cell lines and their retrotranspositional potential. Mob. DNA 3, 11.

Orioli,A., Pascali,C., Quartararo,J., Diebel,K.W., Praz,V., Romascano,D., Percudani,R., van Dyk,L.F., Hernandez,N., Teichmann,M., and Dieci,G. (2011). Widespread occurrence of non-canonical transcription termination by human RNA polymerase III. Nucleic Acids Res. 39, 5499-5512.

193

Osanai-Futahashi,M., Suetsugu,Y., Mita,K., and Fujiwara,H. (2008). Genome-wide screening and characterization of transposable elements and their distribution analysis in the silkworm, Bombyx mori. Insect Biochem. Mol. Biol. 38, 1046-1057.

Ostertag,E.M. and Kazazian Jr,H.H. (2001). Biology of Mammalian l1 retrotransposons. Annu. Rev. Genet. 35, 501-538.

Ostertag,E.M. and Kazazian,H.H., Jr. (2001). Twin priming: a proposed mechanism for the creation of inversions in l1 retrotransposition. Genome Res. 11, 2059-2065.

Ostertag,E.M., Prak,E.T., DeBerardinis,R.J., Moran,J.V., and Kazazian,H.H., Jr. (2000). Determination of L1 retrotransposition kinetics in cultured cells. Nucleic Acids Res 28, 1418-1423.

Owens,R.A., Weitzman,M.D., Kyostio,S.R., and Carter,B.J. (1993). Identification of a DNA-binding domain in the amino terminus of adeno-associated virus Rep proteins. J. Virol. 67, 997-1005.

Pardue,M.L. and DeBaryshe,P.G. (2003). Retrotransposons provide an evolutionarily robust non-telomerase mechanism to maintain telomeres. Annu. Rev. Genet. 37, 485- 511.

Pardue,M.L. and DeBaryshe,P.G. (2011). Retrotransposons that maintain chromosome ends. Proc. Natl. Acad. Sci. U. S. A 108, 20317-20324.

Paulson,K.E. and Schmid,C.W. (1986). Transcriptional inactivity of Alu repeats in HeLa cells. Nucleic. Acids. Res. 14, 6145-6158.

Pavletich,N.P. and Pabo,C.O. (1991). Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 252, 809-817.

Pelletier,J. and Sonenberg,N. (1989). Internal binding of eucaryotic ribosomes on poliovirus RNA: translation in HeLa cell extracts. J. Virol. 63, 441-444.

Peng,H., Begg,G.E., Harper,S.L., Friedman,J.R., Speicher,D.W., and Rauscher,F.J., III (2000). Biochemical analysis of the Kruppel-associated box (KRAB) transcriptional repression domain. J. Biol. Chem. 275, 18000-18010.

Perepelitsa-Belancio,V. and Deininger,P.L. (2003). RNA truncation by premature polyadenylation attenuates human mobile element activity. Nat Genet 35, 363-366.

Perez,E.E., Wang,J., Miller,J.C., Jouvenot,Y., Kim,K.A., Liu,O., Wang,N., Lee,G., Bartsevich,V.V., Lee,Y.L., Guschin,D.Y., Rupniewski,I., Waite,A.J., Carpenito,C., Carroll,R.G., Orange,J.S., Urnov,F.D., Rebar,E.J., Ando,D., Gregory,P.D., Riley,J.L., Holmes,M.C., and June,C.H. (2008). Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat. Biotechnol. 26, 808-816.

Perez-Pinera,P., Kocak,D.D., Vockley,C.M., Adler,A.F., Kabadi,A.M., Polstein,L.R., Thakore,P.I., Glass,K.A., Ousterout,D.G., Leong,K.W., Guilak,F., Crawford,G.E., Reddy,T.E., and Gersbach,C.A. (2013). RNA-guided gene activation by CRISPR-Cas9- based transcription factors. Nat. Methods 10, 973-976.

194

Pilsner,J.R., Hu,H., Ettinger,A., Sanchez,B.N., Wright,R.O., Cantonwine,D., Lazarus,A., Lamadrid-Figueroa,H., Mercado-Garcia,A., Tellez-Rojo,M.M., and Hernandez-Avila,M. (2009). Influence of prenatal lead exposure on genomic methylation of cord blood DNA. Environ. Health Perspect. 117, 1466-1471.

Piskareva,O., Ernst,C., Higgins,N., and Schmatchenko,V. (2013). The carboxy-terminal segment of the human LINE-1 ORF2 protein is involved in RNA binding. FEBS Open. Bio 3, 433-437.

Pitkanen,E., Cajuso,T., Katainen,R., Kaasinen,E., Valimaki,N., Palin,K., Taipale,J., Aaltonen,L.A., and Kilpivaara,O. (2014). Frequent L1 retrotranspositions originating from TTC28 in colorectal cancer. Oncotarget. 5, 853-859.

Qi,L.S., Larson,M.H., Gilbert,L.A., Doudna,J.A., Weissman,J.S., Arkin,A.P., and Lim,W.A. (2013). Repurposing CRISPR as an RNA-guided platform for sequence- specific control of gene expression. Cell 152, 1173-1183.

Quinlan,A.R. and Hall,I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841-842.

Raffa,G.D., Cenci,G., Ciapponi,L., and Gatti,M. (2013). Organization and Evolution of Drosophila Terminin: Similarities and Differences between Drosophila and Human Telomeres. Front Oncol. 3, 112.

Raiz,J., Damert,A., Chira,S., Held,U., Klawitter,S., Hamdorf,M., Lower,J., Stratling,W.H., Lower,R., and Schumann,G.G. (2011). The non-autonomous retrotransposon SVA is trans-mobilized by the human LINE-1 protein machinery. Nucleic Acids Res.

Recchia,A., Perani,L., Sartori,D., Olgiati,C., and Mavilio,F. (2004). Site-specific integration of functional transgenes into the human genome by adeno/AAV hybrid vectors. Mol. Ther. 10, 660-670.

Repanas,K., Zingler,N., Layer,L.E., Schumann,G.G., Perrakis,A., and Weichenrieder,O. (2007). Determinants for DNA target structure selectivity of the human LINE-1 retrotransposon endonuclease. Nucleic Acids Res. 35, 4914-4926.

Rodic,N., Sharma,R., Sharma,R., Zampella,J., Dai,L., Taylor,M.S., Hruban,R.H., Iacobuzio-Donahue,C.A., Maitra,A., Torbenson,M.S., Goggins,M., Shih,I., Duffield,A.S., Montgomery,E.A., Gabrielson,E., Netto,G.J., Lotan,T.L., De Marzo,A.M., Westra,W., Binder,Z.A., Orr,B.A., Gallia,G.L., Eberhart,C.G., Boeke,J.D., Harris,C.R., and Burns,K.H. (2014). Long interspersed element-1 protein expression is a hallmark of many human cancers. Am. J. Pathol. 184, 1280-1286.

Rodic,N., Steranka,J.P., Makohon-Moore,A., Moyer,A., Shen,P., Sharma,R., Kohutek,Z.A., Huang,C.R., Ahn,D., Mita,P., Taylor,M.S., Barker,N.J., Hruban,R.H., Iacobuzio-Donahue,C.A., Boeke,J.D., and Burns,K.H. (2015). Retrotransposon insertions in the clonal evolution of pancreatic ductal adenocarcinoma. Nat. Med. 21, 1060-1064.

Rodriguez-Martin,C., Cidre,F., Fernandez-Teijeiro,A., Gomez-Mariano,G., de,l., V, Ramos,P., Zaballos,A., Monzon,S., and Alonso,J. (2016). Familial retinoblastoma due to

195 intronic LINE-1 insertion causes aberrant and noncanonical mRNA splicing of the RB1 gene. J. Hum. Genet. 61, 463-466.

Rowe,S.M., Coughlan,S.J., McKenna,N.J., Garrett,E., Kieback,D.G., Carney,D.N., and Headon,D.R. (1995). Ovarian carcinoma-associated TaqI restriction fragment length polymorphism in intron G of the progesterone receptor gene is due to an Alu sequence insertion. Cancer Res. 55, 2743-2745.

Roy-Engel,A.M. (2012). LINEs, SINEs and other retroelements: do birds of a feather flock together? Front Biosci. 17, 1345-1361.

Roy-Engel,A.M., Carroll,M.L., Vogel,E., Garber,R.K., Nguyen,S.V., Salem,A.H., Batzer,M.A., and Deininger,P.L. (2001). Alu insertion polymorphisms for the study of human genomic diversity. Genetics 159, 279-290.

Roy-Engel,A.M., Salem,A.H., Oyeniran,O.O., Deininger,L., Hedges,D.J., Kilroy,G.E., Batzer,M.A., and Deininger,P.L. (2002). Active alu element "A-Tails": size does matter. Genome Res. 12, 1333-1344.

Russell,D.W., Miller,A.D., and Alexander,I.E. (1994). Adeno-associated virus vectors preferentially transduce cells in S phase. Proc. Natl. Acad. Sci. U. S. A 91, 8915-8919.

Sahakyan,A.B., Murat,P., Mayer,C., and Balasubramanian,S. (2017). G-quadruplex structures within the 3' UTR of LINE-1 elements stimulate retrotransposition. Nat. Struct. Mol. Biol. 24, 243-247.

Sakai,K. and Miyazaki,J. (1997). A transgenic mouse line that retains Cre recombinase activity in mature oocytes irrespective of the cre transgene transmission. Biochem. Biophys. Res. Commun. 237, 318-324.

Sander,J.D. and Joung,J.K. (2014). CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 32, 347-355.

Sanjana,N.E., Cong,L., Zhou,Y., Cunniff,M.M., Feng,G., and Zhang,F. (2012). A transcription activator-like effector toolbox for genome engineering. Nat. Protoc. 7, 171- 192.

Santourlidis,S., Florl,A., Ackermann,R., Wirtz,H.C., and Schulz,W.A. (1999). High frequency of alterations in DNA methylation in adenocarcinoma of the prostate. Prostate 39, 166-174.

Sarrowa,J., Chang,D.Y., and Maraia,R.J. (1997). The decline in human Alu retroposition was accompanied by an asymmetric decrease in SRP9/14 binding to dimeric Alu RNA and increased expression of small cytoplasmic Alu RNA. Mol. Cell Biol. 17, 1144-1151.

Sauer,B. and Henderson,N. (1988). Site-specific DNA recombination in mammalian cells by the Cre recombinase of bacteriophage P1. Proc. Natl. Acad. Sci. U. S. A 85, 5166- 5170.

Sauer,B. and Henderson,N. (1989). Cre-stimulated recombination at loxP-containing DNA sequences placed into the mammalian genome. Nucleic Acids Res. 17, 147-161.

196

Sauer,B. and Henderson,N. (1990). Targeted insertion of exogenous DNA into the eukaryotic genome by the Cre recombinase. New Biol. 2, 441-449.

Schwahn,U., Lenzner,S., Dong,J., Feil,S., Hinzmann,B., van Duijnhoven,G., Kirschner,R., Hemberger,M., Bergen,A.A., Rosenberg,T., Pinckers,A.J., Fundele,R., Rosenthal,A., Cremers,F.P., Ropers,H.H., and Berger,W. (1998). Positional cloning of the gene for X-linked retinitis pigmentosa 2. Nat. Genet. 19, 327-332.

Semprini,S., Troup,T.J., Kotelevtseva,N., King,K., Davis,J.R., Mullins,L.J., Chapman,K.E., Dunbar,D.R., and Mullins,J.J. (2007). Cryptic loxP sites in mammalian genomes: genome-wide distribution and relevance for the efficiency of BAC/PAC recombineering techniques. Nucleic Acids Res. 35, 1402-1410.

Servant,G., Streva,V.A., Derbes,R.S., Wijetunge,M.I., Neeland,M., White,T.B., Belancio,V.P., Roy-Engel,A.M., and Deininger,P.L. (2017). The Nucleotide Excision Repair Pathway Limits L1 Retrotransposition. Genetics 205, 139-153.

Servomaa,K. and Rytomaa,T. (1988). Suicidal death of rat chloroleukaemia cells by activation of the long interspersed repetitive DNA element (L1Rn). Cell Tissue Kinet. 21, 33-43.

Servomaa,K. and Rytomaa,T. (1990). UV light and ionizing radiations cause programmed death of rat chloroleukaemia cells by inducing retropositions of a mobile DNA element (L1Rn). Int. J. Radiat. Biol. 57, 331-343.

Shafferman,A. and Helinski,D.R. (1983). Structural properties of the beta origin of replication of plasmid R6K. J. Biol. Chem. 258, 4083-4090.

Shaikh,T.H., Roy,A.M., Kim,J., Batzer,M.A., and Deininger,P.L. (1997). cDNAs derived from primary and small cytoplasmic Alu (scAlu) transcripts. J Mol. Biol. 271, 222-234.

Shen,M.R., Batzer,M.A., and Deininger,P.L. (1991). Evolution of the master Alu gene(s). J Mol. Evol. 33, 311-320.

Shivram,H., Cawley,D., and Christensen,S.M. (2011). Targeting novel sites: The N- terminal DNA binding domain of non-LTR retrotransposons is an adaptable module that is implicated in changing site specificities. Mob. Genet. Elements. 1, 169-178.

Shukla,R., Upton,K.R., Munoz-Lopez,M., Gerhardt,D.J., Fisher,M.E., Nguyen,T., Brennan,P.M., Baillie,J.K., Collino,A., Ghisletti,S., Sinha,S., Iannelli,F., Radaelli,E., Dos,S.A., Rapoud,D., Guettier,C., Samuel,D., Natoli,G., Carninci,P., Ciccarelli,F.D., Garcia-Perez,J.L., Faivre,J., and Faulkner,G.J. (2013). Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. Cell 153, 101-111.

Silva-Sousa,R. and Casacuberta,E. (2013). The JIL-1 kinase affects telomere expression in the different telomere domains of Drosophila. PLoS. ONE. 8, e81543.

Slagel,V. and Deininger,P. (1989). In vivo transcription of a cloned prosimian primate SINE sequence. Nucleic Acids Res. 17, 8669-8682.

197

Smit,A.F. (1996). The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6, 743-748.

Smith,J., Bibikova,M., Whitby,F.G., Reddy,A.R., Chandrasegaran,S., and Carroll,D. (2000). Requirements for double-strand cleavage by chimeric restriction enzymes with zinc finger DNA-recognition domains. Nucleic Acids Res. 28, 3361-3369.

Smith,R.H., Spano,A.J., and Kotin,R.M. (1997). The Rep78 gene product of adeno- associated virus (AAV) self-associates to form a hexameric complex in the presence of AAV ori sequences. J. Virol. 71, 4461-4471.

Sokolowski,M., DeFreece,C.B., Servant,G., Kines,K.J., deHaro,D.L., and Belancio,V.P. (2014). Development of a monoclonal antibody specific to the endonuclease domain of the human LINE-1 ORF2 protein. Mob. DNA 5, 29.

Sokolowski,M., deHaro,D., Christian,C.M., Kines,K.J., and Belancio,V.P. (2013). Characterization of L1 ORF1p self-interaction and cellular localization using a mammalian two-hybrid system. PLoS. ONE. 8, e82021.

Solyom,S., Ewing,A.D., Rahrmann,E.P., Doucet,T.T., Nelson,H.H., Burns,M.B., Harris,R.S., Sigmon,D.F., Casella,A., Erlanger,B., Wheelan,S., Upton,K.R., Shukla,R., Faulkner,G.J., Largaespada,D.A., and Kazazian,H.H. (2012). Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res.

Speek,M. (2001). Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol. Cell Biol. 21, 1973-1985.

Stalker,D.M., Kolter,R., and Helinski,D.R. (1982). Plasmid R6K DNA replication : I. Complete nucleotide sequence of an autonomously replicating segment. Journal of Molecular Biology 161, 33-43.

Startek,M., Szafranski,P., Gambin,T., Campbell,I.M., Hixson,P., Shaw,C.A., Stankiewicz,P., and Gambin,A. (2015). Genome-wide analyses of LINE-LINE-mediated nonallelic homologous recombination. Nucleic Acids Res. 43, 2188-2198.

Stenglein,M.D. and Harris,R.S. (2006). APOBEC3B and APOBEC3F Inhibit L1 Retrotransposition by a DNA Deamination-independent Mechanism. J. Biol. Chem. 281, 16837-16841.

Stewart,C., Kural,D., Stromberg,M.P., Walker,J.A., Konkel,M.K., Stutz,A.M., Urban,A.E., Grubert,F., Lam,H.Y., Lee,W.P., Busby,M., Indap,A.R., Garrison,E., Huff,C., Xing,J., Snyder,M.P., Jorde,L.B., Batzer,M.A., Korbel,J.O., and Marth,G.T. (2011). A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet 7, e1002236.

Straubeta,A. and Lahaye,T. (2013). Zinc fingers, TAL effectors, or Cas9-based DNA binding proteins: what's best for targeting desired genome loci? Mol. Plant 6, 1384-1387.

Streva,V.A., Jordan,V.E., Linker,S., Hedges,D.J., Batzer,M.A., and Deininger,P.L. (2015). Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals

198 significant variation in full length L1 elements between individuals. BMC. Genomics 16, 220.

Stribinskis,V. and Ramos,K.S. (2006). Activation of Human Long Interspersed Nuclear Element 1 Retrotransposition by Benzo(a)pyrene, an Ubiquitous Environmental Carcinogen. Cancer Res 66, 2616-2620.

Swergold,G.D. (1990). Identification, characterization, and cell specificity of a human LINE-1 promoter. Mol. Cell Biol. 10, 6718-6729.

Symer,D.E., Connelly,C., Szak,S.T., Caputo,E.M., Cost,G.J., Parmigiani,G., and Boeke,J.D. (2002). Human l1 retrotransposition is associated with genetic instability in vivo. Cell 110, 327-338.

Takahashi,H. and Fujiwara,H. (2002). Transplantation of target site specificity by swapping the endonuclease domains of two LINEs. EMBO J 21, 408-417.

Takahashi,H., Okazaki,S., and Fujiwara,H. (1997). A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucleic Acids Res. 25, 1578-1584.

Takeda,K., Kaisho,T., Yoshida,N., Takeda,J., Kishimoto,T., and Akira,S. (1998). Stat3 activation is responsible for IL-6-dependent T cell proliferation through preventing apoptosis: generation and characterization of T cell-specific Stat3-deficient mice. J. Immunol. 161, 4652-4660.

Tan,W., Zhu,K., Segal,D.J., Barbas,C.F., III, and Chow,S.A. (2004). Fusion proteins consisting of human immunodeficiency virus type 1 integrase and the designed polydactyl zinc finger protein E2C direct integration of viral DNA into specific sites. J Virol. 78, 1301-1313.

Tanaka,A., Nakatani,Y., Hamada,N., Jinno-Oue,A., Shimizu,N., Wada,S., Funayama,T., Mori,T., Islam,S., Hoque,S.A., Shinagawa,M., Ohtsuki,T., Kobayashi,Y., and Hoshino,H. (2012). Ionising irradiation alters the dynamics of human long interspersed nuclear elements 1 (LINE1) retrotransposon. Mutagenesis 27, 599-607.

Tasic,B., Miyamichi,K., Hippenmeyer,S., Dani,V.S., Zeng,H., Joo,W., Zong,H., Chen- Tsai,Y., and Luo,L. (2012). Extensions of MADM (mosaic analysis with double markers) in mice. PLoS. ONE. 7, e33332.

Taylor,M.S., LaCava,J., Mita,P., Molloy,K.R., Huang,C.R., Li,D., Adney,E.M., Jiang,H., Burns,K.H., Chait,B.T., Rout,M.P., Boeke,J.D., and Dai,L. (2013). Affinity proteomics reveals human host factors implicated in discrete stages of LINE-1 retrotransposition. Cell 155, 1034-1048.

Teneng,I., Montoya-Durango,D.E., Quertermous,J.L., Lacy,M.E., and Ramos,K.S. (2011). Reactivation of L1 retrotransposon by benzo(a)pyrene involves complex genetic and epigenetic regulation. Epigenetics. 6, 355-367.

199

Terasaki,N., Goodier,J.L., Cheung,L.E., Wang,Y.J., Kajikawa,M., Kazazian,H.H., Jr., and Okada,N. (2013). In vitro screening for compounds that enhance human L1 mobilization. PLoS. ONE. 8, e74629.

Terzi,L., Pool,M.R., Dobberstein,B., and Strub,K. (2004). Signal recognition particle Alu domain occupies a defined site at the ribosomal subunit interface upon signal sequence recognition. Biochemistry 43, 107-117.

Teugels,E., De Brakeleer,S., Goelen,G., Lissens,W., Sermijn,E., and De Greve,J. (2005). De novo Alu element insertions targeted to a sequence common to the BRCA1 and BRCA2 genes. Hum. Mutat. 26, 284.

Thompson,B.K. and Christensen,S.M. (2011). Independently derived targeting of 28S rDNA by A- and D-clade R2 retrotransposons: Plasticity of integration mechanism. Mob. Genet Elements. 1, 29-37.

Tratschin,J.D., Miller,I.L., and Carter,B.J. (1984). Genetic analysis of adeno-associated virus: properties of deletion mutants constructed in vitro and evidence for an adeno- associated virus replication function. J. Virol. 51, 611-619.

Tsai,B.P., Wang,X., Huang,L., and Waterman,M.L. (2011). Quantitative profiling of in vivo-assembled RNA-protein complexes using a novel integrated proteomic approach. Mol. Cell Proteomics. 10, M110.

Tsien,J.Z., Chen,D.F., Gerber,D., Tom,C., Mercer,E.H., Anderson,D.J., Mayford,M., Kandel,E.R., and Tonegawa,S. (1996). Subregion- and cell type-restricted gene knockout in mouse brain. Cell 87, 1317-1326.

Ullu,E. and Weiner,A.M. (1985). Upstream sequences modulate the internal promoter of the human 7SL RNA gene. Nature 318, 371-374.

Van Duyne,G.D. (2015). Cre Recombinase. Microbiol. Spectr. 3, MDNA3-2014.

Voigt,K., Gogol-Doring,A., Miskey,C., Chen,W., Cathomen,T., Izsvak,Z., and Ivics,Z. (2012). Retargeting sleeping beauty transposon insertions by engineered zinc finger DNA-binding domains. Mol. Ther. 20, 1852-1862.

Vorce,R.L., Lee,B., and Howard,B.H. (1994). Methylation- and mutation-dependent stimulation of Alu transcription in vitro. Biochem. Biophys. Res. Commun. 203, 845-851.

Voziyanov,Y., Pathania,S., and Jayaram,M. (1999). A general model for site-specific recombination by the integrase family recombinases. Nucleic Acids Res. 27, 930-941.

Wagstaff,B.J., Barnerssoi,M., and Roy-Engel,A.M. (2011). Evolutionary conservation of the functional modularity of primate and murine LINE-1 elements. PLoS One. 6, e19672.

Wagstaff,B.J., Hedges,D.J., Derbes,R.S., Campos,S.R., Chiaromonte,F., Makova,K.D., and Roy-Engel,A.M. (2012). Rescuing Alu: Recovery of New Inserts Shows LINE-1 Preserves Alu Activity through A-Tail Expansion. PLoS Genet 8, e1002842.

200

Waldman,A.S. and Liskay,R.M. (1988). Dependence of intrachromosomal recombination in mammalian cells on uninterrupted homology. Mol. Cell Biol. 8, 5350-5357.

Wallace,N., Wagstaff,B.J., Deininger,P.L., and Roy-Engel,A.M. (2008a). LINE-1 ORF1 protein enhances Alu SINE retrotransposition. Gene 419, 1-6.

Wallace,N.A., Belancio,V.P., and Deininger,P.L. (2008b). L1 mobile element expression causes multiple types of toxicity. Gene 419, 75-81.

Walter,P. and Blobel,G. (1982). Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum. Nature 299, 691-698.

Walter,P., Gilmore,R., Muller,M., and Blobel,G. (1982). The protein translocation machinery of the endoplasmic reticulum. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 300, 225-228.

Ward,P., Urcelay,E., Kotin,R., Safer,B., and Berns,K.I. (1994). Adeno-associated virus DNA replication in vitro: activation by a maltose binding protein/Rep 68 fusion protein. J. Virol. 68, 6029-6037.

Waterman,M.J., Waterman,J.L., and Halazonetis,T.D. (1996). An engineered four- stranded coiled coil substitutes for the tetramerization domain of wild-type p53 and alleviates transdominant inhibition by tumor-derived p53 mutants. Cancer Res. 56, 158- 163.

Wei,W., Gilbert,N., Ooi,S.L., Lawler,J.F., Ostertag,E.M., Kazazian,H.H., Boeke,J.D., and Moran,J.V. (2001). Human L1 retrotransposition: cis preference versus trans complementation. Mol. Cell Biol. 21, 1429-1439.

Weichenrieder,O., Repanas,K., and Perrakis,A. (2004). Crystal structure of the targeting endonuclease of the human LINE-1 retrotransposon. Structure. 12, 975-986.

Weiner,A., Deininger,P., and Efstradiatis,A. (1986). The Reverse Flow of Genetic Information: pseudogenes and transposable elements derived from nonviral cellular RNA. Annual Reviews of Biochemistry 55, 631-661.

Weis,L. and Reinberg,D. (1997). Accurate positioning of RNA polymerase II on a natural TATA-less promoter is independent of TATA-binding-protein-associated factors and initiator-binding proteins. Mol. Cell Biol. 17, 2973-2984.

Weisenberger,D.J., Campan,M., Long,T.I., Kim,M., Woods,C., Fiala,E., Ehrlich,M., and Laird,P.W. (2005). Analysis of repetitive element DNA methylation by MethyLight. Nucleic Acids Res. 33, 6823-6836.

Weisenberger,D.J. and Romano,L.J. (1999). Cytosine methylation in a CpG sequence leads to enhanced reactivity with Benzo[a]pyrene diol epoxide that correlates with a conformational change. J. Biol. Chem. 274, 23948-23955.

Weitzman,M.D., Kyostio,S.R., Kotin,R.M., and Owens,R.A. (1994). Adeno-associated virus (AAV) Rep proteins mediate complex formation between AAV DNA and its integration site in human DNA. Proc. Natl. Acad. Sci. U. S. A 91, 5808-5812.

201

Wessler,S.R. (2006). Transposable elements and the evolution of eukaryotic genomes. Proc. Natl. Acad. Sci. U. S. A 103, 17600-17601.

West,N., Roy-Engel,A., Imataka,H., Sonenberg,N., and Deininger,P. (2002). Shared Protein Components of SINE RNPs. J. Mol. Biol. 321, 423-432.

Williams,D.A. (2007). RAC reviews serious adverse event associated with AAV therapy trial. Mol. Ther. 15, 2053-2054.

Wimmer,K., Callens,T., Wernstedt,A., and Messiaen,L. (2011). The NF1 Gene Contains Hotspots for L1 Endonuclease-Dependent De Novo Insertion. PLoS Genet 7, e1002371.

Witherspoon,D.J., Xing,J., Zhang,Y., Watkins,W.S., Batzer,M.A., and Jorde,L.B. (2010). Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC. Genomics 11, 410.

Witherspoon,D.J., Zhang,Y., Xing,J., Watkins,W.S., Ha,H., Batzer,M.A., and Jorde,L.B. (2013). Mobile element scanning (ME-Scan) identifies thousands of novel Alu insertions in diverse human populations. Genome Res. 23, 1170-1181.

Wolfe,S.A., Nekludova,L., and Pabo,C.O. (2000). DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183-212.

Wonderling,R.S., Kyostio,S.R., and Owens,R.A. (1995). A maltose-binding protein/adeno-associated virus Rep68 fusion protein has DNA-RNA helicase and ATPase activities. J. Virol. 69, 3542-3548.

Woodcock,D.M., Lawler,C.B., Linsenmeyer,M.E., Doherty,J.P., and Warren,W.D. (1997). Asymmetric methylation in the hypermethylated CpG promoter region of the human L1 retrotransposon. J. Biol. Chem. 272, 7810-7816.

Xie,H., Wang,M., Bonaldo,M.F., Smith,C., Rajaram,V., Goldman,S., Tomita,T., and Soares,M.B. (2009). High-throughput sequence-based epigenomic analysis of Alu repeats in human cerebellum. Nucleic Acids Res. 37, 4331-4340.

Xie,Y., Rosser,J.M., Thompson,T.L., Boeke,J.D., and An,W. (2010). Characterization of L1 retrotransposition with high-throughput dual-luciferase assays. Nucleic Acids Res.

Yang,J., Malik,H.S., and Eickbush,T.H. (1999). Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. U. S. A 96, 7847-7852.

Yang,N., Zhang,L., Zhang,Y., and Kazazian,H.H., Jr. (2003). An important role for RUNX3 in human L1 transcription and retrotransposition. Nucleic Acids Res. 31, 4929- 4940.

Young,J.J., Cherone,J.M., Doyon,Y., Ankoudinova,I., Faraji,F.M., Lee,A.H., Ngo,C., Guschin,D.Y., Paschon,D.E., Miller,J.C., Zhang,L., Rebar,E.J., Gregory,P.D., Urnov,F.D., Harland,R.M., and Zeitler,B. (2011). Efficient targeted gene disruption in the soma and germ line of the frog Xenopus tropicalis using engineered zinc-finger nucleases. Proc. Natl. Acad. Sci. U. S. A 108, 7052-7057.

202

Yulug,I.G., Yulug,A., and Fisher,E.M. (1995). The frequency and position of Alu repeats in cDNAs, as determined by database searching. Genomics 27, 544-548.

Zhang,F., Cong,L., Lodato,S., Kosuri,S., Church,G.M., and Arlotta,P. (2011). Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription. Nat. Biotechnol. 29, 149-153.

Zhang,L., Beaucher,M., Cheng,Y., and Rong,Y.S. (2014). Coordination of transposon expression with DNA replication in the targeting of telomeric retrotransposons in Drosophila. Embo J. 33, 1148-1158.

203

Appendix Table of Contents

Appendix Table 1. Chromosome locations of Alu inserts driven by LZ-ORF2 fu- sion protein………………………………………………………………...... 205

Appendix Table 2. Chromosome locations of Alu inserts driven by TZ-ORF2 fu- sion protein...... 209

Appendix Table 3. Chromosome locations of Alu inserts driven by C-Cre-ORF2 and N-Cre-ORF2 fusion proteins...... 210

Appendix Table 4. Chromosome locations of Alu inserts driven by Tal-ORF2 fu- sion protein...... 211

Appendix Table 5. Chromosome locations of Alu inserts driven by EGFP target- ing zinc fingers...... 213

Appendix Table 6. Chromosome locations of Alu inserts driven by N-ZF4-ORF2

GHL targeting zinc fingers...... 215

Appendix Table 7. Chromosome locations of Alu inserts driven by N-ZF4-ORF2

FL4 targeting zinc fingers...... 221

Appendix Table 8. Chromosome locations of Alu inserts driven by N-ZF4-ORF2

HL4 targeting zinc fingers...... 227

Appendix Table 9. Chromosome locations of Alu inserts driven by C-ZF4-ORF2

GHL targeting zinc fingers...... 231

204

Appendix Table 10. Chromosome locations of Alu inserts driven by C-ZF4-

ORF2 FL4 targeting zinc fingers...... 236

Appendix Table 11. Chromosome locations of Alu inserts driven by C-ZF4-

ORF2 HL4 targeting zinc fingers...... 239

Appendix Table 12. Chromosome locations of Alu inserts driven by N-ZF2-

ORF2 GHL targeting zinc finger...... 245

Appendix Table 13. Chromosome locations of Alu inserts driven by N-ZF2-

ORF2 FL4 targeting zinc finger...... 247

Appendix Table 14. Chromosome locations of Alu inserts driven by N-ZF2-

ORF2 HL4 targeting zinc finger...... 248

Appendix Table 15. Chromosome locations of Alu inserts driven by C-ZF2-

ORF2 GHL targeting zinc finger...... 250

Appendix Table 16. Chromosome locations of Alu inserts driven by C-ZF2-

ORF2 FL4 targeting zinc finger...... 252

Appendix Table 17. Chromosome locations of Alu inserts driven by C-ZF2-

ORF2 HL4 targeting zinc finger...... 253

Appendix Table 18. Chromosome locations of Alu inserts driven by ORF2 en- donuclease deficient constructs fused to Cas9 or the Cas9 nickase...... 256

Appendix Table 19. Chromosome locations of Alu inserts driven by dCas9-

ORF2 constructs supplemented with a targeting gRNA...... 257

Appendix Table 20. Chromosome locations of Alu inserts driven by MS2-ORF2 constructs supplemented with a targeting gRNA and dCas9...... 262

205

APPENDIX

Appendix Table 1. Chromosome locations of Alu inserts driven by LZ-ORF2 fu- sion protein

Driver Clone Chr Position Target Location of Alu relative Notes site? to Target Sequence (If applicable) LZ- 1 1 1329515 No ORF2 LZ- 2 1 8937306 No ORF2 LZ- 3 1 21737472 No ORF2 LZ- 4 1 54191570 No ORF2 LZ- 5 1 65860631 No ORF2 LZ- 6 1 66642147 No ORF2 LZ- 7 1 67441039 No ORF2 LZ- 8 1 117549581 No ORF2 LZ- 9 1 182982875 No ORF2 LZ- 10 1 186433913 No ORF2 LZ- 11 1 205596728 No ORF2 LZ- 12 2 69430365 No ORF2 LZ- 13 2 70318707 No ORF2 LZ- 14 2 189995636 No ORF2 LZ- 15 2 210800221 No ORF2 LZ- 16 2 238628075 No ORF2 LZ- 17 3 33345902 No ORF2 LZ- 18 3 124,745,971 No ORF2 LZ- 19 3 150736931 No ORF2 LZ- 20 3 171402897 No ORF2

206

LZ- 21 3 184205951 No ORF2 LZ- 22 3 197689496 No ORF2 LZ- 23 4 124759370 No ORF2 LZ- 24 5 19387313 No ORF2 LZ- 25 5 36255339 No ORF2 LZ- 26 5 64336200 No ORF2 LZ- 27 5 76483382 No ORF2 LZ- 28 5 134356756 No ORF2 LZ- 29 5 141379654 No ORF2 LZ- 30 5 148459878 No ORF2 LZ- 31 5 167190715 No ORF2 LZ- 32 6 135233306 No ORF2 LZ- 33 6 137359240 No ORF2 LZ- 34 7 1287709 No ORF2 LZ- 35 7 65619169 No ORF2 LZ- 36 7 72232576 No ORF2 LZ- 37 7 74666432 No ORF2 LZ- 38 7 156719757 No ORF2 LZ- 39 8 97323906 No ORF2 LZ- 40 9 22157141 No ORF2 LZ- 41 9 131304493 No ORF2 LZ- 42 10 73047314 No ORF2 LZ- 43 10 91034339 No ORF2 LZ- 44 10 112270251 No ORF2 LZ- 45 10 117012451 No ORF2

207

LZ- 46 11 28671379 No ORF2 LZ- 47 11 74484409 No ORF2 LZ- 48 11 101262810 No ORF2 LZ- 49 12 29807168 No ORF2 LZ- 50 12 104616769 No ORF2 LZ- 51 12 112411757 No ORF2 LZ- 52 13 77,685,020 No ORF2 LZ- 53 13 79200920 No ORF2 LZ- 54 13 92850663 No ORF2 LZ- 55 14 56285222 No ORF2 LZ- 56 14 65094075 No ORF2 LZ- 57 14 91245526 No ORF2 LZ- 58 14 92352545 No ORF2 LZ- 59 15 59724193 No ORF2 LZ- 60 15 69480707 No ORF2 LZ- 61 16 30514516 No ORF2 LZ- 62 16 89575313 No ORF2 LZ- 63 17 39958396 No ORF2 LZ- 64 17 61238975 No ORF2 LZ- 65 17 68043422 No ORF2 LZ- 66 18 32733002 No ORF2 LZ- 67 19 57679450 No ORF2 LZ- 68 20 43781473 No ORF2 LZ- 69 20 48,437,556 No ORF2 LZ- 70 20 52835539 No ORF2

208

LZ- 71 20 59087654 No ORF2 LZ- 72 21 46280278 No ORF2 LZ- 73 22 20311747 No ORF2 LZ- 74 22 38002187 No ORF2

209

Appendix Table 2. Chromosome locations of Alu inserts driven by TZ-ORF2 fu- sion protein

TZ- ORF2 Driver Clone Chromosome Position Tar- Location of Alu Notes get relative to Target site? Sequence (If appli- cable) TZ- 1 1 186648182 No ORF2 TZ- 2 1 206,991,724 No ORF2 TZ- 3 2 10342071 No ORF2 TZ- 4 2 230982473 No ORF2 TZ- 5 3 24844836 No ORF2 TZ- 6 3 120085269 No ORF2 TZ- 7 3 181561035 No ORF2 TZ- 8 5 131066328 No ORF2 TZ- 9 6 86242220 No ORF2 TZ- 10 7 27741174 No ORF2 TZ- 11 7 45867351 No ORF2 TZ- 12 8 102554170 No ORF2 TZ- 13 11 57578561 No ORF2 TZ- 14 16 69597400 No ORF2 TZ- 15 17 62685113 No ORF2

210

Appendix Table 3. Chromosome locations of Alu inserts driven by C-Cre-ORF2 and N-Cre-ORF2 fusion proteins

Driver Clone Chr Position Target Location of Alu Notes site? relative to Tar- get Sequence (If applicable) N-Cre- 1 2 239001769 No ORF2 N-Cre- 2 2 241683668 No ORF2 N-Cre- 3 2 88857884 No ORF2 N-Cre- 4 20 17413977 No ORF2 N-Cre- 5 17 23141471 No Multiple 100% ORF2 matches on chromosome 17. No LoxP target site N-Cre- 6 18 29147758 No ORF2 C-Cre- 1 14 69599516 No ORF2 C-Cre- 2 2 157758596 No ORF2 C-Cre- 3 17 30038502 No ORF2 C-Cre- 4 2 224621653 No ORF2 C-Cre- 5 8 100540787 No ORF2 C-Cre- 6 1 186557305 No ORF2 C-Cre- 7 15 41069408 No ORF2 C-Cre- 8 2 239923465 No ORF2 C-Cre- 9 2 242623083 No ORF2 C-Cre- 10 2 89157396 No ORF2 C-Cre- 11 18 26727723 No ORF2

211

Appendix Table 4. Chromosome locations of Alu inserts driven by Tal-ORF2 fu- sion protein

Clone Chr Position Target Location of Alu Notes site? relative to Tar- get Sequence (If applicable) 1 8 143956366 No 2 6 100354134 No 3 2 111554759 No Flanking sequence or matched two loci per- 87307000 fectly in Chr 2. 4 17 43839357 No 5 1 88630563 No 6 6 117392925 No 7 10 21921931 No 8 14 55468550 No 9 2 201946347 No 10 3 132294375 No 11 2 36485919 No 12 11 34840115 No 13 10 10513336 No 14 4 122894976 No 15 5 65835788 No 16 4 122894593 No 17 4 68898932 No 18 9 24868953 No 19 8 23140477 No 20 X 77951391 No 21 7 95236089 No 22 1 185030940 No 23 13 75676994 No 24 2 197607012 No 25 10 18169299 No 26 20 52048053 No 27 12 21559659 No 28 4 142756482 No 29 20 64309176 No 30 X 22797646 No 31 2 233823638 No 32 10 18295714 No

212

33 20 733421 No 34 2 197266174 No 35 2 66761408 No 36 7 1687592 No 37 15 59846378 No 38 2 61037197 No 39 17 74898842 No 40 5 18326666 No 41 5 136793782 No 42 5 58399150 No

213

Appendix Table 5. Chromosome locations of Alu inserts driven by EGFP target- ing zinc fingers

Driver Clone Chr Position Target Location of Alu Notes site? relative to Target Sequence (If ap- plicable) 2.17- 1 3 89557580 No ORF2 2.17- 2 9 36057110 No ORF2 2.17- 3 5 101499336 No ORF2 2.17- 4 6 13530183 No ORF2 2.17- 5 20 33575974 No ORF2 2.17- 6 3 189574527 No ORF2 2.17- 7 12 20198625 No ORF2 2.17- 8 20 52047801 No ORF2 2.17- 9 1 223793011 No ORF2 2.17- 10 20 50092025 No ORF2 2.17- 11 15 35606159 No ORF2 2.17- 12 3 98942189 No ORF2 2.17- 13 13 106508879 No ORF2 2.17- 14 7 116484971 No ORF2 2.18- 1 20 52048087 No ORF2 2.18- 2 12 71882810 No ORF2 2.18- 3 3 105333414 No ORF2 2.18- 4 7 23504847 No ORF2 2.18- 5 19 15394850 No ORF2 2.18- 6 1 155338626 No ORF2 2.18- 7 2 19930097 No

214

ORF2 2.18- 8 17 58894378 No ORF2 2.18- 9 8 74598418 No ORF2 2.18- 10 4 65863043 No ORF2 2.18- 11 5 143050994 No ORF2 2.1817- 1 6 12055301 No ORF2 2.1817- 2 6 7831544 No ORF2 2.1817- 3 14 49863115 No ORF2 2.1817- 4 2 36706350 No ORF2 2.1817- 5 5 138417515 No ORF2 2.1817- 6 3 57769007 No ORF2 2.1817- 7 15 39440427 No ORF2 2.1817- 8 1 2396116 No ORF2 2.1817- 9 11 12090139 No ORF2 2.1817- 10 2 37191061 No ORF2 2.1817- 11 18 9986720 No ORF2

215

Appendix Table 6. Chromosome locations of Alu inserts driven by N-ZF4-ORF2 GHL targeting zinc fingers

Driver Clon Ch Position ZF4 Location of Notes e r site? Alu relative to Target Se- quence (If ap- plicable) N-ZF4 1 7 20185429 Yes Alu inserted 3114bp upstream and in the ORF2 opposite orientation of the ZF4 target se- quence N-ZF4 2 5 10653978 Yes Alu inserted 447bp downstream and in the ORF2 1 same orientation as the ZF4 target se- quence N-ZF4 3 13 36614335 No ORF2 N-ZF4 4 3 44150558 Yes Alu inserted 1763bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 5 17 64921000 May- Alu inserted 173bp downstream and in the ORF2 be opposite orientation of the potential target site: TGGTTGTAAAAATGATGAG N-ZF4 6 11 10301471 No ORF2 N-ZF4 7 12 54226570 No ORF2 N-ZF4 8 1 10302631 No ORF2 0 N-ZF4 9 7 15169676 No ORF2 9 N-ZF4 10 9 70760042 Yes Alu inserted 393bp downstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 11 6 13921626 Yes Alu inserted 39bp upstream and in the ORF2 7 same orientation as the ZF4 target se- quence N-ZF4 12 17 64707295 No ORF2 N-ZF4 13 3 19071915 No ORF2 N-ZF4 14 9 26557157 Yes Alu inserted 1836bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 15 9 13602109 No ORF2 8 N-ZF4 16 8 12699022 No ORF2 2 N-ZF4 17 X 72596070 Yes Alu Inserted 4805bp downstream and in the ORF2 same orientation as the ZF4 target se- quence

216

N-ZF4 18 8 72775760 Yes Alu Inserted 307 bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 20 3 15183794 No ORF2 N-ZF4 21 13 27327371 No ORF2 N-ZF4 22 5 40662640 Yes Alu inserted 2123bp upstream and in the ORF2 oppostie orientation of the ZF4 target se- quence N-ZF4 23 18 9828292 No ORF2 N-ZF4 24 20 52048287 May- Alu inserted 1435bp downstream in the ORF2 be same orientation as a potential ZF4 target site: GCTAGAAAATGAGATGAG N-ZF4 25 5 12251298 Yes ZF4 target sequence located 2808 bp ORF2 downstream of the Alu insertions in the same orientation N-ZF4 26 5 10651543 No ORF2 4 N-ZF4 27 4 38951920 No ORF2 N-ZF4 28 2 19760701 Yes Alu inserted 134bp upstream and in the ORF2 2 same orientation as the ZF4 target se- quence N-ZF4 29 13 90960575 Yes Alu inserted 1397bp upstream and in the ORF2 opposite orientation of the ZF4 target se- quence N-ZF4 30 13 89830753 No ORF2 N-ZF4 31 5 40110264 No ORF2 N-ZF4 32 8 10917526 No ORF2 4 N-ZF4 33 1 86181516 Yes Alu inserted 320 bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 34 3 17027558 No ORF2 8 N-ZF4 35 1 23757623 Yes Alu inserted 226bp upstream and in the ORF2 7 same orientation as the ZF4 target site N-ZF4 36 8 98891296 No ORF2 N-ZF4 37 4 12421783 Yes Alu inserted 622 bp upstream and in the ORF2 4 opposite orientation of the ZF4 target se- quence N-ZF4 38 22 27235131 Yes Alu inserted 317bp upstream of the ZF4 ORF2 target sequence in the same orientation N-ZF4 39 3 28963299 Yes Alu inserted 763bp downstream and in the ORF2 same orientation as the ZF4 target se-

217

quence N-ZF4 40 1 14448997 No ORF2 1 N-ZF4 41 7 13663716 May- Alu inserted 647bp up- ORF2 0 be stream in the oppositein- serted orientation from partial target: GGCATAAAAATGTGTTC N-ZF4 42 16 57471997 No ORF2 N-ZF4 43 9 20033134 No ORF2 N-ZF4 44 19 24378752 No ORF2 N-ZF4 45 4 79096340 No ORF2 N-ZF4 46 Yes Alu inserted Inserted in an L1HS: Mul- ORF2 137bp up- tiple Hits in the genome-- stream and in can't map exactly the opposite orientation of the ZF4 target sequence 137 N-ZF4 47 17 42775476 No ORF2 N-ZF4 48 2 18490079 No ORF2 4 N-ZF4 49 10 16494837 No ORF2 N-ZF4 50 11 98250855 No ORF2 N-ZF4 51 22 18484497 May- Alu inserted 757bp up- ORF2 be stream and in the same orientation as the poten- tial target site: GGCATAAAAAAGA- CATGA N-ZF4 52 4 89134310 No ORF2 N-ZF4 53 5 28364955 No ORF2 N-ZF4 54 15 83451561 Yes Alu inserted 404bp downstream and in the ORF2 same orientation as the ZF4 target site N-ZF4 55 15 57716255 Yes Alu lannded 3909bp downstream and in the ORF2 oppostie orientation of the ZF4 target site N-ZF4 56 4 79096364 No ORF2 N-ZF4 57 1 21544050 May- Alu inserted 1305bp ORF2 5 be downstream and in the same orientation as the

218

potential target site. Tar- get site is not an exact match, and is located in L1PA13. There are more than 3 mismatches N-ZF4 58 6 10737719 Yes Alu inserted 509bp upstream and in the ORF2 3 same orientation as the ZF4 target site N-ZF4 59 1 75492134 May- Alu inserted 226bp up- ORF2 be stream and in the oppo- site orientation of the po- tential ZF4 target se- quence: ATTTCAAAACATGATGA G N-ZF4 60 9 23503765 No ORF2 N-ZF4 61 10 88442629 No ORF2 N-ZF4 62 5 61176934 No ORF2 N-ZF4 63 10 22748976 Yes Alu inserted 31bp downstream and in the ORF2 opposite orientation of the ZF4 target se- quence N-ZF4 64 9 86732651 No ORF2 N-ZF4 65 4 12640131 No ORF2 9 N-ZF4 66 14 59159312 No ORF2 N-ZF4 67 15 85970958 Yes Alu inserted 314bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 68 7 11449063 Yes Alu inserted 199bp downstream and in the ORF2 4 opposite orientation of the ZF4 target se- quence N-ZF4 69 14 74775854 No ORF2 N-ZF4 70 7 11235056 No ORF2 2 N-ZF4 71 12 12004716 Yes Alu inserted 1373bp downstream and in the ORF2 5 opposite orientation of the ZF4 target se- quence N-ZF4 72 X 15105785 Yes Alu inserted 69bp upstream and in the op- ORF2 3 posite orientation of the ZF4 target se- quence N-ZF4 73 X 11645572 No ORF2 6 N-ZF4 74 1 24107550 No ORF2 5 N-ZF4 75 11 59904416 Yes Alu inserted 589bp downstream and in the

219

ORF2 same orientation as the ZF4 target se- quence N-ZF4 76 6 72296320 No ORF2 N-ZF4 77 6 81306831 No ORF2 N-ZF4 78 12 69267188 No ORF2 N-ZF4 79 2 11117634 No ORF2 0 N-ZF4 80 7 11644332 No ORF2 6 N-ZF4 81 1 16459255 No ORF2 0 N-ZF4 82 5 17435017 Yes Alu inserted 1871bp upstream and in the ORF2 2 same orientation as the ZF4 target se- quence N-ZF4 83 9 30015503 Yes Alu inserted 281bp upstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 84 16 71464331 No ORF2 N-ZF4 85 1 20273499 No ORF2 9 N-ZF4 86 1 95920351 No ORF2 N-ZF4 87 3 64074827 No ORF2 N-ZF4 88 7 11991397 Yes Alu inserted 577bp upstream and in the ORF2 9 same orientation as the ZF4 target se- quence N-ZF4 89 8 28180582 Yes Alu inserted 331bp downstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 90 2 22216006 No ORF2 1 N-ZF4 91 3 45160713 No ORF2 N-ZF4 92 12 62545959 No ORF2 N-ZF4 93 5 16424889 Yes Alu inserted 367bp upstream and in the op- ORF2 3 posite orientation as the ZF4 target se- quence N-ZF4 94 X 65008309 Yes Alu inserted 1916 bp downstream and in ORF2 the same orientation as the ZF4 target se- quence N-ZF4 95 4 16803482 No ORF2 7 N-ZF4 96 14 54316658 No ORF2

220

N-ZF4 97 1 94758373 No ORF2 N-ZF4 98 7 5496667 No ORF2 N-ZF4 99 17 29323010 No ORF2 N-ZF4 100 7 93823421 No ORF2 N-ZF4 101 12 12120229 No ORF2 6 N-ZF4 102 11 95046760 No ORF2 N-ZF4 103 8 14485672 No ORF2 4 N-ZF4 104 12 27696861 No ORF2 N-ZF4 105 11 6000204 Yes Alu inserted 75 bp upstream and in the ORF2 same orientation as the Alu N-ZF4 106 2 20154154 Yes Alu inserted 2907bp downstream and in the ORF2 4 same orientation as the ZF4 target se- quence. N-ZF4 107 15 50427263 No ORF2 N-ZF4 108 21 32731068 No ORF2 N-ZF4 109 2 19439376 Yes Alu inserted 746bp upstream and in the ORF2 7 same orientation as the ZF4 target se- quence N-ZF4 110 21 6721650 Yes Alu inserted 338bp downstream and in the ORF2 same orientation as the ZF4 target se- quence N-ZF4 111 1 16392625 No ORF2 N-ZF4 112 8 10339962 No ORF2 0 N-ZF4 113 15 63412576 No ORF2 N-ZF4 114 7 12453660 No ORF2 9 N-ZF4 115 11 66378393 Yes Alu inserted 241bp upstream and in the op- ORF2 posite orientation of the ZF4 target se- quence N-ZF4 116 10 69402626 No ORF2 N-ZF4 117 13 43261742 Yes Alu inserted 195bp upstream and in the op- ORF2 posite orientation of the ZF4 target se- quence

221

Appendix Table 7. Chromosome locations of Alu inserts driven by N-ZF4-ORF2 FL4 targeting zinc fingers

Driver Clone Chromosome Position ZF4 Location of Alu rela- Notes site? tive to Target Se- quence (If applica- ble) N-ZF4 1 13 85296511 No FL4 N-ZF4 2 17 44956329 No FL4 N-ZF4 3 1 47115028 Yes Alu inserted 84bp upstream FL4 and in the same orientation as the ZF4 target sequence N-ZF4 4 11 5445906 Yes Alu inserted 15bp upstream FL4 and in the same orientation as the ZF4 target sequence N-ZF4 5 2 181624173 No FL4 N-ZF4 6 3 55997888 No FL4 N-ZF4 7 13 103335682 No FL4 N-ZF4 8 X 31013822 No FL4 N-ZF4 9 1 178511225 No FL4 N-ZF4 10 12 68650325 No FL4 N-ZF4 11 1 164626605 No FL4 N-ZF4 12 22 49345600 No FL4 N-ZF4 13 6 67898471 No FL4 N-ZF4 14 4 39099648 No FL4 N-ZF4 15 9 16250633 No FL4 N-ZF4 16 14 36046269 No FL4 N-ZF4 17 3 55592785 No FL4 N-ZF4 18 17 62776227 No FL4 N-ZF4 19 3 191006316 Yes Alu inserted 4599bp down- FL4 stream and in the same ori- entation as the ZF4 target sequence

222

N-ZF4 20 17 16660271 No FL4 N-ZF4 21 16 75307506 No FL4 N-ZF4 22 1 40527455 Yes Alu inserted 280bp up- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 23 12 8966669 Yes Alu located 332bp upstream FL4 and in the same orientation as the ZF4 target sequence. N-ZF4 24 14 66510479 No FL4 N-ZF4 25 22 44775366 No FL4 N-ZF4 26 8 60144473 No FL4 N-ZF4 27 5 27957579 Yes Alu inserted 617bp down- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 28 2 153926332 Yes Alu located 2326 bp up- FL4 stream and in the same ori- entation as the ZF4 target sequence N-ZF4 29 1 96679459 Yes Alu insertion located 1255 FL4 bp upstream and in the same orientation of the ZF4 target sequence N-ZF4 30 1 153565543 No FL4 N-ZF4 31 7 95570122 Yes Alu inserted 3721bp up- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 32 4 85880267 No FL4 N-ZF4 33 5 43652633 No FL4 N-ZF4 34 2 174823389 No FL4 N-ZF4 35 5 40624287 No FL4 N-ZF4 36 1 178891486 No FL4 N-ZF4 37 6 47216395 Yes Alu inserted 598 bp down- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 38 11 63624676 No

223

FL4 N-ZF4 39 2 187501397 No FL4 N-ZF4 40 21 37068803 No FL4 N-ZF4 41 6 114078602 Maybe Alu inserted 2106 bp up- FL4 stream and in the opposite orientation from a partial target sequence. GCCA- TAAAAAATGAAACC N-ZF4 42 2 141614494 No FL4 N-ZF4 43 5 30515974 No FL4 N-ZF4 44 15 34672009 No FL4 N-ZF4 45 8 88058176 Yes Alu inserted 4297 bp up- FL4 stream and in the same ori- entation of the ZF4 target site. See seqbuilder file for more details. N-ZF4 46 7 43848962 No FL4 N-ZF4 47 2 9383996 Yes Alu inserted 600 bp up- FL4 stream in the opposite orien- tation of the ZF4 target site. N-ZF4 48 7 96077565 No FL4 N-ZF4 49 21 24732728 No FL4 N-ZF4 50 17 72844091 No FL4 N-ZF4 51 3 164214956 Yes Alu inserted 4001bp up- FL4 stream and in the opposite orientation as the ZF4 target sequence N-ZF4 52 17 63295142 Yes Alu inserted 3958bp down- FL4 stream and in the opposite orientation of the ZF4 target site. N-ZF4 53 3 99813152 No FL4 N-ZF4 54 2 33504242 No FL4 N-ZF4 55 2 138162508 No FL4 N-ZF4 56 18 39712547 No FL4 N-ZF4 57 5 64777381 No FL4

224

N-ZF4 58 5 32965012 No FL4 N-ZF4 59 1 75492052 No FL4 N-ZF4 60 1 85342637 Yes Alu inserted 4942bp up- FL4 stream and in the opposite orientation of the ZF4 target sequence. N-ZF4 61 11 14428475 No FL4 N-ZF4 62 6 84827051 Yes Alu inserted 21bp upstream FL4 and in the same orientation as the ZF4 target site. N-ZF4 63 18 49249772 No FL4 N-ZF4 64 21 41417351 No FL4 N-ZF4 65 6 152882688 No FL4 N-ZF4 66 4 134902304 Yes Alu inserted 320 bp up- FL4 stream in the same orienta- tion as the ZF4 target se- quence N-ZF4 67 13 73196324 No FL4 N-ZF4 68 2 159254410 No FL4 N-ZF4 69 9 78912744 No FL4 N-ZF4 70 2 188733643 No FL4 N-ZF4 71 3 167478775 No FL4 N-ZF4 72 10 76479428 Yes Alu inserted 497bp down- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 73 11 33652631 No FL4 N-ZF4 74 17 71520151 Yes Alu inserted within the ZF4 FL4 target sequence, located in L1PA6 in the same orienta- tion N-ZF4 75 17 70393239 No FL4 N-ZF4 76 11 86394869 No FL4 N-ZF4 77 2 231103903 No FL4 N-ZF4 78 12 119325641 No

225

FL4 N-ZF4 79 5 168664383 No FL4 N-ZF4 80 2 56742983 No FL4 N-ZF4 81 14 59082274 No FL4 N-ZF4 82 5 35497285 Yes Alu inserted 2767bp up- FL4 stream and in the same ori- entation as the ZF4 target sequence N-ZF4 83 11 5801499 No FL4 N-ZF4 84 2 158567910 No FL4 N-ZF4 85 7 27883436 Yes Alu inserted 3480bp up- FL4 stream and in the same ori- entation as the ZF4 target sequence. **The ZF4 target sequence had two mis- matches: GCCATAAAAAA- GAATGAG N-ZF4 86 15 66739945 No FL4 N-ZF4 87 10 88561661 No FL4 N-ZF4 88 5 163571269 No FL4 N-ZF4 89 7 45377325 Yes Alu inserted 514bp down- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 90 14 70100490 No FL4 N-ZF4 91 5 151000445 No FL4 N-ZF4 92 12 87932911 No FL4 N-ZF4 93 2 226533338 Yes Alu inserted 28bp upstream FL4 and in the same orientation as the ZF4 target sequence. N-ZF4 94 9 73543992 No FL4 N-ZF4 95 4 47106238 No FL4 N-ZF4 96 9 26235976 No FL4 N-ZF4 97 8 87507864 No FL4

226

N-ZF4 98 17 54972677 No FL4 N-ZF4 99 13 49443071 No FL4 N-ZF4 100 14 36689961 Yes Alu inserted 4691bp down- FL4 stream and in the opposite orientation of the ZF4 target sequence N-ZF4 101 5 164733407 Yes Alu inserted 2bp upstream FL4 and in the same orientation as the ZF4 target sequence N-ZF4 102 9 132011936 Yes Alu inserted 1220bp down- FL4 stream and in the same ori- entation as the ZF4 target sequence. N-ZF4 103 17 78444169 No FL4 N-ZF4 104 21 29442557 Yes Alu inserted 2281bp up- FL4 stream and in the opposite orientation of the ZF4 target sequence. N-ZF4 105 6 83141843 No FL4 N-ZF4 106 X 45485661 No FL4 N-ZF4 107 17 41452857 No FL4 N-ZF4 108 10 2352840 No FL4

227

Appendix Table 8. Chromosome locations of Alu inserts driven by N-ZF4-ORF2 HL4 targeting zinc fingers

Driver Clone Chr Position ZF4 Location of Alu relative to Target Se- Notes site? quence (If applicable) N ZF4 1 11 73829777 No HL4 N ZF4 2 10 10858161 No HL4 N ZF4 3 8 116181550 Yes Alu inserted341 bp upstream in the same ori- HL4 entation from ZF4 target site N ZF4 4 2 32660278 Yes Alu inserted 318bp upstream and in the same HL4 orientation as the ZF4 target sequence N ZF4 5 4 89919278 No HL4 N ZF4 6 6 101758687 No HL4 N ZF4 7 2 116394922 No HL4 N ZF4 8 1 12780503 No HL4 N ZF4 9 11 74782114 No HL4 N ZF4 10 7 97407146 No HL4 N ZF4 11 12 83182951 No HL4 N ZF4 12 2 225595768 No HL4 N ZF4 13 20 32087627 No HL4 N ZF4 14 9 87853001 No HL4 N ZF4 15 5 152213568 Yes Alu inserted 564 bp downstream and in the HL4 same orientation of the ZF4 target site. N ZF4 16 5 168469263 No HL4 N ZF4 17 3 121524927 No HL4 N ZF4 18 7 44594297 No HL4 N ZF4 19 5 18706507 Yes Alu inserted 1469 bp downstream and in the HL4 same orientation of the ZF4 target sequence in an L1 PA7 N ZF4 20 6 133856086 No HL4 N ZF4 21 9 131183543 No HL4

228

N ZF4 22 1 191002269 Yes Alu inserted 604bp upstream and in the same HL4 orientation as the target sequence N ZF4 23 14 37665034 No HL4 N ZF4 24 12 39906714 No HL4 N ZF4 25 4 125200638 No HL4 N ZF4 26 8 34794510 No HL4 N ZF4 27 5 148186963 No HL4 N ZF4 28 3 149466096 No HL4 N ZF4 29 1 220304776 No HL4 N ZF4 30 5 14788501 No HL4 N ZF4 31 3 100571572 Yes Alu inserted 355bp downstream in the opposite HL4 orientation of the ZF4 target sequence N ZF4 32 4 91728145 No HL4 N ZF4 33 7 89192026 Yes Alu inserted 1738 bp downstream in the oppo- HL4 site orientation of the ZF4 target sequence in L1PA3 N ZF4 34 5 135552528 No HL4 N ZF4 35 14 87018057 Yes Alu inserted 1699bp upstream and in the op- HL4 posite orientation of the ZF4 target sequence N ZF4 36 18 10890957 No HL4 N ZF4 37 2 223703448 No HL4 N ZF4 38 12 75673633 No HL4 N ZF4 39 6 154827627 No HL4 N ZF4 40 5 128257563 No HL4 N ZF4 41 1 73716324 Yes Alu inserted 384bp upstream and in the oppo- HL4 site orientation of the ZF4 target in L1PA7 N ZF4 42 11 76838786 No HL4 N ZF4 43 8 88058176 Yes Alu inserted 4297 bp upstream and in the HL4 same orientation of the ZF4 target sequence N ZF4 44 12 20380508 No HL4 N ZF4 45 X 118575160 No HL4 N ZF4 46 12 63053225 No

229

HL4 N ZF4 47 1 242202807 No HL4 N ZF4 48 12 45231819 Yes Alu inserted 4855 bp downstream of the ZF4 HL4 target site in the same orientation N ZF4 49 8 40019577 Yes Alu inserted 2684bp downstream in the same HL4 orientation of the ZF4 target sequence N ZF4 50 8 136425024 No HL4 N ZF4 51 8 133679565 No HL4 N ZF4 52 6 111160362 No HL4 N ZF4 53 2 85535655 No HL4 N ZF4 54 10 103015065 No HL4 N ZF4 55 4 102415118 Yes Alu inserted 492bp upstream in the opposite HL4 orientation of the ZF4 target site. N ZF4 56 11 76611089 No HL4 N ZF4 57 9 20986633 No HL4 N ZF4 58 9 8636678 No HL4 N ZF4 59 2 223937436 No HL4 N ZF4 60 17 63295287 Yes Alu inserted 3813bp downstream and in the HL4 opposite orientation of the ZF4 target se- quence N ZF4 61 15 57137881 No HL4 N ZF4 62 9 20361705 Yes Alu inserted 4668bp downstream in the oppo- HL4 site orientation of the ZF4 target sequence N ZF4 63 17 70255999 No HL4 N ZF4 64 2 127414901 No HL4 N ZF4 65 15 32721437 No HL4 N ZF4 66 17 28810673 No HL4 N ZF4 67 1 66345133 No HL4 N ZF4 68 12 39813708 Yes Alu inserted 3471bp upstream and in the same HL4 orientation as the ZF4 target sequence N ZF4 69 15 68458497 No HL4 N ZF4 70 20 19845309 Yes Alu inserted 85bp upstream and in the same HL4 orientation as the ZF4 target sequence.

230

N ZF4 71 6 85078344 No HL4 N ZF4 72 11 19648150 Yes Alu inserted 4640bp upstream and in the op- HL4 posite orientation as the ZF4 target sequence N ZF4 73 15 68458497 No HL4 N ZF4 74 1 100880227 No HL4 N ZF4 75 1 100880354 No HL4 N ZF4 76 1 51618531 No HL4 N ZF4 77 2 67951867 No HL4 N ZF4 78 9 135558998 No HL4 N ZF4 79 3 156354811 No HL4 N ZF4 80 3 78380615 No HL4 N ZF4 81 17 34327826 Yes Alu inserted 9bp upstream and in the same HL4 orientation as the ZF4 target sequence.

231

Appendix Table 9. Chromosome locations of Alu inserts driven by C-ZF4-ORF2 GHL targeting zinc fingers

Driver Clone Chr Position ZF4 Location of Alu rela- Notes site? tive to Target Se- quence (If applica- ble) C-ZF4- 1 7 106810346 No ORF2 C-ZF4- 2 1 185841938 No ORF2 C-ZF4- 3 6 108209137 No ORF2 C-ZF4- 4 1 41811798 No ORF2 C-ZF4- 5 2 229336516 No ORF2 C-ZF4- 6 5 35707075 No ORF2 C-ZF4- 7 6 108210916 No ORF2 C-ZF4- 8 14 50739919 No ORF2 C-ZF4- 9 1 34615066 No ORF2 C-ZF4- 10 17 1766600 No ORF2 C-ZF4- 11 4 82435681 No ORF2 C-ZF4- 12 8 100947987 No ORF2 C-ZF4- 13 7 43435178 No ORF2 C-ZF4- 14 15 35675240 No ORF2 C-ZF4- 15 4 48532067 No ORF2 C-ZF4- 16 6 74672589 No ORF2 C-ZF4- 17 10 89310400 No ORF2 C-ZF4- 18 13 49022314 No ORF2 C-ZF4- 19 4 83589305 No ORF2 C-ZF4- 20 10 66975021 No ORF2 C-ZF4- 21 10 100332515 No

232

ORF2 C-ZF4- 22 14 95326473 No ORF2 C-ZF4- 23 20 50361882 Yes Alu inserted 753bp See Word document ORF2 downstream and in the same orientation as the target se- quence C-ZF4- 24 6 2925385 No ORF2 C-ZF4- 25 1 172845159 No ORF2 C-ZF4- 26 10 71249098 No ORF2 C-ZF4- 27 Yes Alu inserted ~70 bp Matched Multiple ORF2 downstream and in locations due to the opposite orienta- L1PA3. Can't map tion of target se- exactly. quence C-ZF4- 28 6 124883382 No ORF2 C-ZF4- 29 8 23371256 No ORF2 C-ZF4- 30 12 64039522 No ORF2 C-ZF4- 31 10 107327569 Yes Alu inserted ~400 See Word document ORF2 downstream and in the opposite orienta- tion of the ZF4 tar- get sequence C-ZF4- 32 1 171996221 No ORF2 C-ZF4- 33 2 124392783 No ORF2 C-ZF4- 34 2 105342712 No ORF2 C-ZF4- 35 7 35331767 No ORF2 C-ZF4- 36 9 21798262 No ORF2 C-ZF4- 37 21 10349664 Yes Alu inserted ~320bp See Word document ORF2 downstream and in the opposite orienta- tion of the ZF4 tar- get sequence C-ZF4- 38 5 174406617 No ORF2 C-ZF4- 39 8 19147097 No ORF2 C-ZF4- 40 7 85135997 No

233

ORF2 C-ZF4- 41 10 61307102 No ORF2 C-ZF4- 42 17 30385489 No ORF2 C-ZF4- 43 8 37691046 No ORF2 C-ZF4- 44 12 106828673 No ORF2 C-ZF4- 45 17 38115097 No ORF2 C-ZF4- 46 9 34755727 Yes Alu inserted 665 bp downstream and in the ORF2 opposite orientation of the ZF4 target se- quence. C-ZF4- 47 2 99139348 No ORF2 C-ZF4- 48 18 67720417 Yes Alu inserted 1229bp upstream and in the ORF2 opposite orientation of the ZF4 target se- quence. C-ZF4- 49 22 29314147 No ORF2 C-ZF4- 50 6 57092303 No ORF2 C-ZF4- 51 6 7703645 No ORF2 C-ZF4- 52 6 20402830 No ORF2 C-ZF4- 53 9 133417064 No ORF2 C-ZF4- 54 1 105246853 Yes Alu inserted 1060bp upstream and in the ORF2 opposite orientation of the ZF4 target se- quence C-ZF4- 55 10 114694055 No ORF2 C-ZF4- 56 2 223418305 No ORF2 C-ZF4- 57 6 107579114 No ORF2 C-ZF4- 58 3 32410017 No ORF2 C-ZF4- 59 8 127649784 No ORF2 C-ZF4- 60 1 235264008 No ORF2 C-ZF4- 61 8 96347813 No ORF2 C-ZF4- 62 20 7595533 No ORF2 C-ZF4- 63 4 39298836 Yes Alu inserted 2238bp upstream and in the ORF2 opposite orientation of the ZF4 target se-

234

quence C-ZF4- 64 6 32442143 No ORF2 C-ZF4- 65 8 98686670 No ORF2 C-ZF4- 66 5 10471323 No ORF2 C-ZF4- 67 6 31580514 No ORF2 C-ZF4- 68 3 22983356 No ORF2 C-ZF4- 69 3 44148496 Yes Alu inserted 5bp upstream and in the same ORF2 orientation as the ZF4 target sequence C-ZF4- 70 10 75755200 No ORF2 C-ZF4- 71 16 46793087 No ORF2 C-ZF4- 72 17 29681885 No ORF2 C-ZF4- 73 2 23820972 No ORF2 C-ZF4- 74 X 9066248 No ORF2 C-ZF4- 75 14 68772217 No ORF2 C-ZF4- 76 4 30539216 No ORF2 C-ZF4- 77 9 17782380 Yes Alu inserted 1392bp upstream and in the ORF2 same orientation as the ZF4 target se- quence C-ZF4- 78 3 188534644 No ORF2 C-ZF4- 79 11 100890694 No ORF2 C-ZF4- 80 1 233147279 No ORF2 C-ZF4- 81 6 127387690 No ORF2 C-ZF4- 82 X 15045378 Yes Alu inserted 2037bp downstream and in the ORF2 opposite orientation of the ZF4 target se- quence C-ZF4- 83 5 140211058 No ORF2 C-ZF4- 84 19 48215241 No ORF2 C-ZF4- 85 2 25593919 No ORF2 C-ZF4- 86 7 76806909 No ORF2

235

C-ZF4- 87 13 33567553 No ORF2 C-ZF4- 88 9 73359137 No ORF2 C-ZF4- 89 8 107084100 No ORF2 C-ZF4- 90 3 147983524 Yes Alu inserted 3720 bp downstream and in ORF2 the opposite orientation of the ZF4 target sequence . C-ZF4- 91 15 63040828 No ORF2 C-ZF4- 92 5 167655928 No ORF2 C-ZF4- 93 1 101186721 Yes Alu inserted 2535bp upstream and in the ORF2 opposite orientation of the ZF4 target se- quence

236

Appendix Table 10. Chromosome locations of Alu inserts driven by C-ZF4-ORF2 FL4 targeting zinc fingers

Driver Clone Chr Position ZF4 Location of Alu relative to Target Notes site? Sequence (If applicable) C ZF4 1 4 1.2E+08 No FL4 C ZF4 2 19 9810443 No FL4 C ZF4 3 7 95299329 No FL4 C ZF4 4 15 80448247 No FL4 C ZF4 5 1 150979180 No FL4 C ZF4 6 4 16086400 No FL4 C ZF4 7 10 88758343 No FL4 C ZF4 8 14 62591012 No FL4 C ZF4 9 5 98964341 No FL4 C ZF4 10 6 74885388 Yes Alu inserted 789 bp upstream and in the FL4 same orientation as the ZF4 target se- quence C ZF4 11 2 83282559 No FL4 C ZF4 12 9 72961522 Yes Alu inserted 2069bp upstream and in the FL4 same orientation of the ZF4 target se- quence C ZF4 13 4 134336635 Yes Alu inserted 370bp downstream and in FL4 the same orientation as the ZF4 target site C ZF4 14 8 66763102 No FL4 C ZF4 15 4 54126082 No FL4 C ZF4 16 5 40699445 No FL4 C ZF4 17 1 240279911 Yes Alu inserted 4292bp upstream and in the FL4 same orientation as the ZF4 target se- quence C ZF4 18 8 49545235 Yes Alu inserted 4256bp upstream and in the FL4 same orientation as a potential ZF4 target sequence C ZF4 19 9 41555687 Yes Alu inserted 1011bp downstream and in FL4 the same orientation as the ZF4 target sequence

237

C ZF4 20 9 29364194 No Alu inserted 5738bp upstream and in the FL4 same orientation as the ZF4 target se- quence C ZF4 21 7 74857886 No FL4 C ZF4 22 17 8009394 No FL4 C ZF4 23 2 26941082 No FL4 C ZF4 24 9 131029089 No FL4 C ZF4 25 14 38258857 Yes Alu inserted 696bp downstream and in FL4 the opposite orientation of the ZF4 target sequence C ZF4 26 5 121093292 No FL4 C ZF4 27 13 63931559 No FL4 C ZF4 28 7 102854728 Yes Alu inserted 523bp downstream and in FL4 the same orientation as the ZF4 target sequence. C ZF4 29 4 47842135 Yes Alu inserted 4592bp upstream and in the FL4 same orientation as the ZF4 target se- quence C ZF4 30 4 160212937 Yes Alu inserted 2999bp downstream and in FL4 the opposite orientation of the ZF4 target sequence C ZF4 31 4 142427097 No FL4 C ZF4 32 11 9116399 No FL4 C ZF4 33 7 100422178 No FL4 C ZF4 34 10 7309546 No FL4 C ZF4 35 11 10309773 No FL4 C ZF4 36 5 159176048 No FL4 C ZF4 37 1 216124229 No FL4 C ZF4 38 16 19532931 No FL4 C ZF4 39 3 166543851 No FL4 C ZF4 40 6 165521371 No FL4 C ZF4 41 1 79117576 Yes Alu inserted 3817bp downstream and in FL4 the opposite orientation of the ZF4 target sequence

238

C ZF4 42 5 144439191 No FL4 C ZF4 43 2 96598983 No FL4 C ZF4 44 1 66797119 Yes Alu inserted 911bp upstream and in the FL4 same orientation as the ZF4 target se- quence. C ZF4 45 1 8705969 No FL4 C ZF4 46 14 59337320 No FL4 C ZF4 47 5 159176048 No FL4

239

Appendix Table 11. Chromosome locations of Alu inserts driven by C-ZF4-ORF2 HL4 targeting zinc fingers

Driver Clone Chr. Position ZF4 Location of Alu relative to Target Se- Notes site? quence (If applicable) C 1 3 166021522 Yes Alu Located 709 bp downstream and in the ZF4 opposite orientation of the ZF4 target se- HL4 quence C 2 14 102120637 No ZF4 HL4 C 3 16 56468358 No ZF4 HL4 C 4 15 59214636 No ZF4 HL4 C 5 9 125850639 No ZF4 HL4 C 6 8 42527550 No ZF4 HL4 C 7 3 11262411 No ZF4 HL4 C 8 15 39756843 No ZF4 HL4 C 9 13 85384521 No ZF4 HL4 C 10 6 166116445 No ZF4 HL4 C 11 1 115382552 Yes Alu landed 1112bp downstream and in the ZF4 opposite orientation of the ZF4 target se- HL4 quence. C 12 22 41148732 No ZF4 HL4 C 13 2 27770351 No ZF4 HL4 C 14 7 112423540 No ZF4 HL4 C 15 10 88474425 No ZF4

240

HL4 C 16 X 36081054 No ZF4 HL4 C 17 15 38918790 No ZF4 HL4 C 18 14 102512154 No ZF4 HL4 C 19 6 5310701 No ZF4 HL4 C 20 12 23677505 No ZF4 HL4 C 21 11 106383288 No ZF4 HL4 C 22 4 79935140 No ZF4 HL4 C 23 6 108075262 No ZF4 HL4 C 24 11 118442456 No ZF4 HL4 C 25 4 149513130 No ZF4 HL4 C 26 5 14654084 No ZF4 HL4 C 27 11 29144229 Yes Alu landed 651bp downstream and in the ZF4 same orientation as the ZF4 target site HL4 C 28 17 80093945 No ZF4 HL4 C 29 X 46785681 No ZF4 HL4 C 30 11 118442453 No ZF4 HL4 C 31 8 66762914 No ZF4 HL4 C 32 11 34318884 No

241

ZF4 HL4 C 33 16 75375448 No ZF4 HL4 C 34 6 1796299 No ZF4 HL4 C 35 5 58862230 No ZF4 HL4 C 36 3 194547481 No ZF4 HL4 C 37 3 125230534 No ZF4 HL4 C 38 7 88880545 No ZF4 HL4 C 39 9 96460897 No ZF4 HL4 C 40 2 186033502 No ZF4 HL4 C 41 15 44891077 Yes Alu inserted 29bp upstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 42 2 218887150 No ZF4 HL4 C 43 2 1071411 Yes Alu inserted 76bp downstream in the same ZF4 orientation as the ZF4 target sequence HL4 C 44 3 171448707 Yes Alu inserted 373bp downstream and in the ZF4 opposite orientation of the ZF4 target se- HL4 quence C 45 5 148237862 No ZF4 HL4 C 46 14 22470678 No ZF4 HL4 C 47 1 66275253 No ZF4 HL4 C 48 14 42474768 No ZF4 HL4

242

C 49 1 102708222 No ZF4 HL4 C 50 1 85569426 No ZF4 HL4 C 51 1 196157189 Yes Alu inserted 766bp upstream and in the same ZF4 orientation as the ZF4 target Sequence HL4 C 52 14 42474768 No ZF4 HL4 C 53 15 34765010 No ZF4 HL4 C 54 1 199994257 No ZF4 HL4 C 55 7 105643989 No ZF4 HL4 C 56 1 218481590 No ZF4 HL4 C 57 19 41440408 No ZF4 HL4 C 58 11 97889294 No ZF4 HL4 C 59 6 146988846 Yes Alu inserted 152bp downstream and in the ZF4 same orientation as the ZF4 target sequence HL4 C 60 9 132529601 Yes Alu inserted 410 bp downstream and in the ZF4 same orientation as the ZF4 target sequence HL4 C 61 4 152263397 No ZF4 HL4 C 62 12 78085705 No ZF4 HL4 C 63 20 6816769 Yes Alu inserted 318bp upstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 64 12 52795148 No ZF4 HL4 C 65 2 182759437 No ZF4

243

HL4 C 66 7 7566207 No ZF4 HL4 C 67 11 56510829 No ZF4 HL4 C 68 15 76634754 No ZF4 HL4 C 69 1 88698880 No ZF4 HL4 C 70 8 127276362 No ZF4 HL4 C 71 X 129342303 Yes Alu landed 420bp upstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 72 2 161655611 Yes Alu landed 3608bp upstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 73 5 144972874 Yes Alu landed 293 downstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 74 8 127276373 No ZF4 HL4 C 75 3 35013924 Yes Alu inserted 736bp downstream and in the ZF4 opposite orientation of the ZF4 target se- HL4 quence C 76 1 35531824 No ZF4 HL4 C 77 3 151799661 No ZF4 HL4 C 78 1 221052887 Yes Alu inserted 316bp upstream and in the same ZF4 orientation as the ZF4 target sequence HL4 C 79 2 131020368 Yes Alu inserted 660bp downstream and in the ZF4 same orientation as the ZF4 target sequence HL4 (more like 350bp in HeLa cells, as they were missing the sequence) C 80 9 134310757 No ZF4 HL4 C 81 15 76634758 No ZF4 HL4

244

C 82 5 94290346 Yes Alu Inserted 276bp downstream and in the ZF4 same orientation as the ZF4 target sequence HL4 C 83 5 31096755 No ZF4 HL4 C 84 3 124510169 No ZF4 HL4 C 85 2 149814955 No ZF4 HL4 C 86 6 137830771 No ZF4 HL4 C 87 9 91074730 No ZF4 HL4 C 88 5 57943274 No ZF4 HL4 C 89 6 74918388 No ZF4 HL4 C 90 12 50038561 No ZF4 HL4 C 91 2 59137213 No ZF4 HL4 C 92 1 199994258 No ZF4 HL4

245

Appendix Table 12. Chromosome locations of Alu inserts driven by N-ZF2-ORF2 GHL targeting zinc finger

Driver Clone Chr Position ZF2 Location of Alu rela- Notes site? tive to Target Se- quence (If applicable) N- 1 7 131042832 No ZF2- ORF2 N- 2 12 59969863 No ZF2- ORF2 N- 3 20 5412591 No ZF2- ORF2 N- 4 9 129696062 No ZF2- ORF2 N- 5 6 82185443 No ZF2- ORF2 N- 6 3 27809669 Yes Alu inserted 1997bp ZF2- downstream of ZF2 ORF2 target site N- 7 X 109714911 No ZF2- ORF2 N- 8 14 62819431 No ZF2- ORF2 N- 9 6 113047416 No ZF2- ORF2 N- 10 1 174720071 No ZF2- ORF2 N- 11 11 28127703 No ZF2- ORF2 N- 12 1 112958778 No ZF2- ORF2 N- 13 18 15114894 No ZF2- ORF2 N- 14 14 55069487 No ZF2- ORF2 N- 15 13 36799671 ZF2-

246

ORF2 N- 16 Can't map with 100% cer- ZF2- tainty-- matched several ge- ORF2 nomic locations N- 17 6 25436148 No ZF2- ORF2 N- 18 Mapped to two different lo- ZF2- cations perfectly. Can't map ORF2 with 100% certainty N- 19 22 40677529 No ZF2- ORF2 N- 20 1 109568160 No ZF2- ORF2 N- 21 8 91746118 No ZF2- ORF2 N- 22 11 73206419 No ZF2- ORF2 N- 23 4 81962905 No ZF2- ORF2 N- 24 9 122885670 No ZF2- ORF2 N- 25 6 36501735 No ZF2- ORF2 N- 26 12 48936863 No ZF2- ORF2

247

Appendix Table 13. Chromosome locations of Alu inserts driven by N-ZF2-ORF2 FL4 targeting zinc finger

Driver Clone Chr Position ZF4 Location of Alu relative to Target site? Sequence (If applicable) N ZF2 1 22 30766300 No FL4 N ZF2 2 10 67760478 No FL4 N ZF2 3 11 13439921 No FL4 N ZF2 4 10 23090212 No FL4 N ZF2 5 20 34535439 No FL4 N ZF2 6 14 61492770 No FL4 N ZF2 7 22 21751522 No FL4 N ZF2 8 7 31246660 No FL4 N ZF2 9 6 25032546 No FL4 N ZF2 10 1 201587454 No FL4 N ZF2 11 2 224525986 No FL4 N ZF2 12 22 30976419 No FL4 N ZF2 13 X 77915094 No FL4 N ZF2 14 11 99209331 No FL4 N ZF2 15 4 156518830 No FL4 N ZF2 16 5 74758128 No FL4 N ZF2 17 2 199514748 No FL4 N ZF2 18 8 91266619 No FL4 N ZF2 19 6 2562281 No FL4

248

Appendix Table 14. Chromosome locations of Alu inserts driven by N-ZF2-ORF2 HL4 targeting zinc finger

Driver Clone Chr Position ZF4 Location of Alu relative to Notes site? Target Sequence (If applica- ble) N ZF2 1 7 152157954 No HL4 N ZF2 2 11 66136859 No HL4 N ZF2 3 4 114466414 No HL4 N ZF2 4 6 2933496 No HL4 N ZF2 5 20 10779093 No HL4 N ZF2 6 3 146486266 No HL4 N ZF2 7 6 111763914 No HL4 N ZF2 8 2 224797650 No HL4 N ZF2 9 11 65499320 No HL4 N ZF2 10 5 138649049 No HL4 N ZF2 11 9 120796708 No HL4 N ZF2 12 9 71937146 No HL4 N ZF2 13 12 122609469 No HL4 N ZF2 14 2 238277234 No HL4 N ZF2 15 2 142970622 No HL4 N ZF2 16 22 31357819 No HL4 N ZF2 17 9 137422489 No HL4 N ZF2 18 20 25156896 No HL4 N ZF2 19 2 182077477 No HL4 N ZF2 20 12 13280665 No HL4 N ZF2 21 8 127504904 No HL4

249

N ZF2 22 17 27244978 No HL4 N ZF2 23 2 201070170 No HL4 N ZF2 24 12 104495002 No HL4 N ZF2 25 14 49862915 No HL4 N ZF2 26 18 9600436 No HL4 N ZF2 27 22 40778528 No HL4 N ZF2 28 6 17943657 No HL4 N ZF2 29 1 36061942 No HL4 N ZF2 30 8 66763059 No HL4 N ZF2 31 13 98792206 No HL4

250

Appendix Table 15. Chromosome locations of Alu inserts driven by C-ZF2-ORF2 GHL targeting zinc finger

Driver Clone Chr Position ZF2 Location of Alu Notes site? relative to Tar- get Sequence (If applicable) C-ZF2 1 12 7319253 No ORF2 C-ZF2 2 20 50530732 No ORF2 C-ZF2 3 20 41050908 No ORF2 C-ZF2 4 10 33244991 No ORF2 C-ZF2 5 12 189949540 No ORF2 C-ZF2 6 Chimeric Sine from ORF2 two different chromo- somes C-ZF2 7 4 82969740 No ORF2 C-ZF2 8 12 12600704 No ORF2 C-ZF2 9 2 174557492 No ORF2 C-ZF2 10 20 47351851 No ORF2 C-ZF2 11 8 38779300 No ORF2 C-ZF2 12 17 8577002 No ORF2 C-ZF2 13 1 207229683 No ORF2 C-ZF2 14 16 15203375 No ORF2 C-ZF2 15 7 128602161 No ORF2 C-ZF2 16 8 7496736 No ORF2 C-ZF2 17 11 8890743 No ORF2 C-ZF2 18 7 93997572 No ORF2 C-ZF2 19 3 42726548 No ORF2 C-ZF2 20 7 17353822 Yes Alu inserted ap- ORF2 proximately

251

away from target sequence C-ZF2 21 12 59141569 No ORF2 C-ZF2 22 12 38308871 No ORF2 C-ZF2 23 7 134536520 No ORF2 C-ZF2 24 8 39583065 No ORF2 C-ZF2 25 5 50741453 No ORF2 C-ZF2 26 10 12575295 No ORF2

252

Appendix Table 16. Chromosome locations of Alu inserts driven by C-ZF2-ORF2 FL4 targeting zinc finger

Driver Clone Chr Position ZF4 Location of Alu rel- Notes site? ative to Target Se- quence (If applica- ble) C ZF2 1 13 21151335 No FL4 C ZF2 2 9 125028388 FL4 C ZF2 3 10 62245024 No FL4 C ZF2 4 14 49862885 FL4 C ZF2 5 13 43298388 FL4 C ZF2 6 20 38375621 FL4 C ZF2 7 1 65805879 FL4 C ZF2 8 7 94108898 FL4 C ZF2 9 X 107869224 FL4 C ZF2 10 10 30482652 FL4 C ZF2 11 1 226204810 FL4 C ZF2 12 7 6045795 FL4 C ZF2 13 3 196235073 FL4 C ZF2 14 3 127188004 FL4 C ZF2 15 12 92152321 FL4

253

Appendix Table 17. Chromosome locations of Alu inserts driven by C-ZF2-ORF2 HL4 targeting zinc finger

Driver Clone Chr Position ZF4 Location of Alu relative Notes site? to Target Sequence (If applicable) C ZF2 1 1 180022071 No HL4 C ZF2 2 2 42697024 No HL4 C ZF2 3 2 41872320 No HL4 C ZF2 4 2 14985268 No HL4 C ZF2 5 2 26005813 No HL4 C ZF2 6 6 132153320 No HL4 C ZF2 7 2 33562232 No HL4 C ZF2 8 16 56931659 No HL4 C ZF2 9 12 54620797 No HL4 C ZF2 10 3 149466076 No HL4 C ZF2 11 9 21503865 No HL4 C ZF2 12 16 56931652 No HL4 C ZF2 13 10 28091896 No HL4 C ZF2 14 10 124397176 No HL4 C ZF2 15 5 172732096 No HL4 C ZF2 16 18 46217311 No HL4 C ZF2 17 14 49183855 No HL4 C ZF2 18 4 136021611 No HL4 C ZF2 19 5 66040426 No HL4 C ZF2 20 7 22501637 No HL4 C ZF2 21 5 21617433 No HL4

254

C ZF2 22 18 44617305 No HL4 C ZF2 23 9 115796994 No HL4 C ZF2 24 20 51207610 No HL4 C ZF2 25 17 57031821 No HL4 C ZF2 26 1 209815503 No HL4 C ZF2 27 4 99576414 No HL4 C ZF2 28 2 145905609 No HL4 C ZF2 29 17 78740448 No HL4 C ZF2 30 1 209815503 No HL4 C ZF2 31 4 76374372 No HL4 C ZF2 32 4 134331854 Yes Alu located 4953bp upstream and in HL4 the same orientation as the ZF4 tar- get sequence. C ZF2 33 8 19285605 No HL4 C ZF2 34 2 200706028 No HL4 C ZF2 35 4 120317576 No HL4 C ZF2 36 10 103227613 No HL4 C ZF2 37 15 64987491 No HL4 C ZF2 38 8 141280110 No HL4 C ZF2 39 1 7237668 No HL4 C ZF2 40 3 170316793 No HL4 C ZF2 41 14 68287264 No HL4 C ZF2 42 16 70558218 No HL4 C ZF2 43 8 8607409 No HL4 C ZF2 44 2 45241966 No HL4 C ZF2 45 7 34456803 No HL4 C ZF2 46 10 114563397 No

255

HL4 C ZF2 47 11 7976759 No HL4 C ZF2 48 1 213078798 No HL4 C ZF2 49 18 63128165 No HL4 C ZF2 50 8 19307133 No HL4 C ZF2 51 14 55344062 No HL4 C ZF2 52 1 51108227 No HL4 C ZF2 53 15 34143355 No HL4

256

Appendix Table 18. Chromosome locations of Alu inserts driven by ORF2 endo- nuclease deficient constructs fused to Cas9 or the Cas9 nickase

Driver gRNA Clone Chr. Position gRNA Location of Alu relative 2 site to Target Sequence (If applicable) Cas9- 2 1 3 189574526 No Endo-- Cas9- 2 2 2 197607012 No Endo-- Cas9- 2 3 20 52048002 No Endo-- Cas9- 2 1 11 1528733 No RTCYS Cas9- 2 2 5 146243316 No RTCYS Nickase 2 1 1 155657534 No Endo-- Nickase 2 1 2 64319926 No RTCYS

257

Appendix Table 19. Chromosome locations of Alu inserts driven by dCas9-ORF2 constructs supplemented with a targeting gRNA

Driver gRNA Clone Chr Position 551 Location of Alu relative to Target Target Sequence (If appli- site? cable) dCas9- 551 1 4 3115237 No ORF2 dCas9- 551 2 6 102405371 No ORF2 dCas9- 551 3 6 56205348 Yes Alu landed 914 bp down- ORF2 stream and in the same ori- entation as the 551 target sequence dCas9- 551 4 10 29982061 No ORF2 dCas9- 551 5 5 95668696 No ORF2 dCas9- 551 6 22 45894468 No ORF2 dCas9- 551 7 4 12241030 No ORF2 dCas9- 551 8 16 58536376 No ORF2 dCas9- 551 9 15 42766986 Yes Alu landed 2317 bp down- ORF2 stream of the 551 target se- quence dCas9- 551 10 16 31248288 No ORF2 dCas9- 551 11 X 103246009 No ORF2 dCas9- 551 12 N/A N/A N/A ORF2 dCas9- 551 13 12 53415019 No ORF2 dCas9- 551 14 6 10486439 No ORF2 dCas9- 551 15 7 80699914 No ORF2 dCas9- 551 16 2 159621910 No ORF2 dCas9- 551 17 9 126949424 No ORF2 dCas9- 551 18 11 121919136 No ORF2 dCas9- 551 19 19 47169758 No ORF2 dCas9- 551 20 12 50540637 No

258

ORF2 dCas9- 551 21 10 24900656 No ORF2 dCas9- 551 22 14 69069801 No ORF2 dCas9- 551 23 5 102003817 No ORF2 dCas9- 551 24 10 11923033 No ORF2 dCas9- 551 25 11 76182702 No ORF2 dCas9- 551 26 11 95656742 No ORF2 dCas9- 551 27 12 9995642 No ORF2 dCas9- 551 28 13 82699209 No ORF2 dCas9- 551 29 X 36123177 No ORF2 dCas9- 551 30 3 35855845 No ORF2 dCas9- 551 31 7 129808403 No ORF2 dCas9- 5X 32 9 15528193 No ORF2 551 dCas9- 5X 33 11 72953783 No ORF2 551 dCas9- 5X 34 5 82305929 No ORF2 551 dCas9- 5X 35 5 96192442 No ORF2 551 dCas9- 5X 36 N/A N/A N/A ORF2 551 dCas9- 5X 37 4 52542761 No ORF2 551 dCas9- 5X 38 20 8923530 No ORF2 551 dCas9- 5X 39 4 74520805 No ORF2 551 dCas9- 765 1 3 157039846 No ORF2 dCas9- 765 2 9 19556476 No ORF2 dCas9- 765 3 N/A N/A N/A ORF2 dCas9- 765 4 5 14123304 No ORF2 dCas9- 765 5 2 195488586 No ORF2

259 dCas9- 765 6 2 200979233 No ORF2 dCas9- 765 7 2 4701628 No ORF2 dCas9- 765 8 8 104803558 No ORF2 dCas9- 765 9 20 34901286 No ORF2 dCas9- 765 10 15 78566006 No ORF2 dCas9- 765 11 7 95268147 No ORF2 dCas9- 765 12 N/A N/A N/A ORF2 dCas9- 765 13 1 40614422 No ORF2 dCas9- 765 14 7 5221776 No ORF2 dCas9- 765 15 12 50624310 No ORF2 dCas9- 765 16 X 37609459 No ORF2 dCas9- 765 17 14 67063448 No ORF2 dCas9- 765 18 10 124819025 No ORF2 dCas9- 765 19 5 13809038 No ORF2 dCas9- 765 20 6 145799727 No ORF2 dCas9- 765 21 11 34740060 No ORF2 dCas9- 765 22 20 32162878 No ORF2 dCas9- 765 23 2 26254562 No ORF2 dCas9- 765 24 3 193983887 No ORF2 dCas9- 765 25 4 86844739 No ORF2 dCas9- 765 26 11 63493674 No ORF2 dCas9- 5X 27 6 25436248 No ORF2 765 dCas9- 5X 28 N/A N/A N/A ORF2 765 dCas9- 5X 29 1 156652084 No ORF2 765 dCas9- 5X 30 11 102231749 No ORF2 765

260 dCas9- 5X 31 5 39032067 Yes ORF2 765 dCas9- 892 1 10 52365577 Yes Alu landed 24 base pairs ORF2 upstream and in the same orientation as the target se- quence. dCas9- 892 2 1 41003547 No ORF2 dCas9- 892 3 2 224874617 No ORF2 dCas9- 892 4 6 63394667 No ORF2 dCas9- 892 5 14 95412799 No ORF2 dCas9- 892 6 1 201860765 No ORF2 dCas9- 892 7 5 139291433 No ORF2 dCas9- 892 8 2 58562043 No ORF2 dCas9- 892 9 17 80266906 No ORF2 dCas9- 892 10 9 124663845 No ORF2 dCas9- 892 11 16 4687568 No ORF2 dCas9- 892 12 20 45158944 No ORF2 dCas9- 892 13 3 168729441 No ORF2 dCas9- 892 14 3 142716537 No ORF2 dCas9- 892 15 5 16335292 Yes Alu landed 1198bp up- ORF2 stream and in the opposite orientation as the 892 target sequence. dCas9- 892 16 20 44178659 No ORF2 dCas9- 892 17 21 28104921 ORF2 dCas9- 892 18 8 109000356 ORF2 dCas9- 892 19 6 100679963 ORF2 dCas9- 892 20 11 59528067 ORF2 dCas9- 892 21 5 20999717 ORF2 dCas9- 892 22 6 117741145 ORF2

261 dCas9- 892 23 1 51422840 ORF2 dCas9- 892 24 3 170733637 ORF2 dCas9- 892 25 20 35321061 ORF2 dCas9- 892 26 19 36154567 ORF2 dCas9- 892 27 1 86532691 ORF2 dCas9- 892 28 17 13020444 ORF2 dCas9- 892 29 12 60162957 ORF2 dCas9- 892 30 8 37669299 ORF2 dCas9- 5X 31 9 111943975 ORF2 892 dCas9- 5X 32 10 12701113 ORF2 892 dCas9- 5X 33 16 21655860 ORF2 892 dCas9- 5X 34 1 170066310 ORF2 892 dCas9- 5X 35 7 17353916 ORF2 892 dCas9- 5X 36 4 41065494 ORF2 892 dCas9- 5X 37 3 138957104 ORF2 892 dCas9- 5X 38 2 20089322 ORF2 892 dCas9- 3' L1 1 11 28127811 No ORF2 dCas9- 3' L1 2 11 66772364 No ORF2 dCas9- 3' L1 3 15 34122649 No ORF2 dCas9- 3' L1 4 2 237439074 No ORF2 dCas9- 3' L1 5 16 21655830 No ORF2 dCas9- 3' L1 6 8 38779792 Yes Alu landed 2962bp down- ORF2 stream and in the same ori- entation as the 3’ L1 target sequence dCas9- 3' L1 7 13 36799671 No ORF2 dCas9- 3' L1 8 20 50362133 No ORF2

262

Appendix Table 20. Chromosome locations of Alu inserts driven by MS2-ORF2 constructs supplemented with a targeting gRNA and dCas9

Driver gRNA Clone Chr. Position Target Location of Notes site? Alu relative to Target Sequence (If applicable) MS2- MS2 1 2 190904311 No ORF2 551 MS2- MS2 2 17 8534722 No ORF2 551 MS2- MS2 3 15 34221612 No ORF2 551 MS2- MS2 4 6 143234396 No ORF2 551 MS2- MS2 5 12 68457435 No ORF2 551 MS2- MS2 6 3 14498960 No ORF2 551 MS2- MS2 7 19 46973079 No ORF2 551 MS2- MS2 8 11 4749443 No ORF2 551 MS2- MS2 9 4 105257632 No ORF2 551 MS2- MS2 10 10 73883119 No ORF2 551 MS2- MS2 11 17 80388417 No ORF2 551 MS2- MS2 12 7 46083068 No ORF2 551 MS2- MS2 13 10 60702728 No ORF2 551 MS2- MS2 1 2 175258257 No ORF2 765 MS2- MS2 2 13 36061485 No ORF2 765 MS2- MS2 3 1 7190338 No ORF2 765 MS2- MS2 4 2 42119219 No ORF2 765 MS2- MS2 5 18 31722079 No ORF2 765 MS2- MS2 6 5 33456230 No ORF2 765 MS2- MS2 7 7 114289476 No ORF2 765

263

MS2- MS2 8 7 32843748 No ORF2 765 MS2- MS2 9 3 167963192 No ORF2 765 MS2- MS2 10 4 150904853 No ORF2 765 MS2- MS2 11 2 227445035 No ORF2 765 MS2- MS2 12 2 98561653 No ORF2 765 MS2- MS2 13 14 104288026 No ORF2 765 MS2- MS2 1 N/A N/A N/A Landed in an ORF2 892 Alu, no perfect match in the genome, can- not map it. MS2- MS2 2 9 92659699 No ORF2 892 MS2- MS2 3 2 162456127 No ORF2 892 MS2- MS2 4 N/A N/A Yes Alu landed Many hits to ORF2 892 2278 bp L1PA5 in the downstream genome. of 892 target None 100%-- site all 89%-- can't map exactly MS2- MS2 5 7 73520969 No ORF2 892 MS2- MS2 6 9 135110453 No ORF2 892 MS2- MS2 7 5 159573306 No ORF2 892 MS2- MS2 8 2 33144656 No ORF2 892 MS2- MS2 9 9 13961341 No ORF2 892 MS2- MS2 10 15 69456817 No ORF2 892 MS2- MS2 11 4 121147830 No ORF2 892 MS2- MS2 12 20 32290396 No ORF2 892 MS2- MS2 13 19 41375742 No ORF2 892 MS2- MS2 14 2 215355676 No ORF2 892 MS2- MS2 15 9 118061563 No ORF2 892

264

BIOGRAPHY.

Catherine M. Ade was born in Fairfax, Virginia. She grew up playing flute, and participated in both the marching and concert bands in high school. In college, she double majored in Zoology and Music at Miami University of Ohio, receiving degrees in 2010 and 2011, respectively. She matriculated into the Department of

Cell and Molecular Biology at Tulane University in the fall of 2011, where she joined the lab of Dr. Astrid Engel.