RECURRENT OF VERTEBRATE TRANSCRIPTION FACTORS VIA CAPTURE

A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

by Rachel Leigh Cosby December 2019

© 2019 Rachel Leigh Cosby

RECURRENT EVOLUTION OF VERTEBRATE TRANSCRIPTION FACTORS VIA TRANSPOSASE CAPTURE

Rachel Leigh Cosby, Ph. D. Cornell University 2019

Despite their vital role as regulators of expression, the evolutionary origin of transcription factors (TFs) and the mechanisms by which new TFs evolve remain poorly understood. Some TFs evolved via transposase capture, the process by which , the proteins that facilitate DNA transposon mobility, fuse to domains from host proteins, but the extent of this phenomenon and the events required remain unexplored. Here we use comparative genomics to characterize the frequency, composition, and function of host- transposase fusions (HTF) in the tetrapod lineage. We find that HTF is a recurrent evolutionary process that occurred at least 88 times during tetrapod evolution, primarily through alternative splicing. By analyzing the domain structure of HTFs, we also determined that HTF fusion in the tetrapod lineage occurs most commonly between the transcriptionally repressive KRAB domain and transposase DNA binding domains. We demonstrate that four KRAB- transposase fusions repress in a sequence-specific manner in reporter assays, consistent with a potential role as a TF. To further test this, we chose a bat-specific KRAB-transposase fusion, KRABINER, as our model. We performed precision run-on sequencing (PRO-seq) and Cleavage Under

Targets and Release Using Nuclease (CUT&RUN) in KRABINER KO bat cells, engineered using CRISPR-Cas9, and KRABINER KO cells rescued with either wild-type or mutant KRABINER transgenes. We found that KRABINER regulates transcription of both and transcriptional regulatory elements, and that a subset of these changes is associated with KRABINER binding, confirming that KRABINER acts as a TF in bat cells. Transposase capture is thus a heretofore underappreciated mechanism to generate novel vertebrate

TFs, and provides a plausible mechanism for the origin of several extant TFs.

BIOGRAPHICAL SKETCH

Rachel Cosby was born in Oklahoma, where she received her B.S. degrees in Microbiology, Cell and Molecular Biology and Biochemistry from

Oklahoma State University. During that time, she completed a senior honors thesis, advised by Jeffrey Hadwiger, PhD, on the role of protein- A/G- protein coupled-receptor signaling in differentiation and motility of the slime mold, Dictyostelium discoideum . Following completion of her degree, Rachel began her PhD in the Molecular Biology program at the University of Utah.

There she joined the labs of Drs. Ellen Pritham and Cedric Feschotte, studying the role of transposons in bat evolution. After her fourth year at the

University of Utah, she transferred to Cornell University to complete her PhD in

Dr. Feschotte’s lab, studying the role of transposase cooption in generating novel cellular genes. Following graduation, she will pursue a postdoctoral position at the National Institutes of Health studying the role of KRAB-zinc finger proteins and transposons in mammalian evolution.

v

To my husband, Kevin Cosby.

vi

ACKNOWLEDGMENTS

There are many people and institutions to acknowledge, both for their contributions to this work and to my success as a graduate student. First, I owe an incredible debt of gratitude to my mentors, Ellen Pritham and Cedric

Feschotte, whose guidance throughout my PhD has been invaluable. They have supported me mentally and professionally, and I couldn’t have completed this project without their help. I also thank the Feschotte/Pritham labs, including all current and past members, for their tremendous support and for making the lab a fun place to work. I especially acknowledge Claudia Marquez, Jainy Thomas,

Aurelie Kapusta, Edward Chuong, Xiaoyu Zhuo, and Julius Judd, all of whom helped, whether with technical knowledge or moral support throughout my graduate career. I also thank my incredibly talented undergraduates, Ruiling

Zhang, Alan Zhong, and Nathaniel Gerry for their assistance with this project.

I also thank the University of Utah, including the Molecular Biology PhD program and the Human Genetics department, and Cornell University, including the Genetics, Genomics, and Development and Molecular Biology and Genetics communities for providing excellent training and a collaborative environment.

My dissertation committees, at both the University of Utah and Cornell, also made vital contributions to this work. I also thank the National Institutes of Health for continued funding for my work, and to numerous collaborators, including

David Ray, Todd Macfarlan, Woodring Wright, Helen Rowe, and Joanna

Wysocka for providing reagents and experimental assistance.

vii

I also owe my success and sanity to my close friends and family. My best friends, Julie Feusier, Rebecca Bruders, and John and Rosika Frank have been a constant source of support throughout my PhD, and together we have faced the highs and lows of graduate school. My parents, Julie and Aaron Rice, have also supported me throughout my life by never doubting me and by encouraging me to pursue my interests in science. Finally, I am most thankful for my husband, Kevin Cosby, for his eternal support, and the tremendous sacrifices he has made and continues to make in order for me to pursue my dreams.

viii

TABLE OF CONTENTS

BIOGRAPHICAL SKETCH…...……………………………………………………..v ACKNOWLEDGEMENTS.…………………………………………………………vii TABLE OF CONTENTS…………………………………………………………….ix LIST OF FIGURES…………………………………………………………………..xi LIST OF TABLES……………………………………………………..…………….xii LIST OF ABBREVIATIONS…………………………………………..…………...xiii CHAPTER 1 - Host-transposon interactions: conflict, cooperation, and cooption……………………………………………………………………………….1 1.1 ABSTRACT…………………………………….………………………...1 1.2 INTRODUCTION…………..…………………………………………….1 1.3 ARMS RACES…………………………………………………………...5 1.3.1 piRNA-mediated TE silencing………………………………6 1.3.2 KRAB Zinc Finger Proteins as an adaptive TE silencing system……………………………………………………….16 1.3.3 Counter-defense mechanisms…………………………….20 1.4 ESCAPE AND SELF-CONTROL STRATEGIES…………………...24 1.4.1 Bypassing host surveillance……………………………….24 1.4.2 Self-regulatory mechanisms……………………………….25 1.4.3 Targeting preference……………………………………….28 1.5 HOST-TRANSPOSON MUTUALISM………………………………..29 1.5.1 Candidate host-transposon mutualisms………………….30 1.5.2 Conflicts in disguise?...... 36 1.6 EN ROUTE TO COOPTION…………………………………………..37

ix

1.7 OUTLOOK………………………………………………………………39 REFERENCES……………………………………………………………………...44 CHAPTER 2 – Recurrent evolution of vertebrate transcription factors via transposase capture………………………………………………………………..63 2.1 ABSTRACT……………………………………………………………..63 2.2 INTRODUCTION……………………………………………………….64 2.3 RESULTS……………………………………………………………….66 2.3.1 Transposase capture is a pervasive mechanism to generate novel genes in tetrapods………………………..66 2.3.2 Transposase capture occurs through alternative splicing……………………………………………………….90 2.3.3 Fusion of transposase DBDs to host KRAB domains is the most frequent HTF combination…………………………..95 2.3.4 KRAB-transposase fusions act as sequence-specific repressors of gene expression…………………………..101 2.3.5 KRABINER regulates transcription in bat cells..………………………………………………………..103 2.3.6 KRABINER binds to genomic mariner TIRs…...113 2.3.7 KRABINER binding is associated with downregulation fo

TREs…….………………………………………………….118 2.4 DISCUSSION………………………………………………………….122 2.5 MATERIALS AND METHODS………………………………………125 2.5.1 Cell lines and culture methods…………………………..126 2.5.2 Identifying and characterizing transposase fusion…….126 2.5.3 Selection analysis…………………………………………128 2.5.4 Transposase consensus sequence generation………..128

x

2.5.5 Determining HTF gene birth mechanism……………….129 2.5.6 Determining the evolutionary history of KRABINER...... 132 2.5.7 KRABINER mutant sequence design…………...………133 2.5.8 Vector construction………………………………………..134 2.5.9 Luciferase assays…………………………………………135 2.5.10 Generating and validating KRABINER KO cells……….136 2.5.11 Generating and validating KRABINER rescue cells..…138 2.5.12 KRABINER rescue transgene immunofluorescence assays………………………………………………………139 2.5.13 Sample preparation for PRO-seq and CUT&RUN...…..142 2.5.14 PRO-seq library preparation……….…………………….142 2.5.15 PRO-seq alignment and processing…..………………..144 2.5.16 Differential transcription analysis of genes and TREs…………………………………………………….....145 2.5.17 CUT&RUN library preparation..…………..………...…...145 2.5.18 CUT&RUN data processing and analysis……………...146 REFERENCES…………………………………………………………………….149 CHAPTER 3 – Discussion and future directions………………………………155 3.1 DISCUSSION…………………………………………………………155

3.2 FUTURE DIRECTIONS……………………………………………...159 3.2.1 How common is transposase capture in other lineages?...... 159 3.2.2 What other functions do host-transposase fusion genes have?...... 160 3.2.3 What is the biological function of KRABINER in bats?....161 3.3 CONCLUSION…………………………………………………………163

xi

REFERENCES…………………………………………………………………….164 APPENDIX: KRABINER regulated genes…………………...…………………167

xii

LIST OF FIGURES

Figure 1.1: piRNA and KRAB-ZFPs: two host systems that recognize and silence TEs……………………………………………………………………………8 Figure 1.2: Evidence of host-TE arms races……………………………………..11 Figure 1.3: Evidence of TE counter-defense……………………………………..23 Figure 1.4: Cooperation paves the way for cooption…………………………….33 Figure 1.5: Model for host-TE interactions……………………………………….42 Figure 2.1: Gene birth by transposase capture is pervasive in tetrapods…….69 Figure 2.2: Transposase capture by alternative splicing……………………….91 Figure 2.3: KRABINER evolved in the vespertilionid bat ancestor…………92-93 Figure 2.4: Biochemical activities of host-transposase fusion proteins………..95 Figure 2.5: HTF domain structure is varied………………………………….98-99 Figure 2.6: Sequences of TIRs and mutants used in KRAB-transposase luciferase assays………………………………………………………………….100 Figure 2.7: KTIGD3 and KRABINER regulate gene expression in a KAP1- independent manner………………………………………………………………102 Figure 2.8: KRABINER regulates transcription of genes and TREs in bat cells…………………………………………………………………………………104

Figure 2.9: KRABINER KO cell line validation…………………………………105 Figure 2.10: KRABINER rescue cell line validation…………………………….107 Figure 2.11: PRO-Seq QC metrics for WT, KO, and rescue cell lines……………………………………...... …...…………………………………..109 Figure 2.12: Some KRABINER mediated transcriptional changes require only one if its functional domains.…………………………………………………..…111 Figure 2.13: KRABINER binds to mariner TIRs in bat cells……….…………...112

xiii

Figure 2.14: CUT&RUN QC metrics…………………………………….……….115 Figure 2.15: KRABINER binding is associated with transcriptional downregulation of TREs……………………………………………………..116-117 Figure 2.16: KRABINER over-expression results in changes in TRE transcription in a domain-dependent manner…………………………………..119 Figure 2.17: KRABINER binding is associated with transcriptional downregulation of some TREs……………………………………………...120-121

xiv

LIST OF TABLES

Table 2.1: NCBI Refseq queried in HTF search……………….…….66 Table 2.2: Transposase domains used as query for HTF search………………………………………………………….…………………….68 Table 2.3: Summary of identified HTF genes……………………………………72 Table 2.4: PCR and RTPCR primer sequences……………………………….131 Table 2.5: Sequencing and alignment statistics……………………………….140 Table A2.1: KRABINER regulated genes………………………………………167

xv

LIST OF ABBREVIATIONS

2C 2-cell stage DBD DNA-binding domain DNA Deoxyribonucleic acid DNMT DNA methyltransferase DOX Doxycycline CDART Conserved Domain Architecture Retrieval Tool CDNA Complementary DNA CMV Cytomegalovirus CUT&RUN Cleavage Under Targets and Release Using Nuclease ERV Endogenous retrovirus FDR False discovery rate GRNA guide RNA HDAC deacetylase HERV Human endogenous retrovirus HESC Human embryonic stem cell HTF Host-transposase fusion

HTH Helix-turn-helix IES Internal eliminated sequence IF Immunofluorescence KRAB Krüppel-associated box KMD DBD mutant KRABINER KMK KRAB mutant KRABINER KO Knock-out

xvi

KTF KRAB-transposase fusion KWT Wild-type KRABINER L1 LINE-1 LCA Last common ancestor LRT Likelihood ratio test LTR Long terminal repeat MAC Macronucleus MIC Micronucleus MLV Murine leukemia virus MYR Million year NURD Nucleosome remodeling deacetylase OE Over-expression ORF Open reading frame PB Piggybac PB-TRE Piggybac Tet-responsive-element PCR chain reaction PIRNA Piwi-interacting RNA PRO-SEQ Precision run-on sequencing PTGS Post-transcriptional gene silencing

PWM Position weight matrix R Rescue RDNA Ribosomal DNA RL Renilla luciferase RNA Ribonucleic acid RNAI RNA interference RT

xvii

RT-PCR Reverse transcription-PCR SA Splice acceptor SIM Similarity TBE Telomere-bearing element TE TES Transcription end site TET Tetracycline TGS Transriptional gene silencing TIR Terminal inverted repeat TF Transcription factor TRE Transcription regulatory element TSS Transcription start site UBQC Ubiquitin-C UMI Unique molecular identifier WT Wild-type ZFP Zinc-finger protein ZNF/ZF Zinc-finger

xviii

CHAPTER 1 HOST-TRANSPOSON INTERACTIONS: CONFLICT, COOPERATION, AND COOPTION 1

1.1 ABSTRACT

Transposable elements (TEs) are mobile DNA sequences that colonize genomes and threaten genome integrity. As a result, several mechanisms appear to have emerged during eukaryotic evolution to suppress TE activity.

Yet, TEs are ubiquitous and account for a prominent fraction of most eukaryotic genomes. We argue that the evolutionary success of TEs cannot be solely explained by evasion from host control mechanisms. Rather, some TEs have evolved commensal and even mutualistic strategies that mitigate the cost of their propagation. These co-evolutionary processes promote the emergence of complex cellular activities, which in turn paves the way for cooption of TE sequences for organismal function.

1.2 INTRODUCTION

1 This work is published as “Cosby RL, Chang NC, and Feschotte C (2019) Host-transposon interactions: conflict, cooperation, and cooption. Genes and Development ” and is reprinted here with permission. The author contributions are as follows: Cosby RL and Feschotte C chose the focus of the review, Cosby RL conducted the literature research, provided input in figure design, and wrote the paper. Feschotte C assisted in writing and manuscript preparation. Chang NC designed and generated all figures and provided input to the review.

Transposable elements (TEs) are mobile, repetitive DNA sequences that comprise a substantial fraction of eukaryotic genomes (Bourque et al. 2018).

For instance, TEs and their remnants account for more than half the nuclear

DNA content of , zebrafish, and humans, and approximately a third of the nuclear DNA content of Drosophila melanogaster and Caenorhabditis elegans

(Schnable et al. 2009; Howe et al. 2013; de Koning et al. 2011; Hoskins et al.

2015; C. elegans Sequencing Consortium 1998). The evolutionary success of

TEs lies in their ability to mobilize and replicate independently of the host genome (Orgel and Crick 1980; Doolittle and Sapienza 1980). TEs are classified into two broad groups based on their molecular transposition intermediate:

Class I elements (retrotransposons) mobilize via an RNA intermediate, while

Class II elements (DNA transposons), mobilize via a DNA intermediate.

Retrotransposons include endogenous retroviruses (ERVs) and related long terminal repeat (LTR) retrotransposons as well as non-LTR retrotransposons

(Wicker et al. 2007; Bourque et al. 2018). DNA transposons include “cut and paste” DNA transposons as well as non “cut and paste” elements (Wicker et al.

2007; Bourque et al. 2018). If transposition occurs in cells that contribute to the next generation (i.e. the germline), the newly integrated TE copy becomes inheritable and may spread vertically within the population. TEs also spread horizontally between species at an appreciable frequency (Gilbert and

Feschotte 2018), which is crucial to their evolutionary persistence by allowing the colonization of new genomes.

2

As for any heritable mutational event, and dictate the fate of a new TE insertion in the population. Because of their sheer length (~100-10,000 bp) and propensity to carry regulatory elements (i.e. promoters, splice sites, poly-A signals), TE insertions are particularly prone to disrupt gene function. For instance, ~10% and ~50% of spontaneous mutant isolated in laboratory strains of mice and flies, respectively, are caused by de novo TE insertions within the coding or noncoding portion of genes (Eickbush and Furano 2002; Gagnier et al. 2019). An equally sizeable fraction of maize mutants arose from transposition events (Neuffer et al. 1997).

In humans, at least 120 Mendelian diseases have been attributed to de novo

TE insertions (Hancks and Kazazian 2016). These observations, together with more direct measurement of fitness effects in Drosophila caged populations

(e.g. Pasyukova 2004), indicate that TE mobilization is highly mutagenic and represents a significant source of genome instability in a wide range of organisms.

Even TEs that are no longer mobile still pose a threat to organismal fitness. The repetitive nature of TE families provides a substrate for ectopic recombination events that can lead to chromosomal rearrangements, often with deleterious consequences (Montgomery et al. 1991; Bennetzen and Wang

2014; Deininger et al. 2003; Ade et al. 2013; Han et al. 2008). TE-encoded products (RNA, cDNA, and proteins), even with compromised functionalities, can also be toxic, and their accumulation to aberrant amounts are associated with, and increasingly recognized as directly contributing to, various disease

3

states including cancer, senescence, and chronic inflammation (De Cecco et al.

2019; Bourque et al. 2018; Tubio et al. 2014; Lee et al. 2012a; Tang et al. 2017;

Schauer et al. 2018; Burns 2017; Dubnau 2018).

The manifold capacity of TEs to compromise genome integrity and interfere with normal cell function suggests that uncontrolled TE amplification can have catastrophic consequences for both organisms and TEs, as their fitness is inextricably linked. Thus, as for any parasitic system, natural selection will eliminate TEs with excessive activity and overly deleterious effects on their host. It will also select for the emergence of host-encoded mechanisms that suppress or dampen TE activity. If a defense mechanism completely blocks a given TE family, it will in turn place selective pressure on these elements to evolve a counter-defense or escape mechanism to further propagate, setting in motion an endless escalation between host and TE weaponry. Such genetic conflict is often framed in terms of an arms race (Hurst and Werren 2001; Burt and Trivers 2006; Werren 2011; McLaughlin and Malik 2017a; Ozata et al.

2019), which builds on the Red Queen model for host-pathogen interactions

(Van Valen 1973). However, it seems at least equally plausible that the arms race could be avoided if TEs could mitigate their conflict with the host, either through self-control or by providing a benefit offsetting their cost. Such strategies would reduce pressure on the host to evolve systems to counteract

TEs, promoting equilibrium rather than precipitating an arms race.

In the first part of this review, we examine the evidence for and against the arms race model of host-TE interactions. We propose that the extent to

4

which TEs can outcompete their host is constrained by their dependence on organismal fitness. These constraints favor TEs that evolve strategies that circumvent or attenuate, rather than block, host defenses. We then explore alternative models, including self-regulatory mechanisms and mutualistic interactions, that reduce the cost of TE activity. We speculate that cooperative strategies may be more widespread than currently appreciated and often pave the way for the cooption of TE sequences for host function.

1.3 ARMS RACES

Consistent with a need to prevent the deleterious effects of rampant transposition, a variety of host-encoded mechanisms are known to repress eukaryotic TEs at both the transcriptional (chromatin modification and DNA methylation) (Ozata et al. 2019; Yoder et al. 1997) and post-transcriptional level

(modifying and degrading TE transcripts) (Czech et al. 2018; Goodier 2016;

Borges and Martienssen 2015). This leads to suppression of TE expression, mobility, and/or ability to promote recombination. However, all of these mechanisms also regulate host gene expression or protect against exogenous pathogens, and thus do not exclusively act on TE sequences (Klose and Bird

2006; Fedoroff 2012; Klemm et al. 2019). This begs the question: are TEs the raison d’être and primary target for these mechanisms or are they caught in the cross-fire? One way to address this question is to examine whether some of these mechanisms have evolved the ability to distinguish TEs from host sequences. Another predicted hallmark of a TE defense system is that it should

5

be able to adapt to control new or variant TEs that would inevitably arise to evade repression, triggering an arms race between TEs and components of the pathway. Below we discuss two pathways, piwi-interacting small RNAs

(piRNAs) and Kruppel-associated-box containing zinc-finger proteins (KRAB-

ZFPs), that appear to possess these attributes.

1.3.1 piRNA-mediated TE silencing

Piwi-interacting RNAs (piRNAs) are small RNAs (25-30 nucleotides long) found in most metazoans that are generally produced from long precursor transcripts derived from specialized loci called piRNA clusters (for review Ozata et al. 2019; Ernst et al. 2017). Once expressed, primarily in the gonads but also in the soma of some organisms, piRNA precursors are processed into mature piRNAs, which then complex with PIWI-clade Argonaute proteins. These ribonucleoprotein complexes recognize complementary target mRNAs in the cytoplasm or nascent RNAs in the nucleus, triggering a cascade of biochemical processes that ultimately reduces target RNA expression via post- transcriptional or transcriptional mechanisms, respectively (Fig. 1.1A; Ozata et al. 2019). piRNAs produced from TE loci act as potent trans-repressors of related TEs located throughout the genome (Ozata et al. 2019). At least in some organisms, piRNA-mediated TE repression appears necessary to preserve the integrity of the germline. This is best documented in Drosophila melanogaster where the loss of piRNAs normally repressing certain TEs (, I factor) leads to rampant transposition and hybrid dysgenesis, which is characterized

6

by extensive DNA damage, gonadal atrophy, and sterility (Kidwell et al. 1977;

Bucheton et al. 1984; Brennecke et al. 2008; Wang et al. 2018). In the mouse, elimination of PIWI proteins also results in massive accumulation of retrotransposon transcripts in sperm and oocytes (Kabayama et al. 2017;

Carmell et al. 2007; De Fazio et al. 2011; Kuramochi-Miyagawa et al. 2008).

This activation precedes, and may cause, defects in spermatogenesis due to meiotic arrest and apoptosis (Frost et al. 2010; Deng and Lin 2002; Carmell et al. 2007; Kuramochi-Miyagawa et al. 2004), though whether defects are due to transposition remains unclear (Newkirk, et al. 2017). Similar phenotypes are also seen in C. elegans upon loss of PIWI protein PRG-1, which leads to transposon activation and impaired fertility (Batista et al. 2008; Wang and

Reinke 2008; Bagijn et al. 2012; Lee et al. 2012b). Thus, it is indisputable that one function of the piRNA pathway is to recognize and silence TEs.

7

Fig. 1.1 piRNA and KRAB-ZFPs: two host systems that recognize and silence TEs. A) piRNA pathway in Drosophila melanogaster . Mature piRNAs reenter the nucleus to perform transcriptional gene silencing (TGS) or participate in the Ping-Pong cycle to perform post-transcriptional gene silencing (PTGS) of TEs. Rhi=Rhino; Del=Deadlock; Aub=Aubergine; Ago3=Argonaute 3. B) KRAB-ZFP pathway in tetrapods (see text). NURD=nucleosome remodeling deacetylase. HDAC=histone deacetylase; DNMTs=DNA methyltransferases.

8

The piRNA pathway evolves rapidly, but why?

Is the piRNA pathway engaged in an arms race with TEs? A common hallmark of an arms race between host and pathogen is the rapid diversification of host and pathogen proteins that are engaged in conflicting interactions. This may be a direct physical interaction (e.g. host protein blocks pathogen protein), but indirect or secondary interactions can also drive rapid evolution (McLaughlin and Malik 2017a; Parhad and Theurkauf 2018). Diversification and adaptation of protein sequences often manifest as so-called positive selection at the gene level, which is characterized by an excess of non-synonymous changes in codons relative to synonymous over time. Signatures of positive selection are pervasive in piRNA pathway genes of invertebrates, both within and between species, and at virtually every step of piRNA biogenesis, including piRNA precursor production and maturation, as well as effector proteins (Simkin et al. 2013; Begun et al. 2007; Larracuente et al. 2008; Obbard et al. 2009;

Mackay et al. 2012; Luo and Lu 2017; Blumenstiel et al. 2016). Signatures of positive selection are also apparent in the evolution of several piRNA genes across fish species, including Piwil1 , but less evident in their mammalian homologs (Yi et al. 2014).

Another prediction of a rapidly evolving host defense system is that it may lead to functional incompatibilities and introduce a breach in the defense of the progeny of individuals carrying divergent components. Support for this scenario came from an elegant study of the rapidly evolving piRNA pathway components Rhino and Deadlock in hybrids of Drosophila melanogaster and D.

9

simulans (Parhad et al. 2017) . In drosophilids, Rhino and Deadlock directly interact as part of a complex that binds to and promotes the transcription of piRNA clusters (Fig. 1.1A; (Ozata et al. 2019)). Genetic loss of Rhino

(Klattenhoff et al. 2009) or depletion of Deadlock (Mohn et al. 2014) in D. melanogaster leads to loss of cluster-derived piRNAs and results in massive transcriptional activation of TEs. While introduction of a D. melanogaster rhino transgene rescues this mutant , the D. simulans rhino transgene does not (Fig 1-2A; Parhad et al. 2017). Domain swapping experiments, co- immunoprecipitation and co-localization assays indicate that this is due to an inability of the D. simulans Rhino protein to interact with D. melanogaster

Deadlock due to species-specific amino acid changes at the interaction interface of the two orthologous proteins (Parhad et al. 2017). D. simulans Rhino does interact with simulans Deadlock, however, indicating that coevolution has occurred between these proteins in the simulans lineage as well. The authors speculate that the rapid coevolution of Rhino and Deadlock proteins was precipitated by a yet-unknown TE-encoded anti-silencing factor present in one or both species lineages that may directly compete or interfere with their interaction (Parhad et al. 2017).

10

Fig. 1.2 Evidence of host-TE arms races. A) Melanogaster-simulans Rhino-Deadlock incompatibility. Rhi=Rhino; Del=Deadlock; Pol II=RNA polymerase II. Melanogaster -simulans Rhino/Deadlock proteins cannot complement to transcribe piRNA clusters (see text). B) ZNF93- and ZNF 649-mediated silencing of primate L1 elements. Older L1 primate families are silenced by ZNF93 and ZNF 649, but younger L1 families lack ZNF93 and ZNF649 binding sites and are not silenced.

11

Another indicator of adaptive evolution of the piRNA pathway lies in the recurrent duplication and turnover of genes involved in the pathway. Again, this is most striking in invertebrates. For example, a phylogenomic survey of Piwi- clade Argonaute genes across 84 dipteran species revealed that these genes have independently duplicated 27 times (Lewis et al. 2016). Interestingly, this study revealed a disproportionate number of duplication events in mosquitoes, which carry a large and diverse TE load (Matthews et al. 2018). More recently, a broader survey of arthropods identified an additional 17 gene duplication events of Piwi , with 14 of which having occurred in the lineage of the pea aphid alone (Lewis et al. 2018).

The rapid diversification of the piRNA pathway machinery in insects via positive selection and recurrent gene duplication is consistent with the idea that the system is engaged in an arms race. What remains unclear however is whether adaptation in the pathway is driven primarily by a need to adjust to TE activity or evasion, or by another conflict. If TE activity is the primary driver, one would predict that the timing and intensity of positive selection should track with

TE load and diversity across species lineages. Yet, in Drosophila , where this model has been modeled and examined, this appears to not be the case. TE abundance across the Drosophila genus was found to be correlated with the level of purifying selection (constraint) on piRNA pathway components, but not with the rate at which these proteins have diversified (Castillo et al. 2011).

Another prediction of an arms race model is that variation in TE activity would be accompanied by commensurate changes in expression of piRNA pathway

12

components or changes in abundance or composition of the piRNA pool. The former possibility was examined across wild-type strains of Drosophila simulans with variable TE content (Fablet et al. 2014) . While piRNA pathway genes exhibited wide variation in transcript levels across strains, there was no direct positive correlation with TE copy numbers. As to the latter, RNA-sequencing of

16 inbred lines from the Drosophila Genetic Reference Panel identified only minor variation in piRNA expression, and piRNA cluster expression did not correlate with presence of strain-specific TE insertions (Song et al. 2014).

Another study investigated the genomic factors that contribute to the piRNA pool by integrating genomic, mRNA, and small RNA data for two lab strains of D. melanogaster (Kelleher and Barbash 2013) . While variation of piRNA abundance between strains appear to be positively correlated with total TE content and expression, the most recently active TEs did not produce the most abundant piRNAs in ovaries (Kelleher and Barbash 2013). Thus, collectively the evidence so far in Drosophila does not support the notion that TE activity is a primary driver of rapid evolution in the piRNA pathway. Further characterization of the relationship between piRNA pathway evolution and TE composition across species with more drastic variation in TE content might reveal a different picture.

piRNAs function beyond TE silencing and outside the germline

What other forces could drive adaptive evolution of the piRNA pathway?

In addition to TEs, it is well established that piRNAs also target host genes and

13

viruses (Ozata et al. 2019) and both targets could have considerable influence on piRNA pathway evolution. The fact that host genes are not immune, but actually frequent targets of piRNAs in a wide range of organisms implies a need to minimize off target effects. Indeed, if piRNA targeting is not sufficiently specific to TEs, it could interfere with host gene expression. The autoimmunity hypothesis (Blumenstiel et al. 2016) posits that positive selection in piRNA genes reflects alternating periods of high and low TE activity, which impose opposite constraints on the piRNA response: high TE activity requires high piRNA specificity, while low activity calls for greater sensitivity. In this model, TE activity indirectly influences piRNA evolution. Measurable fitness defects caused by off targeting effects of TE-derived small RNAs (Lee 2015; Hollister and Gaut 2009) bring empirical support to the model. Further evidence comes from a recent study, which compared off-target effects of three piRNA pathway components, Aubergine ( aub ), Armitage ( armi ), and Spindle E ( spnE ) in D. melanogaster mutant backgrounds trans-complemented with the D. simulans protein (Wang et al. 2019). When mutant flies were trans-complemented, more melanogaster protein-coding genes were repressed than when complemented with the melanogaster version, suggesting that the melanogaster proteins have adapted to avoid genic off-targeting in the melanogaster background, whereas the simulans proteins have not (Wang et al. 2019). These results support the idea that the avoidance of off-target effects on ‘self’ sequences is a plausible driver of adaptation in the piRNA pathway.

14

There is also mounting evidence that the piRNA pathway functions in antiviral defense, which may trigger another conflict underlying rapid evolution.

The strongest evidence for antiviral activity of piRNAs thus far comes from studies in mosquitoes. These insects have a dramatically expanded repertoire of Piwi genes (Lewis et al. 2016) and they deploy somatic piRNAs to combat arbovirus infection (Miesen et al. 2016; Morazzani et al. 2012; Vodovar et al.

2012; Schnettler et al. 2013; Léger et al. 2013). Small RNA profiling also revealed abundant viral-derived piRNAs in Drosophila cell culture, thought to reflect naturally occurring infections (Wu et al. 2010), but not in actual flies subject to experimental viral infection (Petit et al. 2016). Chickens that harbor endogenized Avian Leukosis Virus (ALV) sequences also produce copious amount of piRNAs from these loci in their testes (Sun et al. 2017). Since one of these ALV loci, ALVE6, has been historically associated with ALV resistance

(Robinson et al. 1981), it is tempting to speculate that piRNAs derived from this locus offer antiviral protection. A similar mechanism has been proposed to operate in primates and rodents where endogenous bornavirus-like elements generate piRNAs in the testes (Parrish et al. 2015). While it remains unknown whether these piRNAs could confer resistance to bornavirus infection (which is typically neurotropic), these observations suggest that endogenous viral sequences are a common source of piRNAs in diverse animals. Thus, both gene and viral targeting by piRNAs add a layer of complexity that must be considered as potential drivers of piRNA pathway rapid evolution.

15

1.3.2 KRAB Zinc Finger Proteins as an adaptive TE silencing system

The expansion of KRAB-ZFPs in mammalian genomes is increasingly recognized as an adaptive response to control TE invasion (Yang et al. 2017b).

KRAB-ZFPs minimally contain a N-terminal KRAB domain followed by a variable array of C2H2-type zinc fingers (Imbeault et al. 2017; Yang et al.

2017b). Most KRAB-ZFPs so far characterized act as transcriptional repressors

(for review, Yang et al. 2017b; Ecco et al. 2016). Typically, KRAB-ZFPs bind

DNA through their zinc fingers and recruit the corepressor KAP1 (TRIM28) via their KRAB domain. In turn KAP1 recruits a variety of epigenetic modifiers such as histone (SETDB1) and DNA methyltransferases that nucleate the formation of repressive chromatin at the target locus (Fig. 1.1B). Several KRAB-ZFPs have been implicated in the silencing of specific TE families recognized through sequence-specific DNA binding interactions (Schmitges et al. 2016; Najafabadi et al. 2015; Imbeault et al. 2017). Perhaps the best characterized example is mouse Zfp809 , which targets and represses murine leukemia virus (MLV) in embryonic stem cells (Wolf and Goff 2009) and is necessary to establish stable repression of ERV-like VL30 TEs in early embryonic development (Wolf et al.

2015). Thus, there is growing evidence that a common function of KRAB-ZFPs is to silence TEs.

Several lines of evidence suggest that KRAB-ZFP gene evolution is driven in part by an arms race with TEs. First, they are massively expanded in all tetrapod lineages examined (with the notable exception of birds) --with a range of 200-400 genes in most mammals-- but very few are deeply conserved,

16

implying pervasive evolutionary turnover (Imbeault et al. 2017). The copy number of ZFPs in a given genome, including KRAB-ZFPs, is strikingly correlated with LTR retrotransposon copy number across a wide range of vertebrate species and timescales (Thomas and Schneider 2011), which suggests a persistent co-evolutionary relationship between TEs and ZFPs. The vast majority of mammalian KRAB-ZFPs profiled thus far bind only a single or a few TE families and often the emergence of a KRAB-ZFP closely follows the expansion of the TE family they target (Imbeault et al. 2017; Schmitges et al.

2016; Najafabadi et al. 2015). Finally, the DNA-contacting residues of many mammalian ZFPs exhibit telltale signatures of positive selection (Schmidt and

Durrett 2004; Emerson and Thomas 2009; Nowick et al. 2010), suggesting that some KRAB-ZFP adapt their targeting capacity in order to repress new DNA sequences, which may be introduced by newly expanded TE families.

Studies of two KRAB-ZFPs, ZNF93 and ZNF649, offer a compelling example of coevolution with the LINE1 (L1) family of non-LTR retrotransposons in the primate lineage (Fig. 1.2B; Jacobs et al. 2014; Fernandes et al. 2018).

ZNF93 is a primate-specific gene whose zinc fingers underwent a series of adaptive changes in the great ape lineage that enabled its binding to the L1PA6 and L1PA5 subfamilies (L1PA6-PA5) shortly after their genomic amplification.

Apparently, this had only moderate repressive effects on these subfamilies since they continued to amplify, but ZNF93 gained increased specificity for their descendants (L1PA4-PA3), which must have led to tighter repression. Indeed, a 129-bp deletion within the 5’ UTR of a L1PA3 subfamily derivative (L1PA3-

17

6030) then removed the ZNF93 , which relieved repression and apparently allowed these elements to evade, leading to propagation of their modern descendants (L1PA2/L1HS). Indeed, reinserting the 129-bp deleted segment back to its original position within the presently active L1HS element restores ZNF93 binding and transcriptional repression, and significantly decreases transposition activity of L1HS in-vitro (Jacobs et al. 2014). A recent study revealed that ZNF93 cooperated with an older KRAB-ZFP, ZNF649 to silence L1PA6 elements. Like ZNF93, ZNF649 zinc-fingers evolved to bind a motif within the 5’ UTR of the ancestral L1PA6 element, but upstream of the

129-bp region bound by ZNF93. However, descendants of L1PA6 progressively evaded ZNF649 binding through a series of point mutations within their 5’UTR, in parallel to their evasion from ZNF93 binding (Fig. 1.2B; Fernandes et al.

2018). This intricate game of cat-and-mouse provides a vivid illustration of a host-TE arms race spanning ~30 million years of primate evolution. Given the wide diversity of TEs and KRAB-ZFPs across tetrapods, it suggests many other conflicts have shaped the coevolution of KRAB-ZFP and TEs.

KRAB-ZFPs also regulate host genes and viral activity

Although evidence for a host-TE arms race is stronger for KRAB-ZFPs than piRNAs, it likely explains only a part of KRAB-ZFP evolution. It is known that several KRAB-ZFPs bind non-TE sequences and play important roles in host physiology and development that now appear independent of TE repression (Imbeault et al. 2017; Yang et al. 2017a; reviewed in Yang et al.

18

2017b). Additionally, KRAB-ZFPs can persist in the genome long after their identified TE targets have lost transposition activity and new KRAB-ZFPs can evolve to target TEs that have long ceased to be active (Imbeault et al. 2017).

These and other observations have led to the hypothesis that the recurrent interaction of KRAB-ZFPs with TEs is not a defensive response but rather a

“massive and sophisticated enterprise of TE domestication for the evolutionary benefit of the host” (Friedli and Trono 2015). In this model, KRAB-ZFPs are selected to exploit a vast reservoir of previously dispersed TE families which often contain pre-existing cis-regulatory activities (Chuong et al. 2017), to modulate host gene expression in a lineage- and cell-type specific fashion

(Yang et al. 2017b; Trono 2015; Ecco et al. 2016). While this scenario does not preclude occasional arms race with TEs, it offers a host-centric alternative worth considering as an additional driver of KRAB-ZFP evolution.

Some KRAB-ZFPs are also known to repress the activity of exogenous retroviruses. ZFP809, for example, protects mouse embryonic stem cells from

MLV replication, and its expression in differentiated cell lines is sufficient to render cells resistant to MLV infection (Wolf and Goff 2009). ZFP809 restricts

MLV by binding to the proline tRNA primer binding site of proviral DNA, which represses its transcription. Interestingly, the same primer sequence is used by various retroviruses, which suggests that a single KRAB-ZFP could potentially restrict a wide range of retroviruses. Recently, a pair of KRAB-ZFPs, known as

Suppressor of non-ecotropic ERV (Snerv ) 1 and 2 were shown via genetic analysis to be required for silencing of non-ecotropic ERV (NNERV) expression

19

(Treger et al. 2019). In immunodeficient mice, NEERV loci can recombine to generate infectious retroviruses, and expression of NEERV glycoprotein gp70 contributes to lupus nephritis susceptibility (Ottina et al. 2018; Ito et al. 2013).

Similar to ZFP809, SNERV1 recruits KAP1 to silence NEERV elements by binding sequences overlapping their LTR, including a glutamine tRNA primer binding site (Treger et al. 2019). Together these findings suggest antiviral activity may be a recurrent theme promoting the selection of novel KRAB-ZFPs.

Furthermore, the fact that multiple KRAB-ZFPs have repeatedly evolved the ability to target tRNA primer binding sites, which are some of the most evolutionary constrained sequences in retroviral genomes, attests to the ability of retroviruses to frequently evade KRAB-ZFP binding to other parts of their genome. These observations point to retroviruses as common targets and important drivers of KRAB-ZFP evolution.

1.3.3 Counter-defense mechanisms

Invoking an arms race between TEs and host control systems implies that TEs commonly evade silencing. Yet there are very few explicit cases of TEs having evolved active escape mechanisms. To our knowledge, only three examples of TE-encoded anti-silencing mechanisms have been reported so far

-- all from plants (Fu et al. 2013; Hosaka et al. 2017; Nosaka et al. 2012; McCue et al. 2014; Nosaka et al. 2014). In cultivated rice ( Oryza sativa), a family of

CACTA DNA transposons carries a microRNA gene, mir820 , which downregulates the expression of the de-novo methyltransferase gene OsDRM2

20

(Fig. 1.3B; (Nosaka et al. 2012). mir820 binding to the 3’ UTR of OsDRM2 mRNA modestly reduces OsDRM2 expression, and independent RNAi- mediated knock-down of OsDRM2 resulted in reduced DNA methylation of a variety of TEs and concomitantly elevated TE expression, suggesting that inhibition of OsDRM2 by miR820 would enable several TEs, including CACTA elements, to evade silencing (Nosaka et al. 2012; 2014). Interestingly, compensatory mutations appear to have been selected during rice evolution to maintain interactions between the TE-encoded microRNA and its binding site within the OsDRM2 mRNA, which may reflect a signature of an ongoing arms race (Nosaka et al. 2012).

In Arabidopsis, some TEs produced small interfering RNAs that can affect host gene expression in trans (tasiRNAs). One of these tasiRNAs, derived from Athila6 retrotransposons, was shown to target the 3’ UTR of the UPB1b mRNA, which encodes a host protein involved in global translational repression under stress conditions (McCue et al. 2014). tasiRNA-mediated repression of

UPB1b result in elevated transcript and protein levels of Athila6 elements, supporting the anti-silencing role of this tasiRNA. It is ironic that small RNA- based regulation, which is usually perceived as a prominent mechanism to silence TEs, would be deployed by a TE to promote its propagation.

VANDAL DNA elements in Arabidopsis provide perhaps the most convincing case thus far reported of transposons encoding a suppressor of TE silencing. At least two distantly related TE families ( VANDAL21 and VANDAL6 ) were shown to encode an accessory protein, VANC21 and VANC6 respectively,

21

that, when transiently expressed from a , induces demethylation of cognate VANDAL elements without affecting methylation of each other or any

TEs, including closely related VANDAL subfamilies (Fig 1-3A; Hosaka et al.

2017; Fu et al. 2013). How VANC promotes hypomethylation of VANDAL elements remains mechanistically unclear, but the process is dependent upon a short tandem sequence motif that is found in high copy number within

VANDAL21/6 elements, but at low copy number elsewhere in the genome

(Hosaka et al. 2017). By achieving sequence-specific anti-silencing, VANDAL elements have evolved a powerful selfish strategy that promotes their own mobility without affecting that of other transposons, thereby limiting the deleterious impact of their anti-silencing system on host fitness.

To our knowledge, no TE-encoded anti-silencing systems have been described against either the piRNA or KRAB-ZFP pathways, despite TEs being seemingly engaged in an arms race with these defense systems (Parhad et al.

2017; Wang et al. 2018; Jacobs et al. 2014). The apparent dearth of anti- silencing strategies described in eukaryotic TEs is all the more surprising given the plethora of strategies described for viruses and other pathogens to counteract host defense mechanisms. These include many examples of virally- encoded proteins that directly antagonize or degrade host defense systems, such as RNAi, CRISPR and nucleic acid sensors, to name just a few (Hynes et al. 2018; Landsberger et al. 2018; Crow et al. 2016).

22

A DNA methylation VANDAL21

A. thaliana VANC21

VANC21

VANDAL21

OSDRM2 B DNA methylation

CACTA CACTA

O. sativa miR820

OSDRM2

OSDRM2

Fig. 1.3 Evidence of TE counter-defense A) VANDAL21 elements in A. thaliana encode VANC21, which inhibits host DNA methylation (grey circles) of VANDAL21 elements. B) Some CACTA DNA transposons in O. sativa encode a miRNA, mir820 , which base-pairs with OsDRM2 mRNA and reduces translation of OsDRM2 , a DNA methyltransferase.

23

1.4 ESCAPE AND SELF-CONTROL STRATEGIES

Why are there apparently so few TE-encoded anti-silencing mechanisms? One fundamental difference between TEs and viruses is that

TEs must replicate in the germline in order to propagate within a population, whereas viruses generally do not (Haig 2016). Thus, TE fitness is intimately intertwined with the reproductive fitness of their host organisms. This dependency places an important limitation on the ability of TEs to evolve broadly effective anti-silencing mechanisms. For instance, a mechanism blocking the entire piRNA pathway would lead to massive mobilization of diverse TEs, simultaneously compromising host fertility and dooming TE propagation

(Blumenstiel et al. 2016; Haig 2016), as documented in piRNA mutant backgrounds (e.g. Wang et al. 2018). Consistent with this quandary, all TE- encoded anti-silencing mechanisms thus far described have narrow effects or modes of action, either resulting in modest decreases of host regulatory proteins

(mir820 , Athila6 tasiRNA) or selectively targeting individual families (VANC).

This is in stark contrast to viruses, which evolve mechanisms that achieve broad and/or highly effective blocks of the targeted pathways (Hynes et al. 2018;

Landsberger et al. 2018; Crow et al. 2016).

1.4.1 Bypassing host surveillance

Alternatively, TEs may evade, but do so in contexts where their activity does not impact host fitness, as is the case for gypsy and I factor retrotransposons in Drosophila melanogaster (Wang et al. 2018) . These TEs

24

hijack ovarian nurse cells, which are permissive to their transcription but apparently refractory to transposition, as factories to produce virus-like particles.

These particles are then trafficked via microtubules and delivered to the oocyte where transposition takes place (Wang et al. 2018). While these retrotransposons are still susceptible to piRNA silencing upon entry into the oocyte, assembling the viral particles in permissive cells reduces the number of transposition steps exposed to host silencing. Similarly, the virus-like particles produced by EVADÉ retrotransposons in Arabidopsis partially protect their mRNAs against small RNA-mediated degradation (Marí-Ordóñez et al. 2013).

These bypass strategies suggest that in selection favors TEs that circumvent, rather than antagonize or block host silencing pathways.

1.4.2 Self-regulatory mechanisms

If evasion is a limited option for TEs, how else can a TE ensure it remains active for extended period of time? One solution is for TEs to evolve regulatory mechanisms that minimize their deleterious effects on the host. A prominent mechanism is spatiotemporal regulation of TE activity. TEs are present in the genome of all cells, and could in principle mobilize in both germline and somatic tissues. However, mobilization in somatic tissue has a greater potential of harming organismal fitness, whereas mobilization in the germline has lesser immediate effects on host function as long as it does not affect fertility (Haig

2016). Thus, suppression of TE activity in the soma is advantageous for both hosts and TEs, and therefore predicts that TEs should evolve mechanisms to

25

restrict expression to the germline. A classic example is the Drosophila P element, whose transposase ORF is interrupted by an intron that is only spliced in the germline, preventing P element mobility in the soma where only prematurely truncated transposase is produced (Laski et al. 1986). It appears that this regulatory switch was evolved through the gain of sequence elements within the transposon that recruit somatically-expressed splicing inhibiting factors (reviewed by Majumdar and Rio 2015). A recent study adds another layer of intricacy by implicating the piRNA pathway in repressing the splicing of this intron in the germ cells through piRNA-mediated chromatin changes within the P element (Teixeira et al. 2017). Thus, spatiotemporal regulation of the P element in the fly involves an interplay of mechanisms evolved by the transposon itself and by the host.

In mammals, several retroelements are known to have evolved exquisite stage-specificity of expression in early development (reviewed in Rodriguez-

Terrones and Torres-Padilla 2018). For instance, in mouse, MaLR/MT elements are transcribed specifically in oocytes (Peaston et al. 2004; Brind’Amour et al.

2018), while transcription of mouse endogenous retrovirus type L (MERVL;

Macfarlan et al. 2012) and young L1 subfamilies (Jachowicz et al. 2017;

Percharde et al. 2018) peaks at the 2-cell (2C) stage and coincides with zygotic genome activation. In human, human endogenous retrovirus type K (HERV-K) expression peaks at the 8-cell stage of embryonic development (Grow et al.

2015), while HERVH/LTR7 are expressed in the pluripotent stem cells of the blastocyst (Wang et al. 2014; Göke et al. 2015). The mechanisms enabling such

26

developmental precision in expression are becoming increasingly clear: each of these TE families recruit host transcription factors that precisely specify these developmental stages, for example Oct4 and Nanog for HERVH/LTR7 (Kunarso et al. 2010; Ito et al. 2017) or Dux for MERVL (De Iaco et al. 2017; Hendrickson et al. 2017). Because these stages precede the differentiation of germ cells, they must allow all these different elements to generate inheritable insertions while occupying distinct expression niches, which might reduce their competition for cellular resources.

Another mitigating strategy is the evolution of suboptimal transposition and self-restraining copy number control mechanisms. In order to persist, TEs must be active enough to generate new insertions, but not so active as to impair host fitness. To this effect, some TEs have evolved mechanisms to reduce their own activity. This is most studied in Tc1/mariner transposons, which self- regulate their mobility in at least three ways: evolution of suboptimal transposases (supported by the isolation of hyperactive mutants (Liu and

Chalmers 2013; Mátés et al. 2009; Lampe et al. 1999), inhibition of transposase function when transposase expression is high, also known as over-production inhibition (Lohe and Hartl 1996), and selection for imperfect terminal inverted repeats that reduce transposition efficiency (Augé-Gouillou et al. 2001). The transposase from TcBuster , a hAT element from the beetle Tribolium castaneum, also formed aggregates when overexpressed (Woodard et al.

2017), as do Ac elements in maize (Heinlem et al. 1994). A subset of Ty1 elements in yeast also encode a truncated Gag protein with a dominant-

27

negative effect on Ty1 copy number (Saha et al. 2015). These examples suggest that self-control mechanisms are commonly used by TEs to mitigate their deleterious impact.

1.4.3 Targeting preferences

TEs have repeatedly evolved mechanisms to direct their insertion to

“safe havens” or regions of the genome where insertion will cause minimal harm. Studies that map de novo transposition events have shown that TEs, especially those colonizing compact genomes, have repeatedly evolved mechanisms to target benign or highly redundant regions of the genome. For example, Ty1 and Ty3 LTR retrotransposons in yeast, Skipper retroelements in

Dictyostelium discoideum, and Dada DNA transposons in fish independently evolved preferences for integration in the immediate vicinity of tRNA genes, with apparently little to no impact on tRNA expression (for review: Sultana et al.

2017; Cheung et al. 2018). Another safe harbor has been adopted by R1 and

R2 non-LTR retroelements in arthropods (Pérez-González and Eickbush 2002) as well as Pokey DNA transposons in Daphnia (Penton and Crease 2004), which independently evolved targeting to ribosomal DNA arrays. Ty5 in yeast

(Zou et al. 1996), Het-A/TAHRE/TART in Drosophila (Pardue and DeBaryshe

2011) and TRAS/SART (Fujiwara et al. 2005) in silkworm all independently evolved the ability to target (sub)telomeric regions. Several TE families also show preference for insertion upstream of protein-coding genes, including Tf retrotransposons in fission yeast (Levin and Boeke 1992), Drosophila P

28

elements (Liao et al. 2000), maize Mutator elements (Dietrich et al. 2002), and rice mPing transposons (Naito et al. 2009). While insertion of these elements in this compartment must occasionally perturb host gene expression, it is still less likely to be detrimental than in coding regions. It also provides these elements with the added benefit of an ‘open’ chromatin environment that will facilitate further mobilization and might shield them against host silencing. Remarkably,

TEs can also evolve preference for insertion into other TEs, such as Tx1 non-

LTR retrotransposon which target Tx1d DNA transposons in Xenopus laevis

(Christensen et al. 2000) and Tourist elements that preferentially insert into other Tourist elements in rice and maize (Jiang and Wessler 2001), a strategy shared by various TE families in maize (Stitzer et al. 2019). Collectively, these data indicate that TEs have repeatedly adapted to occupy genomic niches that minimize the cost of transposition on host fitness.

1.5 HOST-TRANSPOSON MUTUALISM

TE self-regulation and targeting may reduce the impact TEs have on host fitness, but they do not directly provide a selective advantage to the host. Is it conceivable that TEs and their hosts could achieve such a mutualistic relationship? Mutualism can be attained if maintenance of TE activity directly and immediately benefits the host. This form of host-TE cooperation is commonplace in bacteria, where transposons and conjugative frequently carry and shuttle antibiotic resistance genes and other adaptation to environmental stress (Wintersdorff et al. 2016). Thus far, very few examples

29

akin to host-TE mutualism have been documented in eukaryotes. Here we highlight three possible cases: telomeric retroelements in Drosophila , TBE- mediated genome rearrangement in the ciliate Oxytricha trifallax , and an emerging role for mammalian retrotransposons in facilitating early embryonic development.

1.5.1 Candidate host-transposon mutualisms

Retrotransposons maintain Drosophila telomeres (and centromeres?)

Eukaryotes have adopted several mechanisms to ensure replication of the ends of linear . In most, chromosomes are protected by telomeric repeats maintained by a specialized called (Fig

1-4A; Kordyukova et al. 2018). Drosophilid species, however, have lost the gene encoding telomerase and in all Drosophila species thus far examined (with one exception, discussed below) their telomeric repeats have been replaced by arrays of non-LTR retrotransposons, referred to collectively as telomeric elements (Fig 1-4A; Casacuberta 2017; Pardue and DeBaryshe 2014). In D. melanogaster, telomeric elements belong to three related jockey -like subfamilies HeT-A, TAHRE, TART that are continuously inserted head-to-tail at ends via target-primed reverse-transcription supported by proteins produced from autonomous TAHRE and TART copies (Fig. 1.4A).

Messenger RNA transcribed from all three elements are imported into the nucleus by virtue of their association with their respective gag-like proteins, but targeting of TAHRE RNPs to telomeres requires association with the Het-A gag

30

(Pardue and DeBaryshe 2014; Rashkova et al. 2002). Het-A, in turn, requires pol proteins from TAHRE/TART to be reverse transcribed (Pardue and

DeBaryshe 2014). Thus, telomeric elements cooperate for their own amplification and for telomere maintenance (Capkova Frydrychova et al. 2008).

As such, telomeric elements have long been regarded as the prototypical example of host-TE mutualism (Kidwell and Lisch 2001), and illustrate that TE families can also cooperate with each other. Intriguingly, evidence is mounting that a distantly related group of non-LTR retrotranposons (G2/ jockey3 subfamilies) contributes directly to the organization and function of centromeres of D. melanogaster and of its sister species D. simulans (Chang et al. 2019).

Active transposons are required for Oxytricha development

The role of telomere-bearing elements (TBEs) in O. trifallax genome rearrangement represents another tantalizing case of host-TE mutualism.

Ciliates, such as O. trifallax , are single-celled eukaryotes that possess two types of nuclei: a transcriptionally silent, diploid germline micronucleus (MIC) and a somatic macronucleus (MAC) that maintains gene expression in vegetative cells

(Fig. 1.4B; Chen et al. 2014). Genes in the micronuclear genome are interrupted by noncoding sequences referred to as internal-eliminated-sequences (IES), and in O. trifallax the exons are arranged out of order and often inverted (Chen et al. 2014). TEs comprise a large fraction of IES, and contribute substantially to the size of micronuclear genome (Hamilton et al. 2016; Chen et al. 2014).

IES removal is required for proper MIC to MAC development and occurs through

31

an extensive genome rearrangement shortly after sexual conjugation. The process reduces the 1-Gb micronuclear genome to an ~50-Mb macronuclear genome, consisting of thousands of single-gene chromosomes (~2-kb long) that are subsequently amplified to thousands of copies (Fig. 1.4B; Swart et al. 2013;

Chen et al. 2014). This complex remodeling of the genome requires TBEs, a family of DNA transposons (Nowacki et al. 2009). TBEs mobilize during meiosis, and experimental silencing of all three families of TBE transposases via RNAi impairs cell growth and causes cell death due to defects in germline elimination of both TBEs and IES (Nowacki et al. 2009). Silencing of a single TBE family was insufficient to cause this phenotype, indicating that all three TBE families cooperate to promote MAC development (Nowacki et al. 2009). Therefore, TBE elements not only remain active in the Oxytricha genome but also provide an indispensable developmental role for their hosts, suggesting a mutualistic compromise was established during Oxytricha evolution (Vogt et al. 2013).

Additional support for this model comes from purifying selection acting on TBE- encoded transposases, which suggest that TBE transposases are under evolutionary constraint to serve both transposon and host function (Chen and

Landweber 2016).

32

Fig. 1.4 Cooperation paves the way for cooption. A) Drosophila telomeric transposons as a potential model for telomerase evolution. In D. melanogaster , three non-LTR transposons families maintain telomeres. Telomeric elements form a head to tail array in the telomere. Gag and reverse-transcriptase (RT) proteins from HeT-A, TAHRE and TART complex with their cognate mRNAs, forming a ribonucleoprotein (RNP) complex capable of telomere elongation. In other organisms, telomerase reverse- transcriptase (TERT) and telomerase RNA (TR) form the telomerase complex, which maintains telomeres. B) Transposase proteins necessary for germline IES elimination in Tetrahymena and Paramecium may have evolved from a mechanism analogous to TBE excision in Oxytricha (see text). In the developing Oxytricha MAC , TBE transposases excise internal eliminated sequence (IES). Subsequent steps stitch together exons into single-gene chromosomes, which are amplified to thousands of copies in the mature MAC. In Paramecium and Tetrahymena, PiggyBac-derived proteins are domesticated for IES recognition and excision. ORF=Open reading frame; Pgm=PiggyMac and interactors; TBP= Tetrahymena piggyBac-like. Scissors represent transposase proteins; arrows represent genes.

33

Is mammalian embryogenesis addicted to transposable elements?

There is growing evidence that retrotransposons and endogenous retroviruses are intimately intertwined with mammalian embryonic development.

MERVL and HERVL appear to promote zygotic genome activation in mice and humans respectively (Kigami et al. 2003; Macfarlan et al. 2012; Svoboda et al.

2004; Hendrickson et al. 2017; De Iaco et al. 2017; Whiddon et al. 2017).

Furthermore, MERVL transcription in the 2C embryo triggers the formation of hundreds of chromatin loops that fold the genome in a 3D organization that is unique and potentially crucial to this transient totipotent stage (Kruse et al.

2019). In human embryonic stem cells (hESCs), a distinct family, HERVH, is highly expressed and marks pluripotent cell populations (Santoni et al. 2012;

Wang et al. 2014; Göke et al. 2015). As for MERVL, the transcription of several

HERVH loci appears to promote the formation of chromatin loops and topological domains that are unique to hESCs (Zhang et al. 2018) as well as the expression of long noncoding RNAs (Kelley and Rinn 2012; Kapusta et al. 2013;

Lu et al. 2013; Wang et al. 2014). RNAi knock-down of some of these individual

HERVH-derived lncRNAs in induced pluripotent stem cells result in loss of pluripotency (Izsvak, et al. 2015) and CRISPR knock-out of a single HERVH locus increased capacity to differentiate into cardiomyocytes (Zhang et al.

2018). While the regulatory activities emanating from MERVL and HERVH elements are intriguing, it remains unclear whether they are relic of selfish or cooperative strategies that these TEs deployed to occupy a niche facilitating

34

their transmission or if they have been coopted for lineage-specific developmental innovations (Haig 2016; Chuong et al. 2017; Izsvak et al. 2015).

Recently, evidence has surfaced for a possible partnership between L1 retrotransposons and mouse embryonic development. There is a striking nuclear accumulation of the RNA produced by the youngest, transpositionally active L1 subfamilies in the 2-cell (2C) embryo. Experimental depletion of endogenous L1 RNAs at that stage elicited a wide range of chromatin and regulatory defects that block developmental progression (Jachowicz et al. 2017;

Percharde et al. 2018). Interestingly, a similar phenotype was observed when

L1 transcription was experimentally prolonged beyond the 2C stage by targeted chromatin manipulation (TALE-VP64), but not when full-length L1 mRNA was injected, and the phenotype could not be rescued by inhibiting L1 reverse transcriptase. Together, these results suggest that the precise transcriptional activation of L1 in the nuclei of preimplantation embryos is required for the establishment of a chromatin state that promotes developmental progression.

Mechanistically, nuclear L1 RNAs appear to exert these activities by recruiting at least two host proteins, Nucleolin and KAP1, that in complex activate ribosomal RNA transcription and repress the Dux locus , which stimulates exit from the 2C stage (Percharde et al. 2018). It is unknown whether these findings are unique to the mouse, but it is worth noting Nucleolin was previously reported to bind the human L1 RNA (Peddigari et al. 2013). Furthermore, it remains difficult to rule out off-target or non-specific effects with the experimental

35

approaches used to manipulate L1 expression in these studies. But undoubtedly these intriguing observations call for further investigation.

1.5.2 Conflicts in disguise?

Although the examples described above suggest that mutualistic interactions between TEs and their hosts may be more widespread than currently appreciated, they remain open to alternative interpretations. Rather than true mutualisms, they may be viewed as commensalisms (benefiting the

TE but of no direct benefit to the host) or even as “addictions”, where TEs have supplanted ancestral functions essential to the host without providing an adaptive benefit, but nevertheless creating a dependency on active transposition (Jangam et al. 2017). Such addiction may precipitate into a conflict if transposition occurs in excess or otherwise incurs a net cost on host fitness, which may once again set an arms race with the host to evolve mechanisms to control transposition (Fig. 4A). There is growing evidence that such instability may be at play at Drosophila telomeres. First, intra- and inter-specific evolutionary analyses of 29 genes encoding proteins associated with telomere maintenance and function revealed that they have experienced repeated bouts of diversifying selection indicating the existence of recurrent conflicts necessitating the rapid adaptation of telomeric proteins (Lee et al. 2016).

Furthermore, a recent study tracing the evolution of D. melanogaster telomeric retroelements in closely related species shows that this lineage of elements has experienced rapid turnover with drastic changes in their abundance across

36

species (Saint-Leandre et al. 2018). These results uncover a paradoxical level of evolutionary instability for seemingly essential components of the genome.

Surprisingly, it appears that one species, D. biarmipes, has even lost altogether the telomeric retroelements, which are now replaced by a mixture of unrelated

TEs at the tip of its chromosomes (Saint-Leandre et al. 2018). It remains to be seen how D. biarmipes copes with the loss of telomeric retroelements, as the

TEs found at their telomeres appear to be no longer active (Saint-Leandre et al.

2018). These findings support the idea that replacing vital host functions with

TE activity may be an evolutionarily contentious and ultimately untenable situation.

1.6 EN ROUTE TO COOPTION

An advantage of adopting a cooperation-centric model of host-TE interaction is that cooperation provides a facile path for TE cooption to emerge.

Cooption is the process by which natural selection utilizes TE sequences to evolve new host function. Once considered rare, numerous examples of TE cooption have now been described, though the evolutionary route and forces that initially lead to cooption remain poorly understood (reviewed in Jangam et al. 2017; Frank and Feschotte 2017; Chuong et al. 2017; Sinzelle et al. 2009;

Feschotte and Pritham 2007). We propose that the nature of the cooperative activities TEs and host engage in directly influence the function TEs get coopted for.

37

The case of Drosophila telomeric retroelements provides a good example to illustrate this model. There are striking similarities between this system and telomerase, the mechanism used by most eukaryotes to maintain telomere length (Fig 1-4A; Pardue and DeBaryshe 2014; Kordyukova et al.

2018). Telomerase is a reverse transcriptase that appears most closely related to that currently encoded by Penelope -like retrotransposons (Gladyshev and

Arkhipova 2007). Moreover, in organisms lacking telomerase ( Drosophila ) or with low telomerase expression (e.g. the silkworm Bombyx mori ) telomeric retrotransposons can supplant telomerase function (Servant and Deininger

2016; Fujiwara 2015). This evidence supports the long-standing model that these two distinct forms of telomeric maintenance may have arisen from a common ancestral transposon (Eickbush 1997; Nakamura and Cech 1998;

Pardue and DeBaryshe 2014). If so, then the cooperation between Drosophila telomeric retrotransposons and their hosts may be an evolutionary replay of how an ancient group of retroelements (perhaps Penelope -like) maintained their activity prior to being coopted by the host for telomere maintenance (Fig. 1.4A).

O. trifallax and its TBEs appear to be engaged in a mutualistic, or perhaps addicted, relationship. But other ciliates, such as Paramecium and

Tetrahymena , appear to have advanced that relationship a step further: these species utilize fully-domesticated transposase proteins to mediate their MIC to

MAC transition (Fig. 1.4B). In Paramecium, PiggyMac (Pgm) is a piggyBac - derived transposase required for DNA elimination that catalyzes and interacts with five additional related Pgm-like proteins to ensure complete IES targeting

38

(Bischerour et al. 2018; Bétermier and Duharcourt 2014). Tetrahymena also utilizes a suite of related transposase-derived genes (TBP, TBP2, TBP6, and

LIA5) to ensure proper genome rearrangement and IES deletion (Cheng et al.

2016; Vogt and Mochizuki 2013). Although these proteins are all clearly related to piggyBac transposases, they are extremely diverged from each other and share no close similarity to any extant TEs in these species, indicating they have been fully and possibly independently coopted (Bétermier and Duharcourt 2014;

Cheng et al. 2010; Vogt et al. 2013). Whether or not these transposases were once derived from elements with similar activity as TBEs in Oxytricha is unknown, but there are clear sequence similarities between ciliate IES and the termini of transposons (Hamilton et al. 2016; Cheng et al. 2016; Fass et al.

2011). These observations suggest that the Pgm and TBP transposases may have been coopted as a means to resolve a similar “addiction” to their cognate elements as seen in Oxytricha , which would link their cooption directly to their initial cooperation with their hosts (Fig 1-4B).

1.7 OUTLOOK

In this review, we have examined two host-encoded TE silencing systems, piRNAs and KRAB-ZFPs, that carry strong signatures of adaptive evolution in some lineages, suggesting that they are engaged in an arms-race with TEs (McLaughlin and Malik 2017). However there remains no evidence that

TEs have evolved mechanisms directly neutralizing either of these two pathways and a variety of other factors may explain their rapid evolution.

39

Clearly, more work is needed to better understand the forces driving the diversification of these systems and to gain a fuller picture of their interactions with TEs across a broad range of species.

Adaptive evolution of piRNAs has been extensively studied in insects, but not in mammals, and the few studies that have addressed this gap suggest mammalian piRNA proteins have not diversified extensively (Yi et al.

2014). One explanation for this contrast could be that mammals have replaced the antiviral arm of the piRNA pathway by other systems. Indeed, mammalian cells possess a variety of antiviral responses, including nucleic acid sensors and interferon responses, which might have supplanted or relieved the piRNA pathway to carry out this immune function (reviewed in tenOever 2016). Such a scenario would support the hypothesis that piRNA pathway evolution outside of mammals is driven primarily by its antiviral function.

Detailed studies of KRAB-ZFPs so far have only been carried out in mice and humans, but they are also dramatically expanded in diverse tetrapods

(Imbeault et al. 2017). While evidence has built for an intricate coevolution of

KRAB-ZFP and L1 retrotransposons in primates (Jacobs et al. 2014; Fernandes et al. 2018) (Fig. 1.2B), it remains unclear how common such a tug of war is. It may be illuminating to investigate these questions in tetrapods with more aggressive TE activity, such as frogs, axolotl, opposum or bats (Pritham and

Feschotte 2007; Ray et al. 2008; Rogers et al. 2018; Sotero-Caio et al. 2017;

Mikkelsen et al. 2007; Hellsten et al. 2010; Nowoshilow et al. 2018).

40

Another take-home message is that the arms race is only one, but perhaps not the most prevalent form of host-TE interaction in eukaryotes. TEs have also evolved subtle evasive strategies as well as self-control and targeting mechanisms that must attenuate the cost of transposition on host fitness. Some

TEs even appear to have engaged in cooperative strategies with their host organism in a way that resembles a mutualistic or symbiotic relationship. While few cases of mutualism have surfaced thus far in eukaryotes, this strategy is commonplace in prokaryotes (Wintersdorff et al. 2016). It is possible that symbiotic interactions are widespread in eukaryotes, but more difficult to capture because they are more challenging to disclose and test experimentally

– in part because of the large amounts of TEs that need to be manipulated simultaneously (e.g. retrotransposons in mammalian embryogenesis). The advent of genome editing and other large-scale perturbations offer new powerful tools to overcome these challenges (Bourque, et al. 2018; Fuentes, et al. 2018;

Smith, et al. 2019). It is also possible that many mutualistic interactions are evolutionarily unstable and volatile, because they are prone to tilt back and forth between disproportionately benefiting the TE (e.g. Drosophila telomeres) or the host to turn into full cooption events (Fig. 1.5).

41

Fig. 1.5 Model for host-TE interactions Conflict: TEs (purple) harm the host (orange), leading to host silencing of TEs. TEs occasionally evolve direct anti-silencing mechanisms (dashed line). Most host-TE conflict leads to TE death. Cooperation: TEs evolve self-regulatory mechanisms to mitigate impacts on host fitness. Hosts and TEs can also evolve a mutualistic relationship. Cooperation can lead to both conflict and cooption. Cooption: Host repurposes all or part of a TE for novel host function at the expense of the TE.

42

We therefore envision a model whereby the host and TE cooperate for a period of time, which resolves in one of three ways: 1) the TE no longer cooperates, leading to reactivation and possible loss of the family in the population if too active (arms-race), 2) the TE fades into obscurity due to relaxed selection pressure on its sequence, or 3) maintenance of TE features for cellular function rather than the TE family as a whole, leading to eventual loss of the TE family (cooption) (Fig. 1.5). Validating the model will require the study of transitional systems such as those described in this review and others that will predictably surface.

43

REFERENCES

Ade C, Roy-Engel AM, Deininger PL. 2013. Alu elements: an intrinsic source of instability. Current Opinion in Virology 3: 639–645.

Augé-Gouillou C, Hamelin MH, Demattei MV, Periquet M, Bigot Y. 2001. The wild-type conformation of the Mos-1 Inverted Terminal Repeats is suboptimal for transposition in bacteria. Mol Gen Genomics 265 : 51–57.

Bagijn MP, Goldstein LD, Sapetschnig A, Weick E-M, Bouasker S, Lehrbach NJ, Simard MJ, Miska EA. 2012. Function, Targets, and Evolution of Caenorhabditis elegans piRNAs. Science 337 : 574.

Batista PJ, Ruby JG, Claycomb JM, Chiang R, Fahlgren N, Kasschau KD, Chaves DA, Gu W, Vasale JJ, Duan S, et al. 2008. PRG-1 and 21U-RNAs Interact to Form the piRNA Complex Required for Fertility in C. elegans. Molecular Cell 31 : 67–78.

Begun DJ, Lindfors HA, Kern AD, Jones CD. 2007. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176 : 1131–1137.

Bennetzen JL, Wang H. 2014. The Contributions of Transposable Elements to the Structure, Function, and Evolution of Plant Genomes. Annu Rev Plant Biol 65 : 505–530.

Bétermier M, Duharcourt S. 2014. Programmed Rearrangement in Ciliates: Paramecium. Microbiology spectrum 2.

Bischerour J, Bhullar S, Denby Wilkes C, Régnier V, Mathy N, Dubois E, Singh A, Swart E, Arnaiz O, Sperling L, et al. 2018. Six domesticated PiggyBac transposases together carry out programmed DNA elimination in Paramecium. eLife 7: 990.

Blumenstiel JP, Erwin AA, Hemmer LW. 2016. What Drives Positive Selection in the Drosophila piRNA Machinery? The Genomic Autoimmunity Hypothesis. The Yale journal of biology and medicine 89 : 499–512.

Borges F, Martienssen RA. 2015. The expanding world of small RNAs in plants. Nature Reviews Molecular Cell Biology 2018 20:1 16 : 727–741.

Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvak Z, Levin HL, Macfarlan TS, et al. 2018. Ten things you should know about transposable elements. Genome Biol 19 : 199.

44

Brennecke J, Malone CD, Aravin AA, Sachidanandam R, Stark A, Hannon GJ. 2008. An epigenetic role for maternally inherited piRNAs in transposon silencing. Science 322 : 1387–1392.

Brind’Amour J, Kobayashi H, Albert JR, Shirane K, Sakashita A, Kamio A, Bogutz A, Koike T, Karimi MM, Lefebvre L, et al. 2018. LTR retrotransposons transcribed in oocytes drive species-specific and heritable changes in DNA methylation. Nature Communications 2018 9:1 9: 3331.

Bucheton A, Paro R, Sang HM, Pelisson A, Finnegan DJ. 1984. The molecular basis of I-R hybrid Dysgenesis in drosophila melanogaster: Identification, cloning, and properties of the I factor. Cell 38 : 153–163.

Burns KH. 2017. Transposable elements in cancer. Nature Reviews Cancer 2017 17:7 17 : 415–424.

Burt A, Trivers R. 2006. Genes in Conflict . Belknap Press of Harvard University Press, Cambridge.

Capkova Frydrychova R, Biessmann H, Mason JM. 2008. Regulation of telomere length in Drosophila. Cytogenetic and Genome Research 122 : 356–364.

Carmell MA, Girard A, van de Kant HJG, Bourc’his D, Bestor TH, de Rooij DG, Hannon GJ. 2007. MIWI2 Is Essential for Spermatogenesis and Repression of Transposons in the Mouse Male Germline. Developmental Cell 12 : 503– 514.

Casacuberta E. 2017. Drosophila: Retrotransposons Making up Telomeres. Viruses 9.

Castillo DM, Mell JC, Box KS, Blumenstiel JP. 2011. Molecular evolution under increasing transposable element burden in Drosophila : A speed limit on the evolutionary arms race. BMC 2011 11:1 11 : 258.

Chang C-H, Chavan A, Palladino J, Wei X, Martins NMC, Santinello B, Chen C- C, Erceg J, Beliveau BJ, Wu C-T, et al. 2019. Islands of retroelements are the major components of Drosophila centromeres. bioRxiv 537357.

Chen X, Bracht JR, Goldman AD, Dolzhenko E, Clay DM, Swart EC, Perlman DH, Doak TG, Stuart A, Amemiya CT, et al. 2014. The Architecture of a Scrambled Genome Reveals Massive Levels of Genomic Rearrangement during Development. Cell 158 : 1187–1198.

Chen X, Landweber LF. 2016. Phylogenomic analysis reveals genome-wide purifying selection on TBE transposons in the ciliate Oxytricha. Mobile DNA 7: 95–2.

45

Cheng C-Y, Vogt A, Mochizuki K, Yao M-C. 2010. A Domesticated piggyBacTransposase Plays Key Roles in Heterochromatin Dynamics and DNA Cleavage during Programmed DNA Deletion in Tetrahymena thermophila ed. K.S. Bloom. MBoC 21 : 1753–1762.

Cheng C-Y, Young JM, Lin C-YG, Chao J-L, Malik HS, Yao M-C. 2016. The piggyBac transposon-derived genes TPB1 and TPB6 mediate essential transposon-like excision during the developmental rearrangement of key genes in Tetrahymena thermophila. Genes Dev 30 : 2724–2736.

Cheung S, Manhas S, Measday V. 2018. Retrotransposon targeting to RNA polymerase III-transcribed genes. Mobile DNA 9: 14.

Christensen S, Pont-Kingdon G, Carroll D. 2000. Comparative studies of the endonucleases from two related Xenopus laevis retrotransposons, Tx1L and Tx2L: target site specificity and evolutionary implications. Genetica 110 : 245–256.

Chuong EB, Elde NC, Feschotte C. 2017. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet 18 : 71–86.

Consortium TCES. 1998. Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology. Science 282 : 2012.

Crow MS, Lum KK, Sheng X, Song B, Cristea IM. 2016. Diverse mechanisms evolved by DNA viruses to inhibit early host defenses. Critical Reviews in Biochemistry and Molecular Biology 51 : 452–481.

Czech B, Munafò M, Ciabrelli F, Eastwood EL, Fabry MH, Kneuss E, Hannon GJ. 2018. piRNA-Guided Genome Defense: From Biogenesis to Silencing. Annu Rev Genet 52 : 131–157.

De Cecco M, Ito T, Petrashen AP, Elias AE, Skvir NJ, Criscione SW, Caligiana A, Brocculi G, Adney EM, Boeke JD, et al. 2019. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature 566 : 73–78.

De Fazio S, Bartonicek N, Di Giacomo M, Abreu-Goodger C, Sankar A, Funaya C, Antony C, Moreira PN, Enright AJ, O’Carroll D. 2011. The endonuclease activity of Mili fuels piRNA amplification that silences LINE1 elements. Nature 480 : 259–263.

De Iaco A, Planet E, Coluccio A, Verp S, Duc J, Trono D. 2017. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nature Genetics 2013 45:3 49 : 941–945.

46

de Koning APJ, Gu W, Castoe TA, Batzer MA, Pollock DD. 2011. Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLOS Genetics 7: e1002384.

Deininger PL, Moran JV, Batzer MA, Kazazian HH. 2003. Mobile elements and mammalian genome evolution. Current Opinion in Genetics & Development 13 : 651–658.

Deng W, Lin H. 2002. miwi, a Murine Homolog of piwi, Encodes a Cytoplasmic Protein Essential for Spermatogenesis. Developmental Cell 2: 819–830.

Dietrich CR, Cui F, Packila ML, Li J, Ashlock DA, Nikolau BJ, Schnable PS. 2002. Maize Mu transposons are targeted to the 5' untranslated region of the gl8 gene and sequences flanking Mu target-site duplications exhibit nonrandom nucleotide composition throughout the genome. Genetics 160 : 697–716.

Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284 : 601–603.

Dubnau J. 2018. The Retrotransposon storm and the dangers of a Collyer's genome. Current Opinion in Genetics & Development 49 : 95–105.

Ecco G, Cassano M, Kauzlaric A, Duc J, Coluccio A, Offner S, Imbeault M, Rowe HM, Turelli P, Trono D. 2016. Transposable Elements and Their KRAB-ZFP Controllers Regulate Gene Expression in Adult Tissues. Developmental Cell 36 : 611–623.

Eickbush TH. 1997. Telomerase and Retrotransposons: Which Came First? Science 277 : 911.

Eickbush TH, Furano AV. 2002. Fruit flies and humans respond differently to retrotransposons. Current Opinion in Genetics & Development 12 : 669–674.

Emerson RO, Thomas JH. 2009. Adaptive evolution in zinc finger transcription factors. PLOS Genetics 5: e1000325–e1000325.

Ernst C, Odom DT, Kutter C. 2017. The emergence of piRNAs against transposon invasion to preserve mammalian genome integrity. Nature Communications 2018 9:1 8: 1411.

Erwin JA, Marchetto MC, Gage FH. 2014. Mobile DNA elements in the generation of diversity and complexity in the brain. Nat Rev Neurosci 15 : 497–506.

47

Fablet M, Akkouche A, Braman V, Vieira C. 2014. Variable expression levels detected in the Drosophila effectors of piRNA biogenesis. Gene 537 : 149– 153.

Fass JN, Joshi NA, Couvillion MT, Bowen J, Gorovsky MA, Hamilton EP, Orias E, Hong K, Coyne RS, Eisen JA, et al. 2011. Genome-Scale Analysis of Programmed DNA Elimination Sites in Tetrahymena thermophila. G3 1: 515–522.

Fedoroff NV. 2012. Presidential address. Transposable elements, epigenetics, and genome evolution. Science 338 : 758–767.

Fernandes JD, Haeussler M, Armstrong J, Tigyi K, Gu J, Filippi N, Pierce J, Thisner T, Angulo P, Katzman S, et al. 2018. KRAB Zinc Finger Proteins coordinate across evolutionary time scales to battle retroelements. bioRxiv 429563.

Feschotte C, Pritham E. 2007. DNA Transposons and the Evolution of Eukaryotic Genomes. Annu Rev Genet 41 : 331–368.

Frank JA, Feschotte C. 2017. Co-option of endogenous viral sequences for host cell function. Current Opinion in Virology 25 : 81–89.

Friedli M, Trono D. 2015. The Developmental Control of Transposable Elements and the Evolution of Higher Species. Annual Review of Cell and Developmental Biology 31 : null.

Frost RJA, Hamra FK, Richardson JA, Qi X, Bassel-Duby R, Olson EN. 2010. MOV10L1 is necessary for protection of spermatocytes against retrotransposons by Piwi-interacting RNAs. Proc Natl Acad Sci USA 107 : 11847.

Fu Y, Kawabe A, Etcheverry M, Ito T, Toyoda A, Fujiyama A, Colot V, Tarutani Y, Kakutani T. 2013. Mobilization of a plant transposon by expression of the transposon-encoded anti-silencing factor. EMBO J 32 : 2407–2417.

Fuentes DR, Swigut T, Wysocka J. 2018. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. eLife 7: e35989.

Fujiwara H. 2015. Site-specific non-LTR retrotransposons eds. Craig, Chandler, Gellert, Lambowitz, Rice, and Sandmeyer. Microbiology spectrum 3: 1147– 1163.

Fujiwara H, Osanai M, Matsumoto T, Kojima KK. 2005. Telomere-specific non- LTR retrotransposons and telomere maintenance in the silkworm, Bombyx mori. Chromosome Res 13 : 455–467.

48

Gagnier L, Belancio VP, Mager DL. 2019. Mouse germ line mutations due to retrotransposon insertions. Mobile DNA .

Gilbert C, Feschotte C. 2018. Horizontal acquisition of transposable elements and viral sequences: patterns and consequences. Current Opinion in Genetics & Development 49 : 15–24.

Gladyshev EA, Arkhipova IR. 2007. Telomere-associated endonuclease- deficient Penelope -like retroelements in diverse eukaryotes. Proc Natl Acad Sci USA 104 : 9352.

Goodier JL. 2016. Restricting retrotransposons: a review. Mobile DNA 7: 344– 16.

Göke J, Lu X, Chan Y-S, Ng HH, Ly L-H, Sachs F, Szczerbinska I. 2015. Dynamic Transcription of Distinct Classes of Endogenous Retroviral Elements Marks Specific Populations of Early Human Embryonic Cells. Cell Stem Cell 16 : 135–141.

Grow EJ, Flynn RA, Chavez SL, Bayless NL, Wossidlo M, Wesche DJ, Martin L, Ware CB, Blish CA, Chang HY, et al. 2015. Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. Nature 522 : 221– 225.

Haig D. 2016. Transposable elements: Self-seekers of the germline, team- players of the soma. BioEssays 38 : 1158–1166.

Hamilton EP, Kapusta A, Huvos PE, Bidwell SL, Zafar N, Tang H, Hadjithomas M, Krishnakumar V, Badger JH, Caler EV, et al. 2016. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome. eLife 5: e19090.

Han K, Lee J, Meyer TJ, Remedios P, Goodwin L, Batzer MA. 2008. L1 recombination-associated deletions generate human genomic variation. Proc Natl Acad Sci USA 105 : 19366.

Hancks DC, Kazazian HH. 2016. Roles for retrotransposon insertions in human disease. Mobile DNA 7: 9.

Heinlem M, Brattig T, Kunze R. 1994. In vivo aggregation of maize Activator (Ac) transposase in nuclei of maize endosperm and Petunia protoplasts. Plant J 5: 705–714.

Hellsten U, Harland RM, Gilchrist MJ, Hendrix D, Jurka J, Kapitonov V, Ovcharenko I, Putnam NH, Shu S, Taher L, et al. 2010. The Genome of the Western Clawed Frog Xenopus tropicalis . Science 328 : 633.

49

Hendrickson PG, Doráis JA, Grow EJ, Whiddon JL, Lim J-W, Wike CL, Weaver BD, Pflueger C, Emery BR, Wilcox AL, et al. 2017. Conserved roles of mouse DUX and human DUX4 in activating cleavage-stage genes and MERVL/HERVL retrotransposons. Nature Genetics 2013 45:3 49 : 925–934.

Hollister JD, Gaut BS. 2009. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res 19 : 1419–1428.

Hosaka A, Saito R, Takashima K, Sasaki T, Fu Y, Kawabe A, Ito T, Toyoda A, Fujiyama A, Tarutani Y, et al. 2017. Evolution of sequence-specific anti- silencing systems in Arabidopsis. Nature Communications 2018 9:1 8: 2161.

Hoskins RA, Carlson JW, Wan KH, Park S, Mendez I, Galle SE, Booth BW, Pfeiffer BD, George RA, Svirskas R, et al. 2015. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res .

Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. 2013. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496 : 498–503.

Hurst GDD, Werren JH. 2001. The role of selfish genetic elements in eukaryotic evolution. Nat Rev Genet 2: 597–606.

Hynes AP, Rousseau GM, Agudelo D, Goulet A, Amigues B, Loehr J, Romero DA, Fremaux C, Horvath P, Doyon Y, et al. 2018. Widespread anti-CRISPR proteins in virulent bacteriophages inhibit a range of Cas9 proteins. Nature Communications 2018 9:1 9: 2919.

Imbeault M, Helleboid P-Y, Trono D. 2017. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543 : 550–554.

Ito J, Sugimoto R, Nakaoka H, Yamada S, Kimura T, Hayano T, Inoue I. 2017. Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses ed. C. Feschotte. PLOS Genetics 13 : e1006883.

Ito K, Baudino L, Kihara M, Leroy V, Vyse TJ, Evans LH, Izui S. 2013. Three Sgp loci act independently as well as synergistically to elevate the expression of specific endogenous retroviruses implicated in murine lupus. Journal of Autoimmunity 43 : 10–17.

Izsvak Z, Wang J, Singh M, Mager DL, Hurst LD. 2015. Pluripotency and the endogenous retrovirus HERVH: Conflict or serendipity? BioEssays 38 : 109– 117.

50

Jachowicz JW, Bing X, Pontabry J, Boskovic A, Rando OJ, Torres-Padilla M-E. 2017. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nature Genetics 2013 45:3 49 : 1502–1510.

Jacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, Paten B, Salama SR, Haussler D. 2014. An evolutionary arms race between KRAB zinc finger genes 91/93 and SVA/L1 retrotransposons. Nature 516 : 242–245.

Jangam D, Feschotte C, Betrán E. 2017. Transposable Element Domestication As an Adaptation to Evolutionary Conflicts. Trends in Genetics 33 : 817–831.

Jiang N, Wessler SR. 2001. Insertion Preference of Maize and Rice Miniature Inverted Repeat Transposable Elements as Revealed by the Analysis of Nested Elements. Plant Cell 13 : 2553.

Kabayama Y, Toh H, Katanaya A, Sakurai T, Chuma S, Kuramochi-Miyagawa S, Saga Y, Nakano T, Sasaki H. 2017. Roles of MIWI, MILI and PLD6 in small RNA regulation in mouse growing oocytes. Nucleic Acids Res 12 : gkx027–5398.

Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, Yandell M, Feschotte C. 2013. Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs. PLOS Genetics 9: e1003470.

Kelleher ES, Barbash DA. 2013. Analysis of piRNA-Mediated Silencing of Active TEs in Drosophila melanogaster Suggests Limits on the Evolution of Host Genome Defense. Molecular Biology and Evolution 30 : 1816–1829.

Kelley D, Rinn J. 2012. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol 13 : R107–R107.

Kidwell MG, Kidwell JF, Sved JA. 1977. Hybrid Dysgenesis in DROSOPHILA MELANOGASTER: A Syndrome of Aberrant Traits Including , Sterility and Male Recombination. Genetics 86 : 813–833.

Kidwell MG, Lisch DR. 2001. PERSPECTIVE: TRANSPOSABLE ELEMENTS, PARASITIC DNA, AND GENOME EVOLUTION. Evolution 55 : 1–24.

Kigami D, Minami N, Takayama H, Imai H. 2003. MuERV-L Is One of the Earliest Transcribed Genes in Mouse One-Cell Embryos1. biolreprod 68 : 651–654.

Klattenhoff C, Xi H, Li C, Lee S, Xu J, Khurana JS, Zhang F, Schultz N, Koppetsch BS, Nowosielska A, et al. 2009. The Drosophila HP1 Homolog

51

Rhino Is Required for Transposon Silencing and piRNA Production by Dual- Strand Clusters. Cell 138 : 1137–1149.

Klemm SL, Shipony Z, Greenleaf WJ. 2019. Chromatin accessibility and the regulatory epigenome. Nat Rev Genet 20 : 207–220.

Klose RJ, Bird AP. 2006. Genomic DNA methylation: the mark and its mediators. Trends in Biochemical Sciences 31 : 89–97.

Kordyukova M, Olovnikov I, Kalmykova A. 2018. Transposon control mechanisms in telomere biology. Current Opinion in Genetics & Development 49 : 56–62.

Kruse K, Diaz N, Enriquez-Gasca R, Gaume X, Torres-Padilla M-E, Vaquerizas JM. 2019. Transposable elements drive reorganisation of 3D chromatin during early embryogenesis. bioRxiv 523712.

Kunarso G, Chia N-Y, Jeyakani J, Hwang C, Lu X, Chan Y-S, Ng HH, Bourque G. 2010. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nature Genetics 2013 45:3 42 : 631–634.

Kuramochi-Miyagawa S, Kimura T, Ijiri TW, Isobe T, Asada N, Fujita Y, Ikawa M, Iwai N, Okabe M, Deng W, et al. 2004. Mili, a mammalian member of piwi family gene, is essential for spermatogenesis. Development 131 : 839– 849.

Kuramochi-Miyagawa S, Watanabe T, Gotoh K, Totoki Y, Toyoda A, Ikawa M, Asada N, Kojima K, Yamaguchi Y, Ijiri TW, et al. 2008. DNA methylation of retrotransposon genes is regulated by Piwi family members MILI and MIWI2 in murine fetal testes. Genes Dev 22 : 908–917.

Lampe DJ, Akerley BJ, Rubin EJ, Mekalanos JJ, Robertson HM. 1999. Hyperactive transposase mutants of the Himar1 mariner transposon. Proc Natl Acad Sci USA 96 : 11428–11433.

Landsberger M, Gandon S, Meaden S, Rollie C, Chevallereau A, Chabas H, Buckling A, Westra ER, van Houte S. 2018. Anti-CRISPR Phages Cooperate to Overcome CRISPR-Cas Immunity. Cell 174 : 908–916.e12.

Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG. 2008. Evolution of protein-coding genes in Drosophila. Trends in Genetics 24 : 114–123.

Laski FA, Rio DC, Rubin GM. 1986. Tissue specificity of Drosophila P element transposition is regulated at the level of mRNA splicing. Cell 44 : 7–19.

52

Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ, Lohr JG, Harris CC, Ding L, Wilson RK, et al. 2012a. Landscape of somatic retrotransposition in human cancers. Science 337 : 967–971.

Lee H-C, Gu W, Shirayama M, Youngman E, Conte D Jr, Mello CC. 2012b. C. elegans piRNAs Mediate the Genome-wide Surveillance of Germline Transcripts. Cell 150 : 78–87.

Lee YCG. 2015. The Role of piRNA-Mediated Epigenetic Silencing in the Population Dynamics of Transposable Elements in Drosophila melanogaster ed. D.A. Petrov. PLOS Genetics 11 : e1005269.

Lee YCG, Leek C, Levine MT. 2016. Recurrent Innovation at Genes Required for Telomere Integrity in Drosophila. Molecular Biology and Evolution 150 : msw248–482.

Levin HL, Boeke JD. 1992. Demonstration of retrotransposition of the Tf1 element in fission yeast. EMBO J 11 : 1145–1153.

Lewis SH, Quarles KA, Yang Y, Tanguy M, Frézal L, Smith SA, Sharma PP, Cordaux R, Gilbert C, Giraud I, et al. 2018. Pan-arthropod analysis reveals somatic piRNAs as an ancestral defence against transposable elements. Nat Ecol Evol 2: 174–181.

Lewis SH, Salmela H, Obbard DJ. 2016. Duplication and Diversification of Dipteran Argonaute Genes, and the Evolutionary Divergence of Piwi and Aubergine. Genome Biology and Evolution 8: 507–518.

Léger P, Lara E, Jagla B, Sismeiro O, Mansuroglu Z, Coppée JY, Bonnefoy E, Bouloy M. 2013. Dicer-2- and Piwi-Mediated RNA Interference in Rift Valley Fever Virus-Infected Mosquito Cells. J Virol 87 : 1631–1648.

Liao G-C, Rehm EJ, Rubin GM. 2000. Insertion site preferences of the P transposable element in Drosophila melanogaster. Proc Natl Acad Sci USA 97 : 3347–3351.

Liu D, Chalmers R. 2013. Hyperactive mariner transposons are created by mutations that disrupt allosterism and increase the rate of transposon end synapsis. Nucleic Acids Res 42 : 2637–2645.

Lohe AR, Hartl DL. 1996. Autoregulation of mariner transposase activity by overproduction and dominant-negative complementation. Molecular Biology and Evolution 13 : 549–555.

Lu Q, Ren S, Lu M, Zhang Y, Zhu D, Zhang X, Li T. 2013. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 14 : 651–651.

53

Lu X, Sachs F, Ramsay L, Jacques P-É, Göke J, Bourque G, Ng HH. 2014. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nature Structural & Molecular Biology 2014 21:4 21 : 423– 425.

Macfarlan TS, Gifford WD, Driscoll S, Lettieri K, Rowe HM, Bonanomi D, Firth A, Singer O, Trono D, Pfaff SL. 2012. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487 : 57–63.

Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, Casillas S, Han Y, Magwire MM, Cridland JM, et al. 2012. The Drosophila melanogaster Genetic Reference Panel. Nature 482 : 173–178.

Majumdar S, Rio DC. 2015. P Transposable Elements in Drosophila and other Eukaryotic Organisms eds. Craig, Chandler, Gellert, Lambowitz, Rice, and Sandmeyer. Microbiology spectrum 3: 727–752.

Marí-Ordóñez A, Marchais A, Etcheverry M, Nature AM, 2013. 2013. Reconstructing de novo silencing of an active plant retrotransposon. Nature Genetics 45 : 1029–1039.

Matthews BJ, Dudchenko O, Kingan SB, Koren S, Antoshechkin I, Crawford JE, Glassford WJ, Herre M, Redmond SN, Rose NH, et al. 2018. Improved reference genome of Aedes aegypti informs arbovirus vector control. Nature 563 : 501–507.

Mátés L, Chuah MKL, Belay E, Jerchow B, Manoj N, Acosta-Sanchez A, Grzela DP, Schmitt A, Becker K, Matrai J, et al. 2009. Molecular evolution of a novel hyperactive Sleeping Beauty transposase enables robust stable gene transfer in vertebrates. Nature Genetics 2013 45:3 41 : 753–761.

McCue AD, Nuthikattu S, Slotkin RK. 2014. Genome-wide identification of genes regulated in transby transposable element small interfering RNAs. RNA Biology 10 : 1379–1395.

McLaughlin RN Jr, Malik HS. 2017a. Genetic conflicts: the usual suspects and beyond. The Journal of experimental biology 220 : 6–17.

Miesen P, Ivens A, Buck AH, van Rij RP. 2016. Small RNA Profiling in Dengue Virus 2-Infected Aedes Mosquito Cells Reveals Viral piRNAs and Novel Host miRNAs ed. G.D. Ebel. PLOS Neglected Tropical Diseases 10 : e0004452.

Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, et al. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447 : 167–177.

54

Mohn F, Sienski G, Handler D, Brennecke J. 2014. The Rhino-Deadlock-Cutoff Complex Licenses Noncanonical Transcription of Dual-Strand piRNA Clusters in Drosophila. Cell 157 : 1364–1379.

Montgomery EA, Huang SM, Langley CH, Judd BH. 1991. Chromosome rearrangement by ectopic recombination in Drosophila melanogaster: genome structure and evolution. Genetics 129 : 1085–1098.

Morazzani EM, Wiley MR, Murreddu MG, Adelman ZN, Myles KM. 2012. Production of Virus-Derived Ping-Pong-Dependent piRNA-like Small RNAs in the Mosquito Soma ed. S.-W. Ding. PLOS Pathogens 8: e1002470.

Naito K, Zhang F, Tsukiyama T, Saito H, Hancock CN, Richardson AO, Okumoto Y, Tanisaka T, Wessler SR. 2009. Unexpected consequences of a sudden and massive transposon amplification on rice gene expression. Nature 461 : 1130–1134.

Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A, Albu M, Weirauch MT, Radovani E, Kim PM, et al. 2015. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotech advance online publication .

Nakamura TM, Cech TR. 1998. Reversing Time: Origin of Telomerase. Cell 92 : 587–590.

Neuffer MG, Coe EH, Wessler SR. 1997. Mutants of Maize . Revised. Cold Spring Harbor Laboratory Press.

Newkirk SJ, Lee S, Grandi FC, Gaysinskaya V, Rosser JM, Vanden Berg N, Hogarth CA, Marchetto MCN, Muotri AR, Griswold MD, et al. 2017. Intact piRNA pathway prevents L1 mobilization in male meiosis. Proc Natl Acad Sci USA 114 : E5635.

Nosaka M, Ishiwata A, Shimizu-Sato S, Ono A, Ishimoto K, Noda Y, Sato Y. 2014. The copy number of rice CACTA DNA transposons carrying MIR820does not correlate with MIR820expression. Plant Signaling & Behavior 8: e25169.

Nosaka M, Itoh J-I, Nagato Y, Ono A, Ishiwata A, Sato Y. 2012. Role of Transposon-Derived Small RNAs in the Interplay between Genomes and Parasitic DNA in Rice ed. A.C. Ferguson-Smith. PLOS Genetics 8: e1002953.

Nowacki M, Higgins BP, Maquilan GM, Swart EC, Doak TG, Landweber LF. 2009. A Functional Role for Transposases in a Large Eukaryotic Genome. Science 324 : 935–938.

55

Nowick K, Hamilton AT, Zhang H, Stubbs L. 2010. Rapid sequence and expression divergence suggest selection for novel function in primate- specific KRAB-ZNF genes. Molecular Biology and Evolution 27 : 2606–2617.

Nowoshilow S, Schloissnig S, Fei J-F, Dahl A, Pang AWC, Pippel M, Winkler S, Hastie AR, Young G, Roscito JG, et al. 2018. The axolotl genome and the evolution of key tissue formation regulators. Nature 554 : 50–55.

Obbard DJ, Gordon KHJ, Buck AH, Jiggins FM. 2009. The evolution of RNAi as a defence against viruses and transposable elements. Philos Trans R Soc Lond, B, Biol Sci 364 : 99–115.

Orgel LE, Crick FHC. 1980. Selfish DNA: the ultimate parasite. Nature 284 : 604–607.

Ottina E, Levy P, Eksmond U, Merkenschlager J, Young GR, Roels J, Stoye JP, Tüting T, Calado DP, Kassiotis G. 2018. Restoration of Endogenous Retrovirus Infectivity Impacts Mouse Cancer Models. Cancer Immunol Res 6: 1292.

Ozata DM, Gainetdinov I, Zoch A, O’Carroll D, Zamore PD. 2019. PIWI- interacting RNAs: small RNAs with big functions. Nat Rev Genet 20 : 89– 108.

Pardue M-L, DeBaryshe PG. 2014. Drosophila Telomeres: A Variation on the Telomerase Theme. Fly 2: 101–110.

Pardue ML, DeBaryshe PG. 2011. Retrotransposons that maintain chromosome ends. Proc Natl Acad Sci USA 108 : 20317–20324.

Parhad SS, Theurkauf WE. 2018. Rapid evolution and conserved function of the piRNA pathway. Open Biology 9: 180181.

Parhad SS, Tu S, Weng Z, Theurkauf WE. 2017. Adaptive Evolution Leads to Cross-Species Incompatibility in the piRNA Transposon Silencing Machinery. Developmental Cell 43 : 60–70.e5.

Parrish NF, Fujino K, Shiromoto Y, Iwasaki YW, Ha H, Xing J, Makino A, Kuramochi-Miyagawa S, Nakano T, Siomi H, et al. 2015. piRNAs derived from ancient viral processed pseudogenes as transgenerational sequence- specific immune memory in mammals. RNA 21 : 1691–1703.

Pasyukova EG. 2004. Accumulation of Transposable Elements in the Genome of Drosophila melanogaster is Associated with a Decrease in Fitness. Journal of Heredity 95 : 284–290.

56

Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, Knowles BB. 2004. Retrotransposons Regulate Host Genes in Mouse Oocytes and Preimplantation Embryos. Developmental Cell 7: 597–606.

Peddigari S, Li PW-L, Rabe JL, MARTIN SL. 2013. hnRNPL and nucleolin bind LINE-1 RNA and function as host factors to modulate retrotransposition. Nucleic Acids Res 41 : 575–585.

Penton EH, Crease TJ. 2004. Evolution of the Transposable Element Pokey in the Ribosomal DNA of Species in the Subgenus Daphnia (Crustacea: Cladocera). Molecular Biology and Evolution 21 : 1727–1739.

Percharde M, Lin C-J, Yin Y, Guan J, Peixoto GA, Bulut-Karslioglu A, Biechele S, Huang B, Shen X, Ramalho-Santos M. 2018. A LINE1-Nucleolin Partnership Regulates Early Development and ESC Identity. Cell 174 : 391– 405.e19.

Petit M, Mongelli V, Frangeul L, Blanc H, Jiggins F, Saleh M-C. 2016. piRNA pathway is not required for antiviral defense in Drosophila melanogaster. Proc Natl Acad Sci USA 113 : E4218–E4227.

Pérez-González CE, Eickbush TH. 2002. Rates of R1 and R2 retrotransposition and elimination from the rDNA locus of Drosophila melanogaster. Genetics 162 : 799–811.

Pritham EJ, Feschotte C. 2007. Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus. Proceedings of the National Academy of Sciences of the United States of America 104 : 1895– 1900.

Rashkova S, Karam SE, Kellum R, Pardue M-L. 2002. Gag proteins of the two Drosophila telomeric retrotransposons are targeted to chromosome ends. J Cell Biol 159 : 397.

Ray DA, Feschotte C, Pagan HJT, Smith JD, Pritham EJ, Arensburger P, Atkinson PW, Craig NL. 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res 18 : 717–728.

Robinson HL, Astrin SM, Senior AM, Salazar FH. 1981. Host Susceptibility to endogenous viruses: defective, glycoprotein-expressing proviruses interfere with infections. J Virol 40 : 745–751.

Rodriguez-Terrones D, Torres-Padilla M-E. 2018. Nimble and Ready to Mingle: Transposon Outbursts of Early Development. Trends in Genetics .

Rogers RL, Summers K, Wu Y, Guo C, Zheng J, Xun X, Xiong Z, Yang H, Zhou L, Zhang G, et al. 2018. Genomic Takeover by Transposable Elements in

57

the Strawberry Poison Frog. Molecular Biology and Evolution 35 : 2913– 2927.

Saha A, Mitchell JA, Nishida Y, Hildreth JE, Ariberre JA, Gilbert WV, Garfinkel DJ. 2015. A trans-dominant form of Gag restricts Ty1 retrotransposition and mediates copy number control. ed. W.I. Sundquist. J Virol 89 : 3922–3938.

Saint-Leandre B, Nguyen SC, Levine M. 2018. Diversification and collapse of a telomere elongation mechanism. bioRxiv 445429.

Santoni FA, Guerra J, Luban J. 2012. HERV-H RNA is abundant in human embryonic stem cells and a precise marker for pluripotency. Retrovirology 9: 111–111.

Schauer SN, Carreira PE, Shukla R, Gerhardt DJ, Gerdes P, Sánchez-Luque FJ, Nicoli P, Kindlova M, Ghisletti S, Santos Dos A, et al. 2018. L1 retrotransposition is a common feature of mammalian hepatocarcinogenesis. Genome Res 28 : 639–653.

Schmidt D, Durrett R. 2004. Adaptive Evolution Drives the Diversification of Zinc-Finger Binding Domains. Molecular Biology and Evolution 21 : 2326– 2339.

Schmitges FW, Radovani E, Najafabadi HS, Barazandeh M, Campitelli LF, Yin Y, Jolma A, Zhong G, Guo H, Kanagalingam T, et al. 2016. Multiparameter functional diversity of human C2H2 zinc finger proteins. Genome Res 26 : 1742–1752.

Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al. 2009. The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 326 : 1112.

Schnettler E, Donald CL, Human S, Watson M, Siu RWC, McFarlane M, Fazakerley JK, Kohl A, Fragkoudis R. 2013. Knockdown of piRNA pathway proteins results in enhanced Semliki Forest virus production in mosquito cells. Journal of General Virology 94 : 1680–1689.

Servant G, Deininger PL. 2016. Insertion of Retrotransposons at Chromosome Ends: Adaptive Response to Chromosome Maintenance. Front Genet 6: 1620.

Simkin A, Wong A, Poh Y-P, Theurkauf WE, Jensen JD. 2013. RECURRENT AND RECENT SELECTIVE SWEEPS IN THE piRNA PATHWAY. Evolution 67 : 1081–1090.

58

Sinzelle L, Izsvak Z, Ivics Z. 2009. Molecular domestication of transposable elements: From detrimental parasites to useful host genes. Cell Mol Life Sci 66 : 1073–1093.

Smith CJ, Castanon O, Said K, Volf V, Khoshakhlagh P, Hornick A, Ferreira R, Wu C-T, Güell M, Garg S, et al. 2019. Enabling large-scale genome editing by reducing DNA nicking. bioRxiv 574020.

Song J, Liu J, Schnakenberg SL, Ha H, Xing J, Chen KC. 2014. Variation in piRNA and Transposable Element Content in Strains of Drosophila melanogaster. Genome Biology and Evolution 6: 2786–2798.

Sotero-Caio CG, Platt RN II, Suh A, Ray DA. 2017. Evolution and Diversity of Transposable Elements in Vertebrate Genomes. Genome Biology and Evolution 9: 161–177.

Stitzer MC, Anderson SN, Springer NM, Ross-Ibarra J. 2019. The Genomic Ecosystem of Transposable Elements in Maize. bioRxiv 559922.

Sultana T, Zamborlini A, Cristofari G, Lesage P. 2017. Integration site selection by retroviruses and transposable elements in eukaryotes. Nat Rev Genet 18 : 292–308.

Sun YH, Xie LH, Zhuo X, Chen Q, Ghoneim D, Bin Zhang, Jagne J, Yang C, Li XZ. 2017. Domestic chickens activate a piRNA defense against avian leukosis virus. eLife 6: 8634.

Svoboda P, Stein P, Anger M, Bernstein E, Hannon GJ, Schultz RM. 2004. RNAi and expression of retrotransposons MuERV-L and IAP in preimplantation mouse embryos. Developmental Biology 269 : 276–285.

Swart EC, Bracht JR, Magrini V, Minx P, Chen X, Zhou Y, Khurana JS, Goldman AD, Nowacki M, Schotanus K, et al. 2013. The Oxytricha trifallax Macronuclear Genome: A Complex Eukaryotic Genome with 16,000 Tiny Chromosomes ed. J.A. Eisen. PLOS Biology 11 : e1001473–e1001473.

Tang Z, Steranka JP, Ma S, Grivainis M, Rodić N, Huang CRL, Shih I-M, Wang T-L, Boeke JD, Fenyo D, et al. 2017. Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc Natl Acad Sci USA 114 : E733–E740.

Teixeira FK, Okuniewska M, Malone CD, Coux R-X, Rio DC, Lehmann R. 2017. piRNA-mediated regulation of transposon alternative splicing in the soma and germ line. Nature 8: 272–272. tenOever BR. 2016. The Evolution of Antiviral Defense Systems. Cell Host & Microbe 19 : 142–149.

59

Thomas JH, Schneider S. 2011. Coevolution of retroelements and tandem zinc finger genes. Genome Res 21 : 1800–1812.

Treger RS, Pope SD, Kong Y, Tokuyama M, Taura M, Iwasaki A. 2019. The Lupus Susceptibility Locus Sgp3 Encodes the Suppressor of Endogenous Retrovirus Expression SNERV. Immunity 50 : 334–347.e9.

Trono D. 2015. Transposable Elements, Polydactyl Proteins, and the Genesis of Human-Specific Transcription Networks. Cold Spring Harbor symposia on quantitative biology 80 : 281–288.

Tubio JMC, Li Y, Ju YS, Martincorena I, Cooke SL, Tojo M, Gundem G, Pipinikas CP, Zamora J, Raine K, et al. 2014. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345 : 1251343–1251343.

Van Valen L. 1973. A new evolutionary law. Evolutionary Theory 1–30.

Vodovar N, Bronkhorst AW, van Cleef KWR, Miesen P, Blanc H, van Rij RP, Saleh M-C. 2012. Arbovirus-Derived piRNAs Exhibit a Ping-Pong Signature in Mosquito Cells ed. S. Pfeffer. PLOS ONE 7: e30861.

Vogt A, Goldman AD, Mochizuki K, Landweber LF. 2013. Transposon Domestication versus Mutualism in Ciliate Genome Rearrangements ed. S.M. Rosenberg. PLOS Genetics 9: e1003659–e1003659.

Vogt A, Mochizuki K. 2013. A domesticated PiggyBac transposase interacts with heterochromatin and catalyzes reproducible DNA elimination in Tetrahymena. PLOS Genetics 9: e1004032–e1004032.

Wang G, Reinke V. 2008. A C. elegans Piwi, PRG-1, Regulates 21U-RNAs during Spermatogenesis. Current Biology 18 : 861–867.

Wang J, Xie G, Singh M, Ghanbarian AT, Raskó T, Szvetnik A, Cai H, Besser D, Prigione A, Fuchs NV, et al. 2014. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516 : 405–409.

Wang L, Barbash D, Kelleher E. 2019. Divergence of piRNA pathway proteins affects piRNA biogenesis but not TE transcript level. bioRxiv 521773.

Wang L, Dou K, Moon S, Tan FJ, Zhang ZZ. 2018. Hijacking Oogenesis Enables Massive Propagation of LINE and Retroviral Transposons. Cell .

Werren JH. 2011. Selfish genetic elements, genetic conflict, and evolutionary innovation. Proc Natl Acad Sci USA 108 Suppl 2 : 10863–10870.

60

Whiddon JL, Langford AT, Wong C-J, Zhong JW, Tapscott SJ. 2017. Conservation and innovation in the DUX4-family gene network. Nature Genetics 2013 45:3 49 : 935–940.

Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. 2007. A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8: 973–982.

Wintersdorff von CJH, Penders J, van Niekerk JM, Mills ND, Majumder S, van Alphen LB, Savelkoul PHM, Wolffs PFG. 2016. Dissemination of Antimicrobial Resistance in Microbial Ecosystems through . Front Microbiol 7: 305–173.

Wolf D, Goff SP. 2009. Embryonic stem cells use ZFP809 to silence retroviral DNAs. Nature 458 : 1201–1204.

Wolf G, Yang P, Füchtbauer AC, Füchtbauer E-M, Silva AM, Park C, Wu W, Nielsen AL, Pedersen FS, Macfarlan TS. 2015. The KRAB zinc finger protein ZFP809 is required to initiate epigenetic silencing of endogenous retroviruses. Genes Dev 29 : 538–554.

Woodard LE, Downes LM, Lee Y-C, Kaja A, Terefe ES, Wilson MH. 2017. Temporal self-regulation of transposition through host-independent transposase rodlet formation. Nucleic Acids Res 45 : 353–366.

Wu F, Schweizer C, Rudinskiy N, Taylor DM, Kazantsev A, Luthi-Carter R, Fraering PC. 2010. Novel γ-secretase inhibitors uncover a common nucleotide-binding site in JAK3, SIRT2, and PS1. The FASEB Journal 24 : 2464–2474.

Yang P, Wang Y, Hoang D, Tinkham M, Patel A, Sun M-A, Wolf G, Baker M, Chien H-C, Lai K-YN, et al. 2017a. A placental growth factor is silenced in mouse embryos by the zinc finger protein ZFP568. Science 356 : 757–759.

Yang P, Wang Y, Macfarlan TS. 2017b. The Role of KRAB-ZFPs in Transposable Element Repression and Mammalian Evolution. Trends in Genetics 33 : 871–881.

Yi M, Chen F, Luo M, Cheng Y, Zhao H, Cheng H, Zhou R. 2014. Rapid Evolution of piRNA Pathway in the Teleost Fish: Implication for an Adaptation to Transposon Diversity. Genome Biology and Evolution 6: 1393–1407.

Yoder JA, Walsh CP, Bestor TH. 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends in Genetics 13 : 335–340.

61

Zhang Y, Li T, Preissl S, Grinstein J, Farah E, Destici E, Lee AY, Chee S, Qiu Y, Ma K, et al. 2018. 3D Chromatin Architecture Remodeling during Human Cardiomyocyte Differentiation Reveals A Novel Role of HERV-H In Demarcating Chromatin Domains. bioRxiv 485961.

Zou S, Ke N, Kim JM, Voytas DF. 1996. The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev 10 : 634–645.

62 CHAPTER 2

RECURRENT EVOLUTION OF VERTEBRATE TRANSCRIPTION FACTORS VIA TRANSPOSASE CAPTURE 2

2.1 ABSTRACT

Although genes play a critical role in organismal evolution, the mechanisms by which genes of novel function evolve remain obscure. Here we investigate the capacity of transposases, proteins that promote DNA transposon mobility, to supply domains and splice sites for the assembly of novel genes via exon- shuffling. We find that transposase domains have been captured to form new host fusion proteins at least 88 times independently over ~350 Myr of tetrapod evolution. Transposase capture occurs primarily through alternative splicing and is biased for incorporation of DNA-binding domains to chromatin regulatory domains, and in particular the transcriptionally repressive KRAB domain.

Consistent with this, we found that KRAB-transposase fusions born in four distinct mammalian lineages can repress gene expression in sequence-specific fashion. Functional studies of one KRAB-transposase fusion, KRABINER, in bat cells demonstrate that it acts as a transcriptional regulator. Transposase

2 This chapter is currently under preparation for publication, and will be available on bioRxiv after submission (Cosby RL, Judd J, Zhang R, Zhong A, Garry N, Pritham E, and Feschotte C). RC, EP, and FC developed the project. RC designed all experiments, performed all experiments except PRO-seq (JJ), analyzed all data, and wrote the manuscript. With help from RC, RZ performed evolutionary analyses. AZ, with help from RC, performed the luciferase assays. Garry N, with help from RC, generated transgenic rescue lines. FC assisted in writing and editing of the manuscript.

capture is thus a heretofore underappreciated mechanism to generate novel vertebrate TFs.

2.2 INTRODUCTION

As the fundamental unit of inheritance, genes play a critical role in organismal evolution. Consequently, the origins of genes and the mechanisms by which new genes evolve is an active area of study. No other mechanism of gene birth has been more studied than gene duplication (Ohno 1970), which has contributed to the expansion of many gene families, including the essential developmental regulatory Hox (Ruddle, et al. 1994) and Pax genes (Bouchard, et al. 2008). Although these gene duplicates have evolved diverging developmental functions relative to their inferred ancestral gene, their biochemical functions as transcription factors remains constrained (Ruddle, et al 1994; Bouchard, et al. 2008). Genes with entirely novel biochemical functions occasionally arise via gene duplication (Deng, et al. 2010; Lynch 2007), but this appears to be rare (Innan 2009). This begs the question, how do genes with entirely novel biochemical functions evolve? Genes with new biochemical functions could evolve de-novo, which recent studies suggest may be common

(for review Van Oss and Carvunis, 2019). However, de-novo gene birth is likely to be a slow process, limiting its overall utility. Neither gene duplication nor exon shuffling seems sufficient to explain the evolution of genes with novel biological function.

64

Exon-shuffling could address the limitations of both gene duplication and de-novo gene birth. Initially proposed by Gilbert, exon-shuffling is the process whereby varied biochemical protein domains are brought together to generate a gene, typically through splicing (1978). In this way, diverse protein domains, with their pre-built functions, can combine to rapidly generate proteins with myriad possible functions. Exon shuffling is therefore a potentially powerful mechanism to generate novel functional genes, but requires a source of both protein domains and splice sites as raw material.

One possible source of raw material for exon-shuffling are DNA transposons, or . DNA transposons encode transposase proteins, which recognize and mobilize DNA through direct sequence-specific interaction with their cognate transposons (Feschotte and

Pritham, 2007). Because of this life cycle, transposase proteins possess both sequence-specific DNA binding domains and catalytic domains, all of which could be repurposed, or coopted, by the host (Feschotte and Pritham, 2007).

Moreover, the very mobility of DNA transposons would facilitate exon-shuffling by introducing these functional domains into new genomic contexts, where they could then be spliced to host domains to generate novel host-transposase fusion (HTF) genes. This process, transposase capture, is the proposed origin mechanism for the Pax gene family, which appears to have acquired its paired

DNA-binding domain from an ancient mariner transposase (Breitling and Gerber

2000). Additional anecdotal examples have also been described (Tellier and

Chalmers, 2019; Gray, et al. 2012; Feschotte and Pritham 2007), but the extent

65

of transposase capture, mechanisms facilitating it, and the functions of the resulting genes remain unclear. Here we show that transposase capture is a recurrent and widespread mechanism for novel gene birth via exon-shuffling over the past ~400 million years of vertebrate evolution, and that this process appears predisposed to generate new transcription factors (TFs).

2.3 RESULTS

2.3.1 Transposase capture is a pervasive mechanism to generate novel genes in tetrapods

Table 2.1 NCBI Refseq genomes queried in HTF search Taxa name Taxa ID # genomes Tetrapoda 32523 596 Mammalia 40674 373 Eutheria 9347 366 Metatheria 9263 6 Monotremata 9255 1 Sauropsida 8457 213 Aves 8782 160 Testidunes 8459 20 Crocodilia 1294634 4 Squamata 8509 28 Amphibia 8292 10

To identify host-transposase fusion (HTF) genes, we surveyed all tetrapod gene annotations (NCBI Refseq; Table 2.1) predicted to encode proteins with at least one domain of transposase origin (Pfam, Table 2.2) fused in-frame to a host-derived protein sequence, and required RNA-sequencing

66

evidence supporting all annotated exon/intron junctions (Conserved Domain

Architecture Retrieval Tool [CDART]; Geer, et al. 2002; see Methods). To trace the evolutionary origin of each HTF gene, we searched for syntenic orthologs as well as paralogs across all vertebrate genomes available in the NCBI Refseq database (see Methods). This analysis revealed 98 unique HTF genes originating from 88 independent fusion events and 10 subsequent duplication events across the 596 species examined (Fig. 2.1; Table 2.3). This is likely a minimal estimate given the stringency of our pipeline, which requires robust gene models supported by transcriptomic data.

67

Table 2.2 Transposase PFAM domains used as query for HTF search ID Access Type Description TE ion superfamily HTH_Tnp PF0322 Domain Tc5 transposase Tc1/mariner _Tc5 1 DNA-binding 1,2 domain HTH cl21459 Superfamily Helix-turn-helix Tc1/mariner domains 1,2 rve PF0066 Domain core Tc1/mariner, 5 domain Ginger 3,4 DDE_1_7 PF1384 Domain Transposase IS4 PiggyBac 3 3 DDE_1_4 PF1370 Domain Transposase DDE PIF/Harbinger 1 domain group 1 3 zf-BED PF0289 Domain BED zinc finger hAT 5,6 2 Dimer_Tn PF0569 Domain hAT family C- hAT 7 p_hAT 9 terminal dimerisation region THAP cl02739 Superfamily THAP domain P element, Kolobok 8,9 FLYWCH PF0450 Domain FLYWCH zinc Mutator 10 0 finger domain 1 Pietrokovski and Henikoff 1997 2 Aravind, et al. 2005 3 Yuan and Wessler 2011 4 Bao, et al. 2010 5 Aravind 2000 6 Hayward, et al. 2013 7 Mobile DNA III 2015 8 Roussigne, et al. 2003 9 Kapitonov and Jurka 2007

68

Fig. 2.1: Gene birth by transposase capture is pervasive in tetrapods. Boxes represent HTF fusion genes; color indicates transposase superfamily assimilated. OWM=Old world monkeys; NWM=New world monkeys; GM=Gray mouse; H=Hystricoid; C=Castorid, M=Muroid, Miniopt=Miniopterid, Vesper=Vespertilionid, S.S=soft-shelled; B=bearded dragon; G=Green; B=Burmese python; L=Lacertid; J=Japanese; T=Tropical; A=African; M=Mountain; LCA=last common ancestor; MY=million years.

69

Fusion events appear to have occurred continuously during evolution.

Some (11.4%) preceded the divergence of tetrapods, while others arose more recently as they were conserved across relatively small species lineages (< 5 species, 26.1%), or found in a single species (20.4%) (Fig. 2.1; Table 2.3).

Several species experienced multiple HTFs of recent origins, such as the green anole (n=7), the Burmese python (n=3), the tropical clawed frog (n=2), and the vespertilionid bats (n=2), consistent with recent episodes of DNA transposon activity documented in these lineages (Fig. 2.1; Alföldi, et al. 2011; Castoe, et al. 2013; Mitros, et al. 2019; Ray, et al. 2007; Pritham and Feschotte 2007).

Mammals generally have more HTF genes (mean = 40.16 +/- 2.73) than other classes (Reptiles: mean = 29.61 +/- 2.53; Amphibians: mean = 27.3 +/- 2.08), reflecting apparent bursts of HTF evolution in mammalian (5.7%), therian (3.4%) and eutherian (9.1%) ancestors. All known major eukaryotic DNA transposon superfamilies contributed HTFs (Fig. 2.1). Tc1/ mariner (42.9%), hAT (23.5%), and P element/ Kolobok (21.4%) transposases predominate, which mirrors the success of these superfamilies throughout tetrapod evolution (Fig. 2.1; Sotero-

Caio, et al. 2017).

To validate that the transposase coding region of each HTF gene has evolved under functional constraint, we performed codon selection analysis for each HTF shared by two or more species separated by >50 Myr of divergence

(n = 57/98). All tested HTF transposases display signatures of purifying selection ( dN/dS < 1 , p < 0.001 , LRT, Table 2.3), validating their domestication for organismal function. Taken together, these data suggest that HTF has been

70

a recurrent mechanism for the generation of novel cellular genes in tetrapod evolution.

71 Table 2.3: Summary of identified host-transposase fusion genes

Study ID Taxonomic Estimated Transposase TE best hit Best Host Transcripts Gene birth Fusion dN/dS dN/dS Refseq species span fusion age type domains (Repbase) repabase domains present mechanism support p value ID'd in (my) (Pfam) hit (Pfam) (% ID/ %SIM) KTIGD1 Pteropodidae < 58 Tigger HTH_Tnp GOLEM/Tig 92 %ID KRAB (x2) Parental + fusion Alternative 100% cov; 0.384 0.0037 Pteropus _Tc5, ger3 splicing 4 samples vampyrus HTH, rve w/ SFAI (105304939), Pteropus alecto (102881560), Rousettus aegypticus (107502811) KMARD1 Chinchilla < 57 Mariner HTH_Tnp Mariner- 71 %ID KRAB Fusion only Splicing 100% cov; NT NT Chinchilla lanigera _Tc5, 1_PM 20 lanigera HTH, rve samples (102029308) w/ SFAI KMARD2 Monotremata 46-177 Tigger HTH_Tnp Mariner- 63 %ID KRAB Fusion only Splicing 100% cov; NT NT Ornithorhynchu _Tc5, 2_AMi/Mon 1 sample s anatinus HTH, rve oRep123/M w/ SFAI (103168937) ariner- 1_Crp KTIGD2 Monodelphis < 80 Tigger HTH_Tnp TIGGER7 76 %ID KRAB Fusion only Splicing 100% cov; NT NT Monodelphis domestica _Tc5, 15 domestica HTH, rve samples (103103968) w/ SFAI KMARD3 Anolis < 150 Mariner HTH_Tnp Mariner- 93 %ID KRAB, Fusion only Splicing 100% cov; NT NT Anolis a carolinensis _Tc5, 2_Acar SCAN, 2 samples carolinensis HTH, rve ZF_C2H2 w/ SFAI (103281019) (x4) KMARD3 Anolis < 150 Mariner HTH_Tnp Mariner- 93 %ID KRAB, Fusion only Segmental 100% cov; NT NT Anolis b carolinensis _Tc5, 2_Acar SCAN, duplication 4 samples carolinensis HTH, rve ZF_C2H2 w/ SFAI (100561333) (x4) KMARD4 Trionychinae < 53 Mariner HTH_Tnp Mariner- 79 %ID KRAB Fusion only Splicing 100% cov; 0.2481 0.0000 Pelodiscus _Tc5, 6_Crp 6 samples 6 11 sinensis HTH, rve, w/ SFAI (102461695), CENPB Apalone spinifera

KMARD5 Archosauria 250-300 Mariner HTH_Tnp Mariner- 20/37 %S KRAB Fusion only Splicing 100% cov; 0.077 0 Aquila _Tc5, rve 10_Crp/Mar IM 20 chrysaetos iner- samples canadensis, 13_Cgi/SM w/ SFAI Haliaeetus AR20/Marin leucocephalus, er-2_CM Ciconia boyciana KTIGD3 Phascolarcto < 50 Tigger HTH_Tnp Mariner- 89 %ID KRAB Fusion only Splicing 100% cov; NT NT Phascolarctos s cinereus _Tc5, 4_Ami/MAR 4 samples cinereus HTH INER3_MD w/ SFAI (110219927) KTIGD4 Metatheria 80-150 Tigger HTH_Tnp Mariner- 64 %ID KRAB Fusion only Splicing 100% cov; 0.2155 0 Phascolarctos _Tc5, 6_Crp, 11 3 cinereus HTH Mariner- samples (110199972), 37_SM/Mari w/ SFAI Monodelphis ner-37_SM domestica (103101550), Sarcophilis harrisi KMARD6 Xenopus < 54 Mariner HTH_Tnp Mariner- 32.4 %SI KRAB Parental + fusion Alternative 100% cov; NT NT Xenopus laevis laevis _Tc5, 4_NV M splicing 106 (108718340) HTH samples w/ SFAI KMARD7 Anolis < 150 Mariner HTH_Tnp Mariner- 65 %ID KRAB Fusion only Splicing 100% cov; NT NT Anolis carolinensis _Tc5, 1_Crp/Mari 16 carolinensis HTH ner-29_SM samples (107983704) w/ SFAI PGBD1 Theria 160-180 PiggyBac DDE_1_7 piggyBac- 39 %SIM KRAB, Fusion only Splicing 100% cov; 0.2463 0 All mammals; 2_Hmel SCAN 4 samples 1 ref id: w/ SFAI Microcebus murinus (105858740) KHARBI Toxicofera 170-200 PIF/Harbinger DDE_1_4 Harbinger- 67 %ID KRAB Fusion only Splicing 100% cov; 0.08 0 Pogona D1 2_BF/Harbi 10 vitticeps nger- samples (110070590), 2D_CPB/Ha w/ SFAI Anolis rbinger- carolinensis 7_CPB/Har (103280956), binger- Python 1_Gav bivittatus, Protobothrops mucros

73

KHARBI Python < 90 PIF/Harbinger DDE_1_4 Harbinger- 61 %ID KRAB Fusion only Splicing 100% cov; NT NT Python D2 bivittatus 2E_CPB/Ha 4 samples bivittatus rbinger- w/ SFAI (103049612) 5B_CPB/Ha rbinger- 7_CPB KHARBI Python < 90 PIF/Harbinger DDE_1_4 Harbinger- 62 %ID KRAB Fusion only Splicing 100% cov; NT NT Python D3 bivittatus 6_CPB/Har 3 samples bivittatus binger- w/ SFAI (103055168) 5B_CPB/Ha rbinger- 5D_CPB/Ha rbinger- 2F_CPB KHARBI Archosauria 250-300 PIF/Harbinger DDE_1_4 Harbinger- 70 %ID KRAB, Parental + fusion Alternative 100% cov; 0.0448 0 Alligator D4 4C_CPB/Ha ZF_C2H2 splicing 1 samples 9 missisipiensis rbinger- (x12) w/ SFAI (102577222), 9B_CPB/Ha Alligator rbinger- sinensis, 1C_Crp Apteryx rowi (112973620), other birds KHATD1 Testidunes 180-250 hAT zf-BED hAT- 32 %SIM KRAB (x2) Fusion only Splicing 93% cov; 0.67 0.0004 Chrysemys (some) (x2), 26_CPB/hA 3 samples 79 picta bellii Dimer_T T-34_Lmi w/ SFAI (101952564), np_hAT Chelonia mydas, Malaclemys terrapin terrapin, Terrapene carolina, Platysternon megacephalum, Dermochelys coriacea KTHAPD Xenopus < 57 P/Kolobok? THAP No sig hit NA KRAB, Fusion only Splicing 100% cov; NT NT Xenopus 1 tropicalis PHA03307, 3 samples tropicalis PHA03247 w/ SFAI (101731566) KKOLOD Anura 200-350 Kolobok THAP Kolobok- 31 %SIM KRAB, Fusion only Splicing 100% cov; 0.1078 0 Xenopus 1 4_Aqu ZF_C2H2 70 9 tropicalis (x4) samples (100495066), w/ SFAI Xenopus laevis

74

(108698464), Nanorana parkeri (108797006) KKOLOD Xenopus 57-200 Kolobok THAP Kolobok- 33/45 %S KRAB, Fusion only Splicing 100% cov; 0.1375 5.3290 Xenopus laevis 2 2_XT/Kolob IM ZF_C2H2 33 6 7E-15 (108701152), ok-1_XT (x4) samples Xenopus w/ SFAI tropicalis KTHAPD Testidunes 180-250 P/Kolobok? THAP Kolobok- 46/53 %S KRAB, Fusion only Splicing 100% cov; 0.1522 1.0235 Pelodiscus 2 1_Aqu/P- IM ZF_C2H2 12 9 5E-08 sinensis 11_Lsal (x4) samples (102446752), w/ SFAI Chelonia mydas, Chrysemys picta bellii, Malaclemys terrapin terrapin, Apalone spinifera (?) KTHAPD Reptiles 250-310 P/Kolobok? THAP Kolobok- 34/40 %S KRAB, Fusion only Splicing 100% cov; 0.0702 0 Alligator 3 1_Cte/Kolo IM ZF_C2H2 26 6 missisipiensis bok-1_CS (x4) samples (106737434), w/ SFAI Aptenodytes forsteri (109279250), Crocodylus porosus (109305755), Fulmarus glacialis (104079790), Struthio camelus australis (104143137), Alligator sinensis (106722913), Haliaeetus albicilla (104322907), Chrysemys

75

picta bellii (101936093), Pelodiscus sinensis (102445153), Aquila chrysaetos canadensis (10 5401065), Pygoscelis adeliae (103925982), Chelonia mydas (102931549), Manacus vitellinus (103756388), Geospiza fortis (103756388), Lepidothrix coronata (108508219), Parus major (107199954), Ficedula alibicollis (101816114), Sturnus vulgaris (106862799) KPELD1 Latimeria < 400 P element THAP P- 80 %ID KRAB Parental + fusion Alternative 100% cov; NT NT Latimeria chalumnae 2_Lsal/DNA splicing 5 samples chalumnae 1a_Lch w/ SFAI (106703257) KKOLOD Xenopus 57-200 Kolobok THAP Kolobok- 34/52 %S KRAB, Parental + fusion Alternative 100% cov; 0.0959 0 Xenopus 3 2_XT/Kolob IM ZF_C2H2 splicing 34 9 tropicalis ok-1_XT (x4) samples (100498365), w/ SFAI Xenopus laevis KRBA2 Eutheria 100-150 Ginger rve Ginger2- 66.18 %ID KRAB Fusion only Splicing 100% cov; 0.22 0 All placental 1_LMi 1 sample mammals w/ SFAI POGK Theria 150-180 Tigger BrkDBD, Mariner- 71.15 %ID KRAB Fusion only Splicing 100% cov; 0.03 0 All mammals CENPB, 10_Crp 1 sample rve w/ SFAI

76

ZNF862 Mammalia 180-312 hAT Dimer_T hAT- 30.66/24.9 KRAB Fusion only Splicing 100% cov; 0.09 0 Mammals + np_hAT 1_SK/hAT- 6 %SIM 39 monotremes 6_BF samples w/ SFAI KRABIN Vespertilionid 27-45 Mariner HTH, rve Mlmar1 NA KRAB Parental + fusion Alternative RTPCR, 0.32 0 Vespertillionid ER ae splicing sequencin bats g, RNA- seq, Pro- seq KHATD2 Anolis < 150 hAT ZnF_TTF hAT- 64.78/72.1 DUF4371, Fusion only Splicing 100% cov NT NT Anolis carolinensis , 35_LCh/hA 5/71.08 % KRAB carolinensis Dimer_T T- ID (100558310) np_hAT 16_HM/hAT -6_TC KMUTD1 Toxicofera 170-200 Mutator FLYWCH MuDR- 27.68 %S KRAB, Parental + fusion Alternative 100% cov; 0.1260 0 Thamnophis 5_Lsal IM SCAN splicing 7 samples 6 sirtalis w/ SFAI (106544370), Python bivittatus (103058648), Pogona vitticeps (110088025), Anolis carolinensis (103277990) KTIGD5 Vespertilionid 27-45 Tigger HTH_Tnp Mariner- 66.4 %ID KRAB, Parental + fusion Alternative de-novo NT NT Myotis occultus, ae _Tc5, 1_Crp DUF4404 splicing RNA-seq Myotis HTH, rve lucifugus, Eptesicus fuscus, Myotis velifer HATDG1 Alligatoridae < 90 hAT Zf-Bed, hAT- 87.474/93. Atrophin Fusion only Splicing 100% cov, 0.4733 0.2010 Alligator Dimer_T 3_Gav#DN 392 %ID w/ SFAI in 1 3078 mississipiensis np_hAT A/hAT-Tag1 19 (102561186), samples Alligator sinensis HATDG2 Serpentes 100-180 hAT ZnF_BE hAT- 69.149/66. SCAN Fusion only Splicing 100% cov, 0.3921 5.4439 Python D 11_AMi#DN 892 %ID w/ SFAI in 8E-12 bivittatus A/hAT-Ac 4 samples (103053651), Thamnophis sirtalis (106542826),

77

Notechis scutatus (113425109), Pseudonaja textilis (113450216) HATDG3 Pogona < 150 hAT ZnF_BE hAT-53_HM 0.4164 % SCAN Fusion only Splicing 100% cov, NT NT Pogona vitticeps D SIM w/ SFAI in vitticeps 11 (110090344) samples HATDG4 Muroidea ~65 hAT Dimer_T SPIN_Og# 85.897 %I CHCH, Parental + fusion Alternative 100% cov, 0.1913 0 Rattus np_hAT DNA/hAT- D DUF4371 splicing 1 sample 9 norvegicus Charlie w/ SFAI (288622), Mus musculus (71970), Mus pahari (110312903), Mus caroli (), Meriones unguiculatus, Cricetulus griseus, Mesocricetus auratus, Peromyscus maniculatus bairdii, Microtus ochrogaster, Peromyscus maniculatus) HATDG5 Durocryptodir 100-160 hAT Dimer_T hAT- 0.3762 % COG5048, Fusion only Splicing 100% cov, 0.4344 1.1225 Chrysemys a np_hAT 26_CPB SIM COG2888, w/ SFAI in 3 9E-05 picta belli ZF_C2H2, 19 (101943458), zf-H2C2_2, samples Terrapene DUF45 mexicana triunguis, Chelonia mydas HATDG6 Tetrapoda 350-410 hAT zf-BED, hAT- 0.2611 % SET Fusion only Splicing 100% cov, 0.1401 0 Xenopus (lost? In all Dimer_T 33_LCh SIM w/ SFAI in 6 tropicalis but Anura + np_hAT 63 (100486384), some reptiles) samples Python bivittatus (103053198),

78

Chrysemys picta belli (101941832), Thamnophis sirtalis (106542524), Protobothrops mucrosquamatu s (107282398), Xenopus laevis (108696725), Nanorana parkeri (108795475), Pogona vitticeps (110086948), Notechis scutatus (113413376), Pseudonaja textilis (113442840), Anolis carolinensis(10 3279277) hAT- Archosauria 250-300 hAT Dimer_T hAT-1_SK 0.25 %SI CH_2, NK Parental + fusion Alternative 100% cov, 0.3310 2.3542 Pelodiscus SPEF2 np_hAT, M splicing w/ SFAI in 9 1E-07 sinensis ZnF_TTF 8 samples (102447905), , Anser DUF4371 cygnoides domesticus (106036369), Alligator missisipiensis (102559565), Oryctolagus cuniculus (100353102), Apaloderma vittattum (104279174), Chrysemys

79

picta belllii (101939198) CSB- Primates 70-90 PiggyBac DDE_Tn piggyBac- 0.4334 % DEXDc, Parental + fusion Alternative 100% cov, 0.1164 0 Primates; ref id: PGBD3 p_1_7 2_Hmel SIM HELICc, splicing w/ SFAI in Cercocebus SNF2_N 14 atys samples (105591447) ZMYM2- Lemuriformes 60-70 PiggyBac DDE_Tn piggyBac- 0.3969 % zf-FCS Parental + fusion Alternative 100% cov, 0.1467 2.4944 Microcebus PGBD p_1_7 3_SM SIM splicing w/ SFAI in 5 9E-11 murinus 11 (105878851), samples Propithecus coquereli HARBID- Crocodilia 90-250 Harbinger DDE_Tn Harbinger- 0.3709 % FAM222A Fusion only Splicing 100% cov, 0.2271 0.0075 Alligator G1 p_4 2_CGi SIM w/ SFAI in 12626 sinensis(10236 11 7928), Alligator samples mississippiensis (102557726), Gavialis gangeticus (109302758), Crocodylus porosus (109306436) HARBID- Squamata 200-270 Harbinger DDE_Tn Harbinger- 0.4451 % SCAN Fusion only Splicing 100% cov, 0.0363 0 Python G2 p_1, 3_LCh SIM w/ SFAI in 6 bivittatus DDE_Tn 31 (103053249), p_4 samples Anolis carolinensis (103053249), Thamnophis sirtalis (106545642), Pogona vitticeps (110091519), Protobothrops mucrosquamatu s, Notechis scutatus, Pseudonaja textilis, Gekko japonicus

80

HARBID- Xenopus < 57 Harbinger DDE_Tn Harbinger- 92.432 %I IG_like, Ig Fusion only Splicing 100% cov, NT NT Xenopus G3 tropicalis p_4 1_XT#DNA/ D w/ SFAI in tropicalis PIF- 2 samples (100493151) Harbinger THAPD- Passeriforme 50-60 P THAP P-2_Lsal 0.4 %SIM ZF_C2H2 Fusion only Splicing 100% cov, 0.2621 2.6645 Ficedula G1 s element/Kolo w/ SFAI in 6 4E-15 albicollis bok 5 samples (101816114), Corvus cornix cornix (109144009), Parus major (107199954), Cyanistes caeruleus (111925580), Manacus vitellinus (103756388), Empidonax traillii (114072132), Corapipo altera (113959500), Lepidothrix coronata (108508219), Sturnus vulgaris, Pseudopodoces humilis, Serinus canaria, Corvus brachyrhynchos , Lonchura striata domestica, Taeniopysia gutatta, Geospiza fortis, Neopelma chrysocephalu m, Acanthisitta chloris

81

THAP2- Rodentia 73-82 P THAP P-25_Lsal 0.4432 % DUF92 Parental + fusion Alternative 100% cov, 0.0001 2.5786 Cricetulus TMEM19 element/Kolo SIM splicing w/ SFAI in 6E-07 griseus bok 7 samples (100760567), Peromyscus maniculatus bairdii (102927959), Mesocricetus auratus (101830095), Cavia porcellus (100732602) THAPD- Anura 200-350 P THAP ISL2EU- 0.3161 % DDE_Tnp_4, Fusion only Splicing 100% cov, 0.0493 5.3290 Xenopus laevis G2 element/Kolo 7_Hma SIM HTH_Tnp_4 w/ SFAI in 9 7E-14 (108699800), bok 101 Xenopus samples tropicalis (LOC10049440 3), Nanorana parkeri (108798727), Scutiger boulengeri THAPD- Squamata 200-270 P THAP P-2_LSal 0.3562 % COG5048, Fusion only Splicing 100% cov, 0.0451 0 Python G3 element/Kolo SIM SCAN, w/ SFAI in 8 bivitattus bok ZF_C2H2, 28 (103049051), zf-H2C2_2 samples Notechis scutatus (113424437), Pseudonaja textilis (113450944), Podarcis muralis, Pogona vitticeps, Gekko japonicus, Paroedura picta, Anolis carolinensis TIGD-G1 Cetartiodactyl 65-90 Tigger HTH_Tnp Mariner- 0.5458 % SCAN Fusion only Splicing 100% cov, 0.1978 0 Bubalus bubalis a _Tc5, 1_Crp SIM w/ SFAI in 4 (102398530), HTH, rve 24 Bison bison samples bison (105001376),

82

Ovis aries (101121964), Bos indicus (109572837), Odocoileus virgianus (110140937), Neophocaena asiaeorientalis (112396907) TIGD-G2 Otolemur <60 Tigger HTH_Tnp Mariner- 95.626 %I SCAN Parental + fusion Alternative 100% cov, NT NT Otolemur garnettii _Tc5, rve 3_XT#DNA/ D splicing w/ SFAI in garnetti TcMar-Tc2 1 sample (100942720) TIGD-G3 Anolis < 150 Tigger HTH_Tnp SMAR13#D 65.887 %I SCAN Fusion only Splicing 100% cov, NT NT Anolis carolinensis _Tc5, NA/TcMar- D w/ SFAI in carolinensis HTH, rve Tigger 5 samples (103278108) TIGD- Hystricomorp 45-70 Tigger HTH_Tnp Mariner- 66.214 %I SCAN, Parental + fusion Alternative 100% cov, 0.2637 2.2511 Fukomys G4- ha _Tc5, 2_PM#DNA D Myb_DNA- splicing w/ SFAI in 4 1E-08 damarensis Zscan29 HTH /TcMar- bind_4 11 (104862669), Tigger samples Cavia porcellus (100717023), Chinchilla lanigera (102028621), Heterocephalus glaber (101720736), Octodon degus TIGD-G5 Anolis < 150 Tigger HTH_Tnp Mariner- 92.96/90.1 SCAN Parental + fusion Alternative 100% cov, NT NT Anolis carolinensis _Tc5, 6_XT#DNA/ 04 %ID splicing w/ SFAI in carolinensis BrkDBD TcMar-Tc2 4 samples (103278093) TIGD-G6 Alligatoridae < 90 Tigger HTH_Tnp Mariner- 72.778/82. SCAN Parental + fusion Alternative 100% cov, 0.0001 0.0011 Alligator _Tc5, 10_Crp#DN 609 %ID splicing w/ SFAI in 62703 sinensis BrkDBD A/TcMar- 6 samples (102368406), Tc2 Alligator mississippiensis (102564554) TIGD- Perissodactyl 55-90 Tigger HTH_Tnp Tigger7#DN 93.991 %I PTZ00093, Parental + fusion Alternative 100% cov, 0.3623 0.0035 Equus caballus G7- a _Tc5 A/TcMar- D NDK splicing w/ SFAI in 1 65358 (100533966), NME1 Tigger 124 Equus samples przewalskii (103549548), Equus asinus

83

(106826468), Ceratotherium simum simum MARD- Lagomorphs 50-80 Mariner HTH_32 HSMAR2#D 90.189 %I Pumilio Parental + fusion Alternative 100% cov, 0.2723 1.7019 Oryctolagus G1- NA/TcMar- D splicing w/ SFAI in 2 9E-05 cuniculus PUM2 Mariner 21 (100358505), samples Ochotona princeps TIGD-G8 Euastraliadel 60-80 Tigger HTH, Mariner- 68.113/69. SCAN Fusion only Splicing 100% cov, 0.0675 0 Phascolarctos phia HTH_TN 7_Croc#DN 697 %ID w/ SFAI in cinereus P5, rve A/TcMar- 2 samples (110197848), Tigger Sarcophilus harrisii (100928377), Vombatus ursinus MARD- Laurasiatheri 90-100 Mariner HTH_29, Mariner- 69.955 %I UPF0547 Parental + fusion Alternative 100% cov, 0.3060 6.4584 Canis lupus G2- a HTH 13_AEc#D D splicing w/ SFAI in 9 6E-06 familiaris C16orf87 NA/TcMar- 72 (102151457), Mariner samples Mustela putorius furo (101674374), Leptonychotes weddellii (102743244), Odobenus rosmarus divergens (101370422), Neomonachus schauinslandi (110593610), Balaenoptera acutorostrata scammoni (103005581), Rousettus aegypticus (107512266), Camelus dromedarius (105091176), others

84

TIGD-G9 Anolis < 150 Tigger HTH_Tnp SMAR13#D 65.887 %I SCAN Fusion only Splicing 100% cov, NT NT Anolis carolinensis _Tc5, NA/TcMar- D w/ SFAI in carolinensis HTH, rve Tigger 5 samples (103278108) TIGD- Python < 90 Tigger HTH_Tnp MarsTigger 82.141 %I SCAN Parental + fusion Alternative 100% cov, NT NT Python G10 bivittatus _Tc5, 6#DNA/TcM D splicing w/ SFAI in bivittatus HTH, rve ar-Tigger 23 (103053031) samples TIGD- Lacertidae < 170 Tigger HTH_Tnp Mariner- 66.627/71. SCAN, Parental + fusion Alternative 100% cov, 0.2094 0 Podarcis G11 _Tc5, 1_MMa#DN 429 %ID COG5048 splicing w/ SFAI in 5 muralis HTH, rve A/TcMar- 7 samples (114592144), Tigger Lacerta viridis, Lacerta bilineata MARD- Gallus gallus < 55 Mariner HTH_32 Mariner1_G 91.922 %I STKc_VRK1 Parental + fusion Alternative 100% cov, NT NT Gallus gallus G3-VRK1 G#DNA/Tc D splicing w/ SFAI in (423443) Mar-Mariner 15 samples MARD- Eutheria 100-150 Mariner HTH [low SMAR25B 0.3485 % CutA1 Parental + fusion Alternative 98% cov, 1.1564 0.7534 Sus scrofa G4- E value in SIM splicing w/ SFAI in 3 99271 (102164650), CUTA conserve 146 Camelus d domain samples bactrianus search] (105065270), Camelus ferus (102511268), Myotis brandtii (102240272), Vicugna pagos (102545300), others MARD- Artiodactyla < 54 Mariner HTH_32 Mar1a_Tars 85.258 %I SAP Parental + fusion Alternative 100% cov, NT NT Bubalus bubalis G5-DEK i#DNA/TcM D splicing w/ SFAI in (102406335), ar-Mariner 144 Bos indicus samples (109576980), Box taurus (540945) SETMAR Anthropoid 40-60 Mariner HTH, Hsmar1 94 %ID SET, Pre-Set Parental + fusion Alternative 100% 0.2369 1.88E- Anthropoid primates Transpos splicing coverage 7 08 primates; ref id ase_1 w/ SFAI in Homo sapiens 1 sample (6419) POGZ Vertebrates 615-680 Tigger CENPB, Mariner- 27% SIM ZF_C2H2, Fusion only Splicing 100% 0.0420 0 Vertebrates; ref rve 21_LCh Med15, coverage 3 id Homo AccB w/ SFAI in sapiens (23126)

85

14 samples PAX1 Vertebrates 615-680 Mariner PAX Mariner- 36% SIM - Fusion only Gene 100% 0.0121 0 Vertebrates; ref 6_Adi duplication coverage 4 id Homo w/ SFAI in sapiens (5075) 1 sample PAX2 Metazoans 950-1100 Mariner PAX Mariner- 34% SIM Homeodoma Fusion only Splicing 100% 0.0001 0 Metazoans; ref 6_Adi in, Pax2_C coverage id Homo w/ SFAI in sapiens (5076) 1 sample PAX3 Vertebrates 615-680 Mariner PAX Mariner- 31% SIM Homeobox, Fusion only Gene 100% 0.0065 0 Vertebrates; ref 6_Adi Pax7, duplication coverage 7 id Homo MFAP1 w/ SFAI in sapiens (5077) 1 sample PAX4 Tetrapods 350-400 Mariner PAX Mariner- 31% SIM Homeobox Fusion only Gene 100% 0.1385 0 Vertebrates; ref 6_Adi duplication coverage 7 id Homo w/ SFAI in sapiens (5078) 1 sample PAX5 Vertebrates 615-680 Mariner PAX Mariner- 35% SIM Pax2_C Fusion only Gene 100% 0.0033 0 Vertebrates; ref 6_Adi duplication coverage 2 id Homo w/ SFAI in sapiens (5079) 1 sample PAX6 Metazoans 950-1100 Mariner PAX Mariner- 36% SIM Homeobox Fusion only Splicing 100% 0.0033 0 Metazoans; ref 6_Adi coverage id Homo w/ SFAI in sapiens (5080) 1 sample PAX7 Vertebrates 615-680 Mariner PAX Mariner- 37% SIM Pax7, Fusion only Gene mixed/part 0.0020 0 Vertebrates; ref 6_Adi Homeobox duplication ial sample 4 id Homo support sapiens (5081) PAX8 Vertebrates 615-680 Mariner PAX Mariner- 38% SIM Pax2_C Fusion only Gene mixed/part 0.0032 0 Vertebrates; ref 6_Adi duplication ial sample 6 id Homo support sapiens (7849) PAX9 Vertebrates 615-680 Mariner PAX Mariner- 36% SIM - Fusion only Gene 100% 0.0130 0 Vertebrates; ref 6_Adi duplication coverage 3 id Homo w/ SFAI in sapiens (5083) 1 sample GTF2IRD Eutheria 100-150 hAT - Charlie8 75 %SIM GTF2I (x2) Fusion only Splicing 100% 0.2986 7.60E- Eutheria; ref id 2 coverage 4 10 Homo sapiens w/ SFAI in (84163) 1 sample

86

GTF2IRD Primates 70-90 hAT - Charlie8 75 %SIM GTF2I (x2) Fusion only Gene 100% 0.2492 3.75E- Primates; ref id 2B duplication coverage 3 08 Homo sapiens w/ SFAI in (389524) 1 sample PRKRIR Vertebrates 615-680 hAT THAP, hAT-5_NV 38 %SIM DUF4371 Fusion only Splicing mixed/part NT NT Vertebrates; ref Dimer_T ial sample id Homo np_hAT support sapiens (5612) ZBED9 Tetrapods 350-400 hAT Dimer_T Ginger2- 50/87% SCAN Fusion only Splicing 100% 0.3014 0 Tetrapods; ref id np_hAT, 1_LMi/CHA SIM coverage 2 Homo sapiens rve, RLIE10 w/ SFAI in (114821) DUF4371 1 sample , CENP- F_N ZMYM6 Eutheria 100-150 hAT zf-BED hAT-2_XT 42 %SIM TRASH Fusion only Splicing mixed/part 0.1328 0 Eutheria; ref id ial sample 6 Homo sapiens support (9204) ZMYM1 Eutheria 100-150 hAT ZnF_TTF hAT-20_SM 28 %SIM zf-FCS, Fusion only Splicing 100% 0.2075 0 Eutheria; ref id , TRASH coverage Homo sapiens Dimer_T w/ SFAI in (79830) np_hAT, 1 sample DUF4371 ) ZNF618 Euteleostomi 430-470 hAT Dimer_T HERMES 20 %SIM ZF_C2H2, Fusion only Splicing 100% 0.3435 0 Euteleostomi; np_hAT Metaviral_G coverage 4 ref id Homo w/ SFAI in sapiens 1 sample (114991) KIAA158 Eutheria 100-150 hAT Dimer_T hAT-38_HM 82 %SIM - Fusion only Splicing 100% 0.0325 0 Eutheria; ref id 6 np_hAT coverage 8 Homo sapiens w/ SFAI in (57691) 1 sample SPEF2 Euteleostomi 430-470 hAT - - - CH_2, NK Fusion only Splicing 100% 0.2030 0 Euteleostomi; coverage 4 ref id Homo w/ SFAI in sapiens (79925) 1 sample ZMYM5 Eutheria 100-150 hAT ZnF_TTF hAT-55_HM 40 %SIM zf-FCS Fusion only Splicing 100% 0.4056 2.61E- Eutheria; ref id coverage 4 08 Homo sapiens w/ SFAI in (9205) 1 sample HATDG7 Theria 150-180 hAT - Charlie8hA 75 %SIM - Fusion only Splicing 100% 0.1481 8.9150 Theria; ref id -MED25 THomo coverage 3 9E-14 Homo sapiens w/ SFAI in (81857) 1 sample

87

THAP1 Gnathostoma 470-615 P element THAP P-13_HM 33 %SIM - Fusion only Splicing 100% 0.0171 0 Gnathostomata; ta coverage 5 ref id Homo w/ SFAI in sapiens (55145) 1 sample THAP2 Mammalia 180-312 P element THAP P-28_HM 47 %SIM - Fusion only Splicing 100% 0.0620 8.40E- Mammalia; ref coverage 2 12 id Homo w/ SFAI in sapiens (83591) 1 sample THAP3 Mammalia 180-312 P element THAP P-31_HM 39 %SIM - Fusion only Splicing 100% 0.0569 0 Mammalia; ref coverage 6 id Homo w/ SFAI in sapiens (90326) 1 sample THAP4 Vertebrates 615-680 P element THAP P-13_HM 45 %SIM nitrobindin, Fusion only Splicing 100% 0.1780 0 Vertebrates; ref DUF1794, coverage 8 id Homo SRP68-RBD w/ SFAI in sapiens (51078) 1 sample THAP5 Euteleostomi 430-470 P element THAP P-13_HM 32 %SIM RILP-like Fusion only Splicing 100% 0.1133 0 Euteleostomi; coverage 4 ref id Homo w/ SFAI in sapiens 1 sample (168451) THAP6 Mammalia 180-312 P element THAP P-13_HM 29 %SIM - Fusion only Splicing 100% 0.0652 0 Mammalia; ref coverage 3 id Homo w/ SFAI in sapiens 1 sample (152815) THAP7 Vertebrates 615-680 ISL2EU THAP ISL2EU- 37 %SIM - Fusion only Splicing 100% 0.0653 0 Vertebrates; ref 4_HM coverage 6 id Homo w/ SFAI in sapiens (80764) 1 sample THAP8 Amniotes 310-350 P element THAP P-29_HM 34 %SIM - Fusion only Splicing 100% 0.0322 0 Amniotes; ref id coverage 1 Homo sapiens w/ SFAI in (199745) 1 sample THAP10 Eutheria 100-150 P element THAP P1_Cis 28 %SIM - Fusion only Splicing 100% 0.2907 5.46E- Mammalia; ref coverage 12 id Homo w/ SFAI in sapiens (56906) 1 sample THAP11 Gnathostoma 470-615 ISL2EU THAP ISL2EU- 32 %SIM - Fusion only Fusion intronless 0.0685 0 Gnathostomata; ta 3_HM 1 ref id Homo sapiens (57215) TIGD- Anthropoid 40-60 Tigger - Tigger3b 77 %SIM S1, TPR, Fusion only Splicing 100% 0.5865 0.4282 Anthropoid G12- primates TRP_1, coverage 4 83384 primates; ref id TTC14 TPR_11

88

w/ SFAI in Homo sapiens 1 sample (151613) ZBED3 Mammalia 180-312 hAT ZnF_BE hAT-52_HM 26 %SIM - Fusion only Splicing 100% 0.1016 0 Mammalia; ref D coverage 3 id Homo w/ SFAI in sapiens (84327) 1 sample ZBED1 Vertebrates 615-680 hAT zf-BED, hAT-10_XT 38 %SIM - Fusion only Splicing 100% 0.0308 0 Mammalia; ref Dimer_T coverage 7 id Homo np_hAT, w/ SFAI in sapiens (9189) DUF4413 1 sample NT = not-tested

89

2.3.2 Transposase capture occurs through alternative splicing

To illuminate the mechanism by which transposase domains are captured to form new chimeric proteins, we examined in more detail the gene structure of HTFs. In every case the transposase-derived domains are encoded by exons distinct from the host domains, suggesting that transposase capture occurred via splicing events. To further delineate the process, we examined the birth of a recently-evolved HTF in vespertilionid bats, dubbed KRABINER .

KRABINER is predicted to encode a 447-amino acid protein consisting of a full- length mariner DNA transposase fused to a N-terminal Krüppel-associated box

(KRAB) (Fig. 2.2A). Using a combination of comparative genomics, PCR, and

RT-PCR, we inferred that KRABINER originated in the common ancestor of all nine vespertilionid species examined (but after the split from miniopterids, ~ 45 my ago) through the following steps : 1) mariner insertion into the last intron of

ZNF112 , a gene present in all eutherian mammals, 2) alternative splicing to the upstream exons of ZNF112 using a splice acceptor site pre-existing in the ancestral mariner transposon, and 3) a unique single nucleotide deletion in the transposase coding sequence which generated an in-frame fusion (Fig. 2.2B;

Fig. 2.3). This sequence of events is reminiscent of the process that gave birth to two other HTFs, SETMAR (Cordaux, et al. 2006) and PGBD3-CSB (Gray, et al. 2012), suggesting that DNA transposons possess features that facilitate their capture via alternative splicing.

Fig. 2.2: Transposase capture by alternative splicing. A) ZNF112/KRABINER locus in vespertilionid bats. B) Steps required for KRABINER birth. C) Age of fusion genes with (green) or without (gray) evidence for alternative splicing. Fusion age (bottom) determined by the midpoint of age range for each fusion as described in Table S1; top shows qualitative illustration of host transcript loss over time. D) Summary of transposon splice site usage for 10 HTFs. Red denotes nucleotides in the splice site that diverge from the transposon consensus sequence. SA=splice acceptor, LCA=last common ancestor. *** p < 0.001 2-sample Wilcoxon Test.

91

Fig 2.3 KRABINER evolved in the vespertilionid bat ancestor . A) PCR genotyping of the mariner insertion at the ZNF112 locus in 9 vespertilionid bats and one outgroup (MM). B) RTPCR validation of KRABINER (left) and ZNF112 (right) expression in bat cell lines. C) Partial alignment of the ZNF112 mariner insertion across 7 vespertilionid bat species compared to the consensus sequence. D) Amino acid alignment of KRABINER’s mariner transposase sequence, including DNA binding domains (HTH) and catalytic domain (DD34D), across 7 vespertilionid bat species. E) Summary of the events required for KRABINER evolution.

92

93

To assess whether this mechanism is generalizable, we surveyed all

HTF gene models for evidence of alternative splicing. For most of the young

HTFs (18 out of 31 HTFs <100 My old), we found unequivocal evidence for the co-existence of both fusion and parental gene transcripts, but as HTFs became older only the fusion transcript was generally detected (Fig. 2.2C; Table 2.3).

These findings suggest that most HTFs are born as alternatively spliced variant of an ancestral gene, but over time the HTF transcript may become the primary or sole transcript for that gene. Thus, alternative splicing is a prominent mechanism for the assimilation of transposase domains by the host proteome.

The splice site facilitating the capture of KRABINER’s mariner transposase was provided by the ancestral mariner transposon. We therefore wanted to know if this was a general feature of transposase capture. To do this, we selected nine additional recently emerged HTFs whose cognate transposons were still detectable in the genome to generate a majority-rule consensus sequence for each transposon family, which serves as a proxy for the ancestral transposon (Table 2.3). We then compared the coordinates of the annotated splice site to their location in the gene body to determine if they were contained in transposon and, if so, whether the splice site sequence was present in the consensus sequence. We found that the splice site responsible for 7/9 (77.8%) HTF genes was both contributed by the TE and present in the consensus sequence, and the remaining two (22.2%) differ only by 1bp in the consensus (Fig 2.2D), suggesting DNA transposons are not only capable of

94

providing the necessary protein domains but also the splice sites required for exon-shuffling.

2.3.3 Fusion of transposase DBDs to KRAB domains is the most frequent

HTF combination

Fig 2.4: Biochemical activities of host-transposase fusion proteins. A) A variety of host domains are fused to transposases. X-axis specifies the number of HTF genes a given domain is present in; some fusions contain more than one domain. Inset shows representative domain architecture schematic for select host-transposase fusions. B) KRAB-transposase fusions repress gene expression in a sequence-specific manner. C) KRABINER requires both its KRAB and DBD domains to repress gene expression. Y axes in B-C boxplots correspond to mean luminescence relative to the KTF(-) state for each comparison. KTF=KRAB-transposase fusion; TIR=terminal inverted repeat; filled triangle = consensus TIR, interrupted triangle = scrambled TIR; +/- = presence/absence of respectively; *** adj. p < 0.001 ; 2-sample Wilcoxon Test, Bonferroni correction.

95

To investigate the cellular function of HTF genes, we first characterized their domain architecture and composition (Fig. 2.4A, Fig. 2.5). Amongst transposon-derived domains, DNA-binding domains predominate (76.5%; Fig.

2.5), though some HTFs contain catalytic or accessory transposase domains

(Fig. 2.5). Among host domains (not typically found in transposases), we identified 48 distinct conserved domains, most of which (75%) were involved in a single fusion event (Fig. 2.4A). Several of the host domains are predicted to be involved in transcriptional regulation, such as KRAB, SET, and SCAN domains (Bruno, et al. 2019; Herz, et al. 2013; Edelstein and Collins, 2005). By far the host domain most frequently fused to transposase was the KRAB domain, which we inferred to have been involved in 30 independent fusion events across the phylogeny, accounting for approximately one third of all

HTFs. KRAB domains are abundant in tetrapod genomes and most commonly found in KRAB-Zinc Finger proteins (KRAB-ZFPs), an exceptionally diverse family of transcription factors (>200 genes in most tetrapod genomes; 487 in humans) (Imbeault, et al. 2017). However, the prevalence of KRAB-transposase fusions is unlikely to be solely explained by the genomic abundance of KRAB-

ZFP genes because (i) other equally expansive gene families are not involved in HTF (e.g. olfactory receptors ~350 genes in humans) (Malnic, et al. 2004) and (ii) we still find three independent KRAB-transposase fusions in bird genomes despite their paucity in KRAB-ZFPs (~8 per genome) (Imbeault, et al.

96

2017). These observations suggest that the combination of KRAB and transposase has been frequently favored by natural selection.

97

Fig 2.5: HTF domain structure is varied . Hexagons represent host domains, colored by identity. TE domains are colored by the superfamily that provided them. Some gene architectures are found in multiple fusions.

98

Fig. 2.6: Sequences of TIRs and mutants used in KRAB-transposase luciferase assays. A) Consensus and scrambled TIR sequences used for KRAB-transposase fusion assays. B) Alignments of KRABINER’s DNA binding domains to the closely related Mos1 transposase DNA binding domains (top) and alignment of KRABINER’s KRAB domain to the Pfam consensus (bottom). Sites shown to be critical domain function are marked with an *; residues mutated in the MUTDBD and MUTKRAB mutant constructs are highlighted in red. HTH=helix-turn-helix C) Western blot showing protein expression of myc- tagged KRABINER or GAPDH in HEK293T cells transiently transfected with the specified KRABINER variant or non-transfected (NT) control. Tet=tetracycline induction

2.3.4 KRAB-transposase fusions act as sequence-specific repressors of gene expression

Given the prevalence of KRAB-transposase fusions and the canonical function of KRAB domains in establishing silent chromatin when tethered to

DNA (Bruno, et al. 2019), we next used these genes as a paradigm to test the hypothesis that transposase fusion creates novel sequence-specific transcriptional regulators. We selected four recently emerged KRAB- transposase fusions for which we had previously generated consensus sequences of their cognate transposon family, which enabled us to identify the predicted binding site, the terminal inverted repeats (TIRs), for each fusion’s transposase DBD (Fig. 2.6). We cloned the consensus sequence of each TIR or a scrambled version upstream of a firefly luciferase reporter and measured luciferase expression in HEK293T cells in the presence or absence of a vector expressing the cognate HTF protein. Each KRAB-transposase fusion protein strongly repressed luciferase expression in the presence of its cognate intact terminal inverted repeat but not the scrambled sequence, indicating that each fusion can repress gene expression in a sequence-specific manner (Fig. 2.4B).

101

Fig 2.7: KTIGD3 and KRABINER regulate gene expression in a KAP1- independent manner. A: Western blot validation of KAP1 KO cell lines. B: Boxplot summarizing luciferase assay in HEK293T cells, WT or KAP1 KO, for all four tested KRAB-transposase fusions. TIR=Terminal inverted repeat; Triangle=consensus TIR. *** adj. p < 0.001; 2 sample Wilcoxon test, Bonferroni correction. N=15 per condition.

To test whether KRAB-transposase fusion repression is dependent on

KAP1 (TRIM28), a transcriptional corepressor often recruited by the KRAB domain (Bruno, et al. 2019), we repeated the reporter assays in HEK293T cells knockout (KO) for KAP1 (Tie, et al. 2018). The results (Fig 2.7) show that repression by KMARD1 and KTIGD1 is dependent on KAP1, whereas

KRABINER and KTIGD3 are only partially dependent on KAP1. To further dissect the requirement of individual domains, we generated two mutant versions of KRABINER by altering residues predicted to (i) compromise DNA- binding activity (mutDBD) or (ii) the function of the KRAB domain (mutKRAB).

To generate the DBD mutant, we leveraged the close similarity of KRABINER’s

102

mariner transposase to that of Mos1 , a well-characterized transposase from

Drosophila (Ray, et al. 2007). Previous studies demonstrated that a single point mutation in the first helix-turn-helix motif was sufficient to abolish Mos1 binding to its TIRs (Zhang, et al. 2001). We mutated the homologous site in

KRABINER’s DBD, as well as three additional residues shown to directly contact TIR DNA in the Mos1 crystal structure (Richardson, et al. 2009) (Fig.

2.6B). For the KRAB mutant, we introduced several point mutations of conserved residues previously identified as critical for KRAB-mediated repression (Margolin, et al. 1994; Witzgall, et al. 1994; Friedman, et al. 1996;

Murphy, et al. 2016). While the mutDBD and mutKRAB proteins were expressed at comparable levels as wild-type (WT) KRABINER (Fig. 2.6C), both lost their ability to repress reporter gene expression (Fig. 2.4C). Together the results of these reporter assays support the hypothesis that KRAB-transposase fusions are modular proteins functioning as sequence-specific transcriptional repressors.

2.3.5 KRABINER regulates transcription in bat cells

103

Fig 2.8: KRABINER regulates transcription of genes and TREs in bat cells. A) Strategy to generate KRABINER KO and rescue lines. TRE=tet responsive element; CMV=cytomegalovirus. B-C) Summary of transcriptional changes of genes and TREs, respectively, upon loss and restoration of KRABINER. KRABINER regulated genes (up or down) change reciprocally between KO vs WT and WT KRABINER rescue vs KO comparisons. p values calculated via a right-tailed hypergeometric test. DE 1 condition refers to differential transcription in either the KO vs WT or WT KRABINER vs KO comparison. Non-specific refers to a gene rescued by WT KRABINER and one or both mutDBD and mutKRAB variants. Unchanged refers to genes/TREs with adj. p > 0.05.

104

Fig 2.9: KRABINER KO cell line validation. A) PCR genotyping of KO clone, with estimated product sizes for the WT (top) and KO allele (bottom). B) RT-PCR assaying expression of full-length KRABINER and ZNF112 mRNAs in WT and KO cells. C) The KO clone is homozygous for a single 1807bp deletion (top) relative to the Myoluc2.0 reference allele (bottom).

To further test whether transposase capture gives birth to transcriptional regulators, we examined the ability of KRABINER to modulate gene expression in embryonic fibroblasts of the bat Myotis velifer where the gene is endogenously expressed (Fig 2.3). We used the CRISPR-Cas9 system to engineer a KRABINER knockout (KO) cell line with a pair of gRNAs designed to mediate precise deletion of the mariner transposon from the ZNF112 locus, leaving the parental gene intact (Fig. 2.8A; Fig. 2.9). We then used a piggyBac vector to deliver transgenes into the KO cell line to establish independent clonal lines reintroducing wild-type KRABINER (WT, n=4 cell lines), or the predicted

DNA-binding mutant (mutDBD, n=3), or the predicted KRAB mutant (mutKRAB,

105

n=3) (Fig. 2.8A; Fig. 2.10). Each transgene was cloned under the control of a tetracycline-inducible promoter and contained a C-terminal myc tag to monitor protein expression (Fig. 2.8A; Fig. 2.10). The non-induced condition showed leaky expression more closely recapitulating the level of WT KRABINER transcription (hereafter termed “rescue”, R), while transgene induction resulted in KRABINER over-expression (OE) relative to the parental cell line (Fig. 2.11).

106

Fig 2.10: KRABINER rescue cell line validation. A) Schematic diagram of the rescue transgene, with expected product sizes for PCR and RTPCR primers. B) PCR validation of transgene insertion in rescue lines. NT=no template; pDNA=PiggyBac transgene plasmid. C) RTPCR validation of transgene expression in WT RESCUE cell lines. NC=KO cDNA; PC=PiggyBac transgene plasmid. D) Western blot validating transgene protein expression across rescue lines, with beta-actin as a loading control. E) Localization of wild-type KRABINER in clonal rescue lines and DNA-binding mutant KRABINER in mixed populations following doxycycline induction. 100X oil immersion.

107

To investigate whether KRABINER modulates transcription, we profiled

KRABINER KO and WT cells with Precision Run-On followed by sequencing

(PRO-Seq), a technique that provides a sensitive measurement of nascent transcription throughout the genome, including genes bodies and transcribed regulatory elements (TREs) such as promoters and enhancers (Kwak, et al.

2013). We identified 2,644 genes differentially transcribed between WT and KO cells (Fig. 2.8B; Fig. 2.12; 1295 upregulated in KO (UP), 1349 downregulated

(DOWN), DESeq2; adj. p < 0.05) , indicating that KRABINER has wide-ranging effects on gene transcription. Of those 2,644 genes, 121 genes (43 UP, 78

DOWN) had their transcription level consistently restored in WT transgenic lines but not in any of the mutant transgenic lines, suggesting that proper transcription of these genes in bat cells requires both the DNA-binding and KRAB activity of

KRABINER (Fig. 2.8B). We also identified 3,472 differentially transcribed TREs

(identified using dREG; Wang, et al. 2019) following loss of KRABINER, of which 99 were restored exclusively in the WT lines (33 UP, 66 DOWN; Fig.

2.8C). A subset of these TREs are associated with restored gene body transcription (18% UP, 12% DOWN), while others are distal (>100 kb) to genes

(18% UP, 33% DOWN) or associated with genes bodies that are not differentially transcribed (64% UP, 55% DOWN). While the results of our reporter assays indicate that KRABINER can act as a strong repressor, our functional analyses in cultured cells suggest that the protein exerts a range of transcriptional modulation on the bat genome, with both repressive and activating contribution to gene expression.

108

Fig 2.11 : PRO-Seq QC metrics for WT, KO, and rescue cell lines. A) Non- induced KRABINER rescue lines (R) express wild-type levels of KRABINER, while induction results in over-expression (OE). ZNF112 is unchanged across all conditions. DESeq2 library normalized PRO-seq counts; *** adj. p < 0.001 One-way ANOVA with Tukey HSD correction. B-C) PCA plots generated from rlog normalized counts for genic (B) and TRE (C) transcription respectively. D) Representative pairwise replicate correlation scatterplots based on library-size normalized PRO-seq counts for gene bodies for rescue (R) samples. Density refers to the number of genes within a given hex bin. Spearman R is comparable across OE samples.

109

In addition to transcriptional changes unique to the KWT transgenic lines, there were several genes and TREs that were rescued by the WT transgene and either mutDBD or mutKRAB transgenes (Fig. 2.12). Thus, while at some loci KRABINER’s regulatory activity appears to require both its DNA-binding and

KRAB domains, its transcriptional effects on other loci may occur through other mechanisms. Such mechanisms may explain the ability of KRABINER to both activate and repress transcription. Taken together these data show that

KRABINER makes substantial contribution to transcriptional regulation of the bat genome.

110

Fig. 2.12: Some KRABINER mediated transcriptional changes require only one of its functional domains. A-B) Upset plots summarizing overlaps between different differentially transcribed gene (A) and TRE (B) sets for the rescue comparison (D&E). Left = KRABINER upregulated genes/TREs (orange); right = KRABINER downregulated genes/TRES. Green=genes/TREs rescued by both WT and mutKRAB transgenes; blue = genes/TREs rescued by both WT and mutDBD transgenes.

111

Fig. 2.13: KRABINER binds to mariner TIRs in bat cells. A) Heatmaps and metaplots summarizing TIRs clustered based on the CUT&RUN read coverage in the WT (n=2) and mutDBD (n=2) samples. B) Quantification of CUT&RUN coverage enrichment across each cluster/ (WT=pink; mutDBD=blue) combination. Y axis = fold enrichment of average CUT&RUN signal over TIRs (30bp) +/- 100bp flanking sequence relative to average CUT&RUN signal for the entire 10kb window. Dashed line ( y=1 ) represents no enrichment. **** adj. p < 0.0001 2-sample Wilcoxon test, Benjamini-Hochberg correction. C) TIR conservation varies by cluster. Boxplot summary of % divergence of each TIR within a cluster relative to the mariner consensus TIR; lower values indicate higher degree of conservation. Kruskal-Wallis global p = 0.0057; * adj. p < 0.05 ; ** adj. p < 0.01 2-sample Wilcoxon test, Benjamini-Hochberg corrected.

112

2.3.6 KRABINER binds genomic mariner TIRs

To determine if KRABINER binds genomic DNA in bat cells, we performed Cleavage Under Targets and Release Using Nuclease (CUT&RUN;

Skene and Henikoff, 2017). Specifically, we measured binding of the myc- tagged KRABINER transgenes in the WT and mutDBD backgrounds 24hr post induction (OE, n=2 each). Both WT and mutDBD KRABINER proteins appear nuclear localized (Fig 2.10E). We concluded upon initial analysis of the data that there was a higher than expected noise-to-signal ratio in our samples, evidenced by spiky signal coverage and poor replicate correlation within peak regions. To address this issue, we chose to focus on the primary predicted binding site for KRABINER’s transposase DBD, its cognate mariner TIR sequences. Because these repetitive sequences are poorly mappable (Fig

2.14A), we clustered ( k-means ) all mariner TIRs (n=2862) based on the number of spike-in normalized CUT&RUN reads (either WT or mutDBD) that mapped uniquely to the junction between the TIR sequence and flanking DNA. We also determined fold enrichment (FE) of KRABINER binding to TIRs relative to flanking genomic (see Methods). Although most TIRs were not found to be bound by this method (cluster 4, n=1959, FE = 0), we identified two clusters of

TIRs that were bound exclusively by WT KRABINER, including a cluster of strongly (cluster 1, n=131, median FE = 3.98) and weakly (cluster 3, n=510, median FE = 2.65) bound TIRs (Fig. 2.13; Fig. 2.14). There was also a cluster of TIRs that were bound by the mutDBD KRABINER protein (cluster 2; n=262, median FE = 4.03) but not WT KRABINER (median FE = 0.84; Fig 2.13; Fig.

113

2.14). These results demonstrate that KRABINER is capable of binding to genomic TIR sequence, and that most of its binding is dependent upon its transposase DBD.

To identify sequence differences between clusters that might explain differential KRABINER binding, we calculated the percent divergence from the consensus TIR, a measure of TIR age or conservation, for each TIR within each cluster. Based on the known age of this family, the average TIR sequence should be approximately 5% diverged from the mariner consensus (Ray, et al.

2008). We found that the TIRs strongly bound by WT KRABINER were less diverged (C1; median 3.85%) than those bound by mutDBD KRABINER (C2; med 4.76%) or were unbound by either protein (C3; med. 5%) (Fig 2.13;

Kruskal-Wallis global p = 0.0057; Wilcoxon test, Benjamini-Hochberg adj p =

0.022 and adj p = 0.002 respectively). Weaker WT-bound TIRs are also modestly less diverged (4.54%) than the unbound TIRs (C3 vs C4; adj. p =

0.038). This suggests KRABINER has higher binding affinity for sequences that resemble its consensus TIR, but may still bind imperfect TIRs.

114

Fig. 2.14: CUT&RUN QC metrics. A) Heatmap and metaplot summary of mappability of mariner elements, including TIR sequences, based on 37mers (GEM-mappability, Derrien, et al. 2012). Higher values indicate higher mappability. B) Heatmap and metaplot summary of CUT&RUN reads per replicate mapped to TIR sequences within each TIR cluster.

115

Fig 2.15: KRABINER binding is associated with transcriptional downregulation of TREs. A) MA plot summarizing changes in TRE transcription upon over-expression of WT KRABINER. Non-specific (black) refers to changes in TRE transcription that are shared between over-expression of WT KRABINER and one or both mutant KRABINER variants. Unchanged refers to TREs with adj. p > 0.05. B-C) Downregulated (B) and upregulated (C) TREs were clustered based on KWT CUT&RUN read coverage within the TRE region; B and C depict heatmaps and metaplot summaries of cluster 1 TREs obtained from each analysis. D) Quantification of CUT&RUN coverage enrichment across each condition/genotype (WT=pink; mutDBD=blue). Y axis = fold enrichment of average CUT&RUN signal over TREs (~1000bp) relative to average CUT&RUN signal for the entire 10kb window. Dashed line ( y=1 ) represents no enrichment. **** adj. p < 0.0001 2-sample Wilcoxon test, Benjamini-Hochberg correction. E-F) Genome snapshots of two downregulated TREs ( XPA promoter, E; GL430198 enhancer, F) that are proximal to KRABINER-bound TIRs. ProSeq signal represents merged replicates (WT n= 4; mutDBD n=3 ); CUT&RUN signal is specified for each replicate separately ( n= 2). R = rescue; OE = overexpression; norm. = normalized (library size for ProSeq, spike-in for CUT&RUN).

116

117

2.3.7 KRABINER binding is associated with downregulation of TREs

We next asked if KRABINER genomic binding was associated with transcriptional change, as suggested by our reporter assays. To do this, we induced expression of the KRABINER transgenes and performed PRO-seq 24- hours post induction (OE), conditions matched to the CUT&RUN experiments.

We then identified TREs that are differentially transcribed between the OE vs rescue (R) conditions, which are of the same exact genotype and in principle differ only in the level of KRABINER expression (Fig. 2.10 and Fig. 2.11).

Because TREs represent discrete transcriptional units such as promoters and enhancers, we reasoned that KRABINER binding to these regions would more likely impact transcription than binding within a gene body. KRABINER OE resulted in 391 differentially transcribed TREs (178 UP, 213 DOWN; Fig. 2.15A) specific to the KWT transgene (n=4) and neither of the mutant transgenes (n=3 each mutDBD and mutKRAB). Additionally, several TREs were differentially transcribed upon OE of WT KRABINER transgene and either mutDBD or mutKRAB transgenes (Fig. 2.16), consistent with the hypothesis that a subset of KRABINER’s transcriptional changes require only one of its functional domains.

To determine if KRABINER binding is associated with differential TRE transcription, we performed a similar analysis described for TIRs above.

Specifically, we mapped spike-in normalized CUT&RUN read coverage over either the downregulated or upregulated TREs, and then clustered TREs based on their signal in the WT samples ( k-means ) and determined fold enrichments

118

for each cluster. We found that 67/213 downregulated TREs are enriched for

WT (cluster 1, median FE = 2.26) but not mutDBD (median FE = 1.02)

KRABINER binding ( adj. p = 4.7e-07, 2-Sample Wilcoxon, Benjamini-Hochberg corrected; Fig 2.15; Fig. 2.17). In contrast, a similar clustering analysis of upregulated TREs (cluster 1, n= 64) found no enrichment for either WT (median

FE = 0.97) or mutDBD (median FE = 1.04) KRABINER binding ( adj. p = 0.32 ;

2-Sample Wilcoxon, Benjamini-Hochberg corrected; Fig 2.15; Fig. 2.17). This data, combined with our reporter assay data, suggests that while KRABINER is capable of both activating and repressing transcription, its direct targets are likely to be downregulated.

Fig 2.16 : KRABINER over-expression results in changes in TRE transcription in a domain-dependent manner. Upset plots summarizing overlaps between different differentially TRE sets for the OE vs rescue comparison. Left = KRABINER upregulated TREs (orange); right = KRABINER downregulated TRES. Green=TREs rescued by both WT and mutKRAB transgenes; blue = TREs rescued by both WT and mutDBD transgenes.

119

Fig 2.17: KRABINER binding is associated with transcriptional downregulation of some TREs. Downregulated or upregulated TREs were each clustered into two groups based on the number of WT CUT&RUN reads mapped to the TRE. A-B) Heatmap and metaplot summarizing WT and mutDBD KRABINER CUT&RUN reads mapped to downregulated TREs (A) or upregulated TREs (B) within each cluster. C) Heatmap and metaplot summarizing WT and mutDBD KRABINER CUT&RUN reads, per replicate, mapped to cluster 1 of the downregulated TREs. D) Quantification of fold enrichments of KRABINER binding to either downregulated (left) and upregulated (right) TREs within each cluster/genotype combination. Dashed line (y=1) represents no enrichment. **** adj p < 0.0001 , 2-sample Wilcoxon test, Benjamini-Hochberg corrected.

120

121

We then asked if KRABINER binding to the downregulated TREs could be due to proximal TIR sequence. Of the 67 KRABINER bound downregulated

TREs, 5 are located within 1kb of a mariner TIR. One example of this is promoter of the promoter the bat ortholog of the XPA, DNA damage recognition and repair factor gene ( XPA ) gene (Fig. 2.15E), which is immediately adjacent to a mariner

TIR bound by WT KRABINER and downregulated (log2FC = -0.69, adj. p =

0.0079 ) upon WT KRABINER over-expression. A similar pattern is seen for an intergenic enhancer element, located between the bat homologs of the family with sequence similarity 174 member B ( FAM174B ) and chromodomain DNA binding protein ( CHD2 ) genes. This region contains three distinct

TREs, two of which are downregulated upon WT KRABINER over-expression

(log2FC = -1.1, adj. p = 0.04 and log2FC = -1.09, adj. p = 0.006 respectively), and this change is associated with KRABINER binding to the mariner TIRs immediately upstream of these TREs (Fig. 2.15F). These examples, and others, suggest that KRABINER binding to its cognate mariner TIRs can lead to transcriptional change in the right circumstances. However, the majority (62/67) of KRABINER bound downregulated TREs are not proximal to a mariner element. This, combined with the absence of transcriptional change or binding to these regions upon over-expression of mutDBD KRABINER, suggests that

KRABINER’s transposase DBD is also capable of binding to non-TIR sequences.

2.4 DISCUSSION

122

How genes with novel functions evolve is a fundamental biological question. The process of gene birth via gene duplication (Ohno 1970) has been extensively studied, but the biochemical function of genes that evolve in this way are often limited to the biochemical function of the ancestral genes (Ruddle, et al. 1994; Bouchard, et al. 2008). In contrast, de-novo gene birth will give rise to new genes with novel functions, but this process is likely to be evolutionarily slow (Van Oss and Carvunis, 2019). Thus, neither mechanism is sufficient to explain the evolution of genes with novel functions. Exon-shuffling (Gilbert

1978), which exploits the modularity of eukaryotic genes to generate proteins with diverse combinations of pre-built functional domains via alternative splicing, potentially addresses both of these limitations. However, the extent of this process as well as how new sources of protein domains and splice sites are acquired remains opaque.

Our study determined that DNA transposons contribute not only functional DNA binding and endonuclease domains but also, in many cases, the very splice sites that facilitate their capture by the host to generate host- transposase fusion genes. We also found that, over time, the fusion transcript generally replaces the parental transcript, a pattern that is consistent with observations of transposase-derived genes (Feschotte and Pritham 2007).

Moreover, transposase capture is not merely an occasional occurrence as previous anecdotal evidence suggested (Cordaux, et al. 2006; Gray, et al.

2012), but rather a pervasive phenomenon in vertebrates. Transposase capture is also likely more common than surveyed here, given our focus on tetrapod

123

genomes, the stringency of our pipeline, and the likelihood of HTF turnover during evolution.

While transposases get fused to a variety of host domains, and thus likely engage in a variety of cellular activities, we found that there is a propensity for fusion to domains involved in chromatin modification and transcriptional modulation. KRAB-transposase fusions are particularly widespread, and our reporter assays indicate that transposase fusion to KRAB domains is a mechanism by which new sequence-specific transcriptional repressors may arise. KRABINER loss in bat cells results in both up- and down-regulated transcriptional changes, suggesting the in-vivo function of KRAB-transposase fusions is more complex. How KRABINER accomplishes these divergent modes of regulation remains unclear, but could be due to interactions with multiple proteins, a common feature of KRAB domains (Helleboid, et al. 2019). Indeed, our reporter assays indicate that KRABINER repression is only partially dependent on the corepressor KAP1, suggesting that it can recruit alternative effector complexes. This is consistent with previous observations that evolutionarily older KRAB domains, such as the one encoded by KRABINER and its parental protein ZNF112 (Murphy, et al. 2016) , which generally interact with a more diverse set of proteins than KRAB domains of younger origins

(Helleboid, et al. 2019). While we chose KRABINER as our test case for HTF function, the ability of other KRAB-transposase fusions to repress gene expression in reporter assays suggests that these genes, and perhaps other

124

fusions, could act also as transcriptional regulators in their endogenous contexts.

Why transposase capture appears predisposed to generate novel transcriptional regulators remains unknown, but may be explained by a model which states that transposase proteins, with their sequence-specific DNA- binding domains and dispersed cognate binding sites, could provide a facile path for the emergence of new transcription factors (Feschotte 2008). We found that KRABINER acts as a transcription factor in bat cells, and that at least some of these transcriptional changes are associated with KRABINER binding to its

TIRs or other genomic loci. Our data, as well as previous studies of other host- transposase fusion genes SETMAR (Tellier and Chalmers, 2019) and PGBD3-

CSB (Gray, et al. 2012), are consistent with the notion that transposase capture facilitates the emergence of complex cis-regulatory networks.

Although we focused on KRAB-transposase fusions because of their abundance in the tetrapod lineage, many other host domains present in HTFs are implicated in chromatin and transcriptional control, such as the SET and

SCAN domains (Herz, et al. 2013; Edelstein and Collins, 2005). Given the ancient and ubiquitous nature of transposases (Aziz, et al. 2010), it is likely that the processes described herein are broadly applicable and have fueled the emergence of novel regulatory proteins throughout evolution.

2.5 MATERIALS AND METHODS

125

2.5.1 Cell lines and culture methods

The following cell lines were used in this study: Myotis velifer embryonic fibroblasts (generous gift D. Ray), Myotis lucifugus embryonic fibroblasts

(generous gift D. Ray), Eptesicus fuscus immortalized skin fibroblasts

(generous gift W. Wright; Gomes, et al. 2011), HEK293T cells (generous gift N.

Elde), HEK293T-Rex cells (ThermoFisher; generous gift J. Lis), HEK293T-WT cells (generous gift H. Rowe, ref), HEK293T-KAP1-KO cells (generous gift H.

Rowe; Tie, et al. 2018). All human cell lines were cultured in high glucose DMEM supplemented with 10% FBS, 1% penicillin/streptomycin, and 1% sodium pyruvate. All bat cell lines were cultured in high glucose DMEM supplemented with 20% FBS, 1% penicillin/streptomycin, and 1% sodium pyruvate. All cells were grown at 37°C and 5% CO2 and passaged as needed upon reaching 80% confluency. All cell culture experiments were performed in sterile conditions in a biosafety hood.

2.5.2 Identifying and characterizing transposase fusion genes

To identify transposase fusion genes, we first extracted all eukaryotic transposase-derived Pfam domains (Table 2.2). These domains were determined to be transposase derived in previous studies selected based on previous studies (Pietrokovski and Henikoff 1997; Aravind, et al. 2005; Yuan and Wessler 2011; Bao, et al. 2010; Aravind 2000; Hayward, et al. 2013; Mobile

DNA II 2015; Roussigne, et al. 2003; Kaiptonov and Jurka 2007; Babu, et al.

126

2006), via a combination of sequence similarity and phylogenetic analysis. We then searched all NCBI Refseq tetrapod gene annotations (Table

2.1; Conserved Domain Architecture Tool [CDART]; Geer, et al. 2002; O’Leary, et al. 2016) for gene models that met the following criteria: 1) contained a transposase domain, 2) had two or more exons (to exclude standalone transposases), and 3) RNA-seq/EST support for all annotated introns. Gene models that met these criteria were considered to be host transposase fusion genes (HTF), and each transposase fusion gene was further characterized to determine its domain structure (Conserved Domain Search, default parameters), originating gene (NCBI) and transposon (Repbase v20170127;

Bao, et al. 2015), where possible, and evolutionary history. Due to inconsistencies in annotation and genome quality, transposase-fusion gene age was determined using a combination of homology-based searches (BLASTn;

NCBI nr/nt and NCBI Refseq_genomes databases; O’Leary, et al. 2016) of closely related species and synteny. Specifically, a transposase-fusion gene was considered to be conserved if it: 1) had a hit containing both the transposase domain and the host domain in the same transcript (nr/nt) or on the same contig (Refseq_genomes) and 2) was located in a syntenic region of the genome, determined by the identity of flanking genes. This conservation data was used to assign each gene a taxonomic span based on parsimony. Each fusion was also assigned a corresponding age in millions of years based on estimates of taxa divergence (Timetree; Kumar, et al. 2017). The insertion timing of the originating transposon for each gene was determined by using the

127

TE sequence and 200bp of flanking genomic DNA as a query against genomes of closely related species (BLASTn; Refseq_genomes; O’Leary, et al. 2016), requiring a hit with 100% coverage to be conserved.

2.5.3 Selection Analysis

For each HTF conserved in at least two species with a divergence > 50 million years (to allow for enough statistical power), we collected and aligned transposase ORF sequences for each species (Kalign, default parameters).

Alignments were manually curated and stop codons were removed. These alignments were then used to determine functional constraint of each transposase using the Phylogenetic Analysis by Maximum Likelihood (PAML) package to estimate dN/dS (CodonFreq = 2, model = 0, Nsites = 0, fix_omega

= 0, omega = 0.4) (Yang, et al. 2007). Significance of dN/dS values were determined via comparison to a model assuming neutral evolution (fix_omega

= 1, omega = 1; Likelihood-ratio-test [LRT], p < 0.05, chi-sq. distribution).

2.5.4 Transposase consensus sequence generation

To generate consensus sequences for the cognate transposons for ten recently evolved KRAB-transposase fusions ( KTIGD1 , KMARD1 , KTIGD3 ,

KRABINER, KMARD4, KHATD2, HATDG4, TIGD-G1, TIGD-G4-Zscan29, and

MARD-G1 ), we first determined the boundaries of the transposon by taking the transposon sequence plus increasing amounts of flanking genomic sequence

(+/- 200bp intervals) and querying the appropriate genome ( Pteropus alecto

128

[ASM32557v1], Chinchilla lanigera [ChiLan1.0], Phascolarctos cinereus

[phaCin_unsw_v4.1], and Myotis lucifugus [Myoluc2.0], Pelodiscus sinensis

[PelSin1.0], Anolis carolinensis [AnoCar2.0], Mus musculus [GRCm38.p6],

Bubalus bubalis [UOA_WB_1], Fukomys damaranensis [DMR_v1.0],

Oryctolagus cuniculus [OryCun2.0] respectively). We then extracted the sequence of one full length copy and used that as a query to collect an additional ten full length copies of each transposon. These transposon sequences were then aligned (Kalign; Madeira, et al. 2019) and manually curated to correct CpG sites. We then used the curated alignments to generate a majority rule consensus for each element, and annotated each consensus for the presence of transposase ORFs and terminal-inverted-repeat (TIR) sequences.

2.5.5 Determining HTF gene birth mechanism

To determine the birth mechanism for each HTF, we first partitioned the genes into two classes: genes born via splicing or genes born via duplication

(through ancestral whole genome duplications or segmental duplications), the latter of which generally occur after the gene is born via splicing. We then asked if those genes born by splicing showed evidence of alternative splicing. To do this, surveyed each gene model and if the fusion transcript (transposase + host sequence) co-occurred with the original host transcript we considered it to have originated via alternative splicing. For a subset of spliced HTFs for which we could construct consensus sequences of its cognate transposon (see above),

129

we also determined whether the splice site was present in the transposon and, if so, whether the sequence was present in the ancestral consensus sequence.

130

Table 2.4 Primers used in this study Primer sequence Pair Orien Target Expected (5' to 3') tation product size (bp) ACATATGAAGTTT KRABINER FWD endogenous 125 TGAAGCTTGCTG junction KRABINER RTPCR transcript TCAGTGGGGACT KRABINER REV endogenous 125 CAGCTCTT junction KRABINER RTPCR transcript CTTGATTCCTGAA ZNF112 FWD endogenous 120 CACCCATCTTT junction ZNF112 RTPCR transcript AACCTGGTCTCAG ZNF112 REV endogenous 120 TGGGGAC junction ZNF112 RTPCR transcript TCTCATGCACTCA KRABINER FWD endogenous 1600 GGCATCC full length KRABINER RTPCR transcript AGTCAGCCATTCC KRABINER REV endogenous 1600 AGTCCTGT full length KRABINER RTPCR transcript AGCTCGTTTAGTG Mlmar1- FWD endogenous 2000 AACCGTC KRAB Mlmar1-KRAB genomic locus PCR GCCCTTCGTCTGA Mlmar1- REV endogenous 2000 CGTG KRAB Mlmar1-KRAB genomic locus PCR AGCTCGTTTAGTG Rescue FWD rescue 1573 AACCGTC Transgene KRABINER PCR locus (all variants) GCCCTTCGTCTGA Rescue REV rescue 1573 CGTG Transgene KRABINER PCR locus (all variants) GAACTTCCGGAA KRABINER FWD rescue 1266 CCTGGTCT rescue KRABINER RTPCR transcript TCTTCTGAGATGA KRABINER REV rescue 1266 GTTTTTGTTCG rescue KRABINER RTPCR transcript

131

tccgcccgggctcgagA KRABINER FWD pcDNA4/TO/myc 1400 TGACCAAGTTCCA PiggyBac IF -His-KRABINER GGAGC (all variants) cgcggaggccacgcgtT KRABINER REV pcDNA4/TO/myc 1400 CAATGGTGATGGT PiggyBac IF -His-KRABINER GATGATGAC (all variants) GGAACTGACATG Mlmar1- FWD endogenous 2000 GTGGCTTT KRAB Mlmar1-KRAB (present); CRISPR locus 250 genotyping (absent) PCR GCCTCAGATGTG Mlmar1- REV endogenous 2000 GTTAACAGTG KRAB Mlmar1-KRAB (present); CRISPR locus 250 genotyping (absent) PCR

2.5.6 Determining the evolutionary history of KRABINER

To determine which bat species have the Mlmar1 mariner insertion at the

ZNF112 locus, two complementary approaches were taken: identifying the presence/absence of the insertion in genomic data, where available, and PCR amplification of Mlmar1-KRAB in DNA extracted from a variety of bat species

(generous gift from R. Baker). In the first case, bat genomes were queried with the mariner insertion +/- 200bp flanking (BLASTn), and was considered to be present at the ZNF112 locus if the hit covered 100% of the query sequence, including flanking DNA. For PCR validation, primers were designed to target the unique genomic DNA flanking the mariner insertion (Fig. 2.3, Table 2.4; NEB

Q5-HF polymerase #M0492). Amplicons were run on a 1% agarose gel to determine presence/absence of the insertion, and amplicons were excised, gel

132

extracted (Zymo Gel DNA Extraction Kit #D4007), and subcloned

(ThermoFisher Zero Blunt TOPO PCR cloning kit #K280020). The subcloned inserts were then Sanger sequenced.

To test for KRABINER expression, RT-PCR was performed. RNA was extracted from three bat cell lines, M. velifer embryonic fibroblasts, M. lucifugus embryonic fibroblasts, and E. fuscus immortalized skin fibroblasts (Qiagen

RNEasy Mini Kit #74104) and converted to cDNA (Maxima First Strand cDNA synthesis kit with dsDNAse #K1672). To verify correct splicing of the KRAB domain (exon 4) to the mariner transposase (exon 6), primers were designed to span the junctions of exons 4-5 (FWD) and 5-6 (REV) (Fig. 2.3, Table 2.4).

Additionally, primers were designed in exons 1 and 6 to amplify the full length

KRABINER transcript (Fig. 2.3, Table 2.4). cDNA was amplified from all cell lines using the specified primers and amplicons were gel-extracted, subcloned, and sequence verified as described above.

2.5.7 KRABINER mutant sequence design

To generate a KRABINER DNA-binding domain mutant, the transposase region of KRABINER was aligned to the sequence of the closely related

Drosophila Mos1 transposon to identify the DNA binding domain. Residues homologous to those known to be critical for Mos1 binding to its TIRs, either via gel-shift experiments (Zhang, et al. 2001) or crystal structure (Richardson, et al.

2009), were mutated (R150A, Q202A, S206A, R208A). To generate a

133

KRABINER KRAB domain mutant, the KRAB sequence was aligned to the

KRAB_Abox Pfam consensus domain (PF01352), and several conserved residues that had previously been shown to be critical for KRAB function were mutated (refs; D12A, V13A, E20A, E21A, L32A, Y33A, R34A) (Margolin, et al.

1994; Witzgall, et al. 1994; Friedman, et al. 1996; Murphy, et al. 2016).

2.5.8 Vector construction

To generate expression vectors, the ORFs of all KRAB-transposase fusions except for KRABINER were synthesized as gBlocks (IDT) with 15bp of homology on the 5’/3’ end to facilitate In-Fusion cloning (Clontech, #638920) into either the pcDNA3.1+ (Addgene #V790-20; BamHI/NotI) for over- expression or pcDNA4/TO/myc-his-B (Thermofisher #V103020, generous gift J.

Schimenti; NotI/XbaI sites). The pcDNA3.1+-ORFs are N-terminal FLAG- tagged and the pcDNA4/TO/myc-his-B-ORFs are C-terminal myc tagged. Wild- type KRABINER expression vectors were generated in a similar manner, except that the KRABINER sequence was amplified from cDNA extracted from Myotis velifer embryonic fibroblasts (Qiagen RNEasy Mini Kit #74104; Maxima First

Strand cDNA synthesis kit with dsDNAse #K1672) using primers unique to the

KRABINER transcript (Table 2.4). Mutant KRABINER sequences (DNA-binding and KRAB mutant) were synthesized as gBlocks (IDT) and cloned into the vectors as described for the other KTFs.

134

Firefly luciferase reporter vectors were generated using the pGL3pro vector (Promega #E1751; generous gift T. Macfarlan). For each tested KTF, the

TIR sequence of the appropriate reconstructed transposon consensus plus

15bp homology on the 5’/3’ end to the pGL3pro vector (BamHI/NotI sites) was synthesized as complementary oligos, which were then mixed and annealed to generate dsTIR fragments for In-Fusion cloning. A sequence scrambled version of each TIR was also synthesized and cloned into the pGL3pro vector to test for sequence specificity.

To generate the KRABINER rescue vectors, we digested the PB-TRE-

KRABdCas9 (generous gift J. Wysocka, Gu, et al. 2018) with XhoI and MluI to remove the KRABdCas9 insert and replaced it (via In-Fusion cloning) with either wild-type (PB-TRE-WT RESCUE), DNA binding mutant (PB-TRE-MUTDBD), or

KRAB mutant forms (PB-TRE-MUTKRAB) of KRABINER, PCR amplified from the expression vectors described above (Table 2.4).

All vectors were validated by Sanger sequencing.

2.5.9 Luciferase Assays

Luciferase assays were performed in one of two variants: overexpression of KRAB-transposase fusion ORFs in HEK293T cells (pcDNA3.1+ vectors) or doxycycline inducible expression of KRAB-transposase fusion in HEK293T-Rex cells (pcDNA4/TO/myc-His-B vectors). In all cases, cells were first seeded at

500,000 cells per well in a 12 well plate (day 1) and allowed to grow. On day 2,

135

cells were transfected with three plasmids (1. KRAB-transposase fusion expression vector or empty, 2. firefly luciferase vector [consensus or scrambled

TIR, Fig. 2.6], and pRL-SV40 [Promega #E2231; generous gift T. Macfarlan], each at 333 ng/uL for a total 1ug DNA) via Lipofectamine 2000 (Fisher

#11668030). 2mL of cell culture media was added on day 3, and cells were either treated (pcDNA4/TO/myc-His-B) or not (pcDNA3.1+) with 1ug/mL doxycycline. On day 4, the cells were lysed and each sample split into 5 wells of a 96-well plate (n=5 technical replicates). Luminescence readings for both firefly and renilla were measured via plate reader (Varioskan LUX; Promega

DualGlo #E2920). Firefly and renilla values were first blank-subtracted

(untransfected cell lysate), and resulting firefly luminescence was normalized to blank-subtracted renilla luminescence. The values for each replicate were then normalized to the average renilla-normalized firefly luminescence of the empty vector condition for each experiment. Each vector combination included a minimum of three independent experiments, and significant difference in mean luminescence relative to the empty vector control was determined via pairwise

Wilcoxon Test with Bonferroni multiple testing corrections (sig. if adj p < 0.05 ).

Luciferase assays were also repeated in HEK293T cells that were either wild- type or KAP1-KO (Tie, et al. 2018; Fig. 2.7) to test for dependency of the effect on KAP1. All statistical tests were performed in R.

2.5.10 Generating and validating KRABINE R KO cells

136

KRABINER KO cell lines were generated as previously described (Ran, et al. 2013). In brief, a pair of gRNAs (gRNA 1 (GL430169:87458-87477) -

CATTTAGTTTCAGCCTCTCATGG, gRNA 2 (GL430169:89264-89286) -

TAATACGTAAGCTGCTGTGTGGG) were designed to the unique genomic

DNA sequence (Myoluc2.0) flanking the mariner insertion, in order to generate a complete deletion of the mariner element (Fig. 2.8). gRNAs were also designed to be unique to that location, allowing no genic off targets with up to 3 mismatches (BreakingCas/CRISPOR/CasOFFinder; Oliveros, et al. 2016;

Haeussler, et al. 2016; Bae, et al. 2014). These gRNAs were synthesized as oligos (IDT), annealed, and cloned into the PX459 vector (Addgene

#62988). We then seeded M. velifer embryonic fibroblasts into 6 well plates and transfected with the resulting Cas9-gRNA vector pair (5ug DNA each, for a total of 10ug DNA) via electroporation (ThermoFisher Neon Transfection System

#MPK10025; 1600V, 20ms, 1 pulse). Cells were allowed to recover for one day, and then were treated with 1.5ug/mL puromycin for 1 week to overcome low transformation efficiency (< 50%). Cells were then seeded at low density (~60 cells) in 98-well plate format for clonal expansion. Plates were checked 1 week after seeding to eliminate wells seeded with more than one cell. Following clonal expansion, clones were genotyped using PCR primers outside of the gRNA targeted region (Table 2.4) to identify clones homozygous for the mariner deletion (DNA QuickExtract Lucigen #QE09050). Clones passing initial genotyping were then further expanded, and RNA and DNA were extracted

(Qiagen RNEasy Mini Kit #74104/Qiagen DNA Blood and Tissue Kit #69504).

137

Clones were again genotyped to verify absence of the mariner insertion (Fig.

2.9), and amplicons were gel-extracted and sequence verified as described above to identify deletion (Fig. 2.9). RNA was extracted, converted to cDNA, and absence of KRABINER transcription was verified using the protocol described above (Fig. 2.9; Table 2.4). We selected a homozygous clone that met the above criteria for use in this study (hereafter referred to as KO).

2.5.11 Generating and validating KRABINER rescue cells

The KO cell line was transfected with 5ug of one of the following

PiggyBac vectors: PB-TRE-WT, PB-TRE-mutDBD, or PB-TRE-mutKRAB as well as 5ug of the PiggyBac transposase expression vector (Systems

Biosciences #PB210PA-1) via electroporation as described above. Cells were allowed to recover for 24 hours and then treated with 1.5ug/mL puromycin for 1 week to select for construct integration. Cells were then clonally expanded and genotyped as described above for KKO cells (Table 2.4). We tested for genomic integration of the KRABINER transgene by designing primers to the CMV promoter (FWD) and the UbqC promoter (REV), which are not present elsewhere in the genome (Fig. 2.10; Table 2.4). PCR products were gel extracted and sequenced as described above to verify presence of correct transgene. We validated expression of the KRABINER transgene at the RNA level via both RTPCR (for WT; reverse primer anchored in the Myc-His tag; Fig.

2.10) and Precision-Runon-Sequencing (PROSeq) (all rescue variants; Fig.

2.11). We also validated protein expression and inducibility of the KRABINER

138

rescue transgenes via western blot (anti-myc antibody; ThermoFisher #MA1-

21316, β-actin loading control #8457).

2.5.12 KRABINER rescue transgene immunofluorescence assays

To assess KRABINER protein localization in bat cells, we performed immunofluorescence assays. In brief, WT clonal lines (P4C11, P4B3, P4D7, and P3E12) as well as a mixed population of mutDBD cells were seeded at a density of 200,000 cells per well of a 4-well chambered coverslip (Ibidi #80426) and either treated with 1ug/mL doxycycline (or not) for 24 hours. Cells were then fixed and permeabilized with 300uL of 100% methanol per well for 3 minutes at

-20°C. Following fixation, cells were washed 3X w/ sterile PBS for 10 minutes each, and then blocked with 2%BSA/0.01% saponin in PBS for 1 hour at RT with rocking. Cells were washed again as described above, and then incubated overnight with rocking in a humidity chamber at 4°C with anti-myc primary antibody (ThermoFisher #MA1-21316) diluted 1:1000 in blocking solution. On day 2, cells were washed and treated with anti-mouse secondary antibody

(ThermoFisher # A28175; 2ug/mL in blocking solution) for 2 hours at RT, then washed again. Slides were then mounted and DAPI counterstained (Fluoromont

DAPI; Southern Biotech #0100-20), and imaged at 100X with an oil immersion objective (EVOS FL).

139

Table 2.5 Sequencing and alignment statistics Sample Replic Genot Condit Experim Unique Rea Run name ate ype ion ent type ly d ID/Co mappe Siz nd d e (reads) (bp) mutDBD_P 1 mutDB rescue PRO- 538270 37x Exp2 1D7 D seq 1 37 mutDBD 1 mutDB OE PRO- 415871 37x Exp2 _P1D7 D seq 7 37 mutDBD 2 mutDB rescue PRO- 518934 37x Exp2 _P2B5 D seq 1 37 mutDBD 2 mutDB OE PRO- 347275 37x Exp2 _P2B5 D seq 1 37 mutDBD 3 mutDB rescue PRO- 716443 37x Exp2 _P2B7 D seq 3 37 mutDBD 3 mutDB OE PRO- 402408 37x Exp2 _P2B7 D seq 4 37 mutDBD 1 mutKR rescue PRO- 324983 37x Exp2 _P3A8 AB seq 6 37 mutKRAB_ 1 mutKR OE PRO- 338420 37x Exp2 P3A8 AB seq 2 37 mutKRAB_ 2 mutKR rescue PRO- 569053 37x Exp2 P4D7 AB seq 1 37 mutKRAB_ 2 mutKR OE PRO- 446569 37x Exp2 P4D7 AB seq 1 37 mutKRAB_ 3 mutKR rescue PRO- 718817 37x Exp2 P4F4 AB seq 9 37 mutKRAB_ 3 mutKR OE PRO- 598044 37x Exp2 P4F4 AB seq 4 37 KKO_rep1 1 KO NA PRO- 145334 37x Exp1 seq 54 37 KKO_rep2 2 KO NA PRO- 165007 37x Exp1 seq 86 37 WT_P3E12 1 WT rescue PRO- 552861 37x Exp1 transge seq 7 37 ne WT_P3E12 1 WT OE PRO- 388745 37x Exp1 transge seq 4 37 ne WT_P4B3 2 WT rescue PRO- 678628 37x Exp1 transge seq 4 37 ne

140

WT_P4B3 2 WT OE PRO- 376276 37x Exp1 transge seq 1 37 ne WT 3 WT rescue PRO- 857353 37x Exp1 _P4C11 transge seq 9 37 ne WT_P4C11 3 WT OE PRO- 592182 37x Exp1 transge seq 6 37 ne WT_P4D7 4 WT rescue PRO- 928082 37x Exp1 transge seq 6 37 ne WT_P4D7 4 WT OE PRO- 681428 37x Exp1 transge seq 2 37 ne WT_P4C11 5 WT rescue PRO- 284826 37x Exp2 transge seq 9 37 ne WT_P4C11 5 WT OE PRO- 222167 37x Exp2 transge seq 5 37 ne WT_P4D7 6 WT rescue PRO- 308425 37x Exp2 transge seq 7 37 ne WT_P4D7 6 WT OE PRO- 528907 37x Exp2 transge seq 4 37 ne WT_rep1 1 WT NA PRO- 169528 37x Exp1 seq 58 37 WT_rep2 2 WT NA PRO- 175814 37x Exp1 seq 94 37 WT_P4C11 1 WT OE CUT&R 116911 37x SD transge UN - 57 37 ne myc WT_P4D7 2 WT OE CUT&R 129401 37x SD transge UN - 78 37 ne myc mutDBD_P 1 mutDB OE CUT&R 150065 37x SD 1D7 D UN - 33 37 myc mutDBD_P 2 mutDB OE CUT&R 108429 37x SD 2B5 D UN - 54 37 myc

141

2.5.13 Sample preparation for PRO-seq and CUT&RUN

For PRO-seq, all samples (Table 2.5), were seeded into 15cm plates and processed as described below. Rescue transgenic lines were also either treated with 1ug/uL doxycycline (induced, OE) or not (non-induced, rescue) 24 hours prior to PRO-seq. For CUT&RUN, WT and mutDBD transgenic cell lines (Table

2.5) were seeded into 10cm plates, grown to 80% confluency, and treated with

1ug/mL doxycycline 24 hours prior to the experiment. Cell viability was assessed via Trypan blue staining (Countess II, ThermoFisher #C10228), with >90% viability required for both PRO-seq and CUT&RUN experiments.

2.5.14 PRO-seq library preparation

PRO-seq libraries were prepared as previously described (Mahat, et al.

2016; Kwak, et al. 2013) with some modifications. Cells were grown to approximately 80% confluency and were chilled on ice, washed twice with ice cold PBS, incubated in PBS with 1 mM EDTA for 5 minutes, and harvested by scraping. Cells were permeabilized by incubating on ice for 5 minutes in permeabilization buffer (10 mM Tris-Cl pH 7.5, 10 mM KCl, 250 mM Sucrose, 5

mM MgCl 2, 1 mM EGTA, 0.5 mM DTT, 0.05% Tween-20, 4 U/mL SUPERase-In

[Thermo Fisher], 1X EDTA Free Pierce Protease Inhibitors [Thermo Fisher], and

0.1% NP-40). Permeabilization was verified using Trypan blue staining followed by visual inspection. Cells were washed once with permeabilization buffer and

142

flash frozen in LN 2 in storage buffer (50 mM Tris-Cl pH 8.0, 40% v/v Glycerol, 5

mM MgCl 2 0.1 mM EDTA, and 0.5 mM DTT). Nuclear Run-On was performed for 5 minutes at 37˚C by mixing equal volumes of permeabilized cells in storage

buffer with 2X Run-On buffer (10 mM Tris-Cl pH 8.0, 5 mM MgCl 2, 1 mM DTT,

300 mM KCl, 40 µM ATP, 40 µM GTP, 40 µM Biotin-11-CTP [Perkin Elmer], 40

µM Biotin-11-UTP [Perkin Elmer], 2 U/mL SUPERase-In [Thermo Fisher], and

1% Sarkosyl). RNA was isolated by TRIzol [Thermo Fisher] extraction and was fragmented by base hydrolysis in 0.2 N NaOH on ice for 5 min. Following neutralization with pH 6.8 Tris-Cl, RNA was passed through an RNase free Bio-

Gel P-30 column [Bio-Rad]. The 3’ and 5’ adapters used were identical to those in (Mahat, et al. 2016) except 6 random nucleotides were added at the ligation junction as a unique molecular identifier (UMI) to facilitate computational removal of PCR duplicates. The 3’ adapter was ligated to the total RNA pool

(containing both nascent and non-nascent RNAs) for 1 h at 25 ˚C using 30 U T4

RNA 1 [NEB] in 10% PEG8000. Nascent RNA was then captured using

Dynabeads MyOne C1 Streptavidin beads [Thermo Fisher] and washed once with high salt wash buffer and once with low salt wash buffer as described in

(Mahat, et al. 2016). RNA was 5’ phosphorylated while attached to the beads using T4 PNK [NEB] and then 5’ uncapped using RppH [NEB] while attached to the beads. RNA was then eluted from the beads using TRIzol, the 5’ adapter was ligated using 10 U T4 RNA Ligase 1 for 1 h at 25 ˚C. RNA was again isolated using Dynabeads MyOne C1 Streptavidin beads, and reverse transcription was performed on the beads using Maxima H- RT [Thermo Fisher].

143

cDNA was eluted from the beads by heating, and final libraries were amplified for 11 PCR cycles using Q5 High Fidelity DNA Polymerase [NEB]. Libraries were sequenced on an Illumina NextSeq 500 platform using paired end 37 by

37 chemistry.

2.5.15 PRO-seq alignment and processing

The pipeline used to process PRO-seq data is available at http://github.com/JAJ256/. Briefly, UMIs were extracted and adapter sequences were trimmed using fastp (Chen, et al. 2018). Ribosomal reads were removed by mapping to one copy of the human rDNA repeat using bowtie2 (Langmead and Salzer 2012) --fast-local and retaining unmapped reads. Reads were then mapped to the Myoluc2.0 assembly, which is the closest organism (< 10 million years diverged from Myotis velifer ) with a publicly available genome, with the sequence of the wild-type KRABINER transgene spiked in as an additional contig, using bowtie2 --sensitive-local. PCR duplicates were removed using

UMI-tools (Smith, et al. 2017) with the “directional” method (avg. 0.02% duplication rate), and bigWig score tracks for visualization and downstream analysis were generated using deepTools bamCoverage (Ramirez, et al. 2014).

Gene Body regions were defined as from -500 – +500 or +500 – TES, respectively of Myoluc2 ENSEMBL gene annotations (GCA_000147115.1). The

PROSeq data was also used to call transcriptional regulatory elements (TREs) for each sample (dREG; Wang, et al. 2019) and then merged to generate a comprehensive TRE set. Read counts in gene bodies and TREs were

144

calculated by generating single nucleotide resolution bedGraph files using a custom script and bedtools map (Quinlan and Hall, 2010).

2.5.16 Differential transcription analysis of genes and TREs

Proseq read counts from each sample were combined into a single counts matrix. Differential expression was performed for gene bodies and TREs separately using DESeq2 (Love, et al. 2014). Samples were separated into condition and treatment by both genes and TREs (Principal Components

Analysis, rlog normalized counts; Fig 2.11). Three comparisons were performed: KO vs WT, WT/mutDBD/mutKRAB R vs KO, and

WT/mutDBD/mutKRAB OE vs R. The OE vs R comparison was performed individually for each rescue, controlling for clonal identity. We considered a gene or TRE to be regulated by KRABINER if it exhibited significant ( adj. P < 0.05 ) reciprocal changes in the KO vs WT and the WT/mutDBD/mutKRAB R vs KO comparison or if it was differentially transcribed in the OE vs R comparison ( adj.

P < 0.05 ). To identify differentially transcribed genes and TREs unique to the

WT rescue lines or shared between the WT rescue lines and either the mutDBD or mutKRAB mutant lines, we used the UpsetR package (Conway, et al. 2017).

Genic transcription across replicates was highly correlated (average Spearman

R = 0.97, p < 0.001; Fig. 2.11). Raw and DESeq2 library normalized counts are available on GEO.

2.5.17 CUT&RUN library preparation

145

CUT&RUN was performed on WT and mutDBD rescue cells, prepared as above, as previously described (Skene and Henikoff, 2017; https://www.protocols.io/view/cut-amp-run-targeted-in-situ-genome-wide- profiling-zcpf2vn ). Two samples from each transgene genotype (WT and mutDBD) were prepared using the standard CUT&RUN protocol (1 million cells each; Table 2.5) using the following antibodies: mouse anti-myc (ThermoFisher

#MA1-21316) and goat anti-mouse IgG secondary antibody (ThermoFisher #

A27022). Following CUT&RUN, purified DNA was end-repaired (2X MM: 10X

Ligase Buffer w/ ATP [NEB], 25mM dNTPs, 10U/uL T4 PNK [NEB], 3U/uL

Klenow large [NEB]), A-tailed (1X MM: 10X NEBuffer2, 10mM dATP [NEB],

5U/uL Klenow exo- [NEB]), ligated to Illumina adapters (1X T4 DNA ligase buffer with ATP [NEB], 0.5uL 0.5uM diluted TruSeq adapter, 3uL T4 DNA ligase), and

PCR amplified (15 cycles, NEB Q5-HF polymerase). Resulting library was purified, quantified (Qubit) and fragment analyzed (Bioanalyzer) to ensure quality, and sequenced on an Illumina NextSeq 500 platform using paired end

37 by 37 chemistry.

2.5.18 CUT&RUN data processing and analysis

CUT&RUN fastq files were processed to remove sequencing adapters

(cutadapt; Martin 2011) and aligned to both the reference (Myoluc2.0) and the spike-in assemblies (dm6) using bowtie2 with the following parameters: --local

--very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X

700 (Langmead and Salzer 2012). Low quality and multimapping alignments

146

(MAPQ < 30), and reads mapping to blacklisted high accessibility regions

(contigs AAPE02059006, AAPE02072785, GL429886, GL429923, GL429927,

GL429979, GL430411, GL430644, GL431357) were removed (samtools; Li, et al. 2009). Scale factors for normalization were determined by counting the number of reads mapped to the spike-in genome for each sample and dividing it by the average number of spike-in reads. BAM files were converted to paired- end bed files (bedtools bamtobed; Quinlan and Hall 2010) and then spike-in normalized bed files, removing fragments > 120bp (bedtools genomecov;

Quinlan and Hall 2010). Normalized bed files were then converted into normalized bigWig files and replicates were merged by genotype (WT rescue or mutDBD) or kept separate for downstream analysis and comparison

(bedGraphToBigWig; Kent, et al. 2010). bigwig files are available on GEO.

Initial attempts at peak calling (WT vs mutDBD, MACS2; Zhang, et al.

2008) identified WT KRABINER bound peaks, but the replicates were poorly correlated due to high noise-to-signal (data not shown). To address this issue, we instead directly quantified enrichment of WT or mutDBD KRABINER binding to TIRs and differentially transcribed TREs. First, we identified TIRs in the

MyoLuc2.0 assembly using a PWM generated from an alignment of 70 mariner

TIR sequences (nhmmer; http://hmmer.org/ ). To assess KRABINER binding to

TIRs, which are poorly mappable (37mer, Fig. 2.14; GEM-mappability, Derrien, et al. 2012), we measured the number of CUT&RUN reads from the WT and mutDBD samples that mapped to the TIR (30bp) and 100bp flanking unique

147

genomic DNA, which is more mappable. We then clustered the TIRs based on signal from both conditions (WT and mutDBD) into 4 clusters using k-means

(deepTools computeMatrix/plotHeatmap; Ramirez, et al. 2014). To quantify enrichment of KRABINER binding for each cluster/genotype combination, we calculated the average CUT&RUN reads within the TIR and its 100bp flanking

DNA and divided that by the average CUT&RUN reads across a 10kb window

(+/-5kb). We performed a similar analysis for the TREs differentially transcribed upon KRABINER over-expression. In brief, we clustered the upregulated and downregulated TREs individually into two clusters based on CUT&RUN read coverage in the WT rescue samples, and calculated fold enrichment of reads mapped within the TRE (avg. 1000bp) relative to the average reads across a

10kb window (+/-5kb). Global statistical significance of all fold-enrichment comparisons was determined using the Kruskal-Wallis test, and, if significant ( p

< 0.05 ), statistical significance of pairwise comparisons was determined via 2- sample Wilcoxon Tests with Benjamini-Hochberg multiple testing corrections.

Pairwise comparisons were considered significant at adj. p < 0.05). All statistical tests were performed in R.

148

REFERENCES

Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, Russell P, Lowe CB, Glor RE, Jaffe JD, et al. 2011. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477 : 587–591.

Aravind L. 2000. The BED finger, a novel DNA-binding domain in chromatin- boundary-element-binding proteins and transposases. Trends in Biochemical Sciences 25 : 421–423.

Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM. 2005. The many faces of the helix-turn-helix domain: Transcription regulation and beyond. FEMS Microbiology Review 29 : 231–262.

Babu MM, Iyer LM, Balaji S, Aravind L. 2006. The natural history of the WRKY- GCM1 zinc fingers and the relationship between transcription factors and transposons. Nucleic Acids Res 34 : 6505–6520.

Bae S, Park J, Kim J-S. 2014. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30 : 1473–1475.

Bao W, Kapitonov VV, Jurka J. 2010. Ginger DNA transposons in eukaryotes and their evolutionary relationships with long terminal repeat retrotransposons. Mobile DNA 1: 3–3.

Bao W, Kojima KK, Kohany O. 2015. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6: 11–11.

Bouchard M, Schleiffer A, Eisenhaber F, Busslinger M. 2001. PaxGenes: Evolution and Function . John Wiley & Sons, Ltd, Chichester, UK.

Breitling R, Gerber J-K. 2000. Origin of the paired domain. Development Genes and Evolution 210 : 655–650.

Bruno M, Mahgoub M, Macfarlan TS. 2019. The Arms Race Between KRAB– Zinc Finger Proteins and Endogenous Retroelements and Its Impact on Mammals. Annu Rev Genet 53 : annurev–genet–112618–043717.

Castoe TA, de Koning APJ, Hall KT, Card DC, Schield DR, Fujita MK, Ruggiero RP, Degner JF, Daza JM, Gu W, et al. 2013. The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci USA 110 : 20645.

149

Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34 : i884–i890.

Conway JR, Lex A, Gehlenborg N. 2017. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33 : 2938–2940.

Cordaux R, Udit S, Batzer MA, Feschotte C. 2006. Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proceedings of the National Academy of Sciences of the United States of America 103 : 8101–8106.

Craig, Chandler, Gellert, Lambowitz, Rice, Sandmeyer, eds. 2015. Mobile DNA III . American Society of Microbiology, Washington, United States.

Deng C, Cheng CHC, Ye H, He X, Chen L. 2010. Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict. Proc Natl Acad Sci USA 107 : 21593.

Derrien T, Estellé J, Marco Sola S, Knowles DG, Raineri E, Guigó R, Ribeca P. 2012. Fast Computation and Applications of Genome Mappability ed. C.A. Ouzounis. PLOS ONE 7: e30377.

Edelstein LC, Collins T. 2005. The SCAN domain family of zinc finger transcription factors. Gene 359 : 1–17.

Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9: 397–405.

Feschotte C, Pritham E. 2007. DNA Transposons and the Evolution of Eukaryotic Genomes. Annu Rev Genet 41 : 331–368.

Friedman JR, Fredericks WJ, Jensen DE, Speicher DW, Huang XP, Neilson EG, Rauscher FJ. 1996. KAP-1, a novel corepressor for the highly conserved KRAB repression domain. Genes Dev 10 : 2067–2078.

Geer LY, Domrachev M, Lipman DJ, Bryant SH. 2002. CDART: Protein Homology by Domain Architecture. Genome Res 12 : 1619–1623.

Gomes NMV, Ryder OA, Houck ML, Charter SJ, Walker W, Forsyth NR, Austad SN, Venditt C, Pagel M, Shay JW, et al. 2011. Comparative biology of mammalian telomeres: hypotheses on ancestral states and the roles of telomeres in longevity determination. Aging Cell 10 : 761–768.

Gray LT, Fong KK, Pavelitz T, Weiner AM. 2012. Tethering of the Conserved piggyBac Transposase Fusion Protein CSB-PGBD3 to Chromosomal AP-1

150

Proteins Regulates Expression of Nearby Genes in Humans ed. G.S. Barsh. PLOS Genetics 8: e1002972–e1002972.

Gu B, Swigut T, Spencley A, Bauer MR, Chung M, Meyer T, Wysocka J. 2018. Transcription-coupled changes in nuclear mobility of mammalian cis- regulatory elements. Science 359 : 1050–1055.

Haeussler M, Schönig K, Eckert H, Eschstruth A, Mianné J, Renaud J-B, Schneider-Maunoury S, Shkumatava A, Teboul L, Kent J, et al. 2016. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17 : 148.

Hayward A, Ghazal A, Andersson G, Andersson L, Jern P. 2013. ZBED evolution: repeated utilization of DNA transposons as regulators of diverse host functions. PLOS ONE 8: e59940–e59940.

Helleboid P-Y, Heusel M, Duc J, Piot C, Thorball CW, Coluccio A, Pontis J, Imbeault M, Turelli P, Aebersold R, et al. 2019. The interactome of KRAB zinc finger proteins reveals the evolutionary history of their functional diversification. EMBO J 38 : e101220–e101220.

Herz H-M, Garruss A, Shilatifard A. 2013. SET for life: biochemical activities and biological functions of SET domain-containing proteins. Trends in Biochemical Sciences 38 : 621–639.

Imbeault M, Helleboid P-Y, Trono D. 2017. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543 : 550–554.

Innan H. 2009. Population genetic models of duplicated genes. Genetica 137 : 19.

Kapitonov V, Jurka J. 2010. Kolobok, a novel superfamily of eukaryotic DNA transposons. Repbase Rep 7: 111–122.

Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. 2010. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26 : 2204–2207.

Kumar S, Stecher G, Suleski M, Hedges SB. 2017. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Molecular Biology and Evolution 34 : 1812–1819.

Kwak H, Fuda NJ, Core LJ, Lis JT. 2013. Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing. Science 339 : 950.

Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9: 357–359.

151

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup 1GPDP. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 : 2078–2079.

Love M, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15 : 550.

Lynch VJ. 2011. Inventing an arsenal: adaptive evolution and neofunctionalization of snake venom phospholipase A2 genes. BMC Evolutionary Biology 11 :1.

Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, et al. 2019. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47 : W636–W641.

Mahat DB, Kwak H, Booth GT, Jonkers IH, Danko CG, Patel RK, Waters CT, Munson K, Core LJ, Lis JT. Base-pair-resolution genome-wide mapping of active RNA using precision nuclear run-on (PRO-seq). Nature protocols 11 : 1455.

Malnic B, Godfrey PA, Buck LB. 2004. The human olfactory receptor gene family. Proc Natl Acad Sci U S A 101 : 2584.

Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ, Rauscher FJ. 1994. Krüppel-associated boxes are potent transcriptional repression domains. Proceedings of the National Academy of Sciences of the United States of America 91 : 4509–4513.

Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal; Vol 17, No 1: Next Generation Sequencing Data Analysis .

Mitros T, Lyons JB, Session AM, Jenkins J, Shu S, Kwon T, Lane M, Ng C, Grammer TC, Khokha MK, et al. 2019. A chromosome-scale genome assembly and dense genetic map for Xenopus tropicalis. Developmental Biology 452 : 8–20.

Murphy KE, Shylo NA, Alexander KA, Churchill AJ, Copperman C, García- García MJ. 2016. The Transcriptional Repressive Activity of KRAB Zinc Finger Proteins Does Not Correlate with Their Ability to Recruit TRIM28 ed. H. Cao. PLOS ONE 11 : e0163555–4513.

Ohno S. 1970. Evolution by Gene Duplication . Springer-Verlag, Berlin, Heidelberg.

O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al. 2016. Reference sequence

152

(RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44 : D733–D745.

Oliveros JC, Franch M, Tabas-Madrid D, San-León D, Montoliu L, Cubas P, Pazos F. 2016. Breaking-Cas-interactive design of guide RNAs for CRISPR- Cas experiments for ENSEMBL genomes. Nucleic Acids Res 44 : W267– W271.

Pietrokovski S, Henikoff S. 1997. A helix-turn-helix DNA-binding motif predicted for transposases of DNA transposons. Molec Gen Genet 254 : 689–695.

Pritham EJ, Feschotte C. 2007. Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus . Proc Natl Acad Sci USA 104 : 1895.

Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 : 841–842.

Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. 2014. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42 : W187–W191.

Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. 2013. Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8: 2281–2308.

Ray DA, Feschotte C, Pagan HJT, Smith JD, Pritham EJ, Arensburger P, Atkinson PW, Craig NL. 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res 18 : 717–728.

Richardson JM, Colloms SD, Finnegan DJ, Walkinshaw MD. 2009. Molecular Architecture of the Mos1 Paired-End Complex: The Structural Basis of DNA Transposition in a . Cell 138 : 1096–1108.

Roussigne M, Kossida S, Lavigne A-C, Clouaire T, Ecochard V, Glories A, Amalric F, Girard J-P. 2003. The THAP domain: a novel protein motif with similarity to the DNA-binding domain of P element transposase. Trends in Biochemical Sciences 28 : 66–69.

Ruddle FH, Bartels JL, Bentley KL, Kappen C, Murtha MT, Pendleton JW. 1994. Evolution of Hox Genes. Annu Rev Genet 28 : 423–442.

Skene PJ, Henikoff S. 2017. An efficient targeted nuclease strategy for high- resolution mapping of DNA binding sites ed. D. Reinberg. eLife 6: e21856.

Smith T, Heger A, Sudbery I. 2017. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27 : 491–499.

153

Sotero-Caio CG, Platt RN II, Suh A, Ray DA. 2017. Evolution and Diversity of Transposable Elements in Vertebrate Genomes. Genome Biology and Evolution 9: 161–177.

Tellier M, Chalmers R. 2018. Human SETMAR is a DNA sequence-specific histone-methylase with a broad effect on the transcriptome. Nucleic Acids Res 38 : 4207.

Tie CH, Fernandes L, Conde L, Robbez Masson L, Sumner RP, Peacock T, Rodriguez Plata MT, Mickute G, Gifford R, Towers GJ, et al. 2018a. KAP1 regulates endogenous retroviruses in adult human cells and contributes to innate immune control. EMBO Rep e45000.

Van Oss SB, Carvunis A-R. 2019. De novo gene birth. PLOS Genetics 15 : e1008160.

Wang Z, Chu T, Choate LA, Danko CG. 2019. Identification of regulatory elements from nascent transcription using dREG. Genome Res 29 : 293– 303.

Witzgall R, O'Leary E, Leaf A, Onaldi D, Bonventre JV. 1994. The Krüppel- associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proceedings of the National Academy of Sciences of the United States of America 91 : 4514–4518.

Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Molecular Biology and Evolution 24 : 1586–1591.

Yuan Y-W, Wessler SR. 2011. The catalytic domain of all eukaryotic cut-and- paste transposase superfamilies. Proc Natl Acad Sci USA 108 : 7884–7889.

Zhang L, Dawson A, Finnegan DJ. 2001. DNA-binding activity and subunit interaction of the mariner transposase. Nucleic Acids Res 29 : 3566–3575.

154 CHAPTER 3

DISCUSSION AND FUTURE DIRECTIONS

3.1 DISCUSSION

The fitness of transposons and their hosts is intimately intertwined. The capacity for transposons to impair host fitness spurs both transposons and their hosts to employ mechanisms to mitigate transposon activity (Cosby et al. 2019).

We proposed that this interaction facilitates cooperation between transposons and their hosts, but that this phase of host-transposon interaction is transient and is resolved in one of three ways: 1) the TE no longer cooperates, leading to reactivation and possible loss of the family in the population if too active (arms- race), 2) the TE fades into obscurity due to relaxed selection pressure on its sequence, or 3) maintenance of TE features for cellular function rather than the

TE family as a whole, leading to eventual loss of the TE family (cooption) (Fig.

1-5). Although the primary focus of this dissertation has been on the third type of resolution (cooption), the results presented here also have implications for arms-races and self-mitigation strategies by transposons.

Transposase capture is clearly a powerful mechanism to generate novel genes in tetrapods, and it appears biased for generating transcriptional regulators. This has implications for the evolution of cellular regulatory networks, but also in genome defense. The vast majority of these transcriptional

155

regulators contain the KRAB domain, which is predicted to be largely repressive based both on data reported here and previous work (reviewed in (Bruno et al.

2019). In contrast, there are far fewer fusions with transcriptional activating domains (Fig. 2-4 and Fig 2-5). Because these fusion proteins are predicted to bind to their cognate transposable elements via their recycled transposase

DNA-binding domains, it is perhaps not surprising that targeting transposons for repression, rather than activation, would be more evolutionarily favorable.

KRAB-transposase fusions could function to protect the genome against transposon activity, similar to their KRAB-ZFP brethren. One possible advantage KRAB-transposase fusions might have over KRAB-ZFPs, however, is their ability to act rapidly to silence transposons upon cooption. While the diverse repertoire of KRAB-ZFPs (Imbeault et al. 2017) facilitates adaptation to new TE invasions, there is usually a period of time before a KRAB-ZFP can evolve specificity to the invading transposon (Yang et al. 2017). In contrast, fusion of a transposase DNA binding domain to a KRAB domain could be immediately beneficial to the host, as it is already adapted to bind to transposons. Moreover, this could potentially lead to cross-repression of several

DNA transposon families, as transposases have been shown to bind to and occasionally cross-mobilize closely related transposons (Yang et al. 2009;

Atkinson et al. 1993; Feschotte et al. 2005). This potential role in genome defense provides a plausible explanation for the abundance of KRAB- transposase fusions, and supports our assertion that the true extent of HTF in the tetrapod lineage remains unknown, as KRAB-transposase fusions could

156

exhibit turnover similar to KRAB-ZFPs once the transposon promoting their maintenance loses its ability to transpose. In some cases, a role in genome defense could also facilitate their maintenance long enough for other functions, such as transcription factor activity, to evolve in a “stepping stone” fashion that has been proposed for cooption of other transposons (Frank and Feschotte

2017).

We investigated the possibility that KRAB-transposase fusions could act as genome defense mechanisms by studying KRABINER ex-vivo . Although we demonstrated that KRABINER binds the TIR sequences of its cognate mariner transposon, we were unable to find convincing evidence that it directly silences mariner elements in modern bats. Most notably, our epigenetic data, including

CUT&RUN and ChIP-seq data for several histone marks, both activating

(H3K27ac, H3K4me3) and repressive (H3K9me3, H3K27me3), suggest that mariner elements are largely inert regardless of KRABINER ’s presence or absence (data not shown). One possible reason KRABINER loss might not result in mariner reactivation is that the mariner family KRABINER is derived from, Mlmar1 , has been extinct in bats for estimated 10 million years (Ray et al.

2008), with lineage-specific insertions likely due to the high degree of lineage sorting in this clade (Platt et al. 2017). This is supported by preliminary data generated by our collaborators suggesting that, at least in Myotis lucifugus embryonic fibroblasts, Mlmar1 elements are heavily methylated (T. Wang).

Thus, it seems likely that Mlmar1 elements are stably suppressed, such that even removal of their possible repressor, KRABINER , is not sufficient to relieve

157

silencing. Future experiments that transiently inhibit DNA methyltransferases in the presence or absence of KRABINER may reveal a different story.

The mechanism of transposase capture, alternative splicing to preexisting cellular genes, also has implications for host-transposon interactions. In the case of KRABINER, the splice acceptor site required for fusion was present in the consensus sequence of the mariner transposon, which made us wonder if this was true of other fusions as well. To test this, a talented student working with me (R. Zhang) reconstructed the ancestral sequence of seven additional transposons responsible for HTFs, and in 6/7 cases the splice site utilized in the fusion was present in the consensus sequence (data not shown). Additionally, studies of two recently evolved HTFs, SETMAR (Cordaux et al. 2006) and PGBD3-CSB (Gray et al. 2012) documented a similar pattern, suggesting that many DNA transposons carry splice acceptor, and in some cases, such as for the THAP-fusions, splice donor sites.

Although the reason why some DNA transposons carry splice sites remains unknown, one possibility is that it facilitates their expression. DNA transposons, unlike retrotransposons, frequently lack or only possess minimal promoters (Palazzo et al. 2017; 2019; O'Kane and Gehring 1987), and this has been proposed to be a potentially advantageous strategy to self-mitigate their negative impacts on the hosts and to facilitate expression in a wide variety of organisms, such as after a horizontal transfer event to another species (Palazzo et al. 2017). Additionally, the lack of promoter elements may allow DNA

158

transposons to go undetected by host defense machinery, as is the case for the

Ping/mPing elements in rice (Chen et al. 2019).

While the lack of a promoter might be beneficial in certain situations, it does beg the question: if the transposase cannot be expressed, how can the transposon family be mobile? We propose that the reason why some DNA transposons carry splice sites in their consensus sequences is to facilitate usage of adjacent or cryptic host promoters to express their transposase mRNAs. This process would not always result in functional transposase being expressed, due to inconsistencies in reading frame or generation of chimeric transcripts. When it does, however, it has the advantage of possibly enabling the transposons to evade silencing, because the host machinery would likely target promoters upstream of the actual transposon. Our lab has some preliminary data to suggest that, mariner elements, at least in bats, produce many chimeric transcripts (R. Zhang), but more work needs to be done to determine how widespread this is and what biological significance it has.

3.2 FUTURE DIRECTIONS

3.2.1 How common is transposase capture in other lineages?

Although we focused of transposase capture in tetrapods, it seems probable that transposase capture is a more general mechanism for TF birth.

This mechanism requires: 1) the presence of transcriptional effector domains,

2) transposases, and 3) splicing. All three of these features are common in

159

eukaryotes, suggesting that transposase capture in principle could contribute to the birth of TFs across Eukarya. This prediction is consistent with preliminary data (not shown) from R. Zhang, who observed numerous host-transposase fusion genes in fish, which have abundant DNA transposon activity (Howe et al.

2013; Lien et al.). The domains incorporated in other lineages are likely to vary, especially considering KRAB domains are not expanded outside of tetrapods

(Imbeault et al. 2017), but the general principles are likely to be conserved. It will be especially interesting to investigate these possibilities in well- characterized model organisms, such as Drosophila and C. elegans , where the functional hypotheses predicted herein can be tested in-vivo .

3.2.2 What other potential functions do host-transposase fusion genes have?

The preponderance of transposase DNA binding domains and host transcriptional effector domains in our identified HTFs strongly suggested a role in transcriptional regulation, which we confirmed. However, this is unlikely to be the only function of HTF genes. In order to mediate transposition, transposase proteins must encode a variety of biochemical functions in addition to sequence- specific DNA binding. Specifically, they must also encode a catalytic core to mediate excision and subsequent reintegration into DNA, and they also frequently encode protein interaction domains that interact with other transposases and various host systems such as the DNA replication machinery.

Any of these features could be potentially recycled, and, indeed, many of our

160

HTF genes contain domains in addition to or in lieu of DNA binding domains, including catalytic cores and dimerization domains (Figure 2.5). These domains could enable HTFs to act similarly to other coopted transpose proteins, including (Henssen et al. 2017; Huang et al.; Nowacki et al. 2009) or as insulator proteins due to the transposition requirement of DNA looping to form the paired-end complex for many DNA transposons (Craig et al. 2015).

Moreover, these diverse transposase domains, when combined with myriad host domains, likely acquire new or combinatorial functions that are yet unknown, representing a fascinating example of the evolution of new genes via exon-shuffling (Gilbert 1978).

3.2.3 What is the biological function of KRABINER in bats?

Our data strongly supports KRABINER as a transcriptional regulator in bats, but the biological consequence of this remains unknown. Over the course of generating the KRABINER KO cell lines, I made several observations that may provide a clue as to what KRABINER’s biological function is. First, following clonal expansion to generate the KO cell line, I noticed that the cells exhibited a cellular phenotype consistent with aging and senescence, including slow growth rate and an enlarged, flattened morphology (Hernandez-Segura et al.

2018). RNA-seq and PRO-seq analysis of these cells also identified activation of a transcriptional network characteristic of senescent cells (Lackner et al.

2014). This included what appeared to be aberrant activation or increased accessibility of DNA on contigs analogous to the X arm of human chromosome

161

12. Initially I believed this to be due to clonal expansion of a non-immortalized cell line, and excluded the aberrantly regulated contigs from all downstream analyses, but subsequent analysis of the KO and rescue lines suggested alternative hypotheses.

Despite the aging phenotype I observed in the KO cells, I was able to repeat the clonal expansion to generate transgenic rescue lines derived from these cells. Because this would have likely been impossible if the progenitor KO cells were truly senescent, it suggested to me that something else might explain the aging phenotype. Additionally, during this process, the co-cultured progenitor KO line began growing at an accelerated rate, suggesting it had become immortalized. Consistent with this, PRO-seq data from these cells indicated that these cells no longer exhibited the aging associated transcriptional changes or aberrant activation of the region homologous to human . A similar phenomenon was observed for the DNA binding mutant KRABINER rescue (mutDBD) but not the wild-type (WT) or

KRAB-mutant (mutKRAB) rescues, though whether this is due to some biological defect being rescued by the WT and mutKRAB variants but not the mutDBD, or a consequence of the mutDBD lines being derived post-KO transformation, remains to be investigated.

Additionally, analysis of the genes differentially regulated by KRABINER (Table A-1) either in the rescue or over-expression conditions suggests that many play a role in cellular proliferation, cell-cycle checkpoints,

DNA damage, and cell migration. Taken together with the changes in cell state

162

and morphology described above, the data suggests KRABINER may function similarly to a tumor suppressor in bats. If so, KRABINER may partially explain the incredible longevity of this clade (Foley et al. 2018; Huang et al. 2019;

Podlutsky et al. 2005).

3.3 CONCLUSION

In conclusion, transposase capture is prominent mechanism to generate novel transcription factors in tetrapods. However, it is likely that this process produces genes with myriad functions, some of which could include genome defense, recombination, or insulator functions. Transposase-capture is also a potent source of lineage specific gene variation that may help explain differences in species-specific traits. Transposase capture is thus a major evolutionary force that generates cellular genes in tetrapods, and may prove to be a foundational mechanism for gene birth in Eukarya.

163 REFERENCES

Atkinson PW, Warren WD, O'Brochta DA. 1993. The hobo transposable element of Drosophila can be cross-mobilized in houseflies and excises like the Ac element of maize. Proceedings of the National Academy of Sciences of the United States of America 90 : 9693–9697.

Bruno M, Mahgoub M, Macfarlan TS. 2019. The Arms Race Between KRAB– Zinc Finger Proteins and Endogenous Retroelements and Its Impact on Mammals. Annu Rev Genet 53 : annurev–genet–112618–043717.

Chen J, Lu L, Benjamin J, Diaz S, Hancock CN, Stajich JE, Wessler SR. 2019. Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice. Nature Communications 2018 9:1 10 : 641.

Cordaux R, Udit S, Batzer MA, Feschotte C. 2006. Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proceedings of the National Academy of Sciences of the United States of America 103 : 8101–8106.

Cosby RL, Chang N-C, Feschotte C. 2019. Host-transposon interactions: conflict, cooperation, and cooption. Genes Dev 33 : 1098–1116.

Craig NL, Chandler M, Gellert M, Lambowitz AM. 2015. Mobile DNA III . ASM Press, Washington, UNITED STATES.

Feschotte C, Osterlund MT, Peeler R, Wessler SR. 2005. DNA-binding specificity of rice mariner -like transposases and interactions with Stowaway MITEs. Nucleic Acids Res 33 : 2153–2165.

Foley NM, Hughes GM, Huang Z, Clarke M, Jebb D, Whelan CV, Petit EJ, Touzalin F, Farcy O, Jones G, et al. 2018. Growing old, yet staying young: The role of telomeres in bats’ exceptional longevity. Sci Adv 4: eaao0926.

Frank JA, Feschotte C. 2017. Co-option of endogenous viral sequences for host cell function. Current Opinion in Virology 25 : 81–89.

Gilbert W. 1978. Why genes in pieces? Nature 271 : 501–501.

Gray LT, Fong KK, Pavelitz T, Weiner AM. 2012. Tethering of the Conserved piggyBac Transposase Fusion Protein CSB-PGBD3 to Chromosomal AP-1 Proteins Regulates Expression of Nearby Genes in Humans ed. G.S. Barsh. PLOS Genetics 8: e1002972–e1002972.

164

Henssen AG, Koche R, Zhuang J, Jiang E, Reed C, Eisenberg A, Still E, MacArthur IC, Rodríguez-Fos E, Gonzalez S, et al. 2017. PGBD5 promotes site-specific oncogenic mutations in human tumors. Nature Genetics 49 : 1005–1014.

Hernandez-Segura A, Nehme J, Demaria M. 2018. Hallmarks of Cellular Senescence. Trends in Cell Biology 28 : 436–453.

Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. 2013. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496 : 498–503.

Huang S, Tao X, Yuan S, Zhang Y, Li P, Beilinson HA, Zhang Y, Yu W, Pontarotti P, Escriva H, et al. Discovery of an Active RAG Transposon Illuminates the Origins of V(D)J Recombination. Cell .

Huang Z, Whelan CV, Foley NM, Jebb D, Touzalin F, Petit EJ, Puechmaille SJ, Teeling EC. 2019. Longitudinal comparative transcriptomics reveals unique mechanisms underlying extended healthspan in bats. Nat Ecol Evol 3: 1110–1120.

Imbeault M, Helleboid P-Y, Trono D. 2017. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543 : 550–554.

Lackner DH, Hayashi MT, Cesare AJ, Karlseder J. 2014. A genomics approach identifies senescence-specific gene expression regulation. Aging Cell 13 : 946–950.

Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, et al. The Atlantic salmon genome provides insights into rediploidization. Nature 533 : 200 EP –.

Nowacki M, Higgins BP, Maquilan GM, Swart EC, Doak TG, Landweber LF. 2009. A Functional Role for Transposases in a Large Eukaryotic Genome. Science 324 : 935–938.

O'Kane CJ, Gehring WJ. 1987. Detection in situ of genomic regulatory elements in Drosophila. Proceedings of the National Academy of Sciences of the United States of America 84 : 9123–9127.

Palazzo A, Caizzi R, Viggiano L, Marsano RM. 2017. Does the Promoter Constitute a Barrier in the Horizontal Transposon Transfer Process? Insight from Bari Transposons. Genome Biology and Evolution 9: 1637–1645.

Palazzo A, Lorusso P, Miskey C, Walisko O, Gerbino A, Marobbio CMT, Ivics Z, Marsano RM. 2019. Transcriptionally promiscuous “blurry” promoters in

165

Tc1/mariner transposons allow transcription in distantly related genomes. Mobile DNA 10 : 13.

Platt RN II, Faircloth BC, Sullivan KAM, Kieran TJ, Glenn TC, Vandewege MW, Lee TE Jr., Baker RJ, Stevens RD, Ray DA. 2017. Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats ed. M. Hahn. Systematic Biology 67 : 236–249.

Podlutsky AJ, Khritankov AM, Ovodov ND, Austad SN. 2005. A New Field Record for Bat Longevity. biomedgerontology 60 : 1366–1368.

Ray DA, Feschotte C, Pagan HJT, Smith JD, Pritham EJ, Arensburger P, Atkinson PW, Craig NL. 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res 18 : 717–728.

Yang G, Nagel DH, Feschotte C, Hancock CN, Wessler SR. 2009. Tuned for Transposition: Molecular Determinants Underlying the Hyperactivity of a <em>Stowaway</em> MITE. Science 325 : 1391.

Yang P, Wang Y, Macfarlan TS. 2017. The Role of KRAB-ZFPs in Transposable Element Repression and Mammalian Evolution. Trends in Genetics 33: 871- 881.

166

APPENDIX

KRABINER REGULATED GENES

Table A.1: KRABINER regulated genes Gene ID Gene name KRABINER Specif Evidenc KWTvKO KWTvKO KWT OEvR KWT KOvWT KOvWT regulation status icity e (log2FC) (padj) (log2FC) OEvR (log2FC) (padj) (padj) ENSMLUG00000011763 LOC10243270 up KWT- rescue, 7.4019357 3.05E-67 0.71781283 9.1297E- - 9.49E- 6 KMK OEvR 4 23 3.2352514 12 ENSMLUG00000024494 PLCB1 up KWT- rescue, 7.3775075 2.65E-34 0.649346 1.2102E- - 6.53E- KMK OEvR 2 08 3.6623075 06 ENSMLUG00000007784 MTMR7 up KWT rescue, 7.2432077 3.86E-28 0.68920358 1.1666E- - 0.00774 OEvR 4 07 2.8775881 288 ENSMLUG00000010564 MYZAP up KWT rescue, 6.5817162 2.48E-60 1.09991453 2.2042E- - 0.00357 OEvR 8 37 2.1531285 105 ENSMLUG00000011072 ENTPD1 up KWT rescue, 6.1345185 5.10E- 0.49735444 1.2568E- - 8.23E- OEvR 3 110 10 1.8408459 06 ENSMLUG00000015296 ENSMLUG000 up all rescue, 5.2752276 1.61E-15 0.65139232 0.0003328 - 1.61E- 00015296 OEvR 3 4.2493702 19 ENSMLUG00000022218 MEOX2 up KWT rescue, 4.5919786 6.11E- 0.55451339 9.4863E- - 0.00287 OEvR 7 103 08 1.1622898 029 ENSMLUG00000008652 IFI44L up KWT rescue, 3.5983138 8.46E-19 0.90353472 5.9709E- - 0.02743 OEvR 2 05 1.4109496 725 ENSMLUG00000014981 PDGFB up KWT- rescue, 3.4414232 5.77E-05 0.491344 0.0060155 - 8.53E- KMK OEvR 1 2 3.7761214 60 ENSMLUG00000009083 IL33 up KWT rescue, 3.0221864 1.50E-10 0.53230499 0.0002423 - 9.03E- OEvR 7 9 1.6493997 08 ENSMLUG00000002554,ENSMLUG00000017874 PANK1 up KWT- rescue, 2.8850553 1.75E-58 0.50312505 6.6356E- - 0.02419 KMK OEvR 09 0.4754331 724 ENSMLUG00000013841 PARD3B up KWT- rescue, 2.7785496 1.13E-56 0.44211907 8.8453E- -0.697509 7.99E- KMD OEvR 7 13 13 ENSMLUG00000014661 GRIK5 up KWT- rescue, 2.7701342 4.11E-37 0.81304562 4.3995E- - 4.64E- KMD OEvR 3 14 1.6963489 21 ENSMLUG00000002136 AHI1 up KWT rescue, 2.0790931 7.20E-38 0.37237229 4.8252E- - 0.04576 OEvR 2 07 0.3066728 176

167

ENSMLUG00000005127 UST up KWT rescue, 1.7130333 2.32E-18 0.64868012 3.1123E- - 5.99E- OEvR 4 13 0.7673557 11 ENSMLUG00000005323 MAMLD1 up KWT rescue, 1.6095510 4.89E-19 0.39385017 2.0452E- - 0.01523 OEvR 6 05 0.4019105 419 ENSMLUG00000026453 RF00012 up KWT- rescue, 1.4994067 0.000729 1.53639132 3.1883E- - 0.00619 KMK OEvR 1 7 16 0.9210762 73 ENSMLUG00000009004 SYNPO up KWT rescue, 1.2875496 0.001027 0.72894445 1.6794E- -1.782998 3.85E- OEvR 1 99 15 64 ENSMLUG00000004316,ENSMLUG00000019230 PRKAG2 up KWT- rescue, 1.2618969 4.87E-05 0.53123678 1.004E-08 - 1.03E- KMK OEvR 8 0.5376967 07 ENSMLUG00000003652 DNAH5 up KWT rescue, 1.2438477 0.043718 0.37559619 0.0046516 - 3.76E- OEvR 6 28 4 1.7666216 151 ENSMLUG00000015613,ENSMLUG00000030620 P3H2 up KWT- rescue, 1.2077273 0.005656 0.61390244 0.0443005 - 0.00013 KMK OEvR 11 5 1.1745721 944 ENSMLUG00000003138 EDN1 up KWT- rescue, 1.1979378 0.010689 0.75269044 1.4216E- - 4.00E- KMK OEvR 3 53 08 1.1266614 15 ENSMLUG00000005506 GATA2 up all rescue, 0.9651717 0.000333 0.54025881 0.0385869 - 8.88E- OEvR 2 7 6 1.0824316 05 ENSMLUG00000015932 LOC10244025 up KWT rescue, 0.8536898 4.96E-07 0.34700141 0.0349740 - 0.00108 8 OEvR 9 6 0.5295873 144 ENSMLUG00000007264 CUX1 up KWT rescue, 0.8520679 2.91E-09 0.45593126 4.6166E- - 4.82E- OEvR 7 11 0.3191403 06 ENSMLUG00000003608 GRK5 up KWT rescue, 0.8108517 4.55E-05 0.56942038 1.849E-09 - 0.00021 OEvR 1 0.4179951 074 ENSMLUG00000026521 LOC10243271 up KWT rescue, 0.7978342 0.046021 0.27143311 0.0025650 - 0.01182 3 OEvR 2 67 9 0.2729548 831 ENSMLUG00000011364 TMEM184B up KWT rescue, 0.6919828 0.034765 0.34725141 0.0340548 - 0.01766 OEvR 1 67 4 0.3893547 057 ENSMLUG00000009300 LIMCH1 up KWT- rescue, 0.6107875 0.014998 0.67082098 1.3227E- - 0.04091 KMD OEvR 6 01 10 0.3268038 016 ENSMLUG00000001180 SLC38A9 up KWT rescue, 0.5822914 0.021135 0.4700554 0.0021840 - 0.00036 OEvR 9 18 2 0.5637685 285 ENSMLUG00000019756 RF00004 up all rescue 4.2184131 2.11E-33 -0.595697 0.0515532 - 1.44E- 3 1.4479436 08 ENSMLUG00000013927 VTCN1 up KWT- rescue 3.6944382 5.65E-11 -0.2877797 0.4120949 - 6.57E- KMK 3 7 2.3370967 07 ENSMLUG00000001015 SERPINE1 up KWT- rescue 2.4088107 1.74E-05 -0.0881342 0.8247687 - 2.39E- KMK 9 2.6885498 35 ENSMLUG00000015777 EPHA4 up KWT- rescue 2.3813419 0.000222 -0.4267403 0.3620036 - 0.01218 KMD 1 93 2 0.6760985 628 ENSMLUG00000028044 LOC10241873 up all rescue 2.1154580 7.72E-08 0.08392617 0.8797454 - 0.00065 6 9 7 1.0140993 44

168

ENSMLUG00000007219 FOXP1 up KWT rescue 2.0317657 0.000176 0.07517193 0.8426118 - 2.21E- 4 2 2.0944175 31 ENSMLUG00000013780 ENSMLUG000 up KWT- rescue 2.0139162 0.048378 0.47615001 0.6122718 -3.173467 1.58E- 00013780 KMK 9 62 3 06 ENSMLUG00000004783 LHX6 up all rescue 2.0082993 7.05E-08 0.16319995 0.7426114 - 1.78E- 2 4 1.8054276 06 ENSMLUG00000026515 RF00004 up all rescue 2.0077312 2.57E-13 -0.6082524 3.045E-06 - 2.08E- 7 0.9460653 11 ENSMLUG00000019586 RF00096 up KWT rescue 1.6326119 4.03E-06 0.39671401 0.3985478 - 2.62E- 6 0.8383929 06 ENSMLUG00000004391 SLC6A8 up KWT- rescue 1.6091979 0.036187 0.05073752 0.9567106 - 0.00502 KMK 1 17 5 1.2512634 42 ENSMLUG00000014038 CPE up KWT- rescue 1.6074930 1.23E-05 -0.06344 0.8269626 - 1.63E- KMK 9 9 2.0048372 56 ENSMLUG00000010651 TYMP up KWT- rescue 1.5096462 0.018827 0.17052647 0.4607676 - 2.58E- KMK 8 7 2 0.6021053 07 ENSMLUG00000015328 PHF7 up all rescue 1.4999922 9.83E-05 -0.2028627 0.5101956 - 2.66E- 4 2 0.9610985 05 ENSMLUG00000007494 THSD7A up KWT rescue 1.4513717 0.012731 0.00814848 0.9848328 - 2.75E- 5 86 4 2.5233699 61 ENSMLUG00000013376 ERG28 up KWT- rescue 1.4202240 1.10E-05 -0.010094 0.9889283 - 0.00333 KMD 7 5 1.0946333 623 ENSMLUG00000013355 SLC7A5 up KWT- rescue 1.4013420 0.002718 0.00900509 0.9827151 - 0.00023 KMK 8 17 4 0.6844614 409 ENSMLUG00000000725 ACAT2 up KWT- rescue 1.3975531 1.46E-09 -0.0083194 0.9806263 - 5.60E- KMK 9 3 1.0198421 19 ENSMLUG00000017345 SPATC1L up KWT- rescue 1.3818810 0.000903 -0.0821604 0.8844149 - 1.44E- KMK 8 04 5 1.2198351 08 ENSMLUG00000021157 RF00009 up KWT rescue 1.3746584 0.005917 0.05271982 0.9370057 - 0.04518 9 64 2 0.6591961 015 ENSMLUG00000003288 HMGCS1 up all rescue 1.3475989 1.01E-12 -0.34397 0.0023279 - 4.31E- 9 4 1.2941338 49 ENSMLUG00000016212 TNIK up all rescue 1.3119587 0.003331 0.03693237 0.9195349 - 0.03848 5 92 6 0.5054265 655 ENSMLUG00000026921 PALM2 up KWT- rescue 1.3100799 0.000555 0.14730803 0.4693878 -1.331563 2.35E- KMK 39 45 ENSMLUG00000002889 MTCL1 up all rescue 1.3068793 9.09E-15 -0.0931651 0.4140920 - 0.02033 5 0.2286244 723 ENSMLUG00000012048 HSD17B7 up KWT- rescue 1.3009068 2.59E-05 0.18756651 0.5270979 - 5.34E- KMK 5 5 1.0373766 09 ENSMLUG00000013148 KDELR3 up KWT rescue 1.2974497 0.001723 -0.1906323 0.4466831 - 1.28E- 1 82 2 0.6973013 05

169

ENSMLUG00000008875 MVD up all rescue 1.2801074 6.38E-06 -0.0222781 0.9472599 - 3.43E- 1 1.2350238 23 ENSMLUG00000002549,ENSMLUG00000017898 SREBF2 up KWT- rescue 1.2777466 7.05E-06 0.0574497 0.7910511 - 1.46E- KMK 6 5 0.7020323 18 ENSMLUG00000010452 LSS up KWT- rescue 1.2703896 0.000315 0.12919944 0.8126670 -1.193905 1.42E- KMK 2 72 2 25 ENSMLUG00000013747 TES up KWT rescue 1.2007011 0.018613 -0.3988423 0.0144923 - 0.00117 9 6 2 0.5449769 747 ENSMLUG00000013820 TRIB1 up KWT- rescue 1.1939033 3.95E-06 0.10554717 0.6417017 - 1.82E- KMK 3 5 0.8289438 08 ENSMLUG00000006832 ASNS up all rescue 1.1837057 7.21E-06 -0.3322444 0.3051517 - 0.02306 0.7249263 273 ENSMLUG00000004238 HMGCR up all rescue 1.1835541 2.44E-12 -0.0642116 0.7025760 - 2.17E- 8 6 1.1230725 50 ENSMLUG00000017228 PDE3A up KWT- rescue 1.1746462 0.001340 -0.7595086 0.0319988 - 2.12E- KMD 45 1.3597788 12 ENSMLUG00000011535 ACKR3 up all rescue 1.1702063 0.004439 0.4838975 0.0881194 - 0.04808 9 58 1 0.6941899 018 ENSMLUG00000011035,ENSMLUG00000020449 ENSMLUG000 up all rescue 1.1686048 9.53E-12 -0.0291626 0.8252915 - 4.89E- 00011035 3 1 0.6306884 21 ENSMLUG00000012552 SRGAP1 up KWT rescue 1.1591799 6.06E-06 -0.0195473 0.9285529 - 0.01321 8 9 0.2732046 603 ENSMLUG00000013813 CHN2 up KWT- rescue 1.1456229 0.026057 0.19445634 0.7058473 - 4.86E- KMK 78 2.0892197 20 ENSMLUG00000007217,ENSMLUG00000027275 PARM1 up KWT rescue 1.1446652 0.010984 0.15289603 0.6576602 - 0.00550 5 71 2 0.7201079 934 ENSMLUG00000011405 SH3BP1 up all rescue 1.1272091 0.000953 0.47003336 0.2018242 - 0.02824 51 5 0.9239489 013 ENSMLUG00000004303 ATP8B1 up KWT rescue 1.1095653 0.000272 0.12028703 0.4163998 - 0.00865 3 02 6 0.3372571 725 ENSMLUG00000020357 RF00618 up KWT rescue 1.1043622 0.021113 0.32514396 0.3109652 - 0.00595 7 64 6 0.6078718 755 ENSMLUG00000007329 OGFRL1 up KWT rescue 1.0912042 0.003561 0.17932918 0.6777152 - 0.00117 5 65 9 1.1219839 881 ENSMLUG00000013151 DDX17 up KWT rescue 1.0729713 0.000370 -0.0115732 0.9558706 - 0.00028 7 49 5 0.3032679 785 ENSMLUG00000016554 SCD up KWT- rescue 1.0592975 0.001723 0.06834137 0.8318571 - 1.41E- KMK 9 82 6 1.5301326 42 ENSMLUG00000013609 FABP3 up all rescue 1.0495528 0.018111 0.06710532 0.8893448 -1.196517 3.55E- 7 29 14 ENSMLUG00000003584 EFNB2 up KWT rescue 1.0118490 0.029037 -0.157399 0.5911919 - 7.87E- 2 46 9 0.8515444 06

170

ENSMLUG00000003088 TLE1 up KWT- rescue 1.0066407 3.69E-09 -0.112606 0.2730821 -0.193645 0.01904 KMD 2 3 825 ENSMLUG00000000188,ENSMLUG00000019127 RBKS up KWT- rescue 1.0048694 0.000381 -0.4034239 0.1218905 - 0.00775 KMK 3 74 7 0.6466216 107 ENSMLUG00000017714 CYP51A1 up all rescue 0.9938576 3.54E-10 0.01905206 0.9367119 - 1.51E- 4 0.7106107 15 ENSMLUG00000000673 PARD6G up KWT- rescue 0.9826570 0.002809 -0.3648927 0.0003065 - 9.29E- KMK 3 01 7 0.4286804 06 ENSMLUG00000014033 MSMO1 up all rescue 0.9337668 7.05E-08 -0.3249837 0.0108511 - 4.42E- 2 3 0.7861497 18 ENSMLUG00000004525,ENSMLUG00000019952 PDK3 up all rescue 0.9266477 2.93E-05 -0.0886496 0.7389850 -0.627833 3.98E- 1 4 05 ENSMLUG00000026784 LOC10241906 up KWT rescue 0.9142400 0.008173 0.76338731 0.2635412 - 0.00119 1 8 23 7 0.6368528 14 ENSMLUG00000002043 GARS up all rescue 0.9075186 6.24E-06 -0.2179488 0.2026709 - 0.00219 5 8 0.3686416 028 ENSMLUG00000013381 TTLL5 up KWT- rescue 0.8948562 3.99E-09 -0.1801406 0.1352021 - 2.87E- KMD 9 9 0.5793331 12 ENSMLUG00000015053 SNAI1 up KWT rescue 0.8941120 0.011729 -0.2833225 0.3426613 - 0.01427 6 32 2 0.4747999 642 ENSMLUG00000017115 CCND2 up all rescue 0.8885490 0.042966 -0.3952663 0.1807731 - 3.22E- 5 1 3 2.6802722 71 ENSMLUG00000007913 IGFBP4 up KWT- rescue 0.8838566 0.032888 -0.4310637 7.5906E- - 3.23E- KMK 4 02 07 0.5564036 15 ENSMLUG00000007669 RDH11 up all rescue 0.8653567 5.51E-06 0.03784474 0.8856252 - 2.89E- 5 7 0.6741878 09 ENSMLUG00000007220 LOC10241841 up KWT- rescue 0.8619419 0.001299 -0.1146408 0.7452520 - 0.02902 5 KMD 4 2 0.5432808 89 ENSMLUG00000011838 NCAM1 up KWT rescue 0.8597862 0.035911 -0.2915607 0.1065249 - 5.91E- 3 21 6 0.5271101 05 ENSMLUG00000000360 HSP90B1 up all rescue 0.8584957 8.61E-08 -0.2209307 0.2449546 - 0.00083 9 8 0.5143162 85 ENSMLUG00000015779 BIRC2 up all rescue 0.8582162 0.008099 -0.1844783 0.2533460 - 1.20E- 4 55 8 0.4726511 07 ENSMLUG00000009010 ZNF318 up KWT- rescue 0.8401315 0.004918 -0.1076893 0.8007207 -0.659405 0.01456 KMD 4 8 669 ENSMLUG00000000526 FAM102B up KWT- rescue 0.8374902 0.002430 -0.1150752 0.7063497 - 0.00032 KMK 5 17 8 0.6583774 647 ENSMLUG00000014817 ARHGAP24 up KWT- rescue 0.8349237 0.032757 -0.0901805 0.5819494 - 0.00019 KMD 2 77 3 0.3416785 407 ENSMLUG00000003188 MDGA1 up KWT rescue 0.8333201 0.046060 -0.0458971 0.9485209 - 0.02306 7 1 5 0.7266141 365

171

ENSMLUG00000000613 SQLE up all rescue 0.8274024 1.60E-05 -0.3064446 0.0343235 - 3.86E- 6 4 0.8887204 23 ENSMLUG00000027669 CTTNBP2 up KWT- rescue 0.8267846 0.024167 -0.0062531 0.9858303 - 1.06E- KMK 7 34 2 0.9774145 11 ENSMLUG00000016823 INSIG1 up KWT- rescue 0.8108587 0.009214 0.23137391 0.5375016 - 6.31E- KMK 6 51 1.2594751 51 ENSMLUG00000014810,ENSMLUG00000021882 EP300 up KWT- rescue 0.8036074 0.006902 -0.0561493 0.7363980 - 1.78E- KMK 6 77 1 0.3416325 06 ENSMLUG00000030513 ENSMLUG000 up all rescue 0.7708078 0.002667 -0.5230454 0.0001277 - 2.54E- 00030513 7 28 9 1.1680717 38 ENSMLUG00000015267 COQ2 up KWT- rescue 0.7568003 0.004304 -0.0878293 0.8893448 - 0.02258 KMD 37 0.7558501 441 ENSMLUG00000003152 PELI1 up KWT- rescue 0.7555650 1.10E-05 0.16133653 0.3863706 - 0.00089 KMK 7 1 0.4779261 852 ENSMLUG00000006295 PYROXD1 up KWT- rescue 0.7548635 0.005168 -0.1031505 0.7600346 - 0.04072 KMD 5 71 7 0.4735552 683 ENSMLUG00000008863 CYBA up KWT rescue 0.7459256 0.024653 0.03823572 0.9121371 - 5.15E- 8 24 0.8151013 09 ENSMLUG00000002929 CRYBG1 up KWT- rescue 0.7391938 0.006479 -0.1991535 0.3591727 - 0.00268 KMD 6 55 9 0.4195868 504 ENSMLUG00000013566 GATA6 up KWT- rescue 0.7302312 0.001117 -0.3033956 0.2616821 - 1.44E- KMD 4 91 4 1.2087802 11 ENSMLUG00000003877 INTS2 up all rescue 0.7264047 0.000107 -0.3796994 0.0733762 - 0.04244 9 18 3 0.4116972 26 ENSMLUG00000016280 MVK up KWT- rescue 0.7168060 0.001532 0.25804237 0.1710381 - 0.00395 KMK 2 63 1 0.4753697 173 ENSMLUG00000011189 TARS up KWT- rescue 0.6966650 0.004769 0.03710677 0.9167734 - 0.00132 KMK 6 05 5 0.5053914 687 ENSMLUG00000002419 ERMP1 up KWT rescue 0.6874422 0.029437 -0.1162556 0.7849745 - 2.38E- 7 84 1.1162426 08 ENSMLUG00000009421 JAG1 up KWT rescue 0.6839800 0.002582 -0.7031238 1.0316E- -0.410182 6.86E- 4 35 26 12 ENSMLUG00000005508 FDFT1 up all rescue 0.6837612 2.91E-09 -0.2321802 0.0364396 - 5.09E- 3 4 0.8561147 27 ENSMLUG00000017439 RFX2 up KWT rescue 0.6705048 0.007313 0.09770468 0.7553162 - 0.00182 1 6 5 0.5069736 82 ENSMLUG00000016912 DHCR24 up all rescue 0.6647678 0.001179 -0.2758668 0.0775169 - 6.48E- 1 43 0.8031161 20 ENSMLUG00000010093,ENSMLUG00000018399,ENSMLUG RPL21 up all rescue 0.6645891 0.034276 0.19902459 0.5522625 - 0.00349 00000018921 3 74 3 0.5953324 129 ENSMLUG00000013418,ENSMLUG00000013446 HCN3/RUSC1 up KWT- rescue 0.6509634 0.031471 0.03658553 0.8856252 - 8.36E- KMK 4 84 7 0.7272501 25

172

ENSMLUG00000009101 MYO10 up KWT- rescue 0.6430377 5.33E-05 -0.0778304 0.4085841 - 6.15E- KMK 2 1 0.2741793 08 ENSMLUG00000012483 SETD3 up KWT rescue 0.6405388 0.029298 0.08042784 0.8154916 - 0.03613 5 98 5 0.3948306 073 ENSMLUG00000000074 TMPO up KWT- rescue 0.6382515 0.000262 -0.2218712 0.0498477 - 0.00801 KMD 2 81 7 0.2910991 796 ENSMLUG00000010466 STAG1 up KWT- rescue 0.6257747 8.10E-05 -0.2412823 0.0101004 - 1.88E- KMD 5 8 0.3317614 05 ENSMLUG00000012756 JAK2 up all rescue 0.6061860 0.000624 -0.2186341 0.0430247 - 0.01219 7 09 2 0.2493596 655 ENSMLUG00000007627 KIAA1841 up KWT- rescue 0.5989566 0.027965 -0.0811653 0.8122531 -0.596078 0.00196 KMD 1 62 5 396 ENSMLUG00000009516,ENSMLUG00000009546 PPP6C/HSPA5 up all rescue 0.5788816 5.69E-07 -0.0838904 0.5761008 - 0.03171 1 9 0.2218321 661 ENSMLUG00000003657 VMP1 up KWT rescue 0.5750902 0.012839 -0.2035335 0.0103046 - 0.02309 9 01 2 0.1417025 343 ENSMLUG00000008327 SLC12A2 up all rescue 0.5747435 0.042115 -0.0710887 0.8563750 - 4.35E- 9 47 5 0.6385371 05 ENSMLUG00000002839 C3orf67 up KWT- rescue 0.5704121 0.040786 -0.1540741 0.6625159 - 0.02651 KMD 2 01 4 0.4906908 236 ENSMLUG00000005411 KLF6 up KWT- rescue 0.5614540 0.015373 -0.0445487 0.7969689 - 5.52E- KMK 9 61 8 0.8437302 42 ENSMLUG00000000518,ENSMLUG00000025212 ALMS1 up all rescue 0.5588441 1.06E-05 -0.1625268 0.1642778 - 0.00348 6 1 0.2750455 861 ENSMLUG00000008018 ATP11C up KWT rescue 0.5435442 0.015588 -0.0556624 0.8691877 -0.422732 0.02346 8 59 347 ENSMLUG00000011693 MAP3K1 up KWT rescue 0.5347972 0.005030 0.18357522 0.4059913 -0.391979 0.02899 2 95 5 524 ENSMLUG00000000183,ENSMLUG00000020337 PGD up all rescue 0.5264112 0.007339 -0.2274376 0.3200132 - 0.04318 1 18 8 0.3674061 545 ENSMLUG00000010300,ENSMLUG00000021396 PKD2 up KWT- rescue 0.5153073 0.001605 -0.0966296 0.5722158 -0.267994 0.01212 KMK 3 72 2 423 ENSMLUG00000005512,ENSMLUG00000027544 OSBPL11 up KWT- rescue 0.5117111 0.002052 -0.1827201 0.3035015 - 0.00420 KMK 1 12 7 0.3835995 814 ENSMLUG00000010589 ATP9B up KWT- rescue 0.4889838 0.014552 -0.1625514 0.1659200 - 3.76E- KMD 2 27 8 0.3423699 05 ENSMLUG00000012837 ENSMLUG000 up KWT- rescue 0.4870126 0.008806 -0.1938463 0.3986843 - 0.00086 00012837 KMK 2 84 2 0.4825683 001 ENSMLUG00000009598 TBC1D9 up KWT- rescue 0.4338266 0.019687 -0.195119 0.2149096 - 0.00172 KMK 9 27 5 0.3865843 816 ENSMLUG00000004227 BRD9 up KWT rescue 0.4257747 0.028668 0.06719662 0.8312537 - 0.03023 4 21 0.3689889 896

173

ENSMLUG00000027361 LOC10242976 up KWT- rescue 0.4188737 0.041438 -0.159707 0.0568559 - 1.32E- 3 KMK 2 04 2 0.2433488 05 ENSMLUG00000011700 ENSMLUG000 up all rescue 0.4104338 0.007797 -0.1429253 0.4608112 - 0.00660 00011700 3 84 0.3530309 378 ENSMLUG00000001435 DYRK1A up all rescue 0.3720737 0.007934 -0.0380076 0.8825622 - 0.00388 8 7 5 0.2837806 523 ENSMLUG00000013293 PPP2R5C up KWT- rescue 0.3644773 0.003553 0.07333368 0.6671978 - 0.04896 KMD 9 73 4 0.2369831 156 ENSMLUG00000013258 EMC2 up all rescue 0.3558121 0.037178 0.09099873 0.7514795 - 0.00143 4 84 8 0.4970772 033 ENSMLUG00000015266 CHD7 up KWT- rescue 0.3432320 0.035378 0.09911351 0.4885680 - 0.01113 KMD 9 23 7 0.2060888 059 ENSMLUG00000005739 SMG6 up KWT- rescue 0.2860536 0.024553 -0.1130737 0.3606557 - 0.02364 KMD 3 84 0.1725066 694 ENSMLUG00000010918 TRABD2B down all rescue, -3.4312208 1.89E-18 -0.7100467 0.0272748 2.7947487 1.88E- OEvR 8 7 52 ENSMLUG00000005851 SLC4A4 down all rescue, -2.9491147 5.82E-05 -0.7474548 0.0210136 0.4899246 1.49E- OEvR 5 9 06 ENSMLUG00000015238,ENSMLUG00000021792 ECM2 down KWT- rescue, -2.5680293 3.83E-06 -1.2796024 0.0065047 0.8336772 1.08E- KMK OEvR 1 1 08 ENSMLUG00000001157 LAMA2 down all rescue, -1.9760424 2.39E-05 -0.8748112 0.0028738 2.0676419 1.08E- OEvR 3 2 51 ENSMLUG00000010440 ENSMLUG000 down all rescue, -1.7919245 0.000520 -1.2678793 0.0046878 2.7324826 3.47E- 00010440 OEvR 76 2 3 30 ENSMLUG00000014302 ADGRL2 down all rescue, -1.7579369 2.49E-08 -0.2881266 0.0240931 0.8118398 5.67E- OEvR 5 2 58 ENSMLUG00000006692 MAP2K6 down KWT- rescue, -1.6217624 4.60E-05 -0.839264 0.0101179 0.4665407 0.00052 KMK OEvR 8 5 792 ENSMLUG00000012270 DKK1 down KWT rescue, -1.5134697 0.030313 -0.815686 0.0081198 3.6976026 4.84E- OEvR 02 3 9 115 ENSMLUG00000013557,ENSMLUG00000023482 FGD4 down KWT rescue, -1.418965 0.029290 -0.5525036 0.0057464 0.4294038 1.01E- OEvR 96 7 1 09 ENSMLUG00000029080 FNDC1 down all rescue, -1.3403513 0.004043 -0.7651914 9.9586E- 4.5751231 3.55E- OEvR 79 08 3 295 ENSMLUG00000009658 RNF150 down KWT- rescue, -1.3194325 0.000320 -0.4543017 0.0078237 0.2085032 0.02422 KMK OEvR 97 3 2 779 ENSMLUG00000007857,ENSMLUG00000029689 FBXL7 down all rescue, -1.254536 0.000420 -0.7131095 8.4907E- 1.0913817 3.81E- OEvR 88 07 2 32 ENSMLUG00000010725 LVRN down KWT- rescue, -0.9763674 1.27E-05 -0.3284134 0.0240872 2.2890870 7.81E- KMK OEvR 5 6 150 ENSMLUG00000013107 MAN2A1 down all rescue, -0.9343267 9.37E-14 -0.3113221 0.0120315 0.8369053 9.78E- OEvR 1 5 25

174

ENSMLUG00000002004,ENSMLUG00000019219,ENSMLUG CBLB down KWT- rescue, -0.8820142 2.19E-05 -0.3099406 0.0073128 0.4110067 5.10E- 00000022462 KMK OEvR 1 11 ENSMLUG00000010646,ENSMLUG00000028376 GPC3 down KWT rescue, -0.7580928 0.004385 -0.3036455 0.0002064 0.7482185 5.30E- OEvR 39 9 47 ENSMLUG00000001090 SVIL down KWT- rescue, -0.7470458 0.003336 -0.311398 0.0347484 0.7010584 6.34E- KMK OEvR 39 7 8 15 ENSMLUG00000002314 EHBP1 down KWT rescue, -0.731863 0.001278 -0.3893343 2.845E-07 0.4337285 5.60E- OEvR 18 4 13 ENSMLUG00000015756,ENSMLUG00000027498 ASPH down KWT- rescue, -0.6372528 0.001676 -0.4088278 0.0106831 0.3057391 0.00618 KMK OEvR 79 2 1 558 ENSMLUG00000007351,ENSMLUG00000021353 BBX down KWT rescue, -0.636702 0.004038 -0.3618896 0.0013389 0.4169827 1.64E- OEvR 23 1 4 09 ENSMLUG00000005610 DST down KWT rescue, -0.5472574 0.010272 -0.3152627 1.1298E- 0.5861806 3.01E- OEvR 8 05 3 41 ENSMLUG00000004962 NT5C2 down KWT- rescue, -0.5422442 0.002131 -0.404862 2.18E-06 0.6082447 3.89E- KMD OEvR 39 9 25 ENSMLUG00000001838 FN1 down KWT rescue, -0.4923159 0.022006 -0.457608 3.1076E- 0.8188775 7.21E- OEvR 92 06 4 85 ENSMLUG00000016103 VIM down KWT rescue, -0.4654611 0.002784 -0.2287453 0.0060308 0.4420863 1.61E- OEvR 46 6 8 17 ENSMLUG00000015230 OGN down KWT- rescue -7.6568961 1.66E-09 -2.1685499 NA 3.8948478 4.62E- KMK 9 95 ENSMLUG00000015234 ASPN down KWT- rescue -6.0496335 6.96E-05 -1.3051188 NA 4.5469865 2.79E- KMK 2 59 ENSMLUG00000004620 ENSMLUG000 down all rescue -5.7153396 1.32E-14 1.06464145 NA 4.5701039 8.84E- 00004620 3 50 ENSMLUG00000004614 AUTS2 down KWT rescue -5.6910384 0.006844 -0.8776095 NA 3.0473701 2.10E- 98 1 59 ENSMLUG00000015062 CLCA2 down KWT- rescue -5.6591079 3.54E-15 -0.8449244 NA 1.2782099 7.50E- KMK 4 05 ENSMLUG00000002221 CACNA2D4 down all rescue -5.3226193 8.73E-14 0.41833644 NA 4.2362401 8.71E- 4 18 ENSMLUG00000015509 ACTA2 down KWT rescue -4.8258577 0.000569 -0.0818275 0.9436464 2.1784317 6.48E- 76 5 1 92 ENSMLUG00000016213,ENSMLUG00000018709 LRRC2 down KWT- rescue -4.5507498 0.000133 0.07730251 NA 2.3321342 0.00092 KMK 33 1 746 ENSMLUG00000011029 FAM180A down all rescue -4.4723178 2.36E-16 -0.1017368 NA 2.1217128 4.32E- 4 15 ENSMLUG00000016208 LEPR down all rescue -4.422289 4.21E-12 -2.3447561 NA 2.3633306 2.21E- 9 13 ENSMLUG00000001055 DDIT3 down all rescue -4.400746 1.98E-05 0.80224482 NA 2.5781189 1.67E- 07

175

ENSMLUG00000008646 PTGFR down KWT- rescue -4.321821 0.000156 1.12326265 NA 3.4772301 4.66E- KMD 27 05 ENSMLUG00000002976 DKK2 down KWT- rescue -3.9327527 0.000113 -0.3464618 NA 2.9295956 7.37E- KMD 98 09 ENSMLUG00000019567 RF00003 down all rescue -3.8943414 0.000720 1.24720989 NA 1.3095374 0.00615 22 566 ENSMLUG00000024426 LOC10243934 down KWT rescue -3.8839893 0.014069 -0.0067596 NA 2.2205793 0.00710 6 32 1 163 ENSMLUG00000004068 NEXN down KWT- rescue -3.8796311 1.26E-11 0.64011434 0.5307623 2.5593160 1.31E- KMD 1 6 34 ENSMLUG00000015339 SEMA3G down KWT- rescue -3.8791353 0.001352 1.07395965 NA 3.5648117 3.64E- KMD 72 6 05 ENSMLUG00000005976 ENSMLUG000 down KWT rescue -3.8188964 0.003872 -0.3521883 NA 1.3858908 0.02596 00005976 81 7 144 ENSMLUG00000003707,ENSMLUG00000021246 ABI3BP down all rescue -3.7070773 3.54E-10 -0.1912227 0.8569192 4.0432808 6.79E- 4 9 68 ENSMLUG00000004108 COLEC12 down KWT rescue -3.7034232 1.40E-07 0.139855 NA 1.7329355 1.55E- 4 13 ENSMLUG00000015691 SLC6A3 down all rescue -3.6863469 4.53E-05 -0.003484 NA 2.4106817 0.00011 9 091 ENSMLUG00000030169 SLC26A10 down all rescue -3.6734287 2.00E-15 0.77643385 0.1367730 2.4217506 1.72E- 7 7 45 ENSMLUG00000007704 LUM down KWT rescue -3.6439124 0.000715 0.00020228 NA 1.5236301 0.00015 52 1 222 ENSMLUG00000000923 TSGA13 down KWT rescue -3.610042 0.036689 1.05668145 NA 4.1054171 0.01383 1 068 ENSMLUG00000004481 NKD1 down KWT rescue -3.5851268 0.000763 -0.5425659 NA 3.6873833 1.65E- 27 4 12 ENSMLUG00000003586 PYM1 down all rescue -3.5751041 2.30E-05 -0.6147459 NA 1.9993591 1.24E- 7 05 ENSMLUG00000026014 IKZF4 down all rescue -3.5282046 9.57E-06 0.3869581 NA 1.5784467 4.06E- 9 07 ENSMLUG00000017178,ENSMLUG00000018638 GRIP1 down all rescue -3.399179 1.35E-07 -0.1336293 NA 1.4972633 6.42E- 10 ENSMLUG00000008103 ITGBL1 down KWT- rescue -3.3364367 8.82E-12 -0.8633719 0.2225425 5.5489625 6.33E- KMK 5 4 66 ENSMLUG00000008071 CCDC59 down all rescue -3.3131485 1.61E-25 -0.0902538 0.9065410 3.3346923 2.18E- 9 96 ENSMLUG00000002230 ENSMLUG000 down all rescue -3.1582843 0.002099 1.469112 NA 3.0077445 7.99E- 00002230 66 5 05 ENSMLUG00000010233,ENSMLUG00000019166 TMTC2 down all rescue -3.1119601 5.75E-24 0.43742344 0.5587047 2.9052151 9.49E- 2 1 39

176

ENSMLUG00000014160 CACNA1C down KWT- rescue -3.0148686 0.003316 -0.0268074 0.9784449 4.5179497 3.50E- KMD 63 8 7 81 ENSMLUG00000008084 METTL25 down all rescue -2.9915551 2.26E-34 -0.1364241 0.7040469 2.7832464 2.17E- 2 181 ENSMLUG00000002925 ADAMTS17 down all rescue -2.9693761 1.06E-20 -0.0524777 0.9562336 1.3616077 7.36E- 9 7 16 ENSMLUG00000023596 ENSMLUG000 down KWT- rescue -2.9536007 0.000786 -0.3023592 NA 3.0528459 1.36E- 00023596 KMD 1 2 21 ENSMLUG00000001013 GLI1 down all rescue -2.9352722 6.15E-06 0.25311459 NA 2.9382830 6.55E- 6 19 ENSMLUG00000008815 HMOX1 down all rescue -2.9290364 3.25E-13 -0.5643677 0.4102007 1.0078604 2.33E- 6 4 10 ENSMLUG00000008555 TSPAN9 down all rescue -2.9213374 3.32E-08 0.45111212 0.5482970 2.2545646 3.83E- 5 4 21 ENSMLUG00000016273 LOC10243871 down KWT rescue -2.9123156 0.021349 2.00203707 NA 1.7361719 0.01143 4 15 8 968 ENSMLUG00000006731 ENSMLUG000 down all rescue -2.9075727 9.95E-06 0.30783119 0.0313272 2.1998521 1.13E- 00006731 3 6 75 ENSMLUG00000006662 NAB2 down all rescue -2.8794061 7.47E-18 0.3285766 0.4381794 2.1969535 2.05E- 9 74 ENSMLUG00000027812 ADRA1A down KWT- rescue -2.8760998 0.002617 -0.4389375 NA 2.8494574 3.38E- KMK 06 18 ENSMLUG00000006653 GRB14 down KWT- rescue -2.8230076 0.000446 0.58584153 NA 3.0398616 5.63E- KMD 21 4 07 ENSMLUG00000010988 SLC13A4 down all rescue -2.8136454 3.27E-05 -0.5000669 NA 2.3865540 6.50E- 5 10 ENSMLUG00000006591 MYL6B down all rescue -2.8071173 2.99E-05 -0.2758662 NA 2.5302144 4.25E- 9 09 ENSMLUG00000001537 PPFIA2 down all rescue -2.7958257 1.77E-10 0.01299096 NA 2.5885940 1.80E- 3 14 ENSMLUG00000006546 APOE down KWT rescue -2.7880048 0.012268 -0.4071646 NA 2.8855028 0.01151 66 5 8 ENSMLUG00000007140 SLC9A3 down KWT- rescue -2.7823753 0.004753 -0.6032183 NA 1.4072754 0.01169 KMD 03 7 202 ENSMLUG00000002252 DCP1B down all rescue -2.7402353 1.19E-26 -0.0607912 0.8948235 1.5885769 3.39E- 4 59 ENSMLUG00000002683 LRIG1 down KWT rescue -2.7291842 0.003282 0.43193877 NA 1.4184500 0.00031 81 9 618 ENSMLUG00000003022 PROB1 down all rescue -2.723214 0.000346 1.50122928 NA 2.1640508 6.21E- 42 8 06 ENSMLUG00000003606 SUOX down all rescue -2.6757807 2.97E-08 -0.5569179 0.5277431 1.9396374 1.10E- 8 6 16

177

ENSMLUG00000008153 LOC10242609 down KWT- rescue -2.6211795 0.026538 0.86337325 NA 1.5444400 0.02751 3 KMD 5 1 549 ENSMLUG00000011727 ADAMTS2 down all rescue -2.6099989 5.73E-19 0.04617179 0.9665606 2.7825824 5.01E- 8 3 26 ENSMLUG00000030361 ENSMLUG000 down all rescue -2.6099232 6.31E-06 0.29255372 0.0030359 2.2284774 0 00030361 8 6 ENSMLUG00000002767 NUDT4 down all rescue -2.5877909 1.09E-30 -0.2344653 0.4690908 2.6140695 4.15E- 7 7 107 ENSMLUG00000008432 RAB3IP down KWT- rescue -2.5674555 7.62E-06 -0.0199007 0.9857971 3.9496309 4.89E- KMD 1 8 34 ENSMLUG00000005602 SPATA13 down KWT rescue -2.5421316 0.000119 -0.0360471 NA 2.1171759 1.10E- 16 07 ENSMLUG00000028917 LOC10244311 down all rescue -2.5249775 2.43E-07 -0.3329914 NA 2.2115875 4.37E- 1 8 10 ENSMLUG00000013498 ADCY2 down KWT rescue -2.5232004 2.23E-05 -0.6091097 NA 5.5627697 1.21E- 6 10 ENSMLUG00000010290 SASH1 down all rescue -2.5173906 1.77E-07 -0.3522791 0.3821304 0.8253128 5.84E- 7 2 14 ENSMLUG00000001115 ARHGEF25 down all rescue -2.4866005 1.13E-09 0.66377124 0.0475165 1.6993693 1.80E- 5 3 44 ENSMLUG00000003578,ENSMLUG00000030389 DNAJC14 down all rescue -2.4847222 5.15E-26 -0.0498658 0.9317477 1.7406130 5.92E- 9 6 51 ENSMLUG00000006568 ESYT1 down all rescue -2.4750268 1.16E-22 0.28271068 0.3922207 1.5508569 3.11E- 4 41 ENSMLUG00000000222 PROSER2 down KWT- rescue -2.4377553 0.004713 1.13425557 NA 1.9432288 0.00492 KMD 62 7 189 ENSMLUG00000014959 COL3A1 down KWT- rescue -2.4325312 0.002607 -0.3996477 0.0804401 0.2568799 0.00511 KMK 69 1 7 13 ENSMLUG00000001110 DTX3 down all rescue -2.3958078 3.27E-07 -0.339051 0.7062105 1.9306295 8.64E- 2 6 17 ENSMLUG00000005462 USP18 down KWT- rescue -2.3940508 9.91E-05 0.75471289 NA 1.7461029 4.23E- KMD 1 05 ENSMLUG00000013785 CYP1B1 down all rescue -2.3883776 4.57E-06 -0.1017831 0.8579622 0.3188342 0.01044 8 2 437 ENSMLUG00000006100 SERPINF1 down KWT- rescue -2.3777369 1.78E-08 -0.9447955 NA 1.0376979 0.00018 KMK 7 126 ENSMLUG00000006689 STAT6 down all rescue -2.3641716 9.67E-07 -0.248431 0.5597679 2.0288609 1.99E- 2 5 67 ENSMLUG00000011085 PLEKHB1 down all rescue -2.3612524 0.008511 -0.1057472 NA 2.8616232 1.38E- 47 4 06 ENSMLUG00000002911 TPH2 down all rescue -2.3459965 2.90E-09 0.12547845 0.8601932 3.8574855 5.92E- 7 1 51

178

ENSMLUG00000000516 SGIP1 down KWT- rescue -2.3431554 0.006636 0.42669928 NA 2.2275089 2.02E- KMD 41 8 05 ENSMLUG00000023985 NALCN down all rescue -2.3410171 7.38E-13 -0.1057765 0.9355931 1.4027815 1.60E- 6 6 08 ENSMLUG00000012406,ENSMLUG00000018036 TRHDE down all rescue -2.3303506 2.46E-10 0.51129557 0.4650147 2.3307842 6.31E- 4 17 ENSMLUG00000004459 DGKG down KWT- rescue -2.317167 1.29E-10 0.03097385 0.9840084 2.6634726 3.55E- KMK 3 4 13 ENSMLUG00000003602 RAB5B down all rescue -2.3134569 2.32E-17 -0.1798791 0.7911957 1.8890528 3.71E- 7 9 37 ENSMLUG00000005779 SLC16A7 down KWT- rescue -2.3082036 0.008420 0.40530372 NA 2.7591118 7.29E- KMD 35 6 05 ENSMLUG00000006614 MYL6 down all rescue -2.2803217 9.16E-24 0.39007536 0.1001557 2.1881908 3.61E- 6 6 114 ENSMLUG00000001526 LIN7A down KWT rescue -2.2791954 0.027207 -0.203092 0.7374631 1.4197312 2.01E- 42 18 ENSMLUG00000001079 KIF5A down all rescue -2.2681994 3.96E-08 0.08368237 0.8656300 1.3234854 8.58E- 5 7 38 ENSMLUG00000002578 B4GALNT1 down all rescue -2.2621979 0.025732 -0.6270376 NA 2.1291989 0.00143 77 6 075 ENSMLUG00000006625 SMARCC2 down all rescue -2.2590827 1.20E-44 -0.0308064 0.9261503 1.9420875 9.48E- 1 4 154 ENSMLUG00000005053 SPNS2 down KWT rescue -2.2406536 0.000122 -1.370822 NA 3.3864271 8.29E- 83 2 10 ENSMLUG00000006564 ZC3H10 down all rescue -2.2176131 0.000925 0.1671422 0.8856252 2.1993055 2.55E- 77 7 5 11 ENSMLUG00000008775 AGAP2 down all rescue -2.2121793 1.82E-07 0.1589249 0.6799749 2.0037283 2.20E- 9 2 82 ENSMLUG00000005905 OS9 down all rescue -2.2103579 3.65E-16 0.15954196 0.6384713 2.3277721 2.44E- 2 3 80 ENSMLUG00000000530 RBMS2 down all rescue -2.2088224 3.41E-18 0.34142263 0.1201025 1.6742529 1.73E- 6 2 61 ENSMLUG00000005157 ACCS down KWT- rescue -2.2066528 0.010822 0.53303363 NA 4.4316644 4.00E- KMD 08 4 07 ENSMLUG00000014555,ENSMLUG00000025933 TBC1D15 down all rescue -2.2058131 8.87E-12 -0.0636287 0.8426118 2.4223833 1.55E- 2 4 220 ENSMLUG00000000279 ENSMLUG000 down all rescue -2.2000424 5.45E-09 1.38579601 0.1478995 0.5152477 0.04297 00000279 6 9 794 ENSMLUG00000007771 SLC16A2 down KWT- rescue -2.1860652 0.017944 0.37631203 NA 3.2998958 8.63E- KMD 02 4 05 ENSMLUG00000003515 TCEA3 down all rescue -2.1789986 2.41E-11 0.24523572 0.5665519 0.4559242 0.00468 2 189

179

ENSMLUG00000005568 COLGALT2 down KWT- rescue -2.168558 0.015218 -0.0231428 NA 1.7461791 0.00057 KMK 03 1 282 ENSMLUG00000006057 RNF41 down all rescue -2.162256 6.43E-19 0.03598492 0.9130688 2.0543460 2.28E- 2 3 132 ENSMLUG00000006850 SPRYD4 down all rescue -2.1290048 0.015157 -0.6668519 NA 2.6829952 8.91E- 3 09 ENSMLUG00000004969 STAC3 down all rescue -2.126753 2.11E-19 0.22641468 0.5051722 1.8838461 8.54E- 6 8 53 ENSMLUG00000026475 ENSMLUG000 down all rescue -2.1091195 1.01E-09 0.3086587 0.4855923 2.1201813 1.81E- 00026475 5 8 34 ENSMLUG00000013086 WNT9A down all rescue -2.1018225 0.001140 -1.1431283 0.1213511 2.7808562 3.16E- 29 7 6 28 ENSMLUG00000000546 ENSMLUG000 down all rescue -2.0967329 3.15E-07 0.0131537 0.9772882 2.1799286 3.32E- 00000546 8 2 106 ENSMLUG00000025000 CPEB1 down all rescue -2.0886414 1.53E-08 0.585797 0.3009104 3.0987428 4.65E- 5 7 25 ENSMLUG00000003581 MMP19 down all rescue -2.0802726 2.00E-13 -0.343955 0.5984598 2.0222017 1.55E- 4 26 ENSMLUG00000007246 SIX2 down all rescue -2.0754394 4.04E-06 0.02793361 0.9844612 7.4218649 4.83E- 9 3 11 ENSMLUG00000029995 SOBP down all rescue -2.0500945 4.24E-21 -0.1735578 0.3606250 0.3500431 7.65E- 9 08 ENSMLUG00000012488 PLCB4 down KWT- rescue -2.0449363 3.80E-07 -0.0582819 0.9433037 1.9607766 1.39E- KMD 1 6 24 ENSMLUG00000003565 SARNP down all rescue -2.0341992 7.90E-13 -0.0693404 0.8505426 2.1774669 2.23E- 7 4 96 ENSMLUG00000003018 MZB1 down KWT- rescue -2.0340023 0.006507 -0.1129461 NA 2.6982866 7.14E- KMD 39 6 08 ENSMLUG00000006725 CNPY2 down all rescue -2.0309445 7.13E-18 0.03438306 0.9533119 2.4267989 9.48E- 7 7 52 ENSMLUG00000010430 CD63 down all rescue -2.0292385 1.21E-22 0.19874645 0.6518730 2.0589345 1.18E- 2 43 ENSMLUG00000014602 MAGI2 down KWT- rescue -2.0267086 4.19E-27 -0.261509 0.1622434 0.70432 8.72E- KMD 21 ENSMLUG00000006674 ANKRD52 down all rescue -2.0261181 3.69E-50 0.21711496 0.2338512 1.8633617 7.76E- 6 7 134 ENSMLUG00000005310 TEAD4 down all rescue -1.9999022 8.91E-11 0.4802976 0.0011380 1.5179056 2.37E- 5 6 72 ENSMLUG00000003679 TMEM45A down KWT- rescue -1.992915 0.003569 -0.1339339 NA 2.4272006 8.97E- KMD 04 7 09 ENSMLUG00000016109 PPFIA4 down KWT- rescue -1.9849435 0.000134 0.45099922 0.4085841 0.7134642 3.62E- KMD 03 1 4 05

180

ENSMLUG00000016973 NPAS3 down KWT- rescue -1.9846086 0.025557 -0.3643985 NA 2.0366081 0.00801 KMD 63 8 796 ENSMLUG00000001285 LOC10244268 down all rescue -1.9796671 8.60E-06 0.07524334 0.8601932 1.0870769 2.62E- 9 7 6 22 ENSMLUG00000009825 NCKAP5L down KWT- rescue -1.9788061 4.57E-06 0.47458255 0.0878208 1.1093858 4.86E- KMD 5 9 17 ENSMLUG00000006654 SLC39A5 down all rescue -1.964926 0.001814 -0.6179052 NA 3.1554252 2.30E- 75 4 09 ENSMLUG00000006364,ENSMLUG00000019392 PRPF40B down all rescue -1.9609052 1.10E-05 0.55738835 0.0874024 0.8854314 2.65E- 4 1 09 ENSMLUG00000005405 DISC1 down KWT- rescue -1.9440125 1.02E-05 0.3386556 0.6633190 1.3036830 1.42E- KMD 6 2 07 ENSMLUG00000010406 BLOC1S1 down all rescue -1.9382306 4.24E-28 0.11520509 0.7587610 2.2799584 1.52E- 5 1 128 ENSMLUG00000000404 MYO1A down all rescue -1.9344595 5.31E-09 0.16105668 0.5504323 2.9302149 1.12E- 6 4 147 ENSMLUG00000015964 GAB2 down all rescue -1.9338812 1.78E-08 0.29101083 0.6337827 0.6065545 0.00630 5 5 996 ENSMLUG00000005170,ENSMLUG00000019397 BICC1 down all rescue -1.9232723 5.19E-37 0.02408479 0.9558041 1.1913399 6.65E- 2 1 35 ENSMLUG00000006646 NABP2 down all rescue -1.91924 6.41E-06 0.28470441 0.6527011 1.8123109 2.76E- 9 7 14 ENSMLUG00000001272 MYO15A down KWT- rescue -1.9026219 0.034268 1.07795099 NA 2.2816944 0.00018 KMK 31 2 703 ENSMLUG00000014181 ESR1 down all rescue -1.896408 0.000156 -0.2000832 0.7932160 2.1019116 1.19E- 65 7 2 21 ENSMLUG00000000390 ZBTB39 down all rescue -1.8958597 2.94E-06 0.4959718 0.3477706 2.1484942 8.81E- 9 6 17 ENSMLUG00000010069 KRT7 down KWT rescue -1.8817764 0.013016 -0.5218641 0.5987720 0.8305359 0.02138 73 3 9 279 ENSMLUG00000030747 LOC10242860 down KWT- rescue -1.873533 6.63E-05 0.69853284 0.1895258 0.9163793 0.00053 4 KMD 3 6 408 ENSMLUG00000006785 STAT2 down all rescue -1.8733931 8.04E-12 0.0915018 0.8330511 1.9453084 4.76E- 7 2 46 ENSMLUG00000017277 ITGB7 down all rescue -1.8695973 4.73E-05 -0.0726748 0.9260123 2.8481733 1.58E- 9 7 39 ENSMLUG00000011587 EDIL3 down KWT rescue -1.8572317 0.026596 -0.7548521 0.3630021 3.7984100 4.57E- 67 5 26 ENSMLUG00000011866 CALCOCO1 down KWT- rescue -1.8554397 0.000681 -0.0402458 0.9525720 1.0365217 5.64E- KMD 04 9 8 14 ENSMLUG00000001102 PIP4K2C down all rescue -1.8482878 0.000261 -0.2691358 0.4540824 1.7708684 2.50E- 32 2 9 32

181

ENSMLUG00000004980,ENSMLUG00000023009 R3HDM2 down all rescue -1.8459488 1.86E-11 -0.1079808 0.3728058 1.9367038 4.45E- 9 286 ENSMLUG00000027657 RF00002 down all rescue -1.8439452 1.03E-06 -0.223084 0.6942721 1.9366886 0.00457 6 1 516 ENSMLUG00000014887 TRIM47 down all rescue -1.8354466 6.69E-06 0.21684731 0.7363980 0.9759740 2.59E- 1 8 07 ENSMLUG00000005327 LRIG3 down all rescue -1.8328657 1.19E-09 -0.122413 0.4089213 0.9436253 2.34E- 2 6 82 ENSMLUG00000001065 DCTN2 down all rescue -1.8210252 4.38E-12 0.18095793 0.3585853 1.9973355 6.20E- 8 4 180 ENSMLUG00000009982 UBE2NL down KWT- rescue -1.8196878 0.001289 -0.2778537 0.8124648 2.2633376 7.91E- KMK 57 4 1 10 ENSMLUG00000011359 PXDC1 down all rescue -1.818047 0.001247 0.10511993 NA 1.8852330 3.52E- 75 1 05 ENSMLUG00000017172 IRAK3 down all rescue -1.8000031 0.001012 -0.5135038 0.3115384 1.3838678 7.90E- 89 6 9 13 ENSMLUG00000010421 RDH5 down all rescue -1.7787713 2.09E-18 -0.0816156 0.8418870 2.0687574 3.94E- 6 2 64 ENSMLUG00000001169 NACA down all rescue -1.7438917 7.03E-11 0.09254613 0.6571271 2.4848963 4.28E- 7 4 179 ENSMLUG00000014768 ENSMLUG000 down all rescue -1.7293065 8.50E-07 0.25089487 0.7553162 4.9611162 2.92E- 00014768 5 3 33 ENSMLUG00000006706 COQ10A down all rescue -1.7199917 0.013983 0.20588899 0.8709215 1.8276856 6.12E- 95 8 9 07 ENSMLUG00000002769 AP3S1 down all rescue -1.7176113 4.99E-16 -0.2492534 0.1748583 0.2232612 0.00916 6 9 598 ENSMLUG00000012253 NHSL2 down KWT rescue -1.6983713 1.58E-05 -0.3057206 0.1306446 1.5201112 2.19E- 6 20 ENSMLUG00000011161 TNFSF9 down KWT- rescue -1.6811727 0.042958 -1.1026331 NA 3.1636174 0.00029 KMK 28 5 445 ENSMLUG00000015259 CDC42BPG down all rescue -1.6639203 0.000960 -0.093 NA 1.0192684 0.03266 6 654 ENSMLUG00000025725 ENSMLUG000 down KWT- rescue -1.646762 0.036538 0.60401688 NA 1.3482329 0.03550 00025725 KMD 66 88 ENSMLUG00000011826 CAVIN4 down all rescue -1.6437294 0.000158 0.14476813 0.8662212 0.7980976 0.00098 46 6 7 819 ENSMLUG00000014516 GOLGA7 down KWT rescue -1.6293774 1.49E-07 -0.2462154 0.3735589 1.5519341 7.06E- 2 8 44 ENSMLUG00000023690 ENSMLUG000 down KWT- rescue -1.6231764 0.007202 -0.1427721 0.8518746 2.1821036 1.83E- 00023690 KMK 37 1 4 18 ENSMLUG00000025891 IGFBP5 down KWT rescue -1.619741 0.036506 -0.3413676 0.3490056 0.8971932 3.02E- 12 8 5 10

182

ENSMLUG00000027856 CCDC141 down KWT rescue -1.6188046 0.022699 -0.4207394 0.5544709 0.8233701 0.00033 99 3 8 438 ENSMLUG00000010114 TRIM44 down KWT- rescue -1.5920216 0.008291 0.11825706 0.9016220 0.6740119 0.02618 KMD 19 1 7 973 ENSMLUG00000004443 TTC39A down KWT- rescue -1.586997 0.002199 0.14027827 0.7299496 0.5825992 1.91E- KMD 07 1 4 06 ENSMLUG00000008394 ENSMLUG000 down KWT- rescue -1.5767916 0.000259 0.04497663 0.9647564 1.1384577 9.61E- 00008394 KMK 5 5 8 05 ENSMLUG00000003590,ENSMLUG00000020026 DGKA down all rescue -1.5559801 8.52E-07 0.09233432 0.8063509 2.1034188 2.08E- 3 5 73 ENSMLUG00000004470 KLHL29 down all rescue -1.554341 0.005171 0.65534041 0.0502477 2.2972777 1.73E- 9 2 8 24 ENSMLUG00000012908 MMP14 down all rescue -1.5543173 4.93E-06 0.15479856 0.7070775 0.4948537 0.00020 3 2 597 ENSMLUG00000002606 SLC35E3 down all rescue -1.5541404 2.44E-09 -0.1416539 0.4003302 3.7275139 0 1 8 ENSMLUG00000028207 UNC13A down KWT- rescue -1.5530631 0.022151 1.05982846 0.0403955 2.8455866 2.41E- KMD 67 3 9 10 ENSMLUG00000011991 TSGA10IP down KWT rescue -1.5430055 0.010754 -0.060813 NA 1.1121364 0.01677 54 299 ENSMLUG00000012937 ZNF385A down all rescue -1.5400752 1.90E-06 1.85345146 1.152E-08 0.7448666 0.00230 4 263 ENSMLUG00000029968 RPS26 down all rescue -1.5250001 2.57E-13 -0.1192836 0.8444115 2.4308559 3.84E- 4 2 42 ENSMLUG00000009505 IFIH1 down KWT rescue -1.5246444 0.011141 0.38582109 0.7311601 1.2363688 0.01460 97 3 9 284 ENSMLUG00000014336 FHL1 down KWT- rescue -1.5187557 0.039207 0.64128146 0.2076102 0.7332586 0.00637 KMD 97 1 3 509 ENSMLUG00000016091 LOC10243613 down KWT- rescue -1.5049275 0.002960 -0.8179196 0.2649953 1.3059773 1.88E- 9 KMK 32 7 3 05 ENSMLUG00000006712 CS down all rescue -1.4912418 6.96E-34 -0.0841957 0.7409731 2.1316193 6.02E- 3 7 118 ENSMLUG00000006737 PAN2 down all rescue -1.4762096 1.91E-07 -0.1123422 0.8010225 1.6652074 6.22E- 6 31 ENSMLUG00000003139 WDR26 down KWT rescue -1.4756977 3.27E-11 -0.1656564 0.5004490 0.3392398 0.00044 9 6 764 ENSMLUG00000001037 MARS down all rescue -1.4709825 1.49E-09 0.073778 0.8269626 2.3636626 2.34E- 9 2 82 ENSMLUG00000016765,ENSMLUG00000021120 NTRK2 down KWT rescue -1.4647475 0.024360 0.50687916 0.6386099 1.4472275 0.00444 23 3 567 ENSMLUG00000005593 CACNA1G down all rescue -1.4641124 9.29E-14 0.52572101 0.0771728 0.4519807 0.00259 9 6 795

183

ENSMLUG00000014171 EYA2 down KWT- rescue -1.463164 0.000214 -0.1150321 0.7411767 5.6227319 5.03E- KMD 39 6 8 119 ENSMLUG00000008569 NTN1 down KWT rescue -1.4545258 0.002653 0.03736246 NA 2.3925071 5.53E- 54 06 ENSMLUG00000006987 ZNF219 down KWT rescue -1.4513962 0.001027 0.31771225 0.7040469 0.9511012 0.00493 56 4 026 ENSMLUG00000016195,ENSMLUG00000018830,ENSMLUG ATP5F1B down KWT- rescue -1.4491921 0.003958 -0.1394906 0.6122718 2.4068883 6.45E- 00000025381 KMD 43 3 1 144 ENSMLUG00000005274 ENKD1 down all rescue -1.4256495 0.008546 0.16269019 0.8608899 0.9219452 0.00952 08 8 7 507 ENSMLUG00000008264 CCDC9B down KWT- rescue -1.4214551 0.000718 0.41087855 0.1062393 0.2849896 0.03005 KMD 01 3 7 403 ENSMLUG00000014859 BMPER down KWT- rescue -1.4145183 0.000381 0.09821292 0.8479583 1.9482991 4.22E- KMD 96 2 1 23 ENSMLUG00000007166 SHF down all rescue -1.4131494 6.41E-06 0.36241949 0.3325003 0.8450515 1.16E- 2 2 05 ENSMLUG00000008654 NID2 down KWT rescue -1.371017 0.018851 -0.5520452 0.4384693 1.3092756 7.28E- 75 1 3 07 ENSMLUG00000007572 CARD6 down KWT- rescue -1.3689705 0.001396 -0.1794708 0.8658515 2.5049797 8.43E- KMD 46 6 10 ENSMLUG00000029846 ARF3 down KWT rescue -1.3596183 0.007319 0.24426666 0.7729776 1.1174700 0.00134 24 8 856 ENSMLUG00000016672 RALGPS1 down all rescue -1.3567055 0.000147 0.26112893 0.4385095 0.3349408 0.03874 68 3 197 ENSMLUG00000025066 SLC8A3 down KWT- rescue -1.3312561 0.032514 -0.3463398 NA 1.6194261 0.00103 KMD 36 1 681 ENSMLUG00000002506,ENSMLUG00000024066 SEPT5- down all rescue -1.3311772 0.000278 0.60375513 0.0001470 0.3479112 0.00643 /GP1BB 88 4 2 989 ENSMLUG00000015927 PTGES3 down all rescue -1.3187811 1.09E-18 -0.1578757 0.2978666 2.4806241 3.28E- 2 1 83 ENSMLUG00000003979 BCHE down KWT rescue -1.3172061 0.022151 -0.1725569 0.9016220 1.6512388 0.00012 67 1 2 933 ENSMLUG00000000695 RGCC down all rescue -1.3085707 0.007665 -0.5236381 0.4684725 2.9065208 1.92E- 97 3 5 12 ENSMLUG00000013588 EYA1 down KWT- rescue -1.3071897 2.10E-06 0.13682541 0.4675312 0.2491119 7.65E- KMK 9 6 06 ENSMLUG00000027186 ENSMLUG000 down KWT- rescue -1.2959438 0.012519 -0.2383553 0.7426114 0.5915863 0.01401 00027186 KMD 19 4 195 ENSMLUG00000017368 YJEFN3 down all rescue -1.2951645 0.001770 0.22333648 0.7729776 0.8346715 0.01022 45 6 763 ENSMLUG00000011181 CCND1 down KWT- rescue -1.2930173 0.000141 -0.0295446 0.9368445 2.7920557 1.47E- KMK 21 5 4 18

184

ENSMLUG00000003560 GDF11 down KWT- rescue -1.2927568 0.012825 0.52389217 0.5718865 1.7495292 0.00165 KMD 65 5 938 ENSMLUG00000004713 CDK2 down all rescue -1.2898318 1.62E-07 0.18760301 0.6796989 2.2620188 1.24E- 5 3 33 ENSMLUG00000000815 PA2G4 down all rescue -1.2856677 2.29E-09 -0.0493093 0.9214201 2.1921717 8.96E- 1 3 65 ENSMLUG00000004955 SHMT2 down all rescue -1.2765722 0.001067 -0.1862099 0.6753323 1.5353078 9.58E- 35 7 6 10 ENSMLUG00000012946 ITGA5 down all rescue -1.2657947 3.22E-09 0.26863947 0.2274792 0.8504181 3.90E- 9 14 ENSMLUG00000012829 LBH down KWT- rescue -1.2646165 0.009796 0.32740673 0.6371275 1.3738203 1.09E- KMK 83 7 8 06 ENSMLUG00000000881 SI down KWT- rescue -1.2539435 0.014540 -0.7751012 0.3384778 1.0666402 0.00347 KMD 12 7 3 681 ENSMLUG00000009166 CAND1 down all rescue -1.2467861 9.73E-05 0.55657044 7.5567E- 3.2202934 0 10 7 ENSMLUG00000002634 FZD1 down all rescue -1.245585 1.10E-07 0.38340226 0.1918017 0.5886535 0.00018 1 6 074 ENSMLUG00000013562 ANO4 down KWT- rescue -1.245406 0.001050 0.02314569 0.9723385 1.7846210 1.85E- KMD 66 9 3 18 ENSMLUG00000001372 ST5 down all rescue -1.2453187 8.28E-10 -0.0559969 0.7782849 0.1974733 0.00346 1 5 823 ENSMLUG00000008201 PAK6 down KWT- rescue -1.2432973 0.003175 0.15079685 0.8426118 1.2985803 9.06E- KMD 72 2 2 06 ENSMLUG00000009938 ATP2A1 down all rescue -1.2298576 3.00E-05 0.41134162 0.2460695 0.8625749 3.44E- 7 4 05 ENSMLUG00000011156 PRKAG1 down all rescue -1.2147236 2.16E-12 0.27400005 0.3586463 1.0634639 5.51E- 6 12 ENSMLUG00000015104 SLC24A1 down KWT rescue -1.2142506 0.021444 0.36332541 0.7223947 1.3812344 0.00921 41 1 3 405 ENSMLUG00000001777 KITLG down KWT rescue -1.1939611 0.000740 -0.2352807 0.3907004 1.5805578 3.06E- 58 7 6 34 ENSMLUG00000005119 GPD1 down KWT- rescue -1.1937883 0.003100 -0.0176423 0.9844612 1.0828340 8.89E- KMD 48 9 1 05 ENSMLUG00000003968 ENSMLUG000 down KWT rescue -1.1796936 0.015157 0.57116713 0.1046429 1.0911686 1.24E- 00003968 3 3 9 06 ENSMLUG00000012225 TIMM8B down KWT rescue -1.1726417 0.029437 0.19335436 0.8330511 0.7888784 0.03167 84 7 5 058 ENSMLUG00000010487 CCDC39 down KWT rescue -1.1642775 0.014067 0.05706497 0.9614332 1.1707848 0.00275 11 9 5 58 ENSMLUG00000002665 PTEN down all rescue -1.1601734 5.36E-06 -0.2873131 0.0502126 0.9110639 3.04E- 7 5 33

185

ENSMLUG00000016236 PTCH1 down all rescue -1.1596916 4.50E-13 0.07390123 0.7389850 0.8593062 4.81E- 4 6 27 ENSMLUG00000009959 DAZAP2 down KWT rescue -1.1527075 0.001428 0.244708 0.6597282 0.7422827 0.00774 9 5 3 288 ENSMLUG00000013893 SLC11A2 down all rescue -1.1467451 1.12E-11 -0.0351215 0.9310445 0.3948573 0.00073 9 2 005 ENSMLUG00000008324 HIGD1C down KWT- rescue -1.1293125 0.004321 -0.5945953 0.1664025 0.8223546 0.00059 KMD 68 1 9 037 ENSMLUG00000001759 FGFR1 down KWT- rescue -1.1285111 0.002898 0.09871349 0.7311601 1.2319445 1.44E- KMD 56 3 28 ENSMLUG00000009226 SYNPO2 down KWT- rescue -1.1214856 0.002943 0.11149566 0.4963822 0.3498912 4.80E- KMD 7 6 08 ENSMLUG00000012834 FAM131A down KWT- rescue -1.1187577 0.001165 0.08028565 0.9211070 1.2093565 1.38E- KMD 59 2 6 05 ENSMLUG00000011274 NFATC4 down all rescue -1.1148628 0.000165 -0.227609 0.5463254 0.3783452 0.02776 63 8 4 295 ENSMLUG00000002342 SMARCA2 down KWT rescue -1.1108753 0.001157 -0.2890244 0.2697364 2.3992552 5.40E- 83 5 64 ENSMLUG00000002685 CAT down all rescue -1.1101765 5.68E-08 -0.1491875 0.7216153 0.3395976 0.02947 6 8 639 ENSMLUG00000000894 CPEB2 down KWT- rescue -1.1047778 0.000477 -0.3759169 0.0855650 0.6765218 1.53E- KMK 35 6 2 10 ENSMLUG00000025423 ENSMLUG000 down KWT rescue -1.1047482 0.007181 0.33322876 0.4361518 0.7780620 0.00107 00025423 03 6 6 081 ENSMLUG00000014413 PTPRG down all rescue -1.100702 1.16E-05 0.3660649 0.0085833 0.7219678 5.08E- 1 1 16 ENSMLUG00000005400 PCDH19 down KWT- rescue -1.092208 0.002653 -0.1672559 0.6402371 0.8303382 4.40E- KMK 54 4 4 08 ENSMLUG00000011179 KMT2D down all rescue -1.0891147 2.15E-10 0.21367978 0.0955073 0.7918597 2.85E- 9 1 35 ENSMLUG00000005389 HMGN3 down KWT- rescue -1.0851904 6.85E-05 0.05417634 0.8710326 0.9710385 1.56E- KMK 6 9 22 ENSMLUG00000008353 DHH down all rescue -1.0697576 0.002415 0.49165863 0.0435104 1.2278397 1.24E- 13 1 2 16 ENSMLUG00000002372 ITGB5 down all rescue -1.0688328 1.36E-08 -0.1025327 0.4444298 0.2579698 8.92E- 3 06 ENSMLUG00000000419 NEMP1 down all rescue -1.0613011 0.000539 0.16469474 0.5228011 2.6611477 1.73E- 85 8 1 85 ENSMLUG00000023181 ATF1 down KWT rescue -1.0504896 0.001650 0.14280109 0.7003675 0.6710141 7.93E- 2 5 08 ENSMLUG00000011084 FKBP11 down KWT- rescue -1.0352098 0.043929 -0.6062178 0.3668422 1.5184467 3.32E- KMD 22 8 06

186

ENSMLUG00000016930 SPATS2 down KWT- rescue -1.0221477 2.06E-06 0.06193156 0.8903603 0.6497514 2.66E- KMK 2 05 ENSMLUG00000004029 STEAP3 down all rescue -1.0195907 0.015281 0.01374488 0.9874657 1.5655313 7.38E- 84 1 1 08 ENSMLUG00000025293 CREG1 down KWT- rescue -0.9985709 0.032514 0.00853007 0.9965875 1.1022464 0.02364 KMK 36 5 8 694 ENSMLUG00000008735 HS3ST3B1 down KWT rescue -0.9733306 0.001018 0.27786472 0.4072862 2.7316445 1.69E- 28 6 9 32 ENSMLUG00000010299 UBXN11 down KWT- rescue -0.9704988 0.010334 -0.3814672 0.4985722 0.7900780 0.00668 KMK 7 3 7 81 ENSMLUG00000004487 ACVR1B down all rescue -0.9668256 5.79E-09 0.08689966 0.7796477 0.6720416 9.15E- 6 2 09 ENSMLUG00000004000 MMP2 down KWT- rescue -0.9572676 0.032467 -0.2209206 0.5630223 3.9360359 4.62E- KMK 32 2 5 49 ENSMLUG00000010943 ADCY6 down KWT- rescue -0.9556916 0.000256 0.1767038 0.4301066 0.3464165 0.00760 KMD 11 8 5 8 ENSMLUG00000008230 MEF2C down KWT rescue -0.9479006 0.006438 0.0662151 0.8691398 0.4007615 0.00058 54 8 702 ENSMLUG00000005998,ENSMLUG00000030164 MANBA down KWT- rescue -0.9468348 0.019899 0.22304162 0.7739196 1.8242395 8.39E- KMD 76 5 6 08 ENSMLUG00000002144 CAB39L down KWT rescue -0.9443632 0.000535 -0.1463568 0.6706356 0.3710312 0.01729 38 8 4 913 ENSMLUG00000008272 RNF24 down KWT- rescue -0.9360862 0.017744 -0.1457582 0.6217931 0.4396048 0.00016 KMK 85 2 2 762 ENSMLUG00000009119 ATAD3A down KWT rescue -0.9212909 0.007707 0.37129263 0.2363968 0.5148220 0.02007 54 1 6 798 ENSMLUG00000010679 CDKN1A down KWT- rescue -0.9185422 0.033599 -0.3396583 0.3779878 6.3508988 2.50E- KMK 69 9 2 29 ENSMLUG00000012369 TOMM20L down all rescue -0.9110876 0.008181 0.04439987 0.9570077 1.2728769 2.36E- 17 2 9 06 ENSMLUG00000013915 CSRNP2 down all rescue -0.9027125 0.000134 -0.0086695 0.9848328 0.8333497 1.26E- 03 4 09 ENSMLUG00000005360,ENSMLUG00000019728 DIP2B down KWT- rescue -0.8964116 0.012137 -0.2188554 0.0627610 0.9322823 8.77E- KMK 07 4 3 45 ENSMLUG00000003428 UBE3A down KWT- rescue -0.8957239 3.67E-08 0.06793354 0.7999630 0.2429655 0.04469 KMD 8 6 645 ENSMLUG00000000491 SMUG1 down KWT rescue -0.8881086 0.048753 0.13615697 0.8413591 0.7994827 0.00387 68 9 822 ENSMLUG00000012736,ENSMLUG00000027251 MAST1 down KWT rescue -0.8689181 0.043464 0.18410001 0.5375016 1.0511318 7.34E- 98 3 14 ENSMLUG00000002622 TUBA1B down all rescue -0.8571185 0.001955 0.45183669 0.0073128 1.4439044 1.20E- 8 1 43

187

ENSMLUG00000016776,ENSMLUG00000023038 PRIM1 down all rescue -0.8555548 0.002490 -0.079823 0.8095034 2.7647221 4.59E- 55 3 2 98 ENSMLUG00000015971,ENSMLUG00000018003,ENSMLUG COPZ1 down all rescue -0.8446651 0.000520 -0.294216 0.1569020 0.7308135 4.39E- 00000019832 59 3 10 ENSMLUG00000001307 DISP1 down KWT rescue -0.8425093 0.048753 -0.0692889 0.8341493 0.7443785 4.79E- 68 2 7 16 ENSMLUG00000005238 NR2F1 down KWT- rescue -0.8409075 0.000326 -0.2211604 0.1955147 0.5119461 2.30E- KMD 28 5 1 08 ENSMLUG00000016947 MCRS1 down KWT- rescue -0.8375565 0.014392 0.23362307 0.3996451 0.9561302 1.37E- KMD 96 2 9 11 ENSMLUG00000027488 ENSMLUG000 down KWT- rescue -0.8271945 0.001263 0.18918149 0.1648796 0.6968379 1.89E- 00027488 KMD 61 6 7 29 ENSMLUG00000011744 GTF2A1L down KWT rescue -0.8195667 0.002560 0.32988863 0.3449400 0.7101504 0.00051 92 1 889 ENSMLUG00000001478 SH3D19 down KWT rescue -0.8046478 0.024517 -0.1125226 0.8299723 1.0369251 8.86E- 37 7 07 ENSMLUG00000004454 EPS15 down KWT rescue -0.8015142 2.29E-06 0.00756226 0.9823655 0.3155352 0.00137 5 3 175 ENSMLUG00000000967 IRS1 down KWT- rescue -0.7960939 0.006393 0.08037991 0.8557507 0.6448539 8.32E- KMK 49 4 9 05 ENSMLUG00000004606 UGDH down KWT rescue -0.7875115 0.008511 -0.1561634 0.5395794 0.468838 4.28E- 47 3 05 ENSMLUG00000002536 KLF12 down KWT rescue -0.7773029 0.006564 -0.1545285 0.2737822 0.2322215 0.00357 54 8 1 363 ENSMLUG00000013907 LETMD1 down KWT- rescue -0.7722114 0.026057 -0.1348652 0.8084207 0.8311628 2.01E- KMD 78 8 2 05 ENSMLUG00000009454 MOB3B down KWT- rescue -0.7622893 0.030888 -0.177695 0.5239959 3.4690946 3.28E- KMD 27 1 2 80 ENSMLUG00000004938 FAM3D down KWT- rescue -0.7595705 0.012642 0.19966183 0.3213699 0.3388791 0.00173 KMD 62 1 2 129 ENSMLUG00000005013 PPP1R15A down all rescue -0.7444441 0.021537 0.42871805 0.1675749 0.4536834 0.03257 63 3 8 554 ENSMLUG00000013160 CNN2 down KWT- rescue -0.7439875 0.002236 0.1600448 0.3630021 0.4816295 1.35E- KMD 73 1 10 ENSMLUG00000002339 LOC10243038 down KWT- rescue -0.7265472 0.034546 -0.0854167 0.6694401 3.3892792 1.01E- 0 KMD 62 7 3 158 ENSMLUG00000012794 NFIX down all rescue -0.7244334 2.09E-05 0.16273471 0.2419621 0.1307563 0.03897 4 4 263 ENSMLUG00000007218 SLC30A10 down KWT rescue -0.7241232 0.010133 -0.0285123 0.9734165 0.6548228 0.03613 11 8 7 073 ENSMLUG00000011013 SPOP down KWT- rescue -0.7212004 0.008546 -0.2021349 0.4985769 0.3672145 0.01507 KMD 08 2 4 285

188

ENSMLUG00000010266 BAX down all rescue -0.7209952 0.010579 0.28110453 0.3396355 0.9582935 3.75E- 26 9 8 09 ENSMLUG00000005927 CYP27B1 down KWT- rescue -0.7197124 0.000361 0.4309931 0.0003065 2.8627185 6.61E- KMD 53 7 6 111 ENSMLUG00000000648 HNRNPA1 down all rescue -0.7164412 3.75E-05 -0.0655732 0.8496302 0.8816334 6.98E- 7 6 18 ENSMLUG00000007373 TBC1D30 down KWT- rescue -0.711886 0.049667 -0.5037219 0.1631538 1.0473095 4.40E- KMD 53 4 8 08 ENSMLUG00000007336 LAMC1 down KWT- rescue -0.707951 6.09E-10 -0.1815095 0.0878604 0.9804808 6.74E- KMK 9 69 ENSMLUG00000016244 DNAJC11 down KWT- rescue -0.7015184 0.000697 0.45563551 0.0020569 0.4063699 0.00391 KMD 89 5 5 997 ENSMLUG00000001602 MICU3 down KWT rescue -0.6877603 0.049758 -0.0624322 0.8860186 0.5798430 0.00107 26 9 6 435 ENSMLUG00000013918 TFCP2 down KWT rescue -0.6828072 0.003316 -0.0319522 0.9183771 0.7514734 2.45E- 63 3 5 14 ENSMLUG00000000900 HDAC7 down all rescue -0.6725416 9.53E-06 0.18645732 0.2737049 0.3387865 0.00034 6 4 143 ENSMLUG00000011642 STRN3 down KWT- rescue -0.6705868 1.81E-06 -0.1341471 0.3569933 0.2376443 0.00145 KMK 5 8 529 ENSMLUG00000008307 NEK7 down KWT rescue -0.6703168 0.004497 -0.0747394 0.8578888 0.3671895 0.02270 14 1 5 269 ENSMLUG00000009971 PCBP2 down KWT- rescue -0.6607625 8.77E-06 -0.0453925 0.8134984 0.7761460 3.63E- KMK 3 2 29 ENSMLUG00000003456 MATN3 down KWT- rescue -0.6474616 0.031852 0.39726159 0.4146064 0.7113849 0.02763 KMD 32 6 6 182 ENSMLUG00000011937 ARHGAP31 down KWT- rescue -0.6432645 0.010669 0.0397052 0.9093629 0.5951283 5.43E- KMD 34 4 5 07 ENSMLUG00000006798 RECK down KWT- rescue -0.6412055 0.046937 -0.1184371 0.6153958 0.5236446 1.13E- KMD 72 6 07 ENSMLUG00000007486 SAMD8 down KWT rescue -0.6339446 0.049621 -0.1106109 0.8330511 0.5327288 0.01441 29 7 9 942 ENSMLUG00000012513,ENSMLUG00000021489 LARP4 down KWT rescue -0.6321048 0.000156 -0.0994965 0.6897071 0.9542276 1.86E- 65 9 2 29 ENSMLUG00000011537 MSRB3 down KWT- rescue -0.6273869 0.041107 0.22530679 0.0828554 0.9695385 4.98E- KMD 21 2 3 119 ENSMLUG00000006969 PRDM2 down KWT- rescue -0.6206404 5.08E-10 0.12433349 0.2525064 0.2220061 0.00062 KMD 4 1 361 ENSMLUG00000014957 EPHA2 down KWT- rescue -0.6150007 0.000224 0.13976767 0.2344904 0.4364500 6.93E- KMD 71 9 14 ENSMLUG00000015446 PPP3CA down KWT rescue -0.6136596 0.002230 -0.0518075 0.8381655 0.8537305 3.23E- 02 3 25

189

ENSMLUG00000006205,ENSMLUG00000019450 TCF12 down KWT rescue -0.6136573 7.12E-06 -0.2924668 0.0561200 0.4207386 8.97E- 2 9 06 ENSMLUG00000012354,ENSMLUG00000020090 ARID4A down KWT- rescue -0.6040689 0.009746 0.00399962 0.9886501 0.8746082 2.40E- KMD 24 7 3 29 ENSMLUG00000005657 RSF1 down KWT rescue -0.5981956 0.002416 -0.089617 0.5779665 0.2595986 0.00100 04 6 407 ENSMLUG00000003067 FAM172A down KWT rescue -0.587994 0.005409 -0.0905933 0.5949075 0.2074786 0.02138 67 9 8 279 ENSMLUG00000017450 UVRAG down KWT rescue -0.5878097 6.79E-07 -0.0422679 0.8006440 0.1660979 0.03668 5 4 135 ENSMLUG00000001165 CLIC4 down KWT rescue -0.5698138 0.001446 -0.2421868 0.2720850 1.2613840 5.42E- 12 3 4 37 ENSMLUG00000012180 RRBP1 down KWT- rescue -0.5669788 0.028801 0.04770318 0.7552574 0.1538093 0.01092 KMD 29 9 2 936 ENSMLUG00000017769 DENND5A down KWT rescue -0.5628857 6.00E-05 -0.0780543 0.6933166 0.3189938 0.00026 7 536 ENSMLUG00000005110 SMARCD1 down KWT- rescue -0.5534764 0.004304 -0.1623433 0.5319547 0.8189401 8.93E- KMD 38 8 09 ENSMLUG00000005663 ADAM10 down KWT- rescue -0.5485495 0.002548 -0.0795852 0.6504721 0.7243441 3.23E- KMD 52 9 7 30 ENSMLUG00000017139 GNL2 down KWT rescue -0.5458035 0.008806 -0.0784833 0.8324867 0.4414766 0.00430 43 4 869 ENSMLUG00000017066 PHACTR4 down KWT rescue -0.541955 0.012123 0.20329664 0.3530832 0.4065448 0.00442 3 6 3 532 ENSMLUG00000014552 ANO5 down KWT- rescue -0.5404222 0.030888 -0.1253682 0.7676325 1.1632160 6.72E- KMK 27 8 9 09 ENSMLUG00000004146 ZFAT down KWT- rescue -0.5297382 0.018564 0.10093197 0.7997910 0.7566435 2.44E- KMD 48 9 7 05 ENSMLUG00000015559 ATXN2 down KWT rescue -0.5151546 0.004425 -0.092848 0.5322677 0.3455931 1.62E- 86 7 07 ENSMLUG00000005140 LIMA1 down KWT- rescue -0.4988912 0.007446 0.40334884 0.0004781 1.4883854 1.74E- KMD 36 3 120 ENSMLUG00000015248 IPPK down KWT rescue -0.4979008 0.035996 0.13945593 0.3203477 0.5591439 4.20E- 51 9 10 ENSMLUG00000007947 PLXNA1 down all rescue -0.4925005 0.003094 0.12414161 0.4681528 0.3458868 3.06E- 74 6 05 ENSMLUG00000006672 BAZ1A down KWT rescue -0.4792331 0.023773 -0.1341723 0.4526069 0.2331030 0.03125 62 8 3 353 ENSMLUG00000000652,ENSMLUG00000021632 CBX5 down KWT- rescue -0.4757875 0.003329 0.09910754 0.5142349 0.6707798 1.79E- KMK 04 6 2 18 ENSMLUG00000005729 PCCA down KWT- rescue -0.4750852 0.004689 -0.1172455 0.4406396 0.2103380 0.02335 KMK 1 9 1 728

190

ENSMLUG00000029976 CCDC88C down all rescue -0.4734592 0.009547 0.17724807 0.4897303 0.5192025 0.00098 6 187 ENSMLUG00000014730 FASTK down all rescue -0.4553339 0.045408 0.51265472 0.0004229 0.3063072 0.02608 41 6 6 762 ENSMLUG00000006542 SNTB2 down KWT rescue -0.4379206 0.014592 -0.1712258 0.3805372 0.4185965 9.11E- 82 2 06 ENSMLUG00000012761,ENSMLUG00000018702,ENSMLUG KANSL2 down KWT- rescue -0.4205104 0.049262 -0.0284275 0.9326724 0.9875499 3.96E- 00000027250,ENSMLUG00000029219 KMD 25 9 5 22 ENSMLUG00000004456 CHD9 down KWT- rescue -0.3881327 0.042295 -0.1611427 0.1884518 0.1950694 0.01324 KMK 71 3 3 293

191

192