1

Characterization of TOPHAT, a novel domesticated transposable element gene

Danny K. Leung Department of Biology McGill University, Montreal February, 2015

A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science.

© Danny K. Leung, 2015

2

Table of Contents

Abstract ...... 4

Résumé ...... 5

Acknowledgments ...... 6

Statement of contribution ...... 6

Introduction ...... 7

Introduction to transposons ...... 8

Types of transposons ...... 9

Influence of transposable elements on genome evolution ...... 10

Influence on gene expression ...... 10

Transduplication and transduction ...... 11

Domestication ...... 11

hAT transposable elements ...... 14

Characterization of TOPHAT ...... 15

Initial characterization of TOPHAT ...... 15

Bioinformatic characterization ...... 15

Initial phenotyping ...... 20

Additional phenotyping...... 23

Root length assay ...... 23

Germination rate ...... 25

Transcriptomic analyses using an RNA- approach in ...... 26

Synergistic model of gene regulation ...... 29

3

Transcriptional differences common to TOPHAT knockout mutant and overexpression genotypes ...... 35

Transcriptional differences in the TOPHAT knockout mutant genotype ...... 35

Transcriptional differences in the TOPHAT overexpression genotype ...... 42

Biotic stress response phenotype...... 46

Bioinformatic approaches to the analysis of TOPHAT ...... 48

Conservation of TOPHAT in plant genomes ...... 48

Phylogenetic analyses of TOPHAT and other SLEEPER-related genes ...... 48

Conclusions ...... 52

Methods ...... 53

Cloning ...... 53

Transformation ...... 53

Salt assay ...... 54

Phenotyping ...... 55

Root length assay ...... 55

Germination rate assay ...... 55

RNA-SEQ pipeline ...... 56

Growth and sample preparation ...... 56

Bioinformatics ...... 56

Bacterial infection assay ...... 56

Phylogeny ...... 57

Works cited ...... 58

List of figures ...... 67

List of tables ...... 69

4

Abstract

With the advent of high-throughput sequencing, it has been revealed that protein- coding genes constitute only a small portion of the genome. Comprising a large segment of the genome are transposons, or transposable elements (TEs), which have been classically regarded as “selfish” or “junk” DNA. Research on deciphering the non-coding regions has been a more recent area of focus, and TEs have been shown to be a contributing factor in genome evolution. The activity of TEs can be induced by environmental and population factors in various organisms. As part of the VEGI project, abiotic stress screens were performed on a curated set of T-DNA insertional mutagenesis lines to identify domesticated transposable element (DTE) genes with putative functions. In this screen, one uncharacterized DTE, which we have named TOPHAT, showed significant phenotypes in multiple abiotic tolerance screens (salt, nitrogen use efficiency, freezing). TOPHAT overexpression lines were created in a wild-type background to further characterize this DTE gene. In a parallel RNA-sequencing experiment, expression analyses confirmed the osmotic stress phenotype in the knockout line and identified a large group of pathogen defence genes that are expressed constitutively in the overexpression genotype. Proteins encoded by these genes play critical roles in pathogen recognition and the activation of defence responses. We propose that TOPHAT is a DTE that causes changes in the expression of a host of genes through a synergistic co-regulation with other genes in its gene family. Among those are genes which are associated with abiotic and biotic resistance and response. TOPHAT and the closely related SLEEPER genes may be acting collectively to repress a suite of stress-related genes to ensure and maintain transiency of expression. We hope our study characterizing this novel DTE can further our knowledge of the importance of DTEs in genome evolution.

5

Résumé

Avec l’avènement du séquençage à haut débit, il a été révélé que les gênes codeurs de protéines constituent seulement une minorité du génome. Généralement vu comme de l’ADN «égoïste» ou «camelote», les éléments transposables (ETs) comptent pour une large portion du génome. Les recherches effectuées sur le déchiffrage des régions non codantes sont un domaine de recherche plus récent. Il a été démontré que les ETs ont contribués à l’évolution du génome. L’activité des ETs peut être déclenchée par des facteurs de population et environnementaux dans divers organismes. Dans le cadre du projet VEGI, des dépistages de stress abiotique ont été réalisé sur un échantillon méticuleusement sélectionné de T-ADN de mutagénèse insertionnelle afin d’identifier les gènes d’éléments transposables domestiques (ETDs) avec des fonctions putatives. Dans ces dépistages, un ETD non caractérisé que nous avons nommé TOPHAT, a démontré des phénotypes dans plusieurs tests de tolérance abiotique (sel, efficacité d’utilisation de l’azote, congélation). Les lignes de surexpressions de TOPHAT ont été créées dans une plante de type sauvage afin de caractériser ce gène ETD. Dans une expérience parallèle portant sur un séquençage d’ARN, l’analyse de l’expression confirme le phénotype du stress osmotique dans la ligne du knock-out. L’analyse révèle également un groupe nombreux de gènes de défense contre les pathogènes qui s’expriment constitutivement dans la surexpression du génotype. Les protéines encodées par ces gênes jouent un rôle critique dans l’identification des pathogènes et l’activation des réponses de défense. Nous proposons que TOPHAT soit un ETD qui provoque des changements dans l’expression d’une variété de gênes à travers une co-régulation synergique avec d’autres gênes de la même famille. Parmi ces gênes, plusieurs sont associés avec des résistances et des réponses abiotiques et biotiques. TOPHAT et ses gênes SLEEPER apparentés peuvent agir collectivement pour réprimer une suite de gênes liés au stress afin de s’assurer de maintenir une expression transitoire. J’espère que mon étude caractérisant les nouveaux ETD pourra approfondir nos connaissances sur l’importance des ETDs dans l’évolution des gênes.

6

Acknowledgments

I sincerely appreciate the help and support provided by those around me. Science requires the collaboration and resourcefulness of others who are willing to be generous with their time. A big thanks to Akiko Tomita, Zoé Joly-Lopez, Douglas Hoen, Ewa Forczek, and Emilio Vello for their help and teamwork throughout the project. I have learned a lot about research and writing and have become acquainted with many important techniques throughout this degree. I would like to thank Dr. Thomas Bureau for his supervision and valuable advice and critique, as well as those given by committee members Dr. Joseph Dent and Dr. Ehab Abouheif at the McGill biology department. I especially appreciated the independence afforded to me to explore the project and determine its path. A last word of thanks to the friends and family who have supported me during this time. My funding was in part provided by the Natural Sciences and Engineering Research Council of Canada and Genome Québec/Canada through the VEGI project headed by Dr. Thomas Bureau.

Statement of contributions

All experiments in this paper were developed and performed by Danny Leung except for the following. The salt tolerance and freezing tolerance assays from which the TOPHAT mutant was discovered were designed and performed by Zoé Joly-Lopez and Dr. Ewa Forczek, respectively (Figure 6 and 7). The expression visualization figures (2 and 4) were created using the AtGenExpress Visualization Tool (Schmid et al., 2005). The genome browser figures were visualized using a version of the UCSC genome browser adapted for Brassicaceae developed by Haudry et al. (2013). Lastly, the Pseudomonas syringae pv. DC3000 colonies were kindly provided by the laboratory of Dr. David Guttman at the University of Toronto.

7

Introduction

With the advent of high throughput sequencing, it has been revealed that protein- coding genes constitute only a small portion of the genome. Comprising a large portion of the genome are transposons or transposable elements (TEs), which have classically been regarded as “selfish” or “junk” DNA. Research on deciphering the non-coding regions has been a more recent area of focus, and many examples have been shown where TEs have contributed to the evolution of the genome. For example, TE insertions or excision are able to change gene expression patterns by shuffling transcription factor binding sites or by epigenetic processes (Feschotte, 2008). Another example is the ability of TEs to duplicate, combine, and mobilize parts of the genome to create chimeric genes or novel regulatory sequences (Hoen et al., 2006). The last example is molecular domestication, a process by which the TE protein genes are co- opted and evolve to perform host functions. To date, there are three major examples of functional DTEs in Arabidopsis thaliana. FAR1/FHY3 identified in Arabidopsis are transcription factors (TFs) involved in regulating phyA light signalling (Hudson et al., 2003). DAYSLEEPER has lost its transposase function and acquired new essential functions in plant development (Bundock & Hooykaas, 2005). Our lab is characterizing the MUSTANG family of genes, which show severe developmental and reproductive defects when knocked out (Joly-Lopez et al., 2012). My research to date focuses on transposable element domestication. I am aiming to characterize a novel DTE discovered through high-throughput functional screens. In this thesis, TOPHAT was characterized by a number of approaches. Functional characterization methods were used to further characterize the initial phenotypes found through the abiotic stress tolerance screens. Expression analysis was performed using a transcriptomic approach to identify underlying pathways and gene networks. Lastly, computational methods were used to probe the evolutionary background of the gene. This research will reveal ways in which transposable elements can contribute to coding capacity. In particular, it will serve to understand the evolutionary impact of domestication and serve to establish the importance of DTEs and their derivatives as host genes.

8

Introduction to transposons

In the 1940’s, Barbara McClintock first discovered transposable elements (TEs) while conducting her ground-breaking work on cytogenetics in maize (McClintock, 1950). She noticed changes in the genome, including insertions, deletions, and translocations, caused by TEs, and recognized their potential to play an important role in genome evolution. TEs are genetic elements that, as suggested by their name, transpose and persist in the genome, often multiplying to great numbers. Their abundance accounts for much of the differences in genome size among species despite the small variance in gene number. Current estimates of TE abundance include 50% of the human genome (Kent et al., 2002), 85% of the maize genome (Sanmiguel & Bennetzen, 1998), and 15-24% of the Arabidopsis thaliana genome (The Arabidopsis Genome Initiative, 2000; Hu et al., 2011; de la Chaux et al., 2012). TEs are frequently described as “selfish” because they can persist without necessarily producing beneficial phenotypes for the host. This is called self-replicative selection, where the TEs persevere by relying on high transposition rates to undermine the accumulation of mutations. However, too much TE activity can disrupt existing genes and jeopardize the fitness of the host, and therefore there exists a negative selection pressure on the activity of TEs. Mechanisms have evolved for the host genome to recognize and silence TEs. This can be achieved through post-transcriptional gene silencing processes such as siRNAs to target TEs for mRNA degradation (Sijen & Plasterk, 2003), and RNA-directed DNA methylation to trigger histone modifications and prevent transcription and transposition from occurring (Aravin et al., 2007; Castel & Martienssen, 2013). These signatures are key markers in the differentiation of transposable elements from domesticated transposable elements (DTEs), or elements which have been co-opted and provide a beneficial phenotype for the host. Originally dismissed and ignored as genetic parasites with no beneficial function to the genome (Doolittle & Sapienza, 1980), McClintock’s original hypothesis is now back in the spotlight, with TEs being viewed as a possible key driver behind genome evolution (Feschotte, 2008; Rebollo et al., 2010) and a source of genetic material to be exapted to confer functional advantages in the host (Hudson et al., 2003; Bundock & Hooykaas, 2005; Cowan et al., 2005; Kapitonov & Jurka, 2005; Joly-Lopez et al., 2012).

9

Types of transposons

At present, transposable elements can be categorized into two classes based on their method of transposition. Class I transposons, or retrotransposons, are characterized by their proliferation via reverse transcription. They are characterized by their transcription into an RNA intermediate which, after reverse transcription, is reinserted into the host genome. This class can be further separated into those with long terminal repeats (LTR retrotransposons) or those without (non-LTR retrotransposons). Like retroviruses, LTR-retrotransposons carry genes encoding their own reverse transcriptase enzymes to facilitate reintegration into the genome. Non-LTR retrotransposons do not contain the terminal repeats as their name suggests, and can be categorized into two subtypes. Long interspersed elements (LINEs) are found in large numbers across eukaryotic genomes, comprising around 17% of the human genome (Lander et al., 2001), and usually contain two open reading frames (ORFs), which encode a RNA-binding protein and an endonuclease and reverse transcriptase enzyme. Short interspersed Elements (SINEs) are short non-LTR sequences that do not encode functional proteins and rely on trans- acting factors from other mobile elements for transposition. Retrotransposons effectively contribute to the highest coverage in the genome due to their length and ability to “copy and paste” throughout the genome using the cDNA intermediate. Class II transposons, also known as “cut and paste” transposable elements or DNA transposons, encode a protein that recognizes terminal inverted repeat (TIR) sites at the 5’ and 3’ ends of these elements, allowing them to excise and reinsert into the genome. These transposons share similarities in domain architecture, usually containing an N-terminal DNA- binding domain that recognizes and binds to specific motifs such as TIRs and a C-terminal catalytic domain that facilitates the DNA cleavage and reinsertion events (Feschotte & Pritham, 2007). Some DNA transposons, such as the MUSTANG-B subfamily of transposable elements described by the Bureau lab (Cowan et al., 2005), contain extra domains, likely gained through transduplication events (described below). A final category of DNA transposons has been reported to replicate by a “rolling-circle” process, in which a single strand is pasted into the target site and filled in by replication (Kapitonov & Jurka, 2001).

10

Lastly, both classes of TEs can be further classified as either “autonomous” or “non- autonomous” according to self-sufficiency. Non-autonomous DNA transposons still have functional terminal inverted-repeats (TIRs) to allow for transposition but require the presence of trans-acting factors from autonomous TEs to be able to mobilize as they are lacking the coding capacity for the necessary mechanisms to move by themselves. Often, this is due to mutations that disrupt the transcription of a functional transposase or that the TE is derived from a pair of conserved TIRs sandwiching genetic material from non-transposon origins. In the case of non-autonomous retrotransposons, the reverse transcriptase is disrupted or no longer expressed.

Influence of transposable elements on genome evolution

Influence on gene expression

Although TEs are able to replicate independently from their hosts, they inescapably participate in a dynamic relationship with the host genome. Through its mobility, a TE has the potential to influence the genome in many ways. In the most evident sense, TEs can insert directly into the coding sequence of a gene, truncating the gene sequence or add new coding sequences, domains, or splicing variants. Even when inserted into intergenic regions, TEs are able to influence the expression of nearby genes (Britten, 1996). The inserted TE, as a mobile piece of genomic material, may contain sequences such as alternative start sites or binding sites for regulatory elements. The addition of extra genetic material can also act as a spacer to disrupt existing cis-elements from their influence in cueing nearby gene expression. Since TEs are also targets of siRNA mechanisms to silence their expression, the epigenetic state around the regions of insertion can also be incidentally altered (Sijen & Plasterk, 2003).

11

Transduplication and transduction

Transduplication and transduction are the processes by which TEs capture or duplicate genes and gene fragments of non-transposon origin. DNA transposons can duplicate and mobilize parts of the genome in a process called transduplication. Genomic material is shuffled around the genome, potentially resulting in new regulatory sequences or chimeric genes through the creation of new exons (Juretic et al., 2005). All examples in Oryza sativa that have been studied appear to be pseudogenes – transcribed transduplicates that have truncations or lack characterized domains which preclude the formation of a functional protein (Hoen et al., 2006). Nonetheless, transcribed pseudogenes may function in the regulation of gene expression, much like how other non-coding DNA such as microRNAs have regulatory roles. In the case of retrotransposons, transduction is an equivalent process which involves the transcription of a retroviral promoter and a subsequent read-through into host gene sequences. The chimeric transcript undergoes reverse transcription and the host sequences are incorporated into the transposon.

Domestication

The last example, and an area of focus of the Bureau lab, is called domestication – a process by which TE protein genes are co-opted and evolve to perform host functions. This process is predicted to occur when mutations create a host-beneficial phenotype and immobilize the transposable element, allowing selection to fix the mutations into the population (Miller et al., 1999). In the last number of years, many examples of domestication transposable element (DTE) genes have been described in detail in organisms ranging from vertebrates to plants. Many of these genes fulfill roles in the nucleus, such as acting as putative transcription factors. As TEs already typically contain a DNA-binding transposase domain and a nuclear localization sequence, their domestication into nuclear transcription factors is a logical outcome. In addition, these DNA-binding domains recognize motifs within TEs, and since TEs are duplicated and spread across the genome, DNA transposons have an innate predisposition

12 to be exapted as de novo regulatory networks. Of the eukaryotic DTE genes derived from DNA transposons, about half are annotated as putative transcription factors (Feschotte & Pritham, 2007; Sinzelle et al., 2009). A hallmark of transposon-derived genes with known host functions is that it usually lack the transposon-specific terminal sequences. This is likely because the transposition of a gene with host functions could dramatically change its expression and negatively influence its fitness, and selection should act to inhibit its mobility. In recent years, many detailed accounts of DTE genes have been described. In vertebrates, it is believed that TEs may have been co-opted by the vertebrate immune system to produce a diverse array of antibodies. To respond to a virtually unlimited number of antigens expressed by bacteria and viruses, a large repertoire of antibodies must be synthesized by a limited amount of genes. A combinatorial solution to encode surface receptors, along with somatic mutations and insertions and deletions at segment junctions, allow vertebrates to establish the necessary antibody diversity (Tonegawa, 1983). The V(D)J recombination system used to create antibody diversity is catalyzed by the RAG1 protein, which has a catalytic core region similar to transposases encoded by Transib DNA transposons (Kapitonov & Jurka, 2005). It is proposed that the RAG1 protein is a chimeric domestication event, with its catalytic core derived from the Transib transposase and the N-terminal domain assembled from separate proteins. In plants, at least three DTE families have been described with various functions all pertaining to gene regulation. FAR1/FHY3 was identified in Arabidopsis thaliana as transcription factors (TFs) involved in regulating phyA light signalling (Hudson et al., 2003). The proteins encoded by these genes are similar to Mutator transposase genes but lack TE terminal sequences. The FAR1 protein is sufficient to activate transcription in A. thaliana and may be a type of transcriptional regulator. Although isolated in a genetic screen for far-red light response, it appears that the gene may operate on a pleiotropic level, as it appear to affect other genes not responsive to light. However as far-reaching as its roles may be, it is proposed that the mechanism by which FAR1 and FHY3 control the expression of target genes is directly evolved from the ability of the transposase domain to bind to the TIR of TEs. In other words, the

13

DNA-binding mechanisms ancestrally used to recognize transposition cleavage sites have been exapted for use to recognize and bind to sequences and cause transcriptional regulation. DAYSLEEPER is a hAT-like DTE that has lost its transposase function and acquired new essential roles in plant development (Bundock & Hooykaas, 2005). It has the same conserved domain architecture as hAT TEs but lacks features required for active transposition. The hAT- like dimerization domain is demonstrated to be active and is responsible for the homo- dimerization of DAYSLEEPER (Knip et al., 2013). Plants lacking or strongly overexpressing DAYSLEEPER display abnormal development in Arabidopsis thaliana. The gene was isolated as a factor binding to the Kubox1 motif during a yeast one-hybrid screen, but it appears that its influence is more far-reaching than initially suggested. The genes that are differentially regulated upon overexpression are involved in a range of processes, from development to pathogen defense (Bundock & Hooykaas, 2005). Hence, it is suggested that DAYSLEEPER may be transcription factor with many roles, much like other observed DTEs. Our lab has also characterized the MUSTANG family of genes which, when knocked out, show severe developmental and reproductive defects. Unlike the above DTE families, which were discovered by reverse genetic screens, MUSTANG was discovered through an in silico targeted search for DTEs (Cowan et al., 2005). The genes belonging to the MUSTANG gene family contain the same conserved domains that are found in MULE TEs. One subset of MUSTANG, the MUG-B subfamily, also contain the Phox/Bem1p domain (PB1), a possible example of transduplication or exon shuffling, since this domain is not usually found in transposable elements. Mutants of Arabidopsis thaliana MUSTANG genes yield severe developmental phenotypes, displaying decreased plant size, delayed flowering, abnormal development of floral organs, and markedly reduced fertility (Joly-Lopez et al., 2012). The continued characterization of DTE phenotypes provides a more complete understanding on the evolutionary impact of transposons.

14 hAT transposable elements

DNA transposons can be organized into families based on the ability to be activated by the same transposase. Logically, these families share nucleotide similarities at the termini as well, where the TIRs that the transposases recognize are located. There is also conservation in amino acid sequence similarity in the transposase gene to recognize those sequences. The first transposon found by Barbara McClintock, Activator, is a member of the large and diverse hAT transposon superfamily (McClintock, 1950). Along with hobo in Drosophilia (Calvi et al., 1991), and Tam3 in Antirrhinum majus (Hehl et al., 1991), these transposable elements form the founding members of this superfamily. Since then, many other transposable elements distributed across eukaryotes have been discovered to share significant similarity to Activator. In fact, the sequencing of the human genome has revealed that hAT TEs are the most abundant type of DNA transposons in humans (Lander et al., 2001). Phylogenetic analyses reveal that hAT TEs tend to cluster by kingdom and suggest that the superfamily likely predates the plant-fungi- animal separation (Rubin et al., 2001). In terms of signatures, hAT TEs share an N-terminal BED Zinc finger domain, a suggested DNA-binding domain, and a catalytic dimerization domain close to the C terminus (Essers et al., 2000; Zhou et al., 2004). In addition, there are characteristic invariable nucleotides in the 8-bp target site duplications (TSDs) created upon transposon reintegration. Being a well proliferated element with existing DNA-binding and dimerization properties make the hAT TE particularly prone to exaptation; examples of domesticated hAT TEs have been characterized in a multitude of organisms (Sinzelle et al., 2009).

15

Characterization of TOPHAT

TOPHAT was initially highlighted as a candidate DTE as a result of an extensive search for functional conserved “non-coding” regions in members of the Brassicaceae, including Arabidopsis thaliana. The DTE branch of this search was a systematic examination based on many lines of evidence, including expression, repetitiveness, conserved microsynteny, and methylation to differentiate the genes from regular TE genes (Hoen & Bureau, in press). The search was able to detect all 29 known A. thaliana DTEs to confirm its sensitivity and predicted 38 novel putative DTEs. The activity of transposable elements (TEs) can be induced by environmental factors in various organisms (Naito et al., 2009). As a logical inference, the functions of potential DTEs may be prospectively related to stress responses in plants. Stated another way, plants with mutant DTE genotypes may show visibly altered phenotypes when challenged with environmental stresses if those DTEs are indeed functional host genes. As part of the VEGI project, abiotic stress screens were performed on a curated set of DTE T-DNA insertional mutagenesis lines to identify DTE genes with putative functions. One uncharacterized DTE, named TOPHAT due to its similarity to hAT family TEs, showed significant phenotypes in multiple abiotic tolerance screens – including salt tolerance and freezing tolerance (data shared by Zoé Joly-Lopez [salt] and Ewa Forczek [freezing]) – and a large scale study was performed to further characterize this gene as an example of a DTE discovered through a systematic bioinformatic screen.

Initial characterization of TOPHAT

Bioinformatic characterization

TOPHAT differs from other hAT TE in that as a DTE, it displays genetic features found in ordinary protein coding genes while sharing an evolutionary heritage and sequence similarity with hAT TEs. By contrasting TOPHAT to an “undomesticated” or bona fide hAT TE in A.

16 thaliana, we can identify the attributes that, along with its stress response phenotypes, provide a convincing case that TOPHAT has been exapted. Unlike hAT TEs, TOPHAT is well expressed across different tissue types (Figure 1). EST data and RNA-sequencing data also confirm this observation. Looking at small non-coding RNA targets, it appears that TOPHAT has escaped epigenetic mechanisms that have evolved in eukaryotic cells to silence the expression of TEs and to inhibit their mobility, allowing for transcription to occur. There are no small RNA targets in TOPHAT, while a bona fide TE typically contains many throughout the length of its sequence (Figure 2). Transcriptional gene silencing is linked to the ability of TEs to transpose and cause genomic volatility, and these heritable epigenetic defences are thought to be necessary for the stability of the genome (Ito & Kakutani, 2014). As they are immobilized and eventually fixed into the population due to its positive effect on fitness, DTEs no longer require such mechanisms to keep them at bay. Inversely, the expression of functional DTEs is encouraged in order to perform their exapted roles in the host. Newly domesticated genes undergo positive selection due to their novel roles contributing positively to the fitness of the host, leading to their eventual fixation in the population. In TOPHAT, we observe this in the form of conservation and synteny when comparing amongst diverged species. Among the Brassicaceae, amino acid sequence conservation is high, with Arabidopsis lyrata and Capsella rubella at 94.4% and 86.6% identity, respectively (Table 1). Sequence identity remains well-conserved (>50% amino acid identity) even when compared to species separated by greater evolutionary distances. Along with this conservation, TOPHAT contains three putative domains: (1) a BED Zinc-finger DNA-binding domain (PFAM:02892) near the N-terminus commonly found in transposases, (2) a conserved domain of unknown function (DUF4413; PFAM:14372) which is thought to be a part of the RNase-H fold section of transposable element proteins, (3) and a hAT family dimerization domain (PFAM:05699) in the C-terminus (Figure 3). TOPHAT maintains microsyntenic blocks across all cruciferous species (Figure 4). In stark contrast, the hAT TE example found in A. thaliana shows an evident lack of synteny in the surrounding region, due to the lack of selective pressure and mobility.

17

(A)

tissue type

(B)

tissue type Figure 1. Expression of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g02050) (B). Images were visualized using the AtGenExpress Visualization Tool (Schmid et al., 2005). The individual points represent different samples of the tissue type, available as a list from the Table S1 in the cited paper. The y-axis represents normalized signal intensity of the microarray. The signal intensity around 5 likely represents a background noise as this value is consistently a minimum on this microarray data set.

18

(A)

(B)

Figure 2. mRNA and small RNA expression of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g05020) (B). The browser show are tracks are taken from an adapted version of the UCSC Genome Browser (http://mustang. biol.mcgill.ca:8885/) (Haudry et al., 2013). The tracks represent: (1) In maroon, Arabidopsis thaliana gene models from TAIR9, (2) mRNA expression aggregated from several NCBI SRA experiments (SRR013411, SRR019183, SRR019209, SRR064165, SRR309186), the black representing average expression calculated from the combined data of the archived mRNA sequences, (3) known small non-coding RNAs aggregated by the Arabidopsis thaliana Small RNA Project (http://asrp.danforthcenter.org/).

Figure 3. Locations of T-DNA insertions in the context of the TOPHAT gene. There is an intron in the 5’ UTR where one of the T-DNA insertions lie. The remaining three are nestled in the middle of the CDS. The conserved PFAM domains are highlighted in green boxes. (1) BED Zinc-finger (2) Domain of unknown function (DUF4413) (3) hAT-C dimerization domain.

19

(A)

(B)

Figure 4. Synteny of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g05020) (B). The browser show are tracks are taken from an adapted version of the UCSC Genome Browser (http://mustang. biol.mcgill.ca:8885/) (Haudry et al., 2013). The tracks represent: (1) In maroon, Arabidopsis thaliana gene models from TAIR9, (2) Multiple sequence alignments showing conservation in Brassicaceae represented by the best orthologous chains between Arabidopsis thaliana and the respective cruciferous species, (3) Conservation in Brassicaceae broken down into the three groups of non-overlapping chains (B. rapa and L. alabamica) because of the large genomes due to whole-genome triplication.

20

Organism Identity (%) Positive (%) Sequence Length Arabidopsis lyrata 94.4 96.8 676 Capsella rubella 86.6 92.3 593 Brassica rapa 74.0 83.6 641 Carica papaya 62.8 76.9 655 Cucumis sativus 55.6 74.1 674 Mimulus guttatus 50.2 69.2 656 Eucalyptus grandis 55.8 74.0 663 Vitis vinifera 56.7 74.6 659 Amborella trichocarpa 41.4 46.3 685

Table 1. Comparison of Arabidopsis thaliana TOPHAT amino acid coding sequence to its orthologs in other fully sequenced species.

Initial phenotyping

As mentioned above, TOPHAT exhibited significant phenotypes in multiple abiotic stress tolerance screens, leading to an interest in the further examination of this DTE gene candidate. Four independently derived mutant alleles with homozygous T-DNA insertions were obtained from the Arabidopsis Biological Research Center, Ohio State University (Figure 3). Instead of backcrossing to insure that the phenotype corresponded to the mutant due to the volume of mutants being screened, a multi-allelic T-DNA approach was used. Since these T-DNA mutants are independently derived, multiple mutant alleles that show similar phenotypes would suggest that the phenotype is attributed to the insertion into the TOPHAT loci. When grown in standard Arabidopsis thaliana growth conditions, the mutant lines exhibited little difference compared to wild-type (Figure 5). This is unsurprising as many stress-related genes are only utilized when the plant is challenged by that particular stress. Basic morphology parameters such as average height, average rosette diameter, flowering time, silique count and seed count measurements were performed at regular intervals. None of these measures were significantly different from wild-type, even though the mutants had an obvious phenotype when faced by stress. Through the various abiotic stress screens performed as part of the putative DTE characterization pipeline, TOPHAT yielded significant phenotypes in freezing tolerance and salt tolerance compared to wild-type. When young seedlings are challenged with salt stress, the phenotypes (measured by percentage survival) are comparable to characterized reference

21 genes in literature, including SOS3 (SALT OVERLY SENSITIVE 3), a calcium sensor protein that has Na+ hypersensitive phenotypes when expression is knocked down (Figure 6) (Liu & Zhu, 1998). TOPHAT mutants appear to have a salt sensitivity phenotype, with survival rates significantly lower than the wild-type reference. When challenged with a freezing stress protocol, two of the TOPHAT mutants also showed reduced survivability, albeit not as drastically as the reference gene SFR2 (SENSITIVE TO FREEZING 2) used in the assay (examples shown in Figure 7) (Thorlby et al., 2004).

(A) (B)

Average Rosette Diameter at 30 Average Height at 30 DAS (Days DAS (Days After Sowing) (mm) After Sowing) (mm) 60 180 50 160 140 40 120 100 30 80 20 60 40 10 Average Height 20

0 0 Average Average rosette diameter

Plant Line Plant Line

Figure 5. Basic morphology measurements in Arabidopsis thaliana plants grown at standard conditions (21°C, 18h/6h day/night). (A) Average rosette diameter, (B) average height.

22

Primary screens on DTE lines 100

90

80 type

- Wild

70 KO4 -

60

Tophat KO3

50 -

KO2

type

-

- Survival Survival %

40 Tophat

Reference 2 Reference Wild

KO1

KO1

-

-

Tophat KO2

30 - Tophat

20 Tophat Tophat

10 1 Reference

Reference 2 Reference Reference 1 Reference 0 Salt Freezing Abiotic assay

Figure 6. The results of TOPHAT T-DNA mutant lines in the abiotic stress tolerance screens from the VEGI project. Salt tolerance and freezing tolerance assays were designed and performed by Zoé Joly-Lopez and Dr. Ewa Forczek, respectively. Reference genes were chosen from literature to provide a benchmark for the effectiveness of the assay. The reference gene shown for salt is SOS3 (SALT OVERLY SENSTIVE 3); for freezing, SFR2 (SENSITIVE TO FREEZING 2). Reference 1 and 2 refer to the two runs of the experiment (3 replicates). TOPHAT-KO1,2,3, and 4 refer to mutant lines CS860461, CS860391, SALK_017471C, and SALK_069419 respectively.

((

(A) (B)

A) Figure 7. Representative pictures from the freezing tolerance assay. (A) TOPHAT knockout (B) Wild-type. Experiments were conducted by Dr. Ewa Forczek. Seedlings were subjected to a freezing protocol and allowed to recover. Survivability was scored based on seedlings having 50% green after recovery. As seen in (A), many seedlings cannot recover from the treatment and remain yellow, whereas more plants are able to recover and continue growth in (B).

23

Additional phenotyping

In order to further characterize the pathways and mechanisms on which TOPHAT acts, overexpression lines of the gene were created in a wild-type background. These constructs were created with the pEARLEYGATE 100 vector (Earley et al., 2006), which contains a constitutive 35S CaMV promoter to overexpress the gene across all tissue types. Homozygous overexpression lines at the third generation (T3 lines), in addition to the original knockout mutant and wild-type lines, were subject to various phenotyping experiments. To ensure successful overexpression, semi-quantitative PCR was performed on an RNA extraction of whole seedlings, and later confirmed by RNA-sequencing data. Since abiotic stress phenotypes were uncovered in the knockout, osmotic stress-related experiments were performed with these lines. In summary, the set of osmotic-stress experiments supported the knockout phenotype and its relation to osmotic stresses but revealed no significant phenotypes for the overexpression line, which behaved much like wild-type plants. One would assume that if mutagenizing a locus causes sensitivity, then overexpressing it could result in tolerance. However, the overexpression of the gene does not appear to reverse the phenotypes in the opposite direction (i.e. tolerance as opposed to sensitivity), and the RNA sequencing results following this section support and help expose why this is the case.

Root length assay

The process of root formation is well-characterized in A. thaliana and often affected by environmental cues. A number of studies have established that root system architecture is influenced by environmental stresses such as salt and osmotic stress (Fitter & Stickland, 1991; Parida & Das, 2005). In particular, the formation of the lateral root is greatly repressed as water availability is reduced (Deak & Malamy, 2005). Correspondingly, An osmotic tolerant Arabidopsis thaliana mutant has been characterized to exhibit longer primary and lateral root architecture (Chen et al., 2012). A hypothesis is that a method to escape water availability

24 stress is to increase root growth to maximize water uptake and penetrate deep into the soil to search for moisture. As a convenient quantitative proxy to seedling growth, root elongation can be used to determine the responses of various mutants to osmotic treatments. Similar to the methods used to study the growth of Arabidopsis thaliana seedlings under water deficit (van der Weele, 2000), the wild-type, T-DNA knock out mutant, and overexpression lines were grown on a 1/2 MS agar medium for 7 days and transferred to media containing different salts and osmotica. Root length was measured by capturing images with a fixed camera and using RootTrace to measure the growth by pixels (French et al., 2009). As shown in Figure 8, both NaCl and KCl applications resulted in significantly reduced root growth in the knockout compared to wild-type. This is indicative of negative root architecture growth under osmotic stress. Overexpression lines showed wild-type levels of salt tolerance in all root length assays. The data suggests that the overexpression of TOPHAT does not reverse the osmotic phenotypes in the opposite direction. In the NaCl plates, the T-DNA mutants showed a 10% decrease when compared to wild-type. When replaced with KCl, the difference amplifies to a 39% decrease compared to wild-type. Control plates were also performed to ensure that the lines do not have inherently different root length architecture in the absence of stress. To determine whether TOPHAT mutant seedlings are hypersensitive to general osmotic stress or an ion-specific sensitivity, root elongation in response to mannitol was also measured (Figure 8). TOPHAT T-DNA mutant seedlings are sensitive to the osmotic stress caused by mannitol, suggesting that the preliminary phenotype uncovered during the stress is not just Na+ or K+ specific sensitivity. As for LiCl sensitivity, a similar trend was observed, although the difference was deemed statistically insignificant. It is possible that a non-optimal concentration was used for the assay. As a more toxic analogue of NaCl, it seems logical that the assay would yield a similar if not stronger phenotype. However, as this series of experiments suggests, it is not so much ion toxicity, but a general osmotic stress that is responsible for these phenotypes. It is conceivable that the lower concentration used to avoid Li+ toxicity was not enough to generate significant osmotic stress.

25

Root length grown on different osmotic treatments 60

50 WT OE-821 40 KO-S69

30

20 Root length(mm)

10

0 Control NaCl KCl Mannitol LiCl Treatment

Figure 8. Root length of the three genotypes grown in different osmotic treatments. Assays were performed over a spectrum of concentrations and a representative non-lethal but growth inhibitory concentration is shown. Concentrations are as follows: 50mM NaCl, 100mM KCl, 50mM mannitol, 10mM LiCl. Stars represent significance at a p≤0.05.

Germination rate

Previous studies have indicated that osmotic tolerance is developmentally specific and that tolerance at one stage in development is not necessarily correlated with tolerance at other stages (Deak & Malamy, 2005). Therefore, if we are to fully characterize the abiotic stress phenotypes, we must also study tolerance and sensitivity at different stages in plant development, such as germination. Another metric used to determine the fitness of the plant lines under stress is to determine the germination rate when the seedlings are grown directly in osmoticum. This assay targets the “initiation” stage of root development, a process distinctly different from the cell divisions that occur to form the lateral root (Casimiro et al., 2003). Seedlings were plated evenly on NaCl plates of increasing concentrations and scored for healthy developing cotyledons after 5 days. Similar to the previous osmotic assays, TOPHAT T-DNA

26 knockout mutants displayed a sensitivity response compared to wild-type (Figure 9). The overexpression line, consistent with prior results, behaved much like wild-type.

Germination rate on NaCl 100.00%

80.00%

60.00%

40.00% Wild-type

Germination rate(%) Tophat-OE 20.00% Tophat-KO

0.00% 0 80 120 160 Concentration of NaCl (nm)

Figure 9. Germination rate of the three genotypes on NaCl plates of increasing concentration. Stars represent significance at p≤0.05.

Transcriptomic analyses using an RNA-sequencing approach in Arabidopsis thaliana

Plants with the TOPHAT mutant genotype display phenotypes of increased sensitivity to freezing and salt. Expression profiling through RNA sequencing of the mutant and overexpression lines would provide essential background information pertaining to changes in known pathways and gene networks that cause these phenotypes. Here we report the results of the RNA-sequencing of wild-type, TOPHAT overexpression, and knockout mutant lines. Seeds from wild-type, SALK_069419 (knockout), and OE821 (overexpression) genotypes were plated on ½ MS plates after sterilization with bleach. Total RNA was isolated from each genotype at 7 days after sowing (Qiagen RNeasy Plant Mini). Ten plants per line were extracted

27 together to reduce biological noise, and two biological replicates per genotype were subject to Illumina HiSeq single-end 100 bp reads after TruSeq RNA preparation. Reads were aligned using TopHat and expression differences were analyzed using the Cuffdiff module in Cufflinks (Trapnell et al., 2012). Differential expression was assessed using an FDR-adjusted q-value, and significant genes are discussed (q <0.05). We also looked at genes using a more stringent q- value (0.001). GO categories were independently determined using GOSEQ to correct for length biases and agriGO SEA tool with TAIR10 annotations.

Sample OE-821_1 OE-821_2 KO-S69_1 KO-S69_2 WT_1 WT_2 Total reads 26,973,879 42,425,265 42,240,716 49,337,198 24,932,651 42,442,572 Aligned == 0 2,373,250 3,834,617 3,574,151 4,377,998 2,238,004 3,817,735 (8.80%) (9.04%) (8.46%) (8.87%) (8.98%) (9.00%) Aligned == 1 15,450,288 24,232,483 24,311,092 28,294,556 14,255,399 24,294,833 (57.28%) (57.12%) (57.55%) (57.35%) (57.18%) (57.24%) Aligned > 1 9,150,341 14,358,165 14,355,473 16,664,644 8,439,248 14,330,004 (33.92%) (33.84%) (33.98%) (33.78%) (33.85%) (33.76%) Total aligned 24,600,629 38,590,648 38,590,648 44,959,200 22,694,647 38,624,837 (91.20%) (90.96%) (91.54%) (91.13%) (91.02%) (91.00%)

Table 2. Summary of the RNA-SEQ data generated from the sequenced samples. HiSeq 2500 produced paired end reads of approximately 100 bp. Alignment was performed using TopHat on a reference TAIR9 A. thaliana genome with transposon annotations included. OE-821 refers to the 35S overexpression lines, KO-S69_1 refers to the homozygous T-DNA mutant line, and WT refers to Arabidopsis thaliana col- 0 used as control, the same lines on which the overexpression mutants were created. Aligned == 0 refers to reads that did not map to the , aligned == 1 refers to reads which aligned once, and aligned > 1 refers to reads which mapped more than once to the transcriptome, which are divided uniformly to each position to which it maps for the purposes of expression quantification.

We obtained an average of 38,058,714 hits per sample across the 6 samples. Approximately 9% of reads were unmapped, an expected number for a poly-A tail selected RNA sequencing effort (Table 2). Unmapped reads represent sequences that do not align to our reference transcriptome, due to sequence quality, true reads that are not found in our reference, contamination, or other factors. Consistent with expectations, the TOPHAT transcript was significantly reduced in expression in the knockout genotype (SALK_069419), while the overexpression line shows a significantly increased expression (Figure 10). The knockout mutant showed a 1.7 fold reduction in expression compared to wild-type, while the

28

Genotype Genome browser FPKM

1.45 Knockout (-56.6%)

Wild-type 3.59

Over- 9.44 expression (+163.0%)

Figure 10. Summary of the RNA-SEQ data generated from the sequenced samples. OE-821 refers to the 35S overexpression line, KO-S69_1 refers to the homozygous T-DNA mutant line, and WT is wild type col-0 used as control. FPKM refers to the fragments per kilobase of exon per million fragments mapped. The genome browser column represents a visual depiction of the FPKM readings in each individual base, the y axis is not shown but all images are depicted on the same scale, with the numerical average on the last column. overexpression showed a 2.0 fold increase, all of which showed very significant q-values. This shows that the T-DNA insertion was able to decrease the expression of the transcript, and the remaining hits likely may be non-functional due to its insertion in the centre of the coding region of the gene. It also shows that the overexpression construct was successfully able to increase expression of the gene. This is consistent with the initial semi-quantitative RT-PCR to determine relative levels of gene expression. We found that the constitutive overexpression of TOPHAT caused a differential expression of 103 genes, 101 of them up-regulated and 2 down-regulated. Among these 103 differentially expressed genes, 52 showed a 2 fold or greater increase, and only 1 showed a 2 fold or greater decrease. This suggests that the overexpression of TOPHAT is primarily causing a positive regulation in gene expression. In the mutant knockout line, a greater level of differential expression was observed (350 up- and 75 down-regulated), with 271 and 45 genes up- and down-regulated over 2 fold, respectively (Table 3). Only 64 out of 464 differentially expressed genes were shared (14%). The GO categories enriched also reflect a difference in the function of the genes affected by the genotypes. This has been reported before in the overexpression of TBPH (Hazelett et al., 2012) where the loss-of-function and overexpression of the gene leads to

29 non-overlapping cellular changes. This suggests that different cellular programs are being activated in these two situations. As well, we found only one gene other than TOPHAT with opposing directionality in expression level differences. The remainder of the shared genes displayed upregulation in both instances. At a higher stringency (q>0.001, >2 fold change), TOPHAT is the only gene with opposing directionality. It is further suggested by these numbers that the overexpression of TOPHAT is not responding to the same set of cellular functions as the knockout mutant.

Condition (q < 0.001) Up-regulated Down-regulated Total (compared to WT) TOPHAT knockout (S69) 350 75 425 TOPHAT knockout (S69) (>2 fold 271 45 316 change) TOPHAT overexpression (OE821) 101 2 103 TOPHAT overexpression (OE821) 92 2 94 (>2 fold change) Shared (between knockout and 64 (2 diff direction) 64 13.79% overexpression data sets) Shared (>2 fold change) 52 (1 diff direction) 52 14.53% Table 3. Summary of differentially expressed genes amongst the three genotypes.

Synergistic model of gene regulation

Before we discuss the gene sets in detail, it is important to note that the caution should be used when analyzing changes in gene regulation due to the modification in expression of a single regulatory element. It is becoming more and more evident that a combinatorial control of gene expression is a common mechanism in eukaryotic cells, as it allows for diversity in regulation using only a limited number of transcription factors. This is especially applicable to gene regulators derived from TEs as they are members of closely related genes. In the case of TOPHAT, the gene has paralogs that are species-specific (poplar, grape; Figure 14). Its closest BLAST hit in Arabidopsis thaliana, AT3G14800, was likely duplicated just after angiosperm differentiation, as well-conserved orthologs of the gene are found in most angiosperms. In a synergistic gene regulatory network, the knockout and overexpression of a single gene may

30

Figure 11. Synergistic model to explain the gene expression differences caused by overexpressing and knocking out TOPHAT in Arabidopsis thaliana. Pointy arrows represent an upregulation or influence on the pathway, and blunt arrows represent a downregulation or inhibition of the pathway. The left column represents regular TOPHAT interactions due to standard expression. The right column represents the effects of TOPHAT as it is overexpressed or knocked out; “incorrect cells” refers to the 35S over- expression promoter causing uniform overexpression in A. thaliana, and in the case of the knockout mutant, the lack of TOPHAT causing the SLEEPER TFs to interact in other combinations in cells where the TOPHAT interaction is expected. generate unexpected phenotypes or changes in gene expression, as is the case with our data, since so many interactors and pathways are affected. In the future, double knockout mutants with TOPHAT paralogs would allow a deeper understanding as to how these genes interact and which pathways are affected. In particular, AT3G14800, the closest paralog to TOPHAT, should be examined. As I will describe in a later section, TOPHAT, along with its paralog AT314800, are related to the SLEEPER family of hAT DTEs previously characterized (Knip et al., 2012, 2013). We believe TOPHAT and its paralog form a subfamily within the SLEEPER family of DTEs as they appear to descend from common phylogenetic origins.

31

Care should be used when interpreting the transcriptomic and phenotypic changes – overexpression causes factors involved in gene regulation to be expressed in cells where it is not normally found, and may yield phenotypes that are not naturally observed. However, this still suggests that those pathways are able to be controlled by the SLEEPER combinatorial regulon. The knockout mutant yielded abiotic stress response phenotypes which were confirmed transcriptomically. The overexpression lines showed an upregulation of biotic stress response genes, a portion of which were also up-regulated in the knockout. A model which would explain the differences in regulation involve the interaction of SLEEPER transcription factors in order to cause regulatory differences (Figure 11). These effects can be explained by a dimerization hypothesis, where SLEEPER genes (such as closely related SLEEPER-B gene AT3G14800) are working synergistically to cause effects in gene expression. Given the tissue co-expression data (Figure 12) and the presence of an active dimerization domain, hetero- and homo- dimerization seem to be a plausible process. The TOPHAT-SLEEPER interaction appears to be a negative regulator upstream of the abiotic stress pathway. When TOPHAT is knocked out, we lose the inhibition of the abiotic stress response genes. In the overexpression line, we do not see gene expression differences occurring in the opposite direction since we are not contributing to more of this complex, as it is limited by the other interacting SLEEPER genes. The overexpression of TOPHAT may cause abundant homodimerization of TOPHAT and may lead to the misregulation of genes, especially when it is overexpressed across all cell types. Expression profiles of SLEEPER genes are very similar (Figure 12), and they are not evenly expressed across cell types. In particular, genes in the SLEEPER family of DTEs tend to have highest levels of expression in apex tissues. The sudden presence of the TOPHAT homodimerization complex can have profound effects on regulation in unexpected and unnatural ways. The same process could be explained or further complemented by the overexpression of TOPHAT squelching the interactions of other SLEEPER interactions due to its abundance. With an overabundance of TOPHAT, TOPHAT-SLEEPER or other SLEEPER-SLEEPER interactions could be thrown out of balance. Therefore, genes controlled by the availability of

32 the common co-factors could be influenced by the misexpression of the TOPHAT transcription factor. The cascading effect of gene regulation and expression caused by transcription factors are complex and it may be difficult to tease apart direct targets and downstream targets. Gene expression analysis represents the first step in understanding the transcriptional networks that are potentially influenced by TOPHAT and how a DTE family can form a complex and integral system of gene regulation control. Since transposable elements have been shown to be induced by environmental factors in various organisms (Naito et al., 2009), TOPHAT and related SLEEPER genes may be acting collectively to repress a suite of stress related genes to ensure and maintain transiency of expression.

(A)

tissue type

33

(B)

tissue type

(C)

tissue type

34

(D)

tissue type

(E)

tissue type

Figure 12. Expression profiles of SLEEPER genes. Asides from AT1G15300 (CYTOSLEEPER), the SLEEPER genes show striking similarities in cell type localization, supporting the hetero-dimerization and additive effect hypothesis. A-C represent the SLEEPER-A subfamily and D-E represent the SLEEPER-B subfamily (A) AT1G15300 (CYTOSLEEPER) (B) AT1G80020 (C) AT3G42170 (DAYSLEEPER) (D) AT1G69950 (TOPHAT) (E) AT3G14800. The tissues include root, stem, leaf, whole plant, apex, flowers, floral organs, and seeds. Images were visualized using the AtGenExpress Visualization Tool (Schmid et al., 2005). The individual points represent different samples of the tissue type, available as a list from the Table S1 in the cited paper. The y-axis represents normalized signal intensity of the microarray.

35

Transcriptional differences common to TOPHAT knockout mutant and overexpression genotypes

Although the majority of gene expression differences were not shared, commonly up- regulated genes account for a considerable portion of the data when we take into account that 64 of the 101 up-regulated genes (63%) in the overexpression construct are also up-regulated in the knockout mutant (Table 3). These genes do not include any involved in the osmotic and abiotic stress related phenotypes, such as genes involved in the COLD BINDING FACTOR (CBF) pathway, which are exclusively found in the gene set belonging to the knockout mutant. In the GOSEQ results, there appears to be an enrichment of GO categories responsible for biotic stresses, such as “response to stress” (GO:0006950) (Table 4). This set of GO terms appear to be similar to the set enriched in genes only found in the overexpressed line. In fact, 7 of the 8 GO categories overlap from the two sets (Table 4 and Table 1). This suggests that the regulation pathways are somewhat overlapping – knocking out TOPHAT affects the regulation of both biotic and abiotic response genes. However, overexpression only causes differences in biotic response genes.

Upregulated in knockout and overexpression mutants vs. WT

GO term (Biological Process) p-value response to stress 5.80E-06 secondary metabolic process 4.20E-05 response to stimulus 0.00048 response to other organism 0.0011 response to biotic stimulus 0.0014 defense response 0.0032 multi-organism process 0.0033 cellular response to stimulus 0.0047 Table 4. GO terms enriched in the upregulated genes shared by the knockout mutant and overexpression genotypes compared with wild-type. In bold are GO terms that are common with genes enriched in the overexpression genotype compared with the wild-type (Table 7).

Transcriptional differences in the TOPHAT knockout mutant genotype

36

Plants with the TOPHAT knockout mutant genotype display phenotypes of reduced cold and salt tolerance in the initial stress tolerance screens. Expression analyses confirm the osmotic stress phenotypes found in the knockout line. In the GOSEQ results, there appears to be an enrichment of GO categories responsible for these phenotypes in the mutant line compared to wild-type and overexpression line, such as response to water deprivation, response to cold, response to abscisic acid (ABA), abscisic acid-activated signaling pathway (Table 5). As described in the above model (Figure 11), TOPHAT may be involved in a co-regulatory effort to suppress genes involved in the CBF osmotic response pathway. When knocked out, this suppression is lost and CBF response genes are upregulated in expression. This mechanism appears to be separate from the well-characterized ICE1 regulatory factor, which acts as an upstream initiator whose transcript is accumulated in low temperatures and regulate CBF transcription factors (Chinnusamy et al., 2003). We do not see any gene expression differences in ICE1, leading us to believe that TOPHAT is not directly upstream of the ICE1-CBF pathway. Its absence also alludes to the fact that the samples were not just incorrectly subject to cold temperatures leading to the accumulation of CBF transcripts, as ICE1 would certainly be present in that scenario. Though it may seem counterintuitive that the constitutive expression of CBF genes could result in decreased resistance to osmotic stress, there are examples in literature where this is also the case. The Arabidopsis thaliana FRY1 locus has been found to encode an inositol polyphosphate 1-phosphatase, and the effect of its knockout results in a loss of transiency in CBF2 expression (Xiong et al., 2001). Like TOPHAT, fry1 mutant plants show an increased expression of stress-responsive genes, and phenotypically display sensitivity to low temperature and salt stress. It is hypothesized that FRY1 has roles in the attenuation of stress responses. These results contrast the results of transgenic studies done to characterize the CBF pathway, where the overexpression of CBF TFs commonly results in an enhanced resistance to abiotic stresses (Maruyama et al., 2004; Novillo et al., 2007). It may also be that the mutation of

37 these genes impair unknown signaling mechanisms required for stress tolerance, even though the classically defined pathways are upregulated. In terms of overexpressed genes, CBF1, CBF2, and CBF3, genes with prominent roles in cold acclimation (Thomashow, 2010), all showed up as significant differentially regulated genes. CBF2 may regulate the other CBF genes negatively to ensure transiency and control of the regulon (Novillo et al., 2007), whereas CBF1 and 3 activate a subset of CBF-target genes, prominently characterized ones being KIN1, KIN2, COR15B, COR15A, LT178, COR47, and ERD10. Interestingly, even though all three CBF factors in the cold regulatory regulon are upregulated, none of the target genes are overexpressed in our gene list. In fact, only 4 of the 38 genes found to be upregulated by CBF3 showed differential expression (Maruyama et al., 2004). This perhaps shows that the non-transiency of the CBF regulon induced by the TOPHAT knockout mutant may negatively affect the plants ability to cope with cold stress. Since there is a basal upregulation of the CBF genes caused by knocking out TOPHAT, they are less sensitive to gene induction by abiotic stress cues. It may also suggest that the CBF regulon is not sufficient and that there are other cues or signalling mechanisms caused by cold stress that are also required to activate the CBF-target genes. Functional analysis of GO categories found significant enrichment in genes which involved stress and defense response (Table 6). DREB transcription factors, ABRE-containing genes, and other factors involved in osmotic stress are largely over-represented categories of genes in the knockout, matching the observed phenotype. LEA7 and LEA4-5 are LEA genes that typically accumulate in response to water deprivation and are widely found in plants (Battaglia & Covarrubias, 2013). Most genes encoding LEA proteins have ABRE (abscisic acid responsive elements) and LTRE (low temperature responsive elements) elements in their promoters and are activated by the respective stresses (Hundertmark & Hincha, 2008). RD29B encodes a protein that is induced in response to water deprivation stresses such as cold, salt, and drought via the abscisic acid pathway (Nakashima et al., 2006). The promoter region of RD29B also contains two ABRE elements. A number of DREB (dehydration responsive element binding) genes also show up in our gene list. DREBs are transcription factors that regulate stress- inducible genes by interacting with a DRE/CRT element found in many abiotic stress response

38 genes (Lata & Prasad, 2011). Lastly, PUB genes are plant U-box armadillo repeat proteins that encode an E3 ubiquitin ligase and are observed to be involved in the salt inhibition of germination (Bergler & Hoth, 2011). Since many genes upregulated in the knockout mutant encode for cold and osmotic response genes, this suggests that perhaps the co-regulation of SLEEPER genes interacting with TOPHAT acts on common elements found in the promoters of these genes such as ABRE, DRE/CRT and LTRE elements.

Up-regulated in knockout mutant vs. WT

GO term (Biological Process) p-value response to chemical stimulus 1.30E-14 response to chitin 1.30E-13 response to stimulus 5.30E-13 response to carbohydrate stimulus 2.00E-12 plant-type cell wall organization 2.70E-12 response to stress 7.50E-12 response to water deprivation 3.60E-09 response to organic substance 4.80E-09 response to water 5.70E-09 response to abscisic acid stimulus 2.60E-05 defense response 6.70E-05 response to oxidative stress 6.70E-05 response to abiotic stimulus 8.90E-05 protein ubiquitination 0.00015 Table 5. GO terms enriched in the knockout mutant genotype compared with wild-type. Highlighted are GO terms that are related to the observed phenotype.

Up-regulated Rank Gene Locus Gene Symbol Description Fold change 1 AT5G44120 CRU1 RmlC-like cupins superfamily protein 22.72 2 AT1G52690 LEA7 Late embryogenesis abundant protein (LEA) family protein 15.46 3 AT5G59310 LTP4 lipid transfer protein 4 14.17 4 AT5G50360 11.70 5 AT5G35935 transposable element gene 11.53 6 AT1G47600 TGG4 beta glucosidase 34 11.15 7 AT1G52060 Mannose-binding lectin superfamily protein 10.03 8 AT2G47770 TSPO TSPO(outer membrane tryptophan-rich sensory protein)- 9.42 related 9 AT5G06760 LEA4-5 Late Embryogenesis Abundant 4-5 9.23 10 AT1G60190 PUB19 ARM repeat superfamily protein 8.63 11 AT1G56242 other RNA 8.34 12 AT1G05675 UDP-Glycosyltransferase superfamily protein 8.25 13 AT3G02840 ARM repeat superfamily protein 7.71 14 AT1G51470 TGG5 beta glucosidase 35 7.68 15 AT5G24110 WRKY30 WRKY DNA-binding protein 30 7.44

39

16 AT3G16440 MLP-300B myrosinase-binding protein-like protein-300B 7.22 17 AT1G56240 PP2-B13 phloem protein 2-B13 7.12 18 AT1G21326 VQ motif-containing protein 6.80 19 AT4G27657 6.75 20 AT5G61550 U-box domain-containing protein kinase family protein 6.43 21 AT2G22880 VQ motif-containing protein 6.32 22 AT4G25490 DREB1B C-repeat/DRE binding factor 1 6.05 23 AT1G58420 Uncharacterised conserved protein UCP031279 5.97 24 AT4G28140 Integrase-type DNA-binding superfamily protein 5.57 25 AT5G52300 RD29B CAP160 protein 5.53 26 AT4G15210 RAM1 beta-amylase 5 5.34 27 AT3G02480 Late embryogenesis abundant protein (LEA) family protein 5.33 28 AT5G04200 MCP2f metacaspase 9 5.31 29 AT3G48650 4.95 30 AT1G61800 GPT2 glucose-6-phosphate/phosphate translocator 2 4.92 31 AT5G59320 LTP3 lipid transfer protein 3 4.84 32 AT1G64380 Integrase-type DNA-binding superfamily protein 4.75 33 AT3G29000 Calcium-binding EF-hand family protein 4.65 34 AT3G10930 4.65 35 AT1G26820 RNS3 ribonuclease 3 4.57 36 AT4G08400 Proline-rich extensin-like family protein 4.44 37 AT5G06630 proline-rich extensin-like family protein 4.39 38 AT1G79680 WAKL10 WALL ASSOCIATED KINASE (WAK)-LIKE 10 4.37 39 AT4G25470 FTQ4 C-repeat/DRE binding factor 2 4.37 40 AT2G24980 EXT6 Proline-rich extensin-like family protein 4.37 41 AT1G65390 PP2-A5 phloem protein 2 A5 4.24 42 AT5G54370 Late embryogenesis abundant (LEA) protein-related 4.16 43 AT3G14440 STO1 nine-cis-epoxycarotenoid dioxygenase 3 4.13 44 AT5G35190 EXT13 proline-rich extensin-like family protein 4.12 45 AT1G52070 Mannose-binding lectin superfamily protein 4.07 46 AT5G03210 DIP2 4.04 47 AT5G49080 EXT11 transposable element gene 4.00 48 AT5G60530 late embryogenesis abundant protein-related / LEA protein- 3.97 related 49 AT5G48920 TED7 tracheary element differentiation-related 7 3.93 50 AT5G03260 LAC11 laccase 11 3.92 51 AT4G25480 DREB1A dehydration response element B1A 3.92 52 AT4G33810 Glycosyl hydrolase superfamily protein 3.84 53 AT4G13390 EXT12 Proline-rich extensin-like family protein 3.64 54 AT4G08410 Proline-rich extensin-like family protein 3.61 55 AT1G11190 ENDO1 bifunctional nuclease i 3.46 56 AT5G06640 EXT10 Proline-rich extensin-like family protein 3.42 57 AT4G24570 DIC2 dicarboxylate carrier 2 3.40 58 AT3G54590 HRGP1 hydroxyproline-rich glycoprotein 3.39 59 AT5G63130 Octicosapeptide/Phox/Bem1p family protein 3.36 60 AT1G54970 RHS7 proline-rich protein 1 3.34 61 AT4G23810 WRKY53 WRKY family transcription factor 3.31 62 AT4G37370 CYP81D8 cytochrome P450, family 81, subfamily D, polypeptide 8 3.30 63 AT4G11521 Receptor-like protein kinase-related family protein 3.16 64 AT1G21120 IGMT2 O-methyltransferase family protein 3.13 65 AT5G67400 RHS19 root hair specific 19 3.10 66 AT5G09800 ARM repeat superfamily protein 3.10 67 AT1G43790 TED6 tracheary element differentiation-related 6 3.01 68 AT1G17147 VQ motif-containing protein 3.00 69 AT3G15210 RAP2.5 ethylene responsive element binding factor 4 3.00 70 AT2G34790 MEE23 FAD-binding Berberine family protein 3.00 71 AT1G58370 RXF12 glycosyl hydrolase family 10 protein / carbohydrate-binding 2.99 domain-containing protein

40

72 AT1G30870 Peroxidase superfamily protein 2.97 73 AT2G32680 RLP23 receptor like protein 23 2.90 74 AT4G20000 VQ motif-containing protein 2.89 75 AT3G22830 HSFA6B heat shock transcription factor A6B 2.89 76 AT1G07430 HAI2 highly ABA-induced PP2C gene 2 2.81 77 AT3G62680 PRP3 proline-rich protein 3 2.78 78 AT3G21550 DMP2 DUF679 domain membrane protein 2 2.76 79 AT5G57220 CYP81F2 cytochrome P450, family 81, subfamily F, polypeptide 2 2.75 80 AT3G54580 Proline-rich extensin-like family protein 2.75 81 AT1G49450 Transducin/WD40 repeat-like superfamily protein 2.73 82 AT1G28370 ERF11 ERF domain protein 11 2.67 83 AT1G69490 NAP NAC-like, activated by AP3/PI 2.67 84 AT4G26010 Peroxidase superfamily protein 2.66 85 AT1G19020 2.64 86 AT5G48540 receptor-like protein kinase-related family protein 2.63 87 AT4G02410 Concanavalin A-like lectin protein kinase family protein 2.58 88 AT5G52050 MATE efflux family protein 2.58 89 AT5G15870 glycosyl hydrolase family 81 protein 2.58 90 AT5G04340 ZAT6 zinc finger of Arabidopsis thaliana 6 2.55 91 AT3G49960 Peroxidase superfamily protein 2.54 92 AT2G24600 Ankyrin repeat family protein 2.53 93 AT5G08640 FLS1 flavonol synthase 1 2.52 94 AT1G23720 Proline-rich extensin-like family protein 2.52 95 AT4G21390 B120 S-locus lectin protein kinase family protein 2.51 96 AT1G05575 2.51 97 AT5G13930 TT4 Chalcone and stilbene synthase family protein 2.49 98 AT2G36690 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase 2.47 superfamily protein 99 AT3G11840 PUB24 plant U-box 24 2.47 100 AT3G44550 FAR5 fatty acid reductase 5 2.47 101 AT2G40095 Alpha/beta hydrolase related protein 2.47 102 AT5G19240 Glycoprotein membrane precursor GPI-anchored 2.46 103 AT4G18205 Nucleotide-sugar transporter family protein 2.46 104 AT2G19190 FRK1 FLG22-induced receptor-like kinase 1 2.43 105 AT1G52890 NAC019 NAC domain containing protein 19 2.40 106 AT4G18880 HSF A4A heat shock transcription factor A4A 2.40 107 AT1G05240 Peroxidase superfamily protein 2.38 108 AT5G62470 MYBCOV1 myb domain protein 96 2.37 109 AT1G05250 Peroxidase superfamily protein 2.36 110 AT4G18197 PUP7 purine permease 7 2.36 111 AT5G04960 Plant invertase/pectin methylesterase inhibitor superfamily 2.36 112 AT4G02330 PME41 Plant invertase/pectin methylesterase inhibitor superfamily 2.36 113 AT2G25735 2.35 114 AT1G80840 WRKY40 WRKY DNA-binding protein 40 2.35 115 AT1G73010 PS2 phosphate starvation-induced gene 2 2.34 116 AT4G26200 ATACS7 1-amino-cyclopropane-1-carboxylate synthase 7 2.34 117 AT5G66400 RAB18 Dehydrin family protein 2.33 118 AT2G33580 Protein kinase superfamily protein 2.32 119 AT3G18710 PUB29 plant U-box 29 2.32 120 AT5G07080 HXXXD-type acyl-transferase family protein 2.32 121 AT2G38390 Peroxidase superfamily protein 2.32 122 AT4G25820 XTR9 xyloglucan endotransglucosylase/hydrolase 14 2.29 123 AT2G31880 SOBIR1 Leucine-rich repeat protein kinase family protein 2.29 124 AT3G24420 alpha/beta-Hydrolases superfamily protein 2.29 125 AT1G55450 S-adenosyl-L-methionine-dependent methyltransferases 2.28 superfamily protein 126 AT5G27420 CNI1 carbon/nitrogen insensitive 1 2.28 127 AT1G21110 IGMT3 O-methyltransferase family protein 2.27

41

128 AT2G41190 Transmembrane amino acid transporter family protein 2.27 129 AT1G52040 MBP1 myrosinase-binding protein 1 2.26 130 AT1G16130 WAKL2 wall associated kinase-like 2 2.25 131 AT5G18470 Curculin-like (mannose-binding) lectin family protein 2.24 132 AT5G64660 CMPG2 CYS, MET, PRO, and GLY protein 2 2.23 133 AT3G52450 PUB22 plant U-box 22 2.22 134 AT5G01542 2.22 135 AT2G45130 SPX3 SPX domain gene 3 2.21 136 AT1G02335 GL22 germin-like protein subfamily 2 member 2 precursor 2.21 137 AT1G17710 PEPC1 Pyridoxal phosphate phosphatase-related protein 2.20 138 AT1G23710 Protein of unknown function (DUF1645) 2.19 139 AT5G41040 HXXXD-type acyl-transferase family protein 2.19 140 AT1G66160 CMPG1 CYS, MET, PRO, and GLY protein 1 2.18 141 AT3G03530 NPC4 non-specific phospholipase C4 2.18 142 AT5G05410 DREB2A DRE-binding protein 2A 2.18 143 AT3G28550 Proline-rich extensin-like family protein 2.18 144 AT2G30210 LAC3 laccase 3 2.18 145 AT1G33610 Leucine-rich repeat (LRR) family protein 2.17 146 AT4G12470 AZI1 azelaic acid induced 1 2.16 147 AT5G52750 Heavy metal transport/detoxification superfamily protein 2.15 148 AT1G51800 IOS1 Leucine-rich repeat protein kinase family protein 2.14 149 AT3G28340 GolS8 galacturonosyltransferase-like 10 2.13 150 AT3G26210 CYP71B23 cytochrome P450, family 71, subfamily B, polypeptide 23 2.12 151 AT5G63560 HXXXD-type acyl-transferase family protein 2.08 152 AT4G29780 2.07 153 AT1G73540 NUDT21 nudix hydrolase homolog 21 2.06 154 AT5G09530 PRP10 hydroxyproline-rich glycoprotein family protein 2.05 155 AT2G23540 GDSL-like Lipase/Acylhydrolase superfamily protein 2.04 156 AT2G30020 Protein phosphatase 2C family protein 2.03 157 AT1G51850 Leucine-rich repeat protein kinase family protein 2.01

Down-regulated Rank Gene Gene Symbol Description Fold Locus change 1 AT1G53490 HEI10 RING/U-box superfamily protein -15.10 2 AT3G19680 Protein of unknown function (DUF1005) -5.49 3 AT2G41240 BHLH100 basic helix-loop-helix protein 100 -4.93 4 AT1G50040 Protein of unknown function (DUF1005) -4.68 5 AT1G22470 -3.98 6 AT5G25240 -3.26 7 AT2G32100 OFP16 ovate family protein 16 -3.22 8 AT4G08950 EXO Phosphate-responsive 1 family protein -3.17 9 AT5G57560 XTH22 Xyloglucan endotransglucosylase/hydrolase family protein -3.01 10 AT2G34600 TIFY5B jasmonate-zim-domain protein 7 -2.99 11 AT1G35140 EXL1 Phosphate-responsive 1 family protein -2.92 12 AT5G11330 FAD/NAD(P)-binding oxidoreductase family protein -2.79 13 AT2G23290 MYB70 myb domain protein 70 -2.74 14 AT4G27280 Calcium-binding EF-hand family protein -2.71 15 AT3G50800 -2.57 16 AT2G17230 EXL5 EXORDIUM like 5 -2.56 17 AT2G47440 Tetratricopeptide repeat (TPR)-like superfamily protein -2.55 18 AT1G69950 TOPHAT SLEEPER-family domesticated transposable element -2.47 19 AT2G42870 PAR1 phy rapidly regulated 1 -2.43 20 AT5G19190 -2.37 21 AT1G72430 SAUR-like auxin-responsive protein family -2.27 22 AT2G23130 ATAGP17 arabinogalactan protein 17 -2.26 23 AT4G37240 -2.21 24 AT4G37260 MYB73 myb domain protein 73 -2.20

42

25 AT3G50060 MYB77 myb domain protein 77 -2.14 26 AT5G61600 ERF104 ethylene response factor 104 -2.06 27 AT2G36050 OFP15 ovate family protein 15 -2.05 Table 6. Gene expression differences in the TOPHAT knockout mutant line. A total of 157 genes were up-regulated and 27 genes were down-regulated greater than a 2 fold change, at a q-value cut off of 0.001.

Transcriptional differences in the TOPHAT overexpression genotype

TOPHAT overexpression lines did not exhibit visible osmotic or deleterious phenotypes, but their transcriptome shows that the upregulation of TOPHAT indeed causes sweeping changes in gene regulation. We expected to see a large cluster of genes with opposing regulation in loss- versus gain-of-function mutants. However, we found only a small number of the differentially expressed genes to show regulation in opposite directions. This could be because TOPHAT may act redundantly and in conjunction with other transcription factors or SLEEPER genes to change gene expression (Refer to Figure 11 for model). This is supported by the high degree of conservation of the dimerization domain in SLEEPER genes, which has been demonstrated to dimerize in vitro (Essers et al., 2000). Interestingly, RNA-sequencing establishes that a large group of pathogen defence genes are expressed constitutively in the overexpression line. Overexpression of TOPHAT predominantly caused an upregulation of genes involved in the salicylic acid mediated signalling pathway and defense response genes, particularly in response to fungus and bacteria (Table 7). Proteins encoded by these genes play a critical role in pathogen recognition and signal perception followed by activation of defence responses. Out of the 26 genes solely upregulated in the overexpression construct, 15 were implicated as defence-response genes in previous reports, and 5 were implicated in pathogen-plant stress transcriptomic projects, for a total of 20 out of 26 genes (Table 9, references on table). Examples of genes upregulated include PATHOGENESIS-RELATED 1, 2, and 5 genes (Glazebrook, 1999), considered to be markers for systemic acquired resistance (SAR). PR genes are typically activated by salicylic acid in response to attacks by pathogens. Similar to the case of the TOPHAT knockout, it is possible that the SLEEPER regulon can act on common elements

43 found in biotic tolerance response genes, such as the TACA(A/G)T element found in PR genes, since these genes were upregulated in the absence of pathogen stress.

Up-regulated in overexpression mutant vs. WT

GO term (Biological Process) p-value defense response 3.30E-08 response to other organism 7.60E-07 response to biotic stimulus 1.20E-06 response to stress 3.50E-06 multi-organism process 5.10E-06 response to stimulus 7.70E-05 secondary metabolic process 0.00036 response to organic substance 0.0063 response to endogenous stimulus 0.01 Table 7. GO terms enriched in the overexpression genotype compared with wild-type. All terms appear to be related to biotic stress response.

Up-regulated Rank Gene Locus Gene Symbol Description Fold change 1 AT5G54075 U3D U3D; snoRNA 11.35 2 AT1G52690 LEA7 Late embryogenesis abundant protein (LEA) family protein 7.65 3 AT3G21010 transposable element gene 6.76 4 AT1G47600 TGG4 beta glucosidase 34 6.44 5 AT1G61120 TPS4 terpene synthase 04 6.04 6 AT5G61550 U-box domain-containing protein kinase family protein 6.02 7 AT1G21240 WAK3 wall associated kinase 3 5.69 8 AT4G24430 Rhamnogalacturonate lyase family protein 5.61 9 AT1G51470 TGG5 beta glucosidase 35 5.16 10 AT5G04200 MCP2f metacaspase 9 5.09 11 AT4G10500 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase 4.79 superfamily protein 12 AT3G57260 PR2 beta-1,3-glucanase 2 4.33 13 AT1G33960 AIG1 P-loop containing nucleoside triphosphate hydrolases 4.00 superfamily protein 14 AT1G26820 RNS3 ribonuclease 3 3.94 15 AT2G32680 RLP23 receptor like protein 23 3.70 16 AT2G43570 CHI chitinase, putative 3.59 17 AT5G54370 Late embryogenesis abundant (LEA) protein-related 3.52 18 AT5G03260 LAC11 laccase 11 3.49 19 AT5G48920 TED7 tracheary element differentiation-related 7 3.48 20 AT5G13320 WIN3 Auxin-responsive GH3 family protein 3.44 21 AT1G52070 Mannose-binding lectin superfamily protein 3.39 22 AT2G14610 PR1 pathogenesis-related gene 1 3.27 23 AT1G05675 UDP-Glycosyltransferase superfamily protein 2.97 24 AT5G35190 EXT13 proline-rich extensin-like family protein 2.95 25 AT3G55500 EXPA16 expansin A16 2.93 26 AT2G44910 HB4 homeobox-leucine zipper protein 4 2.91 27 AT3G21550 DMP2 DUF679 domain membrane protein 2 2.86 28 AT5G52760 Copper transport protein family 2.85

44

29 AT1G58370 RXF12 glycosyl hydrolase family 10 protein / carbohydrate-binding 2.75 domain-containing protein 30 AT1G43790 TED6 tracheary element differentiation-related 6 2.74 31 AT5G07080 HXXXD-type acyl-transferase family protein 2.68 32 AT5G52300 RD29B CAP160 protein 2.63 33 AT1G69950 TOPHAT SLEEPER-family domesticated transposable element 2.63 34 AT3G45860 CRK4 cysteine-rich RLK (RECEPTOR-like protein kinase) 4 2.55 35 AT2G24850 TAT3 tyrosine aminotransferase 3 2.48 36 AT1G14880 PCR1 PLANT CADMIUM RESISTANCE 1 2.48 37 AT3G25010 RLP41 receptor like protein 41 2.41 38 AT5G10760 Eukaryotic aspartyl protease family protein 2.39 39 AT2G19190 FRK1 FLG22-induced receptor-like kinase 1 2.36 40 AT3G54590 HRGP1 hydroxyproline-rich glycoprotein 2.30 41 AT1G02335 GL22 germin-like protein subfamily 2 member 2 precursor 2.20 42 AT2G45220 Plant invertase/pectin methylesterase inhibitor superfamily 2.19 43 AT2G14560 LURP1 Protein of unknown function (DUF567) 2.18 44 AT5G57220 CYP81F2 cytochrome P450, family 81, subfamily F, polypeptide 2 2.15 45 AT2G18690 2.09 46 AT4G37370 CYP81D8 cytochrome P450, family 81, subfamily D, polypeptide 8 2.07 47 AT2G26560 PLP2 phospholipase A 2A 2.06 48 AT4G23700 CHX17 cation/H+ exchanger 17 2.02 49 AT1G32100 PRR1 pinoresinol reductase 1 2.02 50 AT3G01290 HIR2 SPFH/Band 7/PHB domain-containing membrane-associated 2.01 protein family 51 AT3G26210 CYP71B23 cytochrome P450, family 71, subfamily B, polypeptide 23 2.01

Down-regulated Rank Gene Locus Gene Symbol Description Fold change 1 AT5G53902 U3B U3B; snoRNA -2.12

Table 8. Gene expression differences in the TOPHAT overexpression line. A total of 51 genes were up- regulated and 1 gene was down-regulated greater than a 2 fold change, at a q-value cut off of 0.001.

Up-regulated Fold difference Locus Gene description (OE821/WT) Reference AT5G54075 U3 small nucleolar RNA 11.35 AT3G21010 copia-like retrotransposon family 6.76 AT1G61120 Geranyllinalool synthase, precursor to TMTT 6.04 (Attaran et al., 2008) AT1G21240 Wall-associated kinase 5.69 (Lakshmanan et al., 2013) AT4G24430 Rhamnogalacturonate lyase family protein 5.61 AT4G10500 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase 4.79 (Zeilmaker et al., 2015) superfamily protein AT3G57260 Pathogenesis-related gene 2 (PR2) Beta-1,3-Glucanase 2 4.33 (Glazebrook, 1999) AT1G33960 AVRRPT2-induced gene 1 (AIG1) 4.00 (Wang & Li, 2009) AT2G43570 Putative chitinase (CHI) 3.59 (Nishimura et al., 2003) AT5G13320 AVRPPHB-susceptible 3 (PBS3) 3.44 (Nobuta et al., 2007) AT2G14610 Pathogenesis-related PR1 protein (PR1) 3.27 (Glazebrook, 1999) AT3G55500 Expansin-like protein (EXPA16) 2.93 AT2G44910 Homeobox-leucine zipper protein 4 (HB4) 2.91 AT5G52760 Copper transport protein family 2.85 (Sato et al., 2007; Ascencio- Ibáñez et al., 2008)

45

AT3G45860 Cysteine-rich receptor-like protein kinase (CRK4) 2.55 (Chen et al., 2004; Du & Chen, 2008) AT2G24850 Tyrosine aminotransferase (TAT3) 2.48 (Titarenko et al., 1997) AT1G14880 Plant cadmium resistance 1 (PCR1); 2.48 AT3G25010 Receptor like protein 41 (RLP41) 2.41 (Ellendorff et al., 2008) AT5G10760 Eukaryotic aspartyl protease family protein 2.39 (Breitenbach et al., 2014) AT2G45220 Plant invertase/pectin methylesterase inhibitor 2.19 (Bethke et al., 2014) superfamily AT2G14560 Late upregulated in response to H. parasitica (LURP1) 2.18 (Knoth & Eulgem, 2008) AT2G18690 Unknown protein 2.09 (Whitham et al., 2003; Ascencio-Ibáñez et al., 2008) AT2G26560 Phospholipase A 2A (PLP2) 2.06 (La Camera et al., 2009) AT4G23700 Member of Putative Na+/H+ antiporter family (CHX17) 2.02 (Ascencio-Ibáñez et al., 2008) AT1G32100 Pinoresinol reductase 1 (PRR1) 2.02 (Ascencio-Ibáñez et al., 2008) AT3G01290 SPFH/Band 7/PHB domain-containing membrane- 2.01 (Qi et al., 2011) associated protein family (HIR2)

Table 9. List of genes that were differentially expressed in only the overexpression vs. wild-type comparison (not found in knockout vs. wild-type comparison). Highlighted in dark grey are genes shown to play a core role in literature. Genes in light grey are implicated in pathogen-plant stress transcriptomic projects or pathogen response papers to be involved in the pathways.

46

Biotic stress response phenotype

As sessile organisms, plants have developed a complex range of mechanisms to defend themselves against physical and biological threats. Plant resistance to biotic stresses is usually initiated by the recognition of the pathogen, which triggers local and systemic signalling pathways to mount defence mechanisms. Immediately, the hypersensitive response (HR) is activated, which causes rapid local tissue necrosis to prevent the spread of the infection. The systemic acquired resistance (SAR) response follows to prime the plant against a broad spectrum of pathogens (Ryals et al., 1996). SAR requires salicylic acid to mediate the upregulation of PATHOGENESIS-RELATED (PR) genes, which work collaboratively to confer resistance to a range of pathogens (Zhu et al., 1994). In vivo studies in tobacco involving the overexpression of PR1 leads to a significant increase in pathogen resistance (Alexander et al., 1993). As RNA-sequencing data uncovered a concentration of upregulated pathogen defence- related genes in the TOPHAT overexpression line (Table 9), it is prudent to determine whether the plants confer a pathogen response phenotype. Pseudomonas syringae is an important model system for the experimental characterization of the molecular dynamics of plant-pathogen interactions (Katagiri et al., 2002). In particular, the P. syringae pathovar tomato strain (DC3000) is able to infect not only its natural host tomato, but also Arabidopsis thaliana in a laboratory setting. The pathovar carries over 200 potential virulence factors (Buell et al., 2003), and has provided an understanding of the complex molecular events that are involved in the ability of bacterial pathogens to suppress plant immune responses (Abramovitch & Martin, 2004). A seedling flood inoculation assay using the model pathogen P. syringae was performed in triplicate to assess this phenotype (Ishiga et al., 2011). This protocol was favoured over the more typical dip-inoculation and syringe-inoculation methods as it allowed us to assay the plants at an earlier growth stage. Since the RNA-sequencing data that suggested a possible pathogen response phenotype was derived from 10 day old seedlings, a similar growth stage may better relate the mechanisms at play.

47

A reduced susceptibility was observed in transgenic lines overexpressing TOPHAT compared to wild-type and knockout plants (Figure 13). These phenotypes correspond to the upregulation in salicylic acid mediated signalling pathway and defense response genes. It was unknown whether this constitutive overexpression in defense genes would cause a tolerant or susceptible phenotype, as in the case of the abiotic stress genes, their decreased transiency could cause their inability to respond to the particular stress. It appears in this case, the overexpression of TOPHAT leads to a change in gene regulation of a wealth of biotic response genes, and their upregulation seems to cause a slight but significant tolerance when challenged with biotic stress.

Bacterial growth after inoculation with P. syringae

150

120

90

60 Bacterial Bacterial count100mg /

30

0 WT Tophat-OE Tophat-KO Genotype Figure 13. Bacterial growth after seedling flood inoculation using Pseudomonas syringae DC3000. The assay was performed in triplicate. Stars represent significance of p≤0.05. Bars represent 95% confidence intervals.

48

Bioinformatic approaches to the analysis of TOPHAT

Conservation of TOPHAT in plant genomes

Bioinformatics approaches serve to shed light on evolutionary history and provide insight into plant transposable element domestication. In particular, delving into the evolutionary history of SLEEPER genes can shed light on their putative molecular functions. Currently characterized plant DTEs derived from DNA transposons have various functions, of which the majority act as transcription factors. TOPHAT, like all SLEEPER genes, contain three conserved domains, an N-terminal BED zinc-finger domain (Pfam: PF02892), a C-terminal hAT family dimerization domain (Pfam: PF05699), and a domain of unknown function DUF4413 (Pfam: PF14372) (Figure 3). These also are the most conserved regions of the gene, along with a 40-bp region downstream of the BED zinc-finger domain that appears to be the most conserved continuous stretch in the SLEEPER gene family, although it does not have any homology to any currently described protein domain in Pfam. The dimerization domain found at the C-terminus belongs to the hAT superfamily. The isolated dimerization domain forms extremely stable dimers in vitro. This is the proposed method of activation – the homodimerization of TOPHAT or the heterodimerization with other SLEEPER proteins (Essers et al., 2000).

Phylogenetic analyses of TOPHAT and other SLEEPER-related genes

Performing a simple TBLASTN of the TOPHAT protein sequence to the Arabidopsis thaliana genome shows that the gene appears to be closely related to the SLEEPER family of hAT DTEs previously characterized (Bundock & Hooykaas, 2005). As well, TOPHAT showed high similarity to an additional putative hAT DTE, AT3G14800, from the initial screens. To determine the distribution of the SLEEPER gene family and its phylogenetic origins, SLEEPER paralogs were searched in 14 angiosperm species with high quality fully-sequenced genomes (Tuskan et al., 2006; Jaillon et al., 2007; Ouyang et al., 2007; Ming et al., 2008; Huang et al., 2009; Paterson et al., 2009; Hu et al., 2011; Lamesch et al., 2012; Albert et al., 2013; Hellsten et al., 2013; Slotte et

49 al., 2013; Wu et al., 2014; Myburg et al., 2014). All 5 Arabidopsis thaliana SLEEPER genes were used as a TBLASTN query on unmasked genomes. The highest scoring BLAST hits were extracted, and sequences were determined based on existing annotations when available or using a software prediction algorithm (FGENESH, Solovyev et al. 2006) to predict the protein sequence, given that there is EST data to support its expression. Phylogenetic analysis of the two previously described SLEEPER genes (DAYSLEEPER, CYTOSLEEPER) has been formerly performed (Knip et al., 2012); with our detailed expanded list of DTEs, we can expand on this analysis which constructs a more complete story on the evolution of the SLEEPER genes. In addition to DAYSLEEPER and CYTOSLEEPER, we have also included three new Arabidopsis thaliana SLEEPER DTEs to the analysis (AT1G69950, AT3G14800, and AT1G15300). The resulting list of sequences was aligned with MUSCLE (Edgar, 2004) and a UPGMA tree was created from the multiple . The resulting phylogenetic tree suggests that the SLEEPER genes were domesticated at time of angiosperm evolution (Figure 14), an observation previously described (Knip et al., 2012) that still is true with addition of the second branch of the SLEEPER family. No SLEEPER paralogs of either family were found in gymnospermae, monilophyta, and bryophyta species. At first it was thought that the two SLEEPER subfamilies may have been domesticated at separate events, since monocots only have the SLEEPER-A subfamily. However, with the basal angiosperm Amborella trichocarpa genome now fully sequenced, we found two sequences with EST support (Amborella1 and Amborella2, Figure 14) that correspond to the two SLEEPER families. It appears that monocots lost the SLEEPER-B subfamily (containing TOPHAT and AT1G14800 orthologs) and gained multiple copies of SLEEPER-B through whole genome duplication events in the grasses. The branches of the phylogenetic tree suggest that the SLEEPER-A genes both descended from the ancestral Amborella sequence, and that the genes subsequently diversified in separate phylogenetic trajectories. The two genes in the SLEEPER-B subfamily, TOPHAT and AT3G14800, appear to have diversified from the single corresponding ancestral Amborella trichopoda gene. The two genes are present in single copy in most dicot genomes that were surveyed, placing the divergence of the two genes shortly after the evolution of angiosperms, following the first

50 domestication of the ancestral SLEEPER-B gene. The two genes have been separated early on but retain very high sequence identity with each other.

SLEEPER-B

SLEEPER-A

Monocots

Figure 14. Phylogenetic tree of the SLEEPER family. 14 angiosperm genomes were used to build this tree. As the genomes were BLASTed with all copies of the A. thaliana SLEEPER family before building the tree, the numbers following the genus represent a naming of the SLEEPER genes regardless of to which they are related; the relationship can be inferred from the tree itself. Arabidopsis genes are annotated by TAIR naming convention, with the chromosome number following AT and the gene number following G. Brassica: Brassica rapa, Capsella: Capsella rubella, Lyrata: Arabidopsis lyrata, Vitis: Vitis vinifera, Poplar: Populus trichocarpa, Cucumis: Cucumis sativus, Eucalyptus: Eucalyptus grandis, Citrus: Citrus sinensis, Papaya: Carica papaya, Amborella: Amborella trichopoda, Oryza: Oryza sativa, Sorghum: Sorghum bicolor, Mimulus: Mimulus guttatus, AT#G#####: Arabidopsis thaliana.

51

It is likely that a single domestication event preceded and gave rise to the two SLEEPER subfamilies. From the high similarity (42% pairwise identity, 55% positive identity) of the two Amborella trichocarpa SLEEPER genes, it appears likely that the genes duplicated from a common ancestral gene, and that selective pressure after subsequent diversification has resulted in the preservation of sequence identity. This suggests that ancestral SLEEPER transposase was domesticated during early angiosperm evolution. So far, all fully-sequenced basal angiosperm genomes contain genes from both SLEEPER subfamilies, so it appears that the genes have been domesticated and very stably fixed in angiosperm genomes. In accordance with the above hypothesis that SLEEPER genes interact, there appears to be high degrees of overlap in gene expression profiles across all tissue types (Figure 12). Using the BAR expression angler tool (Toufighi et al., 2005), we calculated the correlation coefficients on the SLEEPER genes compared to TOPHAT using the AtGenExpress Tissue Plus expression library. Besides CYTOSLEEPER (AT1G15300), which is thought to be a neofunctionalized divergent homolog of DAYSLEEPER (Knip et al., 2012), the SLEEPER genes all have similar expression profiles with r-values greater than 0.70 (AT1G80020: 0.825; DAYSLEEPER: 0.782, AT3G14800: 0.732, CYTOSLEEPER: 0.248). Expression appears to be consistent across tissue types but highest in the apex. With such high degrees of overlap in expression and the existence of the highly conserved hAT dimerization domain, heterodimerization appears likely for the activation of the SLEEPER genes in Arabidopsis thaliana. This may also explain the pleiotropic and modest stress sensitivity phenotypes seen in our T-DNA mutant lines – the downregulation of TOPHAT gene may affect several routes of interaction, but could also be a redundant piece of the gene regulation caused by the SLEEPER genes collectively, as described in detail in the section regarding the synergistic model. It may be necessary to downregulate several pieces of this puzzle to see more extreme phenotypes. We speculate that the SLEEPER genes play interdependent and overlapping roles in a variety of plant stress responses.

52

Conclusions

In this thesis I have characterized a novel DTE discovered through a bioinformatic approach to screen for previously unidentified DTEs. TOPHAT appears to have a role in the gene regulation of a host of genes including downstream genes in biotic and abiotic stress response pathways. The knockout mutant of TOPHAT confers salt and freezing sensitivity phenotypes and shows a misregulation of genes related to those processes, while the overexpression of TOPHAT upregulates biotic response genes and causes a notable biotic tolerance phenotype. We speculate that TOPHAT acts as a gene regulatory factor in a complex combinatorial interplay with other SLEEPER DTE genes. Future work on TOPHAT includes creating double knockout mutants with other SLEEPER mutants. Since DAYSLEEPER itself has a lethal phenotype, it is logical to start with AT3G14800, the closest paralog to TOPHAT and a member of the same SLEEPER-B subfamily, or AT1G80020, which shares the greatest correlation in expression profile. Like MUSTANG, whose single mutants only show modest phenotypes under standard growth conditions, double mutants may reveal more extreme phenotypes, whether it is due to redundancy or cooperative interaction. When grown under standard laboratory conditions, MUSTANG double mutants exhibit severe phenotypes from germination to senescence (Joly-Lopez et al., 2012). This work demonstrates a first step in the characterization of a gene that has been overlooked for having features similar to a transposable element due to its evolutionary heritage, but confers a function that has been exapted by the host. Many genes like TOPHAT are still frequently annotated as non-coding DNA and masked during genome analyses, even though there are multiple lines of evidence to support putative functionality. Much like the characterization of other known DTEs, the exact molecular role of TOPHAT remains unknown. Further work will shed light on the mechanisms and ways in which a transposable element protein can be altered to confer a positive fitness advantage for the host. The characterization of DTEs may add a further layer of understanding into the complex interplay of the evolution of gene regulation.

53

Methods

Cloning

Total RNA was extracted (Qiagen) from wild-type Arabidopsis thaliana (Col-0), treated with DNase (Ambion) and single stranded cDNA was synthesized using SuperScript III (Invitrogen). Using primers 69950topoF2, 69950topoR, and the CDS of the AT1G69950 was amplified by Phusion polymerase using the GC buffer (NEB) and cloned into a pENTR/SD/D- TOPO vector (Invitrogen). This entry vector was cloned into chemically competent E. coli and the plasmids were isolated by miniprep (Qiagen). The plasmid was digested with HincII and PvuI and the band containing the insert and cloning sites was extracted following gel electrophoresis (Qiagen). The resulting linearized DNA was recombined into pEARLEYGATE destination vectors 100, 101, 102, 103, and 104 (Earley et al., 2006, primers listed in Table 10) using LR clonase II (Invitrogen). These destination vectors are driven by a 35S promoter and contain various fluorescence tags. Vectors were sequenced at each stage and aligned to the A. thaliana genome sequence to ensure that there were no errors in replication.

Transformation

Expression vectors were transformed into electrocompetent Agrobacterium tumefaciens strain GV3101. Using the floral dip procedure, flowering wild-type Arabidopsis thaliana col-0 plants grown for 4 weeks were transformed (Clough & Bent, 1998) and seeds from these plants (T0) plants were harvested and transformed seeds were selected on agar plates supplemented with BASTA, an herbicide to select against untransformed seeds lacking the resistance gene and grown on soil until seed production. Plants were genotyped with primers to confirm insertion. T2 individuals were confirmed to be homozygous by planting their T3 seeds in BASTA plates, and the seeds from those lines were selected to carry out further experiments T2 and T3 refer to the second and third generation of the transgenic, with T0 being the original transformed plant. Expression was confirmed by PCR using primers designed to

54 probe a region including the insertion construct and the insertion site (Q-35S69950-LP/RP, Table 10).

Cloning primers 69950-topoF2 CACCATGGATTTGTCAGATGCAGTG 69950-topoR AGAGCTTTCTAGTTCGCTTTGGAT

Genotyping primers Seq1-R TGTGTCACAACAAGCTTCCAG Seq2-F GATTCCCTGGAAGCTTGTTG Seq3-F GTGGTGAGATGCTTGCTGAA DNA-BD-R TTTCGTTTTAAAACCTAAGAGTC T7-F TAATACGACTCACTATAGGG DEST100-LP ATCTCTCTGCCGACAGTGGT DEST100-RP TTAGGTTTGACCGGTTCTGC

RT-PCR primers BAR11-LP GCCAACATGGGAGTCCAAGA BAR11-RP ATCGATGAGCCCAGAACGAC Actin-F CATCAGGAAGGACTTGTACGG Actin-R GATGGACCTGACTCGTCATAC Q-35S69950-LP CGGCCGCCTTGTTTAACTTT Q-35S69950-RP TGAGCCGCTAAGTCGCTTCT

Table 10. Table of primers used in this study.

Salt assay

Mutant, wild-type, and overexpression line seeds were vapor-sterilized with chlorine gas (Bent et al. 2000), plated out onto ½ MS, 1% sucrose, 0.8% agar media and vernalized at 4°C for 3 days. Seeds were allowed to grow at standard growth conditions for 4 days, transferred to ½ MS plates supplemented with 160 mM of NaCl, and allowed to grow for an additional 8 days. The plates were then imaged with Lemnatec and scored for survivability. Three independent experimental replicates were performed. Percentage survival was represented as means ± SE.

55

Phenotyping

We measured seven traits that typically reflect plant fitness. This includes seed germination rate, height, rosette diameter, time of first flowering, silique count, seed count, and aborted seeds. Mutant, wild-type, and overexpression line seeds were vapor-sterilized with chlorine gas (Bent et al. 2000), plated out onto ½ MS, 1% sucrose, 0.8 agar media and vernalized at 4°C for 3 days. After 7 days of growth in regular conditions, plants were transferred to potted soil. Measurements were taken every 3 days until the end of the plant growth cycle.

Root length assay

Mutant, wild-type, and overexpression line seeds were vapor-sterilized with chlorine gas (Bent et al. 2000), plated out onto ½ MS, 1% sucrose, 0.8 agar media and vernalized at 4°C for 3 days. After 4 days of growth in regular conditions, plants were transferred to plates containing different salts. The root tip of the seedling was marked with a black marker. Plants were left to grow for an addition 7 days and images were taken with a camera attached at a stationary length with a tripod. RootTrace was used to analyze the total length of root growth from the initial point noted by the marker.

Germination rate assay

Mutant, wild-type, and overexpression line seeds were vapor-sterilized with chlorine gas (Bent et al. 2000), plated out onto ½ MS, 1% sucrose, 0.8 agar media and a variable amount of NaCl (0, 80, 120, and 160 mM) and vernalized at 4°C for 3 days. Plants were left to germinate and germination rate determined by the presence of green tissue after 3 days was recorded.

56

RNA-SEQ pipeline

Growth and sample preparation Seeds from wild-type, knockout, and overexpression genotypes were vapor-sterilized with chlorine gas (Bent et al. 2000) and plated onto ½ MS, 0.8 agar, 1% sucrose media and vernalized at 4°C for 3 days. The three genotypes were plated onto the same plate to ensure equal growth conditions. Seeds were allowed to grow at standard growth conditions for 7 days. At the end of this growth period, 10 seedlings per sample were extracted together to reduce biological noise. Two biological replicates per genotypes were extracted by Qiagen RNeasy Plant Mini Kit and treated with Ambion DNA-free to remove contaminating DNA. These samples were sent to Genome Quebec for TruSeq Library preparation and subject to Illumina HiSeq 100bp single-end sequencing.

Bioinformatics Analysis of the reads were performed on TAIR9.0 version of the Arabidopsis thaliana genome. Reads were aligned using TopHat and expression differences were analyzed using the Cuffdiff module in Cufflinks (Trapnell et al., 2012). Differential expression was assessed using an FDR-adjusted q-value, and significant genes are discussed (q <0.05). We also looked at genes using a more stringent q-value (0.001). GO categories were independently determined using GOSEQ to correct for length biases and agriGO SEA tool with TAIR10 annotations.

Bacterial infection assay

Pseudomonas syringae pv. tomato strain (PstDC3000) was allowed to grow overnight in king's B media containing 15 μg rifampicin per ml media at 28°C with 220 rpm shaking. Cells were collected centrifugation at 2500 g and resuspended in sterile water with 0.025% Silwet L-

6 77 to approximately 5 × 10 cfu/ml, which corresponds to an OD600 of 1.0 diluted 1:100. The bacterial suspension was dispensed onto 1 week old seedlings growing on ½ MS plates to

57 incubate for 3 minutes and then decanted out. Infected plants were kept in growth chamber under 16 h light- 8 h dark cycle at 22°C for 7 days and then bacterial growth was measured.

To measure bacteria growth, 4 seedlings were surface sterilized with 5% H2O2 and homogenized in 10 mL sterile H2O using a mortar and pestle. 1:10 serial dilutions were created from this homogenate and 10 μl were spotted 6 times on king’s B plates with 15 μg rifampicin. Plates were incubated at 28°C for 48 h and the dilutions having between 10-100 colonies were counted and normalized by the total weight of the seedlings. Experiment was performed in triplicate.

Phylogeny

Phylogenetic analyses were conducted on the amino acid sequences of TOPHAT and other related SLEEPER sequences in plant genomes. These sequences were found using a TBLASTN query on unmasked, fully sequenced plant genomes, and the highest scoring BLAST hits were extracted. If a gene annotation did not exist the amino acid sequences were predicted using a software prediction algorithm (FGENESH). The resulting list of sequences was aligned with MUSCLE and a UPGMA tree was created from the multiple sequence alignment.

58

Works cited

Abramovitch, R.B. & Martin, G.B. 2004. Strategies used by bacterial pathogens to suppress plant defenses. Curr. Opin. Plant Biol. 7: 356–64.

Albert, V.A., Barbazuk, W.B., DePamphilis, C.W., Der, J.P., Leebens-Mack, J., Ma, H., et al. 2013. The Amborella genome and the evolution of flowering plants. Science 342: 1241089.

Alexander, D., Goodman, R.M., Gut-Rella, M., Glascock, C., Weymann, K., Friedrich, L., et al. 1993. Increased tolerance to two oomycete pathogens in transgenic tobacco expressing pathogenesis-related protein 1a. Proc. Natl. Acad. Sci. United States Am. 90: 7327–7331.

Aravin, A.A., Hannon, G.J. & Brennecke, J. 2007. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science 318: 761–4.

Ascencio-Ibáñez, J.T., Sozzani, R., Lee, T.-J., Chu, T.-M., Wolfinger, R.D., Cella, R., et al. 2008. Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection. Plant Physiol. 148: 436–54.

Attaran, E., Rostás, M. & Zeier, J. 2008. Pseudomonas syringae elicits emission of the terpenoid (E,E)-4,8,12-trimethyl-1,3,7,11-tridecatetraene in Arabidopsis leaves via jasmonate signaling and expression of the terpene synthase TPS4. Mol. Plant. Microbe. Interact. 21: 1482–97.

Battaglia, M. & Covarrubias, A.A. 2013. Late Embryogenesis Abundant (LEA) proteins in legumes. Front. Plant Sci. 4.

Bergler, J. & Hoth, S. 2011. Plant U-box armadillo repeat proteins AtPUB18 and AtPUB19 are involved in salt inhibition of germination in Arabidopsis. Plant Biol. 13: 725–30.

Bethke, G., Grundman, R.E., Sreekanta, S., Truman, W., Katagiri, F. & Glazebrook, J. 2014. Arabidopsis PECTIN METHYLESTERASEs contribute to immunity against Pseudomonas syringae. Plant Physiol. 164: 1093–107.

Breitenbach, H.H., Wenig, M., Wittek, F., Jordá, L., Maldonado-Alconada, A.M., Sarioglu, H., et al. 2014. Contrasting Roles of the Apoplastic Aspartyl Protease APOPLASTIC, ENHANCED DISEASE SUSCEPTIBILITY1-DEPENDENT1 and LEGUME LECTIN-LIKE PROTEIN1 in Arabidopsis Systemic Acquired Resistance. Plant Physiol. 165: 791–809.

Britten, R. 1996. Cases of ancient mobile element DNA insertions that now affect gene regulation. Mol. Phylogenet. Evol. 5: 13–17.

59

Buell, C.R., Joardar, V., Lindeberg, M., Selengut, J., Paulsen, I.T., Gwinn, M.L., et al. 2003. The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. PNAS 100: 10181–6.

Bundock, P. & Hooykaas, P. 2005. An Arabidopsis hAT-like transposase is essential for plant development. Nature 436: 282–4.

Calvi, B.R., Hong, T.J., Findley, S.D. & Gelbart, W.M. 1991. Evidence for a Common Evolutionary Origin of Inverted Repeat Transposons in Drosophila and Plants: hobo , Activator , and Tam3. Cell 66: 465–471.

Casimiro, I., Beeckman, T., Graham, N., Bhalerao, R., Zhang, H., Casero, P., et al. 2003. Dissecting Arabidopsis lateral root development. Trends Plant Sci. 8: 165–71.

Castel, S.E. & Martienssen, R.A. 2013. RNA interference in the nucleus: roles for small RNAs in transcription, epigenetics and beyond. Nat. Rev. Genet. 14: 100–12.

Chen, J.-H., Jiang, H.-W., Hsieh, E.-J., Chen, H.-Y., Chien, C.-T., Hsieh, H.-L., et al. 2012. Drought and salt stress tolerance of an Arabidopsis glutathione S-transferase U17 knockout mutant are attributed to the combined effect of glutathione and abscisic acid. Plant Physiol. 158: 340–51.

Chen, K., Fan, B., Du, L. & Chen, Z. 2004. Activation of hypersensitive cell death by pathogen- induced receptor-like protein kinases from Arabidopsis. Plant Mol. Biol. 56: 271–83.

Chinnusamy, V., Ohta, M., Kanrar, S., Lee, B., Hong, X., Agarwal, M., et al. 2003. ICE1: a regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev. 17: 1043–54.

Clough, S. & Bent, A. 1998. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 16: 735–743.

Cowan, R.K., Hoen, D.R., Schoen, D.J. & Bureau, T.E. 2005. MUSTANG is a novel family of domesticated transposase genes found in diverse angiosperms. Mol. Biol. Evol. 22: 2084–9.

De la Chaux, N., Tsuchimatsu, T., Shimizu, K.K. & Wagner, A. 2012. The predominantly selfing plant Arabidopsis thaliana experienced a recent reduction in transposable element abundance compared to its outcrossing relative Arabidopsis lyrata. Mob. DNA 3: 2.

Deak, K.I. & Malamy, J. 2005. Osmotic regulation of root system architecture. Plant J. 43: 17–28.

Doolittle, W.F. & Sapienza, C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601–3.

60

Du, L. & Chen, Z. 2008. Identification of genes encoding receptor-like protein kinases as possible targets of pathogen- and salicylic acid-induced WRKY DNA-binding proteins in Arabidopsis. Plant J. 24: 837–847.

Earley, K.W., Haag, J.R., Pontes, O., Opper, K., Juehne, T., Song, K., et al. 2006. Gateway- compatible vectors for plant functional genomics and proteomics. Plant J. 45: 616–29.

Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792–7.

Ellendorff, U., Zhang, Z. & Thomma, B.P. 2008. Gene silencing to investigate the roles of receptor-like proteins in Arabidopsis. Plant Signal. Behav. 3: 893–6.

Essers, L., Adolphs, R.H. & Kunze, R. 2000. A highly conserved domain of the maize activator transposase is involved in dimerization. Plant Cell 12: 211–24.

Feschotte, C. 2008. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 9: 397–405.

Feschotte, C. & Pritham, E.J. 2007. DNA transposons and the evolution of eukaryotic genomes. Annu. Rev. Genet. 41: 331–68.

Fitter, A.H. & Stickland, T.R. 1991. Architectural analysis of plant root systems 2. Influence of nutrient supply on architecture in contrasting plant species. New Phytol. 118: 383–389.

French, A., Ubeda-Tomás, S., Holman, T.J., Bennett, M.J. & Pridmore, T. 2009. High-throughput quantification of root growth using a novel image-analysis tool. Plant Physiol. 150: 1784– 95.

Glazebrook, J. 1999. Genes controlling expression of defense responses in Arabidopsis. Curr. Opin. Plant Biol. 2: 280–6.

Haudry, A., Platts, A.E., Vello, E., Hoen, D.R., Leclercq, M., Williamson, R.J., et al. 2013. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45: 891–8.

Hazelett, D.J., Chang, J.-C., Lakeland, D.L. & Morton, D.B. 2012. Comparison of parallel high- throughput RNA sequencing between knockout of TDP-43 and its overexpression reveals primarily nonreciprocal and nonoverlapping gene expression changes in the central nervous system of Drosophila. G3 2: 789–802.

Hehl, R., Nacken, W.K.F., Krause, A., Saedler, H. & Sommer, H. 1991. Structural analysis of Tam3, a transposable element from Antirrhinum majus, reveals homologies to the Ac element from maize. Plant Mol. Biol. 16: 369–371.

61

Hellsten, U., Wright, K.M., Jenkins, J., Shu, S., Yuan, Y., Wessler, S.R., et al. 2013. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. PNAS 110: 19478–82.

Hoen, D.R., Park, K.C., Elrouby, N., Yu, Z., Mohabir, N., Cowan, R.K., et al. 2006. Transposon- mediated expansion and diversification of a family of ULP-like genes. Mol. Biol. Evol. 23: 1254–68.

Hu, T.T., Pattyn, P., Bakker, E.G., Cao, J., Cheng, J.-F., Clark, R.M., et al. 2011. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43: 476– 81.

Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., et al. 2009. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41: 1275–81.

Hudson, M.E., Lisch, D.R. & Quail, P.H. 2003. The FHY3 and FAR1 genes encode transposase- related proteins involved in regulation of gene expression by the phytochrome A-signaling pathway. Plant J. 34: 453–71.

Hundertmark, M. & Hincha, D.K. 2008. LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana. BMC Genomics 9: 118.

Ishiga, Y., Ishiga, T., Uppalapati, S.R. & Mysore, K.S. 2011. Arabidopsis seedling flood-inoculation technique: a rapid and reliable assay for studying plant-bacterial interactions. Plant Methods 7: 32.

Ito, H. & Kakutani, T. 2014. Control of transposable elements in Arabidopsis thaliana. Chromosome Res. 22: 217–23.

Jaillon, O., Aury, J.-M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., et al. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–7.

Joly-Lopez, Z., Forczek, E., Hoen, D.R., Juretic, N. & Bureau, T.E. 2012. A gene family derived from transposable elements during early angiosperm evolution has reproductive fitness benefits in Arabidopsis thaliana. PLoS Genet. 8: e1002931.

Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M. & Bureau, T.E. 2005. The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15: 1292–7.

Kapitonov, V. V & Jurka, J. 2005. RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons. PLoS Biol. 3: e181.

Kapitonov, V. V & Jurka, J. 2001. Rolling-circle transposons in eukaryotes. PNAS 98: 8714–9.

62

Katagiri, F., Thilmony, R. & He, S.Y. 2002. The Arabidopsis thaliana-pseudomonas syringae interaction. Arabidopsis Book 1: e0039.

Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, a. M., et al. 2002. The Human Genome Browser at UCSC. Genome Res. 12: 996–1006.

Knip, M., de Pater, S. & Hooykaas, P.J.J. 2012. The SLEEPER genes: a transposase-derived angiosperm-specific gene family. BMC Plant Biol. 12: 192.

Knip, M., Hiemstra, S., Sietsma, A., Castelein, M., de Pater, S. & Hooykaas, P. 2013. DAYSLEEPER: a nuclear and vesicular-localized protein that is expressed in proliferating tissues. BMC Plant Biol. 13: 211.

Knoth, C. & Eulgem, T. 2008. The oomycete response gene LURP1 is required for defense against Hyaloperonospora parasitica in Arabidopsis thaliana. Plant J. 55: 53–64.

La Camera, S., Balagué, C., Göbel, C., Geoffroy, P., Legrand, M., Feussner, I., et al. 2009. The Arabidopsis patatin-like protein 2 (PLP2) plays an essential role in cell death execution and differentially affects biosynthesis of oxylipins and resistance to pathogens. Mol. Plant. Microbe. Interact. 22: 469–81.

Lakshmanan, V., Castaneda, R., Rudrappa, T. & Bais, H.P. 2013. Root transcriptome analysis of Arabidopsis thaliana exposed to beneficial Bacillus subtilis FB17 rhizobacteria revealed genes for bacterial recruitment and plant defense independent of malate efflux. Planta 238: 657–68.

Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., et al. 2012. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40: D1202–10.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860–921.

Lata, C. & Prasad, M. 2011. Role of DREBs in regulation of abiotic stress responses in plants. J. Exp. Bot. 62: 4731–48.

Liu, J. & Zhu, J.K. 1998. A calcium sensor homolog required for plant salt tolerance. Science 280: 1943–1945.

Maruyama, K., Sakuma, Y., Kasuga, M., Ito, Y., Seki, M., Goda, H., et al. 2004. Identification of cold-inducible downstream genes of the Arabidopsis DREB1A/CBF3 transcriptional factor using two microarray systems. Plant J. 38: 982–93.

McClintock, B. 1950. The origin and behavior of mutable loci in maize. PNAS 36: 344–355.

63

Miller, W.J., McDonald, J.F., Nouaud, D. & Anxolabéhère, D. 1999. Molecular domestication-- more than a sporadic episode in evolution. Genetica 107: 197–207.

Ming, R., Hou, S., Feng, Y., Yu, Q., Dionne-Laporte, A., Saw, J.H., et al. 2008. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452: 991–6.

Myburg, A.A., Grattapaglia, D., Tuskan, G.A., Hellsten, U., Hayes, R.D., Grimwood, J., et al. 2014. The genome of Eucalyptus grandis. Nature 510: 356–62.

Naito, K., Zhang, F., Tsukiyama, T., Saito, H., Hancock, C.N., Richardson, A.O., et al. 2009. Unexpected consequences of a sudden and massive transposon amplification on rice gene expression. Nature 461: 1130–4.

Nakashima, K., Fujita, Y., Katsura, K., Maruyama, K., Narusaka, Y., Seki, M., et al. 2006. Transcriptional regulation of ABI3- and ABA-responsive genes including RD29B and RD29A in seeds, germinating embryos, and seedlings of Arabidopsis. Plant Mol. Biol. 60: 51–68.

Nishimura, M.T., Stein, M., Hou, B.-H., Vogel, J.P., Edwards, H. & Somerville, S.C. 2003. Loss of a callose synthase results in salicylic acid-dependent disease resistance. Science 301: 969– 72.

Nobuta, K., Okrent, R.A., Stoutemyer, M., Rodibaugh, N., Kempema, L., Wildermuth, M.C., et al. 2007. The GH3 acyl adenylase family member PBS3 regulates salicylic acid-dependent defense responses in Arabidopsis. Plant Physiol. 144: 1144–56.

Novillo, F., Medina, J. & Salinas, J. 2007. Arabidopsis CBF1 and CBF3 have a different function than CBF2 in cold acclimation and define different gene classes in the CBF regulon. PNAS 104: 21002–7.

Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., et al. 2007. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35: D883–7.

Parida, A.K. & Das, A.B. 2005. Salt tolerance and salinity effects on plants: a review. Ecotoxicol. Environ. Saf. 60: 324–49.

Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–6.

Qi, Y., Tsuda, K., Nguyen, L. V, Wang, X., Lin, J., Murphy, A.S., et al. 2011. Physical association of Arabidopsis hypersensitive induced reaction proteins (HIRs) with the immune receptor RPS2. J. Biol. Chem. 286: 31297–307.

64

Rebollo, R., Horard, B., Hubert, B. & Vieira, C. 2010. Jumping genes and epigenetics: Towards new species. Gene 454: 1–7.

Rubin, E., Lithwick, G. & Levy, A.A. 2001. Structure and evolution of the hAT transposon superfamily. Genetics 158: 949–957.

Ryals, J. a, Neuenschwander, U.H., Willits, M.G., Molina, A., Steiner, H.Y. & Hunt, M.D. 1996. Systemic Acquired Resistance. Plant Cell 8: 1809–1819.

Sanmiguel, P. & Bennetzen, J. 1998. Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann. Bot. 82: 37–44.

Sato, M., Mitra, R.M., Coller, J., Wang, D., Spivey, N.W., Dewdney, J., et al. 2007. A high- performance, small-scale microarray for expression profiling of many samples in Arabidopsis-pathogen studies. Plant J. 49: 565–77.

Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M., et al. 2005. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37: 501–6.

Sijen, T. & Plasterk, R.H.A. 2003. Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426: 310–4.

Sinzelle, L., Izsvák, Z. & Ivics, Z. 2009. Molecular domestication of transposable elements: from detrimental parasites to useful host genes. Cell. Mol. Life Sci. 66: 1073–93.

Slotte, T., Hazzouri, K.M., Ågren, J.A., Koenig, D., Maumus, F., Guo, Y.-L., et al. 2013. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45: 831–5.

Solovyev, V., Kosarev, P., Seledsov, I. & Vorobyev, D. 2006. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1: S10.1–12.

The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815.

Thomashow, M.F. 2010. Molecular basis of plant cold acclimation: insights gained from studying the CBF cold response pathway. Plant Physiol. 154: 571–7.

Thorlby, G., Fourrier, N. & Warren, G. 2004. The SENSITIVE TO FREEZING2 gene, required for freezing tolerance in Arabidopsis thaliana, encodes a beta-glucosidase. Plant Cell 16: 2192– 2203.

65

Titarenko, E., Rojo, E., León, J. & Sánchez-Serrano, J.J. 1997. Jasmonic acid-dependent and - independent signaling pathways control wound-induced gene activation in Arabidopsis thaliana. Plant Physiol. 115: 817–26.

Tonegawa, S. 1983. Somatic generation of antibody diversity. Nature 302: 575–581.

Toufighi, K., Brady, S.M., Austin, R., Ly, E. & Provart, N.J. 2005. The Botany Array Resource: e- Northerns, Expression Angling, and promoter analyses. Plant J. 43: 153–63.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., et al. 2012. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7: 562–78.

Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596–604.

Van der Weele, C.M. 2000. Growth of Arabidopsis thaliana seedlings under water deficit studied by control of water potential in nutrient-agar media. J. Exp. Bot. 51: 1555–1562.

Wang, Z. & Li, X. 2009. IAN/GIMAPs are conserved and novel regulators in vertebrates and angiosperm plants. Plant Signal. Behav. 4: 165–7.

Whitham, S.A., Quan, S., Chang, H.-S., Cooper, B., Estes, B., Zhu, T., et al. 2003. Diverse RNA viruses elicit the expression of common sets of genes in susceptible Arabidopsis thaliana plants. Plant J. 33: 271–83.

Wu, G.A., Prochnik, S., Jenkins, J., Salse, J., Hellsten, U., Murat, F., et al. 2014. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat. Biotechnol. 32: 656–62.

Xiong, L., Lee, B.H., Ishitani, M., Lee, H., Zhang, C. & Zhu, J.K. 2001. FIERY1 encoding an inositol polyphosphate 1-phosphatase is negative regulator of abscisic acid and stress signaling in Arabidopsis. Genes Dev. 15: 1971–1984.

Zeilmaker, T., Ludwig, N.R., Elberse, J., Seidl, M.F., Berke, L., Van Doorn, A., et al. 2015. DOWNY MILDEW RESISTANT 6 and DMR6-LIKE OXYGENASE 1 are partially redundant but distinct suppressors of immunity in Arabidopsis. Plant J. 81: 210–22.

Zhou, L., Mitra, R., Atkinson, P.W., Hickman, A.B., Dyda, F. & Craig, N.L. 2004. Transposition of hAT elements links transposable elements and V(D)J recombination. Nature 432: 995– 1001.

66

Zhu, Q., Maher, E. a., Masoud, S., Dixon, R. a. & Lamb, C.J. 1994. Enhanced Protection Against Fungal Attack by Constitutive Co–expression of Chitinase and Glucanase Genes in Transgenic Tobacco. Bio/Technology 12: 807–812.

67

List of figures

Figure 1. Expression of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g02050) (B). Images were visualized using the AtGenExpress Visualization Tool (Schmid et al., 2005). The individual points represent different samples of the tissue type, available as a list from the Table S1 in the cited paper. The y-axis represents normalized signal intensity of the microarray. The signal intensity around 5 likely represents a background noise as this value is consistently a minimum on this microarray data set...... 17 Figure 2. mRNA and small RNA expression of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g05020) (B). The browser show are tracks are taken from an adapted version of the UCSC Genome Browser (http://mustang. biol.mcgill.ca:8885/) (Haudry et al., 2013). The tracks represent: (1) In maroon, Arabidopsis thaliana gene models from TAIR9, (2) mRNA expression aggregated from several NCBI SRA experiments (SRR013411, SRR019183, SRR019209, SRR064165, SRR309186), the black representing average expression calculated from the combined data of the archived mRNA sequences, (3) known small non- coding RNAs aggregated by the Arabidopsis thaliana Small RNA Project (http://asrp.danforthcenter.org/)...... 18 Figure 3. Locations of T-DNA insertions in the context of the TOPHAT gene. There is an intron in the 5’ UTR where one of the T-DNA insertions lie. The remaining three are nestled in the middle of the CDS. The conserved PFAM domains are highlighted in green boxes. (1) BED Zinc-finger (2) Domain of unknown function (DUF4413) (3) hAT-C dimerization domain...... 18 Figure 4. Synteny of TOPHAT (At1g69950) (A) compared to a typical annotated hAT transposable element (At2g05020) (B). The browser show are tracks are taken from an adapted version of the UCSC Genome Browser (http://mustang. biol.mcgill.ca:8885/) (Haudry et al., 2013). The tracks represent: (1) In maroon, Arabidopsis thaliana gene models from TAIR9, (2) Multiple sequence alignments showing conservation in Brassicaceae represented by the best orthologous chains between Arabidopsis thaliana and the respective cruciferous species, (3) Conservation in Brassicaceae broken down into the three groups of non-overlapping chains (B. rapa and L. alabamica) because of the large genomes due to whole-genome triplication...... 19

68

Figure 5. Basic morphology measurements in Arabidopsis thaliana plants grown at standard conditions (21°C, 18h/6h day/night). (A) Average rosette diameter, (B) average height...... 21 Figure 6. The results of TOPHAT T-DNA mutant lines in the abiotic stress tolerance screens from the VEGI project. Salt tolerance assay was designed and performed by Zoé Joly-Lopez and Dr. Ewa Forczek, respectively. Reference genes were chosen from literature to provide a benchmark for the effectiveness of the assay. The reference gene shown for salt is SOS3 (SALT OVERLY SENSTIVE 3); for freezing, SFR2 (SENSITIVE TO FREEZING 2). Reference 1 and 2 refer to the two runs of the experiment (3 replicates)...... 22 Figure 7. Representative pictures from the freezing tolerance assay. (A) TOPHAT knockout (B) Wild-type. Experiments were conducted by Dr. Ewa Forczek. Seedlings were subjected to a freezing protocol and allowed to recover. Survivability was scored based on seedlings having 50% green after recovery. As seen in (A), many seedlings cannot recover from the treatment and remain yellow, whereas more plants are able to recover and continue growth in (B)...... 22 Figure 8. Root length of the three genotypes grown in different osmotic treatments. Assays were performed over a spectrum of concentrations and a representative non-lethal but growth inhibitory concentration is shown. Concentrations are as follows: 50mM NaCl, 100mM KCl, 50mM mannitol, 10mM LiCl. Stars represent significance at a p≤0.05...... 25 Figure 9. Germination rate of the three genotypes on NaCl plates of increasing concentration. Stars represent significance at p≤0.05...... 26 Figure 10. Summary of the RNA-SEQ data generated from the sequenced samples. OE-821 refers to the 35S overexpression lines, KO-S69_1 refers to the T-DNA mutant line, and WT is wild type col-0 used as control. FPKM refers to the fragments per kilobase of exon per million fragments mapped. The genome browser column represents a visual depiction of the FPKM readings in each individual base...... 28 Figure 11. Synergistic model to explain the gene expression differences caused by overexpressing and knocking out TOPHAT in Arabidopsis thaliana. Pointy arrows represent an upregulation or influence on the pathway, and blunt arrows represent a downregulation or inhibition of the pathway...... 30

69

Figure 12. Expression profiles of SLEEPER genes. Asides from AT1G15300 (CYTOSLEEPER), the SLEEPER genes show striking similarities in cell type localization, supporting the hetero- dimerization and additive effect hypothesis. A-C represent the SLEEPER-A subfamily and D-E represent the SLEEPER-B subfamily (A) AT1G15300 (CYTOSLEEPER) (B) AT1G80020 (C) AT3G42170 (DAYSLEEPER) (D) AT1G69950 (TOPHAT) (E) AT3G14800. The tissues include root, stem, leaf, whole plant, apex, flowers, floral organs, and seeds. Images were visualized using the AtGenExpress Visualization Tool (Schmid et al., 2005)...... 34 Figure 13. Bacterial growth after seedling flood inoculation using Pseudomonas syringae DC3000. The assay was performed in triplicate. Stars represent significance of p≤0.05. Bars represent 95% confidence intervals...... 47 Figure 14. Phylogenetic tree of the SLEEPER family. 14 angiosperm genomes were used to build this tree. As the genomes were BLASTed with all copies of the A. thaliana SLEEPER family before building the tree, the numbers following the genus represent a naming of the SLEEPER genes regardless of to which they are related; the relationship can be inferred from the tree itself. Arabidopsis genes are annotated by TAIR naming convention, with the chromosome number following AT and the gene number following G...... 50

List of tables

Table 1. Comparison of Arabidopsis thaliana TOPHAT amino acid coding sequence to its orthologs in other fully sequenced species...... 20 Table 2. Summary of the RNA-SEQ data generated from the sequenced samples. HiSeq 2500 produced paired end reads of approximately 100 bp. Alignment was performed using TopHat on a reference TAIR9 A. thaliana genome with transposon annotations included. OE-821 refers to the 35S overexpression lines, KO-S69_1 refers to the T-DNA mutant line, and WT refers to Arabidopsis thaliana col-0 used as control, the same lines on which the overexpression mutants were created...... 27

70

Table 3. Summary of differentially expressed genes amongst the three genotypes...... 29 Table 4. GO terms enriched in the upregulated genes shared by the knockout mutant and overexpression genotypes compared with wild-type. In bold are GO terms that are common with genes enriched in the overexpression genotype compared with the wild-type (Table 7). . 35 Table 5. GO terms enriched in the knockout mutant genotype compared with wild-type. Highlighted are GO terms that are related to the observed phenotype...... 38 Table 6. Gene expression differences in the TOPHAT knockout mutant line. A total of 157 genes were up-regulated and 27 genes were down-regulated greater than a 2 fold change, at a q- value cut off of 0.001...... 42 Table 7. GO terms enriched in the overexpression genotype compared with wild-type. All terms appear to be related to biotic stress response...... 43 Table 8. Gene expression differences in the TOPHAT overexpression line. A total of 51 genes were up-regulated and 1 gene was down-regulated greater than a 2 fold change, at a q-value cut off of 0.001...... 44 Table 9. List of genes that were differentially expressed in only the overexpression vs. wild-type comparison (not found in knockout vs. wild-type comparison). Highlighted in dark grey are genes shown to play a core role in literature. Genes in light grey are implicated in pathogen- plant stress transcriptomic projects or pathogen response papers to be involved in the pathways...... 45 Table 10. Table of primers used in this study...... 54