The Pennsylvania State University

The Graduate School

Intercollege Graduate Degree Program in Plant Biology

GENOME-WIDE ANALYSIS OF PLANT HETEROCHROMATIC

SHORT-INTERFERING

A Dissertation in

Plant Biology

by

Feng Wang

© 2017 Feng Wang

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2017

The dissertation of Feng Wang was reviewed and approved* by the following:

Michael J. Axtell Professor of Biology Dissertation Advisor Chair of Committee

Charles T. Anderson Assistant Professor of Biology

Surinder Chopra Professor of Agricultural Sciences

Teh-hui Kao Distinguished Professor of Biochemistry and Molecular Biology Chair, Intercollege Graduate Degree Program in Plant Biology

*Signatures are on file in the Graduate School

ii

ABSTRACT

Small RNAs, usually 20-24 nt in length, are critical regulators of plant transcriptomes. They are loaded into ARGONAUTE (AGO) proteins, and mediate gene silencing by interacting with target transcripts through sequence-specific base pairing. In plants, small RNAs are classified into various different groups based on their biogenesis and function; microRNAs (miRNAs) and heterochromatic short-interfering RNAs (het-siRNAs) are the two most important types of small RNAs in flowering plants.

Plant small RNAs are subject to various forms of modification. Despite intensive studies about miRNA modification, knowledge about het-siRNA modification is lacking. I systematically studied non-templated patterns in plant small RNAs by analyzing small RNA sequencing (sRNA-seq) libraries from Arabidopsis, tomato, Medicago, rice, maize and Physcomitrella. Elevated rates of non-templated were observed at the 3' end of plant small RNAs from wild-type specimens of all analyzed species. In all species I analyzed, 'off-sized' small RNAs, such as 25 nt and 23 nt siRNAs arising from het-siRNA loci, often had higher rates of non-templated nucleotides at the 3' end. In Arabidopsis, 23 nt siRNAs arising from het-siRNA clusters display a distinct pattern of 3'-non-templated nucleotides. This pattern of 3'-non- templated nucleotides in 23 nt siRNAs is not dependent on known terminal nucleotidyl , and may result from modifications added to longer het-siRNA precursors.

Het-siRNAs negatively regulate gene expression through the RNA-directed DNA methylation (RdDM) pathway. Biogenesis of most het-siRNAs depends on the plant- specific RNA IV (Pol IV), and AGO4 is the major effector protein of het- siRNAs. Through genome-wide analysis of sRNA-seq data sets, I found that AGO4 is required for the accumulation of a small subset of het-siRNAs in Arabidopsis thaliana. The accumulation of AGO4-dependent het-siRNAs also requires several other RdDM components, including RNA POLYMERASE V (Pol V), DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2) and SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1). I also demonstrated that het-siRNA accumulation could not be fully recovered by a slicing-defective AGO4 from ago4 mutant plants. These data suggest that AGO4- dependent het-siRNAs are secondary het-siRNAs, whose biogenesis requires prior activities of RdDM at certain loci.

In the current RdDM model, AGO4-bound het-siRNAs target Pol V transcripts through sequence-specific base pairing. However, the details of such interactions are largely unknown. Through crosslinking immunoprecipitation by an Arabidopsis AGO4 antibody and subsequent high-throughput sequencing, I identified a handful of het-siRNA:target interactions in Arabidopsis thaliana. These de novo identified het-siRNA:target interactions suggest that het-siRNAs act on both cis and trans loci. Successful interaction between a het-siRNA and its target(s) requires extensive base-pairing, and induces target cleavage between the 10th to 11th nucleotide counting from the 5' end of het-siRNAs.

iii

TABLE OF CONTENTS

LIST OF FIGURES ...... vii LIST OF TABLES ...... ix ACKNOWLEDGEMENTS ...... x Chapter 1 Introduction...... 1 1.1 Overview of Plant Small RNAs ...... 1 1.2 Plant small RNA biogenesis and classification ...... 1 1.2.1 microRNAs ...... 2 1.2.2 Short-interfering RNA ...... 3 1.3 ARGONAUTE Proteins in Arabidopsis ...... 7 1.3.1 Function of ARGONAUTE proteins in Arabidopsis ...... 8 1.3.2 Catalytic ability of ARGONAUTEs in Arabidopsis ...... 10 1.4 Modification of plant small RNAs ...... 12 1.4.1 The 2'-O-methylation of small RNAs ...... 13 1.4.2 Tailing of miRNAs ...... 14 1.4.3 Trimming of miRNAs ...... 15 1.4.4 miRNA editing ...... 15 1.4.5 Heterochromatic siRNA modification ...... 16 1.5 RNA-directed DNA methylation ...... 17 1.5.1 Canonical RdDM pathway ...... 17 1.5.2 Non-canonical RdDM pathway ...... 18 1.5.3 Biological roles of RdDM ...... 19 1.5.4 Unsolved issues in canonical RdDM ...... 20 1.6 Objectives ...... 22 Chapter 2 Genome-Wide analysis of single non-templated nucleotides in plant endogenous siRNAs and miRNAs...... 23 2.1 Introduction ...... 23 2.2 Materials and Methods ...... 25 2.2.1 Small RNA sequencing library preparation ...... 25 2.2.2 Source and processing of small RNA sequencing datasets ...... 25 2.2.3 Preparation of Simulated sRNA-seq Libraries ...... 28 2.2.4 Analysis of non-templated nucleotides ...... 28 2.3 Results ...... 29 2.3.1 Elevated mismatch rates at the 3' ends of genome-aligned plant miRNAs and siRNAs are due to non-templated nucleotides ...... 29 2.3.2 'Off-sized' het-siRNAs and miRNAs often have higher rates of 3' end non- templated nucleotides...... 36 2.3.3 Arabidopsis 23 nt siRNAs from the 24 nt-dominated clusters have a unique pattern of 3' non-templated nucleotides ...... 38

iv

2.3.4 23 nt siRNAs with a 3' non-templated nucleotide are infrequently bound to AGO4 ...... 45 2.3.5 23 nt siRNAs from the 24 nt-dominated clusters are mostly not 3'- or 5'- truncated variants of 24 nt siRNAs ...... 45 2.3.6 Further properties of 23 nt siRNAs with a 3'-most non-templated nucleotide 47 2.4 Discussion ...... 49 Chapter 3 AGO4 is specifically required for heterochromatic siRNA accumulation at Pol V-dependent loci in Arabidopsis thaliana ...... 53 3.1 Introduction ...... 53 3.2 Materials and Methods ...... 55 3.2.1 Plant materials and growth condition ...... 55 3.2.2 Cloning of wild-type and slicing-defective AGO4 ...... 55 3.2.3 Plant transformation and transgenic plant selection ...... 55 3.2.4 sRNA-seq library preparation ...... 56 3.2.5 Differential expression analysis ...... 56 3.2.6 Heatmap of small RNA accumulation in AGO4-dependent clusters ...... 57 3.2.7 Euler diagrams ...... 57 3.2.8 Small RNA accumulation in nrpd1-4, nrpe1-12, drm2-2, shh1-1, and ago4-4 57 3.3 Results ...... 59 3.3.1 Accumulation of a subset of 24 nt het-siRNAs depends on AGO4 in Arabidopsis ...... 59 3.3.2 Accumulation of small RNAs in AGO4-dependent clusters requires NRPE1, DRM2 and SHH1 ...... 62 3.3.3 An AGO4 catalytic residue is required for full accumulation of most AGO4- dependent 24 nt siRNAs ...... 65 3.3.4 Slicing-defective AGO4 partially complements small RNA accumulation in the ago4-4/ago6-2/ago9-1 triple mutant ...... 66 3.4 Discussion ...... 70 3.4.1 Most 24 nt siRNAs do not require AGO4, AGO6, or AGO9 for accumulation70 3.4.2 AGO4-dependent siRNAs are likely secondary siRNAs ...... 70 3.4.3 On the role of AGO4-catalyzed slicing ...... 71 Chapter 4 Systematic identification of het-siRNA and target interactions in Arabidopsis thaliana ...... 73 4.1 Introduction ...... 73 4.2 Materials and Methods ...... 75 4.2.1 AGO4-CLIP for preparation of het-siRNA chimeras ...... 75 4.2.2 Characterization of het-siRNA clusters in Arabidopsis seedlings ...... 77 4.2.3 Computational identification of het-siRNA chimeras ...... 77 4.3 Results ...... 78 4.3.1 Identification of het-siRNA:target interactions in Arabidopsis thaliana ...... 78 4.3.2 Het-siRNA induces target cleavage ...... 82 4.3.3 Het-siRNAs can interact with targets in cis and in trans ...... 88 4.4 Discussion ...... 88

v

4.4.1 Rules of het-siRNA targeting ...... 88 4.4.2 Role of target cleavage...... 89 Chapter 5 Summary and Prospects...... 92 5.1 Summary ...... 92 5.1.1 Non-templated nucleotides in het-siRNAs ...... 92 5.1.2 Role of AGO4 on het-siRNA accumulation ...... 92 5.1.3 Het-siRNA and target interactions ...... 93 5.2 Prospects ...... 93 5.2.1 Determinants for secondary het-siRNA biogenesis ...... 93 5.2.2 Role of AGO4 mediated target cleavage ...... 95 5.2.3 Improving AGO4-CLIP efficiency ...... 95 5.2.4 RdDM in crops ...... 96 5.2.5 RdDM: Beyond TE silencing...... 97 Appendix ...... 101 References ...... 106

vi

LIST OF FIGURES

Figure 1.1 A simplified model of the RdDM pathway ...... 5 Figure 1.2 Phylogenetic tree of Arabidopsis ARGONAUTEs ...... 7 Figure 1.3 Active catalytic sites in Arabidopsis ARGONAUTEs ...... 11 Figure 1.4 A model of miRNA modification pathway in Arabidopsis ...... 12 Figure 2.1 Elevated mismatch rates at the 3'-most positions of genome-aligned plant small RNAs ...... 31 Figure 2.2 3'-mismatch rates after using an alternative method of 3'-adapter trimming .. 32 Figure 2.3 'Off-sized' miRNAs and siRNAs have high rates of 3' mismatches in Arabidopsis ...... 34 Figure 2.4 'Off-sized' miRNAs and siRNAs have high rates of 3'-mismatches in Arabidopsis – evidence from additional sRNA-seq libraries ...... 35 Figure 2.5 'Off-sized' miRNAs and siRNAs have high rates of 3' end non-templated nucleotides in several plant species ...... 37 Figure 2.6 Sequence features of siRNAs arising from Arabidopsis 24 nt-dominated siRNA loci ...... 39 Figure 2.7 Sequence features of 23 nt siRNAs arising from the 24 nt-dominated siRNA loci in indicated species ...... 40 Figure 2.8 3'-most non-templated adenines in 23 nt siRNAs arising from the 24 nt- dominated siRNA clusters are not dependent on HESO1 and URT1 ...... 41 Figure 2.9 Non-templated nucleotide profile of small RNAs in dcl2/dcl4 mutant background ...... 43 Figure 2.10 Rates of non-templated nucleotides in siRNAs co-immunoprecipitated with Arabidopsis AGO4 ...... 44 Figure 2.11 Analysis of 5'- and 3'-truncations and tailings for 'off-sized' Arabidopsis small RNAs ...... 47 Figure 2.12 Further properties of 23 nt siRNAs with a 3'-most non-templated nucleotide ...... 49 Figure 2.13 Model of 3' non-templated nucleotides in 'off-sized' siRNAs arising from the 24 nt-dominated siRNA clusters ...... 52

vii

Figure 3.1 Expression of wild-type AGO4 and slicing-defective AGO4 proteins in transgenic plants ...... 60 Figure 3.2 Overall size profiles of small RNAs in tested genotypes ...... 61 Figure 3.3 Identification of AGO4-dependent small RNA clusters in Arabidopsis thaliana ...... 62 Figure 3.4 AGO4-dependent and AGO4-independent 24 nt siRNA clusters in other RdDM mutants ...... 64 Figure 3.5 Slicing-defective AGO4-D742A partially complements small RNA accumulation from AGO4-dependent siRNA loci ...... 66 Figure 3.6 Divergence of small RNA accumulation between Col-0 and Ws ...... 67 Figure 3.7 Slicing-defective AGO4-D742A partially complements small RNA accumulation in the ago4-4/ago6-2/ago9-1 background ...... 69 Figure 4.1 Overall procedures for de novo identification of het-siRNA:target interactions ...... 79 Figure 4.2 Characterization of de novo identified chimeras ...... 80 Figure 4.3 Penalty score of Cluster 3 chimeras in ligated and control sample ...... 81 Figure 4.4 Frequency of het-siRNA pairing regions in Cluster 3 chimeras ...... 83

viii

LIST OF TABLES

Table 2.1 Data sources and accession number...... 26 Table 3.1 Data sources and accession numbers of Arabidopsis thaliana sRNA-seq libraries ...... 58 Table 4.1 Het-siRNA:target interactions with perfect complementarity ...... 84 Table 4.2 Het-siRNA:target interactions with near-perfect complementarity ...... 85 Table 4.3 Potential het-siRNA duplexes ...... 87 Table 4.4 RNA duplex with extensive mismatches ...... 87

ix

ACKNOWLEDGEMENTS

I am extremely grateful to have Dr. Michael J. Axtell as my thesis adviser. He has always been patient, supportive and easy to approach. As an adviser, he creates a safe place where I can pursue challenging scientific questions without the fear of failure. He teaches me to think critically, to work smart, and to care for details. He not only influences my scientific perspective, but also motivates me to be a better person by generously sharing his wisdom and stories with me. I am particularly grateful for his insightful suggestions and enthusiastic encouragement for my research and for my daily life.

I would like to thank my dissertation committee, Dr. Charles Anderson, Dr. Surinder Chopra and Dr. Teh-hui Kao for their great support. They offered me invaluable advice on my research, dissertation writing, and future career choices. I am also very gratitude to them for their willingness of writing recommendation letters for me. As an international student, I would like to extend my special thanks to Dr. Kao, who always encourages me to reach out and enjoy the American life and culture. I would like to thank Dr. Ying Gu and Dr. Sarah Assmann who offered me opportunity to rotate in their labs in the first few months of my Ph.D. studies.

I thank all the current and former members of the Axtell lab. They are not only my lab mates, but also my good friends. I am thankful to Ceyda Coruh and Nathan Johnson for their contribution in the 'het-siRNA modification' project related to this dissertation. I thank Saima Shahid for teaching me many useful bioinformatics skills. I thank Sung Hyun (Joseph) Cho, Qikun Liu, Seth Polydore, Jo Ann Snyder and Charles Page for their help on many wet lab experiments. I would also like to thank Alice Lunardon, Matthew Endres, Zhaorong Ma and Cathy Lin for the inspiring discussions and helpful suggestions.

I would like to express my deep gratitude to my wife Jianghua Yu for being understanding and supportive all the time. She inspires and encourages me to follow my ambition and achieve my full potential. I also thank her for spending most of her time taking care of our newborn baby while I am writing this dissertation. I owe many thanks to my parents for providing me a happy childhood, supporting my education and encouraging me to study oversea. I thank my 6-month-old baby, Jiamu, for bringing me tremendous happiness. My families, including my in-laws, have given me endless support and love during my Ph.D. studies. This dissertation would not have been possible without them.

x

Chapter 1 * Introduction

1.1 Overview of Plant Small RNAs Small RNAs, usually 20-24 nt in length, are key regulators of plant transcriptomes. They silence plant endogenous genes as well as exogenous virus and introduced RNAs, and thereby play important roles in plant development, stress responses and genome stabilization (Sunkar et al., 2007; Chen, 2009; Ruiz-Ferrer and Voinnet, 2009; Matzke and Mosher, 2014). Plant small RNAs are usually classified into different groups based on their biogenesis and function (Axtell, 2013a). The microRNAs (miRNAs) and short- interfering RNAs (siRNAs) comprise the majority of plant small RNAs (Axtell, 2013a). Despite distinct biogenesis and function, functional plant small RNAs are loaded into ARGONAUTE (AGO) proteins, and guide AGO proteins to their targets by sequence- specific base pairing (Vaucheret, 2008; Fang and Qi, 2016). In plants, small RNAs induce target silencing through different mechanisms, at both transcriptional and post- transcriptional levels.

1.2 Plant small RNA biogenesis and classification In plants, small RNAs are classified into two major groups: miRNAs and siRNAs. Both miRNAs and siRNAs are processed from longer RNA precursors through the catalytic activity of DICER-LIKE (DCL) proteins (Schauer et al., 2002; Xie et al., 2004; Gasciolli et al., 2005; Henderson et al., 2006). The major difference between the two classes of plant small RNAs are the types of their precursors: miRNAs are processed from hairpin- like single-stranded RNA (ssRNA) precursors, whereas siRNAs are processed from double-stranded RNA (dsRNA) precursors (Axtell, 2013a).

* Part of this chapter is modified and reproduced from the following published work at Current Opinion in Plant Biology (Wang et al., 2015a).

1

1.2.1 microRNAs Most plant miRNAs are ~21 nt in length, and primarily regulate gene expression through silencing target mRNAs. In plants, miRNA genes (MIRs) are primarily found in intergenic regions (Reinhart et al., 2002). Like protein coding genes, MIRs are regulated by upstream promoters and enhancers (Xie et al., 2005a), transcribed by RNA POLYMERASE II (Pol II), and undergo splicing (Mica et al., 2009; Szarzynska et al., 2009; Kruszka et al., 2013), capping and (Jones-Rhoades and Bartel, 2004; Xie et al., 2005a; Zhang et al., 2005). Transcription of MIRs yields hairpin-like miRNA precursors with mature miRNA embedded in the stem region (Reinhart et al., 2002). An RNase III-like endonuclease called DICER-LIKE1 (DCL1) cuts the miRNA precursor, releasing a dsRNA duplex formed by a guide and a passenger (miRNA*) strand with a 2-nt overhang at the 3’ end (Schauer et al., 2002). Accurate and efficient dicing of plant miRNA precursors by DCL1 requires other RNA binding proteins, including DAWDLE (DDL) (Yu et al., 2008), SERRATE (SE) (Grigg et al., 2005; Lobbes et al., 2006; Yang et al., 2006a), and HYPONASTIC LEAVES1 (HYL1) (Vazquez et al., 2004a; Han et al., 2004; Kurihara et al., 2006). The miRNA/miRNA* duplex is then 2’-O-methylated by a methyltransferase, HUA ENHANCER 1 (HEN1), at the 3’ ends (Yu et al., 2005). The 2’-O-methylation protects miRNA from HEN1 SUPPRESSOR 1 (HESO1) and UTP:RNA uridylyltransferase 1 (URT1) mediated uridylation and degradation (Ren et al., 2012; Zhao et al., 2012b; Tu et al., 2015; Wang et al., 2015b). Mature miRNAs are loaded into AGO1-clade of AGOs to form RNA- induced Silencing Complexes (RISCs) for function (Vaucheret et al., 2004; Baumberger and Baulcombe, 2005; Qi et al., 2005; Mi et al., 2008). Most plant miRNAs induce target mRNA silencing on post-transcriptional level, primarily through AGO1-mediated messenger RNA (mRNA) cleavage (Baumberger and Baulcombe, 2005; Qi et al., 2005). Plant miRNAs induced translational repression has also been reported (Chen, 2004; Brodersen et al., 2008; Yang et al., 2012; Li et al., 2013a). The mechanism of plant miRNA mediated translational repression is currently unclear.

2

A class of non-canonical 24 nt miRNAs (lmiRNAs) has been identified in rice (Wu et al., 2010). Unlike canonical 21 nt miRNAs, lmiRNAs are processed by DCL3, are preferentially loaded into AGO4, and are able to direct target DNA methylation (Wu et al., 2010).

1.2.2 Short-interfering RNA In contrast to miRNAs, short-interfering RNAs (siRNAs) are processed from dsRNA precursors, which are mostly synthesized by RNA-DEPENDENT RNA (RDRs) (Axtell, 2013a). According to their distinct biogenesis and function, plant siRNAs are further divided into two major subcategories: heterochromatic siRNAs (het- siRNAs) and secondary siRNAs (Axtell, 2013a).

Heterochromatic siRNA Plant heterochromatic siRNAs (het-siRNAs) are highly abundant and present in a wide range of plant species. They are critical for transposable element (TE) repression, heterochromatin formation and genome stability (Matzke and Mosher, 2014). Het- siRNAs are ~ 24 nt in length, and are usually produced by intergenic or repetitive regions of plant genome (Zhang et al., 2007; Mosher et al., 2008; Lee et al., 2012). Biogenesis and function of het-siRNAs require a complicated transcriptional pathway called RNA- directed DNA methylation (RdDM) (Figure 1.1). Briefly, a plant-specific RNA polymerase, POLYMERASE IV (Pol IV) physically interacts with RNA-DEPENDENT RNA POLYMERASE 2 (RDR2), and they together synthesize double-stranded RNA precursors (Haag et al., 2012). An RNase III-like protein, DCL3, processes the double- stranded RNA precursors into 24 nt siRNA duplexes (Xie et al., 2004; Henderson et al., 2006; Gasciolli et al., 2005). In the current model, Pol IV is recruited to target loci through interactions with a histone 3 lysine 9 (H3K9) methyl-group binding protein SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1, also known as DNA-BINDING TRANSCRIPTION FACTOR 1, DTF1). The Pol IV and RDR2-dependent precursors (P4R2 RNA) are short in length (mostly 26 nt to 45 nt), and usually give rise to only one 24 nt het-siRNA duplex per precursor (Zhai et al., 2015a; Blevins et al., 2015).

3

Het-siRNA duplexes are 2’-O-methylated at the 3’ end by HEN1 (Li et al., 2005), and are then loaded into AGO4-clade of AGOs in cytoplasm (Ye et al., 2012). AGO4 slices passenger strands and takes guide strands back to the nucleus (Ye et al., 2012). Another plant specific RNA polymerase, RNA POLYMERASE V (Pol V), synthesizes long non- coding RNAs that interact with AGO4-bound het-siRNAs through sequence-specific base-pairing (Wierzbicki et al., 2008; 2009; Böhmdorfer et al., 2016). Methyl-DNA- binding proteins, SUPPRESSOR OF VARIEGATION 3-9 HOMOLOGUE 2 (SUVH2) and SUVH9, recruit the DDR complex, which is composed of DEFECTIVE IN RNA- DIRECTED DNA METHYLATION 1 (RDM1), DEFECTIVE IN MERISTEM SILENCING 3 (DMS3) and RNA-DIRECTED RNA METHYLATION 1 (RDR1), which further recruit Pol V to target loci (Johnson et al., 2014; Liu et al., 2014c). Facilitated by RDM1, the AGO4-mediated siRNA-target interactions recruit downstream methyltransferase DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2) to catalyze de novo DNA methylation (Cao and Jacobsen, 2002; Naumann et al., 2011; Zhong et al., 2014).

4

Figure 1.1 A simplified model of the RdDM pathway RNA POLYMERASE IV (Pol IV) and RNA-DEPENDENT RNA POLYMERASE 2 (RDR2) together synthesize a double stranded RNA, which is further processed into 24 nt siRNA duplex by DICER-LIKE 3 (DCL3). HUA ENHANCER 1 (HEN1) catalyzes 2'-O- methylation at the 3' end of the 24 nt siRNA duplex. ARGONAUTE 4 (AGO4) loads the 24 nt siRNA duplex in the cytoplasm, slices passenger strand of the duplex, and brings guide strand back to the nucleus. The AGO4-bound 24 nt siRNA interacts with a long non-coding RNA synthesized by RNA POLYMERASE V (Pol V), and recruits a methyltransferase, DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2), by interacting with DEFECTIVE IN RNA-DIRECTED DNA METHYLATION 1 (RDM1). SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1) binds histone 3 lysine 9 dimethylation (H3K9me2), and recruits Pol IV to target loci. SUPPRESSOR OF VARIEGATION 3-9 HOMOLOGUE 2 (SUVH2) and SUVH9 bind methylated DNA, recruit the DDR complex, which further recruits Pol V to target loci.

5

Secondary siRNA Secondary siRNAs are derived from dsRNA precursors that are synthesized by RDRs, following the stimulation of one or more primary small RNAs (Fei et al., 2013). In plants, secondary siRNAs are mostly derived from TAS transcripts (Peragine et al., 2004; Vazquez et al., 2004b; Allen et al., 2005; Yoshikawa et al., 2005; Axtell et al., 2006; Xia et al., 2013), long non-coding RNAs (Johnson et al., 2009; Song et al., 2012; Zhai et al., 2015b), re-activated transposable elements (TEs) (Slotkin et al., 2009; Creasey et al., 2014), and less frequently from mRNAs (Si-Ammour et al., 2011; Parry et al., 2009; Howell et al., 2007; Arribas-Hernández et al., 2016b). Biogenesis of secondary siRNAs is triggered by miRNA-mediated target cleavage (Allen et al., 2005; Yoshikawa et al., 2005). SUPPRESSOR OF GENE SILENCING 3 (SGS3) protects the 3' end of the cleaved transcript, and RDR6 copies the cleaved RNA fragment into a double-stranded RNA precursor (Peragine et al., 2004; Yoshikawa et al., 2013). The dsRNA precursor is subsequently processed into 21 or 22 nt secondary siRNA duplexes by DCL4 and/or DCL2 (Xie et al., 2005b; Gasciolli et al., 2005; Henderson et al., 2006). Some secondary siRNAs are loaded into AGOs, and further repress target RNA in trans. They are therefore named trans-acting siRNAs (tasiRNAs). Many secondary siRNAs are produced successively from precursors by DCL4, with a head-to-tail arrangement (Fei et al., 2013). They are termed phased siRNAs.

Two distinct mechanisms, known as ‘one-hit’ and ‘two-hit’ systems, stimulate secondary siRNA biogenesis (Fei et al., 2013). In ‘one-hit’ pathway, 22 nt miRNAs are loaded in AGO1, direct target cleavage, and trigger downstream secondary siRNA biogenesis (Chen et al., 2010; Cuperus et al., 2010). Alternatively, Manavella et al. (2012) suggested that miRNAs that are derived from asymmetric miRNA/miRNA* duplexes are capable of triggering secondary siRNA biogenesis. The ‘two-hit’ pathway requires two 21 nt miRNAs that are loaded in AGO7 to target one RNA transcript at different sites, resulting in biogenesis of secondary siRNAs upstream of the 3' cleavage site (Axtell et al., 2006).

6

1.3 ARGONAUTE Proteins in Arabidopsis ARGONAUTE (AGO) proteins are the effectors of RNA induced silencing. Small RNAs need to be loaded into AGO proteins to form RNA-induced Silencing Complexes (RISCs) for function. The Arabidopsis thaliana genome has 10 AGOs, which are classified into three clades based on their phylogenetic analysis (Vaucheret, 2008) (Figure 1.2). The three AGO-clades are distinct from each other in two aspects: the small RNAs that they bind, and the pathways that they are involved in (Vaucheret, 2008; Fang and Qi, 2016).

AGO8

AGO9

AGO4

AGO6

AGO5

AGO1

AGO10

AGO7

AGO2

AGO3

0.20

Figure 1.2 Phylogenetic tree of Arabidopsis ARGONAUTEs This phylogenetic tree is prepared by using protein sequences of 10 Arabidopsis thaliana ARGONAUTEs, with the Maximum Likelihood method by Whelan And Goldman model. The branch lengths are measured in the number of substitution per site. This phylogenetic tree is generated by using MEGA7 (Kumar et al., 2016).

7

1.3.1 Function of ARGONAUTE proteins in Arabidopsis AGO1 clade The AGO1-clade of AGOs includes Arabidopsis AGO1, AGO5 and AGO10 (Figure 1.2). AGO1 is the founding member of AGOs. The name of ARGONAUTE comes from the filamentous leaf phenotype of Arabidopsis ago1 mutant, which resembles the tentacles of Argonauta (Bohmert, 1998). AGO1 predominantly selects 21 nt miRNAs, and acts as an target slicer (Baumberger and Baulcombe, 2005; Qi et al., 2005). In plants, perfect or near perfect complementarity between AGO1-loaded miRNA and target transcript is required for efficient target cleavage (Liu et al., 2014b). Knocking-out or knocking-down AGO1 causes pleiotropic phenotypes, mainly due to the involvement of AGO1-mediated gene silencing in many important plant development pathways (Chen, 2009). In addition, AGO1 is a major antiviral effector in Arabidopsis (Carbonell and Carrington, 2015), and is involved in secondary siRNA biogenesis (see above).

AGO5 is specifically expressed in somatic cells in growing pollen tubes and mature pollen, where it promotes megagametogenesis (Tucker et al., 2012). AGO5, along with AGO2, is required for full resistance of Potato Virus X infection in Arabidopsis (Brosseau and Moffett, 2015).

AGO10 and AGO1 are closely related at the phylogenetic level. AGO10 was first recognized as a key factor in the maintenance of the shoot apical meristem (SAM) in Arabidopsis (Moussian, 1998). The expression of AGO10 is restricted in the SAM, provasculature and adaxial side of leaves (Lynn et al., 1999; Moussian, 1998). The role of AGO10 in meristem maintenance has been revealed. AGO10 primarily competes with AGO1 for miR166/165 loading (Zhu et al., 2011). Sequestering of miR166/165 in AGO10 positively regulates HD-ZIP III transcripts accumulation and therefore maintains shoot apical meristem (Zhu et al., 2011). Despite their competition for miRNAs, AGO10 and AGO1 may have overlapping function. While the ago1-3/ago10-1 double mutant is embryonic lethal, the ago1-3 single mutant is viable (Arribas-Hernández et al., 2016a; Mallory et al., 2009). This implies a partially redundant role of AGO10 and AGO1.

8

AGO2 clade The AGO2 clade AGOs includes AGO2, AGO3 and AGO7 (Figure 1.2). While AGO2 and AGO3 are close paralogs, they have distinct preferences of small RNAs. AGO2 primarily associates with 21 nt miRNAs and siRNAs, and plays important roles in antibacterial and antiviral responses in Arabidopsis (Zhang et al., 2011; Harvey et al., 2011). AGO2 is involved in DNA double-strand break (DSB) repair in Arabidopsis. It binds ~21 nt siRNAs that are derived from the vicinity of DSB, and mediates DSB repair (Wei et al., 2012). In contrast, AGO3 primarily associates with 24 nt het-siRNAs, and is probably involved in RdDM pathway (Zhang et al., 2016b). As discussed in Section 1.2.2, AGO7 is primarily involved in the 'two-hit' system of secondary siRNA biogenesis (Adenot et al., 2006; Montgomery et al., 2008).

AGO4 clade The AGO4-clade of AGOs includes Arabidopsis AGO4, AGO6, AGO8 and AGO9 (Figure 1.2). AGO8 expression has not been detected so far, and it has been suggested as a pseudogene (Vaucheret, 2008). AGO4 was identified through a genetic screen for mutants that suppress silencing of the Arabidopsis SUPERMAN (SUP) gene (Zilberman et al., 2003). Decreased asymmetric DNA methylation and decreased het-siRNA accumulation were observed in ago4 mutant plants. AGO4 plays a central role in RdDM: It binds 24 nt het-siRNAs, physically interacts with Pol V, and mediates sequence- specific interactions between het-siRNAs and Pol V transcripts (Matzke and Mosher, 2014).

AGO6 was identified through a genetic screen for suppressors of ros1 mutant (Zheng et al., 2007). AGO6 knock-out mutant showed reduced non-CG methylation at specific loci (Zheng et al., 2007). High-throughput sequencing followed by AGO6 immunoprecipitation revealed that AGO6 primarily binds 24 nt het-siRNAs (Havecker et al., 2010; Duan et al., 2015). AGO6 differs from AGO4 in both expression patterns and function. AGO6 is primarily expressed in root and shoot apical meristems while AGO4 is ubiquitously expressed (Zheng et al., 2007). In addition, AGO6 binds 21 nt siRNAs and mediates non-canonical RDR6-dependent RdDM (McCue et al., 2015).

9

In Arabidopsis, AGO9 expression is constrained in somatic companion cells in ovules (Olmedo-Monfil et al., 2010). It interacts with 24 nt siRNAs that are derived from TEs, and restricts the differentiation of gametophyte precursors through epigenetic reprogramming (Olmedo-Monfil et al., 2010).

1.3.2 Catalytic ability of ARGONAUTEs in Arabidopsis AGOs are composed of four domains: the N-terminal domain, the PAZ domain, the MID and the PIWI domain (Tolia and Joshua-Tor, 2007; Vaucheret, 2008). The N-terminal domain is variable among plant AGOs, and its function is currently unclear (Fang and Qi, 2016). The PAZ and MID domains are critical for small RNA binding. The MID domain binds 5’-monophosphate of small RNAs and determines the base-specific recognition of small RNAs (Frank et al., 2010; 2012). The PAZ domain recognizes the 3’ end of small RNAs, and base-pairing between miRNA and target is predicted to result in release of the 3’ end of miRNA from PAZ domain (Yan et al., 2003; Lingel et al., 2003; Song et al., 2004; Yuan et al., 2005). The PIWI domain, a highly conserved RNase H like module in plant AGOs, equips some AGOs with slicer capacity (Song et al., 2004; Rivas et al., 2005). A three-amino-acid Mg2+-coordinating triad, composed of Aspartate-Aspartate- Histidine (DDH) or Aspartate-Aspartate-Aspartate (DDD), constitutes the apparent catalytic triads of AGOs (Song et al., 2004; Rivas et al., 2005). A fourth amino acid, Glutamate (E), has also been proposed to be involved in active slicing (Nakanishi et al., 2012; Poulsen et al., 2013).

10

AGO1 757 IIFGADVTHP...IIFYRDGVSE...PAYYAHLA AF 992 AGO2 741 MFIGADVNHP...IVIFRDGVSD...PVYYADMV AF 958 AGO3 920 MFIGADVNHP...IVIFRDGVSD...PVSYADKA AS 1138 AGO4 655 IILGMDVSHG...IIIFRDGVSE...PICYAHLA AA 878 AGO5 716 IIMGADVTHP...IIFYRDGVSE...PAYYAHLA AF 951 AGO6 618 LILGMDVSHG...IIIFRDGVSE...PVRYAHLA AA 844 AGO7 729 IFMGADVTHP...IIFFRDGVSE...PAYYAHLA AY 943 AGO8 599 IIIGMDVSHG...IIIFRDGVSE...PICYAHLA AA 804 AGO9 627 IIVGMDVSHG...IIIFRDGVSE...PVCYAHLA AA 850 AGO10 704 IIFGADVTHP...IIFYRDGVSE...PAYYAHLA AF 939 ::.* **.* *:::*****: * **. . *

Figure 1.3 Active catalytic sites in Arabidopsis ARGONAUTEs All 10 Arabidopsis AGO proteins are aligned by Clustal Omega (Sievers et al., 2011). The active catalytic sites are highlighted in yellow. Fully conserved residues are marked by asterisks (*); highly and semi-conserved residues are marked by colons (:) and single dots (.) respectively. The first and last positions of displayed sequences are labeled.

All 10 Arabidopsis AGOs contain the apparent catalytic triads (Figure 1.3). However, the apparent catalytic triads per se do not ensure cleavage of target RNAs. In Arabidopsis, the slicer ability of AGO1, AGO4, AGO7 and AGO10 has been confirmed. AGO1 and AGO7 cleaves target transcripts in vivo (Baumberger and Baulcombe, 2005; Qi et al., 2005; Adenot et al., 2006; Montgomery et al., 2008). Through slicing, they mediate target silencing or secondary siRNA biogenesis. AGO10 is a slicer in vitro, but it is unclear if the slicer activity is required for its function in vivo (Ji et al., 2011; Zhu et al., 2011).

The function of slicer activity in AGO4 is more complicated. In plants, removal of the passenger strand in het-siRNA duplex requires AGO4 slicing (Ye et al., 2012). An in vitro slicing assay demonstrated that AGO4 slices synthetic RNA target (Qi et al., 2006). However, it is unknown whether AGO4 slices het-siRNA targets in vivo. Qi et al. (2006) hypothesized that AGO4 slices het-siRNA targets and initiates secondary het-siRNA biogenesis. However, considering that no AGO4-sliced transcript has been identified so far, an alternative hypothesis may be equally intriguing: AGO4 is reprogrammed in the nucleus and does not slice the scaffold RNA.

11

Poulsen et al. (2013) proposed a model that slicer activity of an AGO protein can be dynamically regulated, depending on whether it associates with GW motif containing proteins or not. The GW-motif of a viral-encoded suppressor protein, P38, directly binds AGO1, and inhibits its slicer activity (Azevedo et al., 2010). Intriguingly, AGO4 physically interacts with the GW-WG motif of NRPE1, the largest subunit of Pol V (El- Shami et al., 2007). If the dynamic model of slicer activity were correct, an altered slicer- ability of AGO4 would be expected upon its interaction with Pol V during RdDM.

1.4 Modification of plant small RNAs Several studies have shown that small RNAs are subject to active modifications that result in size and sequence variants of small RNAs. Modified small RNAs usually show reduced stability, and occasionally, altered function. So far, most of our knowledge about plant small RNA modifications comes from miRNAs (Figure 1.4). The patterns and mechanisms of siRNA modifications in plants remain largely unknown.

Figure 1.4 A model of miRNA modification pathway in Arabidopsis In Arabidopsis, miRNA/miRNA* duplex is processed from pri-miRNA by DICER-LIKE 1 (DCL1). HUA ENHANCER 1 (HEN1) catalyzes 2'-O-methylation at the 3' end of miRNA/miRNA* duplex. Mature miRNA is loaded into ARGONAUTE 1 (AGO1) to mediate target silencing. An unknown demethylates AGO1-loaded miRNA, leading to miRNA uridylation mediated by HEN1 SUPPRESSOR 1 (HESO1) and/or UTP:RNA uridylyltransferase 1 (URT1). Uridylated miRNA is then digested by an unknown enzyme. Alternatively, a 3' to 5' exonuclease SMALL RNA DEGRADING NUCLEASE (SDN) can trim unmethylated miRNA and result in miRNA degradation.

12

1.4.1 The 2'-O-methylation of small RNAs All types of small RNAs in plants, Piwi-interacting RNAs (piRNAs) in animal and siRNAs in Drosophila are subject to HEN1 catalyzed 2'-O-methylation at their 3'- terminal nucleotide (Yu et al., 2005; Horwich et al., 2007; Saito et al., 2007; Kirino and Mourelatos, 2007). HEN1 was first identified in a genetic screen for mutants showing floral developmental defects, and it was then characterized as an essential factor for miRNA biogenesis in Arabidopsis (Chen et al., 2002; Park et al., 2002). Studies of hen1 mutant plants demonstrated that 2'-O-methylation is crucial for the accumulation of all types of plant small RNAs (Li et al., 2005).

The HEN1 protein in Arabidopsis contains five structural domains: a methyltransferase (MTase) domain, two double-stranded RNA binding domains (dsRBDs), a La-motif- containing domain (LCD) and a PPIase-like domain (PLD) (Huang et al., 2009). The Arabidopsis HEN1 binds small RNAs as a monomer (Huang et al., 2009). Analysis of HEN1 crystal structure revealed that the preferred length of small RNAs for HEN1- binding is determined by the distance between the of MTase domain and the 5'-end-capping site in LCD (Huang et al., 2009).

In Arabidopsis, small RNAs are methylated before being loaded into AGO proteins (Yu et al., 2005; Li et al., 2005). The Arabidopsis HEN1 preferentially catalyzes 21-24 nt small RNA duplexes which have 2 nt 3'-overhangs (Yang et al., 2006c). In plants, loss of HEN1 function results in significant reduction of miRNA accumulation and over- accumulated size variants with 3'-truncation or tailing (Li et al., 2005; Zhai et al., 2013). In animals, HEN1 homologues primarily methylate AGO- or PIWI-loaded single- stranded RNAs (Horwich et al., 2007; Saito et al., 2007; Kurth and Mochizuki, 2009). Loss of HEN1 function in animals also leads to 3'-truncation or tailing of small RNAs (Saito et al., 2007; Horwich et al., 2007; Kirino and Mourelatos, 2007; Kurth and Mochizuki, 2009; Kamminga et al., 2012; Billi et al., 2012).

13

1.4.2 Tailing of miRNAs miRNA uridylation Non-templated nucleotide additions at 3' termini are commonly observed in plant miRNAs, especially in the hen1 mutant background (Li et al., 2005; Zhai et al., 2013). Non-templated uridylation is the predominant form of tailing in plant miRNAs (Li et al., 2005; Zhai et al., 2013). In Arabidopsis, HESO1 is the major uridyltransferase. In hen1- 1/heso1-2 double mutant plants, miRNA accumulation is partially recovered from hen1- 1, resulting in moderately reduced developmental defects (Ren et al., 2012). This suggests that HESO1 destabilizes miRNAs by catalyzing miRNA uridylation. The remaining 3'-non-templated uridines in hen1-1/heso1-2 mutant plants could be partially explained by URT1, a paralog of HESO1 (Tu et al., 2015; Wang et al., 2015b). Uridylation of miRNA is further abolished in hen1-2/heso1-2/urt1-3 comparing to hen1- 2/heso1-2, implying functional redundancy of URT1 and HESO1 (Tu et al., 2015; Wang et al., 2015b). In the green alga Chlamydomonas reinhardtii, a HESO1 homologue MUT68 catalyzes non-templated uridylation of small RNAs, suggesting that small RNA uridylation is a widespread phenomenon (Ibrahim et al., 2010).

HESO1 can only uridylate unmethylated small RNAs (Ren et al., 2012). This is consistent with the observation that active and global uridylation only occurs in hen1 mutant, but not in wild-type plants where most mature miRNAs are methylated (Li et al., 2005; Zhai et al., 2013). Demethylation and subsequent uridylation of miRNAs may occur after AGO1-loading. This notion is supported by the following observations: HESO1 physically interacts with AGO1 (Ren et al., 2014); HESO1 and URT1 are able to uridylate AGO1-bound miRNA (Tu et al., 2015; Wang et al., 2015b); and uridylated miRNAs are recovered from AGO1 immunoprecipitation (Zhai et al., 2013). In addition, HESO1 is involved in cleaved target degradation in Arabidopsis. It uridylates the 5' fragments of AGO1-cleaved mRNA (Ren et al., 2014).

Occasionally, uridylation of miRNA alters the function of miRNA. In the hen1 mutant, single uridine addition to the 3' end of miR171 results in a 22 nt miRNA variant, which triggers phased tasiRNA biogenesis from target mRNA (Zhai et al., 2013; Tu et al.,

14

2015). This phenomenon seems to be miR171-specific, as no other uridylated miRNA variants have been reported to trigger secondary siRNAs biogenesis. miRNA adenylation Plant miRNAs undergo adenylation less frequently. In Populus trichocarpa (black cottonwood), significant proportion of isolated miRNAs contain one or more non- templated adenosines at their 3' ends (Lu et al., 2009). The adenylated miRNAs showed reduced degradation rates in an in vitro degradation assay (Lu et al., 2009).

1.4.3 Trimming of miRNAs In plants, methylated miRNAs can be trimmed by a family of 3’ to 5’ exonuclease named SMALL RNA DEGRADING NUCLEASE (SDN) (Ramachandran and Chen, 2008). SDNs specifically act on single-stranded miRNA in vitro, and their catalytic activities are sensitive to 2'-O-methylation and 3'-uridylation of miRNAs (Ramachandran and Chen, 2008). Simultaneously knocking out SDN1, SDN2 and SDN3 results in elevated accumulation of miRNAs, indicating SDNs are responsible for miRNA degradation (Ramachandran and Chen, 2008).

It is still unclear how the uridylated miRNAs are eventually degraded in flowering plants. SDNs are not involved in the digestion of uridylated miRNAs because their activity is inhibited by miRNA uridylation (Ramachandran and Chen, 2008). Intriguingly, a Chlamydomonas reinhardtii 3’ to 5’ exonuclease, RRP6, is capable of digesting uridylated, but not 2'-O-methylated miRNAs in vitro, and depletion of RRP6 leads to increased small RNA accumulation (Ibrahim et al., 2010). This suggests that RRP6 is required for the degradation of uridylated small RNAs in green alga.

1.4.4 miRNA editing RNA editing refers to modifications to specific nucleotides in RNA molecules, such as cytosine (C) to uridine (U) changes and adenosine (A) to inosine (I) deaminations. In plants, RNA editing is frequently observed in mitochondria and plastids (Takenaka et al., 2013). RNA editing in plant miRNAs has not been reported so far. In contrast, adenosine

15

to inosine (A-to-I) editing in miRNA precursors is common in various animal tissues and cell types (Luciano et al., 2004; Kawahara et al., 2008; Berezikov et al., 2011; Wulff and Nishikura, 2012; Ekdahl et al., 2012). The editing in animal miRNAs can reduce miRNA stability (Yang et al., 2006b), affect strand selection during RISC assembly (Iizasa et al., 2010), redirect miRNA to different set of targets (Kawahara et al., 2007), and alter silencing efficiency (Kume et al., 2014).

1.4.5 Heterochromatic siRNA modification Despite the abundance of het-siRNAs in plants, the pattern and mechanism of het- siRNAs modification are largely unknown. Het-siRNAs, like miRNAs, are subject to 2’- O-methylation at their 3’ ends (Li et al., 2005). Tailed variants of a few abundantly accumulated het-siRNAs were observed in the hen1 mutant (Li et al., 2005; Ren et al., 2012; Zhao et al., 2012b), indicating that unmethylated het-siRNAs are subject to nucleotide addition. However, it is unknown if het-siRNA modification is prevalent in all het-siRNA loci, especially in the ones with low accumulation levels. The preference of non-templated nucleotides in het-siRNA modification is also unclear.

The precursors of het-siRNAs, termed Pol IV and RDR2 dependent RNAs (P4R2 RNAs), are also subject to modification. One or more non-templated nucleotides are frequently added to the 3’ end of P4R2 RNAs (Zhai et al., 2015a; Blevins et al., 2015). The template DNA nucleotide corresponding to the first non-templated position in P4R2 RNA tends to be a cytosine (Zhai et al., 2015a). In the ddm1 mutant, where DNA methylation is strongly depleted at genome-wide level, the accumulation of P4R2 RNAs carrying non- templated nucleotides significantly reduced (Zhai et al., 2015a). Therefore, Zhai et al. (2015a) proposed that Pol IV transcription is prone to errors at methylated cytosines. Blevins et al. (2015) showed that RDR2 could catalyze terminal nucleotidyl addition to P4R2 RNAs in vitro, and proposed that RDR2 incorporates non-templated nucleotides to the 3’ end of het-siRNA precursors. Nevertheless, it is unknown if the mature het- siRNAs could obtain the non-templated nucleotides that are initially introduced to the longer precursors.

16

1.5 RNA-directed DNA methylation RNA-directed DNA methylation (RdDM) was first discovered in viroid infected transgenic tomato plants, where viroid RNA induced methylation of its own cDNA that had been integrated in host genome (Wassenegger et al., 1994). It was then understood that methylation of the transgene DNA was induced by siRNAs derived from longer double-stranded RNA precursors (Mette et al., 2000; Sijen et al., 2001). RdDM was also observed in plant endogenous genes, though the nature of it was initially unknown. In Wassilewskija ecotype of Arabidopsis, inverted-repeats in PAI1 gene can cause stable methylation of unlinked paralogs, PAI2 and PAI3 genes, resulting in transcriptional gene silencing (Luff et al., 1999). The core genes involved in RdDM were then characterized and extensively studied. Here I summarize the canonical and non-canonical RdDM pathways, and focus on a few unsolved issues in the canonical RdDM pathway.

1.5.1 Canonical RdDM pathway Identification and characterization of the genes involved in RdDM has been accomplished using genetic and biochemical screens in Arabidopsis thaliana. In the current model, the canonical RdDM is composed of three steps: (1) biogenesis of 24 nt het-siRNAs which requires key components including Pol IV, RDR2 and DCL3; (2) target recognition and de novo methylation which requires Pol V, AGO4 and DRM2; (3) genome condensation which is downstream of de novo DNA methylation. The first two steps of RdDM are discussed in section 1.2.2 (Figure 1.1).

As an outcome of the first two steps of RdDM, target loci are de novo methylated. There are three contexts of cytosine methylation in genome: CG, CHG and CHH (H represents A, C, T). The CG and CHG methylation, but not CHH methylation, can be catalyzed in methylation maintenance pathways that are independent of RdDM (Law and Jacobsen, 2010). In plants, CG methylation is maintained by DNA METHYLTRANSFERASE 1 (MET1); CHG methylation is maintained by a self-reinforcing loop involving a plant- specific methyltransferase CHROMOMETHYLASE 3 (CMT3) and a H3K9 methyltransferase SUVH4 (Law and Jacobsen, 2010). In contrast, cytosines in all three

17

contexts can be de novo methylated by RdDM, with CHH methylation as a hallmark of this pathway (Law and Jacobsen, 2010).

Much less is known about the components involved in genome condensation. A landmark of genome condensation is the deposition of repressive histone marks. Lysine acetylation and H3K4 mono-methylation (H3K4me) are transcriptionally active marks on histone. In contrast, H3K9me and H3K9me2 are repressive marks on histone. In Arabidopsis, HISTONE DEACETYLASE 6 (HDA6) maintains heterochromatin silencing in cooperation with MET1 (Aufsatz et al., 2002; Earley et al., 2006; To et al., 2011; Liu et al., 2012; Blevins et al., 2014). JMJ14, a Jumonji C domain containing protein, participates in AGO4-dependent TE repression by catalyzing H3K4me demethylation (Lu et al., 2010; Searle et al., 2010). Arabidopsis SUVH4, SUVH5 and SUVH6, which catalyze H3K9me and H3K9me2, facilitate transcriptional gene silencing (Malagnac et al., 2002; Jackson et al., 2004; Ebbs et al., 2005; Ebbs and Bender, 2006). The canonical RdDM pathway undergoes a self-reinforcing loop to enhance DNA methylation and transcriptional gene silencing. RdDM leads to DNA methylation and repressive histone modifications as discussed above. Meanwhile, SHH1/DTF1 binds H3K9me2, and recruits Pol IV to certain loci through protein-protein interaction (Law et al., 2013; Zhang et al., 2013). Similarly, Pol V is recruited to RdDM target loci by SUVH2 and SUVH9, which are guided to repressive chromatin by binding methylated DNA (Johnson et al., 2014; Liu et al., 2014c). Taken together, canonical RdDM represents a de novo DNA methylation maintenance pathway.

1.5.2 Non-canonical RdDM pathway Canonical RdDM is involved in active maintenance of pre-existing de novo DNA methylation. However, it cannot explain how transcriptionally active regions are initially methylated and turned into transcriptionally repressive regions. Increasing evidence implies a non-canonical RdDM pathway, where de novo DNA methylation is triggered by 21-22 nt siRNAs instead of 24 nt het-siRNAs (Fultz et al., 2015). The small RNAs involved in non-canonical RdDM are Pol IV-independent. Instead, they are triggered from Pol II transcripts including TAS RNAs (Wu et al., 2012), re-activated transposable

18

elements (McCue et al., 2012; Nuthikattu et al., 2013; Panda et al., 2016), and viral RNAs (Bond and Baulcombe, 2015). These Pol II-dependent transcripts are copied into double-stranded RNAs by RDR6, and further processed into 21-22 nt siRNAs by DCL4 and/or DCL2 (Wu et al., 2012; Nuthikattu et al., 2013; Bond and Baulcombe, 2015). In complex with AGO6, these 21-22 nt siRNAs potentially interact with Pol V-dependent scaffold RNAs, and mediate de novo DNA methylation by recruiting DRM2 (McCue et al., 2015). The non-canonical RdDM pathway represents the establishment of de novo DNA methylation at transcriptionally active loci. Transcriptionally active loci that are initially targeted by 21-22 nt siRNAs can transit to transcriptionally repressed loci which are further maintained by 24 nt siRNAs over time.

1.5.3 Biological roles of RdDM In plants, cytosine methylation is an important epigenetic mark that often associates with gene repression and chromosome condensation (Law and Jacobsen, 2010; Matzke and Mosher, 2014). As discussed above, the 24 nt siRNAs direct cytosine methylation in all three different sequence contexts (CG, CHG, and CHH, where H represents A, T or C). Many of the 24 nt siRNA originating loci overlap with transposable elements (TEs) and repetitive sequences (Mosher et al., 2008), suggesting a role of RdDM in TE silencing. In Arabidopsis, genome-wide loss of cytosine methylation was observed in many RdDM- defective mutants (Stroud et al., 2013b). Interestingly, the Arabidopsis RdDM-mutant plants usually do not display obvious morphological defects. This is possibly due to the fact that only a very small amount of TEs are re-activated and translocated in RdDM- defective Arabidopsis plants (Zemach et al., 2013). In contrast, RdDM-defective maize plants often display pleiotropic morphological abnormalities (Alleman et al., 2006; Parkinson et al., 2007; Nobuta et al., 2008; Erhard et al., 2009; Hale et al., 2009). It has been reported that loss of RdDM in maize results in large-scale TE and gene expression changes (Woodhouse et al., 2006; Jia et al., 2009), which may contribute to the morphological defects in the maize RdDM-mutants. All together, these observations imply a primary role of RdDM in TE repression.

19

1.5.4 Unsolved issues in canonical RdDM Although extraordinary progress has been made in understanding the canonical RdDM pathway, there are still many unsettled questions. The details of AGO4-mediated target recognition are especially desired, due to the critical role of AGO4-mediated target recognition in initiating downstream DNA methylation and chromosomal remodeling. Here I summarize a few unsolved issues related to AGO4 activity.

Role of AGO4 in het-siRNA accumulation In the current canonical RdDM model, the biogenesis of 24 nt het-siRNAs depends on Pol IV, RDR2 and DCL3, while AGO4 is not directly required. Nevertheless, AGO4 is involved in 24 nt het-siRNA accumulation: The accumulation of 24 nt het-siRNAs at some loci is AGO4-dependent; and the slicing-defective AGO4 cannot fully complement 24 nt siRNA accumulation at certain loci (Zilberman et al., 2003; Qi et al., 2006). It has been hypothesized that the biogenesis of a subset of 24 nt het-siRNAs is triggered by AGO4-mediated target slicing (Qi et al., 2006). This hypothesis is of interest because it is consistent with the observation that AGO4 is required for the biogenesis of secondary het-siRNAs, which promotes the spreading of RdDM (Daxinger et al., 2009). However, it is not clear why the accumulation of het-siRNAs from some other loci is not affected in the plants with a slicing-defective AGO4 (Qi et al., 2006). An alternative hypothesis is that the AGO4-dependent het-siRNAs come from a stepwise self-reinforcing loop. Both Pol IV and Pol V are recruited to genomic regions with previously established repressive marks (Law et al., 2013; Zhang et al., 2013; Liu et al., 2014c; Johnson et al., 2014). Pol V-dependent scaffold RNAs interact with AGO4-associated siRNAs, which in turn enhances subsequent DNA methylation and histone modification (Wierzbicki et al., 2009; Böhmdorfer et al., 2014; 2016). A few sRNA-seq experiments revealed that Pol V is required for the accumulation of a small subset of het-siRNAs, which may represent the production of secondary het-siRNAs derived from previously methylated genomic loci (Pontier et al., 2005; Mosher et al., 2008; Böhmdorfer et al., 2016). However, it is unknown if AGO4 contributes to the accumulation of the same population of secondary siRNAs.

20

The elusive target of het-siRNAs Regardless of the abundance of het-siRNAs, details of the interactions between het- siRNAs and their targets have not been reported. Pol V transcription is essential for het- siRNA mediated DNA methylation and target silencing, and a key het-siRNA associated AGO protein, AGO4, can co-immunoprecipitate with Pol V-dependent RNAs (Wierzbicki et al., 2009; Böhmdorfer et al., 2014; 2016). AGO4 physically interacts with the C-terminal GW-WG repetitive domain of NRPE1, the largest subunit of Pol V (El- Shami et al., 2007). These and similar observations have led to the predominant model for het-siRNA targeting: het-siRNA/AGO complexes identify complementary sites on nascent Pol V transcripts, and successful identification leads to recruitment of DNA methyltransferases to the vicinity of the locus. More recently, Böhmdorfer et al. (2016) systematically analyzed Pol V- and AGO4-bound long non-coding RNAs through high- throughput sequencing of immunoprecipitated RNAs, and found a significant overlap. This experiment provides genome-wide inventories of Pol V transcripts in Arabidopsis, and further implies interactions between Pol V and AGO4; however, a fine map of the genome-wide interactions between het-siRNAs and Pol V transcripts is still lacking.

It is striking that despite intense research on het-siRNAs, there has not yet been a single experimentally confirmed example of a discrete, single het-siRNA target site on a discrete Pol V RNA. Therefore, an alternative hypothesis of het-siRNA targeting is also possible. Matzke and Mosher (2014) suggest that het-siRNA/AGO complexes target genomic DNA instead of Pol V-transcribed nascent RNA. In this model, Pol V transcription is required not to produce RNA per se, but to open a transcription bubble that allows het-siRNA/AGO complexes the opportunity to scan for sequence matches in the genomic DNA. The demonstrated associations between AGO4 and Pol V RNAs could be interpreted as indirect consequences of their proximity, rather than as evidence of a direct het-siRNA/Pol V RNA interaction. That AGO4 is in close proximity to chromatin at Pol V-occupied loci is not in doubt, as AGO4 chromatin immunoprecipitation-sequencing (ChIP-Seq) has been shown to produce useful results

21

(Zheng et al., 2013). Further study is required to determine whether the alternative hypothesis of a direct het-siRNA/genomic DNA interaction is true.

1.6 Objectives I aim to investigate the non-templated nucleotides in plant siRNAs, the role of AGO4 in het-siRNA accumulation, and the targets of AGO4-associated siRNA in Arabidopsis. My study will be discussed in the following chapters:

Chapter 2: I systematically analyzed non-templated nucleotides in small RNAs from multiple plant species, summarized the localization and nucleotidyl preference of non- templated nucleotides in het-siRNAs, and proposed a model of their origin. This work presented in Chapter 2 has been published in Nucleic Acids Research (Wang et al., 2016).

Chapter 3: I generated transgenic plants with a slicing-defective AGO4 or wild-type AGO4 in both ago4-4 single mutant and ago4-4/ago6-2/ago9-1 triple mutant in Arabidopsis, and analyzed small RNA accumulation in these transgenic plants. I characterized AGO4-dependent siRNAs at genome-wide level, and studied their dependency on other key RdDM proteins. Finally, I analyzed the role of AGO4 slicer- activity on het-siRNA accumulation. This work presented in Chapter 3 has been submitted for review in The Plant Journal, and is currently under revision.

Chapter 4: I developed a pipeline to de novo identify het-siRNA:target interactions in Arabidopsis, through crosslinked AGO4-immunoprecipitation and subsequent high- throughput sequencing. A handful of het-siRNA:target interactions are recovered from RNA-seq libraries. Intriguing features of het-siRNA:target interactions are discussed.

Chapter 5: I summarize results in previous chapters and propose future directions.

22

Chapter 2 † Genome-Wide analysis of single non-templated nucleotides in plant endogenous siRNAs and miRNAs

2.1 Introduction Plant regulatory small RNAs, which are usually 20 - 24 nt in length, are classified into different sub-categories based on their biogenesis and function (Axtell, 2013a). Small RNA genes can be empirically annotated by the pattern of alignments and size of predominant small RNAs (Axtell, 2013b; Shahid and Axtell, 2014). Plant microRNAs (miRNAs), usually 21 nt in length, and heterochromatic small interfering RNAs (het- siRNAs), usually 24 nt in length, are the two major types of plant small RNAs. Plant miRNAs are processed from single stranded hairpin-forming primary RNAs and mediate post-transcriptional silencing by triggering target mRNA slicing and/or translational repression (Rogers and Chen, 2013). Plant het-siRNAs are processed from double stranded RNA and mediate silencing by RNA-directed DNA methylation (RdDM) (Borges and Martienssen, 2015).

Though plant miRNAs and het-siRNAs have different biogenesis and functions, they both confer silencing through base-pairing interactions with target RNAs (Wierzbicki et al., 2009; Liu et al., 2014b; Wang et al., 2015a). Post-transcriptional modifications of small RNAs can lead to reduced targeting specificity and/or altered small RNA stability (Zhao et al., 2012a). In plants, both miRNAs and het-siRNAs are subject to 2'-O- methylation of the 3'-most ribose by the small RNA methyltransferase HUA ENHANCER 1 (HEN1), which protects plant small RNAs from degradation (Yu et al., 2005; Li et al., 2005). Small RNAs in hen1 mutant backgrounds are subject to extensive

† This chapter is modified and reproduced from the following published work in Nucleic Acids Research (Wang et al., 2016). Authors: Feng Wang, Nathan R. Johnson, Ceyda Coruh and Michael J. Axtell. Author contributions: Feng Wang analyzed sRNA-seq data. Feng Wang and Ceyda Coruh prepared sRNA-seq libraries. Nathan Johnson contributed simulated sRNA-seq libraries. Feng Wang and Michael Axtell wrote the manuscript. Michael Axtell conceived of the project.

23

3' to 5' exonucleolytic truncation and/or 3' tailing (Rogers and Chen, 2013). Various 3' end nucleotide addition patterns of plant miRNAs have been observed in hen1 mutants as well as in wild-type plants. 3' end uridylation of unmethylated miRNAs, which is the predominant 3' tailing form in Arabidopsis hen1 plants, usually signals destabilization of miRNAs. HEN1 Repressor 1 (HESO1) is the primary uridyltransferase for uridine addition on the 3' ends of miRNAs (Ren et al., 2012; Zhao et al., 2012b). UTP:RNA Uridyltransferase 1 (URT1) cooperatively uridylates miRNAs along with HESO1 in Arabidopsis (Tu et al., 2015; Wang et al., 2015b). Other than triggering miRNA destabilization, uridylation also can affect the function of miRNAs. A uridylated 22 nt variant of miR171a that only arises in a hen1 mutant background can trigger secondary phased siRNA biogenesis (Zhai et al., 2013; Tu et al., 2015). Non-templated nucleotides other than U are also observed in the hen1 background, but their biological functions are less well understood. Interestingly, Lu et al. observed that 3' adenylated miRNAs in Populus trichocarpa seemed to show a slower degradation rate (Lu et al., 2009).

Much less is known about the patterns of non-templated nucleotides of het-siRNAs. Like miRNAs, het-siRNAs are HEN1 substrates, and a few highly abundant 24 nt het-siRNAs were shown to be truncated and tailed in the hen1 mutant (Li et al., 2005; Zhao et al., 2012b; Wang et al., 2015b). However, genome-wide study of non-templated nucleotides in het-siRNAs is made more difficult by the heterogeneity inherent to het-siRNA biogenesis. Unlike miRNAs, where typically one or two major mature miRNA sequences accumulate, het-siRNAs from a given locus are much more sequence-diverse. This high diversity and general lack of a single dominant 'major' RNA species largely prevents approaches, such as miTRATA (Patel et al., 2016), that rely on identifying non-templated variants of a single abundant . In this study, we take an alternative approach based on mismatches between aligned small RNAs to the corresponding reference genome to comprehensively profile single-nucleotide non-templated nucleotides in plant miRNAs and het-siRNAs.

24

2.2 Materials and Methods 2.2.1 Small RNA sequencing library preparation Arabidopsis thaliana (ecotype Col-0) plants were grown at 21 oC, with 16 h day/8h night. Inbred B73 maize plants were grown in a greenhouse at ~28 oC. Total RNA from immature Arabidopsis inflorescence and fully expanded maize leaves was extracted using the miRNeasy Mini kit (Qiagen) per the manufacturer's instructions. Small RNA libraries were prepared using the TruSeq Small RNA kit (Illumina) per the manufacturer's instructions. Small RNA libraries were sequenced on HiSeq2500 (Illumina) with 50 nt read single-end runs. Raw data have been deposited at NCBI GEO under accession GSE79119 (Arabidopsis libraries) and GSE77657 (maize libraries).

2.2.2 Source and processing of small RNA sequencing datasets Sources and accessions of small RNA sequencing datasets analyzed in this study are in Table 2.1. For A. thaliana AGO4 immunoprecipitation datasets (raw accession numbers SRR189808, SRR189809, SRR189810, and SRR189811), 3' adapters were removed by Cutadapt (Martin, 2011) with options -a TCGTATGC -e 0.1 -O 5 -m 15. Trimmed reads were then aligned to the A. thaliana reference genome by ShortStack 3.3 (Johnson et al., 2016) with default settings. Other datasets were processed directly by ShortStack 3.3 with default settings. 3'-adapter sequences of all libraries and reference genomes used for alignment are shown in Table 2.1. The alignment settings retained an alignment containing a single mismatch only if there were no possible perfectly aligned positions for the read in question.

25

Table 2.1 Data sources and accession number Accession Ecotype or Number Species Cultivar Genotype Tissue Genome Figure(s) used GSM2086247 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.1 to 2.3, 2.5, 2.6, 2.11 and 2.12 GSM2086248 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.1 to 2.3, 2.5, 2.6, 2.11 and 2.12 GSM2086249 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.1 to 2.3, 2.5, 2.6, 2.11 and 2.12 GSM2055763 Zea mays B73 wild-type Leaf AGPv3 Figures 2.1, 2.5 and 2.7 GSM2055764 Zea mays B73 wild-type Leaf AGPv3 Figures 2.1, 2.5 and 2.7 GSM2055765 Zea mays B73 wild-type Leaf AGPv3 Figures 2.1, 2.5 and 2.7 GSM1081563 Oryza Sativa Nipponbare wild-type Leaf MSU 7 Figures 2.1, 2.5 and 2.7 GSM1081564 Oryza Sativa Nipponbare wild-type Leaf MSU 7 Figures 2.1, 2.5 and 2.7 GSM1081565 Oryza Sativa Nipponbare wild-type Leaf MSU 7 Figures 2.1, 2.5 and 2.7 GSM1093595 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM1093596 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM1093597 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM1093598 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM1194296 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM1194297 Physcomitrella patens Gransden wild-type 10 day old protonemata v3 Figure 2.5 GSM577998 Solanum lycopersicum M82 wild-type Seedlings aerial tissue SL2.40 Figures 2.5 and 2.7 GSM577999 Solanum lycopersicum M82 wild-type Seedlings aerial tissue SL2.40 Figures 2.5 and 2.7 GSM1651987 Medicago truncatula Jemalong A17 wild-type Leaf Mt4.0 Figures 2.5 and 2.7 GSM1651988 Medicago truncatula Jemalong A17 wild-type Root Mt4.0 Figures 2.5 and 2.7 GSM1651989 Medicago truncatula Jemalong A17 wild-type Seedling Mt4.0 Figures 2.5 and 2.7 GSM1178880 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1178881 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1178882 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1668897 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1668899 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1668905 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figures 2.4 and 2.5 GSM1489494 Arabidopsis thaliana L-er wild-type Inflorescence Ler-0 v7 Figure 2.8 GSM1489495 Arabidopsis thaliana L-er wild-type Inflorescence Ler-0 v7 Figure 2.8 GSM1489496 Arabidopsis thaliana L-er hen1-2 Inflorescence Ler-0 v7 Figure 2.8 GSM1489497 Arabidopsis thaliana L-er hen1-2 Inflorescence Ler-0 v7 Figure 2.8 GSM1489498 Arabidopsis thaliana L-er hen1-2/heso1-2 Inflorescence Ler-0 v7 Figure 2.8 GSM1489499 Arabidopsis thaliana L-er hen1-2/heso1-2 Inflorescence Ler-0 v7 Figure 2.8

26

GSM1489500 Arabidopsis thaliana L-er hen1-2/heso1-2/urt1-3 Inflorescence Ler-0 v7 Figure 2.8 GSM1489501 Arabidopsis thaliana L-er hen1-2/heso1-2/urt1-3 Inflorescence Ler-0 v7 Figure 2.8 GSM1489502 Arabidopsis thaliana L-er heso1-2 Inflorescence Ler-0 v7 Figure 2.8 GSM1489503 Arabidopsis thaliana L-er heso1-2 Inflorescence Ler-0 v7 Figure 2.8 GSM1489504 Arabidopsis thaliana L-er heso1-2/utr1-3 Inflorescence Ler-0 v7 Figure 2.8 GSM1489505 Arabidopsis thaliana L-er heso1-2/utr1-3 Inflorescence Ler-0 v7 Figure 2.8 GSM707686 Arabidopsis thaliana Col-0 wild-type Inflorescence TAIR10 Figure 2.10 GSM707687 Arabidopsis thaliana Col-0 wild-type Leaf TAIR10 Figure 2.10 GSM707688 Arabidopsis thaliana Col-0 wild-type Root TAIR10 Figure 2.10 GSM707689 Arabidopsis thaliana Col-0 wild-type Seedling TAIR10 Figure 2.10 GSM1668906 Arabidopsis thaliana Col-0 dcl2/dcl4 Inflorescence TAIR10 Figure 2.9

27

2.2.3 Preparation of Simulated sRNA-seq Libraries Simulation was accomplished through the use of the script sRNA-simulator.py (Supplemental Script) with default settings. Selection of loci for simulated libraries used a hybrid approach based on prior annotations as well as real alignment profiles of sRNA- seq libraries, producing the three commonly-studied classes: miRNAs, het-siRNAs and trans-acting siRNAs (tasiRNAs). 15 Arabidopsis, 12 rice and 21 maize libraries (Supplemental Table 1) were sourced for simulation, using the TAIR10, MSU 7 and AGPv3 reference genome assembly, respectively. Selected libraries were aligned to the corresponding reference genome using Bowtie (Langmead et al., 2009) with settings to map all locations for every read. High-confidence miRNA loci as listed in miRBase 21 (Kozomara and Griffiths-Jones, 2014) were used for miRNA simulation. Loci that were dominated by 23-24 nt or 21 nt RNAs were not used for miRNA simulation, and were considered candidates for het-siRNA and tasiRNA loci, respectively. Approximate 5 million reads were simulated from selected small RNA producing loci in each library. Thirty percent of simulated reads mimicked RNAs from miRNA loci, with the abundant 21 nt RNAs in a stranded miRNA/miRNA* pattern. Five percent of simulated reads mimicked tasiRNAs, with 21 nt RNAs arising from 125 nt long loci in a phased pattern on both strands. Het-siRNAs represented 65% of total reads, as 24 nt siRNAs were produced from 200-1000 nt long loci, which closely emulating het-siRNA processing in vivo (Blevins et al., 2015; Zhai et al., 2015a). Single nucleotide errors were randomly incorporated at a rate of one nucleotide per 10,000 reads. All loci produced a distribution of differently sized reads, mimicking DCL mis-processing. Once produced, simulated data were analyzed identically to real data.

2.2.4 Analysis of non-templated nucleotides BAM-formatted alignment files from ShortStack 3.3 (Johnson et al., 2016) output were used to analyze mismatched nucleotides. Non-mappers and secondary alignments were removed from the BAM files by SAMtools 1.1 (Li et al., 2009) with options view -b -F 4 -F 256. The remaining reads were then intersected into high-confidence miRNA clusters (Kozomara and Griffiths-Jones, 2014) as listed in miRBase 21, ShortStack de novo annotated miRNA clusters, clusters dominated by 21 nt siRNAs, clusters dominated by

28

24 nt siRNAs, and clusters where less than 80% of the aligned reads were between 20 and 24 nts by Bedtools 2.19.1 (Quinlan and Hall, 2010). Most de novo annotated MIRNA loci by ShortStack overlapped prior miRBase annotations. MIRNA loci that were truly novel and were reproducible (called de novo by ShortStack from all analyzed sRNA-seq libraries in specific species) were shown in Supplemental Table 2 and Supplemental Data Set 1. If a reference genome build used in this study was different from the one in miRBase 21, coordinates of high-confidence miRNA loci were updated by aligning the hairpin sequence from miRBase 21 to the appropriate genome build using BLAST 2.2.30 (Camacho et al., 2009). Reads in each cluster were further grouped based on their length. The 'MD:Z' field from the alignment lines was used to determine positions of bases with mismatches between the read and the reference genome. Fractions of reads with mismatches in each cluster class were calculated. To analyze mismatch frequencies in the 24 nt-dominated clusters, biological replicates were merged as a single library. Reads with or without mismatched nucleotides within the same size group were processed by WebLogo 3.4 (Crooks et al., 2004) with options --format pdf --yaxis 1 --color-scheme classic.

2.3 Results 2.3.1 Elevated mismatch rates at the 3' ends of genome-aligned plant miRNAs and siRNAs are due to non-templated nucleotides To study global patterns of non-templated nucleotides in plant small RNAs, we first analyzed small RNA deep sequencing (sRNA-seq) libraries prepared from wild-type Arabidopsis, maize, and rice tissues (Stroud et al., 2013a). After adapter trimming, reads with lengths of 15 nt or longer were aligned to their corresponding genomes allowing no more than one mismatch. Alignments with single mismatches were allowed only if there were no perfectly-matched alignments for a given read. Each alignment with a single mismatch could potentially arise from a non-templated nucleotide, sequencing error, software error, a single-nucleotide polymorphism (SNP) between the specimen sampled for sRNA-seq and the reference genome assembly, or an error in the reference genome assembly itself. We observed elevated rates of mismatches at the 3'-most nucleotide regardless of RNA sizes and plant species (Figure 2.1A). While mismatches at 3'

29

positions were always elevated, the rates also varied by RNA size and species (Figure 2.1A). The position-specific nature of this pattern rules out SNPs and reference genome assembly errors as major contributors because there is no obvious reason that these situations would consistently result in mismatches predominantly at the 3'-most nucleotides of the reads.

To test whether the elevated 3' mismatch rates were caused by software errors, we first examined simulated sRNA-seq datasets. 15 simulated Arabidopsis small RNA libraries, 21 simulated maize small RNA libraries, and 12 simulated rice small RNA libraries were aligned to their corresponding reference genomes. The simulated data included single- nucleotide errors at random positions at a rate of 0.01%. We didn't observe elevated mismatch rates on the 3' ends of the simulated libraries (Figure 2.1B). We then tested whether the elevated 3' mismatch rates were due to software issues associated with 3' adapter trimming; the simulated data did not include 3' adapters and as such weren't trimmed. We trimmed 3' adapter sequences of the same real Arabidopsis sRNA-seq datasets by using Cutadapt 1.8.3 (Martin, 2011) instead of ShortStack 3.3 (Johnson et al., 2016). Elevated rates of 3' mismatches were still observed for all size groups of small RNAs (Figure 2.2). These results indicate that the high frequencies of 3' mismatches seen in plant sRNA-seq data are not likely due to a systemic software artifact.

30

Figure 2.1 Elevated mismatch rates at the 3'-most positions of genome-aligned plant small RNAs A. Mismatch rates for reference-genome aligned sRNA-seq reads of the indicated sizes and species. Rates for the 5'-most nt, all middle nts, and the 3'-most nt are plotted separately. Multiple data points for the same species and RNA size indicate values from different replicate sRNA-seq libraries. Circles: biological replicates. Bars: mean values. Libraries that were analyzed are listed in Table 2.1. B. Mismatch rates for reference-genome aligned sRNA-seq reads from simulated datasets.

31

Figure 2.2 3'-mismatch rates after using an alternative method of 3'-adapter trimming The 3' adapters of reads from Arabidopsis inflorescence sRNA-seq libraries (Table 2.1) were trimmed using Cutadapt and aligned to the reference genome. The rates of 3' mismatches, by RNA size and position were plotted. Bars: mean values. Circles: biological replicates. Arabidopsis libraries that were analyzed are listed in Table 2.1.

We next tested whether the pattern of 3'-mismatches was due to general sequencing errors. To do this, we grouped aligned Arabidopsis sRNA-seq reads into genomic clusters and classified the clusters. Two non-mutually exclusive groups of miRNAs were identified: Those from our de novo analysis, and those listed as high-confidence loci in miRBase 21. Three groups of non-miRNA loci were also analyzed: Loci dominated by 21 nt siRNAs, loci dominated by 24 nt siRNAs (which we presume are mostly het-siRNAs), and loci where less than 80% of the aligned reads were 20-24 nts in length (Which we termed N loci). The N loci likely represent degraded tRNAs, rRNAs, mRNAs, and other cellular RNAs that are not related to the DCL / AGO regulatory system. The reads aligned to the N loci did not have high rates of 3'-mismatches (Figure 2.3A). High rates of 3'-mismatches were confined to certain RNA sizes within miRNA, 21 nt loci, and 24 nt loci (Figure 2.3A). These trends were not unique to our sRNA-seq libraries; the same

32

analysis procedure applied to previously published Arabidopsis sRNA-seq data (Table 2.1) gave similar results (Figure 2.4). The specificity of the high 3'-mismatch rate for miRNAs and siRNA loci, but not for degraded RNA loci, argues against sequencing errors as a major contributor. Based on these analyses, we conclude that the high 3'- mismatch rates seen for miRNAs and siRNAs in genome-aligned plant sRNA-seq data are due to the presence of non-templated nucleotides in vivo.

33

Figure 2.3 'Off-sized' miRNAs and siRNAs have high rates of 3' mismatches in Arabidopsis A. Mismatch rates for reference-aligned Arabidopsis sRNA-seq reads of the indicated clusters and sizes. 21: siRNA loci dominated by 21 nt RNAs. 24: siRNA loci dominated by 24 nt siRNAs (presumed het-siRNA loci). N: loci where less than 80% of reads were between 20-24 nts, and are thus presumed to not be miRNA or siRNA loci. miRNA- miRBase: High-confidence MIRNA loci from miRBase version 21. miRNA-de novo: MIRNA loci annotated by automated software. Note that most de novo annotated MIRNA loci overlapped prior miRBase annotations. Arabidopsis libraries that were analyzed are listed in Table 2.1. B. As in A, except showing normalized accumulation levels of small RNAs.

34

Figure 2.4 'Off-sized' miRNAs and siRNAs have high rates of 3'-mismatches in Arabidopsis – evidence from additional sRNA-seq libraries A. Mismatch rates for reference-aligned Arabidopsis sRNA-seq reads of the indicated sizes and clusters. 21: siRNA loci dominated by 21 nt RNAs. 24: siRNA loci dominated by 24 nt siRNAs (presumed het-siRNA loci). N: Loci where less than 80% of the aligned reads are 20-24 nts in length. miRNA-miRBase: High-confidence MIRNA loci from miRBase version 21. miRNA-de novo: MIRNA loci annotated by automated software. Note that most de novo annotated MIRNA loci overlapped prior miRBase annotations. Data from Zhai et al. (2013) (Table 2.1). Accession numbers of libraries were shown in Figure. B. As in A, except showing normalized accumulation levels of small RNAs. C. As in A, except for data from Zhai et al. (2015a) (Table 2.1). D. As in B, except for data from Zhai et al. (2015a) (Table 2.1).

35

2.3.2 'Off-sized' het-siRNAs and miRNAs often have higher rates of 3' end non- templated nucleotides In Arabidopsis, most abundant miRNAs are 21 nts in length (Figure 2.3B). Relatively low rates of non-templated 3' nucleotides were observed for the predominant sizes of miRNAs, 21 nt siRNAs, and 24 nt siRNAs (Figure 2.3A). By contrast, 'off-sized' small RNAs showed higher rates of 3' non-templated nucleotides, with the small RNAs one nucleotide longer than the predominant size usually having the highest rates (Figure 2.3A). For example, about 35.3% of 25 nt siRNAs and about 16.7% of 23 nt siRNAs aligned to the 24 nt-dominated clusters had a 3' end non-templated nucleotide. In contrast, only about 2.6% of 24 nt siRNAs aligned to the 24 nt-dominated clusters had a 3' end non-templated nucleotide. These trends were also apparent in other previously published Arabidopsis sRNA-seq data (Figure 2.4).

We further studied whether the elevated rate of 3' end non-templated nucleotides for 'off- sized' small RNA could be observed in other plant species. 17 sRNA-seq libraries from wild-type specimens of Solanum lycopersicum (Shivaprasad et al., 2012a; 2012b), Medicago truncatula (Fei et al., 2015), Zea mays (this study), Oryza sativa (Stroud et al., 2013a) and Physcomitrella patens (Coruh et al., 2015), along with 9 sRNA-seq libraries from Arabidopsis thaliana (same libraries as in Figures 2.3 and 2.4), were analyzed (Table 2.1). 22 nt siRNAs from the 24 nt-dominated siRNA clusters in Physcomitrella were included in the analysis, because 23 nt siRNAs and 24 nt siRNAs from the 24 nt- dominated siRNA clusters in Physcomitrella are similarly abundant (Coruh et al., 2015). Similar to Arabidopsis, we observed that small RNAs of the predominant size for their locus type usually have low rates of 3' non-templated nucleotides in all species examined (Figure 2.5). miRNAs and siRNAs with one nucleotide longer than predominant sizes in all clusters had higher rates of 3' end non-templated nucleotides, with 25 nt siRNAs from the 24 nt-dominated clusters usually having the highest levels (Figure 2.5). As in Arabidopsis, RNAs aligned to N clusters (which are unlikely to be miRNAs or siRNAs) usually did not have elevated rates of 3' non-templated nucleotides (Figure 2.5). We noted that 25 nt RNAs from Oryza sativa N clusters also had elevated rates of 3'-

36

mismatches (Figure 2.5). We speculate that this could be due to mis-classification of some clusters that are truly 24 nt siRNA dominated clusters as N clusters in this species.

Figure 2.5 'Off-sized' miRNAs and siRNAs have high rates of 3' end non-templated nucleotides in several plant species Mismatch rates for reference-aligned sRNA-seq reads of the indicated species, clusters, and sizes. 21: siRNA loci dominated by 21 nt RNAs. 24: siRNA loci dominated by 24 nt siRNAs (presumed het-siRNA loci). N: loci where less than 80% of reads were between 20-24 nts, and are thus presumed to not be miRNA or siRNA loci. miRNA-miRBase: High-confidence MIRNA loci from miRBase version 21. miRNA-de novo: MIRNA loci annotated by automated software. Note that most de novo annotated MIRNA loci overlapped prior miRBase annotations. Libraries that were analyzed are listed in Table 2.1.

37

2.3.3 Arabidopsis 23 nt siRNAs from the 24 nt-dominated clusters have a unique pattern of 3' non-templated nucleotides The 3'-most nucleotides of plant siRNAs and miRNAs are 2'-O-methylated by the HEN1 methyltransferase (Li et al., 2005; Yu et al., 2005). In the absence of HEN1 activity, miRNAs and siRNAs are subject to 3'-uridylation (Li et al., 2005). Thus, one hypothesis that could explain our observations is that HEN1 disfavors the 'off-sized' miRNAs and siRNAs, rendering them unmethylated and thus susceptible to 3'-end uridylation in the wild-type. Consistent with this hypothesis, U was a very frequent 3' non-templated nucleotide of Arabidopsis 25 nt siRNAs from the 24 nt-dominated clusters (Figure 2.6). These 25 nt siRNAs also tended to have a 5'-A, as do canonical 24 nt het-siRNAs (Figure 2.6). However, 23 nt siRNAs from the 24 nt-dominated clusters had a very different pattern. The non-templated 3'-most nucleotide for 23 nt siRNAs tended to be an A, while there is no strong tendency to have the canonical 5'-A (Figure 2.6). Surprisingly, position 22 of the 23 nt non-templated siRNAs has a very strong tendency to be a U or C (Figure 2.6). This pattern was also present in maize, rice, Medicago truncatula, and tomato (Figure 2.7). The U or C at position 22 is templated by the genome, suggesting that the 23 nt siRNAs with a 3'-mismatch occur when templated at specific genomic locations. The distinct preference of 3' non-templated nucleotides indicates that distinct mechanisms are likely to underlie their deposition on 23 nt and 25 nt siRNAs from Arabidopsis 24 nt- dominated loci.

38

Figure 2.6 Sequence features of siRNAs arising from Arabidopsis 24 nt-dominated siRNA loci Sequence logos for the indicated siRNAs aligned to the 24 nt-dominated siRNA loci. Three biological replicate sRNA-seq libraries from Arabidopsis inflorescence (Table 2.1) were merged for this analysis. Sequence features were analyzed by WebLogo 3.4.

39

Figure 2.7 Sequence features of 23 nt siRNAs arising from the 24 nt-dominated siRNA loci in indicated species Sequence logos for 23 nt siRNAs aligned to the 24 nt-dominated siRNA loci in indicated species. Biological replicate sRNA-seq libraries from indicated species (Table 2.1) were merged for this analysis. Sequence features were analyzed by WebLogo 3.4.

We next examined Arabidopsis sRNA-seq data from hen1 mutants, as well as the 3' uridylase mutants heso1 and urt1 (Wang et al., 2015b). As expected, increased levels of 3' non-templated uridylation were observed in the hen1 mutant for all sizes of siRNAs arising from the 24 nt-dominated clusters (Figure 2.8A). This uridylation was suppressed in hen1/heso1 and especially hen1/heso1/urt1 backgrounds. Similar trends of HESO1- and URT1-dependent 3' uridylation in the hen1 background were observed for miRNAs (Figure 2.8B). However, the 3' non-templated adenylation of 23 nt siRNAs from the 24 nt-dominated clusters decreased, not increased, in the hen1 mutant (Figure 2.8A). Together with the analysis of sequence motifs, these data suggest that the frequent 23 nt siRNAs with 3' non-templated A residues are produced by a mechanism distinct from the currently understood HESO1- and URT1-dependent terminal transferases.

40

Figure 2.8 3'-most non-templated adenines in 23 nt siRNAs arising from the 24 nt- dominated siRNA clusters are not dependent on HESO1 and URT1 A. Rates of Arabidopsis 3' non-templated nucleotides in siRNAs aligned to the 24 nt- dominated loci by siRNA size, 3'-most non-templated nucleotide, and genetic background. Data from Wang et al. (2015b) (Table 2.1). L-er: Landsberg erecta ecotype. B. As in A except for miRBase high-confidence miRNA clusters.

41

24 nt siRNAs in Arabidopsis are mostly generated by the Dicer-Like enzyme DCL3 (Xie et al., 2004; Henderson et al., 2006). However, DCL2 and DCL4 can also contribute 22 nt and 21 nt siRNAs, respectively, at loci dominated by 24 nt siRNAs (Henderson et al., 2006). Therefore, we tested whether the 23 nt siRNAs with a 3'-most non-templated nucleotide were derived from tailing of DCL2-dependent 22 nt siRNAs by examining sRNA-seq data from a dcl2/dcl4 double null mutant. 23 nt siRNAs with a 3'-most non- templated nucleotide were still present at the 24 nt-dominated siRNA loci in the dcl2/dcl4 background, and had the same pattern of nucleotide biases at their 3' ends (Figure 2.9). Thus we conclude that the population of 23 nt siRNAs with a 3' non-templated nucleotide is not generally produced by DCL2 or DCL4.

42

Figure 2.9 Non-templated nucleotide profile of small RNAs in dcl2/dcl4 mutant background A. Mismatch rates for reference aligned Arabidopsis thaliana sRNA-seq reads of the indicated sizes and clusters in dcl2/dcl4 mutant background. 21: siRNA loci dominated by 21 nt RNAs. 24: siRNA loci dominated by 24 nt siRNAs (presumed het-siRNA loci). N: loci where less than 80% of reads were between 20-24 nts, and are thus presumed to not be miRNA or siRNA loci. miRNA-miRBase: High-confidence MIRNA loci from miRBase version 21. miRNA-de novo: MIRNA loci annotated by automated software. Note that most de novo annotated MIRNA loci overlapped prior miRBase annotations. Libraries that were analyzed are listed in Table 2.1. B. As in A, except showing normalized accumulation levels of small RNAs. C. Sequence logos for 23 nt, 24 nt and 25 nt siRNAs with 3'-most non-templated nucleotide in the dcl2/dcl4 mutant background. Analyzed siRNAs were from the 24 nt- dominated siRNA clusters. Data from Zhai et al. (2015a) (Table 2.1).

43

Figure 2.10 Rates of non-templated nucleotides in siRNAs co-immunoprecipitated with Arabidopsis AGO4 A. Rates of non-templated nucleotides in AGO4-immunoprecipitated RNA, indicated by cluster type and RNA sizes. 24: the 24 nt-dominated siRNA loci. N: loci where less than 80% of the aligned reads were 20-24 nts in length. Data from Wang et al. (2011) (Table 2.1). B. As in A, except showing normalized accumulation levels of small RNAs. C. Sequence logos for perfectly aligned siRNAs arising from the 24 nt-dominated siRNA clusters. The four Arabidopsis AGO4 immunoprecipitation sRNA-seq libraries from Wang et al. (2011) (Table 2.1) were merged for this analysis. Sequence features were analyzed by WebLogo 3.4. D. As in C, except for reads with a 3'-most non-templated nucleotide.

44

2.3.4 23 nt siRNAs with a 3' non-templated nucleotide are infrequently bound to AGO4 We then studied whether siRNAs with a 3' non-templated nucleotide from the 24 nt- dominated clusters were loaded into AGO4 by analyzing Arabidopsis AGO4 immunoprecipitation sRNA libraries (Wang et al., 2011). The rate of 3' non-templated nucleotides in AGO4-immunoprecipitated 24 nt siRNAs was low (2.4%, Figure 2.10A). In contrast, 24.8% of AGO4-immunoprecipitated 25 nt siRNAs were carrying a 3' non- templated nucleotide (Figure 2.10A). Interestingly, compared to total RNA, a much reduced percentage of AGO4-associated 23 nt siRNAs had 3' non-templated nucleotides (16.7% in total RNA vs. 3.9% in the AGO4 immunoprecipitates; Figures 2.3A, 2.10A). All AGO4-associated siRNAs had a strong 5' adenine preference regardless of size and the presence of a 3' end non-templated nucleotide (Figures 2.10C-D). The strong 5'-A preference was not observed for 23 nt siRNAs with a 3' non-templated nucleotide in total RNA (Figure 2.6). Thus, we speculate that the known selectivity of AGO4 for 5'-A containing siRNAs (Mi et al., 2008) reduces the loading of 23 nt siRNAs with 3' non- templated nucleotides onto AGO4. URT1 and HESO1 can act upon AGO-bound small RNAs (Zhai et al., 2013; Tu et al., 2015). The fact that 25 nt siRNAs with a non- templated 3' nucleotide are bound to AGO4 at robust frequencies, coupled with their tendency to have 5'-As and a U as the non-templated 3' nucleotide, suggests they are HESO1 and/or URT1 substrates. However, the 23 nt siRNAs with a non-templated 3' nucleotide are not frequently bound to AGO4, suggesting that their 3' non-templated nucleotides are likely to be added at a step prior to AGO4 loading. However, it is also possible that the 23 nt siRNAs with a non-templated 3' nucleotide might instead be bound to an Argonaute other than AGO4.

2.3.5 23 nt siRNAs from the 24 nt-dominated clusters are mostly not 3'- or 5'- truncated variants of 24 nt siRNAs We next examined whether the 23 nt siRNAs from the 24 nt-dominated clusters are frequently 3' or 5' truncated variants of the more prevalent 24 nt siRNAs. The frequencies at which the 5' or 3' ends of Arabidopsis 23 nt siRNAs overlap with 24 nt siRNAs from

45

the same clusters were calculated. Lowly expressed clusters (those with less than 20 aligned 24 nt siRNAs) were excluded from analysis. The 23 nt siRNAs were infrequently 5'-truncated variants (median values ~25%) and 3'-truncated variants (median value ~12%) of 24 nt siRNAs (Figure 2.11A). 23 nt siRNAs with 3'-most non-templated nucleotide are even less frequently 5'- or 3'-truncated variants of 24 nt siRNAs (median values of 0%) (Figure 2.11A). 25 nt siRNAs, however, showed a higher tendency to be 3'-tailing variants (median values ~35%, for RNAs with 3'-most non-templated nucleotide), but not 5'-tailing variants (median values of 0%) of 24 nt siRNAs. As a control, we examined 20 nt and 22 nt RNAs arising from high confidence miRNA loci in Arabidopsis. The majority of these 20 nt and 22 nt RNAs were 3'-truncation or 3'-tailing variants of a more abundant 21 nt RNA (most often the major mature miRNA) (Figure 2.11B). These observations suggest that the 23 nt siRNAs present in Arabidopsis 24 nt- dominated loci are generally not 3'- or 5'-truncations of 24 nt siRNAs, whereas 25 nt siRNAs are more likely to be 3'-tailings of 24 nt siRNAs.

46

Figure 2.11 Analysis of 5'- and 3'-truncations and tailings for 'off-sized' Arabidopsis small RNAs A. Fractions of 23 and 25 nt siRNAs which shared coincident 5' or 3' ends with 24 nt siRNAs in robustly expressed (>= 20 24nt reads) in the 24 nt-dominated siRNA clusters. Boxplots show medians (horizontal lines), the 1st-3rd quartile range (boxes), other data out to 1.5 times the interquartile range (whiskers) and outliers (dots). Data from Arabidopsis inflorescences (Table 2.1). Illustrations below depict different possible relative 5'-end relationships between the canonical sized RNAs (gray) and off-sized RNAs (red). B. As in A, except for 20 nt and 22 nt RNAs aligned to high-confidence miRNA loci compared to 21 nt miRNAs.

2.3.6 Further properties of 23 nt siRNAs with a 3'-most non-templated nucleotide In the absence of DCL proteins, RNAs longer than 24 nts accumulate from Arabidopsis het-siRNA loci (Blevins et al., 2015; Li et al., 2015b; Yang et al., 2016; Ye et al., 2016; Zhai et al., 2015a). These RNAs, termed P4R2 RNAs because of their dependence on both DNA-dependent RNA polymerase IV (Pol IV) and RNA-dependent RNA polymerase 2 (RDR2), are likely to be the direct precursors for DCL3-catalyzed

47

production of het-siRNAs. Many P4R2 RNAs have 3' non-templated nucleotides which are thought to predominantly be located on the Pol IV-transcribed strand (Zhai et al., 2015a; Blevins et al., 2015; Yang et al., 2016). Zhai et al. (2015a) reported that cytosines were enriched in the template DNA sequence at the 3'-most non-templated positions of P4R2 RNAs and hypothesized that 5-methyl cytosines promote Pol IV transcriptional termination and addition of a non-templated nucleotide. We analyzed the DNA nucleotide frequencies at positions corresponding to the 3' non-templated nt in siRNAs from the 24 nt-dominated loci. Cytosines were strongly enriched at the 3'-most non- templated positions of 23 nt siRNAs, but not in other size groups (Figure 2.12A). We next tested if 23 nt siRNAs with a 3'-most non-templated nucleotide were frequently the passenger strand siRNA for 24 nt siRNAs. This analysis assumed a 2 nt offset of the 5' end (consistent with DCL ), and a 1 nt offeset on the 3' end (assuming addition of a non-templated nt to the end of an originally blunt-ended dsRNA). These arrangements were quite rare (Figure 2.12B). If the hypothesis that canonical 24 nt siRNAs mostly derive from the 5'-ends of the Pol IV transcribed strands of P4R2 RNAs is correct (Zhai et al., 2015a), then this observation suggests that the 23 nt siRNAs with a 3' non-templated nucleotide do not frequently originate from the RDR2-transcribed strand. Overall, these observations are consistent with the hypothesis that these 23 nt siRNAs are more frequently derived from the 3' ends of the Pol IV-transcribed strand of P4R2 RNAs as opposed to the 3' ends of the RDR2-transcribed strands.

48

Figure 2.12 Further properties of 23 nt siRNAs with a 3'-most non-templated nucleotide A. Enrichment/depletion analysis of genomic nucleotides corresponding to the 3'-most non-templated positions of 23 nt, 24 nt and 25 nt siRNAs arising from the 24 nt- dominated siRNA clusters. Three wild-type Arabidopsis sRNA-seq libraries (Table 2.1) were analyzed. Biological replicates are represented by circles. B. Frequency of 23 nt siRNAs with a 3’-most non-templated nucleotide that correspond to a reverse-complemented 24 nt siRNA in the 24 nt-dominated siRNA clusters. Three wild-type Arabidopsis sRNA-seq libraries were analyzed (Table 2.1). Lowly expressed clusters (those with less than 20 aligned 24 nt siRNAs) were excluded from analysis. Boxplots show medians (horizontal lines), the 1st-3rd quartile range (boxes), other data out to 1.5 times the interquartile range (whiskers) and outliers (dots).

2.4 Discussion Previous analyses of non-templated nucleotides in plant small RNAs have examined minor variants of highly abundant RNAs that do not contain any non-templated nucleotides (Zhai et al., 2013; Patel et al., 2016). This allows thorough analysis of both truncation and tailing variants. This approach is well-suited for miRNAs, for which a single dominant RNA sequence accumulates to high levels, allowing relatively easy identification of truncated and tailed variants. However, this method is ill-suited to characterize plant het-siRNAs because, unlike miRNAs, het-siRNA clusters are composed of multiple distinct siRNAs rather than a single dominant product. Here we show the utility of an alternative method to examine non-templated nucleotides in plant sRNA-seq data that can be applied to het-siRNAs: genome alignment. By allowing valid

49

alignments to contain one mismatch to the reference genome (provided that there are no zero-mismatch alignments for the read), we can capture RNAs with a single non- templated nucleotide. Importantly, control experiments rule out alternative explanations (software errors, sequencing errors, SNPs, and errors in the reference genome assembly) for the patterns of mismatched nucleotides.

Data from multiple plant species indicate that 'off-sized' 25 nt and 23 nt siRNAs that arise from clusters dominated by 24 nt siRNAs have high rates of single 3' non-templated nucleotides (Figure 2.5). We presume that most of these loci are het-siRNA loci, based on their dominant production of 24 nt siRNAs. Our data suggest that the 25 nt siRNAs with 3' non-templated nucleotides are added after AGO4 binding by the URT1 and/or HESO1 uridyltransferases (Figure 2.13). Supporting this hypothesis are the observations that the 25 nt siRNAs tend to have a 5'-A like the canonical 24 nt siRNAs (Figure 2.6), tend to have a U residue as the 3' non-templated residue (Figure 2.6), are increased in the hen1 background in a HESO1 and/or URT1-dependent manner (Figure 2.8), and are found associated with AGO4 (Figure 2.10). In contrast, the 'off-sized' 23 nt siRNAs with a 3' non-templated nucleotide had none of these features. Thus, we hypothesize that the 23 nt siRNAs with a 3' non-templated nucleotide are modified at a step prior to AGO4 binding in a manner insensitive to the presence or absence of the 2'-O-methyl modification deposited by HEN1 (Figure 2.13). It is worth noting that 25 nt siRNAs are rare at het-siRNA loci, but 23 nt siRNAs accumulate to significant levels in Arabidopsis (Figures 2.3 and 2.4). In Physcomitrella patens, 23 nt siRNAs accumulate to even higher levels at most het-siRNA loci (Cho et al., 2008; Coruh et al., 2015).

How might the 23 nt siRNAs with a 3' non-templated nucleotide arise? P4R2 RNAs tend to have a U or C at their 3' ends in vivo (Blevins et al., 2015; Yang et al., 2016), and in vitro transcription by Pol IV also results in RNAs with a strong enrichment of U or C at the 3' position (Blevins et al., 2015). Pol IV transcribed RNAs also tend to begin with an A or G at the 5' most nucleotide (Zhai et al., 2015a; Blevins et al., 2015; Yang et al., 2016), which means the 3' ends of the RDR2-transcribed second strands will also tend to be U or C. P4R2 RNAs also have a high frequency of 3' non-templated nucleotides

50

(Blevins et al., 2015; Yang et al., 2016; Zhai et al., 2015a). All of these data are consistent with a hypothesis where the 23 nt siRNAs with a 3' non-templated nucleotide result from DCL catalysis of a P4R2 precursor RNA that has a non-templated nucleotide (Figure 2.13). It is difficult at present to say whether the Pol IV-transcribed strand or the RDR2-transcribed strand contributes more to the pool of these 23 nt siRNAs. However, the observation that the corresponding DNA positions of the non-templated nucleotides tend to be C (Figure 2.12A), and that the 23 nt siRNA do not tend to be reverse complements of 24 nt siRNAs (Figure 2.12B) suggest they might more frequently derive from the Pol IV-transcribed strand. These inferences are based on the previous suggestions that the non-templated residues on Pol IV-transcribed RNAs tend to align with C's in the genome, and that the canonical 24 nt siRNAs are most often derived from the 5' ends of Pol IV-transcribed RNAs (Zhai et al., 2015a). Further testing of this hypothesis may shed light on the modes of DCL protein action and biogenesis of het- siRNAs in plants.

51

Figure 2.13 Model of 3' non-templated nucleotides in 'off-sized' siRNAs arising from the 24 nt-dominated siRNA clusters Model of 3' non-templated nucleotides in 'off-sized' siRNAs arising from the 24 nt- dominated siRNA clusters. Non-templated nucleotides are indicated by red letters.

52

Chapter 3 ‡ AGO4 is specifically required for heterochromatic siRNA accumulation at Pol V- dependent loci in Arabidopsis thaliana

3.1 Introduction 24 nucleotide (nt) heterochromatic small interfering RNAs (het-siRNAs) are usually loaded into ARGONAUTE4 (AGO4) to direct repressive chromatin modifications and subsequent transcriptional gene silencing via RNA-directed DNA Methylation (RdDM) (Zilberman et al., 2003; Qi et al., 2006). Het-siRNA-induced transcriptional silencing plays important roles in transposable element silencing, stress responses and genome stability (Law and Jacobsen, 2010; Matzke and Mosher, 2014). The production of het- siRNAs in Arabidopsis thaliana usually requires the plant-specific RNA POLYMERASE IV (Pol IV) (Onodera et al., 2005; Herr et al., 2005; Blevins et al., 2015; Zhai et al., 2015a), RNA-DEPENDENT RNA POLYMERASE 2 (RDR2) (Xie et al., 2004; Kasschau et al., 2007) and one or more DICER-LIKE (DCL) proteins (most predominantly DCL3; (Henderson et al., 2006). A second plant-specific RNA polymerase, Pol V, generates scaffold RNAs targeted by het-siRNAs associated with AGO4 (Wierzbicki et al., 2008; 2009). This targeting is thought to recruit the de novo DNA methyltranserase DOMAINS REARRANGED 2 (DRM2) to the local chromatin, which acts to catalyze 5-methylation of cytosines (Cao and Jacobsen, 2002; Zhong et al., 2014). The SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1) protein interacts with chromatin at Pol V transcribed loci, and recruits Pol IV to promote further siRNA biogenesis specifically from Pol V-dependent regions (Law et al., 2013; Zhang et al., 2013). Despite their positions at the effector portion of the RdDM pathway, Pol V and DRM2 are required for the accumulation of a subset of Pol IV-dependent het-siRNAs in Arabidopsis (Pontier et al., 2005; Mosher et al., 2008).

‡ This chapter is modified and reproduced from a manuscript submitted to The Plant Journal. Authors: Feng Wang and Michael J. Axtell Author contributions: Michael Axtell conceived of the project. Feng Wang generated transgenic plants, constructed small RNA-seq libraries and performed data analysis. Michael Axtell and Feng Wang wrote the manuscript.

53

The Arabidopsis genome has 10 AGO genes. AGO4, AGO6, AGO8, and AGO9 form a monophyletic clade (Vaucheret, 2008; Mallory and Vaucheret, 2010; Fang and Qi, 2016). AGO8 has been suggested as a pseudogene (Vaucheret, 2008). AGO4 and AGO6 both bind 24 nt het-siRNAs and contribute to the canonical RdDM pathway in a non- redundant fashion (Zheng et al., 2007; Havecker et al., 2010; Duan et al., 2015). AGO6 also binds 21 nt siRNAs and act as a key effector of the non-canonical RDR6-RdDM pathway (McCue et al., 2015; Panda et al., 2016). AGO9, which is primarily expressed in female gametes, interacts with het-siRNAs and silences TEs in female gametes (Olmedo- Monfil et al., 2010). Though AGO4, AGO6 and AGO9 are functionally related, the small RNA profile of an ago4/ago6/ago9 triple mutant has not been reported yet.

According to the current model of the RNA-directed DNA methylation (RdDM) pathway, the biogenesis of het-siRNAs depends on Pol IV, RDR2, and primarily DCL3 (Law and Jacobsen, 2010; Matzke and Mosher, 2014), while AGO4 is not directly required for the biogenesis of het-siRNAs. However, the accumulation of certain het-siRNAs was shown to be dependent on AGO4 in previous reports (Qi et al., 2006; Havecker et al., 2010). It has been hypothesized that the accumulation of a subset of het-siRNAs depends on AGO4-mediated target slicing (Qi et al., 2006). All 10 Arabidopsis AGOs have a conserved Asp-Asp-His (DDH) or Asp-Asp-Asp (DDD) motif thought to form a catalytic center for cleavage of target RNA. The target-slicing ability of AGO1 and AGO7 has been confirmed in vivo (Vaucheret, 2008; Fang and Qi, 2016). AGO10 can slice miRNA target in vitro, but it is still unclear if the slicer-activity is required for its function in plants (Zhu et al., 2011). AGO4, which specifically binds 24nt het-siRNAs, can slice synthetic het-siRNA targets in vitro (Qi et al., 2006) as well as the passenger-strand of het-siRNA duplexes in vivo (Ye et al., 2012). The in vitro and/or in vivo slicing ability of AGO4 is abolished by mutagenesis of the presumed (Qi et al., 2006; Ye et al., 2012). However, the genome-wide effects of AGO4 slicing on global small RNA accumulation have not been previously reported.

54

3.2 Materials and Methods 3.2.1 Plant materials and growth condition All Arabidopsis thaliana plants were grown at 21oC with 16 h light/8 h dark. ago4-4 (FLAG_216G02) was from INRA T-DNA transformants in the Wassilevskija (Ws) ecotype. ago6-2 (SALK_031553) and ago9-1 (SALK_127358) were from Salk T-DNA transformants in the Columbia-0 (Col-0) ecotype. The ago4-4/ago6-2/ago9-1triple mutant was generated by crossing ago4-4 to ago6-2 first, and crossing the ago4-4/6-2 double mutant to ago9-1. Homozygous mutants were selected by genotyping using primers that specifically amplify T-DNA inserted alleles. All the genotyping primers are listed in Supplemental Table 3.

3.2.2 Cloning of wild-type and slicing-defective AGO4 cDNA encoding AGO4 (AT2G27040) was amplified from Arabidopsis thaliana cDNA in Col-0 ecotype. A FLAG tag was inserted at the 5' of AGO4 cDNA right after the start codon by PCR. The FLAG-tagged AGO4 sequence was sub-cloned into the pGII0179 vector. A ~ 2 kb DNA sequence located upstream of the start codon of AGO4 in Col-0, and a ~ 500 bp DNA sequence downstream of stop codon of AGO4 in Col-0, were further sub-cloned into AGO4 expression vector as native promoter and terminator (pAGO4:FLAG-AGO4). Mutagenesis of the catalytic motif of AGO4 was performed by overlapping extension PCR. Primers with desired changes, which encode alanine instead of aspartic acid at the 742nd amino acid position of AGO4, were used to introduce slicing defective mutation. The wild-type AGO4 sequence in AGO4 expression vector was then swapped by mutagenized AGO4 to generate slicing-defective AGO4 expression vector (pAGO4:FLAG-AGO4-D742A). The hygromycin-B gene was inserted into both wild-type and slicing-defective AGO4 expression vectors for hygromycin resistance selection in transgenic plants. All primers used for subcloning are listed in Supplemental Table 3.

3.2.3 Plant transformation and transgenic plant selection Wild-type or slicing-defective AGO4 expression vector was introduced into ago4-4 or the ago4-4/ago6-2/ago9-1 background by floral dip with Agrobacterium tumefaciens strain

55

GV3101 bearing the pSOUP plasmid and designated expression vectors. Transgenic plants were selected on 1/2 strength Murashige-Skoog plates supplemented with 15mg/L Hygromycin-B. Independent transgenic lines with single insertion were selected in the T2 generation. Homozygous lines with comparable wild-type or slicing-defective AGO4 protein accumulation in the T3 generation were further selected to prepare sRNA-seq libraries.

3.2.4 sRNA-seq library preparation Libraries were constructed by using 1μg total RNA extracted from Arabidopsis immature inflorescence tissue as described in (Wang et al., 2016). Three biological replicates from each genotype were prepared. Raw data have been deposited at NCBI GEO under accession number GSE79119 (Col-0 samples) and GSE87333 (all other samples). Details for sRNA-seq libraries are listed in Table 3.1.

3.2.5 Differential expression analysis sRNA-seq data sets, including libraries from wild-type AGO4 and slicing-defective AGO4 transgenic lines in ago4-4 and ago4-4/ago6-2/ago9-1 background, mutant controls of ago4-4 and ago4-4/ago6-2/9-1, wild-type controls of Col-0 and Ws, were merged and run with ShortStack 3.3 (Johnson et al., 2016) with options --adapter TGGAATTC -- mincov 50. All sRNA-seq libraries were aligned to the Arabidopsis TAIR10 reference genome.

A matrix of raw read counts from de novo annotated small RNA clusters in all three biological replicates of different genotypes were used for differential expression analysis with the R package DESeq2 (Love et al., 2014). Clusters with at least a 2-fold change relative wild-type at a 1% false discovery rate were defined as differentially expressed.

To identify differentially expressed clusters in nrpe1 compared to Col-0, sRNA-seq data sets from a previous study (Lee et al., 2012) with three biological replicates of nrpe1-1 and three biological replicates of Col-0 were analyzed with the same pipeline as described above, except that small RNA clusters were previously annotated by analyzing

56

the AGO4-related data sets. sRNA-seq libraries used in this analysis are listed in Table 3.1.

3.2.6 Heatmap of small RNA accumulation in AGO4-dependent clusters To generate the heatmap for small RNA accumulation visualization, we first transformed read per million (RPM) data in AGO4-dependent cluters with the equation E = log2(Ri/Rm), where E is the input for heatmap, Ri is the RPM of a cluster in a sRNA-seq library, Rm is the mean RPM of a cluster across different sRNA-seq libraries been analyzed for the heatmap. The matrix of transformed RPM was then used for heatmap preparation with the R package pheatmap (Kolde, 2015).

3.2.7 Euler diagrams All Euler diagrams in this study were prepared with eulerAPE 3.0 (Micallef and Rodgers, 2014).

3.2.8 Small RNA accumulation in nrpd1-4, nrpe1-12, drm2-2, shh1-1, and ago4-4 sRNA-seq libraries from a study (Law et al., 2013) containing samples from nrpd1-4, nrpe1-12, drm2-2, shh1-1 and Col-0 were aligned to the Arabidopsis TAIR10 genome using ShortStack 3.3 (Johnson et al., 2016) with a locifile specifying small RNA clusters which were defined in our previous differential expression analysis. The 3' adapters were removed with the option --adapter TGGAATTC. Before log2-transformation, a value of

0.5 was added to all raw counts. Log2-transformed RPMs of the 24 nt siRNA clusters from nrpd1-4, nrpe1-12, drm2-2 and shh1-1 as well as Col-0 were plotted to illustrate small RNA accumulation in the 24 nt siRNA clusters. Log2-transformed RPMs of high- confidence miRNA genes were also plotted. The linear regression and 95% predicted intervals were calculated based on the distribution of high-confidence miRNA genes. Small RNA accumulation at 24 nt siRNA loci in indicated RdDM mutants was then normalized to corresponding wild-type plants, with equation N = log2

(RPMmutant/RPMWT). Statistical differences between the AGO4-dependent and AGO4- independent clusters were tested using the Mann-Whitney U test.

57

Table 3.1 Data sources and accession numbers of Arabidopsis thaliana sRNA-seq libraries Accession Genotype Ecotype 3'-Adapter (first Source number 8 nt) GSM2086247 Wild-type Col-0 TGGAATTC This study GSM2086248 Wild-type Col-0 TGGAATTC This study GSM2086249 Wild-type Col-0 TGGAATTC This study GSM2327936 Wild-type Ws TGGAATTC This study GSM2327937 Wild-type Ws TGGAATTC This study GSM2327938 Wild-type Ws TGGAATTC This study GSM2327939 ago4-4 Ws TGGAATTC This study GSM2327940 ago4-4 Ws TGGAATTC This study GSM2327941 ago4-4 Ws TGGAATTC This study GSM2327945 ago4-4/wtAGO4 Ws TGGAATTC This study GSM2327946 ago4-4/wtAGO4 Ws TGGAATTC This study GSM2327947 ago4-4/wtAGO4 Ws TGGAATTC This study GSM2327948 ago4-4/D742A Ws TGGAATTC This study GSM2327949 ago4-4/D742A Ws TGGAATTC This study GSM2327950 ago4-4/D742A Ws TGGAATTC This study GSM2327942 ago4-4/ago6-2/ago9-1 Col-0/Ws TGGAATTC This study GSM2327943 ago4-4/ago6-2/ago9-1 Col-0/Ws TGGAATTC This study GSM2327944 ago4-4/ago6-2/ago9-1 Col-0/Ws TGGAATTC This study GSM2327951 ago4-4/ago6-2/ago9-1/wtAGO4 Col-0/Ws TGGAATTC This study GSM2327952 ago4-4/ago6-2/ago9-1/wtAGO4 Col-0/Ws TGGAATTC This study GSM2327953 ago4-4/ago6-2/ago9-1/wtAGO4 Col-0/Ws TGGAATTC This study GSM2327954 ago4-4/ago6-2/ago9-1/D742A Col-0/Ws TGGAATTC This study GSM2327955 ago4-4/ago6-2/ago9-1/D742A Col-0/Ws TGGAATTC This study GSM2327956 ago4-4/ago6-2/ago9-1/D742A Col-0/Ws TGGAATTC This study GSM1103235 Col-0 Col-0 TGGAATTC (Law et al., 2013) GSM1103237 nrpd1-4 Col-0 TGGAATTC (Law et al., 2013) GSM1103238 nrpe1-4 Col-0 TGGAATTC (Law et al., 2013) GSM1103240 drm2-2 Col-0 TGGAATTC (Law et al., 2013) GSM1103239 shh1-1 Col-0 TGGAATTC (Law et al., 2013) GSM893112 Col-0 Col-0 CACTCGGG (Lee et al., 2012) GSM893113 Col-0 Col-0 CACTCGGG (Lee et al., 2012) GSM893114 Col-0 Col-0 CACTCGGG (Lee et al., 2012) GSM893115 nrpe1-1 Col-0 CACTCGGG (Lee et al., 2012) GSM893116 nrpe1-1 Col-0 CACTCGGG (Lee et al., 2012) GSM893117 nrpe1-1 Col-0 CACTCGGG (Lee et al., 2012)

58

3.3 Results 3.3.1 Accumulation of a subset of 24 nt het-siRNAs depends on AGO4 in Arabidopsis To systematically study the profile of AGO4-dependent het-siRNAs and the effect of AGO4 catalytic activity on het-siRNA accumulation, we expressed wild-type AGO4 (pAGO4:FLAG-AGO4-DDH, wtAGO4 hereafter) or slicing-defective AGO4 (pAGO4:FLAG-AGO4-DAH, D742A hereafter) driven by the native AGO4 promoter in both the ago4-4 single mutant background and the ago4-4/ago6-2/ago9-1 triple mutant background in Arabidopsis (Figure 3.1A). Three T3 transgenic plants with comparable levels of protein accumulation (Figure 3.1B) were used to prepare three biological replicate sRNA-seq libraries. It is worth noting that ago4-4 is in the Ws background, while ago6-2 and ago9-1 are in the Col-0 background. We therefore prepared three replicate control sRNA-seq libraries from both Ws and Col-0. We merged sRNA-seq libraries from the same genotype and aligned them to the reference genome to study the overall small RNA size distribution in tested samples. Loci dominated by the 24 nt small RNAs were the most abundant in all tested genotypes, and the fractions of small RNA from 24 nt small RNA-dominated loci were similar across different genotypes (Figure 3.2). sRNA-seq libraries from all backgrounds were then merged, aligned to the reference genome, followed by de novo definition of expressed small RNA clusters. The 24 nt siRNA clusters that were de novo annotated are listed in Supplemental Data Set 2.

59

Figure 3.1 Expression of wild-type AGO4 and slicing-defective AGO4 proteins in transgenic plants A. Schematic of transgenes. Indicated codons correspond to the catalytic residues required for slicing. Codon color-coded by red represented the mutagenesis of codon 742. B. Anti-FLAG immunoblot of FLAG-tagged AGO4 in T3 lines of the indicated transgenic plants. Transgenic lines that were chosen for sRNA-seq library preparation, based on approximately equal accumulation of AGO4 protein, are indicated by arrows.

60

Figure 3.2 Overall size profiles of small RNAs in tested genotypes Fractions of small RNA clusters with different predominant sizes in indicated genotypes are shown. R1, R2 and R3 represent three biological replicates.

We first examined siRNA accumulation in the Ws background, to compare ago4-4 to the wild-type. A differential expression analysis was performed by comparing raw read counts from our de novo annotated small RNA loci for all libraries in Ws background. A principal component analysis (PCA) plot was prepared to visualize the overall differences between samples (Figure 3.3A). The biological replicates were grouped together, indicating good reproducibility (Figure 3.3A). ago4-4/wtAGO4 grouped closely with the Ws wild-type, suggesting complementation of small RNA accumulation by expression of wtAGO4 in the ago4-4 background (Figure 3.3A). ago4-4 and ago4-4/D742A were distinct from each other and from the wild-type and ago4-4/wtAGO4 genotypes (Figure 3.3A).

Most differentially accumulated clusters were dominated by 24 nt siRNAs (Figure 3.3B). In ago4-4, 2,912 clusters were down-regulated relative to wild-type; we defined these as AGO4-dependent siRNA clusters (Figure 3.3B). Most of these (2,879) were dominated by 24 nt siRNAs. In contrast, only 121 clusters were down regulated in ago4-4/wtAGO4, indicating nearly full complementation of small RNA accumulation by wtAGO4 (Figure

61

3.3B). Intriguingly, an intermediate amount of clusters (1,541, Figure 3.3B) was down- regulated in ago4-4/D742A, which suggested that slicing-defective AGO4 partially recovers the accumulation of AGO4-dependent small RNAs. Most 24 nt-dominated siRNA clusters are not AGO4-dependent. Only about 18% of the de novo annotated 24 nt siRNA clusters, which contained about 22% of small RNAs in Ws wild-type, were dependent on AGO4 (Figure 3.3C).

Figure 3.3 Identification of AGO4-dependent small RNA clusters in Arabidopsis thaliana A. Principal component analysis demonstrating overall relationships between sRNA-seq libraries in Ws background. B. Number of differentially expressed (DE) clusters in the indicated genotypes and small RNA clusters compared with Ws wild-type. DE clusters were defined as clusters with at least 2-fold change compared to wild-type at a false discovery rate of 1%. The negative values indicate number of down-regulated clusters. C. Percentage of AGO4-dependent clusters and AGO4-dependent small RNAs in 24 nt siRNA loci. AGO4-dependent clusters were defined as clusters with at least 2-fold less accumulation in ago4-4 compared with Ws.

3.3.2 Accumulation of small RNAs in AGO4-dependent clusters requires NRPE1, DRM2 and SHH1 We classified the 16,061 de novo annotated 24 nt siRNA-dominated clusters into different groups based on AGO4-dependency (Supplemental Data Set 2). As stated above, 2,879 24 nt-dominated siRNA clusters were AGO4-dependent (FDR=0.01). We found another 1,359 24 nt-dominated clusters that were clearly AGO4-independent (FDR=0.01).

62

Another 354 24 nt-dominated clusters were up-regulated in ago4-4 (FDR=0.01), and the AGO4-dependency of the remaining 11,469 24 nt-dominated siRNA clusters could not be reliably inferred using our strict statistical tests, primarily due to low expression levels. We analyzed sRNA-seq accumulation from the AGO4-dependent and AGO4-independent clusters using data from nrpd1-4, nrpe1-12, drm2-2, and shh1-1 mutants (Law et al., 2013), using accumulation of clusters overlapping high-confidence MIRNA loci (Kozomara and Griffiths-Jones, 2014) as a control (Figure 3.4A). Note that NRPD1 and NRPE1 encode the catalytic sub-units of Pol IV and Pol V, respectively. In nrpd1, siRNA accumulation was strongly down-regulated in both AGO4-dependent and AGO4- independent clusters (Figure 3.4A). In contrast, AGO4-dependent clusters were much more strongly affected in the nrpe1, drm2, and shh1 backgrounds compared to AGO4- independent clusters (Figure 3.4A). We then normalized small RNA accumulation in AGO4-dependent and AGO4-independent het-siRNA clusters based on wild-type plants. We observed significantly reduced small RNA accumulation (Mann-Whitney test, p<0.01) in AGO4-dependent clusters relative to AGO4-independent clusters in all analyzed RdDM mutants except nrpd1 (Figure 3.4B). Using small RNA-seq data from nrpe1-1 plants (Lee et al., 2012), we defined 2,827 NRPE1-dependent small RNA clusters, the majority of which overlapped AGO4-dependent siRNA clusters (Figure 3.4C). This extent of overlap far exceeded the number expected by random chance (Figure 3.4D). Collectively, these data indicate that the subset of 24 nt dominated siRNA loci that depend on AGO4 for accumulation are those that are also dependent on NRPE1, DRM2, and SHH1.

63

Figure 3.4 AGO4-dependent and AGO4-independent 24 nt siRNA clusters in other RdDM mutants A. Small RNA accumulation from AGO4-dependent and AGO4-independent 24 nt siRNA clusters in indicated RdDM mutants. Log2-transformed reads per million (RPM) in indicated genotypes were plotted. A linear regression (solid line) and associated 95% prediction interval (dashed lines) was plotted based upon accumulation from clusters overlapping high-confidence MIRNA loci. B. Normalized small RNA accumulation in AGO4-dependent and AGO4-independent clusters in indicated RdDM mutants. Boxplots show medians (horizontal lines), the 1st- 3rd quartile range (boxes), 95% confidence of medians (notches), other data out to 1.5 times the interquartile range (whiskers) and outliers (dots). Asterisks indicate significant differences (Mann-Whitney U test, p<0.01) between AGO4-dependent and AGO4- independent clusters in the indicated mutant. C. Venn diagram showing the overlap of AGO4-dependent and NRPE1-dependent 24 nt siRNA clusters. D. Percentage of overlap between AGO4-dependent and NRPE1-dependent clusters. Overlaps expected by random chance were estimated by by randomly choosing 2827 and 2879 clusters from all 24 nt siRNA clusters. The mean and standard deviation (n=10) of randomly overlapping percentages are shown.

64

3.3.3 An AGO4 catalytic residue is required for full accumulation of most AGO4- dependent 24 nt siRNAs We then compared complementation of siRNA accumulation from AGO4-dependent clusters between the wtAGO4 and AGO4-D742A transgenic lines. Small RNA accumulation was recovered in wtAGO4 from nearly all AGO4-dependent clusters, but only from a small subset of loci in the slicing-defective AGO4-D742A plants (Figure 3.5A). We defined AGO4-D742A complemented loci as those that were significantly down-regulated in the ago4-4 background but not in the ago4-4/AGO4-D742A transgenic plants (Figure 3.5B). Conversely, AGO4-D742A non-complemented loci were defined as those that were significantly down-regulated in both ago4-4 and ago4-4/AGO4-D742A (Figure 3.5B). By this measure, half (49.9%) of the AGO4-dependent siRNA loci required AGO4 catalytic activity for their accumulation. Detailed examination of accumulation levels revealed that recovery was generally not to full wild-type levels at loci designated as complemented by AGO4-D742A (Figure 3.5C). We conclude that that the catalytic ability of AGO4 is important for full accumulation of most AGO4-dependent 24 nt siRNAs, but to varying degrees at different loci.

65

Figure 3.5 Slicing-defective AGO4-D742A partially complements small RNA accumulation from AGO4-dependent siRNA loci

A. Heatmap showing normalized (log2-transformed and mean-centered) small RNA accumulation from AGO4-dependent clusters in indicated genotypes and replicates. B. Euler diagram showing overlaps between down-regulated small RNA clusters (FDR=0.01) in the indicated genotypes compared to Ws wild-type. C: Complemented by the AGO4-D742A transgene; NC: Not complemented by the AGO4-D742A transgene. C. Normalized small RNA accumulation levels from AGO4-dependent loci that were complemented or not complemented by the AGO4-D742A transgene. The ratio of small RNA accumulation in indicated genotypes over that in the Ws wild-type was computed and then log2-transformed. Boxplots show medians (horizontal lines), the 1st-3rd quartile range (boxes), 95% confidence of medians (notches), other data out to 1.5 times the interquartile range (whiskers) and outliers (dots).

3.3.4 Slicing-defective AGO4 partially complements small RNA accumulation in the ago4-4/ago6-2/ago9-1 triple mutant AGO4, AGO6, and AGO9 have related but non-redundant functions in gene silencing, and all three can bind 24 nt siRNAs (Havecker et al., 2010). We obtained the triple mutant ago4-4/ago6-2/ago9-1 and analyzed small RNA expression levels from inflorescence tissue. Significant ecotype-specific changes in small RNA accumulation levels were observed between Ws (the parental background of the ago4-4 allele) and Col- 0 (the parental background of the ago6-2 and ago9-1 alleles) (Figure 3.6A). About 15% of the small RNA clusters had significant differential accumulation (FDR = 0.01) when comparing Ws and Col-0 (Figure 3.6B). Different small RNA accumulation in these DE

66

clusters was presumably caused by the different genetic backgrounds. We therefore excluded these loci from our analyses.

Figure 3.6 Divergence of small RNA accumulation between Col-0 and Ws A. Principal component analysis demonstrating overall relationships between sRNA-seq libraries. The first two principal components are shown for tested samples. B. MA plot highlighting in red small RNA clusters with at least 2-fold differences (FDR=0.01) between Col-0 and Ws ecotypes.

When analyzing the remaining, ecotype-insensitive small RNA clusters, we observed that the Col-0, Ws, and ago4-4/wtAGO4 samples were tightly grouped (Figure 3.7A). This demonstrates both the effective removal of clusters that have ecotype-specific differences in accumulation, as well as strong complementation by the wtAGO4 transgene. While ago4-4/ago6-2/ago9-1/wtAGO4 strongly diverged from ago4-4/ago6-2/ago9-1, ago4- 4/ago6-2/ago9-1/D742A showed only minimal differences from ago4-4/ago6-2/ago9-1 (Figure 3.7A). This implies that introduction of wild-type AGO4, but not a slicing- defective AGO4, can rescue much of the small RNA accumulation defects of the triple mutant. Full elimination of AGO4-clade AGOs didn't affect accumulation of the majority of the 24 nt siRNA clusters: About 22% (3005/13602) of the 24 nt siRNA clusters in Col- 0 were AGO4/AGO6/AGO9-dependent, and these clusters contributed only about 15% of the small RNA reads (Figure 3.7B). Only about 24% (719/3005) of the AGO4/AGO6/AGO9-dependent clusters were not complemented by wtAGO4 (Figure 3.7C), indicating that AGO6 and/or AGO9 are required for accumulation from relatively

67

few clusters. Similar to the single-mutant analysis (Figure 3.5), many of the AGO4/AGO6/AGO9-dependent clusters were not complemented by AGO4-D742A (Figure 3.7C). In addition, even the set of loci that were designated as complemented by AGO4-D742A still generally showed less accumulation then observed with the wtAGO4 transgene (Figure 3.7D). Overall, these analyses demonstrate that AGO4 is required for the accumulation of a much larger number of siRNAs compared to AGO6 and AGO9 in inflorescences, and that the slicing activity of AGO4 is required for full accumulation of most of these siRNAs.

68

Figure 3.7 Slicing-defective AGO4-D742A partially complements small RNA accumulation in the ago4-4/ago6-2/ago9-1 background A. Principal component analysis demonstrating overall relationships between all sRNA- seq libraries from ecotype-insensitive small RNA clusters. The first two principal components were shown for tested samples. B. Percentage of AGO4/AGO6/AGO9-dependent clusters and AGO4/AGO6/AGO9- dependent small RNAs in 24 nt siRNA loci. AGO4/AGO6/AGO9-dependent clusters were defined as clusters with at least 2-fold less accumulation in ago4-4/ago6-2/ago9-1 compared with Col-0. C. Euler diagram showing the number of significantly down-regulated small RNA clusters (FDR=0.01) in indicated genotypes compared with Ws wild-type. C: AGO4- D742A complemented clusters in ago4-4/ago6-2/ago9-1 background; NC: AGO4-D742A non-complemented clusters in ago4-4/ago6-2/ago9-1 background. D. Normalized small RNA accumulation in AGO4-D742A complemented and non- complemented clusters. The ratio of small RNA accumulation in indicated genotypes over that in the Ws wild-type was computed and then log2-transformed. Boxplots show medians (horizontal lines), the 1st-3rd quartile range (boxes), 95% confidence of medians (notches), other data out to 1.5 times the interquartile range (whiskers) and outliers (dots).

69

3.4 Discussion 3.4.1 Most 24 nt siRNAs do not require AGO4, AGO6, or AGO9 for accumulation AGO4 is required for 24 nt small RNA at some loci, but not others (Zilberman et al., 2003; Qi et al., 2006). Our genome-wide analysis confirms this observation, and quantifies the extent of the dichotomy: Most 24 nt siRNA loci are unaffected by loss of AGO4, while only a small subset have siRNA accumulation defects. Even when all three functional members of the AGO4 clade (Vaucheret, 2008) are removed in the ago4- 4/ago6-2/ago9-1 triple mutant, accumulation of 24 nt siRNAs from most loci is unaffected. This situation seems to contrast to the relationship between the major Arabidopsis miRNA-binding Argonaute AGO1 and miRNAs: In the null mutant ago1-3, accumulation of the majority of miRNAs is decreased (Vaucheret et al., 2004; Arribas- Hernández et al., 2016a). Why might the majority of 24 nt siRNAs maintain stable accumulation levels in the absence of AGO4, AGO6, and AGO9? One possibility is that they are stabilized by AGO3. Despite not being a member of the AGO4 clade, the Arabidopsis AGO3 is primarily associated with 24 nt siRNAs, and can partially complement the DNA methylation defects seen in the ago4 mutant (Zhang et al., 2016b). Alternatively, many 24 nt siRNAs might be stabilized by association with non-AGO RNA binding proteins, or perhaps not require protein binding at all.

3.4.2 AGO4-dependent siRNAs are likely secondary siRNAs Two models have been proposed to explain why some 24 nt siRNAs are dependent on AGO4. Qi et al. (2006) hypothesized that AGO4-dependent siRNAs might reflect target slicing-dependent secondary siRNA biogenesis similar to that which is sometimes observed from miRNA targets (Fei et al., 2013). In this model, double-stranded RNA could be synthesized from using AGO4-sliced primary transcripts, which are then further processed into 24 nt secondary siRNAs by DCL3. Because Pol V makes chromatin- associated, long non coding RNAs that are targeted by AGO4 (Wierzbicki, 2012), the sliced-secondary siRNA model predicts that AGO4-dependent siRNAs would also be NRPE1-dependent. Our analysis shows that this prediction is supported by the data: most NRPE1-dependent siRNA clusters are also AGO4-dependent, and vice-versa. However, we also found that AGO4-dependent siRNAs also tend to be DRM2-dependent. This isn't

70

an obvious prediction of the sliced-secondary siRNA model because DRM2, a de novo DNA methyltransferase, is thought to be recruited to chromatin in the vicinity of an AGO4-Pol V interaction. An alternative model proposed that an initial wave of de novo AGO4/Pol V-dependent DNA methylation at a locus could subsequently recruit Pol IV and thus produce secondary siRNAs in a self-reinforcing loop (Pontier et al., 2005). Our observation that ago4, nrpe1, drm2, and shh1 were all required for accumulation of the same subsets of 24 nt siRNA loci is fully consistent with the self-reinforcing loop model for secondary het-siRNAs. Intriguingly, much stronger reduction of het-siRNA accumulation was observed in nrpd1-4 than in shh1-1, suggesting that SHH1 may be specifically required for guiding Pol IV to the regions targeted by AGO4-dependent, self- reinforcing silencing.

3.4.3 On the role of AGO4-catalyzed slicing A full description of the functions of AGO4-catalyzed endonuclease activity (e.g. slicing) remains elusive. In other systems, two general functions of AGO-catalyzed slicing have been described: Slicing of passenger strands during AGO-loading of a small RNA duplex (Matranga et al., 2005), and slicing of target RNAs (Qi et al., 2005). For Arabidopsis AGO1, both in vitro and in vivo experiments demonstrate that AGO1-catalyzed slicing is not required for miRNA loading, but is required for many aspects of target regulation (Iki et al., 2010; Carbonell et al., 2012; Arribas-Hernández et al., 2016a; 2016b). In contrast, in vitro and in vivo data have demonstrated that AGO4-catalyzed slicing is required for passenger strand removal during siRNA loading and subsequent nuclear localization of the AGO4-siRNA complex (Ye et al., 2012). Although AGO4 can slice a free target RNA in vitro (Qi et al., 2006), to our knowledge there is no direct evidence of AGO4- catalyzed slicing of Pol V target RNAs in vivo. Our analysis showed that the catalytic capability of AGO4 is critical for the full accumulation of nearly all AGO4-dependent siRNAs. Many siRNAs were not rescued at all by slicing-defective AGO4, and even those that showed some degree of complementation almost never recovered to the extent allowed by complementation with the wild-type AGO4. The dependency of AGO4- dependent siRNAs upon AGO4-catalyzed slicing could be fully explained by defects in siRNA loading (Ye et al., 2012). In either the sliced-secondary siRNA or self-

71

reinforcement secondary siRNA models, lack of proper loading and subsequent nuclear localization of the 'primary' siRNAs would prevent accumulation of the AGO4-dependent sub-population.

Ye et al. (2012) reported that passenger strand removal mediated by AGO4 slicing is required for nuclear location of AGO4. Why could any complementation occur at all in the slicing defective mutant AGO4-D742A in our study? One hypothesis is that the passenger strand removal for proper AGO4-loading may not be completely dependent on slicing. AGO1-mediated slicing is not required for the unwinding of miRNA/miRNA* duplexes during AGO1-loading (Iki et al., 2010; Carbonell et al., 2012; Arribas- Hernández et al., 2016a; 2016b). Slicing-independent miRNA loading may be efficient because of the mismatches and bulges that are common in miRNA/miRNA* duplexes (Iki et al., 2010). In the case of AGO4-loading, where siRNA duplexes are perfectly complementary, a slicing-independent mechanism might still contribute to passenger strand removal, but with a much lower efficiency.

Whether or not AGO4-catalyzed slicing occurs at the targeting stage (e.g. in the nucleus upon targeted Pol V transcripts) remains unclear. If so, it would seem to present difficulties for the current model of RdDM, which supposes that a stable tethering of AGO4-siRNA complexes to nascent RNAs is required to recruit DRM2 to the vicinity. Conversely, if slicing is not used at the targeting stage, the challenge becomes understanding how it is prevented in vivo, given that in vitro AGO4-siRNA complexes are perfectly competent to direct target cleavage (Qi et al., 2006). Resolution of these questions is an important goal for the future that will further illuminate the mechanisms of RdDM.

72

Chapter 4 Systematic identification of het-siRNA and target interactions in Arabidopsis thaliana

4.1 Introduction ARGONAUTE4 (AGO4) is the major effector of canonical RNA-directed DNA methylation (RdDM) in Arabidopsis thaliana. It binds 24 nt het-siRNAs (Havecker et al., 2010), and recruits a de novo DNA methyltransferase DRM2 to catalyze cytosine methylation (Zhong et al., 2014). RNA POLYMERASE V (Pol V) synthesizes long non- coding RNAs at loci that were previously silenced by 24 nt het-siRNAs (Wierzbicki et al., 2008; Mosher et al., 2008; Wierzbicki et al., 2012). In the current RdDM model, AGO4 is guided to target loci through base-pairing of associated het-siRNA with Pol V transcripts (Wierzbicki et al., 2008; 2009; Böhmdorfer et al., 2014). Genome-wide analysis of AGO4 occupancy by chromatin-immunoprecipitation (ChIP) demonstrated that Pol V is generally required for AGO4 binding to chromatin (Zheng et al., 2013). However, the targets of het-siRNAs remain elusive. Despite the abundant genomic loci associated with Pol V in Arabidopsis (Zhong et al., 2012; Wierzbicki et al., 2012; Johnson et al., 2014), no single experimentally verified het-siRNA:target interaction has been reported so far. The difficulty of finding het-siRNA targets is probably due to the low abundance of Pol V transcripts and the heterogeneity of het-siRNAs. Therefore, it is important to develop a methodology that can identify het-siRNA targets.

In plants, miRNAs induce silencing primarily through AGO1-mediated target cleavage (Rogers and Chen, 2013). Plant miRNA target sites are often predicted based on their near-perfect complementarity to miRNAs (Rhoades et al., 2002; Jones-Rhoades and Bartel, 2004; Allen et al., 2005). AGO1 slices plant miRNA targets at the phosphodiester bonds between the 10th to 11th nucleotide counting from the 5' end of miRNAs, resulting in 5' and 3' fragments (Tang et al., 2003; Baumberger and Baulcombe, 2005; Qi et al., 2005). Many plant miRNA targets were verified by isolation of the cleaved fragments and subsequent gene-specific cloning by a method called RNA mediated-5' rapid amplification of cDNA ends (5' RLM-RACE) (Thomson et al., 2011). By combining 5'

73

RLM-RACE and deep-sequencing, a high-throughput method termed 'degradome sequencing' was developed for transcriptome-wide validation of plant miRNA targets (Addo-Quaye et al., 2008; German et al., 2008). The experimentally and computationally validated targets in plants reveal an agreement on the complementarity requirements for functional plant miRNA targeting: Extensive base-pairing at position 2-13 counting from the 5' end of miRNA, almost no mismatches at position 9-11 and more tolerance of mismatches at position 1 and the 3'-end of the miRNA (Wang et al., 2015a).

In contrast to plant miRNA target sites, most animal miRNA target sites do not display extensive complementarity to miRNAs (Axtell et al., 2011), and miRNA-mediated target slicing is much less common in animals (Yekta et al., 2004; Shin et al., 2010; Karginov et al., 2010; Moran et al., 2014). Animal miRNA-mediated target silencing mainly occurs through translational inhibition, and mRNA destabilization via deadenylation and decapping (Ameres and Zamore, 2013). Efficient animal miRNA targeting requires contiguous base-pairing to the 5' region of miRNAs at position 2-7 (the so-called miRNA 'seed' region) (Bartel, 2009). Additional pairing to the 3' region of miRNA can enhance targeting efficacy (Grimson et al., 2007). Occasionally, centered sites that have 11-12 contiguous base-pairs to the center region of miRNA are capable of mediating target repression (Shin et al., 2010).

Prediction and validation of miRNA targets in animals is challenging given that animal miRNAs can use as little as 6-8 nucleotides to confer repression on targets. Animal miRNA target sites are often validated by analysis of defective mutants of specific miRNAs, and by co-expression of miRNA and targets in reporter systems (Lai, 2004; Kuhn et al., 2008; Thomson et al., 2011). These methods are not capable of identifying target sites systematically on a transcriptome level, nor can they validate direct interactions between miRNA and its target. High-throughput sequencing of miRNA:mRNA duplexes following crosslinking and AGO-immunoprecipitation provides a transcriptome-wide approach of mapping AGO-bound miRNA:target interactions. This method, known as high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HIT-CLIP) or crosslinking-immunoprecipitation sequencing

74

(CLIP-Seq), has been performed to identify miRNA targets in various animal tissues and cell cultures (Chi et al., 2009; Zisoulis et al., 2010; Hafner et al., 2010; Leung et al., 2011), as well as piwi-interacting RNA (piRNA) targets in germplasm (Vourekas et al., 2016). Assigning target sites to miRNAs in CLIP-Seq experiments relies on computational prediction on the basis of complementarity. In a modified CLIP-Seq experiment, miRNA and targeted mRNA were experimentally ligated in AGO- immunoprecipitates, resulting in chimeras composed of both miRNA and target fragments (Grosswendt et al., 2014). Subsequent high-throughput sequencing of the chimeras provides extensive insights into the details of in vivo miRNA:target interactions (Grosswendt et al., 2014).

In Arabidopsis, AGO4 primarily binds 24 nt het-siRNAs (Havecker et al., 2010), and physically interacts with Pol V transcripts (Wierzbicki et al., 2008; 2009; Böhmdorfer et al., 2014). In this chapter, I report a modified AGO4-CLIP-Seq method to identify het- siRNA targets in Arabidopsis thaliana. For the first time in plants, a few het- siRNA:target interactions were identified, suggesting the potential power of this approach. The identified het-siRNA:target interactions revealed that het-siRNAs act both in cis and in trans, and induce target cleavage between the 10th to 11th nucleotide counting from the 5' end of het-siRNAs.

4.2 Materials and Methods 4.2.1 AGO4-CLIP for preparation of het-siRNA chimeras: Formaldehyde crosslinking and nuclei isolation Arabidopsis thaliana seedlings in Col-0 ecotype were grown on 1/2 strength Murashige and Skoog medium under 16-hour light/8-hour dark condition. About 2 g of two-week- old seedlings were vacuum infiltrated with 1% formaldehyde as described by Yamaguchi et al. (2014) to crosslink RNA to bound proteins. Nuclei of the formaldehyde-fixed tissue were isolated as described by Folta and Kaufman (2006), and then homogenized in 500 μl of cold PBS buffer by ultrasound sonication.

75

AGO4-immunoprecipitation AGO4 was immunoprecipitated from homogenized nuclei lysate using an endogenous AGO4 antibody (Agrisera, AS09617) coupled to Protein A magnetic beads (Invitrogen, Dynabeads) for 3 h at 4 oC. For this experiment, 500 μl nuclei lysate from about 2 g formaldehyde fixed seedlings, 50 μl magnetic beads and 5 μg AGO4 antibody was used.

RNase A/T1 treatment Immunoprecipitates were washed with low salt wash buffer (0.1 % SDS, 1 % Triton-X, 2mM EDTA, 20 mM Tris-HCI pH 8.1, 150 mM NaCI) for 5 min twice, and with high salt wash buffer (0.1 % SDS, 1 % Triton-X, 2mM EDTA, 20 mM Tris-HCI pH 8.1, 500 mM NaCI) for 5 min once. The immunoprecipitates were then resuspended in 500 μl

PNK-WB buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.5% NP-40, 50 mM NaCl, 5 mM beta-mercaptoethanol), and treated with 1 μl RNase A/T1 solution (200 μg/ml RNase A, 250 U/ml RNase T1) for 7 min at 20 oC. The sample was then chilled on ice and washed with cold PNK-WB buffer briefly for three times.

5' Phosphorylation and RNA ligation After RNase A/T1 treatment, the immunoprecipitates were resuspended in 80 μl phosphorylation solution containing 1x PNK buffer, 1mM rATP, 1 U/μl RNaseOUT (Thermo Fisher Scientific) and 0.5 U/μl T4 PNK minus (NEB). The 5' phosphorylation reaction was performed at 20 oC for 2 hours. The immunoprecipitates were then washed with PNK-WB buffer for 3 times and incubated at 16 oC overnight in 160 μl ligation solution containing 1x PNK buffer, 1mM rATP, 1 U/μl RNaseOUT and 0.25 U/μl T4 RNA ligase (NEB).

RNA elution and sequencing library preparation RNAs bound to immunoprecipitates were isolated using 1 ml TRIzol per the manufacturer's instruction. RNA libraries were prepared using the TruSeq Small RNA kit (Illumina) per the manufacturer's instructions. Paired-end sequencing with 75 cycles was carried out on MiSeq (Illumina).

76

4.2.2 Characterization of het-siRNA clusters in Arabidopsis seedlings Het-siRNA clusters in 2-week-old Arabidopsis seedlings were characterized by analysis of sRNA-seq libraries described by Zhang et al. (2014). For the Arabidopsis seedling sRNA-seq data sets, reads were aligned to Arabidopsis TAIR10 genome by ShortStack v2.1 (Axtell, 2013b) with option --adapter TGGAATTC. The de novo identified 24 nt small RNA-dominated clusters were presumably het-siRNA clusters.

4.2.3 Computational identification of het-siRNA chimeras For both ligated and control samples, only the reads in forward orientation from the paired-end sequences were analyzed. The 3' adapters were trimmed by Cutadapt (Martin, 2011) with options -a TGGAATTC -e 0.1 -O 5 -m 34. Reads in which no adapter was found or reads that were shorter than 34 nt after adapter trimming were discarded. To get het-siRNA:target chimeras, the 1 - 26 nt (counting from 5' end) of trimmed reads were aligned to Arabidopsis thaliana TAIR10 genome and transcriptome using Bowtie 1.1.1 with option –v 0 (Langmead et al., 2009). Reads in which the first 26 nt sequences could be perfectly aligned to Arabidopsis thaliana genome or transcriptome were discarded. The remaining reads were computationally split between the 24th to 25th nucleotides. Reads that fulfilled the following criteria were retained as candidate chimeras: 1. The 24- nt-long 5' read is derived from a defined het-siRNA cluster (please see 4.2.2); 2. Both the 5' and 3' fragments could be perfectly aligned to Arabidopsis thaliana genome or transcriptome. The 5' and 3' fragments would presumably be het-siRNAs and their target respectively.

The target sequences were extended for 20 nt on both 5' and 3' sides based on their genome or transcriptome alignments. The het-siRNAs and extended targets were forced to form RNA duplexes by RNAduplex 2.2.10 (Lorenz et al., 2011). For targets that could be aligned to multiple locations, all extended sequences were temporarily stored, and computationally hybridized to ligated het-siRNA. The sequence that formed the most stable duplex (indicated by minimum free energy, MFE) with the ligated het-siRNA was retained as het-siRNA target.

77

4.3 Results 4.3.1 Identification of het-siRNA:target interactions in Arabidopsis thaliana I modified an animal CLIP-seq protocol (Grosswendt et al., 2014) to study AGO4- mediated het-siRNA:target interactions in Arabidopsis thaliana (Figure 4.1A). Briefly, Arabidopsis thaliana seedlings were vacuum infiltrated with formaldehyde, which crosslinked nucleic acids to bound proteins. AGO4 was immunoprecipitated from nuclei lysate, treated with RNase A/T1 and purified. The purified AGO4-RNA complexes were treated with T4 PNK minus, which blocked the 3' ends of target sequences from ligation. T4 RNA ligase was then added to the sample to ligate interacting het-siRNAs and targets. To measure the endogenous non-specific ligation, a control sample was prepared without adding T4 RNA ligase. High-throughput sequencing libraries were prepared by using RNAs eluted from ligated and control samples. Candidate chimeras were identified through bioinformatic analysis (Figure 4.1B). Chimeras were split into 5' het-siRNAs and 3' target sequences. The 3' target sequences were extended based on their genomic alignments, and stable het-siRNA:target interactions were retained (Figure 4.1C).

78

Figure 4.1 Overall procedures for de novo identification of het-siRNA:target interactions A. Preparation of het-siRNA chimeras by AGO4-CLIP. RNAs bound to immunoprecipitated AGO4 were treated with RNase A/T1, ligated by T4 RNA ligase and deep sequenced. Control sample was not treated with T4 RNA ligase. B. Computational identification of het-siRNA chimeras. Chimeric read was computationally split into 5' het-siRNA fragment and 3' target fragment. The target fragment is extended based on its genome/transcriptome alignment. For the 3' fragment that can be aligned to multiple loci, all extended alignments are temporarily stored. The extended sequence that can form the most stable RNA duplex with het-siRNA is retained as target. C. An example of het-siRNA:target interaction recovered from het-siRNA chimera. Blue, het-siRNA sequence; Red, ligated target; Grey, extended sequence.

79

Figure 4.2 Characterization of de novo identified chimeras RNA duplexes were computationally formed by using het-siRNAs and extended 3' sequences recovered from chimeras. A. Numbers of chimeras in ligated and control samples. B. Classification of chimeras by k-means clustering. The number of clusters is determined by the elbow method. C. MFE ratios of cluster centers. D. MFE ratios of chimeras in ligated sample. E. MFE ratios of chimeras in control sample.

In the ligated sample, 344 chimeric reads passed the filtering process and were retained (Figure 4.2A). Unexpectedly, 153 reads were identified as chimeras in the control sample (Figure 4.2A). The control sample has not been treated with T4 RNA ligase, suggesting the presence of endogenous RNA ligase activity in Arabidopsis tissue lysate. To identify the real het-siRNA:target interactions in ligated sample, I first extended sequences ligated to het-siRNAs based on their genomic alignments, computationally forced them to form duplexes with het-siRNAs, and then calculated the ratios of minimum free energy (MFE) of these duplexes to the MFE of the same duplexes with perfect base-pairing (Figures 4.1B and C). Chimeric reads were partitioned to three clusters by k-means clustering based on their MFE ratios, as the overall variance substantially decreased from clusters of

80

two to clusters of three in both ligated and control samples (Figure 4.2B). In the ligated sample, a cluster apparently differed from the other two in that it had a center of very high MFE ratio (Figure 4.2C). All RNA duplexes recovered from this cluster displayed high MFE ratios (Figure 4.2D). In contrast, none of the RNA duplexes derived from control sample exhibit high MFE ratios (Figures 4.2C and E). The presence of chimeras that could give rise to duplexes with high MFE ratios was specific to ligated sample, as no chimeras showed MFE ratios higher than 0.6 in non-ligated sample (Figures 4.2D and E). Taken together, these ligated sample specific chimeras could well be derived from bona fide interactions of het-siRNAs and targets.

Figure 4.3 Penalty score of Cluster 3 chimeras in ligated and control sample RNA duplexes were computationally formed by using het-siRNAs and extended 3' sequences recovered from chimeras. Mismatches, gaps and bulges in the pairing region were penalized by 1; G-U wobbles were penalized by 0.5; and the penalty scores were doubled at position two to 13 relative to the 5' end of het-siRNA.

The complementarity between het-siRNAs and their recovered targets were measured by the Allen et al. score, which calculates penalties based on position-specific mismatches (Allen et al., 2005). Mismatches, gaps and bulges in the pairing region are penalized by 1;

81

G-U wobbles are penalized by 0.5; and the penalty scores are doubled at position two to 13 relative to the 5' end of miRNA. Het-siRNA:target interactions recovered from ligated sample generally displayed low penalty scores, suggesting that only limited mismatches or bulges are allowed for successful interaction (Figure 4.3). In contrast, all interactions recovered from the control sample exhibited very high penalty scores (Figure 4.3).

4.3.2 Het-siRNA induces target cleavage I then analyzed the regions of het-siRNAs that paired with ligated sequences recovered from chimeras in Cluster 3 (the cluster with the highest average MFE ratio). To do this, the het-siRNAs and their ligated sequences were split and forced to form duplexes directly without extension (recall that MFE ratio is calculated by using extended target sequences). Strikingly, in 16 out of 25 chimeras from the ligated sample, sequences ligated to het-siRNAs paired with position 11 - 24 of the corresponding het-siRNAs counting from the 5' end (Figure 4.4A). In 8 other chimeras, sequences ligated to het- siRNAs also paired with the 3' parts of het-siRNAs, resembling the 11 - 24 region (Figure 4.4A). Only one chimera had a ligated sequence showing extended pairing to het-siRNA (position 2 - 21, Figure 4.4A). In contrast, there were no predominant pairing patterns in the control sample: 26 different pairing regions were observed in 33 Cluster 3 chimeras, and none of them displayed the pairing region of 11 - 24 in het-siRNAs (Figure 4.4B).

82

Figure 4.4 Frequency of het-siRNA pairing regions in Cluster 3 chimeras The RNA duplexes were formed by directly using het-siRNAs and ligated sequences without extension in ligated sample (A) and control sample (B).

Why did the sequences ligated to het-siRNAs specifically pair with position 11 - 24 of het-siRNAs? A hypothesis would be that the het-siRNA targets are sliced at position between the 10th to 11th nucleotide counting from the 5' end of het-siRNAs, and only the 5' ends of targets are retained in AGO4. The details of recovered het-siRNA:target interactions are listed in Tables 4.1 - 4.4. The 5' ends of ligated sequences tended to pair with the 3' regions of het-siRNAs (position 11 - 24 for most interactions, Tables 4.1 and 4.2). The 3' extended sequences, which were retrieved from genome based on alignments of the ligated fragments and subsequent selection of most stable duplex if multiple alignments were found, usually displayed near-perfect complementarity to the 5' regions of het-siRNAs (Tables 4.1 to 4.3). Many ligated fragments were uniquely aligned before computational extension, suggesting that the interactions were not artifacts caused by duplex selection. It has been reported that proper het-siRNA loading requires AGO4- mediated cleavage of the guide-strand of the het-siRNA duplex (Ye et al., 2012). In the ligated sample, I identified three potential het-siRNA duplexes (indicated by their 2 nt 3'- overhangs and perfect complementarity after sequence extension), in which the guide- strands were also cleaved (Table 4.3). These observations suggest that het-siRNA targets are sliced upon interactions with het-siRNAs.

83

Table 4.1 Het-siRNA:target interactions with perfect complementarity

Het-siRNA pairing Target alignment Index Het-siRNA:target (extended) duplexes* region Het-siRNA alignment (Before extension) (Before extension) 1 5'-..CUGGACUGUCCGAAUUUCUCAUGUGUAUACAG. |||||||||||||||||||||||| CCUGACAGGCUUAAAGAGUACACA-5' 11 - 24 Chr1: 17116407-17116430 Chr1: 17116388-17116420 2 5'-..ACUUAACUCUCCUUCUCUCCUACUUUUUCUCA. |||||||||||||||||||||||| AAUUGAGAGGAAGAGAGGAUGAAA-5' 11 - 24 Chr2: 18957780-18957803 Chr2: 13404456-13404477 3 5'-..GUUAUCUUAAUGAUCCGACGGUCCAUAAUCAU. |||||||||||||||||||||||| AUAGAAUUACUAGGCUGCCAGGUA-5' 11 - 24 Chr5: 486904-486927 Chr5: 486914-486933 4 5'-..UGUGAAAAAAGUGUCUAGUUUCCAACUUAUCU. |||||||||||||||||||||||| 11 - 24 5 alignments 5 alignments ACUUUUUUCACAGAUCAAAGGUUG-5' 5 5'-..GCUUUCAUGGUGUAGCCAAAGUCCAUAUGAGU. |||||||||||||||||||||||| 11 - 24 75 alignments 137 alignments AAAGUACCACAUCGGUUUCAGGUA-5'

6 5'-..UCGACCCAAACUCGUCCUACCAUCUAAUCUCG. Chr2: 8028838-8028861 |||||||||||||||||||||||| 15 - 24 Chr5: 13562081-13562104 Chr5: 13734580-13734592 CUGGGUUUGAGCAGGAUGGUAGAU-5' Chr5: 13734566-13734589

* Computationally extended sequences are shaded in grey.

84

Table 4.2 Het-siRNA:target interactions with near-perfect complementarity

Het-siRNA pairing Target alignment Allen et al. Index Het-siRNA:target (extended) duplexes* region Het-siRNA alignment (Before extension) Score (Before extension) 7 5'-..ACAUCACUCGAACUAACGACCUCUAAGUUUCG. Chr3: 2199322-2199345 ||||||||||O |||| |||||| 13 - 23 Chr5: 9552698-9552714 6 CAGUGAGCUUGGCUGCUAGAGAUU-5' Chr5: 317530-317553 8 5'-..UGACUUGACACGUGGCAUGAUCCUAUGAGUUA. | |||||||||| ||||||||||| 13 - 22 Chr4: 4801784-4801807 29 alignments 3 UCAACUGUGCACAGUACUAGGAUA-5' 9 5'-..UUUGAUGGUGUAACCAAAGUCCGUAUGAGUGU. | |||||||| |||||||||O||| 11 - 24 74 alignments 8 alignments 3 AGUACCACAUCGGUUUCAGGUAUA-5' 10 5'-..ACAGCCAUUGUAAAUAAGUCCCAACUCCAGGA. |||| O|||||||||O|||||||| 11 - 24 6 alignments 15 alignments 2.5 UCGGCGACAUUUAUUUAGGGUUGA-5' 11 5'-..CUGUUUCUUCUUUUGUUUCCGCUCUGCACGUU. Chr1: 24531699-24531715 ||||||| |O||||||||||||| 11 - 24 Chr4: 7923452-7923475 2.5 CAAAGAAAAGAACAAAGGCGAGAA-5' Chr3: 19753959-19753975 12 5'-..GUUCAUAAUGGACAAUGGUCCGAAAAAUGUCC. O||||||||||||||||||||||| 11 - 24 5 alignments Chr3: 16787662-16787679 0.5 GGUAUUACCUGUUACCAGGCUUUU-5' 13 5'-..AUCAAGUCAACUCAACUCAAUCCGUAAACUCU. ||| |||||||||||||||||O| 11 - 24 6 alignments Chr5: 753941-743957 3 GUUGAGUUGAGUUGAGUUAGGUAC-5' 14 5'-..CUCCAUUUGACAGAUCCAUCCAACUGAUUCAU. ||||||||| |||| ||||||||O 11 - 24 Chr5: 20856730-20856753 Chr1: 25400503-25400533 3.5 GGUAAACUGACUAGAUAGGUUGAU-5' 15 5'-..GUUACUCGAAAGUUCUGGGCCGAAACCGAUAA. || ||| |||||||O||||||| 11-23 Chr4: 8465974-8465997 Chr3: 11811099-11811116 6 UUGUCCUU-CAAGACCUGGCUUUGA-5' 16 5'-..AGACUUGACAUGUAGCACAAUCCUAAACAAAG. ||||||||O|| |||| |||||| 11 - 24 10 alignments Chr1: 16860617-16860633 5.5 UGAACUGUGCACCGUGCUAGGAUA-5'

85

17 5'-..AAACCCGUCCACCAAAAACCCAACUUUAGAAA. Chr1: 56511-56531 |||||O|||||||||||||||||| 11 - 24 5 alignments Chr3: 8209529-8209549 0.5 UGGGCGGGUGGUUUUUGGGUUGAA-5' Chr5: 13417348-13417368 18 5'-..CUUC-CAGACCCGAGUUCGAUCCUCUAUGGAA. || |||||||||||||O |||||| 11 - 24 Chr2: 5052490-5052513 Chr5: 17833607-17833625 4 AGAGUCUGGGCUCAAGUCAGGAGA-5' 19 5'-..GAACUUAAACCGCAACCCGAUCUUGUAAGCCU. ||||||||||||||||||||||O| 11 - 24 27 alignments 230 alignments 1 UGAAUUUGGCGUUGGGCUAGAAUA-5' 20 5'-..GAUGUGAUGAUACAGGACGAAAAAUCAUGGUG. |||||||||| |O||||||||||| 11 - 24 Chr5: 18171464-18171487 Chr1: 17689784-17689813 2 ACACUACUAUAUUCUGCUUUUUAG-5' 21 5'-..AUGUUUAGAAUAUAAUUCAGAGUCAUAUCAUU. O|||||| |||||||||||||||| 13 - 24 12 alignments 13 alignments 1.5 UAAAUCUAAUAUUAAGUCUCAGUA-5'

* Computationally extended sequences shaded in grey.

86

Table 4.3 Potential het-siRNA duplexes

Het-siRNA pairing Target alignment Index het-siRNA:target (extended) duplexes* region Het-siRNA alignment (Before extension) (Before extension) 22 5'-UUUUAUAAGUGUUAACGUCCAUAC |||||||||||||||||||||| 11 - 22 Chr1: 641179-641202 5 alignments AGAAAAUAUUCACAAUUGCAGGUA-5' 23 5'-UUUAGUUACCAUUCACAGCCAUAA |||||||||||||||||||||| 11 - 22 Chr2: 15097797-15097820 5 alignments CAAAAUCAAUGGUAAGUGUCGGUA-5' 24 5'-CCUUCGGUGAGAAGUCCACUCUAA |||||||||||||||||||||| 11 - 22 Chr2: 7161264-7161287 4 alignments CAGGAAGCCACUCUUCAGGUGAGA-5'

* Computationally extended sequences are shaded in grey.

Table 4.4 RNA duplex with extensive mismatches

Het-siRNA pairing Target alignment Index Het-siRNA:target (extended) duplexes region Het-siRNA alignment (Before extension) (Before extension) 5'-..ACCAAGGAGUCUGACAUGUGUG-CGAGUCAACG Chr2: 6795-6829 25 OO|||||| |||||||| ||| 2 - 21 Chr2: 6089711-6089734 AAAUUUCAGAC-GUACACACUGCUA-5' Chr3: 14200766-14200800

87

4.3.3 Het-siRNAs can interact with targets in cis and in trans The RNA duplexes recovered from Cluster 3 chimeras from ligated sample could be classified into four groups: 6 duplexes with perfect complementarity (Table 4.1); 15 duplexes with near-perfect complementarity (Table 4.2); 3 potential het-siRNA duplexes (Table 4.3); 1 duplex with extensive mismatches (Table 4.4). The duplex with extensive mismatches could possibly be a false positive, as the target sequence was not sliced. For a few perfectly paired duplexes, the het-siRNAs and targets were generated from same loci, indicating that het-siRNAs can interact with target in cis (Table 4.1). More intriguingly, limited mismatches were allowed in het-siRNA:target interactions (Table 4.2). In these cases, het-siRNAs and their targets are generated from different loci, suggesting that het- siRNAs can interact with their targets in trans.

4.4 Discussion 4.4.1 Rules of het-siRNA targeting It has been known that het-siRNAs can target their own originating loci to enhance silencing (Wierzbicki et al., 2008; Mosher et al., 2008; Zhong et al., 2012; Wierzbicki et al., 2012). Therefore, perfect complementarity is expected for many het-siRNA:target interactions. It was not known whether het-siRNAs could interact with targets generated from trans loci. Het-siRNAs do not immediately interact with their targets after biogenesis. Instead, a lot of movement is required before targeting. Het-siRNAs are initially produced in the nucleus in the form of RNA duplexes, which are transferred to the cytoplasm for AGO4 loading, and then delivered back to the nucleus in AGO4 complexes to induce target silencing (Ye et al., 2012). In addition, het-siRNAs can move from shoot to root to regulate root DNA methylation in Arabidopsis (Melnyk et al., 2011b; Lewsey et al., 2016). It would be difficult to imagine that het-siRNAs only specifically select their biogenesis loci after these extensive movements.

The de novo identified het-siRNA:target interactions in my analysis directly demonstrate that het-siRNA can act in trans. The trans-acting ability of het-siRNA could be evolutionarily important for transposon silencing. Zhong et al. (2012) reported that Pol V is enriched in promoters and evolutionarily young transposons. Zheng et al. (2013)

88

reported that AGO4 preferentially targets transposons embedded in promoters and protein-coding genes. These observations imply that Pol V and AGO4 act together as a genomic surveillance system for newly inserted transposons. Given the trans-acting capacity of het-siRNAs, newly inserted transposons showing sequence similarity with the existing silenced transposons would be targeted by RdDM.

What is the complementarity requirement for functional het-siRNA targeting? In plants, miRNA:target interactions are commonly assessed by Allen et al. score, which calculates penalties based on position-specific mismatches (Allen et al., 2005). Transcriptome-wide computational analysis of plant miRNA targets led to a consensus that functional plant- miRNA:target interaction should have an Allen et al. score smaller than 4 (Addo-Quaye et al., 2008; German et al., 2008). Here in this study, the majority of de novo identified het-siRNA:target interactions also displayed Allen et al. scores smaller than 4 (Figure 4.4, Tables 4.1 and 4.2), indicating that the Allen et al. score can be used to evaluate het- siRNA:target interactions. It is not clear whether if pairing to any specific regions of het- siRNA is more important than others. It seems that the base-pairing rules for miRNA- target interactions can be applied to het-siRNA targeting. In the identified het- siRNA:target interactions, more mismatches were tolerated at position 12 – 24 and at position 1; fewer mismatches were observed at position 9-10; and no mismatch was observed at position 11 (Table 4.2). However, the sample size in the present analysis is far from sufficient. A larger population of het-siRNA:target interactions are needed to conclusively address this question.

4.4.2 Role of target cleavage The data reported here suggests that het-siRNA induces target cleavage upon functional interactions (Figure 4.4, Tables 4.1 to 4.3). Target cleavage is probably mediated by AGO4, as the chimeric reads are recovered from AGO4-immunoprecipitation. Previous work suggested that tethering AGO4-siRNA complexes to Pol V-dependent transcripts is required for the recruitment of DRM2 to the vicinity (Zhong et al., 2014). This leads to a critical question: Why would the Pol V- transcript be cleaved if it acts as a scaffold?

89

First of all, the slicer-capacity of AGO4 is important for het-siRNA accumulation. In Chapter 3, I demonstrated that the AGO4-dependent het-siRNAs are likely a group of secondary het-siRNAs, and the full accumulation of them requires slicer-capacity of AGO4. In our working model, the biogenesis of AGO4-dependent het-siRNAs is primarily a result of the self-reinforcing loop in RdDM. In this model, previously methylated loci are again targeted by RdDM machinery in a site-specific manner, giving rise to secondary het-siRNAs. In this wave of het-siRNA amplification, AGO4-slicer capacity mainly contributes to het-siRNA loading and reprogramming of AGO4 for subsequent nuclear localization. The slicing-defective AGO4 loses its capacity of nuclear localization (Ye et al., 2012), so that it cannot trigger the self-reinforcing loop of RdDM. However, it is unknown if AGO4-slicer capacity contributes to a second but minor wave of het-siRNA amplification, in which the cleaved het-siRNA targets can be processed into dsRNA for further siRNA biogenesis.

In addition, DRM2-mediated DNA methylation could be fine-tuned by target cleavage. The kinetics of AGO4-mediated slicing is completely unknown. AGO4 may not slice targets immediately after het-siRNA:target interactions. Instead, the stalling time of AGO4 on Pol V transcript could regulate the strength of DRM2 activity. In Schizosaccharomyces pombe, AGO1 mediates siRNA amplification and heterochromatin formation in both slicing-dependent and slicing-independent manners, and the residence time of AGO1 on chromatin determines the amount of repressive marks on targets (Jain et al., 2016). It will be intriguing to test if a similar mechanism is used in plants.

Alternatively, AGO4-mediated target cleavage may be a step after DRM2 activity. In this scenario, slicing would contribute to AGO4 recycling, Pol V release, and target degradation. Pol V associates with AGO4 by RNA-RNA interactions and by direct binding through the GW-WG domain in NRPE1. These interactions can 'lock' Pol V on chromatin, and inhibit it from moving forward. Upon target cleavage, Pol V is released and is capable of further transcription. In my AGO4-immunoprecipitation experiments, only the 5' fragments of cleaved targets are bound by AGO4. The remaining transcript

90

could still be associated with Pol V and serve as scaffold for more AGO4-siRNA complexes. This hypothesis awaits careful biochemical analysis.

In summary, the data reported here suggest that the successful interactions between het- siRNAs and their targets require extensive complementarities, and result in target cleavages. Importantly, we provide a powerful approach to examine the details of het- siRNA:target interactions, including the genomic coordinates of het-siRNA targets and the complementarity requirements of het-siRNA:target interactions. Future work should focus on improving this RNA-immunoprecipitation protocol to identify potential interactions more efficiently. Thorough understanding of the interactions between het- siRNAs and their targets will further clarify the effects of RdDM at a much higher resolution.

91

Chapter 5 Summary and Prospects

5.1 Summary 5.1.1 Non-templated nucleotides in het-siRNAs In Chapter 2, I systematically studied the non-templated nucleotides in het-siRNAs by investigating mismatches between het-siRNAs and reference genomes. An important observation is that non-templated nucleotides are often observed in 'off-sized' het-siRNAs, but much more rarely seen in 24 nt het-siRNAs. In Arabidopsis, 23 nt siRNAs accumulate to significant levels from het-siRNA loci. A lot of them have non-templated nucleotides at the 3'-end position. The 23 nt siRNAs with 3'-most non-templated nucleotides display unique characteristics: 1. The 3'-most non-templated nucleotide tends to be an adenosine; 2. The 22nd nucleotide, which is perfectly mapped to the genome, exhibits strong a preference for a pyrimidine; 3. The template DNA position of the non- templated nucleotide tends to be a cytosine. Coincidentally, het-siRNA precursors also have a high frequency of 3'-most non-templated nucleotides, and a pyrimidine preference at the 3' last-mapped position (Zhai et al., 2015a; Blevins et al., 2015). Taken together, we proposed that 23 nt siRNAs are derived from het-siRNA precursors and are by- products of het-siRNA biogenesis.

5.1.2 Role of AGO4 on het-siRNA accumulation In Chapter 3, I addressed a key mystery of RdDM: why are only some het-siRNAs dependent on AGO4 for accumulation? I characterized AGO4-dependent het-siRNAs in Arabidopsis thaliana at the transcriptome-wide level, and found that they are only a small subset of the total het-siRNA population. The accumulation of most het-siRNAs does not decrease even in an ago4/ago6/ago9 triple mutant. I showed that AGO4-dependent het- siRNAs significantly overlap with the subset of het-siRNAs that are dependent on Pol V or downstream factors including DRM2 and SHH1. In contrast, Pol IV is required for almost all het-siRNAs. This indicates that the AGO4-dependent het-siRNAs are secondary het-siRNAs whose biogenesis requires positioning of RdDM machinery at previously methylated loci.

92

In addition, my analysis showed that the catalytic capacity of AGO4 is required for the full accumulation of nearly all AGO4-dependent siRNAs. Many het-siRNAs were not rescued at all by slicing-defective AGO4, and even those that showed some degree of complementation almost never recovered to the same level of complementation with the wild-type AGO4.

5.1.3 Het-siRNA and target interactions By coupling AGO4-immunoprecipitation with high-throughput sequencing, I developed a novel method to identify in vivo het-siRNA:target interactions in Arabidopsis thaliana. For the first time, I characterized a few het-siRNA:target interactions, and found that het- siRNA can act both in cis and in trans. Successful interactions require perfect or near- perfect complementarity and result in target cleavage.

5.2 Prospects 5.2.1 Determinants for secondary het-siRNA biogenesis Given the fact that AGO4-dependent het-siRNAs are likely the secondary het-siRNAs, a critical question is: Why are these secondary het-siRNAs only produced from a small subset of het-siRNA loci? This may be explained by the distinct genomic occupancy of Pol IV and Pol V. Pol V transcription is not dependent on het-siRNA biogenesis or Pol IV function, suggesting independent transcriptional activity of Pol IV and Pol V (Wierzbicki et al., 2008; Mosher et al., 2008). Pol IV and Pol V are not likely to be recruited by specific DNA sequences, as no consensus motif was found in Pol IV and Pol V ChIP-seq experiments (Zhong et al., 2012; Wierzbicki et al., 2012; Johnson et al., 2014). Indeed, increasing evidence suggests that Pol IV and Pol V are recruited by chromatin marks. H3K9 dimethylation (H3K9me2) is associated with multiple RdDM target loci (Numa et al., 2010; Li et al., 2012). The H3K9me2 binding capacity of SHH1/DTF1 is required for CHH methylation and het-siRNA accumulation in Arabidopsis (Zhang et al., 2013; Law et al., 2013). This suggests the central role of H3K9me2 in Pol IV recruitment. Meanwhile, DNA methylation is required for Pol V recruitment. SUVH2 and SUVH9 act downstream of RdDM (Johnson et al., 2014; Liu et al., 2014c). They both contain a SET- and RING-associated (SRA) domain for binding

93

methylated DNA (Johnson et al., 2014; Liu et al., 2014c). Tethering SUVH2 to an unmethylated locus is sufficient for Pol V recruitment and RdDM establishment (Johnson et al., 2014). These observations suggest that Pol V is recruited to methylated DNA through SUVH2 and SUVH9 activity.

Pol V and Pol IV targeting possibly represents different stages of silencing. It has been reported that Pol V is enriched at gene promoters and evolutionarily young transposons (Zhong et al., 2012). Silencing initiation of these targets requires non-canonical RDR6- RdDM machinery (Nuthikattu et al., 2013; McCue et al., 2015; Panda et al., 2016). The transition from RDR6-RdDM to Pol IV-RdDM occurs several generations later to further enhance target silencing (Nuthikattu et al., 2013; Bond and Baulcombe, 2015). Epigenetic silencing of these targets is subsequently maintained by MET1, CMT3 and HDA6 (Law and Jacobsen, 2010; Blevins et al., 2014). In this sequential epigenetic silencing process, Pol V targeting preferentially acts as genome surveillance for newly inserted transposons. Pol V targeting may no longer be needed for some stably silenced loci, but Pol IV transcription is still retained. MET1 or an uncharacterized factor could be involved in CHH methylation in these stably silenced loci. This hypothesis is supported by the fact that Pol IV and Pol V are required for CHH methylation only in specific loci and about 50% of methylated CHH do not require Pol IV and Pol V (Wierzbicki et al., 2012).

Are the distinct genomic occupancies of Pol IV and Pol V determined by differential histone modification or DNA methylation? This could be tested by genome-wide analysis of the relationship between Pol V/AGO4-dependent het-siRNA loci and H3K9me2 ChIP- seq or methylome sequencing data sets. Future analysis should also address the question that whether evolutionarily young or ancient transposons display different repressive marks on chromatin.

94

5.2.2 Role of AGO4 mediated target cleavage The role of target cleavage in RdDM remains enigmatic. In the current RdDM model, stable tethering of AGO4-siRNAs with Pol V-synthesized scaffold RNA is required to recruit DRM2 for DNA methylation (Zhong et al., 2014). This raises a question: Is target cleavage necessary for de novo DNA methylation? For S. pombe, the catalytic capacity of AGO is dispensable for siRNA amplification and heterochromatin formation (Jain et al., 2016). In Arabidopsis, a defect in the AGO4 catalytic site inhibits AGO4 from entering the nucleus (Ye et al., 2012). Future experiments should test the DNA methylation levels of RdDM targets in an ago4 mutant complemented with a nuclear-localized, slicing- defective AGO4.

The de novo identified het-siRNA:target interactions in my AGO4-CLIP-seq experiment suggest that AGO4 mediates target cleavage in vivo. Target cleavage may regulate the time of AGO4 stalling on chromatin, which further fine-tunes DRM2 activity. Careful analysis of the relationship between AGO4-slicing kinetics and target methylation levels is needed to address this hypothesis. Alternatively, AGO4-mediated target cleavage could be a step after DRM2 activity and contribute to AGO4 recycling and Pol V release. Reconstructing Pol V transcription and AGO4 targeting in vitro will help understanding the stepwise occurrence of this process.

5.2.3 Improving AGO4-CLIP efficiency In Chapter 4, I described a novel method to identify in vivo het-siRNA:target interactions. Even though a few de novo interactions were identified, the signal to noise ratio was very low. Considering the abundance of het-siRNAs in Arabidopsis thaliana, much more het- siRNA:target interactions are expected. A few modifications could be included to improve AGO4-CLIP efficiency. First of all, the quantity and quality of isolated nuclei should be measured. Ye et al. (2012) reported that only a small fraction of het-siRNAs and AGO4 proteins are present in Arabidopsis nuclei. Therefore, a large amount of high quality nuclei would be needed to obtain enough AGO4-immunoprecipitates. This can be achieved by affinity isolation of labeled nuclei as previously described (Deal and Henikoff, 2010). Secondly, AGO4-immunoprecipitates should be washed more

95

stringently. It will be necessary to measure the sizes of AGO4-bound RNAs by Bioanalyzer (Agilent) or gel electrophoresis following AGO4-immunoprecipitation. Ideally, two discrete groups are expected: het-siRNAs with 24 nt in size and targets with longer sizes. Furthermore, the RNase A/T1 concentration needs to be titrated. The RNase A/T1 digested het-siRNA targets should be long enough to allow confident genome alignment. Overall, large-scale identification of het-siRNA:target interactions will provide insights into the high-resolution map of Pol V transcription sites as well as complementarity requirements for het-siRNA targeting.

5.2.4 RdDM in crops Future research is likely to reveal variation of RdDM in different species. The current model of RdDM is mostly based on studies in the model plant Arabidopsis thaliana. However, RdDM-defective Arabidopsis plants usually do not exhibit obvious morphological defects even with significant loss of het-siRNAs. This might be attributed to the facts that transposable elements (TEs) only account for about one fifth of the Arabidopsis thaliana genome (Buisine et al., 2008; Hu et al., 2011; Ahmed et al., 2011; Maumus and Quesneville, 2014), and that only few genes are closely localized to annotated TEs (Zhang, 2008). In Arabidopsis, RdDM primarily targets short TEs near genes, and thus only contributes to de novo DNA methylation in a minor fraction of the genome (Zhong et al., 2012; Wierzbicki et al., 2012; Zemach et al., 2013). Indeed, the majority of CHH methylation in Arabidopsis heterochromatin is facilitated by DDM1 and mediated by CMT2 (Zemach et al., 2013).

In contrast, mutants of RdDM components in maize exhibit obvious developmental defects. Maize has a relatively large genome (~2.3 Gb), and about 85% of its genome is composed of TEs (Schnable et al., 2009). The majority of genes are located within 1 kb of TEs in maize (Schnable et al., 2009). It is thus reasonable to expect RdDM to play much broader roles in regulating both transposons and nearby genes. Consistent with this notion, it has been shown that Pol IV acts immediately upstream and downstream of POL II-transcribed genes in maize, suggesting a previous unknown interaction between RdDM and gene expression (Erhard et al., 2015). In addition, RdDM-dependent CHH

96

methylation is enriched at the edges of TEs near genes (Gent et al., 2013; Regulski et al., 2013; Gent et al., 2014; Li et al., 2015a; Lunardon et al., 2016), and the methylation levels of these CHH enriched regions (mCHH islands) positively correlate with the expression levels of the proximal genes (Gent et al., 2013). Analysis of the RdDM mutants in maize suggests that the mCHH islands act as genomic boundaries to enhance TE silencing without attenuating nearby gene expression (Li et al., 2015a).

In both Arabidopsis and maize, RdDM tends to target loci in euchromatin instead of heterochromatin (Zhong et al., 2012; Wierzbicki et al., 2012; Zheng et al., 2013; Zemach et al., 2013; Gent et al., 2014; Lunardon et al., 2016). Maize provides an attractive model system to study how Pol II, Pol IV and Pol V selectively and/or cooperatively act on loci with different genomic features, as protein-coding genes and TEs are usually closely localized in maize genome. More intriguingly, as an outcrossing plant, maize is an ideal model plant to study the role of RdDM in imprinting, genomic interaction and hybrid vigor. Further research into these aspects will provide valuable knowledge for crop breeding.

5.2.5 RdDM: Beyond TE silencing The current research mainly focuses on the role of RdDM in TE silencing. Increasing evidence indicates that RdDM has a broader range of biological functions. Here I briefly discussed the emerging roles of RdDM in plant stress responses and intercellular signal communication, and proposed a few future research directions.

Stress responses Plants are subject to a remarkable variety of environmental stresses. It has been shown that RdDM components are actively involved in both biotic and abiotic stress responses. In Arabidopsis, ago4 mutants show enhanced disease susceptibility to the bacterium Pseudomonas syringae (Agorio and Vera, 2007). Similarly, increased disease susceptibility to necrotrophic fungal pathogens Botrytis cinerea and Plectosphaerella cucumerina was observed in nrpd2 mutant, which carries the second largest subunit of Pol IV and Pol V (López et al., 2011). It is currently unclear if the whole RdDM pathway

97

is required for plant immunity. It has been reported that while some RdDM components are required for plant pathogen responses, other components are dispensable (Agorio and Vera, 2007; López et al., 2011; Dowen et al., 2012). Growing evidence suggests that plant stress responses are regulated at epigenetic level, including dynamic DNA methylation and histone modification (López et al., 2011; Dowen et al., 2012). Arabidopsis methyltransferase mutant met1 displays enhanced defense responses against Pseudomonas syringae (Dowen et al., 2012; Yu et al., 2013). Global methylome analysis revealed that many genomic loci, especially edges of protein-coding genes, and TEs near genes, undergo dynamic changes of DNA methylation after pathogen infection (Dowen et al., 2012). Yu et al. (2013) hypothesized that systematic demethylation activates some immune-response genes that are closely linked to TEs or repeats. Interestingly, epigenetic regulation pathway can be a frontline of pathogen-plant warfare. Geminiviruses are single stranded DNA viruses in plants. They use host histones to pack them to form viral minichromosome. Arabidopsis suppresses geminivirus infection by inducing DNA methylation and H3K9me2 in the geninivirus minichromosome (Rodríguez-Negrete et al., 2009; Castillo-González et al., 2015). Meanwhile, geminiviruses encode a suppressor of the plant H3K9me2 histone methyltransferase SUVH4/KYP to escape the host surveillance (Castillo-González et al., 2015).

The RdDM pathway is also involved in abiotic stress responses. In Arabidopsis, mutants of RdDM components display reduced heat tolerance (Popova et al., 2013). Arabidopsis plants grown in low humidity condition show reduced stomatal index, which has been speculated to be caused by RdDM-dependent repression of genes involved in stomatal development (Tricker et al., 2012). In maize, a miniature transposon in NAC gene is subject to RdDM, resulting in repressed NAC accumulation and reduced drought tolerance (Mao et al., 2015). In rice, phosphate starvation induces transient hypermethylation of TEs close to highly induced genes to limit their potential deleterious effects (Secco et al., 2015).

Current research suggests that RdDM is involved in stress responses in plants. However, whether the observed DNA methylation and siRNA biogenesis are positive regulators of

98

plant adaptation or just by-products of chromosomal reprogramming following stress stimuli is unclear. It is also unknown how RdDM might mediate physiological changes of plants after stress stimuli. Another interesting question is whether RdDM-dependent epigenetic traits can serve as long-term memory of stresses. Further studies of these questions will provide insights into development of stress-tolerant plant at epigenome level.

Intercellular siRNA movement Future research should also address the mechanisms of intercellular siRNA movement. Het-siRNAs can move from cell to cell, and travel long distance from organ to organ (Melnyk et al., 2011a). In Arabidopsis, the TE-derived 21 nt siRNAs in pollen vegetative cells can move to sperm cells to enhance TE silencing in sperm (Slotkin et al., 2009; Martínez et al., 2016). This conclusion, however, is still controversial. Sun et al. (2013) showed that the artificial miRNAs generated from the vegetative cells in the petunia pollens could not travel into sperm cells to silence their target genes. Indirect evidence indicates that siRNAs also communicate between endosperm and embryo. Demethylation of the endosperm genome coincides with hypermethylation of embryo TEs, suggesting intercellular targeting of embryo TEs by siRNAs derived from endosperm (Hsieh et al., 2009). Het-siRNAs can systematically travel through the vascular system. In Arabidopsis, het-siRNAs are capable to move from leaves to roots through phloem, and mediate de novo DNA methylation in recipient tissues (Molnar et al., 2010; Melnyk et al., 2011b; Lewsey et al., 2016).

The movement of siRNAs can be important for plants. The siRNA mediated intercellular communication between pollen vegetative cells and sperms is critical for TE repression in sperms and potentially the next generation (Slotkin et al., 2009; Martínez et al., 2016). The long distance systematic movement of het-siRNAs could be critical for signal transduction between organs after environmental stress stimuli (Melnyk et al., 2011a). It is also tempting to speculate that the plant can even export het-siRNAs to suppress pathogen infection. Consistent with this hypothesis, it has been observed that cotton plants export miRNAs to inhibit virulence gene expression in pathogen (Zhang et al.,

99

2016a). Currently, the mechanism of siRNA movement is largely unknown, and no small RNA transporter has been identified so far. It is also unclear in what form (mature siRNAs or precursors) the siRNAs move long distance. As more research in this area in the future, we may better understand the biological function related to siRNA movement.

100

Appendix

List of Supplemental Tables

Supplemental Table 1 Data sources and accession number for simulated sRNA-seq libraries ...... 102 Supplemental Table 2 Newly identified MIRNA loci ...... 104 Supplemental Table 3 Primers used in this study ...... 105

101

Supplemental Table 1 Data sources and accession number for simulated sRNA-seq libraries Accession Ecotype or number Species Cultivar Genotype Tissue Citation GSM1330561 Arabidopsis thaliana Col-0 wild-type Young Leaf (Thatcher et al., 2015) GSM1330562 Arabidopsis thaliana Col-0 wild-type Mature Leaf (Thatcher et al., 2015) GSM1330563 Arabidopsis thaliana Col-0 wild-type Early senescence leaf (Thatcher et al., 2015) GSM1330564 Arabidopsis thaliana Col-0 wild-type Late senescence leaf (Thatcher et al., 2015) GSM1330566 Arabidopsis thaliana Col-0 wild-type mature silique (Thatcher et al., 2015) GSM1330567 Arabidopsis thaliana Col-0 wild-type early senescence silique (Thatcher et al., 2015) GSM1330568 Arabidopsis thaliana Col-0 wild-type late senescence silique (Thatcher et al., 2015) GSM1495677 Arabidopsis thaliana Col-0 wild-type Inflorescence (Slotkin et al., 2009) GSM1495679 Arabidopsis thaliana Col-0 wild-type Pollen (Slotkin et al., 2009) GSM1495680 Arabidopsis thaliana Col-0 wild-type Sperm (Slotkin et al., 2009) GSM1499347 Arabidopsis thaliana Col-0 wild-type floral tissue (Zhong et al., 2015) GSM1533527 Arabidopsis thaliana Col-0 wild-type inflorescence (Groth et al., 2014) GSM1533528 Arabidopsis thaliana Col-0 wild-type inflorescence (Groth et al., 2014) GSM1533529 Arabidopsis thaliana Col-0 wild-type inflorescence (Groth et al., 2014) GSM1227188 Arabidopsis thaliana Col-0 wild-type ten-day-old seedlings (Dinh et al., 2014) GSM1059887 Oryza sativa Nipponbare wild-type leaf (Raman et al., 2013) GSM1081563 Oryza sativa Nipponbare wild-type 3wk-old leaf tissue (Stroud et al., 2013a) GSM1081564 Oryza sativa Nipponbare wild-type 3wk-old leaf tissue (Stroud et al., 2013a) GSM1081565 Oryza sativa Nipponbare wild-type 3wk-old leaf tissue (Stroud et al., 2013a) GSM1229047 Oryza sativa Nipponbare wild-type lamina joints of the rice flag leaf (Wei et al., 2014) GSM278571 Oryza sativa Nipponbare wild-type 1-5 days after fertilization grains (Zhu et al., 2008) GSM278572 Oryza sativa Nipponbare wild-type 1-5 days after fertilization grains (Zhu et al., 2008) GSM489087 Oryza sativa Nipponbare wild-type shoots from four-leaf stage (He et al., 2010) GSM562946 Oryza sativa Nipponbare wild-type young seedlings (Song et al., 2012) GSM562947 Oryza sativa Nipponbare wild-type young panicles (4cm) (Song et al., 2012) GSM647192 Oryza sativa Nipponbare wild-type inflorescence (Barrera-Figueroa et al., 2012) GSM943193 Oryza sativa Nipponbare wild-type leaf (Chodavarapu et al., 2012) GSM1057326 Zea mays B73 wild-type seedling shoot (Chodavarapu et al., 2012) GSM1057327 Zea mays B73 wild-type seedling root (Chodavarapu et al., 2012) GSM1091764 Zea mays B73 wild-type mature pollens (Li et al., 2013b) GSM1091765 Zea mays B73 wild-type in-vitro germinated pollens (Li et al., 2013b)

102

GSM1091766 Zea mays B73 wild-type mature silks (Li et al., 2013b) GSM1091767 Zea mays B73 wild-type pollinated silks (Li et al., 2013b) GSM1160278 Zea mays B73 wild-type ears (female inflorescences) (Liu et al., 2014a) GSM1223589 Zea mays B73 wild-type leaf apex (Dotto et al., 2014) GSM1223591 Zea mays B73 wild-type leaf apex (Dotto et al., 2014) GSM1223593 Zea mays B73 wild-type leaf apex (Dotto et al., 2014) GSM433620 Zea mays B73 wild-type Leaves (Nobuta et al., 2008) GSM433621 Zea mays B73 wild-type Female inflorescence (ears) (Nobuta et al., 2008) GSM433622 Zea mays B73 wild-type Male inflorescence (tassels) (Nobuta et al., 2008) GSM448853 Zea mays B73 wild-type ear (Zhang et al., 2009) GSM448854 Zea mays B73 wild-type pollen (Zhang et al., 2009) GSM448855 Zea mays B73 wild-type root (Zhang et al., 2009) GSM448856 Zea mays B73 wild-type seedling (Zhang et al., 2009) GSM448857 Zea mays B73 wild-type tassel (Zhang et al., 2009) GSM918104 Zea mays B73 wild-type shoot apex (Barber et al., 2012) GSM921512 Zea mays B73 wild-type leaf (Kang et al., 2012) GSM921513 Zea mays B73 wild-type seed (Kang et al., 2012)

103

Supplemental Table 2 Newly identified MIRNA loci miRNA Gene Name Species Family Mature miRNA Sequence Length Stem-loop Coordinates Strand Genome Build mtr-MIRNEW1 Medicago truncatula MIR156 UUGACAGAAGAGAGAGAGCAC 21 nt chr3:23939420-23939543 - Mt4.0 mtr-MIRNEW2 Medicago truncatula MIR166 UCGGACCAGGCUUCAUUCCCC 21 nt chr8:39663069-39663188 - Mt4.0 mtr-MIRNEW3 Medicago truncatula MIR167 UGAAGCUGCCAGCAUGAUCUA 21 nt chr7:38348466-38348584 + Mt4.0 mtr-MIRNEW4 Medicago truncatula MIR390 AAGCUCAGGAGGGAUAGCGCC 21 nt chr3:26834609-26834798 + Mt4.0 mtr-MIRNEW5 Medicago truncatula MIR1509 UUAAUCAGGGAAAUCACAGUU 21 nt chr8:38022302-38022447 - Mt4.0 mtr-MIRNEW6 Medicago truncatula MIR2111 UAAUCUGCAUCCUGAGGUUUA 21 nt chr7:23584552-23584825 - Mt4.0 mtr-MIRNEW7 Medicago truncatula new family UCUUUCAAACAAUCCCAAAGU 21 nt chr3:847050-847220 - Mt4.0 mtr-MIRNEW8 Medicago truncatula new family UUACAAUUCCGCAAUGAUUAU 21 nt chr3:27828023-27828114 + Mt4.0 mtr-MIRNEW9 Medicago truncatula new family AUAAAACUGGGUGCAUCUACC 21 nt chr7:7121637-7121722 - Mt4.0 sly-MIRNEW1 Solanum lycopersicum MIR156 UUGACAGAAGAUAGAGAGCAC 21 nt SL2.40ch03:61720031-61720198 + SL2.40 sly-MIRNEW2 Solanum lycopersicum MIR162 UCGAUAAACCUCUGCAUCCAG 21 nt SL2.40ch03:58491539-58491626 + SL2.40 sly-MIRNEW3 Solanum lycopersicum MIR162 UCGAUAAACCUCUGCAUCCAG 21 nt SL2.40ch06:39463119-39463216 + SL2.40 sly-MIRNEW4 Solanum lycopersicum MIR166 UCGGACCAGGCUUCAUUCCCC 21 nt SL2.40ch01:79167048-79167163 + SL2.40 sly-MIRNEW5 Solanum lycopersicum MIR167 UGAAGCUGCCAGCAUGAUCUA 21 nt SL2.40ch09:59575908-59576005 + SL2.40 sly-MIRNEW6 Solanum lycopersicum MIR171 UGAUUGAGCCGUGCCAAUAUC 21 nt SL2.40ch03:6667748-6667993 + SL2.40 sly-MIRNEW7 Solanum lycopersicum new family AGGAGAUGUAUGCAAGAGCAA 21 nt SL2.40ch02:36976709-36976872 - SL2.40 zma-MIRNEW1 Zea mays new family UUUUCUGGCUGCCAAACUAGC 21 nt chr3:5671575-5671659 + AGPv3 zma-MIRNEW2 Zea mays new family AUACAUUUUCGGUCCUUAAAC 21 nt chr3:191058508-191058596 + AGPv3 zma-MIRNEW3 Zea mays new family ACCGGAGGAGGUUAGAGGAGC 21 nt chr4:195967281-195967517 - AGPv3 zma-MIRNEW4 Zea mays new family AUCACCUUCGGUUUUGUGGCU 21 nt chr1:73390302-73390423 - AGPv3 zma-MIRNEW5 Zea mays new family AAGAAUGGAGGCGGGAGGACC 21 nt chr1:299826859-299826964 - AGPv3

104

Supplemental Table 3 Primers used in this study Primer name 5' to 3' sequence Annotation

26.101 CGGAAGTAGTCAGCAACTGATACTTC AGO4 wild-type allele genotyping forward primer

26.102 CCAATGGGAATGAAAGTCCAA AGO4 wild-type allele genotyping reverse primer

20.80 TCTTAGAACGACAATGGTGG AGO6 wild-type allele genotyping forward primer

20.81 ACTCTAAGTGCATCCTGAGC AGO6 wild-type allele genotyping reverse primer

20.81 ACTCTAAGTGCATCCTGAGC ago6-2 mutant allele genotyping forward primer

20.49 GCGTGGACCGCTTGCTGCAAC ago6-2 mutant allele genotyping reverse primer

22.50 TTTTTCCTTTTTGCTTGTGGAT AGO9 wild-type allele genotyping forward primer

22.51 AACTTGCGTTACTTTTGGCATT AGO9 wild-type allele genotyping reverse primer

20.50 TTTTTCCTTTTTGCTTGTGGAT ago9-1 mutant allele genotyping forward primer

20.48 TGGTTCACGTAGTGGGCCATC ago9-1 mutant allele genotyping reverse primer

pAGO4F ACCGGGCCCGTTACAAAGCATCCGATCC AGO4 promoter forward primer

pAGO4R CATACTAGTCTCCTGCTCAAAGAAACCAAAC AGO4 promoter reverse primer

AGO4F GAGACTAGTATGGACTACAAGGATGACGATGACAAGGAT AGO4 cDNA forward primer TCAACAAATGGTAACGGA AGO4R TTATGCGGCCGCTTAACAGAAGAACATGGA AGO4 cDNA reverse primer

AGO4-ter-F TTAAGCGGCCGCATGAGCAGCCCTACTTGGCTCA AGO4 terminator forward primer

AGO4-ter-R AACTCCTAGGCATCAGCTCCAACCAATTTCA AGO4 terminator reverse primer

D742AF CATAATTTTCAGGGCTGGTGTGAGTGAATC D742A Mutagenesis forward primer

D742AR GATTCACTCACACCAGCCCTGAAAATTATG D742A Mutagenesis reverse primer

105

References

Addo-Quaye, C., Eshoo, T.W., Bartel, D.P., and Axtell, M.J. (2008). Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Current Biology 18: 758–762.

Adenot, X., Elmayan, T., Lauressergues, D., Boutet, S., Bouché, N., Gasciolli, V., and Vaucheret, H. (2006). DRB4-Dependent TAS3 trans-Acting siRNAs Control Leaf Morphology through AGO7. Current Biology 16: 927–932.

Agorio, A. and Vera, P. (2007). ARGONAUTE4 Is Required for Resistance to Pseudomonas syringae in Arabidopsis. THE PLANT CELL ONLINE 19: 3778– 3790.

Ahmed, I., Sarazin, A., Bowler, C., Colot, V., and Quesneville, H. (2011). Genome- wide evidence for local DNA methylation spreading from small RNA-targeted sequences in Arabidopsis. Nucleic Acids Res. 39: 6919–6931.

Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J.E., White, J., Sikkink, K. and Chandler, V.L. (2006). An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442: 295-298.

Allen, E., Xie, Z., Gustafson, A.M., and Carrington, J.C. (2005). microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants. Cell 121: 207–221.

Ameres, S.L. and Zamore, P.D. (2013). Diversifying microRNA sequence and function. Nat. Rev. Mol. Cell Biol. 14: 475–488.

Arribas-Hernández, L., Kielpinski, L.J., and Brodersen, P. (2016a). mRNA Decay of Most Arabidopsis miRNA Targets Requires Slicer Activity of AGO1. Plant Physiol. 171: 2620–2632.

Arribas-Hernández, L., Marchais, A., Poulsen, C., Haase, B., Hauptmann, J., Benes, V., Meister, G., and Brodersen, P. (2016b). The Slicer Activity of ARGONAUTE1 Is Required Specifically for the Phasing, Not Production, of Trans-Acting Short Interfering RNAs in Arabidopsis. Plant Cell 28: 1563–1580.

Aufsatz, W., Mette, M.F., van der Winden, J., Matzke, M., and Matzke, A.J.M. (2002). HDA6, a putative histone deacetylase needed to enhance DNA methylation induced by double stranded RNA. EMBO J. 21: 6832–6841.

Axtell, M.J. (2013a). Classification and comparison of small RNAs from plants. Annu Rev Plant Biol 64: 137–159.

Axtell, M.J. (2013b). ShortStack: comprehensive annotation and quantification of small RNA genes. RNA 19: 740–751.

106

Axtell, M.J., Jan, C., Rajagopalan, R., and Bartel, D.P. (2006). A Two-Hit Trigger for siRNA Biogenesis in Plants. Cell 127: 565–577.

Axtell, M.J., Westholm, J.O., and Lai, E.C. (2011). Vive la différence: biogenesis and evolution of microRNAs in plants and animals. Genome Biol. 12: 221.

Azevedo, J., Garcia, D., Pontier, D., Ohnesorge, S., Yu, A., Garcia, S., Braun, L., Bergdoll, M., Hakimi, M.A., Lagrange, T., and Voinnet, O. (2010). Argonaute quenching and global changes in Dicer homeostasis caused by a pathogen-encoded GW repeat protein. Genes Dev. 24: 904–915.

Barber, W.T., Zhang, W., Win, H., Varala, K.K., Dorweiler, J.E., Hudson, M.E., and Moose, S.P. (2012). Repeat associated small RNAs vary among parents and following hybridization in maize. Proc. Natl. Acad. Sci. U.S.A. 109: 10444–10449.

Barrera-Figueroa, B.E., Gao, L., Wu, Z., Zhou, X., Zhu, J., Jin, H., Liu, R., and Zhu, J.-K. (2012). High throughput sequencing reveals novel and abiotic stress- regulated microRNAs in the inflorescences of rice. BMC Plant Biol. 12: 132.

Bartel, D.P. (2009). MicroRNAs: Target Recognition and Regulatory Functions. Cell 136: 215–233.

Baumberger, N. and Baulcombe, D.C. (2005). Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc. Natl. Acad. Sci. U.S.A. 102: 11928–11933.

Berezikov, E. et al. (2011). Deep annotation of Drosophila melanogaster microRNAs yields insights into their processing, modification, and emergence. Genome Res. 21: 203–215.

Billi, A.C., Alessi, A.F., Khivansara, V., Han, T., Freeberg, M., Mitani, S., and Kim, J.K. (2012). The Caenorhabditis elegans HEN1 Ortholog, HENN-1, Methylates and Stabilizes Select Subclasses of Germline Small RNAs. PLoS Genet. 8: 84–99.

Blevins, T., Podicheti, R., Mishra, V., Marasco, M., Tang, H., and Pikaard, C.S. (2015). Identification of Pol IV and RDR2-dependent precursors of 24 nt siRNAs guiding de novo DNA methylation in Arabidopsis. Elife 4: e09591.

Blevins, T., Pontvianne, F., Cocklin, R., Podicheti, R., Chandrasekhara, C., Yerneni, S., Braun, C., Lee, B., Rusch, D., Mockaitis, K., Tang, H., and Pikaard, C.S. (2014). A Two-Step Process for Epigenetic Inheritance in Arabidopsis. Mol. Cell 54: 30–42.

Bohmert, K. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J. 17: 170–180.

Bond, D.M. and Baulcombe, D.C. (2015). Epigenetic transitions leading to heritable, RNA-mediated de novo silencing in Arabidopsis thaliana. Proc. Natl. Acad. Sci.

107

U.S.A. 112: 917–922.

Borges, F. and Martienssen, R.A. (2015). The expanding world of small RNAs in plants. Nat. Rev. Mol. Cell Biol. 16: 727–741.

Böhmdorfer, G., Rowley, M.J., Kuciński, J., Zhu, Y., Amies, I., and Wierzbicki, A.T. (2014). RNA-directed DNA methylation requires stepwise binding of silencing factors to long non-coding RNA. Plant J. 79: 181–191.

Böhmdorfer, G., Sethuraman, S., Rowley, M.J., Krzyszton, M., Rothi, M.H., Bouzit, L., and Wierzbicki, A.T. (2016). Long non-coding RNA produced by RNA polymerase V determines boundaries of heterochromatin. Elife 5: 1325.

Brodersen, P., Sakvarelidze-Achard, L., Bruun-Rasmussen, M., Dunoyer, P., Yamamoto, Y.Y., Sieburth, L., and Voinnet, O. (2008). Widespread Translational Inhibition by Plant miRNAs and siRNAs. Science 320: 1185–1190.

Brosseau, C. and Moffett, P. (2015). Functional and Genetic Analysis Identify a Role for Arabidopsis ARGONAUTE5 in Antiviral RNA Silencing. Plant Cell 27: 1742– 1754.

Buisine, N., Quesneville, H., and Colot, V. (2008). Improved detection and annotation of transposable elements in sequenced genomes using multiple reference sequence sets. Genomics 91: 467–475.

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics 10: 421.

Cao, X. and Jacobsen, S.E. (2002). Role of the Arabidopsis DRM Methyltransferases in De Novo DNA Methylation and Gene Silencing. Current Biology 12: 1138–1144.

Carbonell, A. and Carrington, J.C. (2015). Antiviral roles of plant ARGONAUTES. Curr. Opin. Plant Biol. 27: 111–117.

Carbonell, A., Fahlgren, N., Garcia-Ruiz, H., Gilbert, K.B., Montgomery, T.A., Nguyen, T., Cuperus, J.T., and Carrington, J.C. (2012). Functional analysis of three Arabidopsis ARGONAUTES using slicer-defective mutants. Plant Cell 24: 3613–3629.

Castillo-González, C., Liu, X., Huang, C., Zhao, C., Ma, Z., Hu, T., Sun, F., Zhou, Y., Zhou, X., Wang, X.-J., Zhang, X., and Weigel, D. (2015). Geminivirus- encoded TrAP suppressor inhibits the histone methyltransferase SUVH4/KYP to counter host defense. Elife 4: e06671.

Chen, H.-M., Chen, L.-T., Patel, K., Li, Y.-H., Baulcombe, D.C., and Wu, S.-H. (2010). 22-Nucleotide RNAs trigger secondary siRNA biogenesis in plants. Proc. Natl. Acad. Sci. U.S.A. 107: 15269–15274.

108

Chen, X. (2004). A MicroRNA as a Translational Repressor of APETALA2 in Arabidopsis Flower Development. Science 303: 2022–2025.

Chen, X. (2009). Small RNAs and Their Roles in Plant Development. http://dx.doi.org/10.1146/annurev.cellbio.042308.113417 25: 21–44.

Chen, X., Liu, J., Cheng, Y., and Jia, D. (2002). HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129: 1085–1094.

Chi, S.W., Zang, J.B., Mele, A., and Darnell, R.B. (2009). Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460: 479–486.

Cho, S.H., Addo-Quaye, C., Coruh, C., Arif, M.A., Ma, Z., Frank, W., and Axtell, M.J. (2008). Physcomitrella patens DCL3 is required for 22-24 nt siRNA accumulation, suppression of retrotransposon-derived transcripts, and normal development. PLoS Genet. 4: e1000314.

Chodavarapu, R.K., Feng, S., Ding, B., Simon, S.A., Lopez, D., Jia, Y., Wang, G.-L., Meyers, B.C., Jacobsen, S.E., and Pellegrini, M. (2012). Transcriptome and methylome interactions in rice hybrids. Proc. Natl. Acad. Sci. U.S.A. 109: 12040– 12045.

Coruh, C., Cho, S.H., Shahid, S., Liu, Q., Wierzbicki, A., and Axtell, M.J. (2015). Comprehensive Annotation of Physcomitrella patens Small RNA Loci Reveals That the Heterochromatic Short Interfering RNA Pathway Is Largely Conserved in Land Plants. Plant Cell 27: 2148–2162.

Creasey, K.M., Zhai, J., Borges, F., Van Ex, F., Regulski, M., Meyers, B.C., and Martienssen, R.A. (2014). miRNAs trigger widespread epigenetically activated siRNAs from transposons in Arabidopsis. Nature 508: 411–415.

Crooks, G.E., Hon, G., Chandonia, J.-M., and Brenner, S.E. (2004). WebLogo: a sequence logo generator. Genome Res. 14: 1188–1190.

Cuperus, J.T., Carbonell, A., Fahlgren, N., Garcia-Ruiz, H., Burke, R.T., Takeda, A., Sullivan, C.M., Gilbert, S.D., Montgomery, T.A., and Carrington, J.C. (2010). Unique functionality of 22-nt miRNAs in triggering RDR6-dependent siRNA biogenesis from target transcripts in Arabidopsis. Nat. Struct. Mol. Biol. 17: 997– 1003.

Daxinger, L., Kanno, T., Bucher, E., van der Winden, J., Naumann, U., Matzke, A.J.M., and Matzke, M. (2009). A stepwise pathway for biogenesis of 24-nt secondary siRNAs and spreading of DNA methylation. EMBO J. 28: 48–57.

Deal, R.B. and Henikoff, S. (2010). A simple method for gene expression and chromatin profiling of individual cell types within a tissue. Dev. Cell 18: 1030–1040.

109

Dinh, T.T. et al. (2014). DNA topoisomerase 1α promotes transcriptional silencing of transposable elements through DNA methylation and histone lysine 9 dimethylation in Arabidopsis. PLoS Genet. 10: e1004446.

Dotto, M.C., Petsch, K.A., Aukerman, M.J., Beatty, M., Hammell, M., and Timmermans, M.C.P. (2014). Genome-wide analysis of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development. PLoS Genet. 10: e1004826.

Dowen, R.H., Pelizzola, M., Schmitz, R.J., Lister, R., Dowen, J.M., Nery, J.R., Dixon, J.E., and Ecker, J.R. (2012). Widespread dynamic DNA methylation in response to biotic stress. Proc. Natl. Acad. Sci. U.S.A. 109: E2183–91.

Duan, C.-G. et al. (2015). Specific but interdependent functions for Arabidopsis AGO4 and AGO6 in RNA-directed DNA methylation. EMBO J. 34: 581–592.

Earley, K., Lawrence, R.J., Pontes, O., Reuther, R., Enciso, A.J., Silva, M., Neves, N., Gross, M., Viegas, W., and Pikaard, C.S. (2006). Erasure of histone acetylation by Arabidopsis HDA6 mediates large-scale gene silencing in nucleolar dominance. Genes Dev. 20: 1283–1293.

Ebbs, M.L. and Bender, J. (2006). Locus-specific control of DNA methylation by the Arabidopsis SUVH5 histone methyltransferase. Plant Cell 18: 1166–1176.

Ebbs, M.L., Bartee, L., and Bender, J. (2005). H3 lysine 9 methylation is maintained on a transcribed inverted repeat by combined action of SUVH6 and SUVH4 methyltransferases. Mol. Cell. Biol. 25: 10507–10515.

Ekdahl, Y., Farahani, H.S., Behm, M., Lagergren, J., and Ohman, M. (2012). A-to-I editing of microRNAs in the mammalian brain increases during development. Genome Res. 22: 1477–1487.

El-Shami, M., Pontier, D., and Lahmy, S. (2007). Reiterated WG/GW motifs form functionally and evolutionarily conserved ARGONAUTE-binding platforms in RNAi-related components. Genes Dev. 21, 2539-2544.

Erhard, K.F., Stonaker, J.L., Parkinson, S.E., Lim, J.P., Hale, C.J. and Hollick, J.B. (2009). RNA polymerase IV functions in paramutation in Zea mays. Science, 323:1201-1205.

Erhard, K.F., Talbot, J.-E.R.B., Deans, N.C., McClish, A.E., and Hollick, J.B. (2015). Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199: 1107–1125.

Fang, X. and Qi, Y. (2016). RNAi in Plants: An Argonaute-Centered View. Plant Cell 28: 272–285.

Fei, Q., Li, P., Teng, C., and Meyers, B.C. (2015). Secondary siRNAs from Medicago

110

NB-LRRs modulated via miRNA-target interactions and their abundances. Plant J. 83: 451–465.

Fei, Q., Xia, R., and Meyers, B.C. (2013). Phased, secondary, small interfering RNAs in posttranscriptional regulatory networks. Plant Cell 25: 2400–2415.

Folta, K.M. and Kaufman, L.S. (2006). Isolation of Arabidopsis nuclei and measurement of gene transcription rates using nuclear run-on assays. Nat Protoc 1: 3094–3100.

Frank, F., Hauver, J., Sonenberg, N., and Nagar, B. (2012). Arabidopsis Argonaute MID domains use their nucleotide specificity loop to sort small RNAs. EMBO J. 31: 3588–3595.

Frank, F., Sonenberg, N., and Nagar, B. (2010). Structural basis for 5'-nucleotide base- specific recognition of guide RNA by human AGO2. Nature 465: 818–822.

Fultz, D., Choudury, S.G., and Slotkin, R.K. (2015). Silencing of active transposable elements in plants. Curr. Opin. Plant Biol. 27: 67–76.

Gasciolli, V., Mallory, A.C., Bartel, D.P., and Vaucheret, H. (2005). Partially redundant functions of Arabidopsis DICER-like and a role for DCL4 in producing trans-acting siRNAs. Current Biology 15: 1494–1500.

Gent, J.I., Ellis, N.A., Guo, L., Harkess, A.E., Yao, Y., Zhang, X., and Dawe, R.K. (2013). CHH islands: de novo DNA methylation in near-gene chromatin regulation in maize. Genome Res. 23: 628–637.

Gent, J.I., Madzima, T.F., Bader, R., Kent, M.R., Zhang, X., Stam, M., McGinnis, K.M., and Dawe, R.K. (2014). Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. Plant Cell 26: 4903–4917.

German, M.A. et al. (2008). Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol. 26: 941–946.

Grigg, S.P., Canales, C., Hay, A., and Tsiantis, M. (2005). SERRATE coordinates shoot meristem function and leaf axial patterning in Arabidopsis. Nature 437: 1022– 1026.

Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27: 91–105.

Grosswendt, S., Filipchyk, A., Manzano, M., Klironomos, F., Schilling, M., Herzog, M., Gottwein, E., and Rajewsky, N. (2014). Unambiguous Identification of miRNA:Target Site Interactions by Different Types of Ligation Reactions. Mol. Cell 54: 1042–1054.

111

Groth, M., Stroud, H., Feng, S., Greenberg, M.V.C., Vashisht, A.A., Wohlschlegel, J.A., Jacobsen, S.E., and Ausin, I. (2014). SNF2 chromatin remodeler-family proteins FRG1 and -2 are required for RNA-directed DNA methylation. Proc. Natl. Acad. Sci. U.S.A. 111: 17666–17671.

Haag, J.R., Ream, T.S., Marasco, M., Nicora, C.D., Norbeck, A.D., Pasa-Tolic, L., and Pikaard, C.S. (2012). In Vitro Transcription Activities of Pol IV, Pol V, and RDR2 Reveal Coupling of Pol IV and RDR2 for dsRNA Synthesis in Plant RNA Silencing. Mol. Cell 48: 811–818.

Hafner, M. et al. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141: 129–141.

Hale C.J., Erhard K.F.Jr., Lisch D., Hollick J.B. (2009). Production and processing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genet 5: e1000598.

Han, M.-H., Goud, S., Song, L., and Fedoroff, N. (2004). The Arabidopsis double- stranded RNA-binding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc. Natl. Acad. Sci. U.S.A. 101: 1093–1098.

Harvey, J.J.W., Lewsey, M.G., Patel, K., Westwood, J., Heimstädt, S., Carr, J.P., and Baulcombe, D.C. (2011). An Antiviral Defense Role of AGO2 in Plants. PLoS ONE 6: e14639.

Havecker, E.R., Wallbridge, L.M., Hardcastle, T.J., Bush, M.S., Kelly, K.A., Dunn, R.M., Schwach, F., Doonan, J.H., and Baulcombe, D.C. (2010). The Arabidopsis RNA-Directed DNA Methylation Argonautes Functionally Diverge Based on Their Expression and Interaction with Target Loci. Plant Cell 22: 321–334.

He, G. et al. (2010). Global epigenetic and transcriptional trends among two rice subspecies and their reciprocal hybrids. Plant Cell 22: 17–33.

Henderson, I.R., Zhang, X., Lu, C., Johnson, L., Meyers, B.C., Green, P.J., and Jacobsen, S.E. (2006). Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nat. Genet. 38: 721–725.

Herr, A.J., Jensen, M.B., Dalmay, T., and Baulcombe, D.C. (2005). RNA polymerase IV directs silencing of endogenous DNA. Science 308: 118–120.

Horwich, M.D., Li, C., Matranga, C., Vagin, V., Farley, G., Wang, P., and Zamore, P.D. (2007). The Drosophila RNA Methyltransferase, DmHen1, Modifies Germline piRNAs and Single-Stranded siRNAs in RISC. Current Biology 17: 1265–1272.

Howell, M.D., Fahlgren, N., Chapman, E.J., Cumbie, J.S., Sullivan, C.M., Givan, S.A., Kasschau, K.D., and Carrington, J.C. (2007). Genome-wide analysis of the RNA-DEPENDENT RNA POLYMERASE6/DICER-LIKE4 pathway in Arabidopsis

112

reveals dependency on miRNA- and tasiRNA-directed targeting. THE PLANT CELL ONLINE 19: 926–942.

Hsieh, T.-F., Ibarra, C.A., Silva, P., Zemach, A., Eshed-Williams, L., Fischer, R.L., and Zilberman, D. (2009). Genome-Wide Demethylation of Arabidopsis Endosperm. Science 324: 1451–1454.

Hu, T.T. et al. (2011). The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43: 476–481.

Huang, Y., Ji, L., Huang, Q., Vassylyev, D.G., Chen, X., and Ma, J.-B. (2009). Structural insights into mechanisms of the small RNA methyltransferase HEN1. Nature 461: 823–827.

Ibrahim, F., Rymarquis, L.A., Kim, E.-J., Becker, J., Balassa, E., Green, P.J., and Cerutti, H. (2010). Uridylation of mature miRNAs and siRNAs by the MUT68 promotes their degradation in Chlamydomonas. Proc. Natl. Acad. Sci. U.S.A. 107: 3906–3911.

Iizasa, H., Wulff, B.-E., Alla, N.R., Maragkakis, M., Megraw, M., Hatzigeorgiou, A., Iwakiri, D., Takada, K., Wiedmer, A., Showe, L., Lieberman, P., and Nishikura, K. (2010). Editing of Epstein-Barr Virus-encoded BART6 MicroRNAs Controls Their Dicer Targeting and Consequently Affects Viral Latency. J. Biol. Chem. 285: 33358–33370.

Iki, T., Yoshikawa, M., Nishikiori, M., Jaudal, M.C., Matsumoto-Yokoyama, E., Mitsuhara, I., Meshi, T., and Ishikawa, M. (2010). In vitro assembly of plant RNA-induced silencing complexes facilitated by molecular chaperone HSP90. Mol. Cell 39: 282–291.

Jackson, J.P., Johnson, L., Jasencakova, Z., Zhang, X., PerezBurgos, L., Singh, P.B., Cheng, X., Schubert, I., Jenuwein, T., and Jacobsen, S.E. (2004). Dimethylation of histone H3 lysine 9 is a critical mark for DNA methylation and gene silencing in Arabidopsis thaliana. Chromosoma 112: 308–315.

Jain, R., Iglesias, N., and Moazed, D. (2016). Distinct Functions of Argonaute Slicer in siRNA Maturation and Heterochromatin Formation. Mol. Cell 63: 191–205.

Ji, L. et al. (2011). ARGONAUTE10 and ARGONAUTE1 regulate the termination of floral stem cells through two microRNAs in Arabidopsis. PLoS Genet. 7: e1001358.

Jia, Y., Lisch, D.R., Ohtsu, K., Scanlon, M.J., Nettleton, D., and Schnable, P.S. (2009). Loss of RNA–dependent RNA polymerase 2 (RDR2) function causes widespread and unexpected changes in the expression of transposons, genes, and 24- nt small RNAs. PLoS Genet. 5: e1000737.

Johnson, C., Kasprzewska, A., Tennessen, K., Fernandes, J., Nan, G.-L., Walbot, V., Sundaresan, V., Vance, V., and Bowman, L.H. (2009). Clusters and superclusters

113

of phased small RNAs in the developing inflorescence of rice. Genome Res. 19: 1429–1440.

Johnson, L.M., Du, J., Hale, C.J., Bischof, S., Feng, S., Chodavarapu, R.K., Zhong, X., Marson, G., Pellegrini, M., Segal, D.J., Patel, D.J., and Jacobsen, S.E. (2014). SRA- and SET-domain-containing proteins link RNA polymerase V occupancy to DNA methylation. Nature 507: 124–128.

Johnson, N.R., Yeoh, J.M., Coruh, C., and Axtell, M.J. (2016). Improved Placement of Multi-mapping Small RNAs. G3 (Bethesda) 6: 2103–2111.

Jones-Rhoades, M.W. and Bartel, D.P. (2004). Computational Identification of Plant MicroRNAs and Their Targets, Including a Stress-Induced miRNA. Mol. Cell 14: 787–799.

Kamminga, L.M., Luteijn, M.J., Broeder, den, M.J., Redl, S., Kaaij, L.J.T., Roovers, E.F., Ladurner, P., Berezikov, E., and Ketting, R.F. (2012). Hen1 is required for oocyte development and piRNA stability in zebrafish (vol 29, pg 3688, 2010). EMBO J. 31: 248–248.

Kang, M., Zhao, Q., Zhu, D., and Yu, J. (2012). Characterization of microRNAs expression during maize seed development. BMC Genomics 13: 360.

Karginov, F.V., Cheloufi, S., Chong, M.M.W., Stark, A., Smith, A.D., and Hannon, G.J. (2010). Diverse endonucleolytic cleavage sites in the mammalian transcriptome depend upon microRNAs, Drosha, and additional nucleases. Mol. Cell 38: 781–788.

Kasschau, K.D., Fahlgren, N., Chapman, E.J., Sullivan, C.M., Cumbie, J.S., Givan, S.A., and Carrington, J.C. (2007). Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol. 5: 479–493.

Kawahara, Y., Megraw, M., Kreider, E., Iizasa, H., Valente, L., Hatzigeorgiou, A.G., and Nishikura, K. (2008). Frequency and fate of microRNA editing in human brain. Nucleic Acids Res. 36: 5270–5280.

Kawahara, Y., Zinshteyn, B., Sethupathy, P., Iizasa, H., Hatzigeorgiou, A.G., and Nishikura, K. (2007). Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315: 1137–1140.

Kirino, Y. and Mourelatos, Z. (2007). Mouse Piwi-interacting RNAs are 2 “-O- methylated at their 3 ” termini. Nat. Struct. Mol. Biol. 14: 347–348.

Kolde, R. (2015). pheatmap: Pretty Heatmaps. R package version 1.0. 2.

Kozomara, A. and Griffiths-Jones, S. (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42: D68–73.

Kruszka, K., Pacak, A., Swida-Barteczka, A., Stefaniak, A.K., Kaja, E., Sierocka, I.,

114

Karlowski, W., Jarmolowski, A., and Szweykowska-Kulinska, Z. (2013). Developmentally regulated expression and complex processing of barley pri- microRNAs. BMC Genomics 14: 34.

Kuhn, D.E., Martin, M.M., Feldman, D.S., Terry, A.V., Jr., Nuovo, G.J., and Elton, T.S. (2008). Experimental validation of miRNA targets. Methods 44: 47–54.

Kumar, S., Stecher, G., and Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 33: 1870–1874.

Kume, H., Hino, K., Galipon, J., and Ui-Tei, K. (2014). A-to-I editing in the miRNA seed region regulates target mRNA selection and silencing efficiency. Nucleic Acids Res. 42: 10050–10060.

Kurihara, Y., TAKASHI, Y., and WATANABE, Y. (2006). The interaction between DCL1 and HYL1 is important for efficient and precise processing of pri-miRNA in plant microRNA biogenesis. RNA 12: 206–212.

Kurth, H.M. and Mochizuki, K. (2009). 2'-O-methylation stabilizes Piwi-associated small RNAs and ensures DNA elimination in Tetrahymena. RNA 15: 675–685.

Lai, E.C. (2004). Predicting and validating microRNA targets. Genome Biol. 5: 115.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10: R25.

Law, J.A. and Jacobsen, S.E. (2010). Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet. 11: 204–220.

Law, J.A., Du, J., Hale, C.J., Feng, S., Krajewski, K., Palanca, A.M.S., Strahl, B.D., Patel, D.J., and Jacobsen, S.E. (2013). Polymerase IV occupancy at RNA-directed DNA methylation sites requires SHH1. Nature 498: 385–389.

Lee, T.-F., Gurazada, S.G.R., Zhai, J., Li, S., Simon, S.A., Matzke, M.A., Chen, X., and Meyers, B.C. (2012). RNA polymerase V-dependent small RNAs in Arabidopsis originate from small, intergenic loci including most SINE repeats. Epigenetics 7: 781–795.

Leung, A.K.L., Young, A.G., Bhutkar, A., Zheng, G.X., Bosson, A.D., Nielsen, C.B., and Sharp, P.A. (2011). Genome-wide identification of Ago2 binding sites from mouse embryonic stem cells with and without mature microRNAs. Nat. Struct. Mol. Biol. 18: 237–244.

Lewsey, M.G., Hardcastle, T.J., Melnyk, C.W., Molnar, A., Valli, A., Urich, M.A., Nery, J.R., Baulcombe, D.C., and Ecker, J.R. (2016). Mobile small RNAs regulate genome-wide DNA methylation. PNAS 113: E801–E810.

115

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.

Li, J., Yang, Z., Yu, B., Liu, J., and Chen, X. (2005). Methylation protects miRNAs and siRNAs from a 3'-end uridylation activity in Arabidopsis. Curr. Biol. 15: 1501– 1507.

Li, Q. et al. (2015a). RNA-directed DNA methylation enforces boundaries between heterochromatin and euchromatin in the maize genome. PNAS 112: 14728–14733.

Li, S. et al. (2013a). MicroRNAs Inhibit the Translation of Target mRNAs on the Endoplasmic Reticulum in Arabidopsis. Cell 153: 562–574.

Li, S., Vandivier, L.E., Tu, B., Gao, L., Won, S.Y., Li, S., Zheng, B., Gregory, B.D., and Chen, X. (2015b). Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis. Genome Res. 25: 235–245.

Li, X., Qian, W., Zhao, Y., Wang, C., Shen, J., Zhu, J.-K., and Gong, Z. (2012). Antisilencing role of the RNA-directed DNA methylation pathway and a histone acetyltransferase in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 109: 11425–11430.

Li, X.M., Sang, Y.L., Zhao, X.Y., and Zhang, X.S. (2013b). High-throughput sequencing of small RNAs from pollen and silk and characterization of miRNAs as candidate factors involved in pollen-silk interactions in maize. PLoS ONE 8: e72852.

Lingel, A., Simon, B., Izaurralde, E., and Sattler, M. (2003). Structure and nucleic- acid binding of the Drosophila Argonaute 2 PAZ domain. Nature 426: 465–469.

Liu, H. et al. (2014a). Identification of miRNAs and their target genes in developing maize ears by combined small RNA and degradome sequencing. BMC Genomics 15: 25.

Liu, Q., Wang, F., and Axtell, M.J. (2014b). Analysis of complementarity requirements for plant microRNA targeting using a Nicotiana benthamiana quantitative transient assay. Plant Cell 26: 741–753.

Liu, X., Yu, C.-W., Duan, J., Luo, M., Wang, K., Tian, G., Cui, Y., and Wu, K. (2012). HDA6 Directly Interacts with DNA Methyltransferase MET1 and Maintains Transposable Element Silencing in Arabidopsis. Plant Physiol. 158: 119–129.

Liu, Z.-W., Shao, C.-R., Zhang, C.-J., Zhou, J.-X., Zhang, S.-W., Li, L., Chen, S., Huang, H.-W., Cai, T., and He, X.-J. (2014c). The SET domain proteins SUVH2 and SUVH9 are required for Pol V occupancy at RNA-directed DNA methylation loci. PLoS Genet. 10: e1003948.

116

Lobbes, D., Rallapalli, G., Schmidt, D.D., Martin, C., and Clarke, J. (2006). SERRATE: a new player on the plant microRNA scene. EMBO reports 7: 1052– 1058.

Lorenz, R., Bernhart, S.H., Höner Zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P.F., and Hofacker, I.L. (2011). ViennaRNA Package 2.0. Algorithms Mol Biol 6: 26.

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15.

López, A., Ramírez, V., García-Andrade, J., Flors, V., and Vera, P. (2011). The RNA Silencing Enzyme RNA Polymerase V Is Required for Plant Immunity. PLoS Genet. 7: e1002434.

Lu, F., Cui, X., Zhang, S., Liu, C., and Cao, X. (2010). JMJ14 is an H3K4 demethylase regulating flowering time in Arabidopsis. Cell Res. 20: 387–390.

Lu, S., Sun, Y.-H., and Chiang, V.L. (2009). Adenylation of plant miRNAs. Nucleic Acids Res. 37: 1878–1885.

Luciano, D.J., Mirsky, H., Vendetti, N.J., and Maas, S. (2004). RNA editing of a miRNA precursor. RNA 10: 1174–1177.

Luff, B., Pawlowski, L., and Bender, J. (1999). An Inverted Repeat Triggers Cytosine Methylation of Identical Sequences in Arabidopsis. Mol. Cell 3: 505–511.

Lunardon, A., Forestan, C., Farinati, S., Axtell, M.J., and Varotto, S. (2016). Genome-Wide Characterization of Maize Small RNA Loci and Their Regulation in the required to maintain repression6-1 (rmr6-1) Mutant and Long-Term Abiotic Stresses. Plant Physiol. 170: 1535–1548.

Lynn, K., Fernandez, A., Aida, M., Sedbrook, J., Tasaka, M., Masson, P., and Barton, M.K. (1999). The PINHEAD/ZWILLE gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126: 469–481.

Malagnac, F., Bartee, L., and Bender, J. (2002). An Arabidopsis SET domain protein required for maintenance but not establishment of DNA methylation. EMBO J. 21: 6842–6852.

Mallory, A. and Vaucheret, H. (2010). Form, Function, and Regulation of ARGONAUTE Proteins. Plant Cell 22: 3879–3889.

Mallory, A.C., Hinze, A., Tucker, M.R., Bouché, N., Gasciolli, V., Elmayan, T., Lauressergues, D., Jauvion, V., Vaucheret, H., and Laux, T. (2009). Redundant and Specific Roles of the ARGONAUTE Proteins AGO1 and ZLL in Development and Small RNA-Directed Gene Silencing. PLoS Genet. 5: e1000646.

117

Manavella, P.A., Koenig, D., and Weigel, D. (2012). Plant secondary siRNA production determined by microRNA-duplex structure. Proc. Natl. Acad. Sci. U.S.A. 109: 2461–2466.

Mao, H., Wang, H., Liu, S., Li, Z., Yang, X., Yan, J., Li, J., Tran, L.-S.P., and Qin, F. (2015). A transposable element in a NAC gene is associated with drought tolerance in maize seedlings. Nature Communications 6: 8326.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17: 10–12.

Martínez, G., Panda, K., Köhler, C., and Slotkin, R.K. (2016). Silencing in sperm cells is directed by RNA movement from the surrounding nurse cell. Nature Plants, 2, 16030.

Matranga, C., Tomari, Y., Shin, C., Bartel, D.P., and Zamore, P.D. (2005). Passenger-strand cleavage facilitates assembly of siRNA into Ago2-containing RNAi enzyme complexes. Cell 123: 607–620.

Matzke, M.A. and Mosher, R.A. (2014). RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nat. Rev. Genet. 15: 394–408.

Maumus, F. and Quesneville, H. (2014). Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter. PLoS ONE 9: e94101.

McCue, A.D., Nuthikattu, S., Reeder, S.H., and Slotkin, R.K. (2012). Gene Expression and Stress Response Mediated by the Epigenetic Regulation of a Transposable Element Small RNA. PLoS Genet. 8: e1002474.

McCue, A.D., Panda, K., Nuthikattu, S., Choudury, S.G., Thomas, E.N., and Slotkin, R.K. (2015). ARGONAUTE 6 bridges transposable element mRNA-derived siRNAs to the establishment of DNA methylation. EMBO J. 34: 20–35.

Melnyk, C.W., Molnar, A., and Baulcombe, D.C. (2011a). Intercellular and systemic movement of RNA silencing signals. EMBO J. 30: 3553–3563.

Melnyk, C.W., Molnar, A., Bassett, A., and Baulcombe, D.C. (2011b). Mobile 24 nt Small RNAs Direct Transcriptional Gene Silencing in the Root Meristems of Arabidopsis thaliana. Current Biology 21: 1678–1683.

Mette, M.F., Aufsatz, W., Van der Winden, J., Matzke, M.A., and Matzke, A.J.M. (2000). Transcriptional silencing and promoter methylation triggered by double stranded RNA. EMBO J. 19: 5194–5201.

Mi, S. et al. (2008). Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5' terminal nucleotide. Cell 133: 116–127.

118

Mica, E. et al. (2009). High throughput approaches reveal splicing of primary microRNA transcripts and tissue specific expression of mature microRNAs in Vitis vinifera. BMC Genomics 10: 558.

Micallef, L. and Rodgers, P. (2014). eulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses. PLoS ONE 9(7), e101717.

Molnar, A., Melnyk, C.W., Bassett, A., Hardcastle, T.J., Dunn, R., and Baulcombe, D.C. (2010). Small Silencing RNAs in Plants Are Mobile and Direct Epigenetic Modification in Recipient Cells. Science 328: 872–875.

Montgomery, T.A., Howell, M.D., Cuperus, J.T., Li, D., Hansen, J.E., Alexander, A.L., Chapman, E.J., Fahlgren, N., Allen, E., and Carrington, J.C. (2008). Specificity of ARGONAUTE7-miR390 Interaction and Dual Functionality in TAS3 Trans-Acting siRNA Formation. Cell 133: 128–141.

Moran, Y., Fredman, D., Praher, D., Li, X.Z., Wee, L.M., Rentzsch, F., Zamore, P.D., Technau, U., and Seitz, H. (2014). Cnidarian microRNAs frequently regulate targets by cleavage. Genome Res. 24: 651–663.

Mosher, R.A., Schwach, F., Studholme, D., and Baulcombe, D.C. (2008). PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proc. Natl. Acad. Sci. U.S.A. 105: 3145–3150.

Moussian, B. (1998). Role of the ZWILLE gene in the regulation of central shoot meristem cell fate during Arabidopsis embryogenesis. EMBO J. 17: 1799–1809.

Nakanishi, K., Weinberg, D.E., Bartel, D.P., and Patel, D.J. (2012). Structure of yeast Argonaute with guide RNA. Nature 486: 368–374.

Naumann, U., Daxinger, L., Kanno, T., Eun, C., Long, Q., Lorkovic, Z.J., Matzke, M., and Matzke, A.J.M. (2011). Genetic Evidence That DNA Methyltransferase DRM2 Has a Direct Catalytic Role in RNA-Directed DNA Methylation in Arabidopsis thaliana. Genetics 187: 977–979.

Nobuta, K., Lu, C., Shrivastava, R., Pillay, M., De Paoli, E., Accerbi, M., Arteaga- Vazquez, M., Sidorenko, L., Jeong, D.H., Yen, Y. and Green, P.J. (2008). Distinct size distribution of endogeneous siRNAs in maize: Evidence from deep sequencing in the mop1-1 mutant. Proc. Natl. Acad. Sci. U.S.A. 105: 14958–14963.

Numa, H., Kim, J.M., Matsui, A., Kurihara, Y., Morosawa, T., Ishida, J., Mochizuki, Y., Kimura, H., Shinozaki, K., Toyoda, T. and Seki, M. (2010). Transduction of RNA-directed DNA methylation signals to repressive histone marks in Arabidopsis thaliana. EMBO J. 29: 352–362.

Nuthikattu, S., McCue, A.D., Panda, K., Fultz, D., DeFraia, C., Thomas, E.N., and Slotkin, R.K. (2013). The Initiation of Epigenetic Silencing of Active Transposable Elements Is Triggered by RDR6 and 21-22 Nucleotide Small Interfering RNAs. Plant

119

Physiol. 162: 116–131.

Olmedo-Monfil, V., Duran-Figueroa, N., Arteaga-Vazquez, M., Demesa-Arevalo, E., Autran, D., Grimanelli, D., Slotkin, R.K., Martienssen, R.A., and Vielle- Calzada, J.-P. (2010). Control of female gamete formation by a small RNA pathway in Arabidopsis. Nature 464: 628–632.

Onodera, Y., Haag, J.R., Ream, T., Nunes, P.C., Pontes, O., and Pikaard, C.S. (2005). Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation- dependent heterochromatin formation. Cell 120: 613–622.

Panda, K., Ji, L., Neumann, D.A., Daron, J., Schmitz, R.J., and Slotkin, R.K. (2016). Full-length autonomous transposable elements are preferentially targeted by expression-dependent forms of RNA-directed DNA methylation. Genome Biol. 17: 170.

Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer Homolog, and HEN1, a Novel Protein, Act in microRNA Metabolism in Arabidopsis thaliana. Current Biology 12: 1484–1495.

Parkinson S.E., Gross S.M., Hollick J.B. (2007). Maize sex determination and abaxial leaf fates are canalized by a factor that maintains repressed epigenetic states. Dev. Biol. 308: 462–473.

Parry, G., Calderon-Villalobos, L.I., Prigge, M., Peret, B., Dharmasiri, S., Itoh, H., Lechner, E., Gray, W.M., Bennett, M., and Estelle, M. (2009). Complex regulation of the TIR1/AFB family of auxin receptors. PNAS 106: 22540–22545.

Patel, P., Ramachandruni, S.D., Kakrana, A., Nakano, M., and Meyers, B.C. (2016). miTRATA: a web-based tool for microRNA Truncation and Tailing Analysis. Bioinformatics 32: 450–452.

Peragine, A., Yoshikawa, M., Wu, G., Albrecht, H.L., and Poethig, R.S. (2004). SGS3 and SGS2/SDE1/RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev. 18: 2368–2379.

Pontier, D., Yahubyan, G., Vega, D., Bulski, A., Saez-Vasquez, J., Hakimi, M.A., Lerbs-Mache, S., Colot, V., and Lagrange, T. (2005). Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes Dev. 19: 2030–2040.

Popova, O.V., Dinh, H.Q., Aufsatz, W., and Jonak, C. (2013). The RdDM Pathway Is Required for Basal Heat Tolerance in Arabidopsis. Molecular Plant 6: 396–410.

Poulsen, C., Vaucheret, H., and Brodersen, P. (2013). Lessons on RNA silencing mechanisms in plants from eukaryotic argonaute structures. Plant Cell 25: 22–37.

Qi, Y., Denli, A.M., and Hannon, G.J. (2005). Biochemical specialization within

120

Arabidopsis RNA silencing pathways. Mol. Cell 19: 421–428.

Qi, Y., He, X., Wang, X.-J., Kohany, O., Jurka, J., and Hannon, G.J. (2006). Distinct catalytic and non-catalytic roles of ARGONAUTE4 in RNA-directed DNA methylation. Nature 443: 1008–1012.

Quinlan, A.R. and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842.

Ramachandran, V. and Chen, X. (2008). Degradation of microRNAs by a family of in Arabidopsis. Science 321: 1490–1492.

Raman, V., Simon, S.A., Romag, A., Demirci, F., Mathioni, S.M., Zhai, J., Meyers, B.C., and Donofrio, N.M. (2013). Physiological stressors and invasive plant infections alter the small RNA transcriptome of the rice blast fungus, Magnaporthe oryzae. BMC Genomics 14: 326.

Regulski, M. et al. (2013). The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Res. 23: 1651–1662.

Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev. 16: 1616–1626.

Ren, G., Chen, X., and Yu, B. (2012). Uridylation of miRNAs by hen1 suppressor1 in Arabidopsis. Curr. Biol. 22: 695–700.

Ren, G., Xie, M., Zhang, S., Vinovskis, C., Chen, X., and Yu, B. (2014). Methylation protects microRNAs from an AGO1-associated activity that uridylates 5' RNA fragments generated by AGO1 cleavage. Proc. Natl. Acad. Sci. U.S.A. 111: 6365– 6370.

Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel, D.P. (2002). Prediction of Plant MicroRNA Targets. Cell 110: 513–520.

Rivas, F.V., Tolia, N.H., Song, J.-J., Aragon, J.P., Liu, J., Hannon, G.J., and Joshua- Tor, L. (2005). Purified Argonaute2 and an siRNA form recombinant human RISC. Nat. Struct. Mol. Biol. 12: 340–349.

Rodríguez-Negrete, E.A., Carrillo-Tripp, J., and Rivera-Bustamante, R.F. (2009). RNA silencing against geminivirus: complementary action of posttranscriptional gene silencing and transcriptional gene silencing in host recovery. J. Virol. 83: 1332– 1340.

Rogers, K. and Chen, X. (2013). Biogenesis, turnover, and mode of action of plant microRNAs. Plant Cell 25: 2383–2399.

Ruiz-Ferrer, V. and Voinnet, O. (2009). Roles of Plant Small RNAs in Biotic Stress

121

Responses. Annu Rev Plant Biol 60: 485–510.

Saito, K., Sakaguchi, Y., Suzuki, T., Suzuki, T., Siomi, H., and Siomi, M.C. (2007). Pimet, the Drosophila homolog of HEN1, mediates 2 “-O-methylation of PIWI- interacting RNAs at their 3 ” ends. Genes Dev. 21: 1603–1608.

Schauer, S.E., Jacobsen, S.E., Meinke, D.W., and Ray, A. (2002). DICER-LIKE1: blind men and elephants in Arabidopsis development. Trends Plant Sci. 7: 487–491.

Schnable, P.S. et al. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 326: 1112–1115.

Searle, I.R., Pontes, O., Melnyk, C.W., Smith, L.M., and Baulcombe, D.C. (2010). JMJ14, a JmjC domain protein, is required for RNA silencing and cell-to-cell movement of an RNA silencing signal in Arabidopsis. Genes Dev. 24: 986–991.

Secco, D., Wang, C., Shou, H., Schultz, M.D., Chiarenza, S., Nussaume, L., Ecker, J.R., Whelan, J., Lister, R., and Weigel, D. (2015). Stress induced gene expression drives transient DNA methylation changes at adjacent repetitive elements. Elife 4: e09343.

Shahid, S. and Axtell, M.J. (2014). Identification and annotation of small RNA genes using ShortStack. Methods 67: 20–27.

Shin, C., Nam, J.-W., Farh, K.K.-H., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol. Cell 38: 789–802.

Shivaprasad, P.V., Chen, H.-M., Patel, K., Bond, D.M., Santos, B.A.C.M., and Baulcombe, D.C. (2012a). A microRNA superfamily regulates nucleotide -leucine-rich repeats and other mRNAs. Plant Cell 24: 859–874.

Shivaprasad, P.V., Dunn, R.M., Santos, B.A., Bassett, A., and Baulcombe, D.C. (2012b). Extraordinary transgressive phenotypes of hybrid tomato are influenced by epigenetics and small silencing RNAs. EMBO J. 31: 257–266.

Si-Ammour, A., Windels, D., and Arn-Bouldoires, E. (2011). miR393 and secondary siRNAs regulate expression of the TIR1/AFB2 auxin receptor clade and auxin- related development of Arabidopsis leaves. Plant Physiol. 157: 683-691.

Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., and Higgins, D.G. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7: 539–539.

Sijen, T., Vijn, I., Rebocho, A., van Blokland, R., Roelofs, D., Mol, J.N.M., and Kooter, J.M. (2001). Transcriptional and posttranscriptional gene silencing are mechanistically related. Current Biology 11: 436–440.

122

Slotkin, R.K., Vaughn, M., Borges, F., Tanurdzić, M., Becker, J.D., Feijó, J.A., and Martienssen, R.A. (2009). Epigenetic Reprogramming and Small RNA Silencing of Transposable Elements in Pollen. Cell 136: 461–472.

Song, J.-J., Smith, S.K., Hannon, G.J., and Joshua-Tor, L. (2004). Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305: 1434–1437.

Song, X. et al. (2012). Roles of DCL4 and DCL3b in rice phased small RNA biogenesis. Plant J. 69: 462–474.

Stroud, H., Ding, B., Simon, S.A., Feng, S., Bellizzi, M., Pellegrini, M., Wang, G.-L., Meyers, B.C., and Jacobsen, S.E. (2013a). Plants regenerated from tissue culture contain stable epigenome changes in rice. Elife 2: e00354.

Stroud, H., Greenberg, M.V.C., Feng, S., Bernatavichute, Y.V. and Jacobsen, S.E. (2013b) Comprehensive Analysis of Silencing Mutants Reveals Complex Regulation of the Arabidopsis Methylome. Cell: 152, 352–364.

Sun, P., and Kao, T.-H. (2013). Self-incompatibility in Petunia inflata: the relationship between a self-incompatibility locus F-box protein and its non-self S-RNases. Plant Cell 25: 470-485.

Sunkar, R., Chinnusamy, V., Zhu, J., and Zhu, J.-K. (2007). Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci. 12: 301–309.

Szarzynska, B., Sobkowiak, L., Pant, B.D., Balazadeh, S., Scheible, W.-R., Mueller- Roeber, B., Jarmolowski, A., and Szweykowska-Kulinska, Z. (2009). Gene structures and processing of Arabidopsis thaliana HYL1-dependent pri-miRNAs. Nucleic Acids Res. 37: gkp189–3093.

Takenaka, M., Zehrmann, A., Verbitskiy, D., Härtel, B., and Brennicke, A. (2013). RNA Editing in Plants and Its Evolution. http://dx.doi.org/10.1146/annurev-genet- 111212-133519 47: 335–352.

Tang, G., Reinhart, B.J., Bartel, D.P., and Zamore, P.D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev. 17: 49–63.

Thatcher, S.R., Burd, S., Wright, C., Lers, A., and Green, P.J. (2015). Differential expression of miRNAs and their target genes in senescing leaves and siliques: insights from deep sequencing of small RNAs and cleaved target RNAs. Plant Cell Environ. 38: 188–200.

Thomson, D.W., Bracken, C.P., and Goodall, G.J. (2011). Experimental strategies for microRNA target identification. Nucleic Acids Res. 39: 6845–6853.

To, T.K. et al. (2011). Arabidopsis HDA6 regulates locus-directed heterochromatin silencing in cooperation with MET1. PLoS Genet. 7: e1002055.

123

Tolia, N.H. and Joshua-Tor, L. (2007). Slicer and the Argonautes. Nature Chemical Biology 3: 36–43.

Tricker, P.J., Gibbings, J.G., Rodriguez Lopez, C.M., Hadley, P., and Wilkinson, M.J. (2012). Low relative humidity triggers RNA-directed de novo DNA methylation and suppression of genes controlling stomatal development. J. Exp. Bot. 63: 3799–3813.

Tu, B. et al. (2015). Distinct and cooperative activities of HESO1 and URT1 nucleotidyl transferases in microRNA turnover in Arabidopsis. PLoS Genet. 11: e1005119.

Tucker, M.R., Okada, T., Hu, Y., Scholefield, A., Taylor, J.M., and Koltunow, A.M.G. (2012). Somatic small RNA pathways promote the mitotic events of megagametogenesis during female reproductive development in Arabidopsis. Development 139: 1399–1404.

Vaucheret, H. (2008). Plant ARGONAUTES. Trends Plant Sci. 13: 350–358.

Vaucheret, H., Vazquez, F., Crété, P., and Bartel, D.P. (2004). The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev. 18: 1187–1197.

Vazquez, F., Gasciolli, V., Crété, P., and Vaucheret, H. (2004a). The Nuclear dsRNA Binding Protein HYL1 Is Required for MicroRNA Accumulation and Plant Development, but Not Posttranscriptional Transgene Silencing. Current Biology 14: 346–351.

Vazquez, F., Vaucheret, H., Rajagopalan, R., Lepers, C., Gasciolli, V., Mallory, A.C., Hilbert, J.-L., Bartel, D.P., and Crété, P. (2004b). Endogenous trans-Acting siRNAs Regulate the Accumulation of Arabidopsis mRNAs. Mol. Cell 16: 69–79.

Vourekas, A., Alexiou, P., Vrettos, N., Maragkakis, M., and Mourelatos, Z. (2016). Sequence-dependent but not sequence-specific piRNA adhesion traps mRNAs to the germ plasm. Nature 531: 390–394.

Wang, F., Johnson, N.R., Coruh, C., and Axtell, M.J. (2016). Genome-wide analysis of single non-templated nucleotides in plant endogenous siRNAs and miRNAs. Nucleic Acids Res. 44: 7395–7405.

Wang, F., Polydore, S., and Axtell, M.J. (2015a). More than meets the eye? Factors that affect target selection by plant miRNAs and heterochromatic siRNAs. Curr. Opin. Plant Biol. 27: 118–124.

Wang, H., Zhang, X., Liu, J., Kiba, T., Woo, J., Ojo, T., Hafner, M., Tuschl, T., Chua, N.-H., and Wang, X.-J. (2011). Deep sequencing of small RNAs specifically associated with Arabidopsis AGO1 and AGO4 uncovers new AGO functions. Plant J. 67: 292–304.

124

Wang, X., Zhang, S., Dou, Y., Zhang, C., Chen, X., Yu, B., and Ren, G. (2015b). Synergistic and independent actions of multiple terminal nucleotidyl transferases in the 3' tailing of small RNAs in Arabidopsis. PLoS Genet. 11: e1005091.

Wassenegger, M., Heimes, S., Riedel, L., and Sanger, H.L. (1994). Rna-Directed De- Novo Methylation of Genomic Sequences in Plants. Cell 76: 567–576.

Wei, L., Gu, L., Song, X., Cui, X., Lu, Z., Zhou, M., Wang, L., Hu, F., Zhai, J., Meyers, B.C., and Cao, X. (2014). Dicer-like 3 produces transposable element- associated 24-nt siRNAs that control agricultural traits in rice. Proc. Natl. Acad. Sci. U.S.A. 111: 3877–3882.

Wei, W., Ba, Z., Gao, M., Wu, Y., Ma, Y., Amiard, S., White, C.I., Rendtlew Danielsen, J.M., Yang, Y.-G., and Qi, Y. (2012). A Role for Small RNAs in DNA Double-Strand Break Repair. Cell 149: 101–112.

Wierzbicki, A.T. (2012). The role of long non-coding RNA in transcriptional gene silencing. Curr. Opin. Plant Biol. 15: 517–522.

Wierzbicki, A.T., Cocklin, R., Mayampurath, A., Lister, R., Rowley, M.J., Gregory, B.D., Ecker, J.R., Tang, H., and Pikaard, C.S. (2012). Spatial and functional relationships among Pol V-associated loci, Pol IV-dependent siRNAs, and cytosine methylation in the Arabidopsis epigenome. Genes Dev. 26: 1825–1836.

Wierzbicki, A.T., Haag, J.R., and Pikaard, C.S. (2008). Noncoding Transcription by RNA Polymerase Pol IVb/Pol V Mediates Transcriptional Silencing of Overlapping and Adjacent Genes. Cell 135: 635–648.

Wierzbicki, A.T., Ream, T.S., Haag, J.R., and Pikaard, C.S. (2009). RNA polymerase V transcription guides ARGONAUTE4 to chromatin. Nat. Genet. 41: 630–634.

Woodhouse M.R., Freeling M., Lisch D. (2006). The mop1 (mediator of paramutation1) mutant progressively reactivates one of the two genes encoded by the MuDR transposon in maize. Genetics 172:579–592.

Wu, L., Mao, L., and Qi, Y. (2012). Roles of DICER-LIKE and ARGONAUTE Proteins in TAS-Derived Small Interfering RNA-Triggered DNA Methylation. Plant Physiol. 160: 990–999.

Wu, L., Zhou, H., Zhang, Q., Zhang, J., Ni, F., Liu, C., and Qi, Y. (2010). DNA Methylation Mediated by a MicroRNA Pathway. Mol. Cell 38: 465–475.

Wulff, B.-E. and Nishikura, K. (2012). Modulation of MicroRNA Expression and Function by ADARs. Curr. Top. Microbiol. Immunol. 353: 91–109.

Xia, R., Meyers, B.C., Liu, Z., Beers, E.P., Ye, S., Liu, Z., and Liu, Z. (2013). MicroRNA superfamilies descended from miR390 and their roles in secondary small interfering RNA Biogenesis in Eudicots. Plant Cell 25: 1555–1572.

125

Xie, Z., Allen, E., Fahlgren, N., Calamar, A., Givan, S.A., and Carrington, J.C. (2005a). Expression of Arabidopsis MIRNA genes. Plant Physiol. 138: 2145–2154.

Xie, Z., Allen, E., Wilken, A., and Carrington, J.C. (2005b). DICER-LIKE 4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana. PNAS 102: 12984–12989.

Xie, Z.X., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D., Jacobsen, S.E., and Carrington, J.C. (2004). Genetic and functional diversification of small RNA pathways in plants. PLoS Biol. 2: 642–652.

Yamaguchi, N., Winter, C.M., Wu, M.-F., Kwon, C.S., William, D.A., and Wagner, D. (2014). PROTOCOLS: Chromatin Immunoprecipitation from Arabidopsis Tissues. Arabidopsis Book 12: e0170.

Yan, K.S., Yan, S., Farooq, A., Han, A., Zeng, L., and Zhou, M.-M. (2003). Structure and conserved RNA binding of the PAZ domain. Nature 426: 468–474.

Yang, D.-L., Zhang, G., Tang, K., Li, J., Yang, L., Huang, H., Zhang, H., and Zhu, J.-K. (2016). Dicer-independent RNA-directed DNA methylation in Arabidopsis. Cell Res. 26: 66–82.

Yang, L., Liu, Z., Lu, F., Dong, A., and Huang, H. (2006a). SERRATE is a novel nuclear regulator in primary microRNA processing in Arabidopsis. Plant J. 47: 841– 850.

Yang, L., Wu, G., and Poethig, R.S. (2012). Mutations in the GW-repeat protein SUO reveal a developmental function for microRNA-mediated translational repression in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 109: 315–320.

Yang, W.D., Chendrimada, T.P., Wang, Q.D., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. (2006b). Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat. Struct. Mol. Biol. 13: 13–21.

Yang, Z., Ebright, Y.W., Yu, B., and Chen, X. (2006c). HEN1 recognizes 21-24 nt small RNA duplexes and deposits a methyl group onto the 2“ OH of the 3” terminal nucleotide. Nucleic Acids Res. 34: 667–675.

Ye, R., Chen, Z., Lian, B., Rowley, M.J., Xia, N., Chai, J., Li, Y., He, X.-J., Wierzbicki, A.T., and Qi, Y. (2016). A Dicer-Independent Route for Biogenesis of siRNAs that Direct DNA Methylation in Arabidopsis. Mol. Cell 61: 222–235.

Ye, R., Wang, W., Iki, T., Liu, C., Wu, Y., Ishikawa, M., Zhou, X., and Qi, Y. (2012). Cytoplasmic Assembly and Selective Nuclear Import of Arabidopsis ARGONAUTE4/siRNA Complexes. Mol. Cell 46: 859–870.

Yekta, S., Shih, I.-H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of

126

HOXB8 mRNA. Science 304: 594–596.

Yoshikawa, M., Iki, T., Tsutsui, Y., Miyashita, K., Poethig, R.S., Habu, Y., and Ishikawa, M. (2013). 3' fragment of miR173-programmed RISC-cleaved RNA is protected from degradation in a complex with RISC and SGS3. Proc. Natl. Acad. Sci. U.S.A. 110: 4117–4122.

Yoshikawa, M., Peragine, A., Park, M.Y., and Poethig, R.S. (2005). A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev. 19: 2164–2175.

Yu, A., Lepère, G., Jay, F., Wang, J., Bapaume, L., Wang, Y., Abraham, A.-L., Penterman, J., Fischer, R.L., Voinnet, O., and Navarro, L. (2013). Dynamics and biological relevance of DNA demethylation in Arabidopsis antibacterial defense. Proc. Natl. Acad. Sci. U.S.A. 110: 2389–2394.

Yu, B., Bi, L., Zheng, B., Ji, L., Chevalier, D., Agarwal, M., Ramachandran, V., Li, W., Lagrange, T., Walker, J.C., and Chen, X. (2008). The FHA domain proteins DAWDLE in Arabidopsis and SNIP1 in humans act in small RNA biogenesis. Proc. Natl. Acad. Sci. U.S.A. 105: 10073–10078.

Yu, B., Yang, Z., Li, J., Minakhina, S., Yang, M., Padgett, R.W., Steward, R., and Chen, X. (2005). Methylation as a crucial step in plant microRNA biogenesis. Science 307: 932–935.

Yuan, Y.-R., Pei, Y., Ma, J.-B., Kuryavyi, V., Zhadina, M., Meister, G., Chen, H.-Y., Dauter, Z., Tuschl, T., and Patel, D.J. (2005). Crystal structure of A. aeolicus argonaute, a site-specific DNA-guided endoribonuclease, provides insights into RISC-mediated mRNA cleavage. Mol. Cell 19: 405–419.

Zemach, A., Kim, M.Y., Hsieh, P.-H., Coleman-Derr, D., Eshed-Williams, L., Thao, K., Harmer, S.L., and Zilberman, D. (2013). The Arabidopsis Nucleosome Remodeler DDM1 Allows DNA Methyltransferases to Access H1-Containing Heterochromatin. Cell 153: 193–205.

Zhai, J. et al. (2015a). A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis. Cell 163: 445–455.

Zhai, J. et al. (2013). Plant microRNAs display differential 3' truncation and tailing modifications that are ARGONAUTE1 dependent and conserved across species. Plant Cell 25: 2417–2428.

Zhai, J., Zhang, H., Arikit, S., Huang, K., Nan, G.-L., Walbot, V., and Meyers, B.C. (2015b). Spatiotemporally dynamic, cell-type-dependent premeiotic and meiotic phasiRNAs in maize anthers. Proc. Natl. Acad. Sci. U.S.A. 112: 3146–3151.

Zhang, B.H., Pan, X.P., Wang, Q.L., Cobb, G.P., and Anderson, T.A. (2005). Identification and characterization of new plant microRNAs using EST analysis. Cell Res. 15: 336–360.

127

Zhang, H. et al. (2013). DTF1 is a core component of RNA-directed DNA methylation and may assist in the recruitment of Pol IV. Proc. Natl. Acad. Sci. U.S.A. 110: 8290– 8295.

Zhang, H., Tang, K., Qian, W., Duan, C.-G., Wang, B., Zhang, H., Wang, P., Zhu, X., Lang, Z., Yang, Y., and Zhu, J.-K. (2014). An Rrp6-like protein positively regulates noncoding RNA levels and DNA methylation in Arabidopsis. Mol. Cell 54: 418–430.

Zhang, L., Chia, J.-M., Kumari, S., Stein, J.C., Liu, Z., Narechania, A., Maher, C.A., Guill, K., McMullen, M.D., and Ware, D. (2009). A genome-wide characterization of microRNA genes in maize. PLoS Genet. 5: e1000716.

Zhang, T., Zhao, Y.L., Zhao, J.H., Wang, S., Jin, Y., and Chen, Z.Q. (2016a). Cotton plants export microRNAs to inhibit virulence gene expression in a fungal pathogen. Nature Plants, 2: 16153.

Zhang, X. (2008). The Epigenetic Landscape of Plants. Science 320: 489–492.

Zhang, X., Henderson, I.R., Lu, C., Green, P.J., and Jacobsen, S.E. (2007). Role of RNA polymerase IV in plant small RNA metabolism. PNAS 104: 4536–4541.

Zhang, X., Zhao, H., Gao, S., Wang, W.-C., Katiyar-Agarwal, S., Huang, H.-D., Raikhel, N., and Jin, H. (2011). Arabidopsis Argonaute 2 Regulates Innate Immunity via miRNA393∗-Mediated Silencing of a Golgi-Localized SNARE Gene, MEMB12. Mol. Cell 42: 356–366.

Zhang, Z., Liu, X., Guo, X., Wang, X.-J., and Zhang, X. (2016b). Arabidopsis AGO3 predominantly recruits 24-nt small RNAs to regulate epigenetic silencing. Nature Plants 2: 16049.

Zhao, Y., Mo, B., and Chen, X. (2012a). Mechanisms that impact microRNA stability in plants. RNA Biol 9: 1218–1223.

Zhao, Y., Yu, Y., Zhai, J., Ramachandran, V., Dinh, T.T., Meyers, B.C., Mo, B., and Chen, X. (2012b). The Arabidopsis nucleotidyl HESO1 uridylates unmethylated small RNAs to trigger their degradation. Curr. Biol. 22: 689–694.

Zheng, Q., Rowley, M.J., Böhmdorfer, G., Sandhu, D., Gregory, B.D., and Wierzbicki, A.T. (2013). RNA polymerase V targets transcriptional silencing components to promoters of protein coding genes. Plant J. 73: 179–189.

Zheng, X., Zhu, J., Kapoor, A., and Zhu, J.-K. (2007). Role of Arabidopsis AGO6 in siRNA accumulation, DNA methylation and transcriptional gene silencing. EMBO J. 26: 1691–1701.

Zhong, X., Du, J., Hale, C.J., Gallego-Bartolome, J., Feng, S., Vashisht, A.A., Chory, J., Wohlschlegel, J.A., Patel, D.J., and Jacobsen, S.E. (2014). Molecular

128

Mechanism of Action of Plant DRM De Novo DNA Methyltransferases. Cell 157: 1050–1060.

Zhong, X., Hale, C.J., Law, J.A., Johnson, L.M., Feng, S., Tu, A., and Jacobsen, S.E. (2012). DDR complex facilitates global association of RNA polymerase V to promoters and evolutionarily young transposons. Nat. Struct. Mol. Biol. 19: 870–875.

Zhong, X., Hale, C.J., Nguyen, M., Ausin, I., Groth, M., Hetzel, J., Vashisht, A.A., Henderson, I.R., Wohlschlegel, J.A., and Jacobsen, S.E. (2015). Domains rearranged methyltransferase3 controls DNA methylation and regulates RNA polymerase V transcript abundance in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 112: 911–916.

Zhu, H., Hu, F., Wang, R., Zhou, X., Sze, S.-H., Liou, L.W., Barefoot, A., Dickman, M., and Zhang, X. (2011). Arabidopsis Argonaute10 Specifically Sequesters miR166/165 to Regulate Shoot Apical Meristem Development. Cell 145: 242–256.

Zhu, Q.-H., Spriggs, A., Matthew, L., Fan, L., Kennedy, G., Gubler, F., and Helliwell, C. (2008). A diverse set of microRNAs and microRNA-like small RNAs in developing rice grains. Genome Res. 18: 1456–1465.

Zilberman, D., Cao, X.F., and Jacobsen, S.E. (2003). ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299: 716–719.

Zisoulis, D.G., Lovci, M.T., Wilbert, M.L., Hutt, K.R., Liang, T.Y., Pasquinelli, A.E., and Yeo, G.W. (2010). Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nat. Struct. Mol. Biol. 17: 173–179.

129

VITA

Feng Wang

EDUCATION Ph.D. in Plant Biology The Pennsylvania State University, University Park, PA. May 2017 Intercollege Graduate Program in Plant Biology

B.S. in Biotechnology Huazhong Agricultural University, Wuhan, China. June 2010 College of Life Sciences and Technology

HONORS AND AWARDS  Outstanding Graduate Award, Huazhong Agricultural University, 2010  Monsanto Scholarship, Huazhong Agricultural University, 2009  National Scholarship, Ministry of Education of P.R.China, 2008

PUBLICATIONS Wang, F. and Axtell, M.J. (2016). AGO4 is specifically required for heterochromatic siRNA accumulation at Pol V-dependent loci in Arabidopsis thaliana. In revision. (Preprint available at http://dx.doi.org/10.1101/078394).

Shahid, S., Kim, G., Wafula, E., Wang, F., Coruh, C., dePamphilis, C.W., Westwood, J.H., and Axtell, M.J. (2016). MicroRNAs from the parasitic plant Cuscuta pentagona target host messenger RNAs. Submitted.

Wang, F., Johnson, N.R., Coruh, C., and Axtell, M.J. (2016). Genome-wide analysis of single non-templated nucleotides in plant endogenous siRNAs and miRNAs. Nucleic Acids Res. 44: 7395–7405.

Wang, F., Polydore, S., and Axtell, M.J. (2015). More than meets the eye? Factors that affect target selection by plant miRNAs and heterochromatic siRNAs. Curr. Opin. Plant Biol. 27: 118–124.

Liu, Q., Wang, F., and Axtell, M.J. (2014). Analysis of complementarity requirements for plant microRNA targeting using a Nicotiana benthamiana quantitative transient assay. Plant Cell 26: 741–753.