Open Seth Polydore Doctoral Thesis.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Intercollege Graduate Degree Program in Genetics ANALYSIS OF ARABIDOPSIS THALIANA RNA DEPENDENT RNA POLYMERASE MUTANTS REVEALS NOVEL SMALL RNAS AND IMPROVES EXISTING ANNOTATIONS A Dissertation in Genetics by Seth Polydore © 2019 Seth Polydore Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2019 The dissertation of Seth Polydore was reviewed and approved* by the following: Claude dePamphilis Professor of Biology Chair of Committee Michael J. Axtell Professor of Biology Dissertation Advisor Charles T. Anderson Associate Professor of Biology Naomi S. Altman Professor of Statistics Robert F. Paulson Professor of Veterinary and Biomedical Sciences Head of Genetics PhD. Program *Signatures are on file with the Graduate School. ii Abstract Small RNAs are molecules that regulate key physiological functions in land plants through transcriptional and post-transcriptional silencing. Small RNAs can be divided into two major categories: microRNAs (miRNAs) which are precisely processed from single-stranded, stem- and-loop RNA precursors, and short interfering RNAs (siRNAs), which are derived from double- stranded RNA precursors. The processing of small RNA precursors is performed by RNASEIII- like endonucleases known as DICER-LIKE (DCL) proteins. DCLs hydrolyze small RNA precursors into 20-24 nt, double-stranded RNA molecules. One strand of these double-stranded RNA molecules is loaded into ARGONAUTE (AGO) proteins. AGO uses the loaded single- stranded RNA to target other RNA molecules for repression. Depending on the type of small RNA in question, a third type of protein family is necessary for small RNA biogenesis – RNA DEPENDENT RNA POLYMERASE (RDR). RDRs convert single-stranded RNAs into double- stranded RNAs, which can be processed by DCL proteins. Small RNAs have been the subject of intense research since their discovery and much has been discovered about their modes of biogenesis and functions. However, some questions remain unanswered and some issues have arisen: Firstly, there are many multi-mapping reads, which comprise the majority of land plant small RNAs in a typical land plant small RNA transcriptome. Due to the difficulties in dealing with multi-mapped reads, in a typical small RNA study most small RNA data are ultimately ignored or incorrectly aligned. Secondly, many known types of small RNAs are incompletely or incorrectly annotated. Many MIRNA annotations in particular are known to be erroneous. In my research, I utilize mutants of the RDR genes known to be involved in small RNA biogenesis in order to re-examine known annotations of different types of small RNAs in Arabidopsis thaliana. I also utilize these mutants in order to determine if there are other, previously unknown types of small RNAs encoded in the A. thaliana genome. Ultimately, I found 58 erroneous MIRNA annotations and 38 small RNA loci that do not follow any known methods of small RNA biogenesis. siRNAs can be further subdivided into several different groups, including phased siRNAs (phasiRNAs) and heterochromatic siRNAs (hc-siRNAs). Predominantly 24 nt in length, hc- siRNAs are produced from the biochemical actions of DCL3, DNA DEPENDENT RNA POLYMERASE (Pol) IV, and RDR2. phasiRNAs are produced from the biochemical actions of Pol II, RDR6, and DCL4. In every land plant sequenced to date, 21-22 nt dominated phasiRNAs are well represented, but 24 nt-dominated phasiRNAs have only been identified in certain monocots thus far, especially Asparagus officinalis, Hemerocallis lilioasphodelus, Lilium maculatum, and the Poaceae family. I was interested in elucidating whether or not the A. thaliana genome encoded these 24 nt-dominated phasiRNAs. I therefore carefully examined A. thaliana 24 nt-dominated loci that consistently passed PHAS-locus detecting algorithms using multiple methods and found that they are likely just heterochromatic siRNAs (hc-siRNAs). Since 24 nt-dominated siRNA loci are very numerous in angiosperms, they serve as potential source of false-positives during searches for phasiRNA-generating loci. Overall, this analysis shows that using existing phasing score algorithms to detect novel PHAS loci can lead to false positives. iii Table of Contents List of Figures ........................................................................................................................vi List of Tables .........................................................................................................................viii Acknowledgements ................................................................................................................ix Chapter 1 Introduction 1.1 Background of small RNAs in Land Plants .....................................................................1 1.2 Classifications of small RNAs and Their Biogenesis Machinery ....................................1 1.2.1 miRNA biogenesis and function ............................................................................2 1.2.2 Sub-classification of siRNAs .................................................................................4 1.2.2.1 Heterochromatic siRNA ..............................................................................4 1.2.2.2 Epigenetically Active siRNAs .....................................................................7 1.2.2.3 Phased siRNAs ............................................................................................8 1.2.2.4 Natural Anti-sense siRNAs .........................................................................10 1.2.2.5 Inverted Repeat siRNAs ..............................................................................13 1.3 Annotation of Small RNAs in Plants ...............................................................................15 1.4 The Two Clades of RDRs ................................................................................................18 1.5 Problems and unknowns in small RNA annotation .........................................................22 1.6 Objectives ........................................................................................................................24 Chapter 2 Analysis of RDR1/RDR2/RDR6-independent small RNAs in Arabidopsis thaliana improves MIRNA annotations and reveals unexplained types of siRNA loci 2.1 Introduction ......................................................................................................................26 2.2 Materials and Methods .....................................................................................................28 2.2.1 Plant materials and growth conditions ...................................................................28 2.2.2 RNA Extraction, small RNA library construction, alignment, and cluster calling ........................................................................................................................................28 2.2.3 Differential expression analysis .............................................................................32 2.2.4 Inspection of miRBase annotated MIRNA .............................................................33 2.2.5 Designation of small RNA types ...........................................................................33 2.2.6 Determining the statistics of different types of loci ...............................................35 2.2.7 Determining overlap between small RNA loci and genetic loci ...........................35 2.2.8 Expression analyses of unclassified small RNAs in various mutant libraries .......36 2.2.9 Conservation analysis of different types of loci ....................................................37 2.2.10 Alignment and expression of A. thaliana mRNAs ..............................................38 2.3 Results ..............................................................................................................................38 2.3.1 Small RNA-seq from rdr1/2/6 ...............................................................................38 2.3.2 Critical inspection of Arabidopsis thaliana MIRNA annotations in miRBase21 ..40 2.3.3 Most small RNA-producing loci are down-regulated in rdr1/2/6 .........................43 2.3.4 Unclassified RDR1/2/6-independent loci are true small RNA-producing clusters that are 21 nt or 24 nt dominated ...........................................................................................47 2.3.5 Unclassified loci overlap with genes but are not produced from their transcripts 51 2.3.6 The unclassified loci are largely not conserved .....................................................59 2.3.7 The rdr6-15 T-DNA insertion causes sRNA production at the RDR6 locus ........60 2.4 Discussion ........................................................................................................................62 iv 2.4.1 Nearly 20% of annotated A. thaliana MIRNAs are dependent on RDRs ...............62 2.4.2 Many RDR1/2/6-independent sRNA loci are MIRNAs, MIRNA-like, or hairpin-derived. ........................................................................................................................................63 2.4.3 Unclassified, RDR1/2/6-independent sRNA loci represent potentially regulatory RNAs ........................................................................................................................................64 Chapter 3 Several phased siRNA annotation methods can produce frequent false-positives for 24 nucleotide RNA-dominated loci in plants