UC Berkeley UC Berkeley Electronic Theses and Dissertations

Title Maize RNA IV complexes and their control of gene function

Permalink https://escholarship.org/uc/item/46h9237t

Author Talbot, Joy-El R B

Publication Date 2015

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California Maize RNA polymerase IV complexes and their control of gene function

by

Joy-El Renee Barbour Talbot

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Molecular and Cell Biology

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Jay B. Hollick, Co-chair Professor Jasper D. Rine, Co-chair Professor Michael B. Eisen Professor Michael R. Freeling

Summer 2015 Maize RNA polymerase IV complexes and their control of gene function

Copyright 2015 by Joy-El Renee Barbour Talbot 1

Abstract

Maize RNA polymerase IV complexes and their control of gene function by Joy-El Renee Barbour Talbot Doctor of Philosophy in Molecular and Cell Biology University of California, Berkeley Professor Jay B. Hollick, Co-chair Professor Jasper D. Rine, Co-chair

Plants have acquired and maintained an expanded suite of DNA-dependent RNA poly- merases (RNAPs) compared to other eukaryotes. Although their exact roles remain unclear, plant-specific RNAPs (Pol IV and Pol V) are involved in epigenetic silencing of transposable elements (TEs). Zea mays (maize) Pol IV is required for proper plant development as well as the establishment and maintenance of paramutations, which are trans-homolog interactions that facilitate heritable gene silencing. Maize has duplications of Pol IV catalytic subunits, which define multiple Pol IV subtypes, and accessory proteins associated with these subtypes define distinct Pol IV complexes. Understanding the roles of Pol IV will require identify- ing the composition and function of these Pol IV subtypes and complexes. In exploring interactions between the genes encoding a Pol IV catalytic subunit and a putative accessory protein, I identified two new alleles of the Pol IV subunit and a family of potential Pol IV accessory proteins conserved in multicellular plants. Together, these Pol IV complexes are proposed to function through two, potentially overlapping, general mechanisms to control of gene function: direct competition with Pol II and/or generation of 24-nucleotide RNAs that guide de novo cytosine . I analyzed nascent transcriptome data of seedlings lacking Pol IV and their wild-type siblings, which identified a global effect of Pol IV on gene boundary transcription. Pol IV-affected loci serve as molecular and phenotypic models for dissecting the involvement of various Pol IV subtype and complex components. Such analysis implicated a Pol IV-affected allele as being controlled by Pol IV competition with Pol II. Through expanded sequencing of a paramutation participating allele (Pl1-Rhoades), I identified a new Pol IV target of five tandem repeats that may serve as an enhancer element. I used the Pl1-Rhoades allele in particular to compare and contrast the effects of Pol IV loss with that of Pol IV accessory proteins. Together, the work presented here explores the involvement of an uncharacterized protein family of putative Pol IV accessory proteins in Pol IV complexes, expands the roles of Pol IV complexes in controlling maize gene function, and identifies new Pol IV-affected loci for future study. i

To Paul and Sheila Barbour

For always believing, supporting and listening–even when I make up horrible analogies between my research and videogame plots.

To Jared Talbot

For helping me get to the other side and being my collaborator in our best genetic experiment yet.

Finally to the Yeti–my mostly silent companion in writing this dissertation. I can’t wait to meet your wonderful self. ii

Contents

Contents ii

List of Figures iv

List of Tables vi

Acknowledgements vii

1 Introduction 1 1.1 Evolutionary origin of plant RNA polymerase structural diversity ...... 2 1.2 Transcriptional activity of Pol IV ...... 6 1.3 FunctionsofPolIVinplants...... 8 1.4 Examples of genome regulation by Pol IV ...... 9 1.5 Remainingquestions ...... 12 1.6 Figures...... 14 1.7 References...... 15

2 Genetic interactions between Pol IV components 21 2.1 Introduction...... 21 2.2 MaterialsandMethods...... 23 2.3 Results...... 24 2.4 Discussion...... 31 2.5 Tables ...... 34 2.6 Figures...... 37 2.7 References...... 45

3 Pol IV affects nascent transcription 50 3.1 Introduction...... 50 3.2 MaterialsandMethods...... 52 3.3 Results...... 59 3.4 Discussion...... 68 3.5 Tables ...... 74 3.6 Figures...... 91 iii

3.7 References...... 108

4 Putative Pl1-Rhoades paramutation element 116 4.1 Introduction...... 116 4.2 MaterialsandMethods...... 118 4.3 Results...... 122 4.4 Discussion...... 128 4.5 Tables ...... 132 4.6 Figures...... 134 4.7 References...... 145

5 Perspectives 150 5.1 Composition of Pol IV complexes ...... 150 5.2 Mechanistic models of Pol IV function ...... 152 5.3 PolIVeffectsongenefunction...... 153 5.4 Pol IV effects on paramutable alleles ...... 153 5.5 Conclusions ...... 155 5.6 References...... 156

Appendices 159

A Complementation cross pedigrees 159

B Small RNA analysis scripts 164 B.1 Awk commands for processing SAM alignment files ...... 164 B.2 Python function to add total alignment count to SAM file ...... 166 iv

List of Figures

1.1 RNAPholoenzymecomposition ...... 14

2.1 Conserved carboxy-terminal motifs in the RMR2 family ...... 37 2.2 Phylogeny of RMR2 family proteins ...... 38 2.3 Expression of rmr2 anditsparalogsinmaize ...... 39 2.4 The rp(d/e)2a genemodel...... 39 2.5 Pedigree of rmr7-1 and rmr7-5 lineages ...... 40 2.6 Sectored tassels and anthers from rmr2, rmr7 complementation crosses . . . . . 41 2.7 Pedigree of rmr2-1 lineages ...... 42 2.8 Reciprocal complementation crosses between rmr2-1 and rmr7-5 ...... 44

3.1 GRO-seq reads are similarly distributed in WT and rpd1 mutant libraries . . . . 91 3.2 GRO-seq reads align to both exonic and intronic sequences ...... 92 3.3 Most TE-like sequences can align to genes ...... 92 3.4 Genic reads are highly enriched in GRO-seq mappable reads ...... 93 3.5 Nongenic GRO-seq reads are enriched near genes ...... 93 3.6 Distribution of uniquely mapping WT GRO-seq reads around genes ...... 94 3.7 ATG enrichment at the start of maize gene models ...... 95 3.8 Alternative TSS definition has little effect on coverage profiles ...... 96 3.9 Comparison of GRO-seq peak and TSS annotations ...... 97 3.10 Strong antisense coverage upstream of GRMZM2G061206 ...... 98 3.11 Pol IV loss alters global transcription profiles at gene boundaries...... 99 3.12 Uniquely mapping GRO-seq coverage across gene bodies ...... 100 3.13 Fold change between WT and rpd1 mutant antisense GRO-seq read coverage . . 101 3.14 Coverage of uniquely mapping 24mers near gene boundaries ...... 102 3.15 Specific alleles are susceptible to Pol IV-induced changes in . . . 103 3.16 qRT-PCR analysis of ocl2 in the absence of RPD1 and DCL3 ...... 105 3.17 Pol IV loss affects both entire TE families and individual elements ...... 106

4.1 Pl1-Rhoades shares structural regions with pl1-B73 ...... 134 4.2 Alignment of a Pl1-Rhoades and pl1-B73 repeatunit ...... 135 4.3 Southern blot of Pl1-Rh-containingBAC ...... 136 v

4.4 Pl1-Rhoades derivatives have similar structures around the pentarepeats . . . . 137 4.5 Structural diversity of pl1 alleles at the unique subrepeat ...... 138 4.6 The unique subrepeat is expressed in Pl-Rh florets...... 139 4.7 Expression patterns of pl1 andtheuniquesubrepeat ...... 139 4.8 Pol IV-dependent 24-nt RNAs align to the unique subrepeat ...... 140 4.9 Unique subrepeat 24-nt RNAs are made only from Pl1-Rh ...... 141 4.10 Gametophytes contribute to the 24-nt RNA pools of immaturecobs ...... 142 4.11 Cis-genetic variation does not alter 24-nt RNA profiles in hybrids ...... 143 4.12 Affects of maternal loss of RMR1 on 24-nt RNAs ...... 144

A.1 rmr7-1 and rmr7-5 pedigree with family identifiers ...... 160 A.2 rmr2-1 pedigreewithfamilyidentifiers ...... 162 vi

List of Tables

2.1 Primersused ...... 34 2.2 Initial ems072087 complementationtests...... 35 2.3 Complementation tests between ems051081 and rmr7 alleles ...... 35 2.4 Complementation tests between ems072087 and rmr7 alleles ...... 35 2.5 Complementation tests between rmr2-1 and rmr7 alleles ...... 36 2.6 Complementation tests between rmr2-mum and rmr7 alleles ...... 36

3.1 Sources of genomic feature annotations ...... 74 3.2 Primers used for RT-PCR and qRT-PCR analysis ...... 74 3.3 Genes with increased sense orientation transcription in rpd1 mutants ...... 75 3.4 Genes with decreased sense orientation transcription in rpd1 mutants ...... 78 3.5 Genes with increased antisense orientation transcription in rpd1 mutants . . . . 84 3.6 Genes with decreased antisense orientation transcription in rpd1 mutants . . . . 86 3.7 Transposons with increased sense orientation transcription in rpd1 mutants . . . 87 3.8 Transposons with decreased sense orientation transcription in rpd1 mutants . . 88 3.9 Transposons with increased antisense orientation transcription in rpd1 mutants . 89 3.10 Transposons with decreased antisense orientation transcription in rpd1 mutants 90

4.1 Tissue sources for pl1 alleles...... 132 4.2 Primersused ...... 133 vii

Acknowledgements

It takes a community to raise a child; similarly it takes a community to train a grad- uate student. During my scientific career, I have been fortunate to have the support of a continental community spanning three different universities and many places in between. My thanks go first to the teachers of Hartland Elementary School (HES) and Hartford High School (HHS). In particular, Mrs. Simon-Bouton, Mrs. Johnson and Mr. MacEachern at HES for challenging me in math and science. I would also like to thank Al Kobe for his ever challenging, but always interesting, biology classes in high school and Nancy Kent who encouraged my interests in computer programming. Je veux également remercier Mme MacHarg qui m’avoir enseigné de valoriser les expériences nouvelles. These teachers, many of whom I am forgetting, spawned my early interest in learning and helped me to build the foundation that has supported my later scholastic work. I would like to acknowledge all of the training that I received in the Molecular Genetics and Microbiology program at the University of Vermont. To Brenda Tessman, thank you for teaching me sterile technique; even if I no longer streak a plate appropriately, I still know how it ought to be done. To Stephanie Phelps, thank you for helping me build my molecular biology tool kit, and for being brave enough to run a mammalian cell culture lab (without antibiotics) with a bunch of chatty seniors. Thanks to Dr. Douglas Johnson for hosting me in my first lab experience and to his ever patient graduate students Benjamin Stark and Karen Cole: I’m sorry I didn’t always get it. Thank you to Amalthiya Prasad for training me on so many techniques in the Pederson lab and entrusting your project to me when you graduated. Finally, thanks to Dr. David Pederson for giving me independence in my research and trusting me to run your lab. I greatly enjoyed all of our talks both while working with you and after starting grad school, and I look forward to many more in the future. There are so many people at UC Berkeley to thank. All of my MCB 2009 cohort, in particular Adrienne Greene, Akemi Kunibe, Cait Schartner, Em Ho, Caitlin DeJong, and Alden Conner, you all made the transition across country and the introduction to graduate school so much more fun. Thanks to the Eisen and Rine labs for welcoming me during my rotations (and introducing me to the fun of MCB Follies). I would particularly like to thank everyone in the Amacher/Hollick lab at Berkeley for a wonderful first few years and the best quals party ever! Thanks especially to Tom Gallagher and Adrienne Maxwell, visiting your bay was always a happy adventure! Thanks to Karl Erhard for teaching me viii the ropes in the Hollick lab; I always enjoyed our crazy science talks and will forever want to shout ‘Let the great experiment begin!’ when starting new projects. Thanks to Irene Liao for enthusiastically helping with everything and keeping our lab running. Thanks also to Nathan Shih, for not complaining too much about all of my shoes under our shared desk space. In addition to the community at UC Berkeley, I got to build another academic and social community at Ohio State University (OSU) in my last three years of this degree. Thanks to the Hamel, Meyer and Slotkin labs for welcoming us to fifth floor of Aronoff and keeping things fun since then. In particular, thanks to Norman Groves and Anna Griffis for keeping me sane and keeping happy hour going even when we were all in the field. I would also like to thank the new incarnations of the Amacher and Hollick labs for continuing to be a supportive family. Special thanks also to Katherine Mejía-Guerra, Jeffrey Campbell, Erik Mukundi and all of the other members of the CAPS Computational Biology Laboratory (CCBL), who became my second lab family on west campus and helped me grow as a bioinformatician. Finally thanks to the current and past Hollick lab folks at OSU, especially Laura Schrader and Julia Kays who helped set up the lab and get things started here, and my fellow grad students Janelle Gabriel, Brian Giacopelli and Natalie Deans. Thanks also to the REU and undergraduate students who brought new perspectives and fresh enthusiasm to the lab. Many thanks to Dr. Jay Hollick for giving me several opportunities over the last five years including joining his lab, discovering the a-maize-ing power of maize genetics, participating in the new adventures at OSU, and encouraging my interests in bioinformatics. Thanks also for the awesome margarita recipe and reminding me about life outside of lab. Thanks also to Dr. Sharon Amacher who always astounds me with her quiet insights. I would also like to thank my committee members for their patience and support in this somewhat atypical PhD process. Penultimately, I would like to thank my friends and family whose unwavering encour- agement, enthusiasm and support made all off this possible. Jonathan Bowley for his great pep-talks and always remembering all of our little inside jokes that would brighten my day. Cynthia Chapin for reminding me of the rejuvenating powers of the outdoors. Jeffrey Camp- bell and Anna and Denis Griffis for the excellent game nights and general nerdiness. Dr. Lewis Houston for reminding me just what a PhD means down the road. Thanks also to Barb and David Talbot for happily welcoming me into their family. Thanks to my sister Sarah Barbour, I’m sorry you never got to visit me in Berkeley, but your calls were some of my favorite parts of the last few years. Finally, thanks to my parents Paul and Sheila Barbour: you sparked and nurtured my curiosity and taught me that nothing is truly impossible, most especially when it feels impossible. Finally, I would like to thank my partner Jared Talbot. I can safely say that I wouldn’t be here without you, and I have always appreciated your reminders to enjoy the ride along the way! 1

Chapter 1

Introduction

All cellular life uses DNA-dependent RNA (RNAPs) to transcribe stored genomic information into biologically active RNA molecules. These highly-conserved multi- subunit holoenzymes represent the core of larger complexes that guide template selectivity, improve processivity, and modify the RNAP products. Eukaryotes have divided the jobs of transcribing DNA into ribosomal RNAs (rRNA), messenger RNAs (mRNA), and transfer RNAs (tRNA) amongst three RNAPs (Pols I, II and III, respectively) that are each dis- tinguished by unique largest and second-largest subunits forming their respective catalytic cores. Multicellular plants have additional RNAPs (Pols IV and V) derived from duplica- tions of Pol II catalytic subunits (Luo & Hall 2007, Huang et al. 2015). The functions of these conserved plant-specific RNAPs remain unclear, although evidence continues to point to Pols IV and V having various roles in defining heritable regulatory information (Erhard & Hollick 2011, Matzke et al. 2015). The structures and functions of plant-specific RNAPs are being actively elucidated. Their best understood roles are in the maintenance of (TE) silencing by 24- nucleotide (24-nt) RNAs that guide cytosine methylation (Matzke et al. 2015). Pol IV is required to initiate and maintain trans-homolog interactions (THI) that facilitate heritable epigenetic silencing of certain alleles through a process known as paramutation (Brink 1973, Hollick 2010). Other eukaryotes have paramutation-like behaviors (Chandler & Stam 2004, Gabriel & Hollick 2015) and maintain TE silencing (Castel & Martienssen 2013, Sabin et al. 2013) without Pols IV and V. Instead, these organisms rely on Pol II-mediated mechanisms to help specify silencing (Sabin et al. 2013). The subfunctionalization of Pols IV and V in plants (Castel & Martienssen 2013, Huang et al. 2015) provides an opportunity to study their mechanisms of genome regulation without interfering with the essential roles of Pol II in mRNA transcription. Understanding the detailed mechanisms underlying Pol IV behaviors promises to provide fundamental insight into plant genome regulation. In the grass Zea mays (maize), Pol IV may define more than the heritable regulation of TEs and paramutations. Pol IV plays a role in promoting proper maize development (Parkinson et al. 2007, Erhard et al. 2009), not observed in the eudicots Arabidopsis thaliana (Arabidopsis) or Brassica rapa (Herr et al. 2005, Pontier et al. 2005, Onodera et al. 2005, CHAPTER 1. INTRODUCTION 2

Huang et al. 2013). Underlying differences in genome-wide TE abundance and their common presence near genes have been suggested (Hale et al. 2009) to be involved in the increased need for Pol IV in the highly repetitive maize genome (Schnable et al. 2009, Baucom et al. 2009), yet only one example of Pol IV regulation of certain alleles has been characterized fully enough to identify a controlling TE element (Erhard et al. 2013). In this dissertation, I sought to expand the understanding of the mechanisms of Pol IV regulation of the maize genome by characterizing one of its accessory proteins (Chapter 2), identifying its effects on genome transcription (Chapter 3), and identifying a new cis-regulatory feature of an allele susceptible to paramutation targeted by Pol IV (Chapter 4). Together, this work identifies novel sites of regulation by Pol IV and adds to our understanding of paramutation mechanisms and the targets upon which those mechanisms acts.

1.1 Evolutionary origin of plant RNA polymerase structural diversity

Bacterial, archaeal, and eukaryotic RNAPs share both conserved sequences and struc- tures. The core RNAP represented by five bacterial subunits includes a homodimer of α subunits that aid in the assembly of the catalytic core composed of β and β ′, and finally the ω subunit which helps β ′ fold (Mathew & Chatterji 2006, Werner 2008). Archaea and eu- karyotes possess homologs of each of these bacterial subunits, which are supplemented with additional subunits providing expanded functionality. In eukaryotes, ancient duplications of catalytic and other subunits ultimately evolved into the distinct Pols I, II and III found today (Werner & Grohmann 2011). Plants have continued to expand their RNAP diversity with similar subunit duplications and specializations (Luo & Hall 2007, Huang et al. 2015).

Composition of multi-subunit RNA polymerases Eukaryotic Pol II typically contains 12 subunits to form the functional domains of the assembly platform, catalytic core, and the additional stalk domain exclusive to archaeal and eukaryotic polymerases (Figure 1.1A on page 14) (reviewed by Werner 2008, Werner & Grohmann 2011). Following Saccharomyces cerevisiae nomenclature, the largest subunit is designated RPB1, where ‘B’ references the RNAP (Pol II in this case) and the numeral represents the subunit number roughly ordered by decreasing mass. RPB1 and RPB2, which are homologous to β ′ and β respectively, form the RNAP catalytic core. The Pol II assembly platform contains a heterodimer of RPB3 and RPB11 (structurally homologous to the α2 homodimer) plus two additional subunits (RPB10 and RPB12). The ω homolog is RPB6. RPB4 and RPB7 form the Pol II stalk, which has roles in initiation and may also interact with the nascent RNA transcript (Armache et al. 2003). Pols I, II and III are distinguished by their unique catalytic cores although they can share other subunits, for instance all three S. cerevisiae RNAPs share RPB5, RPB6, RPB8, RPB10 and RPB12 subunits (Cramer et al. 2001). CHAPTER 1. INTRODUCTION 3

Evolutionary diversity of RNA polymerase subunits Like all eukaryotic RNAPs, Pols IV and V are distinguished by their largest subunits (RPD1 and RPE1, respectively). To avoid confusion with another Arabidopsis gene abbre- viated rpd1, Arabidopsis Pol IV and V largest subunits are named nuclear d1 (nrpd1 ) and nrpe1. I will maintain their nomenclature when referring to Arabidopsis RNAP subunits specifically, but will continue with the maize, human and S. cerevisiae-style nomenclature at other times. The evolutionary origins of Pols IV and V trace to ancient duplications of RPB1 in the common ancestors of multicellular plants (Luo & Hall 2007, Huang et al. 2015). These alternative polymerases are predicted to have the largest roles in flowering plants, where they seem to be actively evolving based on various numbers of subunit homologs encoded by flowering plant genomes (Erhard et al. 2009, Ream et al. 2009, Haag et al. 2014). However, the functional characterization of these additional RNAP sub- units has been limited to the handful of species where rpd1 /nrpd1 (maize, Arabidopsis, and Brassica rapa) or nrpe1 (Arabidopsis) mutants have been identified (Herr et al. 2005, Kanno et al. 2005, Onodera et al. 2005, Pontier et al. 2005, Erhard et al. 2009, Huang et al. 2013). The unique carboxy-terminal domains (CTDs) of RPD1 and RPE1 highlight potential differences in the regulation and function of Pols IV and V compared to Pol II. RPD1 and RPE1 have the eight conserved domains found in all largest subunits (Jokerst et al. 1989); however, their CTDs are entirely distinct (Herr et al. 2005, Pontier et al. 2005, Huang et al. 2015). The RPB1 CTD coordinates Pol II transcription and association of accessory proteins responsible for downstream processing of the RNA transcript (reviewed by Egloff & Murphy 2008). Tandem 7-mer (heptad) repeats characterize the RPB1 CTD, each repeat (consensus: YSPTSPS) contains several serine residues whose phosphorylation states affect Pol II activity (Egloff & Murphy 2008). Both RPD1 and RPE1 CTDs lack RPB1-like heptad repeats, highlighting the potential for differences in regulation and early processing of the nascent transcript. Pol V has the more complex CTD, consisting of repeats enriched in GW, WG or GWG sequences that facilitate binding of (AGO) proteins (El-Shami et al. 2007). RPE1 subunits in different species have varying numbers of repeats as well as different preferences for GW, WG or GWG motifs, although how these differences affect Pol V biology in different species remains unclear (Huang et al. 2015). Both Pols IV and V have a DeCL domain named for its homology to a nuclear-encoded gene (defective chloroplasts and leaves) involved in chloroplast rRNA processing (Bellaoui et al. 2003, Herr et al. 2005, Pontier et al. 2005). The relatively short Pol IV CTD includes only this DeCL domain (Herr et al. 2005, Onodera et al. 2005, Pontier et al. 2005). Finally, structural differences in the bridge domain of the catalytic core render maize and Arabidopsis Pols IV and V insensitive to the Pol II poison α-amanitin (Erhard et al. 2009, Haag et al. 2012, Haag et al. 2014). Unlike Pols I, II and III, Pols IV and V can share the same second-largest subunit to complete their respective catalytic cores (Ream et al. 2009, Haag et al. 2014). Yet, even though only one RP(D/E)2 subunit is required, many plant genomes encode multiple RP(D/E)2 paralogs (Stonaker et al. 2009, Sidorenko et al. 2009, Haag et al. 2014). Indeed, duplications of RPB2 have also been identified in both grasses and eudicots (Haag et al. CHAPTER 1. INTRODUCTION 4

2014). However, second-largest subunit duplications tend to be poorly conserved between different groups of plants, again reflecting the rapid evolution of plant RNAPs. Grasses are an exception to this trend; all grasses sequenced maintain multiple RP(D/E)2 genes which clade into two separate groups (Stonaker et al. 2009, Sidorenko et al. 2009, Haag et al. 2014). Based on association with maize RPD1 and/or RPE1 through coimmunoprecipitation experiments on callus tissue, one clade defines RP(D/E)2 subunits that can associate with either polymerase, while the other clade represents RPE2 subunits found only with Pol V (Haag et al. 2014). The non-catalytic subunits, in analogy to the three eukaryotic RNAPs, can either be shared amongst varying numbers of RNAPs or be exclusive to a specific RNAP. In Ara- bidopsis, distinctions between Pols IV and V are thought to be linked to duplications of RPB5, two of which are either shared by Pols II and IV (NRP(B/D)5a) or shared by Pols IV and V (NRP(D/E)5b) (Ream et al. 2009, Law et al. 2011, Tucker et al. 2011). The stalk domain distinguishes Pols IV and V from Pol II, particularly in angiosperms: both Arabidopsis and maize maintain Pols IV and V-specific fourth and seventh subunits (Ream et al. 2009, Law et al. 2011, Haag et al. 2014) and phylogenetic analyses link the divergence of an RP(D/E)4 subunit to angiosperms (Tucker et al. 2011, Huang et al. 2015). The implications of a non-Pol II stalk domain on the template recognition and transcript processing of Pols IV and V remain unclear. However, varying the template and transcript interaction domains of Pols IV and V may be key to changing the behaviors of these RNAPs from their progenitor Pol II-like counterparts: when identified on the S. cerevisiae Pol II crystal structure (Cramer et al. 2001), the maize Pol IV subunits not shared with Pol II comprise the RNAP face that interacts with the incoming template and outgoing transcript (Haag et al. 2014). More alternative RNAP subunits are identified with each new plant genome assembly; what remains unclear is how incorporation of these alternative subunits affects RNAP functions in various species.

Structural composition of Pols IV and V complexes The functional importance of rapid RNAP subunit evolution in plants remains unclear, particularly when several subunits have multiple paralogs that can all associate with the same RNAP. For instance, the maize genome has two paralogs each of RPB2, RPB5 and RPB9 (Haag et al. 2014). Does each duplicate subunit combinatorially form many different holoenzymes–producing eight Pol II holoenzymes in this example. Or do the duplicates form sets (RPB2a, RPB5a, and RPB9a vs. RPB2b, RPB5b, and RPB9b) that define only a few distinct holoenzymes–two in this case. Functional studies in maize have begun to address these questions for the Pols IV and V second-largest subunits, of which there are three (RP(D/E)2a, RP(D/E)2b, and RPE2c) (Stonaker et al. 2009, Sidorenko et al. 2009, Haag et al. 2014) (Figure 1.1B on page 14). Comparing rp(d/e)2a (required to maintain repres- sion7 : rmr7 and mediator of paramutation2 : mop2 alleles) and rpd1 (rmr6 alleles) mutant phenotypes, RP(D/E)2a affects only a subset of all the Pol IV functions lost in the absence of RPD1 (Parkinson et al. 2007, Erhard et al. 2009, Stonaker et al. 2009, Sidorenko et al. CHAPTER 1. INTRODUCTION 5

2009). In callus tissue, only two Pol IV subtypes are detected, containing either RP(D/E)2a or RP(D/E)2b subunit (Haag et al. 2014). If the same subtypes are present in other tissues then non-RP(D/E)2a Pol IV functions are either specific to the RP(D/E)2b-defined Pol IV subtype or can be performed by any Pol IV subtype. Mutant alleles of maize rpe1, rp(d/e)2b, and rpe2c are needed to genetically assess the second-largest subunit requirements for various Pol IV functions, and additional coimmunoprecipitation experiments on other tissues and with other Pol IV subunits as baits will also help profile the composition of Pol IV subtypes. Although Arabidopsis has only a single RP(D/E)2 subunit, duplicates of other subunits may play an analogous role to that of maize RP(D/E)2 subunits in defining further RNAP diversity. Coimmunoprecipitation of the Arabidopsis Pols IV and V largest subunits (NRPD1 and NRPE1, respectively) identify a different set of shared vs. unique subunits than in maize (Ream et al. 2009, Law et al. 2011, Haag et al. 2014). In particular, maize has more Pol II-specific subunits that are shared with Pols IV and/or V in Arabidopsis (Ream et al. 2009, Law et al. 2011, Haag et al. 2014). Although the Pol II assembly platform can associate with Pols IV and V, Arabidopsis and maize may define additional Pols IV and/or V-specific platforms through duplications of their third and 10th subunits respectively (Ream et al. 2009, Law et al. 2011, Haag et al. 2014). In addition to the Pols IV and V-specific stalk domains found in both species, Arabidopsis has a third Pol IV-specific 7th subunit (Ream et al. 2009, Law et al. 2011). Together, these differences highlight the potential diversity of RNAP subtypes and raise questions about their respective functional roles. In addition to RNAP subunit composition, accessory proteins both directly and indirectly define these plant-specific RNAPs, as well as provide insight into their roles in genome biol- ogy. Coimmunoprecipitation experiments physically link many of these accessory proteins to large complexes centered around their respective RNAPs (Law et al. 2011, Haag et al. 2014). The RNA-dependent RNA polymerase (RDR2) that synthesizes the second strand of Pol IV transcripts prior to cleavage into 24-nt RNAs (Chan et al. 2004, Xie et al. 2004, Alleman et al. 2006), physically associates with Pol IV in both Arabidopsis and maize (Law et al. 2011, Haag et al. 2014). Indeed, Arabidopsis RDR2 in vitro RNA polymerization requires association with RPD1 (Haag et al. 2012). Reciprocal coimmunoprecipitations identify phys- ical association of maize RDR2 with both RPD1 and RP(D/E)2a (Haag et al. 2014). RDR2 could represent a component specific for only one maize Pol IV subtype (Stonaker et al. 2009, Sidorenko et al. 2009). Most peptides found associated with RDR2 represent either RP(D/E)2a or RP(D/E)2b proteins, but some were specific to RP(D/E)2a and none were specific to RP(D/E)2b alone (Haag et al. 2014). Although RPD1 and RP(D/E)2 homologs can be identified in plants from liverworts to gymnosperms and angiosperms, RDR2 ho- mologs have only been found in gymnosperm and angiosperm genomes (Huang et al. 2015), which may indicate a fundamental difference between Pol IV complexes between seed and non-seed bearing plants. Classification of RNAP accessory proteins promises to further illu- minate the differences in structure and function of Pol IV and V complexes in plants. (The characterization of one such accessory protein is discussed in Chapter 2 of this work.) CHAPTER 1. INTRODUCTION 6

1.2 Transcriptional activity of Pol IV

Although Pol IV was first described ten years ago (Herr et al. 2005, Onodera et al. 2005, Pontier et al. 2005), its transcripts remained elusive. Pol IV transcripts were previously inferred from their ultimate products: 24-nt RNAs. However, the short sequence length and repetitious nature of 24-nt RNAs made identifying sites of Pol IV transcription difficult, particularly with highly repetitive genomes like maize.

Nature of Pol IV products RPD1 mutants have been identified in Arabidopsis (Kanno et al. 2005, Herr et al. 2005, Onodera et al. 2005, Pontier et al. 2005), its relative Brassica rapa (Huang et al. 2013), and maize (Erhard et al. 2009). In all three cases, Pol IV is required for the production of 24-nt RNAs whose sequences indicate possible sites of Pol IV transcription directly in Arabidopsis (Zhang et al. 2007, Mosher et al. 2008) and indirectly by identifying RDR2-dependent 24-nt RNAs in maize (Nobuta et al. 2008, Gent et al. 2014). These small RNA profiles identified both potential originating loci and targets of 24-nt RNA-mediated cytosine methylation but could not characterize the nature of the Pol IV transcript itself. In vitro transcription reactions identified a prefered Pol IV target: a bipartite molecule of DNA template and RNA primer reminiscent of a transcription bubble (Haag et al. 2012). The tight coupling of Pol IV and RDR2 (Law et al. 2011, Haag et al. 2012, Haag et al. 2014) indicates that the difficulty in identifying Pol IV transcripts might relate to the rapid cleavage of dsRNAs from Pol IV complexes into 24-nt RNAs. Therefore, to identify 24-nt RNA precursors, the Chen group used an Arabidopsis -like2 (dcl2 ), dcl3, dcl4 triple mutant to enrich for uncleaved Pol IV-RDR2 dsRNAs (Li et al. 2015). From these molecules, they found that Pol IV transcripts lack the capping, splicing and poly-adenylation of Pol II transcripts (Li et al. 2015), consistent with the absence of an RPB1-like CTD which coordinates the required for mRNA processing (Egloff & Murphy 2008). Instead, Pol IV transcripts both begin with a 5 ′-monophosphate and lack the characteristic strand preference of Pol II (Li et al. 2015).

Nature of Pol IV templates Pol IV, like Pols II and V, have a preference for transcription initiation at AT-rich, nucleo- some poor regions based on alignment of uncleaved Pol IV-RDR2 products to the Arabidopsis genome (Li et al. 2015). In maize, there is evidence of Pol IV activity at gene boundaries through both 24-nt RNA alignments (Gent et al. 2014, Erhard et al. 2015) and alterations to Pol II transcription profiles in the absence of Pol IV (Erhard et al. 2015, Chapter 3 of this work). In both organisms, Pol IV activity is enriched over TEs (Zhang et al. 2007, Mosher et al. 2008, Nobuta et al. 2008, Gent et al. 2014). However, unlike Arabidopsis, the maize genome is >85% repetitive and its TEs are often immediately abutting genes or embedded within their introns (Schnable et al. 2009, Baucom et al. 2009). Pol IV is hypothesized to act CHAPTER 1. INTRODUCTION 7

near genes to control these infiltrating TEs, largely without compromising gene expression (Gent et al. 2014). Indeed some examples of Pol IV-dependent tissue-specific gene silencing through interactions with a nearby TE have been characterized in maize (Erhard et al. 2013, Chapter 3 of this work).

Pol IV recruitment proteins Some of the accessory proteins identified in Arabidopsis and maize may have recruitment roles helping define certain Pol IV loci. The best characterized of these is the reader SAWADEE HOMEODOMAIN HOMOLOG1/DNA-BINDING TRANSCRIPTION FACTOR1 (SHH1/DTF1) (Law et al. 2011, Zhang et al. 2013). The SAWADEE home- odomain of SHH1/DTF1 adopts a tandem Tudor domain-like fold that recognizes histone 3 lysine 4 (H3K4) and H3K9 methylation statuses (Law et al. 2013). In particular, SHH1 prefers unmethylated or monomethylated H3K4 and methylated H3K9 (Law et al. 2013). Combined with its physical association with Pol IV (Law et al. 2011), such affinity to re- pressed chromatin is hypothesized to explain Pol IV recruitment to already silenced TEs further maintaining silencing (Law et al. 2013). However, SHH1-dependent 24-nt RNAs ac- count for only half of Pol IV-dependent 24-nt RNAs (Law et al. 2013), indicating that other Pol IV recruitment mechanisms must also exist. Maize Pols IV and V both associate with SHH2 (Haag et al. 2014), a putative ortholog of the other Arabidopsis SAWADEE home- odomain protein that also preferentially binds H3K4 and H3K9me residue combinations (Law et al. 2013, Zhang et al. 2013), raising the question of how SHH2 may affect maize Pols IV and V recruitment. Pol V association with its template DNA requires a three-subunit putative chromatin- remodeler complex which includes a Rad54-like ATPase (DEFECTIVE IN RNA-DIRECTED DNA METHYLATION1: DRD1) that is predicted to have a role in unwinding template DNA (Zhong et al. 2012, Matzke & Mosher 2014). Close homologs of DRD1, RMR1 (maize) and both CLSY1 and RMR1-like proteins (Arabidopsis) physically associate with Pol IV complexes (Law et al. 2011, Haag et al. 2014). DRD1, CLSY1, and RMR1 each represent distinct, though related, clades (Hale et al. 2007) raising the question of whether CLSY1 and RMR1 play DRD1-like roles for Pol IV. Arabidopsis Pol IV complexes have poor in vitro activity on tripartite templates where strand displacement is required (Haag et al. 2012), although the necessary accessory proteins for this processivity might have been absent from the immuno-purified Pol IV complex as only RNAP subunits, RDR2, and a nuclear import protein were identified by mass spectrometry of similarly immuno-purified Pol IV complexes (Haag et al. 2012). Curiously, RMR1 affects transcript stability rather than transcription rate of at least one particular maize allele (Hale et al. 2007), which is inconsistent with a role in Pol IV recruitment because RPD1 loss affects the transcription rate of that same allele (Hollick et al. 2005). Further functional and biochemical analyses are needed to fully catalog and understand the details of Pol IV recruitment, regulation and transcription. CHAPTER 1. INTRODUCTION 8

1.3 Functions of Pol IV in plants

Studies of Pol IV mutants in Arabidopsis and maize have identified two mechanisms by which they affect Pol II transcription. Pol IV-dependent 24-nt RNAs can trigger changes in the chromatin environment that inhibits Pol II transcription (Matzke & Mosher 2014, Matzke et al. 2015), or independent of these 24-nt RNAs and the downstream proteins needed to make them, maize Pol IV can prevent Pol II transcription through a competition mechanism (Hale et al. 2009). The details of these two potentially overlapping mechanisms are discussed below.

Inhibition of Pol II transcription by chromatin modification In our current understanding of the Pol IV RdDM pathway (Matzke & Mosher 2014, Matzke et al. 2015), Pol IV transcripts are immediately made double-stranded by RDR2 in the Pol IV complex. The dsRNA is then cleaved into 24-nt RNAs by DICER-LIKE3 (DCL3). Specific ARGONAUTE (AGO) proteins (AGO4 and AGO6 types) incorporate the resulting 24-nt RNAs. Protein interactions between AGO and WG/GW motifs in Pol V’s RPE1 CTD as well as base-pairing interactions between the 24-nt RNA and the Pol V nascent transcript presumably collaborate to recruit 24-nt RNA-AGO complexes to sites of Pol V transcription. Together, this complex recruits de novo cytosine methyltransferases of the DOMAINS REARRANGED METHYLASE (DRM) class. Cytosine methylation in plants exists in three contexts: symmetric CG and CHG as well as asymmetric CHH, where H is an A, T, or C nucleotide. Maintenance methyltrans- ferases can maintain CG and CHG methylation from hemimethylated templates, but CHH methylation must be newly deposited after each mitosis through the memory of either 24-nt RNAs via DRM2 in the RdDM pathway or chromatin modifications interpreted by CHRO- MOMETHYLASE2 (CMT2) methyltransferase (Zemach et al. 2013, Stroud et al. 2013). Meanwhile, H3K9 methyltransferase KRYPTONITE is recruited by 5-methyl cytosine mod- ifications (Du et al. 2014). Together, the recruitment of cytosine methyltransferases through histone modifications and recruitment of histone methyltransferases by cytosine methylation presents a cyclic loop to maintain a repressed chromatin state (Du et al. 2014).

Inhibition of Pol II transcription by competition with Pol IV Loss of the Pol IV RPD1 subunit has many effects on maize plants that are not shared by plants lacking other 24-nt biogenesis proteins (Parkinson et al. 2007, Erhard et al. 2009, Hale et al. 2009). Increases in poly-adenylated CRM retrotransposon transcripts in the absence of RPD1, but not in rdr2 nor rmr1 mutants, led to the hypothesis that Pol IV competes with Pol II at certain loci (Hale et al. 2009). While mechanisms explaining these competitions include either titration of shared subunits or physical competition for a locus, both ultimately result in an inhibition of Pol II at undesired loci, like TEs. This early experiment (Hale et al. 2009) also setup a system to distinguish between polymerase competition and 24-nt RNA CHAPTER 1. INTRODUCTION 9

mediated Pol II regulation, by testing both the effects of Pol IV (RPD1) loss and the loss of downstream 24-nt RNA biogenesis proteins. Similar tests indicate that certain pl1 alleles are controlled by an upstream TE fragment through a 24-nt RNA mechanism in maize kernels (Erhard et al. 2013). While the genome-wide contributions of 24-nt RNA-mediated and polymerase competition-mediated Pol II regulation remain unclear, high throughput sequencing of nascent transcripts (GRO-seq, Core et al. 2008) provides a new way to identify changes in Pol II transcription and identify genes that may be targets of Pol IV-mediated regulation (Erhard et al. 2015, Chapter 3 of this work).

1.4 Examples of genome regulation by Pol IV

Studies of maize and Arabidopsis Pol IV mutants have catalogued many associated phe- notypes. The effects of rpd1 and accessory protein mutants on regulation of both endogenous genes and TEs have identified both common and distinct behaviors. These mutants can also affect development and regulatory states of epialleles. While the underlying mechanisms are still largely unknown, this catalog of mutant phenotypes provides an important frame- work that must be accommodated by any proposed mechanisms of Pol IV-mediated genome regulation.

Roles for Pol IV in development Plants deficient in Pol IV activity through rpd1 mutant alleles have varying effects on development. Arabidopsis nrpd1 mutants are delayed in flowering but otherwise develop as expected under normal environmental conditions (Pontier et al. 2005). The B. rapa nrpd1 mutants have no obvious developmental defects (Huang et al. 2013). In contrast to these eudicot representatives, maize rpd1 mutants have several developmental defects. Maize rpd1 mutants are shorter and have delayed flowering compared to their wild-type siblings (Parkinson et al. 2007). Such mutants can also display feminization of male tassels, leaf tissue mispolarization and tissue outgrowths on their stems (Parkinson et al. 2007, Erhard et al. 2009). While the dysregulated gene(s) underlying most of these phenotypes are still being identified, epistasis experiments with silkless1 (sk1 )–a locus encoding a protector of female pistil abortion (Calderon-Urrea & Dellaporta 1999)–indicate that sk1 ectopic expression in rpd1 mutant tassels may lead to the feminized tassel phenotype (Parkinson et al. 2007). The sk1 genomic locus has not been identified, prohibiting detailed characterization of its genomic context or mRNA abundance. However, the lack of similar phenotypes in downstream 24-nt RNA biogenesis mutants (Hale et al. 2009, Stonaker et al. 2009, Barbour et al. 2012) supports a polymerase competition model to explain the dysregulation of sk1 in rpd1 mutants. While still largely mechanistically uncharacterized, the developmental phenotypes of maize rpd1 mutants stand as compelling examples of the importance of Pol IV in genome regulation. CHAPTER 1. INTRODUCTION 10

Roles for Pol IV in paramutation Pol IV is required for the establishment and maintenance of epigenetic states known as paramutations (Hollick et al. 2005). Paramutations describe both the action and result of allele-specific, directed, and heritable epigenetic changes that are facilitated by trans- homolog interactions (THI) (Brink 1958, Hollick 2012). Several examples of paramutation and paramutation-like behaviors have been described in plants and animals (Chandler & Stam 2004, Gabriel & Hollick 2015). Likely because of their visible phenotypes, the best understood paramutation examples occur at alleles of maize transcription factors controlling kernel or plant pigmentation. These include the B1-Intense allele of booster1 (b1 ), the Pl1- Rhoades allele of purple plant1 (pl1 ), as well as alleles of red color1 (r1 ) and pericarp color1 (p1 ) (Brink 1956, Coe 1959, Hollick et al. 1995, Sidorenko & Peterson 2001). However, despite their continued characterization, the underlying mechanisms governing these specific THIs, including the role of Pol IV complexes, remain unclear. The Pl1-Rhoades (Pl1-Rh) allele exemplifies several paramutation behaviors that are af- fected by Pol IV. Pl1-Rh exists in distinct epigenetic states: Pl ′ and Pl-Rh, which have low vs. high expression, respectively, and result in light vs. dark pigmentation of several tissues including pollen producing anthers (Hollick et al. 1995). When combined in a heterozy- gote, the paramutagenic Pl ′ allele facilitates the repression of the paramutable Pl-Rh allele through an unknown mechanism that requires RPD1 (Pol IV) and RDR2 (Dorweiler et al. 2000, Hollick et al. 2005). The heterozygote then transmits only Pl ′ alleles regardless of which homologous chromosome (from either paternal or maternal origins) is passed to the next generation. Mosaic studies of the establishment of b1 paramutation indicate that such heterozygotes experience a gradual repression that initially requires the continued presence of the paramutagenic state (Coe 1961). At an unknown point in development, the repression of the paramutable allele becomes meiotically stable. Analogously, loss of either RMR1 or RMR2 (Pol IV accessory proteins) will interrupt early silencing of the paramutable allele in Pl ′ / Pl-Rh heterozygotes, but new meiotically heritable paramutations can still be trans- mitted to the next generation, although not at 100% efficiency (Hale et al. 2007, Barbour et al. 2012). RPD1 is also required for the somatic and meiotic maintenance of existing Pl ′ paramu- tations (Hollick et al. 2005). The Pl ′ state is normally stable, however, when it is combined with a deficiency on the homologous chromosome some of the Pl1-Rh alleles are transmit- ted to the next generation in a derepressed (or reverted) Pl-Rh state (Hollick & Chandler 1998, Gross & Hollick 2007). This finding indicates that, just as THIs are important for establishing paramutations, they are equally necessary for maintaining existing paramuta- tions. Neutral pl1 alleles that do not participate in paramutation can be divided into two classes: those that stabilize Pl ′ in trans (stabilizing alleles) and those that do not (amor- phic alleles) (Hollick & Chandler 1998, Gross & Hollick 2007). This finding indicates that the cis-regulatory sequences important for maintenance THIs are genetically separable from those required for establishment of new paramutations, although their necessity in paramu- tation remains unclear. Again, Pol IV complexes (including RPD1, RDR2, and RMR1) are CHAPTER 1. INTRODUCTION 11 required as trans-acting factors to maintain existing paramutations (Dorweiler et al. 2000, Hollick & Chandler 2001, Hollick et al. 2005). However, neither Pol IV subunit RP(D/E)2a nor Pol IV accessory protein RMR2 are required to maintain Pl ′ through meiosis (Stonaker et al. 2009, Barbour et al. 2012). THIs are required for both the establishment and maintenance of paramutations. How- ever, the nature of these interactions between homologous chromosomes remains unclear. Two mechanisms for paramutation THI have been proposed where either physical contact of the homologous chromosomes or a diffusible substance transmits (paramutation estab- lishment) or reinforces (paramutation maintenance) the repressive epigenetic marks. The involvement of Pol IV complexes in both behaviors predicts that Pol IV-dependent 24-nt RNAs might represent a diffusible substance for paramutation THIs. However, the complex array of effects that different Pol IV subunits and accessory proteins have on paramutation behaviors necessitates a special subset of 24-nt RNAs that are only lost in rpd1 and rdr2 mutants to prevent the establishment of paramutations and are only absent in rpd1, rdr2 and rmr1 mutants to disrupt the maintenance of paramutations. A more complete sequence of the regions being sensed via THI is needed to help distinguish between these two models. None of the known paramutable or paramutagenic alleles were sequenced in the B73 genome (Schnable et al. 2009). While the reference genome is useful for many gene-centric activities, the diversity of maize lines, particularly in terms of repeat content in intergenic regions (Wang & Dooner 2006), makes it difficult to predict the nature of important cis- regulatory sequences of these paramutation alleles. The paramutable B1-Intense (B1-I ) haplotype has been best characterized, including the identification of seven near-identical repeats of 853 bases each that serve as a distal enhancer sufficient for B1-I paramutation (Stam et al. 2002a, Stam et al. 2002b, Belele et al. 2013). With the advent of very long-read sequencing technologies like Oxford Nanopore and Pacific Biosciences platforms, sequencing of paramutation allele haplotypes can extend over complex repetitive regions common in in- tergenic maize sequences (Wang & Dooner 2006, Schnable et al. 2009) to potentially identify new cis-regulatory sequences that can then be used as references to identify where, when and how Pol IV complexes help to mediate different trans-homolog interactions important for paramutations (discussed in Chapter 4 of this work).

Gene silencing by nearby transposable elements (TEs) Barbara McClintock initially described TEs as controlling elements in reference to their ability to affect gene expression through the combined action of nearby and distant TE sequences (McClintock 1951). Approximately 85% of the maize genome is estimated to be made of TE sequences, which are found both in heterochromatic regions and in close juxtaposition to protein coding genes (Schnable et al. 2009, Baucom et al. 2009). At Pl1- Rhoades, an immediately upstream CACTA TE fragment of the doppia family is a target of Pol IV complexes: the absence of RPD1 conditions ectopic pl1 expression in kernels (Erhard et al. 2013). Similar ectopic expression patterns are observed both with loss of other 24-nt RNA biogenesis proteins and with other pl1 alleles that have similar doppia fragments but CHAPTER 1. INTRODUCTION 12

do not participate in paramutations (Erhard et al. 2013). Recent genome-wide sequencing of maize nascent transcription, has identified 128 genes that have significantly increased or decreased sense-strand transcription in the absence of RPD1 (Erhard et al. 2015, Chapter 3 of this work). Together these present several candidate genes that may be controlled through Pol IV effects on nearby TEs.

1.5 Remaining questions

Pol IV complexes play roles in TE suppression, paramutation, gene expression and devel- opment, all by affecting Pol II transcription. Two general mechanisms have been identified to explain Pol IV effects on Pol II transcription: through the downstream products of Pol IV (24-nt RNAs) that mediate inhibitory chromatin modifications, and through polymerase competition, presumably independent of Pol IV products. Yet several unanswered ques- tions remain both in terms of general Pol IV biology and its role(s) in regulating genome homeostasis. Phylogenetic, genetic and proteomic evidence indicates that multiple types of Pol IV complexes exist through varied incorporation of subunit paralogs and accessory proteins, but just how many types of Pol IV complexes exist; what is the extent of their functional overlap; and which associated proteins are crucial to define a particular functional role remain unclear. While many details of the TE-suppressing RdDM pathway have been described, the species specific importance of RdDM-like silencing and the amount of inter- play with other epigenetic pathways remain unclear, particularly outside of the model plants maize and Arabidopsis. More study of Pol IV complexes in other plants will be needed to determine the general trends of Pol IV function and help to determine the evolutionary role for increased RNAP diversity in plants. Understanding Pol IV’s role(s) in maize develop- ment and how gene expression can become regulated by Pol IV promise to open new avenues for breeding or engineering epigenetically controlled phenotypic variation in this and other crops. The mechanism(s) underlying paramutation have long been enigmatic, identifying the key cis-regulatory sequencings and characterizing the effects of Pol IV complexes on those sequences promises to increase our understanding of both the establishment and maintenance of paramutation. In this dissertation, I begin to address several of these questions by exploring how Pol IV complexes affect maize genome regulation. In Chapter 2, I discuss the identification of a new Pol IV accessory protein (RMR2) and use phylogenetic and bioinformatic analyses to explore potential roles for RMR2, which defines a family of proteins of unknown function found in multicellular plants. I also evaluate a potential functional link with RP(D/E)2a defined Pol IV subtypes through characterization of genetic interactions between rmr2-1 and several alleles of rp(d/e)2a. In Chapter 3, to characterize the effects of Pol IV complexes on genome-wide transcription, I analyzed nascent transcriptome data of both wild-type and rpd1 mutant seedlings. With the use of a versatile metagene analysis suite that I wrote (www.github.com/HollickLab/metagene_analysis), I identified a novel role of Pol IV at gene boundaries. Finally, in Chapter 4, I identify a candidate paramutation element for the Pl1- CHAPTER 1. INTRODUCTION 13

Rhoades allele and characterize its structural and expression variation in several neutral and Pl1-Rhoades derivative alleles. With the extended Pl1-Rhoades reference sequence, I aligned 24-nt RNAs from several rmr mutant libraries to build a potential model for Pol IV complexes associated with this region and their results on paramutation behavior. Together this work increases the knowledge of Pol IV complex composition, identifies several genomic targets of Pol IV, and explores the details of Pol IV interactions at a new portion of the Pl1-Rhoades haplotype. CHAPTER 1. INTRODUCTION 14

1.6 Figures

A RNAP subunits and domains 4 7 8 6 Catalytic Core ' RPB1 Assembly Platform 5 11 Stalk Domain 3 10 RPB2 9 12 Bacterial RNAP Eukaryotic Pol II B Maize Pol II, IV and V subtypes in callus tissue

Pol IIa Pol IIb RPB2a RPB2b

Pol IVa Pol IVb RP(D/E)2a RP(D/E)2b

Pol Va Pol Vb Pol Vc RP(D/E)2a RP(D/E)2b RPE2c

Figure 1.1: RNAP holoenzyme composition. (A) Models of bacterial RNAP and eukaryotic Pol II holoenzymes. Identical colors highlight homology between bacterial and eukaryotic RNAP subunits. The three main domains–catalytic core (blues), assembly platform (greens), and stalk domain (pink)–are highlighted. To save space, only the subunit number is rep- resented for the smaller Pol II subunits. (B) Diagrams of the Pol II, IV and V subtypes identified in maize callus tissue (Haag et al. 2014) based on their different second-largest subunits. Smaller Pol IV/V subunits that are still shared with Pol II are kept in yellow, while Pol V subunits that are still shared with Pol IV are kept in bright green. CHAPTER 1. INTRODUCTION 15

1.7 References

1. Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J. E., White, J., Sikkink, K. & Chandler, V. L. (2006) An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442, 295–298. 2. Armache, K.-J., Kettenberger, H. & Cramer, P. (2003) Architecture of initiation- competent 12-subunit RNA polymerase II. Proceedings of the National Academy of Sciences 100, 6964–6968. 3. Barbour, J. R., Liao, I. T., Stonaker, J. L., Lim, J. P., Lee, C. C., Parkinson, S. E., Kermicle, J., Simon, S. A., Meyers, B. C., Williams-Carrier, R., Barkan, A. & Hollick, J. B. (2012) required to maintain repression2 is a novel protein that facilitates locus- specific paramutation in maize. The Plant Cell 24, 1761–1775. 4. Baucom, R. S., Estill, J. C., Chaparro, C., Upshaw, N., Jogi, A., Deragon, J.-M., Westerman, R. P., SanMiguel, P. J. & Bennetzen, J. L. (2009) Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genetics 5, e1000732. 5. Belele, C. L., Sidorenko, L., Stam, M., Bader, R., Arteaga-Vazquez, M. A. & Chandler, V. L. (2013) Specific tandem repeats are sufficient for paramutation-induced trans- generational silencing. PLoS Genetics 9, e1003773. 6. Bellaoui, M., Keddie, J. S. & Gruissem, W. (2003) DCL is a plant-specific protein re- quired for plastid ribosomal RNA processing and embryo development. Plant Molecular Biology 53, 531–543. 7. Brink, R. A. (1973) Paramutation. Annual Review Genetics 7, 129–152. 8. Brink, R. A. (1956) A genetic change associated with the R locus in maize which is directed and potentially reversible. Genetics 41, 872–889. 9. Brink, R. A. (1958) Paramutation at the R locus in maize. Cold Spring Harbor Symposia on Quantitative Biology 23, 379–391. 10. Calderon-Urrea, A. & Dellaporta, S. L. (1999) Cell death and cell protection genes determine the fate of pistils in maize. Development 126, 435–441. 11. Castel, S. E. & Martienssen, R. A. (2013) RNA interference in the nucleus: roles for small RNAs in transcription, epigenetics and beyond. Nature Reviews Genetics 14, 100–112. 12. Chan, S. W.-L., Zilberman, D., Xie, Z., Johansen, L. K., Carrington, J. C. & Jacobsen, S. E. (2004) RNA silencing genes control de novo DNA methylation. Science 303, 1336. 13. Chandler, V. L. & Stam, M. (2004) Chromatin conversations: mechanisms and impli- cations of paramutation. Nature Reviews Genetics 5, 532–544. 14. Coe, E. H. (1959) A regular and continuing conversion-type phenomenon at the B locus in maize. Proceedings of the National Academy of Sciences 45, 828–832. CHAPTER 1. INTRODUCTION 16

15. Coe, E. H. Jr. (1961) A test for somatic mutation in the origination of conversion-type inheritance at the B locus in maize. Genetics 46, 707–710. 16. Core, L. J., Waterfall, J. J. & Lis, J. T. (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845– 1848. 17. Cramer, P., Bushnell, D. A. & Kornberg, R. D. (2001) Structural basis of transcription: RNA Polymerase II at 2.8 Ångstrom resolution. Science 292, 1863–1876. 18. Dorweiler, J. E., Carey, C. C., Kubo, K. M., Hollick, J. B., Kermicle, J. L. & Chandler, V. L. (2000) mediator of paramutation1 is required for establishment and maintenance of paramutation at multiple maize loci. The Plant Cell 12, 2101–2118. 19. Du, J., Johnson, L. M., Groth, M., Feng, S., Hale, C. J., Li, s., Vashisht, A. A., Gallego- Bartolome, J., Wohlschlegel, J. A., Patel, D. J. & Jacobsen, S. E. (2014) Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Molecular Cell 55, 495–504. 20. Egloff, S. & Murphy, S. (2008) Cracking the RNA polymerase II CTD code. Trends in Genetics 24, 280–288. 21. Erhard, K. F. Jr. & Hollick, J. B. (2011) Paramutation: A process for acquiring trans- generational regulatory states. Current Opinions in Plant Biology 14, 1–7. 22. Erhard, K. F. Jr., Parkinson, S. E., Gross, S. M., Barbour, J. R., Lim, J. P. & Hollick, J. B. (2013) Maize RNA Polymerase IV defines trans-generational epigenetic variation. The Plant Cell 25, 808–819. 23. Erhard, K. F. Jr., Stonaker, J. L., Parkinson, S. E., Lim, J. P., Hale, C. J. & Hollick, J. B. (2009) RNA Polymerase IV functions in paramutation in Zea mays. Science 323, 1201–1205. 24. Erhard, K. F. Jr., Talbot, J. R. B., Deans, N. C., McClish, A. E. & Hollick, J. B. (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199, 1107–1125. 25. Gabriel, J. M. & Hollick, J. B. (2015) Paramutation in maize and related behaviors in metazoans. Seminars in Cell and Developmental Biology Accepted Manuscript. 26. Gent, J. I., Madzima, T. F., Bader, R., Kent, M. R., Zhang, X., Stam, M., McGinnis, K. M. & Dawe, R. K. (2014) Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. The Plant Cell. 27. Gross, S. M. & Hollick, J. B. (2007) Multiple trans-sensing interactions affect meiotically heritable epigenetic states at the maize pl1 locus. Genetics 176, 829–839. 28. Haag, J. R., Brower-Toland, B., Krieger, E. K., Sidorenko, L., Nicora, C. D., Norbeck, A. D., Irsigler, A., LaRue, H., Brzeski, J., McGinnis, K., Ivashuta, S., Pasa-Tolic, L., Chandler, V. L. & Pikaard, C. S. (2014) Functional diversification of maize RNA poly- merase IV and V subtypes via alternative catalytic subunits. Cell Reports 9, 378–390. CHAPTER 1. INTRODUCTION 17

29. Haag, J. R., Ream, T. S., Marasco, M., Nicora, C. D., Norbeck, A. D., Pasa-Tolic, L. & Pikaard, C. S. (2012) In vitro transcription activities of Pol IV, Pol V, and RDR2 reveal coupling of Pol IV and RDR2 for dsRNA synthesis in plant RNA silencing. Molecular Cell 48, 811–818. 30. Hale, C. J., Erhard, K. F. Jr., Lisch, D. & Hollick, J. B. (2009) Production and pro- cessing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genetics 5, e1000598. 31. Hale, C. J., Stonaker, J. L., Gross, S. M. & Hollick, J. B. (2007) A novel Snf2 protein maintains trans-generational regulatory states established by paramutation in maize. PLoS Biology 5, e275. 32. Herr, A. J., Jensen, M. B., Dalmay, T. & Baulcombe, D. C. (2005) RNA polymerase IV directs silencing of endogenous DNA. Science 308, 118–120. 33. Hollick, J. B., Patterson, G. I., Coe, E. H. Jr., Cone, K. C. & Chandler, V. L. (1995) Allelic interactions heritably alter the activity of a metastable maize Pl allele. Genetics 141, 709–719. 34. Hollick, J. B. (2012) Paramutation: a trans-homolog interaction affecting heritable gene regulation. Current Opinion in Plant Biology 15, 536–543. 35. Hollick, J. B. (2010) Paramutation and development. Annual Review of Cell and De- velopmental Biology 26, 557–579. 36. Hollick, J. B. & Chandler, V. L. (1998) Epigenetic allelic states of a maize transcrip- tional regulatory locus exhibit overdominant gene action. Genetics 150, 891–897. 37. Hollick, J. B. & Chandler, V. L. (2001) Genetic factors required to maintain repression of a paramutagenic maize pl1 allele. Genetics 157, 369–378. 38. Hollick, J. B., Kermicle, J. L. & Parkinson, S. E. (2005) Rmr6 maintains meiotic inheritance of paramutant states in Zea mays. Genetics 171, 725–740. 39. Huang, Y., Kendall, T., Forsythe, E. S., Dorantes-Acosta, A., Li, S., Caballero-Perez, J., Chen, X., Arteaga-Vazquez, M., Beilstein, M. A. & Mosher, R. A. (2015) Ancient origin and recent innovations of RNA polymerase IV and V. Molecular Biology and Evolution 32, 1788–1799. 40. Huang, Y., Kendall, T. & Mosher, R. A. (2013) Pol IV-dependent siRNA production is reduced in Brassica rapa. Biology 2, 1210–1223. 41. Jokerst, R. S., Weeks, J. R., Zehring, W. A. & Greenleaf, A. L. (1989) Analysis of the gene encoding the largest subunit of RNA polymerase II in Drosophila. Molecular and General Genetics 215, 266–275. 42. Kanno, T., Huettel, B., Mette, M. F., Aufsatz, W., Jaligot, E., Daxinger, L., Kreil, D. P., Matzke, M. & Matzke, A. J. M. (2005) Atypical RNA polymerase subunits required for RNA-directed DNA methylation. Nature Genetics 37, 761–765. CHAPTER 1. INTRODUCTION 18

43. Law, J. A., Du, J., Hale, C. J., Feng, S., Krajewski, K., Palanca, A. M. S., Strahl, B. D., Patel, D. J. & Jacobsen, S. E. (2013) Polymerase IV occupancy at RNA-directed DNA methylation sites requires SHH1. Nature 498, 385–389. 44. Law, J. A., Vashisht, A. A., Wohlschlegel, J. A. & Jacobsen, S. E. (2011) SHH1, a homeodomain protein required for DNA methylation, as well as RDR2, RDM4, and chromatin remodeling factors, associate with RNA polymerase IV. PLoS Genetics 7, e1002195. 45. Li, S., Vandivier, L. E., Tu, B., Gao, L., Won, S. Y., Li, S., Zheng, B., Gregory, B. D. & Chen, X. (2015) Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis. Genome Research 25, 235–245. 46. Luo, J. & Hall, B. D. (2007) A multistep process gave rise to RNA Polymerase IV of land plants. Journal of Molecular Evolution 64, 101–112. 47. Mathew, R. & Chatterji, D. (2006) The evolving story of the omega subunit of bacterial RNA polymerase. TRENDS in Microbiology 14, 450–455. 48. Matzke, M. A., Kanno, T. & Matzke, A. J. (2015) RNA-directed DNA methylation: the evolution of a complex epigenetic pathway in flowering plants. Annual Review of Plant Biology 66, 9.1–9.25. 49. Matzke, M. A. & Mosher, R. A. (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics 15, 394–408. 50. McClintock, B. (1951) Chromosome organization and genic expression. Cold Spring Harbor Symposia on Quantitative Biology 16, 13–47. 51. Mosher, R. A., Schwach, F., Studholme, D. & Baulcombe, D. C. (2008) PolIVb influ- ences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proceedings of the National Academy of Sciences 105, 3145–3150. 52. Nobuta, K., Lu, C., Shrivastava, R., Pillay, M., De Paoli, E., Accerbi, M., Arteaga- Vazquez, M., Sidorenko, L., Jeong, D.-H., Yen, Y., Green, P. J., Chandler, V. L. & Meyers, B. C. (2008) Distinct size distribution of endogenous siRNAs in maize: evidence from deep sequencing in the mop1-1 mutant. Proceedings of the National Academy of Sciences 105, 14958–14963. 53. Onodera, Y., Haag, J. R., Ream, T., Nunes, P. C., Pontes, O. & Pikaard, C. S. (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent formation. Cell 120, 613–622. 54. Parkinson, S. E., Gross, S. M. & Hollick, J. B. (2007) Maize sex determination and abaxial leaf fates are canalized by a factor that maintains repressed epigenetic states. Developmental Biology 308, 462–473. CHAPTER 1. INTRODUCTION 19

55. Pontier, D., Yahubyan, G., Vega, D., Bulski, A., Saez-Vasquez, J., Hakimi, M.-A., Lerbs- Mache, S., Colot, V. & Lagrange, T. (2005) Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA poly- merases IV in Arabidopsis. Genes and Development 19, 2030–2040. 56. Ream, T. S., Haag, J. R., Wierzbicki, A. T., Nicora, C. D., Norbeck, A. D., Zhu, J.-K., Hagen, G., Guilfoyle, T. J., Pasa-Tolic, L. & Pikaard, C. S. (2009) Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II. Molecular Cell 33, 192–203. 57. Sabin, L. R., Delas, M. J. & Hannon, G. J. (2013) Dogma derailed: the many influences of RNA on the genome. Molecular Cell 49, 783–794. 58. Schnable, P. S. et al. (2009) The B73 maize genome: complexity, diversity, and dynam- ics. Science 326, 1112–1115. 59. El-Shami, M., Pontier, D., Lahmy, S., Braun, L., Picart, C., Vega, D., Hakimi, M.-A., Jacobsen, S. E., Cooke, R. & Lagrange, T. (2007) Reiterated WG/GW motifs form functionally and evolutionarily conserved ARGONAUTE-binding platforms in RNAi- related components. Genes and Development 21, 2539–2544. 60. Sidorenko, L. V. & Peterson, T. (2001) Transgene-induced silencing identifies sequences involved in the establishment of paramutation of the maize p1 gene. The Plant Cell 13, 319–335. 61. Sidorenko, L., Dorweiler, J. E., Cigan, A. M., Arteaga-Vazquez, M., Vyas, M., Kermicle, J., Jurcin, D., Brzeski, J., Cai, Y. & Chandler, V. L. (2009) A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genetics 5, e1000725. 62. Stam, M., Belele, C., Dorweiler, J. E. & Chandler, V. L. (2002) Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes and Development 16, 1906–1918. 63. Stam, M., Belele, C., Ramakrishna, W., Dorweiler, J. E., Bennetzen, J. L. & Chandler, V. L. (2002) The regulatory regions required for B ′ paramutation and expression are located far upstream of the maize b1 transcribed sequences. Genetics 162, 917–930. 64. Stonaker, J. L., Lim, J. P., Erhard, K. F. Jr. & Hollick, J. B. (2009) Diversity of Pol IV function is defined by mutations at the maize rmr7 Locus. PLoS Genetics 5, e1000706. 65. Stroud, H., Greenberg, M. V. C., Feng, S., Bernatavichute, Y. V. & Jacobsen, S. E. (2013) Comprehensive analysis of silencing mutants reveals complex regulation of the arabidopsis methylome. Cell 152, 352–364. 66. Tucker, S. L., Reece, J., Ream, T. S. & Pikaard, C. S. (2011) Evolutionary history of plant multisubunit RNA polymerases IV and V: subunit origins via genome-wide and segmental gene duplications, retrotransposition, and lineage-specific subfunctionaliza- tion. Cold Spring Harbor Symposia on Quantitative Biology 75, 285–297. CHAPTER 1. INTRODUCTION 20

67. Wang, Q. & Dooner, H. K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proceedings of the National Academy of Sciences 103, 17644–17649. 68. Werner, F. (2008) Structural evolution of multisubunit RNA polymerases. Trends in Microbiology 16, 247–250. 69. Werner, F. & Grohmann, D. (2011) Evolution of multisubunit RNA polymerases in the three domains of life. Nature Reviews Microbiology 9, 85–98. 70. Xie, Z., Johansen, L. K., Gustafson, A. M., Kasschau, K. D., Lellis, A. D., Zilberman, D., Jacobsen, S. E. & Carrington, J. C. (2004) Genetic and functional diversification of small RNA pathways in plants. PLoS Biology 2, e104. 71. Zemach, A., Kim, M. Y., Hsieh, P.-H., Coleman-Derr, D., Eshed-Williams, L., Thao, K., Harmer, S. L. & Zilberman, D. (2013) The arabidopsis nucleosome remodeler DDM1 allows DNA methyltransferases to access H1-containing heterochromatin. Cell 153, 193–205. 72. Zhang, H., Ma, Z.-Y., Zeng, L., Tanaka, K., Zhang, C.-J., Ma, J., Bai, G., Wang, P., Zhang, S.-W., Liu, Z.-W., Cai, T., Tang, K., Liu, R., Shi, X., He, X.-J. & Zhu, J.-K. (2013) DTF1 is a core component of RNA-directed DNA methylation and may assist in the recruitment of Pol IV. Proceedings of the National Academy of Sciences 110, 8290–8295. 73. Zhang, X., Henderson, I. R., Lu, C., Green, P. J. & Jacobsen, S. E. (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proceedings of the National Academy of Sciences 104, 4536–4541. 74. Zhong, X., Hale, C. J., Law, J. A., Johnson, L. M., Feng, S., Tu, A. & Jacobsen, S. E. (2012) DDR complex facilitates global association of RNA polymerase V to promoters and evolutionarily young transposons. Nature Structural and Molecular Biology 19, 870–875. 21

Chapter 2

Pol IV subunit RP(D/E)2a genetically interacts with the Pol IV accessory protein required to maintain repression2 (RMR2)

Note: Portions of this chapter have been published in: Barbour, J. R., I. T. Liao, J. L. Stonaker, J. P. Lim, C. C. Lee, S. E. Parkinson, J. Kermicle, S. A. Simon, B. C. Meyers, R. Williams-Carrier, A. Barkan, and J. B. Hollick (2012) required to maintain repression2 is a novel protein that facilitates locus-specific paramutation in maize. The Plant Cell: 24: 1761-1775.

2.1 Introduction

DNA-dependent RNA polymerase IV (Pol IV) is a plant specific RNA polymerase (RNAP) whose best-understood role centers around the production of 24-nt RNAs that mediate de novo cytosine methylation in a pathway known as RNA-directed DNA methylation (RdDM) (reviewed by Matzke et al. 2015). Studies predominantly in Arabidopsis thaliana (Ara- bidopsis) have characterized the RdDM pathway, which maintains epigenetic silencing of transposable elements (TEs) (see Matzke et al. 2015 for review). RdDM targets found ad- jacent to protein coding genes can impact their expression in both Arabidopsis (Swiezewski et al. 2007) and in Zea mays (maize) (Erhard et al. 2013). Loss of maize Pol IV also affects expression of certain TEs (Hale et al. 2009) and genic loci (Chapter 3 of this work) inde- pendent of 24-nt RNAs, through competition with Pol II (Hale et al. 2009). At other maize loci, the mechanism of their Pol IV-dependent regulation remains unclear (Parkinson et al. 2007, Erhard et al. 2009, Erhard et al. 2015). Pol IV functions in a mutli-protein complex composed of RNAP subunits and accessory proteins (Ream et al. 2009, Haag et al. 2014). Pol IV itself derives from ancient duplications CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 22

of RNAP Pol II subunits (Luo & Hall 2007). Pol IV is distinguished from Pol II by its largest and second largest subunits (Luo & Hall 2007). Together these two subunits define the catalytic core of each multisubunit RNAP (Cramer et al. 2001), while ten additional smaller subunits complex with the catalytic core to make the RNAP holoenzyme (Cramer et al. 2001). Pol IV, and another plant specific RNAP (Pol V), share many of these ten remaining RNAP subunits with Pol II (Ream et al. 2009, Law et al. 2011, Tucker et al. 2011, Haag et al. 2014). Furthermore, different plant genomes carry different copy numbers of genes encoding RNAP subunits that may associate with Pols II, IV or V (Stonaker et al. 2009, Tucker et al. 2011, Haag et al. 2014). Maize has three plant-specific second largest subunits (RP(D/E)2a, RP(D/E)2b, and RPE2c) (Stonaker et al. 2009) which associate with the single Pol IV (RPD1) or Pol V (RPE1) largest subunit to form two Pol IV subtypes and three Pol V subtypes (Haag et al. 2014). Comparisons of rpd1 and rp(d/e)2a mutant phenotypes indicate that the two Pol IV subtypes, defined by either RP(D/E)2a or RP(D/E)2b, have distinct functions (Stonaker et al. 2009, Sidorenko et al. 2009, Erhard et al. 2013). What remains unclear is how different these complexes are in terms of alternative RNAP subunit composition, associated accessory proteins, genomic targets, and ultimate functions. Accessory proteins associate both directly and indirectly with RNAP holoenzymes to facilitate RNAP recruitment, transcription, and transcript processing (Fuda et al. 2009, Kuehner et al. 2011). Although the full suite of Pol IV accessory proteins is still being defined, Pol IV currently shares only one accessory protein with Pol II (Haag et al. 2014), which highlights the importance of accessory proteins in distinguishing Pol IV complexes from other RNAPs. Furthermore, certain accessory proteins may define subpopulations of Pol IV complexes with distinct roles. For instance, loss of SAWADEE HOMEODOMAIN HOMOLOG1 (SHH1), which binds Pol IV (Law et al. 2011) and brings it to nucleosomes with both unmodified or monomethylated Histone 3 Lysine 4 (H3K4) and methylated H3K9 residues (Law et al. 2013), only disrupts 24-nt RNA accumulation at 44% of Arabidopsis Pol IV-dependent 24-nt RNA clusters (Law et al. 2013); how the other 56% of Pol IV-dependent 24-nt RNA clusters recruit Pol IV remains unclear. Understanding the molecular mecha- nisms of Pol IV regulation in the genome will require both identifying and characterizing these accessory proteins, and also associating them with the correct Pol IV subtype(s) and complex(es). Maize required to maintain repression2 (rmr2 ) encodes a novel protein required for 24- nt RNA biogenesis or maintenance (Barbour et al. 2012), maintaining epigenetic silencing of certain loci (Hollick & Chandler 2001, McGinnis et al. 2006), and cytosine methylation (Barbour et al. 2012). While the exact role of RMR2 in these processes remains obscure, the overlap in these functions with Pol IV functions (Hollick et al. 2005, Erhard et al. 2009, Erhard et al. 2013) classify RMR2 as a candidate Pol IV accessory protein. RMR2 loss only affects a subset of Pol IV-dependent 24-nt RNAs (Erhard et al. 2009, Barbour et al. 2012), which identifies RMR2 as a putative complex-specific Pol IV accessory protein. Understanding the function of RMR2 and identifying with which Pol IV complex(es) RMR2 associates promise to help distinguish the functions different Pol IV complexes in maize. Here, I present bioinformatic and phylogenetic analysis of the RMR2-defined protein family CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 23

to help identify the function of this novel protein. Additionally, I present the identification of two new rp(d/e)2a alleles and complementation data that genetically link rmr2 function with the RP(D/E)2a containing Pol IVa subtype.

2.2 Materials and Methods

Genetic Materials and Stock Synthesis The genetic nomenclature used follows established guidelines for maize (described in detail in Hollick et al. 2005), including listing the female parent of a cross first. Hand pol- linations were used for all stock syntheses and genetic assays. All stocks contain functional alleles required for anthocyanin pigmentation in the anthers, and all plants are homozygous for the Pl1-Rhoades haplotype. Details regarding pollen and anther scoring and paramuta- tion analyses have been previously published (Hollick et al. 2005). The rmr2-1 reference allele was isolated from an ems mutagenesis as previously described (Hollick & Chandler 2001). Details of the rmr2-1 pedigree as it pertains to complementa- tion tests described in this work are provided in Figures 2.7 and 2.8 (on pages 42 and 44, respectively) and Appendix 1. The rmr2 locus is linked to the nearby b1 locus, whose protein product governs plant, but not anther, pigmentation (Dooner et al. 1991) and was highlighted in the pedigrees where applicable. All other details of rmr2 alleles can be found in Barbour et al. (2012). The rmr7-4 and rmr7-5 alleles were isolated from an ems mutagenesis as previously described (Hollick et al. 2005). Both of these were introgressed into a divergent A632 back- ground to create mapping populations (Backcross 1 F2 for both rmr7-4 and rmr7-5 ) used in their molecular identification. Portions of the rmr7-5 pedigree can be found in Figure 2.5 on page 40. Complete pedigrees for complementation tests are in Appendix 1.

PCR and Capillary Sequencing Sequences of all particular primers used can be found in Table 2.1 on page 34. Genomic DNA (gDNA) was isolated from seedling leaf samples as described (Hale et al. 2007). The sampled seedlings were grown to flowering to identify mutants via anther pigmentation. For bulk segregant analysis, gDNA was pooled in equal concentrations. Primer sets for amplifying size polymorphisms between A619 and A632 backgrounds were identified through maizeGDB (maizeGDB.org). For sequencing of the rp(d/e)2a locus, a subset of primers (Table 2.1 on page 34) from Stonaker et al. (2009) were used as previously described. PCR amplicons were sequenced at the UC Berkeley DNA Sequencing Facility (Berkeley, CA). The resulting sequences were trimmed and aligned to the B73 reference sequence (Schnable et al. 2009) using Sequencher (GeneCodes, Ann Arbor, MI) to identify potential ems-induced transition mutations resulting in amino acid changes. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 24

Multiple Sequence Alignments and Phylogenetic Analysis Sequences were obtained through protein BLAST (BLASTP, BLOSUM62 comparison matrix, allowing gaps and a word length of 3) (Altschul et al. 1990) queries with the maize RMR2 protein sequence on Phytozome (http://phytozome.net; Goodstein et al. 2012). A core ClustalX (Larkin et al. 2007) alignment on the top three protein matches for each grass, plus all Arabidopsis thaliana matches and the Physcomitrella patens top match was used to delimit the conserved C-terminal region. Additional homologs (from Phytozome or National Center for Biotechnology Information (NCBI) BLASTP searches) were then man- ually added with Jalview (Waterhouse et al. 2009) to the conserved C-terminal alignment. The final alignment used for the phylogram was chosen for breadth and depth of coverage. The second P. patens protein (Pp1s114_65V6) was excluded to avoid a 15 amino acid in- sertion in the middle of the conserved region. The manually adjusted ClustalX alignment was analyzed with PHYLIP to create a maximum likelihood bootstrapped phylogram with 1000 randomized data sets (proml subprogram: Jones-Taylor-Thornton probability model with each data set randomly input 10 times and global rearrangements for each data set) (Felsenstein 2005).

RT-PCR Paralog specific primers (Barbour et al. 2012) and primers for alanine aminotransferase as a normalization control (Woodhouse et al. 2006) (Table 2.1 on page 34) were used to amplify previously generated cDNAs (Erhard et al. 2009) from several different tissue sources as previously described (Erhard et al. 2009). These were separated on a 3% agarose gel and visualized by ethidium bromide staining. Genomic DNA from an A619 background was used as a gDNA control.

A. thaliana Expression Profile Analysis The AtGenExpress Visualization Tool (jsp.weigelworld.org/expviz/expviz.jsp) was used to query the AtGE Development microarray database (Schmid et al. 2005) for the A. thaliana RMR2 family members (AT3G07730, AT1G21560, AT1G77270, and AT4G01170). AAT1 (AT1G17290) was used as a control for expression across samples and LEAFY3 (AT5G61850) as a tissue-specific control primarily to indicate the background noise when not expressed. AT3G07730 and AT4G01170 were not differentiated by the microarray hybridizations.

2.3 Results

RMR2 founds a family of plant-specific proteins The rmr2-1 reference allele was identified in a screen for the somatic derepression of an epigenetically silenced allele (Hollick & Chandler 2001). Mutator transpositions created four CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 25 additional rmr2 alleles. Next-generation sequencing from Mutator terminal inverted repeats (TIRs) (Williams-Carrier et al. 2010) identified the rmr2 locus as the GRMZM2G009208 gene model (Barbour et al. 2012). These insertions are found within the 5 ′ untranslated region (UTR) and first exon of rmr2 while the ethyl methanesulfonate (ems) induced rmr2- 1 allele encodes a nonsense mutation in the latter half of the gene (Figure 2.1A on page 37). The GRMZM2G009208 gene model encodes a protein of unknown function. EST and full- length cDNA sequences support a 366 amino acid-encoding rmr2 gene model (http://maize- sequence.org). TBLASTN searches of the predicted RMR2 protein identified two putative paralogs (GRMZM2G109217 and GRMZM2G003389) and hypothetical or predicted proteins in other multicellular plants, all lacking confirmed functional annotations. Multiple sequence alignments of these potentially related proteins (Figure 2.1B on page 37) highlighted a con- served carboxy-terminal region (CTR) corresponding to approximately one-third (amino acids 252 to 354) of the predicted RMR2 protein. I used this conserved region in a position- specific iterated BLAST search (Altschul et al. 1997) to identify more distantly related pro- teins, but this analysis returned only hypothetical and similarly unconfirmed annotations exceeding an expected = 0.01 threshold. Representative alignments of proteins sharing the conserved RMR2 CTR (Figure 2.1B on page 37) defined a unique arrangement of amino acid residues conserved from mosses through eudicots. Several highly conserved motifs can be identified (Figure 2.1C on page 37), although their functions remain unclear. Based on these conserved CTRs, phylogenetic analysis of representative RMR2-like plant sequences identify several clades within the RMR2 family (Figure 2.2 on page 38). Phylogenetic surveys of existing genome sequences indicate that all grass genomes encode three distinct proteins containing this conserved region. These predicted proteins separate into three clades distinct from the clade containing the multiple Arabidopsis thaliana representatives. Despite the predicted A. thaliana proteins sharing the conserved carboxy-terminus, none appear to be orthologous to the proteins representing grass clade A that contains RMR2. Predictive bioinformatic tools and existing rmr2 sequence data were used to better un- derstand the potential molecular function of RMR2. In silico translation predicts that rmr2 encodes a 40.5 kilodalton (kDa), negatively charged protein (theoretical pI = 4.70) (http://web.expasy.org/protparam; Gasteiger et al. 2005). The Plant-mLoc protein sort- ing algorithm (http://www.csbio.sjtu.edu.cn/cgi-bin/PlantmPLoc.cgi; Chou & Shen 2010) predicts a nuclear localization for RMR2, but comparisons to existing nuclear localization sig- nals (https://www.predictprotein.org; Rost et al. 2004) did not identify any related nuclear localization signal. PONDR-FIT (http://www.disprot.org/pondr-fit.php; Xue et al. 2010) predicted two regions with high probability of being disordered in the amino-terminus and the middle of the polypeptide. Pfam searches did not find any conserved motifs in the presumed RMR2 primary structure matching their databases (http://www.sanger.ac.uk/resources/data- bases/pfam.html; Finn et al. 2010). However, sequence-structure comparison software (FU- GUE; http://tardis.nibio.go.jp/fugue; Shi et al. 2001) identified structural similarities of the RMR2 conserved region to F93 (FUGUE z-score = 3.87), a predicted winged helix-turn-helix CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 26

(HTH) DNA binding protein from hyperthermophilic Sulfolobus turreted icosahedral virus (Larson et al. 2007) and to human saposin b (FUGUE z-score = 3.80). The protein ho- mology/analogy recognition engine (PHYRE2; http://www.sbg.bio.ic.ac.uk/˜phyre2; Kelley & Sternberg 2009) predicted structural similarity between residues 255 and 305 (the first half of the conserved CTR) and most of the HTH DNA binding domain (amino acids 3 to 60 of the 80 residue domain) of the Mycobacterium tuberculosis EspR transcription factor (Rosenberg et al. 2011). Analysis of rmr2 mutant alleles indicates that RMR2 may be involved in an epigenetic silencing pathway. RMR2 is required for the production and/or stabilization of 24-nucleotide (24-nt) RNAs (Barbour et al. 2012). In A. thaliana such 24-nt RNAs guide de novo cytosine methylation through a mechanism called RNA-directed DNA methylation (RdDM) (Matzke et al. 2015). While the RdDM pathway is highly characterized in A. thaliana, none of the A. thaliana genes encoding RMR2 family members have been identified through genetic screens. The moss (Physcomitrella patens) and spike moss (Selaginella moellendorffii) family members encode amino-terminal domains involved in protein-DNA or protein-chromatin interactions: a zinc-finger and an AT-hook motif respectively. Evidence of the same zinc- finger motif can be found in the maize RMR2 sequence, though the crucial zinc coordinating amino acids are absent.

Grasses conserve three distinct RMR2-family proteins The maize genome encodes three RMR2 family member proteins: RMR2, GRMZM2G- 109217, and GRMZM2G003389. These three proteins have likely orthologs in Oryza sativa (rice), Brachypodium distachyon (Brachypodium), and Sorghum bicolor (Sorghum) (Fig- ure 2.2 on page 38). The rmr2 gene and GRMZM2G109217 represent a tandem duplication approximately 45 kb apart on chromosome 2S in maize. A similar tandem orientation of potential orthologs is present in Brachypodium and Sorghum genomes, though not in the rice genome. While maize experienced a whole-genome duplication after its divergence from Sorghum (Swigonova et al. 2004), the maize genome does not include the expected addi- tional duplicates (ohnologs) for the three grass clades of RMR2 family proteins (Figure 2.2 on page 38). The distinct grass-specific clades for proteins encoded by rmr2, GRMZM2G109217, and GRMZM2G003389 indicate that they likely have varied functions. Indeed, phenotypes as- sociated with loss of RMR2 indicate that the two paralogs are not entirely redundant to RMR2. As of now, genes encoding the RMR2 paralogs have not been identified among the other rmr loci (J. B. Hollick, unpublished data). Using paralog-specific primers (Figure 2.3A on page 39), the RNA expression of each maize paralog was examined (Figure 2.3B on page 39). All three paralogs are expressed in the rapidly dividing tissues of the shoot apical meris- tem and developing floral structures. However, only rmr2 mRNA appears to be present in embryo tissue. Together phylogenetic, bioinformatic, and phenotypic analyses indicate that RMR2 plays an important, if unknown, role in maize epigenetic pathways. Genetic analyses discussed below further link RMR2 with other RdDM proteins. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 27

Complementation tests for ems072087 link RMR2 with Pol IV subunit RP(D/E)2a The purple plant1 (pl1 ) locus encodes a transcription factor involved in anthocyanin pigmentation in seedlings as well as adult vegetative and reproductive tissues (Cone et al. 1993). The Pl1-Rhoades allele is epigenetically regulated resulting in varying amounts of mRNA that correspond to a continuum of anthocyanin pigment production (Hollick et al. 1995). The pollen producing anthers of male inflorescences display pigmentation that corre- lates well with Pl1-Rhoades expression (Hollick et al. 1995). Anthocyanin pigmentation in the anthers is scored on a scale from 1 to 7, where anther color scores (ACS) of 1-4 represent epigenetically repressed Pl ′ types and ACS 7 values represent highly expressed Pl-Rh types (Hollick et al. 1995). To identify factors required to maintain the epigenetic repression of Pl ′, M2 progeny were screened for increased anthocyanin pigmentation (Hollick & Chandler 2001). Subsequent analysis of recessive mutants identified in these screens includes complemen- tation tests with existing reference alleles to assign complementation groups. The ems072087 allele complemented most existing reference alleles (Table 2.2 on page 35) (J. B. Hollick, un- published data). In the progeny of these complementation crosses, half of the individuals are heterozygous for both mutant alleles (doubly heterozygous). These doubly heterozygous progeny are expected to have the mutant phenotype (dark anthers) if ems072087 represents a new allele of the reference gene; these alleles fail to complement. In this case, ems072087 partially fails to complement reference alleles representing three distinct complementation groups: rmr2-1, ems051081 (then called rmr4-1 ), and rmr7-2 (Table 2.2 on page 35). The rmr7 locus encodes one of the three potential Pol IV / Pol V second largest subunits (RP(D/E)2a), which helps to form the polymerase’s catalytic core (Stonaker et al. 2009). Like RMR2, RP(D/E)2a is also required for 24-nt RNA biogenesis (Stonaker et al. 2009, Sidorenko et al. 2009). Since two of the three possible ems072087 complementation groups have defined dis- tinct loci, these complementation test results indicated that at least some second site non- complementation (SSNC) was being observed. Also known as nonallelic noncomplementa- tion, SSNC represents a low frequency outcome of complementation crosses between alleles of distinct genes, where through the poisoning of a complex or altering dosage, the two recessive alleles in a doubly heterozygous condition present a similar phenotype to either homozygous mutant (Hawley & Gilliland 2006). SSNCs can provide insight into either direct or indirect molecular interactions between the products of the non-complementing alleles. I therefore sought to identify the lesions of ems051081 and ems072087 alleles and, through additional complementation crosses, probe the SSNC behaviors of these alleles. ems051081 encodes an rmr7 allele The ems051081 allele was initially designated rmr4-1 when original tests indicated that it complemented rmr7-1 (Table 2.3, row 1 on page 35). The rmr7 and rmr2 loci are both CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 28

on chromosome 2S flanking the booster1 (b1 ) locus that, together with pl1, controls pig- mentation in adult stem and leaf tissues (Dooner et al. 1991). The ems051081 allele was isolated from a colorless b1 background, but a B1-Intense (B1-I ) allele conferring plant pigmentation was combined with ems051081. Subsequent growouts of ems051081 segre- gating material identified a linkage between B1-I and the ems051081 locus (J. B. Hollick, unpublished data), which prompted the sequencing of the rmr7 locus from ems051081 mu- tants. Sequencing identifies an early transition mutation resulting in a glutamine to stop codon change (Figure 2.4 on page 39), truncating the final seven domains conserved across all eukaryotic RNA polymerase second-largest subunits (Jokerst et al. 1989). Therefore the ems051081 allele is now designated rmr7-5. Although, a deleterious lesion was identified, the original complementation data between rmr7-5 and rmr7-1 remained contradictory. More seed from the same progeny was grown out, confirming the original result: 71 of 71 plants had strong Pl ′ phenotypes (Table 2.3 on page 35). However, subsequent complementation crosses with different lineages of rmr7-1 (Figure 2.5 on page 40), behaved as expected: the two alleles consistently failed to com- plement, producing dark-anthered Pl-Rh-like progeny (Table 2.3 on page 35). However, the progeny with darker anthers had more intermediate (ACS 5-6) than full Pl-Rh (ACS 7) types. Complementation tests between rmr7-2 or rmr7-3 and rmr7-5 produced more progeny with the expected ACS 7 types, further confirming the designation of ems051081 as rmr7-5. ems072087 encodes an rmr7 allele Original complementation data points to three potential loci for ems072087 : rp(d/e)2a, rmr2 and the then unknown locus of ems051081 (Table 2.2 on page 35). Cosegregation of A619-background specific polymorphisms with ems072087 exclude the rmr2 locus and identify a region within approximately 2 Mb of rmr7. Subsequent sequencing of the rmr7 locus from an ems072087 mutant identifies a transition mutation that results in an amino acid change from glycine (G) to aspartate (D) at a conserved residue in domain E of the polymerase subunit. Subsequent sequencing of DNA from additional ems072087 mutants confirmed the same lesion and resulted in the designation of ems072087 as rmr7-4 (Figure 2.4 on page 39). The rmr7-4 lesion changes a small nonpolar glycine residue to a large, negatively charged aspartate. This domain E residue is conserved throughout eukaryotic polymerases; although, the exact residue is specific to the associated RNA polymerase: asparagine for Pol I, serine for Pol II, glycine for Pol III, and glycine or serine for Pol IV (Luo & Hall 2007). Structural analysis of yeast Pol II, identifies domain E as important for creating the folds of the active center of the polymerase through interactions with the largest subunit (Cramer et al. 2001). While the effects of the glycine to aspartate change in rmr7-4 mutants remains un- clear, the complementation data indicate that the rmr7-4 lesion may encode a partially dysfunctional protein. Complementation tests with other rmr7 alleles support the rmr7-4 assignment (Table 2.4 on page 35), but in all complementation crosses, the non-Pl ′ progeny have lighter pigmentation (ACS 5-6 values). Furthermore, during introgressions of the rmr7- CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 29

4 allele into various genetic backgrounds, the homozygous mutant anther color tended to lighten if careful selection for the darkest anthers was not employed (J. R. B. Talbot and J. B. Hollick, unpublished observations). Together the molecular and genetic data support ems072087 as a hypomorphic allele of rmr7.

rmr2-1 partially fails to complement rmr7 alleles With the identification of ems072087 and ems051081 as rmr7 alleles, more progeny from both existing and new complementation crosses were evaluated to confirm and characterize the second site noncomplementation (SSNC) between rmr7-4 and rmr2-1. A second planting of the progeny originally evaluating complementation between rmr2-1 and rmr7-4 had a similar result with 28% of individuals with ACS 5-7 types compared to 21% originally. SSNCs can be classified into three groups: allele specific at both loci, allele specific at one locus or allele specific at neither locus (Hawley & Gilliland 2006). Stonaker et al. (2009) found that rmr2-1 complemented both rmr7-1 and rmr7-2 indicating that the observed SSNC between rmr2-1 and rmr7-4 is allele specific at one or possibly both genes. To expand the sampling from the original complementation tests, new crosses were made to create double heterozygotes of rmr2-1 and various rmr7 alleles. In most cases, the female was heterozygous (either for rmr2-1 or an rmr7 allele), and the male parent was homozygous mutant. Therefore, half of the resulting progeny individuals would be doubly heterozygous for both genes. The resulting progeny had variable complementation results, where for every allele combination tested, a portion of the progeny complemented while other progeny sets failed to complement (Table 2.5 on page 36). The origin of the rmr2-1 allele (from a homozygous mutant male or a heterozygous female) cannot predict the complementation outcome, discounting a parent-of-origin effect. The presumed severity of the rmr7 allele does not affect the likelihood of SSNC, as both early nonsense mutations (rmr7-1 ) and missense mutations at invariant residues (rmr7-2 ) (Stonaker et al. 2009) and conserved residues (rmr7- 4 ) could noncomplement rmr2-1. In the SSNC progeny, approximately half of the individuals had darkly colored anthers as expected: 56%, 49% and 55% with rmr7-1, rmr7-2 and rmr7- 4, respectively. However, the dark anthers tended to be lighter (ACS 5-6) than typical for most homozygous rmr mutants: 36%, 92% and 78% of dark anthered plants were scored ACS 5-6 for rmr7-1, rmr7-2 and rmr7-4, respectively. While the feature distinguishing progeny that have SSNC from those that do not remains unclear, when noncomplementation does occur the nature of the rmr7 allele does not matter.

Other rmr2 alleles complement rmr7 alleles Where SSNC did occur with rmr2-1, it was independent of the rmr7 allele; to test whether the noncomplementation is also independent of the rmr2 allele, several additional rmr2 alleles were tested. These rmr2 alleles have Mutator insertions in the 5 ′ UTR (rmr2- mum070823, rmr2-mum070824, and rmr2-mum070825 ) and in the first exon (rmr2-mum07- 0809 ) (Figure 2.1A on page 37) and are more likely to represent true nulls than the rmr2-1 CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 30

late nonsense mutation (Barbour et al. 2012). With one exception, these rmr2-mum alleles complemented the rmr7 alleles (Table 2.6 on page 36). The cross between rmr2-mum070824 and rmr7-3 produced progeny individuals with mosaic anthocyanin pigmentation in a small proportion (less than 1%) of the anthers (Figure 2.6C on page 41). However, all 33 individuals evaluated had Pl ′ anthers, even the ten with mosaic anthers showed predominantly ACS 1-2 pigmentation outside of the pigmented sectors. These additional complementation crosses did not identify clear SSNC between the rmr2-mum alleles and rmr7 alleles. Although, whether rmr2-mum and rmr7 alleles could fail to complement in the presence of the same unknown feature that facilitates rmr2-1 SSNC remains unknown.

A particular rmr2-1 lineage fails to complement rmr7 alleles The complementation behaviors between rmr2 and rmr7 alleles indicated the poten- tial for a third component required for the SSNC in certain rmr2-1 -containing double het- erozygotes. Both SSNC and complementing progenies were evaluated in the same field and multiple growouts of the same progeny produced the same complementation results, even when grown many years apart. Together, the consistency of complementation cross results indicate that the environment through field conditions, growth season, etc. do not have an appreciable role in the SSNC behavior. If the background factor is genetic in nature, then the rmr2-1 individuals that exhibit SSNC should be more closely related than those that complement rmr7 alleles. I traced the pedigrees of all rmr2-1 carrying parents that were used in complementation crosses back to a single progenitor family from 1996 (Figure 2.7 on page 42). From the progenitor family, several individuals were used to start distinct pedigrees distinguished by the distal b1 allele. The linkage, if any, between this lineage specific factor and the rmr2-1 lesion remains unclear, particularly because only the most terminal generations of these pedigrees were used in complementation crosses. To further confirm the hypothesis that a lineage specific component was affecting the complementation tests, several reciprocal crosses between both complementing and non- complementing rmr2-1 lineages and rmr7-5 were made. Figure 2.8A (on page 44) summa- rizes the particular crosses as well as the relative pedigrees separating the parents used for the crosses. Importantly, the two rmr7-5 families derive from the same ear, with one to two generations of additional selfing and intercrossing; and for all but two crosses, the same individual (11-537-7) was used. These parental sources were chosen to limit the genetic background variation between rmr7-5 parents. Unlike complementation tests with other rmr7 alleles, the dark anther colors were much more variable (for example ranges of ACS 4-6 were not uncommon). To minimize the bias in summarizing such complex scores, each value in a complex score (ex. ACS 4-6) was given a weighted score (1/3 point to ACS 4, 1/3 to ACS 5 and 1/3 to ACS 6), and weighted boxplots were created to summarize the total results (Figure 2.8B on page 44). Complementation crosses with the complementing b rmr2-1 allelic lineage fully complemented rmr7-5 in all cases. Families producing only B rmr2-1, rmr7-5 double heterozygotes had mean ACS values of 4 and interquartile ranges of CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 31

ACS 3-5. Those families produced from crosses of a heterozygote and a homozygote (Figure 2.8B, lines 2 and 3 on page 44) produced the greatest range, as expected since only half of the progeny were double heterozygotes. Together, these results support the model that a particular lineage of the rmr2-1 allele that fails to complement rmr7 alleles, although what causes this effect remains unclear.

2.4 Discussion

RMR2 represents a novel class of proteins with largely unknown function(s). Mutant analyses identify RMR2 as a Pol IV accessory protein from its roles in 24-nt RNA biogenesis or maintenance (Barbour et al. 2012), in the maintenance of epigenetic silencing (Hollick & Chandler 2001, McGinnis et al. 2006), and in the maintenance of cytosine methylation patterns (Barbour et al. 2012). Here, bioinformatic and phylogenetic analyses identify RMR2 family proteins in various multicellular plants and divide the RMR2 protein into a variable amino-terminal region (NTR) and a conserved CTR. Additionally, genetic interactions with a Pol IV catalytic subunit (RP(D/E)2a), the polymerase responsible for transcribing 24-nt RNA precursors (Li et al. 2015), indicate that RMR2 may function with specific Pol IV complexes in maize. The RMR2 family represents apparently diverse proteins in multicellular plants related by a novel amino acid motif embedded in a conserved CTR. The corresponding gene mod- els encode hypothetical proteins ranging from 278 (Selaginella moellendorffii gene model: 413199) to 1123 amino acids (Physcomitrella patens gene model: Pp1s36_60V6). The num- ber of genes encoding RMR2 family members per organism also varies. Compared with eudicots, which typically have predicted single RMR2-type proteins, the grasses appear to have conserved additional diversity. However, further improvements to plant genome assem- blies may identify additional gene models containing a similar CTR. Unlike other eudicots, A. thaliana and A. lyrata genomes both contain several (four and three, respectively) gene models predicting proteins with identity to the RMR2 CTR. In every replicate phylogram (n = 1000, bootstrap frequencies of 1.0), these Arabidopsis proteins group into three sub- clades, which appear to be most closely related to the grass clade C proteins, rather than the RMR2-containing clade A. Analysis of the A. thaliana developmental expression atlas (Schmid et al. 2005) indicates that all of the RMR2 family genes are expressed throughout development (see Materials and Methods for details). However, AT1G21560 RNA is typ- ically present at 10-fold higher levels than the other three members. These mRNA levels decrease to the level of the other two subclade representatives in mature pollen and in stage 8, 9, and 10 embryos. Importantly, there is no evidence of clade-specific expression in embryo and endosperm tissues as seen with the maize RMR2-type genes (Figure 2.3B on page 39). These comparisons raise the possibility that RMR2 has evolved to carry out grass-specific functions. The molecular function of the RMR2 protein family remains unknown; however, bioin- formatic analyses hint at a possible role in nucleic acid binding. Both FUGUE and PHYRE2 CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 32 algorithms predicted a series of α-helices in the RMR2 CTR that have the highest structural similarity to helix-turn-helix (HTH) DNA binding folds. The HTH folds of F93 (FUGUE result) and EspR (PHYRE2 result) both have unique traits. EspR in particular was shown by Rosenberg et al. (2011) to create large DNA loops because of the angle of the two DNA binding helices within an EspR dimer. PHYRE2 analysis of the RMR2 clade C maize mem- ber (GRMZM2G003389) identified a more significant similarity between its carboxy-terminal helices and the EspR HTH fold (confidence of 82% versus 49% for RMR2). Pfam queries also identified zinc-finger DNA binding motifs in P. patens (gene model: Pp1s36_60V6) and S. moellendorffii (gene model: 420539). The other S. moellendorffii gene model (413199) sharing identity with the RMR2 CTR has two tandem AT-hook motifs. AT-hooks can bind minor grooves of DNA and commonly function with other DNA binding motifs to alter DNA structure (Aravind & Landsman 1998). None of the RMR2 family members have obvious catalytic domains, leading to the pre- diction of a structural rather than enzymatic role for these generally small (300 to 600 amino acids) proteins. The secondary structure of the RMR2 NTR is predicted to be intrinsically disordered. In combination with the low sequence conservation and variable sizes of the RMR2 family amino-termini, these observations support the idea that RMR2 family pro- teins have rapidly evolving protein-protein interaction domains in their NTRs (Dunker et al. 2005). Perhaps the RMR2 CTR directs nucleic acid binding, while the poorly conserved NTR coordinates contacts with other peptides. These speculative assignments remain to be tested, but interestingly, a DNA binding protein associating with cis-regulatory regions of another epigenetically regulated maize gene (B1-I ) appears to play a role in facilitating its silencing (Brzeska et al. 2010). In A. thaliana, a DNA binding protein (SSH1/DTF1) required for both DNA methylation-dependent and methylation-independent gene silencing (Liu et al. 2011) coimmunoprecipitates with the largest Pol IV subunit (Law et al. 2011). Recently, putative SHH1 homologs were found to coimmunoprecipitate with both maize Pol IV and Pol V through direct interactions with the Pol V largest subunit or indirect interac- tions with RP(D/E)2a and Pol IV-binding RDR2 (Haag et al. 2014). These DNA binding proteins may act to specify genomic targeting of specific RNA polymerase complexes. The characterization of two new rmr7 alleles predicted to alter the length and compo- sition of RP(D/E)2a identified genetic interactions with rmr2. The rmr2-1 allele, but not rmr2-mum alleles, fails to complement all rmr7 alleles tested in double heterozygotes. This type of SSNC that is allele-specific for rmr2-1 and allele-nonspecific for rmr7 is commonly represented by a poisoning model where a particular allele creates a poisonous protein that sensitizes the pathway to dosage changes of other pathway components (Yook et al. 2001, Hawley & Gilliland 2006). The hypothesis of RMR2 as a structural protein with two inter- action domains predicts that the rmr2-1 product containing only the NTR might be able to serve as just such a poisonous protein. While poison-type SSNCs can indicate physical interactions of proteins within a common complex, they can also form with non-complexed members of the same pathway (Yook et al. 2001). Maize RP(D/E)2a coimmunoprecipitates several Pol IV subunits and accessory proteins, but no peptides representing RMR2 or its paralogs were recovered (Haag et al. 2014). Like RMR2, the loss of RP(D/E)2a reduces only CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 33

the repetitively mapping Pol IV-dependent 24-nt RNAs (S. Simmon, I. T. Liao, B. Meyers, and J. B. Hollick, unpublished data). Together these data support an indirect or transient RMR2 association with RP(D/E)2a-containing Pol IV complexes. However, definitive analysis of the rmr2, rmr7 complementation data requires the identi- fication of the lineage specific genetic feature required for the observed SSNC behavior. The consistency with which the noncomplementing rmr2-1 lineage fails to complement all rmr7 alleles and other rmr2-1 lineages complement the same alleles supports the presence of a distinct genetic factor that is inherited with rmr2-1. Although the SSNC rmr2-1 lineage has been distinct from other rmr2-1 lineages for many generations, only the last two generations (Figure 2.7 on page 42) were used in complementation tests. Therefore, it is unclear when the SSNC-conferring genetic factor was introduced. One appealing candidate is a cis-regulatory region that could have been disrupted by the recombination event identified in 1997 that linked rmr2-1 with a new b1 allele. Future analysis of rmr2 expression in the SSNC lineage and of the linkage between the SSNC-conferring genetic factor and rmr2-1 would test this hypothesis. All phenotypic analyses support RMR2 as an accessory protein associated with a subset of Pol IV complexes (Barbour et al. 2012). The bioinformatic, phylogenetic, and genetic anal- yses presented here build a model for this association, where RMR2 functions as a structural protein interacting with RP(D/E)2a-containing Pol IV complexes. This speculative model presents several predictions for further testing. For instance, when defined as individual loci, RMR2-dependent 24-nt RNAs are expected to be co-dependent on RP(D/E)2a. Much remains unknown about RMR2’s role in epigenome regulation and the roles of RMR2 family proteins in other plants. The potential redundancy of Arabidopsis RMR2 family proteins and the probable grass-specific nature of the RMR2 clade promise that further study will identify novel roles for these proteins in plant epigenetic regulation.

Acknowledgements

The cloning of rmr2 was a herculean endeavor that included the efforts of many graduates, undergraduates and high school students; thanks to all of those whose prior work enabled me to explore the details of rmr2 genetics. Thanks especially to Jana Lim who designed the primers I used to sequence rmr7-4 and rmr7-5 alleles. In particular, I would like to thank Jay Hollick for his impeccable pedigree book-keeping that allowed me to trace rmr2-1 lineages back to their identification in 1996. I would also like to thank Karl Erhard for teaching me the ropes during my first experiment in the lab: the cloning of ems072087. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 34

2.5 Tables

Table 2.1: Primers used in this Chapter

Name Sequence (5 ′-3 ′) Locus Region E1a_F ACAGAAGGACAGTGGGCAAC a rmr7 Exon 7 E1a_R GGCTCGGAAGAAGACTTTCC a rmr7 Exon 7 E1b_F GATGATAATGGAGCGGGAAA a rmr7 Exon 7 E1b_R AGCGAATCAAAAAGCTCCTG a rmr7 Exon 7 E1c_F CTATGCCAGCATCACACACC a rmr7 Exon 7 E1c_R ATTCACGATGGGGATCTGAG a rmr7 Exon 7 E2_F AGGCAAGCAGCTATTCATCC a rmr7 Exon 6 E2_R CGTCCCCACCTTCACATTTA a rmr7 Exon 6 E3a_F CCCAATAGCTTGTTGCGAGT a rmr7 Exon 5 E3a_R CAATGAGGTTCGCGTGTTTA a rmr7 Exon 5 E3b_F GGTGCGGTGGAAAAATAACT a rmr7 Exon 5 E3b_R GGTTGCTTTTAGTTGTTGACAC a rmr7 Exon 5 E3c_F CAGTCGTTTGGTTCCATCCT a rmr7 Exon 5 E3c_R ATGATTTCGCGAGACCAGAA a rmr7 Exon 5 E4_F AACCTGCACACCAAACATCT a rmr7 Exon 4 E4_R TGACTTAAGACTAAATCCCTACGTACA a rmr7 Exon 4 E5a_F GCACATGCAATACTTGGTTGA a rmr7 Exon 3 E5a_R CATGTACATTATTTTTCCTTTCATCTT a rmr7 Exon 3 E5b_F CAGTGGAAAAGGCACGATTT a rmr7 Exon 3 E5b_R CAATGCCAGTTTGGTTGTTG a rmr7 Exon 3 E6_F TGGTGTCATGGCAACTAGGA a rmr7 Exon 2 E6_R CACTGATTATATGTGGGTTATTTTTCA a rmr7 Exon 2 E7_F GGGAAAACCATGGCATACAT a rmr7 Exon 1 E7_R GGATGTGGATATCAAGGGAAAA a rmr7 Exon 1 RT_rmr2_F CGTGCACCCCCTACTAATTC rmr2 Exons 1-2 RT_rmr2_R TGAATCTGGCCTTCTTGTGA rmr2 Exons 1-2 RT_109217_F CAGCAATTCAGTCTCCAACG GRMZM2G109217 Exons 4-5 RT_109217_R CAAACATTCCATGACAACTCG GRMZM2G109217 Exons 4-5 RT_003389_F TCTGATCGTGATAAGAGACTGAG GRMZM2G003389 Exons 3-4 RT_003389_R AATGGGGTTGCAAGCTAGAG GRMZM2G003389 Exons 3-4 AAT_F ATGGGGTATGGCGAGGAT b alanine aminotransferase Exons 12-13 AAT_R TTGCACGACGAGCTAAAGACT b alanine aminotransferase Exons 12-13 a Designed by Jana Lim for sequencing of rmr7-1, rmr7-2 and rmr7-3 (Stonaker et al. 2009). b alanine aminotransferase primers from Woodhouse et al. 2006. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 35

Table 2.2: Initial complementation tests for ems072087 a

Female Male No. of Ears ACSb1-4 ACS 5-6 ACS 7 + / mop1-5 ems072087 1 19 0 0 + / rmr1-1 ems072087 1 19 0 0 + / rmr2-1 ems072087 1 15 0 4 + / rmr3-1 ems072087 1 19 0 0 + / ems051081 ems072087 1 14 0 5 + / rmr5-1 ems072087 1 19 0 0 + / rmr6-1 ems072087 1 9 00 + / rmr7-2 ems072087 1 17 0 3 + / rmr8-1 ems072087 1 1 00 + / rmr9-1 ems072087 1 4 00 a This table summarizes 2008 unpublished data by J. B. Hollick. b Anther Color Score

Table 2.3: Complementation tests between ems051081 and rmr7 alleles

Female Male No. of Ears ACSa1-4 ACS 5-6 ACS 7 + / rmr7-1 ems051081 1 88 0 0 + / rmr7-1 ems051081 1 18 18 0 + / ems051081 rmr7-1 1 13 6 5 + / ems051081 + / rmr7-1 1 41 0 10 + / ems051081 rmr7-2 2 28 0 40 + / rmr7-3 ems051081 1 10 3 9 + / ems051081 rmr7-3 4 68 1 78 a Anther Color Score

Table 2.4: Complementation tests between ems072087 and rmr7 alleles

Female Male No. of Ears ACSa1-4 ACS 5-6 ACS 7 + / rmr7-1 ems072087 2 2 12 + / rmr7-2 ems072087 3 69 16 13 + / rmr7-3 ems072087 1 2 50 + / rmr7-5 ems072087 2 66 18 15 + / ems072087 rmr7-5 1 20 9 3 a Anther Color Score CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 36

Table 2.5: Complementation tests between rmr2-1 and rmr7 alleles

Female Male No. of Ears ACSa1-4 ACS 5-6 ACS 7 + / rmr2-1 rmr7-1 2 29 0 0 b + / rmr7-1 rmr2-1 3 63 0 0 rmr2-1 rmr7-2 2 36 0 0 b + / rmr2-1 rmr7-3 1 53 0 0 + / rmr7-4 rmr2-1 2 35 0 0 + / rmr7-1 rmr2-1 1 14 4 7 + / rmr2-1 rmr7-2 1 21 11 1 + / rmr2-1 rmr7-4 4 66 42 12 a Anther Color Score b Data originally published in Stonaker et al. 2009.

Table 2.6: Complementation tests between Mutator induced rmr2 alleles and rmr7 alleles

Female Male No. of Ears ACSa1-4 ACS 5-6 ACS 7 + / rmr2-mum070809 + / rmr7-1 1 69 0 0 + / rmr2-mum070809 rmr7-2 1 30 0 0 + / rmr2-mum070809 rmr7-3 2 32 1 0 + / rmr2-mum070809 rmr7-4 3 33 0 0 + / rmr2-mum070823 + / rmr7-1 2 60 0 0 + / rmr2-mum070823 rmr7-2 2 31 0 0 + / rmr2-mum070823 rmr7-3 1 36 0 0 + / rmr2-mum070823 rmr7-4 2 21 0 0 + / rmr2-mum070824 + / rmr7-1 1 29 0 0 + / rmr2-mum070824 rmr7-2 1 37 0 0 + / rmr2-mum070824 rmr7-3 1 33 b 0 0 + / rmr2-mum070824 rmr7-4 4 10 0 0 a Anther Color Score b 10 of 33 Pl ′ types had sectored anthers at a low frequency (>1%) (Figure 2.6 on page 41). CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 37

2.6 Figures

A rmr2-mum070823 rmr2-mum070824 rmr2-mum070825 rmr2-mum070809

rmr2-1 5' 3' C-T 3,026 bp

R-X N Conserved C-terminus C 366 aa

B 10 20 30 40 50 60 70 80 90 100 110 ZZm GRMZM2G009208 (RMR2) (252-354) RPDSDFK *RRLIKALTKPVARKEYYRLFDTVTIRTPLMKLRQVRNETK-FYP-TEEMGSS YLDHYPDLAEQ--IMNNGR---RNGLALMRGFLFWLQNAVHED--QFKPWV DD Sb Sb06g023910 (213-318) RPDSDFKRRLIKALTKPVSRKEYYRLFETVTIRTPLMKLRQVRNETK-SYP -AEEMGKSYLEHYPDLARQ--IMKSDS---RNGLALMRGFLFWLQNNVHDD- - Q F K P WV DD Os Os04g45610 (218-320) RPDSDFKQRLLDALSKPFSRKEYIKLFDMASIRTPLVKLRQVRNDVK-FYP -TQEMGNSYFDHYPDLVDQ--VMHTSF---PNGLALMRGFFFWLQNNAHED- - Q F R P WV DV Bd Bradi5g17050 (273-375) RPDSDFKRRLVDALCKPFSRKEYLNLFDMASIRTPLVKLRQVRNDEK-FYP -TDEMGNSYFDHYPDLVEQ--VTNTSY---SKGLALMRGFFFWLQNSAHED- - Q FM P WMDD Zm GRMZM2G109217 (328-433) RPNSVFKEKLIEFLIKPFTQEEYDRYFALASDRIPMVKERRTRNTVA-YYHWTHEMSKSYFDRYPDLAQQFSLQRNNY---TNGLALLRGLFFWLQNIELED--QFRPWSDE Sb Sb06g023905 (248-350) RPNSMFKEKLIEFLIKPFTQEEYDKYFALATDRRPLLKERRTRNKVA-YYP WTHEMNKSYFDRYPDLAEQFNLQRNNY---PNGLALLRGLFFWLQNIGQED --QFRPWSDE Os Os04g14410 (339-442) REESDFKQRLIHVLNKPFSQGEYDKLFGMATIRNPLTRERRTRCGVK-YYY SQHEKAKSYFDCYPDLAKQ--VEEASY---PNRLALLRGLFFWLENIGQDD- - Q F R P WR DD Bd Bradi5g17040 (214-317) RPESRFKERLMQVLSKPFSQREYDQLVHRASIHIAATKERRTRAGVK-YFP STHEKNKSYFDSFPDLEEQ--VKSTSY---PNQLSLLRGFFFWLKNIGHED- - Q F R P WR DD Al 921132 (279-386) QRNSWFRKEIMNVLKQPYTVTELKELHNEASVHRVSSRHVELRDGTEFSFP TK-KKTPSYLDGYPDFKKQYLESLREDDE-HKALNLLRGFIFYLTKVVRDD--AFKPWLDQ At AT1G21560 (284-391) QRNSWFRREIMNVLKQPYTVTELKELHKEASVHGVSSRHIELRDGTEFSFP TN-KKKPSYLDRYPDFKKQYLDSLREKDE-RKALNLLRGFIFYITNVVRDD--AFKPWLDQ Al 903672 (366-472) SSSSWFRKELMDVLQKPYDEGELKLLHRYASIHRKMTRCRELRKGRESDYE TD-ELGQSYLESFPDFEKEYKLVVGVD-K-ARALKLLRGFFLYLKYVSHDG--VFKPWERN At AT4G01170 (332-439) SSITWFRKELMDVLKQPYDEEEMKQLHRDVSIHRKMTRCRELRKGRGSVYE TG-ELGQSYLESFPDFEKEYKLVVGVDDK-ARALKLLRGFFFYLKYVSHDG--VFKPWEKT At AT3G07730 (362-468) SSSSWFRKELMDVLKNPYDEGEMKQLHRYVSIHKKMTRCRELRKGRGSDYE TD-ELGQSYLESFPDFEKEYKLVVGVD-K-ARALKLLRGFFFYLKYVSHDG--VFKPWEKT Al 927125 (697-803) IFNSWFSKKLMEILRNPYDEKEFLRLYNEASLKRPLTKSRQLRDGREIEYN DQHKLALSYLEKYTDFNKKYHR--YQKDL-PRALNLLRGFFFYLENIVLEG--AFKPWLHE At AT1G77270 (568-674) IFNSSFSKKLMEILRNPYDEKEFLRLYNEASVKRPLTKSRQLRDGREIEYY CDDQFALSYLEKYTDFNKTYHR--YRKDL-PRALNLLRGFFFYLENIVLEG--AFKPWLHE Pt POPTR-0002s07800 (696-802) GVSSKFREELLKILEKPYDEREFEDLWQRISHRSHVNRDINLRSGCIR-FE -TKAIGKSYLDHHADLEAKIRSAQGNQ---PEILNLMRGFFYWLMNASHAG SRALRPWLDS Vv GSVIVG01008800001 (121-225) TGHSQFRDNLMVILKKPYDQLEYEELLQEIIQRKPLERHRDLRG-RIKSYS -VEICSKSYLDHHTDLKRKIDAVQGDH---CKILNLLRGFFYWLKNLSHEG--AFQPWSDA Me cassava4.1-004084m.g (475-580) NQCIWFMESVMGILKKPYDPEEYDKLLEEVYSRKPVIEDREMRNGRTRLCP -AQRMGKSYVDLHQDLDKQIKSAGSNK---PKILNLLRGFFYWLQHTPHEG- - I F K P WMD S Zm GRMZM2G003389 (263-365) GGSSTFDEKLNAVLSQPYDQNEYEELWRKATDRKPVSRQRHLRS-ASKRYV -TEAIGLSYLDYYPDLAVQIKSSDRDK---R-LS-LLRRFFFWLENLCHEG--AFMPWISK Sb Sb10g022740 (266-314) GGSSTFDEKLNAVLSQPYDQNEYEELWRKATDRKP------NLCHEG--AFMPWISK Os Os05g34920 (261-364) GIPSTFDEKLNDVLSKPYDLNEYKELLRKATDRKLVSRQRHLRN-ASKPYA -TRAVGLSFLDHYPDLAIQIDSADSDE---RKLC-LLRKFFFWLENLCHEG--AFMPWIDK Bd Bradi2g24950 (235-338) GVTSTFDEQLDSFLSRPYDRNEFQELLRKATFRKPVTRQRHLRN-ASKFYA -TGELGLSYLDEYPDLATQIDSADCDE---RRLN-LLRKFFFWLQNLTHEG--AFMPWIPK Ac AcoGoldSmith-v1.003355m.g (506-567) DHEKEFRRLLRKEISKPYDKYEYKDSLKRARVRKWVDLDKETRRTSFSVPT --KRKRKSYLGHHPDLKGLLEAALKEHFPRKRALALLRCLFFHVKYRPHPG- - Y F V P WT N P Sm 420539 (533-635) DAGLLYRQQLEKELEKPFDSQQLESMLKQASCRKPVLRHRDTRRGSMPIAT --TVTAPSYLDYHPDLAEKVEAAEDEE----K-LRLLRGFFFWLMKTCDPG- - A F K P WV T L Pp Pp1s36-60V6 (884-987) PQGTIYRTKLKSALAHPFDAKEMATLDESARCRKPIQKLRQMRGRIVDVIT --EDEGLSYLDHHQDFALKLEEAASPP----EKLALLRGFFFWLQHSCMEG--AFRPWLGA Sm 413199 (146-249) TSTSSFGQKLEAELERPFDGDELERMEKDVSHRKPVVKLRQTRQGSVPFET --ENAGNSYLDHFPDFTEVLSVTRSSE----KRLRLLRGFFFWLQHSCMMN --SFKPWLSG

C Consensus FLLPE R SYLDPDLXX LLRGFFFWLX L F FPWX F F Motif 1 Motif 2 Motif 3

Figure 2.1: Motifs within the conserved C-termini of RMR2 family proteins. (A) Schematic of the rmr2 gene model highlights lesions of rmr2 mutant alleles and the RMR2 protein’s conserved C-terminus. Exons, untranslated regions (UTRs) and introns are depicted as thick, medium and thin grey rectangles, respectively. (B) Alignment of the C-terminal ends of several RMR2 family proteins. Representative plants from mosses and spike mosses (green), grasses (yellow) and eudicots (pale green) are included in an order dictated by the phylogeny in Figure 2.2 on the following page. Conserved amino acids are colored. The rmr2-1 lesion is at the beginning of this conserved region, the relevant arginine is in bold with an asterisk above. (C) Potential RMR2 family motifs. Consensus sequences from the alignment are highlighted including invariant and nearly invariant amino acids as well as three short motifs. Abbreviations: maize (Zea mays; Zm), sorghum (Sorghum bicolor; Sb), rice (Oryza sativa; Os), Brachypodium distachyon (Bd), Arabidopsis lyrata (Al), Arabidopsis thaliana (At), Populus trichocarpa (Pt), Vitis vinifera (Vv), Manihot esculenta (Me), Aquilegia coerulea (Ac), Selaginella moellendorffii (Sm), and Physcomitrella patens (Pp). CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 38

Zm GRMZM2G009208 (RMR2) Grasses (Monocots) 930 Sb Sb06g023910 A 748 Os Os04g45610 Eudicots 425 Bd Bradi5g17050 Mosses and sses 989 Zm GRMZM2G109217 995 Sb Sb06g023905 B 695 Os Os04g14410 720 Bd Bradi5g17040 343 Al 921132 1000 At AT1G21560 416 Al 903672

1000 At AT4G01170 952 582 238 At AT3G07730 Al 927125 1000 At AT1G77270 Pt POPTR-0002s07800 392 Vv GSVIVG01008800001 627 237 Me cassava4.1-004084m.g Zm GRMZM2G003389 995 Sb Sb10g022740 633 524 C Os Os05g34920 981 Bd Bradi2g24950 713 Ac AcoGoldSmith-v1.003355m.g 1000 Sm 420539 Pp Pp1s36-60V6 Sm 413199

Figure 2.2: Phylogeny of RMR2 family proteins. Molecular phylogeny of RMR2 and related proteins including representatives from grasses (yellow), eudicots (pale green) and mosses and spike mosses (outgroup, green) based on alignments of the conserved C-terminal domain (Figure 2.1 on the preceding page). Branch lengths correspond to the indicated bootstrap values (n = 1000). Three grass-specific clades are labeled A, B, and C. Abbreviations: maize (Zea mays; Zm), sorghum (Sorghum bicolor; Sb), rice (Oryza sativa; Os), Brachypodium dis- tachyon (Bd), Arabidopsis lyrata (Al), Arabidopsis thaliana (At), Populus trichocarpa (Pt), Vitis vinifera (Vv), Manihot esculenta (Me), Aquilegia coerulea (Ac), Selaginella moellen- dorffii (Sm), and Physcomitrella patens (Pp). CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 39

A rmr2 GRMZM2G109217 2 GRMZM2G003389 9

B

Apical

MeristemImmatureTassel Immature EndospermSeedlingGenomic DNA

1 aat 109217 003389 rmr2

Figure 2.3: Expression of rmr2 and its paralogs in various maize tissues. (A) Gene models of rmr2 and its paralogs. Exons are thick grey rectangles, the annotated transcription start site is marked with a grey arrow. Positions of the paralog-specific primers (Table 2.1 on page 34) used for expression analysis are highlighted with arrows above and below the gene models. (B) RNA expression analysis of rmr2 paralogs. Paralog-specific primers ((A) and Table 2.1 on page 34) amplified cDNAs from several tissues in RT-PCR reactions. Alanine aminotransferase (aat, alt301 ) was used as a control. The paralogs are abbreviated as 003389 for GRMZM2G003389 and 109217 for GRMZM2G109217.

rmr7-5 rmr7-1 rmr7-2 rmr7-4 mop2-2 rmr7-3Mop2-1

1

W N A C D E GH I C

Figure 2.4: The rp(d/e)2a / rmr7 / mop2 gene model. A schematic of the rp(d/e)2a locus, showing coding regions, untranslated regions, and introns as thick, medium and thin rectangles, respectively. Conserved RPB2-like domains are highlighted in the protein as light grey boxes labeled A-I. Both new (rmr7-4 and rmr7-5 ) and previously identified (Stonaker et al. 2009, Sidorenko et al. 2009) alleles are highlighted in both the DNA and corresponding peptide.

CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 40 Y

Generation ear rmr7-1 Generation Year 9§ M1 9§

rmr7-1 lineages

9 9

rmr7¨ © lineages

9 9

W      

9 9

¡ ¡

  

Complementation

   

ed to complement

9 9

¡ ¡

   Complementation

alleles complemented

99 9 9

99 9 9

00 0 0

00 0 0

0 0

¢ ¢

0 0

¢ ¢

0 0

£ £

0 0

£ £

0¤ 0¤

rmr7-5 0¤

M1 0¤

0 0

¥ ¥

0 0

¥ ¥

0¦ 0¦

0¦ 0¦

0§ 0§

   4 4    4 4 4 4  %

   7

0§ 0§

0 0

0 0

0 0

¡ ¡

0 0

¡ ¡

09 0 9

09 0 9

09 0 9

0 0

¢ ¢

  4  )-

5 ( )* + ,

'

0 0

¢ ¢

11 11



$   ./2 33  4 4& 

!"# !#   / .6 11 1 11 1 1 11

12 12

Figure 2.5: (Continued on next page.) CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 41

Figure 2.5: Pedigree of rmr7-1 and rmr7-5 lineages used in complementation crosses. The pedigree includes the lineages from the original M1 families to those complementation crosses that were evaluated as either complementing (yellow circles) or failing to complement (purple circles) (Table 2.3 on page 35). Each circle represents a family and each connecting line a cross with self-cross, sibling cross and outcross represented as a single connecting line, a diamond connection, or two distinct input lines, respectively. Family circles are color-coded as marked in the legend. Female parents are always on the left of each two-parent cross. Each complementation cross is shown twice, once for each rmr7-1 and rmr7-5 lineage, with the family index numbers listed.

A Primary Spike C

(+ / rmr7-1) X rmr2-1 B Secondary Spikes

(+ / rmr7-1) X rmr2-1 (+ / rmr2-mum070824) X rmr7-3

Figure 2.6: Sectored tassels and anthers from rmr7, rmr2 complementation crosses. (A and B) Sectored tassels with both Pl-Rh and Pl ′ anthers on the primary (A) or secondary (B) spikes of progeny from complementation crosses between + / rmr7-1 and rmr2-1 / rmr2-1 parents. (C) Complementation crosses between + / rmr2-mum070824 and rmr7-3 / rmr7-3 parents produced plants with rare sectored anthers (white arrowheads). Ten of the 33 progeny had some low frequency (<1%) of sectored anthers. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 42

Earl] rmr2-1 lineage

rmr2-1 M1

QRS T eUV XSR Z [ mV\]

rmr2-1 M2

V \ X] e X e X eR

^ _; ` a

8 : ; ;=: 8 : ;<== ;> e[Te Xb[X cSm rmr2-1 lin ` lements

rmr7 alleles

e[Te Xb[X d[V\ XS cSm

a `

eUV XSR Z [ mV\] QRS T rmr2-1 lin lement

rmr7 alleles P

Generation ear b1 rmr2-1 Generation Pear FG

B-weak rmr2-1 FG

FG FG

F F

H H

F F

H H FF

B-I rmr2-1 FF

FF FF

II II

II II

IJ IJ

IJ IJ

IK IK

IK IK

IL IL

IL IL

IM IM

IM IM

IN IN

IN IN

IO IO

IO IO

IG IG

b1 rmr2-1 ?>;f E

IG IG

I I

H H

?f;<8

I I

H H

IF IF

IF IF

IF IF

JI JI = ? ;= g

7 = ? ;g D1

JI JI

11 11

11 @ABC 11

= ;f :8 1 1 1= ;DE 1 1

Figure 2.7: (Continued on next page.) CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 43

Figure 2.7: Pedigree of rmr2-1 lineages used in complementation crosses with rmr7 alleles. All rmr2-1 lineages that resulted in complementation crosses with rmr7 alleles trace back to an early cross of individual 96-207-16 (B1-I Rmr2 / b1 rmr2-1 ) by individual 96-211-7 (b1 rmr2-1 / b1 rmr2-1 ). For clarity plants resulting from that cross (peach circles) are shown multiple times though they all originate from the single parental cross. The remainder of the pedigree highlights rmr2-1 lineages that complement rmr7 alleles (green) and those that fail to complement (teal). Asterisks mark the approximate time point of recombination events that separated the original b1 and rmr2-1 loci. Families that were crossed to rmr7 alleles for complementation tests have their family number designations printed beside them. The pedigree on the far left does not fully connect due to a gap in the 2007 records; however the linked b1 allele supports that this sub-lineage is most likely related to the other b1 rmr2-1 lineage. Connector lines and orientation of the family circles follow that of Figure 2.5 on page 40. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 44

A rmr2 lineage rmr2 parents Complementation Crosses rmr7 parents rmr7 lineage

11-869-4 (+ / B rmr2-1) X rmr7-5 Rmr2 / B rmr2-1 (+ / rmr7-5) X B rmr2-1 11-537-1 11-869-2 B rmr2-1 X rmr7-5 Rmr7 / rmr7-5 B rmr2-1 / B rmr2-1 rmr7-5 X B rmr2-1 11-537-7 rmr7-5 / rmr7-5 11-531-4 Rmr2 / b rmr2-1 (+ / b rmr2-1) X rmr7-5 b rmr2-1 X rmr7-5 11-532-3 11-531-5 rmr7-5 X b rmr2-1 rmr7-5 / rmr7-5 b rmr2-1 / b rmr2-1

11-431-14 b rmr2-1 / b rmr2-1 eneration Current Generation

B Complementation Crosses Complementation Test Results (+ / b rmr2-1) X rmr7-5 (+ / B rmr2-1) X rmr7-5

half (+ / rmr7-5) X B rmr2-1

b rmr2-1 X rmr7-5 rmr7-5 X b rmr2-1 heterozygotes Expected all B rmr2-1 X rmr7-5 rmr7-5 X B rmr2-1 trans- 1 2 3 4 6 7 Anther Color Score (ACS)

Figure 2.8: Reciprocal complementation crosses between rmr2-1 and rmr7-5. (A) Descrip- tion of the crosses used to test complementation between rmr7-5 and two different lineages of the rmr2-1 allele. The parents and resulting crosses are in the center, while the more distant relationships between parents are diagrammed with each segment of dark grey hor- izontal line representing an earlier cross. (B) The results of the complementation crosses from (A). Weighted boxplots highlight the mean and variation in anther color scores (ACS) from these crosses. Those crosses where the rmr2-1 lineage complemented rmr7 alleles are colored green; while the rmr2-1 lineage that fails to complement rmr7 alleles is colored teal. See Figure 2.7 on page 42 for the detailed rmr2-1 lineages. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 45

2.7 References

1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) Basic local alignment search tool. Journal of Molecular Biology 215, 403–410. 2. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402. 3. Aravind, L. & Landsman, D. (1998) AT-hook motifs identified in a wide variety of DNA-binding proteins. Nucleic Acids Research 26, 4413–4421. 4. Barbour, J. R., Liao, I. T., Stonaker, J. L., Lim, J. P., Lee, C. C., Parkinson, S. E., Kermicle, J., Simon, S. A., Meyers, B. C., Williams-Carrier, R., Barkan, A. & Hollick, J. B. (2012) required to maintain repression2 is a novel protein that facilitates locus- specific paramutation in maize. The Plant Cell 24, 1761–1775. 5. Brzeska, K., Brzeski, J., Smith, J. & Chandler, V. L. (2010) Transgenic expression of CBBP, a CXC domain protein, establishes paramutation in maize. Proceedings of the National Academy of Sciences 107, 5516–5521. 6. Chou, K.-C. & Shen, H.-B. (2010) Plant-mPLoc: A top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS ONE 5, e11335. 7. Cone, K. C., Cocciolone, S. M., Moehlenkamp, C. A., Weber, T., Drummond, B. J., Tagliani, L. A., Bowen, B. A. & Perrot, G. H. (1993) Role of the regulatory gene pl in the photocontrol of maize anthocyanin pigmentation. The Plant Cell 5, 1807–1816. 8. Cramer, P., Bushnell, D. A. & Kornberg, R. D. (2001) Structural basis of transcription: RNA Polymerase II at 2.8 Ångstrom resolution. Science 292, 1863–1876. 9. Dooner, H. K., Robbins, T. P. & Jorgensen, R. A. (1991) Genetic and developmental control of anthocyanin biosynthesis. Annual Reviews Genetics 25, 173–199. 10. Dunker, A. K., Cortese, M. S., Romero, P., Iakoucheva, L. M. & Uversky, V. N. (2005) Flexible nets: the roles of intrinsic disorder in protein interaction networks. The FEBS Journal 272, 5129–5148. 11. Erhard, K. F. Jr., Parkinson, S. E., Gross, S. M., Barbour, J. R., Lim, J. P. & Hollick, J. B. (2013) Maize RNA Polymerase IV defines trans-generational epigenetic variation. The Plant Cell 25, 808–819. 12. Erhard, K. F. Jr., Stonaker, J. L., Parkinson, S. E., Lim, J. P., Hale, C. J. & Hollick, J. B. (2009) RNA Polymerase IV functions in paramutation in Zea mays. Science 323, 1201–1205. 13. Erhard, K. F. Jr., Talbot, J. R. B., Deans, N. C., McClish, A. E. & Hollick, J. B. (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199, 1107–1125. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 46

14. Felsenstein, J. (2005) PHYLIP (phylogeny inference package). Seattle, WA: University of Washington version 3.6. 15. Finn, R. D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J. E., Gavin, O. L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E. L., Eddy, S. R. & Bateman, A. (2010) The Pfam protein families database. Nucleic Acids Research 38 (Database issue), D211–D222. 16. Fuda, N. J., Ardehali, M. B. & Lis, J. T. (2009) Defining mechanisms that regulate RNA polymerase II transcription in vivo. Nature 461, 186–192. 17. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Appel, R. D. & Bairoch, A. in The Proteomics Protocols Handbook (ed Walker, J.) 571–607 (Humana Press, Totowa, NJ, 2005). 18. Goodstein, D. M., Shu, S., Howson, R., Neupane, R., Hayes, R. D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N. & Rokhsar, D. S. (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Research 40 (Database issue), D1178–D1186. 19. Haag, J. R., Brower-Toland, B., Krieger, E. K., Sidorenko, L., Nicora, C. D., Norbeck, A. D., Irsigler, A., LaRue, H., Brzeski, J., McGinnis, K., Ivashuta, S., Pasa-Tolic, L., Chandler, V. L. & Pikaard, C. S. (2014) Functional diversification of maize RNA poly- merase IV and V subtypes via alternative catalytic subunits. Cell Reports 9, 378–390. 20. Hale, C. J., Erhard, K. F. Jr., Lisch, D. & Hollick, J. B. (2009) Production and pro- cessing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genetics 5, e1000598. 21. Hale, C. J., Stonaker, J. L., Gross, S. M. & Hollick, J. B. (2007) A novel Snf2 protein maintains trans-generational regulatory states established by paramutation in maize. PLoS Biology 5, e275. 22. Hawley, R. S. & Gilliland, W. D. (2006) Sometimes the result is not the answer: the truths and the lies that come from using the complementation test. Genetics 174, 5–15. 23. Hollick, J. B., Patterson, G. I., Coe, E. H. Jr., Cone, K. C. & Chandler, V. L. (1995) Allelic interactions heritably alter the activity of a metastable maize Pl allele. Genetics 141, 709–719. 24. Hollick, J. B. & Chandler, V. L. (2001) Genetic factors required to maintain repression of a paramutagenic maize pl1 allele. Genetics 157, 369–378. 25. Hollick, J. B., Kermicle, J. L. & Parkinson, S. E. (2005) Rmr6 maintains meiotic inheritance of paramutant states in Zea mays. Genetics 171, 725–740. 26. Jokerst, R. S., Weeks, J. R., Zehring, W. A. & Greenleaf, A. L. (1989) Analysis of the gene encoding the largest subunit of RNA polymerase II in Drosophila. Molecular and General Genetics 215, 266–275. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 47

27. Kelley, L. A. & Sternberg, M. J. E. (2009) Protein structure prediction on the Web: A case study using the Phyre server. Nature Protocols 4, 363–371. 28. Kuehner, J. N., Pearson, E. L. & Moore, C. (2011) Unravelling the means to an end: RNA polymerase II transcription termination. Nature Reviews Molecular Cell Biology 12, 283–294. 29. Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J. D., Gibson, T. J. & Higgins, D. G. (2007) Clustal W and Clustal X version 2.0. Bioinfor- matics 23, 2947–2948. 30. Larson, E. T., Eilers, B., Menon, S., Reiter, D., Ortmann, A., Young, M. J. & Lawrence, C. M. (2007) A winged-helix protein from Sulfolobus turreted icosahedral virus points toward stabilizing disulfide bonds in the intracellular proteins of a hyperthermophilic virus. Virology 368, 249–261. 31. Law, J. A., Du, J., Hale, C. J., Feng, S., Krajewski, K., Palanca, A. M. S., Strahl, B. D., Patel, D. J. & Jacobsen, S. E. (2013) Polymerase IV occupancy at RNA-directed DNA methylation sites requires SHH1. Nature 498, 385–389. 32. Law, J. A., Vashisht, A. A., Wohlschlegel, J. A. & Jacobsen, S. E. (2011) SHH1, a homeodomain protein required for DNA methylation, as well as RDR2, RDM4, and chromatin remodeling factors, associate with RNA polymerase IV. PLoS Genetics 7, e1002195. 33. Li, S., Vandivier, L. E., Tu, B., Gao, L., Won, S. Y., Li, S., Zheng, B., Gregory, B. D. & Chen, X. (2015) Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis. Genome Research 25, 235–245. 34. Liu, J., Bai, G., Zhang, C., Chen, W., Zhou, J., Zhang, S., Chen, Q., Deng, X., He, X.-J. & Zhu, J.-K. (2011) An atypical component of RNA-directed DNA methylation ma- chinery has both DNA methylation-dependent and -independent roles in locus-specific transcriptional gene silencing. Cell Research 21, 1691–1700. 35. Luo, J. & Hall, B. D. (2007) A multistep process gave rise to RNA Polymerase IV of land plants. Journal of Molecular Evolution 64, 101–112. 36. Matzke, M. A., Kanno, T. & Matzke, A. J. (2015) RNA-directed DNA methylation: the evolution of a complex epigenetic pathway in flowering plants. Annual Review of Plant Biology 66, 9.1–9.25. 37. McGinnis, K. M., Springer, C., Lin, Y., Carey, C. C. & Chandler, V. L. (2006) Tran- scriptionally silenced transgenes in maize are activated by three mutations defective in paramutation. Genetics 173, 1637–1647. 38. Parkinson, S. E., Gross, S. M. & Hollick, J. B. (2007) Maize sex determination and abaxial leaf fates are canalized by a factor that maintains repressed epigenetic states. Developmental Biology 308, 462–473. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 48

39. Ream, T. S., Haag, J. R., Wierzbicki, A. T., Nicora, C. D., Norbeck, A. D., Zhu, J.-K., Hagen, G., Guilfoyle, T. J., Pasa-Tolic, L. & Pikaard, C. S. (2009) Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II. Molecular Cell 33, 192–203. 40. Rosenberg, O. S., Dovey, C., Tempesta, M., Robbins, R. A., Finer-Moore, J. S., Stroud, R. M. & Cox, J. S. (2011) EspR, a key regulator of Mycobacterium tuberculosis virulence, adopts a unique dimeric structure among helix-turn-helix proteins. Proceedings of the National Academy of Sciences 108, 13450–13455. 41. Rost, B., Yachdav, G. & Liu, J. (2004) The PredictProtein Server. Nucleic Acids Re- search 32 (Web server issue), W321–W326. 42. Schmid, M., Davison, T. S., Henz, S. R., Pape, U. J., Demar, M., Vingron, M., Scholkopf, B., Weigel, D. & Lohmann, J. U. (2005) A gene expression map of Ara- bidopsis thaliana development. Nature Genetics 37, 501–506. 43. Schnable, P. S. et al. (2009) The B73 maize genome: complexity, diversity, and dynam- ics. Science 326, 1112–1115. 44. Shi, J., Blundell, T. L. & Mizuguchi, K. (2001) FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310, 243–257. 45. Sidorenko, L., Dorweiler, J. E., Cigan, A. M., Arteaga-Vazquez, M., Vyas, M., Kermicle, J., Jurcin, D., Brzeski, J., Cai, Y. & Chandler, V. L. (2009) A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genetics 5, e1000725. 46. Stonaker, J. L., Lim, J. P., Erhard, K. F. Jr. & Hollick, J. B. (2009) Diversity of Pol IV function is defined by mutations at the maize rmr7 Locus. PLoS Genetics 5, e1000706. 47. Swiezewski, S., Crevillen, P., Liu, F., Ecker, J. R., Jerzmanowski, A. & Dean, C. (2007) Small RNA-mediated chromatin silencing directed to the 3’ region of the Arabidopsis gene encoding the developmental regulator, FLC. Proceedings of the National Academy of Sciences 104, 3633–3638. 48. Swigonova, Z., Lai, J., Ma, J., Ramakrishna, W., Llaca, V., Bennetzen, J. L. & Messing, J. (2004) Close split of sorghum and maize genome progenitors. Genome research 14, 1916–1923. 49. Tucker, S. L., Reece, J., Ream, T. S. & Pikaard, C. S. (2011) Evolutionary history of plant multisubunit RNA polymerases IV and V: subunit origins via genome-wide and segmental gene duplications, retrotransposition, and lineage-specific subfunctionaliza- tion. Cold Spring Harbor Symposia on Quantitative Biology 75, 285–297. 50. Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. (2009) Jalview version 2–a multiple sequence alignment editor and analysis workbench. 25, 1189–1191. CHAPTER 2. GENETIC INTERACTIONS BETWEEN POL IV COMPONENTS 49

51. Williams-Carrier, R., Stiffler, N., Belcher, S., Kroeger, T., Stern, D. B., Monde, R.-A., Coalter, R. & Barkan, A. (2010) Use of Illumina sequencing to identify transposon insertions underlying mutant phenotypes in high-copy Mutator lines of maize. Plant Journal 63, 1189–1191. 52. Woodhouse, M. R., Freeling, M. & Lisch, D. (2006) Initiation, establishment, and main- tenance of heritable MuDR transposon silencing in maize are mediated by distinct factors. PLoS Biology 4, e339. 53. Xue, B., Dunbrack, R. L., Williams, R. W., Dunker, A. K. & Uversky, V. N. (2010) PONDR-FIT: A meta-predictor of intrinsically disordered amino acids. 1804, 996– 1010. 54. Yook, K. J., Proulx, S. R. & Jorgensen, E. M. (2001) Rules of nonallelic noncomple- mentation at the synapse in Caenorhabditis elegans. Genetics 158, 209–220. 50

Chapter 3

RNA polymerase IV affects nascent transcription in Zea mays

Note: Portions of this chapter have been published in: Erhard Jr., K. F., J. R. B. Talbot, N. C. Deans, A. E. McClish, and J. B. Hollick (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics: 199: 1107-1125.

3.1 Introduction

Eukaryotes use at least three DNA-dependent RNA polymerases (RNAPs) to transcribe their genomes into functional RNAs. RNAP Pol II generates mRNAs and non-coding RNAs involved in various RNA-mediated regulatory pathways (reviewed by Sabin et al. 2013). Multicellular plant genomes encode additional RNAP subunits comprising Pol IV and Pol V, which are central to a small interfering RNA (siRNA)-based silencing pathway primarily targeting repetitive sequences such as transposable elements (TEs) (Matzke & Mosher 2014, Matzke et al. 2015). These additional RNAPs derive from duplications of specific Pol II subunits followed by subfunctionalization during plant evolution (Luo & Hall 2007, Tucker et al. 2011), yet the holoenzyme complexes still share some Pol II subunits (Ream et al. 2009, Haag et al. 2014). Zea mays (maize) has distinct largest subunits for Pol IV and V and, unlike Arabidopsis thaliana (Arabidopsis), three second-largest subunits (Erhard et al. 2009, Stonaker et al. 2009, Sidorenko et al. 2009) that in distinct combinations form two Pol IV and three Pol V subtypes (Haag et al. 2014). Genetic analyses of rna polymerase (d/e) 2a [rp(d/e)2a] encoding one of the second-largest subunits (Stonaker et al. 2009, Sidorenko et al. 2009) together with recent proteomic data showing association of a putative RNA-dependent RNA polymerase (RDR2) with only RP(D/E)2a-containing subtypes (Haag et al. 2014) indicate that maize Pol IV subtypes have diverse functional roles in managing genome homeostasis. Loss of Pol IV function has different consequences in Arabidopsis, Brassica rapa (a close CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 51

relative of Arabidopsis), and maize, species in which Pol IV mutants have been identified (Herr et al. 2005, Onodera et al. 2005, Erhard et al. 2009, Huang et al. 2013), though mutants in all three species are defective for siRNA production (Zhang et al. 2007, Mosher et al. 2008, Erhard et al. 2009, Huang et al. 2013). Arabidopsis NUCLEAR RNA POLYMERASE D1A (NRPD1A) mutants are late flowering (Pontier et al. 2005) and B. rapa nrpd1a mutants have no obvious phenotypes (Huang et al. 2013), while maize rna polymerase d1 (rpd1 ) mutants have multiple developmental defects and trans-generational degradation in plant quality compared to non-mutant siblings (Parkinson et al. 2007, Erhard et al. 2009). The disparate impacts of rpd1 /NRPD1A mutations in maize vs. Brassicaceae representatives are potentially related to different genomic TE contents as TE sequences are greatly ex- panded in maize compared to both Arabidopsis (Hale et al. 2009) and B. rapa (Wang et al. 2011). However, recently reported cytosine methylome profiles (Li et al. 2014) indicate maize TEs are equally well methylated in the absence of Pol IV as their Arabidopsis counterparts (Stroud et al. 2013), predicting that Pol IV-dependent cytosine methylation is not required to maintain TE silencing. The developmental defects observed in rpd1 mutants are both distinct and non-heritable (Parkinson et al. 2007), and therefore unlikely related to TE-derived mutations. These de- fects also appear unrelated to siRNA-induced silencing because other maize mutants affecting siRNA biogenesis, including rp(d/e)2a, are developmentally normal (Hale et al. 2007, Ston- aker et al. 2009, Barbour et al. 2012). We hypothesize that maize has co-opted RPD1/Pol IV to transcriptionally control specific alleles of genes for which TEs and TE-like repeats act as regulatory elements. Supporting this concept, specific purple plant1 (pl1 ) alleles having an upstream doppia TE fragment are regulated by RPD1 (Erhard et al. 2013). As the maize genome is composed of >85% TE-like sequences (Schnable et al. 2009), many of which occur within 5 kilobases (kb) of genes (Baucom et al. 2009, Gent et al. 2013), a large number of alleles using TE-like sequences as regulatory elements is possible. Phylogenomic comparisons between Arabidopsis thaliana and A. lyrata also support the idea that gene-proximate TEs represent a source of regulatory diversity (Hollister et al. 2011). Maize RPD1 was initially identified as a genetic factor required to maintain transcrip- tional repression of specific alleles subject to paramutation (Hollick et al. 2005) - a process by which meiotically heritable changes in gene regulation are influenced by trans-homolog interactions (Brink 1956, Brink 1958, Hollick 2012). Presumably because detailed pedigree analyses are required to recognize instances of paramutation, only a few clear examples in- volving endogenous alleles have been described (Brink 1956, Coe 1961, Hagemann & Berg 1978, Hollick et al. 1995, Sidorenko & Peterson 2001, Pilu et al. 2008). Similar behaviors involving transgenes have been noted in both plants and animals (Chandler & Stam 2004, Rassoulzadegan et al. 2006, Khaitova et al. 2011, de Vanssay et al. 2012, Shirayama et al. 2012, Ashe et al. 2012) though it remains unknown whether these examples are due to mechanistically related processes. One strategy for identifying a broader set of alleles sub- ject to paramutation would be to start with a list of candidate genes whose transcriptional regulation is affected by RPD1 function. The evolutionary function(s) of Pol IV remain enigmatic. Because Pol IV is required for CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 52

siRNA-directed cytosine methylation (reviewed by Matzke & Mosher 2014, Matzke et al. 2015), it is expected that the regulation of many alleles might be affected by RPD1/NRPD1 action though few such alleles have been identified in maize (Hollick et al. 2005, Parkinson et al. 2007, Erhard et al. 2013) and Arabidopsis (Matzke et al. 2007, Ariel et al. 2014). To date there have been no reports of genome-wide effects of RPD1/NRPD1 on gene regulation in any species, although several studies have noted correlations between siRNA profiles and nearby genic mRNA abundance (Hollister et al. 2011, Eichten et al. 2012, Greaves et al. 2012). Template competitions between Pol IV and Pol II have been proposed (Hale et al. 2009) to account for RPD1-based transcriptional repression seen at the Pl1-Rhoades allele of pl1 (Hollick et al. 2005) and for increases in polyadenylated transcripts of some Long Terminal Repeat (LTR) retrotransposons that specifically accompany loss of RPD1 but not loss of two other siRNA biogenesis factors (Hale et al. 2009). These results indicate that RPD1-containing Pol IV complexes directly interfere with Pol II transcription of RPD1- targeted genomic regions. Here we use global run-on sequencing (GRO-seq) (Core et al. 2008) to identify genome- wide targets of Pol IV-based transcriptional regulation. This technique profiles RNAs from RNAPs incorporating a brominated UTP ribonucleotide during a short nuclear run-on re- action (Core et al. 2008). Maize Pols IV and V can extend transcripts in vitro, but ri- bonucleotide incorporation is attenuated compared to Pol II (Haag et al. 2014). Because Arabidopsis Pol IV products are rapidly processed to siRNAs (Li et al. 2015), transcription rates of maize Pol IV are relatively slow in vitro (Haag et al. 2014), and no appreciable maize Pol IV RNAs are detected in vivo in short run-on experiments (Erhard et al. 2009), most non-rRNA, non-tRNA transcription detected by GRO-seq is expected to represent Pol II function. Our results show loss of Pol IV affects transcription profiles at 5 ′ and 3 ′ gene ends and at a discrete set of unique TEs and genes, some of whose dysregulation may contribute to rpd1 mutant developmental defects.

3.2 Materials and Methods

Genetic stocks The rpd1-1 null mutation (originally designated rmr6-1 ; Erhard et al. 2009) was intro- gressed into the B73 inbred background to ∼97% by repeated backcrosses of F2 rpd1-1 / rpd1-1 mutant pollen to a recurrent B73 female parent. Families segregating 1:2:1 for rpd1-1 / rpd1-1 mutants, heterozygotes, and homozygous Rpd1-B73 individuals were used for nuclei isolations and RNA isolations for polymerase chain reaction (RT-PCR) and quantitative real-time PCR (qRT-PCR) analysis. The qRT-PCR analysis of dcl3-2 mutants vs. homozygous Dcl3 wild-types, used a family (progeny number: 130009) segregating 1:2:1 (for Dcl3 / Dcl3, Dcl3 / dcl3-2 and dcl3-2 / dcl3-2, respectively) introgressed into the B73 inbred background to ∼73%. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 53

GRO-seq library preparation Ten homozygous wild-type (WT; Rpd1-B73 / Rpd1-B73 ) and 10 homozygous rpd1-1 mutants (rpd1 mutant) siblings were identified with a dCAPs marker for the rpd1-1 lesion (Erhard et al. 2013) and used for nuclei isolations. Nuclei were isolated from whole shoots (roots removed) of 8 day-old seedlings. Seedling tissues and dry ice were pulverized in a blade coffee grinder and transferred to a ceramic mortar with 15 mL of ice-cold isolation buffer (40% glycerol, 250 mM sucrose, 20 mM Tris pH 7.8, 5 mM MgCl2, 5 mM KCl, 0.25% Triton X-100, 5 mM β-mercaptoethanol). Pulverized tissue in isolation buffer was ground further with a ceramic pestle, and filtered through cheesecloth into a 50 mL conical tube. Grindates were filtered again through 40 µm nylon cell strainers (BD Biosciences, San Jose CA) into 35 mL centrifuge tubes. Nuclei were centrifuged at 6,000 g for 15 minutes at 4◦C and pellets washed with 15 mL isolation buffer. Washes were repeated two more times and pellets were resuspended in 100 µL resuspension buffer (50 mM Tris pH 8.5, 5 mM MgCl2, 20% glycerol, 5 mM β-mercaptoethanol). Transcription run-ons were performed as described (Hollick & Gordon 1993) with the following changes: 0.5 mM 5-bromouridine 5 ′-triphosphate (BrUTP) (Sigma) was substituted for UTP and 2 µM cold cytidine triphosphate (CTP) was added in addition to 10 µL of α-32P-CTP (3000 Ci/mmol, 10 mCi/mL; Perkin-Elmer). RNA isolation was as described (Hollick & Gordon 1993) with the following changes: DNase I and Proteinase K digestions were performed for 1 hour each at 37◦C and 42◦C, respectively; one acid phenol/chloroform extraction was performed to isolate RNA. High throughput sequencing libraries were prepared from in vitro labeled RNA as described (Core et al. 2008) using agarose bead-conjugated α-bromodeoxyuridine (BrdU) antibody (Santa Cruz Biotechnology).

GRO-seq library processing Fifty nucleotide (nt) single-end raw reads (54,318,135 for WT library and 54,873,783 for rpd1 mutant library) were generated on the Illumina HiSeqII platform (Vincent J. Coates Genome Sequencing Laboratory, UC Berkeley). Based on FastQC (http://www.bioinfor- matics.babraham.ac.uk/projects/fastqc/) PHRED score analysis, 10 nt were trimmed from the relatively lower quality 3 ′ end of all reads from both libraries using the FASTQ/A Trimmer script (http://hannonlab.cshl.edu/fastx_toolkit/). The Cutadapt program (Mar- tin 2011) was used to trim adapter sequences and any additional low quality bases (op- tion -q 10) from the 3 ′ end of all reads; reads shorter than 20 nt after adapter trimming (2,253,657 and 2,488,839, respectively) were excluded (option -m 20). The Fastx Quality Filter (http://hannonlab.cshl.edu/fastx_toolkit/) removed reads with fewer than 97% of their bases (option -p 97) above the Phred quality of 10 (option -q 10); 654,703 WT reads and 667,412 rpd1 mutant reads were removed. Finally, 7,546 and 5,443 sequencing artifacts were removed from the WT and rpd1 mutant libraries, respectively, using the Fastx Arti- facts Filter (http://hannonlab.cshl.edu/fastx_toolkit/). The resulting libraries consisted of 51,402,229 and 51,712,089 high quality WT and rpd1 mutant reads, respectively, with an CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 54

average length of 32 nt and these were used for subsequent mapping, computational, and statistical analyses.

Computational analyses of GRO-seq libraries Alignments to genomic features: For filtering and downstream comparisons, high qual- ity GRO-seq reads were mapped using Bowtie alignment software (version 0.12.7; Langmead et al. 2009) to annotated maize sequence features (see Table 3.1 on page 74 for file types and origins): rRNAs and tRNAs (kindly provided by Dr. Blake Meyers, University of Delaware), maize B73 AGPv2 filtered gene set (FGS), Maize Transposable Element Con- sortium (MTEC) consensus sequences, and maize pseudo-chromosomes 1 through 10 repre- senting the sequenced B73 reference genome (AGP version 2, build 5b). Bowtie indices were built from the above annotated maize feature sequences using the bowtie-build command with default parameter settings. Reads that failed to match to the rRNA/tRNA indices with up to two mismatches (option -v 2) comprise the filtered non-rRNA/non-tRNA align- ments (41,169,885 and 42,498,964 for WT and rpd1 mutant libraries, respectively) which were aligned to the maize pseudo-chromosomes 1 through 10 allowing two mismatches (op- tion -v 2) and to match only once in the genome (option -m 1). These alignments define uniquely mapping reads (12,239,069 and 11,739,444 for WT and rpd1 mutant libraries, re- spectively), which were used for the metagene and differential expression analyses described below.

Distribution of reads over genomic features: To measure the relative contribution of introns and exons in WT and rpd1 mutant GRO-seq libraries, we first isolated the subset of 26,987 (68%) maize FGS models with no predicted alternatively spliced transcript isoforms, which we define as single transcript genes. As a control, we determined the contribution of introns and exons to all single transcript genes using intron and exon chromosomal coordi- nates contained in the FGS positions annotation file (Table 3.1 on page 74). We compared the control distribution to percentages of uniquely mapping GRO-seq reads overlapping in- trons or exons within single transcript genes, which were identified using the intersectBed tool, part of the BEDTools suite (Quinlan & Hall 2010), with the following parameters: -f 1 -u -wa. Only reads contained entirely within a gene, exon, or intron were reported (no untranslated regions or exon/intron boundaries are reported by this method). Percent- ages of introns and exons were calculated as a proportion of unique reads contained within single transcript genes. For direct Bowtie alignments of filtered non-rRNA/non-tRNA reads to TE and FGS se- quence indices, I allowed up to two mismatches (option -v 2) and reads to map more than once to the respective set of sequences, but counted each multi-mapping read only once (de- fault option -k 1 to report only 1 random alignment per read for TEs and option --best to report the best FGS alignment). Analyses of differentially transcribed TE superfamilies com- pared numbers of reads aligned directly to MTEC consensus TE sequences (with the same alignment parameters) normalized by total mappable reads of each respective library. For a CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 55

control alignment to TE sequences, we generated a random sampling of genomic sequences us- ing the Sherman program (http://www.bioinformatics.babraham.ac.uk/projects/sherman/). Specifically, a set of 51,402,229 (the number of high quality reads in the WT GRO-seq li- brary; option -n 51402229) random 32-mers (the average length of high quality GRO-seq reads for both the WT and rpd1 mutant libraries, weighted for abundances; option -l 32) was generated from the maize genome, inhibiting in silico bisulfite conversion with option -cr 0. The resulting 32-mers were aligned directly to TE and FGS sequences using identical bowtie parameters as GRO-seq read alignments. To determine overlaps of uniquely and repetitively mapping reads to genomic features (genes, TEs, and intergenic regions), I created alignment tables from the raw Bowtie align- ment files (SAM formatted). For each read, alignment characteristics (unmappable, maps uniquely or maps repetitively) were extracted from the unique alignment of non-rRNA/non- tRNA reads to the B73 genome. These were compared to mapping sense / mapping antisense / not mapping values for the same reads to indices built from the FGS sequences, MTEC TE sequences, or custom sequences extending 5 kb upstream (FGS-5kb) or downstream (FGS+5kb) of the original FGS sequences. In all cases, up to two mismatches were allowed to each reference (option -v 2); for the FGS-only alignment the best alignment was reported (option --best), for the rest a random alignment was reported (default option -k 1). The resulting dataset was used to collapse reads into different groups, for instance: uniquely mapping TE reads within 5 kb of gene starts would map only once to the B73 genome, map to the MTEC and FGS-5kb indices, but not to the FGS or FGS+5kb indices. Only the original B73 genome alignment determined uniquely vs. repetitively mapping. Therefore, when a uniquely mapping read maps within 5 kb of an annotated TSS and within 5 kb of a 3 ′ gene end, the read is likely between two genes that are less than 10 kb apart.

Metagene and heatmap profiles of gene boundaries: Uniquely mapping reads overlap- ping TSSs and 3 ′ gene ends were tallied and binned using a metagene analysis pipeline of cus- tom Python scripts (https://github.com/HollickLab/metagene_analysis). Using metagene_ count.py, the 5 ′ read ends (option --count_method start) were tallied against the posi- tions of TSSs (option --feature_count start) and 3 ′ ends (option --feature_count end) of FGS models (Table 3.1, FGS positions on page 74) greater than 1 kb in length (31,794 of 39,656 total models). The tallies extended +/- 5 kb or +/- 1 kb by changing the padding option (--padding) to 5000 or 1000 respectively; in both cases counting was strand spe- cific relative to the feature orientation by default. For the 5 kb tallies, only the first 1 kb of genic windows at each gene were kept by excluding windows with starting po- sitions greater than +1000 from the TSS or less than -1000 from the 3 ′ end in R (ver- sion 2.15.2; http://www.r-project.org/). Tallies from metagene_count.py were strand- specifically binned (option --separate_groups) with metagene_bin.py in either 10 nt non-overlapping windows (--window_size 10 --step_size 10) for detailed metagene plots or 50 nt non-overlapping windows (--window_size 50 --step_size 50) for the heatmap plots. The resulting counts tables were imported to R (version 2.15.2) for normalization, statistical testing and plotting. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 56

To view the normalized coverage (reads per million uniquely mapped), a heatmap-like plot was made using image, a base R command, to plot coverage (z-axis) on a color scale at each position along the gene model (x-axis) for groups of 60 genes (y-axis). Gene models were ordered by their maximum contribution to a 10 nt window’s total abundance (Maximum Sum Contribution) for all heatmap and metagene plots. At each window, a gene’s sum contribution represents its influence on the total coverage via the calculation: gene’s coverage at the window / total coverage of all genes at the window. The Maximum Sum Contribution was also used to exclude the upper and lower 5% of genes from the metagene plots described below. Heatmap plots were binned into 50 nt non-overlapping windows, and neighboring gene models after sorting by Maximum Sum Contribution were averaged in groups of 60 genes. To summarize the average behavior of GRO-seq coverage over the FGS models, I created metagene plots summarizing the coverage of each gene using either the total (sum) or the mean coverage at each 10 nt window. Welch’s two-sample t-tests and 95% confidence inter- vals were calculated for each 10 nt window across all inner 90% quantile (by Maximum Sum Contribution) gene models. To identify the windows with statistically significant coverage differences +/- RPD1, the individual Welch’s t-test results were corrected for multiple sam- pling using the Holm-Bonferroni method (α = 0.05) across all 800 windows comprising the regions around the TSSs and 3 ′ ends on both sense and antisense strands. Final plots used base R commands to plot mean or sum abundance as lines (or bars), 95% confidence intervals as polygons, and highlight statistically significant windows with horizontal line segments.

Correlations between WT and rpd1 mutant libraries: To determine the correspon- dence between WT and rpd1 mutant libraries, the number of uniquely mapping reads per kb per million uniquely mapped reads were tallied across various regions. Near-genic analysis of the 31,794 genes analyzed by the metagene profiles divided the region around each gene model into constant 1 kb upstream of TSS, 1 kb downstream of TSS, 1 kb upstream of gene end, and 1 kb downstream of gene end regions. The internal portion (greater than 1 kb away from each gene boundary) of those genes greater than 2 kb in length (23,050 of 31,794) was used to represent the interior gene region. All counts used the 5 ′-most base of each GRO-seq read. For each region, zero-values were artificially set to 1/10th of the lowest non-zero value in either dataset - this allowed both the inclusion of zero vs. non-zero data and had minimal (linear regression) to no (Spearman’s rank correlation) effect on the summary statistics. Re- sulting normalized coverages were log10 transformed for plotting, fitting a linear regression “Data fit” line, and calculating the Spearman’s rank correlation coefficient, all of which were performed in R (version 3.0.2). siRNA analyses: NCBI-sourced profiles of maize 16-35 nt RNAs were pooled and over- lapped with the gene models used in the GRO-seq metagene analysis. Because no available profiles represented 8 day-old B73 seedling shoots, I pooled WT B73 siRNAs from seedling (day 3) root tips (SRR218319: Gent et al. 2012), seedling (day 11) shoot apices (SRR488770 and SRR488774: Barber et al. 2012), ovule-enriched unfertilized cob (at silk emergence) CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 57

(SRR408793: Gent et al. 2013), and developing ear (SRR1583943 and SRR1583944: Gent et al. 2014). Pooled sRNAs from B73 rdr2 mutant developing ears (SRR1583941 and SRR1583942: Gent et al. 2014) represent 24 nt RNA-deficiency that should mimic Pol IV loss as RDR2 is required with Pol IV for 24 nt RNA biogenesis (Matzke & Mosher 2014, Matzke et al. 2015). All raw sequences were downloaded from the NCBI Sequence Read Archive and processed to a high-quality set which all had trimmable 3 ′ adapters and no low quality (Phred scores less than 30) nor ambiguous bases. The resulting high-quality reads (11,163,623 for rdr2 mutant and 74,787,717 for WT) were filtered against rRNA/tRNA sequences and aligned to the maize B73 v2 genome with 551,194 (rdr2 mutant) and 11,466,496 (WT) reads mapping uniquely. Uniquely mapping 24 nt reads (154,509 or 28% for rdr2 mutant and 8,611,691 or 75% for WT) were subjected to metagene analysis as described above, using all size classes for library normalization in reads per million uniquely mapped per region length. For con- sistency, the order of genes in the heatmaps followed the same order (by maximum sum contribution to total GRO-seq coverage) as the GRO-seq heatmaps.

Definition of alternative transcription start sites: As an alternative TSS definition, I used the 5 ′ end sequence of full length cDNAs (flcDNAs) from predominately 7 day-old seedling tissues (ZM_BFc set: http://www.ncbi.nlm.nih.gov/nucest/?term=ZM_BFc; see Soderlund et al. 2009). Positions for the flcDNA 5 ′ ends were defined by perfectly (0- mismatches) and uniquely (only 1 B73 alignment) aligning the first 50 nt of the cDNA sequence to the maize B73 reference genome (version 2, build 5b) with Bowtie (version 0.12.7). Identical TSS positions were collapsed. Metagene counting, binning and plotting of total normalized GRO-seq read abundance were performed +/- 1 kb from all of the flcDNA- defined TSSs in 10 nt non-overlapping windows. The output of metagene_count.py was loaded into R to identify the position of the maximum GRO-read counts around the flcDNA defined TSS using the command which(x == max(x)), where x represents each item (tally of GRO-reads at that position) in an array (of positions around the TSS).

Identification of differentially transcribed genes and TEs: For differential transcript analysis of genes (gene body only) and TEs with defined genomic coordinates (positionally defined TEs), the DESeq package (Anders & Huber 2010) was used with uniquely aligning reads and the maize FGS GFF3 and MTEC TE GFF3 (Table 3.1 on page 74) annota- tion files. The counts table for DESeq was generated from tallies of raw reads overlapping features in sense and antisense orientations (-s and -S options respectively) generated by intersectBed (additional options -c -wa; Quinlan & Hall 2010). The sense and anti- sense counts tables were processed with the following DESeq parameters: fit = “local”; method = “blind”; sharingMode = “fit-only”. The Benjamini and Hochberg correction (Benjamini & Hochberg 1995) for multiple sampling adjusted the p-values adjusted (padj) for False Discovery Rate (FDR) control and a 10% FDR threshold was applied to the padj val- ues (Anders & Huber 2010). The list of features passing the DESeq thresholds were further curated and trimmed as described below. Sequence polymorphisms between the introgressed CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 58

haplotype containing the rpd1-1 mutation and the homologous B73 chromosome 1 region likely contribute to differences in the abundances of reads aligning to this region between the WT and rpd1 mutant libraries. While the size of the rpd1-1 -containing haplotype is unknown, we estimated it to be at most 20 megabases (Mb). Therefore, we excluded 12 TEs and 27 genes located within 20 Mb on either side of the rpd1 locus from the respective lists of features classified as having decreased transcription in rpd1 mutants. I manually curated genes with increased or decreased sense transcription based on visual inspection for GRO-seq read coverage consistent with the transcription unit defined by the gene annota- tion; coverage localized to only a portion of the gene model were tagged as unlikely to be related to that gene’s expression (non-bold entries in Tables 3.3 and 3.4 on pages 75 and 78, respectively). Genes with increased antisense transcription were visually inspected and TEs less than 2 kb beyond the genic 3 ′ end were tallied. I determined whether positionally defined, differentially transcribed TEs were located within annotated genic regions or less than 5 kb from their 5 ′ or 3 ′ ends using the closestBed tool (Quinlan & Hall 2010) with the -d parameter, which in this case reports the distance of the closest gene to each TE analyzed.

qRT-PCR Expression analysis Sibling seedlings, from accessions described in the Genetics Stock section, segregating 1:2:1 for Rpd1 homozygotes, Rpd1 / rpd1-1 heterozygotes, and rpd1-1 mutants were geno- typed with CAPS markers using an MwoI digestion polymorphism. The PCR primers (for- ward = 5 ′-TTG GAA TGT CTT CTG CAC GCT ATA TCT GG-3 ′ and reverse = 5 ′-TCT GAA ATG GAA CAA GCA GCC CGA CA-3 ′) produce a 241 bp PCR product that can be cleaved into 159 bp and 82 bp fragments by MwoI if the template is WT, but not if the rpd1-1 lesion is present. A similar protocol using the BlpI restriction , distinguishes Dcl3 and dcl3-2 templates which produce a single 752 bp band vs. 558 bp and 194 bp bands respectively, when amplified with PCR primers: forward = 5 ′-GAA GGC TCA TAG CTG GCA AC-3 ′ and reverse = 5 ′-CAA ACG GGC AAA ATA ACA GA-3 ′ (I. T. Liao and J. B. Hollick, unpublished data). Whole shoots (as used in the GRO-seq experiment, described above) were collected from 8 day-old siblings homozygous wild-type (Rpd1 or Dcl3 ) and mutant (rpd1-1 or dcl3-2 ) plants, flash frozen in liquid nitrogen and stored at -80◦C for subsequent RNA extraction. A total of nine seedlings for each genotype were prepared in pools of three seedlings each. These pools of three were pulverized with dry ice in a coffee grinder and subsequently ground in a mortar and pestle with 5 mL of ice-cold isolation buffer (40% glycerol, 250 mM sucrose, 20 mM Tris pH 7.8, 5 mM MgCl2, 5 mM KCl, 0.25% Triton X-100, 5 mM β-mercaptoethanol). RNAs were then extracted from 1 mL of the grindate with 1 mL of Trizol reagent (Invitrogen) following manufacturer’s protocol. An aliquot (2 µg) of total RNA was DNaseI (Roche) treated for 20 minutes at 37◦C, and heat inactivated (75◦C for 10 minutes) in 5mM EDTA. The Tetro cDNA synthesis kit (Bioline, Taunton, MA) was used for the first strand cDNA synthesis, followed by RNA degradation with an RNase A/TI/H cocktail. qRT-PCR reactions each CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 59

used cDNA from 20 ng of starting total RNA in a 1x SYBR Sensimix (Bioline, Taunton, MA) reaction with 0.25 µM of each primer (Table 3.2 on page 74). Each biological sample had three technical replicates for both the aat and ocl2 primer sets. The qRT-PCR conditions used 40 cycles with 30 second 60◦C annealing and 45 second extension steps, the melt curve ramped from 60◦C to 95◦C in 20 minutes. The technical (3 per template) and biological (3 pools of 3 seedlings each) replicates were combined to calculate the average ocl2 abundance relative to aat as 2(aat Ct - ocl2 Ct).

3.3 Results

Global run-on sequencing profiles nascent transcription in maize seedlings Additional Pol II-derived RNAPs in multicellular plants (Luo & Hall 2007) may affect transcription dynamics across the genome, particularly as all of these RNAPs share accessory subunits with Pol II (Ream et al. 2009, Law et al. 2011, Tucker et al. 2011, Haag et al. 2014). Pol II transcription of both LTR retrotransposons (Hale et al. 2009) and certain pl1 alleles (Erhard et al. 2013) is increased in rpd1 mutants. However, dysregulation of these specific loci cannot explain all the developmental phenotypes observed in rpd1 mutants (Parkinson et al. 2007, Erhard et al. 2009). We used GRO-seq to view nascent transcription profiles +/- RPD1, to identify both particular (haplotype specific) and general (locus independent) effects of Pol IV loss on maize genome transcription. GRO-seq libraries were prepared using sibling rpd1 mutant and non-mutant (WT) seed- lings with each library representing nuclei from 10 separate individuals (see Materials and Methods). Sequencing reads from these libraries were mapped to the B73 reference genome (Schnable et al. 2009) (Figure 3.1A on page 91). Transcripts from all five classes of maize RNAPs (Pols I-V) could be represented in the WT GRO-seq library, whereas only those requiring Pol IV function (including Pol IV transcripts) would be absent in the rpd1 mutant library. To focus predominantly on Pol II, IV and V transcription, we removed most Pol I and III products by excluding rRNA- and tRNA-aligning reads from subsequent analysis (see Materials and Methods). Genomic non-rRNA/non-tRNA reads separated into repeti- tively aligning and uniquely aligning groups show similar distributions +/- RPD1 (Figure 3.1A on page 91). To evaluate whether the libraries are enriched for nascent transcripts as opposed to spliced mRNAs, we compared the exonic/intronic distributions of genic reads (Figure 3.2 on page 92). Both repetitively and uniquely mapping reads include intronic sequences (∼90% and ∼40% respectively) confirming the enriched representation of nascent, unspliced transcripts. Because genes and TEs are predicted to be differentially affected by Pol IV loss, I cat- egorized each read as having possibly originated from a TE, a gene (including annotated UTRs and introns), both, or neither (intergenic). Most uniquely mapping reads originate from genic or intergenic loci having little overlap with TEs (Figure 3.1B, Unique, on page CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 60

91). Repetitively mapping reads align to TE-like sequences (Figure 3.1B, Repetitive), though these TE-like reads usually align to genes as well (Figure 3.1B, purple boxes). Approximately 96% of repetitive reads aligning to TE sequences could originate from genic transcripts. To ensure this enrichment is not biased by our categorical analysis, we repeated the analysis on a set of in silico sequences randomly generated from the maize genome (see Materials and Methods; Figure 3.3 on page 92) and found 97.3% of the TE-aligning in silico-generated 32mers also aligned to genes. These results indicate that the majority of TE sequences rep- resented in nascent transcription profiles are likely found within introns and untranslated regions of gene-derived Pol II RNAs. Exclusively genic reads in both libraries (61% and 59.7% of all mappable WT and rpd1 mutant reads, respectively) are highly enriched compared to the prevalence of genic sequences in the maize genome (8%) (Figure 3.4 on page 93) indicating that these GRO-seq profiles represent largely genic transcription. Because GRO-seq reads provide strand-specific infor- mation, I could also identify a sense-oriented strand bias amongst all genic reads, particularly the uniquely mapping reads which originate from the mapped locus (Figure 3.1C on page 91). Even reads representing TEs embedded within host genes appear biased towards the sense-strand: 95% of uniquely aligning TE-like reads (5,937 and 5,641 reads in WT and rpd1 mutant libraries) are sense oriented relative to their host genes. In all comparisons, the read distributions remain similar between WT and rpd1 mutant libraries, consistent with prior run-on transcription results showing that Pol IV contributes less than 5% of the transcribed nuclear RNA pool (Erhard et al. 2009) and the recent finding that Pol IV transcripts are rapidly processed to siRNAs (Li et al. 2015). These results indicate that loss of RPD1 does not generally shift transcription away from genes and toward TEs, suggesting Pol IV has a more precise mechanism for regulating transcription of its targets.

Maize seedling transcription is focused on genic regions Mappable GRO-seq reads aligning to neither gene nor TE features represented the “inter- genic” category. These reads could have several origins including unannotated genes and TEs or non-coding DNA sources for siRNAs. However, I find that most non-genic non-TE reads represent pretermination transcription downstream of currently annotated addition sites (PAS) (Figure 3.5 on page 93). Nascent transcription profiles in Homo sapiens (Core et al. 2008), Drosophila melanogaster (Core et al. 2012) and Caenorhabditis elegans (Kruesi et al. 2013) also identify transcription beyond the PAS and, in some cases, antisense transcription upstream of the TSS. To test whether nascent transcription is enriched near maize genes, I queried non-genic read alignments at regions 5 kb upstream of annotated TSSs and 5 kb downstream of annotated 3 ′ ends. Both intergenic and TE-only reads consist largely (>80%) of sequences that can align to within 5 kb of a gene model, representing potential genic reads (Figure 3.5A on page 93). The one exception is uniquely mapping TE-only reads, where only half of the reads align within 5 kb of genes. These intergenic and TE-only reads greater than 5 kb from the nearest gene comprise approximately 5% of the maize seedling non-rRNA / non-tRNA transcriptome (see Materials and Methods). At CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 61 our current depth of sequence coverage, I cannot confidently determine if transcription of those loci far from genes is influenced by RPD1 loss. However, our datasets are enriched over genes (∼65% of uniquely mapping reads in both WT and rpd1 mutant datasets) with an average sense strand coverage of 173 (rpd1 mutant) to 187 (WT) raw reads per gene, I therefore continued our focus predominantly on near-genic regions which are enriched in our dataset. In support of the idea that near-genic reads in the maize GRO-seq profiles represent Pol II pretermination extensions of genic transcription units, uniquely mapping near-genic reads align predominantly in the downstream 5 kb (up to 84% and 87% of WT and rpd1 mutant near-genic reads respectively; Figure 3.5B on page 93). The overlap of uniquely mapping reads between the upstream 5 kb and downstream 5 kb (22% in both genotypes) likely represents reads aligning between genes separated by less than 10 kb. Repetitively mapping near-genic reads have a more even distribution between upstream and downstream 5 kb, which could represent an artifact of alignments to non-origin loci. With uniquely mapping reads, where alignments likely correspond to the originating locus, near-genic reads are predominantly sense stranded (Figure 3.5C on page 93), in accord with the hypothesis that they represent a continuation of genic transcription. I next used pile-ups of uniquely mapped WT GRO-seq reads across genic loci to profile the typical maize genic transcription unit at higher resolution. Combining all sense and antisense profiles (Figure 3.6 on page 94) into a metagene composite (Figure 3.5D on page 93) confirms the sense-strand enrichment downstream of currently annotated gene models. Additionally, the composite profile indicates that most pretermination transcription occurring 3 ′ of PASs concludes within 1-1.5 kb of currently annotated gene ends. This result agrees with estimates from metazoan profiles (Core et al. 2008, Core et al. 2012, Kruesi et al. 2013). Beyond presumptive genic transcription termination points and upstream of TSSs there is remarkably little evidence of transcription. Although our results do not distinguish RNAs produced from different RNAPs, the nature of the read enrichment at genes (starting near the TSS and extending beyond the 3 ′ PAS) and the sense strand bias supports the prediction that most of these reads derive from gene-associated nascent Pol II RNAs.

Promoter-proximal transcription in maize is distinct from that seen in metazoans The composite metagene profile (Figure 3.5D on page 93) highlighted unexpected features of typical maize transcription initiation. The beginning of composite genic transcription is marked with a prominent narrow peak of GRO-seq reads nearly coincident with the TSS (Figure 3.5D on page 93). Sense-oriented peaks located approximately 50 bp downstream of TSSs are identified by GRO-seq profiles in human (Core et al. 2008), Drosophila (Core et al. 2012) and under stress conditions in C. elegans (Kruesi et al. 2013, Maxwell et al. 2014). These peaks are cited as evidence of promoter-proximal Pol II pausing, a transcriptional regulatory mechanism first described at the hsp70 gene in Drosophila (Rougvie & Lis 1988). CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 62

It is unclear whether the more upstream maize TSS-proximal peak (Figure 3.5D on page 93) represents Pol II pausing. Over 1/4 of maize gene models used in the original metagene profile (Figure 3.5D) begin with the triplet ATG sequence (27%), compared to only 2% when triplet sequences are sampled 5 kb upstream of the gene models (an approximation of ATG enrichment genome-wide; Figure 3.7 on page 95). This finding indicates that translation, rather than transcription, initiation sites define many current annotations of maize gene start positions. To test if an alternative gene start definition would shift the TSS-proximal peak, I first defined a set of genomic TSS annotations using maize seedling and young leaf full-length cDNA sequences (see Materials and Methods; Soderlund et al. 2009). A similar metagene analysis using this validated set of TSSs places the sense-oriented GRO-seq peak upstream of the TSS (Figure 3.8 on page 96). Curiously, the GRO-seq read peak is upstream of the flcDNA-annotated TSS. Because the existing annotations for TSS were variable and the metagene analysis naturally minimizes rare GRO-seq peaks, the highest GRO-seq peak for each gene model (within 2 kb of the TSS) was identified. These peaks can be found at nearly every position within the 2 kb region, although they focus near the annotated TSS regardless of the method used to define the TSS (Figure 3.9 on page 97). While the GRO-seq coverage peaks are enriched within 100 nt of the TSS, only 27% of the 11,059 gene models with support from both FGS and flcDNA data had their tallest peak near the TSS (Figure 3.9D on page 97). However, GRO-seq coverage profiles of randomly sampled gene models with peaks no more than 100 nt upstream (purple) or downstream (blue) of the TSS (green) show sense strand coverage that comports with transcription initiation at (or very near) the TSS continuing into the body of the gene, while gene models with peaks further upstream (red) have GRO-seq coverage inconsistent with the gene model, and those with peaks further downstream (gold) may represent much later TSSs than predicted (Figure 3.9B on page 97). These GRO-seq peak analyses identify subpopulations of genes with different peak profiles, which warrant future investigation, particularly if GRO-seq data is to be used for improving gene models. Even so, these results indicates that the peak positions relative to TSSs are dependent on annotation methods but likely do not represent canonical Pol II pausing as described in metazoans. Nascent transcription profiling (Core et al. 2008) and short RNA cDNA libraries (Seila et al. 2008) identified evidence of divergent antisense transcription peaking at approximately -250 bp at mammalian promoters. Similar to Drosophila GRO-seq profiles (Core et al. 2012), our metagene profile has no evidence of such a broad antisense peak (Figure 3.5D on page 93). The only peak at approximately -250 bp is due to antisense-biased GRO-seq coverage of a single gene (Figure 3.10 on page 98), which likely represents a transcription unit unrelated to the annotated gene model. Together, these two characteristics of near-promoter transcription - a TSS-proximal peak and lack of divergent transcription - distinguish the maize seedling transcriptional environment at promoters from that of currently profiled metazoans. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 63

Pol IV affects nascent transcription at gene boundaries Additional Pol II-derived plant RNAPs (Pols IV and V) represent a key distinction be- tween the transcriptional landscape of multicellular plants and other eukaryotes. To evaluate Pol IV effects on transcriptional activity at functionally important sites surrounding genes, such as promoters and transcription termination sites, I generated and compared metagene profiles displaying the mean GRO-seq read abundance across a composite of annotated gene edges and their flanking genomic regions (Figure 3.11A on page 99). To exclude outliers, I sorted gene models based on their maximum read count contribution to the metagene plot and included only the inner 90% (∼28.6 thousand) of gene models in the metagene profiles (see Materials and Methods). Profile comparisons reveal changes in transcription at gene boundaries +/- RPD1, while transcription of genic and upstream regions remain unaffected (Figure 3.11A on page 99, Figure 3.8 on page 96). Near TSSs, the rpd1 mutant library has significantly lower read cov- erage in both strand orientations (Figure 3.11A, Welch’s t-tests in 10 nt windows, corrected for multiple sampling by the Holm-Bonferroni method at α = 0.05). Heatmap represen- tations of read abundance differences across individual gene boundaries indicate that the trends observed in the metagene summary apply to most genes (Figure 3.11B on page 99). Elsewhere, the WT and rpd1 mutant GRO-seq profiles are remarkably similar, particularly for regions having high coverage (sense strand over gene bodies and 1 kb downstream of gene ends) (Figure 3.12 on page 100). The sense strand, gene body coverage between libraries has Spearman’s rank correlation coefficients (ρ) of 0.971, 0.963, and 0.977 (first 1 kb, middle, and last 1 kb, respectively), which approximate the ρ = 0.967 observed between biological replicates in the original GRO-seq analysis (Core et al. 2008). Together, the metagene sum- mary, fold-change heatmap, and Spearman’s correlations highlight that RPD1 has no general impact over the coding region of genes. More striking, the pretermination region beyond the PAS has significantly increased read coverage in the absence of RPD1. Downstream of the PAS there is evidently increased transcription for most genes relative to upstream of the PAS and this 3 ′ transcription is even more pronounced in rpd1 mutants (Figure 3.11B on page 99). Upstream of currently annotated gene models, the fold differences in read abundances are variable as expected for regions of relatively slow or underrepresented transcription; fold-changes in read abundances representing antisense-oriented transcription (Figure 3.13 on page 101) show similarly variable patterns. Although window sizes used for generating fold-change heatmap data are larger (50 nt vs. 10 nt) than in the metagene profiles, most of these larger windows directly above TSSs still show negative fold-changes indicating increased transcription in WT vs. rpd1 mutant samples as a common feature of many genes (Figure 3.11B on page 99). The RPD1-dependent changes in the GRO-seq profiles observed near gene boundaries implicate Pol IV activity near genes. Because Pol IV is required to generate 24 nt RNAs, presumably through downstream processing of Pol IV transcripts by RNAi-like machinery (Li et al. 2015, Matzke & Mosher 2014, Matzke et al. 2015), I looked for 24 nt read (24mer) evidence supporting Pol IV action nearby recognized Pol II transcription units. Previous CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 64

genome-wide profiling of maize 24mers identified enrichment 1.5 kb upstream of genes (Gent et al. 2014). To profile 24mers within 1 kb of gene boundaries, I pooled approximately 75 million B73 16-35 nt RNA reads from published datasets for metagene analysis. To increase effective sequencing depth, I pooled four distinct datasets: 3 day-old seedling root (Gent et al. 2012), unfertilized cobs (Gent et al. 2013, Gent et al. 2014) and 11 day-old seedling shoot apices (Barber et al. 2012) all representing the B73 inbred background, and subjected them to similar coverage analysis near genes (see Materials and Methods). Most 24mers represent repetitive features so their originating loci cannot be determined. I therefore limited analysis to uniquely mapping 24mers (8.6 million). These 24mers are enriched both upstream and downstream of genes (Figure 3.14A and B on page 102), and because they are uniquely mapping, they are presumably generated from Pol IV transcription occurring in regions immediately flanking genes. Biogenesis of 24 nt RNAs also requires RDR2 to create a double stranded RNA from Pol IV transcripts (Li et al. 2015, Matzke et al. 2015). In Arabidopsis, RDR2 only functions in physical association with Pol IV (Haag et al. 2012). While the maize RDR2 ortholog (Alleman et al. 2006) has not been tested for a similar requirement, it physically associates with Pol IV (Haag et al. 2014) and is required for 24 nt RNA biogenesis (Nobuta et al. 2008). I therefore used an existing siRNA dataset from an rdr2 mutant (Gent et al. 2014) as a proxy for identifying RPD1-dependent 24 nt RNAs. Although this comparative analysis is more limited (11.1 million total reads, ∼551,000 unique 16-35mers, and ∼154,000 unique 24mers) the 24mer coverage across all genes showed no enrichment flanking gene boundaries (Figure 3.14A, C and D on page 102), indicating that the near-genic 24 nt RNAs are RDR2-dependent. These 24mer analyses support the presence of Pol IV immediately upstream and downstream of Pol II genes. Together, this analysis of maize rpd1 mutants at gene boundaries identifies previously unknown interactions by which Pol IV affects Pol II transcription at discrete positions near genic regions. In addition, these results identify both conserved and novel features of transcription in higher plants vs. metazoans.

Transcription of specific genes is affected by RPD1 RPD1 is required for restriction of silkless1 gene expression from apical inflorescences, thus ensuring proper male flower development (Parkinson et al. 2007), though how RPD1 reg- ulates developmentally important genes is unknown. We employed a computational method (DESeq; Anders & Huber 2010) that assigns statistical significance to regions annotated as genes over- or under-represented in uniquely mapping GRO-seq reads from either WT or rpd1 mutant libraries. This approach robustly controls for false positives (Type I errors) across a dynamic range of coverage levels and, by pooling information from similarly represented loci, can estimate the required mean and variance values for each locus even when the total sequencing depth and/or number of biological replicates is small (Anders & Huber 2010). Using this method, we could treat the WT and rpd1 mutant libraries as effective biological replicates (see Materials and Methods) assuming that at most loci there was no differential transcription, an assumption supported by the trends observed in the global analysis (Fig- CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 65 ure 3.1 on page 91), the metagene profiling (Figure 3.11 on page 99), and the Spearman’s rank correlation coefficients (Figure 3.12 on page 100). As a result, we have higher confidence in our identified differentially transcribed loci than if we treated each library individually, though the method likely underestimates the total number of differentially transcribed loci. Applying this stringent statistical method identified a total of 209 annotated genes whose seedling-stage transcription across the entire gene body (annotated UTRs, introns and exons) is significantly increased or decreased by loss of RPD1. We excluded a cluster of 26 gene models having significantly reduced GRO-seq representation in the rpd1 mutant profiles found within 20 Mb of the rpd1-1 introgressed haplotype as these were likely identified because of an inability of some polymorphic rpd1 mutant reads to align to B73 sequences in this interval. The remaining 183 genes represent potential direct or indirect targets of RPD1/Pol IV regulation (Figure 3.15A, page 103 and Tables 3.3, page 75; 3.4, page 78; 3.5, page 84; 3.6, page 86). To determine whether or not genic TEs were related to RPD1-affected transcription, graduate student Karl Erhard re-annotated the 183 genes and compared their TE content with 200 randomly selected genes indicating that RPD1-regulated genes have an average TE content: 64% of the differentially transcribed genes and 66% of the randomly selected genes (Erhard et al. 2015). Visual inspection of the differentially expressed genes having sense-strand changes (148 in total) using genome browser displays identified two classes: those with consistent GRO- seq read coverage across the entire gene model and those having more biased or localized coverage. In total, 32 of 46 (70%) having increased transcription (bold entries in Table 3.3 on page 75) and 96 of 102 (94%) with decreased transcription (bold entries in Table 3.4 on page 78) matched current gene annotations and are likely related to RPD1-dependent effects on genic transcription. The remaining 20 genes identified as differentially expressed by DESeq analysis have changes in GRO read coverage localized to sub-genic regions, sometimes overlapping TEs or the beginning of the gene with persistent coverage extending from the 3 ′ end of an upstream gene. Three examples of differentially transcribed genes were subsequently examined and vali- dated at mRNA levels. Results of oligo(dT)-primed reverse transcriptase polymerase chain reaction (RT-PCR) analyses using similar biological materials confirmed increased sense- oriented mRNA levels of all three genes tested from the set identified by computational analysis of WT and rpd1 mutant GRO-seq reads (Erhard et al. 2015). As an example, the predicted fold-changes +/- RPD1 of outer cell layer 2 (ocl2 ) nascent transcripts and mRNAs are approximately 7.9- and 3-fold increases, respectively. Additional quantitative real-time polymerase chain reaction (qRT-PCR) results estimate a 10-fold increase in ocl2 mRNAs in the absence of RPD1 (Figure 3.16 on page 105). These results validate the GRO-seq technique and DESeq analyses in discovering genes whose Pol II transcription is affected by RPD1. We expected to detect primarily increased sense-specific transcription by comparing WT and rpd1 mutant GRO-seq profiles because the Pl1-Rhoades allele is transcriptionally re- pressed by RPD1 (Hollick et al. 2005). However, the largest fraction (∼0.56) of genes af- fected by loss of RPD1 has lower sense-oriented transcription in rpd1 mutants (Figure 3.15B, CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 66

page 103; Table 3.4, page 78). One possible explanation for this result is that genomic fea- tures transcriptionally repressed in an rpd1 mutant background represent indirect effects. Potentially related to this idea, the presumed maize ortholog of the Arabidopsis REPRES- SOR OF SILENCING 1 (ROS1 ) gene, which encodes a DNA glycosylase that facilitates demethylation of cytosine residues (Morales-Ruiz et al. 2006), is transcriptionally repressed in rpd1 mutants (2.9-fold decrease) although this differential representation does not pass our statistical cutoff after adjustment for multiple testing by our DESeq test (raw p-value: 0.002, adjusted p-value: 0.2). Maize ros1 RNA levels are also reduced in meristems of rdr2 mutants (Jia et al. 2009), though the mechanism by which any genes are repressed in the absence of RPD1 or Pol IV-dependent siRNAs remains unknown. Among differentially transcribed gene models in rpd1 mutants (Tables 3.3, page 87; 3.4, page 78; 3.5, page 84; 3.6, page 86), we identified several candidates whose dysregula- tion could result in developmental abnormalities and potentially contribute to rpd1 mutant phenotypes (Parkinson et al. 2007). The candidate showing the second greatest increased transcription is ocl2 (Figure 3.15C on page 103), a member of the plant-specific home- odomain leucine zipper IV (HD-ZIP IV) family of transcription factors predicted to have leaf epidermis-related functions in maize (Javelle et al. 2011). Mature rpd1-2 mutant plants often exhibit problems maintaining proper leaf polarity (adaxialized leaf sectors) (Parkin- son et al. 2007). Interestingly, ocl2 is not normally expressed in epidermal nor mesophyll cells (Javelle et al. 2011), which comprise the majority of cells used for GRO-seq library generation, indicating this gene is transcribed outside its normal expression domain in rpd1 mutants. Genome browser visualization of GRO-seq reads uniquely mapping to the genomic region containing ocl2 (Figure 3.15C on page 103) highlights coincident transcription profiles of a proximate TE and this RPD1-regulated gene. A fragment of an LTR retrotransposon of the Gypsy class (RLG) assigned to the ubid family located ∼1.3 kb upstream of the ocl2 gene is also transcribed in rpd1 mutants, but in antisense orientation with respect to the ocl2 coding region (Figure 3.15C on page 103). These results identify the ubid fragment upstream of ocl2 as a putative controlling element for this allele, with the absence of RPD1 corresponding with transcriptional increases of both the ubid fragment and adjacent gene. While Pol IV loss increases both nascent transcription and poly-adenylated ocl2 mRNA, it is unclear whether this ectopic expression relates to a loss of RdDM-like or polymerase competition based repression. The required to maintain repression5 (rmr5 ) locus encodes a dicer-like3 (DCL3) homolog required for virtually all 24-nt RNA biogenesis in maize (A. Narain, I. T. Liao and J. B. Hollick, unpublished data). We used rmr5-2 (dcl3-2 ) mutant and sibling Dcl3 / Dcl3 wild-type seedlings to determine whether 24-nt RNAs were involved in maintaining ocl2 silencing in seedlings (Figure 3.16 on page 105). The loss of DCL3, which normally cleaves the Pol IV/RDR2 dsRNA products into 24 nt RNAs (Matzke et al. 2015), had no effect on the ocl2 seedling RNA levels. This result supports a 24 nt RNA-independent mechanism for the regulation of ocl2 expression by Pol IV (Hale et al. 2009). We also identified 36 genes transcriptionally altered in the antisense orientation in rpd1 mutants (Figure 3.15B on page 103 and Tables 3.5, page 84; 3.6, page 86). Pol IV is CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 67

implicated in the production of an antisense precursor transcript, and corresponding 24 nt siRNAs, homologous to the 3 ′ end of the Arabidopsis gene FLOWERING LOCUS C (FLC) (Swiezewski et al. 2007), which encodes an epigenetically regulated MADS Box factor important for vernalization and the regulation of flowering time (Dennis & Peacock 2007). However, only four genes (Table 3.6 on page 86) show decreased antisense transcription in the rpd1 mutant, indicating that Pol IV-dependent antisense transcription of genes is unlikely a primary mechanism of its action on a genome-wide scale. Many more genes (32 total) were recognized having increased transcription in antisense orientation (Figure 3.15B on page 103 and Table 3.5 on page 84), yet only one of these (GRMZM2G045560; a gene model encoding a WRKY DNA-binding domain-containing protein) had a significant (DESeq method of Anders & Huber 2010) increase in sense transcription as well. Most of these examples appear to represent transcription of non-coding RNAs initiated 3 ′ of the annotated genes (Figure 3.15D on page 103). Additionally, visual inspections indicate most (22 of 32, 69%) of these transcription units begin at downstream TEs (Figure 3.15D on page 103). Counting MTEC TE annotations within 2 kb of the 3 ′ ends of these 32 genes identifies a large number of LTR retrotransposons immediately downstream (Figure 3.15E on page 103). These results support previous findings showing Pol IV loss allows increased transcription of certain LTR retrotransposons (Hale et al. 2009) and they highlight how such promiscuous Pol II transcription could affect gene regulation (Kashkush & Khasdan 2007). Genes encoding a histidine receptor for cytokinin - an important phytohormone - and a homolog of an Arabidopsis HD-ZIP factor ATHB-4, also have elevated antisense transcription profiles in the absence of RPD1 (Table 3.5 on page 84), indicating the potential for a biologically significant role for this novel mechanism of RPD1 gene regulation in maize.

Transcription of TE families and specific TEs is affected by RPD1 While overall TE transcription appears modest (Figures 3.1B, page 91 and 3.5A, page 93), we compared GRO-seq read representations among specific TE families and at individual TE loci to determine if certain types are preferentially transcribed. Direct alignments of total WT and rpd1 mutant GRO-seq reads (unique and non-unique) to consensus sequences of annotated maize TE classes and major super-families allowed us to compare transcription of these features to their relative abundance in the maize genome (Figure 3.17A on page 106; Schnable et al. 2009). Although the RLG (Gypsy) class of LTR TEs is the most abundant TE super-family in the maize genome (Figure 3.17A on page 106; Schnable et al. 2009), they are under-represented in both GRO-seq profiles as compared to the LTR Unknown class (RLX) and to Type II DNA TEs (Figure 3.17B on page 106). To determine if distinct TE groups were differentially represented in nascent transcrip- tomes, Karl Erhard compared normalized abundances of WT and rpd1 mutant reads map- ping to annotated TE super-families (Figure 3.17C on page 106). This comparison identified several TE super-families with elevated transcription in the rpd1 mutant background (Figure 3.17C on page 106), including Gypsy, Mariner, Helitron and CACTA non-coding elements. The majority of TE-derived GRO-seq reads cannot be mapped uniquely to specific genomic CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 68

coordinates, limiting the detection of individual rpd1 -affected TE loci to those TEs har- boring significant sequence polymorphisms with respect to their family members. We thus employed the same method (DESeq; Anders & Huber 2010) used with genes to identify genomic regions annotated as TEs (Baucom et al. 2009, Schnable et al. 2009) having sta- tistically significant differences in uniquely mapping GRO-seq read coverage. This method identifies 63 individual TEs (Figure 3.17D on page 106 and Tables 3.7, page 87; 3.8, page 88; 3.9, page 89; 3.10, page 90) representing several different super-families (Figure 3.17E on page 106) whose transcription is either increased (Figure 3.17F on page 106 and Tables 3.7, page 87; 3.9, page 89) or decreased (Tables 3.8, page 88; 3.10, page 90) in rpd1 mutants in either sense or antisense directions. Only 28 of these unique TEs are not within or nearby (+/- 5 kb) genic regions and some identify larger TE regions affected by RPD1 (Figure 3.17F on page 106 and Tables 3.7, page 87; 3.8, page 88; 3.9, page 89; 3.10, page 90). These results agree with previous analyses (Hale et al. 2009) indicating that RPD1 prohibits mRNA ac- cumulation of certain LTR retrotransposons by interfering with normal Pol II transcription and RNA processing.

3.4 Discussion

Our GRO-seq profiling of WT and rpd1 mutant maize seedlings represents the first genome-wide nascent transcription analysis in plants and of a Pol IV mutant. The GRO-seq technique facilitates future studies of RNAP dynamics relevant to basic mechanisms of gene control in higher plants. Our results identify Pol IV effects on transcription at most gene boundaries, indicating that distinctions between higher plant and metazoan transcription may be partly related to the plant-specific expansion of RNAP diversity. We also identified specific TE and genic alleles that show significant changes in nascent transcription +/- RPD1, a dataset that should prove useful for better understanding the role(s) of Pol IV function in TE silencing, paramutation, and maize development. The GRO-seq method captures snapshots of active transcription, which can identify entire transcription units from initiation to termination, helping to identify alternative TSSs and cryptic transcripts. Through comparisons of GRO-seq and RNA-seq profiles, it should be possible to identify the extent to which regulation of gene expression occurs at the level of post-transcriptional RNA stability. As GRO-seq reveals aspects of transcriptional regulation absent from the mature mRNA, nascent transcriptome profiles in metazoans, plants and fungi will continue to define and distinguish RNAP functions across eukaryotes.

General Pol IV effects on genic transcription Similar to metazoans, maize transcription extends beyond the PAS with termination occurring within approximately 1-1.5 kb downstream. However, because of its additional RNAPs, plants may have alternative mechanisms to terminate Pol II transcription. At maize 3 ′ gene ends, I speculate that Pol IV plays a role in attenuating aberrant read- CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 69 through transcription by Pol II into neighboring genes or TEs. This model is supported by the enrichment of RDR2-dependent 24 nt RNAs immediately downstream of PASs. Another possibility is that the kinetics of co-transcriptional mRNA splicing and/or polyadenylation are affected by Pol IV, perhaps related to the sharing of specific holoenzyme subunits (Haag et al. 2014). Together, the GRO-seq profiles and rpd1 mutant analysis indicate that Pol II termination in maize is unique from that seen in metazoans. Maize Pol IV also affects transcription at 5 ′ gene boundaries. Our results show rpd1 mutants have decreased transcription at most gene TSSs, identifying a previously unknown role for Pol IV at Pol II initiation sites. Enrichment of 24 nt RNAs immediately (this paper) and further upstream (on average 1.5 kb in maize; Gent et al. 2014) of genes supports the presence of Pol IV at genic promoters. Because Pol IV is predicted to engage transcrip- tion bubble-like DNA templates (Haag et al. 2012), and appears to initiate at AT-rich and nucleosome-depleted regions (Li et al. 2015), perhaps Pol IV holoenzymes synthesize RNA, either abortively or productively, at loci undergoing Pol II transcription initiation. Such behaviors could account for the relatively higher abundance of both sense and antisense 5 ′ GRO-seq reads found in WT although this idea seems inconsistent with the observed pat- terns of 24mer vs GRO-seq enrichment upstream of genes (Figures 3.11A on page 99 and 3.14 on page 102). These discordant distributions indicate the decrease in GRO-seq coverage near the TSS is an indirect effect of Pol IV loss affecting transcription from another RNAP. It remains formally possible that Pol V contributes to this TSS-proximate transcription as Arabidopsis Pol V associates with TE-proximal promoters and more transiently with other promoters (Zhong et al. 2012). Alternatively, the decrease in promoter proximal GRO-seq read coverage could be due to titration of Pol II to other initiation sites in the absence of Pol IV (Hale et al. 2009), such as to the LTR TEs downstream of genes showing increased antisense transcription in rpd1 mutants (Figure 3.15D on page 103). The maize TSS-proximal peak of GRO-seq reads appears distinct from the promoter- proximal peak associated with canonical Pol II pausing in metazoans. Whether Pol II paus- ing, as described in humans and Drosophila (Core et al. 2008, Core et al. 2012), regulates transcription elongation of some maize genes remains unknown, though I observe no strong evidence for Pol II pausing in our data sets. Our experimental design focused on capturing a broad view of nascent transcription +/- RPD1, and as such, we chose seedling tissue in which more than 90% of genes produce detectable mRNAs via ultra-deep sequencing (Mar- tin et al. 2014). Seedlings grown under laboratory conditions may have no need for Pol II elongation regulation; C. elegans tends to show evidence of Pol II pausing only under stress (Kruesi et al. 2013, Maxwell et al. 2014). Additionally, we omitted Sarkosyl from the run-on reactions not knowing how this detergent might affect Pol IV function. This omission may also prevent detection of promoter-proximal Pol II pausing peaks as Sarkosyl can dissociate inhibitory factors holding Pol II at a canonical paused gene, hsp70, in Drosophila (Rougvie & Lis 1988). However, it should be noted that GRO-seq profiles in Drosophila indicate that the pausing peak, though greatly diminished, can still be detected in the absence of Sarkosyl (Core et al. 2012). While the nature of the maize TSS-proximal peak remains unclear, there is still a significant difference in transcription behavior at these regions +/- RPD1. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 70

A third characteristic identified in metazoan GRO-seq profiles is divergent transcription, which may be a byproduct of previous Pol II initiations at the same promoter and/or a mechanism to maintain the nucleosome free region (reviewed by Seila et al. 2009). These divergent transcripts may have roles as either regulatory scaffolds or sources of small RNAs (Core et al. 2008, Seila et al. 2008). Divergent transcription is prevalent at mammalian genes, but less so at C. elegans and Drosophila promoters, perhaps related to the directional specificity of favored promoter sequences (Core et al. 2012, Kruesi et al. 2013). I found no evidence of divergent transcription at maize promoters, potentially placing them in a similar category as Drosophila, which has a median 32-fold bias for sense-oriented transcription at promoters (Kruesi et al. 2013). This finding is curious because evidence in Arabidopsis indicates Pol V can be recruited to Pol II promoters (Zhong et al. 2012) and profiles of uniquely mapping 24 nt RNAs (Figure 3.14 on page 102) place Pol IV near genes. It may be that plant-specific RNAPs provide divergent transcription to help maintain nucleosome free regions and that our GRO-seq assay conditions do not detect such nascent transcripts. If Pol IV and/or Pol V are present immediately upstream of genes, then why is there little evidence of nascent transcription upstream of genic units? The Pol IV and V catalytic cores differ from that of Pol II, leading to relatively slow elongation rates and insensitivity to the Pol II inhibitor alpha-amanitin in both Arabidopsis (Haag et al. 2012) and maize (Haag et al. 2014). These differences may also affect their sensitivity to the limiting CTP present in the GRO-seq run-on reaction that affects Pol II NTP incorporation rates (Core et al. 2008) or their ability to incorporate the brominated UTP analog. Because only uniquely mapping reads were analyzed at gene boundaries, repetitive Pol IV- and/or Pol V-derived reads would be excluded. Additionally, any nascent Pol IV RNAs co-transcriptionally processed into siRNAs (Haag et al. 2014, Li et al. 2015) would not have been incorporated into the GRO- seq libraries. However, we are still able to identify effects of RPD1-loss on Pol II transcription (Tables 3.3, page 75 and 3.4, page 78; Figures 3.15C, page 103 and 3.16, page 105; and Figure S11 from Erhard et al. 2015). Within limits of the GRO-seq assay, our results indicate that Pol II behaviors are generally affected by Pol IV and this defines fundamental differences in genic transcription between metazoans and higher plants.

Pol IV affects gene regulation McClintock referred to TEs as controlling elements because of their potential to affect gene regulation (McClintock 1951). TEs are transcriptionally repressed by Pol IV action either through direct competitions with Pol II (Hale et al. 2009) or through chromatin modifications dictated by Pol IV small RNAs (Matzke & Mosher 2014, Matzke et al. 2015), thus it is not surprising that increasing evidence points to Pol IV as a general source of epigenetic variation affecting gene regulation (Parkinson et al. 2007, Hollister et al. 2011, Gent et al. 2012, Eichten et al. 2012, Greaves et al. 2012, Erhard et al. 2013). Here we found Pol IV responsible for transcriptional control of both TEs and genes consistent with a role of TEs as regulatory elements for specific alleles. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 71

In accord with prior results (Erhard et al. 2009, Hale et al. 2009), loss of RPD1 results primarily in increased TE transcription, and both Type I and Type II TEs are among those affected. We identified only 28 unique TEs whose transcription was affected by RPD1 that were further than 5 kb of annotated genes (Tables 3.7, page 87; 3.8, page 88; 3.9, page 89; 3.10, page 90). While specific repetitive TE classes are differentially affected, our results indicate the majority of the genome-wide non-genic TEs are not transcribed at the seedling stage of development even in the absence of RPD1. Our analyses, however, likely underesti- mate the number of transcribed TEs because our sequencing depth, particularly in non-genic regions, is insufficient to detect low abundance transcripts (Martin et al. 2014). Addition- ally, B73-mapping TE-like reads representing either unannotated TEs, TEs highly divergent from the Maize TE Consortium canonical set, or chimeras from multiple insertion events may have been misclassified as intergenic reads. Our results contrast with RNA-seq data from rdr2 mutant meristems (Jia et al. 2009) showing significant increases in TE RNAs in the absence of this siRNA biogenesis factor. Assuming that TE RNA levels accurately reflect transcription rates, this difference in experimental results indicates that the mechanisms of TE repression among meristematic and differentiated cell types are distinct. Consistent with the limited cytosine methylation changes seen in the absence of maize RPD1 (Parkinson et al. 2007, Erhard et al. 2013, Li et al. 2014), Pol IV plays a potentially redundant role in repressing most TE transcription in whole seedlings though a fraction appear to be directly controlled by Pol IV action(s). The genomic and/or molecular features that distinguish these two general classes remain to be identified. In addition to TEs, about 0.5% of all B73 alleles are transcriptionally responsive to RPD1, though loss of RPD1 can result in either increased or decreased transcription, in some cases in antisense orientation. It seems plausible that inappropriate antisense gene transcription could interfere with normal co-transcriptional RNA processing steps, or that sense-antisense RNA pairs could create double-stranded substrates for endonucleases, leading to post-transcriptional degradation. Pol II / Pol IV competitions for gene and TE promoters as previously proposed (Hale et al. 2009) and here exemplified by the B73 ocl2 haplotype profiles (Figures 3.15C on page 103 and 3.16 on page 105) remain a viable hypothesis to ex- plain RPD1-specific effects. Such competitions could also account for the increased antisense transcription of many genes in the absence of RPD1 where the antisense transcription unit is contiguous with a downstream TE (Figure 3.15D-E on page 103). It will be important to characterize the make-up of nuclear transcription “factories” in plants to see if and how Pol II and Pol IV potentially compete for specific genomic templates. It will also be nec- essary to compare whole-genome transcription profiles of mutants deficient for downstream components required for RNA-directed DNA methylation such as DCL3 to identify genes whose regulation is associated with modulations of cytosine methylation. Independent of specific mechanisms, our genome-wide analyses indicate that Pol IV plays a significant regu- latory role for specific alleles in the grasses. Given that there are multiple functional Pol IV subtypes defined by alternative second largest subunits in the grasses (Stonaker et al. 2009, Sidorenko et al. 2009, Sloan et al. 2014, Haag et al. 2014), and that haplotype diversity in maize is largely based on radically different intergenic TE compositions (Wang & Dooner CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 72

2006), the potential for regulatory diversity controlled by alternate RNAPs is immense in the maize pangenome.

Transcriptional control affecting paramutation One feature of the ocl2 haplotype, transcription of an upstream TE affected by Pol IV, is shared among pl1 alleles affected by Pol IV loss. Several pl1 alleles, including Pl1-Rhoades, have an upstream Type II CACTA-like TE fragment belonging to the doppia subfamily, which defines kernel-specific expression after conditioning in an rpd1 mutant background (Erhard et al. 2013). This Pl1-Rhoades doppia fragment may be necessary, but is insuffi- cient, for facilitating paramutations (Erhard et al. 2013). A related doppia fragment serves as a kernel-specific promoter and is partly required for paramutation occurring at R-r:standard haplotypes (Kermicle 1996, Walker 1998). At both Pl1-Rhoades and R-r:standard, other doppia-independent features are clearly important for mediating the trans-homolog interac- tions characteristic of paramutation (Kermicle 1996, Erhard et al. 2013, this work Chapter 4). At the B1-Intense haplotype, paramutation interactions require a distal 5 ′ transcrip- tional enhancer composed of direct repeats (Stam et al. 2002) that loop to the b1 promoter region (Louwers et al. 2009), but there is currently no evidence supporting a functional role of specific TE sequences. In the four most studied examples of paramutation occurring in maize, transcription and/or transcriptional enhancers are functionally implicated in the mechanism required for establishing meiotically heritable repression associated with para- mutation (Kermicle 1996, Sidorenko & Peterson 2001, Stam et al. 2002, Gross & Hollick 2007). In the absence of RPD1, high levels of gene expression are restored at R-r:standard, Pl1-Rhoades, and B1-Intense haplotypes previously repressed by paramutation (Hollick et al. 2005). At Pl1-Rhoades, loss of RPD1 results in increased pl1 transcription and this is often associated with meiotically-heritable reversions of Pl1-Rhoades to a stable and highly expressed non-paramutant state (Hollick et al. 2005) resulting in strong plant pigmentation. We purposely excluded Pl1-Rhoades from materials used for the GRO-seq libraries reported here to avoid changes in transcription related to light perception that might be affected by seedling pigmentation. However, now having a list of RPD1-regulated alleles, we can use pedigree analyses to test whether they also exhibit paramutation-like properties. Charac- terizing nascent transcription in different Pol IV and siRNA mutant backgrounds across Pl1-Rhoades and other haplotypes subject to paramutation promises to uncover important features of the underlying mechanism. Our findings indicate RPD1 uses mechanistically diverse actions, some of which may be independent of its catalytic action within the Pol IV holoenzyme, to regulate alleles in different genomic contexts. The identification of alleles affected by RPD1 loss now presents the opportunity to identify specific haplotype structures that have co-opted direct Pol IV action for their regulation. Given that maize Pol IV defines both mitotically and meiotically heritable patterns of gene regulation (Parkinson et al. 2007, Erhard et al. 2013), alterations of its function by developmental, environmental or genealogical sources might lead to both ontogenetic and phylogenetic changes. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 73

Acknowledgements

I would like to thank Karl Erhard for starting this project and trusting me to take it over. Additionally, I want to acknowledge his work, particularly on the TE-related analysis that I left in this chapter including Figure 3.17C and the TE annotations in all of the DESeq tables. I also want to thank our former REU student Allison McClish and our then rotation student Natalie Deans for their hard work with the qRT-PCR project on young seedlings, some of which is discussed here. Thanks also to Janelle Gabriel, Brian Giacopelli, Julia Kays, and Laura Schrader for their constructive comments and discussions surrounding this work over the years. Thanks to members of the Slotkin, Bisaro and Ding labs who provided useful comments at our group lab meetings. And thanks to Tom Gallagher for letting me play with some of his RNA-seq data as I added extra functionality to the metagene_analysis suite. Finally, I want to thank Jeffrey Campbell, Katherine Mejia-Guerra and all of the past and present members of the CAPS Computational Biology Laboratory (CCBL) at The Ohio State University who provided many helpful and fun discussions about bioinformatics and programming–my code wouldn’t be the same without them. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 74

3.5 Tables

Table 3.1: Sources of genomic feature annotations

FeatureType FileFormat OriginalLocation rRNA/tRNA sequences FASTA Dr. Blake Meyers, University of Delaware FGS sequences FASTA release-5b/filtered-set/ZmB73_5b_FGS_genes.fasta.gza FGS positions GFF3 release-5b/filtered-set/ZmB73_5b_FGS.gff.gza MTEC consensus sequences FASTA release-5b/repeats/repeat-libs.tar.gza MTEC consensus positions GFF3 release-5b/repeats/ZmB73_5a_MTEC_repeats.gff.gza Maize AGPv2 build 5b B73 FASTA release-5b/assembly/ZmB73_RefGen_v2.tar.gza genome pseudo-chromosomes a Originally these files were downloaded from ftp://ftp.maizesequence.org/pub/gramene/maize- sequence.org/ base directory. Now you can access these same files from this base directory: ftp://ftp.gramene.org/pub/ gramene/maizesequence.org/.

Abbreviations: Filtered Gene Set (FGS) and Maize TE Consortium (MTEC)

Table 3.2: Primers used for RT-PCR and qRT-PCR analysis

Gene Model Exons Amplicon Primer Sequence (5 ′ to 3 ′) Amplified Size (bp)a Forward Reverse ocl2 6 & 7 176 bp TGTTTCCATTGATGGA- AGCGCATAAGAGGCTG- (AC235534.1_FG007) CTGC GTAG 8 & 9 125 bp ATCCTCAGCAATGGTG- AATCAAGATGCTGTTC- GCCCTAT AGGTGGGAG GRMZM2G043242 11-14 243 bp TATGCATGTGGCAGCT- AATCGTGCTTCGAGTC- AAGG TTGC GRMZM2G161658 2-4 341 bp CAGGAATTTGGTGTTG- CCCGGAGCGTTGTATG- CTGA TAAT aat 11-13 283 bp ATGGGGTATGGCGAGG- TTGCACGACGAGCTAA- (GRMZM2G088064) ATb AGACTb 12 & 13 122 bp CAATATCACTGGTCAA- TTGCACGACGAGCTAA- ATCCTTGCGA AGACTb a Shorter product sizes were used for the qRT-PCR reaction, while longer products were used for the RT-PCR experiment. b From Woodhouse et al. 2006. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 75 e a d -Value P /WT) 2 rpd1 Log Fold Change ( 631 3.652 2.28E-09 c 7839 1.266 1.27E-03 457160 1.407 4.98E-03 969840451823 2.367 4.021 2.49E-02 7.96E-05 3-36226741 2.981 3.01E-22 53-179990792 1.075 4.14E-02 680-25072205 1.735 1.26E-02 419042-254422294 2.350 1.47E-02 chr1:273032625-273036627 1.725 3.18E-03 chr1:147013445-147016840 2.008 1.51E-04 chr1:80620045-80622805 1.290 2.54E-02 Genomic Position UTR) b UTR) chr1:188855847-188859402 — 4.69E-05 ′ ′ UTR) ′ mutants - Increased Transcription - Sense Orientation noneintrons+ chr3:205694156-205695856 1.272 chr4:218998953-219053414 1.48E-03 1.987 9.42E-08 noneexon chr1:192730986-192732596exon (3 2.527 chr1:194674440-194677737 1.041 4.83E-03 2.54E-02 and intron noneexon chr1:173142735-173144264 3.188 7.07E-05 chr1:187587612-187591058 2.740 2.27E-02 (3 Content exon andtron in- rpd1 ) none chr1:45510990-45511749 2.527 3.37E-05 pdc3 glucosyltransferase like domain-containing protein (ACR5) tor srp30 epimerase/dehydratasetein family pro- tein receptor-like DUF 2930 Table 3.3: Genes differentially transcribed in GRMZM2G161658 Epoxidehydrolase2-likeGRMZM2G081363GRMZM5G866269 Hypothetical,GRMZM2G051683 unknown protein Anthocyanidin Hypothetical, unknown proteinGRMZM2G145213 14-3-3-likeprotein noneGRMZM2G174449 exon Hypothetical,GRMZM2G043242 unknown protein exon Putative ATP-binding, none ATPase- 5,3-O- chr3:3621707 chr3:127968485-127 chr3:151450891-151 intron chr4:25070 chr4:5962321-596 GRMZM2G083538 Amino acidGRMZM2G088413 Hypothetical,GRMZM2G143142 unknown binding protein protein ASF/SF2-like introns pre-mRNA splicing fac- chr1:254 GRMZM2G161233 NAD-dependent GRMZM2G081151 Beta-amryin synthase introns chr1:160287318-160 GRMZM2G303010 NBS-LRR disease resistanceGRMZM2G392791 pro- Epoxidehydrolase2-likeGRMZM2G132763 PutativeGRMZM2G430455 leucine-rich repeat Ribosomal protein S4 exon chr1:1799884 exon (3 GRMZM2G143082GRMZM2G147724 Phosphotidic Hypothetical, acid unknown phosphatase protein exon none chr1:86260377-86261 Gene IDGRMZM5G832375 Annotation Pyruvate decarboxylase ( TE/Repeat GRMZM2G087063 Hypothetical, unknown protein, CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 76 2.085 2.55E-02 333 1.159 2.92E-02 295 1.392 7.56E-05 2-74615918 1.125 2.49E-02 705-11201863 2.214 4.71E-02 0523-84051872 1.825 7.05E-03 0931-70063152 1.748 4.19E-06 8:118165588-118167724 1.681 1.26E-02 chr8:70136197-70144793 1.120 2.65E-02 chr7:85874277-85881002 1.239 2.80E-03 chr7:57059839-57064670 3.652 5.47E-07 UTR) UTR chr5:19298125-19301282 2.167 6.73E-02 UTR, exon ′ ′ ′ noneintron chr8:138861203-138862890 0.986 chr8:168823637-168828313 1.633 3.24E-02 1.89E-02 and intron none5 chr8:35561736-35562916 1.349 2.63E-02 exon (5 and intron none chr4:234873379-234874456 2.553 2.49E-02 noneexons chr5:159912586-159914501 1.935 chr5:210273706-210275023 0.990exons 7.34E-02 transcript only) 2.95E-02 (T02 carboxylate oxidase containing protein protein (ERF1) unit like glutamine amidotransferasemain) do- tion factor tein with Squamosa-promoter(SBP) binding domain family protein AC197705.4_FG001 Pyruvate decarboxylase isozyme 1GRMZM2G013448 exon 1-aminocyclopropane-1- GRMZM2G045560 WRKY DNA-binding chr domain- GRMZM2G045155 B12DproteinGRMZM2G146004GRMZM2G053503 Ethylene-responsive Protein induced upon tuberizationGRMZM2G031827 factor-like Splicing factor U2af exon 38 kDa sub- none chr8:5186351-5187 chr7:164234830-164235658 GRMZM5G814164 Peroxisome biogenesis protein 3-2- GRMZM2G333140 Hypothetical, unknown protein 3 GRMZM2G062716 Defense-related protein (Type 1 GRMZM2G021369 Putative AP2/EREBP transcrip- GRMZM5G830269 Hypothetical,GRMZM2G009080 unknown protein Hypothetical,GRMZM2G543070 unknown protein noneGRMZM2G110742 Hypothetical, exons unknown protein Putative thioredoxin superfamily pro- chr6:11200 none chr6:8405 chr6:90044328-90045 GRMZM2G098438GRMZM2G168747 Nrat1 Hypothetical,GRMZM2G028677 aluminum unknown transporter protein 1 Putative cytochrome P450 super- none exon and intron chr5:7005 chr5:7461226 CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 77 mutant counts were zero. 9696 1.962 1.774 2.49E-02 8.64E-04 rpd1 79083 2.974 1.35E-04 erg method (Benjamini & Hochberg 1995) by the .girinst.org/censor/) with 0.7 similarity or greater to del defined transcription unit as identified by visual chr10:136932960-136939534 2.988 2.65E-09 chr9:43122694-43125766 2.553 2.49E-02 UTR UTR) chr9:25386483-25390940 1.487 2.15E-02 ′ ′ tron exonexon chr9:134749957-134751048 1.254 chr10:24649645-24653324 1.21E-03 1.535 3.37E-05 and intron (HD-ZIP IV) exons and in- mutant / WT); for lines where no fold change is indicated, either WT or TE. rpd1 regulated protein B ocl2 ( 2 Panicoideae -values have been adjusted to account for multiple testing with the Benjamini-Hochb Bold entries highlight differential transcription that corresponds to the gene-mo inspection. TE content within genes isan defined annotated as a positive hit in Repeatmasker (http://www DESeq analysis (Anders & Huber 2010). p Genomic position is basedFold on change the is B73 Log maize reference genome, version 2, build 5b. a b e c d GRMZM2G024996 Pseudogene, transposon relic - up- GRMZM2G300965 Respiratory burstGRMZM2G070575AC235534.1_FG007 oxidase-like Cycloartenol synthase introns chr10:66168601-661 GRMZM2G047105 Hypothetical, unknown protein exon, 3 GRMZM2G147399 Earlynodulin93 none chr9:20665757-206669 GRMZM2G131421 Earlynodulin93GRMZM2G046528 Hypothetical, unknown protein exons (3 none chr9:20693835-206950 CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 78 e a d -Value P /WT) 2 rpd1 Log Fold Change ( 16357 -2.192 4.72E-02 c 4772060 -1.563 4.57E-05 7067-124821052 -2.685 8.54E-06 06254-27508489 -1.126 2.49E-02 chr1:123254749-123256442 -2.498 1.31E-02 Genomic Position chr1:149708692-149715244 -1.534 2.27E-02 chr1:156872706-156873991 -2.232 4.96E-02 chr1:213652786-213654502 -1.423 6.54E-05 b UTR) chr1:157173981-157177692 -2.686 1.97E-07 ′ UTR) UTR) UTR) ′ ′ ′ mutants - Decreased Transcription - Sense Orientation and(3 exon Content noneexon chr1:17624743-17625481none -1.511introns+ chr1:34594746-34596756 9.17E-03 introns+ chr1:43054782-43057900 -1.553 chr1:117114680-117142745 -2.264 7.77E-02 -1.032 chr1:117155139-117165079 9.47E-12 -1.268 7.08E-02 2.65E-02 exons andtrons in- intron (3 exon (5 exonnoneexon chr1:160808686-160810744(3 -1.608 chr1:209180557-209181568 7.40E-02 -1.297 3.98E-02 rpd1 DUF581 superfamily subunit C-1 isoform brane movement of substance tein (alpha-mannosidase) tein (alpha-mannosidase) DUF828 superfamily related At3g19553-like family member PSI light-harvesting complex type 4 protein Table 3.4: Genes differentially transcribed in GRMZM2G022804 Hypothetical, unknown protein intron Gene IDGRMZM2G475380 Gibberellin20oxidaseGRMZM2G496991 Annotation Hypothetical, unknownGRMZM2G164974 protein, 3-ketoacyl-CoAGRMZM2G089812 synthase 6-like Nuclear transcriptionGRMZM2G054332 none factor intron ATPase, Y coupledGRMZM2G038973 to Glycosyl transmem- hydrolase family 38GRMZM2G337242 pro- chr1:275 chr1:4767327- Glycosyl TE/Repeat hydrolase family 38GRMZM5G841743 pro- Hypothetical, unknown protein none chr1:123015378-1230 GRMZM2G067929 Hypothetical,GRMZM2G081977 unknown protein Hypothetical, none unknown protein, chr1:12481 GRMZM2G093038 Hypothetical, unknown protein exon and GRMZM2G429128 DNA-binding storekeeper protein- GRMZM2G331902 Probable polyamineGRMZM2G460861 transporter SAUR33-auxin-responsive SAUR GRMZM2G153184 Chlorophyll a-b binding protein 4; CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 79 -214892746 -1.830 6.38E-05 2091-27538462126-9588772 -1.398 4.14E-02 -1.313 2.63E-02 8344-10572400 -1.036 7.29E-02 973873-235982962 -0.923 8.66E-02 7279821-217280603 -1.536 2.69E-02 chr3:35353116-35357015 -3.086 1.26E-07 chr3:7275169-7280519 -1.516 1.34E-05 chr2:222270518-222276029 -1.062 1.80E-02 chr2:209536404-209537584 -1.852 6.76E-02 chr2:18460858-18463365 -1.976 4.35E-03 chr1:273091163-273101408 -0.928 9.76E-02 chr1:235645071-235668787 -1.195 2.24E-03 chr1:232400407-232405547 -0.914 7.29E-02 UTR ′ UTR) UTR) UTR, ex- UTR chr2:102279931-102283830 -1.387 9.17E-03 ′ ′ ′ ′ noneexon andtron in- chr3:32466336-32467145 -1.629 1.66E-03 exons and in- tron introns and(3 exon exonintron chr2:42257261-42259297 chr2:102136620-102139245 -1.094 -1.609 3.93E-02 1.43E-06 intronintrons chr2:102750321-102752520 -1.552(3 chr2:102897151-102905974 2.41E-05 -0.946 5.84E-02 noneexon and intron and 5 chr2:13712123-13713766 -1.491 3.88E-03 none5 chr1:262997924-263003009 -1.001 3.68E-02 ons andtron in- exon andtrons in- exon andtron in- DUF3339 superfamily ferase 1 chloroplastic precursor genase 1B-like tinin superfamily domain hydrogenase tinin superfamily domain tein tein ZOG1) LPE1 DNA-binding domain familytein pro- protein like GRMZM2G339866 Hypothetical, unknownGRMZM2G072298 protein Glycerol-3-phosphate acyltrans- GRMZM2G085019 NADP-dependent malic enzyme, GRMZM2G009188 11-beta-hydroxysteroid dehydro- GRMZM2G134264 Hypothetical protein with Agglu- GRMZM2G090028 Hypothetical, unknown protein 5 GRMZM2G337113 Glyceraldehyde-3-phosphate de- GRMZM2G056500 Hypothetical protein with Agglu- GRMZM2G090675 Cystathionin beta synthaseGRMZM2G152189 pro- Hypothetical, unknown protein exon GRMZM2G010034 Hypothetical,GRMZM2G089282 unknown protein Hypothetical,GRMZM2G093526 unknown protein Putative noneGRMZM2G000166 ent-kaurene synthase B Putative none metal intron ion-bindingGRMZM2G168474 pro- Cis-zeatin chr1:27538 O-glucosyltransferase 1 (cis- chr2:95872 chr2:1056 GRMZM2G060742 CitrateGRMZM5G858417 transporter family protein Nucleobase-ascorbate intron transporter GRMZM2G396114 Putative POX domain/homeobox chr1:235 GRMZM2G115646 Glutamate-ammonia ligase-like AC217050.4_FG007 Terpene synthase 7 introns chr1:214886200 AC217910.3_FG004 Hypothetical, unknown proteinGRMZM2G143883 Long-chain-alcohol oxidase FAO1- none chr1:21 CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 80 827787 -1.067 2.01E-02 3647436 -1.420 1.26E-02 15473931 -1.951 2.55E-09 -168962676 -1.128 2.54E-02 78-144840927 -2.279 2.28E-09 4599067-204601218 -2.323 1.35E-04 chr5:48384381-48390703 -1.113 1.03E-02 chr5:38567212-38578529 -0.919 9.76E-02 chr4:151478038-151481556 -1.566 1.34E-05 chr4:17332223-17339190 -1.117 1.62E-02 chr4:693736-696087 -1.082 6.43E-02 chr3:147397873-147401506 -1.711 1.65E-06 UTR) UTR) UTR) ′ ′ ′ exon (3 and intron nonenoneintron chr4:153318407-153320975 -1.564 chr4:197071819-197073073 1.26E-02 none -1.530 chr4:204275345-204276126 -1.561 8.66E-02 intron 3.64E-03 chr5:8334393-8335549(3 chr5:10172933-10174782and intron -1.807 -1.055 4.72E-02 2.88E-02 intron chr4:30678789-30681127tron -2.793 1.34E-05 exon andtron in- exon (3 nonetron chr3:101735928-101737972 -4.456 6.76E-02 similaritylipase/hydrolase-like protein (rice and to others)proline-rich protein and APG anther-specific GDSL-motif ferase main superfamily protein fense peptide) protein SANT Superfamily DUF247 superfamily dehydrogenase lase/oxygenase activase,plast precursor chloro- like GRMZM2G042758 Hypothetical protein with GRMZM2G163925 Phosphoribosylanthranilate trans- GRMZM2G017268 Putative Myb DNA-bindingGRMZM2G027479 do- Thionin-like proteinGRMZM2G155253 (plant Fructose-bisphosphateGRMZM2G401040 aldolase de- exon ATPGRMZM2G455869 synthase F1, delta MYB subunit family transcriptionGRMZM2G377341 Acetyl-CoA factor carboxylase 1-like chr4:20 in exon GRMZM2G337387 Hypothetical, unknownGRMZM2G374302 protein, ArgininedecarboxylaseGRMZM2G038243 family protein intron exon and in- chr4:1448356 GRMZM2G143373 Cis-2, 3-dihydrobiphenyl-2,3-diol GRMZM5G882078 Putativeankyrin-kinaseGRMZM2G077333 PSII22kDaproteinGRMZM2G132169 L-ascorbateoxidaseGRMZM2G162200 Ribulose bisphosphate exon carboxy- intron none chr3:168959356 chr3:174825201-174 chr3:183644921-18 GRMZM2G121878 Carbonicanhydrase intron chr3:215466579-2 GRMZM2G138258 Hypothetical, unknown protein exon and in- GRMZM2G062531 Methylsterol monooxygenase 1-1- CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 81 1.048 7.08E-02 8790352 -1.356 2.49E-02 922279 -0.998 4.05E-02 107312261 -1.036 4.25E-02 199595057 -1.501 2.63E-02 2-136583153 -1.537 1.21E-03 4835-202305876 -1.258 8.66E-02 0228137-190232283 -1.133 9.68E-03 92613610-192614489 -1.157 8.04E-02 chr6:151462209-151465411 -1.495 9.68E-02 chr6:133868502-133877776 -1.482 4.14E-02 chr6:103602046-103609491 -1.396 2.49E-02 chr6:63430823-63436487 -1.568 3.68E-02 chr6:61601766-61603753 -1.826 3.63E-02 chr5:213264683-213268374 -0.916 7.16E-02 UTR) UTR ′ UTR) UTR) UTR, in- ′ ′ ′ ′ exon andtron in- none(3 and chr6:119876153-119878032 introns -1.396 2.49E-02 5 trons,and exon 3 exons and in- tron exon (3 nonenone chr5:209873678-209875630 -1.041 4.38E-02 (3 chr5:210776511-210779081 -1.710 1.97E-07 none chr5:69114970-69116904 -1.968 1.21E-03 ferase 5 glucosyltransferase (putative nitrate transporter) XBAT33 andcontaining protein ankyrin repeat- DNA-binding protein cnd41 carboxylateoxidase) oxidase 1 (Acc PSI light-harvesting complex type 4 protein ferase 8 GRMZM2G059637 Glycerol-3-phosphate acyltrans- GRMZM2G030583 Monoterpene synthaseGRMZM2G088501 Transporter-like protein introns chr6:107307539- exon GRMZM2G095404 Peroxidase11-likeGRMZM2G162755 Anthocyanidin 3-O- none chr6:118788607-11 GRMZM2G169321 Adiponectinreceptor1GRMZM2G137421 NRT1/ PTR FAMILY 4.6-like exon chr6:96919201-96 GRMZM2G142345 E3 ubiquitin-protein ligase GRMZM2G358540 VAMPGRMZM2G461716 protein SEC22 Putative chloroplast nucleoid none chr6:655472-656792 - GRMZM2G052422 1-aminocyclopropane-1- GRMZM2G318662 Hypothetical, unknown protein exon GRMZM2G147172 PutativeGRMZM2G169458 F-box family protein FattyGRMZM2G398807 aldehyde dehydrogenase 1 CorticalGRMZM2G162529 cell-delineating protein intron noneGRMZM2G466309 Hypothetical, noneGRMZM2G103101 unknown protein Chlorophyll a-b binding protein 4; none chr5:19 chr5:13658165 chr5:1 none chr5:20230 chr5:199592727- GRMZM2G020320 Glycerol-3-phosphate acyltrans- CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 82 7 1.96E-02 -141294830 -1.763 2.27E-02 -80173134 -0.938 9.03E-02 3-5130176 -1.077 1.97E-02 129-2403240 -2.026 1.08E-04 0194502-10196450 -1.070 1.38E-02 803178-44815872 -2.003 1.29E-04 120671025-120672700 -1.763 6.95E-02 chr9:136175680-136180522 -1.518 7.34E-02 chr9:126718179-126723107 -1.156 9.11E-03 chr8:154028733-154034508 -1.109 3.68E-02 chr8:71714905-71721778 -1.804 8.25E-03 chr8:38487905-38488708 -1.006 2.62E-02 chr7:160255313-160258286 -1.199 2.85E-03 chr7:9201072-9202357 -1.412 5.30E-02 chr7:5523491-5527292 -1.202 2.99E-03 UTR ′ UTR) UTR) UTR ′ ′ ′ 3 exonexonsnone chr8:162182482-162183632 -1.811 chr8:166481614-166484307none 1.96E-02 -0.977 chr9:4308985-4310333 7.72E-02 none chr9:61296279-61301686exon -1.485 andtron in- -1.713 9.68E-02 chr9:106201175-106203143 2.37E-07 -1.394 1.35E-04 exon andtron in- exon andtrons in- none(3 chr8:10287956-10291623 -0.930 9.78E-02 exon andtron in- onic 3 (3 main superfamily protein feruloyl transferase-like like 1 (PEPCase 1) LIKE 16 17 protein like glucose transporter member 8 (RING finger) domainprotein containing PSI antenna protein GRMZM2G181546 Hypothetical,GRMZM2G046284 unknown protein Fructose-bisphosphateGRMZM2G079490 aldolase Aspartic none proteinase nepenthesin-1 none exon chr10:2402 chr10:1 chr10: GRMZM2G022686 Thioredoxin family protein exon and GRMZM2G082376 Putative RINGGRMZM2G034360 zinc-finger Omega-hydroxypalmitate do- GRMZM2G465046 GDSL esterase/lipase O- At1g74460- GRMZM2G083841 Phosphoenolpyruvate carboxylase GRMZM2G039586 GATA transcriptionGRMZM2G107886 factor 20 Zinc finger proteinGRMZM2G012140 CONSTANS- Putative none glycosyl hydrolase family chr9:80170947 GRMZM2G028134 MetalionbindingproteinGRMZM5G833406 IAA-amino acid hydrolase ILR1- none chr8:141293857 GRMZM2G028570 Solute carrier family 2, facilitated GRMZM2G003930 Putative zinc finger, C3HC4 type GRMZM2G097141 Hypothetical, unknown protein exon GRMZM2G177424 FlavonoidGRMZM2G058310 7-O-methyltransferaseGRMZM2G072280 intron Chlorophyll Beta-amylase a-b binding protein; chr7:44 none chr7:155357370-155360570 -1.20 GRMZM2G046750 Lipid binding protein exon and ex- GRMZM2G129246 Glycolateoxidase exon GRMZM2G012397 photosystem I reaction center6 none chr7:512927 CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 83 mutant counts were zero. 0304967 -1.888 7.96E-05 rpd1 erg method (Benjamini & Hochberg 1995) by the .girinst.org/censor/) with 0.7 similarity or greater to del defined transcription unit as identified by visual intronintron chr10:130620219-130622989exon -2.465 chr10:147593063-147597356 7.06E-03 -1.196 2.73E-03 chr10:147890562-147891878 -1.237 2.66E-02 mutant / WT); for lines where no fold change is indicated, either WT or TE. rpd1 21/22 lase 1 tein CP24 precursor ( 2 Panicoideae -values have been adjusted to account for multiple testing with the Benjamini-Hochb Bold entries highlight differential transcription that corresponds to the gene-mo inspection. TE content within genes isan defined annotated as a positive hit in Repeatmasker (http://www DESeq analysis (Anders & Huber 2010). p Genomic position is basedFold on change the is B73 Log maize reference genome, version 2, build 5b. a b e c d GRMZM2G031660 Beta-glucosidase intron chr10:130301051-13 GRMZM2G043813 NAC domain-containingGRMZM2G150248 protein Lysine-specific histoneGRMZM2G092427 demethy- Chlorophyll a/b binding apopro- CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 84 d c -Value 1.05E-05 P /WT) 2 .967 3.76E-02 rpd1 Log Fold Change ( 3 2.692 1.01E-02 67115 3.259 3.39E-03 749 2.459 4.11E-03 b 93648778 3.737 1.29E-02 76571270 2.708 4.82E-03 -224047387 1.836 5.44E-02 chr5:67476538-67482854 3.790 8.89E-10 chr1:193768325-193775666 4.161 4.19E-08 chr1:167851430-167863929 3.263 1.62E-02 Genomic Position a UTR) chr1:230198773-230199704 2.000 3.27E-02 UTR) chr1:230186289-230191174 2.628 1.75E-05 ′ ′ mutants - Increased Transcription - Antisense Orientation exonsnone chr7:115182036-115185167 3.788 chr7:130092349-130094061 2.55E-08 2.841 6.18E-04 exonexonexons chr3:214857327-214861719 andtron in- chr5:14650208-14653205 2.177 6.86E-03 3.848 6.05E-07 exon (3 trons Content exonand 3’ UTR chr1:75147439-75151689 2.082 4.63E-02 rpd1 or armadillo/beta-catenin-likecontaining repeat- protein family like protein ATHB-4 phosphatase CVP2-like protein protein related protein containing protein Table 3.5: Genes differentially transcribed in GRMZM2G074634GRMZM2G171408 Cysteine-type peptidase U-box domain-containingAC225205.3_FG003 protein 10 Phosphatidylinositol 3-and 4-kinase none chr5:84101452-84105175 1 GRMZM2G107540 Histone H2AGRMZM2G052365GRMZM2G126239 Subtilase-like protease-like Homeobox-leucineGRMZM2G094900 zipper TypeGRMZM2G472346 I protein inositol-1,4,5-triphosphate 5- IQ intron calmodulin-binding none motif family chr3:193637409-1 chr1:270053078-270054144 3.162 GRMZM5G891855GRMZM5G837822 Hypothetical,GRMZM2G430936 unknown protein Hevamine-A - chitinase (plant Acidic defense) endochitinase-like none intron chr3:176570295-1 chr2:224015038 none chr3:176597396-17659875 GRMZM2G131715 Hypothetical, unknown protein exon (3 GRMZM2G131683 Electron transporter, glutaredoxin- GRMZM2G443308GRMZM2G421579 Hypothetical, unknown protein Putative MYB DNA binding protein exons none and in- chr1:177766328-1777 Gene IDGRMZM2G385021 Annotation Hypothetical, unknown protein none chr1:45510990-45511 TE/Repeat GRMZM2G043141 Hypothetical,GRMZM2G130034 DUF 1336 Putative ureidoglycolate hydrolase Domain- exons, introns CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 85 .294 2.95E-02 mutant counts were zero. rpd1 18496 3.19306927 4.82E-03 2.761 2.76E-02 5405 2.649 2.58E-05 058 2.434 1.82E-04 -139543922 3.939 4.11E-03 12-146548756 3.075 1.07E-02 erg method (Benjamini & Hochberg 1995) by the .girinst.org/censor/) with 0.7 similarity or greater to chr8:14597519-14604732 2.932 4.19E-08 chr7:132186216-132203350 2.350 2.39E-02 UTR) UTR) chr9:142015409-142016063 2.619 2.68E-03 UTR) chr8:166909761-166911646 2.615 7.47E-03 ′ ′ ′ intron chr8:168823637-168828313 3.019 4.19E-08 exonexon (5 chr7:168403976-168405995 2.213 4.82E-03 and introns nonetrons chr7:130097614-130101719 3.396 4.78E-09 mutant / WT); for lines where no fold change is indicated, either WT or TE. rpd1 containing protein subunit 4-likekinetochore protein protein and putative TICCINO 2A-like family like protein ( 2 Panicoideae -values have been adjusted to account for multiple testing with the Benjamini-Hochb TE content within genes is defined as a positive hit in Repeatmasker (http://www an annotated DESeq analysis (Anders & Huber 2010). p Genomic position is basedFold on change the is B73 Log maize reference genome, version 2, build 5b. a d b c GRMZM2G084473 cytokinin receptor exon chr10:1465451 GRMZM2G180258 Hypothetical, unknown protein exon chr10:2137242-2141 GRMZM2G510796 Hypothetical, unknown protein none chr8:168817914-1688 GRMZM2G045560 WRKYGRMZM2G149105GRMZM5G872568 DNA-binding Hypothetical,GRMZM5G884151 unknown protein Hypothetical, unknown protein domain- Hypothetical, unknown protein none intron+ exon (3 chr9:139536440 chr8:174306169-1743 GRMZM2G009894GRMZM2G396248 60SGRMZM2G396246 ribosomal protein L29 Cytochrome P450 Hypothetical, unknown protein exon none (3 none chr8:36503543-3650 chr8:166900775-166902970 2 GRMZM2G082365 H/ACA ribonucleoproteinGRMZM2G181266 complex 3-hydroxyacyl-CoA dehydratase PAS- GRMZM2G181378 E3 ubiquitin-protein ligase UPL5-like exon and in- GRMZM5G824938 Phosphatidylinositol 3-and 4-kinase CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 86 d c -Value P /WT) 2 rpd1 Log Fold Change ( mutant counts were zero. rpd1 b 364951 -2.948 2.44E-02 3037791 -3.858 5.91E-02 erg method (Benjamini & Hochberg 1995) by the .girinst.org/censor/) with 0.7 similarity or greater to Genomic Position a none chr10:2466640-2467603 -2.841 9.33E-02 introns chr1:140039425-140056837 -1.933 2.22E-02 Content mutants - Decreased Transcription - Antisense Orientation rpd1 mutant / WT); for lines where no fold change is indicated, either WT or TE. rpd1 perfamily protein idase ( 2 Panicoideae Table 3.6: Genes differentially transcribed in -values have been adjusted to account for multiple testing with the Benjamini-Hochb TE content within genes is defined as a positive hit in Repeatmasker (http://www an annotated DESeq analysis (Anders & Huber 2010). p Genomic position is basedFold on change the is B73 Log maize reference genome, version 2, build 5b. a d b c GRMZM2G023033AC197146.3_FG002 Hypothetical, unknown Putative protein Myb DNA-binding domain su- intron chr2:33036267-3 GRMZM5G819500 Hypothetical, unknown protein exon chr1:148364112-148 Gene IDGRMZM2G061988 Annotation Putative pyridoxamine 5-phosphate ox- TE/Repeat CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 87

Table 3.7: TEs differentially transcribed in rpd1 mutants - Increased Transcription - Sense Orientationa

b d TEID GenomicPosition Log2 Fold p-Value Change (rpd1 /WT)c RLG_prem1_AC201801_7095 chr1:168882596-168883075 5.129 3.66E-02 ZM_hAT_noncoding_120 chr1:196249905-196250023 5.392 8.19E-03 RLX_uwum_AC177933_415 chr2:107508182-107508735 3.395 8.78E-08 RLG_yfages_AC197085_5383 chr3:36221587-36223382 2.896 6.81E-06 RLC_ufonah_AC194064_3852 chr3:36223389-36224416 2.807 1.42E-02 RLG_fege_AC205532_8884 chr3:36224420-36224976 3.459 3.03E-05 RLC_ufonah_AC194064_3852 chr3:36224716-36225104 3.301 1.19E-02 RLX_milt_AC211742_11402 chr3:122962737-122968593 2.739 3.03E-05 RLC_machiavelli_AC200490_6883 chr3:160664585-160664712 2.842 3.78E-05 RLC_machiavelli_AC200490_6883 chr3:160664720-160664907 2.911 4.38E-05 RLX_fageri_AC204875_8470 chr5:70052242-70052969 1.767 8.27E-02 ZM_hAT_8 chr5:84105173-84105372 3.433 3.68E-02 DTM_Zm00460_consensus chr5:98657883-98658114 2.317 3.39E-04 RLX_name_AC197689_5725 chr7:10972520-10972677 — 4.97E-04 RLG_neha_AC215285_13046 chr8:143766280-143766943 4.954 2.79E-04 RLX_naseup_AC196428_5108 chr9:69638177-69640432 2.984 1.08E-02 RLX_milt_AC209648_10275 chr9:148227395-148228110 1.905 1.41E-02 RLX_baso_AC192251_3423 chr10:23192733-23192950 2.154 6.11E-02 RLC_ji_AC193479_3665 chr10:23192806-23193005 2.191 4.97E-02 RLC_gudyeg_AC206942_9404 chr10:105781195-105781407 5.322 1.09E-02 RLC_gudyeg_AC206942_9404 chr10:105781195-105782219 5.322 1.09E-02 RLC_gudyeg_AC206942_9404 chr10:105781357-105784535 5.322 1.09E-02 RIX_ogepy_AC198740-0 chr10:142977228-142979196 4.492 1.25E-02 a Bold entries are TEs not located within genes or 5 kb of either side (see Materials and Methods). b Genomic position is based on the B73 maize reference genome, version 2, build 5b. c Fold change is Log2 (rpd1 mutant / WT); for lines where no fold change is indicated, either WT or rpd1 mutant counts were zero. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 88

Table 3.8: TEs differentially transcribed in rpd1 mutants - Decreased Transcription - Sense Orientationa

b d TEID GenomicPosition Log2 Fold p-Value Change (rpd1 /WT)c RLG_flip_AC203163_7675 chr2:64555076-64557648 -5.170 2.76E-02 ZM_Stowaway_30 chr2:102138521-102138680 -1.983 8.69E-02 RLX_yraj_AC205486_8834 chr10:132170975-132171029 — 2.45E-02 a Bold entries are TEs not located within genes or 5 kb of either side (see Materials and Methods). b Genomic position is based on the B73 maize reference genome, version 2, build 5b. c Fold change is Log2 (rpd1 mutant / WT); for lines where no fold change is indicated, either WT or rpd1 mutant counts were zero. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 89

Table 3.9: TEs differentially transcribed in rpd1 mutants - Increased Transcription - Anti- sense Orientationa

b d TEID GenomicPosition Log2 Fold p-Value Change (rpd1 /WT)c RLX_vora_AC206187_9112 chr1:92817640-92817751 6.755 6.12E-09 RLG_flip_AC208040_9765 chr1:118536328-118537641 — 1.14E-03 RLC_giepum_AC211155_11010 chr1:180819578-180819782 3.248 3.17E-02 RLG_xilon- chr2:158762498-158763219 4.138 9.23E-05 diguus_AC195486_4629 RLG_xilon- chr2:158762921-158764622 3.524 5.06E-02 diguus_AC195486_4629 ZM_CACTA_noncoding_2 chr3:7523757-7524489 2.081 1.67E-03 ZM_CACTA_noncoding_2 chr3:7523878-7524489 2.081 1.67E-03 ZM_CACTA_noncoding_1 chr3:7523973-7524489 2.081 1.67E-03 ZM_CACTA_noncoding_7 chr3:7524044-7524580 2.075 1.68E-03 ZM_hAT_noncoding_12 chr3:36220145-36220665 3.019 2.25E-04 RLG_nihep_AC194441_4115 chr3:65920084-65921482 5.322 6.98E-03 RLC_ji_AC190978_2799 chr3:65921484-65921523 — 6.98E-03 RLC_ji_AC215728_13156 chr3:65921492-65922891 4.209 5.06E-02 ZM_CACTA_34 chr3:91408073-91408137 — 1.22E-04 RLX_hani_AC186285_1359 chr3:157967848-157968598 2.443 7.82E-02 RLX_avahi_AC191363_3084 chr4:63005326-63007571 3.113 3.98E-05 ZM_CACTA_89 chr5:123935141-123936168 — 2.31E-02 ZM_CACTA_noncoding_2 chr5:127956983-127957675 2.198 3.02E-03 ZM_CACTA_noncoding_25 chr6:12704019-12704523 2.183 2.66E-02 ZM_CACTA_noncoding_2 chr7:17711484-17712080 3.379 3.79E-02 RLX_vegu_AC190718_85 chr7:115190804-115191388 — 3.02E-03 ZM_CACTA_16 chr7:133871697-133872651 2.000 9.77E-02 RLX_ewib_AC207533_9599 chr8:138863636-138864837 2.550 9.48E-03 RLG_dagaf_AC208646_9966 chr9:69644374-69648016 4.524 2.25E-04 ZM_CACTA_noncoding_2 chr9:79123736-79124344 2.126 1.45E-02 ZM_CACTA_noncoding_10 chr9:115947886-115948286 2.553 4.47E-02 ZM_CACTA_noncoding_2 chr9:115948005-115948494 2.709 2.66E-02 ZM_CACTA_noncoding_7 chr9:115948057-115948584 2.603 5.55E-02 ZM_CACTA_noncoding_1 chr9:151268135-151268651 2.441 2.99E-03 ZM_CACTA_noncoding_7 chr9:151268206-151268741 2.482 2.35E-03 ZM_CACTA_16 chr10:66170238-66171076 4.392 7.51E-05 a Bold entries are TEs not located within genes or 5 kb of either side (see Materials and Methods). b Genomic position is based on the B73 maize reference genome, version 2, build 5b. c Fold change is Log2 (rpd1 mutant / WT); for lines where no fold change is indicated, either WT or rpd1 mutant counts were zero. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 90

Table 3.10: TEs differentially transcribed in rpd1 mutants - Decreased Transcription - An- tisense Orientationa

b d TEID GenomicPosition Log2 Fold p-Value Change (rpd1 /WT)c RLX_nuhan_AC206272_9161 chr1:32985086-32985358 -3.970 1.67E-02 RLX_votaed_AC215881_13209 chr1:117129922-117132946 -2.728 7.36E-03 RLX_ubat_AC212211_11652 chr1:117135865-117137824 -3.087 8.15E-04 RLC_labe_AC203760_7937 chr1:140040803-140042013 -3.841 3.91E-02 RLX_votaed_AC215881_13209 chr1:140040939-140041818 -5.087 3.17E-02 RLX_buire_AC194484_4158 chr5:30166464-30166598 -4.129 8.72E-02 a Bold entries are TEs not located within genes or 5 kb of either side (see Materials and Methods). b Genomic position is based on the B73 maize reference genome, version 2, build 5b. c Fold change is Log2 (rpd1 mutant / WT); for lines where no fold change is indicated, either WT or rpd1 mutant counts were zero. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 91

3.6 Figures

A WT rpd1

rRNA / tRNA rRNA / tRNA 19.9% 17.8% Repetitively Repetitively Mapped Mapped 19.9% 19.1% Unmapped Unmapped 36.4% Uniquely 40.4% Uniquely Mapped Mapped 23.8% 22.7% Mapping of High Quality Reads Mapping Quality High of

B C Mapped Reads Classified by Strand Distribution of

Overlapping Feature Intergenic All Genic Reads 600 100%

500 Sense 80% RPMM) 3 400 TE 60% TE & Genic 300

40% Antisense 200

Genic 20% 100 Read Abundance(10 Read

0 0% WTrpd1 WT rpd1 WTrpd1 WT rpd1 Repetitive Unique Repetitive Unique

Figure 3.1: GRO-seq reads are similarly distributed in WT and rpd1 mutant libraries. (A) Percentages of WT and rpd1 mutant GRO-seq reads that are unmappable, map to rRNA/tRNA sequences, map repetitively (>1 alignment), or map uniquely to the B73 refer- ence genome. (B) Distribution of repetitively and uniquely mapped reads (reads per million mapped; RPMM) from A to annotated genes (blue), TEs (red), both (purple), or neither (intergenic; yellow). (C) Strandedness of the best alignment to gene models of all potentially genic reads (those that align with genes only or with both genes and TEs; blue and purple regions from B, respectively). CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 92

100% Single transcript gene models wild-type GRO reads 80% rpd1 mutant GRO reads

60%

40%

20% Percent of total genic nucleotides genic total Percent of 0% unique unique non-unique non-unique intronic exonic intronic exonic

Figure 3.2: GRO-seq reads align to both exonic and intronic sequences. Percentages of exonic and intronic sequence in all annotated single transcript genes (grey; repeated for unique vs. non-unique sets) and in WT (blue) and rpd1 mutant (green) GRO-seq reads mapping to single transcript genes.

Randomly Distributed 32-mers

TE only TE / gene Genic only (614,413) overlap (13,955,555) (22,257,282)

Neither (14,574,979)

Figure 3.3: Most TE-like sequences can align to genes. Numbers and overlap of 32-mers randomly generated from the maize genome that align to annotated maize TE and gene sequences. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 93

Figure 3.4: Genic reads are highly enriched in GRO-seq mappable reads. Distribution of genic vs. non-genic nucleotides in the maize genome (Whole Genome) and distribution of all non-rRNA / non-tRNA B73 mappable GRO-seq reads to genic vs. non-genic sequences.

A B Non-Genic Reads within 5kb of Genes C Uniquely Mapping Intergenic Reads TE-Only Reads Reads Near Genes 100% Repetitive Unique 300

WT rpd1WT rpd1 250 eS n

>5kb 80%

200 es 60% RPMUM) 3 150 nA

40% 15 %35 %23 14 %45 %23 61 %22 %26 31 %22 %56 t 100 i

20% 50 nes es NearGenic 0% Reads(10 0 WTrpd1 WT rpd1 WT rpd1WT rpd1 Up Down Up Down Up Down Up Down WT rpd1 WT rpd1 Repetitive Unique Repetitive Unique 5kb Up 5kb Down D 4 Sum Coverage at Gene Boundaries

3

2

1 sense Total Abundance Total 0 Raw ReadsRaw 10pernt Window) 3 antisense (10 1 gene -5 TSS +1 -1 End +5 Position Relative to Gene Boundary (kb)

Figure 3.5: Nongenic GRO-seq reads are enriched near genes. (A) Percentage of nongenic/non-TE (intergenic) or TE-only reads that align near genes (within 5 kb; dark gray) or >5 kb (light gray) from genes. (B) Nongenic reads within 5 kb of genes found exclusively upstream (Up, left circle), downstream (Down, right circle), or at both ends (in- tersection) of a nearby gene. (C) Distribution of uniquely mapping near-genic reads (reads per million uniquely mapped; RPMUM) by strand orientation relative to the nearby gene model. (D) Metagene profile of uniquely mapping WT GRO-seq reads summed over 10-nt windows 65 kb from FGS models. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 94 Sense StrandSense

20 10 WT Uniquely Mapping Reads (RPMUM) Reads WT Uniquely Mapping Maize FGS Genes FGS Maize 60) of Groups (in

1 MaximumSum Contribution

0.1

0.01 0 Antisense Strand Antisense Maize FGS Genes FGS Maize 60) of Groups (in MaximumSum Contribution Gene -5 SST +1 1- dnE +5 Position (kb) Relative to Gene Boundary of 50 nt Windows

Figure 3.6: Distribution of uniquely mapping WT GRO-seq reads around genes. Reads per million uniquely mapped (RPMUM) WT coverage in 50 nt windows +/- 5 kb from gene boundaries in both sense (top plot) and antisense (bottom plot) orientations. Each row rep- resents the average coverage of 60 neighboring genes sorted by maximum sum contribution. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 95

First three bases of: FGS genes 5kb upstream from FGS genes

Top 4 Sets Top 4 S

2.0% 2.0%

2.0% 2.1% ets 2.4% 2.5% 2.6% 2.6% 1.8% ATG Other (64.1%) ATG (26.9%) Other (88.9%)

Figure 3.7: FGS gene annotations start with ATG more frequently than other loci. The first three bases of all FGS gene models were tallied to determine any bias for the ATG set indicative of translation start codons. As a control for general three base set distribution, the first three bases 5 kb upstream of the FGS start were also tallied. In addition to the ATG set, the top four ranking three base sets are individually highlighted. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 96

wild-type rpd1 mutant

Total Window Coverage (RPMUM) Window Total Sense

0 2000 4000 6000 8000 10000 12000 Genes Antisense

−1000 −500 0 500 1000 Start Position of 10 nt Window Relative to TSS

Figure 3.8: Alternative TSS definition has little effect on coverage profiles. TSS sites were defined by the 5 ′ end of full-length cDNAs (flcDNAs) from a library of predominantly 7 day-old seedling tissue (Soderlund et al. 2009). Metagene analysis was performed across all flcDNA defined TSSs in 10 nt non-overlapping windows with the total coverage in reads per million uniquely mapped (RPMUM) displayed. Pair-wise Welch’s t-tests evaluated the difference between WT and rpd1 mutant read abundance at each window, and a Holm- Bonferroni correction (α = 0.05) was applied to identify the windows that were significantly different (red horizontal bar and pink shading). CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 97

A Maximum Abundance Peak by TSS-Definition

Legend: ZM_BFc Zoomed Filtered flcDNAs Gene Set (43,344) (31,794)

Intersection (11,059 Models) 0 0.010 -100-50 0 50 100 Fraction of TSS Model Peaks 0 0.005 0.010 0.015 0.020 −1000 -500 0 500 1000 Peak Position Relative to TSS (nt) B Sampling of TSS Model Read Abundance by Peak Position

D No Reads (85)

Far Up (1,896)

Far Near Up (676) Downstream TSS (183) (6,125) Near Down

Reads Abundance (Raw) (2,094) -1000 -500 0 500 1000 TSS C

-1000 -100 100 1000 Far Upstream Far Downstream Peak Location Relative to TSS Gene Body

Figure 3.9: (Continued on next page.) CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 98

Figure 3.9: (A) The method used to define TSS position alters the profile of GRO-seq max- imal peaks (GRO-seq peaks). TSS sites were defined by >1kb genes from the maize filtered gene set (used in Figure 3.11 on the next page; blue); by the 5 ′ end of full length cDNAs (flcDNAs) from a library of predominantly 7-day old seedling tissue (ZM_BFc; green); or by the intersection of both libraries taking the 5 ′-most position as TSS (pink; see Materials and Methods). Metagene analysis was performed around each defined TSS position and the GRO-seq peaks were tallied for plotting. The insets show the overlap between TSS-definition methods (left) and a zoomed view around the TSS (as defined by each method; right). Subse- quent analysis (B-D) focuses on the intersection of FGS and ZM_BFc TSS positions (pink). (B) Raw abundances of WT sense strand GRO-Seq coverage in non-overlapping 1-nt win- dows. Each line represents the coverage around a single TSS position included in Figure 3.8 on page 96. Individual lines are color-coded based on the location of their maximal GRO-Seq coverage. (C) This cartoon depicts the color-coding and locations for different subsections of the 2 kb metagene window relative to the TSS. (D) Each TSS position from Figure 3.8 on page 96 was categorized into the five subsections by the location of the highest abundance GRO-seq peak over the 2kb region. A sixth group represents the TSS positions with zero coverage over the 2 kb region.

Chromosome 3 2.5 kb 0.82

WT (S) 0.82

rpd1 (S)

GRMZM2G061206 Genes

DTH Pif / Harbinger TEs 654 Readsper Million Uniquely Mapped

WT (AS) 654

rpd1 (AS)

Figure 3.10: Strong antisense coverage upstream of GRMZM2G061206. Genome browser view of GRMZM2G061206 showing sense (S) and antisense (AS) WT (black) and rpd1 mutant (green) reads. Annotated genes are in blue with blue lines indicating introns, thin boxes UTRs and thick boxes CDS, and the arrow indicates the TSS. An annotated Type II TE is in gold with an arrow indicating its orientation. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 99

A Mean Coverage over Genes - Inner 90% of Genes by Maximum Sum Contribution

Wild-type 0.10 rpd1 mutant

0.08

0.06

0.04

0.02 Readsper Mapped Uniquely Million sense 0.00 gene model antisense

-1 TSS +1 -1 End +1 Position Relative to Gene Boundary (kb) B

5 Log 2 ( rpd1 1

0 / WT)mutant -1 Maize FGS Genes FGS Maize 60) of Groups (in

-5.5 MaximumSum Contribution Gene Model -1 TSS +1 -1 End +1 Position (kb) Relative to Gene Boundary of 50 nt Windows

Figure 3.11: Pol IV loss alters global transcription profiles at gene boundaries. (A) WT and rpd1 mutant mean GRO-seq read coverage (black and purple lines, respectively) of 90% of the maize genes within 1 kb of gene start (TSS) or 3 ′ end (End). Gray and purple shading represent 95% confidence intervals; red horizontal bars highlight 10-nt non-overlapping win- dows that significantly differ between libraries (Welch’s t-test by window, corrected to α = 0.05 with the Holm–Bonferroni method for multiple sampling). (B) Variation in coverage between WT and rpd1 mutant libraries for all FGS genes. Fold change was calculated from the average coverage (reads per million uniquely mapped) of 60 neighboring genes when sorted by their maximum sum contribution. Fifty-nucleotide windows with zero coverage in either library are plotted in white. The gold bar highlights the inner 90% of genes used in A. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 100

Number of Region Relative to Gene Models Genes TSS 6,000 1kb Upstream First 1kb Interior of Gene Last 1kb 1kb Downstream 1,250 Gene Model 10 3 10 3 3 10 3 250 ρ = 0.778 ρ = 0.971 10 ρ = 0.963 ρ = 0.977 ρ = 0.974 50 1:1 fit 1:1 fit 1:1 fit 1:1 fit 1:1 fit

Data fit Data fit Data fit Data fit Data fit Strand Antisense Strand Sense 10 1 10 1 1 1 1 10 2 10 10 10

10 -1

10 -1 10 -1 10 -1 10 -1 10 -3

10 -1 10 1 10 3 10 -1 10 1 10 3 10 -3 10 -1 10 1 10 3 10 -1 10 1 10 3 10 -1 10 1 10 3 ρ = 0.721 ρ = 0.676 ρ = 0.642 ρ = 0.738 ρ = 0.740 1:1 fit 1:1 fit 1:1 fit 1:1 fit 1:1 fit 1 Data fit Data fit 10 Data fit Data fit Data fit Mutant Abundance Mutant 10 1 10 1 10 1 10 1 -1 rpd1 10

-1 10 -1 10 -1 10 -1 10 10 -3 (ReadsMapped) Uniquely Million per kb per

10 -1 10 1 10 3 10 -1 10 1 10 -3 10 -1 10 1 10 -1 10 1 10 -1 10 1 WT Abundance (Reads per kb per Million Uniquely Mapped)

Figure 3.12: Correlation between wild-type (WT) and rpd1 mutant uniquely mapping GRO- seq coverage at genes. Near-genic regions were divided into five sections: 1 kb upstream, the first 1 kb after the TSS, a variable internal region, the last 1 kb before the gene end, and 1 kb downstream. GRO-seq coverage is plotted for both sense and antisense directions (relative to the gene model) in reads per kb per million uniquely mapped. To include genes with zero read abundance on the logarithmic scale, they were forced to 1/10th the minimum non-zero abundance of both datasets for that plot. Data fit lines represent linear regression models of the log10 transformed WT and rpd1 mutant abundances. Rho (ρ) values represent the Spearman’s rank correlation coefficient for each region. Total number of genes in each plot is 31,794 except for the interior of the gene where the total gene length had to be greater than 2 kb: 23,050 genes in total. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 101

6.5 Log 2 ( rpd1 1

0 / WT)mutant -1 Maize FGS Genes FGS Maize60) of Groups (in

-7.0 MaximumSum Contribution Gene Model -1 TSS +1 -1 End +1 Position (kb) Relative to Gene Boundary of 50 nt Windows

Figure 3.13: Fold change between WT and rpd1 mutant antisense GRO-seq read coverage. Variation in antisense coverage between WT and rpd1 mutant libraries for all FGS genes is plotted. Each horizontal line represents the average coverage of 60 neighboring genes when sorted by their maximum sum contribution (see Materials and Methods). Fifty nucleotide windows with zero coverage in either library are plotted in white. The gold bar highlights the inner 90% of genes used in Figure 3.11A on page 99. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 102

Uniquely Mapping 24mers near Gene Boundaries Sense Strand Antisense Strand A 2.0 Wild-type (WT) Wild-type (WT) rdr2 mutant rdr2 mutant

1.5

1.0

0.5 Mean CoverageMean (RPMUM) 0.0

-0.5

B 10 4e oeae(PU)Log (RPMUM) Coverage 24mer

1 WT Coverage WT (60gene groups) C 0.1 mutant mutant

Coverage 0.01 rdr2 0 (60gene groups) D 12 2 ( rdr2 mutant / WT)

mutant vs mutant WT 1 0

rdr2 -1 (60gene groups)

-12 Fold Change Fold

Gene Model Gene Model -1 TSS +1 -1 End +1 -1 TSS +1 -1 End +1 Position Relative to Gene Boundary (kb) Position Relative to Gene Boundary (kb)

Figure 3.14: Coverage of uniquely mapping 24mers near gene boundaries. Small RNA datasets representing B73 (WT) or rdr2 mutants introgressed into B73 were pooled from publically available datasets, processed, aligned, and uniquely mapping 24mers were tallied near gene boundaries as with the GRO-seq datasets (see Materials and Methods). (A) Mean coverage for sense (left) and antisense (right) strands within 1 kb of gene boundaries for the same 90% of genes analyzed in the GRO-seq metagene profiles (see Figure 3.11A on page 99). 95% confidence intervals are shaded in grey and pale purple. (B-C) Mean coverage of uniquely mapping 24mers in groups of 60 neighboring genes, when sorted as in the GRO-seq analysis (see Figure 3.11B on page 99) in WT (B) and rdr2 mutant (C) datasets. (D) Log2 fold-change of rdr2 mutant mean coverage (C) over WT (B). Only blocks (50 nt window over 60 genes) with coverage in both datasets are analyzed–any blocks with zero coverage in either or both datasets are left white. Gold vertical bars highlight the genes that were included in the metagene plots in (A). TSS: transcription start site; end: 3 ′ annotated end of gene; WT: wild-type; RPMUM: reads per million uniquely mapped. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 103

A Sense Gene Transcription Antisense Gene Transcription B Differentially Transcribed Genes (rpd1 / WT) 4 4 / WT) /

2 2 Decreased rpd1 rpd1 transcription Increased (sense) (102) 0 0 transcription (sense) -2 -2 (45) (Fold Change Change (Fold 2 Increased -4 -4 transcription Log (both) (1) 1 10 100 1000 10,000 1 10 100 1000 Increased Decreased Average Normalized Counts Average Normalized Counts transcription transcription (antisense) (31) (antisense) (4) C Chromosome 10 1.76 13 kb 1.76 WT (S) rpd1 (S) ocl2 Genes RLG_ubid Stowaway RLX_mibaab RLX_hopscotch hAT RLC_tiwe Tourist RLX_wuwe Repeats (RPMUM) 3.13 hAT 3.13 WT (AS)

0 rpd1 (AS) GRO ReadRepresentation GRO D Chromosome 7 9.8 kb 4.90 WT (S) 4.90 rpd1 (S) GRMZM2G171408 Genes RLX_eweko RLX_vegu RLX_vegu hAT RLX_name Repeats (RPMUM) RLG_huck RLC_debeh 2.45 RLC_debeh WT (AS) 2.45 GRO ReadRepresentation GRO rpd1 (AS) E Type II TE RIX RLX RLG

Type I TE I Type RLC Tandem Convergent Gene

(<2kb from 3' end) 3' from(<2kb None DownstreamFeature 0 2 4 6 8 10 12 14 16 18 Number of genes with increased antisense transcription in the absence of RPD1

Figure 3.15: (Continued on next page.) CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 104

Figure 3.15: Specific alleles are susceptible to Pol IV-induced changes in gene expression. (A) Across FGS gene bodies, the log2 fold change (rpd1 mutant/WT) of uniquely mapping read coverage vs. total coverage (average of WT and rpd1 mutant reads). Triangles rep- resent genes with infinite fold change due to zero coverage from WT (top) or rpd1 mutant (bottom) uniquely mapping reads. Of the 39,656 FGS gene bodies analyzed, those with zero coverage in both WT and rpd1 mutant datasets (7783 and 9667 for sense and antisense strand transcription, respectively) were excluded from the plots. Purple dots represent genes with significantly (by the DESeq statistical method of Anders and Huber (2010); see Ma- terials and Methods) increased or decreased GRO-seq read representation in rpd1 mutants. Teal dots represent genes within 20 Mb of the rpd1 locus whose decreased transcription in rpd1 mutants may reflect alignment artifacts (see Materials and Methods) and are excluded from subsequent analysis. (B) Distribution of categories (by direction of the change and strand) among transcriptionally altered genes in rpd1 mutants. (C) Genome browser view of WT (black peaks) and rpd1 mutant (green peaks) GRO-seq reads (normalized to reads per million uniquely mapped; RPMUM) in sense (S) and antisense (AS) orientation over the ocl2 -coding region and 3 kb of flanking genomic sequences on chromosome 10. Gray- shaded area highlights the ubid TE fragment 5 ′ of ocl2 having increased transcription in rpd1 mutants. (D) Gene browser view of GRMZM2G171408 showing increased antisense transcription in rpd1 mutants. Sense (S) and antisense (AS) transcription occur in distinct units of GRO-seq coverage in both WT (black peaks) and rpd1 mutant (green peaks) li- braries. (E) Distribution of downstream features within 2 kb by type. Type I TEs are subdivided into LINE-like elements (RIX) and Copia (RLC), Gypsy (RLG), and Unknown (RLX) classes of LTR TEs. Color coding in E applies to the TEs in browser views (C-D). Arrows indicate orientation of gene and TE features. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 105

ocl2 RNA abundance in 8-day old seedlings ) aat Ct ive to ΔΔ (Relat

Dcl3 dcl3-2 Rpd1 rpd1-1 Dcl3 dcl3-2 Rpd1 rpd1-1

Figure 3.16: qRT-PCR analysis of ocl2 in the absence of RPD1 and DCL3. Three biological replicates each of three 8 day-old seedling shoots were analyzed for ocl2 and aat (control) RNA levels by qRT-PCR with three technical replicates each. The relative RNA abundance represents 2(aat Ct - ocl2 Ct) where mean and confidence intervals of +/- 1 s.e.m. were calculated on the ∆Ct values prior to converting to fold change. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 106

A B C 0.8 Distribution of GRO-seq Read TE Type I Type II ZmB73 TE Distributions (Retroelements) (DNA Transposons) Categories WT 0.6

0.4 / WT) count ratios count WT) / rpd1 rpd1 (

2 0.2 rpd1

Gypsy LTR 0 Copia LTR Unknown LTR Normalized log Normalized LTR LTR hAT

LINE -0.2 LINE doppia Tourist CACTA Helitron Mariner Mutator En/Spm Type II DNA TEs L1 Stowaway Copia Gypsy Gypsy non-coding non-coding UnknownLTR PIF/Harbinger UnknownLINE hAT hAT D Unique Sense TE Transcription Unique Antisense TE Transcription E CACTA 6 Differentially Transcribed 4 Unique TEs

/ WT) / 4

2 2 rpd1 rpd1 Unknown LTRs (18) 0 0

-2 -2 Gypsy LTRs (10) -4 Type II

(Fold Change Change (Fold -4 DNA TEs 2 -6 Copia LTRs (22)

log (12) 1 10 100 1000 1 10 100 1000 Average Normalized Counts Average Normalized Counts F LINE (1) Chromosome 3 6 15 kb 6 WT (S) rpd1 (S) RLG_prem1 RLG_prem1 RLX_milt RLG_gyma RLG_grande Repeats 6 GRO Read GRO WT (AS) 6 rpd1 (AS) Gypsy LTR Unknown LTR Representation(RPMUM)

Figure 3.17: (Continued on next page.) CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 107

Figure 3.17: Pol IV loss affects both entire TE families and individual elements. (A) Distri- bution of TE categories within the B73 genome (Schnable et al. 2009). (B) Distribution of total (unique and repetitive) WT and rpd1 mutant GRO-seq reads within the different TE categories shown in A. (C) Log2 ratios (rpd1 mutant/WT) of GRO-seq reads, normalized to total mappable reads, mapping to annotated TE superfamilies. (D) Log2 fold change (rpd1 mutant/WT) of uniquely mapping reads in sense and antisense orientation to genomic regions annotated as TEs vs. total coverage (averages of WT and rpd1 mutant reads) to those regions. Triangles represent TEs with infinite fold change due to zero coverage from WT (top) or rpd1 mutant (bottom) uniquely mapping reads. Of the 1,612,638 TE annota- tions analyzed, those with zero coverage in both WT and rpd1 mutant datasets (1,392,382 and 1,399,008 for sense and antisense strand transcription, respectively) were excluded from the plots. Purple dots represent TEs with significantly (by the DESeq statistical method of Anders and Huber (2010); see Materials and Methods) increased or decreased GRO-seq read representation in rpd1 mutants; orange stars or triangles represent those differentially transcribed TEs farther than 5 kb from the nearest FGS gene. Teal dots represent TEs within 20 Mb of the rpd1 locus whose decreased transcription in rpd1 mutants may reflect alignment artifacts (see Materials and Methods) and are excluded from subsequent analysis. (E) Distribution of transcriptionally altered unique TEs among TE categories shown in A. (F) Genome browser view of WT (black peaks) and rpd1 mutant (green peaks) GRO-seq reads (normalized to reads per million uniquely mapped; RPMUM) in sense (S) and an- tisense (AS) orientation over a 15-kb interval on chromosome 3 containing an RLX_milt type I element with increased transcription in rpd1 mutants. Only the element outlined in black has significantly altered GRO-seq read representation in rpd1 mutants based on the statistical threshold used (Anders & Huber 2010; see Materials and Methods). CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 108

3.7 References

1. Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J. E., White, J., Sikkink, K. & Chandler, V. L. (2006) An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442, 295–298. 2. Anders, S. & Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biology 11, R106. 3. Ariel, F., Jegu, T., Latrasse, D., Romero-Barrios, N., Christ, A., Benhamed, M. & Crespi, M. (2014) Noncoding transcription by alternative RNA polymerases dynami- cally regulates an auxin-driven chromatin loop. Molecular Cell 55, 383–396. 4. Ashe, A., Sapetschnig, A., Weick, E.-M., Mitchell, J., Bagijn, M. P., Cording, A. C., Doebley, A.-L., Goldstein, L. D., Lehrbach, N. J., Le Pen, J., Pintacuda, G., Sakaguchi, A., Sarkies, P., Ahmed, S. & Miska, E. A. (2012) piRNAs can trigger a multigenerational epigenetic memory in the germline of C. elegans. Cell 150, 88–99. 5. Barber, W. T., Zhang, W., Win, H., Varala, K. K., Dorweiler, J. E., Hudson, M. E. & Moose, S. P. (2012) Repeat associated small RNAs vary among parents and following hybridization in maize. Proceedings of the National Academy of Sciences 109, 10444– 10449. 6. Barbour, J. R., Liao, I. T., Stonaker, J. L., Lim, J. P., Lee, C. C., Parkinson, S. E., Kermicle, J., Simon, S. A., Meyers, B. C., Williams-Carrier, R., Barkan, A. & Hollick, J. B. (2012) required to maintain repression2 is a novel protein that facilitates locus- specific paramutation in maize. The Plant Cell 24, 1761–1775. 7. Baucom, R. S., Estill, J. C., Chaparro, C., Upshaw, N., Jogi, A., Deragon, J.-M., Westerman, R. P., SanMiguel, P. J. & Bennetzen, J. L. (2009) Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genetics 5, e1000732. 8. Benjamini, Y. & Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B 57, 289–300. 9. Brink, R. A. (1956) A genetic change associated with the R locus in maize which is directed and potentially reversible. Genetics 41, 872–889. 10. Brink, R. A. (1958) Paramutation at the R locus in maize. Cold Spring Harbor Symposia on Quantitative Biology 23, 379–391. 11. Chandler, V. L. & Stam, M. (2004) Chromatin conversations: mechanisms and impli- cations of paramutation. Nature Reviews Genetics 5, 532–544. 12. Coe, E. H. Jr. (1961) A test for somatic mutation in the origination of conversion-type inheritance at the B locus in maize. Genetics 46, 707–710. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 109

13. Core, L. J., Waterfall, J. J., Gilchrist, D. A., Fargo, D. C., Kwak, H., Adelman, K. & Lis, J. T. (2012) Defining the Status of RNA Polymerase at Promoters. Cell Reports 2, 1025–1035. 14. Core, L. J., Waterfall, J. J. & Lis, J. T. (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845– 1848. 15. De Vanssay, A., Bouge, A.-L., Boivin, A., Hermant, C., Teysset, L., Delmarre, V., Antoniewski, C. & Ronsseray, S. (2012) Paramutation in drosophila linked to emergence of a piRNA-producing locus. Nature 490, 112–115. 16. Dennis, E. S. & Peacock, W. J. (2007) Epigenetic regulation of flowering. Current Opinion in Plant Biology 10, 520–527. 17. Eichten, S. R., Ellis, N. A., Makarevitch, I., Yeh, C.-T., Gent, J. I., Guo, L., McGinnis, K. M., Zhang, X., Schnable, P. S., Vaughn, M. W., Dawe, R. K. & Springer, N. M. (2012) Spreading of heterochromatin is limited to specific families of maize retrotransposons. PLoS Genetics 8, e1003127. 18. Erhard, K. F. Jr., Parkinson, S. E., Gross, S. M., Barbour, J. R., Lim, J. P. & Hollick, J. B. (2013) Maize RNA Polymerase IV defines trans-generational epigenetic variation. The Plant Cell 25, 808–819. 19. Erhard, K. F. Jr., Stonaker, J. L., Parkinson, S. E., Lim, J. P., Hale, C. J. & Hollick, J. B. (2009) RNA Polymerase IV functions in paramutation in Zea mays. Science 323, 1201–1205. 20. Erhard, K. F. Jr., Talbot, J. R. B., Deans, N. C., McClish, A. E. & Hollick, J. B. (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199, 1107–1125. 21. Gent, J. I., Madzima, T. F., Bader, R., Kent, M. R., Zhang, X., Stam, M., McGinnis, K. M. & Dawe, R. K. (2014) Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. The Plant Cell. 22. Gent, J. I., Dong, Y., Jiang, J. & Dawe, R. K. (2012) Strong epigenetic similarity between maize centromeric and pericentromeric regions at the level of small RNAs, DNA methylation and H3 chromatin modifications. Nucleic Acids Research 40, 1550– 1560. 23. Gent, J. I., Ellis, N. A., Guo, L., Harkess, A. E., Yao, Y., Zhang, X. & Dawe, R. K. (2013) CHH islands: de novo DNA methylation in near-gene chromatin regulation in maize. Genome Research 23, 628–637. 24. Greaves, I. K., Groszmann, M., Ying, H., Taylor, J. M., Peacock, W. J. & Dennis, E. S. (2012) Trans chromosomal methylation in Arabidopsis hybrids. Proceedings of the National Academy of Sciences 109, 3570–3575. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 110

25. Gross, S. M. & Hollick, J. B. (2007) Multiple trans-sensing interactions affect meiotically heritable epigenetic states at the maize pl1 locus. Genetics 176, 829–839. 26. Haag, J. R., Brower-Toland, B., Krieger, E. K., Sidorenko, L., Nicora, C. D., Norbeck, A. D., Irsigler, A., LaRue, H., Brzeski, J., McGinnis, K., Ivashuta, S., Pasa-Tolic, L., Chandler, V. L. & Pikaard, C. S. (2014) Functional diversification of maize RNA poly- merase IV and V subtypes via alternative catalytic subunits. Cell Reports 9, 378–390. 27. Haag, J. R., Ream, T. S., Marasco, M., Nicora, C. D., Norbeck, A. D., Pasa-Tolic, L. & Pikaard, C. S. (2012) In vitro transcription activities of Pol IV, Pol V, and RDR2 reveal coupling of Pol IV and RDR2 for dsRNA synthesis in plant RNA silencing. Molecular Cell 48, 811–818. 28. Hagemann, R. & Berg, W. (1978) Paramutation at the sulfurea locus of Lycopersicon esculentum Mill. : VII. Determination of the time of occurrence of paramutation by the quantitative evaluation of the variegation. Theoretical and Applied Genetics 53, 113–123. 29. Hale, C. J., Erhard, K. F. Jr., Lisch, D. & Hollick, J. B. (2009) Production and pro- cessing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genetics 5, e1000598. 30. Hale, C. J., Stonaker, J. L., Gross, S. M. & Hollick, J. B. (2007) A novel Snf2 protein maintains trans-generational regulatory states established by paramutation in maize. PLoS Biology 5, e275. 31. Herr, A. J., Jensen, M. B., Dalmay, T. & Baulcombe, D. C. (2005) RNA polymerase IV directs silencing of endogenous DNA. Science 308, 118–120. 32. Hollick, J. B. & Gordon, M. P. (1993) A poplar tree proteinase inhibitor-like gene promoter is responsive to wounding in transgenic tobacco. Plant Molecular Biology 22, 561–572. 33. Hollick, J. B., Patterson, G. I., Coe, E. H. Jr., Cone, K. C. & Chandler, V. L. (1995) Allelic interactions heritably alter the activity of a metastable maize Pl allele. Genetics 141, 709–719. 34. Hollick, J. B. (2012) Paramutation: a trans-homolog interaction affecting heritable gene regulation. Current Opinion in Plant Biology 15, 536–543. 35. Hollick, J. B., Kermicle, J. L. & Parkinson, S. E. (2005) Rmr6 maintains meiotic inheritance of paramutant states in Zea mays. Genetics 171, 725–740. 36. Hollister, J. D., Smith, L. M., Guo, Y.-L., Ott, F., Weigel, D. & Gaut, B. S. (2011) Transposable elements and small RNAs contribute to gene expression divergence be- tween Arabidopsis thaliana and Arabidopsis lyrata. Proceedings of the National Academy of Sciences 108, 2322–2327. 37. Huang, Y., Kendall, T. & Mosher, R. A. (2013) Pol IV-dependent siRNA production is reduced in Brassica rapa. Biology 2, 1210–1223. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 111

38. Javelle, M., Klein-Cosson, C., Vernoud, V., Boltz, V., Maher, C., Timmermans, M., Depege-Fargeix, N. & Rogowsky, P. M. (2011) Genome-wide characterization of the HD- ZIP IV transcription factor family in maize: preferential expression in the epidermis. Plant Physiology 157, 790–803. 39. Jia, Y., Lisch, D. R., Ohtsu, K., Scanlon, M. J., Nettleton, D. & Schnable, P. S. (2009) Loss of RNA-dependent RNA polymerase 2 (RDR2) function causes widespread and unexpected changes in the expression of transposons, genes, and 24-nt small RNAs. PLoS Genetics 5, e1000737. 40. Kashkush, K. & Khasdan, V. (2007) Large-scale survey of cytosine methylation of retrotransposons and the impact of readout transcription from long terminal repeats on expression of adjacent rice genes. Genetics 177, 1975–1985. 41. Kermicle, J. L. in Epigenetic Mechanisms of Gene Regulation (eds Russo, V. E. A., Martienssen, R. A. & Riggs, A. D.) 267–287 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1996). 42. Khaitova, L. C., Fojtova, M., Krizova, K., Lunerova, J., Fulnecek, J., Depicker, A. & Kovarik, A. (2011) Paramutation of tobacco transgenes by small RNA-mediated transcriptional gene silencing. Epigenetics 6, 650–660. 43. Kruesi, W. S., Core, L. J., Waters, C. T., Lis, J. T. & Meyer, B. J. (2013) Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808. 44. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. (2009) Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25. 45. Law, J. A., Vashisht, A. A., Wohlschlegel, J. A. & Jacobsen, S. E. (2011) SHH1, a homeodomain protein required for DNA methylation, as well as RDR2, RDM4, and chromatin remodeling factors, associate with RNA polymerase IV. PLoS Genetics 7, e1002195. 46. Li, Q., Eichten, S. R., Hermanson, P. J., Zaunbrecher, V. M., Song, J., Wendt, J., Rosenbaum, H., Madzima, T. F., Sloan, A. E., Huang, J., Burgess, D. L., Richmond, T. A., McGinnis, K. M., Meeley, R. B., Danilevskaya, O. N., Vaughn, M. W., Kaeppler, S. M., Jeddeloh, J. A. & Springer, N. M. (2014) Genetic perturbation of the maize methylome. The Plant Cell. 47. Li, S., Vandivier, L. E., Tu, B., Gao, L., Won, S. Y., Li, S., Zheng, B., Gregory, B. D. & Chen, X. (2015) Detection of Pol IV/RDR2-dependent transcripts at the genomic scale in Arabidopsis reveals features and regulation of siRNA biogenesis. Genome Research 25, 235–245. 48. Louwers, M., Bader, R., Haring, M., Driel, R. v., Laat, W. d. & Stam, M. (2009) Tissue- and expression level-specific chromatin looping at maize b1 epialleles. The Plant Cell 21, 832–842. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 112

49. Luo, J. & Hall, B. D. (2007) A multistep process gave rise to RNA Polymerase IV of land plants. Journal of Molecular Evolution 64, 101–112. 50. Martin, J. A., Johnson, N. V., Gross, S. M., Schnable, J., Meng, X., Wang, M., Coleman- Derr, D., Lindquist, E., Wei, C.-L., Kaeppler, S., Chen, F. & Wang, Z. (2014) A near complete snapshot of the Zea mays seedling transcriptome revealed from ultra-deep sequencing. Scientific Reports 4, 4519. 51. Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequenc- ing reads. EMBnet.journal 17, 10–12. 52. Matzke, M. A., Kanno, T. & Matzke, A. J. (2015) RNA-directed DNA methylation: the evolution of a complex epigenetic pathway in flowering plants. Annual Review of Plant Biology 66, 9.1–9.25. 53. Matzke, M. A. & Mosher, R. A. (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics 15, 394–408. 54. Matzke, M., Kanno, T., Huettel, B., Daxinger, L. & Matzke, A. J. M. (2007) Targets of RNA-directed DNA methylation. Current Opinion in Plant Biology 10, 512–519. 55. Maxwell, C. S., Kruesi, W. S., Core, L. J., Kurhanewicz, N., Waters, C. T., Lewarch, C. L., Antoshechkin, I., Lis, J. T., Meyer, B. J. & Baugh, L. R. (2014) Pol II docking and pausing at growth and stress genes in C. elegans. Cell Reports 6, 455–466. 56. McClintock, B. (1951) Chromosome organization and genic expression. Cold Spring Harbor Symposia on Quantitative Biology 16, 13–47. 57. Morales-Ruiz, T., Ortega-Galisteo, A. P., Ponferrada-Marin, M. I., Martinez-Macias, M. I., Ariza, R. R. & Roldan-Arjona, T. (2006) DEMETER and REPRESSOR OF SILENCING 1 encode 5-methylcytosine DNA glycosylases. Proceedings of the National Academy of Sciences 103, 6853–6858. 58. Mosher, R. A., Schwach, F., Studholme, D. & Baulcombe, D. C. (2008) PolIVb influ- ences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proceedings of the National Academy of Sciences 105, 3145–3150. 59. Nobuta, K., Lu, C., Shrivastava, R., Pillay, M., De Paoli, E., Accerbi, M., Arteaga- Vazquez, M., Sidorenko, L., Jeong, D.-H., Yen, Y., Green, P. J., Chandler, V. L. & Meyers, B. C. (2008) Distinct size distribution of endogenous siRNAs in maize: evidence from deep sequencing in the mop1-1 mutant. Proceedings of the National Academy of Sciences 105, 14958–14963. 60. Onodera, Y., Haag, J. R., Ream, T., Nunes, P. C., Pontes, O. & Pikaard, C. S. (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120, 613–622. 61. Parkinson, S. E., Gross, S. M. & Hollick, J. B. (2007) Maize sex determination and abaxial leaf fates are canalized by a factor that maintains repressed epigenetic states. Developmental Biology 308, 462–473. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 113

62. Pilu, R., Panzeri, D., Cassani, E., Badone, F. C., Landoni, M. & Nielsen, E. (2008) A paramutation phenomenon is involved in the genetics of maize low phytic acid1-241 (lpa1-241 ) trait. Heredity 102, 236–245. 63. Pontier, D., Yahubyan, G., Vega, D., Bulski, A., Saez-Vasquez, J., Hakimi, M.-A., Lerbs- Mache, S., Colot, V. & Lagrange, T. (2005) Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA poly- merases IV in Arabidopsis. Genes and Development 19, 2030–2040. 64. Quinlan, A. R. & Hall, I. M. (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. 65. Rassoulzadegan, M., Grandjean, V., Gounon, P., Vincent, S., Gillot, I. & Cuzin, F. (2006) RNA-mediated non-Mendelian inheritance of an epigenetic change in the mouse. Nature 441, 469–474. 66. Ream, T. S., Haag, J. R., Wierzbicki, A. T., Nicora, C. D., Norbeck, A. D., Zhu, J.-K., Hagen, G., Guilfoyle, T. J., Pasa-Tolic, L. & Pikaard, C. S. (2009) Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II. Molecular Cell 33, 192–203. 67. Rougvie, A. E. & Lis, J. T. (1988) The RNA polymerase II molecule at the 5’ end of the uninduced hsp70 gene of D. melanogaster is transcriptionally engaged. Cell 54, 795–804. 68. Sabin, L. R., Delas, M. J. & Hannon, G. J. (2013) Dogma derailed: the many influences of RNA on the genome. Molecular Cell 49, 783–794. 69. Schnable, P. S. et al. (2009) The B73 maize genome: complexity, diversity, and dynam- ics. Science 326, 1112–1115. 70. Seila, A., Core, L. J., Lis, J. T. & Sharp, P. A. (2009) Divergent transcription: a new feature of active promoters. Cell Cycle 8, 2557–2564. 71. Seila, A. C., Calabrese, J. M., Levine, S. S., Yeo, G. W., Rahl, P. B., Flynn, R. A., Young, R. A. & Sharp, P. A. (2008) Divergent transcription from active promoters. Science 322, 1849–1851. 72. Shirayama, M., Seth, M., Lee, H.-C., Gu, W., Ishidate, T., Conte, D. & Mello, C. C. (2012) piRNAs initiate an epigenetic memory of non-self RNA in the C. elegans germline. Cell 150, 65–77. 73. Sidorenko, L. V. & Peterson, T. (2001) Transgene-induced silencing identifies sequences involved in the establishment of paramutation of the maize p1 gene. The Plant Cell 13, 319–335. 74. Sidorenko, L., Dorweiler, J. E., Cigan, A. M., Arteaga-Vazquez, M., Vyas, M., Kermicle, J., Jurcin, D., Brzeski, J., Cai, Y. & Chandler, V. L. (2009) A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genetics 5, e1000725. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 114

75. Sloan, A. E., Sidorenko, L. & McGinnis, K. M. (2014) Diverse gene silencing mecha- nisms with distinct requirements for RNA polymerase subunits in Zea mays. Genetics, genetics.114.168518. 76. Soderlund, C., Descour, A., Kudrna, D., Bomhoff, M., Boyd, L., Currie, J., Angelova, A., Collura, K., Wissotski, M., Ashley, E., Morrow, D., Fernandes, J., Walbot, V. & Yu, Y. (2009) Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genetics 5, e1000740. 77. Stam, M., Belele, C., Dorweiler, J. E. & Chandler, V. L. (2002) Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes and Development 16, 1906–1918. 78. Stonaker, J. L., Lim, J. P., Erhard, K. F. Jr. & Hollick, J. B. (2009) Diversity of Pol IV function is defined by mutations at the maize rmr7 Locus. PLoS Genetics 5, e1000706. 79. Stroud, H., Greenberg, M. V. C., Feng, S., Bernatavichute, Y. V. & Jacobsen, S. E. (2013) Comprehensive analysis of silencing mutants reveals complex regulation of the arabidopsis methylome. Cell 152, 352–364. 80. Swiezewski, S., Crevillen, P., Liu, F., Ecker, J. R., Jerzmanowski, A. & Dean, C. (2007) Small RNA-mediated chromatin silencing directed to the 3’ region of the Arabidopsis gene encoding the developmental regulator, FLC. Proceedings of the National Academy of Sciences 104, 3633–3638. 81. Tucker, S. L., Reece, J., Ream, T. S. & Pikaard, C. S. (2011) Evolutionary history of plant multisubunit RNA polymerases IV and V: subunit origins via genome-wide and segmental gene duplications, retrotransposition, and lineage-specific subfunctionaliza- tion. Cold Spring Harbor Symposia on Quantitative Biology 75, 285–297. 82. Walker, E. L. (1998) Paramutation of the r1 locus of maize is associated with increased cytosine methylation. Genetics 148, 1973–1981. 83. Wang, Q. & Dooner, H. K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proceedings of the National Academy of Sciences 103, 17644–17649. 84. Wang, X. et al. (2011) The genome of the mesopolyploid crop species Brassica rapa. Nature Genetics 43, 1035–1039. 85. Woodhouse, M. R., Freeling, M. & Lisch, D. (2006) Initiation, establishment, and main- tenance of heritable MuDR transposon silencing in maize are mediated by distinct factors. PLoS Biology 4, e339. 86. Zhang, X., Henderson, I. R., Lu, C., Green, P. J. & Jacobsen, S. E. (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proceedings of the National Academy of Sciences 104, 4536–4541. CHAPTER 3. POL IV AFFECTS NASCENT TRANSCRIPTION 115

87. Zhong, X., Hale, C. J., Law, J. A., Johnson, L. M., Feng, S., Tu, A. & Jacobsen, S. E. (2012) DDR complex facilitates global association of RNA polymerase V to promoters and evolutionarily young transposons. Nature Structural and Molecular Biology 19, 870–875. 116

Chapter 4

Identification and characterization of a putative Pl1-Rhoades enhancer and paramutation element

4.1 Introduction

Paramutation describes both the process and result of meiotically heritable directed epige- netic changes facilitated by interactions between alleles on homologous chromosomes (Brink 1973). Examples of paramutation and paramutation-like behaviors have been identified in several eukaryotes (Chandler & Stam 2004, Gabriel & Hollick 2015) but have been best described in maize, where the behavior was first identified through interactions between dif- ferent alleles of the red1 (r1 ) locus that regulates kernel pigmentation (Brink 1956, Brink 1958). Several other transcription factor loci governing maize pigmentation have alleles sus- ceptible to paramutation (paramutable alleles; Coe 1959, Sidorenko & Peterson 2001, Hollick et al. 1995). The purple plant1 (pl1 ) gene encodes a MYB-like transcription factor involved in pat- terning anthocyanin production in vegetative tissues and pollen-producing anthers (Cone et al. 1993a, Cone et al. 1993b). The Pl1-Rhoades (Pl1-Rh) allele exists in a continuum of RNA expression levels that correlate with the level of anther pigmentation (Hollick et al. 1995). Its highly expressed state (Pl-Rh) is paramutable, while its low expression state (Pl ′) is paramutagenic: when present in heterozygous condition, Pl ′ facilitates paramutation of Pl-Rh to a Pl ′ state (Hollick et al. 1995). The mechanism of paramutation is broadly described as a trans-homolog interaction (THI) because conceptually the paramutagenic and paramutable alleles residing on homolo- gous chromosomes must recognize each other and share silencing information. Two prevailing THI models invoke either direct interaction or a diffusible substance to transmit the repres- sive epigenetic information between homologous chromosomes. In both models, cis-linked sequences and trans-acting factors are required. Structural analysis of paramutable and CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 117 paramutagenic alleles of r1, booster1 (b1 ) and pericarp color1 (p1 ) identify tandem repeats as a common feature (Stam et al. 2002a, Kermicle et al. 1995, Goettel & Messing 2013). However, the repeats are different at each locus. Several trans-acting proteins have been identified through screens for the maintenance of paramutations at pl1 and b1 (Dorweiler et al. 2000, Hollick & Chandler 2001). Thus far, these proteins belong to a pathway that pro- cesses DNA-dependent RNA polymerase IV (Pol IV) transcripts into 24-nucleotide (24-nt) RNAs that guide de novo cytosine methylation (Alleman et al. 2006, Hale et al. 2007, Erhard et al. 2009, Stonaker et al. 2009, Sidorenko et al. 2009, Barbour et al. 2012). These cis and trans components still need to be unified into a complete molecular model of paramutation. Currently, the B1-Intense (B1-I ) allele has the most detailed molecular model of para- mutation. A transcriptional enhancer ∼100 kb upstream of the coding region is composed of seven near-identical tandem repeats (heptarepeat) of an 853 bp unique sequence (Stam et al. 2002a). The repeat copy number directly correlates with transcriptional activity, para- mutability, and, once silenced, its paramutagenicity (Stam et al. 2002b). In other b1 null alleles, a single copy of this repeat unit can be found upstream. 24-nt RNAs align to the heptarepeat, but the 24-nt RNA abundance does not vary between the paramutagenic (B ′) or paramutable (B-I ) states of B1-I (Arteaga-Vazquez et al. 2010). Furthermore, most of these 24-nt RNAs can be identified in neutral alleles that do not participate in paramutation but carry a single repeat unit (Arteaga-Vazquez et al. 2010). Transgenic heptarepeats can facilitate paramutations even when in a non-allelic location and also serve as a transcriptional enhancer for an associated reporter (Belele et al. 2013). Belele and colleagues tested several different heptarepeat-based transgenes, and while all of their transgene constructs produced 24-nt RNAs, not all of them could facilitate paramutation (Belele et al. 2013), which is consistent with their previously identified insufficiency (Arteaga-Vazquez et al. 2010). Yet, loss of the RNA-dependent RNA polymerase (RDR2) which helps generate 24-nt RNAs pre- vented transgene-induced paramutation (Belele et al. 2013). The 24-nt RNAs are clearly not sufficient for paramutation, but RDR2 is required, indicating that certain 24-nt RNAs may be a necessary component. However, other 24-nt RNA biogenesis components are not required to establish paramutation at other loci (Hale et al. 2007, Barbour et al. 2012). The different requirements for other 24-nt RNA biogenesis machinery and the physical coupling of RDR2 and Pol IV (Haag et al. 2014) to make Pol IV complexes, which could play a role in paramutation that results in 24-nt RNAs as bioproducts still need to be addressed. The B ′ state is exceptionally stable, however paramutations at other loci can gradually revert to the expressed state over time (Styles & Brink 1969, Hollick & Chandler 1998). When heterozygous with a deficiency, Pl ′ will heritably revert to an expressed Pl-Rh state (Hollick & Chandler 1998) indicating that THI are also important for maintaining paramu- tations. The sequences required for maintenance THI can be present in non-paramutation participating alleles: certain neutral pl1 alleles can stabilize Pl ′ in trans (stabilizing alle- les; Hollick & Chandler 1998, Gross & Hollick 2007). When heterozygous with Pl ′, other neutral pl1 alleles (termed amorphic alleles) will trigger the reversion of Pl ′ indicating that these amorphic alleles lack either the sequences required for the stabilizing THI or an ability to recruit the necessary trans-acting proteins (Hollick & Chandler 1998, Gross & Hollick CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 118

2007). Because many of the amorphic alleles have little anther pigment, the combination of amorphic pl1 and Pl ′ parental alleles resulting in a highly pigmented pl1 / Pl-Rh individual is an example of overdominance (Hollick & Chandler 1998). Understanding the molecular interactions (or lack thereof) between amorphic alleles and Pl ′ underlying these overdom- inance behaviors may provide insight into the mechanisms governing hybrid vigor where hybrids exhibit phenotypes not found in their parents (Shull 1948). The protein required to maintain repression1 (RMR1), which is dispensable for establishing paramutations, is re- quired to maintain existing Pl ′ states through meiosis (Hollick & Chandler 2001, Hale et al. 2007). RMR2, on the other hand, is partially required to establish new paramutations but dispensable for their meiotic maintenance (Barbour et al. 2012). Together, these Pl ′ mainte- nance behaviors highlight a THI that alone does not enable paramutation and highlights an overdominance interaction with Pl ′ when those THI sequences are absent in one pl1 allele. While ∼16 kb of the Pl1-Rh allele have been sequenced and assembled (Gross 2007), this data does not identify a structural element like those observed at other paramutable loci sufficient for paramutation. The recombinant allele pl1-R30 places a functional paramutation element at least 12 kb downstream of the Pl1-Rh coding region (Erhard et al. 2013). pl1- R30 along with other paramutation deficient Pl1-Rh derivative alleles indicate that the paramutation element might, like the B1-I heptarepeat, also be a transcriptional enhancer as these alleles have decreased pigmentation capacity (Gross & Hollick 2007, Gross 2007, J. B. Hollick, unpublished data). Here, I identify, characterize, and evaluate a candidate enhancer and paramutation element composed of five identical tandem repeats of 2,092 bp each approximately 14 kb downstream of the Pl1-Rh coding region. This pentarepeat is both a target of Pol II transcription and Pol IV-dependent 24-nt RNAs, and it represents a new cis-linked element motivating detailed molecular study in Pl1-Rh paramutation behaviors.

4.2 Materials and Methods

Genetic Stocks Unless otherwise noted, plants for molecular materials were grown under standard field conditions to maturity or to the seedling stage in the greenhouse on flats of potting soil. In the field, anther color score (ACS) and pollen fertility were evaluated as previously described (Hollick et al. 2005). Plant materials for Southern blot and RT-PCR analysis were harvested from individuals homozygous for the particular pl1 allele. The family and progeny numbers defining these tissue sources can be found in Table 4.1 on page 132. For the visualization of small RNAs with ethidium bromide staining, plants segregating for Rpd1-B73 and rpd1-1 alleles (formerly rmr6-1 ; Erhard et al. 2009) introgressed into the B73 background as previously described (Erhard et al. 2015) were crossed by Mo17 inbreds (Rpd1-Mo17 / Rpd1-Mo17 ) to create homozygous wild-type (progeny 120617), rpd1-1 / Rpd1-Mo17 (progeny 120597), Rpd1-Mo17 / rpd1-1 (progeny 121377) progenies, in addition to a progeny of rpd1-1 / rpd1-1 homozygous individuals (progeny 092209). Kernels from CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 119 these four progenies were germinated and grown to the 7-day seedling stage in lab on flats of vermiculite and sand. Immature cobs were collected from field grown progeny of either B73 and Mo17 inbred lines crossed to produce the Rpd1 / Rpd1 individuals (13-1399-30 and 13-1399-34) or from crosses of rpd1-1 ^B73 by Mo17 to produce rpd1-1 / Rpd1 individuals (13-1400-5 and 13-1400-18).

BAC clone identification and sequencing Genomic DNA from plants homozygous for Pl1-Rhoades in a 99.25% A619 genetic back- ground were submitted to Amplicon Express (Pullman, WA) to construct a BAC library. The resulting clones were screened (J. B. Hollick, unpublished data) through hybridization to the 1.1 kb XhoI fragment derived from pl1-Tx303 (Cone et al. 1993a). Pl1-Rh-containing positive clones were end sequenced (I. T. Liao, unpublished data). One BAC clone (plate 60, position P21) was prepared and sequenced on the Illumina HiSeqII platform (Vincent J. Coates Genome Sequencing Laboratory, University of California, Berkeley, CA). Later this clone and seven others that also tested positive for containing Pl1-Rh were prepared and sequenced on the Pacific Biosciences (PacBio) platform (The Genome Analysis Centre, TGAC, Norwich, UK: C. Watkins, D. Heavens and M. Caccamo, unpublished data).

Assembly of Pl1-Rhoades haplotype Illumina sequencing of a Pl1-Rh containing BAC produced approximately ∼210 million 100 bp paired-end reads. These reads were trimmed for adapter sequences with CutAdapt for paired-end reads (Martin 2011) and reads with low quality (Phred < 30) or ambiguous (N) bases were trimmed, keeping any reads of 90 nt or longer. I had problems with the high read coverage (estimated at over 100,000x) producing very short contigs during ini- tial assembly trials. To simplify the dataset and reduce noise from sequencing errors, I ran the reads through a digital normalization pipeline (normalize-by-median.py in the khmer suite version 1.0, github.com/ged-lab/khmer/, Brown et al. 2012, Crusoe et al. 2014) with a k-mer size of 30 (--ksize 30) and a cutoff depth of 30 (--cutoff 30). Contaminating bacterial reads were removed through alignment with K-12 DH10B E. coli chromosome (hep- tamer.tamu.edu/fgb2/gbrowse/DH10B, NC_010473) using the Map to Reference feature of Geneious (Biomatters, Auckland, New Zealand) allowing a maximum of 10% gaps, 15 nt gap size per read, and maximum mismatches per read of 20%. The resulting reads were assembled with Velvet (version 1.2.10, https://www.ebi.ac.uk/∼zerbino/velvet/, Zerbino & Birney 2008) using a kmer size of 30 (velveth 30). The final velvetg assembly used an empirically determined expected coverage of 50 (-exp_cov 50) and an insert length of 550 (-ins_length 550) to produce 8,905 nodes with an n50 of 1,150 and median node size of 4,704 bp. The largest node was 14,970 bp. Nodes larger than 400 bp were roughly aligned with the Map to Reference feature (Geneious) with default parameters. NODE_3378 (4,948 bp) extended the original ∼16 kb Pl1-Rh 5 ′ reference by 2.4 kb, and NODE_5315 (11,815 bp) extend the 3 ′ reference by 11.2 kb to produce an extended reference (∼30 kb CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 120

Pl1-Rh reference). Manual alignment in Geneious compared the ∼30 kb Pl1-Rh reference to the B73 genome. PacBio sequencing of the Pl1-Rh BACs was returned as partial or complete linear contigs from TGAC. Where possible, the BACs were circularized by complementary ends, and then re-linearized by removal of the BAC backbone sequence. Smaller contigs were divided by removing the BAC backbone if present. The resulting genomic sequences were aligned to each other and the existing ∼30 kb Pl1-Rh reference using Geneious for both automated (Map to Reference feature) and manual alignment. Together, these BAC sequences ex- tended the ∼30 kb Pl1-Rh reference sequence to ∼312 kb, with six independent BACs supporting the sequence over the pl1 coding region. A combination of BLASTN (Altschul et al. 1990), GEvo (Lyons & Freeling 2008) and manual alignment in Geneious were used to compare the Pl1-Rh reference to the B73 genome assembly.

Computational small RNA analysis Reads representing small RNAs from 4 cm immature cobs +/- RPD1 (I. T. Liao, S. Simon, B. Meyers and J. B. Hollick, unpublished data) and +/- RMR2 (Barbour et al. 2012) were obtained as tab-delimited files of high-quality trimmed sequences (tags) and tag abundances per library (rpd1 and rmr2 libraries). The remaining libraries (heterosis libraries, I. T. Liao and J. B. Hollick, unpublished data) were obtained in raw FASTQ format. These were trimmed for adapter sequence with CutAdapt (Martin 2011) keeping only trimmed reads (--discard-untrimmed). Any trimmed reads with low quality (phred <30) or ambiguous (N) bases were excluded. Duplicate reads were then collapsed across all of the heterosis libraries to create a table of distinct reads with their abundances in each library (GEO ac- cession GSE52103, supplemental file GSE52103_smallRNA_tag_abundance.csv.gz). Tables of read sequence and abundances were converted to FASTA files maintaining the abundance in the sequence name. These FASTA files were aligned to the Pl1-Rh reference with Bowtie (version 0.12.7, Langmead et al. 2009) allowing up to two mismatches (-v 2) and returning all possible alignments (-a). Awk commands (Appendix 2) were used to create new align- ment files by removing alignments outside of the pentarepeat region or involving non-24-nt reads. Then the abundance information within the read name was extracted and used to generate custom SAM alignment files for each library. These were then visualized with the Integrative Genomics Viewer (IGV, Robinson et al. 2011, Thorvaldsdottir et al. 2013). The uniqueness index was created for the Pl1-Rh repeat unit to give an approximation of the repetitiveness of any 24-nt RNA that aligned to it. To calculate the uniqueness index, the reference sequence around the first Pl1-Rh repeat unit was divided into all possible 24mers with 1 nt steps. These 24mers were then aligned to the B73 genome (version 2, build 5b) using Bowtie (version 0.12.7) allowing up to two mismatches (-v 2) and returning all possible alignments (-a). The number of alignments for each 24mer were tallied and then plotted as a heatmap using the image package in R (version 2.15.2; http://www.r-project.org/). Comparison of heterosis libraries over chromosomes 6 and 7, started with the same high- quality reads described above. These were aligned to the B73 genome (version 2, build CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 121

5b) using Bowtie. To identify the genetically different 50 Mb window on chromosome 6, Bowtie (version 0.12.7) alignments required perfect matches (-v 0), while the later anal- yses allowed up to two mismatches (-v 2). In both cases, all alignments were reported (-a) and the count of the total alignments to the genome was counted and appended to each SAM-formatted alignment entry as the NH:i:## tag (Appendix 2). I wrote the window_feature.py script (github.com/HollickLab/window_feature) to tally the reads per bin accounting for both NH:i:## mapping frequency and custom NA:i:## read abundance tags. For entire chromosomes, window_feature.py counted reads in 1 Mb windows with 10 kb sliding (--window 1000000 --step 10000), while subsequent analysis on the 50 Mb sections of each chromosome used 1000 bp windows with 200 bp slide (-w 1000 -s 200). In both cases, abundances were extracted from NA:i:## tags and normalized by the number of alignments for that read (abundance / number of alignments = normalized abundance at that location; --abundance --normalize). The resulting counts tables were then plotted in R (2.15.2; http://www.r-project.org/) with its base plot package.

Visualization of small RNAs with ethidium bromide Total RNA was extracted from 8-day-old seedling shoots and leaves or immature 4.5 cm cobs and fractionated as described (Gross & Hollick 2007), except the tissue was initially pulverized in a pre-chilled coffee grinder with 3-4 pieces of dry ice prior to transfer to a pre- chilled mortar and pestle. The resulting low molecular weight RNAs (100 µg per sample) were separated on an 8M urea 15% polyacrylamide gel at 22.5 V/cm for ∼5 hours and stained with ethidium bromide.

Southern blots The unique subrepeat probe was amplified from Pl1-Rh-containing BAC DNA using the pentarepeat-probeFa and pentarepeat-probeR primers (Table 4.2 on page 133). The resulting 386 bp fragment was purified by gel extraction with the QIAEX II kit (QIAGEN, Venlo, Limburg), cloned into the pGEM-T Easy Vector System (Promega, Madison, WI), and transformed into TOP10 One-shot competent cells (Life Technologies, Carlsbad, CA). Subsequent carbenicillin resistant colonies were verified by PCR and capillary sequencing (Plant-Microbe Genomics Facility, OSU, Columbus, OH). The 32P-radiolabelled probe was made in a Klenow amplification with random hexamer primers from gel extracted PCR amplicon of the plasmid template. Genomic DNA for Southern blots was isolated from 2-6 g of flag leaf tissue or ∼3 g of 10- day old seedling shoot and leaves (3-4 seedlings) following the genomic DNA extraction from flag leaf protocol previously described (Erhard et al. 2009), except the samples were subjected to RNaseA/TI digestion for 1 hour at 30◦C prior to the final isopropanol precipitation and recovery. For each sample, 5 µg of total genomic DNA was digested overnight with the indicated restriction enzyme (New England Biolabs, Ipswich, MA) and RNaseA. The digestions were size separated on a 1% agarose gel at ∼40 V for ∼22 hours. The genomic CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 122

DNA was transferred to Hybond-N membrane (GE Healthcare, Buckinghamshire, UK), UV crosslinked, hybridized and probed as previously described (Erhard et al. 2009). Membranes were then exposed to a phosphor screen for four days and detected with a phosphorimaging device.

RNA extractions and RT-PCR Florets were collected at anthesis immediately above or below the florets actively shedding pollen and flash frozen in liquid nitrogen prior to storage at -80◦C. Four florets per sample were ground to a fine powder in a mortar and pestle with liquid nitrogen. Total RNA was then extracted with 500 µL TRIzol (Invitrogen, Waltham, MA) following the manufacturer’s protocol. 2 µg of total RNA was DNaseI (Roche) treated for 20 minutes at 37◦C, and heat inactivated (75◦C for 10 minutes) in 5 mM EDTA. 500 ng of the DNase treated RNAs were converted to cDNA with the Tetro cDNA synthesis kit (Bioline, Taunton, MA) following the manufacturer’s protocol and using the oligo-dT primer. Template RNAs were then degraded with an RNaseA/TI/H cocktail. RT-PCR reactions used an aliquot of cDNA representing 40 ng of starting total RNA. The unique subrepeat (pentarepeat-probeFb and pentarepeat-probeR primers, Table 4.2 on page 133) and alanine aminotransferase (aat) control (alt203_FP1 and aatR primers, Table 4.2 on page 133) were amplified for 35 cycles of 95◦C for 30 seconds, 58◦C for 30 seconds, 72◦C for 90 seconds. The pl1 coding region was amplified with (pl1_FP2 and phi031R primers, Table 4.2 on page 133) for 35 cycles of 95◦C for 30 seconds, 60◦C for 30 seconds, 72◦C for 90 seconds. The resulting products were separated on an agarose gel and visualized with ethidium bromide.

4.3 Results

The Pl1-Rhoades haplotype shares many conserved regions with pl1-B73 The genetically identified enhancer and paramutation element(s) (Erhard et al. 2013) lies outside of the existing ∼16 kb of known Pl1-Rh sequence (Gross 2007). To extend this Pl1-Rh reference, several BACs were sequenced containing the Pl1-Rh allele introgressed into an A619 inbred line (Figure 4.1A on page 134). I assembled short-read Illumina sequencing data from one BAC, which doubled the Pl1-Rh sequence and produced several other contigs that could be positioned via alignment to the B73 genome. However, subsequent long-read Pacific Biosciences sequencing was needed to fully assemble four BACs and partially assemble four others (The Genome Analysis Centre, Norwich, UK: C. Watkins, D. Heavens and M. Caccamo, unpublished data). Six of these BACs have contigs that align to the ∼30 kb Pl1-Rh reference, extending it to ∼180 kb with support by at least two independent BACs. A further ∼132 kb of sequence (11kb upstream and 121 kb downstream) has support from CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 123 single BACs, although the downstream portion relies on similarity to the B73 reference to position several smaller contigs. To identify potential sequences distinct to the Pl1-Rh reference (Rhoades reference), I aligned it to the pl1 region of the B73 genome. The pl1-B73 allele native to the B73 inbred line stabilizes Pl ′ states (Gross & Hollick 2007). Therefore, sequences present in both alleles, but in different copy number or location were identified as potential THI loci (Figure 4.1B on page 134). GEvo analysis (Lyons & Freeling 2008) of these two sequences identify several regions of high sequence similarity (shaded connectors in Figure 4.1B on page 134). The duplications between upstream gene model GRMZM5G850129 and pl1 were extended from their initial designation with the ∼16 kb Pl1-Rh reference (Gross 2007). Although these duplications may be necessary for paramutation, they were excluded from subsequent analysis as a putative enhancer/paramutation element as they are not downstream of pl1 (Gross 2007, Erhard et al. 2013). Five identical tandem repeats of 2,092 bp each produce a long insert of directed repeats just prior to the AY530952.1_FG001 gene model in the Pl1-Rh reference (purple shading in Figure 4.1B on page 134). The current B73 genome assembly contains a single copy of a similar repeat unit. No other multi-copy tandem repeats are present in the entire contig, although Type I long-terminal repeat (LTR) retrotransposons imbedded within and near each other present other complicated repeat structures further up and downstream (such as the Huck1 LTR highlighted in green in Figure 4.1B on page 134). The pentarepeat represents the best candidate for an enhancer/paramutation element because it is: 1) downstream of pl1 in agreement with genetic data (Erhard et al. 2013); 2) present in at least single copy in the stabilizing pl1-B73 sequence that could represent the stabilizing cis-regulatory feature (Hollick & Chandler 1998, Gross & Hollick 2007); and 3) its high copy number of repeats may be analogous to the multiple copies of other sequences important for r1 and b1 paramutation (Kermicle 1996, Stam et al. 2002a). Therefore, subsequent analysis focuses on the structural and functional definition of these pentarepeats.

The pentarepeat includes both unique and TE-like sequences Five identical 2,092 bp repeat units make up the Pl1-Rh pentarepeat, which is distinct from the entirely unique 853 bp B1-I repeat unit. In particular, each Pl1-Rh repeat contains non-unique sequences with similarity to Type I LTR, Type II and helitron transposable ele- ment (TE) sequences (Figure 4.2 on page 135). Flanked by an unclassified non-autonomous Type II TE fragment and a Harbinger Type II TE fragment is a sequence of 390 bp that, like the B1-I repeats, is unique in the B73 genome. BLAST (Altschul et al. 1990) of the unique subrepeat to the NCBI nucleotide collection identifies only a similar unique subrepeat from the pl1-B73 allele in the B73 genome. The single pl1-B73 repeat unit is slightly longer due to two insertions, one of which includes another Harbinger fragment and an unclassified non-autonomous Type II TE fragment. Because of its singular nature, the unique subrepeat was used to design a probe for Southern blot analysis, which confirmed the presence of a pentarepeat fragment in the BAC that was incompletely assembled over the pentarepeat (J. R. B. Talbot and E. Beck, unpublished data; Figure 4.3 on page 136). Together these CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 124 analyses confirm the pentarepeat structure and divide each repeat unit into repetitive and unique subrepeats.

Pl1-Rhoades derivative alleles have a structurally similar pentarepeat Derivative Pl1-Rh alleles unable to establish paramutations might be expected to be al- tered at sequences important for THIs. Three derivative alleles(pl1-mum9515, pl1-gamma9601 and pl1-mum1101 ), whose Mutator TE insertion (mum alleles) and gamma-irradiation (gamma allele) derived mutations remain unidentified, have deficiencies in paramutation behaviors (Gross & Hollick 2007, Gross 2007, J. B. Hollick, unpublished data). Tests of these paramutation behaviors use derivative allele / Pl ′ heterozygotes crossed to a Pl-Rh / Pl-Rh tester, the resulting progeny are scored for both their anther pigmentation and the presence of a structurally distinct chromosome that was carrying the original Pl ′ allele, effectively testing the paramutagenicity of both the derivative allele and the original Pl ′ allele after their exposure to each other in the heterozygote (Gross & Hollick 2007). From such tests, the pl1-mum9515 can not facilitate paramutations and can destabilize Pl ′ at low frequency (Gross & Hollick 2007). Both pl1-gamma9601 and pl1-mum1101 have decreased, though not completely absent, ability to facilitate paramutations, but only pl1-gamma9601 destabilizes Pl ′, albeit at low frequency (Gross 2007, J. B. Hollick, unpublished data). The Mutator insertions and gamma irradiation-induced lesions expected in the deriva- tive alleles are predicted to make them structurally distinct from their Pl1-Rh progenitor. Both pl1-mum9515 and pl1-gamma9601 are structurally similar to Pl1-Rh around the cod- ing region (Gross & Hollick 2007, Gross 2007). To query the structural composition of their pentarepeats, restriction enzymes predicted to cut within the repeats (BglII and EcoRV) or outside the repeats (NsiI) were used on genomic DNA isolated from flag leaves of in- dividuals homozygous for each allele. When probed with the unique subrepeat probe, all three alleles have identical or near-identical banding patterns to Pl1-Rh (Figure 4.4A-C on page 137). These results indicate that all three derivative alleles have the same (or very similar) structure within ∼20.6 kb of these repeats (Figure 4.4D on page 137). Therefore large-scale disruptions in the repeats or changes in their copy number are not responsible for the impaired paramutation phenotypes of these derivative alleles. The pl1-R30 allele represents a recombination event between the coding region of Pl1-Rh and the downstream region of pl1-B73 (Erhard et al. 2013). With its downstream pl1- B73 -like sequence, pl1-R30 is unable to participate in paramutations (Erhard et al. 2013). Although both progenitor alleles (pl1-B73 and Pl1-Rh) can stabilize Pl ′ states in trans, the pl1-R30 recombinant allows some reversions of Pl ′ to Pl-Rh (Erhard et al. 2013). Previous Southern blots (Gross 2007) and limited sequencing of SNPs identified via Illumina Pl1- Rh contigs (J. R. B. Talbot, data not shown) confined the recombination point to a ∼20 kb region downstream of the Pl1-Rh coding region, which encompasses the pentarepeats. Southern blots with the unique subrepeat probe indicate that the repeats and downstream CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 125 region is B73-like in pl1-R30 (Figure 4.4A and C on page 137), while the upstream region is slightly different from both progenitors (Figure 4.4B on page 137). The difference in EcoRV product sizes is slight, but could be explained by a recombination event within a 1.5 kb region between the pl1-B73 EcoRV site and a 359 bp B73 indel relative to the Pl1-Rh reference (4.4D on page 137). Together, the structural analysis of the Pl1-Rh derivatives indicates that large-scale structural changes of the pentarepeat itself do not underlie their deficiencies in paramutation behavior.

Diverse pl1 alleles have different structures around the unique subrepeat Natural maize diversity includes distinct pl1 alleles with varying paramutation behaviors (Hollick & Chandler 1998, Gross & Hollick 2007, Erhard et al. 2013). To identify any struc- tural diversity at the pentarepeat region that might correlate with paramutation behavior, several pl1 alleles were chosen for Southern blot analysis using the unique subrepeat probe (Figure 4.5 on page 138). Two alleles, Pl1-Rh and Pl1-CML52, participate in paramutations having both expressed and repressed states (Hollick et al. 1995, Erhard et al. 2013). Pl1- Blotched, pl1-W22 and pl1-B73 are all neutral alleles that stabilize Pl ′ in trans (Gross & Hollick 2007). Pl1-Blotched has a highly similar sequence in the coding region and upstream doppia TE fragment to Pl1-Rh, although it does not participate in paramutation (Cocciolone & Cone 1993, Hollick & Chandler 1998). Finally, pl1-Co159 and pl1-Mo17 both destabilize Pl ′ states (Hollick & Chandler 1998, Gross & Hollick 2007). All pl1 alleles tested had dis- tinct banding patterns with NsiI and BglII digestions except for Pl1-Rh and Pl1-Blotched, which had slightly different NsiI sizes, but identical BglII patterns. Pl1-CML52 has at least three copies of the unique subrepeat, although a pentarepeat like that of Pl1-Rh is possi- ble, if an additional NsiI site exists within the third repeat and a BglII site has been lost. Hybridization patterns indicate that all other alleles, including amorphic pl1-Co159 and pl1- Mo17 alleles (Hollick & Chandler 1998, Gross & Hollick 2007), have at least a single copy of the unique subrepeat. The Pl1-Blotched result indicates that the pentarepeat alone is insufficient for paramutation, although it may still be necessary. Furthermore, the presence of the unique subrepeat alone cannot be sufficient for the Pl ′-stabilizing THI, as amorphic alleles that destabilize Pl ′ also have that sequence, or one similar enough for hybridization.

Pol II transcribes the unique subrepeat The heptarepeat upstream of B1-I not only facilitates paramutation, but it also serves as a transcriptional enhancer looping to interact with the B1-I promoter (Louwers et al. 2009). The light pigmentation conferred by pl1-mum9515, pl1-gamma9601 and pl1-R30 indicate that an enhancer element may also be disrupted in these derivative alleles (Gross & Hollick 2007, Gross 2007, Erhard et al. 2013). Since enhancer elements are transcriptionally active (Core et al. 2012), the pentarepeat was tested for the production of poly-adenylated RNAs via reverse transcriptase polymerase chain reaction (RT-PCR). RNAs from florets CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 126

(which include anther and glume tissue) were converted to cDNAs with oligo-dT primers and PCR amplified with primers specific to a ∼300 bp portion of the unique subrepeat. Plants homozygous for Pl1-Rh in the Pl-Rh state produce poly-adenylated RNAs for the unique subrepeat, but such RNAs are not detected in Pl ′ plants (Figure 4.6 on page 139). Pl1-Rh mRNA abundance correlates with anthocyanin pigmentation in the anthers (Hollick et al. 1995) indicating that Pol II RNA from one or several unique subrepeats may also correlate with Pl1-Rh expression. Measuring of RNAs from florets of plants homozygous for other pl1 alleles indicates that transcription of the unique subrepeat remains correlated with anther pigmentation (Figure 4.7 on page 139). Darkly pigmented ACS 7 (Pl-Rh) anther containing florets from both Pl1-Rh and Pl1-CML52 plants have detectable poly-adenylated unique subrepeat tran- scripts. Meanwhile, their ACS1 (Pl ′) counterparts, all four Pl1-Rh derivative alleles, as well as pl1-B73 and pl1-Co159 have no detectable unique subrepeat transcripts. Pl1-Blotched florets have detectable levels of unique subrepeat RNA, although they are diminished com- pared to the ACS7 Pl1-Rh and Pl1-CML52 individuals (Figure 4.7 on page 139). Primers spanning intron 2 of the pl1 coding region were used to assay the amount of poly-adenylated pl1 transcript in the same cDNA samples. In all cases, both spliced and unspliced forms are detected. The unspliced form correlates better with anther pigmentation than the spliced form. Exons 2 and 3 are in the same frame for both pl1-B73 and Pl1-Rh, but the extra 36 bp in the pl1-B73 intron introduces two stop codons. Whether the longer PL1 protein encoded by unspliced Pl1-Rh is functional remains unclear, although the heavy bias for this unspliced mRNA and the known high levels of anthocyanin pigmentation indicate that it could be. While both stabilizing neutral alleles Pl1-Blotched and pl1-B73 have similar levels of pl1 mRNA expression, pl1-B73 does not have detectable unique subrepeat transcripts, in- dicating a disconnect between unique subrepeat and pl1 coding region expression. Together these data indicate that the unique subrepeat is transcribed, although the underlying source of variation in its transcription amongst pl1 alleles remains unclear.

Pol IV-dependent 24-nt RNAs align to the unique subrepeat Pol IV and other 24-nt RNA biogenesis proteins are required for various paramutation behaviors (Hollick & Chandler 2001, Hollick et al. 2005, Hale et al. 2007, Stonaker et al. 2009, Barbour et al. 2012). However, it remains unclear where (if at all) the Pol IV-dependent 24-nt RNAs are affecting Pl1-Rh expression. To test whether the pentarepeats might be sources and/or targets of 24-nt RNAs, small RNAs from pl1-B73 and Pl1-Rh immature cobs were deep sequenced and the 24mers were aligned to a single Pl1-Rh repeat unit (Figure 4.8 on page 140). Because all of the 24mers were aligned to a single repeat unit in isolation, the repetitiveness of the repeat was estimated based on the number of times each 24mer present in the repeat unit aligns to the B73 genome (uniqueness index). Together, the repetitiveness and 24mer pileups identify the first two peaks as likely originating from other loci, although those regions might still be targeted by the 24-nt RNAs. The third peak, over the unique subrepeat, likely represents the originating locus for those 24-nt RNAs. That peak is present CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 127 only in the Pl1-Rh immature cob, even though the pl1-B73 immature cob also contains a copy of the unique subrepeat. Furthermore, the unique subrepeat 24-nt RNAs are dependent on the largest subunit of Pol IV (RPD1) and RMR2 activity (Figure 4.8C on page 140).

Unique subrepeat 24-nt RNAs are decreased in Pl1-Rh / pl1-B73 hybrids The Pl ′ state is maintained by pl1-B73 in trans (Gross & Hollick 2007). Although 24-nt RNAs are not made from the pl1-B73 unique subrepeat they might be expected from the Pl1-Rh pentarepeat, in which case 24-nt RNA abundance in a Pl1-Rh / pl1-B73 hybrid could reflect the importance of Pol IV complexes at the unique subrepeat in maintaining Pl ′ states. 24-nt RNAs sequenced from immature cobs of parental Pl1-Rh and pl1-B73 homozygotes and their hybrids were aligned to the Pl1-Rh repeat unit (Figure 4.9 on page 141). The 24mers were also aligned to the pl1-B73 repeat unit, but showed no appreciable difference to the Pl1- Rh unique subrepeat alignment (data not shown). The unique subrepeat 24-nt RNAs align predominantly to the sense strand. The Pl1-Rh / pl1-B73 hybrid (+RMR1 hybrid) still has 24-nt RNAs aligning to the unique subrepeat, although they are diminished compared to its Pl1-Rh / Pl1-Rh parent. Another hybrid (-RMR1 hybrid) represents a similar cross between Pl1-Rh and pl1-B73 parents except that the Pl1-Rh parent was also a homozygous rmr1-1 mutant. The -RMR1 hybrid has even fewer 24-nt RNAs over the unique subrepeat than the +RMR1 hybrid. Immature cobs include both sporophytic and gametophytic tissues, and small RNAs isolated from the immature cob of an rpd1-1 / Rpd1 heterozygote has approximately half the 24-nt RNAs than from an Rpd1 / Rpd1 immature cob (Figure 4.10 on page 142). Seedling tissue where all the cells are diploid do not show a dosage effect for 24-nt RNA abundance in rpd1-1 / Rpd1 heterozygotes (Figure 4.10 on page 142). Together, these data indicate that dosage of Pl1-Rh alleles and -RMR1 gametophytes likely explain the decreases in 24-nt RNAs over the unique subrepeat in +RMR1 and -RMR1 hybrids.

Trans-generational 24-nt RNAs abundances are not influenced by large genetic differences The combination of distinct genomes leading to hybrid vigor (Shull 1948) has been corre- lated with decreases in hybrid 24-nt RNAs, particularly at loci with variable parental 24-nt RNA abundance, in both Arabidopsis and maize (Groszmann et al. 2011, Barber et al. 2012). Although whether those differences reflect nearby or distant interactions remains unclear. Since the +RMR1 and -RMR1 hybrids are predominantly in a B73 genetic background, the region around their Pl1-Rh allele would represent the combination of genetically distinct information, while other chromosomes should be entirely B73 from both parents. Perfect alignment of 24mers from the Pl1-Rh^B73 and B73 parental libraries identified a 50 Mb region around the pl1 locus where fewer Pl1-Rh^B73 24mers aligned (Figure 4.11A on page 143), while a similar region on chromosome 7 had similar 24mer abundances for the parental CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 128 libraries. These 50 Mb regions became the genetically similar (chromosome 7 ) and dissimilar (chromosome 6 ) regions to compare effects of parental 24-nt RNA abundance on the 24-nt RNA abundance of the +RMR1 hybrid (Figure 4.11B on page 143). While there is a great range of both fold difference between the parental 24-nt RNA abundances and deviation of the hybrid 24-nt RNA abundance from an expected parental average, there are no apparent differences in this spread between genetically similar or dissimilar genomic regions. Simi- larly, the only difference in 24-nt RNA abundances between +RMR1 and -RMR hybrids for the same two regions was a general bias towards more 24-nt RNAs in the +RMR1 hybrid (Figure 4.12 on page 144), likely from only half of the -RMR1 hybrid gametophytes having RMR1 activity. Together, these data indicate that relatively localized genetic differences have little effect on the overall 24-nt RNA production in hybrids.

4.4 Discussion

Detailed molecular models are needed to unify the cis-regulatory sequences and trans- acting proteins identified as important for THIs that establish and maintain paramutations. The B1-I heptarepeats represent one sequence element whose molecular characterization has provided one platform to investigate the establishment of paramutation (Stam et al. 2002a, Arteaga-Vazquez et al. 2010, Belele et al. 2013). However, the B ′ state is too stable for de- tailed study of the THI underlying maintenance of existing paramutations. Identifying the cis-regulatory sequences involved in both the establishment and maintenance of Pl1-Rh para- mutations will provide a second model for molecular studies. Here the Pl1-Rh pentarepeat, the best candidate enhancer/paramutation element within ∼200 kb of the Pl1-Rh coding region, was identified and characterized for its involvement in paramutation THIs. Although the underlying sequence and copy number vary, the Pl1-Rh pentarepeats share many common features with the B1-I heptarepeat, which indicates that the pentarepeat may play an analogous role in Pl1-Rh paramutation. In both cases, the repeats are direct tandem copies with no intervening sequence, although some single nucleotide polymorphisms (SNPs) distinguish most of the B1-I repeat units (Stam et al. 2002a). Each Pl1-Rh repeat unit appears identical to the other four based on both the PacBio sequence and alignment of Illumina reads to the pentarepeats. Although similar repeats in paramutable Pl1-CML52 may have some sequence polymorphisms. Both the heptarepeat and pentarepeat contain unique sequence not found elsewhere in the maize genome (Stam et al. 2002a), although this represents only a fraction (390 bp) of the 2,092 bp Pl1-Rh repeat unit. The unique subrepeat sequence is not present elsewhere in the assembled B73 genome, and the lack of additional bands when the unique subrepeat probe is hybridized to various non-B73 maize genomes (Figure 4.4 on page 137) indicates that the subrepeat is also unique in these other maize genomes. The B1-I heptarepeat loops to interact with the b1 promoter (Louwers et al. 2009). While any looping interactions remain to be tested for the Pl1-Rh pentarepeat, at least the unique subrepeat portion is actively transcribed by Pol II–a common feature of enhancer elements in other organisms and of the B1-I heptarepeat (Arteaga-Vazquez et al. 2010, Core CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 129 et al. 2012). Furthermore, its transcription corresponds with the expression state of both Pl1-Rh and Pl1-CML52. Finally, both produce 24-nt RNAs (Arteaga-Vazquez et al. 2010), although further testing is needed to determine if the 24-nt RNAs at the unique subrepeat vary between Pl-Rh and Pl ′ states, which would be distinct from the B1-I repeats (Arteaga- Vazquez et al. 2010). Together, the many similarities between Pl1-Rh pentarepeats and B1-I heptarepeats comports with a similar regulatory role for the pentarepeats. Meanwhile the distinctions between the two repeat sets motivate comparative analysis of their roles in paramutation behaviors. To evaluate the likelihood of the Pl1-Rh pentarepeat as the genetically identified en- hancer/paramutation element (Erhard et al. 2013), the paramutagenic alleles (Pl1-Rh and Pl1-CML52 ) must be carefully compared to the derivative Pl1-Rh alleles and Pl1-Blotched. Both Pl1-Rh and Pl1-CML52 have multiple copies of the repeat unit, which is consistent with the high copy number of repeats involved in both B1-I and r1 paramutation (Ker- micle 1996, Stam et al. 2002a). Minimally, the Southern blots identify three repeats for Pl1-CML52. The promoter and coding regions of Pl1-CML52 are highly similar to that of Pl1-Rh (Erhard et al. 2013), if the repeat region is also similar, then the two Pl1-CML52 NsiI fragments (∼5.5 and 6 kb; Figure 4.4A on page 137) could represent a total region similar in size to the ∼12 kb Pl1-Rh NsiI fragment. Then an additional NsiI site would be needed, likely within the third Pl1-CML52 repeat unit; such a polymorphic repeat unit might also be consistent with the extra Pl1-CML52 BglII fragment (Figure 4.4B on page 137). More structural analysis and potentially sequencing of the Pl1-CML52 repeats is needed to fully compare these two alleles. Before Pl1-Rh and Pl1-Bh were known to be allelic, they were genetically mapped as 0.8-1.4 centiMorgans (cM) apart (Rhoades 1948), which could corre- spond to as near as the pentarepeat or as far from the pl1 coding region as 3 Mb away (Ganal et al. 2011). The Pl1-Rh-like fragments of Pl1-Blotched, pl1-mum9515, pl1-mum1101 and pl1-gamma9601, all of which fail to establish paramutation, disprove the hypothesis that re- peat unit number alone correlates with paramutagenicity at this locus, although the repeats may still be necessary. Similarly, the presence of Pl1-Blotched unique subrepeat transcripts argues against pentarepeat transcripts being responsible for paramutagenicity. However, the Pl1-Rh derivative alleles are all deficient for transcription of their unique subrepeats, which indicates that the region is still tied into the transcriptionally activity of the coding region (Hollick et al. 2000). Identifying the extent of transcriptional activity correlated with the Pl1-Rh epigenetic state will help delimit the boundaries of paramutation elements. The AY530952.1_FG001 gene immediately downstream of the pentarepeat is not expressed in florets (Sekhon et al. 2013), but if its normal expression in seedling is not altered in Pl-Rh or Pl ′ backgrounds then the pentarepeat may define the end of the Pl1-Rh transcriptional regulatory region. Further characterization of this region will help delimit the boundaries of Pl ′ repressed transcription and identify regions to look for the Mutator TE and gamma irradiation-induced lesions of Pl1-Rh derivative alleles. The THI that maintains Pl ′ requires sequences that are present in the stabilizing neutral alleles, but not present or are fundamentally altered in the destabilizing amorphic alleles. While aligning the Pl1-Rh and pl1-B73 sequences, the unique subrepeat became a candidate CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 130 maintenance element, since it is present in a single copy in pl1-B73. However, the detection of DNA fragments from amorphic alleles pl1-Co159 and pl1-Mo17 by the unique subrepeat probe indicate that those alleles likely have the unique subrepeat in single copy, much like pl1-B73. The repetitive nature of the rest of the 2,092 bp repeat unit prohibits other probes to this region as they will likely hybridize to other TE sequences in the genome and the distinct genetic backgrounds of various pl1 alleles will likely have different TE contents (Wang & Dooner 2006). The Mo17 genome has been partially sequenced by the Joint Genomes Institute (JGI, Walnut Creek, CA). If the region downstream of pl1-Mo17 can be assembled, it might shed further light on the nature of its repeat unit and identify differences to the B73 and Pl1-Rh reference sequences. However, alignment of available JGI Mo17 sequences to the Pl1-Rh repeat unit failed to identify any reads that aligned to the unique subrepeat (data not shown). Long read sequencing of the repeat unit in amorphic pl1 alleles by capillary sequencing may be required to fully compare them to stabilizing and paramutagenic alleles. The pl1-B73 unique subrepeat accumulates neither Pol II transcripts (Figure 4.7 on page 139) nor Pol IV-dependent 24-nt RNAs (Figures 4.8 on page 140 and 4.9 on page 141), indicating that neither 24-nt RNAs nor Pol II RNAs of the unique subrepeat are necessary for the THI that maintains the Pl ′ state. Paramutation begins with a period of unstable trans-silencing before the repression be- comes mitotically and meiotically heritable (Coe 1961, Coe 1966). Similarly, all rmr alleles were identified by the somatic derepression of the Pl ′ state (Hollick & Chandler 2001), which does not always translate to meiotically heritable reversion (Stonaker et al. 2009, Barbour et al. 2012). The 24-nt RNAs that align to the unique subrepeat may be involved in this initial trans-silencing behavior. Both the uniqueness of the region and the dosage effect in Pl1-Rh / pl1-B73 hybrids (+RMR1 hybrid in Figure 4.9 on page 141) comport with these 24-nt RNAs originating from one or all of the Pl1-Rh unique subrepeats. These 24-nt RNAs are dependent on the Pol IV largest subunit (RPD1) and RMR2 (Figure 4.8 on page 140). Since immature cobs represent a mixture of 24-nt RNAs from gametophytic and sporophytic tissues (Figure 4.10 on page 142), the decrease in unique subrepeat 24-nt RNAs between the +RMR1 and -RMR1 hybrids (Figure 4.9 on page 141) indicates that these unique subrepeat 24-nt RNAs are also dependent on RMR1. RMR1 is not required and RMR2 is only partially required to establish Pl1-Rh paramutations (Hale et al. 2007, Barbour et al. 2012), indicating that these unique subrepeat 24-nt RNAs can not be necessary for establishing paramutation. Similarly, RMR2 is not required to maintain paramutations (Barbour et al. 2012), indicating that these RMR2-dependent 24-nt RNAs are equally not necessary to maintain paramuta- tions. However, their codependence on RPD1, RMR1 and RMR2 continues to support the idea that these 24-nt RNAs might be important for trans-silencing early in paramutations and between Pl ′ alleles. Although, it cannot be ruled out that Pol IV occupancy of the unique subrepeat is important for the establishment and/or maintenance of paramutation and the 24-nt RNAs are simply a side-effect. Profiling of 24-nt sRNAs at Pl1-CML52 and Pl1-Blotched may help to distinguish Pl1-Blotched from its paramutagenic relatives. Addi- tionally, measuring the abundance of these 24-nt RNAs between Pl-Rh and Pl ′ states and when Pl ′ is heterozygous with amorphic alleles may identify variations in these 24-nt RNAs CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 131 that might be important for paramutation and/or reversion. Although the exact role of the Pl1-Rh pentarepeat in paramutation behaviors remains unclear, it does represent a new portion of the Pl1-Rh haplotype that is transcribed by both Pols II and IV and may be necessary, though not by itself sufficient, for paramutation behaviors. Interactions between these two polymerases, either through competition or via epigenetic repression guided by Pol IV-dependent 24-nt RNAs, has been identified as a key regulator of both TE and genic sequences (Alleman et al. 2006, Erhard et al. 2009, Hale et al. 2009, Erhard et al. 2013, Erhard et al. 2015, Chapter 3 of this work). An upstream doppia TE fragment of the CACTA Type II TE family, is targeted by Pol IV complexes to maintain transcriptional repression of Pl1-Rh, Pl1-CML52, Pl1-Blotched and pl1-R30 in the kernel (Erhard et al. 2013). Although further characterization of the transcriptional units around the pentarepeats is needed, these represent another component of the Pl1-Rh allele whose regulation may be necessary for paramutation behaviors. Meanwhile, further exploration of the ∼ 200 kb Pl1-Rh sequence, including the overlay of 24-nt RNAs, may identify more cis-regulatory regions that play roles in Pl1-Rh epigenetic control.

Acknowledgements

I would like to thank all of those who shared unpublished datasets or reagents with me: Irene Liao, Stacey Simon, Blake Meyers and Jay Hollick for sharing Illumina sequencing data of small RNAs isolated from immature cobs; Irene Liao and Jay Hollick for their early work in the BAC library preparation, screening and Illumina sequencing; Chris Watkins, Darren Heavens and Mario Caccamo at the Genome Analysis Centre (Norwich, UK) for preparing and sequencing the eight Pl1-Rh-containing BACs on the PacBio platform; Janelle Gabriel for sharing Pl ′ and Pl-Rh cDNAs used in Figure 4.7 on page 139; Erica Beck for allowing me to reprobe her Southern blot containing BAC DNA with the unique subrepeat probe (Figure 4.3B on page 136); and Jay Hollick for sharing the results of his paramutation tests on the pl1-mum1101 allele. Thanks also to Janelle Gabriel, Laura Schrader, Julia Kays, Brian Giacopelli and Jay Hollick, as well as the Slotkin, Bisaro, and Ding labs for valuable comments on this project over the years. Thanks to Katherine Mejia-Guerra, Jeffrey Campbell, Wilbur Ouma and the rest of the CAPS Computational Biology Laboratory for their helpful comments with my bioinformatics analyses and the window_feature.py script. Finally thanks to the Center for Applied Plant Sciences (CAPS) administration that allowed the CCBL members to host Dr. Titus Brown who suggested his digital normalization procedure for handling the over-sequenced BAC assembly and Dr. Mario Caccamo who formed a collaboration with Jay to sequence our BACs on the PacBio platform. Most of the work in this chapter would not have been possible without those initial serendipitous collaborations. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 132 Flag leaf, Florets t analysis. Floret tissue was used for RNA alleles pl1 52% B73 Seedling shoots/leaves ∼ 99% W22 Flag leaf Approx. Genetic Background Tissue(s) Collected > Progeny Number(s) Table 4.1: Tissue sources for Number(s) 14-1053—14-1074 140281— 140271 49.2% 142003 B73, 48% A619, 2.8% unknown 49.2% 141648 B73, 51.8% unknown 48% A619, 49.6% A632, 2.4% unknown Flag leaf, 48% Florets A619, Seedling shoots/leaves Flag leaf, Florets 14-123114-1242 112607 112054 100% B73 Flag leaf, Florets 14-123714-123914-1433 09211214-1236 13055514-1234 071611 86.6% A619, 140702 13.4% unknown 98.4% A619, 091228 1.6% unknown 100% CML52 49% B73, 25% CML52, unknown 26% unknown Flag Florets leaf, Florets Flag leaf, Florets Flag leaf 14-122514-1196 112617 131307 100% Mo17 98.4% A619, 1.6% unknown Flag leaf, Florets Flag leaf ) 14-1240 110772 98.44% B73, 1.56% unknown Florets ) 14-1241 130111 98.44% B73, 1.56% unknown Flag leaf, Florets ′ Pl-Rh Pl ( ( Flag leaf and seedling tissuesextraction were for used RT-PCR for analysis. genomic DNA extraction for Southern blo AllelePl1-Rhoades Family Pl1-Rhoades pl1-mum9515 pl1-mum9515 pl1-mum1101 pl1-mum1101 pl1-W22 pl1-B73 pl1-R30 Pl1-CML52 Pl1-CML52 Pl1-Blotched pl1-gamma9601 pl1-Mo17 pl1-Co159 4.5 Tables CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 133

Table 4.2: Primers used

Target Primer Name Sequence (5 ′ to 3 ′) pl1 coding region phi031_F GCAACAGGTTACATGAGCTGACGA a pl1 coding region phi031_R CCAGCGTGCTGTTCCAGTAGTT a pl1 coding region pl1_FP2 GGATCTCATCGTCCGGCTCCACAA b Unique subrepeat pentarepeat-probeFa AGGGACTTAGAACAGCCAACAA Unique subrepeat pentarepeat-probeFb GTGACCGATGAGAAGACAAAGACCTA Unique subrepeat pentarepeat-probeR CTAAGCCTTCCTGTAGTTTGATTGGC aat coding region alt203_FP1 CAATATCACTGGTCAAATCCTTGCGA aat coding region aatR TTGCACGACGAGCTAAAGACT c a Primers from Chin et al. 1996. b Designed by Janelle Gabriel. c Primers from Woodhouse et al. 2006.

CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 134

u

t

w v

Pl1-

j

i

h

u

t

s s

otransposon p

s indicating each

r

q

p o

region by platform.

l k Huck1 Pl1-Rhoades is highlighted as a purple rectangle. AY530952.1_FGT001 pentarepeat Pl1-Rhoades Assemblies of the ns & Freeling 2008). Red, green and purple shadings (A) sents the sequencing of a distinct BAC containing . didate enhancer/paramutation element (purple) and LTR sequences. Shaded connectors highlight high-scoring segment pl1-B73 Pl1-Rhoades pl1-B73 doppia Pl1-Rhoades family (green). Black boxes represent gene models with arrow and Huck1 GRMZM5G850129 pl1-B73 DNA transposon immediately upstream of shares structural regions with doppia s io) Alignment of (Illumina) Pl1-Rhoades (B) da clone (Sanger) . Pl1-Rh ncing source (method) Pl1-Rh Pl1-Rh pairs (HSPs) greater than 400 bpdraw in attention length to from GEvo larger analysisretrotransposon structural (Lyo sequences features, of including the a can 4.6 Figures Figure 4.1: respective TSS. The Rhoades For the Pacific Biosciences (PacBio) platform, each row repre Direct repeats are represented by thin black arrows. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 135

Pl1-Rhoades t

transposons DNA Transposons Helitrons

pl1-B73

Figure 4.2: Alignment of a Pl1-Rhoades and pl1-B73 repeat unit. One repeat unit of the Pl1-Rhoades pentarepeat aligned to the single pl1-B73 repeat unit. Shading represents the HSPs from Figure 4.1 on the preceding page; here the insertions in pl1-B73 relative to Pl1- Rhoades are depicted below the pl1-B73 sequence. A purple rectangle represents the 390 bp unique subrepeat, while transposable elements are depicted as fat arrows in black (helitrons), dark grey (Type I LTR retrotransposons) or light grey (Type II DNA transposons). CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 136

A ‡ {| ~ xyz {|

Pl1-Rh AY530952.1_FG001 € }  e NciI

ents NcoI †

NsiI

„ ƒ

Expected StuI ‚

B

StuI

NsiI NcoI NciI

Figure 4.3: Southern blot of Pl1-Rh-containing BAC confirms the structures identified by PacBio sequencing. (A) Model of the pentarepeats (purple outline), highlighting the unique subrepeat (shaded purple) and flanking genes (grey boxes). Restriction enzyme recognition sites are marked by color-coded vertical lines (NciI = green, NcoI = teal, NsiI = cyan, StuI = magenta). Thin grey boxes under the unique subrepeats represent the unique subrepeat probe. Expected fragments based on the PacBio sequencing are highlighted as color-coded lines below the gene models. (B) A clone blot of Pl1-Rh-containing BAC DNA made by Erica Beck reprobed with the unique subrepeat probe; approximate sizes are indicated next to the Southern blot images. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 137

s s s

pl1-gamma9601 pl1-gamma9601 pl1-gamma9601 A Pl1- pl1-R pl1-mum1101pl1-mum9515 B Pl1- pl1-mum1101pl1-mum9515pl1-R C Pl1- pl1-mum1101pl1-mum9515pl1-R

NsiI (N) EcoRV (E) BglII (B)

D E N E E E E E N ‰Š‹Œ  Pl1-Rh ˆ 52.1_FG001

pl1-mum1101 pl1-mum9515 pl1-gammŽŒ

pl1-R‹ Œ recom n region

E N E N

pl1-B73 AY530952.1_FG001

Figure 4.4: Pl1-Rhoades derivatives have similar structures around the pentarepeats. (A-C) Southern blots reflecting the structure of several Pl1-Rhoades derivative alleles around the pentarepeats based on restriction enzyme digest with NsiI (A), EcoRV (B) and BglII (C). pl1-R30 represents a recombinant between the coding region of Pl1-Rhoades and downstream pl1-B73 sequences. pl1-mum1101, pl1-mum9515 and pl1-gamma9601 are derivatives of Pl1- Rhoades that are deficient in paramutation behaviors. (D) Models of the Pl1-Rhoades and pl1-B73 downstream region highlighting the pentarepeat (or single repeat) as grey arrows with the unique subrepeat in darker grey. The region of similarity between Pl1-Rhoades and its derivative alleles (pl1-mum1101, pl1-mum9515 and pl1-gamma9601 ) is highlighted in green. Combined analysis of the Southern blots (A-C) and sequence alignments narrow the putative pl1-R30 recombination region from ∼20 kb (gold) to ∼5 kb (yellow). Gene models are shown in grey rectangles with exons (thick), introns (thin) and the TSS (arrow). All NsiI (N; blue), EcoRV (E; black) and BglII (B; red) sites within the regions are depicted with vertical lines, while those assayed by the Southern blots (A-C) have single letter designations as well. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 138

CML52

Pl1-RhoadesPl1- Pl1- pl1-W22 pl1-B73 pl1-Co159 A B Pl1-RhoadesPl1-CML52 Pl1-RhoadesPl1-CML52Pl1-Blotchedpl1-B73 pl1-Mo17 pl1-Co159

NsiI (N) BglII (B) C N B B B BBBN

Pl1-Rh unique e AY530952.1_FG001

Figure 4.5: Structural diversity of pl1 alleles at the unique subrepeat. (A-B) Southern blots reflecting the structure of several pl1 alleles that undergo paramutation (Pl1-Rhoades and Pl1-CML52 ), stabilize Pl ′ states in trans (Pl1-Blotched, pl1-W22, and pl1-B73 ) and destabilize Pl ′ states (pl1-Co159 and pl1-Mo17 ) as identified by NsiI (A) and BglII (B) digests and the unique subrepeat probe. (C) Model of the Pl1-Rhoades downstream region highlighting the pentarpeat as grey arrows with the unique subrepeat in darker grey. The unique subrepeat probe used in (A-B) is highlighted in teal. Gene models are shown in grey rectangles with exons (thick), introns (thin) and the TSS (arrow). All NsiI (N; blue) and BglII (B; red) sites within the regions are depicted with vertical lines, while those assayed by the Southern blots (A-B) have single letter designations as well. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 139

Pl-Rh Pl' gDNA unique unique eat aat

Figure 4.6: The unique subrepeat is expressed in Pl-Rh florets. Total RNAs isolated from Pl-Rh and Pl ′ florets collected at anthesis were converted to cDNAs with oligo-dT primer. These cDNAs were amplified with primers specific for the latter three-fourths of the unique subrepeat or alanine aminotransferase (aat) as a control (Table 4.2 on page 133). Genomic DNA (gDNA) served as an additional control.

gDNA (ACS7) (ACS7) (ACS1) (ACS1) s s s s s (ACS 7) (ACS 7) (ACS 1) 15

o159 Blotched

Pl1- Pl1- Pl1- Pl1- Pl1- Pl1- Pl1-BlotchedPl1-CML52Pl1-CML52 Pl1-CML52 pl1-R pl1-R pl1-B73 pl1-B73 pl1-Co159 T d

unique at

unique at d)

254 218 113 pl1 coding region 254 218 113

212

122 ‘’‘“ ”“ e —˜‘“ ™š›˜‘ ‘ •”“– se 212 122

Figure 4.7: Expression patterns of pl1 and the unique subrepeat vary across diverse pl1 alleles. Total RNAs isolated from florets collected at anthesis homozygous for the pl1 allele indicated (top labels) were converted to cDNAs with oligo-dT primer. These cDNAs were then amplified with primers for the unique subrepeat, pl1 coding region and alanine amino- transferase as a control (see Table 4.2 on page 133 for primers). Predicted band sizes based on either B73 or Pl1-Rhoades reference sequences are marked. An intronic polymorphism results in a larger unspliced fragment (254 bp) for pl1-B73 than for unspliced Pl1-Rhoades (218 bp); both templates are predicted to produce products at 113 bp when spliced. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 140

A pl1-B C Pl1- h 1

Pl1- h per million B 24mer reads Pl1-Rh 2-1 t 1 transposons DNA Transposons Helitrons pl1-B73 24mer reads per million e Pl1-Rh

Figure 4.8: Pol IV-dependent 24-nt RNAs align to the unique subrepeat. (A) Profiles of 24-nt RNAs from immature cobs of either pl1-B73 or Pl1-Rh homozygotes. The 24-nt RNAs mapped to the Pl1-Rh repeat unit reference with up to two mismatches (colored lines) and were normalized to total library size. (B) Model of the repeat unit, highlighting the repetitiveness of all possible 24mer sequences making up the Pl1-Rh repeat to the maize B73 genome. 24mers aligning 0 or 1 time to the B73 genome are white, while those mapping many times become more red. (C) Pol IV and RMR2 loss affects the 24-nt RNA abundance over the unique subrepeat. Similar pile-ups as in (A), focused on the unique subrepeat portions of the Pl1-Rh reference. The rpd1 and rmr2 genotypes are indicated, in all cases the immature cobs from which these 24-nt RNAs were isolated represent Pl1-Rh / Pl1-Rh homozygotes. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 141 and parent

Sense Antisen d Pl1-Rh e Pl1-Rh Helitron odels of female parent was also Pl1-Rh in a largely B73 background; the B73 Unique except the Pl1-Rh . 24-nt RNAs (normalized to reads per million) / Pl1-Rh Pl1-Rh DNA transposon DNA allele. +RMR1 Hybrids represent crosses of the pl1-B73 Parent is homozygous . Duplicate genotypes represent biological replicates. The m

otransposon ¥

Pl1-Rh

®

¬

¥



° ¯ rmr1-1

/

®

¤ ¬

repeat.

­

¬  rmr1-1

Pl1-Rh

« « « «

ª ª ª ª

rid rid rid rid

¥ ¥ ¥ ¥

¢ ¢ ¢ ¢

ent ent ent ent

¢ ¢ ¢ ¢

¥ ¥ ¥ ¥

¡ ¡ ¡ ¡

¡ ¡ ¡ ¡

¤ ¤ ¤ ¤

£ £ £ £

rent rent rent rent

Ÿ Ÿ Ÿ Ÿ

¤ ¤ ¤ ¤

Ÿ Ÿ Ÿ Ÿ

   

£ £ £ £

   

repeat unit are shown as in Figure 4.8B on page 140.

ž ž ž ž

¨ ¨ ¨ ¨

ž ž ž ž

§ § § §

   

   

Pl1-Rh Pl1-Rh Pl1-Rh Pl1-Rh œ © © © œ œ ¦ ¦ ¦ ¦ œ ©

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

24mer reads per million per reads 24mer 24mer reads per million per reads 24mer Pl1-Rh pl1-B73 homozygous for by the B73 parent and -RMR1 Hybrids are the same as +RMR1 hybrids pl1-B73 Parent is 100% B73 and homozygous for the aligned to the Figure 4.9: Unique subrepeat 24-nt RNAs are made only from CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 142

seedlings Immature co ♀ arent ♂ arent

As

Figure 4.10: Gametophytes contribute to the 24-nt RNA pools of immature cobs. Seedlings and immature cobs inheriting different combinations of Rpd1 and rpd1-1 alleles from their parents were used to isolate low molecular weight RNAs. The 24-nt RNA and tRNA (loading control) bands are visualized with ethidium bromide staining. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 143 and (B73 7 6 ) (B) in 1000 bp 15 pl1-B73 lysis ( (A) -genetic variation. cis is either 6

5 ² highlighted in 24-nt RNA abundances identify 5 5 7

15 15 ± (A) B and s. 6 Crick ^B73 and paternal B73 parents and their Watson Crick iding in a hit-normalized manner by adding a Watson 108 Mb on chromosome ∼ represents the average of two biological replicates. omosomes al library size and 22-nt RNA abundances, prior to Pl1-Rh wx ^B73 parents. The abundance of perfectly mapping read. These abundances were normalized to library size e +RMR1 hybrid from the parental average abundance. locus at pl1 Pl1-Rh wx b) Variation in hybrid 24-nt RNA levels are independent of (B) ^B73 parent). The 50 Mb shaded regions were used for subsequent ana Pl1-Rh wx ( Pl1-Rh wx 6 7 -genetic variation does not alter 24-nt RNA profiles in hybrid Cis Pl1-Rh

Pl1-Rh wx

parent) or (0-mismatches) 24-nt RNAs werefractional tallied value in of 1 abundance Mb /via windows possible the with mapping genome-wide 10 loci 22-nt kbCentromeric RNA for regions sl abundance. each are marked Each by abundance grey trace rectangles. The 24-nt RNA abundances werewindows tallied with over 200 the bp 50 slide Mb for regions two of bioreps chr each of the maternal Figure 4.12 on the following page). regions of genetic diversity between 100% B73 and resulting hybrid (+RMR1 hybrid).calculating the These Parental were Fold normalized Difference and to Deviation tot of th Figure 4.11: CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 144

Chromosome 6 Chromosome 7 dance

dance

Figure 4.12: Maternal loss of RMR1 has similar effects on 24-nt RNA abundance from genetically similar and dissimilar hybrid chromosomes. Each point represents the library- and 22-nt RNA-normalized 24-nt RNA abundance in a 1000 bp window (200 bp slide) across the 50 Mb regions identified in Figure 4.11A on page 143. For both +RMR1 and -RMR1 hybrids two bioreplicates of abundances are shown. The -RMR1 hybrids are heterozygous for rmr1-1 / Rmr1 because their maternal parent was an rmr1-1 mutant. The +RMR1 hybrids are homozygous Rmr1 and had homozygous Rmr1 parents. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 145

4.7 References

1. Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J. E., White, J., Sikkink, K. & Chandler, V. L. (2006) An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442, 295–298. 2. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) Basic local alignment search tool. Journal of Molecular Biology 215, 403–410. 3. Arteaga-Vazquez, M., Sidorenko, L., Rabanal, F. A., Shrivistava, R., Nobuta, K., Green, P. J., Meyers, B. C. & Chandler, V. L. (2010) RNA-mediated trans-communication can establish paramutation at the b1 locus in maize. Proceedings of the National Academy of Sciences 107, 12986–12991. 4. Barber, W. T., Zhang, W., Win, H., Varala, K. K., Dorweiler, J. E., Hudson, M. E. & Moose, S. P. (2012) Repeat associated small RNAs vary among parents and following hybridization in maize. Proceedings of the National Academy of Sciences 109, 10444– 10449. 5. Barbour, J. R., Liao, I. T., Stonaker, J. L., Lim, J. P., Lee, C. C., Parkinson, S. E., Kermicle, J., Simon, S. A., Meyers, B. C., Williams-Carrier, R., Barkan, A. & Hollick, J. B. (2012) required to maintain repression2 is a novel protein that facilitates locus- specific paramutation in maize. The Plant Cell 24, 1761–1775. 6. Belele, C. L., Sidorenko, L., Stam, M., Bader, R., Arteaga-Vazquez, M. A. & Chandler, V. L. (2013) Specific tandem repeats are sufficient for paramutation-induced trans- generational silencing. PLoS Genetics 9, e1003773. 7. Brink, R. A. (1973) Paramutation. Annual Review Genetics 7, 129–152. 8. Brink, R. A. (1956) A genetic change associated with the R locus in maize which is directed and potentially reversible. Genetics 41, 872–889. 9. Brink, R. A. (1958) Paramutation at the R locus in maize. Cold Spring Harbor Symposia on Quantitative Biology 23, 379–391. 10. Brown, C. T., Howe, A., Zhang, Q., Pyrkosz, A. B. & Brom, T. H. (2012) A reference- free algorithm for computational normalization of shotgun sequencing data. arXiv preprint, arXiv:1203.4802. 11. Chandler, V. L. & Stam, M. (2004) Chromatin conversations: mechanisms and impli- cations of paramutation. Nature Reviews Genetics 5, 532–544. 12. Chin, E. C., Senior, M. L., Shu, H. & Smith, J. S. (1996) Maize simple repetitive DNA sequences: abundance and allele variation. Genome 39, 866–873. 13. Cocciolone, S. M. & Cone, K. C. (1993) Pl-Bh, an anthocyanin regulatory gene of maize that leads to variegated pigmentation. Genetics 135, 575. 14. Coe, E. H. (1959) A regular and continuing conversion-type phenomenon at the B locus in maize. Proceedings of the National Academy of Sciences 45, 828–832. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 146

15. Coe, E. H. (1966) The properties, origin, and mechanism of conversion-type inheritance at the B locus in maize. Genetics 53, 1035–1063. 16. Coe, E. H. Jr. (1961) A test for somatic mutation in the origination of conversion-type inheritance at the B locus in maize. Genetics 46, 707–710. 17. Cone, K. C., Cocciolone, S. M., Burr, F. A. & Burr, B. (1993) Maize anthocyanin regulatory gene pl is a duplicate of c1 that functions in the plant. The Plant Cell 5, 1795–1805. 18. Cone, K. C., Cocciolone, S. M., Moehlenkamp, C. A., Weber, T., Drummond, B. J., Tagliani, L. A., Bowen, B. A. & Perrot, G. H. (1993) Role of the regulatory gene pl in the photocontrol of maize anthocyanin pigmentation. The Plant Cell 5, 1807–1816. 19. Core, L. J., Waterfall, J. J., Gilchrist, D. A., Fargo, D. C., Kwak, H., Adelman, K. & Lis, J. T. (2012) Defining the Status of RNA Polymerase at Promoters. Cell Reports 2, 1025–1035. 20. Crusoe, M. R., Edvenson, G., Fish, J., Howe, A., McDonald, E., Nahum, J., Nanlohy, K., Ortiz-Zuazaga, H., Pell, J., Simpson, J., et al. (2014) The khmer software pack- age: enabling efficient sequence analysis. URL http://dx. doi. org/10.6084/m9. figshare 979190. 21. Dorweiler, J. E., Carey, C. C., Kubo, K. M., Hollick, J. B., Kermicle, J. L. & Chandler, V. L. (2000) mediator of paramutation1 is required for establishment and maintenance of paramutation at multiple maize loci. The Plant Cell 12, 2101–2118. 22. Erhard, K. F. Jr., Parkinson, S. E., Gross, S. M., Barbour, J. R., Lim, J. P. & Hollick, J. B. (2013) Maize RNA Polymerase IV defines trans-generational epigenetic variation. The Plant Cell 25, 808–819. 23. Erhard, K. F. Jr., Stonaker, J. L., Parkinson, S. E., Lim, J. P., Hale, C. J. & Hollick, J. B. (2009) RNA Polymerase IV functions in paramutation in Zea mays. Science 323, 1201–1205. 24. Erhard, K. F. Jr., Talbot, J. R. B., Deans, N. C., McClish, A. E. & Hollick, J. B. (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199, 1107–1125. 25. Gabriel, J. M. & Hollick, J. B. (2015) Paramutation in maize and related behaviors in metazoans. Seminars in Cell and Developmental Biology Accepted Manuscript. 26. Ganal, M. W., Durstewitz, G., Polley, A., Berard, A., Buckler, E. S., Charcosset, A., Clarke, J. D., Graner, E.-M., Hansen, M., Joets, J., Le Paslier, M.-C., McMullen, M. D., Montalent, P., Rose, M., Schon, C.-C., Sun, Q., Walter, H., Martin, O. C. & Falque, M. (2011) A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS ONE 6, e28334. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 147

27. Goettel, W. & Messing, J. (2013) Paramutagenicity of a p1 epiallele in maize. Theoret- ical and Applied Genetics 126, 159–177. 28. Gross, S. M. (2007) Trans-sensing interactions and structural features of the maize Pl1-Rhoades allele. University of California, Berkeley, CA PhD dissertation. 29. Gross, S. M. & Hollick, J. B. (2007) Multiple trans-sensing interactions affect meiotically heritable epigenetic states at the maize pl1 locus. Genetics 176, 829–839. 30. Groszmann, M., Greaves, I. K., Albertyn, Z. I., Scofield, G. N., Peacock, W. J. & Dennis, E. S. (2011) Changes in 24-nt siRNA levels in Arabidopsis hybrids suggest an epigenetic contribution to hybrid vigor. Proceedings of the National Academy of Sciences 108, 2617–2622. 31. Haag, J. R., Brower-Toland, B., Krieger, E. K., Sidorenko, L., Nicora, C. D., Norbeck, A. D., Irsigler, A., LaRue, H., Brzeski, J., McGinnis, K., Ivashuta, S., Pasa-Tolic, L., Chandler, V. L. & Pikaard, C. S. (2014) Functional diversification of maize RNA poly- merase IV and V subtypes via alternative catalytic subunits. Cell Reports 9, 378–390. 32. Hale, C. J., Erhard, K. F. Jr., Lisch, D. & Hollick, J. B. (2009) Production and pro- cessing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genetics 5, e1000598. 33. Hale, C. J., Stonaker, J. L., Gross, S. M. & Hollick, J. B. (2007) A novel Snf2 protein maintains trans-generational regulatory states established by paramutation in maize. PLoS Biology 5, e275. 34. Hollick, J. B., Patterson, G. I., Asmundsson, I. M. & Chandler, V. L. (2000) Paramu- tation alters regulatory control of the maize pl locus. Genetics 154, 1827–1838. 35. Hollick, J. B., Patterson, G. I., Coe, E. H. Jr., Cone, K. C. & Chandler, V. L. (1995) Allelic interactions heritably alter the activity of a metastable maize Pl allele. Genetics 141, 709–719. 36. Hollick, J. B. & Chandler, V. L. (1998) Epigenetic allelic states of a maize transcrip- tional regulatory locus exhibit overdominant gene action. Genetics 150, 891–897. 37. Hollick, J. B. & Chandler, V. L. (2001) Genetic factors required to maintain repression of a paramutagenic maize pl1 allele. Genetics 157, 369–378. 38. Hollick, J. B., Kermicle, J. L. & Parkinson, S. E. (2005) Rmr6 maintains meiotic inheritance of paramutant states in Zea mays. Genetics 171, 725–740. 39. Kermicle, J. L. in Epigenetic Mechanisms of Gene Regulation (eds Russo, V. E. A., Martienssen, R. A. & Riggs, A. D.) 267–287 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1996). 40. Kermicle, J. L., Eggleston, W. B. & Alleman, M. (1995) Organization of paramuta- genicity in R-stippled maize. Genetics 141, 361–372. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 148

41. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. (2009) Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25. 42. Louwers, M., Bader, R., Haring, M., Driel, R. v., Laat, W. d. & Stam, M. (2009) Tissue- and expression level-specific chromatin looping at maize b1 epialleles. The Plant Cell 21, 832–842. 43. Lyons, E. & Freeling, M. (2008) How to usefully compare homologous plant genes and chromosomes as DNA sequences. The Plant Journal 53, 661–673. 44. Martin, M. (2011) Cutadapt removes adapter sequences from high-throughput sequenc- ing reads. EMBnet.journal 17, 10–12. 45. Rhoades, M. M. (1948) Interaction of Bh with recessive c. Maize Newsletter 22, 9 (http://mnl.maizegdb.org/mnl/22/06Rhoades.htm). 46. Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., G., G. & Mesirov, J. P. (2011) Integrative Genomics Viewer. Nature Biotechnology 29, 24–26. 47. Sekhon, R. S., Briskine, R., Hirsch, C. N., Myers, C. L., Springer, N. M., Buelle, C. R., de Leon, N. & Kaeppler, S. M. (2013) Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS ONE 8, e61005. 48. Shull, G. H. (1948) What is ‘heterosis’? Genetics 33, 439–446. 49. Sidorenko, L. V. & Peterson, T. (2001) Transgene-induced silencing identifies sequences involved in the establishment of paramutation of the maize p1 gene. The Plant Cell 13, 319–335. 50. Sidorenko, L., Dorweiler, J. E., Cigan, A. M., Arteaga-Vazquez, M., Vyas, M., Kermicle, J., Jurcin, D., Brzeski, J., Cai, Y. & Chandler, V. L. (2009) A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genetics 5, e1000725. 51. Stam, M., Belele, C., Dorweiler, J. E. & Chandler, V. L. (2002) Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes and Development 16, 1906–1918. 52. Stam, M., Belele, C., Ramakrishna, W., Dorweiler, J. E., Bennetzen, J. L. & Chandler, V. L. (2002) The regulatory regions required for B ′ paramutation and expression are located far upstream of the maize b1 transcribed sequences. Genetics 162, 917–930. 53. Stonaker, J. L., Lim, J. P., Erhard, K. F. Jr. & Hollick, J. B. (2009) Diversity of Pol IV function is defined by mutations at the maize rmr7 Locus. PLoS Genetics 5, e1000706. 54. Styles, E. D. & Brink, R. A. (1969) The metastable nature of paramutable R alleles in maize. IV. Parallel enhancement of R action in heterozygotes with r and in hemizygotes. Genetics 61, 801–811. CHAPTER 4. PUTATIVE PL1-RHOADES PARAMUTATION ELEMENT 149

55. Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178–192. 56. Wang, Q. & Dooner, H. K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proceedings of the National Academy of Sciences 103, 17644–17649. 57. Woodhouse, M. R., Freeling, M. & Lisch, D. (2006) Initiation, establishment, and main- tenance of heritable MuDR transposon silencing in maize are mediated by distinct factors. PLoS Biology 4, e339. 58. Zerbino, D. R. & Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18, 821–829. 150

Chapter 5

Perspectives

DNA-dependent RNA polymerases (RNAPs) are conserved across all cellular organisms to convert the information stored in the genome into biologically active RNAs. For proper development of complex body patterns and tissues, multicellular organisms encode additional information controlling when and in which cells a functional genic unit should be transcribed. Some of that controlling information is established and maintained by an RNAP through epigenetic mechanisms (Moazed 2009, Cecere & Grishok 2014). The metazoan RNAP, Pol II is evolutionarily constrained by its dual roles of genic expression and controlling epigenetic regulatory information. Multicellular plants evolved additional RNAPs (Pols IV and V) responsible for controlling epigenetic regulatory information, while Pol II continues its role in genic expression (Luo & Hall 2007, Huang et al. 2015). I propose that the subfunctional- ization of Pols II, IV and V have relaxed the constraints on plant genome composition and enabled the evolution of new epigenetic regulatory mechanisms. Transposable elements (TEs) have long been hypothesized to serve as controlling ele- ments for gene expression (McClintock 1951, McClintock 1984). As Pol IV functions are characterized, it seems that Pol IV association with TEs and the TEs themselves, together, control gene expression. The Zea mays (maize) genome is a complex mixture of genes, TEs and other repetitive sequences with little segregation between these elements (Schnable et al. 2009, Baucom et al. 2009). The maize genome also encodes multiple functionally distinct Pol IV complexes (Stonaker et al. 2009, Sidorenko et al. 2009, Haag et al. 2014) which are generally thought to target TEs and repetitive sequences for epigenetic silencing (Matzke et al. 2015, Giacopelli & Hollick 2015). Together, the genomic complexity and Pol IV diversity in maize motivated my characterization of Pol IV complexes and their genomic targets to better understand how they together control genome function.

5.1 Composition of Pol IV complexes

RNAP subunits and accessory proteins together define Pol IV complexes. Many plant genomes encode multiple paralogous copies of subunits and/or accessory proteins (Stonaker CHAPTER 5. PERSPECTIVES 151 et al. 2009, Ream et al. 2009, Law et al. 2011, Haag et al. 2014, Huang et al. 2015), although the functional result of these duplications remains largely unexplored outside of the model plants Arabidopsis thaliana (Arabidopsis) and maize. Unlike Arabidopsis, maize has at least two functionally distinct Pol IV complexes defined by different catalytic cores: RNAP sub- units RPD1 and RP(D/E)2a (Pol IVa) vs. RPD1 and RP(D/E)2b (Pol IVb) (Stonaker et al. 2009, Sidorenko et al. 2009, Ream et al. 2009, Haag et al. 2014). Accessory proteins may functionally distinguish Pols IVa and IVb. Second site non- complementation (SSNC) between rp(d/e)2a mutant alleles and the reference required to maintain repression2 (rmr2 ) allele support a functional link in certain backgrounds (Chapter 2). RMR2 and RP(D/E)2a similarly maintain epigenetic silencing of the paramutagenic Pl ′ state of Pl1-Rhoades (Stonaker et al. 2009, Barbour et al. 2012). However, the complete evaluation of RP(D/E)2a’s influence on Pl1-Rhoades behaviors must be performed before these two proteins can be fully compared. A result of Pol IV engagement at a locus is ultimately the production of 24-nucleotide (24- nt) RNAs. Profiles of small RNAs (sRNAs) confirm that RP(D/E)2a functionally identifies only a subset of Pol IV complexes (I. T. Liao, S. Simon, B. Meyers, and J. B. Hollick, unpub- lished data). RMR2-dependent 24-nt RNAs have similar profiles to RP(D/E)2a-dependent 24-nt RNAs (Barbour et al. 2012), supporting association of RMR2 with RP(D/E)2a-defined Pol IVa complexes. Since all Pol IV accessory proteins identified affect 24-nt RNA abun- dance, future comparisons of their 24-nt RNA populations will help functionally distinguish Pol IVa and Pol IVb complexes. What is the nature of RMR2 and the family of RMR2-related novel proteins present in multi-cellular plants? These proteins have a conserved carboxy-terminal domain of unknown function (Chapter 2). The lack of other RMR2 domains supports a structural rather than enzymatic role, presumably in Pol IVa function. However, it remains unclear what RMR2 binds with in such a structural capacity. A YFP-tagged RMR2 localizes to the nucleus (A. Luo and A. Sylvester, unpublished data), consistent with an association with Pol IV. RMR2 immunoprecipitation experiments utilizing the YFP-tag will determine whether RMR2 asso- ciates with proteins, nucleic acids or membranes and any specific binding partners. Together, that data will guide further biochemical characterization of this novel protein. Additionally, the function of the RMR2 paralogs in maize remains unclear, they could associate with other Pol IV complexes or have structural roles unrelated to Pol IV function. If associated with Pol IV complexes, loss of these RMR2 paralogs is expected to affect 24-nt RNA abundance. Mutant alleles of the RMR2 paralogs are needed to test their functional involvement in Pol IV biology. The second-site non-complementation (SSNC) between alleles of rp(d/e)2a and rmr2-1 remains curious. I demonstrated the consistency of the SSNC in a particular rmr2-1 lineage but was unable to identify the distinguishing feature of that lineage. The terminal two gen- erations participate in rmr2-1 SSNC and earlier generations were never tested. Identifying the SSNC background factor in the rmr2-1 lineage will require further generations of out- crosses and complementation crosses similar to our mapping procedures for new rmr alleles (Hale et al. 2007, Erhard et al. 2009, Stonaker et al. 2009). Prior to this, however, simple CHAPTER 5. PERSPECTIVES 152 experiments should rule out changes in rmr2 and rp(d/e)2a expression in the SSNC lineage that may be affecting dosage of RMR2 and RP(D/E)2a in the double heterozygote. Several other SSNC examples have been identified in particular rmr allele combinations, having an understanding of the mechanism for rmr2-1, rp(d/e)2a SSNC could help us understand the underlying behaviors in these other SSNCs as well. Mutant alleles disrupting several genes encoding Pol IV associated proteins have been identified through forward genetic screens (Dorweiler et al. 2000, Hollick & Chandler 2001, Madzima et al. 2011). However, the mutant alleles most needed to functionally distinguish Pol IV complexes are those encoding paralogous genes, which are not discovered efficiently through a forward genetic screen. I unsuccessfully attempted to recover rp(d/e)2b mutant alleles from maize reverse genetic resources, but was stymied by conflicting gene annotations and problems with molecularly tracking the mutagenic Mutator insertions. While reverse genetics resources might identify some Pol IV components, I think the continued development of targeted mutagenesis techniques holds the most promise to expedite the mutagenesis of paralogous genes needed to functionally characterize Pol IV complexes.

5.2 Mechanistic models of Pol IV function

Pol IV complexes must interact with genomic target sequences to maintain epigenetic regulation. Pol IV association with a target may inhibit the association of Pol II with the same genomic region through a polymerase competition mechanism (Hale et al. 2009). Alternatively, the 24-nt RNAs produced as a result of Pol IV transcription of the target can guide cytosine methylation of any Pol V transcribed loci with sequence homology (24-nt RNA mechanism) (reviewed by Matzke & Mosher 2014). In both cases, the physical Pol IV target may be distinct from the functional target that produces a characterizable phenotype (discussed below). Therefore, characterization of Pol IV complexes will be greatly aided by a reference genome with high quality intergenic sequencing for the inbred line or ecotype being studied. A good reference sequence contributes to the understanding of Pol IV regulation at a target locus. RPD1 inhibits transcription of the ocl2 allele in B73 seedlings (Chapter 3). Alignment of B73 nascent transcriptome data to the B73 reference genome identifies a TE upstream of the ocl2 gene with similar RPD1-dependent transcriptional repression. This TE is targeted by Pol IV-dependent 24-nt RNAs while the ocl2 gene is not (Erhard 2012), indicating that Pol IV primarily targets the TE. Additionally, qRT-PCR data implicates a polymerase competition mechanism for RPD1-dependent ocl2 silencing (Chapter 3). Taken together these data support a model where Pol IV targets the upstream TE, in the absence of this targeting Pol II is recruited to the TE which serves as a controlling element for ocl2 expression. This model predicts that other ocl2 alleles lacking the upstream TE would not be controlled by Pol IV. This model highlights the importance of both a sequenced genome and performing functional experiments in the same genetic background as the reference genome. Furthermore, the great variability in intergenic TE content between maize inbred CHAPTER 5. PERSPECTIVES 153 lines (Wang & Dooner 2006) makes such considerations of genetic background particularly vital in maize. The B73 ocl2 allele represents a model target locus for studying the composition and function of Pol IV complexes controlling genes through polymerase competition. A different Pol IV complex may be required for controlling loci through 24-nt RNAs. Identification and detailed analysis of model target loci (such as the B73 ocl2 allele) affected by different Pol IV complexes will improve our understanding of the behaviors of Pol IV complexes at all target loci. That goal motivated the nascent transcriptome profiling in Chapter 3. However, further nascent transcriptome profiling in mutants of the 24-nt RNA biogenesis pathway are also needed to compare with the RPD1-dependent nascent transcriptome. These comparisons will rapidly identify a subset of Pol IV target loci that might represent distinct Pol IV functions and serve as model target loci for future studies.

5.3 Pol IV effects on gene function

Most of the focus on the functions of Pol IV complexes has involved their control of TE and gene expression. A surprising result of the nascent transcriptome analysis is that Pol IV complexes help limit transcription beyond the end of genes (Chapter 3). Pol II nor- mally transcribes beyond the poly-adenylation signal (PAS) that marks the edge of a gene, this pre-termination transcription is particularly evident in nascent transcriptome profiles of metazoans (Core et al. 2008, Core et al. 2012, Kruesi et al. 2013). Similar pre-termination transcription in maize extends further away from the gene end in the absence of RPD1 in- dicating that Pol IV complexes may help limit Pol II transcription beyond the gene body. Similarly, RPD1 prevents antisense transcription into genes that may originate at down- stream TEs (Chapter 3). Both of these functions impacted our identification of differentially transcribed genes (Erhard et al. 2015), but the RNA-seq dataset from the same seedling tissue (K. F. Erhard and J. B. Hollick, unpublished data) needs to be examined to see if these transcriptional differences correspond to changes in stable mRNAs.

5.4 Pol IV effects on paramutable alleles

What happens if two or more Pol IV targets converge on the same genic region? The paramutable Pl1-Rhoades allele is poised to answer that question as both an immediately upstream doppia TE fragment and the downstream pentarepeats (Chapter 4) are targeted by Pol IV complexes. In kernel tissue, where pl1 is not normally expressed, the doppia sequence is repressed by Pol IV complexes partially through their production of 24-nt RNAs (Erhard et al. 2013). The unique subrepeats, which are transcribed by Pol II in florets of the expressed Pl-Rh state, are targeted by Pol IV complexes in immature cobs (Chapter 4). Whether this Pol IV targeting occurs in the florets, and what impact such targeting has on Pol II transcription of the unique subrepeat needs to be tested through measurements CHAPTER 5. PERSPECTIVES 154 of small RNA and poly-adenylated RNA abundance in various Pol IV mutants and their wild-type siblings. The increased transcription of the Pl ′ coding region in the absence of RPD1 and other Pol IV components (Erhard et al. 2009, Stonaker et al. 2009, Barbour et al. 2012) predicts that the pentarepeat will be transcribed in Pl ′ individuals lacking these same Pol IV components. Paramutations represent a unique form of gene regulation where trans-homolog interac- tions (THI) establish and maintain meiotically heritable epigenetic silencing (Brink 1973). Loss of RPD1 and RDR2 components of Pol IV complexes prevent the establishment of new paramutations and destabilize existing paramutations (Dorweiler et al. 2000, Hollick et al. 2005), although how these Pol IV components do so remains unclear. The heptarepeats sufficient for paramutation at B1-Intense (B1-I ) produce 24-nt RNAs, but so do other single copy and transgene versions of the heptarepeat unit that do not participate in paramutation (Arteaga-Vazquez et al. 2010, Belele et al. 2013). Similarly the known Pol IV targets at Pl1-Rhoades (the doppia fragment and the pentarepeat) are not sufficient for paramutation (Erhard et al. 2013, Chapter 4), but the pentarepeat 24-nt RNAs and the Pl1-Rhoades Pol IV targets may be necessary for paramutation behaviors. Gent and colleagues propose that Pol IV complexes near genes may serve to silence TEs while maintaining a chromatin state compatible with genic transcription (Gent et al. 2014). I speculate that such expression-compatible chromatin may be more epigenetically labile, which may serve a role in the reversion of Pl ′. If Pol II transcription of the pentare- peats is needed for the high expression of Pl-Rh, then destabilization of Pl ′ states might reflect stochastic transcription of those repeats. Based on the genome-wide estimates, ex- tended pre-termination of AY530952.1_FG001 transcription in rpd1 mutants could include the 3 ′ edge of the pentarepeat. If that initial transcription of the pentarepeat could trig- ger subsequent transcription extending into the pentarepeat the entire region may become transcribed enough to revert Pl ′ to a Pl-Rh state. While very speculative at this stage, the stochasticism of such a model would fit with the observed rates (∼20%) of Pl ′ reversion in rpd1 mutants (Hollick et al. 2005). Pol IV direct targets are mostly thought of as representing TE sequences. The B1-I heptarepeats and the Pl1-Rhoades pentarepeats highlight that non-TE sequences can also be directly targeted by Pol IV complexes. Understanding of how Pol IV is recruited to tandem repeats (TRs) could help explain its role in paramutation and further characterize this potential controlling element. The TR content of most genomes is underestimated by modern genome sequencing that relies on short-read assembly techniques which tend to collapse such elements. As demonstrated by the assembly of the pentarepeat in the PacBio- generated Pl1-Rhoades assembly, very long-read sequencing techniques promise to uncover more cases of TRs that could previously be identified only through structural assays. TRs also serve as areas of high diversity where uneven recombination events generate structural variants of differing repeat number and regulatory behavior (Stam et al. 2002b, Stam et al. 2002a). Characterization of Pol IV targeting to the B1-I heptarepeat and Pl1-Rhoades pentarepeat will provide models to understand how Pol IV complexes might exert gene control through targeting of TRs. CHAPTER 5. PERSPECTIVES 155

5.5 Conclusions

If TEs and tandem repeats can serve as controlling elements for gene expression, what then controls them? From the work discussed in this dissertation and the work of many others, the answer seems obvious: Pols IV and V. More importantly now: how does Pol IV control the controlling elements and, through them, gene expression? I have begun or extended the characterization of the composition (through RMR2 paralogs; Chapter 2) and functions of Pol IV complexes at certain target loci (pre-termination at gene ends, the B73 ocl2 allele and Pl1-Rhoades; Chapters 3 and 4). These target loci represent largely distinct Pol IV behaviors. While these Pol IV behaviors may be relics of ancient TE blooms, their co- option into genome regulation has become functionally relevant to normal plant development (Parkinson et al. 2007, Erhard et al. 2009). Further characterization of these target loci and others will shed light on the functions of Pol IV complexes in gene function, may identify novel mechanisms of gene regulation and will improve our understanding of the interplay between RNAP diversity and plant genome composition. CHAPTER 5. PERSPECTIVES 156

5.6 References

1. Arteaga-Vazquez, M., Sidorenko, L., Rabanal, F. A., Shrivistava, R., Nobuta, K., Green, P. J., Meyers, B. C. & Chandler, V. L. (2010) RNA-mediated trans-communication can establish paramutation at the b1 locus in maize. Proceedings of the National Academy of Sciences 107, 12986–12991. 2. Barbour, J. R., Liao, I. T., Stonaker, J. L., Lim, J. P., Lee, C. C., Parkinson, S. E., Kermicle, J., Simon, S. A., Meyers, B. C., Williams-Carrier, R., Barkan, A. & Hollick, J. B. (2012) required to maintain repression2 is a novel protein that facilitates locus- specific paramutation in maize. The Plant Cell 24, 1761–1775. 3. Baucom, R. S., Estill, J. C., Chaparro, C., Upshaw, N., Jogi, A., Deragon, J.-M., Westerman, R. P., SanMiguel, P. J. & Bennetzen, J. L. (2009) Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genetics 5, e1000732. 4. Belele, C. L., Sidorenko, L., Stam, M., Bader, R., Arteaga-Vazquez, M. A. & Chandler, V. L. (2013) Specific tandem repeats are sufficient for paramutation-induced trans- generational silencing. PLoS Genetics 9, e1003773. 5. Brink, R. A. (1973) Paramutation. Annual Review Genetics 7, 129–152. 6. Cecere, G. & Grishok, A. (2014) A nuclear perspective on RNAi pathways in metazoans. Biochimica et Biophysica Acta 1839, 223–233. 7. Core, L. J., Waterfall, J. J., Gilchrist, D. A., Fargo, D. C., Kwak, H., Adelman, K. & Lis, J. T. (2012) Defining the Status of RNA Polymerase at Promoters. Cell Reports 2, 1025–1035. 8. Core, L. J., Waterfall, J. J. & Lis, J. T. (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845– 1848. 9. Dorweiler, J. E., Carey, C. C., Kubo, K. M., Hollick, J. B., Kermicle, J. L. & Chandler, V. L. (2000) mediator of paramutation1 is required for establishment and maintenance of paramutation at multiple maize loci. The Plant Cell 12, 2101–2118. 10. Erhard, K. F. Jr. (2012) Alternative polymerase genome regulation in Zea mays. Uni- versity of California, Berkeley, CA PhD dissertation. 11. Erhard, K. F. Jr., Parkinson, S. E., Gross, S. M., Barbour, J. R., Lim, J. P. & Hollick, J. B. (2013) Maize RNA Polymerase IV defines trans-generational epigenetic variation. The Plant Cell 25, 808–819. 12. Erhard, K. F. Jr., Stonaker, J. L., Parkinson, S. E., Lim, J. P., Hale, C. J. & Hollick, J. B. (2009) RNA Polymerase IV functions in paramutation in Zea mays. Science 323, 1201–1205. CHAPTER 5. PERSPECTIVES 157

13. Erhard, K. F. Jr., Talbot, J. R. B., Deans, N. C., McClish, A. E. & Hollick, J. B. (2015) Nascent transcription affected by RNA polymerase IV in Zea mays. Genetics 199, 1107–1125. 14. Gent, J. I., Madzima, T. F., Bader, R., Kent, M. R., Zhang, X., Stam, M., McGinnis, K. M. & Dawe, R. K. (2014) Accessible DNA and relative depletion of H3K9me2 at maize loci undergoing RNA-directed DNA methylation. The Plant Cell. 15. Giacopelli, B. J. & Hollick, J. B. (2015) Trans-homolog interactions facilitat- ing paramutation in Zea mays. Plant Physiology epub ahead of print, doi: http://dx.doi.org/10.1104/pp.15.00591. 16. Haag, J. R., Brower-Toland, B., Krieger, E. K., Sidorenko, L., Nicora, C. D., Norbeck, A. D., Irsigler, A., LaRue, H., Brzeski, J., McGinnis, K., Ivashuta, S., Pasa-Tolic, L., Chandler, V. L. & Pikaard, C. S. (2014) Functional diversification of maize RNA poly- merase IV and V subtypes via alternative catalytic subunits. Cell Reports 9, 378–390. 17. Hale, C. J., Erhard, K. F. Jr., Lisch, D. & Hollick, J. B. (2009) Production and pro- cessing of siRNA precursor transcripts from the highly repetitive maize genome. PLoS Genetics 5, e1000598. 18. Hale, C. J., Stonaker, J. L., Gross, S. M. & Hollick, J. B. (2007) A novel Snf2 protein maintains trans-generational regulatory states established by paramutation in maize. PLoS Biology 5, e275. 19. Hollick, J. B. & Chandler, V. L. (2001) Genetic factors required to maintain repression of a paramutagenic maize pl1 allele. Genetics 157, 369–378. 20. Hollick, J. B., Kermicle, J. L. & Parkinson, S. E. (2005) Rmr6 maintains meiotic inheritance of paramutant states in Zea mays. Genetics 171, 725–740. 21. Huang, Y., Kendall, T., Forsythe, E. S., Dorantes-Acosta, A., Li, S., Caballero-Perez, J., Chen, X., Arteaga-Vazquez, M., Beilstein, M. A. & Mosher, R. A. (2015) Ancient origin and recent innovations of RNA polymerase IV and V. Molecular Biology and Evolution 32, 1788–1799. 22. Kruesi, W. S., Core, L. J., Waters, C. T., Lis, J. T. & Meyer, B. J. (2013) Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808. 23. Law, J. A., Vashisht, A. A., Wohlschlegel, J. A. & Jacobsen, S. E. (2011) SHH1, a homeodomain protein required for DNA methylation, as well as RDR2, RDM4, and chromatin remodeling factors, associate with RNA polymerase IV. PLoS Genetics 7, e1002195. 24. Luo, J. & Hall, B. D. (2007) A multistep process gave rise to RNA Polymerase IV of land plants. Journal of Molecular Evolution 64, 101–112. CHAPTER 5. PERSPECTIVES 158

25. Madzima, T. F., Mills, E. S., Gardiner, J. M. & McGinnis, K. M. (2011) Identification of epigenetic regulators of a transcriptionally silenced transgene in maize. G3 - Genes, Genomes, Genetics 1, 75–83. 26. Matzke, M. A., Kanno, T. & Matzke, A. J. (2015) RNA-directed DNA methylation: the evolution of a complex epigenetic pathway in flowering plants. Annual Review of Plant Biology 66, 9.1–9.25. 27. Matzke, M. A. & Mosher, R. A. (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics 15, 394–408. 28. McClintock, B. (1984) The significance of responses of the genome to challenge. Science 226, 792–801. 29. McClintock, B. (1951) Chromosome organization and genic expression. Cold Spring Harbor Symposia on Quantitative Biology 16, 13–47. 30. Moazed, D. (2009) Small RNAs in transcriptional gene silencing and genome defense. Nature 457, 413–420. 31. Parkinson, S. E., Gross, S. M. & Hollick, J. B. (2007) Maize sex determination and abaxial leaf fates are canalized by a factor that maintains repressed epigenetic states. Developmental Biology 308, 462–473. 32. Ream, T. S., Haag, J. R., Wierzbicki, A. T., Nicora, C. D., Norbeck, A. D., Zhu, J.-K., Hagen, G., Guilfoyle, T. J., Pasa-Tolic, L. & Pikaard, C. S. (2009) Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II. Molecular Cell 33, 192–203. 33. Schnable, P. S. et al. (2009) The B73 maize genome: complexity, diversity, and dynam- ics. Science 326, 1112–1115. 34. Sidorenko, L., Dorweiler, J. E., Cigan, A. M., Arteaga-Vazquez, M., Vyas, M., Kermicle, J., Jurcin, D., Brzeski, J., Cai, Y. & Chandler, V. L. (2009) A dominant mutation in mediator of paramutation2, one of three second-largest subunits of a plant-specific RNA polymerase, disrupts multiple siRNA silencing processes. PLoS Genetics 5, e1000725. 35. Stam, M., Belele, C., Dorweiler, J. E. & Chandler, V. L. (2002) Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes and Development 16, 1906–1918. 36. Stam, M., Belele, C., Ramakrishna, W., Dorweiler, J. E., Bennetzen, J. L. & Chandler, V. L. (2002) The regulatory regions required for B ′ paramutation and expression are located far upstream of the maize b1 transcribed sequences. Genetics 162, 917–930. 37. Stonaker, J. L., Lim, J. P., Erhard, K. F. Jr. & Hollick, J. B. (2009) Diversity of Pol IV function is defined by mutations at the maize rmr7 Locus. PLoS Genetics 5, e1000706. 38. Wang, Q. & Dooner, H. K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proceedings of the National Academy of Sciences 103, 17644–17649. 159

Appendix A

Pedigrees of complementation crosses from Chapter 2

This appendix includes the pedigree figures from Chapter 2, reproduced to include family and pedigree (or ear) numbers. Family numbers traditionally follow the format of year- family_number or year-family_number-individual; in these figures the year is on the edges and only the family and, as needed individual, number is written out.

APPENDIX A. COMPLEMENTATION CROSS PEDIGREES 160 ½

Generation ear rmr7-1 Generation ½ear

³¼ ³¼ Ô Õ Ð ÑÒÓ M1

rmr7-1 lineages

³´

Ð Ó

Û Ü

21 ³´

rmr¾¿ À lineages

³´ ³´ ÝÒÞ Ó

3 ÝÒÞÓ 2

ÁÂÃ ÄÅÆÇÈ É Æ É ÊÆ ÉË

³ ³

µ µ

Ó Þ Ó

Ü Û

Æ ÉÊÆ Í

ComplementatioÌ 1233 1234

ÃÃ ÉÃ É Ê ÂÃ Ï Î

Î ed to complement

³ ³

µ µ

Æ ÉÊÆ Í ComplementatioÌ

alleles complemented

³³ ³ ³

Þ Ò ÞÓ Ð

ß Û

2 Üß 455

ÝÔÞÓ ³³

744 7 ³ ³

ÖÖ× ØÙÚ Ö Ö× ØØ

â

¶¶ ¶ ¶

Ñ ÕÓ Ô Ñ ÕÓ Ò Ó ÔÝ

Û Üß Û ß

268 11 ß Û 4

× × × Ö

ã ä

× ××à Ú×

× × Ø Ú Ù

×××Ù á

á â ¶¶

1188 ¶ ¶

× × Ø à

ã á

¶ ¶

· · Ñ ÕÓÒ

738 737 Û

× Ø×Øàà

¶ ¶

· ·

Ý

ßÛ

× Ø ÖÖ

ã ä

× ØÙ× Úà

¶ ¶

¸ ¸

Þ ÕÓ Ý ÝÓ ÕÞ

ß Ü Ü Û

16 Ü 8

× × Ø

ã ã ä

¶ ¶

¸ ¸

ÕÐÒ Ý Ó

Ü Û Ü

ß 2

× Ù ÖÙ Ø

ã

× à × ØÖ

ã ¶¹

rmr7-5 ¶¹

¶¹ ¶¹ Õ

ß ß M1

× Ùà ×

ää

¶ ¶

º º

× Ö à

ã ã á

¶ ¶

º º

¶» ¶»

¶» ¶»

¶¼ ¶¼

Ô Ð Ó Ô Ð Ó Ô Ð ÓÐ Ô Ð ÓÐ

Ü Ü Ü

4 1184 257 257 Ü

× Ø ÙÖ

á á

× Ø Ø

á ä â

× Ø Ø

á ä â

¶¼ ¶¼ Þ

ÜÛß 1513 1513

× à ØØÙ

á ¶´

247 312 ¶´

× ×à Ú×

ä ¶´

1163 ¶´

× ØÖÖ

äã

¶ ¶ µ

65 438 µ

× × à

á âá

× Ú× ÖØ

ã ¶ ¶

µ µ

¶³ ¶ ³

Ó

ß ß Ü

× Ö××Ö

ä

¶³ ¶ ³

Ô ÕÓ Ô ÕÓ Õ Õ Ô Ó Õ

Û Û Û Ü Ü

556 611 184 1 184 2 Û

× Ö×Ù

á ä

× Ö×à Ø×

×Ö Ø Ö

á ã

¶³ ¶ ³

Þ Õ

ß ×Ö×Ù

853 856 á â

× Ö × Ù

ã ä ×Ö × Ú

ã ã × Ö Ø

ã âã

¶ ¶

· ·

Ð Ô Ó Ð Ô ÝÓ Ð Ð Õ ÐÔÓ Ð Ô ÝÓ Ü

435 436 17 6 326 Ü 17 6 331 466

Ø×Ø ÙÖ

á

¶ ¶

· ·

å å åæååå

Ø×ØÖÙ

á

Ø××Ù×

á Ø××Ù×

11 33 á 11

ØØ×Øà

á

Ð Ð Ó Ô Ò ÝÓ Ô Ò ÝÓ ÐÒÒ Ó ÔÒ ÝÓ Ð ÒÒÓ Ð Ð Ó Ô Ò ÝÓ ß

11 1 3 2 15 712 753 2 15 712 ß 1 3 753 11

ØØ× àÖ

â

ØØ× à Ö

â

ØØÙÖ

äâ

ØØÙÖ

ä â

Ý Ý Ý Ý

ßÛ ß Ü ß Ü 12 ßÛ 12

Figure A.1: (Continued on next page.) APPENDIX A. COMPLEMENTATION CROSS PEDIGREES 161

Figure A.1: Pedigree of rmr7-1 and rmr7-5 lineages used in complementation crosses. The pedigree includes the lineages from the original M1 families to those complementation crosses that were evaluated as either complementing (yellow circles) or failing to complement (purple circles) (Table 2.3). Each circle represents a family and each connecting line a cross with self-cross, sibling cross and outcross represented as a single connecting line, a diamond connection, or two distinct input lines. Family circles are color-coded as marked in the legend. Female parents are always on the left of each two-parent cross. Each complementation cross is shown twice, once for each rmr7-1 and rmr7-5 lineage, with the family index numbers listed. Family numbers are given within the circles, where a particular individual was used the individual number follows the family number (family-individual). Ear or progeny numbers are given in colored rectangles along the lines that represent crosses.

APPENDIX A. COMPLEMENTATION CROSS PEDIGREES 162

2 8 §§

J  J 

Earlþ rmr2-1 lineage

¨ § ¨ 8 ¨©

rmr2-1 M1

òóô õ ö÷ø ùôó ú û üøýþ

¨© § ¨© ¦ ¦

5 rmr2-1 M2

ø ý ùþ ö ù ö ù öó

ÿ W ¡ ¢

¨© 2 1 ¦ © ¨© 2¦¦

rmr2-1 lineage that complements

rmr7 alleles

öûõ ö ù e û ù £ ûøý ùô ¤ô ü

¢ ¡

ö÷ø ùôó ú û üøýþ òóô õ rmr2-1 lin lement

rmr7 alleles ñ Generation ear b1 rmr2-1 Generation ñear

172

ç è

2 1 2 1 2 1

172 418 172 172 418 247 172 267 B-weak rmr2-1 ç è

ç è ç è

¦ 2 ¨ ¦ 2 ¨

1257 5 1257 5

ç ç

é é

2 1¦ ¦ 2 1 8 © 8 ¥ 8¨§ 8 2

8 4 8 542 533

ç ç

é é

¦ § 2 ¦ §2 ¦ ¥© ¦ §

1 6 1 4 1 6 1 16

§ ¨ ©

2 ç ç

448 B-I rmr2-1 64 II çç

ç ç çç

8¥¦ §



 

ê ê êê

§ 12 2 ¨©

ê ê êê





ê ë ê ë

© ©© ¦

541



ê ë ê ë

¨ 2 ¨2¨



0



 

ê ì ê ì

¨1 8©8 2 ¦ 1 ¥ ¦ 1 2



  

ê ì ê ì

¦ 2 § §





18 00

0

ê í ê í

§ ¨ ¥ 2 ¦

372 373

0 



0 0 ê í

1338 ê í

0 

ê î ê î

¦ 1 2 ¦ ¨¦ © 1¥ ¦ 21¥ ¦ 21 §

5

ê î ê î



ê ï ê ï



 

 0 

ê ï ê ï

ê ð ê ð

458 ¨1

 

  ê ð 1323 ê ð

b1 rmr2-1 00 

ê è ê è

¦ ¥ 8¥ 2© 1 2 ¨ ¨

1 83

 

ê è ê è

  

 

ê ê

é é

65 532 428 427 2 ¨



ê ê   

é é

ê ç êç

¨§ 2





 



 

ê ç êç

© 1¥ ¦

556 1



00

ê ç êç

  0

848 

 

 ëê

326432 431 432 147 451 ëê







ëê ëê 1

11 31 1 11





11 431 531 8©¨ 11

Figure A.2: (Continued on next page.) APPENDIX A. COMPLEMENTATION CROSS PEDIGREES 163

Figure A.2: Pedigree of rmr2-1 lineages used in complementation crosses with rmr7 alleles. All rmr2-1 lineages that resulted in complementation crosses with rmr7 alleles trace back to an early cross of individual 96-207-16 (B1-I Rmr2 / b1 rmr2-1 ) by individual 96-211-7 (b1 rmr2-1 / b1 rmr2-1 ). For clarity plants resulting from that cross (peach circles) are shown multiple times though they all originate from the single parental cross. The remainder of the pedigree highlights rmr2-1 lineages that complement rmr7 alleles (green) and those that fail to complement (teal). Asterisks mark the approximate time point of recombination events that separated the original b1 and rmr2-1 loci. Families that were crossed to rmr7 alleles for complementation tests have their family number designations printed beside them. The pedigree on the far left does not fully connect due to a gap in the 2007 records; however the linked b1 allele supports that this sub-lineage is most likely related to the other b1 rmr2-1 lineage. Connector lines and orientation of the family circles follow that of Figure A.1. Family numbers are given within the circles, where a particular individual was used the individual number follows the family number (family-individual). Ear or progeny numbers are given in colored rectangles along the lines that represent crosses. 164

Appendix B

Scripts used in small RNA analysis

B.1 Awk commands for processing SAM alignment files

The awk and bash commands below outline a quick protocol to reduce the alignment file size to a region of interest and deconvolute the individual libraries into their own alignment files. This code is meant for example purposes only and its function is not guaranteed on source files other than those for which it was created.

# define the boundaries of the region of interest chromosome="Rhoades_200kb" start=132000 end=146000

# extract only alignments from that region to a new SAM file awk -v chromosome=$chromosome -v start=$start -v end=$end ’BEGIN{FS = "\t"; OFS = "\t"} {if (substr($0,1,1) == "@") {print $0} else {if ($3 == chromosome && $4 >= start && $4 <= end) {print $0}}}’ \ Rhoades_hits.sam > pentarepeat_hits.sam

# extract library specific abundances from the read name # read name format (comma-delimited columns): # Column 1 Sequence tag # 2 Mapping frequency to B73 genome # ---Raw abundance in library--- # 3 B73 parent 1 # 4 B73 parent 2 # 5 Pl1-Rh ^ B73 parent 1 # 6 Pl1-Rh ^ B73 parent 2 # 7 Pl1-Rh ^ B73 x B73 hybrid 1 # 8 Pl1-Rh ^ B73 x B73 hybrid 2 APPENDIX B. SMALL RNA ANALYSIS SCRIPTS 165

# 9 rmr1-1 het 1 (skipped) # 10 rmr1-1 het 2 (skipped) # 11 rmr1-1 Pl1-Rh ^ B73 x B73 hybrid 1 # 12 rmr1-1 Pl1-Rh ^ B73 x B73 hybrid 2

# strand 0 = sense strand # strand 16 = antisense strand for library in 3 4 5 6 7 8 11 12; do for strand in 0 16; do awk -v l=${library} -v s=${strand} ’BEGIN{FS = "\t"; OFS = "\t"} {if (substr($0,1,1) == "@") {print $0} else {if ($2 == s) { split($1, libraries, ","); abundance = libraries[l]; while (abundance > 0) { print $0; abundance -= 1}}}}’ pentarepeat_hits.sam \ > library-${library}_strand-${strand}_pentarepeat_hits.sam done done

# extracting only 24mers (if desired) for file in library*sam; do awk ’BEGIN{FS = "\t"; OFS = "\t"} {if (substr($0,1,1) == "@") {print $0} else {split($1, libraries, ","); if (length(libraries[1]) == 24) {print $0}}}’ $file > 24mers_${file} done

# convert to indexed BAM files for importing into IGV for file in 24mers*; do samtools view -Sb ${file} | samtools sort - ${file} samtools index ${file}.bam done APPENDIX B. SMALL RNA ANALYSIS SCRIPTS 166

B.2 Python function to add total alignment count to SAM file

Below is the function to parse through a raw Bowtie[1] SAM output and append NH:i:## tags tallying the total alignments of the read to the reference sequence. Because this method relies on all alignments for a read being in a continuous block, you will get in- accurate results if the order of the SAM file is altered (for instance by sorting) prior to running this script. If updated, this script can be found at https://github.com/Joy- El/SequenceTools/blob/master/parse_bowtie_output.py

__author__ = ’Joy-El R.B. Talbot’ """Functions to parse Bowtie output for collapse and summary. Copyright (C) 2014 Joy-El R.B. Talbot This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program (LICENSE). If not, see For details on Bowtie please see their website: http://bowtie-bio.sourceforge.net/index.shtml"""

import sys

def read_chunk(open_file_object, chunk_size=1048): """Read in file by chunk_size chunks returning one line at a time.""" # get first chunk chunk = open_file_object.read(chunk_size) # continue looping until a chunk is just EOF (empty line) while chunk: chunk_list = chunk.split("\n") # yield all but last, potentially incomplete line for c in chunk_list[:-1]: yield c # add incomplete line to beginning of next chunk read chunk = chunk_list[-1] + open_file_object.read(chunk_size)

def add_multimapping_tally(open_bowtie_file, chunk_size=2048): """Count the number of mappings of a read tag. ASSUMES: file is sorted by read tag name APPENDIX B. SMALL RNA ANALYSIS SCRIPTS 167

Either run on unmodified bowtie output or re-sort on the first column of the SAM formatted output: To preserve the header lines: samtools view -SH bowtiefile.sam > sortedbowtiefile.sam To add sorted alignment lines: samtools view -S bowtiefile.sam | sort -k1,1 >> sortedbowtiefile.sam Note that sorting may take a lot of resource to do, that’s why it is best to add multimapping tally BEFORE any other operations are done to the Bowtie output.""" master_read = "" mapping_count = 0 saved_reads = [] for alignment in read_chunk(open_bowtie_file, chunk_size): columns = alignment.split("\t") if columns[0] == master_read: mapping_count += 1 saved_reads.append(alignment) else: if master_read != "": for read in saved_reads: yield "{}\tNH:i:{}".format(read, mapping_count) master_read = columns[0] saved_reads = [alignment] if (int(columns[1]) & 0x4) == 0x4: # unmapped read according to SAM 0x4 flag in second column mapping_count = 0 else: # mapped read mapping_count = 1