Computational and experimental analysis of plant

by

Matthew W. Jones-Rhoades

B.A., Chemistry (1999) Grinnell College

Submitted to the Department of in Partial Fulfillment of the Requirement for the Degree of in Biology

MASSACHUSETTSINSTATE at the OF TECHNOLOGY

Massachusetts Institute of Technology MAY 2 7 2005

June 2005 LIBRARIES © 2005 Massachusetts Institute of Technology All rights reserved

Signatureof Author...... Department of Biology x.. '"---.~ May 20, 2005

Certified by ...... · ...... David P. Bartel Professor of Biology Thesis Supervisor

Acceptedby ...... Stephen P. Bell Professor of Biology Chairman, Graduate Student Committee

,' tti v ;S Computational and experimental analysis of plant microRNAs

Matthew W. Jones-Rhoades

Submitted to the Department of Biology on May 20, 2005 in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy in Biology

ABSTRACT

MicroRNAs (miRNAs) are small, endogenous, non-coding RNAs that mediate gene regulation in plants and animals.

We demonstrated that Arabidopsis thaliana miRNAs are highly complementary (0-3 mispairs in an ungapped alignment) to more mRNAs than would be expected by chance. These mRNAs are therefore putative regulatory targets of their complementary miRNAs. Many miRNA complementary sites are conserved to the monocot Oryza sativa (rice), implying evolutionary conservation based on function at the nucleotide level. The majority of predicted miRNA targets encode for transcription factors and other proteins with known or inferred roles in developmental patterning, implying that the miRNAs themselves are high-level regulators of development. Our findings indicated that miRNAs are key components of numerous regulatory circuits in plants and set the stage for numerous additional experiments to investigate in depth the significance of miRNA-mediated regulation for particular target families and genes.

We developed a comparative genomics approach to identify miRNAs and miRNA targets conserved between Arabidopsis and Oryza. Seven previously unknown miRNAs families were experimentally verified, bringing the total number of known miRNA genes in Arabidopsis to 92, representing 22 families. We expanded the range of functionalities known to be regulated by miRNAs to include F-box proteins, laccases, superoxide dismutases, and ATP-sulfurylases. The expression of miR395, which targets sulfate metabolizing enzymes, is induced by sulfate- starvation, demonstrating that miRNA expression can be responsive to growth conditions.

We investigated the biological role of miR394-mediated regulation of Atlg27340, an F-box gene of previously unknown function. Transgenic plants expressing a miR394-resistant version of Atlg27340 displayed a range of developmental abnormalities, including radialized and fused cotyledons, absent shoot apical meristems, curled and radialized leaves, and abortive flowers. The severity of these abnormalities correlated with the overaccumulation of Atlg27340 mRNA. These findings confirm the biological relevance of the interaction between miR394 and Atlg27340, and represent the first insights into the roles of miRNA-mediated regulation of F-box genes. Our results establish that both MIR394 and Atlg27340 are important regulators of meristem identity, and suggest that Atlg27340 targets an activator of class III HD-ZIP function for ubiquitination and proteolysis.

Thesis Supervisor: David Bartel Title: Professor of Biology

2 Acknowledgements

I would like to thank David Bartel for being a tremendous advisor, role model, and friend throughout my time at MIT. I would like to thank all the past and present members of the Bartel lab, especially Mike Axtell, Scott Baskerville, Nelson Lau, Ben Lewis, Lee Lim, Allison

Mallory, Ramya Rajagopalan, Brenda Reinhart, I-hung Shih, Herv6 Vaucheret, and Soraya

Yekta. You have been a joy to work and play with, and a continual source of reagents, help, and advice. I would also like to thank Bonnie Bartel, without whom the analysis of plant miRNAs would have been ten times harder. I would like to thank my family, especially my parents, for always believing in me and supporting me.

Most importantly, I would like to thank my wife Melinda for being my partner and my best friend. Thanks for putting up with all the late nights and odd-hours; you were always there to encourage me when things went well and to give me perspective when they didn't.

3 Table of contents

Abstract 2 Acknowledgements 3 Table of contents 4 Introduction 5 Chapter I 42 Prediction of plant microRNA targets ChapterI has been publishedpreviously as: Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, and Bartel DP, "Predictionof plant microRNAtargets. " Cell 110(4):513-520 (2002) © CellPress.

Chapter II 65 Computational identification of plant microRNAs and their targets, including a stress-induced miRNA

ChapterII has been publishedpreviously as: Jones-RhoadesMW, and BartelDP, "Computationalidentification of plant microRNAsand their targets,including a stress-inducedmiRNA. " MolecularCell 14(6): 787-799 (2004) © Cell Press.

Chapter III 101 MicroRNA-mediated regulation of an F-box gene is required for embryonic, floral, and vegetative development

Appendix A 120 Arabidopsis, Oryza, and Populus miRNA complementary sites Appendix B 128 Appendix B has been published previously as: ReinhartBJ, WeinsteinEG, RhoadesMW, BartelB, and BartelDP, "MicroRNAs in plants. "Genes & Development 16(13): 1616-1626 © Cold Spring HarborLaboratory Press.

Appendix C 139 Appendix C has been published previously as: Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, and Burge CB, "Prediction of mammalian microRNA targets. " Cell 115(7): 787-798 (2003) © Cell Press.

4 Introduction The biology of multicellular organisms requires a complex network of gene regulatory pathways. MicroRNAs (miRNAs) are key components of this network which had been overlooked until recently. Initially discovered as regulators of developmental timing in C. elegans, miRNAs are now known to serve in a variety of regulatory roles in both plants and animals. The hallmark of a miRNA is a short (-20-24 nt), endogenously expressed non-coding RNA which is processed by RNaseIII proteins such as Dicer from a longer ssRNA precursor that contains a stem-loop secondary structure (reviewed in (11)). MicroRNAs are chemically and functionally similar to short interfering RNAs (siRNAs), which are processed by Dicer from long dsRNA precursor and which are central to the related phenomenon of RNA interference (RNAi), post-transcriptional gene silencing (PTGS), and transcriptional gene silencing (TGS). Mature miRNAs are incorporated into RNAi-induced silencing complexes (RISCs), in which the miRNA guides repression of target genes. Although miRNAs are deeply conserved with both the plant and animal kingdoms, there are substantial differences in the mechanism and scope of miRNA-mediated gene regulation between the two kingdoms, several of which have been instrumental in the rapid increase in our understanding of plant miRNA biology. Plant miRNAs are highly complementary to conserved target mRNAs, a fact which has allowed for the rapid and confident bioinformatic identification of plant miRNA targets (46, 113). Plant miRNAs guide the cleavage of their complementary mRNA targets, an activity which is readily assayed in vitro and in vivo (49, 76, 126). In addition, Arabidopsis is a genetically tractable model organism, which has enabled the study of the genetic pathways which underlie miRNA-mediated regulation and the phenotypic consequences of perturbing miRNA-mediated gene regulation. The picture emerging from this recent research is that plant miRNAs are master regulators of genetic pathways: the majority of genes regulated by plant miRNAs are themselves regulators such as transcription factors, F-box proteins, and RNAi related proteins. MicroRNAs: like siRNAs, but different Before discussing miRNAs, it is useful to consider a highly similar class of small RNAs, the small interfering RNAs (siRNAs). In Arabidopsis, siRNAs are the majority of small RNAs (75, 112, 124, 138), and have been implicated in a variety of pathways, including defense against

5 viruses, the establishment of heterochromatin, silencing of transposons and transgenes, and the post-transcriptional regulation of mRNAs (reviewed in (13)). MicroRNAs and siRNAs have much in common; both types of small RNAs are 20-24 nucleotides long, and both are processed from longer RNA precursors by Dicer ribonucleases (15, 38, 43, 50). Both are incorporated into ribonucleoprotein (RNP) complexes in which the small RNAs, through their base pairing potential, guide repression of target genes, and, as discussed below, the mechanisms by which they repress target genes are also similar. The fundamental difference between the two classes, then, is nature of their precursors; siRNAs are processed from long, dsRNAs, whereas miRNAs are processed from RNAs that are single-stranded but contain imperfect stem-loop secondary structures. In addition, there are a number of general, if not absolute, characteristics that set miRNAs apart from siRNAs. Many miRNAs are conserved between related organisms, whereas most endogenously expressed siRNAs are not(57, 61, 63, 112). Many (but not all) siRNAs target the gene from which they are derived. In contrast, a miRNA regulates genes unrelated to the locus from which the miRNA was derived. In addition, although the proteins required for siRNA and miRNA biogenesis are overlapping, in many organisms, including Arabidopsis, the genetic requirements for miRNA and siRNA function are partially distinct. For example, many Arabidopsis siRNAs require RNA-dependent RNA polymerases (RDRPs) for their biogenesis, whereas miRNAs do not (14, 27, 95, 138). Conversely, most Arabidopsis miRNAs require processing by DICER-LIKE1 (DCL1), one of four dicer-like genes in Arabidopsis, whereas many siRNAs require DICER- LIKE3 (DCL3) (35, 56, 112, 138). MicroRNA Biogenesis Like other types of cellular RNAs, miRNAs must be properly processed and localized in order to function. The steps through which a plant miRNA must pass include 1) transcription, 2) processing into a miRNA/miRNA* duplex, 3) covalent modification, 4) export from the nucleus, and 5) selective incorporation of the miRNA into RISC (Figure 1). Transcription of microRNAs In most cases, miRNAs have been initially discovered as the mature, 20-24 nucleotide form. Presumably, these mature miRNAs are initially transcribed as part of longer transcripts that must minimally include enough additional sequence to generate the stem-loop structures (typically -60-300 nucleotides in plants) that are recognized by Dicer. In several cases, miRNA

6 stem-loops have been shown to be contained within much longer transcripts, termed primary miRNAs (pri-miRNAs). The overexpression of 0.5 kb and 1.4 kb transcripts that contain miR319 and miR172 stem-loops, respectively, correlate with overaccumulation of the mature miRNA (7, 100). miR163 is contained within a 0.7 kb transcript that can be processed into mature miR163 (56). In addition, numerous miRNA precursors are found within ESTs from various plant species that contain additional sequence outside of the stem-loop (7, 46, 100). At least some of these longer pri-miRNA transcripts are spliced and appear to be poly-adenylated (7, 56). Indeed, two rice miRNAs are contained within transcripts that contain exon junctions within the presumptive stem-loop precursor, implying that in these cases splicing is a necessary prerequisite for recognition by Dicer (123). Because plant miRNAs are primarily found in genomic regions not associated with protein coding genes (112), it appears that most miRNA genes are their own transcriptional units. The fact that plant pri-miRNAs can be over 1 kb long, along with the fact that they can undergo canonical splicing and polyadenylation, strongly suggests that RNA polymerase II is responsible for transcribing most plant miRNAs, as has been shown to be the case for several animal miRNAs (66). Relatively little is known about the promoters of plant miRNAs or the regulation of miRNA transcription in plants. MicroRNA processing and export A central step in the maturation of miRNAs is the excision of the mature miRNA from the pri-miRNA by RNaseIII-type endonucleases such as Dicer. Although the observed sizes of Dicer products in plants range from around 20-25 nucleotides (126, 138), the plant miRNAs are primarily 20-21 nt in length (112). In contrast, 24mers are most abundant in the population of siRNAs cloned from Arabidopsis (126, 138). It has been suggested that different Dicer activities are responsible for the different sizes of small RNAs observed in Arabidopsis (126). This idea fits with genetic data, which suggests that the four Dicer-like genes in Arabidopsis have functionally distinct roles. DICER-LIKE3 (DCL3) and DICER-LIKE2 (DCL2) process certain endogenous siRNAs and viral derived siRNAs, respectively, but each is dispensable for miRNA accumulation (138). In contrast, partial loss-of-function alleles of DICER-LIKEI (DCLI) result in reduced accumulation of miRNAs and trans-acting siRNAs, without any obvious effect on the accumulation or function of various other classes of siRNAs (35, 112, 138) (131).

7 In animals, miRNAs are processed in a stepwise manner. A nuclear localized RNaseIII enzyme known as Drosha makes the initial cuts (one on each arm of the stem-loop) in the pri- miRNAs to liberate the miRNA stem-loop, the "pre-miRNA" from the flanking sequence of the pri-miRNA(65). After export to the cytoplasm, Dicer makes a second set of cuts, separating the miRNA, duplexed with its near reverse complement, the miRNA*, from the loop region of the pre-miRNA (65). The resulting miRNA/miRNA* duplex has two-nucleotide 3' overhangs, similar to the siRNA duplexes produced by Dicer from long double-stranded RNA (15, 31, 32, 55). The situation in plants appears to be somewhat different, as plants contain no clear ortholog to Drosha. Whereas most animal Dicers are thought to be localized to the cytoplasm (65), in Arabidopsis DCL1 is localized to the nucleus, and miRNA/miRNA* duplexes are excised from the pri-miRNAs within the nucleus (101, 103, 138). RNAs corresponding to the pre-miRNAs of animals are rarely, if ever, detected in plants (112), and it seems likely that both sets of cleavage events happen in rapid succession. It is uncertain if DCL1 makes both sets of cuts, or if additional nucleases are also involved. One key difference between the biogenesis of siRNAs and miRNA, other than the nature of the precursor, is the number of small RNA species produced per precursor. A single long, double-stranded RNA precursor can be processed into multiple siRNA duplexes by Dicer (Figure ib) (106, 131, 144). However, cloning and expression data show that a miRNA precursor produces predominately a single small RNA species, the mature miRNA (57, 61, 64, 71, 112, 1.24). Although there is some heterogeneity at the 5' and 3' ends of plant miRNAs, it is clear that DCL1 cuts preferentially at specific positions in the miRNA stem-loop precursor that result in the accumulation the appropriate mature miRNA (112). The mechanism by which DCL1 knows where to cut is largely a mystery, although there is evidence for the involvement of dsRNA- binding domain of DCL1. The dcll-9 allele, which disrupts the dsRNA-binding domain, cuts the miR163 stem-loop at aberrant positions (56). In addition to DCL1, several other genes have been shown genetically to be involved in miRNA biogenesis. Mutations in HYPONASTIC LEAVES1 (HYL1) or HUA ENHANCER1 (HEN1) result in reduced miRNA accumulation and function (18, 41, 104, 130, 138). HYL1 contains a NLS and a dsRNA binding domain, and has some homology to R2D2 in Drosophila and RDE-4 in C. elegans, proteins that are thought to function together with Dicer to load

8 siRNAs into the RISC (74, 125). HEN1 contains a methyltransferase domain, and is capable of methylating miRNA/miRNA* duplexes in vitro (143). Endogenous miRNAs are methylated on either the 2' or 3' ribose hydroxyl group of the 3' nucleotide in wild-type plants, but not in henl mutants (143). The function of this miRNA methylation is a mystery, and it remains possible that HEN1 may have additional activities that are important for miRNA biogenesis. In plants, it appears that most, if not all, processing and modification of miRNAs takes place in the nucleus. However, the majority of mature miRNAs are located in the cytoplasm (103), suggesting that a pathway exists for miRNA export. One component of this pathway is HASTY (HST), a member of the importin P family of nucleocytoplasmic transporters. hst mutants have reduced accumulation of most, but not all miRNAs, suggesting that HST is an important part of the miRNA export pathway, but that other components also exist (103). A similar pathway exists in animals; Exportin-5, the mammalian ortholog of HST, exports pre- miRNA hairpins from the nucleus to the cytoplasm (78, 142). As pre-miRNAs appear to be very short-lived in plants, it is likely that HST transports either miRNA/miRNA* duplexes or single- stranded miRNAs after they are fully excised by DCL1. Northern blot data suggests that miRNAs are primarily single-stranded in the nucleus (103), suggesting either that a fraction of functional miRNAs are located within the nucleus or that miRNAs are aleady single standed before transported to the cytoplasm by HST. It is unknown whether plant miRNAs are already associated with components of RISC when transported to the cytoplasm, or if loading into RISC takes place after transport. MicroRNA incorporation into RISC MicroRNAs are processed from their pri-miRNA precursors as duplexes with their miRNA* sequences. However, cloning and expression data indicate that the miRNA strand of this duplex accumulates at much higher levels in vivo than does the miRNA* (71, 112). This asymmetry of accumulation is achieved by the preferential loading of the miRNA strand into RISC, where it is presumably protected from degradation, whereas the miRNA* strand is preferentially excluded from RISC and consequentially subject to degradation. The key insight into understanding this asymmetry of RISC loading came from bioinformatic and biochemical studies of functional siRNA duplexes: the strand of siRNA duplex with less energetically strong pairing at its 5' end is selectively loaded into RISC, where it is competent to guide silencing, while the strand with the less stable 5' end is excluded from RISC (51, 116). Most

9 miRNA/miRNA* duplexes appear to have this energetic asymmetry; the 5' ends of most miRNAs are less stably paired than are the 5' ends of the corresponding miRNA*s (51, 116). The exact mechanism by which siRNA and miRNA duplexes are unwound and asymmetrically incorporated into RISC are still only partially understood, but they appear to involve R2D2-like proteins and perhaps an unidentified RNA helicase (reviewed in (128)). The final product of the miRNA/siRNA biogenesis pathway is a single-stranded RNA incorporated into a RNP complex. There are several varieties of these RNP complexes that vary at least partially in their composition and function; a RNP that mediates RNA cleavage and PTGS is generally referred to as a RISC, whereas a RNP that mediates chromatin modification and TGS is referred to as a RNAi-induced transcriptional silencing (RITS) complex. A central component of all these RNPs is a member of the Argonaute family of proteins. Argonaute proteins, which have been implicated in a broad range of RNAi-related mechanisms, contain two conserved domains, known as the PAZ and PIWI domains (21). The PAZ domain appears to be an RNA-binding domain (72, 118, 140), and the PIWI domain has structural similarity to RNase H enzymes (73, 119). Many organisms contain multiple members of the Argonaute family; in some of these cases, there is evidence for functional specificity of the different Argonautes. For example, only one of four mammalian Argonautes, Ago2, is capable of mediating RNA cleavage (73). Arabidopsis contains ten Argonaute proteins, four of which have been investigated experimentally. AGO4 is involved in the methylation of DNA associated with transposons and inverted-repeat transgenes (146, 147). PNH/ZLL/AGO1Oand ZIP/AGO7 are required for proper development, but the mechanism by which they act is not known (42, 79, 96, 97). Only one Argonuate gene, AGOI, has thus far been shown to be required for miRNA function in Arabidopsis. agol mutants have elevated levels of miRNA targets, consistent with AGO1 being needed for miRNA function (129). A null allele of AGO1 also shows a sharp decrease in accumulation of most miRNAs compared to wild-type (129). Although this reduction in miRNA levels may stem from AGO1 playing an early role in miRNA processing, it may also be due the loss of the RISC complexes needed to bind, and thus stabilize, the processed miRNAs. Mechanisms of miRNA function There are three basic mechanisms by which Dicer-produced small RNAs have been shown to regulate gene expression: RNA cleavage, translational repression, and transcriptional silencing.

10 MicroRNA-mediated RNA cleavage Directed RNA cleavage is perhaps the best studied mechanism by which small RNAs regulate gene expression. In this mechanism, an siRNA/miRNA guides RISC to cleave a single phosphodiester bond within a complementary RNA molecule. This so called "slicer" activity is thought to reside in the PIWI domains of certain Argonaute proteins (73, 119). Several lines of evidence indicate that plant miRNAs act to guide the cleavage of complementary mRNAs. MicroRNA-guided slicer activity is present in wheat germ lysate (126). MicroRNA targets are generally expressed at higher levels in plants that have impaired miRNA function as the result of either mutations in the miRNA pathway (e.g. henl, agol, and hyll) (18, 129, 130) or the expression of certain viral suppressors of RNA silencing (22, 24, 30, 49, 82), implying that miRNAs negatively regulate the stability of their targets. Moreover, the 3' cleavage products of many miRNA targets can be detected in vivo, either by Northern blot (49, 76, 80, 120) or by 5' RACE (46, 49, 76, 80, 81, 83, 100, 123, 139). MicroRNA-mediated translational repression The first miRNAs to be identified, the lin-4 and let-7 RNAs, regulate the expression of heterochronic genes that are critical for the timing of certain cell divisions during larval development of C. elegans (40, 64, 92, 111, 117, 136). However, the induction of these miRNAs at specified points in development does not greatly affect the mRNA levels of their targets, but rather the amount of protein produced from the targeted mRNAs (40, 117, 136). The exact mechanism by which this occurs is unclear, but it appears that functional translation of the targeted mRNAs is inhibited at some point after the initiation of translation (99). It is thought that this mode of target regulation is utilized by the majority of animal miRNAs. What determines whether a small RNA will guide the cleavage of its target, as opposed to directing its target for translational repression? To a certain extent, the outcome seems to depend on the degree of complementarity between the guide RNA and the target. An siRNA or miRNA that is perfectly complementary to a target RNA will generally lead to cleavage, whereas less perfect complementarity is generally associated with translational repression (28, 29). Indeed, the same small RNAs are capable of carrying out either mechanism. In mammalian cell culture, exogenous siRNAs which are competent to direct cleavage when presented with fully complementary targets can repress the translation of other targets which contain multiple, imperfectly complementary sites (28, 29). Conversely, the let-7 miRNA from Drosophila, which

11 is presumed to regulate its endogenous targets through translational repression, can guide cleavage of perfectly complementary RNAs in vitro (44). To a large extent, then, the tendency of plant miRNAs to cleave their targets is probably due to the fact that they are highly, sometimes perfectly, complementary to them, whereas few mRNAs have extensive complementarity to animal miRNAs. However, there are exceptions; miR-196 guides the cleavage of the highly complementary HoxB8 mRNA (84, 141). Furthermore, the expression of either miR-1 or miR-124 in HeLa cells slightly reduces the levels of over 100 mRNAs with complementarity to the 5' portion of the miRNA (70). It in unclear if these mRNAs are cleaved by RISC, albeit at a low efficiency, or if miRNA/RISC binding affects mRNA stability through some other mechanism. Conversely, one Arabidopsis miRNA, miR172, appears to effect the accumulation of target protein but not target mRNA, and thus appears to mediate translational repression (7, 25). Small RNA directed transcriptional silencing Sections of transcriptionally silent DNA, known as heterochromatic regions, are associated with certain covalent modifications of DNA and histones. Evidence from several organisms now shows that small RNAs are important for the establishment and/or maintenance of these heterochromatic modifications. In fission yeast, Dicer-produced small RNAs corresponding to heterochromatic repeats have been identified (110), and deletion of Dicer or Argonaute disrupts silencing at heterochromatic regions (133, 134). This transcriptional repression has been shown to involve the RITS complex, which, like the RISC, contains Argonaute and a single-stranded Dicer-produced siRNA, as well as Chpl and Tas3, which are not thought to be present in RISC (93, 98, 132). Small RNAs also guided repressive modifications of DNA and histones in plants (reviewed in (85)). For example, AGO4 is required for siRNA-guided transcriptional silencing of the SUPERMAN gene and the maintenance of transcriptional repression triggered by inverted repeats (146, 147). Do miRNAs guide transcriptional silencing in plants? Recent evidence suggests that they might (10). Dominant mutations within the miR166 complementary sites of the PHABULOSA (PHB) and PHAVOLUTA (PHV) mRNAs result in abnormal leaf development which correlates with a reduction in miR166-guided mRNA cleavage (83, 88). Curiously, these phb and phv mutants also correlate with a reduction of DNA methylation within the coding region of the mutant alleles (10). This reduction of methylation occurs only in cis; in heterozygous plants,

12 only the mutant copy of PHB is affected, whereas the wild-type copy is not (10). Because the miRNA complementary site in these mRNAs spans an exon-junction, miR166 is presumably not able to interact with the genomic DNA, which suggests that interaction between miR166 and the nascent, but spliced, PHB mRNA somehow results in DNA methylation (10). Although intriguing, the functional significance of this change in methylation is not yet clear. While methylated promoter regions are often associated with transcriptional silencing, the observed methylation in PHB and PHV is near the 3' end of the coding regions (10), and it is unknown what effect it is having on PHB or PHV transcription. It is not known if a reduction in miRNA complementarity generally correlates with a reduction in target gene methylation. Discovery of plant microRNAs Discovery of plant miRNAs: Cloning The most direct method of miRNA discovery has been to isolate and clone small cellular RNAs from biological samples. Quite a few groups have used this approach to identify small RNAs in animals, plants, and fungi (57, 61, 63, 71, 75, 89, 104, 110, 112, 123, 124) (58, 59, 94, 107-109, 122). Although the specifics of the protocols used by various groups differ in some details, all essentially involve the isolation of small RNAs, followed by ligation of adaptor oligos, reverse transcription, amplification, and sequencing. Some of these protocols incorporate methods to select for RNAs that are products of Dicer cleavage (i.e. that have a 5' phosphate and 3' hydroxyl) and to concatemerize the short cDNAs so that many can be analyzed in a single sequencing read (61). These cloning methods were first used to identify large numbers of miRNAs in animals (57, 61, 63). An initial round of cloning experiments in Arabidopsis identified nineteen miRNAs, as well as hundreds of endogenous siRNAs (75, 89, 104, 112). Subsequent cloning experiments have expanded our knowledge of both classes of small RNAs in Arabidopsis (124, 138), and more recently, Oryza sativa (rice) (123). The Carrington lab maintains an online database of small RNAs cloned from Arabidopsis (http://asrp.cgrb.oregonstate.edu/db/). Discovery of plant miRNAs: Forward genetics Given the abundance of miRNA genes in plants, and the mounting evidence that they are key regulators of developmental events, it is in some ways surprising that plant miRNAs were not discovered genetically long ago. Although it is something of a mystery as to why more miRNAs have not been identified in genetic screens, there are several notable examples where

13 they have. However, in plants at least, in none of these cases was it realized that miRNAs were involved until after cloning experiments had established that plant genomes contained numerous miRNAs. In a sense, the dominant mutations in the HD-ZIP genes PHB, PHV, and REVOLUTA (REV) in Arabidopsis and ROLLED LEAF1 (RLDI) in maize can be thought of as miRNA- related mutations; all result in adaxialization of leaves and/or vasculature as the result of mutations within miR166 complementary sites (33, 87, 88, 145). At least three miRNA genes, nziR319 (also known as miR-JAW), miR172 (also known as EAT), and miR166 were isolated as dominant overexpressors in enhancer trap screens for mutants with developmental abnormalities (7, 53, 100). To date, only a single loss of function allele at a miRNA gene has been identified genetically in plants; early extra petalsl is caused by the insertion of a transposon 160 b.p. upstream of miR164c, and results in flowers with extra petals (9). The fact that miRNA loss-of- function mutants have been recovered so rarely is perhaps due to redundancy; most miRNAs exist in multigene families that are likely to have overlapping function, buffering against a loss of function at any single miRNA gene. Discovery of plant miRNAs: Bioinformatics In both plants and animals, cloning has been the initial means of large-scale miRNA discovery. However, cloning is biased towards RNAs that are highly and broadly expressed. MicroRNAs that are expressed at low levels, or that are expressed only in specific cell types or in response to certain environmental stimuli, will be relatively difficult to clone. Any sequence specific biases in the cloning procedure might also cause certain miRNAs to be missed. Because of these limitations, bioinformatic approaches to identify miRNAs have been useful as a complement to cloning. A relatively straightforward use of bioinformatics has been to find homologs of cloned miRNAs, both within the same genome and in the genomes of other species (57, 61, 63, 105). A more difficult challenge is to identify miRNAs unrelated to previously known miRNAs. This was first done for animal miRNAs, using algorithms that search for conservation of sequence and secondary structure (i.e. miRNA stem-loop precursors) between animal species in patterns that are characteristic of miRNAs (6, 37, 60, 69, 71). Although these methods succeeded in identifying numerous potential animal miRNAs, many of which were subsequently confirmed experimentally, they are not directly useful in finding plant miRNAs because of the longer and more heterogeneous secondary structures of plant miRNA stem-loops.

14 To address this problem, several groups have devised bioinformatic approaches specific to the identification of plant miRNAs (2, 17, 46, 135). Like the algorithms for the identification of animal miRNAs, these approaches all use conservation of secondary structure as a filter, but are necessarily more relaxed in terms of the allowed structures. Some of these approaches take advantage of the high complementarity of plant miRNAs to target mRNAs; searching for conserved stem-loops with conserved complementarity to mRNAs not only helps to distinguish authentic miRNAs from false positives, but also identifies putative regulatory targets of the predicted miRNAs (2, 46). Genomics of plant microRNAs Taken in aggregate, cloning, genetics, and bioinformatics have identified 114 potential miRNA genes in Arabidopsis (Table 1, Table 2). These 114 miRNA loci can be grouped into 41 multigene families, with each family comprised of stem-loops with the potential to produce identical or highly similar mature miRNAs. 21 families are clearly conserved to additional plant species beyond Arabidopsis (Table 1), whereas for 20 families conservation outside of Arabidopsis has not been observed or is uncertain (Table 2). The following discussion will focus primarily on evolutionarily conserved families, as these generally have more reliable evidence for their expression and regulation of target genes. Expression of plant microRNAs Some miRNAs are among the most abundant cellular RNAs in animals, with individual miRNAs having up to 10,000-50,000 copies per cell (71). Although the expression levels of plant miRNAs have not been quantified, it is clear that many of them are abundantly expressed. Certain miRNAs have been cloned hundreds of times, and most miRNAs are readily detectable by Northern blot (3, 112)(http://asrp.cgrb.oregonstate.edu/db/). More recently, microarray technology has been adapted to rapidly survey the expression profile of plant miRNAs (8). Some miRNAs are expressed in a broad range of tissues, whereas others are expressed most strongly in particular organs or developmental stages (8, 112). More precise data on the localization of a few miRNAs in plants has come from in situ hybridization to miRNAs (25, 48, 52) or from miRNA-responsive reporter genes (102). Little is known about the transcriptional or post-transcriptional regulation of miRNA expression. The expression levels of several miRNAs are responsive to phytohormones or growth conditions; miR159 levels are enhanced by gibberellin signaling (1), and miR393 levels are increased by a variety of stress conditions (124).

15 The dependence of miR395 levels on growth conditions is even more striking. A regulator of sulfate metabolizing enzymes and sulfate transporters(2, 3, 46), miR395 is undetectable in plants grown on standard MS media, but induced over 100 fold in plants which are starved for sulfate (46). Conservation of plant microRNAs Twenty miRNA families have been identified so far that are conserved between all three sequenced plant genomes: Arabidopsis, Oryza sativa (rice), and Populus trichocarpa (Table 1). There are also several examples of miRNA families which are conserved within specific lineages; miR403 is present in the dicots Arabidopsis and Populus but absent from the monocot Oryza (124). An additional three families identified by cloning in Oryza are conserved to other monocots such as Maize, but are not evident in either sequenced dicot (123). Within each family, the mature miRNA is always located on the same arm of the stem-loop for each family member (5' or 3') (Figure 2). Although the sequence of the mature miRNA and, to a lesser extent, the miRNA*, are highly conserved between members of the same miRNA family (both within and between species), the sequence, the secondary structure, and even the length of the intervening "loop" region can be highly divergent between family members (Figure 2). The pattern of pairing and non-pairing nucleotides within the mature miRNA and miRNA* is often conserved between homologous miRNA stem-loops from different species (Figure 2). The significance of these conserved bulges is unknown; perhaps they serve to guide DCL1 cleavage to the appropriate positions along the stem-loop. Most small RNA cloning efforts in plants have focused on Arabidopsis, a dicot, or Oryza, a monocot, and bioinformatic methods have focused on miRNAs conserved between these two species. Both species are angiosperms (flowering plants), and diverged from each other -145 million years ago (23). Growing evidence shows that many angiosperm miRNA families, and their complementary sites in target mRNAs, are conserved in more basal land plants. Ten miRNA families have conserved target sites in ESTs from gymnosperms or more basal plants, and a miR159 stem-loop is present in an EST from the moss Physcomitrella patens (46). A cDNA containing a miR166 stem loop as been cloned from the lycopod Selaginella kraussiana, and miR166 mediates cleavage within the highly conserved miR166 complementary sites of HD- ZIP mRNAs from gymnosperms, ferns, lycopods, and mosses (36). A systematic search for miRNA expression using microarray technology revealed that at least 11 miRNA families have

16 detectable expression in gymnosperms, and at least 2 (miR160 and miR390) are detectable in moss (8). Furthermore, a clever approach to experimentally identify verify miRNA targets in plants without sequenced genomes found evidence that four miRNA families (miR160, miR167, miR171, and miR172) cleave target mRNAs in gymnosperms, ferns, or mosses that are homologous to the verified Arabidopsis miRNA targets (8). Some of these miRNA families have been shown to regulate development in Arabidopsis, being necessary for processes such as the proper specification of floral organ identity (miR172) or leaf polarity (miR166). It is curious then that these miRNA families regulate homologous mRNAs in basal plant that have very different reproductive structures and leaf morphology. It is tempting to speculate that these miRNAs are parts of ancient, conserved regulatory pathways which underlie seemingly different developmental outcomes. Gene count Counting only the 21 conserved families, the Arabidopsis genome contains at least 91 potential miRNA genes (http://www.sanger.ac.uk/Software/Rfam/mira/index.shtml, Table 1). These families are somewhat expanded in Oryza and Populus, containing 116 and 169 potential miRNA genes, respectively (Tablel). The number of members per family in one genome ranges from 1 to 32. It is unclear why plant genomes contain so many stem-loops encoding similar miRNAs. The number of members in each family seems to be correlated between species; certain families contain numerous members in all three species (e.g. miR156, miR166, miR169), whereas others consistently contain only a few genes (e.g. miR162, miR168, miR394) (Table 1). Although it is unclear why a plant would need, for example, 12 copies of miR156, this correlation suggests a functional significance in the sizes of the various miRNA families. Non-conserved microRNAs Although many miRNA families are conserved widely in plants, others are found only in a single genome, and thus appear to be of a more recent evolutionary origin (Table 2). Based on extended homology between non-conserved miRNAs and target genes, it has been proposed some of these young miRNAs arose as tandem duplications of target-gene segments (4). Although several non-conserved miRNAs have been shown to cleave target mRNAs (4, 130), it is difficult to confidently predict targets for many because it is not possible to use conservation of complementary sites as a filter against false positives. In fact, it is difficult to be confident that all annotated non-conserved miRNAs are in fact miRNAs rather than siRNAs. The

17 established minimal standard is that a small RNA with detectable expression and the potential to from a stem-loop when joined to flanking genomic sequence can be annotated as a miRNA (5). In practice, these requirements are too loose to be useful in categorizing small RNAs cloned from plants. Many plant siRNAs are detectable on blots (138), and hundreds of thousands of non-miRNA genomic sequences can be predicted to fold into secondary structures that resemble the structures of plant miRNA precursors (46). Therefore, without the conservation of a characteristic pattern of sequence and secondary structure, it can be difficult to know if a given cloned RNA originated from a single-stranded stem-loop (i.e. is a miRNA) or from a double- stranded RNA (i.e. is a siRNA). In fact, many of the thousands of cloned Arabidopsis siRNAs (http://asrp.cgrb.oregonstate.edu/) would probably meet the literal requirements for annotation as miRNAs. A few of these sequences probably are miRNAs, but others that might meet the literal criteria probably are not. Because of this difficulty in identifying non-conserved miRNAs, it is not possible to propose a meaningful estimate on the total number of miRNA genes in Arabidopsis or other plant genomes. Because stem-loop structures that resemble miRNA precursors are so common in genomic sequence, bioinformatic searches for plant miRNAs are also prone to identifying false positives. Nine annotated miRNA families that differ from other miRNAs in several key aspects were identified in a bioinformatic screen for miRNAs conserved between Arabidopsis and Oryza (135). Unlike all the other conserved families, each has a single locus in each genome, and none of these 9 families have clearly identifiable homologs in the Populus genome or in ESTs from other plant species (M.W. Jones-Rhoades, personal communication). Taken together with the fact that the stem-loops of many of these miRNAs have more unpaired nucleotides within the miRNA/miRNA* then is typical for miRNAs with more experimental evidence, it appears likely that these sequences are bioinformatic false positives rather than bonafide miRNAs. Regulatory roles of plant microRNAs Regulatory roles of animal microRNAs As cloning experiments in animals identified large numbers of miRNAs, their functions remained largely unknown. Experience from the founding miRNAs, the lin-4 and let-7 RNAs, suggested that many, in not all, cloned miRNAs were also likely to repress the translation of protein coding genes. However, there was a considerable lag between large scale miRNA identification in animals and reliable genome-wide prediction of miRNA regulatory targets.

18 Animal miRNA are capable of repressing mRNAs to which they have quite limited complementarity; 7 or 8 adjacent paired nucleotides in the 5' portion of the miRNA is sufficient for repression in vivo (20, 29). Animal mRNAs contain numerous matches with this degree of complementarity, not only to miRNAs but also to arbitrary sequences with similar dinucleotide composition as miRNAs (67, 68). The primary challenge in predicting animal miRNA targets has been to know which of these numerous potential targets are biologically significant. Algorithms which search for the conservation of potential target sites across multiple species and which take into account the pairing requirements for translational repression (e.g. emphasis on pairing to the 5' portion of the miRNA) have identified thousands of mRNAs as probable targets of animal miRNAs (20, 34, 45, 54, 67, 68, 121). The limited pairing required for translational repression in animals, as well as the large number of predicted targets, has lead to the "micromanager model" for animal miRNA-mediated regulation, whereby many, if not most, animal mRNAs have their expression modulated to a greater or lesser extent through interaction with miRNAs (12). Identification of plant miRNA targets In contrast to the delay in animals, the high degree of complementarity between Arabidopsis miRNAs and their target mRNAs allowed for the confident prediction of targets soon after the discovery of the miRNAs themselves. The first indication of this plant-specific paradigm for miRNA target recognition came from miR171. miR171 has 4 matches in the Arabidopsis genome: one is located between protein coding genes and has a stem-loop structure, whereas the other three are all antisense to SCARECROW-LIKE (SCL) genes and lack stem-loop structures (75, 112). The intergenic miR171 locus with the stem-loop produces a miRNA that guides the cleavage of the complementary SCL mRNAs (76). Although other Arabidopsis miRNAs are not perfectly complementary to mRNAs, most of them are nearly so. An initial genome-wide screen for miRNA targets searched for mRNAs containing ungapped, antisense alignments with 0-3 mismatches to miRNAs, a degree of complementarity highly unlikely to occur by chance (113). Using this cutoff, targets could be predicted for 11 out of 13 miRNA families known at the time, comprising 49 target genes in total (113). For conserved miRNAs, more sensitive predictions that allow for gaps and more mismatches can be made by identifying cases where homologous mRNAs in Arabidopsis and Oryza each have complementarity to the same miRNA family (46).

19 Because plant miRNAs affect the stability of their targets, mRNA expression arrays can be used in experimental genome-wide screens for miRNA targets. For example, expression array data found that five mRNAs encoding TCP transcription factors are down regulated in plants over-expressing miR319 (100). Expression arrays may be especially useful in identifying miRNA targets which have been missed by bioinformatics (i.e. targets with more degenerate or non-conserved complementarity which are nonetheless subject to miRNA-guided cleavage). However, analysis of mRNAs down-regulated in plants overexpressing one of four miRNAs identified only two potentially direct targets not related to those found through bioinformatics (115). Furthermore, evidence for miRNA-guided cleavage of these targets in wild-type plants was not detected by 5' RACE, suggesting that these mRNAs may only be cleaved in plants that ectopically express miRNAs (115). The scope of miRNA-mediated regulation in plants The identity of their predicted targets suggests that plant miRNAs are master regulators; many miRNA targets encode for regulatory proteins. The 21 conserved miRNA families have 90 confirmed or predicted conserved regulatory targets in Arabidopsis (Table 3). 65 (72%) of these encode for transcription factors, pointing to a role for miRNAs in control of transcriptional regulation. Another six (7 %) are F-box proteins or E2 ubiquitin conjugation enzymes thought to be involved in the selective targeting of proteins for degradation by the proteasome, implying a role for miRNAs in regulating protein stability. DCLI, AGO1 and AGO2 are also miRNA targets, suggesting that miRNAs regulate their own biogenesis and function. Other conserved miRNA targets, such as ATP-sulfurylases, superoxide dismutases, and laccases have less clear roles as regulators; although in vivo miRNA-mediated cleavage has been shown for many of these targets, the biological significance of their regulation by miRNAs is not known. All 20 miRNA families that are conserved between Arabidopsis, Populus, and Oryza have complementary sites in target mRNAs that are also conserved in all three species (Table 3). Although these miRNAs may also have targets which are not conserved, this conservation of target sites suggests that miRNAs play similar roles in different plant species. Indeed, mutations in class III HD-ZIP genes that reduce miR166 complementarity in Arabidopsis and Oryza have similar phenotypes (48, 88, 113). However, the expansion of certain miRNA families and target classes in different species suggests that some of these miRNA families may have species- specific roles. For example, the miR397 family is complementary to mRNAs of 26 putative

20 laccase genes in Populus, whereas it has comparable complementarity to only three in Arabidopsis. Although the roles that laccases play in the biology of plants is not well understood, there is speculation that they may be involved in lignification (86), a process which may be more critical in a woody plant such as Populus. Validation of plant miRNA targets While the majority of plant miRNA targets were initially predicted through bioinformatics, a growing number have been validated experimentally. One means of target validation has been to use Agrobacterium filtration to observe miRNA-dependent cleavage of targets in Nicotiana benthiama leaves (49, 76). Another has been to assay the endogenous miRNA-mediated cleavage activity that is present in wheat germ lysate (83, 126). Perhaps the most useful method of miRNA target validation has been to use 5' RACE to detect in vivo the products of miRNA mediated cleavage reactions (46, 49, 76, 80, 81, 83, 100, 123, 139). An adaptor oligo is ligated to the 5' end of the uncapped 3' portion of a cleaved miRNA target, followed by PCR with a gene specific primer (49, 76). Sequencing of the resulting PCR product maps the precise position of cleavage within the target, usually between the nucleotides that pair to positions 10 and 11 of the miRNA. A more informative level of target validation is to examine the biological significance of the miRNA-mediated regulation of that target. As discussed below, reverse genetic approaches have yielded information about the in vivo relevance of a growing number of miRNA-target interactions. Regulatory roles of plant microRNAs The first evidence that small RNAs play roles in plant development came from mutants impaired in small RNA biogenesis or function. Indeed, several genes central to miRNA function, including DCL1, AGO1, and HEN1, were initially identified based on the developmental consequences of their mutations before they were known to be important for small RNA biogenesis or function. Multiple groups isolated dcll mutants; the most severe mutations result in early embryonic arrests, and even partial loss-of-function mutants result in pleiotropic defects, including abnormalities in floral organogenesis, leaf morphology, and axillary meristem initiation (reviewed in (114)). agol, henl, hyll, and hst mutants all have pleiotropic developmental defects that overlap with those of dcll plants (16, 26, 77, 91, 127). In addition, plants that express certain viral inhibitors of small RNA processing or function, such as

21 HC-Pro and P19, also exhibit developmental defects reminiscent of dcll mutants (22, 24, 30, 49, 82). Although many or all of these developmental defects may be the result of impaired miRNA activity, they may also reflect disruption of other pathways in which these genes are active, such as in the generation and function of siRNAs. However, in contrast to mutations in genes needed for miRNA biogenesis, mutations in genes required for the accumulation of certain siRNAs, such as AGO4, RDR6, and DCL3, result in few or mild developmental abnormalities (27, 95, 131, 138, 146). Mutations that impair a fundamental step in miRNA biogenesis result in the misregulation of numerous miRNA targets (18, 130), making it difficult to assign the observed phenotypes to any particular miRNA family. Fortunately, the ease by which transgenic Arabidopsis can be generated has allowed the investigation of particular miRNA/target interactions through one of two reverse genetic strategies. The first strategy is to make transgenic plants that overexpress a miRNA, typically under the control of the strong double 35S promoter (Table 4). This approach has the potential to downregulate all mRNAs targeted by the overexpressed miRNA. The second strategy is to make transgenic plants that express a miRNA- resistant version of a miRNA target, in which silent mutations have been introduced into the miRNA complementary site that disrupt miRNA-mediated regulation without altering the encoded protein product (Table 5). In total, eight miRNA families have been investigated in vivo by these strategies. As might have been expected from the identity of their target mRNAs, in all eight cases perturbation of miRNA-mediated regulation results in abnormal development. Taken together, they prove that miRNAs are key regulators of many facets of Arabidopsis development. One of the better studied families of miRNA targets are the class III HD-ZIP transcription factors. The importance of miR166-mediated regulation for the proper regulation of this gene class is underscored by the large number of dominant gain-of-function alleles that map to the miR166 complementary sites of HD-ZIP mRNAs (33, 48, 87, 88, 145). phb and phv mutants result in adaxialization of leaves and over-expression of phblphv mRNA(87, 88), whereas rev mutants result in radialized vasculature (33, 145). Similarly, mutations within the miR166 complementary site of the maize HD-ZIP gene RLD1 result in adaxialization of leaf primordia and overaccumulation of rldl mRNA (48). All of these HD-ZIP gain-of-function mutations result in a change in the amino acid sequence of the conserved START domain. Before the

22 discovery of miR166, it was hypothesized that the HD-ZIP mutants resulted from the loss of negative regulatory interaction mediated by the START domain (88). However, transgenic plants expressing miR166-resistant version of PHB, PHV, or REV result in plants that phenocopy their respective gain-of-function mutants, whereas transgenic plants containing additional wild- type copies of these genes have no or mild phenotypes (33, 83). This demonstrates that changes in the RNA sequence, rather than in the amino acid sequence, are sufficient to account for the developmental abnormalities observed in HD-ZIP gain-of-function mutants. miR172-mediated regulation of APETALA2 (AP2) and related AP2-like genes is needed for the proper specification of organs during flower development (7, 25). Plants that overexpress miR172 have floral defects, such as the absence of petals and the transformation of sepals to carpels, which resemble ap2 loss-of-function mutants (7, 25). Curiously, overexpression of miR172 substantially decreases the protein levels of target AP2-like genes without a commensurate change in target mRNA levels, suggesting that, unlike other known plant miRNA- target interactions, miR172 is repressing translation of AP2-like mRNAs in a manner similar to that employed by animal miRNAs (7, 25). However, the extent of complementarity between miR172 and the AP2-like mRNAs is high, comparable to that of other plant miRNA targets that undergo robust miRNA-mediated cleavage, and 3' cleavage fragments of AP2-like mRNAs can be detected by 5' RACE (7, 49). Indeed, Schwab et al. found that cleavage of miR172 targets is increased in miR172 overexpressing plants, and postulated a feedback mechanism whereby AP2- like proteins repress their own transcription, resulting in similar mRNA levels despite an increase in mRNA cleavage (115). It appears that miR172 mediated regulation of AP2-like genes is complex, and it is unclear how similar miR172-mediated regulation is to the miRNA-mediated translational repression observed in animals. Although most miRNA families are predicted to target a single class of targets, the miR159/319 family regulates both MYB and TCP transcription factors. Although miR159 and miR319 differ by only three nucleotides, they appear to be functionally distinct. Overexpression of miR319, which specifically downregulates TCP mRNAs, results in plants with uneven leaf shape and delayed flowering time (100). Expression of miR319-resistant TCP4 results in aberrant seedling that arrest with fused cotyledons and without forming apical meristems (100). Overexpression of miR159, which specifically reduces accumulation of MYB mRNAs, results in male sterility (1, 115), whereas plants that express miR159-resistant MYB33 have upwardly

23 curled leaves, reduced stature, and shortened petioles (90, 100). Thus miR159 and miR319 are related miRNAs that regulate unrelated mRNAs. In addition to the miRNAs that target transcription factors, two miRNAs families are known to target genes central to miRNA biogenesis and function; miR162 targets DCL1 (139) and miR168 targets AGOI (113, 129). The targeting of these genes suggests a feedback mechanism whereby miRNAs negatively regulate their own activity. Curiously, although plants expressing miR168-resistant AGO1 overaccumulate AGO1 mRNA as expected, they also overaccumulate numerous other miRNA targets and exhibit developmental defects which overlap with those of dcll, henl, and hyll loss-of-function mutants (129). This suggests that an overabundance of AGO1 inhibits, rather than promotes, RISC activity (129). MicroRNAs: plants vs. animals As our understanding of miRNA genomics and function in both plants and animals has grown, so has the realization that there are numerous differences between the kingdoms in terms of the ways miRNAs are made and carry out their regulatory roles. Indeed, the evolutionary relationship between plant and animal miRNAs is unclear. Did the last common ancestor of plants and animals possess miRNAs from which modem miRNA are descended, or did the plant and animal lineages independently adapt conserved RNAi machinery to use endogenously expressed stem-loop RNAs as trans regulators of other genes? Although miRNAs are deeply conserved within each kingdom (8, 36, 57, 61, 63, 71, 105), no particular miRNA is known to be conserved between kingdoms. There are several kingdom-specific differences in miRNA biogenesis. For one thing, the stem-loop precursors of plant miRNAs are markedly longer and more variable than their animal counterparts. The cellular localization of processing appears to differ between plant miRNAs, which are entirely processed within the nucleus (101, 103, 138), and animal miRNAs, which are processed both in the nucleus and in the cytoplasm (65). Perhaps more importantly, the scope and mode of regulation carried out by miRNAs appears to be drastically different between the two kingdoms. Most plant miRNAs guide the cleavage of target mRNAs (46, 49, 76, 126), and the predicted targets of Arabidopsis miRNAs, which comprise less than 1% of protein coding genes, are highly biased towards transcription factors and other regulatory genes (46, 113). Although at least some animal miRNAs guide cleavage of endogenous targets (84, 141), most appear to act through the repression of translation (19, 20, 68, 117, 136). Furthermore, the identification of conserved reverse complementary matches to the

24 5' "seed" portions of animal miRNAs suggests that a large percentage (20-30 % or more) of animal protein coding genes are conserved miRNA targets (20, 67, 137). Whatever the evolutionary relationship is between plant and animal miRNAs, the functional differences are striking. Summary Plant miRNAs were initially identified through cloning, without any indication as to their biological roles. In chapter one of this thesis, I describe the initial genome-wide bioinformatic screen for plant miRNA regulatory targets. We show that Arabidopsis miRNAs are complementary to far more mRNAs than would be expected by chance, and propose that mRNAs that can pair to miRNAs with less than three unpaired nucleotides are likely to be miRNA targets. Furthermore, many of these miRNA complementary sites are conserved to orthologous Oryza mRNAs, implying that miRNA-mediated regulation of many targets predates the divergence of dicots and monocots. It total, we identified 49 predicted targets, of which 34 encode for transcription factors. Our findings indicated that miRNAs are key components of numerous regulatory circuits in plants and set the stage for numerous additional experiments to investigate in depth the significance of miRNA-mediated regulation for particular target families and genes. Cloning is an efficient way to identify abundant miRNAs, but it is likely to miss those expressed at low levels or under specific conditions. In chapter two, I describe the development and implementation of a bioinformatic approach to identify conserved miRNAs unrelated to those discovered by cloning. In conjunction with this, I used the conservation of miRNA target sites to increase the sensitivity and selectivity of plant miRNA target prediction. Seven previously unknown families of miRNAs were identified computationally and verified experimentally. These newly identified families expanded the categories of genes known to be regulated by miRNAs to include F-box genes, sulfate metabolizing genes, laccases, and superoxide dismutases. Bioinformatic approaches have proven effective at identifying targets of plant miRNAs, and moderately high throughput methods such as 5' RACE can detect evidence for the interaction of many miRNA-mRNA pairs. However, our understanding of the biological significance of plant miRNAs has been greatly aided by reverse genetic approaches that allow for the disruption of miRNA-mediated regulation. In chapter three, I describe the role of

25 miR394 in the regulation of F-Box gene Atlg27340. Expression of miR394-resistant Atlg27340 results in numerous developmental abnormalities, including downwardly curved rosette leaves, radialized cauline leaves, abortive flowers, and arrested seedlings that lack shoot apical meristems, that correlate with an increase in Atig27340 mRNA levels.

26 Table 1. Genomic loci of conserved plant miRNA families miRNA family A.t. O.s. P.t. miR156 12 12 11 miR159/319 6 8 15 miR160 3 6 8 miR162 2 2 3 miR164 3 5 6 miR166 9 12 17 miR167 4 9 8 miR168 2 2 2 miR169 14 17 32 rniR171 4 7 10 miR172 5 3 9 miR390 2 1 4 miR393 2 2 4 miR394 2 1 2 miR395 6 19 10 miR396 2 5 7 miR397 2 2 3 miR398 3 2 3 miR399 6 11 12 miR408 1 1 1 miR403 1 0 2 miR437 0 1+ 0 miR444 0 1+ 0 miR445 0 9+ 0 Total 91 127 169 The number of identified genes in each family of miRNAs is indicated. Only miRNA families with strong evidence for conservation are listed. Otyza miRNA families which appear to be missing from Arabidopsis and Populus but are present in Maize are marked with a plus (+).

27 Table 2. Genomic loci of non-conserved plant miRNA families miRNA family A.t. O.s. P.t. miR158 2 0 0 miR161 1 0 0 miR163 1 0 0 miR173 1 0 0 miR400 1 0 0 :miR401 1 0 0 miR402 1 0 0 miR404 1 0 0 miR405 3 0 0 miR406 1 0 0 miR407 1 0 0 miR435 0 1 0 miR436 0 1 0 miR438 0 1 0 miR439 0 10 0 miR440 0 1 0 miR441 0 3 0 miR442 0 1 0 miR443 0 1 0 miR446 0 1 0 miR413 1 1 0 miR414 1 1 0 miR415 1 1 0 miR416 1 1 0 miR417 1 1 0 miR418 1 1 0 miR419 1 1 0 miR420 1 1 0 miR426 1 1 0 Total 23 29 0 The number of identified genes in each family of miRNAs is indicated. Only miRNA families without strong evidence for conservation are listed. As discussed in the text, miR413- miR426 were identified bioinformatically as conserved between Arabidopsis and Oryza, but are not evident in Populus and it is unclear if they are truly miRNAs.

28 Table 3. Regulatory targets of plant miRNAs miRNA family Target family Validated targets Validation method A.t. O.s. P.t. miR156 SBP SPL2, SPL3, SPL4, SPL10(3, 24, 49, 131) 5' RACE 11 9 16 miR159/319 MYB MYB33, MYB65(1, 90, 100) target, miRNA-resistant 8 6 target, Agro-infiltration 5 miR159/319 TCP TCP2, TCP3, TCP4, TCP10, TCP24(100) 5' RACE, miRNA-resistanttarget 5 4 7 miR160 ARF ARF10, ARF16, ARF17(3, 49, 80) 5' RACE, miRNA-resistant target 3 5 9 miR164 NAC CUC1,CUC2,NAC1,At5g07680, At5g61430(39, 49, 5' RACE, wheat germ lysate, 6 6 6 62, 81) miRNA-resistant target miR166 HD-ZlPIII PHB, PHV, REV, A THB-8, ATHB-15(33, 53, 83,126) 5' RACE, wheat germ lysate, 5 4 9 miRNA-resistanttarget miR167 ARF ARF6, ARF8(3, 49) 5' RACE 2 4 7 miR169 HAP2 Atlg17590, Atlg72830, Atlg54160, At3g05690, 5 RACE 8 7 9 At5g06510(46) miR171 SCL SCL6-111,SCL6-IV(49, 76) 5' RACE, Agro-infiltration 3 5 9 miR172 AP2 AP2, TOE1,TOE2, TOE3(7, 25, 49) 5' RACE, miRNA-resistant target 6 5 6 miR393 bZIP* Atlg27340(46) 5' RACE 1 1 1 miR396 GRF GRL1, GRL2, GRL3, GRL7, GRL8, GRL9(46) 5' RACE 7 9 9 total, transcription factors 65 65 93 miR161 PPR Atlg06580(4, 131) 5' RACE 9 0 0 miR162 Dicer DCL1(139) 5' RACE 1 1 1 miR163 SAMT Atlg66690, Atlg66700, Atlg66720, At3g44860(4) 5' RACE 5 0 0 miR168 ARGONAUTE AG01(129, 131) 5' RACE, miRNA-resistant target 1 6 2 miR393 F-box Atlg12820, At3g26810 At4g03190, 5' RACE 4 2 5 At3g23690(46)TIR1, miR394 F-box Atlg27340(46, 47) 5' RACE, miRNA-resistant target 1 1 2 miR395 APS APS1,APS4(46) 5' RACE 3 1 2 miR395 S transporter AST68(3) 5' RACE 1 2 3 miR396 Rhodenase 1 1 1 miR397 Laccase At2g29130, At2g38080, At5g60020(46) 5' RACE 3 15 26 miR398 CSD* CSD1, CSD2(46) 5' RACE 2 2 2 miR398 CytC oxidase* At3g15640(46) 5' RACE 1 1 0 miR399 Ph transporter 1 4 4 miR399 E2-UBC At2g33770(3) 5' RACE 1 1 2 miR403 ARGONAUTE AGO2 (3) 5' RACE 1 0 1 miR408 Laccase At2g30210(115) 5' RACE 3 2 3 miR408 Plantacyanin 7448.m00137(123) 5' RACE 1 3 1 total, non-transcription factors 39 42 55 Validated and predicted targets of Arabidopsis miRNAs are listed, grouped into those encoding transcription factors (top) and those encoding other functionalities (bottom). For each target family, the number of genes predicted to be targets in each of three plant species with sequenced genomes (A.t., Arabidopsis thaliana; O.s., Oryza sativa; P.t., Populus trichocarpa) is indicated. To be counted, a potential target must contain a complementary site to at least one member of the indicated miRNA family with a score of 3 or less (as described (46) ), with the exception of the target families marked with an asterisk, for which some targets with more relaxed complementarity were included. Non-validated target families are listed only if they are present in all three species. miR408-directed cleavage of plantacyanin mRNAs have been validated only in Oryza. Abbreviations: SBP, SQUAMOSA-promoter binding protein; ARF, AUXIN RESPONSE FACTOR; SCL, SCARECROW-LIKE; GRF, GROWTH REGULATING FACTOR; SAMT, SAM-dependant methyl transferase; APS, ATP-sulfurylase; CSD, COPPER SUPEROXIDE DISMUTASE; E2-UBC, E2 ubiquitin- conjugating protein

29 'Table 4. miRNA overexpression affecting development miRNA target family Consequences of overexpression miR156 SPL transcription factors Increased leaf initiation, decreased apical dominance, delayed flowering time (115) miR159 MYB transcription factors Male sterility, delayed flowering time (1) miR319 TCP transcription factors Uneven leaf shape and curvature, late flowering (100) miR164 NAC domain transcription factors Organ fusion (62, 81) miR166 HD-ZIP transcription factors Seedling arrest, fasciated apical meristems, female sterility (53) miR172 AP2-like transcription factors Early flowering, lack of petals, transformation of sepals to carpels (7, 25)

Table 5. miRNA-resistant target affecting development miRNA family miRNA-resistant target promoter Phenotype miR159 MYB33 35S Upwardly curled leaves (100) miR159 MYB33 Endogenous Upwardly curled leaves, reduced stature, shortened petioles (90) miR319 TCP4 Endogenous and 35S Arrested seedlings, fused cotyledons, lack of SAM (100) miR319 TCP2 35S Longer hypocotyls, reduced stature and apical dominance (100) miR160 ARF17 Endogenous Extra cotyledons (80) Shortened rosette leaf petioles, aberrant leaf shape, extra petals, missing miR164 CUC1 Endogenous sepals (81) miR164 CUC2 Inducible and 35S Aberrant leaf shape, extra petals, increased sepal separation (62) miR164 NACI 35S Increased number of lateral roots (39) miR166 REV Endogenous Radialized vasculature, strands of leaf tissue attached to stem (33) miR166 PHB 35S Adaxialized leaves, ectopic meristems (83) miR168 AGO1 Endogenous Curled leaves, disorganized phyllotaxy, reduced fertility (129) miR172 AP2 35S Late flowering, excess of petals and stamens (25) miR394 Atlg27340 Endogenous Curled leaves, lack of SAM, abortive flowers, "spiked" cauline leaves (47)

30 Figure Legends Figure 1. Mechanisms of small RNA biogenesis and function (A) A model for miRNA biogenesis in Arabidopsis. Following transcription (step 1), the Pri- miRNA is processed by DCL1, and perhaps other factors, to a miRNA:miRNA* duplex (step 2). Pre-miRNAs, which are readily detectable in animals, appear to be very short- lived in plants. The 3' sugars of the miRNA:miRNA* duplex are methylated by HEN1, presumably within the nucleus (step3). The miRNA is exported to the cytoplasm by HST, probably with the aid of additional factors (step 4). The mature, methylated miRNA is separated from the miRNA*, perhaps with the aid of a helicase. The miRNA is incorporated into RISC, an Argonaute containing ribonucleoprotein complex, while the miRNA* is degraded (step 5). For plant miRNAs, unwinding of the miRNA:miRNA* duplex may occur before export to the cytoplasm. (B) A model for siRNA biogenesis. Long double-stranded RNA, perhaps generated through the action of an RNA-dependent RNA polymerase (RDRP), is iteratively processed by Dicer-like proteins to yield multiple siRNA duplexes. One strand from each siRNA duplex is stably incorporated into RISC, while the other is degraded. (C) MicroRNAs or siRNAs can guide RISC to cleave mRNAs with extensive complementarity to the small RNA. The complementarity to the small RNA can occur at any point within the target RNA. (D) MicroRNAs or siRNAs can repress functional translation of target mRNAs. In animals, most known example of translational repression involve multiple sites in the 3' UTR with imperfect complementarity to the small RNA. The one plant miRNA which has been reported to mediate translational repression, miR172, recognizes its targets through a single site with near perfect complementarity. (E) Small RNAs play roles in the establishment of transcriptionally silent heterochromatin. The exact role played by the small RNAs in this pathway is not clear, nor is it known if they base pair to DNA or RNA.

Figure 2. Representative miR164 stem-loop precursor from Arabidopsis, Oryza, and Populus. The mature microRNAs are shown in red.

31 Figure 1

miR NA gene Exogenous RNA, transposon, virus,endogenous RNA

Near-perfect complementarity in coding region or UTR Pri-miRNA Long dsRNA - • -=--An k~0 I1ll-02IIIwyl or 014D I/ "" PremriiR NAý Pre-maR NA WAn

r miRNA:miRNA* duplex siWRWAWIPlxiesIIII Snort complementary segments ini-U I K 010 siR NA duplexes It M e 1PII P 17110 Mell"llllllIIIIIIIIlk I methylated miRNA:miRNA " duplex Nucleus 1 = Active chromatin Histone methylation directed by heterochromatic siR NAs

U- Silent chromatin Mei W~ WP Mature miRNA within RIS C Mature siRNAs within RISC

~TE Figure 2

AU A U G U G U C-G G U U A G U-A A-U U-A A-U U-A G G U-A CC G U U A C-G C-G G'U G-C U'G A C A-U U U G C-G A U U U C -G C-G C U A-U U G C-G C-G C-G C C U U G C UAC G A-U U U C-G U-A C-G C-G C-G G-C C-G A-U U-A G-C -C C -G U-A U-A G-C nt loop U U C-G CG - C20 C-G C-G C C U-A A-U U U U G-C U'G C-GA U U AAU U-A U-A G-C A-UA G C U A-U C-G U G-C G -C U-A U-A C-G C-G A-U G -C G-C A-UC U'G AUC AA C C U A G U C C U -A CUAC -G A-U U-A A-U C-G U-A A-U A-U U-A C-C UU U-A A-U G-C C -G C A C-G A U U -A G-C A-U AC-GC G U U -A A-U A-U U C A-U AC-G C-G G-C U-A U-A C-C C-G G-C A-U A-U A-U A-U A-U U U U -AC U U A-U U C A-U UU UC A-U C A A U C-G A U C-G A U U C-G U C -C CG.C G-C G G-C G-C CG-CU U-A U-A U-A U-A U-A U-A G'U G-C G6U C-G C-G C-G C-G C-G C-G A-U A-U A C A-U A-U C-G C-G C-G G -C G-C C A G-C G-C G G G-C G -C G-C G-C G-C G'U G U G'U G-C A G A-U G-U A A A-U C U A G A A U C U C C C C G-C G- C C U C A G-C A-U A-U G-C G-C A-U G -C G-C A-U A-U A-U A-U A-U A-U A-U G-C G-C G-C A-U AU AG C A A A U A-U G-C G-C G-C G U G-C G-C G-C G -C G-C G-C G-C G-C U-A U-A U-A U-A U-A U-A C A A-U U-A A-U U U G-C G-C G-C G-C G-C G-C G -C 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' MIR164a MIR164b MIR164a MIR164b MIR164a MIR164b Arabidopsis Arabidopsis Oryza Oryza Populus Populus 1. Achard P, Herr A, Baulcombe DC, Harberd NP. 2004. Modulation of floral development by a gibberellin-regulated microRNA. Development 131: 3357-65 2. Adai A, Johnson C, Mlotshwa S, Archer-Evans S, Manocha V, et al. 2005. Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res 15: 78-91 3. Allen E, Xie Z, Gustafson AM, Carrington JC. 2005. microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants. Cell 121: 207-21 4. Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora JW, Carrington JC. 2004. Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana.Nat Genet36: 1282-90 5. Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, et al. 2003. A uniform system for microRNA annotation. RNA9: 277-9 6. Ambros V, Lee RC, Lavanway A, Williams PT, Jewell D. 2003. MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr Biol 13: 807-18 7. Aukerman MJ, Sakai H. 2003. Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15: 2730-41 8. Axtell MJ, Bartel DP. 2005. Antiquity of MicroRNAs and Their Targets in Land Plants. Plant Cell 17 9. Baker CC, Sieber P, Wellmer F, Meyerowitz EM. 2005. The early extra petalsl mutant uncovers a role for microRNA miR164c in regulating petal number in Arabidopsis. Curr Biol 15: 303-15 10. Bao N, Lye KW, Barton MK. 2004. MicroRNA binding sites in Arabidopsis class III HD-ZIP mRNAs are required for methylation of the template chromosome. Dev Cell 7: 653-62 11. Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-97 12. Bartel DP, Chen CZ. 2004. Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs. Nat Rev Genet 5: 396-400 13. Baulcombe D. 2004. RNA silencing in plants. Nature 431: 356-63 1.4. Beclin C, Boutet S, Waterhouse P, Vaucheret H. 2002. A branched pathway for transgene-induced RNA silencing in plants. Curr Biol 12: 684-8 15. Bernstein E, Caudy AA, Hammond SM, Hannon GJ. 2001. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409: 363-6 16. Bohmert K, Camus I, Bellini C, Bouchez D, Caboche M, Benning C. 1998. AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J 17: 170-80 17. Bonnet E, Wuyts J, Rouze P, Van de Peer Y. 2004. Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci U S A 101: 11511-6 18. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, et al. 2003. Arabidopsis HENI: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13: 843-8 19. Brennecke J, Hipfner DR, Stark A, Russell RB, Cohen SM. 2003. bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell 113: 25-36

34 20. Brennecke J, Stark A, Russell RB, Cohen SM. 2005. Principles of microRNA-target recognition. PLoS Biol 3: e85 21. Carmell MA, Xuan Z, Zhang MQ, Hannon GJ. 2002. The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Genes Dev 16: 2733-42 22. Chapman EJ, Prokhnevsky AI, Gopinath K, Dolja VV, Carrington JC. 2004. Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18: 1179-86 23. Chaw SM, Chang CC, Chen HL, Li WH. 2004. Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J Mol Evol 58: 424-41 24. Chen J, Li WX, Xie D, Peng JR, Ding SW. 2004. Viral virulence protein suppresses RNA silencing-mediated defense but upregulates the role of in host gene expression. Plant Cell 16: 1302-13 25. Chen X. 2004. A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303: 2022-5 26. Chen X, Liu J, Cheng Y, Jia D. 2002. HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129: 1085-94 27. Dalmay T, Hamilton A, Rudd S, Angell S, Baulcombe DC. 2000. An RNA-dependent RNA polymerase gene in Arabidopsis is required for posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell 101: 543-53 28. Doench JG, Petersen CP, Sharp PA. 2003. siRNAs can function as miRNAs. Genes Dev 17: 438-42 29. Doench JG, Sharp PA. 2004. Specificity of microRNA target selection in translational repression. Genes Dev 18: 504-11 30. Dunoyer P, Lecellier CH, Parizotto EA, Himber C, Voinnet 0. 2004. Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16: 1235-50 31. Elbashir SM, Lendeckel W, Tuschl T. 2001. RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15: 188-200 32. Elbashir SM, Martinez J, Patkaniowska A, Lendeckel W, Tuschl T. 2001. Functional anatomy of siRNAs for mediating efficient RNAi in Drosophila melanogaster embryo lysate. EMBO J 20: 6877-88 33. Emery JF, Floyd SK, Alvarez J, Eshed Y, Hawker NP, et al. 2003. Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI genes. Curr Biol 13: 1768-74 34. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. 2003. MicroRNA targets in Drosophila. Genome Biol 5: R1 35. Finnegan EJ, Margis R, Waterhouse PM. 2003. Posttranscriptional gene silencing is not compromised in the Arabidopsis CARPEL FACTORY (DICER-LIKE1) mutant, a homolog of Dicer-i from Drosophila. Curr Biol 13: 236-40 36. Floyd SK, Bowman JL. 2004. Gene regulation: ancient microRNA target sequences in plants. Nature 428: 485-6 37. Grad Y, Aach J, Hayes GD, Reinhart BJ, Church GM, et al. 2003. Computational and experimental identification of C. elegans microRNAs. Mol Cell 11: 1253-63 38. Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, et al. 2001. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106: 23-34

35 39. Guo HS, Xie Q, Fei JF, Chua NH. 2005. microRNA164 Directs NAC1 mRNA Cleavage to Downregulate Auxin Signals for Lateral Root Development. Plant Cell 40. Ha I, Wightman B, Ruvkun G. 1996. A bulged lin-4/lin-14 RNA duplex is sufficient for Caenorhabditis elegans lin-14 temporal gradient formation. Genes Dev 10: 3041-50 41. Han MH, Goud S, Song L, Fedoroff N. 2004. The Arabidopsis double-stranded RNA- binding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci U S A 101: 1093-8 42. Hunter C, Sun H, Poethig RS. 2003. The Arabidopsis heterochronic gene ZIPPY is an ARGONAUTE family member. Curr Biol 13: 1734-9 43. Hutvagner G, McLachlan J, Pasquinelli AE, Balint E, Tuschl T, Zamore PD. 2001. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293: 834-8 44. Hutvagner G, Zamore PD. 2002. A microRNA in a multiple-turnover RNAi enzyme complex. Science 297: 2056-60 45. John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. 2004. Human MicroRNA targets. PLoS Biol 2: e363 46. Jones-Rhoades MW, Bartel DP. 2004. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14: 787-99 47. Jones-Rhoades MW, Bartel DP. 2005. MicroRNA-mediated regulation of an F-box gene is required for embryonic, floral, and vegetative development. In Preperation 48. Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC. 2004. microRNA-mediated repression of rolled leafl specifies maize leaf polarity. Nature 428: 84-8 49. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, et al. 2003. P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA unction. Dev Cell 4: 205-17 50. Ketting RF, Fischer SE, Bernstein E, Sijen T, Hannon GJ, Plasterk RH. 2001. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15: 2654-9 51. Khvorova A, Reynolds A, Jayasena SD. 2003. Functional siRNAs and miRNAs exhibit strand bias. Cell 115: 209-16 52. Kidner CA, Martienssen RA. 2004. Spatially restricted microRNA directs leaf polarity through ARGONAUTE. Nature 428: 81-4 53. Kim J, Jung JH, Reyes JL, Kim YS, Kim SY, et al. 2005. microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J 42: 84-94 54. Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, et al. 2004. A combined computational-experimental approach predicts human microRNA targets. Genes Dev 18: 1165-78 55. Knight SW, Bass BL. 2001. A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science 293: 2269-71 56. Kurihara Y, Watanabe Y. 2004. Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci U S A 101: 12753-8 57. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294: 853-8 58. Lagos-Quintana M, Rauhut R, Meyer J, Borkhardt A, Tuschl T. 2003. New microRNAs from mouse and human. Rna 9: 175-9

36 59. Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. 2002. Identification of tissue-specific microRNAs from mouse. Curr Biol 12: 735-9 60. Lai EC, Tomancak P, Williams RW, Rubin GM. 2003. Computational identification of Drosophila microRNA genes. Genome Biol 4: R42 61. Lau NC, Lim LP, Weinstein EG, Bartel DP. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294: 858-62 62. Laufs P, Peaucelle A, Morin H, Traas J. 2004. MicroRNA regulation of the CUC genes is required for boundary size control in Arabidopsis meristems. Development 131: 4311-22 63. Lee RC, Ambros V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294: 862-4 64. Lee RC, Feinbaum RL, Ambros V. 1993. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843-54 65. Lee Y, Ahn C, Han J, Choi H, Kim J, et al. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415-9 66. Lee Y, Kim M, Han J, Yeom KH, Lee S, et al. 2004. MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23: 4051-60 67. Lewis BP, Burge CB, Bartel DP. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 15- 20 68. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. 2003. Prediction of mammalian microRNA targets. Cell 115: 787-98 69. Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. 2003. Vertebrate microRNA genes. Science 299: 1540 70. Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, et al. 2005. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433: 769-73 71. Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, et al. 2003. The microRNAs of Caenorhabditis elegans. Genes Dev 17: 991-1008 72. Lingel A, Simon B, Izaurralde E, Sattler M. 2003. Structure and nucleic-acid binding of the Drosophila Argonaute 2 PAZ domain. Nature 426: 465-9 73. Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, et al. 2004. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305: 1437-41 74. Liu Q, Rand TA, Kalidas S, Du F, Kim HE, et al. 2003. R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301: 1921-5 75. Llave C, Kasschau KD, Rector MA, Carrington JC. 2002. Endogenous and silencing- associated small RNAs in plants. Plant Cell 14: 1605-19 76. Llave C, Xie Z, Kasschau KD, Carrington JC. 2002. Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297: 2053-6 77. Lu C, Fedoroff N. 2000. A mutation in the Arabidopsis HYLI gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12: 2351-66 78. Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U. 2004. Nuclear export of microRNA precursors. Science 303: 95-8 79. Lynn K, Fernandez A, Aida M, Sedbrook J, Tasaka M, et al. 1999. The PINHEADIZWILLE gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126: 469-81

37 80. Mallory AC, Bartel DP, Bartel B. 2005. microRNA-Directed Regulation of Arabidopsis AUXIN RESPONSE FACTOR17 Is Essential for Proper Development and Modulates Expression of Early Auxin Response Genes. Plant Cell 17 81. Mallory AC, Dugas DV, Bartel DP, Bartel B. 2004. MicroRNA regulation of NAC- domain targets is required for proper formation and separation of adjacent embryonic, vegetative, and floral organs. Curr Biol 14: 1035-46 82. Mallory AC, Reinhart BJ, Bartel D, Vance VB, Bowman LH. 2002. A viral suppressor of RNA silencing differentially regulates the accumulation of short interfering RNAs and micro-RNAs in tobacco. Proc Natl Acad Sci U S A 99: 15228-33 83. Mallory AC, Reinhart BJ, Jones-Rhoades MW, Tang G, Zamore PD, et al. 2004. MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5' region. EMBO J 23: 3356-64 84. Mansfield JH, Harfe BD, Nissen R, Obenauer J, Srineel J, et al. 2004. MicroRNA- responsive 'sensor' transgenes uncover Hox-like and other developmentally regulated patterns of vertebrate microRNA expression. Nat Genet 36: 1079-83 85. Matzke M, Aufsatz W, Kanno T, Daxinger L, Papp I, et al. 2004. Genetic analysis of RNA-mediated transcriptional gene silencing. Biochim Biophys Acta 1677: 129-41 86. Mayer AM, Staples RC. 2002. Laccase: new functions for an old enzyme. Phytochemistry 60: 551-65 87. McConnell JR, Barton MK. 1998. Leaf polarity and meristem formation in Arabidopsis. Development 125: 2935-42 88. McConnell JR, Emery J, Eshed Y, Bao N, Bowman J, Barton MK. 2001. Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411: 709-13 89. Mette MF, van der Winden J, Matzke M, Matzke AJ. 2002. Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130: 6-9 90. Millar AA, Gubler F. 2005. The Arabidopsis GAMYB-Like Genes, MYB33 and MYB65, Are MicroRNA-Regulated Genes That Redundantly Facilitate Anther Development. Plant Cell 17: 705-21 91. Morel JB, Godon C, Mourrain P, Beclin C, Boutet S, et al. 2002. Fertile hypomorphic ARGONAUTE (agol) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14: 629-39 92. Moss EG, Lee RC, Ambros V. 1997. The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88: 637-46 93. Motamedi MR, Verdel A, Colmenares SU, Gerber SA, Gygi SP, Moazed D. 2004. Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs. Cell 119: 789-802 94. Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, et al. 2002. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16: 720-8 95. Mourrain P, Beclin C, Elmayan T, Feuerbach F, Godon C, et al. 2000. Arabidopsis SGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance. Cell 101: 533-42 96. Moussian B, Haecker A, Laux T. 2003. ZWILLE buffers meristem stability in Arabidopsis thaliana. Dev Genes Evol 213: 534-40

38 97. Moussian B, Schoof H, Haecker A, Jurgens G, Laux T. 1998. Role of the ZWILLE gene in the regulation of central shoot meristem cell fate during Arabidopsis embryogenesis. EMBO J 17: 1799-809 98. Noma K, Sugiyama T, Cam H, Verdel A, Zofall M, et al. 2004. RITS acts in cis to promote RNA interference-mediated transcriptional and post-transcriptional silencing. Nat Genet 36: 1174-80 99. Olsen PH, Ambros V. 1999. The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. Dev Biol 216: 671-80 100. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, et al. 2003. Control of leaf morphogenesis by microRNAs. Nature 425: 257-63 101. Papp I, Mette MF, Aufsatz W, Daxinger L, Schauer SE, et al. 2003. Evidence for nuclear processing of plant micro RNA and short interfering RNA precursors. Plant Physiol 132: 1382-90 102. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O. 2004. In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18: 2237-42 103. Park MY, Wu G, Gonzalez-Sulser A, Vaucheret H, Poethig RS. 2005. Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102: 3691-6 :104. Park W, Li J, Song R, Messing J, Chen X. 2002. CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12: 1484-95 105. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, et al. 2000. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408: 86-9 106. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS. 2004. SGS3 and SGS2/SDE1/RDR6 are required for juvenile development and the production of trans- acting siRNAs in Arabidopsis. Genes Dev 18: 2368-79 1.07. Pfeffer S, Sewer A, Lagos-Quintana M, Sheridan R, Sander C, et al. 2005. Identification of microRNAs of the herpesvirus family. Nat Methods 2: 269-76 108. Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, et al. 2004. Identification of virus- encoded microRNAs. Science 304: 734-6 109. Poy MN, Eliasson L, Krutzfeldt J, Kuwajima S, Ma X, et al. 2004. A pancreatic islet- specific microRNA regulates insulin secretion. Nature 432: 226-30 110. Reinhart BJ, Bartel DP. 2002. Small RNAs correspond to centromere heterochromatic repeats. Science 297: 1831 111. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, et al. 2000. The 21- nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403: 901-6 112. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. 2002. MicroRNAs in plants. Genes Dev 16: 1616-26 113. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP. 2002. Prediction of plant microRNA targets. Cell 110: 513-20 114. Schauer SE, Jacobsen SE, Meinke DW, Ray A. 2002. DICER-LIKE : blind men and elephants in Arabidopsis development. Trends Plant Sci 7: 487-91

39 115. Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D. 2005. Specific Effects of MicroRNAs on the Plant Transcriptome. Dev Cell 8: 517-27 116. Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. 2003. Asymmetry in the assembly of the RNAi enzyme complex. Cell 115: 199-208 117. Slack FJ, Basson M, Liu Z, Ambros V, Horvitz HR, Ruvkun G. 2000. The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol Cell 5: 659-69 118. Song JJ, Liu J, Tolia NH, Schneiderman J, Smith SK, et al. 2003. The crystal structure of the Argonaute2 PAZ domain reveals an RNA binding motif in RNAi effector complexes. Nat Struct Biol 10: 1026-32 119. Song JJ, Smith SK, Hannon GJ, Joshua-Tor L. 2004. Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305: 1434-7 120. Souret FF, Kastenmayer JP, Green PJ. 2004. AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Mol Cell 15: 173-83 121. Stark A, Brennecke J, Russell RB, Cohen SM. 2003. Identification of Drosophila MicroRNA targets. PLoS Biol 1: E60 122. Suh MR, Lee Y, Kim JY, Kim SK, Moon SH, et al. 2004. Human embryonic stem cells express a unique set of microRNAs. Dev Biol 270: 488-98 123. Sunkar R, Girke T, Jain PK, Zhu JK. 2005. Cloning and Characterization of microRNAs from Rice. Plant Cell 17: 666-999 124. Sunkar R, Zhu JK. 2004. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16: 2001-19 125. Tabara H, Yigit E, Siomi H, Mello CC. 2002. The dsRNA binding protein RDE-4 interacts with RDE-1, DCR-1, and a DExH-box helicase to direct RNAi in C. elegans. Cell 109: 861-71 126. Tang G, Reinhart BJ, Bartel DP, Zamore PD. 2003. A biochemical framework for RNA silencing in plants. Genes Dev 17: 49-63 127. Telfer A, Poethig RS. 1998. HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125: 1889-98 128. Tomari Y, Zamore PD. 2005. MicroRNA biogenesis: drosha can't cut it without a partner. Curr Biol 15: R61-4 129. Vaucheret H, Vazquez F, Crete P, Bartel DP. 2004. The action of ARGONAUTE in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18: 1187-97 130. Vazquez F, Gasciolli V, Crete P, Vaucheret H. 2004. The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14: 346-51 131. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, et al. 2004. Endogenous trans-acting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16: 69- 79 132. Verdel A, Jia S, Gerber S, Sugiyama T, Gygi S, et al. 2004. RNAi-mediated targeting of heterochromatin by the RITS complex. Science 303: 672-6 133. Volpe T, Schramke V, Hamilton GL, White SA, Teng G, et al. 2003. RNA interference is required for normal centromere function in fission yeast. Chromosome Res 11: 137-46

40 134. Volpe TA, Kidner C, Hall IM, Teng G, Grewal SI, Martienssen RA. 2002. Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science 297: 1833-7 135. Wang XJ, Reyes JL, Chua NH, Gaasterland T. 2004. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol 5: R65 136. Wightman B, Ha I, Ruvkun G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855-62 137. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, et al. 2005. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434: 338-45 138. Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, et al. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2: E104 139. Xie Z, Kasschau KD, Carrington JC. 2003. Negative feedback regulation of Dicer-Likel in Arabidopsis by microRNA-guided mRNA degradation. Curr Biol 13: 784-9 140. Yan KS, Yan S, Farooq A, Han A, Zeng L, Zhou MM. 2003. Structure and conserved RNA binding of the PAZ domain. Nature 426: 468-74 141. Yekta S, Shih IH, Bartel DP. 2004. MicroRNA-directed cleavage of HOXB8 mRNA. Science 304: 594-6 142. Yi R, Qin Y, Macara IG, Cullen BR. 2003. Exportin-5 mediates the nuclear export of pre- microRNAs and short hairpin RNAs. Genes Dev 17: 3011-6 143. Yu B, Yang Z, Li J, Minakhina S, Yang M, et al. 2005. Methylation as a crucial step in plant microRNA biogenesis. Science 307: 932-5 144. Zamore PD, Tuschl T, Sharp PA, Bartel DP. 2000. RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101: 25-33 145. Zhong R, Ye ZH. 2004. Amphivasal vascular bundle 1, a gain-of-function mutation of the IFLI REV gene, is associated with alterations in the polarity of leaves, stems and carpels. Plant Cell Physiol 45: 369-85 146. Zilberman D, Cao X, Jacobsen SE. 2003. ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299: 716-9 1.47. Zilberman D, Cao X, Johansen LK, Xie Z, Carrington JC, Jacobsen SE. 2004. Role of Arabidopsis ARGONAUTE4 in RNA-directed DNA methylation triggered by inverted repeats. Curr Biol 14: 1214-20

41 Prediction of Plant MicroRNA Targets

Matthew W. Rhoadesl2 , Brenda J. Reinhart, Lee P. Lim1 2, Christopher B. Burge2, Bonnie Bartel3' 4, and David P. Bartel' 4

'Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142

2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

3 Department of and Cell Biology, Rice University, 6100 Main St., Houston, Texas 77005

4To whom correspondence should be addressed. [email protected], 617-258-5287, fax 617-258-6768 [email protected], 713-348-5602

42 Summary We predict regulatory targets for 14 Arabidopsis microRNAs (miRNAs) by identifying mRNAs with near complementarity. Complementary sites within predicted targets are conserved in rice. Of the 49 predicted targets, 29 are members of transcription factor gene families involved in developmental patterning or cell differentiation. The near-perfect complementarity between plant miRNAs and their targets suggests that many plant miRNAs act similarly to small interfering RNAs and direct mRNA cleavage. The targeting of developmental transcription factors suggests that many plant miRNAs function during cellular differentiation to clear maternal regulatory transcripts from daughter cell lineages.

Introduction Nearly 200 genes for tiny, noncoding RNAs termed microRNAs (miRNAs) have been identified in animals and plants (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001; Lagos-Quintana et al., 2002; Llave et al., 2002; Mourelatos et al., 2002; Reinhart et al., 2002). Two miRNAs, lin-4 and let-7 RNAs, have been studied in detail; both control developmental timing in C. elegans through a mechanism that involves imperfect base pairing to the 3' UTRs of target mRNAs (Lee et al., 1993; Wightman et al., 1993; Ha et al., 1996; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000). The remaining miRNAs have unknown functions. Nonetheless, their sequences are typically conserved among different species, and many have intriguing expression patterns in different tissues or stages of development, indicating that these other miRNAs have important functions and might also modulate gene expression. This idea is supported by the observation that Dicer and Argonaute proteins, which are known to be crucial for normal plant and animal development, are needed for proper miRNA accumulation (Robinson-Beers et al., 1992; Ray et al., 1996; Ray et al., 1996; Jacobsen et al., 1999; Grishok et al., 2001; HutvAgner et al., 2001; Ketting et al., 2001; Knight and Bass, 2001; Reinhart et al., 2002). The major challenge in determining miRNA functions is to identify their regulatory targets. By analogy to lin-4 and let-7 RNAs, it is reasonable to suppose that miRNAs generally recognize their regulatory targets through base pairing. However, the small size of the mature miRNAs (20-24 nt) and the imperfect nature of miRNA:mRNA base pairing have hampered the general prediction of mRNA targets for animal miRNAs. Thus far, prediction of animal miRNA

43 targets has been achieved only after experimental evidence narrowed the number of candidate mRNAs to a small set, either by placing the mRNAs within the same regulatory pathway as the miRNA or by identifying regulatory elements within mRNA 3'-UTRs (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000; Lai, 2002). An indication that target prediction for certain plant miRNAs might be more straight-forward came with the recent identification of miR171, a plant miRNA with perfect antisense complementarity to the mRNAs of three SCARECROW-like transcription factors (Llave et al., 2002; Reinhart et al., 2002). Here we report that near complementarity to mRNAs, particularly transcription factor mRNAs, is a general trend for plant miRNAs. We have been able to identify potential regulatory targets for 14 of the 16 miRNAs studied by searching for mRNAs capable of base pairing with three or fewer mismatches to one of the miRNAs. The fact that many of these potential targets are members of gene families with roles in plant development supports the idea that the function of miRNAs in mediating development is conserved across kingdoms. Particularly compelling targets include the PHABULOSA and PHAVULOTA mRNAs, for which the identification of miRNA complementary sites may explain the ectopic expression previously described for mutations in these genes (McConnell et al., 2001). Similar analysis of animal miRNAs did not predict animal regulatory targets, suggesting mechanistic differences between plant and animal miRNA function.

Results and Discussion Plant MicroRNAs Have Significant Complementarity to Messenger RNAs To identify potential regulatory targets, we searched for Arabidopsis mRNAs that were complementary, with four or fewer mismatches, to at least one of 16 recently identified Arabidopsis miRNAs (Reinhart et al., 2002). Gaps were not allowed, and G:U and other non- canonical pairs were treated as mismatches. To evaluate the significance of these hits to annotated mRNAs, parallel analyses were performed using cohorts of randomly permuted sequences that had identical sizes and base compositions as the set of authentic miRNAs. There were substantially more antisense hits to the authentic miRNAs than to the randomized sequences (Figure 1). This difference was especially striking at higher stringency; when summing the hits with two or fewer mismatches, the number of hits to the authentic miRNA set

44 outnumbered those to the randomized cohorts by a ratio of 30:0.2 (Figure 1). Considering the low probability of so many antisense hits occurring by chance, we suggest that these complementary sites reflect a functional relationship between the miRNAs and the identified mRNAs-that these protein-coding genes are regulatory targets of the miRNAs to which they can potentially base pair. At lower stringencies, there were also significantly more hits with the authentic set of miRNAs than with the randomized cohorts. Most of the 31 hits with three mismatches are viable miRNA target candidates, although a few are likely to be mRNAs with fortuitous complementarity, as judged by the observation that on average the randomized cohorts hit 4.2 mRNAs when three mismatches were permitted (Figure 1). Some hits with four mismatches might also be genuine targets. However, they are not included in the present analysis because of the greater likelihood that their complementarity is fortuitous or occurs because they are targets of unidentified miRNAs related to our query set of 16 miRNAs. Potential regulatory targets with three or fewer mismatches were found for 14 of the 16 miRNAs (Table 1). Targets for the other two miRNAs might be identified through slight changes in the search algorithm. For example, miR163, one of the two miRNAs without predicted targets in Table 1, has extensive complementarity to members of the AtPP-like gene family (Atlg66690, Atlg66700, Atlg66720, At3g44860, At3g44870), which have unknown functions (Cui et al., 1999). All 24 nucleotides of this miRNA paired to complementary sites within these mRNAs when a single-nucleotide gap was permitted near the 3' terminus of the miRNA. Nonetheless, when searching for miRNA targets, permitting gaps did not substantially increase the number of targets predicted for the other miRNAs (data not shown). Perhaps a bulge is accommodated near the miRNA terminus more readily for miR163 because this miRNA is 24 nt in length, which is 3 nt longer than the other miRNAs queried. In all cases where an miRNA was complementary to more than one mRNA, most of the potential targets were members of the same gene family (Table 1). The fraction of the gene family members with miRNA complementary sites varied considerably. Of the 16 Squamosa Promoter Binding Protein (SBP)-like genes in Arabidopsis (Riechmann et al., 2000), 10 have miR156 complementary sites. In contrast, the MYB and NAC families each have over 100 members in Arabidopsis (Riechmann et al., 2000), of which five in each case have sites complementary to miR159 or miR164, respectively. As more miRNAs are identified it will be

45 interesting to learn whether remaining members of these gene families have complementary sites to other miRNAs. In support of this possibility, unrelated miRNAs can be complementary to different members of the same gene family, as illustrated by miR160 and miR167, which apparently target different members of the Auxin Response Factor family (Ulmasov et al., 1999). When considering the significance of multiple hits to the same gene family, it is important to address the possibility that these hits are merely the consequence of complementarity to a nucleotide sequence that encodes a critical protein motif. Indeed, for :miR161,miR165, miR170, and miR171, the miRNA complementary sites were within the context of a domain strongly conserved among family members, as shown for the miR165 complementary sites (Figure 2A). Therefore, we can not rule out the possibility that only a subset of the hits for these miRNAs are authentic targets. This possibility is less likely in the cases of miR156, miR157, miR159, miR160, miR164, and miR169. The complementary sites for these miRNAs fell outside the conserved domains that define the families and instead fell within sequence contexts that were only weakly conserved among the family members, as shown for the miR156 sites within SBP-like mRNAs (Figure 2B). Indeed, there are examples where the conservation of the miRNA complementary sites among family members must be independent of conserved protein function. In the case of the MYB genes with miR159 complementary sites, four genes translate the complementary site in the same reading frame, while the fifth gene translates the site in a different reading frame. In four other cases (miR156/157 to Atlg53160, miR156 to At2g33810, and miR169 to Atlg17590 and Atlg54160), the miRNA complementary sites are not in the coding regions at all but rather in the 3'-UTRs, as illustrated for miR156 and its complementary sites (Figure 2B).

MicroRNA Complementary Sites Are Conserved Among Flowering Plants Many complementary sites observed in Arabidopsis are conserved in rice (Oryza sativa). Analysis of rice homologs focused on the seven miRNAs perfectly conserved in Oryza (Reinhart et al., 2002) for which complementary sites had been identified in Arabidopsis (Table 1). When using a three-mismatch cutoff, six of the seven conserved miRNAs (miR156, miR160, miR164, miR167, miR169, miR171) have at least one potential target gene in Oryza homologous to a corresponding Arabidopsis target. In an analogous control study using 44 hits to the randomized

46 cohorts, there were no miRNA complementary sites in rice homologs of the Arabidopsis hits, even when four mismatches were allowed. The location of the miRNA complementary sites within the mRNAs was conserved between Arabidopsis and rice. Importantly, when there were differences between Arabidopsis and rice complementary sites within homologous genes, these differences were distributed evenly across the three codon positions (Table 2). Homologous regions under selection only at the protein level tend to exhibit a higher frequency of differences at codon position 3. Thus, the even distribution of mismatches across the codon positions indicates selection occurring at the nucleic acid level, in addition to any selection at the protein level, as would be expected if these segments act in miRNA recognition.

Most Predicted MicroRNA Targets Are Members of Transcription Factor Families Involved in Development Perhaps the most intriguing evidence that these genes are regulatory targets of the miRNAs is the identity of the genes themselves. MicroRNA complementary sites were found in 61 mRNAs, which, due to overlap between similar miRNAs, represent 49 unique genes (Table 1). Of these 49 predicted targets, 29 are known or putative transcription factors (Table 1), even though transcription factors are thought to represent only 6% of protein-coding genes in Arabidopsis (Riechmann et al., 2000). Many of these genes specify shoot and floral meristem development or, for those with unknown functions, are in a family that has members involved in meristem development. For example, the predicted targets of miR164 include CUP-SHAPED COTYLEDON2 (CUC2), which is required for shoot apical meristem formation (Aida et al., 1.997),and miR165 predicted targets include PHABULOSA (PHB) and PHAVOLUTA (PHV), which encode HD-Zip transcription factors that regulate axillary meristem initiation and leaf development (McConnell et al., 2001). A miR159 predicted target, AtMYB33, can bind to the promoter of the floral meristem identity gene LEAFY (Gocal et al., 2001). Homologs of the SBPs, which are thought to regulate the Antirrhinum floral meristem identity gene SQUAMOSA (Klein et al., 1996), may in turn be regulated by miR156 and miR157. Genetic evidence supports the regulatory roles of miR165 complementary sites within PHB and PHV (Figure 2A). Multiple gain-of-function alleles have been isolated for both genes, and each of these mutations disrupts the miR165 complementary site, usually as a single-

47 nucleotide substitution (McConnell et al., 2001). In the mutant examined, phb mRNA expression extends more broadly than in wild type (McConnell et al., 2001), suggesting that complementarity to miR165 is required for confining PHB mRNA accumulation to the proper cell types. A connection between miRNAs and meristem development is consistent with the phenotypes of the Arabidopsis carpelfactory (caj) mutant. Dicer and CAF are homologous RNaseIII-domain proteins required for the accumulation of mature miRNAs in animals and plants, respectively (Hutvdgner and Zamore, 2002; Reinhart et al., 2002). Mutant alleles of CAF, which is also known as SHORT INTEGUMENT] (SIN1), delay the meristem switch from vegetative to floral development and cause over-proliferation of the floral meristem (Ray et al., 1996; Jacobsen et al., 1999). Other genes required for miRNA accumulation in animals are homologs of the Arabidopsis gene ARGONAUTE (AGOI), which is required for axillary shoot meristem formation and leaf development in Arabidopsis (Bohmert et al., 1998). While AGO1 has not yet been reported to influence miRNA accumulation in plants, it is a predicted target of miR168 (Table 1), suggesting a negative-feedback mechanism for controlling expression of the AGO1 gene. Other predicted targets of miRNAs do not have direct roles in meristem identity but rather could have roles in cell division or differentiation. For example, miR160 and miR167 are predicted to target auxin response factors, DNA-binding proteins that are thought to control transcription in response to the phytohormone auxin (Ulmasov et al., 1999). Transcriptional regulation is important for many of the diverse developmental responses to auxin signals, which include cell elongation, division, and differentiation in both roots and shoots (Rogg and Bartel, 2001; Liscum and Reed, 2002). The predicted targets of miR170 and miR171 are three SCARECROW-like proteins, a family of transcription factors whose members have been implicated in radial patterning in roots, signaling by the phytohormone gibberellin, and light signaling (Di Laurenzio et al., 1996; Peng et al., 1997; Silverstone et al., 1998; Bolle et al., 2000; Helariutta et al., 2000). Overall, the high percentage of predicted miRNA targets that act as developmental regulators suggest that miRNAs are involved in a wide range of cell division and cell fate decisions throughout the plant.

48 Mechanistic and Functional Models for Regulation by MicroRNAs in Plants The success in identifying potential miRNA targets in Arabidopsis prompted us to examine whether our simple computational approach could also identify miRNA targets in C. elegans and D. melanogaster. In both organisms, the miRNAs had few mRNA hits with complementary sites--essentially the same number of hits as seen for randomized cohorts (data not shown). While the possibility that a few animal miRNAs will recognize their targets with near-perfect complementarity cannot be excluded, the general phenomenon of near-perfect complementarity appears to be specific to plants. Two other key differences emerge when comparing the predicted target sites of plant miRNAs with those of the C. elegans lin-4 and let-7 miRNAs. First, the plant complementary sites are primarily, though not exclusively, within the ORFs, whereas the only proposed lin-4 and let-7 sites are within 3' UTRs (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000). Second, multiple sites 'within the same target mRNA are not detected in plants, whereas there are typically multiple lin- 4 and let-7 sites within each mRNA target (Lee et al., 1993; Wightman et al., 1993; Ha et al., 1996; Reinhart et al., 2000; Slack et al., 2000). These differences observed between plant and animal miRNA target recognition have intriguing mechanistic implications for plant miRNA function (Figure 3A). Namely, plant miRNA target recognition appears to resemble that of small interfering RNAs (siRNAs) much more than that of animal miRNAs. During RNA interference (RNAi), long double-stranded RNA is processed by Dicer into -22-nt siRNAs, which serve as guide RNAs to target homologous mRNA sequences for cleavage (Bernstein et al., 2001; HutvAgnerand Zamore, 2002). Importantly, targeting either the ORF or the UTRs is effective (McManus et al., 2002), provided that the siRNA has near-perfect complementarity to the targeted mRNA (Elbashir et al., 2001). Plants also have siRNAs. Indeed, these tiny RNAs were first observed in plants and are associated with a process related to RNAi, known as posttranscriptional gene silencing (PTGS), which leads to the destruction of mRNA from plant viruses and transgenes (Hamilton and Baulcombe, 1999; Matzke et al., 2001). Plant miRNAs resemble animal miRNAs in their biogenesis, in that they are derived from endogenous, evolutionarily conserved genes and are processed from stem-loop precursors by a Dicer homolog, with accumulation of mature miRNA from only one arm of the precursor stem-loop (Reinhart et al., 2002). However, plant miRNAs resemble siRNAs in their target recognition, suggesting that they might also resemble siRNAs in

49 their mechanism of action (Figure 3A). We propose that many plant miRNAs hybridize to mRNAs with near-perfect complementarity and target the mRNAs for cleavage. A function in mediating RNA cleavage might allow the plant miRNAs to target any region of the mRNA, whereas the animal miRNAs that mediate translational attenuation might be relegated to 3'- UTRs in order to avoid the mRNA-clearing activity of ribosomes. The efficiency and finality of mRNA cleavage might require only a single complementary site in each message, whereas the regulatory mechanism of lin-4 and let-7 miRNAs, which leaves the mRNA intact, might generally require multiple target sites. In presenting this hypothesis, we leave open the possibility that some plant miRNAs might not specify cleavage of their regulatory targets, and some might specify cleavage of some targets but employ other mechanisms to regulate other targets. Targets with many mismatches, analogous to the targets of lin-4 and let-7 miRNAs, would not have been detected in our analysis. Furthermore, some mismatches for the predicted targets are near the center of the complementary sites (Table 2, data not shown) and might be expected to abrogate siRNA-mediated mRNA cleavage (Elbashir et al., 2001). However, it is difficult to know whether these mismatches are incompatible with mRNA cleavage because the types and locations of mismatches permissive for siRNA-mediated cleavage are still being determined in animals and have not yet been explored in plants. In those cases where the miRNAs might not be mediating mRNA cleavage, they might attenuate translation (Olsen and Ambros, 1999), act as guide RNAs for mRNA modifications (Kiss, 2002), or target DNA for epigenetic modifications, such as methylation (Matzke et al., 2001). Although DNA targeting cannot be excluded as an additional miRNA function for some miRNAs, two observations argue strongly for a role in targeting mRNAs in addition to any possible role in targeting DNA. First, plant miRNAs are complementary to the sense rather than antisense strands of mRNAs (data not shown). Second, the complementary sites for miR165 and miR166 span a splice junction within each of the HD-Zip mRNAs. The observation that many plant miRNAs potentially target the mRNAs of transcription factors involved in development suggests that some miRNAs might function to clear maternal regulatory transcripts from certain daughter-cell lineages (Figure 3B). Through the action miRNAs, these inherited mRNAs could be eliminated without relying on constitutively unstable messages. Now that potential miRNA binding sites in some of these developmentally important transcription factor mRNAs have been identified, it should be possible to test this speculative

50 model by disrupting the miRNA complementarity site in the mRNA without changing the protein sequence of the transcription factor. The miRNAs analyzed here are likely to be only a small fraction of the miRNAs in Arabidopsis (Llave et al., 2002; Reinhart et al., 2002). Nonetheless, the discovery that so many of these plant miRNAs appear to have readily identifiable regulatory targets will greatly facilitate experimental investigation of the functions of these tiny noncoding RNAs and the many other miRNAs remaining to be found in plants. With the ability to computationally identify candidate targets, the presumed roles of miRNAs in development can be more readily explored, and roles of miRNAs in other processes can be more readily uncovered.

Experimental Procedures Identification of miRNA Complementary Sites in Annotated mRNAs The set of annotated Arabidopsis mRNA sequences was extracted from the genomic GenBank files, January 2002 release (Arabidopsis Genome Initiative, 2000). This set was searched for complementary sites to any of 16 miRNAs (GenBank accession numbers AJ493620-AJ493656) using Patscan (Dsouza et al., 1997). When the miRNA was cloned as either a 20- or 21-nt RNA, the 21-nt RNA was used (Reinhart et al., 2002). Thus, the miR158 sequence was 20 nt, the miR163 sequence was 24 nt, and the remaining 14 miRNA sequences were 21 nt. One mismatch was added to all miR158 complementary sites to compensate for their smaller size and the correspondingly greater chance of fortuitous complementarity. Complementary sites were also found for 10 cohorts of 16 randomly permuted sequences that had identical sizes and base compositions to the authentic miRNAs. One mismatch was added to the sites complementary to the randomly permuted versions of miR158. Analogous searches for animal miRNA complementary sites queried annotated D. melanogaster mRNAs (GenBank October 2000 release) and annotated C. elegans coding regions (GenBank April 1999 release).

Identification of Homologous miRNA Complementary Sites in Oryza mRNAs For each Arabidopsis target mRNA, the mRNAs of up to 10 homologous Oryza proteins were predicted from the unannotated Oryza contigs (Yu et al., 2002) by GenomeScan, a program which identifies genes within genomic sequence using homology to input protein sequences combined with an ab initio gene-finding algorithm (Yeh et al., 2001). Complementary sites in

51 this dataset were identified by PatScan searches, and homology to the Arabidopsis targets was confirmed by alignment of the inferred protein sequences (ClustalX). One additional target homolog (TC79868) was found by searching the TIGR Rice Gene Index (9.0).

Acknowledgments We thank Earl Weinstein and Ru-Fang Yeh for computer scripts and helpful discussions. This work was supported by grants from David H. Koch Cancer Research Fund (DBP), the Alexander and Margaret Stewart Trust (DBP), the Robert A. Welch Foundation (BB), and the NIH (CBB).

52 Table 1. Potential Regulatory Targets of Arabidopsis miRNAs

_. MicroRNA Target protein family Target gene names (number of mismatches) . miR156 SQUAMOSA-PROMOTER At3g57920 (1), At2g42200/SPL9 (1), At5g50570 (1), At5g50670 (1), BINDING PROTEIN (SBP) At1g53160/SPL4 (2), At2g33810/SPL3 (2), At1g27370/SPL10 (2), like proteins At5g43270/SPL2 (2), Atlg69170/SPL6 (2), Atlg27360/SPL11 (2)

miR157 SQUAMOSA-PROMOTER Atlg27370/SPL10 (1), At3g57920 (1), At2g42200/SPL9 (1), At5g43270/SPL2 (1), BINDING PROTEIN (SBP) At1 g27360/SPL11 (1), At1g69170/SPL6 (2), At5g50570 (2), At5g50670 (2), like proteins At1g53160/SPL4 (3)

Putative RNA helicase At5g08620 (3)

Unknown proteins At3g47170 (3), Atl g22000 (3)

miR158 Unknown protein Atlg64100 (3)

miR159 MYB proteins At2g32460/AtMYB101 (2), At3g60460 (3), At2g26950/AtMYB104 (3), At5g06100/AtMYB33 (3), At3g11440/AtMYB65 (3)

Unknown protein Atlg29010 (3)

miR160 Auxin Response Factors At1g77850/ARF17 (1), At2g28350/ARF10 (2), At4g30080 (3)

miR161 PPR repeat proteins At1 g63150 (3), At1g63400 (3), Atlg06580 (3), At1g64580 (3), At5g16640 (3), Atlg62670 (3), Atlg62720 (3), At5g41170 (3), Atlg63080 (3)

miR164 NAC domain proteins At5g61430 (2), At5g07680 (2), Atlg56010/NAC1 (2), At3g15170 (3), At5g53950/CUC2 (3)

miR165 HD-Zip transcription factors At5g60690/REV (3), At3g34710/PHB (3), At4g32880/ATHB-8 (3), At1g30490/PHV (3)

miR166 HD-Zip transcription factor At1 g52150/ATHB-15 (3)

miR167 Auxin Response Factor At5g37020/ARF8 (3)

miR168 ARGONAUTE Atlg48410/AGO (3)

miR169 CCAAT Binding Factor At1g17590 (3), Atlg54160 (3) (CBF)-HAP2-like proteins

miR170 GRAS domain proteins At2g45160 (2), At3g60630 (2), At4g00150/SCL6 (2) (SCARECROW-like)

miR171 GRAS domain proteins At2g45160 (0), At3g60630 (0), At4g00150/SCL6 (0) (SCARECROW-like) For each gene, the number of mismatches between the miRNA and the mRNA is indicated in parentheses. The sequences of three pairs of miRNAs (miR156/miR157, miR165/miR166, and miR170/miR171) are closely related and therefore sometimes complementary to the same sites within the target mRNAs. Sites complementary to miR158 had an additional mismatch added to compensate for the fact that miR158 is at least 1 nt shorter than the other miRNAs.

53 Table2. MicroRNAComplementary Sites in PotentialmRNA TargetsConserved Between Arabidopsis and Oryza Target RNAsequence of Peptide gene complementarysite sequence miR156 uU GCUCAc ucU cUu CUG UCA At5g50570(1) UGU GCU CuC UCU CUU CUG UCA CALSLLS At5g50670(1) UGU GCU CuC UCU CUU CUG UCA CALSLLS At3g57920(1) UGU GCU CuC UCU CUU CUG UCA CALSLLS At2g42200(1) UGU GCU CuC UCU CUU CUG UCA CALSLLS Atlg27370 (2) aGU GCU CuC UCU CUU CUG UCA SALSLLS At1g27360(2) cGU GCU CuC UCU CUU CUG UCA RALSLLS At5g43270(2) gGU GCU CuC UCU CUU CUG UCA GALSLLS At1g69170(2) cGU GCU CuC UCU CUU CUG UCA RALSLLS At2g33810(2) UuU GCU uAC UCU CUU CUG UCA 3' UTR At1g53160(2) UcU GCU CuC UCU CUU CUG UCA 3' UTR Os20095 (1) UGU GCU CuC UCU CUU CUG UCA CALSLLS Os06618 (1) UGU GCU CuC UCU CUU CUG UCA CALSLLS Os02878 (1) UGU GCU CuC UCU CUU CUG UCA CALSLLS Os 25470(2) gGU GCU CuC UCU CUU CUG UCA GALSLLS miR160 U GGC AUA CAG GGAGCC AGG CA Atlg77850(1) U GGCAUg CAG GGAGCC AGG CA AGMQGARQ At2g28350(2) a GGa AUA CAG GGA GCC AGG CA AGIQGARQ At4g30080(3) g GGu uUA CAG GGAGCC AGG CA VGLQGARH OsTC73519(1) a GGC AUA CAG GGA GCC AGG CA AGIQGARH OsTC70631 (1) a GGC AUA CAG GGA GCC AGG CA AGIQGARH Os17478 (1) a GGC AUA CAG GGA GCC AGG CA AGIQGARH Os 02679(1) a GGC AUA CAG GGA GCCAGG CA AGIQGARH miR164 UG CAC GUG CCC UGC UUC UCC A Atlg56010(2) aG CAC GUaCCC UGC UUC UCC A EHVPCFSN At5g07680(2) Uu uAC GUG CCC UGC UUCUCC A VYVPCFSN At5g61430(2) Uc uAC GUG CCC UGC UUC UCC A VYVPCFSN At3gl 5170 (3) aG CAC GUG uCC UGu UUC UCC A EHVSCFSN At5g53950(3) aG CAC GUG uCC UGu UUC UCC A EHVSCFST Os 00116(2) cG CAC GUG aCC UGC UUC UCC A AHVTCFSN miR167 U AGA UCA UGC UGG CAG CUU CA At5g37020(3) U AGA UCA gGC UGG CAG CUl gu LRSGWQLV OsTC79868 (3)U AGA UCA gGC UGG CAG CUU gu DRSGWQLV miR169 UCG GCA AGU CAU CCU UGG CUG At1g17590(3) aaG GgA AGU CAU CCU UGG CUG 3' UTR At1g54160(3) aCG GgAAGU CAU CCU UGG CUa 3' UTR Os 04048(3) UaG GCA AcU CAU uCU UGG CUG 3' UTR Os09843 (3) UaG GCA AuU CAU CCU UGG CUu 3' UTR miR171 G AUA UUG GCG CGG CUC AAU CA At2g45160(0) G AUA UUG GCG CGG CUC AAU CA GILARLNH At3g60630(0) G AUA UUG GCG CGG CUC AAU CA GILARLNH At4g00150(0) G AUA UUG GCG CGG CUC AAU CA GILARLNQ OsTC76755(0)G AUA UUG GCG CGG CUC AAU CA EILARLNQ OsTC81772(0) G AUA UUG GCG CGG CUC AAU CA EILARLNH Os00711 (0) G AUA UUG GCG CGG CUC AAU CA EILARLNQ Os 12185(0) G AUA UUG GCG CGG CUC AAU CA EILARLNQ OsTC75254(1)G AUA UUG GCGCGG CUC AAU uA EILARLNY For eachgene, the nucleotidesequence of the miRNA complementarysite is brokeninto codonscorresponding to the readingframe of the mRNA. The reversecomplement is shownfor eachmiRNA, and for each complementarysite, mismatches are shownin lowercase. The peptidesequence of the miRNA complementarysite is shown. Oryzagenes are labeledeither by their tentativeconsensus (TC) numbersfrom the TIGRrice gene index(version 9.0) or by the genomiccontig of the mRNApredicted by GenomeScan.

54 Figure Legends Figure 1. Antisense Hits Between Arabidopsis miRNAs and Annotated mRNAs Annotated Arabidopsis mRNAs were searched for sites complementary to 16 Arabidopsis miRNAs with 0-4 mismatches (solid bars). Identical searches with cohorts of 16 randomized RNAs were also performed (open bars, mean values from 10 cohorts; error bars, one standard deviation). Note that two hits by similar miRNAs to the same complementary site within an mRNA were counted as separate hits (Table 1).

Figure 2. Sequence Context of miRNA Complementary Sites (A) The four miR165 complementary sites. These complementary sites lie within the START domain present in a subfamily of HD-Zip transcription factors. The altered protein sequences of the reported phv and phb gain-of-function alleles are indicted (McConnell et al., 2001). Each of these lesions also disrupts the miR165 complementary site. Amino acids conserved in a majority of the proteins are highlighted. (B) The miR156 complementary sites. All ten predicted targets contain the Squamosa Promoter Binding (SBP) box, but the complementary sites are downstream of this conserved domain, within a poorly conserved protein-coding context or the 3'-UTR. Amino acids conserved in a majority of the proteins are highlighted.

Figure 3. Models for the Biogenesis, Action, and Roles of miRNAs in Plants (A) Although plant miRNAs are apparently generated through the classical miRNA pathway (Reinhart et al., 2002), we propose that many act as classical siRNAs, pairing with near-perfect complementarity to their mRNA targets to specify mRNA cleavage. (B) Plant miRNAs might target transcription factor mRNAs for cleavage following cell divisions that require rapid implementation of new transcription factor programs. Following cell division, the daughter cells inherit transcription factor mRNAs from the precursor cell. At the onset of differentiation, one daughter might express not only new transcription factor mRNAs (green) but also miRNAs (red) complementary to mRNAs of key maternal transcription factors (blue). The miRNAs might direct the cleavage of the inherited transcription factor mRNA, preventing the inappropriate expression of the transcription factor protein, thus enabling the rapid differentiation of the daughter cell.

55 Rhoades et. al., Figure 1

100

80

Cn60 4-. z cr 40 I E

20 I' 17

3 l O0 0 Ai -- 0 1 2 3 4 Stringency of pairing (# of mismatches) Rhoades et. al., Figure 2

A

rn-rn

,/1miR 165 compleme si

REV ATHB-8 PHV PHB i 1 e i tN 11 aa insertion in phb - j LG to D mutation in phv gain-of-function allele and phb gain-of-function alleles

B

m m - X

miR156 complementary site

AtSg50570 SRTASLC LQPPLSLSQEA At5g50670 ISRTASLC LQPPLSLSQEA AtIg27370 PDKGVGEC HTPVAEPPPIF At1g27360 HGEDVGEY HVQPFSLLCSY At5g43270 FSKEKVTI DQPRRFTLDHH At3g57920 SSFTTCP LQTPTNTWRPS At2g42200 IPEIMDTK NNNNNNNNNNN At1g69170 ITEVSSIW FPNTTFSITQP At2g33810 miR156 complementary site in 3'-UTR Atlg53160 miR156 complementary site in 3 -UTR Rhoades et. al., Figure 3

MicroRNA Pathway PTGS/RNAi Pathway

Precursor cell expressing miRNA precursor Long double-stranded RNA transcription factor mRNAs

miRNA Many plant siRNAs miRNAs I Cap AAA. A Cap '- - AAA A I! Short segments of complementarity in 3 -UTR Near-perfect complementarity in coding region or UTR

CapnAAA. .A Cap •_*MSN AAA AA Daughter cells with distinct Cleaved mRNA transcription factor profiles Attenuated translation

I',' S References

Aida, M., Ishida, T., Fukaki, H., Fujisawa, H., and Tasaka, M. (1997). Genes involved in organ separation in Arabidopsis: an analysis of the cup-shaped cotyledon mutant. Plant Cell 9, 841- 857.

The Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815.

Bernstein, E., Denli, A. M., and Hannon, G. J. (2001). The rest is silence. RNA 7, 1509-1521.

Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J. 17, 170-180.

Bolle, C., Koncz, C., and Chua, N. H. (2000). PAT1, a new member of the GRAS family, is involved in phytochrome A signal transduction. Genes Dev. 14, 1269-78.

Cui, Y., Brugiere, N., Jackman, L., Bi, Y.-M., and Rothstein, S. (1999). Structural and transcriptional comparative analysis of the S locus regions in two self-incompatible Brassica napus lines. Plant Cell 11, 2217-2231.

Di Laurenzio, L., Wysocka-Diller, J., Malamy, J. E., Pysh, L., Helariutta, Y., Freshour, G., Hahn, M. G., Feldmann, K. A., and Benfey, P. N. (1996). The SCARECROW gene regulates an asymmetric cell division that is essential for generating the radial organization of the Arabidopsis root. Cell 86, 423-33.

Dsouza, M., Larsen, N., and Overbeek, R. (1997). Searching for patterns in genomic data. Trends in Genetics 13, 497-498.

59 Elbashir, S., Martinez, J., Patkaniowska, A., Lendeckel, W., and Tuschl, T. (2001). Functional anatomy of siRNAs for mediating efficient RNAi in Drosophila melanogaster embryo lysate. EMBO J. 20, 6877-6888.

Gocal, G. F., Sheldon, C. C., Gubler, F., Moritz, T., Bagnall, D. J., MacMillan, C. P., Li, S. F., Parish, R. W., Dennis, E. S., Weigel, D., and King, R. W. (2001). GAMYB-like genes, flowering, and gibberellin signalling in Arabidopsis. Plant Physiol. 127, 1682-1693.

Grishok, A., Pasquinelli, A. E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D. L., Fire, A., Ruvkun, G., and Mello, C. C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34.

Ha, I., Wightman, B., and Ruvkun, G. (1996). A bulged lin-4/lin-14 RNA duplex is sufficient for Caenorhabditis elegans lin-14 temporal gradient formation. Genes Dev. 10, 3041-3050.

Hamilton, A. J., and Baulcombe, D. C. (1999). A novel species of small antisense RNA in posttranscriptional gene silencing. Science 286, 950-952.

Helariutta, Y., Fukaki, H., Wysocka-Diller, J., Nakajima, K., Jung, J., Sena, G., Hauser, M. T., and Benfey, P. N. (2000). The SHORT-ROOT gene controls radial patterning of the Arabidopsis root through radial signaling. Cell 101, 555-67.

Hutvdgner, G., McLachlan, J., Pasquinelli, A. E., Balint, E., Tuschl, T., and Zamore, P. D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838.

Hutvdgner, G., and Zamore, P. D. (2002). RNAi: nature abhors a double-strand. Curr. Opin. Genet. Dev. 12, 225-232.

60 Jacobsen, S. E., Running, M. P., and Meyerowitz, E. M. (1999). Disruption of an RNA helicase/RNAseIII gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126, 5231-5243.

Ketting, R. F., Fischer, S. E. J., Bernstein, E., Sijen, T., Hannon, G. J., and Plasterk, R. H. A. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev. 15, 2654-2659.

Kiss, T. (2002). Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell 109, 145-148.

Klein, J., Saedler, H., and Huijser, P. (1996). A new family of DNA binding proteins includes putative transcriptional regulators of the Antirrhinum majus floral meristem identity gene SQUAMOSA. Mol. Gen. Genet. 250, 7-16.

Knight, S., and Bass, B. (2001). A Role for the RNase III Enzyme DCR-1 in RNA Interference and Germ Line Development in Caenorhabditis elegans. Science 293, 2269-2271.

Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858.

Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. (2002). Identification of tissue-specific microRNAs from mouse. Curr. Biol. 12, 735-739.

Lai, E. C. (2002). MicroRNAs are complementary to 3'UTR motifs that mediate negative post- transcriptional regulation. Nat. Gen. 30, 363-364.

Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

61 Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864.

Lee, R. C., Feinbaum, R. L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854.

Liscum, E., and Reed, J. W. (2002). Genetics of Aux/IAA and ARF action in plant growth and development. Plant Mol. Biol. 49, 387-400.

Llave, C., Kasschau, K., Rector, M., and Carrington, J. (2002). Endogenous and silencing- associated small RNAs in plants. Plant Cell 14, 1-15.

Matzke, M. A., Matzke, A. J., Pruss, G. J., and Vance, V. B. (2001). RNA-based silencing strategies in plants. Curr. Opin. Genet. Dev. 11, 221-227.

McConnell, J. R., Emery, J., Eshed, Y., Bao, N., Bowman, J., and Barton, M. K. (2001). Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709- 713.

McManus, M. T., Petersen, C. P., Haines, B. B., Chen, J., and Sharp, P. A. (2002). Gene silencing using micro-RNA designed hairpins. RNA 8, 842-850.

Moss, E., Lee, R., and Ambros, V. (1997). The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88, 637-646.

Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev. 16, 720-728.

62 Olsen, P. H., and Ambros, V. (1999). The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN- 14 protein synthesis after the initiation of translation. Dev. Biol. 216, 671-680.

Peng, J., Carol, P., Richards, D. E., King, K. E., Cowling, R. J., Murphy, G. P., and Harberd, N. P. (1997). The Arabidopsis GAI gene defines a signaling pathway that negatively regulates gibberellin responses. Genes Dev. 11, 3194-205.

Ray, A., Lang, J. D., Golden, T., and Ray, S. (1996). SHORT INTEGUMENT (SIN1), a gene required for ovule development in Arabidopsis, also controls flowering time. Development 122, 2631-2638.

Ray, S., Golden, T., and Ray, A. (1996). Maternal effects of the short integument mutation on embryo development. Dev. Biol. 180, 365-369.

Reinhart, B., Weinstein, E., Rhoades, M., Bartel, B., and Bartel, D. (2002). MicroRNAs in plants. Genes Dev. 16, 1616-1626.

Reinhart, B. J., Slack, F. J., Basson, M., Bettinger, J. C., Pasquinelli, A. E., Rougvie, A. E., Horvitz, H. R., and Ruvkun, G. (2000). The 21 nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906.

Riechmann, J. L., Heard, J., Martin, G., Reuber, L., Jiang, C.-Z., Keddie, J., Adam, L., Pineda, O., Ratcliffe, O. J., Samaha, R. R., Creelman, R., Pilgrim, M., Broun, P., Zhang, J. Z., Ghandehari, D., Sherman, B. K., and Yu, G.-L. (2000). Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105-2110.

Robinson-Beers, K., Pruitt, R. E., and Gasser, C. S. (1992). Ovule development in wild-type Arabidopsis and two female-sterile mutants. Plant Cell 4, 1237-1249.

63 Rogg, L. E., and Bartel, B. (2001). Auxin signaling: derepression through regulated proteolysis. Dev. Cell 1, 595-604.

Silverstone, A. L., Ciampaglio, C. N., and Sun, T. (1998). The Arabidopsis RGA gene encodes a transcriptional regulator repressing the gibberellin signal transduction pathway. Plant Cell 10, 155-69.

Slack, F. J., Basson, M., Liu, Z., Ambros, V., Horvitz, H. R., and Ruvkun, G. (2000). The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol. Cell 5, 659-669.

Ulmasov, T., Hagen, G., and Guilfoyle, T. (1999). Dimerization and DNA binding of auxin response factors. Plant J. 19, 309-319.

Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862.

Yeh, R., Lim, L., and Burge, C. (2001). Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803-16.

Yu, J., Hu, S., Wang, J., Wong, G. K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92.

64 Computational identification of plant microRNAs and their targets,

including a stress-induced miRNA

Matthew W. Jones-Rhoades 1 and David P. Bartel l 2

1Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142

2Correspondence: [email protected]

Running title: Plant microRNAs

Key words: computational gene prediction, microRNA regulatory targets, noncoding RNAs

65 Summary MicroRNAs (miRNAs) are -21-nucleotide RNAs, some of which have been shown to play important gene-regulatory roles during plant development. We developed comparative genomic approaches to systematically identify both miRNAs and their targets that are conserved in Arabidopsis thaliana and rice (Oryza sativa). Twenty-three miRNA candidates, representing seven newly identified gene families, were experimentally validated in Arabidopsis, bringing the total number of reported miRNA genes to 92, representing 22 families. Nineteen newly identified target candidates were confirmed by detecting mRNA fragments diagnostic of miRNA-directed cleavage in plants. Overall, plant miRNAs have a strong propensity to target genes controlling development, particularly those of transcription factors and F-box proteins. However, plant miRNAs have conserved regulatory functions extending beyond development, in that they also target superoxide dismutases, laccases, and ATP sulfurylases. The expression of miR395, the sulfurylase-targeting miRNA, increases upon sulfate starvation, showing that miRNAs can be induced by environmental stress. Introduction MicroRNAs are endogenous 20- to 24-nucleotide RNAs, some of which are known to play important post-transcriptional regulatory roles in plants and animals (Bartel and Bartel, 2003; Lai, 2003; Bartel, 2004). MicroRNAs are initially transcribed as much longer RNAs that contain imperfect hairpins, from which the mature miRNA is excised by Dicer-like enzymes (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001; Lee et al., 2002; Park et al., 2002; Reinhart et al., 2002; Lee et al., 2003). The mature miRNA derives from the double-stranded portion of the hairpin and is initially excised as a duplex comprising two -22-nt RNAs, one of which is the mature miRNA while the other, known as the miRNA*, comes from the opposite arm of the hairpin (Lau et al., 2001; Reinhart et al., 2002; Khvorova et al., 2003; Lim et al., 2003b; Schwarz et al., 2003). The miRNA of this miRNA:miRNA* duplex is preferentially loaded into the RNA-induced silencing complex (RISC(Hammond et al., 2000)), where it functions as a guide RNA to direct the posttranscriptional repression of mRNA targets, while the miRNA* is degraded (Hutvagner and Zamore, 2002; Mourelatos et al., 2002; Khvorova et al., 2003; Schwarz et al., 2003).

66 The primary method of identifying miRNA genes has been to isolate, reverse transcribe, clone, and sequence small cellular RNAs (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001; Llave et al., 2002a; Park et al., 2002; Reinhart et al., 2002). However, molecular cloning is biased towards finding miRNAs that are relatively abundant. In animals, miRNA gene discovery by molecular cloning has been supplemented by systematic computational approaches that identify evolutionarily conserved miRNA genes by searching for patterns of sequence and secondary structure conservation that are characteristic of metazoan miRNA hairpin precursors (Ambros et al., 2003; Grad et al., 2003; Lai et al., 2003; Lim et al., 2003a; Lim et al., 2003b). The most sensitive of these methods indicate that miRNAs constitute nearly 1% of all predicted genes in nematodes, flies, and mammals (Lai et al., 2003; Lim et al., 2003a; Lim et al., 2003b). Methods developed in one animal lineage work well when extended to another animal lineage (Lim et al., 2003a), but cannot be directly applied to plants because the hairpins of plant miRNAs are more heterogeneous than those of animal miRNAs (Reinhart et al., 2002). Because the miRNAs recognize their regulatory targets through base pairing, computational methods have been invaluable for identifying these targets. The extensive complementarity between plant miRNAs and mRNAs makes systematic target identification easier in plants than in animals (Rhoades et al., 2002). A search for targets of 13 Arabidopsis miRNA families predicted 49 unique targets, with a signal-to-noise ratio exceeding 10:1, simply by looking for Arabidopsis messages with three or fewer mismatches (Rhoades et al., 2002). Evolutionary conservation of the miRNA:mRNA pairing in rice (Rhoades et al., 2002), together with experimental evidence showing that these miRNAs direct cleavage of their predicted mRNA targets (Llave et al., 2002b; Kasschau et al., 2003; Palatnik et al., 2003; Tang et al., 2003; Mallory et al., 2004; Vazquez et al., 2004) supports the validity of these predictions. Because metazoan miRNAs only rarely recognize their targets with such extensive complementarity (Yekta et al., 2004), more sophisticated methods that search for short segments of conserved complementarity to the miRNAs are required to identify metazoan miRNA targets (Enright et al., 2003; Lewis et al., 2003; Stark et al., 2003). The previously identified plant miRNAs have a remarkable propensity to target genes involved in development, particularly those of transcription factors (Rhoades et al., 2002). In all cases where disruption of plant miRNA regulation has been reported, striking developmental abnormalities are observed. Dominant gain-of-function mutations in HD-ZIP transcription factor

67 genes PHABULOSA, PHAVULOTA, and REVOLUTA that destabilize pairing to miR165/miR166 cause loss of adaxial/abaxial polarity in developing leaves (McConnell et al., 2001; Rhoades et al., 2002; Emery et al., 2003; Kidner and Martienssen, 2003). In maize, similar mutations in the HD-ZIP gene ROLLED LEAF1 also cause adaxilization of the abaxial surface of leaves, indicating that the miR165/miR166 family has a conserved role in determining leaf polarity despite the morphological differences between Arabidopsis and maize leaves (Juarez et al., 2004). Transgenic plants with silent mutations in the miR-JAW complementary sites of TCP transcription factors arrest as seedlings with fused cotyledons and lack shoot apical meristems, while those with mutations in the miR159 complementary site of MYB33 have upwardly curled leaves(Palatnik et al., 2003). Plants deficient in miR172-mediated regulation of APETALA2 have altered patterns of floral organ development (Chen, 2004). Plants deficient in miR164- mediated regulation of CUP-SHAPED COTYLEDON1 have altered patterns of embryonic, vegetative, and floral development (Mallory et al., 2004). Finally, silent mutations in the miR168 complementary site of ARGONAUTE] lead to misregulation of miRNA targets and numerous developmental defects (Vaucheret et al., 2004). To gain a more complete understanding of plant miRNAs and their regulatory targets, we devised a computational procedure to identify conserved miRNA genes that were missed in previous cloning efforts, and we refined our computational method for identifying mRNA targets to increase its sensitivity. Using criteria that retain all 11 of the previously identified miRNA gene families conserved between Arabidopsis thaliana and Oryza sativa, we found 13 additional families of candidates. Molecular evidence showed that at least seven of these newly identified families of candidate miRNAs are authentic, and that at least six out of the seven mediate the cleavage of their predicted mRNA targets. These seven newly identified families are represented by 23 loci. When these are added to those identified by cloning, we count 92 miRNA loci in the Arabidopsis genome. Our updated analysis of the plant miRNA targets indicates a continued very strong overall bias toward transcription factors and genes involved in development. Some targets of the newly identified miRNAs, such as F-box proteins and GRL transcription factors, represent genes with demonstrated or probable roles in controlling developmental processes. Nonetheless, other newly identified miRNA targets, such as ATP sulfurylases, laccases, and superoxide dismutases, show that the range of functionalities regulated by miRNAs is broader than previously known. Furthermore, the expression of miR395, which targets genes involved in

68 sulfate assimilation, is responsive to the sulfate concentration of the growth media, demonstrating that miRNA expression can be modulated by levels of external metabolites.

Results Identification of 20mers in conserved miRNA-like hairpins Our computational approach to identify plant miRNAs was based upon six characteristics that describe previously known plant miRNAs. 1) The base pairing of the mature miRNA to its miRNA* within the hairpin precursors is relatively consistent. In contrast, both the size of the foldback and the extent of base pairing outside of the immediate vicinity of the miRNA are highly variable among the hairpins of plant miRNAs, even among those of miRNAs from the same gene family. 2) The majority of known Arabidopsis miRNAs have identifiable homologs in the Oryza sativa genome, in which the predicted mature Oryza miRNAs have 0-2 base substitutions relative to their Arabidopsis homologs. 3) The secondary structures of known miRNA hairpins are robustly predicted by RNAfold if given a sequence sufficiently long to contain both the miRNA and the miRNA*. 4) The sequences of the Arabidopsis and Oryza hairpins are generally more conserved in the miRNA and miRNA* than in the segment joining the miRNA and miRNA*. 5) All matches to known miRNAs in the Arabidopsis genome, with the exception of those antisense to coding regions, have potential miRNA-like hairpins and are thus annotated as miRNA genes. 6) Most known Arabidopsis miRNAs are highly complementary to target mRNAs, and this complementarity is conserved to Oryza. As the first step to identifying miRNAs in the genomes of Arabidopsis thaliana and Oryza sativa, we considered only those genomic portions contained in imperfect inverted repeats as defined by EINVERTED (Figure la, step 1). Within these 133,864 Arabidopsis and 410,167 Oryza inverted repeats were 73 of 86 reference set loci corresponding to the 24 previously reported miRNAs (refsetl, Table Si1). Secondary structures for the inverted repeats were predicted with RNAfold, and all 20mers within the inverted repeats were checked against MIRcheck, an algorithm written to identify 20mers with the potential to encode miRNAs (Figure la, step 2). MIRcheck takes as input a) the sequence of a putative miRNA hairpin, b) a secondary structure of the putative hairpin, and c) a 20mer sequence within the hairpin to be considered as a potential miRNA. MIRcheck takes into account the total number of unpaired nucleotides (no more than 4 in the putative miRNA), the number of bulged or asymmetrically

69 unpaired nucleotides (no more than 1 in the putative miRNA), the number of consecutive unpaired nucleotides (no more than 2 in the putative miRNA) and the length of the hairpin (at least 60 nucleotides inclusive of the putative miRNA and miRNA*). In contrast to the algorithms designed to identify metazoan miRNAs, MIRcheck has no requirements pertaining to the pattern or extent of base pairing in other parts of the predicted secondary structure. Even though these parameters were chosen to be relatively stringent, only 7 of the 73 remaining Arabidopsis and Oryza refsetl loci were lost at this step. After removal of 20mers that overlap with repetitive elements, or which have highly biased sequence compositions, 389,648 Arabidopsis 20mers (AtSetl) and 1,721,759 Oryza 20mers (OsSetl) had at least 1 locus that passed MIRcheck. We used Patscan to identify 20mers in AtSetl that matched at least one 20mer in OsSetl with 0-2 base substitutions, considering only 20mers on the same arm of their putative hairpins (Figure la, step 3). 3,851 Arabidopsis 20mers had at least 1 Oryza match (AtSet2), and 5,438 Oryza 20mers were matched at least once (OsSet2). For the previously known plant miRNAs, RNAfold predicts a secondary structure in which the miRNA is paired to the miRNA*, provided that the flanking sequence is sufficiently long to contain the miRNA*. The presence of additional flanking sequence does not interfere with the prediction of a miRNA-like secondary structure. This robustly predicted folding is observed for all of the loci of each cloned miRNA, even though they have widely divergent flanking sequences. While recognizing that the predicted folds are unlikely to be correct in all their details, it is reasonable to propose that the overall robustness of the predicted folding might reflect an evolutionary optimization for defined folding in the plant. To eliminate candidates that do not fold as robustly as the previously known miRNAs, we required AtSet2 and OsSet2 20mers to pass MIRcheck a second time after being computationally folded in the context of sequences flanking the hairpin. Patscan was used to find all matches of AtSet2 and OsSet2 to their respective genomes, RNAfold was used to predict the secondary structure of each match in the context of a 500 nt genomic sequence centered on the 20mer, and each match was evaluated by MIRcheck (Figure la, step 4). 2,588 Arabidopsis 20mers (AtSet3) and 3,083 Oryza 20mers (OsSet3) had at least one locus that passed MIRcheck. Because EINVERTED misses some hairpins and because this second MIRcheck evaluation used more relaxed cutoffs (up to 6

70 unpaired nt each in the putative miRNA and miRNA*), this step also recovered paralogs that were missed in steps 1 or 2. 'The genomic matches to known Arabidopsis miRNAs are all either in hairpins or antisense to coding regions. To ensure that computationally identified miRNAs met this criterion, Arabidopsis 20mers were removed from the analysis if less than 50% of intergenic matches passed MIRcheck, or if more than 50% of genomic matches overlapped with repetitive sequence elements (Figure la, step 5), resulting in 2,506 20mers (AtSet4). Because gene annotation in Oryza is poor, we could not reliably define matches as genic or intergenic. The 2,780 Oryza 20mers that had at least 1 locus pass MIRcheck and had no more than 50% of genomic matches in repetitive sequence elements were included in OsSet4. The next step in our analysis was to identify pairs of Arabidopsis and Oryza hairpins that have miRNA-like patterns of sequence conservation (Figure la, step 6). MicroRNA precursors are generally most conserved in the miRNA:mRNA* portion of the hairpin, a characteristic that has been used to help identify insect miRNA genes (Lai et al., 2003). In our procedure, we retained homologous pairs for which both the miRNA and miRNA* 20mers were more conserved than any 20mer from the loop regions. Doing pairwise comparisons of the hairpins of AtSet4 against those of OsSet4 resulted in 1,145 20mers (AtSet5) with at least 1 acceptable Oryza homolog. AtSet5 was mapped to the Arabidopsis genome, and overlapping 20mers were joined together to form 379 sequences with miRNA encoding potential. A single miRNA gene could be represented by up to four of these potential miRNA sequences, representing the miRNA, the miRNA*, the antisense miRNA, and the antisense miRNA*. After accounting for multiple potential miRNAs mapping to a single locus, the 379 potential miRNAs represented 228 potential miRNA loci. These 228 loci were grouped into 118 families of potential miRNA loci based on sequence similarity as determermined by blastn. Many of these newly identified miRNA candidates had patterns of secondary structure conservation resembling those of previously known plant miRNAs (Figure lb,c). For many of the miRNA loci corresponding to previously reported miRNAs, the computationally identified sequences extended 1-9 nt on either side of the cloned miRNAs, although in a few cases the actual miRNA overlapped with but extended beyond the predicted sequence.

71 A refined procedure for predicting miRNA targets We previously identified mRNAs containing ungapped, antisense matches to miRNAs with 0-3 mispairs (counting G:U pairs as mispairs) as probable miRNA targets (Rhoades et al., 2002). Although the majority of validated plant miRNA targets are captured by this cutoff, there are several authentic targets which are missed. For example, miR162 has a bulged nucleotide as it basepairs to the mRNA of DCL1, and miR-JAW has 4-5 mispairs to the mRNAs of several TCP transcription factors (Palatnik et al., 2003; Xie et al., 2003). In order to more thoroughly assess the mRNA targeting potential of both known and predicted miRNAs, we developed a more sensitive computational approach to identify target candidates. It allows for gaps and more mismatches in the mRNA:miRNA duplex but requires that the miRNA complementarity be conserved between homologous Arabidopsis and Oryza mRNAs. Each miRNA complementary site was scored, with perfect matches given a score of 0, and points were added for each G:U wobble (0.5 points), each non-G:U mismatch (1 point) and each bulged nucleotide in the miRNA or target strand (2 points). To allow the same cutoffs to be applied more evenly to miRNAs of different lengths and to avoid penalizing mismatches at the ends of longer miRNAs, those miRNAs that were longer than 20 nt were broken into overlapping 20mers, with the mRNA:miRNA pair receiving the score of the most favorable 20mer. This scoring was tested using a set of 10 unrelated miRNAs that are highly conserved (0-1 substitutions) between Arabidopsis and Oryza (refset2, Table S1). As a control, we generated 5 cohorts of permuted miRNAs, in which each permuted miRNA has the same dinucleotide composition as the corresponding miRNA in refset2. For all 20mers from the sets of real and permuted miRNAs we searched for complementary sites in Arabidopsis and Oryza mRNAs. Compared to their shuffled cohorts, the real miRNAs had many more complementary Arabidopsis mRNAs with scores < 2 (Figure 2a), which was in agreement with our previous results (Rhoades et al., 2002). Filtering the miRNA-complementary mRNAs to include only those conserved to Oryza showed that nearly all the complementary sites to authentic miRNAs with scores of < 2 are conserved (Figure 2b). For the permuted miRNAs, requiring conservation reduced to nearly zero the number of complementary sites with scores of 2-3.5, whereas for the authentic miRNAs a small but significant number of sites scoring in this range were conserved (Figure 2b). Thus, adding a requirement for conservation raised the threshold at which spurious matches were found, thereby enabling confident prediction of targets that were less extensively

72 paired to the miRNAs - in some cases forming Watson-Crick pairs to only 15 of 20 miRNA nucleotides. Each of the conserved miRNAs had at least one predicted target with score < 3.0, suggesting that the possession of predicted targets could be a criterion for screening the newly identified miRNA candidates. For each 20mer in AtSet5 and OsSetS, miRNA complementary sites were found and scored (Figure la, step 7). As would be expected even for permuted sequences, nearly all of the AtSet4 20mers (1,124 out of 1,145) had a complementary score of 3.0 to at least 1 Arabidopsis mRNA. Of these, 278 20mers (AtSet6) had at least one homologous Oryza 20mer with complementarity to a homologous Oryza mRNA. AtSet6 represented 24 families of potential miRNAs, which account for 100 potential miRNA loci. Eleven of these families, represented by 60 loci (including 41 refsetl loci), corresponded to all previously known miRNA families with identifiable Oryza homologs, suggesting that our method also identified most of the previously unknown families that have extensive conserved complementarity in Oryza.

Newly identified miRNAs are expressed Our computational screen identified 13 previously unreported families of conserved miRNA candidates with conserved complementarity to mRNAs. To determine which of these putative miRNAs are expressed, we used a PCR based assay (Lim et al., 2003a; Lim et al., 2003b) to search for the predicted miRNAs in a library of small cDNAs (Reinhart et al., 2002). In addition to verifying the expression of the miRNAs, this assay maps the 5' ends of the miRNAs (Table 2). Each PCR reaction used one common primer corresponding to the adaptor oligo attached to the 5' end of all members of the library and one primer specific the 3' portion of the predicted miRNA. For seven miRNA families, PCR reactions resulted in products in which the specific primer was extended by at least 3 nucleotides that matched the predicted miRNA sequence. In sum, the seven newly identified miRNA families comprised 23 genomic loci in Arabidopsis (Table 2). All clones for families 393, 396, 397, and 398 had the same 5' end, while for families 394, 395, and 399 miRNAs were detected with differing 5' ends that could result from inconsistent processing of precursors transcripts from a single locus, or from differential processing of precursors from different loci. Several of these miRNA families include loci that

73 would encode distinct but highly similar miRNAs (Table 2). Because the PCR primers overlapped with the residues that differ, it is not possible to know which variants were detected. ;Sixfamilies of putative miRNAs passed all computational checks but were not validated by the IPCRassay. Five of these families had a single locus in Arabidopsis, whereas the sixth had 14 Arabidopsis loci and 52 Oryza loci and likely represented a repetitive element not identified by RepeatMasker. Although the possibility that some of these non-validated predicted candidates are authentic cannot be ruled out, we consider it unlikely that they represent miRNA sequences. The expression of newly identified miRNAs was also tested by Northern blot analysis. Hybridization probes were designed for representative members of the 7 miRNA families detected by the PCR assay. Probes complementary to miR393, miR394, miR396a, miR398b detected 20-21 nt RNAs in samples from wild-type, soil-grown Columbia plants (Figure 3a), whereas probes complementary to miR395a, miR397b, and miR399b did not detect expressed small RNAs in these samples. These miRNAs that are difficult to detect on a Northern blot are likely to be expressed only at low levels or only in a subset of tissues or growth conditions. Because miR395 is complementary to mRNAs of ATP sulfurylase (APS) proteins (Figure 5), amdbecause the expression levels of numerous sulfate metabolizing genes are responsive to sulfate levels (Takahashi et al., 1997; Lappartient et al., 1999; Maruyama-Nakashita et al., 2003), we hypothesized that the expression of miR395 might be dependent on cellular sulfate levels. To test this, we probed RNA samples from plants grown in modified MS media containing various amount of sulfate. As seen for plants grown in soil, miR395 was not detected in the samples from plants grown in 2 mM SO42-. However, miR395 was readily detected in the samples grown 2 in very low sulfate (Figure 3b, 0.2 or 0.02 mM SO4 ). Induction of miR395 by low external sulfate concentrations is somewhat reminiscent of the starvation-associated miR-234 increase that has been observed in nematodes (Lim et al., 2003b), although the miR395 induction (greater than 100 fold) is much more striking than that of miR-234 (twofold). We examined whether APS1 expression changed in the conditions that induced miR395, and found that its expression decreased when miR395 increased, as would be expected if APS1 was a cleavage target of miR395 (Figure 3c).

74 Experimental verification of miRNA targets MicroRNAs, like small interfering RNAs (siRNAs (Elbashir et al., 2001)), can direct the cleavage of their mRNA targets when these messages have extensive complementarity to the miRNAs (Hutvagner and Zamore, 2002; Llave et al., 2002b; Tang et al., 2003; Yekta et al., 2004). This miRNA-directed cleavage can be detected by using a modified form of 5'-RACE (rapid amplification of cDNA ends) because the 3' product of the cleavage has two diagnostic properties: 1) a 5' terminal phosphate, making it a suitable substrate for ligation to an RNA adaptor using T4 RNA ligase, and 2) a 5' terminus that maps precisely to the nucleotide that pairs with the tenth nucleotide of the miRNA (Llave et al., 2002b; Kasschau et al., 2003). To examine whether any of the newly identified miRNAs can direct cleavage of their predicted targets in vivo, we isolated RNA from vegetative and floral tissues and performed the 5'-RACE procedure using primers specific to the predicted targets. For 19 predicted targets the 5'-RACE PCR yielded a distinct band of the predicted size on an agarose gel, which was isolated, cloned and sequenced. In all 19 cases the most common 5' end of the mRNA fragment mapped to the nucleotide that pairs to the tenth nucleotide of one of the miRNAs validated by PCR (Figure 4), indicating cleavage at sites precisely analogous to those seen for other miRNA targets (Llave et al., 2002b; Aukerman and Sakai, 2003; Kasschau et al., 2003; Palatnik et al., 2003; Xie et al., 2003; Vazquez et al., 2004)(Mallory et al., 2004), as well as for RNAs complementary to siRNAs and metazoan miRNAs (Elbashir et al., 2001; Hutvagner and Zamore, 2002; Yekta et al., 2004). These observations also corroborate the 5' ends of the miRNAs as mapped by PCR ('Table 2).

Identification of miRNA paralogs Our computational approach found 81 miRNA loci from 18 miRNA families (Table 1, Table S2). We searched for additional members of these families by searching the Arabidopsis genome for near matches (0-3) to the miRNAs of these 81 loci (Figure la, step 9). After manual inspection for potential hairpin-like secondary structures, this identified six additional loci in miRNA families that are conserved to Oryza. Together with the five loci in miRNA families without apparent Oryza homologs, this brings to 92 the total number of Arabidopsis loci that meet the criteria for designation as miRNA genes (Ambros et al., 2003) (Table S2). As is generally the case with computational gene prediction, some of these might be pseudogenes.

75 Our de-novo miRNA-finding algorithm found 88% of these, and 93% of those with Oryza homologs. These Arabidopsis genes correspond to 122 Oryza miRNA genes, of which 111 (91%) were found de-novo by our algorithm (Figure la, step 9; Table S3). As has been previously observed for numerous animal miRNAs (Lagos-Quintana et al., 2001; Lau et al., 2001;), we find that some plant miRNA genes are clustered in the genome, most strikingly the genes of the 395 family. In Arabidopsis, miRNAs of the 395 family are located in two clusters, each containing three hairpins within 4 kb (Figure Sla). In each cluster, two MIR395 hairpins are on one strand while the third is on the opposite. Thus each cluster could not be expressed as a single primary transcript, but could be expressed as two transcripts sharing common regulatory elements. The Oryza MIR395 hairpins are also clustered, but with a different arrangement than in Arabidopsis. The two largest Oryza MIR395 clusters contain seven and six hairpins, respectively, within 1 kb, with all hairpins encoded on the same strand of DNA (Figure Slb). These clusters are likely expressed as transcripts containing multiple miRNAs, an idea supported by Oryza EST CA764701, which contains four miR395 hairpins. Prediction of conserved miRNA targets Having refined our computational method to more sensitively predict plant miRNA targets, we applied it to the prediction of conserved mRNA targets of all known Arabidopsis and Oryza miRNAs (Figure 5). Control experiments with refset2 and 5 sets of permuted miRNAs suggested that a score cutoff of < 3.5 was appropriate to identify conserved miRNA targets with high sensitivity and selectivity. However, when searching for targets of the entire set of miNRAs, this cutoff identified a number of mRNAs for which miRNA mediated cleavage products could not be found by 5'-RACE. Thus, a cutoff of < 3 was chosen to minimize the number of non-authentic targets. All previously validated targets miRNA targets are identified at this level of sensitivity, although several newly validated targets have scores of 3.5 in one or both species and are not retained using this cutoff. Thus, there is still a threshold at which it is difficult to distinguish authentic targets from potentially spurious complementarity without experimental verification. Nonetheless, a score of <3.0 in our refined method identifies targets with very high confidence (Figure 2).

Plant miRNAs are deeply conserved

76 MicroRNAs conserved between the dicot Arabidopsis thaliana and the monocot Oryza sativa are likely to be found in most flowering plants. Homologs of miR-JAW and miR-JAW complemenary sites have been found in ESTs from numerous angiosperms (Palatnik et al., 2003). To look for evidence of other miRNAs in additional plant species, we searched for ESTs representing potential homologs of Arabidopsis and Oryza miRNAs, defined here as having ;19/20nt matches and a predicted foldback that passes MIRcheck. This search identified 187 putative miRNA homologs in the ESTs (Table S4). A large majority of these appear to be authentic, in that the 10 miRNAs in refset2 each had on average 9.7 EST matches that passed MIRcheck, whereas the set of 50 permuted miRNAs averaged only 0.04 matches that passed MIRcheck. For all 18 miRNA families that are conserved between Arabidopsis and Oryza, potential miRNA precursors were found in at least one additional angiosperm species (Table S4). For miRNAs that are not conserved between Arabidopsis and Oryza, no homologous miRNAs in additional species were identified, suggesting that the lack of conservation in Oryza is a consequence of recent emergence rather than loss in the Oryza lineage. We also searched for matches to experimentally confirmed miRNA complementary sites in ESTs encoding proteins homologous to Arabidopsis targets (blastx score >10-6). For all miRNA families with validated miRNA targets, conserved miRNA complementary sites (19/20 nt matches) were found in at least one additional angiosperm (Table S5). On average, the miRNA complementary sites from 17 unrelated Arabidopsis miRNA targets were each conserved in 191 homologous ESTs, representing 14 species. This is far more than would be expected by chance; when repeating the analysis using 170 sites chosen at random from the same Arabidopsis mRNAs, the average number of ESTs and species were 2.6 and 0.5, respectively. MicroRNAs of the 166 family, as well as their binding sites in mRNAs of HD-ZIP proteins, predate the emergence of seed plants (Floyd and Bowman, 2004). We found nine miRNA families (156, 160, 166, 167, 393, 395, 396, 397 and 398) that had complementary sites conserved in gymnosperms, while a miR171 complementary site was conserved in a SCL mRNA from a fern (Ceratopteris richarii). In addition, a potential miRNA hairpin of the 159/JAW family was present in an EST from moss (Physomitrella patens). These data suggest that multiple miRNAs have deep origins in plant phylogeny.

Discussion

77 The scope of miRNAs conserved between dicots and monocots A combination of computational prediction and experimental verification identified seven families of sequences that had not previously been identified as miRNAs. A set of 2088 small RNAs from Arabidopsis was recently reported (Xie et al., 2004) (http://gac.bcc.orst.edu/smallRNA/). Sequences corresponding to miR397a, miR398b and miR399b were contained in this dataset, each having been cloned a single time, although none were annotated as miRNAs. The cloning of miR397 and miR399, which were not detected by Northern blot, corroborates their expression as determined by PCR. Families 393, 394, 395 and 396 are absent from the reported sets of cloned, sequenced small RNAs. These are each detectable by Northern analysis, and as with families 397, 398 and 399 were detected by PCR in our library of small cDNAs used for cloning. Therefore they would have been found eventually by sequencing enough small cDNAs. However, given that other miRNAs have been cloned hundreds of times (Xie et al., 2004), it seems that all seven newly identified miRNA families are relatively rare in the tissues and growth conditions from which small RNAs have been cloned. They may represent miRNAs that are needed at low levels, or whose expression is limited to rare cell types or particular growth conditions. The expression of miR395 is greatly increased by sulfate starvation; other miRNAs with seemingly low expression may also be inducible by metabolite levels or environmental stimuli. It is the identification of these difficult to clone but potentially important miRNAs that makes computational prediction a useful complement to cloning of small RNAs. The sensitivity of our computational approach, which found all 11 conserved miRNA families previously identified through cloning, suggests that most plant miRNAs with properties similar to previously cloned miRNAs have been identified. MicroRNA genes not found by our analysis are likely to fall into several categories. One set will be those without apparent conservation to Oryza. This describes four families of currently known Arabidopsis miRNAs (158, 161, 163, and 173). It is difficult to estimate how many additional non-conserved miRNA families exist in either species, but the observation that most of the cloned plant miRNAs have readily identified Oryza homologs indicates either that there are no more than a handful of non-conserved miRNAs remaining to be identified or that non-conserved miRNAs are disproportionally poorly expressed in plants.

78 Another set of false negatives will be miRNA families that are conserved between Arabidopsis and Oryza but were missed by our analysis. Most steps in our analysis have the potential to lose authentic miRNA genes. The parameters and cutoffs we used were chosen to be slightly more relaxed than what was needed to retain most loci corresponding to the 11 previously-known miRNAs families with Oryza homologs in refsetl. They found at least one member of each family and 92% (59/64) of all loci in these families. A similar percentage of loci, 96%, were correctly identified for newly discovered miRNA families (22/23), suggesting that our parameters are not over-fitted. Relaxing the parameters of MIRcheck (Figure 1, steps 2 & 4) to allow up to two asymmetric bulges, shorter hairpins (as short as 54 nt), and an additional mismatch did not identify any additional verifiable miRNAs (data not shown). Nonetheless, the low number of previously identified Arabidopsis miRNA gene families (15) precluded splitting the miRNAs into a training set and test set, as was done in our metazoan analysis to evaluate the degree of overtraining and enable firm estimates of the number of genes remaining to be identified (Lim et al., 2003b). MicroRNA families with few members would be more prone to being missed. For example, MIR393 and MIR394 each have only one identified locus in Oryza; either would have been missed if their Oryza locus had been among the fraction of authentic miRNA loci not identified as an inverted repeat or that did not pass MIRcheck, whereas miRNAs that were members of larger gene families that have multiple Oryza homologs were identified even though some Oryza homologs were missed. The observation that some miRNA primary transcripts are spliced (Aukerman and Sakai, 2003) raises the possibility that some miRNA transcripts might have an intron within the hairpin precursor, which could prevent their identification in our analysis of genomic DNA. Furthermore, any unknown miRNA family that systematically had a pattern of base pairing that failed MIRcheck would also have been lost, but there is no reason to suspect that this was a widespread problem. More significant uncertainty in plant miRNA gene number arises from the 94 families of candidate miRNAs that had conserved miRNA-like hairpins but lacked extensive and conserved complementarity to mRNAs. Some of these candidates may be authentic miRNAs with different modes of target recognition. For example, any plant miRNA that recognizes all its target mRNAs in a manner similar to that of most animal miRNAs, that is, by recognizing its targets predominantly through "seed matches" (Lewis et al., 2003), would have been missed. Therefore, further analysis will be required before a meaningful upper bound on the number of plant

79 miRNA genes can be estimated. The 92 loci tabulated to date, when considered together with the assumption that a few others might remain undetected because they are refractory to both cloning and computation, places a lower bound on the number of Arabidopsis miRNA genes at -100, or -0.4% of the predicted Arabidopsis genes-a percentage somewhat lower than that of animals. The plant miRNAs are generally in larger, more highly related families, further reducing the relative complexity of known miRNA sequences when compared to those of animals. Of course, when considering the vast number of distinct -22 nt RNAs that have been cloned from plants, which might be endogenous siRNAs but are not miRNAs, the diversity of small RNA silencing in plants could exceed that in animals.

The targets of newly identified miRNAs The detection of the RNA fragments diagnostic of miRNA-directed cleavage confirms in planta these 19 newly identified miRNA-target interactions. However, these 5'-RACE results do not rule out the possibility that the predominant mode of silencing is translational inhibition. 5'- RACE experiments demonstrate that miR172 directs the cleavage of some APETALA2 mRNA molecules, even though the predominate mode of repression appears to be translational inhibition (Aukerman and Sakai, 2003; Kasschau et al., 2003; Chen, 2004). Nonetheless, for all the other plant miRNA targets examined, inhibition of the miRNA pathway leads to increased accumulation of target mRNA (Kasschau et al., 2003; Vaucheret et al., 2004; Vazquez et al., 2004), suggesting that mRNA cleavage typically plays a significant regulatory role, although in these cases augmentation by translational repression cannot be ruled out. The same is likely to be true for our newly identified targets. Some of the newly identified targets resemble those of previous predictions with regard to their proven or inferred roles in regulating developmental processes (Figure 5). miR396 targets seven Growth Regulating Factor genes, which are putative transcription factors that regulate cell expansion in leaf and cotyledon (Kim et al., 2003). miR393 and miR394 both target the messages of F-box proteins, which in turn target specific proteins for proteolysis by making them substrates for ubiquitination by SCF E3 ubiquitin ligases (Vierstra, 2003). At2g27340, targeted by miR394, is in the same subfamily of F-box genes as UNUSUAL FLORAL ORGANS (UFO) (Gagne et al., 2002), which is involved in floral initiation and development (Wilkinson and Haughn, 1995; Samach et al., 1999). miR393 targets four closely related F-box genes, including

80 TRANSPORT INHIBITOR RESPONSE1 (TIR1), which targets AUX/IAA proteins for proteolysis in an auxin-dependent manner and is necessary for auxin-induced growth processes (Ruegger et al., 1998; Gray et al., 2001). These five F-box genes constitute a newly identified biochemical class of miRNA targets. The identification of TIR1 as a miRNA target implies that miRNAs regulate auxin- responsiveness at multiple points. Other auxin related miRNA targets include Auxin Response Factors (miR160 and miR167) (Rhoades et al., 2002; Kasschau et al., 2003), which are thought to regulate transcription in response to auxin(Ulmasov et al., 1999), and NAC1 (miR164) (Rhoades et al., 2002; Mallory et al., 2004), which promotes auxin-induced lateral root growth downstream of TIR1 (Xie et al., 2000). Finally, in addition to targeting F-box genes, miR393 also targets At3g23690, a basic helix-loop-helix transcription factor with homology to GBOF-1 from tulip, which Genbank annotates as auxin-inducible. Other newly identified miRNA targets have less obvious connections to the control of developmental patterning (Figure 5). miR397 targets putative laccases, members of a family of enzymes with numerous described roles in fungal biology but without well defined roles in plant biology (Mayer and Staples, 2002). miR399 targets two copper superoxide dismutases, CSD1 and CSD2, enzymes which protect the cell against radicals and whose expression patterns respond to oxidative stress (Kliebenstein et al., 1998). The most definitive example of a plant miRNA operating outside the gene regulatory circuitry controlling development is miR395. miR395 targets the ATP sulfurylases, APS1, APS3 and APS4, enzymes that catalyze the first step of inorganic sulfate assimilation (Leustek, 2002). The observations that the expression of miR395 depends on sulfate concentration and that APS1 expression declines with increasing miR395 corroborate the idea that this miRNA regulates sulfate metabolism (Figure 3). Our systematic analysis, which probably has identified most plant miRNAs with conserved and extensive complementarity to plant messages, including those that are expressed at very low levels during lab growth conditions, allows us to revisit the question of what this class of tiny regulatory RNAs is generally doing in plants. As before (Rhoades et al., 2002), we find an overwhelming propensity for targeting messages of known or suspected plant transcription factors (63 of 83, or 76% of genes in Figure 5) and similar propensity for targeting messages of genes with known or suspected roles in plant development (70 of 83, or 84% of

81 genes in Figure 5). A propensity to target developmental regulators differs from what has been seen in mammals (Lewis et al., 2003). Nonetheless, the conserved targets of plant miRNAs extend beyond the regulatory circuitry of development. The discovery that miRNAs regulate genes such as ATP sulfurylases, laccases, and superoxide dismutases shows that miRNAs also have an ancient role in regulating other aspects of plant biology.

Experimental procedures Details of the computational miRNA prediction method and sequences of primers used are available online at http://www.molecule.org/. PCR validation of miRNAs We used a PCR based assay to detect expression and map the 5' ends of predicted miRNAs (Lim et al., 2003b). miRNAs were PCR amplified out of a library of small cDNAs from leaf, flower, and seedling flanked by 5' and 3' adaptor oligos (Reinhart et al., 2002). Each PCR reaction used one common primer corresponding the 5' adaptor oligo and one specific primer antisense to the 3' portion of the predicted miRNA. RNA purification and Northern hybridization RNA was isolated as previously described (Vance, 1991). For developmental Northerns, 30 ,/g per lane of total RNA from soil grown Colombia plants was separated by 15% polyacrylamide electrophoresis and blotted to a nylon membrane. For plants grown on media, Columbia plants were grown in long-day conditions on modified 2 MS/agarose media, containing 0.8% Agarose-LE (USBiochem), in which the S0 4 -containing salts of minimal MS media were replaced with their chloride counterparts and the media supplemented with 20/zM to 2 mM 2(NH4)SO4. RNA was harvested from 2-week old plants. For miRNA Northerns, 40 jig per lane was used in Northern blots as above. For miR393, miR394, miR396a and miR398b, end-labeled antisense DNA probes were used. For miR395a, miR397b, and miR399b, higher specific activity Starfire (Integrated DNA technologies) probes were used. MicroRNA Northerns were hybridized and washed as previously described (Lau et al., 2001). For mRNA Northerns, 10 g per lane were separated by agarose electrophoresis and blotted as described (Mallory et al., 2001). Probes to exon 1 of APS1 were made using the Megaprime DNA labeling system (Amersham).

82 5'-RACE analysis 5'-RACE was performed on poly(A)-selected RNA from Columbia inflorescences and rosette leaves using the GeneRacer Kit (Invitrogen) as described (Kasschau et al., 2003), except that nested PCR was done for each gene, with each round of PCR using one gene-specific primer and the GeneRacer 5' Nested Primer. For each gene we designed gene-specific primers that were 180-450 bp away from the predicted miRNA binding site. PCR reactions were separated by agarose gel electrophoresis, and distinct bands of the appropriate size for miRNA-mediated cleavage were purified (excised gel slices corresponded to a size range of - 100 basepairs), cloned, and sequenced. Acknowledgements We thank M. Axtell for the 5'-RACE library, R. Rajagopalan for the library of 18-28 nt cDNAs and Allison Mallory and other Bartel lab members for helpful discussions. This work was supported by grants from the NIH.

83 Table 1. Sensitivity of computational identification of plant miRNA loci Family At loci Os loci Newly identified families 393 2/2 1/1 394 2/2 1/1 395 6/6 16/19 396 2/2 3/3 397 1/2 1/2 398 3/3 2/2 399 6/6 10/11

Previously identified conserved families 156a 12/12 12/12 159/JAW a,b,c 3/6 7/8 160 a 3/3 6/6 162 ad 2/2 2/2 164 a 3/3 5/5 166 a 8/9 10/12 167 a,b,d 3/4 9/9 168 a 2/2 1/2 169 a 14/14 15/17 171 ad 4/4 7/7 172 b 5/5 3/3

Previously identified non-conserved families 158a 0/2 0 161 a 0/1 0 163 a 0/1 0 173 b 0/1 0 All newly identified and previously known miRNA families are tallied. The number of loci found by de novo computational prediction (Figure la, through step 8) is shown (numerator) as fraction of total found by searching for near paralogs to miRNAs with verified expression (denominator). Additional details regarding the miRNA loci are reported in Tables S2 and S3 (Arabidopsis and Oryza loci, respectively). Citations for previously identified families: aReinhart et al. (2002). bPark et al. (2002). CMette et al. (2002).dLlave et al. (2002b).

84 Table 2. Newly identified miRNA gene families in Arabidopsis miRNA miRNA Chr. Arm miRNA sequence family gene 393 MIR393a 2 5' UCCAAAGGGAUCGCAUUGAUC (PCR,N,R) MIR393b 3 5' "" 394 MIR394a 1 5' uUCUUUGGCAUUCUGUCCACC (PCR,N,R) MIR394b 1 5' ". " 395 MIR395a 1 3' cUGAAGUGUUUGGGGGAACUC (PCR,N,R) MIR395b 1 3' " MIR395c 1 3' MIR395d 1 3' cUGAAGUGUUUGGGGGGACUC MIR395e 1 3' MIR395f 1 3' " 396 MIR396a 2 5' UUCCACAGCUUUCUUGAACUG (PCR,N,R) MIR396b 5 5' UUCCACAGCUUUCUUGAACUU 397 MIR397a 4 5' UCAUUGAGUGCAGCGUUGAUG (PCR,R) MIR397b 4 5' UCAUUGAGUGCAUCGUUGAUG 398 MIR398a 2 3' UGUGUUCUCAGGUCACCCCUU (PCR,N,R) MIR398b 5 3' UGUGUUCUCAGGUCACCCCUG MIR398c 5 3' " . 399 MIR399a 1 3' UGCCAAAGGAGAUUUGCCCUG (PCR) MIR399b 1 3' ccUGCCAAAGGAGAGUUGCCCUG MIR399c 5 3' MIR399d 2 3' UGCCAAAGGAGAUUUGCCCCG MIR399e 2 3' UGCCAAAGGAGAUUUGCCUCG MIR399f 2 3' UGCCAAAGGAGAUUUGCCCGG Newly identified miRNA families are listed with summary of experimental validation (PCR, PCR validation of miRNA; N, Northern blot of miRNA; R, 5'- RACE of target mRNA). The chromosome of each locus is indicated (Chr.), as is the arm of the predicted stem-loop that contains the miRNA (arm). 5' ends of miRNAs were determined from PCR of small cDNAs, and lengths of miRNAs were inferred from mobility on Northern blots. For miRNAs not detected on Northem blots (families 397 and 399), lengths of 21 nt were assumed. For miRNA families for which multiple 5' ends were detected by PCR, nucleotides present in some but not all clones are listed in lower case.

85 Figure legends Figure 1. Prediction of conserved plant miRNAs. (A) Outline of the computational approach used to identify conserved plant miRNAs. See text for description. In steps 1-8, the sensitivity is reported (blue) as the fraction of miRNA loci retained with perfect matches to previously identified miRNAs (refsetl). In step 9, this fraction extends to imperfect matches to previously identified miRNAs. In the later steps, the total numbers of predicted miRNA loci are also reported (red). (B,C) Predicted hairpin secondary structures of two newly identified miRNA families, 393 (B) and 394 (C) that target mRNAs of F-box proteins. Nucleotides in red comprise the sequence of the most common mature miRNA as deduced from PCR validation and Northern hybridization. Nucleotides in blue indicate additional portions of the hairpins predicted to have miRNA- encoding potential after identification of conserved 20mers in miRNA-like hairpins (Figure la, step 6), but before identification of conserved complementarity to mRNAs or experimental evaluation. For all three MIR393 loci, sequences antisense to the validated miRNA were also identified as potentially miRNA-encoding, but the miRNA* segments were not.

Figure 2. The utility of incorporating evolutionary conservation when predicting plant miRNA targets. (A) Arabidopsis mRNAs with sites complementary to a set of 10 diverse miRNAs conserved between Arabidopsis and Oryza (refset2) were found and scored such that lower scores indicate fewer mismatches (see text for details). The number of mRNAs with each of the indicated scores is graphed (solid bars). Complementary sites were found and scored in the same manner for 5 cohorts of permuted miRNAs with the same dinucleotide composition as the authentic miRNAs (open bars, average number of complementary mRNAs per cohort; error bars, 2 standard deviations). (B) mRNAs complementary to 10 miRNAs were found as in (A), with the additional requirement that at least one homologous Oryza mRNA be complementary to the same miRNA (solid bars). Each conserved miRNA complementary site is counted as having the either the Arabidopsis or Oryza score, whichever is higher (i.e. less complementary). Messenger RNAs with conserved complementarity to cohorts of dinucleotide shuffled miRNAs were found in the

86 same manner (open bars, average number of complementary mRNAs; error bars, 2 standard deviations).

Figure 3. Expression of newly identified miRNAs. (A) Total RNA (30 g) from seedlings (S), rosette leaves (L), flowers (F), and roots (R) were analyzed on a Northern blot, successively using radio-labeled DNA probes complementary to newly identified miRNAs. The lengths of 5'-phosphorulated radio-labeled RNA size markers (M) are indicated. As a loading control, he blot was probed for the U6 snRNA. (B) miR395 is induced with low sulfate. Total RNA (40 ttg) from 2-week-old Columbia plants grown on modified MS media containing the indicated concentrations of S042- were analyzed by Northern blot, probing for the indicated miRNAs as in (A). (C) APSi mRNA decreases in low sulfate. Total RNA (10 /tg) from 2-week-old plants grown on modified MS media containing the indicated concentrations of S04-2 were analyzed by Northern hybridization using randomly primed body-labeled DNA probes corresponding to exon 1 of the APS1 mRNA. Normalized ratios of APS1 mRNA to U6 splicosomal RNA are indicated.

Figure 4. Experimental verification of predicted miRNA targets. Each top strand (black) depicts a miRNA complementary site, and each bottom strand depicts the miRNA (red). Watson-Crick pairing (vertical dashes) and G:U wobble pairing (circles) are indicated. Arrows indicate the 5' termini of mRNA fragments isolated from plants, as identified by cloned 5'-RACE products, with the frequency of clones shown. Only cloned sequences that matched the correct gene and had 5' ends within a 100 nt window centered on the miRNA complmentary site are counted. The miRNA sequence shown corresponds to the most common miRNA suggested by miRNA PCR validation (Table 2). For miR394, the 5' end of a less common variant (1 out of 4 PCR clones) is indicated in lower case and corresponds to the most commonly cloned cleavage product.

Figure 5. Conserved predicted miRNA targets. All predicted miRNA targets with scores of 3.0 or less in both Arabidopsis and Oryza are listed. The score of the best scoring 20mer from any member of the miRNA family to each gene is given in parentheses. Predicted targets with scores greater than 3.0 in either Arabidopsis or

87 Oryza but have been validated by 5'-RACE are also listed and marked with an astrisks. Genes in red were validated as miRNA targets by 5'-RACE experiments in this work. Genes in blue are validated as miRNA targets by previous work. Additional information on these genes can be found at www.arabidopsis.org. a Vazquez et al. (2004). b Kasschau et al. (2003). c Palatnik et al.

(2003). dXie et al. (2003). e A. Mallory et al. (2004)f Tang et al. (2003). g Emery et. al. (2003). h

Vaucheret et al. (2004) Llave et al. (2002a). i Aukerman and Sakai (2003). k Chen (2003).

88 Jones-Rhoades and Bartel Figure 1 BAA GC-AC-G U-A CAA-U- u c-GIUA~ C-G UU-AU-A CCAC UU-A U U CAC UU-G U-G U A U-A C-G A A U-A G-C A A U-A C-G A A U-A U G U-A U-A C U A-U G-C G-C A-U U-A C-G",9 nt -GU 18 nt."U-A ,14 loop G-C loop loop U-A-o 26 nt-G U opC-G- C-G loop I G-C C 3. Identify Arabidol loA- A-U U-A A-U C-G mers with potential U-GU-G U-A C -C A-U U C U homologs A-U A-U A-U U A-U C-G U UU U-A C-G CC U C C-G C-G C-G U-A C-G C-G A-U U-A U-A G-C A-U A-U U- -C UG-C A-U U-A U-A C-G A-U A-U G-C C-G C-G C U G-C G-C U-A C-G C-G A-U U-A U-A G-C A-U A-U G-U G-C G-C G-C G'U G-C A-U G-C G'U A-U A-U A-U A-U A-U A-U 6. Identify miRNA- C-G A-U A-U of conservation bel c-G C-G C-G U-A C-G C-G Arabidopsis and Oi A-U U-A U-A G'U A-U A G G-C G-U C-G A A G-C G-C A-U A A AU G-C A-U A-U G-CA -C G-C AG-U C A-UG-UUG- UA8-U U 5' 3' 5' 3' 5' 3' MIR393a MIR393b MIR393 Chr 2 Chr 3 Contig 4493 Arabidopsis Arabidopsis Oryza

C U-A G-U U-A G-U GA U-A A A UU A-U C A G C-G A-U G A G-C G-C U-A U.G A-U U-A U U-G U- C C A-U A-U C C U-G C-G UAA-uGU A-U A-U Ci-i C-, ucU-A-23UCkGU nlt AGA loopn AUGA-AA U-AG-U CUC-G C-AA UG U-A U-A U C-G U-A C-G C-G C-G C-G U-A C-G U-A C-G U-A C-G C-G C-G C-G A-U C-G A-U C-G A-U C-G C-G C-G C-G U.G C-G U-G G-C U G G-C U-A G U U-A C U U-A C U U-A C U U-A U C U-A U C A-U U C A-U C-G A-U C-G G-C C-G G-C G-C G-C G-C U-A G-C U-A

C-G U-AU C-G U-A C-G U-A A G U-A U'G G A AC-G G-C A-U U-A A-U G-C G-C G G C-GA-U A-U A-U A-U A-UC-G A-UG-TP U U U U C-G 5' 3' 5' 3' 5' 3' MIR394a MIR394b MIR394 Chr 1 Chr 1 Contig 15318 Arabidopsis Arabidopsis Oryza Jones-Rhoades and Bartel Figure 2

A B 80 _- HAl

70

60

N1 z L nC 50 E c 40 E a) E 30 8 0 1 20 I

. V N1i]NC,4 IN 10

0 (A | ' E | m C', B 0 I __ _ _ _ I I I 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 3.5 4 Score (0 = 20 contiguous complementary nucleotides) Jones-Rhoades and Bartel Figure 3

2 B SO4 -(mM) S L F R M 0.02 0.2 2.0 .., -24 -24 miR393 -21 miR395 -21 miR394 -21 ~irQ -18 -24 miR156 miR396 -21 -24 miR398 t AL miR159

U6 a s"l

2 C SO4 -(mM) 0.02 0.2 2.0 APS1 *0

U6 APS1/U6 0.33 0.63 1.0 Jones-Rhoades and Bartel figure 4

1700 TIR1 .. ACAAAGCUGGAGAUGUCUU miR393a 3' CUAIA= A5AA& ,6 1874 7 At1g12820 ... GGUAGGUACGAAAj UGUCGUCUUG ... miR393a 3' CUAGACGCUA CCU 5' 1989 T At3926810 ... AGCAAGUAUGAAAAA UGUUUCAUG.

1577 I 4 At4g03190 ... GCCAAGCUAGA UGUCAUCUUG... miR393a 3' CUA A&&jA6GYAAACU 5' 396 81 At3923690 ... CUACCUUUG GA J UGGCAAUG... miR393a 3' CUAGUUACGCUAGGGAAACCU 5' 1345 2/!f/IO At1g27340 ... CUGUUGUGGAA UI ,J ,,CAUAUGGUG... miR394a 3' CCUCCACUGUUUACGGUUu 5' 1/10 10 3/1 .•/10 355 1/111•1911/ APS4 ... GAGACAGUCA U AAA A UUUAACCGU... miR395a 3 ' CUCAA•GG GUGAAGUC 5' 758 If GRL1 ... GAGGCCGCCAUCAUAG Cg U 5gACCAAAAU. . miR396a 3' GUCAAGUUCUUUCG-ACACCUU 5' 853 GRL2 ... GAGCCGUCCU AUCCAAUCU ... miR396a 3' GUCAAGUUCUUUCG-ACACCUU5' 732 GRL3 .. GUGGCCGCACCGUUCAAGAAAGCAUGUGGAAACUCCAACC.. miR396a 3' GUCAAGUUCUUUCG-ACACCUU 5' 4194f GRL7 ... GAGGUCGUCCU A CCC AU.. miR398a 3 ' 656111 GUAGUUGCACIAGUUACU 12 5 ' miR396a 3' GUCAAGUCUUUCG-ACACCUU5' 8275i GRL8 ... AGAGCCG1~.jiU CU miR396a 3' GUUCCACUG A aIUCUUGUU 5'

GRL9 ... CUAAUCGUAAAC•UAýI U CUUCU ... miR396a 3' GUCAAAUUCUUUC-CACtUU 5 ' 671 At2g29130 ... UACUACGAUUAy1E CGAACUCUUC... miR397a 3' GUA MUG. d r4A1MUUIA I A 5

737 If At2g38080 ... UGCUACGACUAGUCACGUGACUAUGAGAACUCUUU... miR397a 3' GUAUUGCGACUU ALUUACU 5' 1112 656 AM2g60020 ... uucucAGcuAAUAAuC AAUACGAGCUCUUU I I9 1 0ol miR397a 3' cGUGuUC 5' 82 CSD1 ... AuucuuuccArg AAAGGCCAAGU... miR398a 3' UUC TrA&6AU6IU 5' 10/I /12

CSD2 ... AGuGccGuCAU = AUAAAUGCCAAU... miR398a 3'1UUCCCCACU -G1 U 5' 85 At3g15640 ... CUAAUCCU UC r ,AA CAAAAC... miR398a 3' UC IAU UA-G5' Jones-Rhoades and Bartel Figure 5

miRNA family Target protein class Target genes - - 393 F-box proteins At1g12820(1), At3g26810(1), At3g629801TIR1(1.5), At4g03190(2.5) bHLH transcription factor At3g23690(2)* 394 F-box protein At1g27340(1) 395 ATP sulfurylases At3g228901APS1(1.5), At4g146801APS3(1 .5), At5g43780/APS4(0.5) 396 Growth Regulating Factor At2g22840/GRL1(3), At2g364001GRL3(3), At2g45480/GRL9(3), (GRL) transcription factors At3g529101GRL4(3), At4g241501GRL8(3), At4g377401GRL2(3), At5g536601GRL7(3) Rhodenase-like protein At2g40760(2.5) Kinesin-like protein B At4g271801ATK2(3) 397 Laccases At2g29130(0.5), At2g38080(1), At5g60020(1 ) Beta-6 tubulin At5g12250(3) 398 Copper superoxide dismutases At lgO88301CSD1(3),At2g28190/CSD2(3.5)* CytochromeC oxidase At3g15640(3)* subunit V 399 Phosphate transporter At3g54700(2) 156 Squamosa-promoter Binding Atlg273601SPL11(1), Atlg27370/SPL 10(1)a, Atlg53160/SPL4(2), Protein (SBP)-like transcription Atlg691701SPL4(1), At2g338101SPL3(1.5), At2g422001SPL9(1), factors At3g152701SPL5(3), At3g57920(1), At5g43270/SPL2(1)b, At5g50570(1), At5g50670(1) 159/JAW MYB transcription factors At2g26950/MYB104(1.5), At2g26960/MYB81(2.5), At2g324601MYB101(1.5), At3g114401MYB65(1.5)aC , At3g60460(1.5), At4g269301MYB97(2.5), At5gO6100/MYB33(1 .5)C, At5g550201MYB120(2) TCP transcription factors Atlg302101TCP24(2.5)c, AtIg53230/TCP3(3), At2g310701TCP10(2.5)c, At3gI5030/TCP4(2.5)c, At4g183901TCP2(2.5)c 160 Auxin Response Factors (ARF Atlg778501ARF17(0.5) b, At2g28350/ARFIO(1 )b transcription factors) At4g300801ARF16(1.5) 162 DICER-LIKE 1 AtIg010401DCL1(2)d 164 NAC domain transcription Atlg56010NAC1(1 )e, At3g151701CUC(1 )b'e,At5907680(1.5)e, factors At5g39610(2),At5g539501CUC2()' b , At5g61430(1.5)e 166 HD-Ziptranscription factors At1g304901PHV(1. 5 )f, Atlg52150/ A THB-15(1.5), At2g34710PHB(1 .5 )f, At4g328801A THB-8(1.5), At5g60690/REV(1 .5)9 167 Auxin ResponseFactors (ARF AtIg30330/ARF6(2), At5g370201ARF8(2)a transcription factors) 168 ARGONAUTE Atlg484101AGO(2.5)ah 169 CCAAT Binding Factor (CBF) Atlg17590(1.5), Atlg54160(2),At1g72830(1.5), At3g05690(1.5), HAP2-like transcription factors At3g20910(2), At5g06510(1.5), At5g12840(1.5) 171 SCARECROW-like At2g45160(0), At3g60630(0)a", At4g001501SCL6(O) transcription factors ' 172 APETALA2-liketrancription At2g28550/TOE1(1 5 )bJ,At2g39250(1), At4g36920/AP2(0.5)bk, factors At5601201TOE2(O.5)bJ, At5g671801TOE3(1. 5)b Ambros, V., Lee, R. C., Lavanway, A., Williams, P. T., and Jewell, D. (2003). MicroRNAs and

other tiny endogenous RNAs in C. elegans. Curr Biol 13, 807-818.

Aukerman, M. J., and Sakai, H. (2003). Regulation of flowering time and floral organ identity by

a MicroRNA and its APETALA2-like target genes. Plant Cell 15, 2730-2741.

Bartel, B., and Bartel, D. P. (2003). MicroRNAs: at the root of plant development? Plant Physiol

132, 709-717.

Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,

281-297.

Chen, X. (2004). A MicroRNA as a Translational Repressor of APETALA2 in Arabidopsis

Flower Development. Science 303, 2022-2025.

Elbashir, S. M., Lendeckel, W., and Tuschl, T. (2001). RNA interference is mediated by 21- and

22-nucleotide RNAs. Genes Dev 15, 188-200.

Emery, J. F., Floyd, S. K., Alvarez, J., Eshed, Y., Hawker, N. P., Izhaki, A., Baum, S. F., and

Bowman, J. L. (2003). Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI

genes. Curr Biol 13, 1768-1774.

Enright, A. J., John, B., Gaul, U., Tuschl, T., Sander, C., and Marks, D. S. (2003). MicroRNA

targets in Drosophila. Genome Biol 5, R1.

Floyd, S. K., and Bowman, J. L. (2004). Gene regulation: ancient microRNA target sequences in

plants. Nature 428, 485-486.

Gagne, J. M., Downes, B. P., Shiu, S. H., Durski, A. M., and Vierstra, R. D. (2002). The F-box

subunit of the SCF E3 complex is encoded by a diverse superfamily of genes in Arabidopsis.

Proc Natl Acad Sci U S A 99, 11519-11524.

94 Grad, Y., Aach, J., Hayes, G. D., Reinhart, B. J., Church, G. M., Ruvkun, G., and Kim, J. (2003).

Computational and experimental identification of C. elegans microRNAs. Mol Cell 11, 1253-

1263.

Gray, W. M., Kepinski, S., Rouse, D., Leyser, O., and Estelle, M. (2001). Auxin regulates

SCF(TIR1)-dependent degradation of AUX/IAA proteins. Nature 414, 271-276.

Grishok, A., Pasquinelli, A. E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D. L., Fire, A.,

Ruvkun, G., and Mello, C. C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing.

Cell 106, 23-34.

Hammond, S. M., Bernstein, E., Beach, D., and Hannon, G. J. (2000). An RNA-directed

nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296.

Hutvagner, G., McLachlan, J., Pasquinelli, A. E., Balint, E., Tuschl, T., and Zamore, P. D.

(2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7

small temporal RNA. Science 293, 834-838.

Hutvagner, G., and Zamore, P. D. (2002). A microRNA in a multiple-turnover RNAi enzyme

complex. Science 297, 2056-2060.

Juarez, M. T., Kui, J. S., Thomas, J., Heller, B. A., and Timmermans, M. C. (2004). microRNA-

mediated repression of rolled leafl specifies maize leaf polarity. Nature 428, 84-88.

Kasschau, K. D., Xie, Z., Allen, E., Llave, C., Chapman, E. J., Krizan, K. A., and Carrington, J.

C. (2003). P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis

development and miRNA unction. Dev Cell 4, 205-217.

95 Ketting, R. F., Fischer, S. E., Bernstein, E., Sijen, T., Hannon, G. J., and Plasterk, R. H. (2001).

Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659.

Khvorova, A., Reynolds, A., and Jayasena, S. D. (2003). Functional siRNAs and miRNAs

exhibit strand bias. Cell 115, 209-216.

Kidner, C. A., and Martienssen, R. A. (2003). Macro effects of microRNAs in plants. Trends

Genet 19, 13-16.

Kim, J. H., Choi, D., and Kende, H. (2003). The AtGRF family of putative transcription factors

is involved in leaf and cotyledon growth in Arabidopsis. Plant J 36, 94-104.

Kliebenstein, D. J., Monde, R. A., and Last, R. L. (1998). Superoxide dismutase in Arabidopsis:

an eclectic enzyme family with disparate regulation and protein localization. Plant Physiol 118,

637-650.

Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel

genes coding for small expressed RNAs. Science 294, 853-858.

Lai, E. C. (2003). microRNAs: runts of the genome assert themselves. Curr Biol 13, R925-936.

Lai, E. C., Tomancak, P., Williams, R. W., and Rubin, G. M. (2003). Computational

identification of Drosophila microRNA genes. Genome Biol 4, R42.

Lappartient, A. G., Vidmar, J. J., Leustek, T., Glass, A. D., and Touraine, B. (1999). Inter-organ

signaling in plants: regulation of ATP sulfurylase and sulfate transporter genes expression in

roots mediated by phloem-translocated compound. Plant J 18, 89-95.

Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny

RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

96 Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864.

Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., and Kim, V. N. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature

425, 415-419.

Lee, Y., Jeon, K., Lee, J. T., Kim, S., and Kim, V. N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. Embo J 21, 4663-4670.

Leustek, T. (2002). Sulfate Metabolism. The Arabidopsis Book, 1-16.

Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003).

Prediction of mammalian microRNA targets. Cell 115, 787-798.

Lim, L. P., Glasner, M. E., Yekta, S., Burge, C. B., and Bartel, D. P. (2003a). Vertebrate microRNA genes. Science 299, 1540.

Lim, L. P., Lau, N. C., Weinstein, E. G., Abdelhakim, A., Yekta, S., Rhoades, M. W., Burge, C.

B., and Bartel, D. P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev 17, 991-

1008.

Llave, C., Kasschau, K. D., Rector, M. A., and Carrington, J. C. (2002a). Endogenous and silencing-associated small RNAs in plants. Plant Cell 14, 1605-1619.

Llave, C., Xie, Z., Kasschau, K. D., and Carrington, J. C. (2002b). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056.

Mallory, A. C., Dugas, D. V., Bartel, D. P., and Bartel, B. (2004). MicroRNA regulation of

NAC-domain targets is required for proper formation and separation of adjacent embyonic, vegatative, and floral organs. Curr Biol In press.

97 Mallory, A. C., Ely, L., Smith, T. H., Marathe, R., Anandalakshmi, R., Fagard, M., Vaucheret,

H., Pruss, G., Bowman, L., and Vance, V. B. (2001). HC-Pro suppression of transgene silencing eliminates the small RNAs but not transgene methylation or the mobile signal. Plant Cell 13,

571-583.

Maruyama-Nakashita, A., Inoue, E., Watanabe-Takahashi, A., Yamaya, T., and Takahashi, H.

(2003). Transcriptome profiling of sulfur-responsive genes in Arabidopsis reveals global effects of sulfur nutrition on multiple metabolic pathways. Plant Physiol 132, 597-605.

Mayer, A. M., and Staples, R. C. (2002). Laccase: new functions for an old enzyme.

Phytochemistry 60, 551-565.

McConnell, J. R., Emery, J., Eshed, Y., Bao, N., Bowman, J., and Barton, M. K. (2001). Role of

PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709-

713.

Mette, M. F., van der Winden, J., Matzke, M., and Matzke, A. J. (2002). Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130, 6-9.

Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J.,

Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16, 720-728.

Palatnik, J. F., Allen, E., Wu, X., Schommer, C., Schwab, R., Carrington, J. C., and Weigel, D.

(2003). Control of leaf morphogenesis by microRNAs. Nature 425, 257-263.

Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana.

Curr Biol 12, 1484-1495.

98 Reinhart, B. J., Weinstein, E. G., Rhoades, M. W., Bartel, B., and Bartel, D. P. (2002).

MicroRNAs in plants. Genes Dev 16, 1616-1626.

Rhoades, M. W., Reinhart, B. J., Lim, L. P., Burge, C. B., Bartel, B., and Bartel, D. P. (2002).

Prediction of plant microRNA targets. Cell 110, 513-520.

Ruegger, M., Dewey, E., Gray, W. M., Hobbie, L., Turner, J., and Estelle, M. (1998). The TIR1 protein of Arabidopsis functions in auxin response and is related to human SKP2 and yeast grrlp. Genes Dev 12, 198-207.

Samach, A., Klenz, J. E., Kohalmi, S. E., Risseeuw, E., Haughn, G. W., and Crosby, W. L.

(1999). The UNUSUAL FLORAL ORGANS gene of Arabidopsis thaliana is an F-box protein required for normal patterning and growth in the floral meristem. Plant J 20, 433-445.

Schwarz, D. S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P. D. (2003).

Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208.

Stark, A., Brennecke, J., Russell, R. B., and Cohen, S. M. (2003). Identification of Drosophila

MicroRNA Targets. PLoS Biol 1, E60.

Takahashi, H., Yamazaki, M., Sasakura, N., Watanabe, A., Leustek, T., Engler, J. A., Engler, G.,

Van Montagu, M., and Saito, K. (1997). Regulation of sulfur assimilation in higher plants: a sulfate transporter induced in sulfate-starved roots plays a central role in Arabidopsis thaliana.

Proc Natl Acad Sci U S A 94, 11102-11107.

Tang, G., Reinhart, B. J., Bartel, D. P., and Zamore, P. D. (2003). A biochemical framework for

RNA silencing in plants. Genes Dev 17, 49-63.

Ulmasov, T., Hagen, G., and Guilfoyle, T. J. (1999). Dimerization and DNA binding of auxin response factors. Plant J 19, 309-319.

99 Vance, V. B. (1991). Replication of potato virus X RNA is altered in coinfections with potato

virus Y. 182, 486-494.

Vaucheret, H., Vazquez, F., Crete, P., and Bartel, D. P. (2004). The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev In Press.

Vazquez, F., Gasciolli, V., Crete, P., and Vaucheret, H. (2004). The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14, 346-351.

Vierstra, R. D. (2003). The ubiquitin/26S proteasome pathway, the complex last chapter in the life of many plant proteins. Trends Plant Sci 8, 135-142.

Wilkinson, M. D., and Haughn, G. W. (1995). UNUSUAL FLORAL ORGANS Controls Meristem

Identity and Organ Primordia Fate in Arabidopsis. Plant Cell 7, 1485-1499.

Xie, Q., Frugis, G., Colgan, D., and Chua, N. H. (2000). Arabidopsis NACI transduces auxin signal downstream of TIR1 to promote lateral root development. Genes Dev 14, 3024-3036.

Xie, Z., Johansen, L. K., Gustafson, A. M., Kasschau, K. D., Lellis, A. D., Zilberman, D.,

Jacobsen, S. E., and Carrington, J. C. (2004). Genetic and Functional Diversification of Small

RNA Pathways in Plants. PLoS Biol 2, E104.

Xie, Z., Kasschau, K. D., and Carrington, J. C. (2003). Negative feedback regulation of Dicer-

Likel in Arabidopsis by microRNA-guided mRNA degradation. Curr Biol 13, 784-789.

Yekta, S., Shih, I. H., and Bartel, D. P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA.

Science In press.

100 MicroRNA-mediated regulation of an F-box gene is required for

embryonic, floral, and vegetative development

Matthew W. Jones-Rhoades1 and David P. Bartell 2

'Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142

2Correspondence: [email protected]

101 Abstract MicroRNAs are endogenous -21 nt RNAs that function as post-transcriptional regulators in both plants and animals. miR394, and conserved miR394-complementary sites in F-box mRNAs, were previously identified in a bioinformatic screen for unknown miRNAs. Here we show that miR394-mediated regulation of F-box gene Atlg27340 is required at multiple stages of Arabidopsis development. Transgenic plants expressing a miR394-resistant version of Atlg27340 display a range of developmental abnormalities, including radialized and fused cotyledons, absent shoot apical meristems, curled and radialized leaves, and abortive flowers. The severity of these abnormalities correlates with the overaccumulation of Atlg27340 mRNA, suggesting that an SCFAtlg27340complex ubiquitinates an activator of class III HD-ZIP function. Introduction MicroRNAs (miRNAs) are endogenous - 21 nucleotide non-coding RNAs that regulate gene expression in both plants and animals (reviewed in (Bartel, 2004)). Initially expressed as single-stranded stem-loop precursor RNAs, miRNAs require the RNase III enzyme DICER- LIKE1 (DCL1), as well as HEN1, HYL1, HST, and AGO1, for proper processing and accumulation (Park et al., 2002; Reinhart et al., 2002; Boutet et al., 2003; Han et al., 2004; Vaucheret et al., 2004; Vazquez et al., 2004; Park et al., 2005). Many miRNAs isolated from Arabidopsis are conserved to Oryza (rice) and other plant species (Reinhart et al., 2002; Floyd and Bowman, 2004; Jones-Rhoades and Bartel, 2004; Sunkar and Zhu, 2004; Axtell and Bartel, 2005), suggesting that miRNAs have evolutionarily conserved roles in land plants. Regulatory targets have been confidently predicted for most Arabidopsis miRNAs based on the high degree of complementarity between the miRNAs and their target mRNAs (Rhoades et al., 2002; Jones- Rhoades and Bartel, 2004). Like the miRNAs themselves, many of these miRNA target sites are broadly conserved in plant species (Rhoades et al., 2002; Jones-Rhoades and Bartel, 2004). Perhaps because of the extensive complementarity of plant miRNA-target duplexes, most Arabidopsis miRNAs guide the cleavage of target mRNAs (Llave et al., 2002; Kasschau et al., 2003; Tang et al., 2003). Several lines of evidence indicate that plant miRNAs play key roles in a broad range of developmental processes. Plants with dcll, henl, agol, hst, or hyll mutations have severe and pleotropic developmental abnormalities (Bohmert et al., 1998; Telfer and Poethig, 1998; Lu and

102 Fedoroff, 2000; Chen et al., 2002; Morel et al., 2002; Schauer et al., 2002) which correlate with the impairment of miRNA activity (Park et al., 2002; Reinhart et al., 2002; Boutet et al., 2003; Han et al., 2004; Vaucheret et al., 2004; Vazquez et al., 2004; Park et al., 2005), as do plants which express certain viral suppressors of RNA mediated silencing (Mallory et al., 2002; Kasschau et al., 2003; Chapman et al., 2004; Chen et al., 2004; Dunoyer et al., 2004). The majority of confirmed and predicted evolutionarily-conserved miRNA targets are mRNAs that encode for transcription factors and other regulatory proteins, such as F-box proteins and components of the miRNA pathway itself (Jones-Rhoades and Bartel, 2004). Plants with impaired miRNA-mediated regulation of particular transcription factor mRNAs have been shown to have various developmental phenotypes. For example, plants expressing miR166 resistant versions of HD-ZIP genes PHABULOSA, PHAVOLUTA, and REVOLUTA have radialized leaves or vasculature (Emery et al., 2003; Kidner and Martienssen, 2004; Mallory et al., 2004b; Zhong and Ye, 2004), and miRNA-resistant copies of certain TCP or ARF transcription factors result in seedlings that arrest or that have extra cotyledons, respectively(Palatnik et al., 2003; Mallory et al., 2005). A recent bioinformatic screen for conserved plant miRNAs and targets identified two miRNA families that guide the cleavage of mRNAs that encode for F-box proteins (Jones- Rhoades and Bartel, 2004). F-box proteins are specificity determinants of SCF E3 ubiquitin ligases, which facilitate the transfer of ubiquitin from E2 ubiquitin conjugating proteins to specific target proteins, thereby marking them for degradation by the 26S proteasome (reviewed in (Deshaies, 1999; Smalle and Vierstra, 2004)). In Saccharomyces cerevisiae, SCF complexes are composed of four primary subunits: Cullinl, Rbxl, and Skpl are thought to comprise the core ubiquitin ligase activity, and an F-box protein is thought to serve as a bridge between the SCF complex and the target protein (Deshaies, 1999; Smalle and Vierstra, 2004). The - 60 amino acid N-terminal F-box domain interacts with the rest of the SCF complex, and the C- terminal portion, which is highly divergent between different F-box proteins, is thought to interact with the target protein and thus confer specificity of ubiquitination (Zheng et al., 2002; Willems et al., 2004). The Arabidopsis genome contains nearly 700 F-box proteins (Gagne et al., 2002), several of which have been shown to be important for diverse aspects of plant biology such as hormone signaling, response to the environment, and developmental patterning. TRANSPORT

103 INHIBITOR RESPONSE1 (TIR1) targets AUX/IAA proteins for degradation in an auxin- dependent manner, and is needed for auxin-induced developmental processes (Ruegger et al., 1998; Gray et al., 2001). The F-box proteins EBF1/EBF2, GID2 and COI1 mediate ethylene, gibberellin, and jasmonate signaling, respectively (Xie et al., 1998; Guo and Ecker, 2003; Potuschak et al., 2003; Sasaki et al., 2003; Gagne et al., 2004). UNUSUAL FLORAL ORGANS (UFO) is required for proper floral development (Wilkinson and Haughn, 1995; Samach et al., 1999), and ORE1 regulates leaf senescence and axillary shoot growth(Woo et al., 2001; Stirnberg et al., 2002). However, the majority of Arabidopsis F-box genes have no known function. It is likely that many of these F-box proteins (or subclades of F-box proteins) each target specific proteins for ubiquitination and proteolysis. Here we show that miR394-mediated regulation of Atlg27340, an F-box gene related to UFO, is required for proper development. Seedlings expressing 5mAtlg27340, a miR394- resistant version of Atlg27340, frequently arrest without forming shoot apical meristems (SAMs), often with fused and/or radialized cotyledons. 5mAtlg27340 expressing plants that do form SAMs have pleotropic defects in vegetative and floral development, including downwardly curled leaves and abortive flowers. These developmental abnormalities correlate with the overaccumulation of Atlg27340 mRNA, suggesting that they are the result of overexpression of an Atlg27340-directed SCF ubiquitin ligase. Results Atlg27340 defines a conserved class of miR394-regulated F-box genes with homology to UFO

In a phylogenetic tree of 694 F-box genes, Atlg27340 falls in a subclade of five genes that contains UFO (Gagne et al., 2002). Although Atlg27340 is the second best blastp hit to UFO in the Arabidopsis genome (E value 6.7-2 1), the two proteins have only -30% similarity at the amino acid level. UFO is unlikely to be regulated by miR394; whereas miR394 can pair to Atlg27340 with 19 out of 20 nucleotides, only 12 out of 20 miR394 nucleotides can pair to the corresponding section of UFO (Figure la).

Although the similarity between Atlg27340 and UFO is limited, F-box genes in other plant species are highly similar to Atlg27340 and contain conserved miR394 complementary sites. Two Populus and one Oryza F-box proteins have Atlg27340 as their best Arabidopsis blastp hits (Figure la). All three of these proteins have are at least 75% similar to Atlg27340 at the amino acid level, including extensive identity in the C-terminal region that is likely to specify

104 substrate recognition, and miR394 can pair to the mRNA encoding each with 0-1 unpaired nucleotides (Figure la). In addition to these Atlg27340-like genes in plants with sequenced genomes, numerous plant species have ESTs which a) have Atlg27340 as their best Arabidopsis blastx hit and b) can pair to miR394 with 0-1 mismatches (Figure 1). These miR394- complementary, Atlg27340-like ESTs are found in both monocots and dicots, as well as in conifers (genus Picea). This conservation implies that the divergence of Atlg27340 from UFO and the regulation of Atlg27340-like genes by miR394 predate the divergence of gymnosperms and angiosperms. miR394 regulation of Atlg27340 is required for normal development In vivo miR394-directed cleavage of Atlg27340 can be detected by 5' RACE (Jones- Rhoades and Bartel, 2004). In order to investigate the biological significance miR394-mediated regulation, we constructed a mutant version of Atlg27340 with reduced complementarity to miR394. This 5mAtlg27340 construct encodes for the same amino acid sequence as Atlg27340, but has five silent mutations within the miR394 complementary site, and contains 1.6 kb of putative promoter sequence upstream of Atlg27340 (Figure lb). We transformed Arabidopsis thaliana separately with both this 5mAtlg27340 construct and with an unmutated Atlg27340 control construct. Only 1 out of 91 control Atlg27340 primary transformants (-1%) had any developmental abnormalities (small outgrowths from the midveins of a few cauline leaves on one plant). In contrast, 65 of 105 5mAtlg27340 primary transformants (62%) displayed various vegetative and floral phenotypes (Table 1). Most noticeably, 51 5mAtlg27340 transformants (49%) had moderately to severely downwardly curled rosette leaves (Figure 2a,b). Fifty-one 5nmAtlg27340 transformants (49%) also had cauline leaf abnormalities. Most commonly, cauline leaves had a spiked outgrowth protruding from the abaxial midvein (Figure 2b,c2). In other cases, the entire cauline leaf was replaced by a radialized, spiked structure (Figure 2c3-6). In some cases, these radialized cauline leaves subtended approximately wild-type axillary inflorescences (Figure 2c6), whereas in other cases radialized cauline leaves subtend axillary inflorescences that themselves produce aberrant cauline leaves and flowers (Figure 2c3,4). The number and severity of abnormal cauline leaves generally correlated with the extent of rosette leaf curling. 5mAtlg27340 transformants also exhibited various floral abnormalities. Most of the flowers that were produced had the expected numbers of organs and were fertile, although

105 flowers on 13 of the plants (12%) with stronger phenotypes were generally missing 1-4 petals and had reduced fertility. Twenty-seven 5mAtlg27340 transformants (26%) sporadically produced abortive flowers that consisted of only a filamentous structure in some cases (Figure 2d inset, Figure 2e), whereas in other cases flowers consisted of two sepals without any other floral organs (Figure 2e). The percentage of abortive flowers produced per plant varied from -1% to -40%. In most cases, a single inflorescence would alternate between producing fertile and abortive flowers in a seemingly stochastic pattern (Figure 2d,e). In extreme cases, inflorescences of 5mAtlg27340 plants produced a proliferation of determinate filaments in place of floral buds (Figure 2e). Shoots of 5mAtlg27340 expressing plants often had a seemingly stochastic phyllotaxy of maturing siliques, with the locations of the missing siliques marked by abortive filaments or empty flowers (Figure 2d). Approximately 10% of 5mAtlg27340 T1 transformants failed to develop a SAM and never formed any true leaves. Analysis of T2 seeds for several 5mAtlg27340 lines revealed that seedling arrest occurred in 0-55% of T2 seedlings, with the percentage of arrested seedlings correlating with the severity of the T1 phenotype, whereas the remainder of Basta-resistant seedlings did form SAMs and recapitulated the vegetative and floral abnormalities observed in their T1 parents (Table 2). The arrested seedlings displayed a range of different phenotypes (Figure 3a,b). Some seedlings had only one cotyledon (Figure 3bl,4), whereas others had two (Figure 3b2,3). In some cases the cotyledons were radialized (Figure 3bl,2), whereas in other cases seedlings had cotyledons that approached wild-type size and shape, but did not form functional SAMs (Figure 3b3,4). In some of these cases, one or two determinate, spike-like structures eventually emerged from the region where the SAM should have been (data not shown). 5mAtlg27340 plants overaccumulate Atlg27340 mRNA Many plant miRNAs guide the cleavage to target mRNAs. Because of this, mRNAs targeted by miRNAs generally overaccumulate in plants impaired in miRNA function, and the expression of a miRNA-resistant version of a miRNA target can similarly result in overaccumulation of the miRNA-resistant mRNA. We find that this is the case with 5mAtlg27340-expressing plants; normalized Atlg27340 mRNA levels are 1.7, 2.8, and 2.2 fold higher in leaves, inflorescences, and seedlings, respectively, in T2 5mAtlg27340 plants compared to control Atlg27340 plants (Figure 3b). Atlg27340 mRNA levels are highest in

106 .5mAtlg27340 T2 seedlings that lack SAMs; these arrested seedlings accumulate Atlg27340 transcripts at levels 1.8 fold higher than 5mAtlg27340 seedlings with functional SAMs and 3.9 -foldhigher than control Atlg27340 seedlings. Discussion We find that expression of a miR394-resistant version of Atlg27340 has broad ranging effects on Arabidopsis development, whereas expression of an additional wild-type copy does not. These results confirm the biological relevance of the interaction between miR394 and Atlg27340, and represent the first insights into the roles of miRNA-mediated regulation of F-box genes. Our finding that Atlg27340 mRNA levels are increased in plants expressing 5mAtlg27340 is consistent with the idea that miR394 exerts its influence over Atlg27340 primarily through guided RNA cleavage. Indeed, the extent of developmental abnormalities correlates with the level of Atlg27340 mRNA in that Atlg27340 transcript levels are highest in seedlings that fail to develop shoot apical meristems. The Arabidopsis shoot apical meristem is a small group of pluripotent cells which gives rise to all aerial tissues and organs (reviewed in (Baurle and Laux, 2003). The proper initiation of and maintenance of SAM pluripotency requires a complex interplay of gene interactions, and is critical to all stages of vegetative and floral development. SHOOTMERISTEMLESS (STM) and WUSCHEL (WUS), which encode for homeodomain transcription factors, act in parallel to initiate and maintain meristem identity (Endrizzi et al., 1996; Laux et al., 1996; Long et al., 1996; Mayer et al., 1998). The embryonic expression of STM, and hence the embryonic establishment of SAM identity, is dependent on the proper development of the cotyledons. Embryos with double homozygous mutations in the NAC domain transcription factors CUP- SHAPED COTYLEDONS1 (CUC1) and CUC2 have fused cotyledons and fail to initiate STM expression during embryogenesis (Aida et al., 1997; Aida et al., 1999). Similarly, the correct balance between the antagonistic activities of class III HD-ZIP and KANADI transcription factors is essential for proper cotyledon development and SAM formation. Seedlings which are either homozygous for loss-of-function mutations in three partially redundant HD-ZIP genes (phblphv/rev), or which overexpress KANADI genes, have one or two radialized cotyledons and fail to initiate SAMs (Eshed et al., 2001; Kerstetter et al., 2001; Emery et al., 2003). Our results establish that both MIR394 and Atlg27340 are also important regulators of meristem identity. MIR394 is expressed highly in inflorescences (Jones-Rhoades and Bartel,

107 2004), and the relative increase of Atlg27340 mRNA in 5mAtlg27340 plants was greatest in inflorescences. This increase in Atlg27340 mRNA levels is likely associated with the overexpression of Atlg27340 protein and/or the accumulation of Atlg27340 protein in cells in which miR394 would block wild-type Atlg27340 expression. If Atlg27340 functions in SCF E3 ubiquitin ligases as do other F-box proteins, then the observed 5mAtlg27340 phenotypes are likely to be the result of increased ubiquitination and protealysis of unknown factors targeted by the putative SCFAtlg27340ubiquitin ligase. Because many 5mAtlg27340 seedlings have abnormal cotyledons, the target of SCFAtlg27340 is likely to be upstream of STM, which is dispensable for cotyledon development (Endrizzi et al., 1996; Long et al., 1996). The 5mAtlg27340 seedlings with one or two fused cotyledons are reminiscent of homozygous phblphvlrev triple mutants and KANADI overexpressors (Eshed et al., 2001; Kerstetter et al., 2001; Emery et al., 2003), suggesting that the targets of SCFAtlg27340may be activators of the HD-ZIP activity or repressors of KANADI function. Because these genes are also important for the proper initiation and patterning of lateral organs and meristems post-embryonically, the vegetative and phenotypes observed in 5mAtlg27340 plants might also be related to a misregulation of HD-ZIP or KANADI activities. Indeed, plants homozygous for loss-of-function alleles for multiple class III HD-ZIP genes sporadically initiate abortive flowers (Prigge et al., 2005) in a manner reminiscent of 5mAtlg27340 expressing plants.

Experimental Procedures DNA constructs and transgenic plants BAC clone F17L21 was digested with SpeI and NsiI to yield a 5.1 kb fragment containing Atlg27340, as well 1.6 kb of upstream sequence and 0.9 kb of downstream sequence, which was ligated into SpeI and PsI cut pBluescriptIISK+ (Stratagene). Site directed mutagenesis was performed by PCR with PfuUltra polymerase and the primers GCACCATATGTTCGGCATGCGATCAACTTCCTTCCACAACAGTGT and ACACTGTTGTGGAAGGAAGTTGATCGCATGCCGAACATATGGTGC, followed by DpnI digestion. Following mutagenesis, a 2.5 kb Hindm-BamHI fragment of the original Atlg27340 clone was replaced with the corresponding fragment containing the mutagenized miR394 complementary site, which was sequenced to ensure that no additional mutations had occurred during PCR. Wild-type and mutant ATlg27340 5.1 kb Spel-HindIII fragments were subcloned

108 into the binary vector pGreenII0229, and electroporated into Agrobacterium tumefaciens strain GV3101::pMP90. Arabidopsis thaliana (Columbia accession) was transformed by the floral dip method (Clough and Bent, 1998), and the collected seeds were surface sterilized and plated on Bouterage No.2 media (Duchefa Biochemie) containing 10 ug/ml Basta. Seedlings were grown under long day conditions (20° C, 16 hr light, 8 hr dark) for about 10 days before transfer to soil consisting of 50% promix (Premier Horticulture) and 50% redi-earth (Scotts). RNA Isolation and Northern blot analysis Total RNA was isolated as described (Mallory et al., 2001). For mRNA northerns, 12 ug of total RNA was size fractionated on a 1% agarose/formaldehde gel and transferred to a nitrocellulose membrane as described (Mallory et al., 2005). 1.4 kb of exon2 of Atlg27340 was PCR amplified with primers AGTCTCTAGAATGGTGTTGCCCTGTATTGAGGA and CAGTAAGCTTAAGAGGTTCCACACAACCCA, and directionally cloned into pBluescriptIISK+ (Stratagene). Following XbaI digestion, this template was used to generate Atlg27340 antisense RNA probe by T7 transcription in the presence of a-3 2P UTP. Blots were hybridized in at 680 C in Ultrahyb buffer overnight, and washed successively with 2X SSC, 0.1% SDS (two times) and 0.1X SSC, 0.1% SDS (two times). For miRNA northerns, 30 ug total RNA was fractionated on a 15% polyacrylamide gel, transferred to a nitrocellulose membrane, hybridized, and washed as described, using the 5' 32p labeled DNA oligo AGGAGGTGGACAGAATGCCAA as a probe for miR394. Scanning Electron Microscopy Plant tissues were fixed, dehydrated, critical point dried, and coated with gold and palladium as described (Mallory et al., 2004a). Samples were imaged on a Jeol 5600LV scanning electron microscope.

109 Table 1. Observed phenotypes of T1 transformant plants curled spikes on radialized wild type abnormal rosette cauline cauline missing abortive construct total development development leaves leaves leaves petals flowers no SAM Atlg27340 91 90 (99%) 1 (1%) 0 (0%) 1 (1 %) 1 (1 %) 0 (0%) 0 (0%) 0 (0%) 5mAtlg27340 105 40 (38%) 65 (62%) 51 (49%) 51 (49%) 23 (22%) 13 (12%) 27 (26%) 11 (10%) The number and percentage of T1 plants with various developmental abnormalities are indicated. See text for details.

Table 2. Observed phenotypes of 5mAtlg27340 T2 transformant plants basta T1 Line total sensitive no SAM like T1 phenotpye 5mAtlg27340-16 36 27% 56% 18% severe 5mAt lg27340-1 36 23% 33% 44% strong 5mAtlg27340-23 37 30% 32% 38% strong 5mAtlg27340-6 66 21% 30% 49% severe 5mAtlg27340-30 45 27% 26% 47% severe 5mAtlg27340-44 95 25% 25% 50% severe 5mAtlg27340-33 32 25% 20% 55% strong 5mAtlg27340-3 33 20% 9% 71% mild 5mAtlg27340-18 43 36% 6% 58% mild 5mAtlg27340-24 22 24% 3% 73% slight 5mAtlg27340-27 35 23% 0% 61% slight The observed frequencies of T2 phenotypes for 5mAtlg27340 lines are indicated, as is the severity of developmental defects observed in the T1 parent of each line.

110 Figure Legends Figure 1. Atlg27340 is complementary to miR394. (A) miR394 complementary sites in F-box genes from different plant genera are depicted. Nucleotides which can form Watson-Crick base pairs with miR394 are in upper case and highlighted, whereas nucleotides which are mismatched or can form G:U wobble pairs are in lower case. For each F-box gene, the Atlg27340 blastp (for Arabidopsis, Oryza, and Populus proteins) or blastx (for ESTs from other genera) E value and rank (out of all Arabidopsis proteins) are indicated. (B) The Atlg27340 genomic clone used to transform Arabidopsis is depicted. Intergenic regions are shown as solid lines, UTR sequence as shaded boxes, coding sequence as open boxes, and intronic sequence as a dashed line. The restriction sites used to isolate the genomic clone from BAC F17L21 are indicated. Within the Atlg27340 coding region, the position of the F-box domain ("F") and miR394 complementary site ("*") are shown. The amino acid sequence, nucleotide sequence, and miR394-complementarity of the wild-type and mutated miR394 complementary sites are shown.

Figure 2. Vegetative and floral phenotypes of 5mAtlg27340 plants. (A) Three week old wild-type plant with broad, flat rosette leaves and T1 SmAtlg27340 plant with downwardly curled rosette leaves. (B) Close-up views of flat wild-type rosette and cauline leaves and curled T1 5mAtlg27340 rosette and cauline leaves (right). 5mAtlg27340 has a spiked outgrowth from the abaxial midvein (arrow). (C) Control T2 Atlg27340 (1) and 5mAtlg27340 cauline leaves and axillary shoots (2-6). (D) Shoots of T2 control Atlg27340 and 5mAtlg27340 plants. At right are close up views of inflorescences showing reduction in silique number in 5mAtlg27340 plants. The inset show the presence of filaments on 5mAtig27340 shoots where phyllotaxy suggests siliques should be. (E) Inflorescences of control Atlg27340 containing flowers in various developmental stages and inflorescences of 5mAtlg27340 plants containing numerous abortive filaments, a few empty flowers (arrows), as well as some reproductively functional flowers.

Figure 3. Seedling phenotypes of 5mAtlg27340 plants (A) Six day old control T2 Atlg27340 seedlings have the first pair of true leaves emerging from the shoot apical meristem. (B) Some T2 5mAtlg27340 seedlings display a variety of

111 developmental abnormalities, including having one radicalized cotyledon (1), two radicalized cotyledons (2), one flat cotyledon (4), and two flat cotyledons but no apparent true leaves (4). (C) Atig27340 mRNA overaccumulates in 5mAtlg27340 plants. 12 ug of total RNA from control Atlg27340 and 5mAtlg27340 rosette leaves (L), inflorescences (Inf), and seedlings (Se, SAM-), was analyzed by Northern blot using a body labeled RNA probe complementary to most of exon 2. For 5mAtlg27340, RNA was isolated separately from seedlings with (Se) and without (SAM-) evident shoot apical meristems. The levels of Atlg27340 mRNA were quantified relative to the ethidium bromide staining of the 25S ribosomal RNA.

112 Figure 1

At 9g27340 miR394 3' C Ff FY Yyy yAI Y y 5 blast E value A (rank) Atlg27340 Arabicdopsis 5s .. . .3 UFO Arabicdopsis 5' ... c 1 a la~c cig EIu c .3 6.721(3) fgenesh4_pm.C_LG_111000589 Populus 5'... ~ .. 3 2.0-153(1) estExt Genewisel vl.CLG 17715 PopulUS 5'... .. 3 4.3-174(1) CB292711 Citrus 5...lu ~. .3 2.7-'00°°(1) CD476694 Eschsscholzia 5 I .'. . .3 3. 0-69(1) AW351311 Glycirle 5'...i 3 3.7-62(1) BQ971555 Heliarnthus5'... _ u __- 3 4.4-59(1) BJ571294 Ipomcoea 5. .. u _ _. .3 9.1-91(1) BQ874161 Lactuica 5 ...... 3 33.36(1 ) CN580831 Malus 5'..._u _ .3 7.5-70(1) CX543200 Poncirus 5' ... - ..3 1.9-74(1) CV049137 PrunuS 5 '... ~ .3 7.1- 36(1) BF153392 Solan um 5 ...u _ .. 3 5.9-96(1) AJ700842 Zantedeschia 5 ... .. 3 7.4-57(1 )

Os01g69940 Oryza 5,... - ---- R' -- ar-lyr .. 3 2.7-142(1) CN820826 Avena 5 ... .. 3 4.9-6(1 ) BQ407881 Gossypium 5'... .. 3 8.6-63(1) AW982846 Hordeum 5... .. 3 8.2-81(1) ._ ~~ ~~ ·r-JUQlUlli~ BM084705 Pennisetum 5'... .. 3 1.0-43(1) _ CA076958 Saccharum 5... .3 1.7-57(1) BE366831 Sorghum 5' ... 3 2.7 52(1) BE427348 Triticum 5 ... ------~-- HITMAMldYs ..3 2.451(1) A1438876 Zea 5'... .3 1.8-44(1) CA482544 Linum 5'... [QLWlbla " IlRI U. .3 4.8-62(1) CO0204356 Picea 5'... a AL-I rYI I .3 9.2-43(1)

Atlg27340 mRNA B Spel Nsil 11111 1.6 kb 5' 0.9 kb 3' 0~~~~

E V D R M P Atlg273405'.. . G ? TJA I A . ~? V miR394 3' 2: 116 , . x 1rg32l . 3' 5m-Atlg27340 5, .. .. a U u. c c Atr c I. . 3 Figure 2

Col 5mAtlg27340 Col 5mAtlg27340

At1g27340 D 5mAt1 g27340

Atlg27340 5mAt1g27340 Atlg27340 5mAtlg27340 E

At 1g27340 5mAtlg27340 Atlg27340 5mAtlg27340 Figure 3

A D I-' D

Atl g27340

4- 4- Atlg27340 5mAt1g27340 arcu L Inf Se L Inf Se SAM- D

0 -J 75: O 0U) 0 CC 0C C U1) ar E z _ a::: U Ji -:

-21 miR394 iiiii;¸ 4i!i!!

Atlg27340 -18I1 U6 I'

25S 3.3 1.0 2.0 5.6 2.8 4.3 7.8 At1 g27340/25S Aida, M., Ishida, T., and Tasaka, M. (1999). Shoot apical meristem and cotyledon formation during Arabidopsis embryogenesis: interaction among the CUP-SHAPED COTYLEDON and SHOOT MERISTEMLESS genes. Development 126, 1563-1570. Aida, M., Ishida, T., Fukaki, H., Fujisawa, H., and Tasaka, M. (1997). Genes involved in organ separation in Arabidopsis: an analysis of the cup-shaped cotyledon mutant. Plant Cell 9, 841-857. Axtell, M.J., and Bartel, D.P. (2005). Antiquity of MicroRNAs and Their Targets in Land Plants. Plant Cell 17, 666-99999. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297. Baurle, I., and Laux, T. (2003). Apical meristems: the plant's fountain of youth. Bioessays 25, 961-970. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. Embo J 17, 170-180. Boutet, S., Vazquez, F., Liu, J., Beclin, C., Fagard, M., Gratias, A., Morel, J.B., Crete, P., Chen, X., and Vaucheret, H. (2003). Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13, 843-848. Chapman, E.J., Prokhnevsky, A.I., Gopinath, K., Dolja, V.V., and Carrington, J.C. (2004). Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18, 1179-1186. Chen, J., Li, W.X., Xie, D., Peng, J.R., and Ding, S.W. (2004). Viral virulence protein suppresses RNA silencing-mediated defense but upregulates the role of microrna in host gene expression. Plant Cell 16, 1302-1313. Chen, X., Liu, J., Cheng, Y., and Jia, D. (2002). HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129, 1085-1094. Clough, S.J., and Bent, A.F. (1998). Floral dip: a simplified method for Agrobacterium- mediated transformation of Arabidopsis thaliana. Plant J 16, 735-743. Deshaies, R.J. (1999). SCF and Cullin/Ring H2-based ubiquitin ligases. Annu Rev Cell Dev Biol 15, 435-467. Dunoyer, P., Lecellier, C.H., Parizotto, E.A., Himber, C., and Voinnet, 0. (2004). Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16, 1235-1250. Emery, J.F., Floyd, S.K., Alvarez, J., Eshed, Y., Hawker, N.P., Izhaki, A., Baum, S.F., and Bowman, J.L. (2003). Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI genes. Curr Biol 13, 1768-1774. Endrizzi, K., Moussian, B., Haecker, A., Levin, J.Z., and Laux, T. (1996). The SHOOT MERISTEMLESS gene is required for maintenance of undifferentiated cells in Arabidopsis shoot and floral meristems and acts at a different regulatory level than the meristem genes WUSCHEL and ZWILLE. Plant J 10, 967-979.

116 Eshed, Y., Baum, S.F., Perea, J.V., and Bowman, J.L. (2001). Establishment of polarity in lateral organs of plants. Curr Biol 11, 1251-1260. Floyd, S.K., and Bowman, J.L. (2004). Gene regulation: ancient microRNA target sequences in plants. Nature 428, 485-486. Gagne, J.M., Downes, B.P., Shiu, S.H., Durski, A.M., and Vierstra, R.D. (2002). The F-box subunit of the SCF E3 complex is encoded by a diverse superfamily of genes in Arabidopsis. Proc Natl Acad Sci U S A 99, 11519-11524. Gagne, J.M., Smalle, J., Gingerich, D.J., Walker, J.M., Yoo, S.D., Yanagisawa, S., and Vierstra, R.D. (2004). Arabidopsis EIN3-binding F-box 1 and 2 form ubiquitin-protein ligases that repress ethylene action and promote growth by directing EIN3 degradation. Proc Natl Acad Sci U S A 101, 6803-6808. Gray, W.M., Kepinski, S., Rouse, D., Leyser, O., and Estelle, M. (2001). Auxin regulates SCF(TIR1)-dependent degradation of AUX/IAA proteins. Nature 414, 271-276. Guo, H., and Ecker, J.R. (2003). Plant responses to ethylene gas are mediated by SCF(EBF1/EBF2)-dependent proteolysis of EIN3 transcription factor. Cell 115, 667-677. Han, M.H., Goud, S., Song, L., and Fedoroff, N. (2004). The Arabidopsis double-stranded RNA-binding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci U S A 101, 1093-1098. Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14, 787-799. Kasschau, K.D., Xie, Z., Allen, E., Llave, C., Chapman, E.J., Krizan, K.A., and Carrington, J.C. (2003). P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA unction. Dev Cell 4, 205-217. Kerstetter, R.A., Bollman, K., Taylor, R.A., Bomblies, K., and Poethig, R.S. (2001). KANADI regulates organ polarity in Arabidopsis. Nature 411, 706-709. Kidner, C.A., and Martienssen, R.A. (2004). Spatially restricted microRNA directs leaf polarity through ARGONAUTEI. Nature 428, 81-84. Laux, T., Mayer, K.F., Berger, J., and Jurgens, G. (1996). The WUSCHEL gene is required for shoot and floral meristem integrity in Arabidopsis. Development 122, 87-96. Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Long, J.A., Moan, E.I., Medford, J.I., and Barton, M.K. (1996). A member of the KN07TED class of homeodomain proteins encoded by the STM gene of Arabidopsis. Nature 379, 66-69. Lu, C., and Fedoroff, N. (2000). A mutation in the Arabidopsis HYL1 gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12, 2351-2366. Mallory, A.C., Bartel, D.P., and Bartel, B. (2005). microRNA-Directed Regulation of Arabidopsis AUXIN RESPONSE FACTOR1 7 Is Essential for Proper Development and Modulates Expression of Early Auxin Response Genes. Plant Cell 17. Mallory, A.C., Dugas, D.V., Bartel, D.P., and Bartel, B. (2004a). MicroRNA regulation of NAC-domain targets is required for proper formation and separation of adjacent embryonic, vegetative, and floral organs. Curr Biol 14, 1035-1046. Mallory, A.C., Reinhart, B.J., Bartel, D., Vance, V.B., and Bowman, L.H. (2002). A viral suppressor of RNA silencing differentially regulates the accumulation of short interfering RNAs and micro-RNAs in tobacco. Proc Natl Acad Sci U S A 99, 15228-15233.

117 Mallory, A.C., Reinhart, B.J., Jones-Rhoades, M.W., Tang, G., Zamore, P.D., Barton, M.K., and Bartel, D.P. (2004b). MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5' region. Embo J 23, 3356-3364. Mallory, A.C., Ely, L., Smith, T.H., Marathe, R., Anandalakshmi, R., Fagard, M., Vaucheret, H., Pruss, G., Bowman, L., and Vance, V.B. (2001). HC-Pro suppression of transgene silencing eliminates the small RNAs but not transgene methylation or the mobile signal. Plant Cell 13, 571-583. Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G., and Laux, T. (1998). Role of WUSCHEL in regulating stem cell fate in the Arabidopsis shoot meristem. Cell 95, 805-815. Morel, J.B., Godon, C., Mourrain, P., Beclin, C., Boutet, S., Feuerbach, F., Proux, F., and Vaucheret, H. (2002). Fertile hypomorphic ARGONAUTE (agol) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14, 629-639. Palatnik, J.F., Allen, E., Wu, X., Schommer, C., Schwab, R., Carrington, J.C., and Weigel, D. (2003). Control of leaf morphogenesis by microRNAs. Nature 425, 257-263. Park, M.Y., Wu, G., Gonzalez-Sulser, A., Vaucheret, H., and Poethig, R.S. (2005). Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102, 3691-3696. Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12, 1484-1495. Potuschak, T., Lechner, E., Parmentier, Y., Yanagisawa, S., Grava, S., Koncz, C., and Genschik, P. (2003). EIN3-dependent regulation of plant ethylene hormone signaling by two arabidopsis F box proteins: EBF1 and EBF2. Cell 115, 679-689. Prigge, M.J., Otsuga, D., Alonso, J.M., Ecker, J.R., Drews, G.N., and Clark, S.E. (2005). Class III Homeodomain-Leucine Zipper Gene Family Members Have Overlapping, Antagonistic, and Distinct Roles in Arabidopsis Development. Plant Cell 17, 61-76. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev 16, 1616-1626. Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel, D.P. (2002). Prediction of plant microRNA targets. Cell 110, 513-520. Ruegger, M., Dewey, E., Gray, W.M., Hobbie, L., Turner, J., and Estelle, M. (1998). The TIR1 protein of Arabidopsis functions in auxin response and is related to human SKP2 and yeast grrlp. Genes Dev 12, 198-207. Samach, A., Klenz, J.E., Kohalmi, S.E., Risseeuw, E., Haughn, G.W., and Crosby, W.L. (1999). The UNUSUAL FLORAL ORGANS gene of Arabidopsis thaliana is an F-box protein required for normal patterning and growth in the floral meristem. Plant J 20, 433- 445. Sasaki, A., Itoh, H., Gomi, K., Ueguchi-Tanaka, M., Ishiyama, K., Kobayashi, M., Jeong, D.H., An, G., Kitano, H., Ashikari, M., and Matsuoka, M. (2003). Accumulation of phosphorylated repressor for gibberellin signaling in an F-box mutant. Science 299, 1896-1898. Schauer, S.E., Jacobsen, S.E., Meinke, D.W., and Ray, A. (2002). DICER-LIKEI: blind men and elephants in Arabidopsis development. Trends Plant Sci 7, 487-491. Smalle, J., and Vierstra, R.D. (2004). The ubiquitin 26S proteasome proteolytic pathway. Annu Rev Plant Biol 55, 555-590.

118 Stirnberg, P., van De Sande, K., and Leyser, H.M. (2002). MAX1 and MAX2 control shoot lateral branching in Arabidopsis. Development 129, 1131-1141. Sunkar, R., and Zhu, J.K. (2004). Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16, 2001-2019. Tang, G., Reinhart, B.J., Bartel, D.P., and Zamore, P.D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev 17, 49-63. Telfer, A., and Poethig, R.S. (1998). HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125, 1889-1898. Vaucheret, H., Vazquez, F., Crete, P., and Bartel, D.P. (2004). The action of ARGONAUTEl in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18, 1187-1197. Vazquez, F., Gasciolli, V., Crete, P., and Vaucheret, H. (2004). The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14, 346-351. Wilkinson, M.D., and Haughn, G.W. (1995). UNUSUAL FLORAL ORGANS Controls Meristem Identity and Organ Primordia Fate in Arabidopsis. Plant Cell 7, 1485-1499. Willems, A.R., Schwab, M., and Tyers, M. (2004). A hitchhiker's guide to the cullin ubiquitin ligases: SCF and its kin. Biochim Biophys Acta 1695, 133-170. Woo, H.R., Chung, K.M., Park, J.H., Oh, S.A., Ahn, T., Hong, S.H., Jang, S.K., and Nam, H.G. (2001). ORE9, an F-box protein that regulates leaf senescence in Arabidopsis. Plant Cell 13, 1779-1790. Xie, D.X., Feys, B.F., James, S., Nieto-Rostro, M., and Turner, J.G. (1998). COII: an Arabidopsis gene required for jasmonate-regulated defense and fertility. Science 280, 1091-1094. Zheng, N., Schulman, B.A., Song, L., Miller, J.J., Jeffrey, P.D., Wang, P., Chu, C., Koepp, D.M., Elledge, S.J., Pagano, M., Conaway, R.C., Conaway, J.W., Harper, J.W., and Pavletich, N.P. (2002). Structure of the Cull-Rbxl-Skpl-F boxSkp2 SCF ubiquitin ligase complex. Nature 416, 703-709. Zhong, R., and Ye, Z.H. (2004). Amphivasal vascular bundle 1, a gain-of-function mutation of the IFL1/REV gene, is associated with alterations in the polarity of leaves, stems and carpels. Plant Cell Physiol 45, 369-385.

119 .Appendix 1. Conserved miRNA target sites in Arabidopsis, Oryza and Populus The sequence and score (see Jones-Rhoades & Bartel, 2004 Molecular Cell 14(6):787-99) is listed for miRNA complementary sites within predicted miRNA targets of three plant species. Some complementary sites occur adjacent to annotated gene models, especially in Populus; this iis indicated as "In annotation" (Y or N). rniRNA miRNAcomplementary In family targetgene TargetFamily Score sequence species annotation? rniR156 At1g27360.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 At1g27370.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 Atlg53160.1 SBP 2 CUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 Atlg69170.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At2g33810.1 SBP 1.5 UUGCUUACUCUCUUCUGUCA Arabidopsis Y miR156 At2g42200.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y nniR156 At3g15270.1 SBP 3 CCGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At3g57920.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y mniR156 At5g43270.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At5g50570.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At5g50670.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 Os01g69830 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os02g04680 SBP 1 AUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os02g07780 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os04g46580 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os06g45310 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os06g49010 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os07g32170 SBP 2 AUGCUCCCUCUCUUCUGUCA Oryza Y miR156 Os08g39890 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR1516 Os08g41940 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR156 estExt_Genewisel_v1.C_1240186 SBP 1 AUGCUCUCUCUCUUCUGUCA Populus Y miR156 estExtGenewisel_vl.C_LGXV2187 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y miR156 eugene3.001:20942 SBP 2 GCGCUCUCUCUCUUCUGUCA Populus Y miR156 eugene3.00160416 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y miR156 fgenesh4pg.C_LG_11001303 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y mriRl56 fgenesh4_pg.C_LG_X001404 SBP 1 GUGCUCUCUCUCUCUGUCA Populus Y miR156 grail3.0010026801 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y mliR156 gw1.107.39.1 SBP 1.5 AUGCUCCCUCUCUUCUGUCA Populus N miR156 gw1.129.152.1 SBP 0.5 GUGCUCGCUCUCUUCUGUCA Populus N miR156 gw1.164.76.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.40.76.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.1.7783.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.111.2396.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.IV.3037.1 SBP 1.5 AUGCUCUCUCUCUWCUGUCA Populus N miR156 gwl.VII.548.1 SBP 2 UUGCUCUCUCUCUUCUGUCA Populus N miR156; gwl.XI.3794.1 SBP 1.5 AUGCUCCCUCUCUUCUGUCA Populus N miR159 At2g26950.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAAG Arabidopsis Y miR159 At2g26960.1 MYB 2.5 UCGAGUUCCCUUCAUUCCAAU Arabidopsis Y miR159 At2g32460.1 MYB 1.5 UAGAGCUUCCUUCAAACCAAA Arabidopsis Y mrniR159 At3g11440.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAA Arabidopsis Y miR159 At3g60460.1 MYB 1.5 UGGAGCUCCAUUCGAUCCAAA Arabidopsis Y miR159 At4g26930.1 MYB 2.5 AUGAGCUCUCUUCAAACCAAA Arabidopsis Y miR159 At5gO6100.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAA Arabidopsis Y miR159 At5g55020.1 MYB 2 AGCAGCUCCCUUCAAACCAAA Arabidopsis Y

120 miR159 Os01g59660 MYB 1 UGGAGCUCCCUUCACUCCAAG Oryza miR159 Os03g38210 MYB 2 CCGAGCUCCCUUCAAGCCAAU Oryza miR159 Os04g46390 MYB 1.5 UGGAGCUCCAUUCGAUCCAAA Oryza miR159 Os5g41 170 MYB 0.5 UGGAGCUCCCUUUAAUCCAAU Oryza miR159 0s06g40330 MYB 1 UAGAGCUCCCUUCACUCCAAU Oryza miR159 Os06g46560 MYB 2.5 GCGAGCUCCCUUCGAACCAAU Oryza miR159 fgenesh4_pm.C_LG_I 11000641 MYB 2.5 UGGAGCUCUAUUCGGUCCAAA Populus miR159 fgenesh4_pm.C_scaffold_40000020 MYB 1.5 UGGAGCUCCAUUCGAUCCAAA Populus miR159 gwl. 1.6885.1 MYB 1 UGGAGCUCCCUUCACUCCAAU Populus miR159 gwl .1.9701.1 MYB 1 UAGAGCUCCCUUCACUCCAAU Populus miR159 gwl.111.41.1 MYB 0 UUGAGCUCCCUUCACUCCAAU Populus miR159 Atlg30210.1 TCP 2.5 AGGGGGACCCUUCAGUCCAA Arabidopsis miR159 At1g53230.1 TCP 3 AGGGGUCCCCUUCAGUCCAU Arabidopsis miR159 At2g31070.1 TCP 2.5 AGGGGUACCCUUCAGUCCAG Arabidopsis miR159 At3g15030.1 TCP 2.5 AGGGGUCCCCUUCAGUCCAG Arabidopsis miR159 At4g18390.1 TCP 2.5 AGGGGGACCCUUCAGUCCAA Arabidopsis miR159 Os01g11550 TCP 3.5 AGGGGACCCCUUCAGUCCAGUOryza miR159 0s03g57190 TCP 2.5 AGGGGGACCCUUCAGUCCAA Oryza miR159 Os07g05720 TCP 2.5 AGGGGGACCCUUCAGUCCAA Oryza miR159 Os12g42190 TCP 2.5 CGGGGCACACUUCAGUCCAA Oryza miR159 eugene3.00110429 TCP 2.5 AGGGGGACCCUUCAGUCCAA Populus miR159 eugene3.00110631 TCP 3 AGGGGAACCCUUCAGUCCAG Populus miR159 eugene3.00121020 TCP 2.5 AGGGGGACCCUUCAGUCCAA Populus miR159 eugene3.00190830 TCP 3 AGGGGGCCCCUUCAGUCCAG Populus miR159 eugene3.00410019 TCP 3 AGGGGAACCCUUCAGUCCAG Populus miR159 grail3.0032015302 TCP 3 AGGGGACCCCUUCAGUCCAG Populus miR159 gwl .IV.2486.1 TCP 3 AUGAGCUCCCUCCACUCAACPopulus miR160 At1g77850.1 ARF 0.5 UGGCAUGCAGGGAGCCAGGCAArabidopsis miR160 At2g28350.1 ARF 1 AGGAAUACAGGGAGCCAGGCA Arabidopsis miR160 At4g30080.1 ARF 1.5 GGGUUUACAGGGAGCCAGGCA Arabidopsis miR160 0s02g41800 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 0s04g43910 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 Os04g59430 ARF 1 UGACAUUCAGGGAGCCAGGCA Oryza miR160 0s06g47150 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 Os10g33940 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 estExtfgenesh4pg.C_LG_V0901 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR160 estExt_fgenesh4_pm.C_LG_X0888 ARF 0 AGGCAUACAGGGAGCCAGGCA Populus miR160 estExt_fgenesh4_pm.C_LG_XVI0323ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR160 eugene3.00660262 ARF 0 AGGCAUACAGGGAGCCAGGCA Populus miR160 fgenesh4_pg.C_LG_11000830 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR160 fgenesh4_pg.C_LG_X001411 ARF 0 AGGCAUACAGGGAGCCAGGCA Populus miR160 fgenesh4_pg.C_LG_VI11000301 ARF 0 AGGCAUACAGGGAGCCAGGCA Populus miR160 gw1.28.631.1 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR160 gw1.28.632.1 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR161 At1g06580.1 PPR 2 CCCGGAUGUAAUCACUUUCAG Arabidopsis miR161 Atl g62670.1 PPR 1.5 CCCUGAUGUAUUCACUUUCAG Arabidopsis rniR161 At1g62720.1 PPR 2.5 CCCCGAUGUAGUGACUUAUAA Arabidopsis rniR161 At1g63080.1 PPR 2 UCCAAAUGUAGUCACUUUCAA Arabidopsis rniR161 At1g631 50.1 PPR 2.5 CCCCAAUGUUGUUACUUUCAA Arabidopsis rniR161 At1g63400.1 PPR 2 UCCAAAUGUAGUCACUUUCAAArabidopsis

121 miR161 At1g64580.1 PPR 1.5 CCCUGAUGUUGUCACUUUCAC Arabidopsis Y miR161 At5g16640.1 PPR 2 CCCUGAUGUAUUUACUUUCAA Arabidopsis Y miR161 At5g41170.1 PPR 1.5 ACCUGAUGUAAUCACUUUCAA Arabidopsis Y miR162 At gOl01040.1 DCL 2 CUGGAUGCAGAGGUAUUAUCGAArabidopsis Y miR162 Os03g02970 DCL 2 CUGGAUGCAGAGGUUUUAUCG Oryza Y miR162 eugene3.00021687 DCL 2 CUGGAUGCAGAGGUCUUAUCG Populus miR163 At1g66690.1 SAMT 0.5 AUCGAGUUCCAAGUCCUCUUCAAArabidopsis miR163 At1966700.1 SAMT 0.5 AUCGAGUUCCAAGUCCUCUUCAAArabidopsis miR163 Atlg66720.1 SAMT 1 AUCGAGUUCCAGGUCCUCUUCAA Arabidopsis miR163 At3g44860.1 SAMT 1.5 AUCGAGUUCCAAGUUUUCUUCAAArabidopsis y miR163 At3g44870.1 SAMT 1.5 AUCGAGUUCCAAGUUUUCUUCAAArabidopsis y miR164 At1g56010.1 NAC 1 AGCACGUACCCUGCUUCUCCA Arabidopsis y miR164 At3g15170.1 NAC 1 AGCACGUGUCCUGUUCUCCA Arabidopsis y miR164 At5g07680.1 NAC 1.5 UUUACGUGCCCUGCUUCUCCA Arabidopsis y miR164 At5g39610.1 NAC 2 CUCACGUGACCUGCUUCUCCG Arabidopsis yY miR164 At5g53950.1 NAC 1 AGCACGUGUCCUGUUUCUCCA Arabidopsis miR164 At5g61430.1 NAC 1.5 UCUACGUGCCCUGCUUCUCCA Arabidopsis yY miR164 Os02g36880 NAC 1 CGCACGUGACCUGCUUCUCCA Oryza yY miR164 Os04g38720 NAC 1 CGCACGUGACCUGCUUCUCCA Oryza yY miR164 0s06g23650 NAC 1 AGCUCGUGCCCUGCUUCUCCA Oryza y miR164 Os06g46270 NAC 1 AGCAAGUGCCCUGCUUCUCCA Oryza y miR164 Os08g10080 NAC 1.5 AGCAAGUGUCCUGCUUCUCCG Oryza y miR164 Os12g41680 NAC 1 AGCAAGUGCCCUGCUUCUCCA Oryza y miR164 eugene3.00150202 NAC 1.5 CCUACGUGCCCUGCUUCUCCA Populus y miR164 fgenesh4_pm.C_LG_XI 1000069 NAC 1.5 CCUACGUGCCCUGCUUCUCCA Populus y miR164 gw1.107.10.1 NAC 1 AGCACGUGUCCUGUUUCUCCA Populus YN miR164 gwl .V.3536.1 NAC 1 AGCAAGUGCCCUGCUUCUCCA Populus miR164 gwl .VII1.2722.1 NAC 1 AGCAAGUGCCCUGCUUCUCCA Populus Ny miR164 gwl .XI.3766.1 NAC 1 AGCACGUGUCCUGUUUCUCCA Populus miR166 At1g30490.1 HD-ZIP 1.5 UUGGGAUGAAGCCUGGUCCGG Arabidopsis y miR166 At1g52150.1 HD-ZIP 1.5 CUGGAAUGAAGCCUGGUCCGG Arabidopsis y miR166 At2g34710.1 HD-ZIP 1.5 UUGGGAUGAAGCCUGGUCCGG Arabidopsis y miR166 At4g32880.1 HD-ZIP 1.5 CUGGGAUGAAGCCUGGUCCGG Arabidopsis y miR166 At5g60690.1 HD-ZIP 1.5 CUGGGAUGAAGCCUGGUCCGG Arabidopsis y miR166 Os03g01890 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Oryza Yy miR166 Os03g43930 HD-ZIP 2 UUGGGAUGAAGCCUGGUCCGG Oryza miR166 Os109g33960 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Oryza y *miR166 Osl 2g41860 HD-ZIP 2 UUGGGAUGAAGCCUGGUCCGG Oryza y Y miR166 estExt_fgenesh4_pg.C_2360002 HD-ZIP 3 UUGGGAUGAAGCCUGGUCCAG Populus miR166 estExt_fgenesh4_pg.C_LG_12905 HD-ZIP 2.5 UUGGUAUGAAGCCUGGUCCGG Populus miR166 estExtfgenesh4_pg.C_LG_1110436 HD-ZIP 1.5 CUGGAAUGAAUGAAGCCUGGUCCGGPopulus Yy miR166 estExt_fgenesh4_pm.C_LG_V10713 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus miR166 estExt_Genewisel_vl .C_660759 HD-ZIP 1.5 CUGGAAUGAAGCCUGGUCCGG Populus y Y miR166 fgenesh4_pg.C_LG_XVI11000250 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus miR166 fgenesh4_pm.C_LG_1000560 HD-ZIP 1.5 CUGGAAUGAAGCCUGGUCCGG Populus miR166 gw1.6326.1.1 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus Yy y rniR166 gwl .IX.4748.1 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus rniR167 Atlg30330.1 ARF 2 GAGAUCAGGCUGGCAGCUUGU Arabidopsis y rniR167 At5g37020.1 ARF 2 UAGAUCAGGCUGGCAGCUUGU Arabidopsis rniR167 Os02g06910 ARF 3 GAGAUCAGGCUGGCAGCUUGU Oryza Y

122 miR167 Os04g57610 ARF 2 UAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 Os06g46410 ARF 3 GAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 Os12941950 ARF 3 AAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 estExt_Genewisel_vl .C_LG_110777 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 estExt_Genewiselvl .C_LG_XI2869 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 fgenesh4_pg.C_LG_1002802 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 fgenesh4_pg.C_scaffold_1006000001 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 gw1.44.432.1 ARF 2 UAGAUCAGGCUGGCAGCUUGU Populus Y miR167 gw1.IV.3880.1 ARF 2 UAGAUAGGGCUGGCAGCUUGU Populus Y miR167 gwl .V.806.1 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR168 At1g48410.1 AGO 2.5 UUCCCGAGCUGCAUCAAGCUA Arabidopsis Y miR168 Os02g45070 AGO 0 UUCCCGAGCUGCACCAAGCCU Oryza N miR168 Os02g58490 AGO 2.5 CUCCCGAGCUGCGCCAAGCAA Oryza Y miR168 Os04g47870 AGO 0 UUCCCGAGCUGCACCAAGCCCOryza Y miR168 Os04g52540 AGO 3 UUCGCCCGCUGCACCAAGCCG Oryza Y miR168 Os04g52550 AGO 3 UUCGCCCGCUGCACCAAGCCG Oryza Y miR168 Os06g51310 AGO 3 CUCCCGAGCUGCUCCAAGCAA Oryza Y miR168 grail3.0031006602 AGO 3 CACCCGAGCUGCACCAAGCUA Populus N miR168 grail3.0122002801 AGO 3 CACCCGAGCUGCACCAAGCUA Populus N miR169 At1g91 7590.1 CCAAT 1.5 AAGGGAAGUCAUCCUUGGCUG Arabidopsis Y miR169 Atlg54160.1 CCAAT 2 ACGGGAAGUCAUCCUUGGCUA Arabidopsis Y miR169 Atlg72830.1 CCAAT 1.5 AGGGGAAGUCAUCCUUGGCUA Arabidopsis Y miR169 At3g05690.1 CCAAT 1.5 AGGCAAAUCAUCUUUGGCUCA Arabidopsis Y miR169 At3g14020.1 CCAAT 2.5 UAGCCAAGGAUGACuUCCCU Arabidopsis Y miR169 At3g20910.1 CCAAT 2 CGGCAAUUCAUUCUUGGCUUU Arabidopsis N miR169 At5g06510.1 CCAAT 1.5 AGGCAAAUCAUCUUUGGCUCA Arabidopsis Y miR169 At5g12840.1 CCAAT 1.5 CCGGCAAAUCAUUCUUGGCUU Arabidopsis Y miR169 Os03g07880 CCAAT 2.5 AUGGCAAAUCAUCCUUGGCUU Oryza Y miR169 Os03g29760 CCAAT 1.5 GUGGCAAUUCAUCCUUGGCUU Oryza Y miR169 Os03g44540 CCAAT 1 UAGGCAAAUCAUUCUUGGCUC Oryza Y miR169 Os03g48970 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Oryza Y miR169 Os07g06470 CCAAT 1 GUGGCAAUUCAUCCUUGGCUG Otyza Y miR169 Os07g41720 CCAAT 1.5 GUGGCAAUUCAUCCUUGGCUU Oryza Y miR169 Os12942400 CCAAT 1 UAGGCAACUCAUUCUUGGCUG Oryza Y miR169 estExt_fgenesh4pg.C_LG_XVI110020CCAAT 1 CAGGCAAUUCAUCCUUGGCUU Populus Y miR169 eugene3.00011755 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Populus Y miR169 eugene3.00060980 CCAAT 1 CAGGCAAUUCAUCCUUGGCUU Populus N miR169 eugene3.00061121 CCAAT 3 AGGGCAAGUCGUUCUUGGCUC Populus N miR169 eugene3.00091116 CCAAT 2 GCGGCAAAUCAUUCUUGGCUU Populus Y miR169 eugene3.00160615 CCAAT 2.5 AGGGCAAGUCGUUCUUGGCUC Populus N miR169 fgenesh4_pg.C_LG_IX000987 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Populus Y miR169 grail3.0024038301 CCAAT 2.5 UUGGCAAAUCAUUCUUGGCUU Populus N miR169 gw1..1522.1 CCAAT 2.5 GCGGCAAAUCAUUCUUGGCUU Populus N miR171 At2g45160.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y miR171 At3g60630.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y miR171 At4g00150.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y miR171 Os02g44360 SCL 0 GAUAUUGGCGCGGCGCGGCUCAAUCAOryza Y miR171 Os02g44370 SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y miR171 Os04g46860 SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y imiR171 Os06g01620 SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y

123 miR171 Os10g40390 SCL 0.5 GAUAUUGGCGCGGCUCAAUUA Oryza miR171 estExt_Genewisel_vl .CLG_113184 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 eugene3.44860001 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 fgenesh4_pg.C_LG_11000787 SCL 1.5 GGUGAUAUUGG GGCGGCUCAA Populus miR171 gw1.127.243.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gwl.40.23.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gw1.57.294.1 SCL 1 GAUAUUGGAACGGCUCAACGGCUCA Populus miR171 gw1.11.1043.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gw1.111.2060.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gwl .VI1.3405.1 SCL 2.5 GAUACUGGAACGGCUCAAUCA Populus miR172 At2g28550.1 AP2 1.5 CAGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 At2g39250.1 AP2 1 UUGUAGCAUCAUCAGGAUUCC Arabidopsis miR172 At3g54990.1 AP2 1 UGCAGCAUCAUCAGGAUUCC Arabidopsis miR172 At4g36920.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 At5g60120.1 AP2 0.5 AUGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 At5g67180.1 AP2 1.5 UGGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 Os03g60430 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 0s04g55560 AP2 1 CUGCAGCAUCAUCACGAUUCC Oryza miR172 Os05g03040 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 0s06g43220 AP2 0.5 CUGCAGCAUCAUCAGGAUUCC Oryza miR172 Os07g913170 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 grail3.0019003502 AP2 0.5 CUGCAGCAUCAUCAGGAUUCC Populus miR172 gw1.28.415.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCG Populus miR172 gwl .V.4061.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Populus miR172 gw1.VII.1637.1 AP2 1 UUGCAGCAUCAUCAGGAUUCU Populus miR172 gwl .X.2501.1 AP2 0.5 UUGCAGCAUCAUCAGGAUUCU Populus miR172 gwl .XVI.2655.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCG Populus miR393 Os08g41320 bHLH 3.5 ACCAAAAGAAUCACAUCGCCC Oryza miR393 At3g23690.1 bZIP 2 GGUCAGAGCGAUCCCUUUGGC Arabidopsis miR393 eugene3.00140963 bZIP 2.5 GAUCAGAGCGAUCCCUUUGAG Populus miR393 At1g91 2820.1 Fbox 1 AAACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At3g26810.1 Fbox 1 AAACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At3g62980.1 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At4g03190.1 Fbox 2.5 AGACCAUGCGAUCCCUUUGGA Arabidopsis miR393 Os04g32460 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Oryza miR393 Os05g05800 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Oryza miR393 estExt_Genewisel_vl.C_880149 F-box 1 AAACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00012208 F-box 1 AAACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00110318 F-box 3 AGUCAAUGAGGUCACUUUGGA Populus miR393 eugene3.00140791 F-box 1.5 AGACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00141554 F-box 1.5 AGACAAUGCGAUCCCUUUGGA Populus miR394 Atl 9g27340.1 Fbox 1 GGAGGUUGACAGAAUGCCAA Arabidopsis miR394 Os01g69940 Fbox 0 GGAGGUGGACAGAAUGCCAA Oryza miR394 estExtGenewisel_vl .C_LG_17715 F-box 1 GGAGGUUGACAGAAUGCCAA Populus miR394 fgenesh4_pm.C_LG_111000589 F-box 1 GGAGGUUGACAGAAUGCCAA Populus miR395 At3g22890.1 APS 1.5 GAGUUCCUCCAAACUCUUCAU Arabidopsis miR395 At4g14680.1 APS 1.5 GAGUUCCUCCAAACUCUUCAU Arabidopsis miR395 At5g43780.1 APS 0.5 GAGUUCCUCCAAACACUUCAUArabidopsis miR395 Os03g53230 APS 0.5 GAGUUCCUCCAAGCACUUCAUOryza miR395 estExtGenewisel_vl .C_LG_VI112439APS 1.5 GAGUUCCUCCAAACUCUUCAU Populus

124 miR395 grail3.0175000802 APS 0.5 GAGUUCCUCCAAACACUUCAU Populus miR395 At5gl 0180.1 S transporter 1.5 AAGUUCUCCCAAACACUUCAA Arabidopsis miR395 Os03g09930 S transporter 1 GAGUUCACCCAAACACUUCAG Oryza miR395 Os03g09940 S transporter 0 GAGUUCCCCCAAACACUUCAG Oryza miR395 estExt_fgenesh4_pm.C_LG_110422 S Transporter 2.5 GAGUUCCCUCAAGCACUUCAA Populus miR395 eugene3.00070572 S Transporter 1 GAGUUUUCCCAAACACUUCAA Populus miR395 fgenesh4_pm.C_LG_V000080 S Transporter 3 UAUUUCCCCUGAACACUUCAA Populus miR396 At2g22840.1 GRF 3 UCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At2g36400.1 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAAArabidopsis miR396 At2g45480.1 GRF 3 ACGUUCAAGAAAGCUUGUGGAAArabidopsis miR396 At3g52910.1 GRF 3 CCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At4g24150.1 GRF 3 UCGUUCAAGAAAGCAUGUGGAAArabidopsis miR396 At4g37740.1 GRF 3 UCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At5g53660.1 GRF 3 UCGUUCAAGAAAGCAUGUGGAAArabidopsis miR396 Os02g45570 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAOryza miR396 Os02g47280 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os02g53690 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAOryza miR396 Os03g47140 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os03g51970 GRF 3 CCGUUCAAGAAAGCAUGUGGA Oryza miR396 Os04g51 190 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os06g02560 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os 1g35030 GRF 3 UCGUUCAAGAAAGAAAGCAUGUGGAOryza miR396 Os12g29980 GRF 3 CCGUUCAAGAAAGCAUGUGGA Oryza miR396 estExt_Genewisel_v.C_290455 GRF 3 CCGUUCAAGAAAGCCUGUGGA Populus miR396 eugene3.00010995 GRF 3 GCGUUCAAGAAAGCUUGUGGA Populus miR396 eugene3.00011018 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAPopulus miR396 eugene3.00021070 GRF 3 UCGUUCAAGAAAGAAAGCCUGUGGAPopulus miR396 fgenesh4_pg.C_LG_1000725 GRF 3 CCGUUCAAGAAAGCCUGUGGA Populus miR396 fgenesh4_pg.C_LG_XI1000270 GRF 3 CCGUUCAAGAAAGAAAGCAUGUGGAPopulus miR396 fgenesh4_pg.C_LG_XIV000034 GRF 3 UCGUUCAAGAAAGCCUGUGGA Populus miR396 fgenesh4_pm.C_scaffold_28000142 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAPopulus miR396 gwl .XIV.854.1 GRF 3 ACGUUCAGAAAGAAAGCUGUGGAPopulus miR396 At2g40760.1 Rhodenase 2.5 AAGUUUAAAGGAGCUGUGGAU Arabidopsis miR396 Os05g25780 Rhodenase 3 AAAUUUAAGAGAGCUGUUGAU Oryza miR396 gwl.XIX.1660.1 Rhodenase 3 AAGUUCAAAGGAGCUGUUGAU Populus miR397 At2g29130.1 Laccase 0.5 AAUCAAUGCUGCACUCAAUGA Arabidopsis miR397 At2g38080.1 Laccase 1 AGUCAACGCUGCACUUAAUGA Arabidopsis miR397 At5g60020.1 Laccase 1 AAUCAAUGCUGCACUUAAUGA Arabidopsis miR397 Os1g44330 Laccase 2 CAUCAACGCUGCAGUCAACGA Oryza miR397 Os01g61160 Laccase 3 CAUCAACGCGGCACUCAACCA Oryza miR397 OsO1g62480 Laccase 2.5 CAUCAACGCCGCGCUCAACGAOryza miR397 Os01g62490 Laccase 0.5 CAUCAACGCUGCGCUCAAUGA Oryza miR397 Os01g63180 Laccase 1.5 CAUCAACGCUGCGCUCAACAC Oryza miR397 Os2g51440 Laccase 3 CAUCAACGCUGGACUCACCAA Oryza miR397 OsO3g16610 Laccase 1.5 GAUCAACGCUGCGCUCAACGA Oryza miR397 Os05g38390 Laccase 2.5 GAUCAACGCGGCGCUCAACGA Oryza imiR397 Os05g38410 Laccase 1 CAUCAACGCUGCACUCAACGA Oryza miR397 Os05g38420 Laccase 1 CAUCAACGCUGCACUCAACGA Oryza miR397 OslgO01730 Laccase 2.5 CAUCAACGCCGCGCUCAACAC Oryza miR397 Osl 1g48060 Laccase 1 CAUCAACGCUGCACUGAAUGA Oryza

125 miR397 Os12g01730 Laccase 2.5 CAUCAACGCCGCGCUCAACAC Oryza miR397 Os12915530 Laccase 2.5 CAUCAACGCCGCGCUCAACAC Oryza miR397 Os12g915680 Laccase 1.5 CAUCAACGCUGCGCUCAACAC Oryza miR397 estExtfgenesh4_pg.C_LG_X1635 Laccase 1.5 CAUCAAUGCUGCACUCAAUCA Populus miR397 estExtfgenesh4_pm.C_LG_V10293 Laccase 1 AAUCAACGCUGCACUCAACGA Populus miR397 estExt_fgenesh4pm.C_LG_VII10291 Laccase 1.5 CAUCAAUGCUGCACUCAAUCA Populus miR397 estExtGenewisel_v .C_LG_XV13501 Laccase 0.5 GAUCAAUGCUGCACUCAAUGA Populus miR397 eugene3.00010449 Laccase 0.5 GAUCAAUGCUGCACUCAAUGA Populus miR397 eugene3.00060812 Laccase 1 AAUCAACGCUGCACUCAAUAA Populus miR397 eugene3.00091222 Laccase 1 CAUCAACGCUGCACUAAAUGA Populus miR397 eugene3.00161066 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 eugene3.01070064 Laccase 1 GAUCAACGCCGCACUCAAUGA Populus miR397 eugene3.04340001 Laccase 1 AAUCAACGCUGCACUCAAUAA Populus miR397 fgenesh4_pg.C_LG_IV001314 Laccase 1.5 CAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_IX000614 Laccase 1.5 CAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_IX001228 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_VI000783 Laccase 1.5 GAUCAAUGCAGCACUCAAUGA Populus miR397 fgenesh4_pg.C_LG_XVI000990 Laccase 2.5 AAUCAACGCUGCUCUCGAUAA Populus miR397 fgenesh4_pg.C_scaffold_107000055 Laccase 1 GAUCAACGCCGCACUCAAUGA Populus miR397 fgenesh4_pm.C_LG_1000649 Laccase 1 UAUCAACGCUGCACUAAAUGA Populus miR397 fgenesh4_pm.C_LG_1000891 Laccase 2 AAUCAACGCAGCACUAAAUGA Populus miR397 grail3.0023027201 Laccase 1.5 GAUCAAUGCAGCACUCAAUGA Populus miR397 gw1.4300.5.1 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 gwl..1 184.1 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 gw1.1.247.1 Laccase 0.5 GAUCAAUGCUGCACUCAAUGA Populus miR397 gwl .VII.3595.1 Laccase 2.5 CAUCAAUGCUGCCCUCAACGA Populus miR397 gwl .V11.2100.1 Laccase 3 GGUCAAUUCUGCACUCAAUCA Populus miR397 gwl.XI.3910.1 Laccase 1.5 GAUCAAUGCCGCACUCAAUGA Populus miR397 gwl .XI.3915.1 Laccase 1.5 GAUCAAUGCUGCCCUCAAUGA Populus miR398 Atlg08830.1 CSD 3 AAGGGGUUUCCUGAGAUCACA Arabidopsis miR398 At2g28190.1 CSD 4 UGCGGGUGACCUGGGAAACA Arabidopsis miR398 Os03g11960 CSD 4 UGUGGGCGACCUGGGAAACA Oryza miR398 Os08g44770 CSD 4 UGCGGGUGACCUGGGAAACA Oryza miR398 fgenesh4_pm.C_scaffold_163000009 CSD 4 UGCGGGUGACCUGGGAAACAU Populus miR398 gwl .IX.5030.1 CSD 4 UGCGGGUGACCUGGGAAACAU Populus miR398 Atlg15640.1 Cyt C oxidase 3 AAGGUGUGACCUGAGAAUCACAArabidopsis miR398 Os01g42650 Cyt C oxidase 4 GCGCCGCGACCUGAGAGCACA Oryza miR399 At3g54700.1 P transporter 2 CAGGCCAGCUCUUCUUUGGCU Arabidopsis miR399 Os03g04360 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 Os08g45000 P transporter 0.5 CAGGGCAACUCUUCUUUGGCU Oryza miR399 Os109g30770 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 Os109g30790 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 estExt_fgenesh4_pm.C_LG_V0552 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 eugene3.00051302 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 eugene3.186960001 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 fgenesh4_pg.C_scaffold_125000020 P transporter 1.5 CAGGGCAACUCUUCUUUGGGU Populus miR399 At2g33770.1 Ub 0.5 UAGAGCAAAUCUCCUUUGGCA Arabidopsis miR399 At2g33770.1 Ub 0.5 UAGGGCAAAUCUUCUUUGGCA Arabidopsis rniR399 At2g33770.1 Ub 0.5 UAGGGCAUAUCUCCUUUGGCA Arabidopsis miR399 At2g33770.1 Ub 0.5 UCGAGCAAAUCUCCUUUGGCA Arabidopsis

126 miR399 At2g33770.1 Ub 0.5 UUGGGCAAAUCUCCUUUGGCA Arabidopsis miR399 0s05g48390 Ub 1 CCGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 1 CGUGGUAAUUCUCCUUUGGCA Oryza miR399 0s05g48390 Ub 0 CUGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 0 UAGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 2 UCGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 2 UUGGGCAAAUCUCCUUUGGCA Oryza miR399 eugene3.00040513 Ub 0.5 CAGGGCAAAUCUUCUUUGGCA Populus miR399 eugene3.00040513 Ub 1.5 UAGGGCAAAUCUCUUUUGGCU Populus miR399 eugene3.00040513 Ub 3 AAGGAAAGAUCUUCUUUGGCA Populus miR399 eugene3.00040513 Ub 0.5 UUGGGCAAAUCUCCUUUGGCA Populus miR399 eugene3.00040513 Ub 1 UAGGGAAAAUCUCCUUUGGCA Populus miR399 eugene3.01240047 Ub 0.5 UAGGGCAAAUCUCCUUUGGCA Populus miR399 eugene3.01240047 Ub 1 UUGGGCAAAUCUCCUUUGGCA Populus miR399 eugene3.01240047 Ub 2.5 AAGGGCAGAUCUUCUUUGGCA Populus miR399 eugene3.01240047 Ub 1 UUGGGCAAAUCUCCUUUGGCA Populus miR399 eugene3.01240047 Ub 1.5 CAGGGCAAAUCUUCUUUGGCG Populus miR403 Atl 9g31280.1 AGO2 0 GGAGUUUGUGCGUGAAUCUAA Arabidopsis miR403 gw1.200.30.1 AGO2 0 GGAGUUUGUGCGUGAAUCUAA Populus miR408 At2g30210.1 Laccase 3 ACCAGUGAAGAGGCUGUGCAG Arabidopsis miR408 At5g05390.1 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAA Arabidopsis miR408 At5g07130.1 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAG Arabidopsis miR408 Os01g61160 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAA Oryza miR408 Os03g1 8640 Laccase 2.5 GCUAGUGAAGAGGCUGUGCAA Oryza miR408 eugene3.00131222 Laccase 3 ACCAGUGAAGAGGCUGUGCAG Populus miR408 eugene3.00191007 Laccase 2.5 GCCAGUGAGGAGGCUGUGCAG Populus miR408 gwl .VIII.2100.1 Laccase 3 UCCAGUGAAGAGGCUGUGCAA Populus miR408 At2g02850.1 Plantacyanin 1 CCAAGGGAAGAGGCAGUGCAU Arabidopsis miR408 Os02g49850 Plantacyanin 1 CUCGGGGAAGAGGCAGUGCAU Oryza miR408 Os03g15340 Plantacyanin 1 CCCAGGGAAGAGGCAGUGCAG Oryza miR408 Os6g15600 Plantacyanin 0.5 GCCGGGGAAGAGGCAGUGCAA Oryza miR408 estExt_fgenesh4_pm.C_LG_11I118 Plantacyanin 1.5 GCCAGGGAAGAUGCAGUGCGA Populus

127 MicroRNAs in plants

Brenda J. Reinhart,' Earl G. Weinstein, 1 Matthew W. Rhoades,' Bonnie Bartel,2 '3 and David P. BartelL'3 'Whitehead Institute for Biomedical Research, and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; 2 Department of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, USA

MicroRNAs (miRNAs) are an extensive class of -22-nucleotide noncoding RNAs thought to regulate gene expression in metazoans. We find that miRNAs are also present in plants, indicating that this class of noncoding RNA arose early in eukaryotic evolution. In this paper 16 Arabidopsis miRNAs are described, many of which have differential expression patterns in development. Eight are absolutely conserved in the rice genome. The plant miRNA loci potentially encode stem-loop precursors similar to those processed by Dicer (a ribonuclease III) in animals. Mutation of an Arabidopsis Dicer homolog, CARPEL FACTORY, prevents the accumulation of miRNAs, showing that similar mechanisms direct miRNA processing in plants and animals. The previously described roles of CARPEL FACTORY in the development of Arabidopsis embryos, leaves, and floral meristems suggest that the miRNAs could play regulatory roles in the development of plants as well as animals. [Key Words: miRNA; siRNA; ncRNA; Dicer; CARPEL FACTORY] Received May 6, 2002; revised version accepted May 22, 2002.

A growing body of evidence suggests that -22-nucleotide of which are conserved from worms to humans (Pas- (nt) noncoding RNA molecules play crucial roles as regu- quinelli et al. 2000; Lagos-Quintana et al. 2001; Lau et al. lators of gene expression in eukaryotes. The first endog- 2001; Lee and Ambros 2001). RNAs are classified as enous -22-nt RNAs to be identified were lin-4 RNA and miRNAs if they share the following features with lin-4 let-7 RNA, both of which are key regulatory molecules and let-7 RNAs: (1) The mature form of the RNA is a in the pathway controlling the timing of larval develop- 20-nt to 24-nt species that is usually detectable on ment in the nematode Caenorhabditis elegans (Leeet al. Northern blots. (2) The RNA has the potential to pair to 1993; Reinhart et al. 2000). When these RNAs are ex- flanking genomic sequences, placing the mature miRNA pressed, they pair to sites within the 3' untranslated re- within an imperfect RNA duplex thought to be needed gion (UTR) of target mRNAs, triggering the translational for its processing from a longer precursor transcript. In repression of the mRNA targets (Lee et al. 1993; Wight- addition, miRNAs are typically derived from a segment man et al. 1993; Reinhart et al. 2000; Slack et al. 2000). of the genome that is distinct from predicted protein- The mature lin-4 and let- 7 RNAs are processed from the coding regions. Thus far, >150 tiny RNAs that satisfy double-stranded region of RNA precursor transcripts by these criteria have been identified in animals (Lagos- Dicer, a molecule with an N-terminal helicase and tan- Quintana et al. 2001, 2002; Lau et al. 2001; Lee and Am- dem C-terminal ribonuclease III domains (Bernstein et bros 2001; Mourelatos et al. 2002). The abundance of the al. 2001; Grishok et al. 2001; Hutvagner et al. 2001; Ket- miRNA genes, their intriguing expression patterns in ting et al. 2001). Argonaute homologs also influence the different tissues or in different stages of development, accumulation of the lin-4 and let-7 RNAs, but their bio- and their evolutionary conservation imply that, as a chemical roles are unclear (Grishok et al. 2001). Argo- class, miRNAs have broad regulatory functions in addi- naute family members have a PAZ domain, which may tion to the known roles of lin-4 and let-7 RNAs in the allow protein-protein interaction with Dicer, as well as temporal control of developmental events. In support of a Piwi domain, whose function is unknown (Cerutti et this idea, six of the recently identified Drosophila miRNAs al. 2000). are complementary to 3'-UTR elements known to confer The lin-4 and let-7 regulatory RNAs are now recog- posttranscriptional regulation in this species (Lai 2002). nized as the founding members of a large class of -22-nt MicroRNAs are not the only small RNAs processed by noncoding RNAs termed microRNAs (miRNAs), several Dicer. Dicer was originally identified as a nuclease in- volved in the RNA interference (RNAi) pathway of ani- 3 Correspondingauthors. mals (Bernstein et al. 2001). This method of RNA silenc- E-MAILbartelflrice.edu; FAX (713)348-5154. ing is triggered by long double-stranded RNA (dsRNA), E-MAILdbartelwi.mit.edu; FAX (617)258-6768. Article and publication are at http://www.genesdev.org/cgi/doi/10.1101/ typically introduced by injection or expression from gad.1004402. a transgene (Fire et al. 1998). The dsRNA trigger is

1616 GENES& DEVELOPMENT16:1616-1626 O 2002 by Cold SpringHarbor Laboratory Press ISSN 0890-9369/02$5.00; www.genesdev.org microRNAsin plants

cleaved by Dicer into -22-nt RNAs (Bernstein et al. -100 were cloned from flowers. Of these, 18 sequences 2001). These -22-nt RNAs, known as small interfering were represented by more than one clone and were the RNAs (siRNAs), act as guide RNAs to target homologous subject of further analysis. Of these 18 RNAs, 16 had mRNA sequences for destruction (Hammond et al. 2000; striking similarities to the miRNAs of animals and have Zamore et al. 2000; Elbashir et al. 2001). RNAs -25 nt in therefore been named miR156 through miR171, with length are also associated with posttranscriptional gene genes designated MIR 156 through MIR 171 (Table 1). Six silencing (PTGS)in plants, and it has been suggested that of the miRNAs represent three pairs of closely related a Dicer-like activity also produces these small RNAs RNA sequences differing only by one or two nucleotides. (Hamilton and Baulcombe 1999; Matzke et al. 2001; Interestingly, most of the plant miRNAs begin with a U, Vance and Vaucheret 2001). RNAi, PTGS, and quelling a trend previously observed in animal miRNAs (Lagos- of Neurospora are related pathways that require a con- Quintana et al. 2001; Lau et al. 2001). served set of proteins (Hutvigner and Zamore 2002). For Five of the plant miRNA sequences have a single copy example, PTGS requires ARGONAUTE (Fagard et al. in the Arabidopsis genome, whereas each of the other 11 2000), the RNA-directed RNA polymerase SDE1/SGS2, sequences correspond to multiple (2-7) loci (Table 1), which may amplify dsRNA used as a trigger for silencing most likely because of duplications in the Arabidopsis (Dalmay et al. 2000; Mourrain et al. 2000), and the RNA genome (The Arabidopsis Genome Initiative 2000). As helicase SDE3 (Dalmay et al. 2001). Some aspects of expected for miRNA loci, nearly all (37 of 40) of the RNA silencing may be species-specific, such as the genomic loci lie outside of annotated segments of the RNA-directed DNA methylation required to maintain genome, and thus do not correspond to previously iden- transgene silencing in plants (Morel et al. 2000; Bender tified genes. The three exceptions are for a single 2001). Although RNA silencing has been proposed to miRNA, miR171. Furthermore, each of these 37 loci have evolved as a viral defense mechanism (Vance and place the cloned RNA sequence in a context where it can Vaucheret 2001), it can clearly be used by organisms for pair with a nearby genomic segment to form a dsRNA the regulation of endogenous genes. The Drosophila Ar- hairpin structure resembling those thought to be re- gonaute family member aubergine is involved in the en- quired for Dicer processing of miRNAs (Fig. 1; Supple- dogenous RNAi-like silencing of Stellate by dsRNA pro- mental data available online at http://www.genesdev. duced from both DNA strands of the Suppressor of Stel- orgl. As with metazoans, the mature miRNA can be pro- late locus (Aravin et al. 2001). It is possible that other cessed from either the 5' or the 3' arm of the fold-back animals or plants also generate endogenous siRNAs for precursor. Nevertheless, each miRNA with multiple gene regulation in development. matches to the genome is always present on the same To further examine the roles of small RNAs in the arm of its potential precursors, suggesting that these loci regulation of plant gene expression, we cloned endog- share a common ancestry (see Supplemental data avail- enous RNAs from Arabidopsis. Here we describe 16 able online at http://www.genesdev.org). We do not plant RNAs that have the defining features of miRNAs. know whether all of these loci are transcriptionally ac- The presence of miRNAs in plants greatly expands the tive or whether some might be pseudogenes. known phylogenetic distribution of this class of tiny The sizes of the predicted Arabidopsis hairpins are noncoding RNAs and indicates that miRNAs arose early more variable than those of animals. For example, Cae- in eukaryotic evolution, before the last common ances- norhabditis elegans miRNAs tend to be cleaved from tor of plants and animals. The presence of miRNAs in precursors -70 nt in length, with the mature miRNA plants also suggests that the developmental defects of located only -2-10 bp from the terminal loop of the carpel factory (caf)l,a mutation in a Dicer homolog (Ja- stem-loop (Lau et al. 2001). Although some of the Ara- cobsen et al. 1999), and mutations in ARGONAUTE bidopsis precursor predictions resemble those of C. el- family proteins (Bohmert et al. 1998; Moussian et al. egans (Fig. 1), others are larger, as seen for the -190-nt 1998) could result from miRNA processing defects. In predicted precursor of miR169 (Fig. 1). fact, we find that the accumulation of plant miRNAs is In other systems, only one of the RNA strands accu- substantially reduced in the caf mutant. The ancient ori- mulates following Dicer processing of miRNAs from the gin of miRNAs, together with the potential link between double-stranded region of the precursor, while the re- miRNAs and development, implies that miRNAs might mainder of the precursor quickly degrades (HutvAgneret have played roles during the origins and evolution of al. 2001). As a result, RNA from only one side of the both plant and animal multicellular life. miRNA precursor is typically cloned or detected on Northern blots, although on rare occasions RNA from the other side of the precursor is identified (Lau et al. Results 2001; Mourelatos et al. 2002), particularly if many clones Identification of Arabidopsis miRNAs are sequenced (E.G. Weinstein and D.P. Bartel, unpubl.). In contrast, Dicer processing of perfectly complementary Using methods designed to clone Dicer cleavage prod- dsRNA molecules in the RNAi pathway is thought to ucts, which are 20-nt to 24-nt RNAs with 5'-phosphate produce two stable overlapping -21-nt RNA molecules and 3'-hydroxyl groups (Bernstein et al. 2001; Elbashir et that pair to each other with -2-nt 3' overhangs (Elbashir al. 2001; Hutvigner et al. 2001; Lau et al. 2001), -200 et al. 2001; Nykiken et al. 2001). As expected, for most tiny RNAs were cloned from Arabidopsis seedlings and (14/16)of the plant miRNAs, we cloned sequences from

GENES& DEVELOPMENT 1617 Table 1. MicroRNAs cloned from Arabidopsis miRNA Fold- Fold- miRNA No. of length Oryza b;ack back gene clones miRNA sequence (nt) matches arm length Chr. Distance to nearest gene MIR156a 16 UGACAGAAGAGAGUGAGCAC 20-2 I 10 5' 82 2 3.2 kb downstream of At2g25100 (s) MIR156b 5' 80 4 0.36 kb upstream of At4g30970(a) MIR156c 5' 83 4 3.2 kb downstream of At4g31875 (s) MIR156d 5' 86 5 2.6 kb upstream of At5g10940 (s) MIR 156e 5' 96 5 1.6 kb downstream of At5gl1980 (s) MIR156f 5' 90 5 1.3 kb downstream of At5g26150 (a) MIR157a 9 UUGACAGAAGAUAGAGAGCAC 20-2 5' 91 1 1.8 kb downstream of Atlg66780 (a) MIR157b 5' 91 1 2.7 kb downstream of Atlg66790 (a) MIR 157c 5' 165 3 2.3 kb downstream of At3g18215 (a) MIR157d 5' 173 1 1.0 kb upstream of Atlg48470 (s) MIR 158 8 UCCCAAAUGUAGACAAAGCA 20 3' 64 3 0.6kb upstream of At3g10750 (s) MIR 159 8 UUUGGAUUGAAGGGAGCUCUA 21 3' 182 1 1.9 kb upstream of Atlg73690 (s) MIR 160a 4 UGCCUGGCUCCCUGUAUGCCA 21 4 5' 78 2 4.0 kb downstream of At2g39180 (a) MIR .160b 5' 80 4 2.4 kb upstream of At4g17790 (a) MIR.160c 5' 81 5 1.5 kb upstream of At5g46850 (a) MIR .161 16 UUGAAAGUGACUACAUCGGGG 20-21 - 5' 90 1 2.6 kb downstream of Atlg48270 (a) MIR.162a 3 UCGAUAAACCUCUGCAUCCAG 21 1 3' 85 5 1.2 kb upstream of At5g08190 (s) MIR .162b 3' 88 5 1.4 kb upstream of At5g23070 (s) MIR.163 24 UUGAAGAGGACUUGGAACUUCGAU 24 3' 303 1 0.6 kb upstream of Atlg66730 (s) MIR.164a 21 UGGAGAAGCAGGGCACGUGCA 21 2 5' 78 2 1.lkb upstream of At2g47590 (s) MIR164b 5' 149 5 2.4 kb upstream of At5g01750 (s) MIR165a 2 UCGGACCAGGCUUCAUCCCCC 20-2 3' 101 1 1.5 kb downstream of AtlgOI 180 (a) MIR165b 3' 136 4 2.8 kb upstream of At4gO0880(s) MIR166a 5 UCGGACCAGGCUUCAUUCCCC 21 6 3' 136 2 4.7 kb upstream of At2g46690 (a) MIR166b 3' 112 3 3.5 kb upstream of At3g61900 (a) MIR166c 3' 108 5 10 kb downstream of At5g08690 (s) MIR166d 3' 101 5 22 kb downstream of At5g08740 (a) MIR166e 3' 135 5 2.6 kb downstream of At5g41910 (a) MIR166f 3' 91 5 1.1 kb downstream of At5g43600 (s) MIR166g 3' 90 5 1.5 kb upstream of At5g63720 (s) MIR167a 19 UGAAGCUGCCAGCAUGAUCUA 21 3 5' 101 3 4.7 kb upstream of At3g22890(a) MIR167b 5' 90 3 0.19 kb downstream of At3g63370 (s) MIR168a 3 UCGCUUGGUGCAGGUCGGGGA 21 5' 104 4 2.3 kb upstream of At4g19390 (a) MIR168b 5' 89 5 0.5 kb downstream of At5g45310 (s) MIR169 3 CAGCCAAGGAUGACUUGCCGA 21 2a 5' 190 3 1.9 kb downstream of At3g13400 (a) MIRI 70 3 UGAUUGAGCCGUGUCAAUAUC 21 3' 64 5 0.5 kb downstream of At5g66040 (s) b MIR171 10 UGAUUGAGCCGCGCCAAUAUC 21 5 3' 92 3 0.5 kb downstream of At3g51380 (a) - - 2 in At2g45160 SCARECROW-like(a) - - 3 in At3g60630 SCARECROW-like(a) - - 4 in At4g00150 SCARECROW-like6 (a) Some miRNAs are represented by clones of different lengths due to heterogeneity of the RNA ends. The sequence of the most abundant clone is shown. Both miR156 and miR161 clones were found with 5' or3' heterogeneity. MIR160band MIR161 each had one clone of the same size but in a register shifted 5' of the sequence shown by 2 and 8 nucleotides, respectively. The number of perfect matches to the available rice genomic sequence (Oryza matches) are indicated, as is the arm of the predicted stem-loop precursor that contains the miRNA (Fold-backarm) and the minimum number of nt that would be required to from a fold-back structure bounded by the miRNA and the segment of the predicted precursor that pairs to the miRNA (Fold-backlength). Oryza fold-backs have the miRNA in the same arm as their Arabidopsis homologs (Supplemental data available online at http://www.genesdev.org). Chromosomal (Chr)positions, distance to the nearest annotated gene, and the position of the miRNA, sense (s)and antisense (a),relative to the nearest gene are noted for all matches in the Arabidopsis genome. aOne of the miR169 Oryza matches is at the end of a contig, precluding prediction of a fold-back precursor structure. bAs with Arabidopsis, only one of the miR171 Orzya matches has a predicted fold-back characteristic of miRNAs.

1618 GENES& DEVELOPMENT microRNAs in plants

GGA AU A-U G-C U-A UG U'G U-A U-A UG U-A U-A UGA U-A 1 U U U Ir44itod A G U U U A U U-A U-A U G U-A A-U U'G A-U U-A U'G AA BQ U-A A-U UU UCG G - C U U U-A U-A U-A G'U A-U U CU U-A UC G U UUCUUc UA C C U C U-A U-A U C G U U u u U-A U G-U U-A U-A A-U U-A G G U U A-8 U-A A-U A G U U C_ C-G A-U C-G A-U C-G G-C C-G G-C U-A G-C G-U U-A U U A-U U-A U-A U-A G G UU-A U U 4ntF A U C G-C U U G-C GGu-G A-U C U U G'U U-A op A A-U A-U A-U G-C U UUACU U A-U A-U u.-GU C-G C-G C-U G-U A G-C G-C A-U U-A U-A A-U A GAUoCB A-U C-G A-U G-C A C C-0 C A-U C-G C-G G'U G- A-U U-A C-0 UU-AC CzU GU A-U C A A-U U G C-G A-U C A-U C_ A-U G-C A-U A C A-U C-G G-C G C C-G U C-G C-G C-G AU G-C A-U A-U A C A-U A C C-G U C-G C-G C-G G C A-U A-U C-G UA G-U C U-A A-U G-C A-U C-G C-G A-U G C U-A G-C C U C-G A-U G-C A-U A-U G-C G-C G-C A-U A-U U-A G C A-U G-C C-G A-U G-C A 3 U-A G-C G-CC G-C AU U-A U-A A-U U-A CGC A-U G-C G C U-G G-C G-UC G-C A-U G C G-C G-C AU A-U G-C A-UG A-U A-U A-U C G-C U-G A-U A-U G-C G-C U G-U C U A-U U A-U A-U G-C G-C C C-G G-C A-U G-C C G A-U A-U C G-U AU A-U C-G A-U G-C A-U A U G C G-C A-UA GCC A-U G-C A U-A A-U A-U UA G-C G-C C-G G-C U U-A A C C A U-A A-U C-G a C A-U A-U G-C A-U A-U C-G G-C C-G U-A U U-A AG-C G-C A-U U-A G-C U-A G- A-U G-C C-G A-U G-C U-G G-C o-C G-C G-U U-A A-U U-A A-U G-C U U C-G A-U C-G G-C A-U A-U C C 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' c d e f MIR169 MIR170 MIR156

Figure 1. Fold-back secondary structures of Arabidopsis miRNA predicted precursors as determined by the RNAfold program. The miRNA sequences are in red. For miR156 and miR169, RNAs from the other side of the fold-back (boxed in blue) were each cloned once. The duplexes that could form between these RNAs and the miRNA from the other strand have -2-nt 3' overhangs characteristic of Dicer cleavage (Elbashir et al. 2001). only one arm of the fold-back precursor. For two loci, we similar to that seen for metazoan miRNAs (E.G. Wein- also cloned a single 21-nt sequence from the other arm of stein and D.P. Bartel, unpubl.). The isolation of these the fold-back (Fig. 1). The disparity in cloning frequency two sequences generated from the opposite arm of the between the two sides, 16:1 in the case of MIR156, was predicted fold-back supports the existence of these stem-

GENES & DEVELOPMENT 1619 Reinhart et al. loops as miRNA precursors. Furthermore, the duplexes have differently processed precursors or tissue-specific that could be formed between the sequences isolated differences in the Arabidopsis miRNA processing ma- from both sides of the stems have 2-nt 3' overhangs (Fig. chinery. We have not been able to reliably detect expres- 1), suggesting that they are products of a Dicer-like ac- sion of RNAs in the size range of 60-200 nt that might tivity similar to that which processes the metazoan correspond to the stem-loop precursors cleaved by Dicer. miRNAs (E.G. Weinstein and D.P. Bartel, unpubl.).

Arabidopsis miRNAs are produced The Arabidopsis miRNAs display developmental by CARPEL FACTORY expression differences Although the presence of precursors in Arabidopsis was Northern analysis confirmed that the 16 miRNAs were not detected on Northern blots, the potential for their stably expressed as -21-nt RNAs (Fig. 2). All are ex- production prompted us to investigate whether the -21- pressed at some level in seedlings, leaves, stems, flowers, nt miRNAs might be processed from a longer dsRNA by and siliques (seed pods). Whereas miR163 accumulates proteins homologous to those that generate metazoan in all tissues, with only slightly lower levels in seedlings miRNAs. Dicer is thought to cleave the double-stranded and siliques, other miRNAs have quite variable levels region of the miRNA precursors in Drosophila, C. el- among the tissues tested. For example, miR157 is most egans, and humans (Grishok et al. 2001; Hutvigner et al. highly expressed in seedlings, and miR171 is most highly 2001; Ketting et al. 2001; Lee and Ambros 2001). Muta- expressed in flowers, suggesting that they might play tions have been isolated in only one of the four Dicer roles in the development of these stages/organs. The size homologs in Arabidopsis, CARPEL FACTORY (CAF; of the RNAs detected approximately matches those that also named SHORT INTEGUMENT [SIN1]; GenBank were cloned. In some cases, RNAs of two sizes can be accession no. AAG38019). The pleiotropic phenotypes detected, reflecting the heterogeneity of the cloned se- associated with loss of CAF/SIN1 function, such as floral quences (Table 1). For example, a probe to miR156 de- meristem proliferation defects, floral organ morphogen- tects both 20-nt and 21-nt RNAs, and the miR156 clones esis defects, and altered ovule development, emphasize were of both sizes. In another case, miR167, a 21-nt RNA the critical developmental role of RNAs processed by accumulates in all tissues except stem, where a 22-nt CAF (Robinson-Beers et al. 1992; Ray et al. 1996a,b; Ja- RNA accumulates instead. This might reflect either dif- cobsen et al. 1999). Northern analysis showed that the ferential transcription of the two MIR167 genes that expression level of the three miRNAs tested is signifi-

M Se L St F Si M Se L St F Si

24- miR158 21- miRll Figure 2. Developmental expression of Ara- 18- bidopsis miRNAs. Total RNA from Columbia seedlings (Se), leaves (L), stems (St), flowers (F), and siliques (Si) was analyzed on Northern blots by hybridization to end-labeled DNA oli- 46,' I-, miR157 gonucleotide probes complementary to the 24- (1) miR163 miRNA. The lengths of end-labeled RNA oli- 21-0 gonucleotides run as a size marker (M) are noted to the left of each panel. Although 18- ft 24- miR165 and miR166 sequences and miR170 21- miR169 and miR171 sequences are too closely related 18- 2 to be reliably distinguished by hybridization miR156 probes, miR156 and miR157 should be specifi- cally recognized (Lau et al. 2001), as reflected 78- U6 in their different levels of expression in seed- oi lings and siliques. miR159 and miR164 show a 24- i similar expression profile to miR165, whereas miR165 miR167 miR160, miR162, and miR168 have similar profiles to miR158 (data not shown). The low 18- expression level of most miRNAs in leaves Sr w and siliques might reflect a difference in the 24- efficiency of small RNA recovery with the miR171 21- mIR170 RNA isolation method used for these two tis- sues (see Materials and Methods). Blots were 18- stripped and reprobed with an oligonucleotide probe complementary to U6 as a loading con- 78*aSu U6 trol.

1620 GENES & DEVELOPMENT

)3K microRNAs in plants

cantly reduced in carpel factory homozygotes (Fig. 3). same arm of the precursor in both species (see Supple- Although the level of miRNA precursors is increased mental data available online at http://www.genesde- when Dicer function is reduced in metazoans (Grishok v.org). The Arabidopsis and Oryza sequences have et al. 2001; Hutvigner et al. 2001; Ketting et al. 2001; Lee drifted considerably in regions outside the miRNA se- and Ambros 2001), we have not detected precursor accu- quence, but selective pressure can be seen in the seg- mulation in caf mutants (Fig. 3; data not shown). ments predicted to base-pair with the miRNAs, resulting in only a few base changes in these segments and a con- served overall propensity for dsRNA formation (Fig. 4). Evolutionary conservation of Arabidopsis miRNAs For each set of related loci, the precursor duplexes extend in Oryza beyond the length of the miRNA, but the sequence of the The evolutionary conservation of miRNA sequences in flanking duplex RNA is variable (see Supplemental data different species indicates that they have important bio- available online at http://www.genesdev.org). This con- logical functions (Pasquinelli et al. 2000; Lagos-Quin- servation in secondary structure accompanied by vari- tana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). ability in sequence provides added evidence that the sec- Eight Arabidopsis miRNAs have sets of identical ondary structural context of these RNAs is important, matches in the genome of the rice Oryza sativa L. ssp. presumably for their processing from stem-loop precur- indica (Table 1), which was estimated to have 92% func- sors. tional coverage at the time of our analysis (Yu et al. 2002). With rare exceptions (noted in Table 1), these sets An miRNA complementary to three related mRNAs of Oryza homologs have adjacent sequences that could form stem-loop precursors analogous to those of Arabi- In nematodes, lin-4 and let-7 RNA recognize their target dopsis, with the miRNA sequence invariably on the mRNAs through limited base-pairing to complementary sites within the 3' UTR of their targets. The largest re- (CAF/CAF) (CAF/caO (caf/ca) gions of uninterrupted complementarity are only -8 nt (Lee et al. 1993; Wightman et al. 1993; Reinhart et al. M L St F L St F L St F 2000; Slack et al. 2000). Consistent with this precedent, the plant miRNA sequences do not perfectly match cod- ing regions, with the exception of miRl71, which has four matches to the genome. One locus is 0.5 kb from the nearest predicted coding region and adjacent to ge- nomic sequence that can form a classical miRNA pre- cursor, consistent with the idea that it is a true miRNA. Further supporting this idea is the observation that a closely related sequence, miR170, was also cloned mul- tiple times and has all the characteristics of the other plant miRNAs. However, the other three MIR171 loci differ from those of the other miRNAs (Table 1). They are anti-sense to the coding region of three SCARE- CROW-like genes of the GRAS family of putative tran- scription factors (DiLaurenzio et al. 1996; Pysh et al. miR169 1999). This is the first example of a convincing miRNA candidate that is also the perfect anti-sense match to a 4W 40 #0 ow NoS~ coding region. Although this miR171 sequence identity might be a coincidence, the targets of this 21-nt RNA miR156 could include these three SCARECROW-like genes. miR171 (and perhaps the related miRNA, miR170) might act like a translational regulator similar to the $** WO ft r! e A*.W lin-4 and let-7 RNAs, or it might pair with these three miR158 genes for a very different type of regulatory interaction. miR171 could direct cleavage of the messages as if it were an siRNA of the RNAi pathway, or it could direct 7Jlil Ueo a nucleic acid modification such as the methylation of genomic DNA seen in PTGS and transcriptional gene silencing of plants. Interestingly, the Figure 3. Expression of miR169 is dependent on CARPEL five perfect FACTORY. Total RNA from wild-type Landsberg erecta (CAF/ matches to miR171 in Oryza also include one miRNA CAF), heterozygous (CAF/caf), and homozygous (caf/caf) carpel homolog and four anti-sense matches to SCARECROW factory leaves (L), stems (St), and flowers (F) was analyzed on a family members. This observation raises the possibility Northern blot. RNA size markers (M) are noted to the left. The that these SCARECROW segments might be conserved blot probed for miR158 was stripped and reprobed with a U6 based on their function as miRNA targets in addition to end-labeled DNA probe as a loading control. their function in coding proteins.

GENES & DEVELOPMENT 1621

21·1 Reinhart et al.

AU A U G-U G-U C-0 G-U AU G U-A G-U A-U G-U U-A C-0 A-U G G U-A cc A-U O 0 u A U-A C-0 G-C G-U G-C G-C C-G Au A-U G-U C U U-6 cc- A-U UC-0 U A C-0 B G C U U U U A G C-0 a-C C-0 U-A U CU U C C-a C-a a c A G-C A A AU C A-U 0-GU-A C-a U A U U C A C-0 C-G U-A U U A U-A U-A G-C G-C C-G C-6 A-U A-UU U U G-C C-B a a C-G U'G U-A C-a C C A-U A-U A-U U-A 0-C AAU C-0 G-C AC A A-U U-A A C C-G A U-A A G-U 0-C U AA AA U-A A-UA AU- AU C U G UG U-A C C C-G U-A U G-C 0 A-UC A-U C-a G-C C- CU-a 0-C U-AAA G - C U U.0 C C U-A A U C AC A 0 U-A U C U C a C-0 A-U U-A U-A C C-0C-G A-U C A C-G C - GAA A-U 0-0 A-U U-A U-A U-A U-C C-G U-A U-A U-A AC C-0 0-G C U A-U U-A A U U-A aGU A-U U-A A C C U C AU A-U U C A-U C-0 C-G C C C-G A-U U-A C-0 U-A U-A c-G G-CU-A A-U A-U U-A A-U G-C G-C A-U A-U uuA -U UC C-G C-0 a-C c-aG A-U A U C-0 G-c A U 0-c U-A U-A U-A C-GA-U U-A A-U A-U U-A C A A-U G-C U-A C A C-0 A-U G-U U-A U-A U-A C-G C-6 U-A U-A U-A A-U A-U G-C U-A C A A-U G c 0-C G-C C-G U-A C-G a-C G-C G-C G-C G-C G-U G-C G-C C U C U G-U G-C G-C G-C U U A A A-U A-U a-C UC A AG'CA A-U A 0-C 0-G A-U G-CA-U C U -C0 CG0 A-U G-C G-C 0-C A-U 0e-cuU•.U O A G A 0-C A-U A-U G-CA-U A-U C A A-U A-U A-U A-U A-U 8-C G-C G-C G-C G-C 0 U A-U A-U G-C AG'CA A-U a-C G-c a-C G-0 G-C U-A U-A a-C G-C U-A C-G C-G U-A U-A G-C U U G-C a-C C-0 U-A U-A G-C U-A C-9 C-0 CU G-C A-U C C a-c U U U C U-A G-C G-C G-C 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' 5' 3' MIR162a MIR162b MIR162 MIR164a MIR164b MIRI64a MIR164b Arabdopsis Arabidopsis Oryza Arabidopsis Arabidopsis Oryza Oryza Figure 4. Conservation between the Arabidopsis and Oryza predicted stem-loop precursors. (A) miR162 homologs. (B) miR164 homologs. Sequence homology is seen within the miRNA (in red), its paired sequences, and a few base pairs adjacent to the miRNA. The remainder of the sequence has drifted considerably, with the main constraint being the formation of a stem-loop structure.

Other endogenous small RNAs quences. Interestingly, both of these sequences match single loci in the same 2.3-kb region of Chromosome 2 The other two RNAs cloned multiple times, Seq C and that is also the source of four other -22-nt RNAs that we Seq F in Figure 5, are not likely to be miRNAs. Expres- cloned once (Fig. 5). These RNAs are unlikely to be sim- sion of Seq F but not Seq C can be detected on Northern ply degradation products of mRNAs. Only two of these blots (data not shown). Nonetheless, neither appears to six sequences correspond to the same DNA strand as the have the potential to form extended pairing with the two predicted protein-coding genes in this 2.3-kb region. adjoining sequence like that seen for the other 16 se- Moreover, one of the single-clone RNAs (Fig. 5, Seq B) is

1622 GENES & DEVELOPMENT microRNAs in plants

At2g39670 At2g39680 A UUCAAUAAAU AAUUGGUUCU A (1) C* B GAACUAGAAA AGACAUUGGA C (1) C E C UCCAAUGUCU UUUCUAGUUC GU (3) I 5 I D AGAGUAAGAU GGAUCUUGAU AA (1) 11I . - II I I E UAUAUCCCAU UUCUACCAUC UG (1) AB 1 kb D F F UCCAAGCGAA UGAUGAUACU U (3)

Figure 5. A cluster of small RNAs derived from Chromosome 2. Arrows represent the two predicted genes in this region, and vertical lines represent the genomic positions of the six cloned RNAs. Sequences of the RNAs are listed, with cloning frequencies in parentheses. a 2-nt-offset reverse-complement of Seq C. A duplex This would be in contrast to metazoan miRNAs, which formed between them would have 1-nt and 2-nt 3' over- often appear to be processed from metastable stem-loop hangs, reminiscent of Dicer cleavage products during precursors that have been preprocessed from a primary RNAi (Elbashir et al. 2001). The high density of 21-nt to transcript (Lau et al. 2001). Although the common role of 22-nt RNAs cloned from this region implicates either Dicer homologs in the production of plant and animal endogenous RNAi or some other, unknown Dicer-medi- miRNAs highlights the similarities between their ated event. mechanisms of production, there might be differences in the structure and production of precursors, cellular com- partmentalization, timing of precursor processing, or types of cofactors involved in processing. Discussion The increasing number of miRNAs being identified We have described 16 plant miRNAs that have the raises the question of what their cellular functions are. characteristic features of metazoan miRNAs. Like the Although some might regulate translation via base-pair- miRNAs of animals, the plant miRNAs are 20-nt to 24- ing to target gene 3' UTRs in a manner similar to regu- nt endogenous RNAs detectable on Northern blots and lation by lin-4 and let-7 RNAs, it is not clear whether all are derived from one arm of an apparent stem-loop pre- will be found to perform similar biochemical functions. cursor through the action of Dicer. As with most of the One hint that miRNAs could perform other types of metazoan miRNAs, most plant miRNAs begin with a U, RNA-mediated gene regulation is our finding that are transcribed from independent genes, and are evolu- miR171 could interact with the coding region of three tionarily conserved. The discovery that the phylogenetic GRAS family transcription factors through perfect distribution of miRNAs extends to plants indicates that complementarity rather than the limited base-pairing miRNAs arose early in eukaryotic evolution and sug- seen between lin-4 and let-7 and the 3' UTRs of their gests that they have been shaping gene expression since targets. If these genes are regulatory targets of miR171, the emergence of multicellular life. Although the evolu- the miRNA could act like other -21-nt regulatory RNAs tion of the RNAi and PTGS pathways and their related and direct mRNA degradation or epigenetic modification proteins has been attributed to defense against viruses of the genomic sequence. and transposons (Ketting and Plasterk 2000; Vance and A role for the miRNAs in development of both plants Vaucheret 2001), the presence of miRNAs in plants sug- and animals is suggested by the phenotypes of Dicer and gests that Dicer and Argonaute proteins also have an- Argonaute family mutants. In C. elegans, developmental cient roles in miRNA processing and function. defects resulting from reduction of function of dcr-1 One difference between plant and animal miRNAs is (Dicer) and alg-llalg-2 (Argonaute-like gene) have been the dsRNA precursor from which the mature miRNAs attributed to the improper processing of miRNA precur- are cleaved. Based on the length of RNA that would be sors and a reduction in mature miRNA expression necessary to allow the miRNA to be incorporated into an (Grishok et al. 2001). The mutant animals essentially RNA duplex suitable for Dicer cleavage, we predict that reiterate stem-cell-like divisions and delay the switch to plant miRNA precursors can be more than three times as a later-stage developmental program. An intriguing par- large as those of animals (Table 1). However, we have not allel in Arabidopsis is that mutant alleles of caf/sinl detected plant precursor molecules during our Northern delay the meristem switch from vegetative to floral de- analysis of wild-type or caf RNA. Our method may not velopment (Ray et al. 1996a) and cause overproliferation be sufficiently sensitive to detect very low levels of of the floral meristem (Jacobsen et al. 1999), which sug- precursors. Perhaps precursor transcripts are more rap- gests a distant link between the pathways affected by idly cleaved and turned over in Arabidopsis than in Dicer mutants in plants and animals. Mutations in two metazoans, or plant precursors might be too large or dif- Arabidopsis Argonaute family genes also alter meristem fuse in size for Northern analysis techniques maximized development. The argonaute mutants disrupt axillary for the resolution of the -21-nt mature RNAs. For in- shoot meristem formation and leaf development (Bohm- stance, plant miRNAs might be processed cotranscrip- ert et al. 1998), and ZWILLE/PINHEADis required for tionally, directly from transient primary transcripts. shoot meristem maintenance and floral development

GENES& DEVELOPMENT 1623 Reinhartet al.

(Moussian et al. 1998; Lynn et al. 1999). The existence of amide gel, electroblotted to a nylon membrane, and hybridized miRNAs in plants suggests that aberrant processing of to end-labeled anti-sense DNA probes (Lee et al. 1993). miRNAs could be responsible for some if not all of the developmental defects in caf mutants, and it is possible Sequence analysis that the same will be true for argonaute or zwille/pin- Sequences of RNA clones were compared with the Arabidopsis head mutants. However, ARGONAUTE is also required genome downloaded from ftp://ncbi.nlm.nih.gov/genbank/ for PTGS (Fagard et al. 2000; Morel et al. 2002), and a genomes/A_thaliana/ (13-Aug-2001).Predicted secondary struc- related protein is required for RNAi in animals (Tabara tures were generated using the Zucker folding algorithm and et al. 1999; Hammond et al. 2001; Williams and Rubin manually inspected for fold-backs with the RNA sequence in 2002). In fact, the Drosophila Argonaute family member the stem as is characteristic of metazoan miRNAs (Lau et al. aubergine, a gene required for oogenesis (Schupbach and 2001). To identify Oryza sativa homologs, the miRNAs were Wieschaus 1991), is involved in the endogenous RNAi- compared with the rice genome sequence downloaded from the like silencing of Stellate by dsRNA produced from both Beijing Genomics Institute Web site at http://btn.genomics. DNA strands of the Suppressor of Stellate locus (Aravin org.cn/rice (first draft) using the BLAST algorithm, and the ad- joining sequences were analyzed for fold-back secondary struc- et al. 2001), raising the possibility that the Arabidopsis tures as described above. argonaute or caf phenotypes reflect the role of these pro- teins in the production of endogenous siRNAs that con- trol gene expression. Further investigation of the roles of Acknowledgments small RNAs such as those from the Chromosome 2 clus- We thank the Arabidopsis Biological Resource Center at Ohio ter (Fig. 5) will address this possibility. State University for seeds segregatingfor the caf mutation, Nel- Finally, we suspect that other classes of Dicer- and son Lau for reagents and advice in the cloning of endogenous Argonaute-dependent small RNAs are present in Arabi- Dicer products, Lee Lim for advice on bioinformatics, and Phil dopsis. Noncoding RNAs continue to be discovered in a Zamore and members of the Bartel laboratories for comments wide range of organisms, and the roles they play in the on the manuscript. This research was supported in part by the cell are only beginning to be understood (Eddy 2001). In Robert A. Welch Foundation (C-1309). many ways, the most interesting possibility is that no The publication costs of this article were defrayed in part by one class of RNAs can be responsible for the phenotypes payment of page charges. This article must therefore be hereby of Dicer and Argonaute family mutations because organ- marked "advertisement" in accordance with 18 USC section isms use such a rich variety of RNA-mediated gene regu- 1734 solely to indicate this fact. lation in their development. References

Materials and methods The Arabidopsis Genome Initiative. 2000. Analysis of the ge- nome sequence of the flowering plant Arabidopsis thaliana. Plant growth and RNA isolation Nature 408: 796-815. Aravin, A.A., Naumova, N.M., Tulin, A.A., Rozovsky, Y.M., Total RNA from wild-type Arabidopsis thaliana (Columbia ac- and Gvozdev, V.A. 2001. Double-stranded RNA-mediated si- cession) was isolated from 6-day-old seedlings grown on agar- lencing of genomic tandem repeats and transposable ele- based medium overlaid with filter paper and from flowers and ments in Drosophila melanogaster germline. Curr. Biol. stems of 4-week-old plants grown in soil using Trizol (GIBCO 11: 1017-1027. BRL. Total RNA was prepared from leaves and siliques using a Bender, J. 2001. A vicious cycle: RNA silencing and DNA meth- modification of the method described in Nagy et al. (1988),in ylation in plants. Cell 106: 129-132. which the LiCl precipitation was replaced by ethanol precipita- Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. tion. For isolation of RNA from carpel factory plants, progeny of 2001. Role for a bidentate ribonuclease in the initiation step CAF/caf heterozygous plants (in the Landsberg erecta acces- of RNA interference. Nature 409: 295-296. sion! were grown on medium supplemented with 12 g/mL Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., kanamycin for 8 d, after which kanamycin-resistant individuals and Benning, C. 1998. AGO1 defines a novel locus of Ara- were transferred to soil and grown for an additional 24 d under bidopsis controlling leaf development. EMBO . 17: 170- continuous illumination. Plants were then scored as having 180. (caf/cafl or lacking (CAF/cafl the carpel factory phenotype (Ja- Cerutti, L., Mian, N., and Bateman, A. 2000. Domains in gene cobsen et al. 1999), and RNA was prepared from leaves, stems, silencing and cell differentiation proteins: The novel PAZ and flowers using a modification of the Nagay et al. (1988) domain and redefinition of the Piwi domain. Trends Bio- method (see above). Wild-type plants (Landsberg erecta acces- chem. Sci. 25: 481-482. sion) were processed similarly, except that seeds were originally Dalmay, T., Hamilton, A., Rudd, S., Angell, S., and Baulcombe, sown on medium lacking kanamycin. D.C. 2000. An RNA-dependent RNA polymerase in Arabi- dopsis is required for posttranscriptional gene silencing me- RNA analysis diated by a transgene but not by a virus. Cell 101: 543-553. Dalmay, T., Horsefield, R., Braunstein, T.H., and Baulcombe, Endogenous 18-nt to 26-nt RNAs from seedlings and flowers D.C. 2001. SDE3 encodes an RNA helicase required for post- were isolated from total RNA by 15% PAGE and cloned as transcriptional gene silencing in Arabidopsis. EMBO J. described (Lau et al. 2001). The laboratory protocol is available 20: 2069-2078. at http://web.wi.mit.edu/bartel/pub/. For Northern analysis, 20 DiLaurenzio, L., Wysocka-Diller, J., Malamy, J.E., Pysh, L., He- ug of total RNA per lane was separated on a 15% polyacryl- lariutta, Y., Freshour, G., Hahn, M.G., Feldmann, K.A., and

1624 GENES& DEVELOPMENT microRNAs in plants

Benfey, P.N. 1996. The SCARECROW gene regulates an Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans asymmetric cell division that is essential for generating the heterochronic gene lin-4 encodes small RNAs with anti- radial organization of the Arabidopsis root. Cell 86: 423- sense complementarity to lin-14. Cell 75: 843-854. 433. Lynn, K., Fernandez, A., Aida, M., Sedbrook, J., Tasaka, M., Eddy, S.R. 2001. Non-coding RNA genes and the modem RNA Masson, P., and Barton, M.K. 1999. The PINHEAD/ZWILLE world. Nat. Rev. Genet. 2: 919-929. gene acts pleiotropically in Arabidopsis development and Elbashir, S.M., Leneckel, W., and Tuschl, T. 2001. RNA inter- has overlapping functions with the ARGONAUTE1 gene. ference is mediated by 21- and 22-nucleotide RNAs. Genes Development 126: 469-481. & Dev. 15: 188-200. Matzke, M.A., Matzke, A.J., Pruss, G.J., and Vance, V.B. 2001. Fagard, M., Boutet, S., Morel, J.-B., Bellini, C., and Vaucheret, H. RNA-basedsilencing strategies in plants. Curr. Opin. Genet. 2000. AGO1, QDE-2, and RDE-1 are related proteins re- Dev. 11: 221-227. quired for post-transcriptional gene silencing in plants, Morel, J., Mourrain, P., B&clin, C., and Vaucheret, H. 2000. DNA quelling in fungi, and RNA interference in animals. Proc. methylation and chromatin structure affect transcriptional Natl. Acad. Sci. 97: 11650-11654. and posttranscriptional transgene silencing in Arabidopsis. Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., Curr. Biol. 10: 1591-1594. and Mello, C.C. 1998. Potent and specific genetic interfer- Morel, J.B., Godon, C., Mourrain, P., Bclin, C., Boutet, S., ence by double-stranded RNA in Caenorhabditis elegans. Feuerbach, F., Proux, F., and Vaucheret, H. 2002. Fertile hy- Nature 391: 806-811. pomorphic ARGONAUTE (agol) mutants impaired in post- Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, transcriptional gene silencing and virus resistance. Plant I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. 2001. Cell 14: 629-639. Genes and mechanisms related to RNA interference regulate Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, expression of the small temporal RNAs that control C. el- B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. 2002. egans developmental timing. Cell 106: 23-34. miRNPs: A novel class of ribonucleoproteins containing nu- Hamilton, A.J. and Baulcombe, D.C. 1999. A novel species of merous microRNAs. Genes & Dev. 16: 720-728. small antisense RNA in posttranscriptional gene silencing. Mourrain, P., Beclin, C., Elmayan, T., Feuerbach, F., Godon, C., Science 286: 950-952. Morel, J.B., Jouette, D., Lacombe, A.M., Nikic, S., Picault, Hammond, S.C., Bernstein, E., Beach, D., and Hannon, G.J. N., et al. 2000. Arabidopsis SGS2 and SGS3 genes are re- 2000. An RNA-directed nuclease mediates posttranscrip- quired for posttranscriptional gene silencing and natural vi- tional gene silencing in Drosophila cells. Nature 404: 293- rus resistance. Cell 101: 533-542. 296. Moussian, B., Schoof, H., Haecker, A., Jurgens, G., and Laux, T. Hammond, S.M., Boettcher, S., Caudy, A.A., Kobayashi, R., and 1998. Role of the ZWILLE gene in the regulation of central Hannon, G.J. 2001. Argonaute2, a link between genetic and shoot meristem cell fate during Arabidopsis embryogenesis. biochemical analyses of RNAi. Science 293: 1146-1150. EMBO . 17: 1799-1809. HutvAgner, G. and Zamore, P.D. 2002. RNAi: Nature abhors a Nagy, F., Kay, S.A., and Chua, N.-H. 1988. Analysis of gene double-strand. Curr. Opin. Genet. Dev. 12: 225-232. expression in transgenic plants. In Plant molecular biology HutvAgner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., manual ed. S.B. Gelvin and R.A. Schilperoort), Part B4, pp. Tuschl, T., and Zamore, P.D. 2001. A cellular function for 1-29. Kluwer, Dordrect. the RNA-interference enzyme Dicer in the maturation of the Nykaken, A., Haley, B., and Zamore, P.D. 2001. ATP require- let-7 small temporal RNA. Science 293: 834-838. ments and small interfering RNA structure in the RNA in- Jacobsen, S.E., Running, M.P., and Meyerowitz, E.M. 1999. Dis- terference pathway. Cell 107: 309-321. ruption of an RNA helicase/RNAseIII gene in Arabidopsis Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., causes unregulated cell division in floral meristems. Devel- Kuroda, M., Maller, B., Srinivasan, A., Fishman, M., Hay- opment 126: 5231-5243. ward, D., Ball, E., et al. 2000. Conservation across animal Ketting, R.F. and Plasterk, R.H.A. 2000. A genetic link between phylogeny of the sequence and temporal regulation of the 21 co-suppression and RNA interference in C. elegans. Nature nucleotide let-7 heterochronic regulatory RNA. Nature 404: 296-298. 408: 86-89. Ketting, R.F., Fischer, S.E.J., Bernstein, E., Sijen, T., Hannon, Pysh, L.D., Wysocka-Diller, J.W., Camilleri, C., Bouchez, D., G.J., and Plasterk, R.H.A. 2001. Dicer functions in RNA in- and Benfey, P.N. 1999. The GRAS gene family in Arabidop- terference and in synthesis of small RNA involved in devel- sis: Sequence characterization and basic expression analysis opmental timing in C. elegans. Genes & Dev. 15: 2654- of the SCARECROW-LIKE genes. Plant 1. 18:111-119. 2659. Ray, A., Lang, J.D., Golden, T., and Ray, S. 1996a. SHORT IN- Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. TEGUMENT (SIN1), a gene required for ovule development 2001. Identification of novel genes coding for small ex- in Arabidopsis, also controls flowering time. Development pressed RNAs. Science 294: 853-858. 122: 2631-2638. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Len- Ray, S., Golden, T., and Ray, A. 1996b. Maternal effects of the deckel, W., and Tuschl, T. 2002. Identification of tissue-spe- short integument mutation on embryo development. Dev. cific microRNAs from mouse. Curr. Biol. 12: 735-739. Biol. 180: 365-369. Lai, E.C. 2002. MicroRNAs are complementary to 3' UTR mo- Reinhart, B.J., Slack, F.J., Basson, M., Bettinger, J.C., Pasquinelli, tifs that mediate negative post-transcriptional regulation. A.E., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. 2000. Nat. Genet. 30: 363-364. The 21 nucleotide let-7 RNA regulates developmental tim- Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An ing in Caenorhabditis elegans. Nature 403: 901-906. abundant class of tiny RNAs with probable regulatory roles Robinson-Beers, K., Pruitt, R.E., and Gasser, C.S. 1992. Ovule in Caenorhabditis elegans. Science 294: 858-862. development in wild-type Arabidopsis and two female-ster- Lee, R.C. and Ambros, V. 2001. An extensive class of small ile mutants. Plant Cell 4: 1237-1249. RNAs in Caenorhabditis elegans. Science 294: 862-864. Schupbach, T. and Wieschaus, E. 1991. Female sterile muta-

GENES& DEVELOPMENT 1625 Reinhart et al.

tions on the second chromosome of Drosophila melanogas- ter II. Mutations blocking oogenesis or altering egg morphol- ogy. Genetics 129: 1119-1136. Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. 2000. The lin-41 RBCC gene acts in the C. el- egans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol. Cell 5: 659- 669. Tabara, H., Sarkissian, M., Kelly, W.G., Fleenor, J., Grishok, A., Timmons, L., Fire, A., and Mello, C.C. 1999. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99: 123-132. Vance, V. and Vaucheret, H. 2001. RNA silencing in plants- Defense and counterdefense. Science 292: 2277-2280. Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855-862. Williams, R.W. and Rubin, G.M. 2002. ARGONAUTE1 is re- quired for efficient RNA interference in Drosophila em- bryos. Proc. Natl. Acad. Sci. 99: 6889-6894. Yu, ., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79- 92. Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. 2000. RNAi: Double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101: 25-33.

1626 GENES& DEVELOPMENT Cell, Vol. 115,787-798, December 26, 2003,Copyright ©2003 by Cell Press Prediction of Mammalian MicroRNA Targets

Benjamin P. Lewis,', 4 I-hung Shih,2,4 that they could have many more regulatory functions Matthew W. Jones-Rhoades, 1' 2 David P. Bartel,' 2.* than those uncovered to date (Lagos-Quintana et al., and Christopher B. Burge'l* 2001; Lau et al., 2001; Lee and Ambros, 2001; Lai et al., 'Department of Biology 2003; Um et al., 2003a, 2003b). The regulatory roles of Massachusetts Institute of Technology the vertebrate miRNAs in particular remain unknown. Cambridge, Massachusetts 02139 The possibility that many mammalian miRNAs play im- 2 Whitehead Institute for Biomedical Research portant roles during development and other processes 9 Cambridge Center is supported by their tissue-specific or developmental Cambridge, Massachusetts 02142 stage-specific expression pattems as well as their evo- lutionary conservation,which is very strong within mam- mals and often extends to invertebrate homologs (Pas- Summary quinelli et al., 2000; Aravin et al., 2001; Lagos-Quintana et al., 2001, 2002, 2003; Lau et al., 2001; Lee and Ambros, MicroRNAs (miRNAs) can play important gene regu- 2001; Ambros et al., 2003b; Dostie et al., 2003; Houbaviy latory roles in nematodes, insects, and plants by base- et al., 2003; Krichevsky et al., 2003; Lai et al., 2003; Lim pairing to mRNAs to specify posttranscriptional re- et al., 2003a, 2003b; Moss and Tang, 2003). Indeed, pression of these messages. However, the mRNAs miR-181, one of the many miRNAs conserved among regulated by vertebrate miRNAs are all unknown. Here vertebrates, is preferentially expressed in the B lympho- we predict more than 400 regulatory target genes for cytes of mouse bone marrow, and the ectopic expres- the conserved vertebrate miRNAs by identifying mRNAs sion of this miRNA in hematopoietic stem/progenitor with conserved pairing to the 5' region of the miRNA cells modulates blood cell development such that the and evaluating the number and quality of these com- proportion of B lymphocytes increases (Chen et al., plementary sites. Rigoroustests using shuffled miRNA 2003). However, regulatory targets have not been estab- controls supported a majority of these predictions, lished or even confidently predicted for any of the verte- with the fraction of false positives estimated at 31% brate miRNAs, which has slowed progress toward un- for targets identified in human, mouse, and rat and derstanding the functions of these tiny noncoding RNAs 22% for targets identified in pufferfish as well as mam- in humans and other vertebrates. mals. Eleven predicted targets (out of 15 tested) were Finding regulatory targets is much easier for the plant supported experimentally using a HeLa cell reporter miRNAs. In a systematic search for the targets of 13 system. The predicted regulatory targets of mamma- Arabidopsis miRNA families, 49 unique targets were lian miRNAs were enriched for genes involved in tran- found with a signal-to-noise ratio exceeding10:1, simply scriptional regulation but also encompassed an unex- by looking for Arabidopsis messageswith near-perfect pectedly broad range of other functions. complementarity to the miRNAs (Rhoades et al., 2002). Confidence in many of these predictions was bolstered by the observation that the complementarity Introduction is con- served among rice orthologs of the miRNAs and mes- sages (Rhoades et al., 2002), and many of the 49 have MicroRNAs are endogenous "22 nt RNAs that can play since been confirmed experimentally (Uave et al., 2002; important gene regulatory roles by pairing to the mes- Emery et al., 2003; Kasschau et al., 2003; Tang et al., sages of protein-coding genes to specify mRNA cleav- 2003). These predicted targets were greatly enriched in age or repression of productive translation (Lai, 2003; transcription factors involved in developmental pat- Bartel, 2004). The first to be discovered were the lin-4 teming or stem cell maintenance and identity, sug- and let-7 miRNAs, which are components of the gene gesting that many plant miRNAs function during cellular regulatorynetwork that controls the timing of C. elegans differentiation to clear regulatory gene transcripts from larval development (Lee et al., 1993; Wightman et al., daughter cell lineages, perhaps enabling more rapid dif- 1993; Moss et al., 1997; Reinhart et al., 2000; Abrahante ferentiation without having to depend on regulatory et al., 2003; Lin et al., 2003). More recently discovered genes having constitutively unstable messages (Rhoades miRNA functions include the control of cell proliferation, et al., 2002). An analogous search for near-perfect pair- cell death, and fat metabolism in flies (Brennecke et al., ing between the miRNAs and messages of C. elegans 2003; Xu et al., 2003) and the control of leaf and flower and Drosophila genes did not uncover more hits than development in plants (Aukerman and Sakai, 2003; would be expected by chance (Rhoadeset al., 2002). Chen, 2003; Emery et al., 2003; Palatnik et al., 2003). More sophisticated methods for predicting targets of MicroRNA genes are one of the more abundant insect miRNAs have recently been published (Stark et classes of regulatory genes in animals, estimated to al., 2003) or submitted (Enright et al. http://genomebiology. comprise between 0.5 and 1 percent of the predicted com/2003/4/11/P8). The method of Stark et al. (2003) genes in worms, flies, and humans, raising the prospect provides lists of candidate target genes that when used in combination with additional biological criteria, includ- *Correspondence:dbarteltwi.mit.edu (D.P.B.), cburgemit.edu ing functional relationshipsshared among predicted tar- (C.B.B.) gets of individual miRNAs,led to validation of six targets 4 Theseauthors contributed equally to this work. for two Drosophila miRNAs (Stark et al., 2003). The cur- rent Drosophila analyses do not include estimates of A false positive rates, leaving open the question of the SMAD-1 5' UGCCU---CUGGAAAACUAUUGAGCCUUGCAUGUACUUGAAG accuracy of these methods in cases where predicted 1111 IIII1 i iiil targets of a miRNA do not have clear functional relat- miR-26a UCGGAUAGGACCUA------GAA(IU edness. -21 8 kcallmol In the present study, we describe an approach SMAD-1 5' GAGCCUU----- GAUAAUACUUGAC that 11111 III iiliii predicts hundreds of mammalian miRNA targets and miR-26a UCGGAUAGGACCUA--AUGAACUU 5' provide computational and experimental evidence that -17.0 kcalmnol most are authentic, allowing us to begin to explore fun- -dGye-dG2jT 21. 620 17.W020 damental questions about miRNA:target relationships Z=e + e = e + e = 5.3 inanimals. Pairing to the 5' portion of the miRNA, partic- ularly nucleotides 2-8, appears to be most important for target recognition by vertebrate miRNAs. As seen previously for plant miRNAs, the predicted regulatory B targets of mammalian miRNAs are enriched for genes involved in transcriptional regulation. In addition, the predicted mammalian regulatory targets encompass an unexpectedly broad range of other functions. Indeed, several lines of evidence imply that the targets identified in this initial analysis are only a fraction of the total, supporting the possibility that miRNAs regulate the ex- pression of a large portion of the mammalian tran- scriptome.

Results and Discussion

An Algorithm for Predicting Vertebrate MicroRNA Targets To identify the targets of vertebrate miRNAs, we devel- oped an algorithm called TargetScan (the TargetScan software is available for download at http://genes.mit. edu/targetscan), which combines thermodynamics-based modeling of RNA:RNA duplex interactions with compar- ative sequence analysis to predict miRNA targets con- served across multiple genomes (Figure 1). Given an miRNA that is conserved in multiple organisms and a set of orthologous 3' UTR sequences from these organisms, TargetScan (1)searches the UTRs in the first organism for segments of perfect Watson-Crick complementarity Z7 Rank to bases 2-8 of the miRNA 0 100 200 300 400 500 600 (numbered from the 5' . 5.3 45 end)-we refer to this 7 nt segment of the miRNA as the mn -- "miRNA . 4.8 72 seed" and UTR heptamers with perfect Watson- - 4.9 76 Crick complementarity to the seed as "seed matches"; ., 5.2 16 (2)extends each seed match with additional base pairs to the miRNA as far as possible in each direction, Figure 1. Prediction of miRNA Targets allowing G:U pairs, but stopping at mismatches; (3)opti- (A)Structures, energies, and scoring for predicted RNA duplexes mizes basepairing of the remaining 3' portion of the involving human miR-26a and two target sites in the 3' UTR of the miRNA to the 35 bases of the UTR immediately 5' of human SMAD-1 gene, with seeds and seed matches in red and each seed match using the RNAfold program seed extension in blue. (Hofacker (B) Schematic for identification et al., 1994), thus extending each seed match to of targets conserved across mam- a longer mals (upper) and targets conserved in mammals and fish (lower). "target site"; (4)assigns a folding free energy G to each The number of genes from each organism with identified orthologs such miRNA:target site interaction (ignoring initiation in every other organism is indicated. free energy) using RNAeval (Hofacker et al., 1994); (5) (C)Positions of two target sites for miR-26a (blue) in orthologous assigns a Z score to each UTR, defined as: Z = SMAD-1 3' UTR sequences from human (Hs), mouse (Mm), rat (Rn), and - G Fugu (Fr), with the Z score and rank of each miRNA:UTR pair, I, e i, where n is the number of seed matches inthe with T = 20. k=l UTR, Gk is the free energy of the miRNA:target site inter- action (kcal/mol) for the k0 target site evaluated inthe as targets those genes for which both Zi - Zc and Ri - previous step, and T is a parameter described below Rc for an orthologous UTR sequence in each organism, (UTRs that have no seed match are assigned a Z score where Zc and Rc are pre-chosen Z score and rank of 1.0); (6) sorts the UTRs in this organism by Z score cutoffs. and assigns a rank Ri to each; (7)repeats this process The only free parameters in this protocol are Rc and for the set of UTRs from each organism; and (8)predicts Zc, and the T parameter inthe formula relating predicted MammalianmicroRNA Targets 789

free energy to Z score. The value of the T parameter 17166, decreased to 14539 ortholog sets in human- influences the relative weighting of UTRs with fewer mouse-rat and 10276 ortholog sets in human-mouse- high-affinity target sites to those with larger numbers of rat-Fugu. In addition, some miRNA:target interactions low-affinity target sites, and in this sense is analogous might not be conserved between mammals and fish. to temperature. However, there is no thermodynamic Another likely factor is that some features used by Tar- meaning to the T parameter or the Z scores used in this getScan to achieve an acceptable signal:noise ratio analysis; they merely provide a convenient means of might not be strictly required for miRNA regulation. For weighting and summing predicted folding free energies. example, although most known invertebrate miRNAtar- Suitable values for Rc, Zc, and T were assigned by opti- get sites have 7 nt Watson-Crick seed matches (or mization over a range of reasonablevalues using sepa- longer matches), some do not, such as lin-41, a target rate training and test sets of miRNAs. of the C. elegans let-7 miRNA (Lee et al., 1993; Wightman TargetScan was initially applied using two sets of et al., 1993; Moss et al., 1997; Reinhart et al., 2000; miRNAs: a nonredundant pan-mammalian set of 79 Abrahante et al., 2003; Brennecke et al., 2003; Lin et al., miRNAs that have homologs in human, mouse, and puf- 2003). Thus, increasing the number of species increases ferfish and identical sequence in human and mouse, the probability that the orthologous UTRof one or more but not necessarilypufferfish, and a nonredundantpan- species harbors functional sites that fail to satisfy the vertebrate set of 55 miRNAs that have identical se- criteria requiredfor TargetScandetection. Nonetheless, quence in human, mouse, and pufferfish (Lagos- in 115 cases involving the UTRs of 107 genes, the pre- Quintana et al., 2001, 2002, 2003; Mourelatos et al., 2002; dicted target sites were sufficiently conserved to be Dostie et al., 2003; Lim et al., 2003a). These sets, referred detected by TargetScan in orthologous UTRs from all to as nrMamm and nrVert, respectively (Supplemental four vertebrates (details of these predictions are given Table S1 at http://www.cell.com/cgi/content/fulV1 15/7/ in Supplemental Table S5 and Figure S1A on the Cell 787/DC1), are nonredundant in that when multiple miRNAs website). had identical seed heptamers, a single representative It is of utmost importance in this type of bioinformatic was chosen. The initial use of miRNAs that were both analysis to ensure that the shuffled control sequences nonredundant and perfectly conserved among the que- preserve all relevant compositional features of the au- ried species simplified the analysis of signal to noise. thentic miRNAs. For example, when compared to the seeds of shuffled cohorts that had not been screened Prediction of 400 Targets of Mammalian to control for the expected number of target sites and MicroRNAs at a Signal:Noise Ratio of 3.2:1 the expected strength of miRNA:targetsite interactions, the seeds of vertebrate miRNAs have To predict mammalian miRNA targets, the nrMamm set approximately 1.4 of miRNAs was searched against orthologous human, times as many seed matches in vertebrate UTRs. Specif- mouse,and rat 3' UTRsderived from the Ensemblclassi- ically, the seeds of vertebrate miRNAs each had an aver- age of about fication of orthologous genes. Using Rc = 200, Zc = 4.5, 2100 perfect-complement matches in masked vertebrate UTR regions whereas random hep- and T = 20, TargetScan identified 451 putative miRNA: tamers with the same base composition target interactions (representing 400 distinct genes), an averaged only about 1500 matches. The high average of 5.7 targets per miRNA (Figure 2A). This num- number of additional matches seen ber of predicted targets (the "signal") was compared to for the miRNA seed (and also for the antisense of the seed) argues strongly the number of targets predicted for cohorts of shuffled against the bio- logical significance of most of these matches. Instead, (i.e., randomly permuted) miRNAs (the "noise'). As de- these excess matches appear to be scribed below, these shuffled sequenceswere carefully the consequence of dinucleotide composition biases shared between ver- screened to ensure that our estimates of noise were tebrate miRNAs and UTRs, which must be controlled for as accurate as possible and not artifactually low. An in order to avoid artificially highestimates of TargetScan average of only 1.8 targets were identified per shuffled signal:noiseratios (particularly in an algorithmthat looks miRNA sequence, for a signal:noise ratio of 3.2:1. This for multiple matches). Therefore, it was important to ratio was higher than the roughly 2:1 ratio observed for ensure that the shuffled miRNA controls matched the targets of the nrMamm miRNA set predicted using only corresponding miRNAs closely in all sequence proper- the human and mouse UTRs (Figure 2A), underscoring ties that impact the expected number and quality of the importance of evolutionaryconservation across mul- TargetScan target sites. The properties we considered tiple genomes in our approach. The signal:noise ratio were (1) the expected frequency of seed matches in the improved to 4.6:1 when conservation was required addi- UTR dataset; (2) the expected frequency of matching to tionally in the fourth and most divergent species, Fugu the 3' end of the miRNA;(3) the observed count of seed rubripes, using the nrVert set of miRNAs (Figure 2A). matches in the UTR dataset; and (4) the predicted free Although the signal:noise ratio improved as more ge- energy of a seed:seed match duplex. A miRNA shuffling nomes were included, the number of predicted targets protocol, MiRshuffle, was developed to generate ran- per miRNA decreased-even though Rc and Zc were domized control sequences that possess all of these relaxed to 350 and 4.5, respectively, and the value T = properties. For a given miRNA sequence, MiRshuffle 10 was used for the four-species analysis (Figure 2A). generates a series of random permutations with the Severalfactors might contribute to this effect, including same length and base composition as the miRNA, until the increased chance that an orthologous gene will be a shuffled sequence is found that matches the parent missing from the annotations of one genome as the miRNA closely in each of the four criteria listed above. number of organisms is increased. For example, the The MiRshuffle procedure calculated expected fre- number of ortholog pairs available in human-mouse, quencies using a first-order Markov model of 3' UTR Cell 790

A 7.0 6.0 12.0 5.0' 4.0' 10.0 i 30 k 2.0

E 8.0 IL1.0 ' 0.0 1..7 2.8 3..9 4.10 5..11 6..12 713 -13-7-12-6-11-5-10..-4 -9-3 -8..-2 -7.-1 6.0 C 5' end Positionof miRNAseed 3' end

X 4.0

2.0 w 105 I ,, 0.0 i human human human 1.7 2..8 39 4.10 5..11 ..12 713 -13..-7-12.-6 -11.- 5 -10..-4-9-3 -8.-2 -7..-1 mouse mouse mouse 5' end Poi 3 end rat rat Positicon of haptamer Fugu

Figure2. PredictedmiRNA Targets Conserved in MultipleGenomes (A) Meannumber of predicted targets per miRNAfor authenticmiRNAs (filled bars)and meanand standarderror of numberof predicted targets per shuffledsequence for four cohortsof randomizedmiRNAs (open bars).Genomes used for identificationof targetsare listed below correspondingbars. The nrMammset of 79 miRNAswas usedfor human/mouseand human/mouse/rat;the nrVert set of 55 miRNAswas usedfor human/mouse/rat/Fugu. (B) Meannumber of targets per miRNAusing the human/mouse/ratUTR set and alternativemiRNA seed positionsfor the nrVert miRNAs (filled bars)and for cohortsof shuffledcontrols (open bars). Positions of seed heptamerare indicatedunder bars; positivenumbers indicate positionrelative to 5' end of miRNA,negative numbers indicate positions relative to 3' endof miRNA.Note that the signal:noisefor the seed at 2..8differs slightlyfrom that of the human/mouse/ratanalysis in (A) becausea different set of miRNAswas used. (C)Conserved heptamers among paralogous human miRNAs. For eachposition, the numberof differentheptamers that are perfectlyconserved across multiplemiRNAs in rMammis shown. composition that accounts for the long-recognized im- 5.7 - 1.8 = 3.9 true targets conserved across mammals pact of dinucleotide frequency biases on the counts of per miRNA (Figure 2A). A number of factors limit the longer oligonucleotides (Nussinov, 1981). As an addi- sensitivity of our method, including (1) the incom- tional control, anothershuffling protocol was developed, pleteness of orthologous gene annotations; (2) the pos- DiMiRshuffie, which preserves the precise dinucleotide sibility that some targets do not meet our stringent seed composition of both the seed and the 3' end of the matching, Z score, or rank criteria; (3) the possibility that miRNA, as well as the seed match count and seed:seed some mammaliantarget sites lie outside the 3' UTR, as match folding free energy. This protocol is less general often observed for plant miRNAs (Rhoadeset al., 2002); than MiRshuffle in that not every oligonucleotide can be (4) the requirement that targets be conserved in the randomizedwhile preservingexact dinucleotide compo- complete set of organisms; and (5) the limitation that sition-e.g., the only heptamer with the same dinucleo- our method does not model the simultaneousinteraction tide composition as the miR-100 seed, ACCCGUA, is of multiple miRNA species with the same UTR. Thus, ACCCGUAitself. Nevertheless,it was possibleto gener- the actual number of target genes regulated by each ate DiMiRShuffled controls for 47 of the 79 nrMamm miRNA is likely to be substantially higher. miRNAs, and a signal:noise ratio of 3.5 was observed using this control in the three-mammal analysis (data not shown), comparable to the value obtained for MiR- The Conserved 5' Region of Mammalian shuffled controls. Because of its wider applicability, MicroRNAs Is Most Important MiRshuffle was used in all reported experiments. for Target Identification In summary, even when the shuffled control se- TargetScantreats the 5' and 3' ends of miRNAs differ- quences were carefully selected to closely match the ently, with perfect basepairing required for the seed at corresponding miRNAs in all sequence properties ex- the 5' end, but no such requirement at the 3' end. The pected to influence the number and quality of target importance of complementarityto the 5' portion of inver- sites, these shuffled controls yielded far fewer targets tebrate miRNAs has been suspected since the observa- than did the authentic miRNA sequences. This differ- tion that complementary sites within the lin-14 mRNA ence results from an increased propensity of vertebrate have "core elements" of complementarityto the 5' seg- UTRs to contain multiple conserved regions of comple- ment of the lin-4 miRNA (Wightman et al., 1993) and mentarity to authentic miRNAs. We conclude that this has been corroborated with the observation that the 5' propensity reflects a functional relationshipbetween the segments of numerous invertebrate miRNAs are per- miRNAs and the identified UTRs-that is, to the extent fectly complementary to 3' UTR elements that mediate that the signal exceeds the noise,these identified UTRs posttranscriptional regulation or are known miRNA tar- are the regulatory targets of the miRNAs. gets (Lai, 2002; Stark et al., 2003).Moreover, the 5' ends Correcting for the estimated rate of false positives, of related miRNAs tend to be better conserved than TargetScan appears to have identified an average of the 3' ends (Lim et al., 2003b), further supporting the MammalianmicroRNA Targets 791

hypothesis that these segments are most critical for the rMamm set was restricted to those miRNAs with mRNA recognition. recognized Fugu homologs. The higher signal seen for To explore this hypothesis, TargetScan was applied the more broadly conserved miRNAscan be explained to predict targets of the nrVert miRNA set conserved by the idea that miRNAswith larger numbers of targets between human, mouse, and rat using versions of the would be under greater selective constraint, and there- algorithm differing in the miRNA heptamer defined as fore less likely to change during the course of evolution. the seed in step 1 (Figure 2B). Consistent with residues Thus, more broadly conserved miRNAs would be likely at the 5' end of miRNAs being most important for target to have more targets and consequently a higher Tar- recognition, the highest signal:noise ratio was observed getScan signal. This observation again supports the when the seed was positioned at or near the extreme conclusion that TargetScan is detecting authentic tar- 5' end of the miRNA, with signal:noise values of 2.7, 3.4, gets because otherwise it would be difficult to explain and 1.6 observed for seeds at segments 1..7, 2..8, and the observed difference in signal:noisefor broadly con- 3..9, respectively, and signal:noise ratios of 1.3 or less served miRNAs relative to that of less broadly con- at other seed positions. We suggest that the critical served miRNAs. importance of pairing to segment 2..8for target identifi- The 854 miRNA:UTR pairs represented UTRs of just cation in silico reflects its importance for target recogni- 442 distinct genes because many genes were hit by tion in vivo and speculate that this segment nucleates multiple miRNAs. In these cases, the miRNAs were usu- pairing between miRNAs and mRNAs. ally, but not always, from the same paralogous miRNA Thoseseed positions that had the highestsignal:noise family, often with the same seed heptamer. In those ratios in the sliding seed analysis (Figure 2B) also had the cases where the same UTR was hit by multiple miRNAs highest degree of heptamer conservation in paralogous from different families (54genes), the target sites gener- human miRNAs (Figure 2C). This observation strength- ally did not overlap, consistent with simultaneous bind- ens the assertion that the signal seen above noise in ing and regulation of some target genes by combina- our analysis reflects a functional relationship between tions of miRNAs.A complete list of the 442target genes the miRNAs and the identified UTRs because otherwise and the corresponding miRNAs is provided (Supplemen- it would be difficult to explain why the most conserved tal Figure SIB and Table S2 on the Cell website). An portions of the miRNA and not other miRNA segments abbreviated list appears as Table 1, where genes were have the greatest propensity to match multiple con- chosen on the basis of high biological interest. Genes served segments in UTRs. involved in transcription, signal transduction, and cell- cell signaling dominate this list, including a number of The Number of Predicted Targets Is Greatest human disease genes such as the tumor suppressor for the Most Highly Conserved MicroRNAs gene PTEN and the protooncogenes E2F-1, N-MYC, The set of target genes predicted using conservationof C-KIT, FLI-1, and LIF. miRNAcomplementarity across the three mammalswas most suitable in size and quality for systematic analysis Experimental Support for 11 Predicted of gene function. To obtain as large a set of targets as Regulatory Targets possible, we searched our set of orthologous mamma- Reporter assays were used to test 15 predicted targets lian 3' UTRs using an expanded set of 121 conserved of mammalian miRNAs in HeLa cells. The 15 targets mammalian miRNAs (rMamm, Supplemental Table Si selected for these experiments all had known biological on Cell website) that includes miRNAs that were ex- functions but resembledthe complete set of predictions cluded from the nrMamm set because they had redun- in other respects, e.g., there was no significant differ- dant seeds,yielding a total of 854 predicted miRNA:UTR ence in the average Z score, rank, or number of target pairs conserved across human, mouse, and rat (Supple- sites per mRNA between the tested targets and the mental Figure SIB). This number of predicted targets complete set of predicted targets. In only one case did (854) represents an 89% increase over the 451 targets the tested targets of a miRNA have obvious functional predicted for the nrMamm miRNAs, even though the relatedness (NOTCH1, a receptor for DELTA1, both pre- number of miRNAs used increased by only 53% from dicted targets of miR-34). Three of the 15 genes, 79 to 121. This discrepancy prompted us to ask whether SMAD-1, BRN-3b, and Notchl, were also in the set of membership in a multi-miRNA gene family influenced predicted targets conserved to Fugu. Eight genes were the abundance of targets. Indeed, we found that the 27 predicted targets of miRNAsthat had been cloned from miRNAs in nrMamm that were members of paralogous HeLa cells (Lagos-Quintana et al., 2001; Mourelatos et miRNA families, i.e., families with variant miRNAs that al., 2002), and three genes were predicted targets of have the same seed, had an average of 8.7 predicted miR-34, which is also expressed in HeLa cells, based targets per miRNA, more than twice the average of 4.2 on Northern analysis (data not shown). For these 11 seen for the remaining 52 nrMamm miRNAs, although genes, a 100 to 1200 nt 3' UTR segment that included the difference in signal:noise between these two sets miRNA target sites was inserted downstream of a firefly was not as pronounced. luciferase ORF, and luciferase activity was compared When initially expanding our list of mammalian to that of an analogous reporter with point substitutions miRNAs, we found that the set of 19 mammalian miRNAs disrupting the target sites (as illustrated for SMAD-1, that were conserved between human and rodents but Figure 3A). Of these 11 UTRs, mutations in eight (SMAD-1, for which a Fugu homolog was not found gave an unac- SDF-1, BRN-3b, ENX-1, N-MYC, PTEN, Deltal, and ceptably low signal:noise ratio of 1.2:1,even though the Notch1, but not HOX-A5,MECP-2, or VAMP-2)signifi- analysis did not extend to the Fugu UTRs.Accordingly, cantly enhanced expression (p < 0.001),as expected if Cell 792

Table1. HighlyCited PredictedTargets of MammalianmiRNAs Category Seed miRNAs EnsemblID GeneName Regulation of AGUGCAA miR-130,-130b 169057 Methyl-CPG-bindingprotein 2 (MECP2) transcription/ GUGCAAA miR-19a 169057 DNA binding AAAGUGC miR-20,-106 101412 Transcriptionfactor E2F1 GAGGUAG let-7(a-g,i),miR-98 100823 DNA-(apurinicor apyrimidinicsite) lyase (APEN) GAAAUGU miR-203 125347 Interferonregulatory factor 1 RF-1). ACAGUAC miR-101 134323 N-MYCprotooncogene protein GAGGUAU miR-202 134323 ... AAUCUCA miR-216 065978 Nucleasesensitive element binding protein1 (YB-1) UAAGGCA miR-124a 163403 Microphtalmia-associatedtranscription factor GCUGGUG miR-138 054598 Forkheadbox protein C1 (FKHL7) AAAGUGC miR-20,-106 103479 Retinoblastoma-likeprotein 2 (RBR-2) UCCAGUU miR-145 151702 Friendleukemia integration 1 transcriptionfactor (FLI-1) GCAGCAU miR-103,-107 137309 High mobilitygroup protein HMG-I/HMG-Y(HMG-I(Y)) GGAAGAC miR-7 136826 Kruppel-likefactor 4 (EZF) Signal UAAGGCA miR-124a 168610 Signaltransducer and act. of transcription3 (STAT3) transduction/ UGGUCCC miR-133,-133b 010610 T cell surfaceglycoprotein CD4 precursor cell-cell UCACAUU miR-23a,-23b 107562 Stromalcell-derived factor 1 precursor(SDF-1) signaling GCUACAU miR-221,-222 157404 Mast/stemcell growthfactor receptorprecursor (C-KIT) GGAAUGU miR-1,-206 176697 Brain-derivedneurotrophic factor precursor(BDNF) UAAGGCA miR-124a 154188 Angiopoietin-1precursor (ANG-1) GGCAGUG miR-34 148400 Notch homologprotein 1 precursor(HN1) CCCUGAG miR-125a,-125b 128342 Leukemiainhibitory factor precursor(LIF) AGUGCAA miR-130,-130b 184371 Macrophagecolony stimulatingfactor-1 precursor(MCSF) UCACAGU miR-27a 184371 AAUACUG miR-200b 008710 Polycystin1 precursor GAAAUGU miR-203 122641 Inhibinbeta A chainprecursor (EDF) AUUGCAC miR-25,-92 065559 Dual spec.mitogen-activated protein kinase kinase 4 GCUGGUG miR-138 070886 Ephrintype-a receptor 8 precursor(HEK3) GUAAACA miR-30(a-e) 156052 Guaninenucleotide-binding protein G(l),alpha-2subunit AUUGCAC miR-25,-92 156052 .. .. GAGAACU miR-146 175104 TNF receptor-associatedfactor 6 (TRAF6) GGCUCAG miR-24 166484 Mitogen-activatedprotein kinase7 (ERK4) GAGAUGA miR-143 166484 AGCUGCC miR-22 166484 GCAGCAU miR-103,-107 141433 Pituitaryadenylate cyclase act. polypeptideprecursor Other GUGCAAA miR-19a,-19b 171862 Phosphatidylinositol-3,4,5-trisphos.3-phosphatase (PTEN) AGUGCAA miR-130,-130b 130164 Low-densitylipoprotein receptor precursor (LDLR) GGAAUGU miR-1,-206 160211 Glucose-6phosphate 1-dehydrogenase (G6PD) UUGGCAC miR-96 101986 Adrenoleukodystrophyprotein (ALDP) AGCACCA miR-29b,-29c 168542 Collagenalpha 1(111)chain precursor AGCACCA miR-29b,-29c 114270 Collagenalpha 1(VII)chain precursor AUUGCAC miR-25,-92 168090 COP9subunit 6 AAGUGCU miR-93 168090 AAAGUGC miR-20,-106 168090 CCCUGAG miR-125a,-125b 160613 Proproteinconvertase subtilisin/kexin type 7 precursor The 442 predictedtargets conservedbetween human, mouse and rat were ranked basedon the numberof referenceslisted in the RefSeq GenBankflatfiles (11/10/03download). The top 37 most referencedpredicted targets are shown,grouped on the basis of GeneOntology annotations.The last six digits of the EnsemblID are shown (ENSG00000#).MicroRNAs with different seedsthat target the same UTRare listed on separatelines.

the endogenous miRNAs in the HeLa cells were speci- Four tested genes (G6PD, BDNF, MCSF, and LDLR) fying the repressionof reporter geneexpression by pair- were predicted targets of miR-1 and miR-130, two miRNAs ing to the predicted target sites (Figure3B). Significantly that had not been cloned from HeLa cells and were enhanced expression was also observedwhen the anal- not detected by Northem analysis. Initially, reporters ogous experiment was performed using either the full- containing UTR segments from these four genes were length C. elegans lin-41 3' UTR or a 124 nt segment of examined for response to transfected miRNAs (Doench the UTR containing the two previously proposed let-7 et al., 2003) (data not shown). Of the four, G6PD, BDNF, miRNA target sites (Reinhart et al., 2000), indicating that and MCSF responded to the transfected miRNAs. To at least some of the repression of lin-41 observed in C. further validate these targets, we used a second assay elegans can be recapitulated by HeLa let-7 miRNA in resembling the one described for targets of miRNAs this heterologousreporter assay (Figure3B). For all eight expressed in HeLa cells, except that it took advantage predicted human targets of endogenous HeLa miRNAs of HeLa cell lines ectopically expressing either human that responded to mutations, the increasein expression miR-1 or human miR-1 30. Mutations in the miRNA target seen when disrupting the pairing to the miRNA seed sites of all three of the genesthat had respondedto trans- was at least as high as that seen for mutations in the fected miRNAs led to significantly increased reporter out- let-7 target sites of lin-41 (Figure 3B). put in the lines expressing the cognate miRNAs, but not Mammalian microRNA Targets 793

A ORF 3'UTR SMAD-1 llý l I Firefly Luc+ gone

WT 5'-f-Gr GCCU--- CUGGAA-l1lt-GUACUUGAA,, l*nt (;AGCCUU- --- GAUAAUACUUGA( S't-3' fl l 11111 11111111 H ill1 II 1111111 UCGGAUAGGACCUA-- AUGAACUU UCGGAUAGGACCUA--AUGAACUU mn-2"

Mutant 5'-hnt UGCCU --- CUGGAA- 18ft-GUUCCUUAA ifnt-GAGCCUU--- GAUAAUUCGUUAC5nt-3* 1111 11111 11111 H ill I IIII I I UCGGAUAGGACCUA ----- AUGAACUU UCGGAUAGGACCUA--AUGAACUU rmwma

19 21

let-7 miRNA

100-

10. 7.9

16 S1.0 . 0.1 09

0.1 - I

+ miR-1 - mtR-1 +miR-130 -mlR-130

Figure 3. Experimental Support for Predicted Targets (A) Schematic of a reporter construct used to evaluate the role of complementarity between miR-26a and the SMAD-1 3' UTR. The wild-type (WT) construct had a 106 nt fragment of the SMAD-1 UTR (green) containing two miR-26a target sites (blue) inserted within the firefly luciferase 3' UTR. The mutant construct was identical to the WT construct except that it had three point substitutions (red) disrupting pairing to each miR-26a seed. (B)Box plots showing the luciferase activity after reporter plasmids were transfected into HeLa cells. Reporters analogous to those depicted for SMAD-1 were constructed for the indicated target genes (Supplemental Figure S2 on Cell website). The UTR fragments often had two target sites to the indicated miRNA, and both were disrupted in the mutant reporters (exceptions were SDF-1, BRN-3b, G6PD, Deltal, Notchl, and BDNF, which each had three target sites, two of which were disrupted, and N-MYC, which had one of its two miR-101 sites disrupted). Firefly luciferase activity was normalized to Renilla luciferase activity of the transfection control plasmid and then normalized to the median activity of the corresponding WT reporter. Each box represents the distribution of activity measured for each WT (blue) and mutant (red) reporter (n = 12-15; ends of the boxes define the 25* and 75" percentiles, a line indicates the median, bars define the 106 and 90* percentiles, and the number indicates the median activity of the mutant reporter). Asterisks (*)denote instances in which differences between the WT and mutant were statistically significant (p < 0.001; Mann-Whitney test). Two pairs of constructs for C. elegans lin-41, a previously known target of let-7, were tested, one with a full-length and the other with a 124 nt UTR segment (f and s, respectively). Except for miR-1 and miR-130, the miRNAs were all endogenously expressed in the HeLa cells. Reporters corresponding to predicted targets of miR-1 and miR-130 (G6PD, BDNF, and MCSF) were each examined in a HeLa cell line stably expressing the relevant miRNA (+ miR-1 or + miR-130) and the parental cell line (- miR-1 or - miR-130). in the parental lines lacking the miRNAs (Figure 3B), as et al., 2003b). For miR-1, Northern analysis with a syn- expected if these genes were authentic targets of the thetic miR-1 standard allowed accurate quantitation, respective miRNAs. The levels of ectopically expressed revealing an average expression of 500 miR-1 molecules miR-1 and miR-130 were comparable to those of endog- per cell. enous miRNAs, as judged by Northern blot analysis (Lim In sum, for 11 of the 15 cases tested, the sites identi- Cell 794

fled by TargetScan influenced expression of an up- target sites supports the assertion that most of the pre- stream ORF when expressed in the same cells as the dicted targets are authentic. For many, the pairing out- corresponding miRNAs. Additional experiments in ani- side the seed was less extensive than that previously mals will be neededto address the particular biological proposed for miRNAtargets (SupplementalFigures SI A consequencesof these regulatory interactions, but the and Si B). Perhaps TargetScan is identifying mRNA ele- evolutionary conservation of the pairings suggests that ments that are necessary but not sufficient for miRNA they are important. All four of the remaininggenes might regulation. Alternatively, these elements might be suffi- not be true targets; our statistical analysis using shuffled cient, in which case their low information content raises controls indicatedthat about 30% of predicted mamma- the possibility that miRNAs modulate the utilization of lian targets are likely to be false positives (Figure 2). a substantial fraction of the mammalian mRNAs. Altematively, some might still be authentic targets In none of the 15 cases tested was there evidence whose regulation was not detected in our assays. Regu- of miRNA-mediated activation of reporter expression; lation would be missed in cases for which cell type- changes either were not statistically significant or were specific factors were required that were not expressed in the direction of miRNA-directed repression. This re- in HeLa cells, or in cases for which additional mRNA sult suggests that mammalian miRNAs are generally elements were required but were not included in the negative regulators of gene expression, as has been UTR segments used in our reporters. observed for the known examples in invertebratesand One limitation of the existing sequence databases plants (Lai, 2003; Bartel, 2004). that complicates the systematic identification of miRNA targets is that UTR annotations are often absent or in- Predicted Mammalian MicroRNA Targets complete. In order to compensatefor this limitation, we Have Diverse Functions had extended each annotated 3' UTR with 2 kb of 3' To assess target gene functions, we evaluated the fre- flanking sequence. Using extended UTRs substantially quency of specific gene ontology (GO) molecular func- increased the number of predicted targets, with signal-to- tion classifications (Gene Ontology Consortium, 2001) noise ratios at least as high as they were for unextended among the predicted targets of the nrMamm miRNAs UTRs,suggesting that extension of the annotated UTRs and their shuffled control sequences(Table 2). Predicted allows detection of many additional authentic target miRNAtargets populated many major GOfunctional cat- genes. One consequence of using this UTR-extension egories, and for each of these categories, the number of protocol is that for some genes,all predicted target sites targets for the real miRNAs greatly exceeded the average will fall outside of annotated UTRs. Manual inspection for the shuffled cohorts. Therefore, despite the presence of the 15 UTR regions tested in our reporter assays of false positives among our predictions, the data in revealed that in all but one of these cases the tested Table 2 strongly indicate that mammalian miRNAs are target sites were contained within regions whose status involved in regulation of target genes with a wide spec- as UTRs was supported by known ESTsand predicted trum of molecular functions. polyadenylation sites, even though some of these re- We also compared the proportion of genes that fell gions are not yet annotated as human UTRs. For the in each of the GO molecular function and GO biological single exception, the Notchl gene, the tested target sites were all located downstream of the annotated 3' process categories for the predicted targets of miRNAs, for targets of shuffled control sequences, and for the UTR of the human gene, and the end of the annotated Notch1 3' UTR was supported by a predicted polyade- initial set of orthologous genes (Table 2 and Supplemen- nylation site and alignment of multiple ESTs. However, tal Table S4 on Cell website).The targets of the shuffled Notchl might have additional 3' UTR isoforms; many cohorts were enriched relativeto the initial set of ortholo- human genes-perhaps as many as 50% or more of the gous genes in certain GO biological process categories genes in the genome-have alternative polyadenylation suchas development(14% versus 8%) andtranscription sites (Iseli et al., 2002).In order to investigatethe poten- (13% versus 9%) (Table S4) and in molecular function tial expression of the tested Notch1 target sites, which categories such as nucleic acid binding (21% versus gave a positive result in our assay for miRNA regulation 14%), DNA binding (15% versus 10%), and transcrip- (Figure 3), an RT-PCR assay was used with polyA- tional regulator activity (10% versus 6%) (Table 2). The selected RNA from a pool of human tissues. Consistent biases seen for the shuffled cohorts are likely to result with the possibility that these sites lie within an altema- primarily from the TargetScan requirement for con- tive UTR isoform of Notchl, an RT-dependent product served segments in the 3' UTRs of predicted targets of the correct size and sequence was observed (data and may reflect differences in the occurrence of 3' UTR not shown).The TargetScanset of predicted mammalian regulatory elements in different classes of genes. target genes (SupplementalTable S1B on the Cell web- In the GO biological process classifications, the pre- site) undoubtedlycontains other examples for which the dicted regulatory targets of authentic miRNA genes target sites all lie outside of the UTR regions supported were enriched in the developmentcategory but no more by available data; some of these will be false positives, than the targets of shuffled controls and were substan- but others might point to the miRNAregulation of alter- tially more enriched for genes involved in transcription native mRNA isoforms. (21% of miRNA targets versus 13% of shuffled targets versus 9% of the initial dataset) and regulation of tran- Human miRNAs Predominantly Are Negative scription (21% versus 12% versus 8%) (Supplemental Regulators of Gene Expression Table S4). In terms of the GO molecularfunction classifi- The finding that a sizable fraction of the tested UTR cations, targets of authentic miRNAs were enriched in segments were sensitive to mutations disrupting their the categories DNA binding (20% versus 15% versus MammalianmicroRNA Targets 795

Table2. MolecularFunction Classification of PredictedmiRNA Targets Meanof All Orthologous GO ID MolecularFunction miRNAs ShuffledCohorts Genes None/unknown 115 (29%) 45 (37%) 5131 (35%) Knownfunction 285 (71%) 77 (63%) 9408 (65%) GO:0005215 Transporteractivity 36 (9%) 14 (12%) 1441 (10%) GO:0005515 Proteinbinding 37 (9%) 11 (9%) 1005 (7%) GO:0016787 Hydrolaseactivity 36 (9%) 12 (9%) 1502 (10%) GO:0016740 Transferaseactivity 39 (10%) 10 (8%) 1104 (8%) GO:0016301 Kinaseactivity 29 (7%) 6 (5%) 624 (4%) GO:0046872 Metal ion binding 27 (7%) 5 (4%) 952 (7%) GO:0003676 Nucleicacid binding 101 (25%) 26 (21%) 2072 (14%) GO:0003677 DNA binding 80 (20%) 18 (15%) 1431 (10%) GO:0030528 Transcriptionreg. act. 56 (14%) 12 (10%) 879 (6%) GO:0000166 Nucleotidebinding 52 (13%) 10 (8%) 1172 (8%) GO:0004871 Signaltransducer act. 55 (14%) 12 (10%) 1959 (13%) GO:0004872 Receptoractivity 29 (7%) 5 (4%) 1351 (9%) The numberand percentageof genesannotated with variousGene Ontology molecular function categories are shownfor targetsof nrMamm miRNAs,targets of shuffledcontrol miRNAs(mean of four cohorts),and for the initial set of orthologoushuman-mouse-rat genes. If GO categorieshave a parent-childrelationship, the child is indented.Because one gene can belongto multipleGO categories,the sum of the percentagesin eachcolumn is not interpretable.

10%), transcription regulatory activity (14% versus 10% the periphery of the regulatory networks, where they versus 6%), and nucleotide binding (13% versus 8% regulate genes with a variety of molecular functions. versus 8%) (Table 2). The differing numbers of predicted The predicted mammaliantargets also differ from the targets in the similar-sounding categories "regulation of plant targets with respect to biological function. Nearly transcription" (GO biological process classification)and all of the transcription factors (TFs)predicted to be plant "transcription regulatory activity" (GO molecular func- miRNA targets have known or implied roles in plant devel- tion classification) prompted us to investigate the gene opment, as do several of the other predicted plant tar- content of these two categories. Inspection of the lists gets (Rhoades et al., 2002). By comparison, only -13% of genes showed that all but two of the predicted target of predicted mammalian miRNA targets were involved genes in the "transcription regulatoryactivity" category in development according to the GO biological process were also included in the larger "regulation of transcrip- categories (Supplemental Table S4). An important ca- tion category," but that the latter category also con- veat to this analysis is that gene annotation and GO tained more than two dozen additional target genes,the categories are still evolving. Nonetheless,our data sug- annotation of which generally supporteda role in control gest that mammalian miRNAs are not exclusively, or of transcription. The GO process category "regulation even primarily, involved in the traditional miRNA role of of transcription" (Supplemental Table S4) therefore ap- developmental control. Instead, we find evidence for pears to provide a more complete listing of known and miRNA regulation of a very broad diversity of biologi- putative transcription factors. cal processes. The proportion of the predicted mammalian miRNA target genes involved in the GO process categories ExperimentalProcedures "transcription" and "regulation of transcription" was sig- nificantly higherthan that seen for either shuffled targets MicroRNADatasets or for the initial gene set (p < 0.001). Nonetheless, this Humanand mouse miRNA sequences that satisfyestablished crite- ria (Ambroset al., 2003a)were downloaded from the Rfamwebsite bias was much lower in magnitude than that seen in (http://www.sanger.ac.uk/Software/Rfam).Human miRNAs that plants: of the 49 targets predicted in a systematic search lackedannotated mouse orthologs and mousemiRNAs that lacked for complementarity to plant miRNAs, 69% were mem- annotatedhuman orthologs were searchedagainst the mouseand bers of transcription factor gene families (Rhoadeset al., humangenomes, respectively, with BLASTN(Altschul et al., 1997) 2002).Examples of other types of predicted mammalian and MiRscan(Lim et al., 2003a,2003b). To identifyFugu homologs, targets include translational regulators (e.g., COP9 sub- the humanmiRNAs were searchedagainst the Fugu genomeusing BLASTNand MiRscan,and the 121 humanmiRNAs with perfectly unit 6, ERF1), regulators of mRNA stability (e.g., HU- homologousmiRNAs in mouseand clear homologousmiRNAs in Antigen D), structural proteins (e.g., collagen), and en- Fugu were assignedto rMamm. For sets of human miRNAsin zymes (e.g., G6PD). The set of predicted miRNA targets rMammwith identicalseed heptamers,a singlerepresentative was conserved across all four vertebrates (SupplementalTa- chosen, yielding 79 human miRNAs(nrMamm). The choice was ble S5 online) was also somewhat biased toward genes basedon conservationto Fuguand C. elegansmiRNAs when possi- involved in transcription, but had annotated functions ble (i.e., the sequencemost broadlyconserved was chosen),but consistent with the broad array of biological activities was otherwiseessentially arbitrary (the miRNA with the lowestmir-# was generallychosen). The subsetof 55 miRNAsfrom nrMammthat seen for the larger mammalian target set. We conclude had perfectconservation to Fugu wereassigned to nrVert. that although mammalian miRNAs are sometimes at the center of gene regulatory networks, where they regulate 3' UTRDatasets genes,such as transcription factors, that regulate other 3' UTR sequencesfor all humangenes, and all mouse,rat, and genes, they are more likely than plant miRNAs to be at Fugugenes associated with a humanortholog, were retrieved using Cell 796

EnsMartversion 15.1 (http://www.ensembl.org/EnsMart). Annotated quence SI,S2 ,SS,S4,SS,S 7,Ss, and the seed designatedas bases 3' UTRsequences were available for only 45%of rat genes in this 2..8,E(SM) was equalto (PsPe, .Psiss.PsS,P s,s,.Pss,).Ps where Moreover,14% of annotated set and for none of the Fugugenes. PSiSA,was the conditionalfrequency of the nucleotideSk+ givenSk rat 3' UTRsequences were less than 50 nucleotidesin length.There- at the previousposition in the set of inversecomplements of the we extendedeach annotated3' UTRwith 2 kb of 3' flanking fore, UTRsin the UTRdatabase. E(TM) was the analogous quantity calcu- Repetitiveelements were maskedin these sequences sequence. latedfor the remainderof the sequence(i.e., for bases9,10, 11, ... to usingRepeatMasker (Smit, A.FA. andGreen, P., http/repeatmasker. the end of the miRNAor shuffledmiRNA). O(SM) was determined genome.washington.edu/cgi-bin/RM2_req.pl)with repeat libraries directly from heptamercounts in the UTRdataset. The predicted for primates,rodents, or vertebrates,as appropriate. folding free energyof a seed:seedmatch duplex was determined usingRNAeval. The DiMirShuffleprogram generated shuffled con- Identification of miRNATarget Sites trols for a givenmiRNA sequence by shufflingthe dinucleotidesof The 3' UTRsequences were searched for antisensematches to the the specifiedmiRNA seed (e.g.,bases 2..8 of the miRNA). designatedseed region of each miRNA(e.g., bases 2..8 starting from the 5' end).Our choice of a 7 nt seed was motivatedby the DNAConstructs observationthat shorterseeds gave substantially lower signal:noise The fireflyluciferase vector wasmodified from pGL3Control Vector ratios,while longerseeds reduced the numberof predictedtargets (Promega),such that a short sequencecontaining multiple cloning of the at comparablesignal:noise ratios. Becausechanging the size sites (5'-AGCTCTATACGCGTCTCAAGCTTACTGCTAGCGT-3')was seed has a large effect on the noise as well as the signal,these insertedinto the Xbal site immediatelydownstream from the stop observationsare much more difficult to interpretin terms of potential codon.3'UTR segments of the targetgenes were amplified by PCR mechanisticimplications than the "sliding seed"data of Figure2B. from human genomicDNA and inserted into the modified pGL3 For seeds located on the 5' portion of the miRNA,35 nt flanking vector betweenSacl andNhel sites. PCRwith the appropriateprim- on the seedmatch on the 5' end and 5 nt flankingthe seed match ers also generatedinserts with point substitutionsin the miRNA was the 3' end were retrieved(a "mirror" version of this algorithm complementarysites. Wild-type and mutant inserts were confirmed used for 3' seedsin the experimentdescribed in Figure2B). Target by sequencingand are listed (SupplementalFigure S2 online). sites in whichthe 35 nt flankingregion contained masked bases or a previous the seed matchoccurred less than 20 nt downstreamof Transfectionsand Assays the miRNAseed seed matchwere discarded.Basepairing between AdherentHeLa S3 cells weregrown in 10%FBS in DMEM,supple- basepairsas far as and UTRwas extendedwith additionalflanking mentedwith glutaminein the presenceof antibiotics,to 80%-90% pairs butdisallowing gaps. possiblein both directions,allowing G:U confluencyin 24-wellplates. Cells weretransfected with 0.4 Itg of (or in the case of The basepairingpattern of the remaining3' end thefirefly luciferase reporter vector and 0.08 ALgof thecontrol vector by runningRNAfold a 3' seed, the remaining5' end)was predicted containingRenilla luciferase,pRL-TK (Promega), in a final volume consisting of an artificial stemloop (5'- on a foldback sequence of 0.5ml using Lipofectamine2000 (Invitrogen). Firefly and Renilla where "L" is an anony- GGGCCCGGGULLLLLLACCCGGGCCC-3', luciferaseactivities were measuredconsecutively using the Dual- and all other basesare pairedto a mous unpairedloop character, luciferaseassays (Promega)30 hr after transfection.Each firefly to complementarybase on the oppositeside of the stem)attached plasmidwas tested in 12-15transfections (four or five independent optimizationwas constrained the extendedseed match. RNAfold experiments,each withthree culturereplicates) involving two inde- foundin previoussteps were fixed, the structure so that all basepairs pendent plasmid preparations(six to ninetransfections each). A the miRNAand UTR of the artificialstem was fixed, and basesin HeLacell line that constitutivelyexpressed miR-1 from a pol-ll pro- theUTR and miRNA, respec- wereallowed to pair onlywith basesin moterwas created usinga derivativeof the retroviralvector pRev- and RNAevalwas usedto esti- tively. The stemloopwas removed, TRE(Clontech) containing a 500bp fragmentof humanmir-ld gene. duplexformed by the basepairs mate theenergy of the miRNAUTR A HeLaS3 cell line that constitutivelyexpressed miR-130 from the determinedin the previoussteps. H1 pol-Ill promoterwas constructedusing a retroviralvector con- taining a 330 nt fragmentof the humanmir-130 gene and a GFP ParameterOptimization gene underthe murine3-phosphoglycerate kinase promoter, which Trainingsets were constructedwith 40 randomlychosen miRNAs servedas an infectionmarker (Chen,et al., 2003).Cells expressing from nrMammand 27 randomlychosen miRNAsfrom nrVert.The GFP followinginfection were enrichedto 95% purity by FACS. remainingmicroRNAs were assignedto the nrMammand nrVert referencesets. TargetScanwas tested on the training sets with Analysis of Gene Ontologies variousparameter values: T was variedfrom 5 to 25 in increments Geneontologies were assignedto humangenes from the Ensembl of 5, Zc was varied between1 and 10 in incrementsof 0.5, andRc databaseby crossreferencingEnsembl identifiers with GO identifi- wasvaried between 50 and 1000in incrementsof 50.The parameters ers usingEnsMart version 15.1 (http://www.ensembl.org/EnsMart). T = 20,Zc = 4.5,Rc = 200were found to give an optimalsignal:noise The GeneOntology Consortium database was retrieved from http:// of 3.4:1for the nrMammtraining set. WhenRc was raisedto 300or www.geneontology.organd function and processontologies were Zc was loweredto 4, the signal:noisedecreased only moderatelyto compiledfor all predictedtarget genes.In additionto the assigned -3:1. The parametersT = 10, Zc = 4.5, Rc = 350 were found to categories,each gene was consideredas havingall more general give an optimal signal:noiseof 4.6:1 for the nrVerttraining set used C'("parent")categories within the "MolecularFunction" and "Biologi- with UTRsets from all four genomes.For both the nrMammand cal Process"ontologies. In Tables2 and S4,sets of GOcategories nrVertsets, the signal:noiseratios obtained using the trainingsets wereselected that were both broadenough to containa significant did notdiffer significantlyfrom the correspondingsignal:noise ratios fractionof the predictedtargets and specific enough to be meaning- obtained using the referencesets, and thus results from the two ful. Becausethe GO descriptionsare not mutuallyexclusive, the sets were merged. sum of the percentagesin these tables is not interpretable.GO categorieswere also usedto produce thecategories in Table1. To Generationof RandomlyPermuted Sequences be includedin a category,a genehad to be annotatedwith at least For each miRNAin nrMamm,randomly permuted sequences with one out of a set of GOcategories. The sets of GOcategories used the same startingbase, length,and base compositionas the real were: regulationof transcription/DNAbinding (GO:0003700,GO: miRNAwere generateduntil four sequenceswere foundthat devi- 0003713,GO:0003714,GO:0016563, or GO:0045449)and signal ated from the original miRNAby less than 15% in the following transduction/cell-cell signaling (GO:0004871, GO:0004872, properties:(1) E(SM),the 1" order Markov probabilityof the seed GO:0007154,GO:0007165, GO:0007267 or GO:0008083). match,(2) E(rM), the 1" order Markovprobability of the antisense of the 3' end of the miRNA(or the 5' end in the case of a 3' miRNA Acknowledgments seed),(3) O(SM),the observedcount of seed matchesin the UTR dataset,and (4) the predicted folding free energyof a seed:seed We thank W.K.Johnston for technicalassistance, C-Z. Chenand match duplex. For a miRNA(or shuffledmiRNA) with the initial se- L.P. Lim for helpful discussions,H.F. Lodish for use of facilities MammalianmicroRNA Targets 797

and equipment,N.C. Lau for the miR-1-expressingcell line, and G. K.A.,and Carrington,J.C. (2003).P1/HC-Pro, a viral suppressorof Ruvkunfor plasmidsused to construct the lin-41 reporters.Sup- RNAsilencing, interferes with Arabidopsis development and miRNA portedby grantsfrom theN.I.H (D.P.B. and C.B.B.),the Searle Schol- function.Dev. Cell 4, 205-217. ars Program(C.B.B.), and theAlexander and MargaretStewart Trust Krichevsky,A.M., King,K.S., Donahue, C.P., Khrapko, K., and Kosik, (D.P.B.),and fellowships from the DOE(B.P.L.) and the CancerRe- K.S. (2003).A microRNAarray revealsextensive regulation of mi- searchInstitute (I.S.). croRNAsduring brain development.RNA 9, 1274-1281. Lagos-Quintana,M., Rauhut, R., Lendeckel,W., and Tuschl, T. Received:November 18, 2003 (2001).Identification of novel genes coding for small expressed Revised:December 3, 2003 RNAs.Science 294, 853-858. Accepted:December 4, 2003 Published:December 24, 2003 Lagos-Quintana,M., Rauhut,R., Yalcin,A., Meyer,J., Lendeckel, W., andTuschl, T. (2002).Identification of tissue-specificmicroRNAs from mouse.Curr. Biol. References 12,735-739. Lagos-Quintana,M., Rauhut, R., Meyer, J., Borkhardt, A., and Abrahante,J.E., Daul, A.L., Li, M.,Volk, M.L.,Tennessen, J.M., Miller, Tuschl, T. (2003).New microRNAsfrom mouse and human. RNA EA, and Rougvie,A.E. (2003). The Caenorhabditiselegans hunch- 9, 175-179. back-likegene lin-57/hbl-1 controls developmental time and is regu- Lai, E.C. (2002).MicroRNAs are complementaryto 3'UTR motifs lated by microRNAs.Dev. Cell 4, 625-637. that mediate negativepost-transcriptional regulation. Nat. Genet. Altschul, S.F., Madden,T.L., Schaffer,A.A., Zhang, J., Zhang,Z., 30, 363-364. Miller,W., and Lipman,D.J. (1997). Gapped BLAST and PSI-BLAST: Lai, E.C.(2003). MicroRNAs: runts of thegenome assert themselves a new generationof protein database searchprograms. Nucleic Curr. Biol. 13,R925-R936. Acids Res.25, 3389-3402. Lai, E.C.,Tomancak, P., Williams, R.W.,and Rubin,G.M. (2003). Ambros, V., Bartel, B., Bartel, D.P., Burge,C.B., Carrington,J.C., Computationalidentification of DrosophilamicroRNA genes. Ge- Chen,X., Dreyfuss,G., Eddy,S., Griffiths-Jones,S., Matzke,M., et nomeBiol. 4:R42,1-20. al. (2003a).A uniform system for microRNAannotation. RNA 9, Lau, N.C., Lim, L.P.,Weinstein, E.G., and Bartel,D.P. (2001).An 277-279. abundant class of tiny RNAs with probable regulatoryroles in Ambros,V., Lee, R.C.,Lavanway, A., Williams,P.T., and Jewell,D. Caenorhabditiselegans. Science 294, 858-862. (2003b).MicroRNAs and othertiny endogenousRNAs in C.elegans. Lee, R.C.,and Ambros,V. (2001).An extensiveclass of smallRNAs Curr. Biol. 13,807-818. in Caenorhabditiselegans. Science 294, 862-864. Aravin,A.A., Naumova, N.M., Tulin, A.A., Rozovsky, Y.M., and Gvoz- Lee, R.C.,Feinbaum, R.L., and Ambros, V. (1993).The C. elegans dev,VA. (2001).Double-stranded RNA-mediated silencing of geno- heterochronicgene lin-4 encodessmall RNAswith antisensecom- mic tandemrepeats and transposable elements in Drosophilamela- plementarityto lin-14. Cell 75, 843-854. nogastergermline. Curr. Biol. 11, 1017-1027. Lim, L.P., Glasner,M.E., Yekta, S., Burge,C.B., and Bartel, D.P. Aukerman,M.J., and Sakai,H. (2003).Regulation of floweringtime (2003a).Vertebrate microRNA genes. Science 299, 1540. andfloral organ identity by a MicroRNAand its APETALA2-liketarget genes.Plant Cell 15,2730-2741. Lim, L.P., Lau, N.C.,Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades,M.W., Burge,C.B., and Bartel,D.P. (2003b).The micro- Bartel,D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, RNAsof Caenorhabditiselegans. Genes Dev. 17,991-1008. and function.Cell, in press. Lin, S.Y.,Johnson, S.M., Abraham, M., Vella,M.C., Pasquinelli, A., Brennecke,J., Hipfner,D.R., Stark, A., Russell,R.B., and Cohen, Gamberi,C., Gottlieb, E., and Slack, F.J. (2003).The C. elegans S.M. (2003).bantam encodes a developmentallyregulated mi- hunchbackhomolog, hbl-1, controlstemporal patteming and is a croRNAthat controls cell proliferationand regulatesthe proapo- probablemicroRNA target. Dev.Cell 4, p639-p650. ptotic gene hid in Drosophila.Cell 113,25-36. Llave,C., Xie, Z., Kasschau,K.D., and Carrington,J.C. (2002). Cleav- Chen,C.-Z., Li, L., Lodish,H.F., and Bartel,D.P. (2003). MicroRNAs ageof scarecrow-likemRNA targets directed by a classof Arabidop- modulatehematopoietic lineage differentiation. Science, in press. sis miRNA.Science 297, 2053-2056. Publishedonline December4, 2003.10.1126/science.1 091903. Moss,E.G., Lee, R.C., and Ambros, V. (1997).The cold shockdomain Chen,X. (2003).A MicroRNAas a translationalrepressor of APET- protein LIN-28controls developmental timing in C. elegansand is ALA2in arabidopsisflower development. Science. Published online regulatedby the lin-4 RNA.Cell 88, 637-646. September11, 2003.10.1126/science.1 088060. Moss,E.G., and Tang,L. (2003).Conservation of the heterochronic Consortium,The Gene Ontology. (2001). Creating the geneontology regulatorLin-28, its developmentalexpression and microRNAcom- resource:design and implementation.Genome Res. 11, 1425-1433. plementarysites. Dev.Biol. 258, 432-442. Doench,J.G., Peterson,C.P., and Sharp, P.A.(2003). siRNAs can Mourelatos,Z., Dostie,J., Paushkin,S., Sharma,A., Charroux,B., function as miRNAs.Genes Dev. 17, 438-442. Abel,L., Rappsilber, J., Mann,M., and Dreyfuss, G. (2002). miRNPs: a Dostie,J., Mourelatos,Z., Yang,M., Sharma,A., and Dreyfuss,G. novelclass of ribonucleoproteinscontaining numerous microRNAs. (2003).Numerous microRNPs in neuronalcells containingnovel mi- GenesDev. 16,720-728. croRNAs.RNA 9, 631-632. Nussinov,R. (1981).Nearest neighbor nucleotide patterns. Struc- Emery,J.F., Floyd, S.K., Alvarez, J., Eshed,Y., Hawker,N.P., Izhaki, tural and biologicalimplications. J. Biol. Chem.256, 8458-8462. A., Baum,S.F., and Bowman,J.L. (2003).Radial patteming of Arabi- dopsis shoots by class III HD-ZIPand KANADIgenes. Curr. Biol. Palatnik,J.F., Allen, E., Wu,X., Schommer,C., Schwab, R., Carring- 13,1768-1774. ton, J.C., and Weigel,D. (2003).Control of leaf morphogenesisby microRNAs.Nature 20, 257-263.Published online August 20, 2003. Hofacker,I.L., Fontana,W., Stadler, P.F., Bonhoeffer,S., Tacker, 10.1038/nature01 958. M., and Schuster,P. (1994).Fast folding and comparisonof RNA secondarystructures. Monatshefte fur Chemie125, 167-188. Pasquinelli,A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M., Mailer, B., Srinivasan,A., Fishman,M., Hayward,D., Ball, E., et Houbaviy,H.B., Murray, M.F.,and Sharp, PA. (2003).Embryonic al. (2000).Conservation across animalphylogeny of the sequence stem cell-specificMicroRNAs. Dev. Cell 5, 351-358. and temporal regulationof the 21 nucleotidelet-7 heterochronic Iseli,C., Stevenson,B.J., de Souza,S.J., Samaia, H.B., Camargo, regulatoryRNA. Nature 408, 86-89. A.A.,Buetow, K.H., Strausberg, R.L., Simpson, A.J., Bucher, P., and Reinhart,B.J., Slack, F.J., Basson,M., Bettinger,J.C., Pasquinelli, Jongeneel,C.V. (2002). Long-range heterogeneity at the 3' ends of A.E., Rougvie,A.E., Horvitz,H.R., and Ruvkun,G. (2000).The 21 humanmRNAs. Genome Res. 12,1068-1074. nucleotidelet-7 RNAregulates developmental timing in Caenorhab- Kasschau,K.D., Xie, Z., Allen,E., Uave, C., Chapman,E.J., Krizan, ditis elegans.Nature 403, 901-906. Cell 798

Rhoades,M.W., Reinhart,B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel,D.P. (2002).Prediction of plant microRNAtargets. Cell 110,513-520. Stark,A., Brennecke,J., Russell,R.B., and Cohen, S.M. (2003). Iden- tificationof DrosophilamicroRNA targets. PLOS Biol., in press.Pub- lished onlineOctober 13, 2003.10.1371/joumal.pbio.0000060. Tang, G., Reinhart,B.J., Bartel,D.P., and Zamore, P.D. (2003).A biochemicalframework for RNA silencing in plants. Genes Dev. 17,49-63. Wightman,B., Ha, I., and Ruvkun,G. (1993).Posttranscriptional regulationof the heterochronicgene lin-14 bylin-4 mediatestempo- ral patternformation in C. elegans.Cell 75, 855-862. Xu,P., Vemooy, S.Y., Guo, M., and Hay,B.A. (2003). The Drosophila MicroRNAMir-14 suppressescell deathand is requiredfor normal fat metabolism.Curr. Biol. 13,790-795.