A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence

Joshua J. Forman, Aster Legesse-Miller, and Hilary A. Coller*

Department of Molecular Biology, Princeton University, Princeton, NJ 08544

Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA, and approved August 12, 2008 (received for review April 2, 2008) Recognition sites for microRNAs (miRNAs) have been reported to frames among Drosophila (11), and that target sites be located in the 3؅ untranslated regions of transcripts. In a introduced into the 5Ј UTR of transcripts can repress computational screen for highly conserved motifs within coding (12). Further, the potential importance of sites embedded within regions, we found an excess of sequences conserved at the nucle- the coding sequence of is supported by the nature of the otide level within coding regions in the human , the itself, which has been shown to be nearly optimal highest scoring of which are enriched for miRNA target sequences. for conveying information in parallel to the se- To validate our results, we experimentally demonstrated that the quence (13). One recent report has shown downregulation via a let-7 miRNA directly targets the miRNA-processing enzyme Dicer coding region target directly (29). within its coding sequence, thus establishing a mechanism for a We sought to investigate whether miRNA target sites within miRNA/Dicer autoregulatory negative feedback loop. We also coding regions are functional, beginning with an unbiased screen found computational evidence to suggest that miRNA target sites for coding region sequence motifs that are more highly evolu- ؅ in coding regions and 3 UTRs may differ in mechanism. This work tionarily conserved than is required for amino acid conservation. demonstrates that miRNAs can directly target transcripts within We found that the motifs most highly conserved in coding their coding region in animals, and it suggests that a complete regions are indeed miRNA target sites, and we experimentally search for the regulatory targets of miRNAs should be expanded to confirmed that the endonuclease Dicer is targeted by the include genes with recognition sites within their coding regions. As miRNA let-7 by means of sites within its coding region. We also more genomes are sequenced, the methodological approach that found computational evidence to suggest that miRNA target we used for identifying motifs with high sequence conservation Ј will be increasingly valuable for detecting functional sequence sites in coding regions and 3 UTRs may differ in mechanism. motifs within coding regions. Results and Discussion

computational biology ͉ posttranscriptional regulation ͉ We developed a computational algorithm to find conserved se- comparative genomics ͉ multiple- ͉ quence motifs within coding regions (http://www.sitesifter.org). Our evolutionary conservation approach is based on the assumption that DNA sequences with a regulatory function should be evolutionarily conserved at the sequence level over and above any conservation icroRNAs (miRNAs) are endogenously encoded, single- required to maintain the amino acid sequence of the encoded Mstranded regulatory that bind to and inhibit the translation of transcripts with complementary sequence (1). . Most computational approaches for finding miRNA Computational evidence suggests that miRNAs regulate at least target sites put special emphasis on the miRNA target ‘‘seed’’ 20% of human genes and have been implicated in the regulation (corresponding to positions 2–7 of the miRNA) because comple- of a wide range of biological systems (2). In plants, miRNA mentarity at these base pairs has been shown to play an targets can be predicted with relatively high confidence because important role in target recognition (4, 14). In keeping with this of the extensive base pairing between plant miRNAs and their approach, we chose to search for conserved motifs 8 bp in length, target mRNAs (1). In animals, in contrast, miRNAs typically because we have found that this length increases specificity for bind to their targets with significantly less complementarity, and miRNAs. Because coding sequences are under constraint by the the short target sequences are, therefore, difficult to identify on canonical coding for amino acids, a larger number of genomes the basis of sequence alone. As a result, most computational must be used when surveying conservation in coding regions compared to 3Ј UTRs. We took advantage of a multiple- approaches to predict miRNA–target interactions rely on con- CELL BIOLOGY servation of target sites (3–6). sequence alignment of 17 genomes (human, chimpanzee, ma- Although early studies reported some evidence for the tar- caque, mouse, rat, rabbit, dog, cow, armadillo, elephant, tenrec, geting of miRNAs to sites within coding regions (4, 6), opossum, chicken, frog, zebrafish, green spotted puffer, and subsequent research has reported that there is minimal func- fugu) obtained from the University of California Santa Cruz tionality for sites in ORFs or 5Ј UTRs (7). A focus on miRNAs (UCSC) Genome Browser (15), used in conjunction with a list present within 3Ј UTRs is supported by evidence suggesting that of coding region boundaries extracted from the UCSC Table the G-cap/poly(A) tail interface (which connects the two ends of Browser (16). eukaryotic mRNAs during translation) is important for miRNA function (8) and that miRNAs tend to be more effective when localized at the end of the 3Ј UTR rather than the middle (7, 9). Author contributions: J.J.F. and H.A.C. designed research; J.J.F. and A.L.-M. performed Indeed, the protein translation machinery might be expected to research; J.J.F., A.L.-M., and H.A.C. analyzed data; and J.J.F. and H.A.C. wrote the paper. displace an miRNA complex present within a ’s coding The authors declare no conflict of interest. sequence. However, exogenously added siRNAs that target This article is a PNAS Direct Submission. coding sequences, including siRNAs with imperfect base pairing, *To whom correspondence should be addressed. E-mail: [email protected]. are effective at silencing (10). More recent reports have also This article contains supporting information online at www.pnas.org/cgi/content/full/ shown that, contrary to other highly conserved coding region 0803230105/DCSupplemental. motifs, miRNA target sites are conserved in all three reading © 2008 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 PNAS ͉ September 30, 2008 ͉ vol. 105 ͉ no. 39 ͉ 14879–14884 Downloaded by guest on September 30, 2021 Fig. 1. Motifs with a high SLCS are enriched for miRNA target sequences. The distribution of non-zero SLCSs (black bars) from a 17-genome alignment and the distribution that resulted from an analysis of a randomly permuted genome are plotted. The distribution demonstrates that there are significantly more high-scoring motifs within coding regions compared with the distribution of scores from a randomized alignment (white bars). The highest-scoring sequence motifs (Inset) are enriched for miRNA target sequences. Results are filtered so that conserved motifs are removed if they are within 3 bp of a higher-scoring motif.

Our algorithm first parses coding regions from the whole- the splicing factor TRA2␤. We expected the TRA2␤ motif to be genome multiple alignment. The fraction of coding region recognized by our algorithm, because it confers a regulatory for which the sequences of all 17 species are aligned signal by means of DNA sequence and was also identified by a (23.4%) is then searched for perfectly conserved sequences 8 bp previous report on coding region binding sites (17). For both in length. Conserved motifs are assigned a Sequence Level datasets, discovered motifs were matched against known human Conservation Score (SLCS) representing the degree to which the miRNAs obtained from miRBase (18, 19). Among the top 15 motif is conserved at the sequence level over and above the scoring motifs in the real dataset are four miRNA target sites amino acid level. The SLCS is based on an empirical measure of (for let-7, miR-9, miR-125a, and miR-153), which had SLCSs of the probability that a given codon is sequence-conserved, given 183.6, 153.2, 132.4, and 56.6, respectively, representing signifi- that it is amino acid-conserved [see supporting information (SI) cant enrichment compared with the total set of scored motifs Table S1 for the full list of codons and their respective sequence from the permuted dataset (P Ͻ 10Ϫ6, one-tailed hypergeomet- level conservation probabilities]. For a given motif, the SLCS ric test) (Fig. 1 Inset). Other motifs include highly conserved represents the logarithm of the probability of sequence conser- sequence elements of unknown function. vation across the motif’s constituent codons, taking into account To investigate whether high-scoring sequence motifs matching codons that partially overlap the motif. If a motif is conserved miRNA target sites are indeed responsive to their associated multiple times throughout the genome, the score is summed over miRNA, we performed a functional assay on the highest-scoring every conserved occurrence (see Materials and Methods). motif, which corresponds to the let-7 target site. The let-7 Results were obtained from the multiple alignment as well as miRNA is highly conserved and was originally demonstrated to a randomly permuted alignment in which the regulate developmental timing in the roundworm Caenorhabditis sequence was left unchanged, but the choice of which codons elegans (20). More recently, let-7 has been found to play a role were fully conserved at the sequence level in other species was in cell cycle regulation and in humans (21). We trans- assigned randomly. This randomization procedure alters the fected cultured human fibroblasts with a let-7 precursor or a DNA sequence of genes in the multiple alignment while main- control RNA and used microarrays to monitor changes in taining their amino acid sequences. As shown in Fig. S1,inthe transcript levels after 24 h. All genes that contain the let-7 target actual data, the overall distribution of spacing between fully site CTACCTCA within human coding regions were analyzed conserved sites is similar to that in the permuted data, except for down-regulation in response to let-7 addition. We discovered that there is an overabundance of fully sequence-conserved a significant correlation between the number of genomes in codons that are adjacent to one another, which could reflect the which the let-7 target site was conserved and the response to presence of embedded regulatory motifs at the sequence level. transfected let-7 (r ϭϪ0.196, P Ͻ 10Ϫ5, n ϭ 521 for coding Indeed, a comparison of the two result sets (Fig. 1) shows that regions; r ϭϪ0.142, P ϭ 0.017, n ϭ 283 for 3Ј UTRs), there are significantly more high-scoring motifs from the real demonstrating that functional let-7 target sites within coding alignment compared with the permuted alignment (P Ͻ 10Ϫ49, regions are more likely to be evolutionarily conserved. In Kolmogorov–Smirnov test), suggesting that these motifs have addition, although previous reports have shown a statistically been functionally conserved within coding sequences even across significant but small effect for miRNA target sites within coding such a large evolutionary distance (see Table S2 for a list of regions (7), we discovered that the subset of target sites in coding high-scoring motifs). Thirteen motifs from the real dataset had regions that are highly evolutionarily conserved are functional as an SLCS Ͼ56.1, which was the maximum score obtained across miRNA targets. As shown in Fig. 2, highly conserved target sites 10 permuted datasets. There were also 41 miRNA target sites within coding regions resulted in down-regulation approaching with an SLCS Ͼ0 in the real dataset, compared with a mean Ϯ that of the total set of 3Ј UTR sites. SD of 20.2 Ϯ 2.2 in the permuted datasets (fewer than 5% of To determine whether the human genes containing a con- motifs have a non-zero SLCS in the real dataset). Four miRNA served let-7 target site are related in terms of biological function, target sites, corresponding to those for let-7 (except for let-7d, we tested the 25% of genes with the most highly conserved let-7 which differs in its first base), miR-9, miR-125, and miR-153, target sites for enrichment in Gene Ontology categories (22). scored above the maximum from the permuted datasets. The Genes were ranked by awarding one point for each genome in sixth-highest scoring motif matches the consensus binding site of which the let-7 binding site is conserved. Scores were summed for

14880 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 Forman et al. Downloaded by guest on September 30, 2021 0.001), embryonic limb morphogenesis (corrected P Ͻ 0.002), chromatin modification (corrected P Ͻ 0.002), and posttransla- tional protein modification (corrected P Ͻ 0.01). Next, we sought to verify directly that let-7 targets one of the genes with a conserved target site in its coding sequence. We selected the enzyme Dicer, which contains three highly con- served let-7 target sites (Fig. 3A) and performs the last process- ing step of miRNA maturation before the miRNA is loaded into the RNA-induced silencing complex (RISC). Dicer was recently found to be posttranscriptionally down-regulated by let-7, thus raising the possibility of a let-7-mediated negative-feedback loop (23, 30, 31). To measure whether let-7 down-regulates Dicer via its coding region, we cotransfected human embryonic kidney 293 cells with an expression vector containing the coding sequence of Dicer (without its 3Ј UTR and tagged with the FLAG epitope) and one Fig. 2. Highly conserved let-7 targets within coding regions are functional. Cultured human fibroblasts were transfected with pre-let-7b or a control RNA. of two pre-miRNAs (pre-let-7b or a control RNA). Additionally, Twenty-four hours after transfection, RNA was isolated, labeled, and hybrid- to assess the relative effects of the three highly conserved let-7 ized to a whole-genome microarray. The fold change in expression between target sites, we also created several variants of the Dicer cells transfected with let-7b or the control was determined. The cumulative construct by subjecting the let-7 target sites to site-directed distribution of the fraction of genes down-regulated in the microarray data mutagenesis. These abrogated the let-7 binding sites shows a significant difference between a random set of genes and genes with while keeping the protein’s amino acid sequence intact (see Ј ϭ target sites in their 3 UTR (P 0.014, one-tailed Kolmogorov–Smirnov test), Materials and Methods). Cell lysates were collected after 48 h and but not with genes with target sites in their coding region (P ϭ 0.26). Genes with highly conserved target sites in their coding region, however, are signif- analyzed by immunoblotting with an antibody to the FLAG icantly different from the set of random genes (P Ͻ 10Ϫ4), as are genes with epitope. Protein concentrations were normalized to GAPDH highly conserved target sites in their 3Ј UTRs (P Ͻ 10Ϫ3). band intensity for quantification. Our results show that wild-type Dicer is strongly down- regulated by the let-7 miRNA compared with transfection with genes with multiple let-7 target sites. We found significant a negative control (Fig. 3B, lanes 1 and 2), whereas let-7 has enrichment among several Gene Ontology categories, including minimal effect on the version of Dicer in which the three coding the processes cell cycle (multiple hypothesis-corrected P Ͻ region let-7 target sites are mutated (lanes 3 and 4) (wild type CELL BIOLOGY

Fig. 3. The microRNA let-7 targets the coding region of Dicer. (A) Dicer contains three highly conserved let-7 target sites. Alignments between the let-7 miRNA and the target sites are shown. The first two sites are conserved in 16 of 17 genomes, and the last is conserved in all 17 genomes. The coding sequence of Dicer is shown above let-7b.(B) HEK 293 cells were cotransfected with Dicer and with let-7b pre-miRNA or a pre-miRNA control. Several variants of the Dicer construct were used, from which various combinations of the let-7 sites had been abrogated by site-directed mutagenesis. (Upper) An immunoblot of Dicer levels and GAPDH levels. Wild-type Dicer is in lanes 1 and 2, fully mutated Dicer is in lanes 3 and 4, and each subsequent pair of lanes is marked to indicate which sites were intact (ߛ) and which were abrogated (ϫ). Cellular lysates were analyzed by immunoblotting with antibodies to the FLAG epitope (upper band) or GAPDH (lower band). (Lower) The fold reduction in the presence versus absence of exogenous let-7 for each Dicer variant. Means (n ϭ 4) and standard errors are plotted. Wild-type Dicer is significantly down-regulated by let-7 compared with the Dicer variant with all three target sites abrogated (two-tailed t test, P Ͻ 0.01). The Dicer variant with the first two target sites intact is also significantly down-regulated (P ϭ 0.02). The Dicer variant with the first and third target sites intact is down-regulated with the next-highest magnitude, but the effect did not reach significance. Dicer variants with a single site intact are down-regulated less strongly than variants with two sites.

Forman et al. PNAS ͉ September 30, 2008 ͉ vol. 105 ͉ no. 39 ͉ 14881 Downloaded by guest on September 30, 2021 versus mutant compared by two-tailed t test, P Ͻ 0.01). The variant with the first two sites intact (lanes 5 and 6) is also significantly down-regulated compared with the fully mutated variant (two-tailed t test, P Ͻ 0.05), although to a lesser degree than wild-type Dicer. Although the variant with the first and third sites intact showed down-regulation of the next-greatest magnitude among the variants we tested (lanes 9 and 10), this effect did not achieve statistical significance. Each variant with a single intact site showed down-regulation of a lower magnitude compared with those with two intact target sites (lanes 7 and 8, 11 and 12, and 13 and 14). The increased importance of the first two let-7 target sites may be explained by their close proximity, because nearby miRNA target sites have been shown to act cooperatively (24). Our findings also suggest that endogenous levels of let-7 are sufficient for down-regulating wild-type Dicer, because FLAG band intensity that results when wild-type Dicer is cotransfected with a control RNA is significantly lower than the band intensity that results when the fully mutated variant is cotransfected (wild type versus mutant compared by paired two-tailed t test, P ϭ 0.019). To confirm these results, we transfected wild-type and fully mutated Dicer into two cell lines—the wild-type HCT116 human colorectal cancer cell line and a version of this cell line with targeted disruption of Dicer 5, which has the effect of lowering total intracellular mature miRNA concentrations (32). Down-regulation of wild-type versus mutated Dicer is larger in wild-type cells compared with cells with defective miRNA processing (Fig. S2). These results reinforce the importance of endogenous let-7 in regulation of Dicer levels. To assess whether conservation of target sites might give insight into the mechanism by which miRNAs silence transcripts at various target locations, we used the program RNAcofold from the ViennaRNA suite (25) to computationally fold let-7 to all of its seed matches in coding regions, 3Ј UTRs, and 5Ј UTRs. We compared base-pairing probabilities along the entire let-7 miRNA between conserved and nonconserved target seeds. Our results for 3Ј UTRs (Fig. 4B) show a preference for looping at base pairs 10–12, followed by a preference for binding at positions 13–16, in conserved compared with nonconserved seed matches. These results closely recapitulate the binding prefer- ences previously reported in papers by Kiriakidou et al. (26), which established a preference for a short bulge after the first nine bases of the miRNA, and Grimson et al. (7), which established the importance of base pairing at positions 13–16 of the miRNA. In sharp contrast, our data show a statistically significant preference for binding in positions 13–15 and 17–19 Fig. 4. Recognition sites at various positions within a transcript have differ- for conserved seed matches in coding regions and no preference ent patterns of miRNA binding. Sites complementary to the let-7 seed were for looping (Fig. 4A), suggesting a possible mechanistic differ- extracted from 3Ј UTRs (A), coding regions (B), and 5Ј UTRs (C), along with ence between miRNA targeting in coding regions and in 3Ј additional upstream sequence, and computationally folded to a consensus UTRs. Conserved coding region seeds, but not conserved 3Ј let-7 miRNA (see Materials and Methods). The fraction of target–seed inter- UTR seeds, prefer a greater number of bound nucleotides in the actions in which a particular base pair was bound was determined for each position of the miRNA outside of the seed. The fraction bound was compared remainder of the miRNA (P Ͻ 10Ϫ9, two-tailed Fisher exact Ј for highly conserved versus poorly conserved target seeds. Numbers indicate test). We also tested binding preferences for seed matches in 5 the number of genomes in which a seed was conserved; 12ϩ indicates indi- UTRs (Fig. 4C) and discovered a statistically significant pref- cates conservation in 12 or more genomes, 3Ϫ indicates conservation in three erence for binding in positions 13–16. These data support the or fewer genomes. Statistical comparisons were performed by using the Fisher finding that miRNA target sites in 5Ј UTRs may be functional. exact test (*, P Ͻ 0.05; **, P Ͻ 0.01; ***, P Ͻ 0.001). Different cutoffs for the Our findings demonstrate that there are number of genomes in the ‘‘conserved’’ and ‘‘nonconserved’’ groups were elements within coding regions that have been evolutionarily used for each search region, and target seeds of 6 bp were used for 5Ј UTRs to conserved through hundreds of millions of years of evolution ensure an adequate number of sequences on which to perform statistical (27) at a significantly higher level than would be expected by analysis (Fig. S3 shows results for various conservation levels). chance. These elements are enriched for miRNA-binding sites, thus supporting a potential role for Ј miRNA targeting within coding regions in animals as well as sion relative to sites within 3 UTRs in the microarray experi- plants. Further, we show that conserved let-7 target sites present ment described above. We demonstrate that the coding region of within the coding regions of human genes are responsive to one gene in particular, Dicer, contains functional target sites for miRNAs, that the regulated genes are functionally related to one let-7 and loses its responsiveness to let-7 when target sites within another, and that conserved miRNA target sites within coding its coding sequence are synonymously mutated. Regulation of regions are comparably effective at posttranscriptional repres- the Dicer endonuclease by let-7 suggests a possible negative-

14882 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 Forman et al. Downloaded by guest on September 30, 2021 feedback mechanism for miRNA processing. A similar mech- codon is sequence-conserved, further broken down by whether or not the anism has been discovered in Arabidopsis, in which an miRNA codon is amino acid-conserved. targets Dicer-like1 transcript degradation by means of a target sequence present in the middle of the coding sequence (28). Motif Scoring. To score a conserved motif, we looked up the appropriate We also provide computational evidence to suggest that probability in Table S1 for each partial or full codon. The logs of the reciprocal miRNA function in coding regions and 3Ј UTRs may differ of the probabilities are summed across the motif: mechanistically. 1 Our results provide evidence that roughly 700 genes in the Motif score ϭ ͸ logͩ ͪ . [4] full or CSLC human genome have conserved regulatory sites within coding partial codon regions. Because our investigation focused on conservation among 17 organisms with a large amount of evolutionary When a motif is scored more than once across the genome, this motif score is distance between them, this number almost certainly underes- further summed across occurrences: timates the number of human genes that contain conserved ϭ ͸ regulatory sites within coding regions. Also, the miRNAs that we SLCS motif score. [5] conserved identified with this approach are expected to be enriched for instance those that play a role in fundamental processes, such as early development or cell cycle regulation, a hypothesis supported by Permutation Testing. For each amino acid that was perfectly conserved at the the fact that the highest-scoring miRNA in our computational amino acid level in all 17 genomes, we replaced it with perfect sequence conser- screen was let-7, which is highly evolutionarily conserved and vation or with imperfect sequence conservation at the empirically determined rate. The regenerated genome was analyzed, and sequence level conservation targets genes in coding sequences that are enriched for these scores were determined as for the actual human genome. biological functions. Our findings represent an important step toward a comprehensive understanding of regulatory DNA let-7 Conservation. To find genes containing the let-7 target 8-mer, the human elements and the role of these elements in cellular networks and sequence from the multiple alignment was searched for the motif CTACCTCA in human health and disease. such a way that gaps in the human sequence of the alignment were ignored (i.e., the human sequence CTACC---TCA in the alignment was included in the analysis). Materials and Methods For each occurrence of this motif, a conservation score was calculated by Extracting Coding Regions. The 17-genome multiple alignment used in this counting the number of genomes in which this motif was conserved in the study was obtained from the UCSC Genome Browser (multiple alignment files same frame as the human sequence. If the motif appeared in a gene more are available directly from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/ than once, the conservation score was summed across occurrences. multiz17way/). Chromosomes M and Y were excluded from the analysis. Coding regions were identified by using the annotated gene list available Binding Profiles Downstream of the let-7 Seed. All instances of the 8-mer let-7 from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/ref- target seed were retrieved from each search region (coding regions, 3Ј UTRs, Gene.txt.gz (column names and definitions are available from http:// and 5Ј UTRs) with 27 bp of upstream sequence. Sequences were discarded if hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refGene.sql). Genes less than 22 bp of upstream sequence was available (e.g., if the were excluded if the cdsStartStat or cdsEndStat was not complete (‘‘cmpl’’) start site was encountered for target sites in the 5Ј UTR). The seed and to filter out genes with imprecise sequences. When coding regions over- upstream sequence were computationally folded to let-7 using the RNAcofold lapped, the longer coding region was used in the analysis. Multiple align- program of the Vienna RNA package (25) (using the ϪnoGU option to disallow ments for coding regions were directly parsed from the multiple alignment G/U base pairing, and the ϪC option to enforce seed binding). Because there files. are several members of the let-7 miRNA family, we constructed a consensus let-7 sequence, which is identical to let-7a. Scoring Model. Our scoring method is designed to detect excess sequence conservation over and above that required for amino acid conservation. The Genome Coverage. To estimate the number of genes that contain high-scoring probabilities for each of the codons in a motif are then used to assign the motif motifs conserved at the DNA sequence level, we subtracted the total number a SLCS, as described below. For a given conserved codon in a motif, its codon of genes containing a conserved motif in the permuted dataset from the sequence level conservation (CSLC) probability is calculated as follows: number of genes containing a conserved motif in the nonpermuted dataset. Only motifs with an SLCS Ͼ15 were considered. ͑ ͒ ϭ ͑ ͉ ͒ CSLC codon P consseq consaa , [1] Monitoring Dicer Levels. The CS2ϩ expression vector containing the coding where consseq denotes that the codon is sequence-conserved and consaa sequence of Dicer with His and FLAG tags was transfected into 293 cells along denotes that the codon is amino acid-conserved. When a codon is amino with control pre-miRNA, pre-let-7b, control anti-miRNA, or anti-let-7b. Levels acid-conserved but not sequence-conserved, the human sequence is used to of Dicer or the loading control GAPDH were determined by immunoblotting determine the codon assignment. of protein lysates, collected at 48 h, with antibodies to FLAG or GAPDH. The We also measured these probabilities for partial codons, because conserved three let-7-binding sites were mutated with the Stratagene multisite-directed

motifs will span both full and partial codons in the multiple alignment. If x CELL BIOLOGY mutagenesis kit, and the experiment was repeated. bases of the codon are within the conserved motif, the probability is measured as follows: Microarray Analysis. To monitor changes in transcript levels associated with CSLC ͑codon͒ ϭ P͑cons ͉cons ͒, [2] let-7 overexpression, dermal fibroblasts were transfected with pre-let-7b or a partial,conserved x,pos,seq aa control RNA. Samples were collected after 24 h. Total RNA was isolated by using the mirVana miRNA Isolation kit (Ambion) according to the manufac- where consx,pos,seq denotes that the first or last (denoted by pos) x bases of the codon are sequence-conserved. The portion of a codon that is part of a turer’s protocol. High-quality RNA was confirmed by using a Bioanalyzer 2100 conserved motif must be fully sequence-conserved to be scored, but the (Agilent Technologies), and the amount was determined with a NanoDrop remainder of the codon need not necessarily be conserved. As a result, we spectrophotometer (Thermo Scientific). A total of 325 ng of total RNA was must also measure partial codon probabilities in the case where x bases within amplified by using the Low RNA Input Fluorescent Labeling Kit (Agilent the codon are fully sequence-conserved, but the full codon is not amino Technologies) according to the manufacturer’s protocol. Cyanine 3-CTP (Cy-3) acid-conserved: or cyanine 5-CTP (Cy-5) (Perkin–Elmer) was directly incorporated into the cRNA during in vitro transcription. Cy-3-labeled cRNA was used as reference RNA ͑ ͒ ϭ ͑ ͉ ͒ to compare with corresponding Cy-5-labeled let-7b-overexpressing sample CSLCpartial,non-conserved codon P consx,pos,seq non-consaa , cRNA. The mixed labeled cRNA was hybridized to whole Human Genome [3] Oligo Microarray slides (Agilent Technologies) at 60°C for 17 h and subse- quently washed according to the Agilent Oligo Microarray Hybridization where nonconsaa denotes that the codon is not amino acid-conserved. Our Kit. Slides were scanned with a dual laser scanner (Agilent Technologies). resulting table (Table S1) contains the probability that a given full or partial The Agilent feature extraction software, in conjunction with the Princeton

Forman et al. PNAS ͉ September 30, 2008 ͉ vol. 105 ͉ no. 39 ͉ 14883 Downloaded by guest on September 30, 2021 University Microarray database, was used to compute the log ratio of the Site 3 ͑translation: DYL͒ difference between the two samples for each gene after background Before mutagenesis: GAC TAC CTC subtraction and dye normalization. Of the Ϸ44,000 probes on the microar- ray, 25,972 probes generated signal in four of five arrays. Data for these After mutagenesis: GAT TAT TTG probes were mapped to genes based on UniGene Clusters. If multiple probes mapped to a single gene, the values were averaged, resulting in Mutagenesis was performed with the Stratagene multisite-directed - 14,590 genes. esis kit according to the manufacturer’s instructions. The msDicer construct was verified to contain all three mutated sites and no other mutations by Site-Directed Mutagenesis. Primers (synthesized by Integrated DNA Technol- primer extension sequencing performed by GENEWIZ. ogies) were designed to introduce mutations into all three let-7 target sites in the Dicer coding region such that the let-7 target site would be abrogated without changing the amino acid translation of the Dicer gene. The sites were ACKNOWLEDGMENTS. We thank Jason Myers, David Hendrickson, and Jim mutated as follows: Ferrell for providing the Dicer expression vector; Burt Vogelstein for pro- viding the HCT116 Dicerex5 cells; and Curtis Huttenhower, Peter Wei, Amy Site 1 ͑translation: TTS͒ Caudy, Leonid Kruglyak, and the entire Coller and Kruglyak labs for helpful Before mutagenesis: ACT ACC TCA discussions. H.A.C. is the Milton E. Cassel scholar of the Rita Allen Founda- tion. This work was supported in part by grants from the PhRMA Founda- After mutagenesis: ACA ACG AGC tion, National Institutes of Health/National Institute of General Medical Site 2 (translation: STS) Sciences Grant 1R01 GM081686 to H.A.C, National Institutes of Health/ National Institute of General Medical Sciences Grant P50 GM071508 to Before mutagenesis: TCT ACC TCA H.A.C. and A.L-M., and National Cancer Institute Training Grant 5T32 After mutagenesis: AGC ACG TCA CA009528 to J.J.F.

1. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs 16. Karolchik D, et al. (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res and their targets, including a stress-induced miRNA. Mol Cell 14:787–799. 32:D493–D496. 2. Xie X, et al. (2005) Systematic discovery of regulatory motifs in human promoters and 17. Down T, Leong B, Hubbard TJ (2006) A machine learning strategy to identify candidate 3Ј UTRs by comparison of several mammals. Nature 434:338–345. binding sites in human protein-coding sequence. BMC Bioinformatics 7:419. 3. Krek A, et al. (2005) Combinatorial microRNA target predictions. Nat Genet 37:495– 18. Griffiths-Jones S (2004) The microRNA Registry. Nucleic Acids Res 32:D109–D111. 500. 19. Griffiths-Jones S (2006) miRBase: The microRNA sequence database. Methods Mol Biol 4. Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by ad- 342:129–138. enosines, indicates that thousands of human genes are microRNA targets. Cell 120:15– 20. Reinhart BJ, et al. (2000) The 21-nucleotide let-7 RNA regulates developmental timing 20. in Caenorhabditis elegans. Nature 403:901–906. 5. Stark A, Brennecke J, Russell RB, Cohen SM (2003) Identification of Drosophila Mi- 21. Johnson SM, et al. (2005) RAS is regulated by the let-7 microRNA family. Cell 120:635–647. croRNA targets. PLoS Biol 1:E60. 22. Boyle EI, et al. (2004) GO::TermFinder–open source software for accessing Gene 6. John B, et al. (2004) Human MicroRNA targets. PLoS Biol 2:e363. Ontology information and finding significantly enriched Gene Ontology terms asso- 7. Grimson A, et al. (2007) MicroRNA targeting specificity in mammals: Determinants ciated with a list of genes. Bioinformatics 20:3710–3715. beyond seed pairing. Mol Cell 27:91–105. 23. Johnson CD, et al. (2007) The let-7 microRNA represses cell proliferation pathways in 8. Humphreys DT, Westman BJ, Martin DI, Preiss T (2005) MicroRNAs control translation human cells. Cancer Res 67:7713–7722. initiation by inhibiting eukaryotic initiation factor 4E/cap and poly(A) tail function. 24. Saetrom P, et al. (2007) Distance constraints between microRNA target sites dictate Proc Natl Acad Sci USA 102:16961–16966. efficacy and cooperativity. Nucleic Acids Res 35:2333–2342. 9. Gaidatzis D, van Nimwegen E, Hausser J, Zavolan M (2007) Inference of miRNA targets 25. Bernhart SH, et al. (2006) Partition function and base pairing probabilities of RNA using evolutionary conservation and pathway analysis. BMC Bioinformatics 8:69. heterodimers. Algorithms Mol Biol 1:3. 10. Saxena S, Jonsson ZO, Dutta A (2003) Small RNAs with imperfect match to endogenous 26. Kiriakidou M, et al. (2004) A combined computational-experimental approach predicts mRNA repress translation. Implications for off-target activity of small inhibitory RNA human microRNA targets. Genes Dev 18:1165–1178. in mammalian cells. J Biol Chem 278:44312–44319. 27. Kumar S, Hedges SB (1998) A molecular timescale for vertebrate evolution. Nature 11. Stark A, et al. (2007) Discovery of functional elements in 12 Drosophila genomes using 392:917–920. evolutionary signatures. Nature 450:219–232. 28. Xie Z, Kasschau KD, Carrington JC (2003) Negative feedback regulation of Dicer-Like1 12. Lytle JR, Yario TA, Steitz JA (2007) Target mRNAs are repressed as efficiently by in Arabidopsis by microRNA-guided mRNA degradation. Curr Biol 13:784–789. microRNA-binding sites in the 5Ј UTR as in the 3Ј UTR. Proc Natl Acad Sci USA 29. Duursma AM, Kedde M, Schrier M, le Sage C, Agami R (2008) miR-148 targets human 104:9667–9672. DNMT3b protein coding region. RNA 14:872–877. 13. Itzkovitz S, Alon U (2007) The genetic code is nearly optimal for allowing additional 30. Tokumaru S, Suzuki M, Yamada H, Nagino M, Takahashi T (2008) let-7 regulates Dicer information within protein-coding sequences. Genome Res 17:405–412. expression and constitutes a negative feedback loop. , 10.1093/carcin/ 14. Brennecke J, Stark A, Russell RB, Cohen SM (2005) Principles of microRNA-target bgn187. recognition. PLoS Biol 3:e85. 31. Selbach M, et al. (2008) Widespread changes in protein synthesis induced by microRNAs. 15. Karolchik D, et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res Nature 455:58–63. 31:51–54. 32. Cummings J, et al. (2006) The colorectal microRNAome. PNAS 103:3687–3692.

14884 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 Forman et al. Downloaded by guest on September 30, 2021