
A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence Joshua J. Forman, Aster Legesse-Miller, and Hilary A. Coller* Department of Molecular Biology, Princeton University, Princeton, NJ 08544 Edited by Philip P. Green, University of Washington School of Medicine, Seattle, WA, and approved August 12, 2008 (received for review April 2, 2008) Recognition sites for microRNAs (miRNAs) have been reported to frames among Drosophila genomes (11), and that target sites be located in the 3 untranslated regions of transcripts. In a introduced into the 5Ј UTR of transcripts can repress translation computational screen for highly conserved motifs within coding (12). Further, the potential importance of sites embedded within regions, we found an excess of sequences conserved at the nucle- the coding sequence of genes is supported by the nature of the otide level within coding regions in the human genome, the genetic code itself, which has been shown to be nearly optimal highest scoring of which are enriched for miRNA target sequences. for conveying information in parallel to the amino acid se- To validate our results, we experimentally demonstrated that the quence (13). One recent report has shown downregulation via a let-7 miRNA directly targets the miRNA-processing enzyme Dicer coding region target directly (29). within its coding sequence, thus establishing a mechanism for a We sought to investigate whether miRNA target sites within miRNA/Dicer autoregulatory negative feedback loop. We also coding regions are functional, beginning with an unbiased screen found computational evidence to suggest that miRNA target sites for coding region sequence motifs that are more highly evolu- in coding regions and 3 UTRs may differ in mechanism. This work tionarily conserved than is required for amino acid conservation. demonstrates that miRNAs can directly target transcripts within We found that the motifs most highly conserved in coding their coding region in animals, and it suggests that a complete regions are indeed miRNA target sites, and we experimentally search for the regulatory targets of miRNAs should be expanded to confirmed that the endonuclease Dicer is targeted by the include genes with recognition sites within their coding regions. As miRNA let-7 by means of sites within its coding region. We also more genomes are sequenced, the methodological approach that found computational evidence to suggest that miRNA target we used for identifying motifs with high sequence conservation Ј will be increasingly valuable for detecting functional sequence sites in coding regions and 3 UTRs may differ in mechanism. motifs within coding regions. Results and Discussion computational biology ͉ posttranscriptional regulation ͉ We developed a computational algorithm to find conserved se- comparative genomics ͉ multiple-sequence alignment ͉ quence motifs within coding regions (http://www.sitesifter.org). Our evolutionary conservation approach is based on the assumption that DNA sequences with a regulatory function should be evolutionarily conserved at the nucleotide sequence level over and above any conservation icroRNAs (miRNAs) are endogenously encoded, single- required to maintain the amino acid sequence of the encoded Mstranded regulatory RNAs that bind to and inhibit the translation of transcripts with complementary sequence (1). proteins. Most computational approaches for finding miRNA Computational evidence suggests that miRNAs regulate at least target sites put special emphasis on the miRNA target ‘‘seed’’ 20% of human genes and have been implicated in the regulation (corresponding to positions 2–7 of the miRNA) because comple- of a wide range of biological systems (2). In plants, miRNA mentarity at these base pairs has been shown to play an targets can be predicted with relatively high confidence because important role in target recognition (4, 14). In keeping with this of the extensive base pairing between plant miRNAs and their approach, we chose to search for conserved motifs 8 bp in length, target mRNAs (1). In animals, in contrast, miRNAs typically because we have found that this length increases specificity for bind to their targets with significantly less complementarity, and miRNAs. Because coding sequences are under constraint by the the short target sequences are, therefore, difficult to identify on canonical coding for amino acids, a larger number of genomes the basis of sequence alone. As a result, most computational must be used when surveying conservation in coding regions compared to 3Ј UTRs. We took advantage of a multiple- approaches to predict miRNA–target interactions rely on con- CELL BIOLOGY servation of target sites (3–6). sequence alignment of 17 genomes (human, chimpanzee, ma- Although early studies reported some evidence for the tar- caque, mouse, rat, rabbit, dog, cow, armadillo, elephant, tenrec, geting of miRNAs to sites within protein coding regions (4, 6), opossum, chicken, frog, zebrafish, green spotted puffer, and subsequent research has reported that there is minimal func- fugu) obtained from the University of California Santa Cruz tionality for sites in ORFs or 5Ј UTRs (7). A focus on miRNAs (UCSC) Genome Browser (15), used in conjunction with a list present within 3Ј UTRs is supported by evidence suggesting that of coding region boundaries extracted from the UCSC Table the G-cap/poly(A) tail interface (which connects the two ends of Browser (16). eukaryotic mRNAs during translation) is important for miRNA function (8) and that miRNAs tend to be more effective when localized at the end of the 3Ј UTR rather than the middle (7, 9). Author contributions: J.J.F. and H.A.C. designed research; J.J.F. and A.L.-M. performed Indeed, the protein translation machinery might be expected to research; J.J.F., A.L.-M., and H.A.C. analyzed data; and J.J.F. and H.A.C. wrote the paper. displace an miRNA complex present within a gene’s coding The authors declare no conflict of interest. sequence. However, exogenously added siRNAs that target This article is a PNAS Direct Submission. coding sequences, including siRNAs with imperfect base pairing, *To whom correspondence should be addressed. E-mail: [email protected]. are effective at silencing (10). More recent reports have also This article contains supporting information online at www.pnas.org/cgi/content/full/ shown that, contrary to other highly conserved coding region 0803230105/DCSupplemental. motifs, miRNA target sites are conserved in all three reading © 2008 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0803230105 PNAS ͉ September 30, 2008 ͉ vol. 105 ͉ no. 39 ͉ 14879–14884 Downloaded by guest on September 30, 2021 Fig. 1. Motifs with a high SLCS are enriched for miRNA target sequences. The distribution of non-zero SLCSs (black bars) from a 17-genome alignment and the distribution that resulted from an analysis of a randomly permuted genome are plotted. The distribution demonstrates that there are significantly more high-scoring motifs within coding regions compared with the distribution of scores from a randomized alignment (white bars). The highest-scoring sequence motifs (Inset) are enriched for miRNA target sequences. Results are filtered so that conserved motifs are removed if they are within 3 bp of a higher-scoring motif. Our algorithm first parses coding regions from the whole- the splicing factor TRA2. We expected the TRA2 motif to be genome multiple alignment. The fraction of coding region recognized by our algorithm, because it confers a regulatory nucleotides for which the sequences of all 17 species are aligned signal by means of DNA sequence and was also identified by a (23.4%) is then searched for perfectly conserved sequences 8 bp previous report on coding region binding sites (17). For both in length. Conserved motifs are assigned a Sequence Level datasets, discovered motifs were matched against known human Conservation Score (SLCS) representing the degree to which the miRNAs obtained from miRBase (18, 19). Among the top 15 motif is conserved at the sequence level over and above the scoring motifs in the real dataset are four miRNA target sites amino acid level. The SLCS is based on an empirical measure of (for let-7, miR-9, miR-125a, and miR-153), which had SLCSs of the probability that a given codon is sequence-conserved, given 183.6, 153.2, 132.4, and 56.6, respectively, representing signifi- that it is amino acid-conserved [see supporting information (SI) cant enrichment compared with the total set of scored motifs Table S1 for the full list of codons and their respective sequence from the permuted dataset (P Ͻ 10Ϫ6, one-tailed hypergeomet- level conservation probabilities]. For a given motif, the SLCS ric test) (Fig. 1 Inset). Other motifs include highly conserved represents the logarithm of the probability of sequence conser- sequence elements of unknown function. vation across the motif’s constituent codons, taking into account To investigate whether high-scoring sequence motifs matching codons that partially overlap the motif. If a motif is conserved miRNA target sites are indeed responsive to their associated multiple times throughout the genome, the score is summed over miRNA, we performed a functional assay on the highest-scoring every conserved occurrence (see Materials and Methods). motif, which corresponds to the let-7 target site. The let-7 Results were obtained from the multiple alignment as well as miRNA is highly conserved and was originally demonstrated to a randomly permuted alignment in which the human genome regulate developmental timing in the roundworm Caenorhabditis sequence was left unchanged, but the choice of which codons elegans (20). More recently, let-7 has been found to play a role were fully conserved at the sequence level in other species was in cell cycle regulation and cancer in humans (21). We trans- assigned randomly. This randomization procedure alters the fected cultured human fibroblasts with a let-7 precursor or a DNA sequence of genes in the multiple alignment while main- control RNA and used microarrays to monitor changes in taining their amino acid sequences.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-