Genomics of Microrna
Total Page:16
File Type:pdf, Size:1020Kb
Review TRENDS in Genetics Vol.22 No.3 March 2006 Genomics of microRNA V. Narry Kim1 and Jin-Wu Nam2 1Department of Biological Sciences and Research Center for Functional Cellulomics, Seoul National University, Seoul, 151-742, Korea 2Graduate Program in Bioinformatics, Seoul National University, Seoul, 151-742, Korea Discovered just over a decade ago, microRNA (miRNA) is approaches have been used to identify miRNA genes. now recognized as one of the major regulatory gene The first approach is through forward genetics, as is the families in eukaryotic cells. Hundreds of miRNAs have case for the first two miRNAs. In Drosophila, the gene been found in animals, plants and viruses, and there are responsible for the bantam phenotype displaying smaller certainly more to come. Through specific base-pairing body size turned out to be a miRNA gene [27]. Also in C. with mRNAs, these tiny w22-nt RNAs induce mRNA elegans, lsy-6 mutants with defects in neuronal left–right degradation or translational repression, or both. asymmetry had mutations in an miRNA gene [7]. Because Because a miRNA can target numerous mRNAs, often genetic studies are based on clear phenotypes, the in vivo in combination with other miRNAs, miRNAs operate functions of genetically identified RNAs are highly complex regulatory networks. In this article, we well established. summarize the current status of miRNA gene mining A breakthrough in large-scale miRNA identification and miRNA expression profiling. We also review up-to- was made when directional cloning was used to construct date knowledge of miRNA gene structure and the a cDNA library for endogenous small RNAs [28,29]. biogenesis mechanism. Our focus is on animal miRNAs. Briefly, total RNA is separated in denaturing polyacryl- amide gel from which small RNA of 19–25 nt is gel- Introduction purified. The size-fractionated RNA is then ligated to 50 Since the discovery of RNA interference (RNAi), efforts to and 30 adaptor molecules, and amplified by RT–PCR to identify endogenous small RNAs have led to the discovery construct the cDNA library for sequencing individual of hundreds of miRNAs in nematodes, fruit flies and clones. Usually, more than half of the cloned sequences are humans [1–21]. More than 500 different miRNAs have from fragments of larger RNAs such as tRNA or rRNA. been identified in animals and plants, where the number According to current convention, miRNA is defined as a of miRNA genes is expected to increase to 500–1000 per single-stranded RNA, w22 nt in length, that is generated species, which would comprise w2–3% of protein-coding by the RNase III-type enzymes from a local hairpin genes [22]. At the time of writing, the miRNA database structure embedded in an endogenous transcript [30]. (http://www.sanger.ac.uk/Software/Rfam/mirna/)con- miRNAs differ from small interfering RNAs (siRNAs) in tained 114 Caenorhabditis elegans miRNAs, 78 Droso- that siRNAs are processed from long double-stranded phila melanogaster miRNAs, 369 Danio rerio miRNAs, RNAs (dsRNAs) [31]. So, when a small RNA is discovered 122 Gallus gallus (chicken) miRNAs, 326 Homo sapiens by cDNA cloning, it must meet the following criteria to be miRNAs and 117 Arabidopsis thaliana miRNAs [23]. classified as a miRNA (for more detailed information see Nearly all miRNAs are conserved in closely related species Ref. [32]). First, the small RNA sequence should be and many have homologs in distant species. At least a present in one arm of the hairpin structure, which lacks third of C. elegans miRNAs have homologs in humans [24], large internal loops or bulges. The hairpins are usually suggesting that their functions could also be conserved w80 nt in animals, but the lengths are more variable in throughout the evolution of animal lineages. plants. The small RNA sequences should be phylogeneti- Large DNA viruses have also been found to carry cally conserved in closely related species. The sequence miRNA genes: five in Epstein–Barr virus, 12 in Kaposi conservation is usually lower in the loop region than in the sarcoma-associated herpesvirus, nine in mouse g-herpes- mature miRNA segment. Second, its expression should be virus 68, and nine in human cytomegalovirus. confirmed by hybridization to a size-fractionated RNA In this review, we summarize the recent advances in sample, typically by northern blotting. The blot normally miRNA gene identification, biogenesis and expression shows both the mature form (a w22-nt band) and the profiling, with a focus on mammalian miRNAs. hairpin precursor (a w70-nt band). When the expression is too low for detection, if the putative precursor of the Identification of microRNA genes cloned small RNA has a conserved hairpin structure it can Considerable effort has been devoted to the identification be annotated as a miRNA. Close homologs in other species of miRNA genes since the discoveries of numerous can be annotated as miRNA orthologs without experimen- miRNAs in 2001 following the earlier genetic identifi- tal validation, provided that they fulfill the conserved cation of lin-4 and let-7 [1,25,26].Threetypesof precursor criterion. Small RNAs that do not meet these Corresponding author: Kim, V.N. ([email protected]). requirements can be classified as either siRNAs or other Available online 30 January 2006 provisional classes [31]. Hundreds of miRNAs have been www.sciencedirect.com 0168-9525/$ - see front matter Q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2006.01.003 166 Review TRENDS in Genetics Vol.22 No.3 March 2006 cloned from various cell lines, diverse tissues of mouse, genomics analysis, Xie et al. revealed 106 conserved motifs fly and zebrafish, and a range of developmental in the 30 untranslated region (UTR) of mRNA [19], many of stages in mouse, fly, worm, frog and zebrafish which (72 motifs) turned out to be complementary to the 50 [2,5,6,10,14,20,27,29]. A limitation of this approach, ends of nearly half of the known miRNAs and were however, is that miRNA expressed at a low level or only capable of forming 6–8 bp seed duplexes [19]. Encouraged in a specific condition or specific cell types would be by this result, Xie et al. used the entire conserved motifs to difficult to find. predict miRNA genes and obtained 129 novel miRNA This problem can be overcome, at least in part, by candidates. When 12 candidates were tested for bioinformatic approaches and examining genomic expression with an RNA sample pooled from just ten sequences. Computational identification is based largely different tissues, six candidates were validated for on the phylogenetic conservation and the structural expression. This indicated that many of the 129 predicted characteristics of miRNA precursors. Simple homology candidates are likely to be authentic miRNA genes. searches can reveal homologs of known miRNAs. Also Another recent prediction method took advantage of useful is to look for stem-loops in the vicinity of known conservation of known miRNA genes [18]. The stems of miRNA genes because more than half of known miRNA miRNA hairpins are highly conserved whereas the loop genes are present as tandem arrays within operon-like region is poorly conserved. The degree of conservation clusters [5,26,33]. More extensive gene mining at a rapidly declines for sequences immediately flanking the genomic level typically begins with identifying conserved miRNA hairpins. Using this characteristic profile, 976 genomic segments in the intergenic area that potentially candidate miRNAs were predicted and 16 out of 69 fold into a stem-loop structure. The first miRNA search representative candidates (23%) were confirmed for their algorithm was MiRscan, which successfully predicted expression by northern blotting. miRNA genes that display close homology in two More recently, Nam et al. introduced a probabilistic co- nematode worms, C. elegans and C. briggsae [24]. learning model (ProMiR), a variant of the pair hidden MiRscan was further improved by defining conserved Markov model, to screen statistically both strongly and sequence motifs found in the vicinity of nematode miRNA weakly conserved stem-loops in the human genome [35]. genes [34]. MiRscan is available on the web (http://genes. ProMiR successfully detects miRNA genes with no clear mit.edu/mirscan/). Another sensitive computational scor- sequence similarity to known miRNA genes. Nine out of 23 ing tool is miRseeker, which has been applied to screen representative candidate genes (39%) were validated by a conserved stem-loops in insect genomes [8]. RT-PCR-based method for expression in HeLa cells. Further improvements have since been introduced, An intensive integrative approach was taken by enabling impressive expansion of the list of miRNA genes Bentwich et al., who combined bioinformatics predictions in diverse species (Table 1). One of the particularly useful with microarray analysis and sequence-directed cloning features used in recent studies was the strong conserva- [21]. Hairpins from the entire human genome were first tion of the nucleotides in the 50 region of mature miRNA. screened. After excluding the hairpins that are located in Seven nucleotides at 2–7 positions (relative to the 50 end of repetitive elements and protein-coding regions, high- miRNA), known as ‘seed’ sequences, are crucial in binding throughput validation was performed for w5300 selected to the target mRNA. Through systematic comparative hairpins by microarray analysis on five different human Table 1. miRNA gene prediction algorithms Prediction target Methods Detection of non- Name of program Species Refs conserved miRNA Pre-miRNA