Ingredients for in Silico Mirna Identification and Annotation Christine Gaspin, Olivier Rué, Matthias Zytnicki
Total Page:16
File Type:pdf, Size:1020Kb
Ingredients for in silico miRNA identification and annotation Christine Gaspin, Olivier Rué, Matthias Zytnicki To cite this version: Christine Gaspin, Olivier Rué, Matthias Zytnicki. Ingredients for in silico miRNA identification and annotation. JSM Biotechnology and Biomedical Engineering, 2016, 3 (5), pp.1-5. hal-01607400 HAL Id: hal-01607400 https://hal.archives-ouvertes.fr/hal-01607400 Submitted on 27 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Central JSM Biotechnology & Biomedical Engineering Bringing Excellence in Open Access Mini Review *Corresponding author Christine Gaspin INRA, MIAT & PF GenoToul Bioinfo, Université de Toulouse, Chemin de Borde Rouge, Ingredients for In silico miRNA BP52 627, 31326 Castanet Tolosan cedex, France, Tel : 33561285282; Fax: 33561285335; Email: Identification and Annotation Submitted: 04 November 2016 Christine Gaspin1,2*, Olivier Rué1,2, and Matthias Zytnicki1 Accepted: 08 December 2016 Published: 09 December 2016 1MIAT, Université de Toulouse, France 2Plate-forme Genotoul Bioinfo, Université de Toulouse, France ISSN: 2333-7117 Copyright Abstract © 2016 Gaspin et al. Annotation of microRNAs (miRNAs) is a prerequisite to the study of their functional OPEN ACCESS analysis at the genome scale. The first list of criteria published for miRNA annotation was established in 2003 by the group of Thomas Tuschl and since then, a large number Keywords of bioinformatics resources relying on these criteria have been produced. Recently, • miRNA the criteria and nomenclature considered for miRNA annotation were questioned by • Nomenclature several groups with regard to available resources (databases, miRNA prediction • Isoform software) but also considering all accumulated knowledge during the ten past years. • Identification In this paper, we revisit criteria for miRNA annotation and their importance to design • Alignment relevant bioinformatics resources able to identify accurately members of known miRNA families as well as members of putative new families. ABBREVIATIONS miRNA: microRNAs; sRNA: small RNA of such characteristics and the importance of thein silicorole of miRNA INTRODUCTION in essential processes have boosted the development bona fide miRNAs,of many resources, including specific databases but also methods, and tools dedicated to the identification of Identification and annotation of miRNAs are the first steps whose accuracy has increased by integrating recently discovered Caenorhabditisof the study of elegansmiRNA functional analysis at the genome scale. characteristics. Indeed, more than fifty organism-specific or Since the discovery of the lin-4 and let-7 miRNA genes in multi-organism miRNA databases and as many miRNA prediction [1,2], the number of discovered miRNAs software are now available to the scientific community [6]. exponentially increased in databases. The first uniform system Recently, the criteria considered for miRNA identification and dedicated to miRNA identification and annotation was proposed bona fide miRNAs annotation were questioned by several groups [7-10] with regard in 2003 [3], and the authors used a combination of expression to available resources (databases, miRNA prediction software) and biogenesis criteria to distinguish between but also considering all accumulated knowledge during the ten and other classes of small RNAs. The first repository dedicated to past years. They argue for a review of nomenclature guidelines miRNAs was miRBase [4], and it is still the most widely used by and are developing alternative resources that meet their needs. far. As the primary repository for miRNA sequence annotation, As a result, scientists who are in charge of developing tools for the initial goals of miRBase were i) to assign unique names to miRNA prediction, annotation and functional analysis have to distinct miRNAs prior to publication of their discovery in order (re)-consider regularly the continuously evolving characteristics to maintain consistent gene nomenclature and ii) to provide a and resources that can be used to identify and annotate properly comprehensive and searchable database of all published mature miRNA genes [11]. In this paper, we revisit criteria for miRNA miRNAs and related pre-miRNA hairpin sequences. As a result annotation and we discuss their importance to design relevant of bioinformatics screening and the increasing number of small C. elegans, Caenorhabditis bioinformatics resources able to identify and annotate accurately RNA sequencing efforts, the database has grown exponentially briggsae, D. melanogaster Arabidopsis members of known miRNA families as well as members of from 506 entries covering 5 species ( thaliana putativemiRNA newcharacteristics families. , human, mouse and ) in the first publication [4] to 28.645 entries in the most recent release 21 [5]. This accumulation of miRNA sequences in The essence of a miRNA identification and annotation tool miRBase but also the continued evolution of this repository to is to learn the characteristics of the miRNAs, and to use this meet the needs of the scientific community helped to agree on the as a bona fide knowledge to provide high quality results. In this section, we characteristics that must be met for a sequence to be considered will show what are the distinctive features that are specific to miRNA. These include miRNA biogenesis and miRNAs, and how these features are used by bioinformatics tools. expression characteristics, but also a strong conservation of the mature miRNA sequence in related species. The availability There are many types of small RNAs present in every Cite this article: Gaspin C, Rué O, Zytnicki M (2016) Ingredients for In silico miRNA Identification and Annotation. JSM Biotechnol Bioeng 3(5): 1071. Gaspin et al. (2016) Email: Central Bringing Excellence in Open Access eukaryotic cell, and several of them have common features with have been defined: for instance, the “seed region” of the miRNA the miRNA class. The transcripts can be produced anywhere (corresponding to nucleotides 2-8, but this region seems to vary in the genome (intergenic, introns, UTRs, exons, transposable from species to species) should match perfectly [14]. The seed is elements…) so identifying miRNAs is a very challenging task. In thought to be the region where the miRNA and the target start order to provide good predictions, current tools use a combination hybridizing, and contains, in principle, the core function of the of different aspects related to miRNAs: knowledge from the miRNA. miRNA biogenesis, knowledge from other species or validated This homology criteria also contributed to the definition miRNAs,Biogenesis and expressionrelated characteristics data (usually, sRNA-seq datasets). of miRNA families, akin to gene families. A first definition of a miRNA family is that all members of a given family should share a common ancestor. Very little work has been done to reconstruct Primary miRNAs (pri-miRNA) are mostly transcribed by the synteny of miRNAs in the tree of life [15,16], so we have to RNA polymerase II. The pri-miRNA is folded into one or more resort to guessing that nearly identical sequences have a common long hairpins, with possibly several conserved patterns in the ancestor. In practice, similarity of the seed region is often used, loop and the sequence that are at the extremities of the hairpin and a family may be defined by the set of miRNAs with a common [7]. However, the pri-miRNA seems rapidly decayed, and RNA- seed region. However, given the size of the seed region (~7 bp), Seq studies usually cannot detect evidences of pri-miRNAs [12]. the similarity may be fortuitous. Alternatively, a miRNA family This pri-miRNA is then cleaved by Drosha (in animals) or DCL (in can also be defined as the set of the miRNAs with a common plants), into one or more individual pre-miRNAs. Each pre-miRNA function. In practice, both definitions can be handled identically, adopts a stem-loop secondary structure with a constrained size if the hypothesis is that the function is encoded in the mature that differs between plants and animal, but is always present. The miRNA, or in its seed region. The only interesting difference is pre-miRNA is cleaved by a Dicer protein, leaving at determined that this definition may also group together two miRNAs that are positions three products including the two mature miRNAs nearly identical due to random or convergent evolution. Studies and the loop. The two mature miRNAs form a duplex with 2 bp on miRNA evolution also consider miRNA families according overhangs at their 3’ ends, while the loop is precisely positioned to global pre-miRNA sequence similarity [17,18] which is in between the mature miRNAs. During this process, sequence accordance with miRBase family organization (Figure 1a,1b). heterogeneity in size and content of the mature sequence may arise from imprecise cleavage and editing mechanisms. Resulting So far, there is no widely accepted definition of miRNA