RNA Ontology Consortium January 8-9, 2011
Total Page:16
File Type:pdf, Size:1020Kb
UC San Diego UC San Diego Previously Published Works Title Meeting report of the RNA Ontology Consortium January 8-9, 2011. Permalink https://escholarship.org/uc/item/0kw6h252 Journal Standards in genomic sciences, 4(2) ISSN 1944-3277 Authors Birmingham, Amanda Clemente, Jose C Desai, Narayan et al. Publication Date 2011-04-01 DOI 10.4056/sigs.1724282 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Standards in Genomic Sciences (2011) 4:252-256 DOI:10.4056/sigs.1724282 Meeting report of the RNA Ontology Consortium January 8-9, 2011 Amanda Birmingham1, Jose C. Clemente2, Narayan Desai3, Jack Gilbert3,4, Antonio Gonzalez2, Nikos Kyrpides5, Folker Meyer3,6, Eric Nawrocki7, Peter Sterk8, Jesse Stombaugh2, Zasha Weinberg9,10, Doug Wendel2, Neocles B. Leontis11, Craig Zirbel12, Rob Knight2,13, Alain Laederach14 1 Thermo Fisher Scientific, Lafayette, CO, USA 2 Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO, USA 3 Argonne National Laboratory, Argonne, IL, USA 4 Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA 5 DOE Joint Genome Institute, Walnut Creek, CA, USA 6 Computation Institute, University of Chicago, Chicago, IL, USA 7 Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA 8 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK 9 Department of Molecular, Cellular and Developmental, Yale University, New Haven, CT, USA 10 Howard Hughes Medical Institute, Yale University, New Haven, CT, USA 11 Department of Chemistry, Bowling Green State University, Bowling Green, OH, USA 12 Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH, USA 13 Howard Hughes Medical Institute, Boulder, CO, USA 14 Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Corresponding Author: Amanda Birmingham This report summarizes the proceedings of the structure mapping working group meeting of the RNA Ontology Consortium (ROC), held in Kona, Hawaii on January 8-9, 2011. The ROC hosted this workshop to facilitate collaborations among those researchers formalizing con- cepts in RNA, those developing RNA-related software, and those performing genome annota- tion and standardization. The workshop included three software presentations, extended round-table discussions, and the constitution of two new working groups, the first to address the need for better software integration and the second to discuss standardization and ben- chmarking of existing RNA annotation pipelines. These working groups have subsequently pursued concrete implementation of actions suggested during the discussion. Further infor- mation about the ROC and its activities can be found at http://roc.bgsu.edu/. Introduction The RNA Ontology Consortium is an international published an RNA Ontology (RNAO) [1] and an coalition of RNA researchers working to develop RNA Structure Alignment Ontology [2] for use by controlled vocabularies pertaining to RNA func- the wider research community, and where appli- tion and based on RNA primary sequence, second- cable has integrated these resources with other ary structure, and tertiary structure. Launched in relevant ontologies such as the Sequence Ontology 2005 with funding from the National Science (SO) [3], Chemical Entities of Biological Interest Foundation, the ROC includes nine working (CheBI) [4], and the Ontology for Biomedical In- groups addressing the ontology needs of RNA do- vestigations (OBI) [5]. mains ranging from backbone conformation to multiple sequence alignment. The Consortium has The Genomic Standards Consortium RNA Ontology Consortium January 8-9, 2011 This workshop furthered the goals of the consor- back of the current implementation of Infernal is tium by making progress on several of the RNA that it is computationally intensive, but accelera- structure mapping concepts elaborated upon dur- tion remains a major goal of the project. ing previous ROC meetings in Kona and in Stras- Jesse Stombaugh presented a live demonstration of bourg. In addition, it brought together members of Boulder ALE (ALignment Editor) [8], a tool for the ROC with members of the Genomic Standards manual examination and adjustment of multiple Consortium (GSC) [6], an open-membership or- sequence alignments. Such manual processing is ganization working towards 1) implementing new often critical to developing gold-standard align- genomic standards, 2) capturing and exchanging ments that can be used as test cases for automated information expressed in these standards and 3) alignment tools. Among other features, Boulder harmonizing information collection and analysis ALE allows users to examine isostericity of base- efforts across the wider genomics community. The pairs in the alignment relative to a reference sec- aim of developing links with the GSC was to in- ondary or 3D structure, to show or hide entire se- form developers and curators of genomic and me- quence features, to rearrange sequences within tagenomic software and resources of the stan- the alignment, and to shift nucleotides (e.g. by in- dards for representing RNA secondary and 3D sertion/deletion of gaps in aligned sequences) and structures and alignments developed by ROC and view the effect of these additions on alignment to encourage collaboration to improve genomic quality. The program currently takes inputs in annotations. FASTA format, but will soon be modified to take The overall structure of the meeting consisted of a Stockholm format [9] inputs for increased intero- working dinner, software presentations, a round- perability with other alignment software. table discussion on annotation challenges, and as- sembly of working groups to address these chal- Zasha Weinberg introduced R2R, software new lenges. The major annotation issues that were application to aid in drawing publication-quality identified included: 1) How can information that is figures for single and consensus RNA structures useful for drawing/representing RNA structures [10]. The goal of this software is to create struc- be exchanged among different software programs ture drawing with both aesthetic layout and em- and incorporated into different genomic and me- bedded annotation while minimizing the require- tagenomic web sites for display to end users? 2) ment for manual work in a graphics program such How can information about covariance models as Adobe Illustrator. One of the notable features automatically be used to infer correspondence of R2R is its ability to gracefully depict informa- groups? 3) How can information about structure tion on variability of consensus structures, such as mapping experiments be captured to facilitate au- hairpins of variable lengths or junctions that join tomatic secondary structure inference and propa- variable numbers of stems. Currently the software gated and displayed in a way that is useful to end is implemented as a command-line tool that takes users? input in Stockholm format and produces PDF or SVG images, as well as intermediate files defining Software Presentations the annotation and layout features. Eric Nawrocki provided an introduction to struc- tural RNA homology search using Infernal (INFE- Day One Discussion Rence of RNA ALignment) [7]. He noted that After the three software presentations, Day One searching DNA sequence for noncoding RNA proceeded with an extensive round-table discus- (ncRNA) molecules is more difficult than search- sion of the role of the RNAO in annotation pipe- ing for proteins, because ncRNA molecules are lines for RNA discovery. While protein (gene) an- generally shorter than protein molecules and their notation is a relatively mature field, RNA annota- alphabet is smaller, so conserved secondary struc- tion is much less mature. Given this fact, there is ture of RNAs provides useful additional signal. In- an opportunity to implement ontological structure fernal scores the combination of consensus se- and support from the very beginning of the stan- quence and consensus structure using covariance dardization of the field. The discussion focused on models; this technique produces results that are identifying ontological needs for end users who more sensitive than either BLAST or basic hidden- create RNA annotations or integrate them into Markov model-based approaches. A major draw- larger genome resources. 253 Standards in Genomic Sciences Birmingham et al. In such a scenario, one key use for ontologies alignments, secondary and tertiary structure an- would be to identify the provenance of RNA se- notations, and molecular functions. One key idea quences incorporated into annotation pipelines and was that information about visualization is itself describe the techniques used to generate these se- an annotation. It was agreed that, WYSIWIG sup- quences. It was noted that researchers performing port might be added in the future, for now it RNA annotation on sequences generated for pro- would be easier to provide presentation informa- tein-level work (i.e. genome and metagenome se- tion through a set of linked tools using a common quences) may be unaware of quality issues asso- file format. RNAO annotations will be referenced ciated with these sequences. For example, some according to their unique identifiers in the RNAO sequencing facilities pre-filter genomic and meta- and, where appropriate, with citations to papers genomic sequences to remove low-quality se- describing them. quences before distributing the sequences