Trends and Challenges in Computational RNA Biology Alina Selega and Guido Sanguinetti*
Total Page:16
File Type:pdf, Size:1020Kb
Selega and Sanguinetti Genome Biology (2016) 17:253 DOI 10.1186/s13059-016-1117-7 MEETINGREPORT Open Access Trends and challenges in computational RNA biology Alina Selega and Guido Sanguinetti* Abstract complemented by two lively poster sessions, where partic- ipants had an opportunity to engage with over 40 posters A report on the Wellcome Trust Conference on during evening drinks receptions. Computational RNA Biology, held in Hinxton, UK, on In this report, we briefly recount the content of the 17–19 October 2016. conference by providing condensed, headline-style sum- Keywords: RNA, Review, Computational biology maries of the research described in the talks and some posters. Within the scope of this brief report, we cannot possibly do justice to the wealth and breadth of material Introduction presented and we will not be able to mention much inter- Recent years have witnessed a profound shift in our esting research, particularly within the poster sessions. understanding of RNA biology. Several novel biochemical We would like to stress that omissions in this report are and sequencing techniques are producing vast amounts not based on quality, but simply on a personal judgement of data that fundamentally challenge the textbook view as to what material could be most coherently presented in of RNA as a simple intermediate step of gene expression, a very limited space. revealing a wealth of unexpected new roles and shed- ding light on the complexity of the RNA world. While Transcripts the emerging picture unequivocally points to the cen- Perhaps the most remarkable discovery in modern RNA trality of RNA as a mediator of most cellular functions, biology is the realization of the diversity of the transcrip- the richness and heterogeneity of modern datasets pose tome. Technologies based on next-generation sequencing significant interpretative challenges and call for an inter- (NGS) have demonstrated the existence of many novel disciplinary approach where statistical and computational classes of transcripts and the great variety of protein- methods will play an increasingly important role. coding transcripts, in terms of both isoforms and syn- The Wellcome Trust Conference on Computational onymous variants. The diversity of the transcriptome and RNA Biology provided a good opportunity to overview its interaction with phenotypes was the main theme of the state of the art in this up-and-coming interdisci- both keynote talks. Ben Blencowe (University of Toronto) plinary field. Organised by the scientific committee of introduced the concept of alternative splicing regulatory Alex Bateman (European Bioinformatics Institute, UK), networks and their role in development and autistic spec- Ivo Hofacker (University of Vienna, Austria), Karissa trum disorders. Blencowe illustrated how analysis of NGS Sanbonmatsu (Los Alamos National Labs, USA) and data has enabled the discovery of a novel class of micro- Mihaela Zavolan (University of Basel, Switzerland), the exons (3–27 nucleotides) that are strongly conserved conference was held at the Wellcome Genome Campus in and whose alternative exclusion is associated with the Hinxton, near Cambridge (UK) on 17–19 October 2016. autism phenotype. Isoform quantification methods were Featuring two keynote talks by Christine Mayr (Memo- discussed by Eduardo Eyras (University of Pompeu Fabra, rial Sloan Kettering Cancer Center, New York, USA) and Barcelona, Spain), who explained how the SUPPA method Ben Blencowe (University of Toronto, Canada), thirteen achieves high computational performance by decoupling invited talks and fourteen short contributed talks, the read mapping from transcript annotation. Methodolo- conference provided a very broad survey of quantita- gies for isoform quantification from time series RNA- tive and computational RNA biology. These were further seq data using the DICEseq method were also presented in the poster session by Yuanhua Huang (University of *Correspondence: [email protected] School of Informatics, University of Edinburgh, Edinburgh, UK Edinburgh, UK). Naturally, the presence of isoform RNA molecules does not immediately imply isoform expression © The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Selega and Sanguinetti Genome Biology (2016) 17:253 Page 2 of 4 at the protein level, as translational regulation may prefer- Structures entially select only a subset of isoforms. This question was RNAs in vivo fold in complex secondary and 3D struc- addressed by Lorenzo Calviello (Max Delbrück Center for tures. It is widely believed that RNA structures play a Molecular Medicine, Berlin, Germany), who used ribo- major regulatory role in determining the possible inter- some profiling data and the Splice-aware Translational action partners of RNAs, and ultimately, their function. Annotation (SaTAnn) tool. This analysis revealed that The computational biology community has long had a almost 55% of genes (in human HEK293 cells) translate a sustained interest in predicting RNA structures and the single isoform, and highlighted widespread translational conference witnessed several interesting presentations on control. SaTAnn also received the Best Acronym Award, the matter. beating stiff competition from CRAC and BUM-HMM While in principle feasible configurations could be (see below). computed by minimizing free energies derived from While alternative splicing has long been recognized as microscopic physical principles, the computation is in a major determinant of the diversity of the transcrip- general prohibitively complex. Simon Poblete (Interna- tome, recent research is also shedding light on the func- tional School for Advanced Studies, Trieste, Italy) pre- tional significance of synonymous variants, i.e. transcripts sented a novel approach to coarse-graining the state space that differ only in the non-coding region. Christine Mayr of possible configurations, leading to considerable accel- (Memorial Sloan Kettering Cancer Center, New York) erations in molecular dynamics simulations. Other talks described how transcript variants with different 3 UTRs described approaches that instead use auxiliary data to can give rise to dramatically different functions in the pro- bypass the difficult step of molecular simulations. Craig tein they code for. A prominent example is given by the Zirbel (Bowling Green State University, USA) described CD47 transcript in human: variants with a long 3 UTR JAR3D, a set of probabilistic models parametrized on are preferentially bound by the HuR protein (due to the the RNA 3D Motif Atlas, that infer new 3D motifs abundance of HuR binding sites on the long UTR), which from sequences. Debora Marks (Harvard Medical School, then leads to membrane localization of the nascent pro- USA) described how evolutionary couplings can be used tein, while CD47 proteins synthesized from a short 3 UTR within global probability models to improve the predic- variant remain in a perinuclear localization. Shorter tran- tive power of optimisation algorithms. Evolutionary argu- script variants can also arise from alternative use of polyA ments can also be invoked to exploit pairwise covariations sites, the presentation topic of Christina Leslie (also from in multiple RNA alignments to deduce the conserva- Memorial Sloan Kettering Cancer Center, New York), tion of RNA secondary structures. This line of reasoning although in this case the shorter transcript mostly results wasusedbyElenaRivas(HarvardUniversity,USA)to in a truncated protein or in a non-coding RNA (ncRNA). argue against the conservation of secondary structures The discovery of a great variety of novel ncRNAs was in long ncRNAs, stirring a certain level of debate within also one of the major breakthroughs of NGS technolo- the conference. Mutation patterns underlying structure gies; ncRNAs remain, however, largely mysterious in their conservation were also employed by Zasha Weinberg biological function. Albin Sandelin (Biotech Research (University of Leipzig) to discover a new group of & Innovation Centre, Copenhagen, Denmark) described riboswitches (metabolite-binding RNAs) and by Martin data from cap analysis of gene expression (CAGE) experi- Smith (Garvan Institute, Sydney, Australia) to cluster evo- ments illustrating the pervasiveness of bidirectional tran- lutionarily conserved RNA structural patterns. scription, often giving rise to mRNA–ncRNA pairs. He A major source of excitement within the RNA structure further explained how genomic features such as density community is the development of novel sequencing-based of polyA sites or closely spaced transcription start sites techniques for structure probing in vivo. High-throughput influence ncRNA expression. Igor Ulitsky (Weizmann experiments using a variety of probing agents are being Institute, Rehovot, Israel) used synteny to elucidate the performed at an increasing pace and Yiliang Ding (John function and origin of lincRNAs (long intergenic ncR- Innes Centre, Norwich, UK) described FoldAtlas, a NAs), highlighting