10444–10455 Nucleic Acids Research, 2015, Vol. 43, No. 21 Published online 29 September 2015 doi: 10.1093/nar/gkv932 Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction Eleonora De Leonardis1,2,3,†, Benjamin Lutz4,5,†, Sebastian Ratz4,5, Simona Cocco3, Remi´ Monasson6, Alexander Schug4,* and Martin Weigt1,2,*
1Computational and Quantitative Biology, Sorbonne Universites,´ Universite´ Pierre et Marie Curie, UMR 7238, 75006 Paris, France, 2Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France, 3Laboratoire de Physique Statistique de l’Ecole Normale Superieure,´ associe´ au CNRS et a` l’Universite´ Pierre et Marie Curie, 75005 Downloaded from Paris, France, 4Steinbuch Centre for Computing, Karlsruher Institut fur¨ Technologie, 76133 Karlsruhe, Germany, 5Fakultat¨ fur¨ Physik, Karlsruher Institut fur¨ Technologie, 76133 Karlsruhe, Germany and 6Laboratoire de Physique Theorique´ de l’Ecole Normale Superieure,´ associe´ au CNRS et a` l’Universite´ Pierre et Marie Curie, 75005 Paris, France http://nar.oxfordjournals.org/ Received June 02, 2015; Revised August 28, 2015; Accepted September 07, 2015
ABSTRACT complexes, in other cases function is directly tied to three- dimensional (3D) RNA structure (6). Gaining such struc- Despite the biological importance of non-coding tural knowledge is important for understanding function. RNA, their structural characterization remains chal- Experimental determination of RNA structure, however, lenging. Making use of the rapidly growing sequence remains challenging. Less than 6% of all structures in the databases, we analyze nucleotide coevolution across RCSB Protein Data Bank (7) contain RNA. Thanks to ad- at Karlsruhe Institute of Technology on February 10, 2016 homologous sequences via Direct-Coupling Analy- vances in sequencing technology, many RNAs have been se- sis to detect nucleotide-nucleotide contacts. For a quenced in different organisms and classified into homolo- representative set of riboswitches, we show that the gous families in the Rfam database (8). However, out of the results of Direct-Coupling Analysis in combination 2450 currently listed families, only 59 (2.4%) possess at least with a generalized Nussinov algorithm systemati- one representative PDB structure, even if 954 (39%) contain cally improve the results of RNA secondary structure more than 100 sequences, and 566 (23%) even more than 200 sequences. Computational structure prediction meth- prediction beyond traditional covariance approaches ods are a promising complementary technique to make use based on mutual information. Even more importantly, of this huge amount of sequences. Unfortunately, even the we show that the results of Direct-Coupling Anal- most advanced computational methods rarely reach RMSD ysis are enriched in tertiary structure contacts. By (root mean square deviation) values below 8–12 A˚ in blind integrating these predictions into molecular model- RNA structure prediction (9,10), staying well above the res- ing tools, systematically improved tertiary structure olutions of 2–3 A˚ reachable in X-ray crystal structures. Typ- predictions can be obtained, as compared to using ically, computational methods (11–13) struggle with various secondary structure information alone. limitations such as alternative solutions, or require expert human intervention. Topologically complex structure can- not be reliably predicted (14,15). Some of the best results INTRODUCTION in this direction have been obtained by combining compu- tational with experimental approaches: inter-residue con- Experimental work and genomic sequence analysis have re- tacts inferred from mutational studies can be introduced vealed that RNAs have a widespread role inside the cell into computational models as restraints (16,17). (1–5). In addition to the transmission of genetic informa- In the related field of protein structure prediction, recent tion, non-coding RNAs catalyze biochemical reactions and years have seen significant progress in residue contact pre- have a crucial role in a multitude of regulatory processes. diction by exploiting residue coevolution (18). These novel While some functional RNA act essentially via their single- methods are based on two ideas: (i) Tertiary contacts be- stranded information or in the context of RNA-protein
*To whom correspondence should be addressed. Tel: +49 721 60826303; Fax: +49 721 60824972; Email: [email protected] Correspondence may also be addressed to Martin Weigt. Tel. +33 1 44277368; Fax: +33 1 44277336; Email: [email protected] †These authors contributed equally to the paper as first authors.