Protein-Structure Prediction Gets Real

attribute the change in optical reflectivity to 3. Eremets, M. I., Troyan, I. A. & Drozdov, P. Preprint at 8. Loubeyre, P., Occelli, F. & LeToullec, R. Nature 416, a pressure-induced phase transition in which https://arxiv.org/abs/1601.04479 (2016). 613–617 (2002). 4. Dias, R. P. & Silvera, I. F. Science 355, 715–718 (2017). 9. McMinis, J., Clay, R. C. III, Lee, D. & Morales, M. A. electrons in the sample become free to move 5. Loubeyre, P., Occelli, F. & Dumas, P. Nature 577, 631–635 Phys. Rev. Lett. 114, 105305 (2015). like those in a metal. Hydrogen remains as a (2020). 10. Azadi, S., Drummond, N. D. & Foulkes, W. M. C. 6. Dewaele, A., Loubeyre, P., Occelli, F., Marie, O. & Phys. Rev. B 95, 035142 (2017). molecular solid up to the transition pressure; Mezouar, M. Nature Commun. 9, 2913–2922 (2018). 11. Eremets, M. I. & Troyan, I. A. Nature Mater. 10, 927–931 it possibly stays in this state above 425 GPa, but 7. Jenei, Zs. et al. Nature Commun. 9, 3563 (2018). (2011). it is difficult to confirm this by spectroscopy because there is a reduced coupling between Computational biology light and matter in these extreme conditions. It can certainly be argued that a definite proof for metallic hydrogen would come only from a measurement of the sample’s electrical Protein-structure conductivity at high pressure as a function of temperature. Solid hydrogen should prediction gets real exhibit a high level of electrical conduction that should then decrease as the sample Mohammed AlQuraishi temperature is raised. However, even with experimental techniques developed in the Two threads of research in the quest for methods that predict past few decades to study condensed matter the 3D structures of proteins from their amino-acid sequences in extreme conditions, electrical-transport have become fully intertwined. The result is a leap forward in measurements of hydrogen remain a huge challenge9,10. the accuracy of predictions. See p.706 Nevertheless, Loubeyre and co-workers’ findings should be considered as a close-to- definite proof of dense hydrogen reaching a Proteins perform or catalyse nearly all the second has proved more recalcitrant. metallic state in extreme-pressure conditions. chemical and mechanical processes in cells. The set of shapes that a protein might take Computational predictions of the pressure at Synthesized as linear chains of amino-acid can be likened to a landscape: different loca- which molecular hydrogen enters a metallic residues, most proteins spontaneously tions in the landscape correspond to different state still lack accuracy, because they require fold into one or a small number of favoured shapes, with nearby locations having similar many different quantum-mechanical correc- three-dimensional structures. The sequence shapes. The height of a location corresponds tions that are difficult to address. However, of amino acids speci fies a protein’s structure to how energetically favourable the associated the experimental value of 425 GPa agrees and range of motion, which in turn deter- shape is, with the lowest point being the most with calculations11 that predict a transition in mine its function. Over decades, structural favoured. Natural proteins evolved to have hydrogen to a different solid phase at a similar biologists have experimentally determined funnel-shaped landscapes that enable newly pressure. thousands of protein structures, but the dif- synthesized proteins, jostled by the thermal Loubeyre and colleagues’ study has ficulty of these studies has made the promise fluctuations of the cell, to cross the landscape combined innovative techniques for ultra- of a computational approach for predicting and find their way to a favoured conformation high-pressure generation with advanced protein structure from sequence alluring. On in physiologically relevant timescales (milli- experimental methods using synchrotron page 706, Senior et al.1 describe an algorithm, radiation. In doing so, it has raised expecta- AlphaFold, that takes a leap forward in solv- “The algorithm tions for the discovery of other remarkable ing this classic problem by bringing to bear properties of solid hydrogen at extreme modern machine-learning techniques. outperformed all entrants density. For the time being, many questions The diversity of protein structures at the most recent blind remain. For instance, could electrical resistiv- precludes the possibility of obtaining simple assessment of methods ity be measured across the metallic transition? folding rules, making structure prediction used to predict protein Could superconductivity at a record-high difficult. Protein folding is ultimately driven temperature be achieved in hydrogen? And by quantum mechanics. Were it possible to structures.” could the molecular order be disrupted under compute the exact energy of protein molec- ultrahigh pressure and lead to an atomic phase ules from quantum theory, and to do so for seconds to minutes)4. Algorithms can search in the solid state? every possible conformation, then predict- the landscape to find favoured conformations Competition is still strong between different ing a protein’s most energetically favoured by following the landscape’s inclination, but research groups seeking to answer these structure would be easy. Unfortunately, a the ruggedness of the terrain causes them to questions, and to further unveil and under- quantum treatment of proteins is compu- get stuck in troughs and valleys far from the stand the characteristics of hydrogen at tationally intractable (quantum computers lowest basin. extreme density. More exciting findings are might change this), and the total set of possi- The course of the structure-prediction sure to come at every stage of the race. ble conformations that any protein can take is field changed nearly a decade ago with the astronomical, prohibiting such a brute-force publication of a series of seminal papers5–7 Serge Desgreniers is in the Department approach. exploring the idea that the evolutionary record of Physics, University of Ottawa, Ottawa, This has not stopped scientists from contains clues about how proteins fold. The Ontario K1N 6N5, Canada. attempting a direct attack on the problem. idea is predicated on the following premise: e-mail: [email protected] Physical chemists have devised tractable, but if two amino-acid residues in a protein are approximate, energy models for proteins2, and close together in 3D space, then a mutation 1. Wigner, E. & Huntington, H. B. J. Chem. Phys. 3, 764–770 computer scientists have developed ways to that replaces one of them with a different resi- (1935). 3 2. Mao, H. K. & Hemley, R. J. Science 244, 1462–1465 explore protein conformations . Much pro- due (for example, large for small) will probably (1989). gress has been made on the first problem but induce, at a later time, a mutation that alters Nature | Vol 577 | 30 January 2020 | 627 ©2020 Spri nger Nature Li mited. All ri ghts reserved. ©2020 Spri nger Nature Li mited. All ri ghts reserved. News & views the other residue in a compensatory direction (in our example, swapping small for large). The 1 set of co-evolving residues therefore encodes valuable spatial information, and can be found 0.8 by analysing the sequences of evolutionarily related proteins. By transforming this co-evolutionary 0.6 information into a matrix known as a binary contact map, which encodes which residues TM score 0.4 are proximal, the set of conformations that merit consideration by algorithmic searches can be restricted. This in turn makes it possi- 0.2 ble to accurately predict the most favourable protein conformation, especially for pro- 0 teins for which many evolutionarily related Protein 1 Protein 2 Protein 3 Protein 4 Protein 5 Protein 6 sequences are known. The idea was not new8, but the rapid growth in available sequence Figure 1 | Predictions of protein structures. Senior et al.1 report a machine-learning system called data in the early 2010s, coupled with crucial AlphaFold, which predicts the 3D structures of proteins from their amino-acid sequences. Template algorithmic breakthroughs, meant that its modelling (TM) scores measure how well a predicted structure matches the overall shape of the actual time had finally come. structure, on a scale from 0 to 1. TM scores for AlphaFold were better than those of other prediction systems Co-evolutionary analysis has been respon- for 25 out of 43 proteins in a blind test. Here, the TM scores for AlphaFold (red) are compared with those sible for most progress in protein-structure of other prediction systems (grey) in the blind test for six proteins whose 3D structures could be modelled only on the basis of their amino-acid sequences — no 3D structures of proteins that have similar amino- prediction in the past few years, but it has not acid sequences were available to use as a starting point for modelling. AlphaFold made the most accurate obviated the need for algorithms to search the predictions for five of these six proteins. (Adapted from Fig. 1b of ref. 1.) energy landscapes of proteins: binary contact maps constrain the search space, but do not pin down a single 3D structure. Furthermore, and minimal search procedures have also the next five years from structure predictions. the mathematics underpinning the conversion been embedded within another deep-learning Such broad availability of structural informa- of co-evolutionary data into contact maps is model to predict protein structures12. tion might transform the life sciences, just as restricted by the types of input used and the What is notable about AlphaFold is that it sequence information did in the preceding output generated. The initial injection of deep predicts distances with sufficient accuracy to decades. This could mean that, combined learning (a type of machine learning) into outperform state-of-the-art search methods with the rapid advances in protein–struc- co-evolutionary analyses improved matters by (Fig. 1). Senior et al. used advances in deep ture determination enabled by cryo-electron incorporating richer inputs9.

Protein-Structure Prediction Gets Real

AI for Health Examples from Industry, Government, and Academia Table of Contents

Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

Machine Learning for Speech Recognition by Alice Coucke, Head of Machine Learning Research

Artificial Intelligence and Cybersecurity

Overview of Current State of Research on the Application of Artificial Intelligence Techniques for COVID-19

A Reinforcement Learning Approach to Sequential Conformer Search

Arxiv:1911.05531V1 [Q-Bio.BM] 9 Nov 2019 1 Introduction

Protein Data Analysis

Fully-Differentiable Protein Folding Powered by Energy-Based Models

Deep Learning and Generative Methods in Cheminformatics and Chemical Biology: Navigating Small Molecule Space Intelligently

Deep Learning-Based Advances in Protein Structure Prediction