<<

proteome predictions (K. Tunyasuvunakool et al. https://doi.org/gk9kp7; 2021). For DEEPMIND’S AI PREDICTS the human proteome, 58% of AlphaFold’s pre‑ dictions for the locations of individual amino STRUCTURES FOR A VAST acids were good enough to be confident in the shape of the ’s folds, Tunyasuvunakool TROVE OF says. A subset of those predictions — 36% of the total — are potentially precise enough to detail AlphaFold neural network produced ‘transformative’ atomic features useful for drug design, such as the active site of an enzyme. database of more than 350,000 structures. Even the less-accurate predictions might offer insights. Biologists think that a large pro‑ By Ewen Callaway dump is a beginning, not an end. They will portion of human proteins and those of other want to validate the predictions and, more eukaryotes — organisms with cells that have he human genome holds the importantly, apply them to experiments that nuclei — contain regions that are inherently instructions for making more than were hitherto impossible. disordered and take on a defined structure 20,000 proteins. But only about only in concert with other molecules. “Many one‑third of those have had their 3D Prizewinning predictions proteins are just wiggly in solution, they don’t structures determined experimentally. In the process of preparing AlphaFold’s code have a fixed structure,” says AlphaFold lead TAnd in many of those cases, structures have for public release, DeepMind refined it to researcher John Jumper. Some of the regions been determined only in part. make the code run more efficiently. Whereas that AlphaFold predicted with low confidence Now, a transformative an earlier version could take days to make a match up with those that biologists suspect (AI) tool called AlphaFold, developed by prediction for a protein, the updated one can are disordered, says Pushmeet Kohli, head of ’s sister company DeepMind in , now compute structures in minutes to hours. AI for at DeepMind. has predicted the structure of nearly the entire With this added efficiency, the DeepMind human proteome (the full complement of pro‑ team set out to predict the structures of nearly Deluge of data teins expressed by an organism). Furthermore, every known protein encoded by the human The approximately 365,000 structure pre‑ the tool has predicted almost complete pro‑ genome, as well as those of 20 model organ‑ dictions deposited last week should swell to teomes for various other organisms, ranging isms. The structures are available in a database 130 million — nearly half of all known proteins from mice and maize (corn) to the malaria maintained by EMBL-EBI (the European Molec‑ — by the year’s end, says Sameer Velankar, a parasite (see ‘Folding options’). ular Biology Laboratory European Bioinfor‑ structural bioinformatician at EMBL-EBI. The more than 350,000 protein structures, matics Institute) in Hinxton, UK. Researchers are already using AlphaFold which are available through a public database, In addition to the predicted structures, which and related tools to help make sense out of vary in their accuracy. But researchers say the cover 98.5% of known human proteins and a experimental data generated using X-ray crys‑ resource has the potential to revolutionize the similar percentage for the other organisms, tallography and cryo-electron microscopy. life sciences. AlphaFold generated a measurement of the Marcelo Sousa, a biochemist at the University “It’s totally transformative from my perspec‑ confidence of its predictions. “We want to give of Colorado Boulder, used AlphaFold to make tive. Having the shapes of all these proteins experimentalists and biologists a really clear models from X-ray data of proteins that bac‑ really gives you insight into their mechanisms,” signal of which parts of the predictions they teria use to evade an antibiotic called colistin. says Christine Orengo, a computational biolo‑ should rely on,” says Kathryn Tunyasuvunakool, The parts of the experimental model that dif‑ gist at University College London (UCL). a science engineer at DeepMind and first fered from the AlphaFold prediction were typ‑ But researchers emphasize that the data author of a Nature paper describing the human ically regions that the software had assigned with low confidence, Sousa notes, a sign that FODIN OPTIONS AlphaFold is accurately predicting its limits. AlphaFold has aimed to predict the structure of every protein in humans as well as in David Jones, a UCL computational biologist 20 model organisms, including those listed here. For some of the proteins, it has provided who advised DeepMind on an earlier iteration multiple predictions, which explains why the numbers can be higher than the size of the proteome. In the case of Homo sapiens, the predictions include 98.5% of known proteins. of AlphaFold, is impressed with what the net‑ Predicted Actual work has achieved. But he says that many of the models predicted by AlphaFold could have Thale cress been generated with earlier software devel‑ (Arabidopsis thaliana) oped by academics. “For most proteins, those

Homo sapiens results are probably good enough for quite a lot of the things you want to do.”

Mouse (Mus musculus) But the availability of so many protein structures is likely to mark a “paradigm shift” Nematode worm in biology, says Mohammed AlQuraishi, a com‑ (Caenorhabditis elegans) putational biologist at Columbia University in Fruit fly New York City who works on protein-structure (Drosophila melanogaster) prediction. His field has spent so much time Budding yeast and energy on predicting accurate protein (Saccharomyces cerevisiae) structures on this scale that it hasn’t yet Malaria parasite worked out what do with such resources. (Plasmodium falciparum) “Everything we do today that relies on a pro‑ 0 5 10 15 20 25 30 tein sequence, we can now do with protein

SOURCE: EMBL–EBI AND HTTPS://SWISSMODEL.EXPASY.ORG/REPOSITORY AND EMBL–EBI SOURCE: Proteome size (thousands) structure,” he says.

Nature | Vol 595 | 29 July 2021 | 635 ©2021 Spri nger Nature Li mited. All rights reserved.