Challenges in the Computational Modeling of the Protein Structure—Activity Relationship
Total Page:16
File Type:pdf, Size:1020Kb
computation Review Challenges in the Computational Modeling of the Protein Structure—Activity Relationship Gabriel Del Río Department of Biochemistry and Structural Biology, Institute of Cellular Physiology, UNAM, Mexico City 04510, Mexico; [email protected] Abstract: Living organisms are composed of biopolymers (proteins, nucleic acids, carbohydrates and lipid polymers) that are used to keep or transmit information relevant to the state of these organisms at any given time. In these processes, proteins play a central role by displaying different activities required to keep or transmit this information. In this review, I present the current knowledge about the protein sequence–structure–activity relationship and the basis for modeling this relationship. Three representative predictors relevant to the modeling of this relationship are summarized to highlight areas that require further improvement and development. I will describe how a basic understanding of this relationship is fundamental in the development of new methods to design proteins, which represents an area of multiple applications in the areas of health and biotechnology. Keywords: protein structure; protein function; bijection 1. Proteins: Fundamental Polymers for Biological Systems Biological systems are composed of biopolymers, including nucleic acids, polymers Citation: Del Río, G. Challenges in (e.g., DNA, RNA), carbohydrates (e.g., starch, cellulose), lipids and proteins. All these the Computational Modeling of the biopolymers play critical roles in all the processes associated with life. Biopolymers are Protein Structure—Activity Relationship. Computation 2021, 9, 39. composed of monomers that are linked in a sequential way; for instance, proteins are https://doi.org/10.3390/ composed of amino acids, while nucleic acids are compose of nucleotides and polymers computation9040039 of carbohydrates are composed of different carbohydrate monomers (e.g., starch and cellulose are made of glucose mainly), and a similar situation occurs with lipids. Since each Academic Editor: Rainer Breitling biopolymer is composed of many different monomers, this situation gives rise to an enormous number of different biopolymer sequences. For instance, proteins are composed Received: 4 February 2021 of 20 different amino acids; hence, proteins that include up to 100 monomers of amino Accepted: 20 March 2021 acids will encompass 20100 different protein sequences. Published: 24 March 2021 The information about the state of the system is maintained by these biopolymers, and this is achieved by controlling the presence/activity of these biopolymers. Activity Publisher’s Note: MDPI stays neutral refers to the action that these biopolymers perform, either maintaining a particular morphol- with regard to jurisdictional claims in ogy, transporting cargo or altering the chemical structure of other molecules, among others. published maps and institutional affil- This activity is commonly referred to as function. In this review, I will use the term activity iations. to avoid confusion with the mathematical term function, which I will also use in this review. The flux of information in biological systems that involves all these biopolymers is depicted in Figure1, where proteins are proposed to play a central role. The central dogma in molecular biology [1] is depicted at the bottom of the figure, where the information flux Copyright: © 2021 by the author. goes from DNA to RNA (transcription) and from RNA to proteins (translation). The figure Licensee MDPI, Basel, Switzerland. includes bidirectional arrows to indicate that the information also goes from RNA to DNA This article is an open access article (retro-transcription) and from protein to RNA or DNA (epigenetics); the information flux distributed under the terms and may go from DNA to DNA, from RNA to RNA or from protein to protein, as indicated conditions of the Creative Commons by circular arrows. The other biopolymers (carbohydrates and lipids) are inserted into Attribution (CC BY) license (https:// this flux of information in the top part of the figure. It is relevant to note that proteins creativecommons.org/licenses/by/ participate in any of these fluxes, e.g., protein transcription factors are required for tran- 4.0/). Computation 2021, 9, 39. https://doi.org/10.3390/computation9040039 https://www.mdpi.com/journal/computation Computation 2021, 9, x FOR PEER REVIEW 2 of 11 Computation 2021, 9, 39 2 of 11 pids) are inserted into this flux of information in the top part of the figure. It is relevant to note that proteins participate in any of these fluxes, e.g., protein transcription factors are required for transcription, ribosomal proteins are involved in translation, and the syn- thesis/degradationscription, ribosomal of proteins carbohydrates are involved or lipids in translation, also requires and proteins. the synthesis/degradation of carbohydrates or lipids also requires proteins. Figure 1. Information flux among biopolymers. Figure 1. Information flux among biopolymers. Thus, proteins are central to life due to the different activities that they perform in biologicalThus, systems.proteins Inare consequence, central to life it due is expected to the different that knowledge activities of that the proteinthey perform contents in biologicalof living organisms systems. In may consequence, help to anticipate it is expected the capabilities that knowledge of such of systems. the protein For instance,contents ofhaving living access organisms to the may list ofhelp protein to anticipate sequences the in capabilities a tumor would of such provide systems. the For information instance, havingrequired access to anticipate to the list the of tumor’s protein malignancy; sequences in knowing a tumor the would proteins provide of viruses the information may antici- paterequired their to virulence. anticipate Due the totumor’s the advances malignancy; in DNA knowing sequencing the proteins techniques, of viruses having may access an- ticipateto protein their sequence virulence. data Due is relatively to the advances straightforward in DNA nowadays. sequencing However, techniques, knowledge having ac- of proteincess to sequencesprotein sequence is not enough data tois anticipate relatively the straightforward activities that proteins nowadays. may displayHowever, in knowledgebiological systems. of protein This sequences is because, is not at the enough most basicto anticipate level, protein the activities activity that depends protein ons maythe ability display of in proteins biological to interact systems. with This other is because, molecules; at the hence, most protein basic level, activity protein depends activity on dependsthe context on inthe which ability the of proteins protein isto located. interact Thiswith dependencyother molecules; of proteins hence, protein was originally activity dependsreferred toon as the gene context sharing, in which but, more the protein recently, is itlocated. has been This considered dependency a particular of proteins case was of originallya phenomena referred referred to as to gene as moonlighting sharing, but, [2 more]. For recently, instance, it when has been a protein considered referred a topar- as ticularthymidine case phosphorylaseof a phenomena is referred present to inside as moonlighting human cells, [2]. at For the cytoplasminstance, when of these a protein cells, referredit catalyzes to as the thymidine dephosphorylation phosphorylase of thymidine, is present among inside otherhuman nucleotides; cells, at the yet, cytoplasm when this of thesesame proteincells, it iscatalyzes expressed the outside dephosphorylation of cells, it stimulates of thymidine, cell growth among and other chemotaxis nucleotides; [3]. yet, whenFrom this the previoussame protein notion, is expressed it can then outside be surmised of cells, that it stimulates different contexts cell growth provide and proteinschemotaxis with [3]. different molecules to interact with, and this may be in turn translated into differentFrom protein the previous activities. notion, The it ways can then in which be surmised this differential that different binding contexts translates provide into proteinsdifferent with activities different highlights molecules another to interact feature ofwith, proteins: and this allostery may be [4 in]. Originallyturn translated proposed into differentby Monod protein and Jacob activities. in 1961, The the modelways in nowadays which this refers differential to conformational binding changestranslates in into any proteindifferent induced activities by hi bindingghlights to another other molecules, feature of which proteins: are requiredallostery for[4]. the Originally performance pro- posedof a particular by Monod activity and [Jacob5]. Take, in 1961, for instance, the model the nowadays case of hemoglobin refers to (Hb),conformational a protein changescarrying in two any oxygen protein molecules induced thatby binding are distributed to other frommolecules, blood which to tissues, are required which displays for the performanceseveral forms of of a allostery:particular (i)activity intrinsic, [5]. whenTake, for a single instance, oxygen the case molecule of hemoglobin binds to Hb (Hb), and a proteinthe binding carrying of the two second oxygen oxygen molecules molecule that is are facilitated distributed or (ii) from heterotrophic blood to tissues, ligands which (2,3- displaysbiphosphoglycerate, several forms protons, of allostery: carbon (i) dioxide, intrinsic, among when others) a single facilitate oxygen