Equalizing the Information Amounts of Protein and Mrna by Information
Total Page:16
File Type:pdf, Size:1020Kb
BioSystems 159 (2017) 1–11 Contents lists available at ScienceDirect BioSystems jo urnal homepage: www.elsevier.com/locate/biosystems Equalizing the information amounts of protein and mRNA by information theory ∗ Y. Adiguzel Biophysics Department, School of Medicine, Istanbul Kemerburgaz University, Istanbul, Turkey a r t i c l e i n f o a b s t r a c t Article history: Based on the Shannon’s information communication theory, information amount of the entire length of a Received 16 December 2016 polymeric macromolecule can be calculated in bits through adding the entropies of each building block. Received in revised form 21 April 2017 Proteins, DNA and RNA are such macromolecules. When only the building blocks’ variation is considered Accepted 19 May 2017 as the source of entropy, there is seemingly lower information in case of the protein if this approach is Available online 15 June 2017 applied directly on a protein of specific size and the coding sequence size of the mRNA corresponding to the particular length of the protein. This decrease in the information amount seems contradictory Keywords: but this apparent conflict is resolved by considering the conformational variations in proteins as a new Codon evolution variable in the calculation and balancing the approximated entropy of the coding part of the mRNA Protein conformation mRNA and the protein. Probabilities can change therefore we also assigned hypothetical probabilities to the Protein conformational states, which represent the uneven distribution as the time spent in one conformation, Shannon’s communication theory providing the probability of the presence in either or one of the possible conformations. Results that are Information amount obtained by using hypothetical probabilities are in line with the experimental values of variations in the conformational-state of protein populations. This equalization approach has further biological relevance that it compensates for the degeneracy in the codon usage during protein translation and it leads to the conclusion that the alphabet size for the protein is rather optimal for the proper protein functioning within the thermodynamic milieu of the cell. The findings were also discussed in relation to the codon bias and have implications in relation to the codon evolution concept. Eventually, this work brings the fields of protein structural studies and molecular protein translation processes together with a novel approach. © 2017 Elsevier B.V. All rights reserved. 1. Introduction seem meaningless although when it is actually not the case. Tak- ing the logarithms can make sense in such situations. A similar Shannon’s information communication theory (Shannon, 1948; approach is followed in the crude sense, in quantification of the Apter and Wolpert, 1965; Rontó, 1999) is a fruitful field that is information that is characterized by the number of possible distinct well-known mostly in bioinformatics (Koslicki, 2011; Orosz and messages that could be processed at a communication event. Obvi- Ovádi, 2011). In the simplest sense, it is based on taking the loga- ously, this depends on the type of communication and what is being rithm of variation in a condition that typically bears the information regarded as a message. Whatever they are, the universal notion in to be communicated. In mathematical terms, the act of taking the information communication is that the simplest condition of such logarithm is having a similar rationale as representing an exponen- an event involves two (equally) possible distinct messages, which tially increasing variable with a logarithmic axis when the relation could be for example the on situation or off situation of a device, of that variable with another entity is sketched as a graph. Varia- or an enzyme. This simplest condition could even be the presence tions in distinct entities can be immensely diverse in number and or absence of a signal in any sort of signal communication event. comparing them by using their actual quantities may practically Accordingly, choosing the base of the logarithm as 2 was recog- nised as the common approach, which resulted in the calculation of the information content of the ‘simplest’ signal communication event with two (equally) possible distinct messages as 1 bit. By this ∗ Correspondence to: Present address: Department of Biophysics, School of means, 1 bit is the unit information, which can simply represent the Medicine, Istanbul Kemerburgaz University, Kartaltepe Mah Incirli Cad. No: 11, presence or absence of the informing entity. Through calculating Bakirkoy, Istanbul, Turkey. the information content of any two diverse signal communication E-mail address: [email protected] http://dx.doi.org/10.1016/j.biosystems.2017.05.003 0303-2647/© 2017 Elsevier B.V. All rights reserved. 2 Y. Adiguzel / BioSystems 159 (2017) 1–11 events in bits, one can make a comparison of the two events since ation of the protein conformation is introduced as an additional this calculation takes also the realisation probabilities of the indi- source of entropy per residue. Basically, statistical variation of the vidual messages of the respective events as well. Such a comparison conformations in the folded state of the protein is considered to be would not be immediately possible without such a quantification representing the missing entropy per residue and hence the infor- (or in other words, standardization). So, this standardization enables mation amounts of the proteins, compared to that of the mRNAs. quantification in various information communication systems and So, statistical distribution of conformations in the protein popu- hence comparing and analysing them. Information content of a lation provides additional entropy per amino acid residue of the message can be calculated through the number of all possible mes- protein. In this sense, the work has inherent relation to the Boltz- sages, along with their realisation propensities, that are eliminated mann entropy as well. once a certain message is received. Messages are comprised of Different from the calculations for the information contents of sequential or a multiple number of symbols, entropy of which the molecular structures (Sarkar et al., 1978; Sullivan et al., 2003; would be proportional to the variation in them. This explanation Aynechi and Kuntz, 2005a; Aynechi and Kuntz, 2005b), it is pre- is valid and understandable in terms of biological macromolecules sented here that equalizing the information contents of the protein such as DNA, RNA or proteins, wherein the nucleotides or amino and the coding part of the mRNA compensates for the degener- acids as the building blocks of the respective macromolecules are acy in the codon usage during protein translation. Further, the the symbols and the macromolecules themselves, namely the pro- concept is evaluated together with the concept of the reduced teins or mRNAs that code for proteins, are the messages. In relation, alphabet (Murphy et al., 2000; Solis and Rackovsky, 2000; Li et al., suitability of information theoretic approaches in biosciences was 2003; Bacardit et al., 2009; Hod et al., 2013), which can be used proved by the presence of redundancy in DNA (Battail, 2014). That for the computational approaches of the protein folding studies, redundancy serves well for reliable message transmission through or so. (Reduced alphabet refers to the least number of amino acid unreliable channels of the biological environments. In a review, by types that could be relied on for attaining all the present/available applying the formula equivalent to information transmission chan- structural variations in the proteins.) It is concluded that the alpha- nel capacity to molecular systems, the efficiency of protein binding bet size is somewhat optimal for the thermodynamic milieu of the was suggested to be 70% (Schneider, 2010). cell. The findings will further be discussed in relation to the codon Proteins are important functional components of the cells and bias and codon evolution concepts (Sorimachi and Okayasu, 2008; they take active roles in the molecular communication networks, Cannarozzi and Schneider, 2012). nanonetworks (Akyildiz et al., 2008; Dressler and Akan, 2010; Adiguzel, 2016a). They are also part of a network that involves 2. Calculation several other proteins and DNA and RNAs during their synthesis, namely the protein translation. Two entities of that network, mRNA Below in Eq. (1), H is the entropy in bits per residue, which would and proteins, are aimed to be dealt with an information theoretic also be the information amount in each residue. approach since information theory can be used for information 20 quantification in distinct type of messages. Information processing = − H Pilog2Pi (1) = capacities of various mechanisms can be related in the informa- i 1 tion contents’ determination of molecular structures (Sarkar et al., 1978; Sullivanet al., 2003; Aynechi and Kuntz, 2005a; Aynechi and The subscript i in Eq. (1) is the protein residues’ variation as the Kuntz, 2005b). Accordingly, we aim to implement the information amino acid type. The equation is the same but the upper limit will be communication theory in the calculation and balancing of the infor- 4 when the mRNA nucleotides are the variables and hence the cal- mation amounts of the messenger Ribonucleic Acid (mRNA) and culation is performed for the mRNA. Here calculation is performed protein macromolecules, which are part of the protein translation for the protein and therefore i goes up to 20 since there are 20 differ- machinery as the inputs (mRNA) and the outputs (protein). ent amino acids, any of which can be present at each residue. The Pi Information theoretic approaches has been applied to the prob- in Eqn. (1) stands for the probability distribution of the amino acid lem of protein structure determination (Orosz and Ovádi, 2011; types in the corresponding subsequence (Oliver et al., 1999). So, it Rocha et al., 2012) and calculating topological entropies (Koslicki, is the probability of the amino acids’ occurrence in the protein or 2011). Common to these approaches, Shannon’s communication peptide and it would have been the bases’ occurrence in the mRNA.