The Impact of the Transversion/Transition Ratio on the Optimal Genetic Code Graph Partition
Total Page:16
File Type:pdf, Size:1020Kb
The Impact of the Transversion/Transition Ratio on the Optimal Genetic Code Graph Partition Daniyah A. Aloqalaa1, Dariusz R. Kowalski1;2, Paweł Błazej˙ 3, Małgorzata Wnetrzak3, Dorota Mackiewicz3 and Paweł Mackiewicz3 1Department of Computer Science University of Liverpool, U.K. 2SWPS University of Social Sciences and Humanities, Warsaw, Poland 3Faculty of Biotechnology, University of Wrocław, Poland fdorota, pamacg@smorfland.uni.wroc.pl Keywords: Code Degeneracy, Graph Theory, Mutation, Set Conductance, Standard Genetic Code, Transition, Transver- sion. Abstract: The standard genetic code (SGC) is a system of rules ascribing 20 amino acids and stop translation signal to 64 codons, i.e triplets of nucleotides. It was proposed that the structure of the SGC evolved to minimize harmful consequences of mutations and translational errors. To study this problem, we described the SGC structure by a graph, in which codons are vertices and edges correspond to single nucleotide mutations occurring between the codons. We also introduced weights (W) for mutation types to distinguish transversions from transitions. Using this representation, the SGC is a partition of the set of vertices into 21 disjoint subsets. In this case, the question about the potential robustness of the genetic code to the mutations can be reformulated into the optimal graph clustering task. To investigate this problem, we applied an appropriate clustering algorithm, which searched for the codes characterized by the minimum average calculated from the set W-conductance of codon groups. Our algorithm found three best codes for various ranges of the applied weights. The average W-conductance of the SGC was the most similar to that of the best codes in the range of weights corresponding to the observed transversion/transition ratio in natural mutational pressures. However, it should be noted that the optimization of the SGC was not as perfect as the best codes. It implies that the evolution of the SGC was driven not only by the selection for the robustness against mutations or mistranslations but also other factors, e.g. subsequent addition of amino acids to the code according to the expansion of amino acid metabolic pathways. 1 INTRODUCTION ganized in specific groups. In most cases, the codons in such groups differ at the third position, which can The questions about the origin and the structure of be called a degenerate position. This fact suggested to the standard genetic code (SGC) have puzzled biolo- Francis Crick that only the first two codon positions gists since the first codons assignments were discov- were important in a primordial code (Crick, 1968). ered (Khorana et al., 1966; Nirenberg et al., 1966). The redundancy of the SGC causes other interest- This nearly universal, with some rare exceptions, set ing consequences related to the process of single nu- of coding rules is responsible for transmitting genetic cleotide mutations. If these changes occur in the de- information stored in DNA molecules into the pro- generate codon position, then the originally encoded tein world. The code uses all possible 64 nucleotide amino acid will not be changed. These mutations triplets, i.e. codons, to encode 20 canonical amino are called synonymous or silent, whereas those that acids and also the signal for stopping the protein syn- change the encoded amino acid or stop translation sig- thesis, i.e. the translation. Since the total number of nal are named nonsynonymous. It should be noted codons is greater then the number of encoded labels, that there are two types of nucleotide changes, tran- the SGC must be degenerate, i.e. there must exist an sitions and transversions. In the case of transition, a amino acid that is encoded by more than one codon. purine nucleotide, i.e. adenine or guanine, mutates These redundant codons, called synonymous, are or- to another purine (A$G), or a pyrimidine nucleotide, 55 Aloqalaa, D., Kowalski, D., Błazej,˙ P., Wnetrzak, M., Mackiewicz, D. and Mackiewicz, P. The Impact of the Transversion/Transition Ratio on the Optimal Genetic Code Graph Partition. DOI: 10.5220/0007381000550065 In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), pages 55-65 ISBN: 978-989-758-353-7 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms i.e. cytosine or thymine, changes into another pyrim- representation, each genetic code is a partition of the idine (C$T). Transversion are changes in which a set of vertices into 21 disjoint subsets. Therefore, the purine mutates to a pyrimidine or vice versa (A$C, question about the potential genetic code optimality A$T, G$C, G$T). There are four possible transi- in regard to the mutations can be reformulated into tions and eight possible transversions. Transitions are the optimal graph clustering problem. more often observed in sequences than transversions In the present study, we investigated the proper- (Ducheneˆ et al., 2015; Gojobori et al., 1982; Kumar, ties of the SGC using a more general model including 1996; Lynch, 2010; Lyons and Lauring, 2017; Petrov in the graph representation information about tran- and Hartl, 1999; Rosenberg et al., 2003; Wakeley, sition to transversion ratio, which was not consid- 1996). It may result from a higher mutation rate of ered by (Tlusty, 2010; Błazej˙ et al., 2018a). From a transitions than transversions in nucleic acids due to mathematical point of view, we considered a weighted physicochemical similarity of the nucleotides. More- graph, in which all weights are dependent on the type over, transitions are accepted with a greater probabil- of nucleotide substitutions. We also modified the set ity because they rarely lead to amino acid substitu- conductance measure, which is widely used in the tions in encoded proteins due to the specific codon graph theory (Lee et al., 2014) and has many prac- degeneracy. The transitions are also more frequent tical interpretations, for example in the theory of ran- during protein synthesis (Freeland and Hurst, 1998a). dom walks (Levin et al., 2009) and social networks It should be noted, that the synonymous substitu- (Bollobas,´ 1998). In the problem considered here, tions do not have to be completely neutral mutations, the conductance of a codon group is the ratio of the even though they do not change a coded amino acid. weights of nonsynonymous mutations to the weights The specific codon usage can be associated with co- of all possible single nucleotide mutations, in which translational modifications of amino acids, efficiency the codons in this group are involved. Therefore, this and accuracy of translation as well as co-translational parameter can be used as a measure of robustness folding of synthesized proteins (Bulmer, 1991; Her- against the potential changes in protein-coding se- shberg and Petrov, 2008; Zhou et al., 2009). The syn- quences generated by the single nucleotide mutations. onymous codon usage can be also modified as a con- Basing on the methodology described in (Błazej˙ et al., sequence of selection at the amino acid level (Morton, 2018a), we found some solutions, i.e. the genetic 2001; Błazej˙ et al., 2017b). code structures, of the optimal graph clustering prob- lem. The tendency to minimize the number of non- synonymous substitutions were noticed in the SGC and this property suggested that the code could have evolved to minimize harmful consequences of muta- 2 PRELIMINARIES tions and translational errors (Ardell, 1998; Ardell and Sella, 2001; Di Giulio, 1989; Di Giulio and 2.1 Model Description Medugno, 1999; Epstein, 1966; Freeland and Hurst, 1998a; Freeland and Hurst, 1998b; Freeland et al., To study the general structure of the genetic code we 2003; Freeland et al., 2000; Gilis et al., 2001; Gold- developed its graph representation. Let G(V;E) be berg and Wittes, 1966; Goodarzi et al., 2005; Haig a graph in which V is the set of vertices represent- and Hurst, 1991; Woese, 1965). The robustness of the ing all possible 64 codons, whereas E is the set of code was usually measured as a difference between edges connecting these vertices. All connections ful- the polarity values of amino acids encoded by codons fil the property that the vertices, i.e. codons u;v 2 V before and after a single-point mutation. are connected by the edge e(u;v) 2 E (u ∼ v) if and Since the genetic code is a set of codons which only if the codon u differs from the codon v in ex- are related, e.g. by nucleotide mutations, the general actly one position. Moreover, we claim that all tran- structure of this code can be well described by the sitions are given a weight which equals always to methodology taken from graph theory (Beineke and one, while the transversions are given a weight W, Wilson, 2005; Lee et al., 2014). Similarly to (Tlusty, where W 2 [0;¥). The larger weight indicates that the 2010; Błazej˙ et al., 2018a), we assume that the code transversions are more important than transitions, re- encodes 21 items, i.e. 20 amino acids and stop trans- spectively. The weight can be interpreted as transver- lation signal, and all 64 codons create the set of ver- sion to transition ratio. Hence, the graph G is undi- tices of the graph, in which the set of edges corre- rected, weighted and regular with the vertices degree sponds to all possible single nucleotide mutations oc- equal to 9. Moreover, from a biological perspec- curring between the codons. This graph is undirected, tive, the set of edges represents all possible single nu- unweighted and regular. Moreover, according to this cleotide substitutions, which occur between codons in 56 The Impact of the Transversion/Transition Ratio on the Optimal Genetic Code Graph Partition a DNA sequence.