Synonymous Codon Usage in Different Protein Secondary Structural Classes 947

Synonymous Codon Usage in Different Protein Secondary Structural Classes 947

Synonymous codon usage in different protein secondary structural classes 947 Synonymous codon usage in different protein secondary structural classes of human genes: Implication for increased non-randomness of GC3 rich genes towards protein stability PAMELA MUKHOPADHYAY, SURAJIT BASAK and TAPASH CHANDRA GHOSH* Bioinformatics Centre, Bose Institute, P 1/12, CIT Scheme VII M, Kolkata 700 054, India *Corresponding author (Fax, 91-33-2355 3886; Email, [email protected]) The relationship between the synonymous codon usage and different protein secondary structural classes were investigated using 401 Homo sapiens proteins extracted from Protein Data Bank (PDB). A simple Chi-square test was used to assess the signifi cance of deviation of the observed and expected frequencies of 59 codons at the level of individual synonymous families in the four different protein secondary structural classes. It was observed that synonymous codon families show non-randomness in codon usage in four different secondary structural classes. However, when the genes were classifi ed according to their GC3 levels there was an increase in non-randomness in high GC3 group of genes. The non-randomness in codon usage was further tested among the same protein secondary structures belonging to four different protein folding classes of high GC3 group of genes. The results show that in each of the protein secondary structural unit there exist some synonymous family that shows class specifi c codon- usage pattern. Moreover, there is an increased non-random behaviour of synonymous codons in sheet structure of all secondary structural classes in high GC3 group of genes. Biological implications of these results have been discussed. [Mukhopadhyay P, Basak S and Ghosh T C 2007 Synonymous codon usage in different protein secondary structural classes of human genes: Implication for increased non-randomness of GC3 rich genes towards protein stability; J. Biosci. 32 947–963] 1. Introduction earlier worker demonstrated that fi rst, second and third positions of the codon has been associated respectively Genetic code is degenerate and not all synonymous codons with the biosynthetic pathway, hydrophobicity pattern, and are used with equal frequencies. The non-random use of the alpha helix or beta strand forming potentiality of the synonymous codons creates codon usage bias that act in coded amino acid (Volkenstein 1966; Taylor and Coates a species-specifi c way (Grantham et al 1980). The factors 1989; Siemion and Siemion 1994). It is also demonstrated responsible for codon usage bias in the coding sequences that fi rst and second position of the codons are the structure includes (і) diversity in the (G+C)% at the third codon determining positions whereas the third position of the position (Alvarez et al 1994), (іі) abundance of t-RNA codon is the species determining position (Majumdar et al molecule (Ikemura 1985), (ііі) overall base composition 1999). For the respective secondary structural units, protein of genes (Ellis and Morrison 1995), (іv) differences in alpha helices are preferentially coded by translationally fast the expression level of the genes (Pouwels and Leunissen mRNA regions while beta strands and coils are preferentially 1994), (v) in the cellular location of the genes in the genome coded by slow mRNA regions (Thanaraj and Argos 1996 (Chiapello et al 1999), (vi) optimal growth temperature a,b). However, by comparing three-dimensional structures (Lynn et al 2002; Basak et al 2004; Basak and Ghosh 2006) of proteins of Escherichia coli and human with their and (vii) protein secondary structures (Kahali et al 2007; corresponding mRNA sequences came to conclusion that Adzhubei et al 1996). Emphasizing on individual codons, species-specifi c correlation exists between the use of two Keywords. Aggregation reaction; codon usage; non-randomness; protein folding; protein secondary structure http://www.ias.ac.in/jbiosci J. Biosci. 32(5), August 2007, 947–963, © IndianJ. Academy Biosci. 32 of(5), Sciences August 2007 947 948 Pamela Mukhopadhyay, Surajit Basak and Tapash Chandra Ghosh synonymous codons and protein secondary structural units assumption that the bases were randomly associated to form (Oresic and Shalloway 1998). On contrary to this hypothesis codons (Zhang et al 1991; Gupta et al 2000). For example, Tao and Dafu (1998) found no signifi cant correlation there were 120 U, 150 C, 180 A and 210 G, for a sum of between the synonymous codon usage and the protein 660 total bases. The expected number for a codon was secondary structural units in E. coli proteins. Working on calculated from the probability that any base will occur at the same hypothesis with E. coli and Homo sapiens Gu a specifi c position of the codon. In the calculation, the base et al (2004) found no signifi cant correlation between the frequencies were used as the probabilities. For example, the use of synonymous codons and protein secondary structural probability that a base occurs at a specifi c position of the class in H. sapiens. However, compositional heterogeneity codon is 120 ÷ 660 = 0.181818 for U, 150 ÷ 660 = 0.227272 in H. sapiens genes has been completely ignored by Gu et al for C, 180 ÷ 660 = 0.272727 for A and 210 ÷ 660 = 0.318181 (2004) in their analysis. for G. The probability at which a codon is expected to occur In the present study a simple chi-square test was is the product of the probabilities of the bases in the codon. performed to assess the signifi cance of codon usage among The probability of the AGG (Arg) codon, for example, is four different (all-alpha, all-beta, alpha+beta, alpha/beta) calculated as 0.272727 × 0.318181 × 0.318181= 0.02761. secondary structural classes in H. sapiens genes. Same Since a total of 220 codons were counted, the expected analysis was performed after partitioning the genes in three number of AGG codon is therefore 220 × 0.02761= 6.0742. different groups according to their GC3 levels to remove The expected frequencies of individual codon have been any noise due to the compositional bias on codon usage. calculated from the overall base frequencies of the genes. We showed that there exist signifi cant correlation between Considering the compositional heterogeneity of H. synonymous codon usage and secondary structural class in sapiens, the coding sequences were partitioned into three H. sapiens and the non-randomness in synonymous codon groups according to GC3 level: low (0-46.5), midrange usage further increased in genes having higher GC3 levels. (46.6-64.7) and high (64.8-100). These GC3 boundaries Biological implications regarding the interrelationships correspond to those of the genes as distributed in the between GC richness and nonrandom use of synonymous isochore of the human genome (Arhondakis et al 2004). codons have been discussed. The secondary structural assignments of the individual residues for the coding sequences in four different secondary structural classes were done by Database of Secondary 2. Materials and methods Structure Assignment (DSSP) program (Kabsch and Sander A dataset of 401 H. sapiens protein sequences were collected et al 1983). The alpha helices are annotated by H and G from Protein Data Bank (PDB). The extracted protein in the DSSP fi le, beta-sheets by E and B and coils by the sequences were classifi ed into four secondary structural rest. classes (all-alpha, all-beta, alpha+beta and alpha/beta) from the structural information provided by ASTRAL Structural 3. Results and discussion Classifi cation of Proteins (SCOP) database 1.61 (Berman et al 2000; Brenner et al 2000; Conte et al 2002). The Observed and expected frequencies of codon for four respective four secondary structural classes include 86 all- different protein classes (all-alpha, all-beta, alpha+beta, alpha proteins, 103 all-beta proteins, 119 alpha+beta proteins alpha/beta) have been tabulated in table 1. The signifi cance and 93 alpha/beta proteins. Further the mutated proteins and of deviation of the observed and expected frequencies of 59 the same protein that have been classifi ed to more than one codons in individual synonymous families was tested by chi- class by SCOP database are removed from our analysis. square test. At the level of individual synonymous family This accounts for lower number of H. sapiens proteins in it was observed that 5 synonymous families comprising 19 our datasets as compared to 563 H. sapiens proteins being codons have signifi cant non-random codon distribution in collected by Gu et al (2004). TBLASTN program against the all four protein classes whereas in 11 synonymous families ‘nr’ database was used to retrieve the corresponding coding comprising 32 codons have displayed random distribution of sequences for the 401 H. sapiens proteins. The amino acid codon frequencies among the four protein classes. sequences, which have 100% similarity scores with the Gu et al (2004) while performing variance analysis on ‘nr’ sequence database are only chosen for avoiding any 563 H. sapiens proteins, observed no signifi cant difference ambiguity of one to one correspondence between the amino in synonymous codon usage in different H. sapiens protein acid and the codon. secondary structural classes. Therefore, they claimed that The signifi cance of deviation of the observed and synonymous codon usage is not related to protein secondary expected frequencies of codons for the 59 synonymous structural classes in H. sapiens. But, the present analysis codons was tested by chi-square test for the individual codon shows clear evidence that the synonymous codon usage for families. The expected frequencies were calculated with an at least 2 synonymous families representing threonine and J. Biosci. 32(5), August 2007 Synonymous codon usage in different protein secondary structural classes 949 proline is related to the protein secondary structural classes Considering the deviation of observed frequency from in H. sapiens. expected frequency of different synonymous groups of The results seem to be in apparent discrepancy with the codons in each protein class, it is clear from tables 2 and conclusions reached in previously published paper of Gu et 3 that signifi cant deviation between observed and expected al (2004).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us