Residue Contact Prediction for Protein Structure Using 2-Norm Distances Nikita V

National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012) Proceedings published by International Journal of Computer Applications® (IJCA) Residue Contact Prediction for Protein Structure using 2-Norm Distances Nikita V. Mahajan L.G.Malik Department of Computer Science &Engg. Department of Computer Science &Engg. G.H. Raisoni College of Engineering, Nagpur G.H. Raisoni College of Engineering, Nagpur. ABSTRACT Bioinformatics is a field which uses technology and is rapidly Carboxyl end growing, to solve the problem related to biological area. Study CO푂− of protein, predicting its structure to know its function is an Amino end important field in bioinformatics. Protein unit that is twenty amino acids have total information for converting linear + sequences of amino acid into its unique and globular structures. 퐻3 푁 C H Protein folding problem is process to determine that protein is folding into its exact tertiary structure. Protein folds into matter Alpha Carbon of seconds to its stable 3-D structure; once it is stable it may perform proper functions. Contact map is an intermediate step Side chain R for converting 1-d structure to 3-d structure. Contact map is the representative graphical view, how the protein folds into its Figure 1: Amino Acid Structure proper structure. Here, 6 parameters are set, which consider 2- There are twenty different amino acids; they are represented as norm distances for generating the contact map. It will consider three letter codes, such as Alanine by ALA, Glycine as GLY, the co-ordinate data, only of the residue having alpha carbon as Proline as PRO, and Cytosine as CYS etc, which are grouped in contact type and map the contact as 1 if differences is greater such a fashion to represent the primary structure of protein.As than threshold value or -1 if less than threshold value. Proteins play a central role in nearly all biological processes at Keywords the cellular level [7]. Protein folding is the process by which proteins fold itself into Protein Structure Prediction, Amino Acids, Contact Map, Non- 3D structure. Protein chains fold into unique, tightly packed, local Contact Map, 2-Norm Distances. globular structures called folds [3]. The early work of Anfinsen 1. INTRODUCTION and Levinthal [6] established that a protein chain folds spontaneously and reproducibly to a unique three dimensional The most emerging field undergoing rapid, exciting growth is structure when placed in aqueous solution. The folding Bioinformatics. This has been mainly fuelled by advances in information process is present totally in linear sequences of DNA sequencing, Protein sequencing and mapping techniques. amino acids, determines its unique fold, and the geometry of a Key topic in computational structural proteomics is Protein protein fold largely determines its specific biological function. structure prediction [5] [6]. Predicting the tertiary structure of a protein by using its residue Protein is nothing but a20 linear amino acids chain which is sequence is called the Protein Folding Problem [4] [7]. covalently bounded with other amino acids (referred as residues). Length of these monomers ranges from ten to MET SER HIS TRP GLY thousand and they are very important part of mostly biochemical process [3].Amino acid is made up of organic compounds which LYS ASN GLY PRO GLU contain two functional groups that are amino group and ALA ILE ASP GLN THR carboxyl group. Amino group is represented as - 푁퐻3 which is LEU VAL ARG TYR PHE basic in nature. While carboxyl group is acidic in nature and it is SER GLY ALA ILE LEU represented as, - COOH. Amino acid is also termed as α amino ASN PRO VAL PRO ILE acid if both amino and carboxyl group is attached to some PRO GLY ASP THR GLU carbon atom. An alpha-amino acid has the generic formula LYS ILE VAL ILE ALA H2NCHRCOOH, where R is an organic substituent. R Group SER HIS ASN GLN LYS only changes in each amino acid ALA SER HIS VAL ILE PRO ILE ASP GLN THR Protein Folding LEU ASN PRO VAL PRO Linear amino acid to its structure GLY ............................ Figure 2: Protein 3-D Structure Prediction Indeed, proteins can be characterized by: (i) their primary structure, i.e. the alphanumerical amino acid sequence; (ii) their secondary structure, i.e. the spatial information related to 17 National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012) Proceedings published by International Journal of Computer Applications® (IJCA) amino acid residues, (iii) the it tertiary structure which describes model and applied input features are described in more details. the spatial coordinate for each atoms composing the whole Section IV presents experimental results. Finally, Section protein. [5].Protein don’t convert directly from primary structure Vprovides some concluding remarks. to tertiary structure, it goes from secondary structure which can be associated to one ofthe two possible states, namely α-helix, 2. METHODOLOGY β- strand.Alpha Helix (α) is the common motif in the secondary structure of proteins. It’s a right-handed coiled or spiral The proposed structure calculation procedure contains three conformation. Every backbone N-H group donates a hydrogen stages: contact map generation, contact map prediction and bond to the backbone C=O group of the amino acid. One turn of secondary structure prediction. the helix represents 3.6 amino acid residues. A single turn of the a-helix involves 13 atoms from the O to the H of the H bond [10]. Input protein atom data Total Residue Contact Map Generation Residue X Residue Symmetric Matrix Selecting alpha carbon residue Figure 3: Alpha Helix Structure [13] 2- Norm Distance Contact Beta sheets consist of the beta strands connected laterally by at 퐷 >t퐷 < 푡 Map least two or three backbone hydrogen bonds, forming a 푖,푗 푖,푗 Prediction generally twisted, pleated sheet. In the parallel b-pleated sheet, adjacent chains run in the same direction (N ® C or C ® N). In Value=1 Value = -1 the anti-parallel b-pleated sheet, adjacent strands run in opposite directions [8] [10]. Secondary Structure Prediction Flow Chart 1: Schematic overview of the structure prediction procedure 2.1. Contact Map Contact map is an intermediate step showing how amino acid can be converted from its linear form to its tertiary structure. There for it can be said that Contact map is guide for protein Figure 4: Beta Sheet Structure [13] structure prediction. Contact map is the graphical representation of amino acid residue. Therefore for predicting the 3D structure Recently a lot of effort has been put in solving the Protein contact map plays vital role [6]. The residues are said to Folding Problem (PFP).Most of them are based on heuristic be in contact if the carbon atom of one residue is in contact with methods, since the computational complexity of the underlying carbon atom of other residue. When amino acids come into models makes their complete computational simulation is very contact they give rise to various non-covalent bonds such as expensive and, sometimes, unfeasible [1].Reinforcement hydrogen bond, hydrophobic bond [6] [2]. Learning (RL) is an approach to machine intelligence in which an agent can learn to behave in a certain way by receiving punishments or rewards on its chosen actions [11].SVM was 2.2.Generating Contact Map applied to the features to predict the structural relevance (i.e. if Input to contact map generation will be the file containing all in the same fold or not) [12]. But still the problem remains information about the amino acid. Given a protein sequence P unsolved [5]. One problem with protein data is that, it is very with M residue such as P = {푅1, 푅2 … … … … … . 푅푚 }. noisy.An approach to solve this problem is representing residue And it can be said that two residue are in contact that is as contact. The principle behind contact map generation is just residue푅푖is in contact with residue 푅푗 if the distance between to show how the residue interacts with one another. These them are greater than that of specified threshold (t). That interactions will eventually lead to generation of tertiary is푅푗 − 푅푖 < 푡 structure of protein. Thus using such representation will Protein fold is normally defined on 3D Cartesian co-ordinates of amino acid atoms [3]. To find the contact between residue and reduce the dimensionality of problem down to more manageable residue 2-Normdistance is consider point [5]. .퐷 = |푅푗 − 푅푖| = 2 2 2 It is organized as follows. Section II provides definitions of (푋푗 − 푋푖) + (푌푗 − 푌푖) + (푍푗 − 푍푖) contact maps, non-local contacts. In section III, the proposed 18 National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012) Proceedings published by International Journal of Computer Applications® (IJCA) Where 푅푖 = (푋푖, 푌푖, 푍푖) are the three dimensional co-ordinate of 2.4.Input Parameter residue 푅푖 [3]. There are six types of input parameters for each pair of positions For plotting the contact map M x M symmetric matrix is in a protein sequence, thatcapture the different aspects of the considered. This matrix will have number of rows and number amino acids and the positions i and j of column equal to the residue number in protein sequence. Sequence Length A pair wise residue distance matrix can be defined as Sequence length (SeqLgt) will be for the total number D = 퐷푖,푗 for each protein, where each element of contact map of amino acid residue in the input protein file. matrix 푀푖,푗 is set to 1 if the residue (푅푖, 푅푗 )are in contact Minimum Sequence Separations otherwise it is set to -1. Minimum sequence separation (MinSeqSep) will the In other words, the contact map of protein with length N can be minimum sequences separation filter used for the displayed in matrix whose elements are defined as [4] creation of contact map.

Residue Contact Prediction for Protein Structure Using 2-Norm Distances Nikita V

Arxiv:1911.09811V1 [Physics.Bio-Ph] 22 Nov 2019 Help to Associate a Folded Structure to a Protein Sequence

Protein Contact Map Prediction Using Multiple Sequence Alignment

Computational Protein Structure Prediction Using Deep Learning

Leveraging Novel Information Sources for Protein Structure Prediction

Ensembling Multiple Raw Coevolutionary Features with Deep Residual Neural Networks for Contact‐Map Prediction in CASP13

Direct Information Reweighted by Contact Templates: Improved RNA Contact Prediction by Combining Structural Features

Machine Learning Methods for Protein Structure Prediction Jianlin Cheng, Allison N

Protein Structures

Deep Learning-Based Advances in Protein Structure Prediction

Challenges in the Computational Modeling of the Protein Structure—Activity Relationship

Protein Contact Map Denoising Using Generative Adversarial Networks

Homology Modeling in the Time of Collective and Artificial Intelligence