Application of Bioinformatics on Protein Structure Prediction
Total Page:16
File Type:pdf, Size:1020Kb
2015 3rd AASRI Conference on Computational Intelligence and Bioinformatics (CIB 2015) ISBN: 978-1-60595-308-3 Application of Bioinformatics on Protein Structure Prediction Xiaodan Liu Jilin Agriculture Science and Technology College, Jilin, China ABSTRACT: With the advent of the information age, bioinformatics has been rapid development in many fields to be fully applied. This paper describes the establishment and development of common database of protein, protein structure analysis, protein secondary and tertiary structure prediction; The application of bioinformatics in protein structure analysis is comprehensively summarized, which provides theoretical basis for the wide application of bioinformatics in protein engineering. 1. INTRODUCTION 2. PROTEIN STRUCTURE DATABASE Bioinformatics provides a shortcut for the protein According to the data in the database based on the function prediction and structural analysis by analysis of the structure and function of the natural directly using gene or protein sequences. Which is protein, the spatial structure and biological function formed by the interaction of biological and computer of a certain amino acid sequence can be predicted, science and applied mathematics. It is processed by and the amino acid sequence and spatial structure of the data acquisition, processing, storage, retrieval the protein can be designed according to the specific and analysis, to explain the biological significance biological function. of the data. As the genome and proteome research The basic three-dimensional structure of the provides a very rich data, we need to make a high protein database is Protein Data Bank((PDB; degree of automation of these data management, http://www.pdb.org/), founded by Brookhaven National establish a strict database and write the Laboratory in New York in 1971,which is the corresponding software. This work greatly promoted database of the most important biological the development of bioinformatics, and macromolecules (proteins, nucleic acids and sugars) bioinformatics will also play a special role in the three- dimensional structure (Sussman et al.,1998; research of structure prediction (Hagen, 2000). Berman et al., 2000; Westbrook et al., 2003).The The emergence of bioinformatics has made the database is accurate coordinate data collected in the protein project into a new era. In the future, more protein structure by X-ray diffraction and nuclear detailed knowledge of protein structure and function, magnetic resonance (NMR) experiment measured. as well as advancements in high throughput Protein structure data in PDB are widely used in technology, may greatly expand the capabilities of studies of protein function and evolution, and they protein structure prediction. serve as a basis for protein structure prediction (He Protein structure prediction is an important task et al., 2014). in bioinformatics. To a large extent, the biological PDB Finder database (http://www.cmbi.kun.nl/ function of protein depends on its spatial structure, swift/pdbfinder/) is built on the basis of PDB, DSSP, so it is very important to predict the structure of HSSP, the two level library, which contains the PDB proteins. Sometimes a possible new gene cannot be sequence, the author, R factor, resolution, Secondary found by searching for any homologous sequence, Structure, etc. these information is not easy to read namely orphan gene, and for proteins encoded by directly from the PDB, with the PDB library every those genes, bioinformatics technology can be used time a new version, PDBFinder automatically to search for homologous genes based on the generated in EBI. homologous comparison of structure or predict its Homology Derived Secondary Structure of Proteins advanced structure to infer its function directly. (HSSP;http://www.sander.embl-heidelberg.de/hssp/) is the secondary structure database based on the homology of the two protein structure. Each PDB project has a corresponding HSSP file (Dodge et al., 1998). The database also provides the homology of all protein sequences in the SWISS-PROT database. 109 The Structural Classification of Proteins data- assigning regions of the amino acid sequence as base(SCOP; http://scop.mrc-lmb.cam.ac.uk/scop/)is a likely alpha helices, beta strands (often noted as comprehensive ordering of all proteins of known "extended" conformations), or turns. The different structure, according to their evolutionary and amino acid residues have different tendencies for the structural relationships. which is obtained by manual formation of different secondary structures. The classification, in which the proteins of known success of a prediction is determined by comparing structure are hierarchical classified (Murzin et al., it to the results of the DSSP algorithm (or similar e.g. 1995; Andreeva et al., 2004).This resource allows STRIDE) applied to the crystal structure of the users to analyze whether the query protein and protein. Specialized algorithms have been developed known structural protein have similarities. Protein for the detection of specific well-defined patterns domains in SCOP are hierarchically classified into such as transmembrane helices and coiled coils in families, superfamilies, folds and classes. proteins (Mount, 2004). The prediction accuracy of Molecular Modeling Database (MMDB; a single sequence of secondary structure is about http://www.ncbi.nih.gov/Structure/MMDB/mmdb. shtml), 60% ~80% (Petersen et al., 2000). The best modern This is a three-dimensional structure database used methods of secondary structure prediction in by the Entrez search tool (Wang et al., 2000). ASN,1 proteins reach about 80% accuracy (Pirovano format, which reflects the structure and sequence &Heringa,2010); this high accuracy allows the use data of PDB Library. Meanwhile, a complete set of of the predictions as feature improving fold 3D structure display program Cn3D is provided by recognition and ab initio protein structure prediction, NCBI. classification of structural motifs, and refinement of MUFOLD-DB(http://mufold.org/mufolddb.php), a sequence alignments.10 kinds of important and web-based database, to collect and process the effective methods were tested and analyzed using weekly PDB files thereby providing users with non- the unified standard, the comparison results show: redundant, cleaned and partially predicted structure the prediction effect of APSSP2, SsPro2.0 and data. For each of the non-redundant sequences, the PSIPRED method is better, the accuracy of the SCOP domain classification and predict structures prediction is more than 80% (Zhang et al.,2003). of missing regions are annotated by loop modelling. So far, more than 20 different secondary structure In addition, evolutional information, secondary stru- prediction methods have been developed. First cture, disorder region, and processed three- algorithms was Chou-Fasman method, which relies dimensional structure are computed and visualized predominantly on probability parameters determined tohelp users better understand the protein(He et al., from relative frequencies of each amino acid's 2014). appearance in each type of secondary structure(Chou et al.,1974). The next notable program was the GOR method, is an information 3. PROTEIN STRUCTURE PREDICTION theory-based method. Which employs the more powerful probabilistic technique of Bayesian Protein structure prediction is the prediction of the inference(Garnier et al.,1978).Another big step three-dimensional structure of a protein from its forward, was using machine learning methods. First amino acid sequence-that is, the prediction of its artificial neural networks methods were used. As a folding and its secondary, tertiary, and quaternary training sets they use solved structures to identify structure from its primary structure. Structure common sequence motifs associated with particular prediction is fundamentally different from the arrangements of secondary structures. These inverse problem of protein design. Protein structure methods are over 70% accurate in their predictions, prediction is one of the most important goals although beta strands are still often underpredicted pursued by bioinformatics and theoretical chemistry; due to the lack of three-dimensional structural it is highly important in medicine (for example, in information that would allow assessment of drug design) and biotechnology (for example, in the hydrogen bonding patterns that can promote design of novel enzymes) (https://en.wikipedia. formation of the extended conformation required for org/wiki/Protein_structure_pre-diction). the presence of a complete beta sheet (Mount, 2004). PSIPRED and JPRED are some of the most known Protein secondary structure prediction 3.1 programs based on neural networks for protein Protein secondary structure prediction is a set of secondary structure prediction (Cuff, 2000). techniques in bioinformatics that aim to predict the The predicted secondary structural states are not local secondary structures of proteins based only on cross validated by any of the existing servers. Hence, knowledge of their amino acid sequence only. information on the level of accuracy for every Which plays a vital role and acts as an intermediate sequence is not reported by the existing servers. This in solving tertiary structures; which provides an was overcome by NNvPDB, which not only insight in to protein function (Kloczkowski et reported greater Q3 but also validates every al.,2002). For proteins, a prediction consists of prediction with the homologous PDB entries.