Spatial Encoding Scheme for Protein Supporting Structure Discovery
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221197867 Spatial Encoding Scheme for Protein Supporting Structure Discovery Conference Paper · June 2009 DOI: 10.1109/BIBE.2009.51 · Source: DBLP CITATIONS READS 0 45 4 authors, including: Yu-Feng Huang Chien-Kang Huang ACT Genomics Inc. National Taiwan University 39 PUBLICATIONS 184 CITATIONS 31 PUBLICATIONS 382 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Protein supporting structure discovery View project All content following this page was uploaded by Yu-Feng Huang on 29 May 2014. The user has requested enhancement of the downloaded file. 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering Spatial Encoding Scheme for Protein Supporting Structure Discovery Yu-Feng Huang Chia-Jui Yang/Yi-Wei Yang/Chien-Kang Huang* Department of Computer Science and Information Department of Engineering Science and Ocean Engineering Engineering National Taiwan University National Taiwan University Taipei, Taiwan 106, Republic of China Taipei, Taiwan 106, Republic of China [email protected] {r95525053, r94525054, ckhuang}@ntu.edu.tw ( *corresponding author ) Abstract—Protein function is highly correlated with its three- [9]. Even now, we still spend more than 30 minutes while dimensional conformation in order to interact with others comparing whole PDB via VAST [10] online search service proteins, ligands, substrates, inhibitors, etc. Studies on binding (http://structure.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml) site and protein function have been investigated to understand or about 3 to 6 minutes by EBI-SSM [11] with multiple the mechanism of protein activity; therefore, protein stability computing nodes (http://www.ebi.ac.uk/msd-srv/ssm/). Like and flexibility play different roles on protein functions. In this webpage search of Google, the goal of protein structure work, we propose a framework to discover protein supporting database search attempts to report what protein structure is structures, which employs the signatures of local structures similar to query structure (discovery) and how similar is and mines the rigid regions form the same/similar proteins. (ranking). Through experiments and discussions, we heuristically Based on functional annotations such as Gene Ontology determined the effective encoding method. Further we apply this encoding method in discovering protein supporting (GO) [12] and ENZYME [13], proteins can be grouped by structure by identifying stable regions. Our results reveal that their main biochemical reactions or functionality. The GO supporting structure can be discovered in selected enzyme project provides three structured controlled vocabularies to families. Moreover, the reasons for supporting structure describe gene productions associated with their biological existence vary between protein families. process, cellular component, and molecular function. ENZYME provides hierarchical classification to cluster Keywords-spatial enconding scheme; neighobrhood residues proteins based on the biochemical reactions they catalyzed sphere; protein supporting structure discovery and label these proteins with Enzyme Commission (EC) numbers. The EC number is composed of a hierarchy set of I. INTRODUCTION four digits like IP address: the first digit refers to enzyme class; the second digit refers to the type of bond or functional Recognizing protein pair similarity is a fundamental group they acted on; the last two digits refers to the specific important issue in modern molecular biology. Protein detail information about reactions and substrates. The structure comparison algorithm is used to identify the function annotations can give us good materials to study the maximum number of equivalent Cα (alpha carbon) atoms relationship between protein structures and protein functions. upon which to optimally align the three-dimensional Jones’ work reveals that the tertiary structures of a structures of protein pair. As the protein structure protein family is much more conserved than the sequences of comparison is the NP-hard problem [1], most scientists try to the proteins within the family [14]. As protein function is propose different heuristic approaches to approximate the highly correlated to protein structure, local structure has been optimal solution. Therefore, in order to detect the fold focused for protein function analysis [15, 16]. Previous similarity and the functional or evolutionary relationship researches pointed out that protein function is associated between proteins, most of the global structure comparison with its local structure. Binkowski et al. applied algorithms exploit different heuristic algorithms in the initial computational approach to locate pockets or voids of protein alignment phase to quickly sieve out good candidates and structures including the surface template of CASTp [17, 18]. then refine the results from these candidate solutions by They also provide online service of pvSOAR [19] to identify iterative dynamic programming in order to get the optimized similar surface regions in three-dimensional protein solutions [2]. However, it will be a great challenge while structures. Porter et al. manually curated functional site applying general protein structure comparison tool to run residues of enzymes as the template library of Catalytic Site one-against-all whole Protein Data Bank (PDB) [3] search Atlas (CSA) [20]; furthermore, they also applied PSI- with the increasing number of protein structures. BLAST to identify functional residues based on collected Indexing technique for handling whole protein structure dataset to expand the template library semi-automatically. dataset to reduce comparison time has been investigated [4- Evidences showed that protein function is highly correlated 8]. With the fast growth of protein structures in PDB to its protein structure, especially the local structure at the (56,217, March 3, 2009), protein structure database search protein surface. becomes more and more critical for protein structure analysis 978-0-7695-3656-9/09 $25.00 © 2009 IEEE 153 DOI 10.1109/BIBE.2009.51 In this article, we propose a framework to discover the protein supporting structures for enzyme family by encoding A Set of Protein Structures the 3D positions of amino acids in local structure into one- dimensional structural signatures. Different encoding schemes were analyzed to find the most appropriate digest Neighborhood Residues for fast similar structure search. Supporting structure is Sphere Recognition formed by comprising conserved structural signatures which means that structural signature is most common among proteins. The connections between supporting structure, 1D Structural Signature protein function, protein stability, and protein flexibility can Encoding be observed from proteins in the same enzyme family. Moreover, the mechanism of protein activity could be inferred based on discovered supporting structures. Signature Clustering & Supporting Structure II. MATERIALS AND METHOD Discovery In this article, we discover protein supporting structure by identifying rigid structure among proteins based on Figure 1. Flowchart of supporting structure discovery. enzyme classification, one of hierarchical functional classification. Materials for this framework are to collect protein structures in PDB based on enzyme classification, and will be illustrated in detail in the following subsections. Next, we explore our framework to discover supporting C-terminal I L T structure of proteins for enzyme family. As shown in Figure 1, the overall framework consists of three major components: W G (1) local structure construction; (2) 1D structural signature Y encoding; (3) signature clustering and supporting structure C discovery. A A. Dataset According to ENZYME Data Bank, we first collect protein labeled with EC number from SWISS-PROT database [25]. For each protein, protein structure information could be referred to PDB via structure annotation of database cross-references in SWISS-PROT N-terminal data. Therefore, the network between EC number, SWISS- Figure 2. Neighborhood residues sphere in two-dimensional (2D) sketch. PROT ID and PDB ID can be conducted via cross-references The gray part is the area within 10 Å radius surrounding a central residue information across multiple databases. In this article, we G. focus on enzyme families of EC class 1, oxidoreductases, and EC class 3, hydrolases. The reason is that these two From the historical theory of lock-and-key model to the classes is the majority among all six EC classes. For each modern concept of induce-fit, the connection between family, we select high quality structures with resolution protein function and protein flexibility has been established. better than 2.0 Å, and only enzyme family with more than In order to achieving multiple functions, a protein will three protein chains will be selected to discover protein remain flexible to interact with different substrates or ligands. supporting structures. On the other hand, in order to support protein functional site, there exists rigid regions close to functional residue to B. Local Structure Construction become a supporting structure. Scheeff et al. discovered the We now define local structure representation for protein evidence of structural evolution in the kinase-like protein three-dimensional structure. Our original idea comes from superfamily, and they found that