Network Analysis of Protein Structures: the Comparison of Three Topologies

Network Analysis of Protein Structures: the Comparison of Three Topologies

Send Orders for Reprints to [email protected] Current Bioinformatics, 2016, 11, 000-000 1 RESEARCH ARTICLE Network Analysis of Protein Structures: The Comparison of Three Topologies Wenying Yan1,2, Guang Hu1,* and Bairong Shen1,* 1Center for Systems Biology, Soochow University, Suzhou 215006, China and 2Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou 215006, China Abstract: Topology plays a central role in the structure of a protein. Network theoretical methods are being increasingly applied to investigate protein topology. In this paper, amino acid contact energy networks (AACENs) are constructed for A R T I C L E H I S T O R Y globular, transmembrane and toroidal proteins. The effects of topology on proteins are investigated by the differences of various network parameters among three kinds Received: September 17, 2014 Revised: August 10, 2015 of protein topologies. Globular proteins are found to have the highest network Accepted: October 20, 2015 density, average closeness and system vulnerability, while toroidal proteins have the DOI: lowest values of these parameters. Transmembrane proteins are found to have 10.2174/15748936116661606021247 07 significantly higher assortativity values than globular and toroidal proteins. Guang Hu AACENs are constructed and compared for proteins with different secondary structure compositions, whose influences on biological functions are discussed in terms of topological descriptors. By extracting sub-networks only including interfacial residues between different chains, it may provide a simple but straightforward method to identify hot spots of toroidal proteins. This network study would offer new insight into overall topology and structural organization of different types of proteins. Keywords: Amino acid network, contact energy, protein topology, secondary structure, symmetry, toroidal proteins. 1. INTRODUCTION The construction and analysis of amino acid networks mostly focused on studying globular proteins, but it has Topology has continuously attracted consideration and also been applied to investigate transmembrane proteins discussion of mathematicians since ancient time. However, recently. Bagler and Sinha [20] firstly proved that globular recent advances have refreshed this area by investigating the proteins exhibit small-world behavior regardless of their structural properties in living systems, from biological structural class. Hydrophobic, hydrophilic and other types molecules to networks. Proteins are biological of amino acid networks have been delineated to further macromolecules consisting of one or more linear polymer understand physico-chemical properties of globular chains of amino acids, typically folded into globular or other proteins [21-23]. Network approaches have been extended forms by packing different secondary structures [1-3]. The to transmembrane proteins by elucidating the packing topological analysis of protein structures is essential to fully topology of structurally important residues [24]. More understand their biological functions. Current bioinformatics recently, Emerson and Gothandam [25] explored the has benefited a lot from applying the methods of network topological differences between transmembrane and theory to characterize protein structures [4-10]. Amino acid globular proteins in terms of network parameters, networks regard a protein as a network in which the nodes including assortativity values and closeness centrality. represent the amino acids and the edges respect different They found that the residue centrality is a quite good types of interactions among them. This representation not descriptor for predicting functional important residues only provides a powerful framework to uncover the general [26]. Toroidal proteins are the third type of proteins of organized principle of protein structures [11], but also importance [1]. Due to the oligomerisation status and enhances our understanding of various functions such as overall packing topology, toroidal proteins provide ideal protein folding [12, 13] and stability [14, 15], functional models to investigate protein-protein interactions. residues identification [16], protein decoys distinguish[17], However, the network analysis of toroidal proteins is still and protein-protein interactions [18, 19]. in its infancy, and poses an imminent challenge in structural proteomics [27-29]. *Address correspondence to this author at the Center for Systems Biology, In this study, amino acid contact energy networks Soochow University, Suzhou 215006, China; E-mails: (AACENs) [30], where edges are established according to [email protected]; [email protected] environment-dependent residue-residue contact energies, are 1574-8936/16 $58.00+.00 © 2016 Bentham Science Publishers 2 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al. constructed for globular, transmembrane and toroidal and normalizes the number of edges between the first proteins. This investigation not only focused on their small- neighbors of the vertex by dividing it by the maximal world properties, but also on comparison of their global number of such edges. Formally C is computed as topologies using four topological parameters, namely, 1 N n network density, average closeness, system vulnerability, C k (2) and assortative-type mixing behavior. The biological NNNk 1 kk( 1) / 2 importance of each parameter and the network difference th based on secondary structure composition has been where for the k node Nk is the number of its neighbors while discussed. In addition, we propose a simple method of nk is the number of contacts among them. extracting sub-network from AACENs to detect possible The characteristic path length L is defined as the number hotspots of toroidal proteins. As we will, now hope, such of edges in the shortest path between two nodes, averaged comparative network analysis sheds light on the architectural over all pairs of nodes: organization of proteins. 2 NN1 LL ij (3) 2. MATERILS AND METHODS NN( 1) i11 j i where N denotes node number and Lij is the shortest path 2.1. Datasets length between nodes i and j. The three-dimensional structures of globular and The network density δ is defined as the ratio between the transmembrane proteins were selected from the structures number of edges in a graph and the maximum number of used by Emerson et al. [25], which contain 61 globular edges which can be obtained by [35] proteins and 61 transmembrane proteins, respectively. In our 2M analysis, we carefully set up a dataset composed of 45 (4) toroidal protein structures from the PDB database, which NN( 1) include oligomers from 2 chains to 14 chains. A dataset of hot spots for one type of interface of toroidal proteins was where M denotes edge number of networks. predicted by the server of HotRegion [31]. The full list of The closeness centrality Ci of node i is the reciprocal of PDB ids of three protein sets and hot spots of toroidal average shortest path length, which can be calculated by: proteins is given in Supplementary Material. (N 1) C (5) 2.2. Amino Acid Contact Energy Networks i L i j i U, i j , The three dimensional structure of proteins were modeled where U is the set of all nodes. The average closeness <C> is as amino acid contact energy networks (AACENs), which is calculated by averaging the closeness centrality of all a type of graphs whose vertices are amino acids and edges vertices. are established between residues when environment- dependent residue-residue contact energies (ERCEs) were The system vulnerability V of a network is the maximal value smaller than zero. The ERCE between residues i and j was of pointwise vulnerability V (i), which can be calculated by defined as [32] [36] NNCC E E() i e ij00 i 0 j 0 Vi() (6) ij ln( ) (1) E NNCCi0 j 0 j 0 00 11 where Nij, Ni0, Nj0, and N00 are the contact numbers from the E ij (7) known structures, and Cij, Ci0, Cj0, and C00 are the NNL( 1) ij corresponding parameters expected in a reference state. Accordingly, AACENs based on ERCEs take into account where E(i) is the network efficiency after removing of the ith the type of secondary structure for each residue in proteins. vertex and all its edges. The contact energies of each protein were calculated by The assortativity of a network can be described by the RankViaContact[33, 34]. assortative mixing coefficient (r), which is given by [37, 38] The sub-networks were also constructed for toroidal 2 r1/ q jk ( ejk q j q k ) (8) proteins based on contact energy. For an interface between jk chain A and B, a pair of nodes are connected when a residue on chain A has direct connection with another residue from where j and k are the degrees of nodes, qj and qk are the chain B. As such, a sub-network can be represented as a remaining degree distributions, ejk is the joint distribution of the bipartite graph with two node sets, which extracted from remaining degrees of the two nodes at either end of a randomly chain A and B. chosen link and σq is the variance of the distribution qk. 2.3. Calculation of Network Metrics 2.4. Measurements of Prediction Performance The clustering coefficient C is a fundamental network The performance assessments of hot spots prediction are metrics for describing the hierarchical structure of proteins, based on the following definitions: AACENs for Proteins with Three Topologies Current Bioinformatics, 2016, Vol. 11, No. 3 3 Sensitivity: S /(TPTP FN) (9) and β barrel proteins, which both adapt cylindrical topology. The most important type of α helical proteins are ion Specificity: C TN /(FP TN) (10) channels, which are key components in a wide variety of biological processes. β-barrels composed of a single Precision: P /(TPTP FP) (11) polypeptide chain, or several polypeptide chains such as G- protein coupled receptor (GPCR). GPCRs contain seven Accuracy: A )( /(TPTNTP FP TN FN) (12) transmembrane helices that connected by three intracellular and three extracellular loops, which are involved in many where TP, TN, FP and FN stand for the numbers of true diseases and are the most privileged drug targets.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us