<<

Send Orders for Reprints to [email protected]

Current Bioinformatics, 2016, 11, 000-000 1 RESEARCH ARTICLE Network Analysis of Structures: The Comparison of Three Topologies

Wenying Yan1,2, Guang Hu1,* and Bairong Shen1,*

1Center for Systems Biology, Soochow University, Suzhou 215006, China and 2Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou 215006, China

Abstract: Topology plays a central role in the structure of a protein. Network theoretical methods are being increasingly applied to investigate protein topology. In this paper, amino acid contact energy networks (AACENs) are constructed for A R T I C L E H I S T O R Y globular, transmembrane and toroidal . The effects of topology on proteins are investigated by the differences of various network parameters among three kinds Received: September 17, 2014 Revised: August 10, 2015 of protein topologies. Globular proteins are found to have the highest network Accepted: October 20, 2015 density, average closeness and system vulnerability, while toroidal proteins have the DOI: lowest values of these parameters. Transmembrane proteins are found to have 10.2174/15748936116661606021247 07 significantly higher assortativity values than globular and toroidal proteins. Guang Hu AACENs are constructed and compared for proteins with different secondary structure compositions, whose influences on biological functions are discussed in terms of topological descriptors. By extracting sub-networks only including interfacial residues between different chains, it may provide a simple but straightforward method to identify hot spots of toroidal proteins. This network study would offer new insight into overall topology and structural organization of different types of proteins.

Keywords: Amino acid network, contact energy, protein topology, secondary structure, symmetry, toroidal proteins.

1. INTRODUCTION The construction and analysis of amino acid networks mostly focused on studying globular proteins, but it has Topology has continuously attracted consideration and also been applied to investigate transmembrane proteins discussion of mathematicians since ancient time. However, recently. Bagler and Sinha [20] firstly proved that globular recent advances have refreshed this area by investigating the proteins exhibit small-world behavior regardless of their structural properties in living systems, from biological structural class. Hydrophobic, hydrophilic and other types molecules to networks. Proteins are biological of amino acid networks have been delineated to further macromolecules consisting of one or more linear polymer understand physico-chemical properties of globular chains of amino acids, typically folded into globular or other proteins [21-23]. Network approaches have been extended forms by packing different secondary structures [1-3]. The to transmembrane proteins by elucidating the packing topological analysis of protein structures is essential to fully topology of structurally important residues [24]. More understand their biological functions. Current bioinformatics recently, Emerson and Gothandam [25] explored the has benefited a lot from applying the methods of network topological differences between transmembrane and theory to characterize protein structures [4-10]. Amino acid globular proteins in terms of network parameters, networks regard a protein as a network in which the nodes including assortativity values and closeness centrality. represent the amino acids and the edges respect different They found that the residue centrality is a quite good types of interactions among them. This representation not descriptor for predicting functional important residues only provides a powerful framework to uncover the general [26]. Toroidal proteins are the third type of proteins of organized principle of protein structures [11], but also importance [1]. Due to the oligomerisation status and enhances our understanding of various functions such as overall packing topology, toroidal proteins provide ideal [12, 13] and stability [14, 15], functional models to investigate protein-protein interactions. residues identification [16], protein decoys distinguish[17], However, the network analysis of toroidal proteins is still and protein-protein interactions [18, 19]. in its infancy, and poses an imminent challenge in structural proteomics [27-29]. *Address correspondence to this author at the Center for Systems Biology, In this study, amino acid contact energy networks Soochow University, Suzhou 215006, China; E-mails: (AACENs) [30], where edges are established according to [email protected]; [email protected] environment-dependent residue-residue contact energies, are

1574-8936/16 $58.00+.00 © 2016 Bentham Science Publishers 2 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al. constructed for globular, transmembrane and toroidal and normalizes the number of edges between the first proteins. This investigation not only focused on their small- neighbors of the vertex by dividing it by the maximal world properties, but also on comparison of their global number of such edges. Formally C is computed as topologies using four topological parameters, namely, 1 N n network density, average closeness, system vulnerability, C   k (2) and assortative-type mixing behavior. The biological NNNk 1 kk(1)/2 importance of each parameter and the network difference th based on secondary structure composition has been where for the k node Nk is the number of its neighbors while discussed. In addition, we propose a simple method of nk is the number of contacts among them. extracting sub-network from AACENs to detect possible The characteristic path length L is defined as the number hotspots of toroidal proteins. As we will, now hope, such of edges in the shortest path between two nodes, averaged comparative network analysis sheds light on the architectural over all pairs of nodes: organization of proteins. 2 NN1 LL  ij (3) 2. MATERILS AND METHODS NN(1)  iji11

where N denotes node number and Lij is the shortest path 2.1. Datasets length between nodes i and j. The three-dimensional structures of globular and The network density δ is defined as the ratio between the transmembrane proteins were selected from the structures number of edges in a graph and the maximum number of used by Emerson et al. [25], which contain 61 globular edges which can be obtained by [35] proteins and 61 transmembrane proteins, respectively. In our 2M analysis, we carefully set up a dataset composed of 45   (4) toroidal protein structures from the PDB database, which NN( 1 )  include oligomers from 2 chains to 14 chains. A dataset of hot spots for one type of interface of toroidal proteins was where M denotes edge number of networks. predicted by the server of HotRegion [31]. The full list of The closeness centrality Ci of node i is the reciprocal of PDB ids of three protein sets and hot spots of toroidal average shortest path length, which can be calculated by: proteins is given in Supplementary Material. (1)N  C  (5) 2.2. Amino Acid Contact Energy Networks i Lij iU ij, ,

The three dimensional structure of proteins were modeled where U is the set of all nodes. The average closeness is as amino acid contact energy networks (AACENs), which is calculated by averaging the closeness centrality of all a type of graphs whose vertices are amino acids and edges vertices. are established between residues when environment- dependent residue-residue contact energies (ERCEs) were The system vulnerability V of a network is the maximal value smaller than zero. The ERCE between residues i and j was of pointwise vulnerability V (i), which can be calculated by defined as [32] [36] NNCC EEi () e  ijij 0000 Vi() (6) ij ln( ) (1) E NNCCijj00000 11 where Nij, Ni0, Nj0, and N00 are the contact numbers from the E   ij (7) known structures, and Cij, Ci0, Cj0, and C00 are the NNL( 1) ij corresponding parameters expected in a reference state. Accordingly, AACENs based on ERCEs take into account where E(i) is the network efficiency after removing of the ith the type of secondary structure for each residue in proteins. vertex and all its edges. The contact energies of each protein were calculated by The assortativity of a network can be described by the RankViaContact[33, 34]. assortative mixing coefficient (r), which is given by [37, 38] The sub-networks were also constructed for toroidal 2 rjk1/() eq qq  jkj k (8) proteins based on contact energy. For an interface between jk chain A and B, a pair of nodes are connected when a residue on chain A has direct connection with another residue from where j and k are the degrees of nodes, qj and qk are the chain B. As such, a sub-network can be represented as a remaining degree distributions, ejk is the joint distribution of the bipartite graph with two node sets, which extracted from remaining degrees of the two nodes at either end of a randomly chain A and B. chosen link and σq is the variance of the distribution qk.

2.3. Calculation of Network Metrics 2.4. Measurements of Prediction Performance The clustering coefficient C is a fundamental network The performance assessments of hot spots prediction are metrics for describing the hierarchical structure of proteins, based on the following definitions: AACENs for Proteins with Three Topologies Current Bioinformatics, 2016, Vol. 11, No. 3 3

Sensitivity: S /(TPTPFN) (9) and β barrel proteins, which both adapt cylindrical topology. The most important type of α helical proteins are ion Specificity: C TN /(FP TN) (10) channels, which are key components in a wide variety of biological processes. β-barrels composed of a single Precision: P /(TPTPFP) (11) polypeptide chain, or several polypeptide chains such as G- protein coupled receptor (GPCR). GPCRs contain seven Accuracy: A )( /(TPTNTP FP TN  FN) (12) transmembrane helices that connected by three intracellular and three extracellular loops, which are involved in many where TP, TN, FP and FN stand for the numbers of true diseases and are the most privileged drug targets. The three positives, true negatives, false positives and false negatives, dimensional structure of a GPCR and its AACEN are shown respectively. Here, the conditional positive stands for a in Fig. 2. residue being a hot spot. Sensitivity, accuracy, specificity Toroidal proteins perform numerous important functions in and selectivity have been calculated for all interfacial biological systems [1]. These donut-shaped structures are quite residues between chain A and chain B (expect one protein common forms for enzymes, which embrace DNA or RNA in with pdb code: 3p87) of toroidal proteins in the dataset. central holes. The ring shape has the advantage of generating multiple identical binding sites on the hole around the DNA or 3. RESULTS AND DISCUSSIONS RNA. Toroidal proteins are known to exist as dimers and higher oligomers, such as from 2-mer to 14-mer in our study. 3.1. Three Topologies of Proteins They comprise many different types of polypeptide chains and topologically different quaternary associations. In contrast to Proteins perform biological functions by folding linear the classification of globular and transmembrane proteins by polymer chains into specific spatial topologies, which adopt their composition of α-helix and β-sheet, toroidal proteins were globular, cylindrical and toroidal shapes. divided into two sub-types: DNA sliding clamps and RNA polymerase. For examples, the E. coli DNA Polymerase III β Globular forms are most popular for protein three- Subunit (β clamp) is a homodimer that form a β-wheel lined by dimensional structures. Globular proteins can be classified into 12 β-helices in the inner surface, and six β-sheets in the outer structural classes according their secondly structures[20]. The α surface [39], while Trp RNA-binding attenuation protein proteins are composed predominantly of α-helices, and the β (TRAP) is a 11-mer, each containing 4 antiparallel β-strands proteins of β sheets. The α + β and α/β proteins have a mixed from one monomer and 3 β-strands from the adjacent monomer composition of α-helices and β-sheets. The difference is that in the ring [40]. These two kinds of oligomeric assemblies, the α + β proteins mainly have anti-parallel β sheets, whereas however, share the similar geometry of interface made by the those in α/β consist of mainly parallel beta sheets. As an association of two individual β-strands. The three dimensional example, the three dimensional structure of a structure of β clamp and its AACENs are shown in Fig. 3. and its AACEN are shown in Fig. 1. Transmembrane proteins are membrane proteins 3.2. General Small-world Properties spanning the entirety of the biological membrane to which it is permanently attached. They play a central role in We have calculated L and C for each type of proteins and mediating a broad range of fundamental cellular activities their related random networks, as shown in Fig. 4. Our network such as signal transduction, cell trafficking and construction method is environment-depended, so we expect photosynthesis. Transmembrane proteins consist of α helical that it has the advantage in distinguishing local micro-

Fig. (1). The three dimensional structure of a globular protein (pdb code: 1avg) and its AACEN. In AACEN representation, nodes with red color indicate α-helices and blue color indicate β-sheets. The edges across the circle represent the long-range contacts, while the edges connected along the circle boundary denote the short-range contacts in the protein tertiary structures. The ring graph representation of AACEN was generated by RINalyzer. 4 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al.

Fig. (2). The three dimensional structure of a transmembrane protein (pdb code: 2z73) and its AACEN.

Fig. (3). The three dimensional structure of a toroidal protein (pdb code: 2pol) and its AACEN. environment (secondly structures) than other known methods 0.02±0.01, 0.01, respectively. Thus, AACENs of globular, such as based on the cut-off distance. We have further transmembrane and toroidal proteins belong to ‘‘small-world” compared L and C among subtypes in each topological group, networks [41]. The statistical significance was also found for as listed in Table 1. The comparisons of four structural classes clustering coefficient among different sub-classes both for of, α, β, α+β, α/β globular proteins, are shown in Fig. 4(a), as globular proteins and transmembrane proteins. The average of well as two structural classes of transmembrane proteins are the distribution of C is 0.57 ± 0.02 for α globular protein shown in Fig. 4(b). It should be noted that toroidal proteins networks, and 0.48 ± 0.02 for β globular protein networks, with cannot be classified according to secondary structural elements, the p-value 6.25E-06. The average of the distribution of C is so the Fig. 4(c) shows the comparison of L–C plot for DNA 0.53 ± 0.02 for α transmembrane networks, and 0.46 ± 0.02 for sliding clamps and RNA polymerases. β transmembrane networks, with the p-value 5.52E-13. α Results show that all types of amino acid networks have Therefore, with the proposed network approach, -helix has been detected to have statistically significant higher clustering significantly higher clustering coefficient than, but similar β average shortest path length with their random counterparts. coefficient than -sheets. This is caused by the more densely α β Clustering coefficients for AACENs of globular, packed amino acids in helical structures than barrel shaped structures. For toroidal proteins, the statistical significance was transmembrane and toroidal proteins are 0.51±0.04, 0.49±0.04, 0.48±0.02, but for their random counterparts are 0.04±0.02, not found for clustering coefficient but for average shortest AACENs for Proteins with Three Topologies Current Bioinformatics, 2016, Vol. 11, No. 3 5

Table 1. Clustering coefficient C, characteristic path length L, density δ, average closeness , system vulnerability V and assortative mixing coefficient r of amino acid networks for subclasses of globular, transmembrane and toroidal proteins.

Protein topologies Topological Descriptors C L δ V r

Globular alpha 0.57±0.02 5.44±1.21 0.05±0.01 0.19±0.03 0.02±0.006 0.37±0.09 proteins beta 0.48±0.02 4.93±0.58 0.05±0.01 0.21±0.03 0.02±0.006 0.24±0.06 P value 6.25E-06 0.4411 0.2119 0.4411 0.402 2.17E-03 Transmembran alpha 0.53±0.02 7.09±1.76 0.02±0.01 0.15±0.03 0.008±0.003 0.38±0.07 e proteins beta 0.46±0.02 7.62±1.56 0.02±0.01 0.14±0.03 0.007±0.004 0.27±0.09 P value 5.52E-13 0.1116 0.1258 0.105 0.1963 1.46E-06 Toroidal proteins DNA-binding 0.48±0.02 10.84±1.00 0.009±0.01 0.09±0.01 0.003±0.000 0.30±0.05 RNA-binding 0.47±0.03 9.39±2.01 0.011±0.004 0.11±0.02 0.004±0.001 0.24±0.06 P value 0.4336 0.0032 0.01026 0.00294 0.2761 0.00051

Fig. (4). Represents the L–C plot for (a) globular proteins, (b) transmembrane and (c) toroidal proteins. (d) Represents the mean and standard deviation of their L–C plots. Random controls are indicated by an arrow in figures. path length. The average of the distribution of L is 10.84 ± 1.00 scale-free networks follow the power-law distribution of for DNA sliding clamps, and 9.39 ± 2.01 for RNA degrees[42]. Fig. 5 shows the degree distributions of polymerases, with the p-value 0.0032 (<0.005). This is because globular, transmembrane and toroidal proteins. The shape of each DNA sliding clamps needs a larger hole to bind the larger these degrees shows Gaussian distribution and most of nodes DNA molecules. have degrees ranging from 5 to 10. In comparison with random and scale-free networks, the degree distribution of The distribution of degree is another important property for network topology, which is also used to distinguish AACENs are more uniform, whose values mostly lie around different classes of networks. The degree distribution of a the mean degree. These results may also support our above observation that AACENs for three kinds of proteins belong random work is determined by a Poisson distribution, while to “small-world” networks. 6 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al.

Fig (5). Degree distributions for (a) globular, (b) transmembrane and (c) toroidal proteins.

significance between α helical structures and the β barrel 3.3. Topological Descriptors shaped structures, which means that the network density of proteins is independent of their secondary structure We have analyzed four network parameters as formations. However, there is a significant difference in δ topological descriptors, including the network density ( ), network density between DNA-binding (0.009) and RNA- C V the average closeness (< >), the system vulnerability ( ) binding (0.011) toroidal proteins. This also verify that the and the assortative mixing coefficient (r), for globular packing density of toroidal proteins is mainly decided by the proteins, transmembrane proteins and toroidal proteins. distance of central cavity. These results are shown in Fig. 6. In order to compare the role of their secondary structure elements, we also calculated 3.3.2. The Average Closeness these four network parameters covering all sub-classless as tabulated in Table 1. Centrality is of vital importance for the topology of networks for identifying essential nodes. Closeness is a 3.3.1. Network Density centrality measure of nodal importance quantifying how long it will take to spread information to all other nodes The network density is the normalized version of the sequentially. The calculation of closeness centrality takes average number of neighbors, which indicates the average into account pathways that connect residues over the whole connectivity of a node in the network. The value of network protein, and is thus a global measure based on the network density is between 0 and 1, and it reflects how densely the topology. It has been found that functional important network is populated with edges. residues always have high closeness centrality values [16]. Fig. 6(a) shows that the average density of globular The average of closeness centralities of all nodes, a proteins is 0.04, while are 0.02 and 0.01 for transmembrane measure of how central of its most central node in relation to and toroidal proteins. The largest density of globular proteins all the other nodes, is used to compare the centralization of indicates that they have the most compact packing forms. AACENs. The results of comparison of average closeness Transmembrane proteins have smaller density means that centrality for three types of proteins have been shown in Fig. residues are less densely connected with each other 6b. It can be seen that the value of ranges from 0.02 to ± distributed round ion channels. The smallest density 0.04 for globular proteins, from 0.15 to ± 0.03 for observed for a toroidal protein is due to the large central transmembrane proteins and from 0.10 to ± 0.02 for toroidal cavity within the torus, which plays a key role in binding proteins. Lowest L values for the globular proteins contribute nucleic acids. These results indicate both cylindrical and to largest values of , which indicate that the spherical toroidal topologies are not favorable for closed packing. packing of the globular topology is conducive to signal Network density shows that there is no statistical AACENs for Proteins with Three Topologies Current Bioinformatics, 2016, Vol. 11, No. 3 7

Fig. (6). Statistical evaluation of four network parameters: (a) density, (b) average closeness, (c) system vulnerability and (d) assortative mixing coefficient for globular, cylindrical and toroidal proteins. transduction within a protein. On the other hand, the larger secondary structures. It has been known that the system and largest L values of transmembrane and toriodal proteins vulnerability is related to the hierarchy and symmetry of are help to explain the lower and lowest value of . Due complex networks [30]. Hierarchy means that different parts to the elongated structure of transmembrane proteins and the of the system have different impacts on the system oligomerization of toriodal proteins, they need to maintain performance, which is a common property of biological long-range transformations between distal regions for networks. The higher vulnerability means that a network has controlling allosteric regulation. In Table 1, we observed that evident hierarchical property, and lower symmetry. α β the differences of average closeness for -helix and -sheet Now, we are turning to the symmetry of proteins. The p structures are not statistically significant ( > 0.005). understanding of symmetry can provide direct insight and Accordingly, the of amino acid networks is decided by better understanding of protein structures and functions [43]. shortest path lengths, regardless of their secondary structures In comparison with AACENs of globular proteins, AACENs c> formations. The < is reminiscent of network density, of cylindrical and toroidal protein are graphs with transitive while networks with larger values of have larger packing isometry group actions. Cylindrical and toroidal proteins are densities. multi-subunit oligomeric complexes that assembled from 3.3.3. The System Vulnerability two or more identical proteins symmetrically. Cylindrical proteins belong to the cyclic symmetry groups Cn (n is the The vulnerability of a node in a complex network is number of subunits), which have one rotational axis of another centrality measure, whose values are used to symmetry, for example, C7 symmetry for G protein-coupled describe the robust of the network by removing such a node receptors (GPCRs). Toroidal proteins have higher symmetry and its corresponding edges. This pointwise vulnerability of dihedral groups Dn, and they contain additional Vi()describes the importance of a nodei in the network perpendicular axis of two-fold symmetry at interfaces [44]. hierarchy, and the maximal value of pointwise vulnerability Breaking the symmetry in AACENs of cylindrical and V can be used to quantity symmetry of networks. toriodal proteins will lead to appearance of hierarchy and modularity in globular protein networks [45]. Thus, globular Fig. 6(c) shows that the average vulnerability value V ranges from 0.02 to ±0.008 for globular proteins, from 0.008 proteins exhibit high modularity, suggesting that allosteric to ±0.003 for transmembrane proteins and from 0.004 to property favor for this topology. ±0.001 for toroidal proteins. Globular proteins have 3.3.4. Assortative Mixing Coefficient significantly higher values than both transmembrane proteins and toroidal proteins. For all subtypes of three kinds of The assortative mixing coefficient r is a global topological proteins, there is no statistically significant quantitative measure of degree correlations in a network, and difference for vulnerability (Table 1). Therefore, the system takes values ranging from -1 to 1. Positive values of r denote vulnerability is also independent of environments of protein the presence of an assortative characteristic that high degree 8 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al. nodes prefer to attach to other high-degree vertices, whereas in the folding process of a protein and help in determining negative values confirm disassortativity that high-degree the topology of tertiary structure of a protein. nodes likely attach to low-degree ones. From Fig.6 (d), we have compared r for globular, 3.4. Hot spots prediction for Toroidal Proteins transmembrane and toroidal proteins. Globular and torodial proteins are observed to have similar average assortative Protein-protein interactions are crucial for carrying out coefficient values, which ranging from 0.28 to ±0.08 and many biological processes and regulating cellular and signaling pathways, which also form the basis of enormous from 0.28 to ±0.06, respectively. Transmembrane proteins are found to have significantly higher average assortative diseases. The study of residues located at protein-protein coefficient values, which ranging from 0.32 to ±0.10. The interface can give fine details of interaction and guide further drug design. Toroidal proteins are members of protein- positive values of average assortative coefficient suggest that they all posses assortative mixing characteristics. The higher protein complexes, as a result of polymerization of multi assortative coefficient of transmembrane proteins may be protein chains. Therefore, the investigation of protein-protein interfaces of toroidal proteins is of considerable importance. caused by their polypeptide chains folding into the biological membrane a couple of times. Newman [31] found that Amino acid networks have already been employed to et al assortative networks are more robust to vertex removal. This analyze hot spots in protein-protein interfaces. del Sol . [18] applied small-world network approach to protein structural character further confirm the robust of transmembrane protein networks, as expected for the most complexes and found that hotspots usually have high et al regular form of cylindrical symmetry. betweenness. Brinda . [19] used graph representation of oligomeric proteins and applied spectral analysis to the We also evaluate the statistical significance of mixing amino acid networks to predict hot spots. behavior of nodes in different subclasses (Table 1). Low p- However, an AACEN uses a different definition of amino values (p < 0.005) indicates that there is a significant difference of the r values between α and β structures of acid networks, whose connections are based on contact globular and transmembrane proteins, whereas the energy. On the other hand, hot spots are defined as critical interfacial residues that contribute most to the binding significant difference is not found between DNA sliding clamps and RNA polymerase. Assortativity is also a very energy. According to the similarity of two definitions, we relevant descriptor in protein folding. The assortative mixing suggest that the AACEN may provide a straightforward way to detect hot spots. The method adopt here is somewhat of amino acid networks contributes positively towards the folding speed of proteins [37]. Protein folding is a process similar with O-ring theory used by Li and Liu[46], and et al that amino acid sequence forms the native state structure by minimum cut tree algorithms used by Tuncbag . [47]. In order to test this hypothesis, we firstly used a well- non-covalent interactions. It has been shown that short-range interactions contribute higher in α-helix structures, whereas established server HotRegion [31] to predict hot spots of long-range interactions are more important in β-sheet toroidal proteins as a reference data set. Secondly, we constructed sub-networks based on AACENs for toroidal structures. Here, we show that α-helical structures have larger assortative mixing coefficient, and thus may indirectly proteins, which only include interfacial residues between indicate that short-range interactions play an important role different chains (Table S3 in Supplementary Material).

Fig. (7). (a) An example of protein interface (2polAB) and (b) its bipartite graph based on AACEN. The hot spots are showed as VDW representations, and two chains are showed with different colors. Nodes in the bipartite graph are colored according to the chains. AACENs for Proteins with Three Topologies Current Bioinformatics, 2016, Vol. 11, No. 3 9

Lastly, a direct comparison is carried between the HotRegion vulnerability defined as the maximal value of Vi(), which is data set and sub-network nodes. It should be noted that all more related to symmetry and hierarchy of network global the analyses were performed on only one type of interface topology. On the other hand, assortative mixing is a global (such as the interface between chain A and chain B), because measure for describing network robustness. In assortatively the multiple interfaces of toroidal proteins are almost mixed networks, high-degree nodes tend to be clustered identical. The performance values of sub-network nodes are together in the core group, and thus their removing is S 66%, C 74%, P 62% and A 71% for sensitivity, specificity, somewhat redundant for destroying network topology. precision and accuracy respectively. Although there is no System vulnerability indicated that toroidal proteins have available experimental information, the relatively high maximum symmetries, while the symmetry breaking is specificity and accuracy shows that these two computational happened for globular proteins. Largest assortative mixing methods are comparable. coefficient of transmembrane proteins showed that their

As an example, an interface between chain A and chain B AACENs are the most robust systems. of β clamp is analyzed. The HotRegion predicted nine hot The identification of structural and functional key residues, as shown in Fig. 7(a), in which ILE 78, LEU82, residues in proteins is another challenge. To this aim, several PHE106, and LEU108 located in chain A, and AGR269, centrality parameters based on amino acid networks have ILE272, LEU273, GLU300, and GLU 304 located in chain been developed. Betweenness centrality was found useful in B. The bipartite graph for this interface is also constructed detecting nucleation centers for protein folding[13] and and shown in Fig. 7(b). This sub-network contains 13 nodes, hotspots in different protein complexes[18]. Functional and 7 nodes are overlap with HotRegion prediction. 7 out of important residues, including active sites, ligand-binding and 9 hot spots have been predicted successfully by the AACEN evolutionary conserved residues, typically have high method, and PHE106 has the largest degree of connection, so closeness values[16]. Residues having high residue centrality the mutation of this residue mostly leads to the are supposed to interrupt the allosteric communications destabilization of the interface. As such, there is a good among different monomers [29]. The residue centrality has correlation between observed hot spots obtained by two been extended to study key residues of transmembrane methods. However, how to detect hot spots information proteins[26]. In addition, the packing topology of these buried by toroidal topology correctly is still a challenge, structurally important residues in membrane proteins has which needs our further study. been investigated [24]. In our recent paper, we also used residue centrality and betweenness to indentify key residues 4. CONCLUSIONS in the DNA clamp, a typical toroidal protein [28]. In the current work, we focused on the ‘hotspots’ as key residues of In this paper, we have constructed a new type of amino toroidal proteins. Through the comparison with hotspots acid networks based on contact energy, AACENs, for predicted by HotRegion, we have proposed that nodes in globular, transmembrane and toroidal proteins. Clustering sub-network for interfacial residues may model hotspots of coefficients and characteristic path lengths have been toroidal proteins. This method is not so powerful, but may calculated to describe their general properties. AACENs of provide a very quick tool to detect critical residues for the proteins show significant higher clustering coefficients and protein-protein interactions. the similar values of characteristic path length, comparing At last, it is very interesting to compare current topological with their random counterparts. Besides, the degree invariants with other centrality measures. The closeness distributions of AACENs are found to have Gaussian centrality can be used to distinguish the different roles played distributions. These two properties indicate that all proteins by different nodes, based just on local connections. The with different topologies belong to “small-world” networks average closeness can be considered as a global descriptor to [41]. distinguish the different global network topologies. Pointwise In addition, four important topological parameters vulnerability, similar with residue centrality, describes the including the network density, the average closeness, the importance of each node by quantize changes of their removal. system vulnerability and the assortative mixing coefficient But the system vulnerability can be used to quantity symmetry have been calculated to explain topological differences of network, which may relate to the system fragility. AACENs among three groups of proteins. These four topological provide a quantitative way to compare protein structures from parameters can be classified into two groups. First, the local to global levels[48]. Therefore, the comparative network network density and the average closeness are two global analysis of proteins among three topologies could give a deeper parameters, which are both decided by shortest path lengths. understanding of protein folding and oligomerization Results indicate that larger average closeness values have principles. However, its implication in the elucidation of larger packing densities. The network density and closeness protein allostery represents an imminent task [49]. centrality showed that globular proteins have the largest packing density, while the packing of toroidal proteins pack CONFLICT OF INTEREST are most loosely. Second, system vulnerability and assortative mixing coefficient are both related to robustness The authors confirm that this article content has no of networks. However, system vulnerability is a path-based conflicts of interest. measure and assortative mixing coefficient is a degree-based measure. Similar with residue centrality, the pointwise ACKNOWLEDGEMENTS vulnerability sVi()could be considered as local measure for predicting key residues whose removal will affect the This work was supported by the National Nature Science robustness of networks mostly. However, the system Foundation of China (21203131 and 91230117), the Natural 10 Current Bioinformatics, 2016, Vol. 11, No. 3 Yan et al.

Science Foundation of the Jiangsu Higher Education [24] Pabuwal V, Li ZJ. Comparative analysis of the packing topology of Institutions (12KJB180014). structurally important residues in helical membrane and soluble proteins. Protein Eng Des Sel 2009; 22: 67-73. [25] Emerson IA, Gothandam KM. Network analysis of transmembrane REFERENCES protein structures. Physica A 2012; 391: 905-16. [26] Emerson IA, Gothandam KM. Residue centrality in alpha helical [1] Hingorani MM, O'Donnell M. Toroidal proteins: Running rings polytopic transmembrane protein structures. J Theor Biol 2012; around DNA. Curr Biol 1998; 8: R83-R6. 309: 78-87. [2] Levitt M, Chothia C. Structural patterns in globular proteins. [27] Feverati G, Achoch M, Zrimi J, Vuillon L, Lesieur C. Beta-Strand Nature 1976; 261: 552-8. Interfaces of Non-Dimeric Protein Oligomers Are Characterized by [3] Tusnady GE, Dosztanyi Z, Simon I. Transmembrane proteins in the Scattered Charged Residue Patterns. Plos One 2012; 7: e32558. Protein Data Bank: identification and classification. Bioinformatics [28] Hu G, Yan WY, Zhou JH, Shen BR. Residue interaction network 2004; 20: 2964-72. analysis of Dronpa and a DNA clamp. J Theor Biol 2014; 348: 55- [4] Bode C, Kovacs IA, Szalay MS, Palotai R, Korcsmaros T, 64. Csermely P. Network analysis of protein dynamics. Febs Lett 2007; [29] del Sol A, Fujihashi H, Amoros D, Nussinov R. Residues crucial 581: 2776-82. for maintaining short paths in network communication mediate [5] Csermely P, Korcsmaros T, Kiss HJ, London G, Nussinov R. signaling in proteins. Mol Syst Biol 2006; 2: 2006.0019. Structure and dynamics of molecular networks: A novel paradigm [30] Yan WY, Hu G, Zhou JH, et al. Amino acid contact energy of drug discovery A comprehensive review. Pharmacol Ther 2013; networks impact and evolution. J Theor Biol 138: 333-408. 2014; 355: 95-104. [6] Greene LH. Protein structure networks. Brief Funct Genomics [31] Cukuroglu E, Gursoy A, Keskin O. HotRegion: a database of 2012; 11: 469-78. predicted hot spot clusters. Nucleic Acids Res 2012; 40: D829- [7] Hu G, Zhou JH, Yan WY, Chen JJ, Shen BR. The Topology and D33. Dynamics of Protein Complexes: Insights from Intra-Molecular [32] Zhang C, Kim SH. Environment-dependent residue contact Network Theory. Curr Protein Pept Sci 2013; 14: 121-32. energies for proteins. Proc Natl Acad Sci U S A 2000; 97: 2550-5. [8] Krishnan A, Zbilut JP, Tomita M, Giuliani A. Proteins as networks: [33] Shen BR, Vihinen M. RankViaContact: ranking and visualization Usefulness of graph theory in protein science. Curr Protein Pept Sci of amino acid contacts. Bioinformatics 2003; 19: 2161-2. 2008; 9: 28-38. [34] Yang Y, Chen B, Tan G, Vihinen M, Shen B. Structure-based [9] Yan WY, Zhou JH, Sun MM, Chen JJ, Hu G, Shen BR. The prediction of the effects of a missense variant on protein stability. construction of an amino acid network for understanding protein Amino Acids 2013; 44: 847-55. structure and function. Amino Acids 2014; 46: 1419-39. [35] Gaci O, Balev S. In: Lazinica A, Ed. A study of protein structure [10] Doncheva NT, Klein K, Domingues FS, Albrecht M. Analyzing using amino acid interaction networks, Croatia, I-Tech Education and visualizing residue networks of protein structures. Trends and Publishing. 2010; 19-46. Biochem Sci 2011; 36: 179-82. [36] Gol’dshtein V KG, Surdutovich G. Vulnerability and Hierarchy of [11] Di Paola L, De Ruvo M, Paci P, Santoni D, Giuliani A. Protein Complex Networks. Arxiv Prepr Cond-Mater 2004: 0409298. Contact Networks: An Emerging Paradigm in Chemistry. Chem [37] Bagler G, Sinha S. Assortative mixing in Protein Contact Networks Rev 2013; 113: 1598-613. and protein folding kinetics. Bioinformatics 2007; 23:1760-7. [12] Paci P, Di Paola L, Santoni D, De Ruvo M, Giuliani A. Structural [38] Newman ME. Assortative mixing in networks. Phys Rev Lett 2002; and Functional Analysis of and Serum Albumin 89: 208701. Through Protein Long-Range Interaction Networks. Curr [39] Hu G, Michielssens S, Moors SL, Ceulemans A. Normal Mode Proteomics 2012; 9:160-6. Analysis of Trp RNA Binding Attenuation Protein: Structure and [13] Vendruscolo M, Dokholyan NV, Paci E, Karplus M. Small-world Collective Motions. J Chem Inf Model 2011; 51: 2361-71. view of the amino acids that play a key role in protein folding. Phys [40] Hu G, Michielssens S, Moors SL, Ceulemans A. The harmonic Rev E 2002; 65: 061910. analysis of cylindrically symmetric proteins: A comparison of [14] Brinda KV, Vishveshwara S. A network representation of protein Dronpa and a DNA sliding clamp. J Mol Graph Model 2012; 34: structures: Implications for protein stability. Biophys J 2005; 89: 28-37. 4159-70. [41] Watts DJ, Strogatz SH. Collective dynamics of 'small-world' [15] Ding Y, Wang X, Mou Z. Effect of Hubs in Amino Acid Network networks. Nature 1998; 393: 440-2. on Iron Superoxide Dismutase Stability. Curr Bioinform 2015; 10: [42] Albert R, Jeong H, Barabasi AL. Error and attack tolerance of 232-9. complex networks. Nature 2000; 406: 378-82. [16] Amitai G, Shemesh A, Sitbon E, et al. Network analysis of protein [43] Matsunaga Y, Koike R, Ota M, Tame JR, Kidera A. Influence of structures identifies functional residues. J Mol Biol 2004; 344: Structural Symmetry on Protein Dynamics. Plos One 2012; 7: 1135-46. e50011. [17] Zhou J, Yan W, Hu G, Shen B. SVR_CAF: An integrated score [44] Pinsky M, Zait A, Bonjack M, Avnir D. Continuous symmetry function for detecting native protein structures among decoys. analyses: Cnv and Dn measures of molecules, complexes, and Proteins 2014; 82: 556-64. proteins. J Comput Chem 2013; 34: 2-9. [18] del Sol A, O'Meara P. Small-world network approach to identify [45] Tasdighian S, Di Paola L, De Ruvo M, et al. Modules key residues in protein-protein interaction. Proteins 2005; 58: 672- Identification in Protein Structures: The Topological and 82. Geometrical Solutions. J Chem Inf Model 2014; 54: 159-68. [19] Brinda KV, Vishveshwara S. Oligomeric protein structure [46] Li JY, Liu Q. 'Double water exclusion': a hypothesis refining the O- networks: insights into protein-protein interactions. BMC ring theory for the hot spots at proteininterfaces. Bioinformatics Bioinformatics 2005; 6: 296. 2009; 25: 743-50. [20] Bagler G, Sinha S. Network properties of protein structures. [47] Tuncbag N, Salman FS, Keskin O, Gursoy A. Analysis and Physica A 2005; 346: 27-33. network representation of hotspots in proteininterfaces using [21] Aftabuddin M, Kundu S. Hydrophobic, hydrophilic, and charged minimum cut trees. Proteins. 2010; 78: 2283-94. amino acid networks within protein. Biophys J 2007; 93: 225-31. [48] Vuillon L, Lesieur C. From local to global changes in proteins: a [22] Sengupta D, Kundu S. Do topological parameters of amino acids network view. Curr Opin Struct Biol 2015; 31: 1-8. within protein contact networks depend on their physico-chemical [49] Di Paola L, Giuliani A. Protein contact network topology: a natural properties? Physica A 2012; 391: 4266-78. language for allostery. Curr Opin Struct Biol 2015; 31: 43-8. [23] Sengupta D, Kundu S. Role of long- and short-range hydrophobic, hydrophilic and charged residues contact network in protein's structural organization. Bmc Bioinformatics 2012; 13: 142.

DISCLAIMER: The above article has been published in Epub (ahead of print) on the basis of the materials provided by the author. The Editorial Department reserves the right to make minor modifications for further improvement of the manuscript.