<<

Mol. Based Math. Biol. 2016; 4:36–46

Research Article Open Access

Tingyi Cao, Yongxiao Yang, and Xinqi Gong* Multiple dimensional space for interface residue characterization

DOI 10.1515/mlbmb-2016-0004 Received August 2, 2016; accepted December 7, 2016

Abstract: interact to perform biological functions through specific interface residues. Correctly understanding the mechanisms of interface recognition and prediction are important for many aspects of life science studies. Here, we report a novel architecture to study protein interface residues. In our method, multiple dimensional space was built on some meaningful features. Then we divided the space and put all the surface residues into the regions according to their features’ values. Interestingly, interface residues were found to prefer some grids clustered together. We obtained excellent result on a public and verified data benchmark. Our approach not only opens up a new train of thought for interface residue prediction, but also will help to understand proteins interaction more deeply.

Keywords: Protein-protein interaction, Interface residue, Multiple dimensional space, Clustering

1 Introduction

The biological functions of protein and proteins interact are often closely related. It is estimated that each protein interacts with 5 proteins in average. The interface of proteins plays an important role in the interaction between proteins [1]. Because of the important relationship between protein interaction interface and biological function, it is significant to study the characteristics of protein interface in order to correctly know the interface residues’ characterization. In proteins, the interface residues may have some differences from non-interface residues in many properties. Up to now, there have been a lot of theoretical studies on the geometry, physics, , evolutionary conservation and energy of the protein interfaces. These studies mainly involve the following properties: geometric, physical and chemical properties. For geometry and physical views, there are some quantities that usually be used: the area of interface (such as the size of the protein interact interface), plane degree (such as the plane of the interface usually referring to the degree of deviation from the interface atoms and their closest to the plane.), shape (most of the protein-protein interaction interface is round or similar round), symmetry, complementarity (such as shape correlation index, gap index, packing density), electrical conductivity and so on [2]. For chemical properties, there are electrostatic potential and hydrophobic. Interface amino acids in the protein surface have higher value of hydrophobic than the other amino acids [3]. In addition to the above mentioned, conservation of evolution and energy are also important. Otherwise, several methods have been developed to predict protein interface residues. For example, machine learning, such as neural network, deep learning, support vector machines (SVM), trains a model from a training set by taking account of some above features. But these approaches have a common problem that the performances of these models in the test set are not as good as in the training set [4]. Here we introduce a new multiply dimensional feature space method for interface residues. We know that there are differences between interface residues and non-interface residues on the geometric, physicalor

*Corresponding Author: Xinqi Gong: Institute for Mathematical Sciences, Renmin University of China, E-mail: [email protected] Tingyi Cao, Yongxiao Yang: Institute for Mathematical Sciences, Renmin University of China © 2016 Tingyi Cao et al., licensee De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. Multiple dimensional space for protein interface residue characterization Ë 37 chemical properties. At the same time, even if the same kind also has different values of properties in different proteins. This phenomenon should be taken into account in statistic. In application ofstatistic method, normalization has meaning to eliminates differences of measure units and variability of statistical data. Take one simple example, normalization of ratings means adjusting values measured on different scales to a notionally common scales, often prior to averaging [5]. Here we used standard normalization. So, in order to reduce the differences between surface residues in different proteins, we normalized the values of features. What’s more, we have payed attention to the subspace clustering. Actually, Subspace clustering combines traditional technology and cluster algorithm and can effectively cluster in high-dimensional data space [6]. We can combine computational algorithms with mathematical statistics to differentiate interface and non- interface residues by using above residue features.

2 Materials and Methods

2.1 Features

There are five kinds features to describe a surface residue: Absolute Exterior solvent accessible Area (absEA), Relative Exterior solvent accessible Area (R(EA)) [7], Exterior Contact Area with other residues (EC), Exterior Void area (EV) and Interior Contact area (IC) [8]. R(EA): The ratio between the experimental value and the theoretical maximum Absolute exterior solvent accessible Area for a certain amino acid, shown in Eq.(1).

ASA(k) R(EA)(k, i) = × 100%, k = 1, ··· , N, i = 1, ··· , 20. (1) (i) MaxASA

(i) ASA(k) is the value of the Solvent Accessible Surface area for the residue whose number is k, MaxASA is theory maximum volume and surface area for third amino acid residues [7]. From physical and geometric viewpoints, a surface residue will interact with other residues and be accessible to solvent. Besides surface residues also have some areas which are denied access to solvent or other residues. So we used EC, EV and IC to characterized related properties and showed them specially in Figure 1.

Figure 1: Schematic general view of the geometric features. Spheres with blue or green represent surface residues in the same . The gray sphere represents water. The red outline means Exterior contact Area with other residues (EC). Pink and yellow ones denote the Exterior Void area (EV) and Interior Contact area (IC). 38 Ë Tingyi Cao et al.

2.2 Data set

There are 143 dimers in docking benchmark version 5.0 [9]. Among them, some residues of nine proteins have been repeatedly measured in one site. Besides one protein have the problem about skeletal atom deletion. So 10 proteins(Their PDB codes: 1AVX, 1D6R, 1EAW, 1FQJ,1OPH,1PPE, 1YVB, 2H7V, 2IDO and 3SGQ)did not meet the requirements and 133 dimers or 266 were taken for calculation.

Table 1: Example data

Pro_No R/L resSeq absEA R(EA) EC EV IC Flag 1 1 1 103.35 77 93.958 43.932 162.698 0 1 1 2 0.62 0.8 97.439 92.341 54.84 0 1 2 9 57.52 49.4 102.879 65.051 132.676 0 1 2 10 5.63 2.8 201.292 110.128 246.028 0 2 1 1 232.39 119.7 52.222 15.998 293.579 1 2 1 2 111 73.3 102.178 50.112 193.834 0

The 133 dimers include 52782 residues, 7481 of them are interface residues. Table 1 took several residues as example to show the real features’ value of the surface residues. Pro_No is protein identification. R/L means receptor or ligand. The 1, 2 stand for receptor and ligand respectively. ResSeq is sequence number of amino acid. absEA, absEA, EC, EV, and IC are five features’ values of amino acids. Flag 0 means that the amino acid is non-interface residue, and 1 means interface residue. So we could use five features to describe a surface residue. In other words, every surface residue corresponds to a point in five-dimensional space.

2.3 Multiple dimensional feature space method

We found that there is variability of features’ value between different proteins. So first of all, we need to normalize the values of each feature at monomer level. According to data, every surface residue has five features, that is equivalent to five dimensional coordinates. We normalized the five features and mappedthe coordination into the new space, forming grids and multiple dimensional feature space. Then, counting the ratio of surface residues in every grids. Next we set the threshold k, and clustered grids meeting requirements by using DBSCAN [10]. Specific steps are shown in Figure 2. In this figure, we marked the residues indifferent monomers using various colors. First we normalized the values of each feature for every residue. After this procedure, the differences between the proteins have been reduced, so we marked all the residues usingthe same color. Then we divided the normalized values of the first feature into ten parts. The same operations were used for other four features. After that, we got new five coordinates for each residue, and formed the regions. Clustering these regions which constitute the space. Finally, we put the amino acids into the new multiply dimensional feature space. Multiple dimensional space for protein interface residue characterization Ë 39

Figure 2: The process of forming multiply dimensional space. Each axis represents a feature. The amino acids in different environment(protein) were marked in different color. Their differences were reduced by step 1 data normalization. Afterthat the residues were changed into the same color. Then we divided the value of every features into ten parts for forming the many regions through step 2 coordinate mapping. These regions formed a new multiple dimensional space by step 3.

Data normalization. In different monomers, every feature’s value is different for the same kind amino acid. In order to reduce the difference, each feature’s value of the amino acid should be normalized atthe level of monomer. The mathematical formulas of normalization were shown in Eq.(2) and Eq.(3).

k k k * xij − xi xij = √︁ (2) ∑︀N xk xk 2 j=1( ij − i )

∑︀N xk xk j=1 ij i = N (3) k k Where the five features included absEA, R(EA), EC, EV, IC, denoted as x (k = 1, 2, ··· , 5), xij (i = 1 1, 2, ··· , 266) is the value of jth residue at ith monomer. For example x23 represents the value of absEA of the third residue in the second monomer. Coordinate mapping. In all the amino acids, we found out the maximum and minimum values of the first normalized feature’s value, and divided [minimum, maximum] into ten intervals which correspond from 1 to 10 respectively. The other four features were dealt with same way. The old five coordinates corresponded to the new five dimensional space. The values of the new five dimensional coordinate component are all integer from 1 to 10. Calculating quantities related in each region. After the step 2, different types of residues may also fell into the same region. So we calculated Nir, Rir, Nr, Nim, Rim and Nm. Nir is the number of interface residues and Nr is the number of residues in region, where Rir(%) is a simple ratio describing how many interface residues in the residues. Nim is the number of monomers containing interface residues and Nm is the number of monomer, where Rim(%) is a ratio presenting how many monomers containing interface residues in the Nir Nim monomer. So Rir(%) and Rim(%) could be gotten by Rir = Nr × 100%, Rim = Nm × 100% . 40 Ë Tingyi Cao et al.

Table 2: Representative region with statistic result

region absEA relEA EA EV IC Nir Rir(%) Nr Nim Rim(%) Nm 55656 6 6 7 6 6 112 21.92 511 95 46.34 205 66556 7 7 6 6 6 105 18.75 560 87 40.28 216 76557 8 7 6 6 7 92 19.83 464 78 39.20 199 65657 7 6 7 6 7 112 16.62 674 91 40.09 227 77447 8 8 5 5 7 113 18.08 625 81 37.85 214 77446 8 8 5 5 6 93 18.27 509 74 34.10 217 77547 8 8 6 5 7 88 16.83 523 72 35.82 201 65667 7 6 7 7 7 82 17.23 476 70 35.53 197 54767 6 5 8 7 7 67 19.65 341 59 36.42 162

After five dimensional coordinates mapping, we got 1388 regions. Here we give examples toshowwhat we got and the quantities related. Table 2 showed the regions with statistical quantities related. Region_ID was region’s code. The values of absEA, R(EA), EC, EV and IC are the region’s coordinates in the five features dimensional space. Clustering by DBSCAN. Considering some regions with low ratio of interface residues which were not good for clusting, we set a threshold value to filter these regions out. For grids meeting threshold requirements, we used DBSCAN [10] clustering algorithm to find clusters in the new five dimensional space. Calculating cluster quantities related. For each cluster, we employed some statistical indicators for evaluation: Nir, Rir, Nr, Nim, Rim, Nm and Ave. Nir is the number of interface residues and Nr is the number of residues in region, where Rir(%) is a simple ratio describing how many interface residues in the residues. Nim is the number of monomers containing interface residues and Nm is the number of monomer, where Rim(%) is a ratio presenting how many monomers containing interface residues in the monomer. So Rir(%) Nir Nim and Rim(%) could be gotten by Rir = Nr ×100%, Rim = Nm ×100% .Then We calculated the average number Nr of amino acid for each monomer, written Ave, so Ave = Nm .

Figure 3: The summary of our method. First we confirmed what are the most effective features and then organized the original data. Normalizing each feature and mapping coordinates to form a multiple dimensional space. Setting the threshold about the ratio of interface residues in the region, finally clustering some regions by DBSCAN. Multiple dimensional space for protein interface residue characterization Ë 41

In our method, every surface residue has five features. We can use all of them or a part of them, suchas the first four features or three features to do similar calculations as showed in Figure 3. First wemustconfirm which features will be used. Then inputting related data and normalizing each feature at the protein level. The difference between the proteins have been reduced partly. The residues from different proteins may fallinto the same region. Counting the quantities related for every grid. Especially we set the threshold k about the ratio of the interface residues. If the grid matches the threshold, it can participate in clustering by DBSCAN, otherwise the grid class should be marked as 0. Finally, calculating the quantities related in every class for evaluation.

3 Results and Discussion

3.1 Interface amino acid prefers small grids

The same kind residues has different values of features in different monomers. Data normalization madedata have the same standard. The Figure 4 shows that normalization for features reduces the differences between proteins partly.

(a)

(b)

Figure 4: 4. Original and normalized data. Every color represents a mean value of one kind feature. (a) The original data of five features. (b)The normalized data of five features. 42 Ë Tingyi Cao et al.

Coordinate mapping could put surface residues from different monomers into the same grid,forming regions. We used two features to show the progress in Figure 5. Here we took absEA and R(EA) as example to calculate. The colored points mean surface residues from different proteins. Normalizing the absEA and R(EA) values at monomer level. After that we divided the normalized values of two features into 10 equal parts, marked with 1, 2, ··· , 10. Obviously,some residues would fall into the same grid. And some grids have no residues.

4.8053 1 PDB:1ACB PDB:1A7N 4.1126 2 PDB:1AK4

3.4198 3 2.7271 4 2.0343 5 1.3416 6 0.6489 7 normalized relEA -0.0439 8 -0.7366 9 -1.4294

10 1 2 3 4 5 6 7 8 9 10 -2.1221 -1.762 -1.0961 -0.4302 0.2357 0.9016 1.5675 2.2333 2.8992 3.5651 4.2310 4.8969 normalized absEA

Figure 5: Coordinate mapping. Dots in different colors or shapes mean residues from different proteins. Using first feature (absEA) and the second feature (R(EA)) as an example. Normalized values of absEA and R(EA) are divided into 10 equal parts (red number). The grid formed and each grid could have many points (surface residues).

According to the Figure 5 each surface residue values of five features mapped to the new 5 dimensional coordinates, then formatting some regions. Each dimension is an integer between 1-10, so 105 regions can be formed. But through statistics, only 1388 regions have residues, which contain the interface residues are 839. The Table 3 indicates that the interface residues have a certain preference for small regions.And the most of residues fall into the regions that have both interface residues and non-interface residues.

Table 3: Region categories

Region class Region number Amino acid number Only interface amino acid 92 105 Both types of amino acid 747 51173 Only non-interface amino acid 549 1504 Sum 1388 52782

3.2 Clustering

We sorted the cluster results according to Ave. By adjusting the threshold k (the ratio of interface amino acids in the region), good cluster results are found. We think a cluster result is good when the result satisfied the following requirements. (1) The number of monomers more than 133. (2) The ratio of monomers containing interface residues is the highest. Multiple dimensional space for protein interface residue characterization Ë 43

(3) The number of residues can’t be too little.

Table 4: Example cluster result

features threshold k class Nir Rir(%) Nr Nim Rim(%) Nm Ave 12345 28.889% 2 197 41.30 477 138 65.09 212 2 12345 26.923% 1 188 36.02 522 134 63.51 211 2

12345 31.111% 1 144 44.44 324 110 62.86 175 2 12345 27.473% 2 304 35.93 846 178 72.06 247 3 12345 27.778% 2 278 36.92 753 170 71.43 238 3

12345 28.000% 2 266 37.46 710 165 70.82 233 3

12345 23.387% 1 865 31.35 2759 255 96.23 265 10 12345 23.448% 1 836 31.73 2635 254 95.85 265 10

Table 4 shows the clustering results with different threshold k in five-dimension space and sorted byAve. In the table 4, threshold k defined by the ratio of interface residues is the request of clustering forregion and class is one of clustering result, where features describe which feature was used to calculate. Nir is the number of interface residues and Nr is the number of residues, where Rir(%) is a simple ratio describing how many interface residues in the residues. Nim is the number of monomers containing interface residues and Nm is the number of monomer, where Rim(%) is a ratio presenting how many monomers containing interface Nir Nim residues in the monomer. So Rir(%) and Rim(%) could be gotten by Rir = Nr × 100%, Rim = Nm × 100% . Nr And We calculated average number of amino acid for each monomer, written Ave, so Ave = Nm . For example, the first row shows that in five-dimension space (absEA, R(EA), EC, IC, EV), if we set clustering threshold k as 28.889%, we could get some clusters. In the second cluster, there were 477 amino acids which including 197 interface residues and 212 monomers which including 138 monomers had interface residues. In the other words, the number of successes is 197 in the 212 monomers. If we describe the surface residues with four features, the method is also available which the space dimension is four. We sorted the cluster results both five and four dimensional space by Ave. Then we picked out the class which the number of monomer more than 133. The class which has the highest Rim was thought as the good cluster result as Table 5 shows.

Table 5: Good cluster

features threshold k class Nir Rir(%) Nr Nim Rim(%) Nm Ave 1345 29.167% 1 207 43.86 472 147 70.33 209 2 1345 28.000% 1 307 37.44 820 183 77.54 236 3 1345 27.473% 1 388 35.99 1078 204 81.93 249 4 1235 25.333% 1 462 34.22 1350 226 87.26 259 5 1235 23.944% 1 547 32.85 1665 238 90.15 264 6 12345 25.581% 1 642 33.45 1919 238 91.89 259 7 1345 24.138% 1 687 32.24 2131 247 93.92 263 8 12345 23.810% 1 799 32.18 2483 252 95.45 264 9 1345 22.500% 1 825 30.45 2709 255 96.59 264 10 44 Ë Tingyi Cao et al.

Table 5 presents the good clustering results with different k in four or five dimensional space. Inthe table 5, threshold k defined by the ratio of interface residues is the request of clustering for region andclass is one of clustering result, where features describe which feature was used to calculate. Nir is the number of interface residues and Nr is the number of residues, where Rir(%) is a simple ratio describing how many interface residues in the residues. Nim is the number of monomers containing interface residues and Nm is the number of monomer, where Rim(%) is a ratio presenting how many monomers containing interface residues Nir Nim in the monomers. So Rir(%) and Rim(%) could be gotten by Rir = Nr × 100%, Rim = Nm × 100% .And We Nr calculated the average number of amino acid for each monomer, written Ave, so Ave = Nm . For example, the third row shows that in four-dimension 1345 space formed by the first(absEA), third(EC), fourth(IC) and fifth(IC) features, if we set clustering threshold k as 27.473%, we could get some clusters. In the 1cluster, there were 1078 amino acids which including 388 interface residues and 249 monomers which including 204 monomers had interface residues. In the other words, the number of successes is 204 in the 249 monomers when keeping the 4 residues per monomer.

3.3 Comparison

In contrast to the described methods, we considered the case for data without normalization. For the original data, first we directly divided the every feature’s value into ten parts which correspond from 1 to 10, forming the regions. Then setting the threshold k defined by the ratio of interface residues in the region. Finally, clustering the regions which satisfied the k by DBSCAN. In the multiple space method, we regarded Rir(the ratio of interface residues)as threshold k. Actually, for each region, we also calculated the ratio of monomers that have interface residues. So we could regard Rim(the ratio of monomers contains the interface residues)as the threshold k to choose which regions could be clustered by DBSCAN. Above all,our method(normalized data and k defined by Rir) compare with two other methods. The first method is data without normalization and threshold k defined by Rir. The secondoneis data with normalization and threshold k defined by Rim. We make a comparison between three methods in Figure 6.

1 normalized data and k defined by Rir data without normalization and k defined by Rir 0.95 normalized data and k defined by the Rim in region 0.9

0.85

0.8

Rim 0.75

0.7

0.65

0.6

0.55 3 4 5 6 7 8 9 10 Ave

Figure 6: Comparison with two methods. The first method is data without normalization and threshold k defined by Rir(the ratio describing how many interface residues in the residues). The second one is data with normalization and threshold k defined by Rim(the ratio describing how many interface residues in the residues). Axis-x is the Ave(the average number of amino acid for each monomer). Axis-y is Rim (the ratio presenting how many monomers containing interface residues in the monomers). Multiple dimensional space for protein interface residue characterization Ë 45

Data normalization and definition of threshold k make results different as Figure 6 shows. These areall calculated in five dimensional features space. Black indicates our method with normalized data and kdefined by Rir in region. Red represents data without normalized and k defined by Rir in region. Blue indicates data normalized but K defined by Rim in region. Obviously, the more residues contained in proteins, thehigher Rim we got in black or red. And the black performs better than others in most cases, which means the region with low Rir would disturb the result of cluster finally.

3.4 Discussion

In this paper, we employed five useful features to describe an amino acid, while we could use fourorthree features from them to character amino acids. At the same time, we divided the interval of each feature into ten parts while more or less parts may make a better result. In other words, if we regraded the number of features we employed and the number of parts we used to divide interval of each feature as parameters of this method, tuning parameters may be a way to obtain better results. As for clustering methods, DBSCAN, we set the parameters including the number of objects in a neighborhood of an object and neighborhood radius to 1. Maybe other parameters’ values can make some differences. Last, the definition of threshold kis not unique. We only used two of them (Rir or Rim).

4 Conclusion

According to the Table 4, four-dimensional space method might have better clustering results than five- dimensional space method. Using the four-dimensional feature space method, the accuracy of results can attain 77.54% when retaining the three residues per monomer. And using the five-dimensional feature space method, the accuracy of results can attain 91.89% when retaining the seven residues per monomer. This method will help to understand the mechanism of differentiating protein interface residues from non- interface ones and be useful for protein engineering applications.

Acknowledgement: This research was supported by the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University of China, and State Key Laboratory of Membrane Biology to Xinqi Gong. Experiments run on Renda Xing Cloud that currently has 64 physical nodes.

Conflict of interest: Authors state no conflict of interest

References

[1] Rafael A Jordan, Yasser EL-Manzalawy, Drena Dobbs, et al. Predicting protein-protein interface residues using local surface structural similarity[J]. BMC Bioinformatics, 2012, 13(1):1-14. [2] Ming-min Li, Shi-de Liang, Jian Zhang, et al. Structural Characterisation and Prediction of Protein-protein Interfaces[J]. Progress in Modern Biomedicine, 2006:91-103. [3] Philip Lijnzaad, Herman J C Berendsen, Patrick Argos. Hydrophobic patches on the surfaces of protein structures[J]. Proteins- structure Function & Bioinformatics, 1996, 25(3):389-397. [4] Changhui Yan, Drena Dobbs, Vasant Honavar. A two-stage classifier for identification of protein-protein interface residues[J]. Bioinformatics, 2004, 20(suppl_1):371-378. [5] Catriel Beeri, Philip A Bernstein, Nathan Goodman. A Sophisticate’s Introduction to Database Normalization Theory. [C] International Conference on Very Large Data Bases, September 13-15, 1978, West Berlin, Germany. 1989:468-479. [6] Lance Parsons, Ehtesham Haque, Huan Liu. Subspace clustering for high dimensional data: a review[J]. Acm Sigkdd Explorations Newsletter, 2004, 6(1):90-105. [7] Simon J Hubbard, M Thornton. NACCESS. 2.1.1: Department of and [J]. 1993. 46 Ë Tingyi Cao et al.

[8] Tiffany B Fischer, J Bradley Holmes, Ian R Miller, et al. Assessing methods for identifying pair-wise atomic contacts across binding interfaces. [J]. Journal of Structural Biology, 2006, 153(2):103-12. [9] Thom Vreven, Iain H Moal, Anna Vangone, et al. Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Aflnity Benchmark Version 2. [J]. Journal of Molecular Biology, 2015, 427(19):3031-41. [10] Md Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, et al. A new scalable parallel DBSCAN algorithm using the disjoint-set data structure[J]. 2012:1-11. [11] Zhi-wei Sun, Zheng Zhao. A Clustering Algorithm Based on Grid and Density with Random Sampling[J]. Journal of Tianjin University, 2006, 39(5):621-626. [12] Yanay Ofran, Burkhard Rost. Predicted protein-protein interaction sites from local sequence information. [J]. FEBS Letters, 2003, 544(1-3):236-239. [13] Pietro Di Lena, Ken Nagata, Pierre Baldi. Deep architectures for protein contact map prediction. [J]. Bioinformatics, 2012, 28(19):2449-57. [14] Burkhard Rost, Chris Sander. Combining evolutionary information and neural networks to predict protein secondary structure[J]. Proteins-structure Function & Bioinformatics, 1994, 19(1):55-72. [15] Alessandro Vullo, Paolo Frasconi. Prediction of Protein Coarse Contact Maps. [J]. Journal of Bioinformatics & Computational Biology, 2012, 1(2):411-431. [16] Jonathan C Fuller, Nicholas J Burgoyne, Richard M Jackson. Predicting druggable binding sites at the protein-protein interface. [J]. Drug Discovery Today, 2009, 14(3-4):155-61. [17] Gabrielle A Reeves, Karen Eilbeck, Michele Magrane, et al. The Protein Feature Ontology: a tool for the unification of protein feature annotations[J]. Bioinformatics, 2008, 24(23):2767-2772. [18] Agnieszka S Juncker, Lars J Jensen, Andrea Pierleoni, et al. Sequence-based feature prediction and annotation of proteins. [J]. Genome Biology, 2009, 10(2):1-6. [19] Jianping Lin, Pi Liu, Hua-Zheng Yang, et al. Factors That Affect the Computational Prediction of Hot Spots in Protein-Protein Complexes[J]. Computational Molecular Bioscience, 2012, 02(1):23-34.