Distance Adjacency Matrix with Shortest Path and Protein Similarity/Dissimilarity

Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

D. Vijayalakshmi Assistant Professor, Department of Mathematics, (SCSVMV),

K. Divya M.Phil Scholar, Department of Mathematics, (SCSVMV), Abstract

Proteins are represented by protein graphs. For the protein graphs the distance adjacency matrix DA(s) based on the shortest path between vertices is constructed. In this novel method the modulus of second Eigen value of the DA(s) matrix plays the key role to determine the similarity/dissimilarity between proteins. The result obtained is compared with BLAST sequence site result

Keywords –Distance adjacency matrix, Shortest path, Similarity/Dissimilarity

I. INTRODUCTION

Protein study is a wide area of research. In this, similarity/dissimilarity study occupies an important spaces as it allows to observe the nature of a new protein whose primary and secondary structure is known. One common method to determine the biological function of protein by analyzing the sequence based search for similarities in the arrangement between the known sequences is basic local alignment search tool (BLAST).In this paper protein are converted into graph , hence the similarity study of protein gets converted into a graph problem.

In this part, we see how the graph, its matrix, the Eigen value of matrix plays a vital role in study of protein. In [1], the stability of graph that represents the protein is verified by using graph entropy. It is also suggested that graph entropy helps in conformational on protein graph modeling. In [2],for the graph that represents protein, the normalized laplacian representation of adjacency matrix is considered. To measure similarity between graphs the Euclidean distance between the two graphs spectral vectors is calculated.

In [3],a novel method based on graph indexing called closure tree annotates the information of underlined graphs. In [4],[5], the normalized laplacian matrix is used as a tool to classify the different domains in proteins. In[6], the largest Eigen value of normalized laplacian shows the synchronization of graphs.[7],[8],[9] clearly visualize that the normalized laplacian matrix is a very useful tool in analyzing the qualitative aspects of complex networks. In [10], Degree -Distance matrix for protein graph is determined and the similarity between proteins is measured using the least positive Eigen value of Degree- Distance matrix. In this paper, the proteins are represented by graphs and Distance- Adjacency matrix DA(s) based on shortest path is constructed for the protein graph. Second Eigen value of the DA(s) is used to measure similarity/ dissimilarity between proteins.

II. METHOD

DA matrix and protein similarity/dissimilarity

The graph for the protein is constructed using the following steps.

1. The secondary structure elements are considered as vertices. 2. Each vertex is represented by the centroid of the corresponding secondary structure element. The centroids are the average of 3-D co-ordinates of the central carbon atom of the amino acids in the particular structure.

Volume XII, Issue V, 2020 Page No: 2474 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

3. The difference between the centroid of a vertex with the remaining vertices is calculated. The edges are drawn between the vertex and the first two minimum distance vertices. This step is repeated for every vertex and the graph for a protein is obtained.

In this paper we consider 9 proteins and they are 1jxt, 1jxy, 1jxw, 1jxx, 1ccn, 1jxu, 3u7t, 1ab1, 1wuw

2.1Proteins and Protein graphs

1wuw 1jxt 1ccn

1jxy 1jxu 1ab1

3u7t 1jxx 1jxw

Figure 1 Proteins with structure, description and graph

2.2Shortest Path & Modulus of Second Eigen Value (SP&MSEV)

Distance Matrix

Distance Matrix also known as path length matrix as it gives details about distance between the vertices. In this paper, we construct the distance matrix based on the shortest distance between each pair of vertices.

푠ℎ표푟푡푒푠푡 푝푎푡ℎ 푏푒푡푤푒푒푛 푣 푎푛푑 푣 푖푓 푖 ≠ 푗 Dist(i,j) = { 푖 푗 0 푖푓 표푡ℎ푒푟푤푖푠푒

Adjacency Matrix

In the Adjacency Matrix AG of the graph G (V, E), the entries ai,j are

1 푖푓 (푖, 푗) ∈ 퐸 푎 = { 푖푗 0 표푡ℎ푒푟푤푖푠푒

Volume XII, Issue V, 2020 Page No: 2475 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

Distance adjacency matrix is defined as difference between distance and adjacency matrix of a graph. In particular we consider the shortest distance between the each pair of vertices.

The Distance adjacency matrix is denoted by DA(s). Let 123 ...... n be the Eigen value of DA(s) matrix and the Eigen Value of DA(s) matrix satisfies the condition.

∑ = 0   푖 Distance-Adjacency Matrix DA(s) is

 DA matrix is a non-singular matrix  This is a real symmetric matrix  The diagonal entries are zero  Indefinite matrix

The Modulus of Second Eigen Value (MSEV) of the DA(s) matrix is taken into account to measure similarity/dissimilarity of protein.

Initially the DA(s) matrix for all protein graphs is calculated. Secondly the Eigen values are determined and the modulus of second Eigen value is taken into account to measure similarity between proteins. The modulus of second Eigen value is listed below

Table -1 Modulus of Second Eigen Value

S.No Protein MSEV 2

1 1JXT 2.2484 2 1JXY 2.2484 3 1JXW 2.2484 4 1JXX 2.3170 5 1AB1 2.6303 6 3U7T 2.4777 7 1JXU 2.9525 8 1CCN 2.9585 9 1WUW 1.1231

The similarity /dissimilarity between proteins is measured us follows. For a protein the difference between its MSEV and the remaining protein’s MSEV is calculated and based on the difference value the percentage of similarity is measured. The difference value and its corresponding percentage of similarity is given below in the following table

Table -2 Difference Value and Similarity Percentage

S.No Difference Value(DV) Percentage of Similarity 1 0 – 0.20 100% 2 0.20 - 0.40 97%- 100% 3 0.40 - 0.60 90% - 97% 4 0.60 - 0.80 80% - 90% 5 0.80 – 1 70% - 80% 6 1 – 1.20 65 % - 70% 7 1.20 – 1.40 60% - 65% 8 1.40 - 1.60 55% -60%

Volume XII, Issue V, 2020 Page No: 2476 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

9 1.60 – 1.80 50% -55% 10 1.80 - 2 45%- 50% We consider 1jxx protein; the difference value when compared with remaining proteins is given below.

Table -3 1jxx and Difference Value

Protein compared with Difference value 1jxx 1ccn,1jxu 0.6355 1ab1 0.3133 1jxt,1jxy,1jxw 0.0686 3u7t 0.1607 1wuw 1.1939

The percentage of similarity of 1jxx protein with the remaining protein under consideration is as follows

Table -4 1jxx and Similarity Percentage

Proteins Name % Similarity 1ccn, 1jxu 80-90 1ab1 97-100 1jxt,1jxy,1jxw 100 3u7t 100 1wuw 65-70

III.RESULT

Based on the DV the similarity percentage between each pair of protein is calculated and is compared with blast sequence results. The details are shown below

Table -5 Similarity/Dissimilarity percentage of proteins

1jxt 1jxy 1jxw 1jxx 1ccn 1jxu 1ab1 3u7t 1wuw 1jxt 0 100 100 100 80-90 80-90 97-100 97-100 65-70 100 100 100 95-98 95-98 100 100 56-58 1jxy 0 100 100 80-90 80-90 97-100 97-100 65-70 100 100 95-98 95-98 100 100 56-58 1jxw 0 100 80-90 80-90 97-100 97-100 65-70 100 95-98 95-98 100 100 56-58 1jxx 0 80-90 80-90 97-100 100 65-70 95-98 95-98 100 100 55 1ccn 0 100 97-100 90-97 45-50 100 95-97 95-97 50-54

Volume XII, Issue V, 2020 Page No: 2477 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930

1jxu 0 97-100 90-97 45-50 95-97 95-97 50-54 1ab1 0 100 55-60 100 56-58 3u7t 0 60-65

55 1wuw 0

Values in Bold letters represent the result obtained by our method and the values below are result from blast sequence site.

IV.Conclusion

It is trivial that the method stated in this paper is a novel and simpler method in measuring similarity/ dissimilarity. In this method the similarity is measured based on the distance that is the neighbourhood of the vertices. Even though the method is simpler the result obtained are much closer to the blast sequence results, this fact shows the efficiency of the method

REFERENCES

[1] Sheng-lung peng, Yu Wei Tsay “Adjusting Protein graphs based on graphs entropy” BMC Bio Informatics. (2014) (suppl15):S6 http\\www.biomedcentral.com 1471-2105\15\S15\S6. [2] Do Phuc, and Nguyen Thi Kim Phung, “Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures”, (2009) World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:3, No:8. [3] Huahai He, Ambuj K. Singh (2006). “Closure-Tree: An Index Structure for Graph Queries”, ICDE. [4] Banerjee.A, “The Spectrum of the Graph Laplacian as a Tool for Analyzing Structure and Evolution of Networks”, (2008), PhD thesis, Fakult¨at f¨r Mathematik und Informatik der Universit¨at Leipzig. [5] Banerjee.A and J. Jost, “Laplacian spectrum and protein-protein interaction networks”, (2008), AarXiv.org:0705.3373. [6] J. Jost, and M. P. Joy, “Spectral properties and synchronization in coupled map lattices”, (2001) Phys. Rev. E, 65, 016201. [7] Banerjee and J. Jost, “Spectral plots and the representation and interpretation of biological data”,(2007), Theory in Biosciences, 126, 15–21. [8] Banerjee and J. Jost,” The spectral characterization of network structures and dynamics”, in N. Ganguly, A. Deutsch, and A. Mukherjee,editors, Dynamics on and of Complex Networks, (2009), 117–132. Birkhauser Boston. [9] Banerjee and J. Jost, “Graph spectra as a systematic tool in computational biology”, (2009), Discrete Appl. Math. 157, 2425–2431. [10] D.Vijayalakshmi, K.Srinivasa Rao,” DD Matrix with Least Positive Eigen Value and Protein Similarity”, (2017) International journal of pure and Applied Mathematics, volume 115 No.9,331-342 ISSN:1311-8080(printed version);ISSN:1314-3395(online version)

Volume XII, Issue V, 2020 Page No: 2478