JournalJournal of Chemical of Chemical Technology Technology and Metallurgy,and Metallurgy, 55, 4, 55, 2020, 4, 2020 714-718

CHEMICAL STRUCTURE COMPUTER MODELLING

Radoslava Topalska, Fatima Sapundzhi

South-West University “Neofit Rilski”, 66 Ivan Michailov str. Received 11 January 2019 2700, Blagoevgrad, Bulgaria Accepted 30 July 2019 E-mail: [email protected]

ABSTRACT

The root-mean-square deviation of atomic positions (RMSD) is one of the most commonly used approaches in bioinformatics. It measures the average distance between the atoms of superimposed proteins. The present study describes a program calculating RMSD between two structures. The software developed detects the surfaces of two molecular structures – a convex and a concave one and the area of their interaction by calculating RMSD between them. The program uses fragments of files from Protein Data Bank format. The Python implementation enabling RMSD computation is suggested on the ground of the Kabsh algorithm. Keywords: computer modelling, RMSD, Python, PDB, ligand-receptor interactions, bioinformatics.

INTRODUCTION structure is typically based on the protein name or ID [5]. The objective of this research is to present a program The protein structure prediction refers in general that (i) detects two surfaces – a convex and a concave to the juxtaposition of the predicted structure and the one of two structures and the area of their interaction experimentally determined one obtained by X-ray crys- and (ii) calculates RMSD. tallography and Nuclear Magnetic Resonance Imaging (NMR) technology used in clinical medicine. The degree EXPERIMENTAL of similarity is often expressed as a Root Mean Square Python Deviation (RMSD) measure, which represents the dis- Python is an object-oriented and an open source tance between the corresponding atoms in each molecule. computer programming language. It is commonly used It is useful as a measure of the accuracy of a model if one for both standalone programs and scripting applications has a crystal structure of the protein in order to compare in a wide variety of domains. Python is designed to opti- the model. RMSD calculations can be applied to non- mize the developer productivity, the software quality and protein molecules such as small organic molecules [1]. the program portability. The programs using Python run The algorithm of Kabsch is a popular method for on most platforms commonly used, including Windows, calculating the optimal rotation that minimizes Linux, Java and .NET, and more [6]. RMSD between two paired sets of points. This algorithm is widely used in bioinformatics for comparing protein RMSD structures, in cheminformatics to compare molecular The root mean-square deviation (RMSD) is a structures, etc. [2, 3]. However, as the size of the protein measure of the differences between values predicted by increases, the minimum RMSD to qualify for what is a model and the values actually observed in the object considered a good fit increases. Whereas an RMSD of being modeled or estimated (Eq. 1): 10 Å would be considered a poor fit for a small protein, 1 it might be considered excellent for a longer protein with 2 (1) = 𝑛𝑛 several hundred amino acids. =1 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 � � 𝛿𝛿𝑖𝑖 Most of the imaging work in bioinformatics involves 𝑛𝑛 𝑖𝑖 data from the Protein Data Bank (PDB) or the Molecu- where δi is the distance between atom i and either a refer- lar Modeling Database (MMDB) [4]. The search for a ence structure or a mean position of n equivalent atoms.

714 Radoslava Topalska, Fatima Sapundzhi

Normally a rigid superposition which minimizes Table 1. A fragment of 4DKL.xyz file () data used. RMSD is performed, and this minimum is returned. C -18.687 18.589 -1.665 Given two sets of points n and v , RMSD is defined C 9367 7640 8730 in accordance with Eq. 2: N -19.331 17.879 -2.770 2 2 2 =1 1, 2, + 1, 2, + 1, 2, N 9001 7490 8520 ( , ) = 𝑛𝑛 ∑𝑗𝑗 ��𝑥𝑥 𝑗𝑗 − 𝑥𝑥 𝑗𝑗 � �𝑦𝑦 𝑗𝑗 − 𝑦𝑦 𝑗𝑗 � �𝑧𝑧 𝑗𝑗 − 𝑧𝑧 𝑗𝑗 � � C -19.216 18.216 -4.051 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑣𝑣 𝑤𝑤 � 𝑛𝑛 (2) C 8657 7044 8299 The proteins atomic coordinates are generally ex- N -18.473 19.257 -4.403 pressed in Å (where 1 Å = 10–10 m = 0.1 nm). RMSDs N 8544 6605 8184 are also expressed in Å as RMSD value is expressed in N -19.844 17.511 -4.981 length units [7, 8]. N 8198 6785 7937 N -12.650 17.723 -2.621 UCSF Chimera N 10424 9873 11513 UCSF Chimera 1.12 software is used to generate C -11.638 17.687 -3.667 high-quality images. It is a program for interactive C 10117 9785 11668 visualization and analysis of molecular structures and C -11.784 16.370 -4.426 related data including density maps, supramolecular as- semblies, sequence alignments, docking results, etc. The C 9158 9210 10910 program can be downloaded free of charge for academic, O -12.134 15.349 -3.833 government, non-profit, and personal use [9]. O 8539 8832 10227 C -10.221 17.815 -3.068 RESULTS AND DISCUSSION C 10776 10747 12732 The developed program is based on the Kabsh O -9.239 17.516 -4.067 algorithm and is realized in the Python program lan- O 11173 11464 13570 guage. The toll superimposes and calculates RMSD C -10.051 16.866 -1.891 between two molecule structures in xyz format. RMSD C 10474 10804 12436 is calculated between two sets of atomic coordinates, in this case, one for crystallographic structure ( , N -11.538 16.389 -5.747 ) (Table 1) from PDB fail and another for the atomic N 9212 9274 11159 coordinates of the ligand ( , ) (Table 2). C -11.721 15.184 -6.569 The mathematical calculation of RMSD in case of C 8973 9337 11061 two sets of xyz coordinates for n particles is given by C -10.806 14.020 -6.179 Eq. 1 [1, 2]. This procedure does not take into account C 8911 9825 11405 that the two molecules could be identical. In this case O -11.131 12.867 -6.479 they are translated only in space. The solution of the O 9022 10154 11539 problem requires to position the molecules at an identi- C -11.397 15.672 -7.988 cal center and to rotate one onto the other. The centroid C 8792 9003 10988 for both molecules has to be initially found and then both molecules have to be translated to the center of the C -10.598 16.916 -7.799 coordinate system. Then the Kabsch algorithm is used to C 8685 8681 10987 align the molecules by rotation. The procedure described C -11.142 17.551 -6.560 is in fact a method for calculating the optimal rotation C 9004 8731 10982 matrix that minimizes RMSD between two paired set N -9.688 14.312 -5.520 of points. It returns the centroid of a matrix as a [x y z] N 8882 10011 11688 vector and translates two matrices so that their centroids … … … … are equal to the origin of the coordinate system.

715 Journal of Chemical Technology and Metallurgy, 55, 4, 2020

Table 2. A fragment of MET-enkephalin – a pentapeptide The Kabsch algorithm [2, 3] solves the constrained of a morphine-like activity (PubChem CID:443363) MET- orthogonal Procrustes problem. This problem refers to the enkephalin.xyz fail (< mol2.xyz >). comparison of two (or more) shapes. Aiming this, they N -1.347 0.242 -1.290 must be optimally superimposed by translating, rotating C 0.058 -0.100 -0.952 and scaling. Rotations of the matrices are only allowed. C 0.541 -1.375 -1.688 Widely studied proteins, both theoretically and C -0.407 -2.515 -1.332 experimentally, are used for a test set. The working N -0.004 -3.808 -1.498 algorithm is illustrated by following an excerpt of PDB file of human μ-opiod receptor. Fig. 1 shows a part of C -0.871 -4.926 -1.094 the structure of MOR [4]. C -1.964 -5.309 -2.111 The atomic coordinates are used to construct a N -2.283 -4.329 -3.025 reference matrix, which together with another matrix C -3.293 -4.539 -4.043 of coordinates (constructed in the same way), provides C -4.655 -4.025 -3.586 the algorithm input data (Fig. 2) [11]. O -5.148 -3.004 -4.056 The program is written in Python programming O -2.553 -6.388 -2.088 language and uses two fails of atomic coordinates < O -1.586 -2.242 -1.105 mol1.xyz > and < mol2.xyz >. C 2.011 -1.666 -1.347 The program uses a function that reads the atomic C 3.029 -0.652 -1.834 coordinates from PDB file and returns the non-hydrogen C 4.240 -0.531 -1.134 coordinates as arrays [5 - 9]. The .pdb-format contains a C 5.234 0.348 -1.569 lot of information about the molecules investigated, such C 5.030 1.110 -2.710 as name of each amino-acid, their coordinates, hbonds O 6.024 1.956 -3.096 over which they are related to each other, etc. C 3.850 0.996 -3.433 In the loop, the function loops through each line of PDB file and assigns the atomic coordinates to three lists, C 2.860 0.105 -3.005 named x, y and z. The coordinates x-, y- and z- are each H -1.623 1.085 -0.786 separated by two blank characters, while the coordinates H -1.967 -0.532 -1.051 are stored in a floating value format (Table 1 and Table H -1.418 0.414 -2.293 2). The advantage of the .pdb-files refers to the fact that H 0.154 -0.234 0.105 their structure can be visualized again in Chimera. H 0.674 0.712 -1.276 The base function calculates RMSD based on Eq. 1 H 0.514 -1.244 -2.750 and returns this value as a float. In this case the value of H 0.878 -3.995 -1.892 RMSD is smaller than 3Å (RMSD = 2.359 Å). H -1.328 -4.697 -0.154 The application developed uses specifically format- H -0.228 -5.779 -1.028 ted output files .xyz-format. They are presented with H -1.814 -3.465 -2.986 a .xyz-extension and are usually obtained by some H -3.358 -5.579 -4.285 processing with Python. These files may be visualized H -3.006 -3.988 -4.915 using Chimera software (Fig. 1). H 2.273 -2.632 -1.725 Fig. 1 shows a graphical representation of detects H 2.061 -1.611 -0.280 on two surfaces – a convex and a concave one (in this case a receptor (4dkl.xyz) and a ligand (Met-enkephalin. H 4.397 -1.099 -0.287 xyz) and the area of their interaction, which is in fact the H 6.119 0.430 -1.043 purpose of the experiments. The convex surface in this H 6.811 2.016 -2.592 case is a fragment of PDB fail of the protein (4dkl.xyz) H 3.704 1.565 -4.282 (Table 1), while the concave surface is a fragment of H 1.994 0.004 -3.557 the ligand (PubChem CID: 443363) (Table 2) [10 - 13]. … … … … Met-enkephalin is an endogenous opioid peptide that

716 Radoslava Topalska, Fatima Sapundzhi

Fig.1. Different views of the graphical representation of a convex (4dkl.xyz) and a concave (Met-enkephalin.xyz) surfaces and the area of their interaction. The pictures are generated by Chimera. has opioid effects of a relatively short duration [10]. The atoms to be used in the calculation are to be specified It is a potent agonist of the δ-opioid receptor and to a in the upper left corner of the menu. The atom selection lesser extent of the μ-opioid receptor. The drug exerts text is inserted in the input field. It is typed exactly as its analgesia and antidepressant-like effects [14 - 17] this is done in case of using the Graphics form. through them. This paper deals with a fundamental problem - Although RMSD is one of the most commonly cited RMSD calculation, which is very important for protein measures for describing a structural similarity, it is not so structure analysis. A Python implementation of Kabsh useful for comparing distant structures where extensive algorithm enables RMSD computation. The developed embellishments or secondary structure shifts can often tool provides the estimation of RMSD between two pro- mask the underlying similarity. RMSD is equal to 0 for tein 3-D structures. Faster query algorithms or more flex- identical structures, while its value increases as the two ible query algorithms are to be additionally developed. structures start to differ. RMSD calculations between models of delta-opioid receptor (DOR), mu-opioid re- Аcknowledgements ceptor and cannabinoid receptors are presented in refs. This paper is partially supported by SWU “N. [13 - 22]. RMSD between three models of DOR are Rilski” Project RPY-B4/19; RP-B7/20; Project, BNSF calculated in Chimera [8]. It is respectively: a model of Н27/36; National Scientific Program “Information DOR obtained by homology modeling (Model B) with and Communication Technologies for a Single Digital DOR (PDBid:1ozc)- RMSD = 1.960; Model B with Market in Science, Education and Security (ICTinSES)”, DOR (PDBid:4ej4), RMSD = 1.660; DOR (PDBid: financed by the Ministry of Education and Science. 1ozc) with DOR(PDBid:4ej4)– RMSD = 1.874. The introduction of a protein by its molecular REFERENCES surface representation facilitates the study of protein folding in the prediction of biomolecular recognition, 1. B. Bergeron, Bioinformatics computing, Pearson the detection of the drug binding ‘cavities’ and the Educatin, USA, 2003. molecular graphics [19 - 22]. One of the advantages of 2. W. Kabsch, A solution for the best rotation to relate the molecular surface description refers to its ability to two sets of vectors, Acta Crystallographica Section visualize the shape complementarity at interfaces. A, 32, 5, 1976, 922-923. The research reported provides a consistent frame- 3. W. Kabsch, A discussion of the solution for the best work for further investigation of RMSD calculations and rotation to relate two sets of vectors, Acta Crystal- detects on two surfaces – a convex and a concave one lographica Section A, 34, 5, 1978, 827-828. and the area of their interaction. The RMSD calculator is 4. RCSB Protein Data Base, www.rcsb.org used to calculate RMS distances between the molecules. 5. B. Lee., F. Richards, The interpretation of protein 717 Journal of Chemical Technology and Metallurgy, 55, 4, 2020

structures: estimation of static accessibility, J. Mol. 15. F. Sapundzhi, T. Dzimbova, N. Pencheva, P. Biol., 55, 1971, 379-400. Milanov, Molecular docking experiments of can- 6. M. Lutz, Python Pocket Reference: Python In Your nabinoid receptor, Bulgarian Chemical Communica- Pocket, O’Reilly, Canada, 2014. tions, 50, Special Issue B, 2018, 44-48. 7. https://en.wikipedia.org/wiki/Root-mean-square_ 16. F. Sapundzhi, M. Popstoilov, Optimization algo- de-viation_of_atomic_positions,2019 rithms for finding the shortest paths, Bulgarian 8. https://www.cgl.ucsf.edu/chimera, 2019. Chemical Communications, 50, Special Issue B, 9. https://pubchem.ncbi.nlm.nih.gov, 2019. 2018, 115-120. 10. T. Dzimbova, F. Sapundzhi, N. Pencheva, P. 17. F. Sapundzhi, K. Prodanova, M. Lazarova, Survey of Milanov, Computer modeling of human mu-opioid the scoring functions for protein-ligand docking. AIP receptor, Journal of peptide science, 18 (S1), S84, Conference Proceedings, 2172, 2019, 100008 1-6. 2012, P072. 18. F. Sapundzhi, Computer modelling and optimization 11. F. Sapundzhi, Scoring functions and modeling of of the structure-activity relationship by using surface structure-activity relationships for cannabinoid fitting methods. Bulgarian Chemical Communica- receptors, International Journal of Online and Bio- tions, 51(4), 2019, 569– 579. medical Engineering, 15 (11), 2019, 139-145. 19. F. Sapundzhi, T. Dzimbova, A study of QSAR based 12. F. Sapundzhi, T. Dzimbova, N. Pencheva, P. on polynomial modeling in Matlab, International Milanov, Exploring the interactions of enkephalin Journal of Online and Biomedical Engineering, and dalargin analogues with the mu-opioid recep- 15(15), 2019, 39-56. tor, Bulgarian Chemical Communication, 2, 2015, 20. V. Kralev, R. Kraleva, Visual analysis of actions 613-618. performed with big graphs International Journal of 13. F. Sapundzhi, T. Dzimbova, N. Pencheva, P. Innovative Technology and Exploring Engineering, Milanov, Comparative evaluation of four scoring 9(1), 2019, 2740-2744. functions with three models of delta opioid receptor 21. F. Sapundzhi, M. Popstoilov, C# implementation of using molecular docking, Der Pharma Chemica, 8, the maximum flow problem, 27th National Conference 2016, 118-124. with International Participation: The Ways to Connect 14. F. Sapundzhi, T. Dzimbova, N. Pencheva, P. the Future, TELECOM 2019 – Proceedings, 62-65. Milanov, Modeling the relationship between biologi- 22. F. Sapundzhi, T. Dzimbova, Computer modelling of cal activity of delta-selective enkephalin analogues the CB1 receptor by molecular operating environ- and docking results by polynomials, Bulgarian ment, Bulgarian Chemical Communications, 50, Chemical Communications, 49, 4, 2017, 768-774. Special Issue B, 2018, 15-19.

718