Chemical Structure Computer Modelling
Total Page:16
File Type:pdf, Size:1020Kb
JournalJournal of Chemical of Chemical Technology Technology and Metallurgy,and Metallurgy, 55, 4, 55, 2020, 4, 2020 714-718 CHEMICAL STRUCTURE COMPUTER MODELLING Radoslava Topalska, Fatima Sapundzhi South-West University “Neofit Rilski”, 66 Ivan Michailov str. Received 11 January 2019 2700, Blagoevgrad, Bulgaria Accepted 30 July 2019 E-mail: [email protected] ABSTRACT The root-mean-square deviation of atomic positions (RMSD) is one of the most commonly used approaches in bioinformatics. It measures the average distance between the atoms of superimposed proteins. The present study describes a program calculating RMSD between two structures. The software developed detects the surfaces of two molecular structures – a convex and a concave one and the area of their interaction by calculating RMSD between them. The program uses fragments of files from Protein Data Bank format. The Python implementation enabling RMSD computation is suggested on the ground of the Kabsh algorithm. Keywords: computer modelling, RMSD, Python, PDB, ligand-receptor interactions, bioinformatics. INTRODUCTION structure is typically based on the protein name or ID [5]. The objective of this research is to present a program The protein structure prediction refers in general that (i) detects two surfaces – a convex and a concave to the juxtaposition of the predicted structure and the one of two structures and the area of their interaction experimentally determined one obtained by X-ray crys- and (ii) calculates RMSD. tallography and Nuclear Magnetic Resonance Imaging (NMR) technology used in clinical medicine. The degree EXPERIMENTAL of similarity is often expressed as a Root Mean Square Python Deviation (RMSD) measure, which represents the dis- Python is an object-oriented and an open source tance between the corresponding atoms in each molecule. computer programming language. It is commonly used It is useful as a measure of the accuracy of a model if one for both standalone programs and scripting applications has a crystal structure of the protein in order to compare in a wide variety of domains. Python is designed to opti- the model. RMSD calculations can be applied to non- mize the developer productivity, the software quality and protein molecules such as small organic molecules [1]. the program portability. The programs using Python run The algorithm of Kabsch is a popular method for on most platforms commonly used, including Windows, calculating the optimal rotation matrix that minimizes Linux, Java and .NET, and more [6]. RMSD between two paired sets of points. This algorithm is widely used in bioinformatics for comparing protein RMSD structures, in cheminformatics to compare molecular The root mean-square deviation (RMSD) is a structures, etc. [2, 3]. However, as the size of the protein measure of the differences between values predicted by increases, the minimum RMSD to qualify for what is a model and the values actually observed in the object considered a good fit increases. Whereas an RMSD of being modeled or estimated (Eq. 1): 10 Å would be considered a poor fit for a small protein, 1 it might be considered excellent for a longer protein with 2 (1) = several hundred amino acids. =1 � � Most of the imaging work in bioinformatics involves data from the Protein Data Bank (PDB) or the Molecu- where δi is the distance between atom i and either a refer- lar Modeling Database (MMDB) [4]. The search for a ence structure or a mean position of n equivalent atoms. 714 Radoslava Topalska, Fatima Sapundzhi Normally a rigid superposition which minimizes Table 1. A fragment of 4DKL.xyz file (<mol1.xyz>) data used. RMSD is performed, and this minimum is returned. C -18.687 18.589 -1.665 Given two sets of points n and v , RMSD is defined C 9367 7640 8730 in accordance with Eq. 2: N -19.331 17.879 -2.770 2 2 2 =1 1, 2, + 1, 2, + 1, 2, N 9001 7490 8520 ( , ) = ∑ �� − � � − � � − � � C -19.216 18.216 -4.051 � (2) C 8657 7044 8299 The proteins atomic coordinates are generally ex- N -18.473 19.257 -4.403 pressed in Å (where 1 Å = 10–10 m = 0.1 nm). RMSDs N 8544 6605 8184 are also expressed in Å as RMSD value is expressed in N -19.844 17.511 -4.981 length units [7, 8]. N 8198 6785 7937 N -12.650 17.723 -2.621 UCSF Chimera N 10424 9873 11513 UCSF Chimera 1.12 software is used to generate C -11.638 17.687 -3.667 high-quality images. It is a program for interactive C 10117 9785 11668 visualization and analysis of molecular structures and C -11.784 16.370 -4.426 related data including density maps, supramolecular as- semblies, sequence alignments, docking results, etc. The C 9158 9210 10910 program can be downloaded free of charge for academic, O -12.134 15.349 -3.833 government, non-profit, and personal use [9]. O 8539 8832 10227 C -10.221 17.815 -3.068 RESULTS AND DISCUSSION C 10776 10747 12732 The developed program is based on the Kabsh O -9.239 17.516 -4.067 algorithm and is realized in the Python program lan- O 11173 11464 13570 guage. The toll superimposes and calculates RMSD C -10.051 16.866 -1.891 between two molecule structures in xyz format. RMSD C 10474 10804 12436 is calculated between two sets of atomic coordinates, in this case, one for crystallographic structure ( , N -11.538 16.389 -5.747 ) (Table 1) from PDB fail and another for the atomic N 9212 9274 11159 coordinates of the ligand ( , ) (Table 2). C -11.721 15.184 -6.569 The mathematical calculation of RMSD in case of C 8973 9337 11061 two sets of xyz coordinates for n particles is given by C -10.806 14.020 -6.179 Eq. 1 [1, 2]. This procedure does not take into account C 8911 9825 11405 that the two molecules could be identical. In this case O -11.131 12.867 -6.479 they are translated only in space. The solution of the O 9022 10154 11539 problem requires to position the molecules at an identi- C -11.397 15.672 -7.988 cal center and to rotate one onto the other. The centroid C 8792 9003 10988 for both molecules has to be initially found and then both molecules have to be translated to the center of the C -10.598 16.916 -7.799 coordinate system. Then the Kabsch algorithm is used to C 8685 8681 10987 align the molecules by rotation. The procedure described C -11.142 17.551 -6.560 is in fact a method for calculating the optimal rotation C 9004 8731 10982 matrix that minimizes RMSD between two paired set N -9.688 14.312 -5.520 of points. It returns the centroid of a matrix as a [x y z] N 8882 10011 11688 vector and translates two matrices so that their centroids … … … … are equal to the origin of the coordinate system. 715 Journal of Chemical Technology and Metallurgy, 55, 4, 2020 Table 2. A fragment of MET-enkephalin – a pentapeptide The Kabsch algorithm [2, 3] solves the constrained of a morphine-like activity (PubChem CID:443363) MET- orthogonal Procrustes problem. This problem refers to the enkephalin.xyz fail (< mol2.xyz >). comparison of two (or more) shapes. Aiming this, they N -1.347 0.242 -1.290 must be optimally superimposed by translating, rotating C 0.058 -0.100 -0.952 and scaling. Rotations of the matrices are only allowed. C 0.541 -1.375 -1.688 Widely studied proteins, both theoretically and C -0.407 -2.515 -1.332 experimentally, are used for a test set. The working N -0.004 -3.808 -1.498 algorithm is illustrated by following an excerpt of PDB file of human μ-opiod receptor. Fig. 1 shows a part of C -0.871 -4.926 -1.094 the structure of MOR [4]. C -1.964 -5.309 -2.111 The atomic coordinates are used to construct a N -2.283 -4.329 -3.025 reference matrix, which together with another matrix C -3.293 -4.539 -4.043 of coordinates (constructed in the same way), provides C -4.655 -4.025 -3.586 the algorithm input data (Fig. 2) [11]. O -5.148 -3.004 -4.056 The program is written in Python programming O -2.553 -6.388 -2.088 language and uses two fails of atomic coordinates < O -1.586 -2.242 -1.105 mol1.xyz > and < mol2.xyz >. C 2.011 -1.666 -1.347 The program uses a function that reads the atomic C 3.029 -0.652 -1.834 coordinates from PDB file and returns the non-hydrogen C 4.240 -0.531 -1.134 coordinates as arrays [5 - 9]. The .pdb-format contains a C 5.234 0.348 -1.569 lot of information about the molecules investigated, such C 5.030 1.110 -2.710 as name of each amino-acid, their coordinates, hbonds O 6.024 1.956 -3.096 over which they are related to each other, etc. C 3.850 0.996 -3.433 In the loop, the function loops through each line of PDB file and assigns the atomic coordinates to three lists, C 2.860 0.105 -3.005 named x, y and z. The coordinates x-, y- and z- are each H -1.623 1.085 -0.786 separated by two blank characters, while the coordinates H -1.967 -0.532 -1.051 are stored in a floating value format (Table 1 and Table H -1.418 0.414 -2.293 2). The advantage of the .pdb-files refers to the fact that H 0.154 -0.234 0.105 their structure can be visualized again in Chimera.