<<

Send Orders for Reprints to [email protected] Mini-Reviews in Medicinal Chemistry, 2014, 14, 35-55 35 Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review

Manoj G. Damale1, Sanjay N. Harke1, Firoz A. Kalam Khan2, Devanand B. Shinde3 and Jaiprakash N. Sangshetti*2

1Department of Bioinformatics, MGM’s Institute of Biosciences and Technology, Aurangabad (MS) India-431003; 2Y.B. Chavan College of Pharmacy, Dr. Rafiq Zakaria Campus, Rauza Baugh, Aurangabad (MS) India-431001; 3Department of Chemical Technology, Dr. B.A.M. University, Aurangabad (MS) India-431004

Abstract: The quantitative structure activity relationship (QSAR) study is the most cited and reliable computational technique used for decades to obtain information about a substituent’s physicochemical property and biological activity. There is step-by-step development in the concept of QSAR from 0D to 2D. These models suffer various limitations that led to the development of 3D-QSAR. There are large numbers of literatures available on the utility of 3D-QSAR for . Three-dimensional properties of with non-covalent interactions are served as important tool in the selection of bioactive confirmation of compounds. With this view, 3D-QSAR has been explored with different advancements like COMFA, COMSA, COMMA, etc. Some reports are also available highlighting the limitations of 3D-QSAR. In a way, to overcome the limitations of 3D-QSAR, more advanced QSAR approaches like 4D, 5D and 6D-QSAR have been evolved. Here, in this present review we have focused more on the present and future of more predictive models of QSAR studies. The review highlights the basics of 3D to 6D-QSAR and mainly emphasizes the advantages of one dimension over the other. It covers almost all recent reports of all these multidimensional QSAR approaches which are new paradigms in drug discovery. Keywords: Biological activity, Molecular descriptors, Multidimensional QSAR, Physicochemical property, QSAR.

INTRODUCTION model which indicates the association between molecular descriptors and biological activity, is validated internally and The account of Rational Drug design starts with the externally in order to assess the predicative power of the discovery of lead by trial-and-error process or QSAR model (Fig. 1). The interpretations of these models screening the library of lead compounds [1]. Nowadays, a are carried out by various methods like pattern recognition, Quantitative Structure Activity Relationship analysis is mostly machine and artificial intelligence. The first structure activity used in high-throughput screening of combinatorial libraries relationship study was conducted in the late eighteenth of small chemical compounds and moved further to check century when different alkaloids were studied, by Crum- the activity of a diverse set of designed small compounds Brown and Fraser. To demonstrate the alkylation of basic [2]. The QSAR is a knowledge-based method where a statistical nitrogen, of a ring system results in the formation of quaternary prediction model is made about biological activity and the t-amine compounds which are different from basic amines, presence of molecular descriptor. The aim of carrying out a and that now have significant change in its biological action. QSAR study is with the help of computational methods the Since then a variety of quantitative structure activity relationship QSAR model can help evaluate biological activity; this is studies have been reported to predict cytotoxicities, depressant mostly done to reduce failure rate in the drug development and antibacterial activity of chemical compounds [5-7]. process [3]. The historical aim of QSAR studies is to predict the specific biological activity of a series of test compounds. Thousands of QSAR equations have been formulated Nowadays the main objective of these studies is to predict using the QSAR methodology to validate and elucidate the biological activity of Insilico-designed compounds on the predicative power of QSAR hypothesis about the mechanism basis of already synthesized compounds [4]. of action of drugs at the molecular level and a more complete understanding of physicochemical phenomena such as In a QSAR study molecules are characterized based on hydrophobicity. In 1962 Hansch and Muir published their the presence of molecular descriptors and these descriptors brilliant study on the 2D structure-activity relationships of are mostly used to calculate the basis of physicochemical plant growth regulators and their dependency on Hammett properties of ligand molecule such as logP, pKa, mol. wt, constants and hydrophobicity [2]. The present review covers logD, molecular refractive index, molecular surface area, all recent developments in the field of QSAR. molecular interaction field, etc. The constructed mathematical SCHEME OF QSAR STUDY

*Address correspondence to this author at the Dr. Rafiq Zakaria Campus, The QSAR model studies are mostly carried out in three Y.B. Chavan College of Pharmacy, Aurangabad-431001 (M.S.), India; different steps. Tel:/Fax: +91-240-23801129; E-mail: [email protected]

1875-5607/14 $58.00+.00 © 2014 Bentham Science Publishers 36 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

Fig. (1). Schematic overview of the QSAR process. i. Understanding and Selection of Potential Molecular Table 1. List of desirable attributes of molecular descriptors Descriptors from Set of Biologically Active Conformers for use in QSAR studies. Understanding and selection of potential molecular descriptors from set of biologically active conformers is Sr. No. Desirable Features Associated with Descriptors most critical step in QSAR model generation as it helps to understand the nature of molecular descriptors prior to actual 1. Structural interpretation QSAR model construction. This mostly helps to reduce un-necessary error in study data. The specified properties of 2. Show good correlation with at least one property are used to select the potential molecular 3. Preferably allow for the discrimination of isomers descriptors like physicochemical properties; quantum-chemical, geometrical and topological (Table 1). As we do select the 4. Applicable to local structure potential molecular descriptor the necessary biological 5. Generalizable to “higher” descriptors information will be obtained from them. One of the earliest approaches for selection of molecular 6. Independence descriptor by manual inspection was plotting a (2D) plot of 7. Simplicity important molecular descriptor of bioactive conformers. And as of then several methods have developed but first and most 8. Not to be based on properties important computational method was cluster analysis 9. Not to be trivially related to other descriptors developed by Hansch which made easier to select compounds with diverse substituent on it [7]. Selection of relevant 10. Allow for efficient construction molecular descriptors is covered under 1D-QSAR model. 11. Use familiar structural concepts 1D-QSAR 12. Show the correct size dependence Various parameters are used to select the potential molecular descriptor that defines the specific molecular 13. Show gradual change with gradual change in structures properties of conformer like electronic constraints, hydrophobic constraints and steric constraints. Hammett was the first to study the electronic nature of a. Electronic Constraints chemical compound in case of benzoic acid ionization with water in a to determine activation energy The main aim behind the calculation of electronic effect (G). The various substitutions at meta and ortho positions is to know about inter and intra- molecular interactions, with the help of electron-withdrawing and donating groups which significantly contribute to biological action. Here the are studied. The analysis of both reactions was done, which common constant in the QSAR equation is studied, i.e. helps in understanding that electron donating groups will Hammett constants which include quantum chemical indices assist the rate of reaction. From the above observation one such as the lowest unoccupied molecular orbital, the highest can make meaningful correlation about change the in the unoccupied molecular orbital and polarizabilty. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 37 electronic nature of substituent and change in the activation ii. Analysis of Potential Molecular Descriptors in the energy. Context of Analysis of Activity As the QSAR methodology developed most intensively The correlations of biological activity to physicochemical with the help of the computational method, the electronic property are made by using the manual method by forming a nature of chemical compounds is now studied as a wave linear relationship between them [8]. The effects of Hammett function using methods like calculation of quantum chemical and Taft constant are studied for biological activity. The descriptor and the semi-empirical method. The quantum numbers of equation are generated to significantly intercorrelate chemical descriptor method uses constraints such as net the activity using a narrow gap between the number of atomic changes, highest occupied molecular orbital/lowest descriptors and the set of dataset. Once the set of molecular unoccupied molecular orbital (HOMO-LUMO) energies, descriptors is selected from them most informative sets can frontier orbital electron densities, and super delocalizabilities be used for the study. to correlate these it with biological activity. iii. Mapping the Specified Value Obtained for Each b. Hydrophobic Constraints Descriptor and Feeding as Independent Variable to Correlate With Its Biological Activity Hydrophobicity of a compound refers to the physicochemical nature of that compound mostly in connection with the solvent. The analyzed sets of molecular descriptors which are The first QSAR study of hydrophobic property was conducted used for mapping either by linear or nonlinear mapping in case of growth hormone by Hansch. There are many areas techniques. The quantified values of each descriptor are used where this property is studied: ligand-receptor interaction, as a function of activity. Many times the methods used for for example solvent interaction in case of detergency, carrying out mapping utilize the information about a training coagulation, membrane permeation and many more. set to obtain the optimal function [7]. The hydrophobic nature of solute molecule across the 2D-QSAR solvent is mostly studied by measuring the partition coefficient The 2D-QSAR study is mostly based on specialized P, where P is the ratio of concentration of solute present in polar and non-polar solvents. The shake flask method is molecular fragments that constitute the chemical compound. Mostly, there are different descriptors that include constitutional, commonly used to measure the P of a compound in the form topology, total polar surface area, electrostatic and quantum- of logP value ranging from -3 to 6. There is inconsistency in chemical, geometrical and molecular fingerprints property of the manual measurement of accurate logP value so it has the chemical compound [9, 10]. been replaced by the automated system ClogP. a. Constitutional Descriptors c. Steric Constraints These are the descriptors that reflect the constitutional Steric effect of a molecule arises when a chemical compound property of a chemical compound without dealing with attempts to take certain space and this is because of its connectivity or geometry of the chemical compound. These charge or its specified shape or size. The steric nature is an descriptors are molecular weight, number of , number important phenomenon to study the transport of chemical of hydrogen, number of carbon, number of , number moiety across bio-membranes. R.W. Taft was the first to study of oxygen, number of nitrogen, number of ring system, number who studied the steric effect of a compound and termed it as of bond like, number of single bond, number of double bond, Es. Some important parameters that address steric effect more number of triple bond as well as number of aromatic bond effectively are molar refraction (MR) and molecular volume. and many more [7]. Molar refraction of the compound is studied in the form of refractive index which measures the overall bulk of a b. Topological Descriptors compound. However, as the reports of several studies show The topological descriptors mostly deal with the measuring molar refraction of the compound did not allow arrangement of chemical compound which defines the distinguishing between some shapes of alkyl substituents; information of a compound like orientation of the internal hence, molar refraction has been replaced by STERIMOL bond, molecular size, shape, branching and presence of parameters. These STERIMOL parameters give an overall hetero-atoms [7]. Overall the topology of a compound is account of the dimensions of compound like length of the mentioned in the form of 2-D graph like nodes and edges. substituent or the bond angle between substituents. In this the There are several indices that are used to signify the steric effect of a compound at several fixed axes is studied. The molecular connectivity of the compound and are categorized 3D steric property can be studied for both substituent and as topochemical and topostructural indices. These indices parent compound by studying parameters such as bond length include the Wiener index, the Randic index, the Balaban J and bond order. index, the Scultz index, the Kier and Hall index, the Galvez Parameters such as the length of the substituent and bond topological charge index, the BCUT Eigen value index, the length between the parent and the substituent are taken into E-sate index and so on [7, 11]. All of the above-mentioned consideration and by including all these parameters in the indices define the overall connectivity, average distance, study, chemical compound can be studied three dimensionally. average valence, net charge transfer, bond polarizabilty, Molecular volume of the compound affects on bulkiness electronic and topological organization of in the form study by affecting the transport of moiety across the cellular of nodes and bond in the form of edges. Topological index is membrane [2]. also very helpful in determining the structure and substructure 38 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al. in case of chemical mining where structure similarity generates numbers of key fingerprints corresponding to a and diversity are measured in a set of structural data using feature of molecule like the nearest neighborhood atom pair, precise computational algorithm [12]. bonded sequence of atom, the specific fragment of ring, etc [14]. c. Topological Polar Surface Area 1D vs. 2D-QSAR The topological polar surface area (TPSA) is the most 1D-QSAR is a “classical” form of the QSAR, mostly conveniently used term in case of ADME prediction of focused on macroscopic properties of chemical compounds, chemical compounds. TPSA is also commonly exploited to by representing them in the 1D linear form for example, measure the relative propensity of ligand molecules, for polar molecular formulas [15]. Predefined variables are calculated interaction in specific receptor molecule, but still TPSA is by hand count or a pocket calculator where the concept the least commonly exploited term. The methods like multiple works on the classical approach given by Hansch. The numbers linear-regression analysis have been employed to study of additive properties and indicator variable are used to TPSA. Analysis of coefficient of linear–regression will help construct the QSAR model [2, 16]. us know the polar fragments in a compound favoring or The 2D-QSAR is more superior to the classical approach disfavoring the biological activity [13]. [17]. Topological encoded information is enough to construct d. Quantum Chemical Descriptors the model. The limited numbers of additive and indicator variables are used to construct the QSAR model [7]. It is Quantum chemistry method in combination with recently very easy and fast to calculate the encoded information developed efficient computational algorithms is used to related to molecular architecture [12]. The 2D-QSAR is a study quantum mechanical calculations. These calculations knowledge-based approach that requires some constitutional are mostly focused on electronic and geometrical properties information about the compound while constructing the of molecules. Because of this reason most of the time model [18]. The linear and nonlinear methods like multiple electrostatic and quantum chemical descriptors are studied in linear regression method, GA, GA-PLS and PLS are used to combination. The mostly used quantum chemical descriptors predict the QSAR model [19]. The steriochemical information in QSAR studies are atomic charges, molecular orbital about the compound is not required by the model as it is built energies, frontier orbital densities, atom-atom polarizabilty, on 2D properties of compounds. It gives better value for a molecular polarizabilty, dipole moment and polarity indices, correlation coefficient than actual QSAR model prediction. total energy and many more. Quantum chemical methods are The six subtle principles are set by Organization for Economic very fast and accurate methods and mostly neglect the Co-operation and Development (OECD), and 2D-QSAR solvent effect, so the results obtained from the quantum satisfies all the principles except for mechanical interpretation. chemical method are most valid and can be directly applied The 2D-QSAR also suffers from problems such as lack of for correlation with biological activity [7]. interoperation or recognition ability in search of active e. Geometrical Descriptors/Molecular Fingerprint Descriptors compounds. The 2D-QSAR prediction method also has a limitation in the prediction of of a training Molecular fingerprint descriptors are used to represent dataset selected in the study [20]. the molecular properties of a compound. There are two methods using which molecular fingerprint can be calculated: 3D-QSAR one based on fragment dictionary and another is the hash Three dimensional quantitative structure activity method (binary bit string). In hash method, each bit is a relationships focused broadly on all such properties of pattern which that represents the characteristic of molecules atoms in a compound that are represented as descriptor and it like structural fragment, connectivity of the molecule and mainly corresponds to spatial representation of a molecule pharmacophoric nature. Each bit is encoded in the form of [21]. In early 1980, a novel approach in structure activity binary format with a specific value. These molecular relationship study was put forward, which include the study fingerprints are mostly used for searching of molecules of molecular properties of chemical compounds in a 3D grid “similar” to a query molecule; for example, Daylight chemical box. The calculated molecular properties are then subsequently information system fingerprint is a unique subgraph search correlated with biological activity, using a technique called algorithm that works on the path-based approach. This DYLOMMS (dynamic lattice oriented molecular modeling algorithm works on learning from the set of training system) [22]. These properties are innate in nature and are molecular fingerprint of unique connection path (subgraph) prolonged due to their molecular framework and mainly are to a large bit string (maximum of eight bit) [7]. In 1982, geometrical, electrostatic and quantum in nature. The mentioned Molecular Drug limited (MDL) started a program that helps properties were first studied by Hansh et al. using a multi- in storage and retrieval of chemical reaction information. linear regression equation. The prediction of biological MDLs create a key or compact set of fingerprints, where activity of chemical a compound is mostly done based on predefined observations are used to create a pattern matching detailed information of receptor and ligand molecules. fingerprint. The pre-defined data search learning is carried However, in many cases 3D information of receptor molecules out using a large dataset of chemical compounds present in was not known, and in these cases the indirect method of the MDL databank (320 keys from their 966 sets). Barnard 3D-QSAR is mostly followed. The indirect approach of 3D- Chemical Information Systems (BCI) is another predicative QSAR is based on information of the ligand molecule such fingerprint model that works by on first predicting the as molecular alignment of atoms, pharmacophores, volume, fingerprint and then implementing it like the presence or or fields to generate a virtual receptor [23]. absence of certain fragment of molecule. Typically, BCI Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 39

A. 3D-QSAR DESCRIPTORS receptor interactions like favorable and unfavorable receptor- ligand interaction. As previously discussed, CoMFA is an The 3D-QSAR approach is advanced, well established and alignment-dependent descriptor methods. All aligned ligands the most exploited technique for structure activity relationship are placed in energy grid and by placing the probe at each study. The concept is focused on the prediction of biological lattice point, energy is calculated. The resultant energy activities based on 3D properties of lead compounds by using a series of linear and nonlinear analytical predicative calculated at each unit fraction corresponds to electrostatic methods. The 3D-QSAR approach is simpler than the (Coulombic) and steric (Van der Waals). These values serve traditional structure activity relationship study. The major as descriptors for further analysis. These values are further objective of a 3D-QSAR study is to improve the activity of correlated with other biological activities using the linear the lead compound by optimization and structural modification regression method like partial least square (PLS). PLS [24]. results served as an important signal to identify the favorable and unfavorable electrostatic and steric potential and also to The first step in a 3D-QSAR study is to collect or design help correlate it with biological activity [24]. a representative starting 3D structures of ligand molecules and thereafter refine them based on geometry and energy b. Comparative Molecular Similarity Indices Analysis values [7]. The mostly used methods for energy optimization (CoMSIA) are semi-empirical, molecular mechanics, and quantum Comparative Molecular Similarity Indices Analysis mechanics [10]. In the next step a database of conformers are (CoMSIA) is a recent modification of CoMFA. The approaches created belonging to lead molecules. These are flexible in of CoMFA and CoMSIA are similar, except for molecular nature and are available in multiple forms. These multiple similarity which is calculated additionally [22]. The CoMFA forms of compounds are included in the QSAR study. From mostly focuses on the alignment of molecules and may multiple conformers, peculiar bioactive conformers are searched lead to error in alignment sensitivity and interpretation of and selected for a particular study. The selection of bioactive electrostatic and steric potential. To cut lose CoMSIA fields, conformer is done based on knowledge-based methods like Gaussian potentials are used; they are much 'softer' than the experimental and theoretical by measuring the binding CoMFA functions. The regular energy grid box is constructed affinity towards receptor molecules. The selected datasets of and similar probes are placed throughout the grid lattice. In bioactive compounds are then aligned uniformly using the addition to this solvent dependent molecular entropic term computational tool. Once all the conformers are aligned 3D which defines hydrophobic term also included in the study. properties of a molecule like steric and electrostatic are To analyze the property of dataset atom, a common probe is calculated using a lattice probe by placing the probe at placed and similarity at each grid point is calculated. The different location lattices. At each probe point, the molecular calculation is mostly done on steric, electrostatic, hydrophobic structure can be measured with sets of numbers which are and hydrogen bonding properties. All of these properties are called descriptors [24]. These numbers mostly represent calculated at regular spacings of grid point corresponding to physicochemical and biological properties of a molecule. a particular descriptor and these are important in correlation The descriptor calculation can be done with (dependent) or with biological activity [24]. without (independent) the alignment of bioactive conformers [7]. c. Genetically Evolved Receptor Modeling (GERM) i. Alignment Dependent Descriptor Methods Genetically Evolved Receptor Modeling (GERM) is a theoretical knowledge-based method. The construction of a There are several methods used in the calculation of 3D 3D structure of receptor active site is done by using Homology descriptors focusing on molecular alignment prior to the modeling in the absence of experimental structure like X-ray calculation of 3D descriptors. These methods calculate the crystallography and NMR spectroscopy [28]. As an initial descriptor by mapping receptor atoms or ligand atoms or step in the GERM 3D-QSAR, reasonable series of structure complexes of receptor-ligand atoms. Various alignment- activity relationship are selected and alignment of these dependent descriptors are Comparative Molecular Field reasonable bioactive conformers is done. All the aligned Analysis (CoMFA), Comparative Molecular Similarity Indices conformers are enclosed into the receptor active site and Analysis (CoMSIA), Genetically Evolved Receptor Modeling allocating them as a shell of atoms. The allocated shells of (GERM), Comparative Binding Energy Analysis (CoMBINE), atoms are considered an explicit set atom (aliphatic H, Adaptation of the Fields for Molecular Comparison (AFMoC), aliphatic C, polar H) and matched at the receptor active site Hint Interaction field analysis (HIFA) and Comparative Residue similar to those found in the receptor active site. Hence, the Interaction Analysis (CoRIA) [7, 25, 26]. shell of aliphatic carbon has been replaced by a uniform a. Comparative Molecular Field Analysis (CoMFA) sphere of an aligned training set. The position of model aliphatic carbon atom and aligned ligand training set are Comparative Molecular Field Analysis (CoMFA) helps adjusted so that maximum Van der Waals interactions are in building the quantitative relationship. This is molecular obtained. Once the position of aliphatic carbon has been field-based method and was developed by Cramer in 1988 identified their position can be occupied by any other atom [4]. This is a more selective method as compared to the or no atom. As practically, one could replace atom type with traditional classical QSAR methods. The first CoMFA study spheres or combination of both can be used and as a result of was carried out in the case of steroid [27]. It mostly focuses this large rendered model can be generated. To deal with this on ligand properties (steric and electrostatic) and ligand- problem and to a create number of possible conformers in to 40 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al. the active site of receptor, genetic algorithm is used, where a iii. Making Correlation between Interaction Field Value number of possible conformers of ligand molecules are Calculations and Binding Affinity Prediction generated using the well-suited docking program. The Theoretically calculated interaction field values are intermolecular bonding between the ligand and receptor correlated with experimentally determined biological affinity complex is calculated using advance force field like for surrogated ligand molecules using PLS analysis [24, 34]. CHARMM, which mostly computes electrostatic and Van der Waals interaction. The calculated binding energy value is f. Hint Interaction Field Analysis (HIFA) then used to correlate with biological activities [29]. The It is a newly developed program used to calculate GERM technique is mostly suited for de novo drug design empirical hydrophobic interaction and extension of CoMFA. process where above the mentioned-approach is followed As a result of the introduction of hydrophobicity calculation [30]. in CoMFA, the predicative power of it for QSAR model has d. Comparative Binding Energy Analysis (COMBINE) increased. It calculates key hydrophobic features, which are atom-based analogs of the fragment constant. It uses the This method works on empirical knowledge-based method already-published data to predict hydrophobic field interaction. where experimentally demonstrated complex of the receptor The methodology of HIFA is to calculate hydrophobic field ligand is used for further determination of molecular properties interaction in the same manner as that of CoMFA by aligning to correlate with biological activity [31, 32]. As the name the ligands and then placing them into a grid, followed by suggests, this technique mostly focuses on the free energy of the interpretation of the net sum of hydrophobic interaction. binding of ligand molecules, and are calculated using molecular mechanics force field. The series of complexes are observed g. Comparative Residue Interaction Analysis (CoRIA) for intermolecular potential interaction like electrostatic and Comparative Residue Interaction Analysis is a recent Van der Waals as per residue in the active site. Another invention in the field of 3D-QSAR studies. There are several alternative approach has been followed where ligands are advanced modifications in CoRIA methodology like reverse- fragmented into similar fragments and each ligand molecule CoRIA (rCoRIA) and mixed-CoRIA (mCoRIA) [35]. The is build by incorporating a dummy fragment into it that is not main emphasis of CoRIA study is to calculate and analyze essential and intermolecular potential interactions are the receptor–ligand complex and thereafter predict the binding calculated. The energy is calculated for all pair of atoms in affinity of the complex. The binding energies in the form of receptor active site residue and ligand atom on the basis of non-bonded interactions like van der Waals and Coulombic the distance-based method. The significant descriptors are which describe thermodynamic events involved in ligand retained and others are eliminated for the study data. Then binding to receptor, are calculated. These interactions are statistical technique like PLS is used to generate the QSAR correlated with biological activities using the G/PLS analysis model to quantify the most important energy interaction in method, which is an advancement of the PLS method which terms of activity prediction [24]. additionally covers several variables like lipophilicity, molar refractivity, surface area, molecular volume, Jurs descriptors e. Adaptation of the Fields for Molecular Comparison and strain energy [36-38]. (AFMoC) ii. Alignment-Independent Descriptor Methods This is a very recently developed QSAR technique put forward by Klebe et al. [4]. It is also called “Inverted CoMFA” The conventional methods based on the alignment approach derived from potential scoring function (drug score). The have many limitations like they are time consuming, can methodology of AFMoC is similar to that of CoMFA and introduce user biasness and it may affect the sensitivity of CoMSIA but the additional advantage is the involvement of the resultant model. To overcome all these limitations a protein environment in the study. The protein-specific potential novel class of method has been adopted, which is independent fields are generated into binding sites, which are used for the of alignment and is not affected by radiation or transformation prediction of binding affinity [24, 33]. The overall methodology of the molecule. The different methods belonging to this of AFMoC is discussed below in three steps. category include Comparative Molecular Moment Analysis (CoMMA), COMPASS, Holo-QSAR (HQSAR), Weighted i. Potential Field Calculation and Ligand Alignment Holistic Invariant Molecular Descriptors (WHIM), Comparative Drug score is a new developed knowledge-based scoring Spectral Analysis (CoSA) and Grid Independent Descriptors function based on distance-dependent pair-potentials. The (GRIND). atom by atom pair-wise potential is calculated in case of a. Comparative Molecular Moment Analysis (CoMMA) ligand and protein environments. These potential values are calculated using a suitable probe at the intersection at each The Comparative Molecular Moment Analysis (CoMMA) grid point constructed around the binding pocket. is alignment independent descriptor method. The CoMMA mainly focuses on 3D/spatial arrangement of molecular ii. Interaction Field Calculations fragment and calculates different molecular moment with Potential field map generated for complexes of ligand respect to center of mass, center of dipole and center of and receptor is used as an interaction field to calculate atom charge. Descriptors for CoMMA are zero order descriptor type and distance-dependent interaction (3D Gaussian (molecular weight and moment of inertia with respect to functions) for each atom at each grid point. center of mass), first order descriptor (magnitude of dipole moment ()) and second order descriptor (quadruple moment Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 41

(Q)) with respect to molecular charge. Secondly, dipole moment It is mostly used to represent properties of 3D descriptors of components around the axes (x,y,z) and displacement d in in the form of indices. It contains the case of center of dipole and center of charge are calculated information about ligand structure in the form of their size, using principle of inertial axes. All these are calculated by shape, symmetry and atom distribution. These indices are taking reference frame (initial) into account who will calculated about Cartesians coordinates around x, y, z axis of superimpose with center of dipole moment. Each value obtained energy minimized ligand structure. These minimal energy from calculation of principle of inertial axes for center of structures are subjected to weighting scheme has been applied mass, center of dipole and center of charge are used as to describe them in the form of unitary conceptual framework. descriptors. Finally these descriptors are correlated with The indices are used to search proper QSAR model. The biological activity using PLS analysis techniques [7, 39]. G-WHIM and MS-WHIM are modifications of WHIM method where in case of G-WHIM (Grid-Weighted Holistic b. Compass Invariant Molecular) descriptors are calculated using gird. The quantitative structure activity relationship often uses The coordinates of each atom of grid are set and probe nonlinear method of correlation in activity because it mostly interaction energy potential is calculated at each point. The predicts accurate activity as compared with linear method. MS-WHIM called as molecular surface Weighted Holistic Compass work on concept of artificial neural network. The Invariant indices, it mostly focuses on theoretical descriptors compass focuses on features such as molecular surface and which enabling the information like size, shape and electrostatic conformation selection for alignment. The selection of distribution of a molecule [4, 7]. proper ligand shape is more significant because it mainly e. Comparative Spectral Analysis (CoSA) flexible in nature and may adopt many conformations with slight change in geometry. The alignment of conformation is Comparative Spectral Analysis (CoSA) has been done and corresponding similar part is found. As said above recently developed and not yet explored fluently except the conformation will be aligned in many ways and which few applications. In this technique molecular spectroscopy help in predicting biological activity. There is automatic methods have been used for determination of three dimensional selection of bioactive conformation and creating a model of molecular descriptors of chemical compounds in 3D-QSAR alignment for each conformer. study. The molecular spectra used to predict biological activity of three dimensional structures. The spectroscopic method The algorithm on which compass work is divided into mostly includes are Proton (H)-NMR, carbon C13NMR, IR and three phases, the initial phase is called as alignment of pose . The data generated through spectroscopic where alignment of different conformation is done to find studies are converted into matrices values with the help of bioactive one. Second phase is called as bioactive model appropriate tool and then correlated with biological activity construction; this phase is focusing on facts like selection of by using PLS analysis. The comparative study of CoSA and bioactive conformation and constructing statistical model CoMFA has carried out to evaluate predicative nature of which uses properties of molecular features. The quantitative CoSA and fortunately it gives better correlation values as relationship is established between molecular surface properties compared to CoMFA studies [43]. and biological activity. The third phase is called as molecular display, the relationship between molecular properties and f. Grid Independent Descriptors (GRIND) biological activity displayed and which will help in molecular Grid Independent Descriptors (GRIND) is first method modeling [39-41]. developed as an alternative for method like CoMFA. The c. Holo-QSAR (H-QSAR) basic approach of GRIND and CoMFA are same. Both methods are grid based methods where probe is placed at It is recently developed QSAR technique by Heritage and particular gird lattice around the three dimensional structure Lowis (1997), which mostly focus on molecular fragments, of macromolecule complex. The non bonded interactions are exploring the chemical and biological data of chemical also calculated like electrostatic, steric and van deer Waals. compounds. Each molecule is broken into the specified This technique has several advantages as it uses “DRY Probe” unique form and there after these fragments are used to form to calculate HBD, HBA and Hydrophobic interaction. a Holo-gram. The molecular fingerprints have been defined Additionally it utilizes most softer/smoother potential in a pattern of fragments that binds in predefined order of function method to calculate Van der Waals interaction. The array. These fingerprints are used to define nature and type values of each potential probing point can be correlated with of molecular fragments. The three dimensional property of biological activities by using multilinear regression like PLS the molecule are used to define hybridization and chirality. analysis [44, 7]. Each corresponding molecular fragment has its peculiar B. STATISTICAL APPROACHES FOR SELECTION physicochemical properties and that will be correlated with OF RELEVANT MOLECULAR DESCRIPTORS corresponding biological activities by using PLS analysis in to order construct H-QSAR model [42]. Structure activity relationship study concepts mainly d. Weighted Holistic Invariant Molecular Descriptors focus on various chemical characteristics of chemical data and from that it mostly focuses on the retrieval of a desired (WHIM) set of information. To do this, wild spectrum of data analysis Weighted Holistic Invariant Molecular Descriptors method has been used, which works on various statistical (WHIM) is recently developed and slightly different approach correlations. These data analysis methodologies are mostly of 3D-QSAR technique as compared to conventional approach. meant for recovery of primary and secondary information. 42 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

This information is about dependent and independent variables iv. Partial Least Squares (PLS) in a correlation study, i.e. activity (y-variable) and molecular This is an improved QSAR model predication technique descriptors (x-variable) [24]. introduced by Hermann and Svante Wold. PLS is the most i. Linear Regression Analysis commonly used technique in QSAR model analysis. It is used to make a more attractive QSAR model because it predicts It is the first type of regression analysis and has more a more realistic and complicated SAR data for biological application in the practical predicative method mostly used activity. It is also called the latent or projection structure for prediction of relationship between dependent and method. Here the large numbers of descriptors can be independent variables (i.e. ss x and y). The simple linear transformed into a small number of new orthogonal terms, regression can be expressed by the following equation: called the latent variable. The numbers of latent variables are y= a + bx used to define the dependent variable. The main aim of this technique is to form a relationship between matrixes (features where ‘a’ is called the intercept constant, ‘b’ the regression and property). The SIPLS is the most commonly used coefficient, “x” is depicted as a molecular descriptor, which algorithm in the PLS technique. It is mostly used as a is one or more than one in numbers and also called the predicative model of 3D-QSAR. explanatory variable, whereas “y” is called as the dependent variable and it mainly corresponds to activity in the QSAR v. Genetic Function Approximation (GFA) study. Here values of x and y variables are fed to the above Genetic function approximation (GFA) is a new equation and linear regression analysis can be applied for computational algorithm derived from the G/SPLINES predicative analysis [24]. algorithm developed by Rogers [46]. It has an advantage ii. Multiple Linear Regressions (MLR) over the conventional approach used for QSAR model prediction, like it builds the model that has a higher The LRA is used to predict the relationship between predictability and can address the problem that is not solved variables in SLR equation, whereas MLR is used to determine by the standard regression method. It also has the capabilities the quantitative relationship between them. It is also referred like generating multiple models and selecting multiple features to as the linear free energy relationship method; thus it is an for model construction. It includes the higher order polynomial extension of SLRA [45]. Here 2D relationship between x and and spline function where the main focus is on creation of y is defined using the SLR equation. The aim of applying the many nonlinear models. The initial models generated are SLR equation is the value of ‘a’ and ‘b’ in the equation evaluated using the natural selection hypothesis. The automatic defining the best prediction of the x and y variables. There method then after is applied to remove outliers. The GFA are several best methods available for finding the correlation method uses the LOF (lack of fit) score for measurement of between them like the Student t-test, standard deviation and error in each QSAR model. Each model goes routinely under multiple correlation coefficients or the independent method evaluation test for fitness of model and the numbers of feature like the leave on out method. In MLR analysis the relationship required for constructing an accurate model. It also provides between x and y variable is expressed by modifying the checks for outlier and over-fitted data [47]. SMR equation and adding several new terms in it. The MLR equation is as follows: vi. Pattern Recognition y = b0 + b1 x1 + b2x2 +………… + bm xm + e Pattern recognition is a value-defined method mainly focused on the classification of data [24]. The basic difference where b1 is used to estimate the regression coefficients and e between the classical approach of QSAR and the pattern is the minimizing residual error used to quantify deviation recognition method is that it uses a large number of variables. from linear relationship on the regression line. The significance The relationships between large numbers of variables are of correlation is judged by calculating the correlation studied using the pattern recognition method like how much coefficient and cross-validating it [24]. diverse or closeness is present in it. The most commonly iii. Multivariate Data Analysis employed methods of pattern recognition are cluster analysis, Artificial Neural Network (ANN) and k-Nearest Neighbor The chemical data used in QSAR analysis are (k-NN) [24, 48, 49]. multidimensional in nature where features of a chemical compound are defined by many other data components [46]. vii. Cluster Analysis The chemical data used in QSAR analysis are usually Cluster analysis is the basic method for classification of multidimensional in nature, by means of which features of data and also has been called as the statistical pattern the chemical compound can be defined by many other data recognition method or the distance-based approach, where components. The multivariate techniques are specifically used data is clustered in so many groups according to the close to reduce the multiple components within the data. The proximity between them. In this method, four key steps are techniques used are principle regression analysis (PCA), partial involved: generating key features of each chemical data set least square analysis (PLS), genetic algorithms (GA) and and calculating similarity or diversity in data set, then using genetic algorithms- partial least square analysis (GA-PLS). various clustering programs cluster the data and finally one The statistical analysis of these features is represented using representative is selected from each clustered class. The the matrix that has row and column. Each of them represents similarity between data in each class is calculated using the property and features of chemical compounds [24]. similarity coefficient and each chemical data representative Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 43 is noted using a binary set of descriptors. Most clustering 2D vs. 3D-QSAR techniques are non-overlapping and this overlapping method The primary goal of 3D-QSAR is to establish the is divided into two classes: hierarchical and non-hierarchical relationship between biological activity and spatial properties of [50]. chemical compounds like steric, electrostatic and lipohilic. viii. Artificial Neural Networks (ANN) The 3D-QSAR is mostly applied to series of chemical compound to find out the molecular pharmacophoric features Artificial neural networks (ANN) was developed by in it by carrying out analysis and optimization of those ones Teuvo and Kohonen and is mostly used in the QSAR approach that can increase the biological activity [24]. The 3D-QSAR because it is transparent and easily interpretable. An ANN is studies are mostly focused on how change in 3D structural a data analyzer and a data processer technique and works on features of chemical moiety will result in change in the the function of biological nerve system. The architecture of biological activity. It mostly provides information about the ANN is similar to that of a biological nerve system where structure activity relationship in the form of graphical form each neuron is connected to each other and each neuron is for easy understanding, thus making it the most attractive artificial in nature. Here each neuron is a highly processing method [22]. The structurally diverse set of compounds is unit connected to each other and it works in parallel, mostly easily studied for structure activity relationship as compared used to design parallel computational systems. A series of to the 2D-QSAR traditional approach [17]. The prime layers arranged as a network on each other and each network difference between 2D and 3D-QSAR is in the use of consists of nodes and edges. The first layer consists of the statistically more robust methods for selection of molecular input layer where it uses the input fed by the user, the second descriptors like simple linear regression, multiple linear one is the middle layer – also called the hidden layer – where regression (MLR), principle component analysis (PCA), data analyzed in QSAR are mostly of independent variables Principle component Regression (PCR), PLS analysis, GFA, like property of molecular fragments and last layer is output Cluster analysis, Artificial Neural Networks and k-Nearest [51]. Neighbor method, which help in quantitative prediction of a ix. k-Nearest Neighbor (k-NN) diverse set of 3D properties of chemical compounds [19]. k-Nearest Neighbor (k-NN) is one of the most simple and 0D- 3D QSAR exploited pattern recognition methods. Here, k is a small The simplest and easiest way to represent chemical positive integer and the method has an objective that it compound is molecular formula, simply is the constitutional classifies the object (i.e. chemical compound). The k-NN properties. Pérez-Garrido et al. [89] explained multiple linear utilizes the principle of Euclidean distance metric to classify regressions by QSAR model of - cyclodextrin (CD). The study data with close proximity. In the QSAR study, the average aim and objective of study was to establish correlation majority is calculated by the voting neighbor in a similarity between CD binding constant and substituent changes in CD index value to its molecular descriptors. The actual working structure. Free software package is used to study 0D to 3D- of k-NN follows the Euclidean distance metric and it is used QSAR called as DRAGOAN. The study of 233 chemical to calculate the similarity index between numbers of chemical compounds are done by calculating 1600 molecular descriptors compound present in the dataset, i.e. training and unknown and further divided them in to 0D, 1D, 2D and 3D. The entity [51]. This calculation will help determine the possible separation between these descriptors is done by clustering value of integer kappa (k), the above values obtained will help technique called as k-means clustering analysis. The efficient classify the unknown compound in the dataset dependent variable in the procedure is selected by selection procedure upon average majority of the unknown compounds. The above- called as Genetic algorithm; the multiple QSAR models are mentioned technique is cross-validated with the highest satisfying the internal predicative accuracy by cross validation. leave-one-out (LOO) cross-validated correlation coefficient 2 2 The further study done in case of external data using genetic (r ) and (q ) [52]. simulation approach for cross validation of the data. This C. LIMITATIONS OF 3D-QSAR will help in statistical predication of external data. There is significant variation in case of many models (0D,1D and The 3D-QSAR is a novelistic approach as compared to 2D), but there is good correlation between CD binding constant the 2D-QSAR but still facing some drawbacks like huge and substituent changes in case of hydrophobic and steric number of chemical data (thousands of chemical structure) descriptors (3D). cannot be handled by this approach and for this high-throughput virtual screening method has been used [53]. Investigation of 4D-QSAR the active conformation of flexible compounds in a study set The recent study has suggested that techniques used in is critical followed by the specification for the molecular 3D-QSAR like CoMFA had the same limitation in predictive alignment in constructing a 3D-QSAR model [54]. The quality. To work out the above limitation there is a need to intermolecular interactions of receptors also need to be improve the description or representation of molecules, studied because different types of interaction will take place alignment of the compounds and statistics used in activity at different sites. These intermolecular interactions of the predication [57-58]. To overcome these limitations a new molecules or receptors are called as interaction pharmacophore. approach of QSAR (4D-QSAR) is evolved where modeling The topological descriptor studied in 3D-QSAR works on statistics are improved, which helps in better predicative quality. the same connectivity principle as in 2D-QSAR. The 3D orientation of ligands needs to be studied as it may affect the The recent advancement in QSAR developed two rate of selection of correct bioactive conformation [55]. approaches, namely receptor independent (RI-QSAR) and 44 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al. receptor dependent (RD-QSAR) [59]. These approaches i. Initiation of Reference Grid for 3D Models of Training have a number of steps that are involved, like the generation Set of multiple conformation, alignment and consideration of This is a foundation step in RI 4D-QSAR analogous to multiple sub-structure groups, are taken into consideration CoMFA in 3D-QSAR where around 3D structure of training [60]. The determination of free energy of binding is very set reference gird box is specified. This is one of the important in receptor-ligand interaction as it mainly corresponds parameter in RI 4D-QSAR study. In 4D-QSAR initial 3D to the loss of free energy, also excluding of some solvent structure are starting point in conformational ensemble molecules from the active site of receptors. These factors sampling of the training set. The training set conformations significantly contribute to the measurement of binding affinity with minimum free energy and having common torsion angle between them [61]. All these activities in the active site of are selected. This also provides reference points in 4D- receptor will lead to change in topology; this phenomenon is QSAR analysis. commonly called as induced fit. The topological change will also lead to change features like hydrophobic, hydrophilic, ii. Selection of Interaction Pharmacophore Elements electrostatic, dielectric or steric and solvent accessibility (IPE) [62]. The induced fit and flexibility the of receptor binding The each atom in each molecules are classified into five pocket to an individual ligand topology are intensively studied to six different classes like, all atoms of the molecules in multidimensional QSAR. All these multidimensional (IPE)a, polar atom of the molecules (IPE)p, non polar atom properties predicted by the QSAR approach is quantifies of the molecules (IPE)n, hydrogen bond donor(IPE)hbd, for all such parameters to sufficiently predict the QSAR hydrogen bond acceptor(IPE)hba, user defined IPE types(IPE)x, model. aromatic carbon and hydrogen. This classification helps to The fourth dimension of QSAR analysis is also called analyze and understand different interactions involved in “ensemble sampling” [55]. This is new dimension to each pharmacophoric site. quantitative structure activity relationship study mainly iii. Creation of Conformational Ensemble Profile (CEP) focuses on some of the problems raised in the 3D-QSAR study. 4D-QSAR is as an extension of molecular shape analysis The conformational ensemble sampling is done in training (MSA) introduced by Hopkins et al. [62, 63]. They found dataset in study which helps in find out active conformation that the 4D-QSAR model was more robust and yielded a in training data set. The molecular dynamic simulation (MDS) more predicative model as compared to the conventional 3D- is an advance approach routinely used to create ensemble for QSAR approach like CoMFA. The main focus of this each training set molecules. CEP uses Boltzmann sampling approach is to the screen libraries of bioactive conformations techniques. The objective of this step is achieved by systematic to check their structure activity relationship and the establishment conformational search technique or stochastic conformational of fond relationship to biological activity. The principle search technique. Large number of conformers are explore and behind 4D-QSAR study is structure-based drug design (SBDD) correct conformation state is selected. As mentioned above, where issues like ligand conformational flexibility the multiple Boltzmann sampling is commonly used because of list of advantages like: alignment exploitation have been solved [55]. There are several parameters in 4D-QSAR used in analysis like grid 1) It is independent on sampling size. cell size(s), molecular dynamic simulation of reference molecules (R), temperature (T), size of initial ensemble sampling (Es), 2) Different starting state lead to produce same sample distribution. number of alignments (Na) and numbers of descriptors in initial basis set (Nd) [55, 62, 64-67]. As discussed above 3) As different sampling scheme has used that will lead to there are two main elements. produce same state of optimized three dimensional structures. A. RECEPTOR INDEPENDENT 4D-QSAR 4) The average rate of change in energy with change in Receptor independent 4D quantitative relationship study state is almost zero. has a significant impact in rational drug design. The application of RI 4D-QSAR comes mostly in the picture, iv. Selection of the Trail Alignment when the researcher either wants to find pharmacophoric The molecular alignment of training data set is major features of the ligand molecules or to find the projected problem in 4D-QSAR study and can be solve by rapidly changes in ligand structure [67]. The ultimate aim of RI 4D- evaluating the trail alignment. The rapid evaluation of trail QSAR is to obtain maximum structural information after the alignment is done by searching and sampling operation structure activity relationship study. The advantages of RI analogs to CEP. In general this is achieved by designing RI over RD, it will design and construct Pharmacophoric 4D-QSAR algorithms which help in alignment analysis by features for limited number of substituent, design and map decoupling of conformational analysis and further rapid analysis rational base for substituent placement on scaffold and designed of conformation to investigate the molecular descriptor on Pharmacophoric model can be used as an initial filter in molecular alignment. The CEP for each compound from virtual screening. The successful implementation of RI 4D- training data set is evaluated for molecular alignment and QSAR is done in studies like TMPKmt inhibitors and Isoniazid this mainly corresponds to significance of 4D-QSAR model. Derivatives [67]. There are ten principal steps involved in RI The molecular alignment produces unique models of every 4D- QSAR. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 45 compound in gird box. This will result in development of biological activity and occupancy value of GCODs. The occupancy distribution for given CEP. values obtain from PLS regression related quantitatively with 4D-QSAR model, by giving specific weight age to each v. Construction of Grid Cell Occupancy Profile (GCOP) GCODs and quantitative relationship between small groups and Calculation of Grid Cell Occupancy Descriptors of selected GCOD represented by graphical mode. In general (GCOD) small groups (<15) of GCODs are selected for 4D-QSAR The each conformation is placed in reference cubic studies which are significant in 4D-QSAR model. lattice reference grid cell and spacing of cell is as per trail vii. 4D-QSAR Model Building, Optimization, Comparison alignment. The GCOP is calculated for each compound based and Evaluation on five to six different classes of interaction pharmacophore elements. These IPEs used to do trail alignment of 4D- The highly weighted grid cell occupancy descriptors are QSAR descriptors. The cell occupancy for each grid cell is selected; further, genetics algorithm (GA) and genetic function taken into consideration after alignment of each IPEs atom in approximation (GFA) are used for RI-4D-QSAR model a grid cell it result in formation of unique set of IPEs. This building, optimization, comparison and evaluation. These set of IPEs for each atom in QSAR called as grid cell highly weighted set of GCODs generated in pervious step by occupancy descriptors, GCODs. Three grid cell occupancy is trail alignment used in model building by GA analysis. The calculated for each IPEs by taking in accounts of absolute basic requirement of this algorithm that only linear terms are occupancy (Ao), Cartesian coordinate of grid cell where i, j, used by multiple linear regressions. The model optimization k defines dimension of grid cell and time t for ensemble is done after analysis of resultant 4D-QSAR models, where generation for each IPEs atoms. The joint occupancy (Jo) several diagnostic operations included in the study like measurement is an important task and effectively done by crossover, linear crossover to relate GCODs and biological putting reference compounds (R) into gird cell. And lastly, activity. Finally, model comparison and evaluation are self-occupancy is studied it generally relative to grid cell performed using techniques like Leave one out-Correlation occupancy of reference compounds (R). There is no official coefficient, Cross correlation, Correlation coefficient, Lack guideline regarding which occupancy descriptor regarding of Fit. which should be included and which should be left from viii. Retrial of Trail Alignment, Creation of IPE CEPs particular study. The reference compounds are really helpful and Selection of GCODs Unless Included for GA or will lead to generation of biasness in 4D-QSAR model. Analysis The biasness may be helpful in generation of 4D-QSAR model towards template based properties. The reference Based on trail alignment, a model is constructed and compounds selected for study so that highly potent member evaluated with composite set of 4D-QSAR models constructed of series get selected and that result in highly influence with on repetition of step iv to vii. It also can be achieved if trail activity potent features. From numbers of study it is suggested alignments are sampled. Once desire set of trail alignments that use of joint occupancy descriptor are used when small included into construction of 4D-QSAR model, proper group of compound present in study set, where as absolute optimization, comparison and evaluation of all models are occupancy descriptors useful in case of large number of done. compounds. ix. Identification of “Best” 4D-QSAR Model With vi. PLS Analysis to Reduce Number of GCODs Against Respect to Trail Alignment the Biological Activity Measures Detail assessment and evaluation of 4D-QSAR models is There are several reasons, because of which large number constructed in set of QSAR population. The basic objective of grid cell occupancy descriptors are generated such as a of this step to find “Best” optimized 4D-QSAR model. The1 result of enormous rigorous trail alignment, numbers of grid specific best trail alignment is selected on the basis of highest cell, five to six IPEs and three different possible occupancy. goodness of fit of regression coefficient and q2. Generally, to Data reduction step in 4D-QSAR is similar to CoMFA with choose best trail alignment with high q2 a cross-correlation slight difference where complete set of grid cell occupancy matrix of the residuals in error is done in case of pair of top descriptors are included into study. The all atoms IPE CEPs 4D-QSAR model. This is done generously to know similar generated in grid cell, and location of molecular shape with and distinctive structure activity relationship to find unique respect to the IPE is calculated using plot of Boltzmann best 4D-QSAR model present in the study. average. In this plot joint occupancy (Jo) measurement is done by mapping grid cell location into a single location x. Proposing the Hypothesis about Active Ligands index m this plot also known as molecular shape spectrum Conformations (MSS). The difference in biological activity of two compounds The aim of this step is to find or identify active state of is distinguished by difference in their molecular shape conformers generated for each compound and sampled for spectrum. In this step location of such IPE CEPs is tried to conformational search which is belonging to lowest energy identify in the grid cell. The MSS may be dependent on level E (Global minima) of CEP. These all selected lowest several things like size of grid cell, coordinate positioning of energy conformers are evaluated using q2 for best 4D-QSAR the compound and can be evaluated by shifting the coordinate model. Also conformation with maximum Gird cell occupancy of compounds or by changing the cell size. The PLS is used or consistent with lowest energy conformation or after to perform regression analysis to remove unnecessary molecular dynamic simulation ensemble sampling of GCOD GCODs by establishing the relationship between experiential are used for evaluation of activity of 4D-QSAR model. The 46 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al. hypothesis is made about predication of conformation with binding mode and alignment; there are a number of methods lowest energy which is “high at activity”. The hypothesized used to study it like gird cell based hot spot analysis (GCB- highest activity conformation is used as template structure HSA), ensemble simultaneous search (ESS) and composite for structure based drug design. The use of molecular crystal-field analysis (CC-FA). In SBDD two are the best geometry of template with receptor ligand interactions is best approaches to find novel ligands like virtual screening where source for structure based drug design. ligand shape and molecular features play an important role in searching novelty. Another approach is de novo approach B. RECEPTOR DEPENDENT 4D-QSAR where molecular interaction between receptor-ligand is Receptor-dependent 4D-QSAR is the new approach significant in designing of novel inhibitors. SBDD has four in QSAR where experimental techniques like X-ray common limitations in order to find novel hits: crystallography, NMR spectroscopy and comparative a) Ligands’ conformational flexibility is not taken into modeling/Homology modeling are used to determine 3D consideration, only single conformation is used, which structures of macromolecules. The database such as PDB, might create ambiguity. The above problem is because SCOP and CATH are depository for it. The 3D structure is of free ligand conformation and which is different from determined, the binding site for the ligand is predicted, and it the actual bound conformation in the active site. The allows knowing the binding and alignment modes of ligands. multiple conformations in the form of ensembles are The basic aim of the RD 4D-QSAR study is to map the considered as that of single conformation in order to ligand-receptor interaction mode. The binding mode address conformational flexibility. information from such series of ligands allows one to construct site occupancy- weighted 3D pharmacophore b) The bonded and non-bonded interaction between the models. According to the binding interaction weight of the receptor and ligand may change the conformation of pharmacophore is assigned. The relative occupancy of active site. We mostly called it the induced fit effect, pharmacophore is evaluated thermodynamically by accessing which may play a role in the recognition of the novel conformational state of the receptor ligand complex. The ligand. This has not been consider in most of the SBDD complex of receptor ligand is analyzed as stated above for studies and called it as rigid fit effect. In recent studies

Fig. (2). Methodology of RD 4D-QSAR analysis. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 47

multiple protein conformations (MPS) are taken in the ii. Generation of Ensembles form of ensemble docking to find the induced fit effect The number of molecular dynamics and molecular in the binding mode. simulations packages used most commonly GROMACS c) The numbers of scoring function are used, which calculate because of its open access and it is fast in computing the free energy of binding between the receptor and the molecular ensembles using numbers of different force fields. ligand and to do it numbers of force fields are used. Force The .gro is an output of input file and it is an extension of fields are mostly calculated for non-bonded interaction GROMACS package as a trajectory file in which coordinates receptor and ligand, but till date no scoring function has of ensembles are stored. In .top or .itp file generated as input been designed, which can calculate the correct binding of .gro file, which includes topology and atomic energy in affinity. the form of Columbic and Van der Waals energy terms. The output of .gro file and .top/.itp file is used as input for d) Solvation terms like desolvation and resolvation are not calculating the 3D interaction energy descriptors in the grid considered generally and it might affect the ligand receptor cell. To calculate this interaction energy uniformly, the grid binding interactions. These solvation effects have a is defined, which occupies all the conformations. The direct impact on binding strength of receptor and ligand, interaction at each point in the grid is calculated by placing a and neglecting of these interactions will lead to wrong probe at specific junctions of the grid [70-71]. estimation of the biological activity of ligand moiety. Generally solvation effect will lead to change in geometry iii. Computation of 3D Properties and free energy of the complex. Generally, probes placed around grids are ions, cations or The specialized design of RD 4D-QSAR will focus on a functional group of positive or negative charge used for such problem arising in SBDD, which can be reduced to the computation of energy at regular spacing in 3D grid cells. As lowest level and the proper quantitative relation can be defined, earlier energy in the form of Columbic and Van der established between features of ligand molecule and biological Waals is calculated for each atoms of CEP using various activity [68]. Basic methodology and analysis of RD 4D- force fields. Both Columbic and Van der Waals energy are QSAR is shown in Fig. 2. calculated for ‘n’ number of atom and the mean of this energy value is useful to calculate all copies of ligands in The numbers of studies are carried out in receptor- each grid point [70, 71]. dependent 4D quantitative relationships like glucose analogue inhibitors [68], enoyl-acp reductase inhibitors [65], 4- iv. Correlation Descriptors between Biological Activities Hydroxy-5, 6-dihydropyrones as Inhibitor of HIV-1 Protease The output of computation of 3D properties is at each [85-86], p38-mitogen-activated protein kinase [87] and 14-- lanosterol demethylase [88]. grid point where columns contain information about energy for each atom in regular spacing 3D grid cell which contains C. LQTA 4D-QSAR information about descriptors. In the same way, rows define sets of chemical compounds under study in the grid cell. The It is a new conception in 4D-QSAR study, Laborato´rio simple matrix is defined containing values of columns and de Quimiometria Teo´rica Aplicada (LQTA) where rows to select relevant molecular descriptors in the study. conformational flexibility is mostly studied [69]. In routine The multivariate regression approach (PLS/PCR) used to 4D-QSAR study IPE and GCODs are calculated to predict construct the QSAR model for relationship between the biological activity of chemical moiety. In LQTA 4D- independent variable “biological activities” with dependent QSAR numbers of conformations are generated and after variable 3D properties at each grid point, which is defined as that conformations are optimized to calculate 3D descriptors. “descriptors” [70, 71]. The basic methodology of LQTA 4D-QSAR, is the same as that of RI 4D-QSAR, where the numbers of conformations D. SOM 4D-QSAR are generated and their interaction pharmacophore energies are calculated in the grid cell. The number of molecular The exclusive focus of 4D-QSAR is to model the 3D properties of chemical compounds, and to construct and dynamics and molecular simulation packages are used to analyze conformational profile of receptor ligands. The self- generate ensembles. The molecular simulation and MD organizing map (SOM) was the unsupervised Machine studies are carried out in the presence of solvent interaction Learning type and used to classify the data according to the which involves a better understanding of natural conditions similarity in it. It is a basic type of ANN developed by Teuvo [69, 70]. The basic methodology of LQTA 4D-QSAR is divided into four steps and is explained in Fig. 3. and Kohonen and it is mostly used in the QSAR approach because it is transparent and easily interpretable. It is mostly i. Design or Retrieval of Ligands used for classification and to test the toxicity of chemical compounds. In the design of the SOM/KN network you will The molecular structures of ligands are sketched with find 2D grid of connected-neuron network and a layer of various molecular modeling packages and their after-energy multidimensional vectors, each dimension of vector representing is minimized and converted to 3D conformations using a descriptor. The system of neuron learn in two steps: one is suitable force fields. Molecular dynamic programs like election of a similar neuron (winning) and the other is where GROMACS or online servers like DYNAPOCKET are useful the weight of winner neuron is modified by vectors so that in generation of topology and coordinates for ligand atoms neighbor neurons get similar values and self-organizing map [70, 71]. is set up. After all the input is offered to the network, and the 48 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

Fig. (3). Methodology of LQTA 4D-QSAR analysis. learning process is over, network, became stable and each iv. Creation of Conformational Ensemble Profile (CEP) layer of vector represents the value of each descriptor and it The training set selected in step1 after energy minimization tries to preserve that information [51]. As we know the are subjected for molecular dynamics studies and molecular QSAR study can include many descriptors (logP, mol.wt, ensembles are created for each ligand molecules are used kier and Hall indices) and SOM/KN does not have any for conformational ensemble profile. The molecular problem to include them in the study; hence, in 3D-QSAR it is used mostly when molecular shape, spectra or any other dynamic study also helps in comparative analysis of ligand conformations. Semi empirical method such as AM1 method encoding 3D features are used. The statistical result of the is used to calculate partial charge of compounds. SOM/KN model is compared with other methods like the linear or MLR model. In general as discussed above, multiple v. Construction of Comparative 2D-SOM Map conformations of ligands are developed in the 4D-QSAR model. The method is suited for experiments where active At this step 2D self organization maps are constructed bound conformation is searched, taking into account based on CEP for each ligand after calculating Cartesian conformation flexibility. The basic methodology of SOM coordinates and partial charges. At the training stage values 4D-QSAR is divided is into six steps [72, 73]. of each ligand are provided as input data by means of Grid cell Occupancy Profile (GCOP) and partial charges to neurons i. 3D Model Building and Ensemble Searching to get Occupancy Profile maps or partial charges maps. The initiation of SOM 4D-QSAR begins with generation vi. Data Reduction and Model Validation of 3D conformation of selected ligands in training set. Ligands 3D structure are minimized and active conformation The terminal and final step in SOM 4D-QSAR where are searched in ensemble sampling. specific occupancy and partial charge groups are selected for PLS analysis and variables are eliminated. In 4D-QSAR, ii. Trail Alignment or Superposition relationships are built up based on leave one out (LOO) and cross validation (CV) procedure in association with iterative The selected active conformations of ligands are subjected variable elimination (IVE-PLS). Model validation is done by for trail alignment or superimposition. The procedure worked cross validating QSAR model with external test set data by on the basis of active conformer and selecting at least three atoms for superposition in the alignment. measuring predicative ability and to do so variety of training and test set are sampled and monitored by Stochastic Model iii. Finding of Interaction Pharmacophore Elements Validation (SMV) scheme. (IPE) 4D Vs 5D-QSAR The aim of this is step to find and select groups of atoms The comparison of 4D-QSAR with 5D-QSAR helps that play an important role in interaction by placing selected measure the performance of these multidimensional QSAR conformer into grid cell and their fragmenting ligand. The apparent selection of interaction groups like aliphatic, approaches in drug discovery and development. The performance of 5D-QSAR is evaluated using the analyzing aromatic, hydrogen bond donor, hydrogen bond acceptor and residual test (|Gpred° - Gexp°|) [74]. polar and non-polar charge groups is done by placing probe at regular spaces of 3D gird cell. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 49

5D-QSAR donor, acceptor at the complementary directed group at ligand molecule and it also includes hydrogen bond Flip The Quantitative Structure Activity relationship is the flop. The atomistic or proxy receptor model is created using most well-established research field of the Quasar concept, which was introduced by Angelo Vedani dealing with building computational models of chemical et al. (Fig. 5). The receptor surface model gives an idea compounds and correlating it with biological activity. The about proxy or hypothetical receptor binding site, which basic ideology behind the 3D or 4D-QSAR models is that mapped into 3D features into the properties that can help in compounds with different biological activities are mostly projecting them on the receptor surface. The nature, shape, because of either a different molecular field or because of size and location of these surface and 3D features will help correct bioactive conformations. The hypothetical virtual in understanding the role like steric, electrostatic or bulk models built are mostly used to predict binding affinity and nature of a mapped binding surface. There are several allied the pharmacokinetic and toxic properties of chemical properties of binding receptor like hydrophobicity, hydrogen compounds. The recently developed approach of Quantitative bond donor, hydrogen bond acceptor and partial charges. Structure Activity relationship is used to address receptor The evaluation of atomistic or proxy receptor or surface flexibility. The inherent flexibility of the receptor has often receptor is checked by validating two properties of surface been overlooked in drug design, despite recent accounts of its receptor, which are involved in the simulation of receptor importance in the design of several inhibitors. The receptor atom like induced fit and H-bond flip-flop. Momentary ligand is the multifaceted object where ligand binds to the hydroxyl amino acids Ser, Thr, Tyr, Cys, His, Asn and Gln receptor to a lowest energy conformation state and to do so present in the binding surface play an important role in the receptor structure will distort to engulf the ligand induced fit and H-bond flip-flop to the ligand molecule. molecule. To understand the details about the distorted state While building the atomistic model the main aim is to of the receptor as a result of formation of binding complex establish an average stimulated model that interacts with a two states of receptors are considered, i.e. Apo (unbound) series of ligand molecules. The multiple representation of ligand and Holo (Bound) (Fig. 4). topology to study conformation, isosteriomer, protonation and The phenomenon of receptor flexibility and the orientation is generally called the new dimension of 4D- consequent actions are generally represented in the form of QSAR as it can be represented in multiple induced fit and energy-level transitions. The receptor flexibility will follow referred as the new revolution in QSAR called as 5D-QSAR. several incidents like receptor refolding and formation of The multiple representations are used as ensembles in compact hydrogen units (CHUs). In receptor ligand binding, 4D and 5D-QSAR study, where molecular simulation and several forces play a role to determine the selective or Boltzmann weighted selection criterion for bioactive peculiar binding pose, like hydrogen bonding, covalent metal conformation are selected as genetically evolved from a bond and electrostatic bond [56]. The binding interactions reservoir of conformation. In the Quasar concept local are mostly studied in molecular docking where only one induced fit in atomistic receptor is simulated by introducing ligand molecule will bind to receptor, but in Quantitative “pharmacophore equilibration” mapping ligand in the Structure Activity Relationship study several aligned series training set as “Inner” and “Mean” envelope. The inner of ligand molecules are considered to evaluate biological envelope is created by mapping all training ligand sets at a activity [75]. The simultaneous binding of receptor to ligand Van der Waals distance whereas the inner core of the envelope is studied as atomistic or proxy receptor. The factual is created by tightly accumulating and accommodating each flexibility of receptor conformation is created in atomistic or ligand molecule. The mapping of “Inner” and “Mean” envelope proxy receptor by introducing flexibility on hydrogen bond

Fig. (4). (a) Explicit simulation of induced-protein fit in 5D-QSAR; (b) Explicit simulation of induced fit by a dual-shell representation of the three-dimensional binding-site model [57]. 50 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

Fig. (5). Quasi-atomistic receptor model for the receptor system generated by the software Quasar [76]. design on basic approaches is first performed isotropically and test set. The building of receptors inner envelope is (linearly), second anisotropically (steric, electric, H-bond or achieved by mapping significant three dimensional features. lipophilicity-potential scaled) and third through energy The correct protocol for this is firstly to map surface on brief minimization. As discussed above, the fifth dimension induced and make the inner core envelope and then after snugly fit model cannot be created without considering factual accommodate all the ligand training set. This will create receptor; to do, a so number of simulations and induced fit accurate induced fit three dimensional models by firstly hypothesis are considered to create a factual receptor. To isotropically (linearly), secondly anisotropically (Steric, electric, construct a fifth dimension model several techniques have H-bond, or lipophilicity-potential scaled) and thirdly through been used like building it on the modus operandi of the mean energy minimization. The dimensionality induce fit is envelope. The Quasar concept used in building local induced calculated by measuring root mean square deviation from fit in atomistic receptor is created by a changeable degree of mean (0.4-2.5 Å). The free energy association between freedom from 0 to 1 degree where 0 indicates no mobility in receptor-ligands is calculated by means of inner core envelope receptor and 1 as the perfect local induced fit model. The (0.2-6.0 kcal/mol). perfect atomistic receptor model is created, which allows generating the perfect receptor binding surface where 3D ii. Generation of an Initial Set of These Atomistic Receptor Surfaces of Parent Structures receptor models are created choosing peculiar surface properties. The construction of a 3D pharmacophoric model The significant three dimensional properties of atomistic will help in allowing receptor surface and ligand to adapt to receptor surface are randomly distributed. The numbers of each other conformationally as well as hydrogen bond mediated points of receptor surface are calculated using atomistic flip-flop of hydroxyl-containing amino acid towards well- populated properties, minimum distance between two populated adopted structure of ligand molecules. The set of these atomistic properties as default are set as 2.4 Å. These properties are receptor surfaces are created using genetic algorithm and sense as most saturated functional groups of most potent again are validated using cross-validation protocols. In the ligand molecule of series of ligand molecules. The precise Quasar approach with site directed hydrogen flip-flop bonding, and more potent ligand molecules are estimated using the solvation term also has been studied [76-80]. intrinsic affinity towards receptor surface. The position and geometry of receptor surface will remain still throughout the A. METHODOLOGY OF CONSTRUCTION IN QUASAR simulation. The vector support method used for identification of i. Building of Receptors “Inner” and “Mean” Envelope binding cavity present on receptor surface which will yield a best position for ligand functional groups to form non Local induced fit in atomistic receptor simulated model is covalent bond with (directional H bond) a receptor surface. constructed for each training ligand set by creating inner The above method followed to generate ‘n’ number of envelope of van der Waals distance at topology of each training atomistic receptor surfaces of parent structures. The solvation Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 51 effect of solvent also have to study on receptor binding The inherent slope and intercept of this equation to a pocket because it mainly as a part of envelope to represent receptor model in study is used to predict binding energy of solvent accessible area. This region open as void and compounds included into study which is other than training hydrophobic in nature which is motionless static dynamically. set. iii. Genetic Evolution of Set of Atomistic Receptor B. ANALYSIS OF RECEPTOR MODEL FAMILY Surfaces of Parent Structures The receptor used in study is evaluated and validated Genetic algorithm is the class of model optimization using external data set and criteria for evaluation is predication method Darwinian evolution. Genetic algorithm is widely of model receptor by predicting relative free energy of binding. used method in the field of computational chemistry and Other stringent criteria are set to evaluate model like cross- chemoinformatics and applied mostly in protein ligand validated q2 value and lack-of-fit [46]. Along with these test docking or Quantitative structure activity relationship study. uniformity distribution of envelope test is calculated like In the assumption of genetic algorithm study creation SAS or hydrophobic pocket. The method used to select random of ‘n’ number of sample population by either cross diversified training compounds from available biological over or mutational approach and after that finally evaluate data set in study ‘minimum distance’ method work to predict them to find best among the sample population. The true biological values. The algorithm of ‘minimum distance’ evolution study depends upon finding fitness function and is developed with aim to select most diversified compounds based on fitness function sample population are ranked. In from set of biological data set in study. The minimum cross over study, parents selected for genetic study are best distances between two compounds are calculated by using fit for the cross over study and cross validatory q2 which weighed function such as electrostatic and van der Waals indicate higher or lower value indicating kind of group of interaction determined by using at common point of interaction individual selected for study. Then after cross probability q2 between them. The ability of this algorithm to predict true value is calculated to measure fitness of produced population biological value helps in selection of most potent compound ranging from 0-1. The lock of fit is calculated to measure the from dissimilar biological data set which later on can be used different properties of parent structure with in the population as training set in QSAR study. (Predicated and experimental) in study to validate induced fit Local induced fit in QSAR model is considered in Quasar by considering three penalties like protonation state, technology and ensemble induced fit hypothesis is evaluated selectivity of ensembles and property difference between the to minimize biasness. In recent methodology some modification model. are done to consider adaptation mechanism. 1) Linear mode, LoF =rms [G°pred - G°exp]/{1.0 - (ppart + pdiff + which scale from 0.0 to 1.0 where 0 indicated no induced fit psele)/3.0} where as 1.0 indicate maximum topological adaptation in induced fit. In next three step i.e. 2-4 steric, electrostatic, and iv. Measurement of Free Energy of Binding H-bond field adaptation are considered. In step 5, energy As mentioned above in measurement of free energy of minimization based on steric potential and in 6th step, a binding between ligand and receptor both solvation energy particular adaptation in topology in proportional to lipophilicity and entropy value are taken into consideration. potential of the molecule. The induced fit methodology adopted are proportional to the steric, electrostatic, H-bond field and Ebdg ~ Elig_recy - TSbdg - Gsolv, lig + Eint, lig + E ind. lipophilicity potential which is directionally acting on mean fit, lig envelope of the model. The above step is followed by The solvation term mentioned in above equation is constrained energy minimization to make definite separation necessary to address change in internal energy of binding of between neighboring points in linear gird. The linear mode receptor surrogate, the maximum internal energy of ligand used for measurement of local induced fit behaves isotropically observed when appropriate accumulation of ligand by and dependent on molecular properties such as shape, steric receptor molecule where optimal ligand receptor interaction potential, electrostatic potential, and hydrogen bond donor or observed. The solvation energy of ligand molecules in acceptor property or to lipohilic potential which yield an receptor structure is accurately mentioned and explained by isotropic induced fit model (Fig. 6). The genuine parents Blaney’s approximation where he stated that all ligands are induced model structure selected for study is evaluated using buried into receptor surface with equal propensity. The lack of fit method the model with lowest LOF value are selected explanation to Blaney’s approximation is given with all for study. complexes of receptor ligands carries equal solvation energy 6D-QSAR [81]. The ligand which is exposed differently to receptor molecule with different binding fraction of binding site is The new dimension added to the QSAR study improves exposed to it, so correct portion of binding site exposed the QSAR analysis. The additional dimension is included in defining is important. The interaction energy between receptor the QSAR study to consider solvation function [82]. This is and ligand calculation is important to calculate interaction an extension of the Quasar technology, where using this new energy calculated by directional force fields (Vedani and dimension we allocate consideration of simulations for different Huhta, 1990; Vedani et al., 1995). The free energy of ligand solvation models [83]. The solvation terminal (solute and binding and predication free energy (Gpre) are calculated by solvent) conceptualization is very important because here we calculating mean of each by linear regression of training set. study their importance in non-covalent interaction between ligand and receptor when they combine. Gpre =|a|. Ebinding + b 52 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

Fig. (6). Stereo view of the envelope selection (induced fit hypotheses) [78].

The detailed concept of simulations solvation has been The result of these studies is also cross-validated using explained over here by either implicitly or explicitly mapping experimental value. the features of receptor-ligand like surface area (size and 3D to 6D- QSAR shape) and solvation properties using genetic algorithms. The solvation terminology used is either ligand desolvation Structurally Diverse 106 Ligands of Estrogen Receptor or solvent stripping, which is directly measured in case are selected for study, the ligand molecules are divided into different surrogate model families are used in measurement test (18) and training set (88). Initially three dimensional of binding affinity. These surrogated model families of true structure of ligand molecules are energetically optimized receptor binding pocket comprises features like hydrogen using force field like AMBER using Macro- Model software. bond donor, hydrogen acceptor and hydrophobic and Further in study atomic charges and solvation energy is hydrophilic. The result of measuring the binding affinity of calculated using AMSOL program. The numbers of conformer surrogated model family will give us an idea about the are generated for each ligand molecules (4D) using Quasar solvent’s accessible area. As mentioned above, features of tool and least energy conformer is retained in study. The 5D receptor-ligand complex like SAA and solvation properties model is simultaneous evaluation of different induced-fit are weighted and these are evolved through the intervals of scenarios very well suited for study of induced fit. And finally it genetic simulation. The principles used in 4D- and 5D-QSAR is studied for different solvation scenarios. The advantage of were 3D structures, which are generated and simulated using each of QSAR approach over other is efficiently shown by moderate evolutionary pressure. The inherent characteristics comparing values of cross- validated and predicative r2 [84] of ligand molecules are screen unbiasedly using receptor (Table 2). properties like hydrophobic properties, characteristics of binding pocket and SAA-called the receptor QSAR-based receptor modeling. The compound included in the study Table 2. Comparison of predicative ability of 3D-6D QSAR comprises both training and test set and as defined above a model. surrogate model family of receptor is used in 6D- QSAR.

The surrogate model family has true biological activity the 2 2 model selected in the study and are evolved under the genetic QSAR Model Cross Validated r Predicative r algorithms [84]. 3D- QSAR 0.821 0.563 The Quasar-simulated model used in the study is evolved and both cross-validation and predicative correlation value is 4D- QSAR 0.810 0.788 calculated to predict biological activity. The biological activity 5D- QSAR 0.872 0.790 of training and test compound is tested using characteristic properties of true receptor in the binding pocket and true 6D- QSAR 0.903 0.885 positives are well selected by the receptor surrogated family. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 53

CONCLUSIONS [9] Sharma, M.C.; D. Kohli, V.; Sahu, N.K.; Sharma, S.; Chaturvedi, S.C. 2D-QSAR studies of some 1, 3, 4-thidiazole-2yl azetidine one as QSAR is the most reliable computational technique used antimicrobial activity. Dig. J. Nanomater. Bios., 2009, 4(2), 339-347. for decades for better insight on substituent’s physicochemical [10] Siavoush, D.; Maryam, H.M.; Karim, A.Z. Comparison of different property and biological activity. Number of developments 2D and 3D-QSAR methods on activity prediction of histamine H3 receptor antagonists. Iran. J. Pharm. Res., 2012, 11(1), 97-108. has been taken and different constitutional descriptors have [11] Hong, H.; Xie, Q.; Ge, W.; Qian, F.; Fang, H.; Shi, L.; Su, Z.; been explored. Advancements have been taken place Perkins, R.; Tong, W. Mold(2), molecular descriptors from 2D considering the limitations of previous one. After the structures for chemoinformatics and toxicoinformatics. J. Chem. introduction of 3D-QSAR different advanced tools like Inf. Model., 2008, 48(7), 1337-1344. [12] Prasanna, S.; Doerksen, R.J. Topological polar surface area: a COMFA, COMSA, WHIM etc. extensively assist to study useful descriptor in 2D QSAR. Curr. Med. Chem., 2009, 16(1), 21- multiple 3D descriptors of a chemical compounds establishing 41. correlation between structure and biological activity. More [13] Helguera, A.M.; Combes, R.D.; González, M.P.; Cordeiro, M.N. advanced QSAR approaches like 4D, 5D and 6D-QSAR Applications of 2D descriptors in drug design: a DRAGON tale. have been evolved to overcome the limitations of 3D-QSAR. Curr. Top. Med. Chem., 2008, 8(18), 1628-1655. [14] William, L.C. Chemoinformatics: past, present and future. J. Chem. Large number of reports appears recently for 4D-QSAR Inf. Model., 2006, 46(6), 2230-2255. studies of penicillin analogs which exploit Hopfinger’s [15] Aliuska, M.H.; Natalia, S.C.; Maykel, P.G.; Miguel, A.C.P.; 4D-QSAR schemes. There are some reports on recent Reinaldo, M.R.; Yunierkis, P.C. QSAR modeling for predicting advancements like LQTA-QSAR approach. 5D-QSAR carcinogenic potency of nitroso-compounds using 0D-2D molecular descriptors. 11th International electronic conference on studies are extensively described by tools like Biographics synthetic organic chemistry, 2007, 1-30. Laboratory 3R, Quasar and Raptor. To consider the salvation [16] Todeschini, R.; Consonni, V.; Gramatica, P. Comprehensive function, a new dimension has been originated known as 6D- chemometrics in QSAR, Oxford, Elsevier, Italy, 2009, 3, 129-164. QSAR where we allocate simulations for different solvation [17] Hoffman, B.T.; Kopajtic, T.; Katz, J.L.; Newman, A.H. 2D QSAR models. In this model, non-covalent interactions between Modeling and preliminary database searching for dopamine transporter inhibitors using genetic algorithm variable Selection of ligand and receptor are studied. Considering some recent Molconn Z descriptors. J. Med. Chem., 2000, 43(22), 4151-4159. reports of novel approaches multidimensional QSAR might [18] Livia, B.S.; Adriano, D.A. Fragment-based QSAR: perspectives in serve as an important tool in drug discovery. drug design. Mol. Divers, 2009, 13(3), 277-285. [19] Jaroslaw, P.; Andrzej, B.; Rafal, G.; Tomasz, M. Modeling robust CONFLICT OF INTEREST QSAR. J. Chem. Inf. Model., 2006, 46(6), 2310-2318. [20] Lea da, S.V.; Masamoto, A.; Kimito F.; Yuji T. 2D and 3D QSAR The authors confirm that this article content has no studies of the receptor binding affinity of Progestins. J. Braz. conflicts of interest. Chem. Soc., 2010, 21(5), 872-881. [21] Scior, T.; J.L. Medina, F.; Do, Q.T.; Martinez M.K.; Yunes, R.J.A.; ACKNOLEDGEMENTS Bernard, P. How to recognize and workaround pit falls in QSAR studies: a critical review. Curr. Med. Chem., 2009, 16(32), 4297- The authors are thankful to the Mrs. Fatima Rafiq Zakaria 4313. Chairman Maulana Azad Educational Trust Aurangabad, [22] Hugo, K. In: QSAR and 3D QSAR in Drug Design Part 1: ed Mahatma Gandhi Mission Trust Aurangabad and Dr. M.H.G. Methodology; Eds.; Elsevier Science: Germany, 1997, 2 , 1-11. [23] Ovidiu, I. In: QSPR/QSAR Studies by Molecular Descriptors, Eds.; Dehghan, Principal, Y.B. Chavan College of Pharmacy, M.V. Diudea, Nova Science: Huntington, New York, 2001, pp. 213- Dr. Rafiq Zakaria Campus, Aurangabad 431 001 (M.S.) for 231. constant support. [24] Jitender, V.; Vijay, M.K.; Evans C.C. 3D-QSAR in drug design-a review. Curr. Top. Med. Chem., 2010, 10(1), 95-115. REFERENCES [25] Christian, L.; Thomas L. Computational methods for the structural alignment of molecules. J. Comput. Aided Mol. Des., 2000, 9(3), [1] Ojha, L.K.; Sharma, R; Bhawsar, M.R. Modern drug design with 215-232. advancement in QSAR: A review. Int. J. Res. Biosciences., 2013, [26] Stefan, D.; Armin, B. Improved alignment by weighted field fit in 2(1), 1-12. CoMFA of histamine H2 receptor agonistic imidazolylpropyl- [2] Selassie, C.D. In: History of Quantitative Structure Activity guanidines. Quant. Struct.-Act. Relat., 1999, 18(4), 29-341. Relationships; Eds.; Burger’s Medicinal Chemistry and Drug st [27] Coats, E.A. In: The Comfa Steroids as a Benchmark Dataset for Discovery: New York, 2003, 1 ed, 1-48. Development of 3D QSAR Methods. In: 3D QSAR in Drug Design [3] Liew, C.Y.; Yap, C.W. In: Current Modeling Methods Used in - Recent Advances. Kluwer Academic Publishers: New York, 1998; QSAR/QSPR. Statistical Modelling of Molecular Descriptors in st pp.117-134. QSAR/QSPR; Eds.; Wiley-Blackwell: Weinheim, 2012, 1 ed, 1-32. [28] Walters, D.E.; Hinds, R.M. Genetically evolved receptor models: a [4] Mark, T.D.C. In: Quantitative Structure-Activity Relationships computational approach to construction of receptor models. J. Med. (QSARs)-Applications and Methodology; Eds.; Springer Dordrecht Chem., 1994, 37(16), 2527-2536. Heidelberg: New York, 2010, 8, 16-24. [29] Walters, D.E.; Muhammad, T.D. In: Genetically Evolved Receptor [5] Chanin, N.; Chartchalerm, I.; Thanakorn, N.; Virapong, P. A Models (GERM): A Procedure for Construction of Atomic-Level practical overview of quantitative structure-activity relationship. Receptor Site Models in the Absence of a Receptor Crystal Excli. J., 2009, 8, 74-88. st Structure. In: Genetic Algorithms in Molecular Modeling; [6] Hugo, K. In: QSAR: Hansch Analysis and Related Approaches, 1 Devillers, J., Eds.; Academic Press: London, 1996; pp. 193-210. eds.; VCH: Weinheim & New York, 1993. [30] GERM. Walters, D.E. http://www.finchcms.edu/biochem/Walters/ [7] Arkadiusz, Z.D.; Tomasz, A.; Jorge G. Computational methods in germ.html (Accessed on 1st April, 2009). developing quantitative structure-activity relationships (QSAR): A [31] Lushington, G.H.; Guo, J.X.; Wang, J.L. Whither combine? New review. Combin. Chem. High Throughput Screen., 2006, 9(3), 213- opportunities for receptor-based QSAR. Curr. Med. Chem., 2007, 228. 14(17), 1863-1877. [8] Kumar, U.C.; Munipalli, H.; Mahmood, S. 2D QSAR, Pharmacophore [32] Ortiz, A.R.; Pisabarro, M.T.; Gago, F.; Wade, R.C. Prediction of and docking studies of Mycobacterium tuberculosis enoyl acyl drug binding affinities by comparative binding energy analysis. J. carrier protein reductase inhibitors. J. Global Pharma Technol., Med. Chem., 1995, 38(14), 2681-2691. 2010, 2(5), 73-89. 54 Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 Damale et al.

[33] Silber, K.; Heidler, P.; Kurz, T.; Klebe, G. AFMoC enhances Models Using the 4D-QSAR Analysis Formalism, J. Am. Chem. predictivity of 3D QSAR: a case study with DOXP-reductoisomerase. Soc., 1997, 119(43), 10509-10524. J. Med. Chem., 2005, 48(10), 3547-3563. [56] Sandip, S.; Farooqui, N.A.; Easwari, T.S.; Bishwabara, R. CoMFA- [34] Mark, T.D.C. In: Quantitative Structure-Activity Relationships 3D QSAR approch in drug design. Int. J. Res. Dev. Pharm. Life (QSARs)-Applications and Methodology, Recent Advances in Sci., 2012, 1(4), 167-175. QSAR Studies; Eds.; Springer: Netherland, 2010, 8, pp.3-11. [57] Markus, A.L. Multi-dimensional QSAR in drug Discovery. Drug [35] Verma, J.; Khedkar, V.M.; Prabhu, A.S.; Khedkar, S.A.; Malde, Discovery Today, 2007, 12(23-24), 1013-1018. A.K.; Coutinho, E.C. A comprehensive analysis of the thermodynamic [58] Carolina, H.A.; Kerly, F.M.P.; Elizabeth, I.F.; Anton, J.H. events involved in ligand-receptor binding using CoRIA and its 4DQSAR: perspectives in drug design. Molecules, 2010, 15(5), variants. J. Comput. Aided Mol. Des., 2008, 22(2), 91-104. 3281-3294. [36] Datar, P.A.; Khedkar, S.A.; Malde, A.K.; Coutinho, E.C. [59] Jaroslaw, P.; Andrzej, B. Modeling steric and electronic effects in Comparative residue interaction analysis (CoRIA): a 3D-QSAR 3D and 4D-QSAR schemes: predicting benzoic pKa values and approach to explore the binding contributions of active site residues steroid CBG binding affinities. J. Chem. Inf. Comput. Sci., 2003, with ligands. J. Comput. Aided Mol. Des., 2006, 20(6), 343-360. 43(6), 2081-2092. [37] Khedkar, S.A.; Malde, A.K.; Coutinho, E.C. Design of inhibitors of [60] Venkatarangan, P.; Hopfinger, A.J. Prediction of ligand-receptor the MurF enzyme of Streptococcus pneumoniae using docking, binding free energy by 4D-QSAR analyses: application to a set of 3DQSAR, and de novo design. J. Chem. Inf. Model., 2007, 47(5),1839- glucose inhibitors of glycogen phosphorylase. J. Chem. Inf. 1846. Comput. Sci., 1999, 39(6), 1141-1150. [38] Dhaked, D.K.; Verma, J.; Saran, A.; Coutinho, E.C. Exploring the [61] Simon, J.T. Implications of protein flexibility for drug discovery. binding of HIV-1 integrase inhibitors by comparative residue Nat. Rev. Drug discovery, 2003, 2(7), 527. interaction analysis (CoRIA). J. Mol. Model., 2009, 15(3), 233-245. [62] Hopfinger, A.J. QSAR investigation of dihydrofolate-reductase [39] Silverman, B.D.; Platt, D.E. Comparative molecular moment inhibition by baker triazines based upon molecular shape-analysis. analysis (CoMMA): 3D-QSAR without molecular superposition. J. Am. Chem. Soc., 1980, 102(24), 7196-7206. J.Med. Chem., 1996, 39(11), 2129-2140. [63] Hopfinger, A.J.; Inhibition of dihydrofolate reductase: structure [40] Ajay, N.J.; Koile, K.; David, C. Compass: predicting biological activity correlations of 2,4-diamino-5-benzylpyrimidines based activities from molecular surface properties performance upon molecular shape analysis. J. Med. Chem., 1981, 24(7), 818- comparisons on a steroid benchmark. J. Med. Chem., 1994, 37(15), 22. 2315-2327. [64] Matthew, D.K.; Xuan, H.; Hopfinger, A.J.; Neil, L.H. 4D-QSAR [41] Samuli, P.K. FLUFF-BALL, a Fuzzy Superposition and QSAR analysis of a set of propofol analogs: mapping binding sites for an Technique, PhD Thesis, Department of Biosciences/ Chemistry anesthetic phenol on the GABAA receptor. J. Med. Chem., 2002, University of Kuopio, Finland. 45(15), 3210-3221. [42] Livia, B.S.; Adriano, D.A. Fragment-based QSAR: perspectives in [65] Pasqualoto, K.F.M.; Ferreira, E.I.; Ferreira, O.A.; Santos, F.; drug design. Mol. Divers., 2009, 13(3), 277-285. Hopfinger, A.J. Rational design of new antituberculosis agents: [43] Asikainen, A.; Ruuskanen, J.; Tuppurainen, K. Spectroscopic receptor-independent four-dimensional quantitative structure QSAR methods and self-organizing molecular field analysis for activity relationship analysis of a set of isoniazid derivatives. J. relating molecular structure and estrogenic activity. J. Chem. Inf. Med. Chem., 2004, 47(15), 3755-3764. Comput. Sci., 2003, 43(6), 1974-1981. [66] Osvaldo, A.S.; Hopfinger, A.J. A search for sources of drug [44] Pastor, M.; Cruciani, G.; McLay, I.M.; Pickett, S.D.; Clementi, S. resistance by the 4D-QSAR analysis of a set of antimalarial GRid-Independent Descriptors (GRIND): a novel class of dihydrofolate reductase inhibitors. J. Comput. Aided Mol. Des., alignment-independent three-dimensional molecular descriptors. J. 2001, 15(1), 1-12. Med. Chem., 2000, 43(17), 3233-3243. [67] Carolina H.A.; Kerly, F.M.; Pasqualoto, E.; Ferreira, I.; Hopfinger, [45] Berk, R.A. In: The Formalities of Multiple Regression. In: A.J. 3D-Pharmacophore mapping of thymidine-based inhibitors of Regression Analysis: A Constructive Critique; Berk, R.A., Eds.; TMPK as potential antituberculosis agents. J. Comput. Aided Mol. SAGE Publications Ltd: London, 2003, pp. 21-38. Des., 2010, 24(2), 157-172. [46] Rogers, D.; Hopfinger, A.J. Application of genetic function [68] Dahua P.; Jianzhong L.; Craig S.; Hopfinger, A.J.; Yufeng T. approximation to quantitative structure-activity relationships and Characterization of a ligand-receptor binding event using receptor quantitative structure-property relationships. J. Chem. Inf. Comput. dependent four-dimensional quantitative structure-activity relationship Sci., 1994, 34(4), 854-866. analysis. J. Med. Chem., 2004, 47(12), 3075-3088. [47] Materials Studio 5.0 Manual. [69] Martins, J.P.A.; Barbosa, E.G.; Pasqualoto, K.F.M.; Ferreira, [48] Davies, E.K.; Glick, M.; Harrison, K.N.; Richards, W.G. Pattern M.M.C. In: LQTA QSAR A New 4D-QSAR Methodology In: recognition and massively distributed computing. J. Comput. Proceedings of the 5th international symposium on computational Chem., 2002, 23(16), 1544-1550. methods in toxicology and integrating internet [49] Hyde, R.M.; Livingstone, D.J. Perspectives in QSAR: computer resources, Istanbul, Turkey, September 4-8, Turkey, 2009. chemistry and pattern recognition. J. Comput. Aided Mol. Des., [70] De Melo, E.B.; Ferreira, M.M.C. Four-dimensional structure 1988, 2(2), 145-155. activity relationship model to predict HIV-1 integrase strand transfer [50] Andrew, R.L.; Valerie, J.G. In: An Introduction to Chemoinformatics; inhibition using LQTA-QSAR methodology. J. Chem. Inf. Model., Springer: Netherlands, 2007; pp 2. 2012, 52(7), 1722-1732. [51] Keith, L.P. In: Artificial Neural Networks and Their Use in [71] Martins, J.P.A.; Barbosa, E.G.; Pasqualoto, K.F.M.; Ferreira, Chemistry; Eds.; Wiley-VCH, John Wiley and Sons, Inc: New M.M.C. LQTA-QSAR: a new 4D-QSAR methodology. J. Chem. York, 2000, 16, 53-140. Inf. Model., 2009, 49(6), 1428-1436. [52] Zhiyan, X.; Shikha, V.; Yun-De, X.; Alexander, T. Modeling of [72] Andrzej, B.; Jaroslaw, P. A 4D-QSAR study on anti-HIV HEPT p38 mitogen-activated protein kinase inhibitors using the catalyst analogs. Bio. Med. Chem., 2006, 14(1), 273-279. HypoGen and k-nearest neighbor QSAR methods. J. Mol. [73] Andrzej, B.; Jaroslaw, P. Modeling robust QSAR 3: SOM- Graphics. Model., 2004, 23(2), 129-138. 4DQSAR with iterative variable elimination IVE-PLS: application [53] Anardhan, S.J.; Srivani, P.; Narahari, S.G. 2D and 3D quantitative to steroid, azo dye and benzoic acid series. J. Chem. Inf. Model., structure-activity relationship studies on a series of bis-pyridinium 2007, 47(4), 1469-1480. compounds as choline kinase inhibitors. QSAR Comb. Sci., 2006, [74] Louise, B.; Christopher, W.M.; Michael, J.H.; Ian, J.T.; Marcel, 25(10), 860-872. L.V. Sensitivity of molecular docking to induced fit effects in [54] Ersin, Y.; Emin, S.; Kader, S.; Nazmiye, G.; Fatih, C. 4D-QSAR influenza virus neuraminidase. J. Comput. Aided Mol. Des., 2002, analysis and pharmacophore modeling: electron conformationalgenetic 16(12), 855-869. algorithm approach for penicillins. Bio. Med. Chem., 2011, 19(7), [75] Nibha, M.; Arijit, B.; Venkatesan, J.; Ashoke, S.; Mahua, B.; Kiran, 2199-2210. K.P. Structure based virtual screening of GSK-3b: importance of [55] Hopfinger, A.J.; Wang, S.; Tokarski, J.S.; Jin, B.; Albuquerque, protein flexibility and induced fit. Bio. Med. Chem. Lett., 2009, M.; Madhav, P.J.; Duraiswami, C. Construction of 3D-QSAR 19(19), 5582-5585. Recent Advances in Multidimensional QSAR (4D-6D): A Critical Review Mini-Reviews in Medicinal Chemistry, 2014, Vol. 14, No. 1 55

[76] Angelo, V.; Peter, Z.; Quasi-atomistic receptor modeling a bridge [83] Jaroslaw, P. Receptor dependent multidimensional QSAR for between 3D-QSAR and receptor fitting. Pharmaceutica. Acta. modeling drug-receptor interactions. Cur. Med. Chem., 2009, Helletiae., 1998, 73(1), 11-18. 16(25), 3243-3257. [77] Christoph, O.; Thomas, J.S.; Bernhard, W. 5D-QSAR for spirocyclic [84] Angelo, V.; Max, D.; Markus, A. L. Combining protein modeling S1 receptor ligands by Quasar receptor surface modeling. Eur. J. and 6D-QSAR simulating the binding of structurally diverse ligands Med. Chem., 2010, 45(7), 3116-3124. to the estrogen receptor. J. Med. Chem., 2005, 48(11), 3700-3703. [78] Angelo, V.; Max, D.; 5D-QSAR: the key for simulating induced [85] Santos-Filho, O.; Hopfinger, A. Structure-Based QSAR Analysis of fit.J. Med. Chem., 2002, 45(11), 2139-2149. a Set of 4-Hydroxy-5,6-dihydropyrones as Inhibitor of HIV-1 [79] Angelo, V.; Max, D.; Horst, D.; Kai-Malte, H.; Franz, B.; Markus, Protease: An Application of the Receptor Dependent (RD) A.L. Novel ligands for the chemokine receptor-3 (CCR3): a 4DQSAR Formalism. J. Chem. Inf. Model., 2006, 119(43), 345- receptor-modeling study based on 5D-QSAR. J. Med. Chem., 2005, 354. 48(5), 1515-1527. [86] Senese, C.L.; Duca, J.; Pan, D.; Hopfinger, A.J.; Tseng, Y.J. [80] Sylvie, D.; Grant, M.; Nicholas, J.L.; James, P.S. Quantitative 4Dfingerprints, universal QSAR and QSPR descriptors. J. Chem. structure-activity relationship (5D-QSAR) study of combretastatin Inf. Comput. Sci., 2004, 44(55), 1526-1539. like analogs as inhibitors of tubulin assembly. J. Med. Chem., 2005, [87] Romeiro, N.C.; Albuquerque, M.G.; de Alencastro, R.B.; Ravi, M.; 48(2), 457-465. Hopfinger, A.J. Construction of 4D-QSAR models for use in the [81] Blaney, J.M.; Weiner, P.K.; Dearing, A.; Kollman, P.A.; Jorgensen, design of novel p38-MAPK inhibitors. J. Comput. Aided Mol. Des., E.C.; Oatley, S.J.; Burridge, J.M.; Blake, J.F. Molecular mechanics 2005, 19(6), 385-400. simulation of protein-ligand interactions: binding of thyroid [88] Liu, J.; Pan, D.; Tseng, Y.; Hopfinger, A.J. 4D-QSAR analysis of a analogs to prealbumin. J. Am. Chem. Soc., 1982, 104(23), 6424- series of antifungal p450 inhibitors and 3D-pharmacophore 6434. comparisons as a function of alignment. J. Chem. Inf. Comput. Sci., [82] Angelo, V.; Anne-Verene, D.; Morena, S.; Beat, E. Predicting the 2003, 43(6), 2170-2179. toxic potential of drugs and chemicals in silico: a model for the [89] Alfonso, P.; Aliuska M.H.; Adela, A.G.; Natalia, D.S.C.; Amalio, peroxisome proliferator-activated receptor (PPAR ). Toxicol. G.E. Convenient QSAR model for predicting the complexation of Lett., 2007, 173(1), 17-23. structurally diverse compounds with -cyclodextrins. Bioorg. Med. Chem., 2009, 17(2), 896-904.

Received: September 13, 2013 Revised: October 29, 2013 Accepted: December 01, 2013

DISCLAIMER: The above article has been published in Epub (ahead of print) on the basis of the materials provided by the author. The Editorial Department reserves the right to make minor modifications for further improvement of the manuscript.

PMID: 24195665