[Type here]

The whitepaper provides an overview of the process, types, ’s used, success and challenges of Computer-Aided Drug Designing (CADD)

Authors: Jagmohan Verma, Anjaly Maria VRB Analytics Pvt Ltd

21st April 2020 Version 1.0

21st April, 2020 Computer Aided Drug Design “A binding pocket for a new class of drugs to treat AIDS was discovered using while considering the flexibility of the receptor through . This information leads to discovery of orally available HIV integrase inhibitor, raltegravir (Isentress®), approved by FDA in 2007 and received approval for paediatric use in 2011”. This is one of the many success stories of CADD approach.

What is Computer-Aided Drug Designing?

The traditional drug designing process takes almost 10 years and costs more than 1 billion dollars in total. Several technologies were used to reduce the time and cost of discovering a new drug , one of which was Computer Assisted Drug Designing (CADD).

10,000 250 5 compounds 1 drug Compounds compounds

Drug Discovery Preclinical Phase Clinical Phase 1-4 FDA Approved

10-14 years

>1 billion dollars

Figure 1: Traditional Drug Discovery Process In simple terms, computational drug designing can be explained as a modern drug discovery technique that uses theoretical and computational approaches to design a new drug molecule. CADD approaches can reduce the cost of drug discovery and development up to 50%.

Create new Drug molecule candidate Dock Estimate drug molecule to like property target protein

Estimate Analyze binding molecular strength interactions Figure 2: Basic Principle of CADD

Page 2 of 16

21st April, 2020 Computer Aided Drug Design Major Types of Approaches in CADD

CADD approaches are mainly of two types.

• Structure based drug design / direct approach • Ligand based drug design / indirect approach

Figure 3: Schematic Overview of CADD Process

What is structure-based approach?

Structure based approach or direct approach is exactly what the term indicates. It depends on the 3D structure of the molecule. The structure of the target protein is known. The basic principle behind structure-based approach involves predicting whether the given small molecule will bind to a chosen protein target and, if so, what will be the strength of this molecular recognition.

Page 3 of 16

21st April, 2020 Computer Aided Drug Design The first step in this process is molecular docking which is the cornerstone of structure-based drug design. To accurately carry out docking studies one requires the high-resolution X-ray, NMR or homology-modelled structure with known/predicted binding site in the biomolecule. Molecular docking is done to predict the most probable geometry and position of a small molecule at the surface of a protein by optimizing the interactions between both molecular partners. Many docking programs are freely available and can be used for educational purposes, including web-based tools such as SwissDock.ch, or downloadable programs such as Autodock and Autodock Vina. Almost 162,529 structures are available in the Protein Data Base till date.

The next step is to determine the strength of binding of the small molecule to the protein which can be achieved using a binding free energy estimator. Several computer-aided approaches are available for this purpose. They are generally based on high-level methods involving concepts in physical chemistry and statistical physics. Docking software is used for estimating the binding free energy. The docking process involves two interrelated steps, first step is sampling conformations of the ligand in the active site of the protein: then ranking these conformations via a scoring function

Sampling algorithm: sampling algorithms should be able to reproduce the experimental binding mode. There are a huge number of possible binding modes between two based on degrees of freedom of both the ligand and protein. To generate all possible conformations computationally will be very expensive. Thus, various sampling algorithms have been developed and widely used in molecular docking software. Sampling algorithms are classified based on the number of degrees of freedom they ignore.

Molecular dynamics (MD) is widely used as a powerful simulation method in many fields of . In the context of docking, by moving each atom separately in the field of the rest atoms, MD simulation represents the flexibility of both the ligand and protein more effectively than other algorithms.

Page 4 of 16

21st April, 2020 Computer Aided Drug Design

The simplest of the algorithms introduced treated the molecules as two rigid bodies thereby reducing the degree of freedom to just six. Examples: DOCK, LibDock, LIDAEUS, SANDOCK

Incremental construction: ligand is fragmented from rotatable bonds into various segments. One of the segments is anchored to the receptor surface. The anchor is generally considered to be the fragment which shows maximum interactions with the receptor surface, has minimum number of alternate conformations and fairly rigid such as the ring system. Examples: DOCK4.0, FlexX, SLIDE

Monte Carlo (MC):a ligand is modified gradually using bond rotation and translation or rotation of the entire ligand. More than one parameter can also be changed at a time to get a particular conformation. That conformation is then evaluated at the binding site based on energy calculation using molecular mechanics and is then rejected or accepted for the next iteration based on Boltzmann’s probability constant. Example: DockVision 1.0.3, FDS, GlamDock, ICM, MCDOCK

Genetic algorithm (GA): It is quite similar to MC method and is basically used to find the global minima. These are much inspired by the Darwin’s Theory of Evolution. GA maintains a population of ligands with an associated fitness determined by the scoring function. Each ligand represents a potential hit. The GA alters the ligands of the population by mutation or crossover. Example: Autodock 4.0, DARWIN, DIVALI , FITTED, FLIPDock

Hierarchial algorithm: the low energy conformations of the ligand are pre-computed and aligned. The populations of the pre-generated ligand conformations are merged into a hierarchy such that similar conformations are positioned adjacent to each other within the hierarchy. Afterwards, on carrying out rotation or translation of the ligand, the docking program will make use of this hierarchical data structure and thus minimize the outcomes. Example: GLIDE

Figure 4:Types of Algorithm

Page 5 of 16

21st April, 2020 Computer Aided Drug Design • Scoring function: It is done to precisely identify the correct poses from incorrect poses, or binders from inactive compounds in a reasonable computation time. However, scoring functions involve estimating, rather than calculating the binding affinity between the protein and ligand.

• Assess binding energy by calculating sum of the non-bonded (electrostatistics and van der waals) Classical force field based interactions. Eg DOCK, AutoDock scoring function

• The binding energy decomposes in to several energy components, such as hydrogen bond, ionic interactions, hydrophobic effect and bonding Emperical score function entropy. Each component is multiplied by a coefficient and then summed up to give final score. Eg. LUDI, ChemScore

• In this method the score is calculated by favouring preferred contacts and penalizing repulsive Knowledge based scoring interactions between easch atom in the ligand and function protein within a given cutoff. Eg.DrugScore, Bleep

• It combines several different scores to assess docking conformations. Eg. CScore combines Consensus scoring function DOCK, ChemScore, PMF, GOLD and FlexX scoring functions

• MM-PB/SA and MM-GB/SA is involved in rescoring or lead optimization to improve the Physics based scoring accuracy of binding affinity prediction function

Figure 5: Types of Scoring Methods

Ligand Based Approach: Ligand based drug design is an approach used in the absence of the receptor 3D information and it relies on knowledge of molecules that bind to the

Page 6 of 16

21st April, 2020 Computer Aided Drug Design biological target of interest. 3D quantitative structure activity relationships (3D QSAR) and pharmacophore modelling are the most important and widely used tools in ligand-based drug design. Pharmacophore models are derived from known molecules to define the necessary structural characteristics to enable binding to the biological target. The power of prediction is one of the major characteristics of a QSAR model and may be defined as the capability of a model to accurately predict the biological activity of compounds that were not used for model development. Virtual screening methods: Virtual screening (VS) is a computational approach for the discovery of new drugs that has successfully complemented High Throughput Screening (HTS) for hit detection. The objective is to use a computational approach for rapid cost-effective evaluation of large virtual databases of chemical compounds to find novel leads that can be synthesized and examined experimentally for their biological activity. Structure-based virtual screening (SBVS) encompasses a variety of sequential computational phases, including target and database preparation, docking and post docking analysis and prioritization of compounds for biological testing. SBVS is employed in situations in which the 3D structure of the target protein is known. Programs that utilize the SBVS include GLIDE, FlexX and GOLD. Pharmacophore based virtual screening (PBVS) uses a pharmacophore modelling approach to screen large databases to identify molecules of desired biological effects. To accomplish this, a query (pharmacophore model) that encodes the correct 3D organization of the required interaction pattern in the most likely manner is created. Different options are available for constructing a pharmacophore model (query) depending on the information available for the particular protein target. Examples of some programs that perform pharmacophore-based searches include UNITY, MACCS-3d, Catalyst, PHASE.

Page 7 of 16

21st April, 2020 Computer Aided Drug Design Software’s available for CADD

Table 1: Common Used for CADD

SL.No Software Name Major Use

Pharmacokinetic parameters 1 DDDPlus Dissolution and disintegration study 2 MapCheck Compare dose or fluency measurement 3 GastroPlus In-vitro and in vivo correlation for various formulations Ligand interactions and molecular dynamic 4 AutoDock Evaluate the ligand-protein interaction

5 Schrodinger Ligand-receptor docking

6 GOLD Protein-ligand docking 7 BioSuite Genome analyzing and sequence analyzing Molecular modeling and structural activity relationship 8 Maestro Molecular modeling analysis 9 ArgusLab Molecular docking calculations and molecular modeling package 10 GRAMM Protein-protein docking and protein-ligand docking 11 SYBYL-X Suite Molecular modeling and ligand-based design 12 Sanjeevini Predict protein-ligand binding affinity 13 PASS Create and analysis of SAR models Image analysis and Visualizers 14 AMIDE (A Medical Image Data Medical image analysis in molecular imaging Examiner) 15 ® Visualizer Viewing and analyzing protein data 16 Imaging Software Scge-Pro Cytogenetic and DNA damage analysis 17 Xenogen Living Image Software In vivo imaging display and analysis Data analysis 18 GeneSpring Identify variation across set of samples and for correction method in samples 19 QSARPro Protein-protein interaction study 20 REST 2009 Software Analysis of gene expression data Behavioural study 21 Ethowatcher Behavior analysis 22 MARS (Multimodal Animal Animal activity tracking, enzyme activity, Rotation System) nanoparticle tracking and delivery study

Page 8 of 16

21st April, 2020 Computer Aided Drug Design

Paid and Open Source Softwares

Free and Open Source Softwares in Drug Designing

Open source refers to any program that has the available for use or modification as users or other developers see fit. It is usually developed in collaboration. Free source most of the time is only for academic institution as commercial enterprises will be required to pay a fee. Advantages of Open Source Software’s Scientists usually gets less funding to invest in their project. Also, the softwares comes with expensive license fee, renewable every year. • Using an open source or is helpful for the scientist as rapid implementation is possible, i.e. he/she can download a program directly and immediately from the internet • There are no license fees or at a lower cost • Flexibility and options to customize the software for a particular project. Disadvantages of Open Source Software’s Though free and open softwares is attractive in terms of absence of licence fees, there are several pitfalls that users need to be aware of. • Software programs are not always well written or their use well documented, which might present a problem for the average end-user • Also, especially in the field of chemistry, commercialization has been a driving force, sometimes making it difficult to convince experts in a field to contribute in their ‘spare’ time to these open-source projects • Many of these programs do not come with easy installation and most of them need to be compiled by the programming language-specific (++, Fortran, JAVA) or run from the command line as these programs are developed by graduates or students majoring in computational drug designing • Most of such softwares are not user friendly which can result in a bench scientist spending more time trying to install or manage the program, rather than using it.

Page 9 of 16

21st April, 2020 Computer Aided Drug Design Table 2: Examples of free and/or open-source software packages for computational and molecular modelling

Application Program name Website

Visualization Rasmol http://www.openrasmol.org/ MolVis http://molvis.sdsc.edu/visres PyMol http://pymol.sourceforge.net/ DeepView http://us.expasy.org/spdbv/ http://jmol.sourceforge.net/ gOpenMol http://www.csc.fi/gopenmol/ AstexViewer http://www.astex-therapeutics.com/ Docking ArgusDock http://www.arguslab.com/ Dock http://dock.compbio.ucsf.edu/ FRED http://www.eyesopen.com/ eHITS http://www.simbiosys.ca/ AutoDock http://www.scripps.edu/ FTDock http://www.bmm.icnet.uk/docking/ftdock.html Energy GAMESS http://www.bmm.icnet.uk/docking/ftdock.html Minimization http://www.uku.fi/~thassine/ghemical/ PS13 http://www.psicode.org/ TINKER http://dasher.wustl.edu/tinker QSAR descriptors SoMFA http://bellatrix.pcl.ox.ac.uk/ GRID http://www.moldiscovery.com/ E-Dragon 1.0 http://146.107.217.178/lab/edragon/ ALOGPS 2.1 http://146.107.217.178/lab/alogps/ Marvin Beans http://www.chemaxon.com/ Chemical drawing ACD/labs ChemSketch http://www.acdlabs.com/ ISISDraw http://www.mdli.com/ XDrawChem http://xdrawchem.sourceforge.net/ JME Editor http://www.molinspiration.com/jme/ Software libraries Chemical Development http://almost.cubic.uni-koeln.de/cdk/ Kit Molecular Modeling http://starship.python.net/crew/hinsen/MMTK/ Toolkit PerlMol http://www.perlmol.org/ JOELib http://www-ra.informatik.uni- tuebingen.de/software// OpenBabel http://openbabel.sourceforge.net/

Page 10 of 16

21st April, 2020 Computer Aided Drug Design Table 3:Examples of Paid Source Software Packages

Application Program name Website

Molecular Docking GOLD https://www.ccdc.cam.ac.uk/solutions/csd- discovery/components/gold/ GLIDE https://www.schrodinger.com/glide FlexX https://www.schrodinger.com/glide ICM http://www.molsoft.com/docking.html Surflex-Dock https://sites.google.com/view/biocomp- uenf/mdr-surflexdock Pharmacophore PHASE modelling ADMET Prediction QIKPROP https://www.schrodinger.com/qikprop QSAR SYBYL https://sybyl.com/

Current Challenges in the Field of CADD

Though CADD has resulted in tremendous progress in the field of drug discovery, there are still formidable challenges that need to be overcome which limit the effective applications of current computational methods.

• One of the major challenges is that it is not possible to copy and simulate the complete biological system on a computer system • current molecular docking scoring functions rank the compounds collections with inherent poor prediction accuracy in novel target drug discovery whose function has just been unravelled not long ago • traditional docking algorithms fail to take complicated factors into full consideration like protein flexibility, solvation, entropy, and dynamic inclusion of water molecules • even though epigenetic enzymes have been actively pursued as potential drug targets, there is still conspicuous lack of potent chemical probes for a large number of knotty targets like HATs and epigenetic protein-protein interactions, which needs to be further explored • the bioactivities of identified inhibitors vary considerably due to different assay platforms in differ different labs • Ligand flexibility, permutations and combinations of stereoisomers and possible protonation states pose additional challenges to the docking problem. The flexibility of the system is a major challenge in the search for the correct pose

Page 11 of 16

21st April, 2020 Computer Aided Drug Design Success Story of CADD Table 4: List of Drugs Developed via CADD Approach Drug Trade name Pharmaceutical Year of Therapeutic action company Approval Captopril Capoten® Bristol Myers- 1981 Antihypertensive Squibb Saquinavir Invirase® Hoffmann-La 1995 HIV inhibitor Roche Dorzolamide Trusopt® Merck 1995 Carbonic anhydrase inhibitor Indinavir Crixivan Merck 1996 HIV inhibitor Ritonavir Norvir AbbVie 1996 HIV inhibitor Triofiban Aggrastat Merck 1998 Fibrinogen antagonist Zanamvir Relenza®, Gilead Sciences 1999 Neuraminidase inhibitor Tamiflu Roche Active against influenza A and B Oseltamivir 1999 viruses. Raltegravir Isentress Merck 2007 HIV inhibitor Aliskiren Tekturna® Novartis 2007 Human renin inhibitor Phase TMI-005 In Rheumatoid arthritis II clinical trials Lilly/Protherics Phase LY-517717 Serine protease Inhibitor II clinical trials Schering-Plough Phase III Boceprevir HCV inhibitor clinical trials Thymitaq® Agouron Phase III Nolatrexed In Liver cancer clinical trials NVP- Novartis Phase I clinical Inhibitor for HSP90 AUY922 trials Rupintrivir Agouron Development Anti-viral agent (AG7088) stopped after phase II/III trials

Page 12 of 16

21st April, 2020 Computer Aided Drug Design Recent Advances in CADD Approach Small molecule databases: A variety of repositories of biologically interesting small molecules and their physicochemical properties have been compiled to develop databases. These databases comprise of chemical compounds, drugs, carbohydrates, enzymes, reactants, natural products and natural-product-derived compounds. These databases are the backbone of computer-aided drug discovery and give information which can be used to build knowledge-based models for discovering and designing drug molecules. Table 5: List of Small Molecule Databases

Name Number of compounds available URL PubChem 40 million small molecules and 19 million http://pubchem.ncbi.nlm.nih.gov/ unique structures ACD >571000 purchasable compounds, Screening http://www.mdli.com/ Compounds Directory stores over 4.5 million unique structures ZINC 21 million compounds available for virtual http://zinc.docking.org/ screening annotated with biologically relevant properties LIGAND 15395 chemical compounds, 8031 drugs, 10 966 http://www.genome.jp/ligand/ carbohydrates, 5043 enzymes, 7826 chemical reactions and 11113 reactants

DrugBank The database contains 6712 drug entries http://www.drugbank.ca/ including 1448 FDA-approved small molecule drugs, 131 FDA-approved biotech (protein/peptide) drugs, 85 nutraceuticals and 5080 experimental drugs ChemDB nearly 5 million commercially available http://cdb.ics.uci.edu/ compounds

Biological Databases: Sequencing of the human and other model organism genomes have produced increasingly huge amounts of data relevant to the study of human disease. The international collaborative GenBank, DNA Data Bank of Japan (DDBJ) and European Molecular Biology Laboratory (EMBL) serve as worldwide repositories for nucleotide sequences of diverse origins.

Page 13 of 16

21st April, 2020 Computer Aided Drug Design Table 6:List of Biological Databases

Type Name URL DNA sequences GenBank http://www.ncbi.nlm.nih.gov/Genbank/ DDBJ http://www.ddbj.nig.ac.jp/ EMBL http://www.embl-heidelberg.de/

Protein sequences Swiss-Prot http://www.expasy.ch/sprot/ PIR http://pir.georgetown.edu/

Protein structures PDB http://www.rcsb.org/pdb Gene expression ArrayExpress http://www.ebi.ac.uk/microarray-as/ae/ GEO http://www.ncbi.nlm.nih.gov/geo/ CIBEX http://cibex.nig.ac.jp/index.jsp

2D gel SWISS- http://www.expasy.ch/ch2d/ electrophoresis 2DPAGE GELBANK http://gelbank.anl.gov/ Mass spectrometry OPD http://bioinformatics.icmb.utexas.edu/OPD/ GPMDB http://www.thegpm.org/GPMDB/index.html

Metabolomics HMDB http://www.metabolomics.ca/ MDL http://www.mdl.com/products/predictive/metabolite/index.jsp Metabolite http://metlin.scripps.edu/ database

METLIN http://metlin.scripps.edu/

Protein-protein BIND http://www.bind.ca/ interactions HPRD http://www.hprd.org/

IntAct http://www.ebi.ac.uk/intact/ Transcriptional TRANSFAC http://www.biobase-international.com/pages/index.php?id regulation TRED rulai.cshl.edu/TRED/

Post translational dbPTM dbptm.mbc.nctu.edu.tw/ modification RESID http://www.ebi.ac.uk/RESID/

Biological KEGG http://www.genome.jp/kegg/ Pathways BioCarta http://www.biocarta.com/

Page 14 of 16

21st April, 2020 Computer Aided Drug Design Virtual Combinatorial Libraries: comprises of chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individual compounds or chemical structures generated by computer software. Combinatorial chemistry can be used for the synthesis of small molecules and for peptides. Future of CADD The clear concept and advanced knowledge of CADD methods will improve research quality and facilitate identification of new chemical entities, leading to development of useful drugs. Extensive use of computational approaches with higher accuracy could reduce the overall cost and failure of drug designing. VRB analytics provide HEOR solutions, market access solutions, RWE solutions, data analytics solutions, clinical trial solutions and medical writing solutions.

Page 15 of 16

21st April, 2020 Computer Aided Drug Design References

• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151162/pdf/nihms-308746.pdf • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151162/ • https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0075992 • https://www.intechopen.com /books/drug-discovery-and-development-new- advances/molecular-docking-in-modern-drug-discovery-principles-and-recent- applications • http://chekhov.cs.vt.edu/PAPERS/open_source_drug_disc.pdf • https://www.pharmafocusasia.com/research-development/computer-aided-drug- design-in-pharma • https://www.frontiersin.org/articles/10.3389/fchem.2018.00057/full • file:///C:/Users/91854/Downloads/success-limitation-and-future-of-computer- aided-drug-designing-2161-1025.1000e127%20(1).pdf • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3480706/ • file:///C:/Users/91854/Desktop/computational/nihms-308746.pdf • file:///C:/Users/91854/Desktop/computational/Molecular%20Docking%20in%20Mo dern%20Drug%20Discovery_%20Principles%20and%20Recent%20Applications%20_ %20IntechOpen.pdf • file:///C:/Users/91854/Downloads/Dr.Middha-MS1%20(1).pdf • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5248982/

Page 16 of 16