Organic & Biomolecular Chemistry

Organic & Biomolecular Chemistry Accepted Manuscript This is an Accepted Manuscript, which has been through the Royal Society of Chemistry peer review process and has been accepted for publication. Accepted Manuscripts are published online shortly after acceptance, before technical editing, formatting and proof reading. Using this free service, authors can make their results available to the community, in citable form, before we publish the edited article. We will replace this Accepted Manuscript with the edited and formatted Advance Article as soon as it is available. You can find more information about Accepted Manuscripts in the Information for Authors. Please note that technical editing may introduce minor changes to the text and/or graphics, which may alter content. The journal’s standard Terms & Conditions and the Ethical guidelines still apply. In no event shall the Royal Society of Chemistry be held responsible for any errors or omissions in this Accepted Manuscript or any consequences arising from the use of any information it contains. www.rsc.org/obc Page 1 of 6 CREATED USING THE RSC ARTICLEOrganic TEMPLATE & Biomolecular (VER. 3.0 OOo) - SEE WWW.RSC.ORG/ELECTRONICFILESChemistry FOR DETAILS www.rsc.org/xxxxxx | XXXXXXXX Expanding DP4: Application to drug compounds and automation Kristaps Ermanis,a Kevin E. B. Parkes,b Tatiana Agbackb and Jonathan M. Goodman*c Received (in XXX, XXX) Xth XXXXXXXXX 200X, Accepted Xth XXXXXXXXX 200X First published on the web Xth XXXXXXXXX 200X DOI: 10.1039/b00000000x The DP4 parameter, which provides a confidence level for NMR assignment, has been widely used to help assign the structures of many stereochemically-rich molecules. We present an improved version of the procedure, which can be downloaded as Python script instead of running within a web-browser, and which analyses output from open-source molecular modelling programs (TINKER and NWChem) in addition to being able to use output from commercial Manuscript packages (Schrodinger's Macromodel and Jaguar; Gaussian). The new open-source workflow incorporates a method for the automatic generation of diastereomers using InChI strings and has been tested on a range of new structures. This improved workflow permits the rapid and convenient computational elucidation of structure and relative stereochemistry. Introduction characterized by higher number of heteroatoms and aromatic systems. While DP4 should in principle be applicable to any Computational methods for the prediction of NMR spectra are organic compound, its performance in drug compounds is all Accepted well established and have become an invaluable tool in the but untested. structure elucidation.1 When DFT calculations are used to So far in our investigations we have been using predict the NMR spectra of candidate structures, the next step MacroModel6 for conformational searches and Jaguar7 or is to decide which of the predictions match the experimental Gaussian8 for DFT and GIAO calculations. However, there NMR data best. Many different ways to quantitatively exist open-source alternatives for both conformational search evaluate goodness of fit have been developed, including mean and for DFT calculations. Incorporation and validation of this absolute error, corrected mean absolute error and correlation software in the DP4 process would significantly increase its coefficient. In addition to these, we have developed CP3 and accessibility. DP4 statistical parameters which help choosing the correct Chemistry structure when several sets or just one set of experimental Results and discussion NMR data are available, respectively.2,3 Both of these parameters generally appear to give higher confidence in the Application to drug molecules correct structure than other parameters. DP4 in particular has seen wide use in structure elucidation of many complex DP4 was originally developed with a focus on natural natural products4 and also synthetic compounds.5 products and their fragments.3 We wanted to explore the A typical NMR prediction process starts with a application of the DP4 workflow in a less familiar chemical conformational search on all of the isomers. DFT GIAO space. Drug compounds occupy one such region – in general, calculations are then run on all low-energy conformers, the they contain less stereocentres, but more heteroatoms and predicted shifts are combined using Boltzmann weighing and aromatic systems when compared to natural products. A large finally, the DP4 analysis is employed to decide which portion of drug compounds also contain basic nitrogen atoms structure is the most likely to be correct. DP4 analysis or systems with several possible tautomeric states, which can achieves this by first empirically scaling all the predicted make the prediction of NMR spectra challenging. For this Biomolecular chemical shifts to remove any systematic errors. Errors are study a selection of stereochemically rich drug compounds then calculated for each of the signals. Then, assuming that from the Top100 drugs was chosen (Figure 1).9 The selected & each error follows a normal or t distribution with known compounds all had 2-64 plausible diastereomers and had a parameters, the probability of encountering each error in a well defined tautomeric state. Nucleoside analogs in particular correct structure is calculated. Next, all of the probabilities for were not included in this study because of the potential signals in a candidate structure are multiplied to give the tautomeric and protonated states. overall absolute probability, that the candidate structure is the The compounds and their diastereomers were submitted to correct one. Finally, these resulting probabilities are converted MacroModel conformational search, all conformers with to a set of relative probabilities that each candidate structure is energy less than 10 kJ/mol relative to the global minimum Organic the correct one using Bayes' theorem. were then submitted to DFT single point calculation and NMR DP4 was originally developed for the elucidation of the GIAO calculation10 using Gaussian Quantum chemistry relative stereochemistry of natural products. Drug compounds software. Our earlier studies demonstrated that this protocol is another other class of compounds that also exhibit diverse provides an effective balance of precision and speed.3 Finally, stereochemistry. They inhabit a distinct chemical space the predicted NMR shifts were submitted to DP4 analysis. For This journal is © The Royal Society of Chemistry [year] Journal Name, [year], [vol], 00–00 | 1 Organic & Biomolecular Chemistry Page 2 of 6 Manuscript Fig. 1 Stereochemically rich drug molecules selected for DP4 study. Diastereomer count shown. Diastereomers labelled with letters a-bl, a being the Accepted correct diastereomer. Configuration was varied at chiral centres marked with '*'. solvent in the DFT calculations reduced the errors of the chemical shift prediction and improved the identification of the correct diastereomer. Therefore the PCM solvent model11 was used for all NMR GIAO calculations. In most cases DP4 is able to correctly determine the relative stereochemistry of these compounds with good confidence and shows the generality and robustness of the DP4 approach. Even highly flexible structures like rosuvastatin, 2, and formoterol, 1, Chemistry were correctly identified. In the case of ezetimibe, 8, however DP4 favoured the structure epimeric at the remote OH centre as the most likely with 98% confidence, with the correct structure as a remote second most likely candidate. Similarly in the case of darunavir, 6, the most likely structure was Fig. 2 Overall DP4 probabilities assigned to drug compound identified as the one with epimeric OH centre. However, the diastereomers using MacroModel and Gaussian. Only diastereomers with relative stereochemistry of the remaining four stereocentres probabilities greater than 0.1% shown. was identified correctly, despite the flexible nature of the molecule. The case of simvastatin, 5, was particulary mometasone, 4, and testosterone, 3, only the structures challenging as this compound is Quite flexible and has six diastereomeric at the peripheral stereocentres were stereocentres and 64 possible diastereomers. Unfortunately considered. In the case of tiotropium, 7, the epoxide chiral this particular problem appears to be too complex for the Biomolecular centres were varied in concert, as the trans isomers would be current DP4 methodology and the diastereomers identified as too strained to exist. For the rest of the compounds all & likely candidates were completely different from the correct possible diastereomers were considered. The results of this structure. However, the DP4 conclusion that four study are shown in Figure 2. diastereomers have significant probability may suggest the Ideally, the DP4 parameter will give 100 % confidence for challenge of this highly flexible structure to the user and thus the correct diastereoisomer, but all certainty beyond a random would likely not lead to incorrect conclusions. selection is useful, and certainty on some stereocentres is Several of our previous studies have shown that helpful. In the figure, the colours correspond to the utility of optimization of the geometries at the DFT level provides only the assignment. Dark blue is the probability of the completely limited improvement at high computational cost. We decided Organic correct structure. The other colours relate to the utility – six to test if this still holds true on the compounds in the current out of seven

Organic & Biomolecular Chemistry

20 Years of Polarizable Force Field Development

Integrated Tools for Computational Chemical Dynamics

An Efficient Hardware-Software Approach to Network Fault Tolerance with Infiniband

Computer-Assisted Catalyst Development Via Automated Modelling of Conformationally Complex Molecules

Melissa Gajewski & Jonathan Mane

BOSS, Version 4.6 Biochemical and Organic Simulation System User’S Manual for UNIX, Linux, and Windows

Dr. David Danovich

Tinker-HP : Accelerating Molecular Dynamics Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields Using Gpus and Multi-Gpus Systems

Multiscale Friction Simulation of Dry Polymer Contacts: Reaching Experimental Length Scales by Coupling Molecular Dynamics and Contact Mechanics

Quantum Chemistry (QC) on Gpus Feb

Software Modules and Programming Environment SCAI User Support

Macromodel Quick Start Guide