New Fragmentation Method to Enhance Structure-Based

NEW FRAGMENTATION METHOD TO ENHANCE STRUCTURE-BASED IN SILICO MODELING OF CHEMICALLY-INDUCED TOXICITY DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Darshan Mehta, M.S. Graduate Program in Chemical Engineering The Ohio State University 2016 Dissertation Committee: James Rathman, Advisor Chihae Yang, Co-advisor Aravind Asthagiri Bhavik Bakshi Copyright by Darshan Mehta 2016 ABSTRACT Evaluating the potential toxicity of chemical compounds is an important step in the development of all new products these days, ranging from drugs to food ingredients to materials in consumer products. This evaluation is necessary for obtaining approval from regulatory agencies and for minimizing safety concerns. Current methods for assessing toxicity largely rely on experimental techniques that are time consuming and resource intensive. The development of computational methods is therefore of great current interest as it helps to reduce and prioritize the experimental tests. Principles of chemical informatics (also called as chemoinformatics), where chemical descriptors are used to capture and represent structural information, are widely used to predict the toxicity of chemicals based on their molecular structure. These descriptors describe the molecules by abstracting their structural information into a mathematical quantity and then relating them to a particular question such as the similarity between two molecules or as factors in a computational model. In spite of the developments in identifying newer descriptors, there is still a need for more efficient methods that can reduce the high-dimensional descriptor space, give meaningful descriptors, and yield better toxicity prediction results. In this research, we developed novel chemical descriptors that were able to overcome some of the above-mentioned limitations. These novel descriptors are linear subgraphs of chemical structures that are annotated with atom-based features such as atom identity and partial charge. These features provide flexibility in defining chemical fragments, and thus allow one to explore different levels of structural details from the same chemical fragment. Even though the fragments ii are linear in composition, they also capture branched structural information due to the provision of annotating features. Our particular interest in linear fragments was due to the potential of building Markov chain models for prediction purposes. Markov models have proved useful in bioinformatics methods for the sequence analysis of DNA, RNA, and other nucleotide sequences. We developed a similar model tailored for analyzing linear chemical fragments that helped to quantify the relationship between these fragments and chemically-induced toxicity. We evaluated the performance of annotated linear fragments using datasets on two toxicity endpoints, namely skin sensitization and Ames mutagenicity. The first part of this evaluation was to explore the fragment space to see if descriptors that appear to be related to these two toxicity endpoints could be identified. We were able to identify 15 unique descriptors that are well-known to cause skin sensitizing effects and 12 descriptors that are known to cause mutagenic effects. The second part of the evaluation was to explore the performance of Markov chain models for predicting mutagenicity of chemicals. We developed several models using different annotation schemes and fragment lengths and explored their predictive performances. It was found that these models performed significantly better or comparably similar to the other non-parametric approaches reported in the literature. We also explored the performance of kNN models using annotated linear fragments to substantiate the effectiveness of our novel descriptors. iii DEDICATION To the presiding deities at Columbus Krishna House: Sri Sri Gaura Nitai, Sri Sri Radha Natabara, and Sri Sri Jagannath Baladev Subhadra To my spiritual master, His Holiness Radhanath Swami And to all innocent creatures used for experimental testing of chemicals iv ACKNOWLEDGMENTS I would sincerely like to acknowledge the guidance and support that I received from my advisors, Drs. James Rathman and Chihae Yang. Without their expert leadership, clear vision, and constructive feedback, it would not have been possible for me to complete this monumental task. I would also like to acknowledge the help that I received from my lab members for their regular feedback and advice. In particular, I would like to thank Aleksandra Mostrag-Szlichtyng, Dimitar Hristozov, and Bryan Hobocienski. I am especially grateful to Aleksandra for compiling the skin sensitization dataset and to Dimitar for helping me set up the RDKit package in Python scripting language. I also sincerely thank the professors in department of Statistics for teaching me the fundamentals of statistical data analysis. This played a significant role during the course of my research and eventually led me to pursue a Master’s degree in Applied Statistics. Finally, I would like to acknowledge the emotional and spiritual support that I received from my parents, brother, wife, and all the wonderful devotees at Columbus Krishna House. My stay in Columbus wouldn’t have been the same without them. v VITA 2008……………………………………B. Chem. Engg. (Bachelor of Chemical Engineering), Institute of Chemical Technology. 2008 to 2010……………………………Manager, Central Technical Services Dept., Reliance Industries Limited, Hazira, India. 2010 to present…………………………Graduate Research/Teaching Associate, Department of Chemical and Biomolecular Engineering, The Ohio State University. 2013……………………………………M.A.S. (Master of Applied Statistics), The Ohio State University. 2014……………………………………M.S. Chemical Engineering, The Ohio State University. FIELDS OF STUDY Major Field: Chemical Engineering vi TABLE OF CONTENTS Abstract ............................................................................................................................... ii Dedication .......................................................................................................................... iv Acknowledgments............................................................................................................... v Vita ..................................................................................................................................... vi List of Tables ..................................................................................................................... xi List of Figures .................................................................................................................. xvi CHAPTER 1: INTRODUCTION ....................................................................................... 1 1.1 QSAR Modeling ........................................................................................................................ 7 CHAPTER 2: BACKGROUND ....................................................................................... 11 2.1 Representing Chemical Structures ........................................................................................... 11 2.2 Structural Descriptors .............................................................................................................. 15 vii 2.3 Application of QSAR Methods ................................................................................................ 19 2.4 Research Problem .................................................................................................................... 23 2.5 Proposed Solution .................................................................................................................... 24 CHAPTER 3: DATASETS AND COMPUTATIONAL TOOLS .................................... 27 3.1 Training Datasets ..................................................................................................................... 27 3.2 Computational Tools ................................................................................................................ 32 CHAPTER 4: METHODS ................................................................................................ 35 4.1 Generation of Linear Fragments .............................................................................................. 35 4.2 Chemical Annotations .............................................................................................................. 38 4.3 Compound-Fragment Data Matrix ........................................................................................... 41 4.4 Developing Markov Chain Models .......................................................................................... 44 4.5 Evaluating Model Performance ............................................................................................... 53 CHAPTER 5: RESULTS ON IDENTIFICATION OF STRUCTURL ALERTS............ 55 5.1 Distinguishing Structurally Similar Compounds ..................................................................... 55 viii 5.2 Identifying Structural Alerts for Skin Sensitization ................................................................. 59 5.3 Comparison of Different Annotation Schemes ........................................................................ 71 5.4 Identifying Structural Alerts for Ames Mutagenicity .............................................................. 73 CHAPTER 6: RESULTS ON CLASSIFICATION OF COMPOUNDS USING MARKOV

Load more