© 2016 Marlene T. Kim ALL RIGHTS RESERVED

© 2016 Marlene T. Kim ALL RIGHTS RESERVED DEVELOPING ADVANCED RULES AND TOOLS TO IMPROVE IN VIVO QSAR MODELS by MARLENE T. KIM A dissertation submitted to the Graduate School-Camden Rutgers, the State University of New Jersey In partial fulfillment of the requirements For the degree of Doctor of Philosophy Graduate Program in Computational and Integrative Biology Written under the direction of Hao Zhu And approved by ________________________________ Hao Zhu ________________________________ Menghang Xia ________________________________ Jongmin Nam ________________________________ Desmond Lun Camden, New Jersey May 2016 ABSTRACT OF THE DISSERTATION Developing Advanced Rules and Tools to Improve In Vivo QSAR Models by MARLENE T. KIM Dissertation Director: Hao Zhu Together oral bioavailability and hepatotoxicity determine the fate and failure of a new drug in clinical trials. A promising drug candidate that has little to no oral bioavailability could be considered an ineffective treatment. If the drug does have high oral bioavailability, yet exhibits severe hepatotoxicity, the drug will be withdrawn from clinical trials. Thus, oral bioavailability and hepatotoxicity account for a substantial number of drugs eliminated from therapies and withdrawn from the market. Not only is this a severe financial loss for a pharmaceutical company; it is also a loss for patients who will no longer be able to benefit from the therapeutic effects. The most common approach to testing oral bioavailability and hepatotoxicity before clinical trials is through animal testing. The information learned from animal testing is highly valuable, but is expensive, time consuming, and has low throughput. Cell based assays developed to study specific biochemical mechanisms and toxicity are heavily used as an alternative; however, correlations between complex in vitro and in vivo endpoints are not very clear. Another alternative is Quantitative Structure Activity Relationship (QSAR) approach. A QSAR model could evaluate millions of chemicals without requiring them to be synthesized, which saves money, time, and has high throughput. Many predictive QSAR models have been developed since the 1960’s, yet predictive in vivo QSAR models are ii difficult to build and rare to find. Developing new methods to integrate mechanistic information and improve in vitro-in vivo correlation(s) for QSAR modeling purposes have been shown to be beneficial and are the focus of this thesis. The complex in vivo endpoints oral bioavailability and hepatotoxicity were used as example endpoints to model. Methods to curate biological information and in vitro data, specifically high- throughput screening data, and incorporate it into the QSAR models are discussed in great detail. The performance of the resulting models, as well as their shortcomings is also discussed. Overall, incorporating biological information into the QSAR modeling workflow greatly improved predictions from both the oral bioavailability and hepatotoxicity models. The new techniques can be adapted to model other complex biological endpoints. iii ACKNOWLEDGEMENTS To my advisor and friend, Hao Zhu, thank you for everything. I have learned so much over the years about research and life. Without you, I would not be where I am today. To my committee members, Jongmin Nam and Menghang Xia, thank you for your generous support and guidance. To my friend and mentor, Alexander Sedykh, thank you so much to for all your help, inspiration, and guidance. The magnitude of your influence is greater than you can imagine. I cannot thank you enough. To the CCIB systems administrator, Kevin Abbey, without you none of this would be possible. All your help was greatly appreciated. Thank you. To my lab members and life-long friends, Abena Boison, Dan Russo, Dan Pinolini, and Wenyi Wang, thank you for making graduate school an enjoyable experience. Your expertise was greatly valued and always improved my work. Together, we accomplished great projects that would not have been realized had we did it ourselves. I will miss spending time with you all. iv DEDICATION To my dad, Paul Kim, you are my hero. You have encouraged me to pursue my passion when know else believed I could. To my mom, Anh Kim, your resilience taught me to never give up and give my all; there is always a solution, no excuses. To my family, our family is huge. Therefore, I cannot acknowledge you all, but thank you for your unconditional love. You all have taught me the importance of sharing, giving, and cooperating; especially when I did not want to. These skills are essential, and I attribute them to my success in graduate school. To my dogs Lilo and Stitch, you are my inspiration to work. Each day, my love for the both of you grows and increases my passion to advocate for alternatives to animal testing. To my best friends, Hetal Vasavada and Sojaita “Jenny” Mears, I would not be here without either of you, literally. Hetal, you are the nicest person in the world. Thank you for being my conscience and showing me there is a huge, loving world outside of Camden. By the way, thank you for nagging me to go to grad school. Jenny, you are my “big sister.” You devote your free time to community members and bear the burden of others, when you do not have to. Your example encourages me to do the same for others, especially when I am exhausted. To my high school science teachers, Mr. Grous, Ms. Cook, and Mr. Snyder, thank you for making me fall in love with chemistry. You all said a career in the sciences is not easy, the road will be tough, and despite that, everything will turn out fine. And it did! Mr. Snyder, thank you for teaching me that, “Perfect practice makes perfect.” v TABLE OF CONTENTS Chapter 1. Introduction 1 Chapter 2. Curating and Preprocessing High Throughput Screening Data for Quantitative Structure Activity Relationship Modeling 4 Chapter Overview 4 Introduction 4 Methods 6 Materials 6 Procedure 6 Summary 19 Chapter 3. Critical Evaluation of Human Oral Bioavailability for Pharmaceutical Drugs by Using Various Cheminformatics Approaches 20 Chapter Overview 20 Introduction 21 Methods 26 Human Oral Bioavailability Dataset 26 Chemical Descriptors 28 Modeling Approaches 29 Random Forest (RF) 29 Support Vector Machine (SVM) 30 k-Nearest Neighbor (kNN) 30 CASE Ultra 31 Combinatorial QSAR Modeling Workflow 31 vi Universal Statistical Figures of Merit for All Models 33 Integrating Human Intestinal Transporters Interactions of Compounds Into Oral Bioavailability Predictions 34 Results 34 Overview of the Dataset 34 Category Models 37 Continuous Models 40 Integrating Human Intestinal Transporter Parameters into the CNT-%F Bioavailability Model 43 Discussion 50 Interpretation of QSAR Models 50 Conclusion 55 Chapter 4. Mechanism Profiling of Hepatotoxicity Caused by Oxidative Stress Using the Antioxidant Response Element Reporter Gene Assay Models and Big Data 57 Chapter Overview 57 Introduction 59 Methods 61 qHTS ARE-bla Dataset 61 In Vivo Hepatotoxicity Dataset 63 Chemical Structure Curation 63 Measures of Quality and Reliability 64 Workflow for Profiling the Mechanisms of Liver Toxicants 65 vii Automated biological response profiling 66 QSAR Modeling of the ARE-bla Pathway 67 Chemical IVIVC Evaluation 68 Results 69 Overview of qHTS ARE-bla Dataset 69 qHTS ARE-bla Combinatorial QSAR Models 69 Liver Toxicants Profile and Its IVIVC 74 Discussion 79 Conclusions 85 Appendix 87 References 197 viii 1 Chapter 1. Introduction Animal toxicity testing is utilized in the pharmaceutical industry, academia and regulatory agencies to evaluate the potential risks and safety of chemical substances. The information learned from animal testing is highly valuable, but comes at a high cost, is time consuming, and has low throughput. Sometimes the data from animal testing could not predict compound toxicity in human due to the species difference. Cell based assays developed to study specific cellular and biochemical mechanisms are commonly used as an alternative; however, correlations between in vitro and complex in vivo endpoints, such as hepatotoxicity and bioavailability, are not very clear (Kim et al. 2014, 2015). Another alternative is Quantitative Structure Activity Relationship (QSAR) approach, which explores the relationships between the biological activity of a chemical and its sub- structural features and/or physicochemical properties. The unique advantage of using a QSAR model, unlike experimental models which require all chemicals to be evaluated to also be synthesized, a chemical could be evaluated without being synthesized. QSAR saves time, money, and resources. Thus many predictive QSAR models have been developed since its inception in the 1960’s; however, not many scientists have been able to develop predictive in vivo QSAR models (Cherkasov et al. 2014; Ekins 2014). In the 1960’s, Corwin Hansch evaluated two experimentally based physicochemical properties such as Hammett constant and water partition coefficient and found that these parameters were highly correlated with biological activity in plants (Hansch et al. 1963). His studies, as well many other early QSAR studies, analyzed how the change of functional groups and atoms affect the physicochemical properties and biological activity of a chemical (Fujita et al. 1964; Klopman 1984). Hansch later 2 discovered that these parameters had general applicability in insects, rats, and bacteria as well (Hansch and Fujita 1964). These studies suggested that using experimental results as descriptors (quantitative variables based on chemical

Load more