A Semantically Enabled Framework for Small Molecule Metabolic Fate Prediction: the Web As a Biochemical Reactor
Total Page:16
File Type:pdf, Size:1020Kb
A Semantically Enabled Framework for Small Molecule Metabolic Fate Prediction: the Web as a Biochemical Reactor by Leonid Chepelev A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology Carleton University Ottawa, Ontario © 2011, Leonid Chepelev Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue Wellington OttawaONK1A0N4 OttawaONK1A0N4 Canada Canada Your file Votre reference ISBN: 978-0-494-83248-6 Our file Notre r6f6rence ISBN: 978-0-494-83248-6 NOTICE: AVIS: The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission. In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these. While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis. 1*1 Canada Abstract Prediction and dynamical characterization of metabolic fate of small molecules is essential to understanding chemical toxicity, activity, and mechanisms of bioaccumulation, degradation, and industrial "green" synthesis. Unfortunately, experimental methods for metabolic fate characterization are prone to missing transient metabolites, experience difficulty in adjusting to variations in metabolic networks, and are expensive in the face of the enormous chemical space that requires characterization. The quantitative structure-activity relationship-based methods employed currently to predict metabolic fate are insufficient since they often ignore the dynamics of metabolism or are limited in scope and applicability. First principle-based approaches currently require expensive manual coordination of tools and formalisms to develop a cohesive dynamical model. An integrative framework to support these efforts is vital for metabolomics research to progress efficiently while taking full advantage of the expanding biochemical information and improving computational resources. In this work, I investigate the feasibility of constructing such a framework to integrate the computational and informational resources required for automated metabolic fate prediction of small molecules. To this end, I demonstrate a framework, based on semantic web technologies, which attempts to connect the highly fragmented world of chemical information at the levels of data representation and computational resource interoperability. This modular prototype framework is reconfigurable and adjustable for future advancements in computational chemistry and improvements in computational power. Within this framework, enzymes are distributed as semantically annotated web ii services, each with logically defined molecular input and output classes. A machine client that is capable of automatically classifying small molecules strings together biochemical pathways by calling upon the appropriate distributed enzyme services and obtaining the kinetic parameters and reaction specifications for each reaction. Kinetic reaction analysis is then carried out on the resultant pathway and the toxicity of each chemical entity is assessed using semantically encoded toxicity decision trees. Although the framework developed here does not aim to address all aspects of metabolism prediction, or to cover any appreciable portion of all human enzymes, it demonstrates the feasibility of transforming the Web into a unified, reconfigurable, distributed biochemical reactor and can open new possibilities for interdisciplinary research. m Acknowledgements First and foremost, I would like to thank Dr. Michel Dumontier for creating a supportive, opportunity-rich environment for my research and for allowing me considerable freedom in carrying out this investigation. Truly, I shall always kindly remember the years spent on my doctoral thesis as a member of Dr. Dumontier's laboratory and working in close collaboration with Michel. I would like to thank Dr. Alain St-Amant and Dr. Maria De Rosa, the committee members who have assessed my progress and helped with constructive feedback. I would also like to thank my family for unwavering support and understanding. I would like to thank the faculty of Carleton University for providing considerable support, and the various donors of the scholarships and awards I have received throughout my studies. I am not certain whether this work would have been possible without this help. I would also like to thank the National Sciences and Engineering Research Council for their extensive funding through the Canada Graduate Scholarships program and the Michael Smith foreign studies award that has allowed me to work with some of the leaders in my field at the European Bioinformatics Institute in Cambridge, United Kingdom. I am also thankful to my lab mates for all the fun times and conversations we have had together. In particular, I would like to thank Dana Klassen for his collaboration on decision trees. Finally, I would like to thank the group of Dr. Mark Wilkinson at the University of British Columbia and the group of Dr. Christopher Baker at the University of New Brunswick for their work on some of the Semantic Web technologies which form the foundation of some parts of this work. iv Preface The vast majority of this work was carried out by me, and this thesis was written solely by me with critical input from Dr. Michel Dumontier. I would like to thank Dr. Michel Dumontier for the supervision of my work as well as valuable and insightful comments. The majority of this thesis is a result of my original contributions (which I assess at about 90%) and Dr. Dumontier's contributions in the form of suggestions and critical assessments (contributing to 10% of all work). Parts of this thesis, however, are a reflection of collaborative studies. Thus, I would like to thank Dana Klassen for his participation in the semantic decision tree project and for his work on the OWL encoding on decision trees, all reported in Section 3.3.6. For this session, I approximate that Dana Klassen has carried out 30% of the work, with me carrying out 60% and Dr. Dumontier the remaining 10%. I would also like to thank Dr. Christopher Baker's group for their collaboration in demonstrating the formalization of chemical annotation and classification with SADI services, discussed in Section 4.3.7. Finally, I would like to thank the curators at the European Biochemical Institute, and in particular, Marcus Ennis, for some of their assessment of the chemical classification outcomes for the structural hierarchy construction reported in Chapter 3. v Table of Contents Abstract .. ii Acknowledgements iv List of Tables xi List of Figures xii List of Appendices xv List of Abbreviations xvi 1 Chapter: Introduction 1 1.1 General Introduction 2 1.2 Xenobiotic Metabolism 7 1.2.1 Phase I Xenobiotic Metabolism 8 1.2.2 Phase II of Xenobiotic Metabolism 12 1.2.3 Non-Classical Transformations 14 1.3 Representation and Cataloguing of Metabolic Information 15 1.3.1 Pathway Map Databases 19 1.3.2 Enzyme Kinetics Databases 21 1.4 Metabolic Fate Prediction 25 1.4.1 Graph Theory and Expert-Based Approaches 27 1.4.2 Statistical Learning and Quantitative Structure-Activity Relationships 29 1.4.3 First Principles-Based Approaches 31 1.4.4 Commercial Packages for Metabolic Fate Prediction 35 1.5 Semantic Web Technologies 37 1.6 Semantic Web for Chemical Knowledge Representation 43 1.7 An Integrated Metabolic Fate Prediction Framework 50 vi 2 Chapter: Chemical Entity Semantic Specification... 53 2.1 Abstract............. 54 2.2 Introduction 56 2.2.1 Consensus Chemical Entity Identifiers 56 2.2.2 Common Chemical Information Representation Platform 58 2.2.3 Chemical Knowledge Integration 61 2.2.4 Overview. 63 2.3 Results and Discussion 64 2.3.1 CHESS Representation Overview 64 2.3.2 Molecular Specification 66 2.3.3 Semantic Web-Enabled Cheminformatics: Chemical Searching 70 2.3.4 Chemical Descriptor Specification... 77 2.3.5 Chemical Information Integration 82 2.3.6 Variable-Level Granularity Semantically Enriched Annotations and Queries.... 84 2.3.7 Representing and Querying Reactive Transformations 89 2.3.8 Supporting Tools and Interfaces 94 2.4 Conclusions