Prednisolone-Induced Differential Gene Expression in Mouse Liver
Total Page:16
File Type:pdf, Size:1020Kb
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/106938 Please be advised that this information was generated on 2021-10-05 and may be subject to change. Text mining and information extraction for the life sciences: an enhanced science approach Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann, volgens besluit van het college van decanen in het openbaar te verdedigen op donderdag 16 mei 2013 om 10.30 uur precies door Wilco Wilhelmus Maria Fleuren geboren op 8 april 1978 te Nijmegen Promotor Prof. dr. J. de Vlieg Copromotor Dr. W.B.L. Alkema (NIZO food research BV) Manuscript-commissie Prof. dr. S.S. Wijmenga Prof. dr. H.G. Brunner Prof. dr. A.J. van Gool This work was performed within the NBIC BioRange project “A Systems Bioinformatics Approach For Evaluating And Translating Drug-Target Effects In Disease Related Pathways” (BR4.2) and continued within the Netherlands eScience project “Creation of food specific ontologies for food focused text mining and knowledge management studies” (027.011.306). Title: Text mining and information extraction for the life sciences: an enhanced science approach. Copyright © 2013 Wilco Fleuren, Nijmegen, The Netherlands ISBN/EAN: 978-90-9027438-6 Cover design: Walter van Rooij, Buro Brandstof Printed by: CPI Wöhrmann Print Service, Zutphen "Vorsprung durch Technik", Audi AG TABLE OF CONTENTS Chapter 1 General introduction 6 Chapter 2 CoPub update: CoPub 5.0 a text mining system to answer 22 biological questions Chapter 3 Identification of new biomarker candidates for glucocorticoid 34 induced insulin resistance using literature mining Chapter 4 Prednisolone-induced changes in gene-expression 56 profiles in healthy volunteers Chapter 5 Prednisolone induces the Wnt signaling pathway in 3T3-L1 80 adipocytes Chapter 6 Prednisolone-induced differential gene expression in mouse liver 106 carrying wild type or a dimerization-defective glucocorticoid receptor Chapter 7 Identification of T lymphocyte signal transduction pathways 134 Chapter 8 Functional annotation of microorganisms using text mining 166 Chapter 9 General discussion 190 Summary 200 Samenvatting 206 Dankwoord 213 Curriculum vitae 217 Bibiliography 219 CHAPTER 1 General introduction General introduction 9 General introduction Products for human consumption and intake need to be safe and should not be harmful. Hence the pharmaceutical and food industry spent millions of dollars on research and 1 Chapter development (R&D) to develop better and safer products. The safety of drugs and food products is tested in experiments before they are marketed. These experiments result in enormous amounts of biological data that need to be analyzed in order to draw conclusions, to initiate follow-up studies, and to generate new hypothesis. This so-called data driven research is becoming more and more important in these industries and requires specialists which are able to extract useful information out of these biological data. Results from experiments are published in scientific papers and reports for which the abstracts often are deposited in the publicly available Medline database. Medline contains over 22 million of journal citations and abstracts from biomedical literature and is growing exponentially each year. Extraction of useful knowledge from these abstracts in order to annotate results from experiments requires methods which can do this extraction automatically. Text mining and Information extraction Text mining (TM) and automatic information extraction (IE) has evolved over the years into a sophisticated and specialized discipline in the bio medical sciences. The process of TM and IE involves high quality information extraction from literature and subsequently linking this information to experimental data. Currently, the most used approaches to retrieve knowledge from text are by co- occurrences and by natural language processing (NLP) [1]. Co-occurrence-based methods Co-occurrence based methods use the simple concept that two keywords that co-occur in the same body of text often are functionally related. For example the co-occurrence of retinol-binding protein 4 (RBP4) and insulin resistance suggests a functional relationship between the gene and the disease [2, 3]. However, also many keywords co-occur in text without being functionally related. Therefore these methods often use a statistically scoring based on frequency of occurrence of both keywords, to add a degree of significance to the relationship. Co-occurrence based methods do not provide information about the type of relation between keywords but are therefore able to detect any type of relationship. 10 Chapter 1 NLP-based methods NLP based methods use prior knowledge to retrieve information out of text. This prior knowledge can be for instance how language is structured or specific knowledge on how biological information is mentioned in the literature. NLP is defined as "the processing of natural language, i.e. human languages by computers". In contrast to co-occurrence based systems, NLP systems are phrase based and include information about the relationship between two keywords, e.g. gene A inhibits gene B or gene C is involved in disease Y. NLP based systems are very suitable to search for specific relationships. However these systems are limited to predefined relationships and are often very computationally intensive. Drug discovery The first part of this thesis describes the application of bioinformatics and especially IE and TM to study drug (side) effects. The development of a new drug is very complex, time-consuming and expensive. On average the development of a drug takes about twelve years and costs are in the order of 1 billion US dollars. Already in the United States only, approximately just one out of six drugs that enter the clinical testing pipeline will eventually obtain approval to be marketed [4]. Despite the increasing investment in research and development (R&D), the attrition rate of drug candidates increases. Therefore early detection of failure of drug candidates is essential to reduce the drug development costs. Traditionally, the drug development process can be divided into distinct phases starting with the target discovery phase and eventually leading to the successful market approval of a drug candidate (Figure 1). Target discovery In the first phase possible new therapeutic targets will be identified. Information about Figure 1 A schematic representation of the drug development pipeline. The process ranges from the identification of potential drug targets in the Target discovery stage to the successful introduction of the drug on the market in the Marketing & sales stage. the function and the role of these targets in the development of the disease will be gathered and subsequently used to get an overview of the pathways and genes/proteins that are involved. This step is very important since it gives insight into the biological context of the target and its relation to the disease. Typically in this phase information General introduction 11 about the target will be retrieved from earlier studies and from the scientific literature. in vitro in vivo Promising targets will be validated in both and experiments in which the therapeutic effect and the relation of the target to the disease will be further investigated. Chapter 1 Chapter Lead discovery and lead optimization Once possible targets have been detected and verified, screening of large compound libraries starts to identify so called lead compounds, which are able to modulate the biological activity of targets. Subsequently in the lead optimization phase these identified compounds will be chemically modified to improve their potency, selectivity and efficacy. The efficacy of compounds will be determined in genome-wide association studies using microarray technology, in which changes in gene expression are measured as a result of compound treatment. These expression profiles can be compared with expression profiles from other compounds and can also be used to get an insight into the mechanisms and pathways affected. In the target discovery, lead discovery and lead optimization phases is it important to gather as much as possible information about the target, its biologcal surrounding and the compound to predict (side) effects. TM and IE are important techniques to retrieve this information from the scientific literature. Pre-clinical development Compounds which showed a promising effect on the biological target will move to the pre-clinical development phase were they will be further tested for efficacy and safety. Compounds are tested on toxicity in animal models. Furthermore in pharmacokinetics experiments the compounds are tested for their absorption, distribution, metabolism, and excretion (ADME) properties in the body. The pre-clinical development phase is important because the ultimate safety profile of compounds will determined. Based on this safety profile decisions will be made to move the compounds to the next phase in which they will be tested in humans. Approximately two third of the compounds fail in the pre-clinical development phase because of poor toxicity and/or pharmacokinetics properties. Clinical trials In the last phase compounds also called drug candidates will be tested in humans in so called clinical trials. Clinical