Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Disease Therapies
Total Page:16
File Type:pdf, Size:1020Kb
Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Disease Therapies Dissertation zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) vorgelegt dem Rat der Biologisch-Pharmazeutischen Fakultät der Friedrich-Schiller-Universität Jena von Diplom-Bioinformatiker Michael Hecker geboren am 6. April 1982 in Erfurt Die vorgelegte Arbeit, finanziert durch das Bundesministerium für Bildung und Forschung (Grant 0313692D), wurde am Leibniz-Institut für Naturstoff-Forschung und Infektionsbiologie e.V. - Hans-Knöll-Institut (HKI) unter der Leitung von PD Dr. Reinhard Guthke (Abteilung Systembiologie / Bioinformatik) im Zeitraum November 2006 bis Januar 2010 angefertigt. Table of contents I Table of contents Abbreviations.................................................................................................................III 1. Introduction......................................................................................................................1 1.1. Gene regulatory network modeling.............................................................................1 1.2. Autoimmune diseases..................................................................................................4 1.3. Objectives and experimental approach........................................................................8 2. Overview of manuscripts...............................................................................................11 3. Manuscript I....................................................................................................................16 Hecker M, Lambeck S, Töpfer S, van Someren E, Guthke R. Gene regulatory network inference: data integration in dynamic models - a review. Biosystems 2009, 96(1):86-103. 4. Manuscript II..................................................................................................................35 Hecker M, Goertsches RH, Engelmann R, Thiesen HJ, Guthke R. Integrative modeling of transcriptional regulation in response to antirheumatic therapy. BMC Bioinformatics 2009, 10:262. 5. Manuscript III.................................................................................................................54 Goertsches RH, Hecker M, Koczan D, Serrano-Fernández P, Möller S, Thiesen HJ, Zettl UK. Long-term genome-wide blood RNA expression profiles yield novel molecular response candidates for IFN-beta-1b treatment in relapsing remitting MS. Pharmacogenomics 2010, 11(2):147-161. 6. Manuscript IV.................................................................................................................70 Hecker M, Goertsches RH, Fatum C, Koczan D, Thiesen HJ, Guthke R, Zettl UK. Network analysis of transcriptional regulation in response to intramuscular interferon-beta-1a multiple sclerosis treatment. Pharmacogenomics J. submitted January 25, 2010. 7. Discussion......................................................................................................................104 Table of contents II 7.1. Discussion of main results.......................................................................................104 7.2. Discussion of methods.............................................................................................107 7.2.1. Experimental approach......................................................................................107 7.2.2. Microarray data preprocessing..........................................................................108 7.2.3. Integrative network inference............................................................................109 7.2.4. Evaluation of inference performance.................................................................111 7.3. Open issues and outlook..........................................................................................113 7.3.1. Prediction of clinical responses.........................................................................113 7.3.2. Further development of TILAR.........................................................................114 7.4. Concluding remarks.................................................................................................116 8. Summary.......................................................................................................................118 9. Zusammenfassung........................................................................................................120 References......................................................................................................................122 Appendix........................................................................................................................127 Danksagung...................................................................................................................141 Ehrenwörtliche Erklärung...........................................................................................142 Tabellarischer Lebenslauf...........................................................................................143 Abbreviations III Abbreviations ACPA anti-citrullinated protein/peptide antibody ACR American College of Rheumatology CDF chip definition file CSF cerebrospinal fluid DAS disease activity score DMARD disease-modifying antirheumatic drug DREAM dialogue on reverse-engineering assessment and methods EDSS expanded disability status scale GO Gene Ontology GRN gene regulatory network HLA human leukocyte antigen IFN-β interferon-beta IRF IFN regulatory factor LARS least angle regression Lasso least absolute shrinkage and selection operator MAID MA-plot-based signal intensity-dependent fold-change criterion MHC major histocompatibility complex MRI magnetic resonance imaging MS multiple sclerosis OLS ordinary least squares PBMC peripheral blood mononuclear cells PPI protein-protein interaction RA rheumatoid arthritis RF rheumatoid factor RNAP RNA polymerase ROC recall-precision curve RPC receiver operating characteristic TF transcription factor TFBS transcription factor binding site TILAR TFBS-integrating least angle regression TNF-α tumor necrosis factor-alpha Introduction 1 1. Introduction At the heart of multicellular life are the complex interactions between genes, proteins and metabolites. These interactions give rise to the function and behavior of biological systems. To study and understand such systems as a whole abstractions are needed such as the concept of networks, in which molecules are represented as nodes and interactions or causal influences are represented by edges. The reconstruction of biomolecular networks from experimental data and subsequent network analysis is a challenging and active field of research. The major focus of the present dissertation is on the inference of gene regulatory networks (GRNs). 1.1. Gene regulatory network modeling Gene expression is mainly regulated at the level of DNA transcription by proteins called transcription factors (TFs). These TFs specifically bind short DNA sequence motifs at the regulatory region of their target genes. In doing so, they control the recruitment of RNA polymerase, which reads the DNA and transcribes it into RNA. However, gene regulation is a far more complex multi-layered process. Any step of gene expression may be modulated, from the RNA synthesis to the post-translational modification of proteins. Many genes are (directly oder indirectly) involved in these gene regulatory mechanisms, and therefore it is reasonable to regard genes as nodes in a network of mutual regulatory interactions. The introduction of DNA microarrays in the mid-1990s offered the possibility to simultaneously measure the levels of thousands of RNA transcripts in a single sample of cells or tissues. Since then, researchers utilized the growing amount of large-scale gene expression data as input for algorithms to infer, or "reverse-engineer", the regulatory interaction structure of genes [1-3]. When inferring models of transcriptional regulation solely from gene expression data, one typically seeks for influences between RNA transcripts. In this case, the expression levels of each gene are explained by the expression levels of other genes. By construction, such GRN models do not generally describe physical interactions as transcripts rather exert their regulatory effects indirectly through the action of proteins, metabolites and effects on the cell environment (figure 1). Therefore, these models can be difficult to interpret in terms of real physical interactions, and the implicit description of hidden regulatory factors may limit the reliability of the inference results. To overcome these issues and support the network reconstruction it is necessary to integrate additional biological information. Diverse types of data (e.g. protein-protein and protein- Introduction 2 Figure 1. Gene regulation is carried out by interactions of RNA molecules, proteins and metabolites. (A) Simplified illustration of an example GRN where gene "3" encodes a membrane-bound metabolite transporter protein (green shape). The metabolite (blue triangle) that is imported by this protein binds a TF (orange shape). The activated TF binds the DNA and together with RNA polymerase (RNAP) initiates the transcription of gene "2". Hence, the expression of gene "2" is influenced by the other two gene transcripts (red lines). (B) A graph model of the network in (A). Because the model is inferred from measurements of RNA transcripts only, it implicitly captures the regulatory mechanisms at the