Stochastic Modeling of RNA Polymerase Predicts Transcription Factor Activity

Total Page:16

File Type:pdf, Size:1020Kb

Stochastic Modeling of RNA Polymerase Predicts Transcription Factor Activity Stochastic modeling of RNA polymerase predicts transcription factor activity by Joseph Gaspare Azofeifa B.A., Vassar College A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2017 This thesis entitled: Stochastic modeling of RNA polymerase predicts transcription factor activity written by Joseph Gaspare Azofeifa has been approved for the Department of Computer Science Prof. Robin Dowell Prof. Aaron Clauset Prof. Elizabeth Bradley Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. Azofeifa, Joseph Gaspare (Ph.D., Computer Science) Stochastic modeling of RNA polymerase predicts transcription factor activity Thesis directed by Prof. Robin Dowell Seventy-six percent of disease associated variants occur in non-genic sites of open chromatin suggesting that the regulation of gene expression plays a crucial role in human health. Nucleosome- free with flanking chromatin modifications, these regulatory loci are optimal platforms for tran- scription binding and, in fact, recruit RNA Polymerase. The subsequent transcription of these sites is an unintuitive discovery as these regulatory loci do not harbor an open reading frame. The role these enhancer RNAs (eRNA) play in downstream gene regulation remains an open and exciting question. However, fast RNA degradation rates challenge eRNA identification, requiring non-traditional sequencing technologies. Global Run-on followed by sequencing (GRO- seq) detects non-genic transcription and thus, in theory, eRNA presence. Yet GRO-seq is not without noise and bias, predictive modeling of both the sequencing error and the stochastic nature of RNA polymerase itself is required to discover enhancer RNA transcripts. In short, this thesis asks: what regulates eRNA transcription? To answer this question, I first develop two novel probabilistic models to unbiasedly determine eRNA location. A regression method was constructed to quickly identify all transcribed regions in GRO-seq. Based on the known enzymatic stages of RNA polymerase, a subsequent latent variable model was built to infer the precise location of eRNA initiation. With the relevant technology developed, I undertake a massive data integration project and show strong contextual relationships between TF-binding events, epigenetics and eRNA transcription. I conclude by showing that enhancer RNAs can unbiasedly quantify transcription factor activity and predict cell type. Dedication To my family. v Acknowledgements First, I would like to thank my PhD committee, Dr. Aaron Clauset, Dr. Elizabeth Bradley, Dr. Michael Mozer, Dr. Katerina Kechris and my thesis advisor, Dr. Robin Dowell. Apart from shaping many of the ideas we ultimately published, Robin also trusted in my work-ethic and independence which provided me the necessary freedom I needed to thrive in graduate school. I would like to thank Dr. Manual Lladser for some of the preliminary work on the mixture model and also giving me with confidence to pursue more rigorous mathematics. Dr. Tim Read's endless phone calls helped me deeply in developing the MD-Score and correctly framing the story of eRNAs. I am very indebted to Josephina Hendrix who performed all the short read alignments (> 700 datasets) for the MD-score publication. Finally, I would like to formally thank Dr. Mary Allen. Mary and I sat next to each other for 5 years and I can say, with little doubt, that I learned all my molecular biology through her. Apart from acknowledgments related directly to my thesis, I would like to thank the IQ Biology program and the BioFrontiers Institute for funding support but also for providing a PhD training program that really fostered an atmosphere of creativity, collaboration and interdisciplinary research. Specifically, I would like to thank Dr. Thomas Cech, Dr. Jana Watson-Capps, Amber McDonell, Kim Little, Kim Kelly and, most of all, Dr. Andrea Stith. I would like to thank Sam Way and Ryan Langendorf who I feel honored to call friends; I hope to continue our lively scientific debates in the future. Finally, I would like to thank my family, Andie and Katherine Azofeifa, without whose support none of this would have been possible. I am not sure my mother realized she'd raise two scientists but I am extremely grateful for such a strong and loving family. vi Contents Chapter 1 Introduction 1 1.1 Biological Setup . .1 1.2 Regulatory Element Identification . .3 1.2.1 ChIP-seq . .3 1.2.2 Chromatin State and Epigenetics . .5 1.2.3 Regulatory Element Identification . .6 1.2.4 GRO-seq and eRNAs . .9 1.3 Thesis Outline . 11 1.3.1 Overview . 11 1.3.2 Chapter 2: Transcribed Region Annotation . 12 1.3.3 Chapter 3: Stochastic models of RNA Polymerase . 12 1.3.4 Chapter 4: eRNA Profiles Predict Transcription Factor Activity . 13 2 Transcribed Region Annotation 14 2.1 Introduction . 14 2.2 Nascent Transcript Model . 16 2.2.1 Description . 16 2.2.2 Parameter Estimation . 17 2.2.3 Software Design . 20 vii 2.3 Model Accuracy . 21 2.3.1 Datasets . 21 2.3.2 Sensitivity to depth of data . 22 2.3.3 Benchmarking FStitch & Vespucci . 23 2.4 Biological Analysis . 25 2.4.1 Annotation Comparisons . 25 2.4.2 Characterizing bidirectional RNA Activity . 27 2.4.3 Differential transcription at annotated genes: a comparison of FStitch to Allen et. al. 29 2.4.4 Differential transcription using all FStitch active calls . 30 2.5 Conclusions . 32 3 Stochastic models of RNA Polymerase 34 3.1 Introduction . 34 3.2 Modeling RNA Polymerase . 36 3.2.1 Double Geometric Distribution . 36 3.2.2 Exponentially Modified Gaussian . 39 3.2.3 Poisson Point Process . 47 3.2.4 Mixture Models . 49 3.2.5 Bayesian Extensions . 53 3.3 Applications to GRO-seq . 55 3.3.1 Numerical confirmation of model inference by simulation . 55 3.3.2 Predicting enzymatic changes of RNAP following Experimental Perturbation 58 3.3.3 RNAP model accurately predicts marks of regulatory elements . 59 3.3.4 Three dimensionally paired loci display centrality and associativity based on bidirectional transcription . 61 viii 4 eRNA Profiles Predict Transcription Factor Activity 64 4.1 Introduction . 64 4.2 Enhancer RNAs originate from transcription factor binding sites . 65 4.3 Enhancer RNA origins mark sites of regulatory TF binding . 67 4.4 eRNA origins co-localize with TF-binding motifs . 68 4.5 Motif displacement scores quantify TF activity . 70 4.6 MD-scores predict TF activity across cell types . 72 4.7 Conclusion . 73 5 Looking Forward 76 5.1 Mixture Models . 76 5.1.1 Model Selection . 77 5.1.2 Integration of other Data Types . 79 5.2 TF Activity Inference Models . 80 5.3 Predicting Enhancer to Gene Interactions . 82 5.3.1 Network Structure Prediction . 82 5.3.2 Correlation Networks . 83 5.3.3 Bayesian Networks . 84 5.3.4 A 3D Genome . 87 5.4 Thesis Conclusions . 88 Bibliography 90 Appendix A Supplementary Material to Chapter 2 105 ix B Supplementary Material to Chapter 3 109 B.1 Seeding the EM . 110 B.2 Datasets . 111 B.3 CTCF ChIA-PET network construction . 112 B.4 Software Package: Tfit . 112 B.5 Numerical confirmation of model inference by simulation . 112 C Supplementary Material to Chapter 4 117 C.1 eRNA origins . 117 C.2 Genomic Feature Data Integration . 118 C.3 Nascent transcription data processing . 118 C.4 Tfit parameters and bidirectional prediction . 119 C.4.1 Template Matching . 119 C.4.2 EM Algorithm and Bidirectional Origin estimation . 121 C.4.3 Footprint Estimation . 121 C.5 Computation of Bimodality, ∆BIC . 122 C.6 Motif Curation and Motif Scanning . 123 C.7 MD-score Hypothesis Testing . 124 C.7.1 The Motif Displacement score . 124 C.7.2 MD-score significance under stationary model . 125 C.7.3 MD-score significance under a non-stationary background model . 125 C.7.4 MD-score significance between experiments . 126 C.8 Cell type and TF enrichment analysis . 127 C.9 Associated File Types . 128 C.10 IPython Notebook . 129 Chapter 1 Introduction 1.1 Biological Setup A central goal in genetics is to understand how genotype (the unique ordering of DNA) trans- lates to phenotype (observable qualities like height or eye color). Although some phenotypic traits are innocuous, one's genotype may influence cancer susceptibility, a predisposition to alcoholism or cognitive disabilities[47, 21, 62]. For advancements in human medicine|as well as a fundamental understanding of biology|genetics remains an exciting and active area of research. A long way from Mendel's pea plants, whole genome sequencing makes possible the com- plete identification of an organism's genotype. Resolving the human genome's nearly 3.2 billion nucleotides, we now know that alterations in the gene sequence of p53, kvlqt1 and adam19 correlate with incidences of cancer, Type II diabetes and heart disease respectively[14, 173, 183]. Although genome-wide association studies (GWAS)[25] successfully link genotypic variants to phenotype, they require hundreds or even thousands of genomic samples to achieve significant correlations[88]. Yet furthermore, GWAS is unable to predict the phenotypic consequences of a novel genetic variant, unassociated with a specific phenotype. In contrast, the study of gene expression|the biochemical or molecular process by which a genotype renders a phenotype|promises to uncover why certain genotypes result in specific phenotypes. To summarize briefly, gene expression begins with the enzyme.
Recommended publications
  • ISCB's Initial Reaction to the New England Journal of Medicine
    MESSAGE FROM ISCB ISCB’s Initial Reaction to The New England Journal of Medicine Editorial on Data Sharing Bonnie Berger, Terry Gaasterland, Thomas Lengauer, Christine Orengo, Bruno Gaeta, Scott Markel, Alfonso Valencia* International Society for Computational Biology, Inc. (ISCB) * [email protected] The recent editorial by Drs. Longo and Drazen in The New England Journal of Medicine (NEJM) [1] has stirred up quite a bit of controversy. As Executive Officers of the International Society of Computational Biology, Inc. (ISCB), we express our deep concern about the restric- tive and potentially damaging opinions voiced in this editorial, and while ISCB works to write a detailed response, we felt it necessary to promptly address the editorial with this reaction. While some of the concerns voiced by the authors of the editorial are worth considering, large parts of the statement purport an obsolete view of hegemony over data that is neither in line with today’s spirit of open access nor furthering an atmosphere in which the potential of data can be fully realized. ISCB acknowledges that the additional comment on the editorial [2] eases some of the polemics, but unfortunately it does so without addressing some of the core issues. We still feel, however, that we need to contrast the opinion voiced in the editorial with what we consider the axioms of our scientific society, statements that lead into a fruitful future of data-driven science: • Data produced with public money should be public in benefit of the science and society • Restrictions to the use of public data hamper science and slow progress OPEN ACCESS • Open data is the best way to combat fraud and misinterpretations Citation: Berger B, Gaasterland T, Lengauer T, Orengo C, Gaeta B, Markel S, et al.
    [Show full text]
  • Downloaded from [16, 18]
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Predicting growth optimization strategies with metabolic/expression models Permalink https://escholarship.org/uc/item/6nr2539t Author Liu, Joanne Publication Date 2017 Supplemental Material https://escholarship.org/uc/item/6nr2539t#supplemental Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Predicting growth optimization strategies with metabolic/expression models A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Bioinformatics and Systems Biology by Joanne K. Liu Committee in charge: Professor Karsten Zengler, Chair Professor Nathan Lewis, Co-Chair Professor Michael Burkart Professor Terry Gaasterland Professor Bernhard Palsson Professor Milton Saier 2017 Copyright Joanne K. Liu, 2017 All rights reserved. The dissertation of Joanne K. Liu is approved, and it is acceptable in quality and form for publication on micro- film and electronically: Co-Chair Chair University of California, San Diego 2017 iii DEDICATION To my mom and dad, who I cannot thank enough for supporting me throughout my education, and to The One. iv EPIGRAPH Essentially, all models are wrong, but some are useful. |George E. P. Box v TABLE OF CONTENTS Signature Page.................................. iii Dedication..................................... iv Epigraph.....................................v Table of Contents................................
    [Show full text]
  • ACM-BCB 2016 the 7Th ACM Conference on Bioinformatics
    ACM-BCB 2016 The 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics October 2-5, 2016 Organizing Committee General Chairs: Steering Committee: Ümit V. Çatalyürek, Georgia Institute of Technology Aidong Zhang, State University of NeW York at Buffalo, Genevieve Melton-Meaux, University of Minnesota Co-Chair May D. Wang, Georgia Institute of Technology and Program Chairs: Emory University, Co-Chair John Kececioglu, University of Arizona Srinivas Aluru, Georgia Institute of Technology Adam Wilcox, University of Washington Tamer Kahveci, University of Florida Christopher C. Yang, Drexel University Workshop Chair: Ananth Kalyanaraman, Washington State University Tutorial Chair: Mehmet Koyuturk, Case Western Reserve University Demo and Exhibit Chair: Robert (Bob) Cottingham, Oak Ridge National Laboratory Poster Chairs: Lin Yang, University of Florida Dongxiao Zhu, Wayne State University Registration Chair: Preetam Ghosh, Virginia CommonWealth University Publicity Chairs Daniel Capurro, Pontificia Univ. Católica de Chile A. Ercument Cicek, Bilkent University Pierangelo Veltri, U. Magna Graecia of Catanzaro Student Travel Award Chairs May D. Wang, Georgia Institute of Technology and Emory University JaroslaW Zola, University at Buffalo, The State University of NeW York Student Activity Chair Marzieh Ayati, Case Western Reserve University Dan DeBlasio, Carnegie Mellon University Proceedings Chairs: Xinghua Mindy Shi, U of North Carolina at Charlotte Yang Shen, Texas A&M University Web Admins: Anas Abu-Doleh, The
    [Show full text]
  • Gaëlle GARET Classification Et Caractérisation De Familles Enzy
    No d’ordre : 000 ANNÉE 2015 60 THÈSE / UNIVERSITÉ DE RENNES 1 sous le sceau de l’Université Européenne de Bretagne pour le grade de DOCTEUR DE L’UNIVERSITÉ DE RENNES 1 Mention : Informatique École doctorale Matisse présentée par Gaëlle GARET préparée à l’unité de recherche Inria/Irisa – UMR6074 Institut de Recherche en Informatique et Système Aléatoires Composante universitaire : ISTIC Thèse à soutenir à Rennes Classification et le 16 décembre 2014 devant le jury composé de : caractérisation Jean-Christophe JANODET Professeur à l’Université d’Evry-Val-d’Essonne / Rapporteur de familles enzy- Amedeo NAPOLI Directeur de recherche au Loria, Nancy / Rapporteur Colin DE LA HIGUERA matiques à l’aide Professeur à l’Université de Nantes / Examinateur Olivier RIDOUX Professeur à l’Université de Rennes 1 / Examinateur de méthodes for- Mirjam CZJZEK Directrice de recherche CNRS, Roscoff / Examinatrice Jacques NICOLAS melles Directeur de recherche à Inria, Rennes / Directeur de thèse François COSTE Chargé de recherche à Inria, Rennes / Co-directeur de thèse Ainsi en était-il depuis toujours. Plus les hommes accumulaient des connaissances, plus ils prenaient la mesure de leur ignorance. Dan Brown, Le Symbole perdu Tu me dis, j’oublie. Tu m’enseignes, je me souviens. Tu m’impliques, j’apprends. Benjamin Franklin Remerciements Je remercie tout d’abord la région Bretagne et Inria qui ont permis de financer ce projet de thèse. Merci à Jean-Christophe Janodet et Amedeo Napoli qui ont accepté de rapporter cette thèse et à Olivier Ridoux, Colin De La Higuera et Mirjam Czjzek pour leur participation au jury. J’aimerais aussi dire un grand merci à mes deux directeurs de thèse : Jacques Ni- colas et François Coste, qui m’ont toujours apporté leur soutien tant dans le domaine scientifique que personnel.
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • Workshop Focuses on DNA Sequence Annotation
    Workshop Focuses on DNA Sequence Annotation By Richard Mural, Life Sciences Division, Oak Ridge National Laboratory Introduction Automatic annotation of large amounts of genomic DNA sequence clearly is and will continue to be a formidable challenge. This problem will be addressed properly only by developing very efficient computational tools for initial sequence annotation, treating the annotations as hypotheses, and testing and verifying them in the laboratory. Additionally, if the generated annotations are to be of maximum usefulness, results must be stored in an easily retrievable and queryable form in well-curated databases. The "If you sequence it, the community will annotate it" approach is unlikely to produce desired results, and new paradigms and possibly new organizational models will need to be implemented to present genomic sequence in its most useful form. Annotation Meeting The Fifth International Conference on Intelligent Systems for Molecular Biology held June 21 25, 1997, in Porto Carras, Greece, ended with a workshop on Automatic Annotation of Genome Sequence Data. Eight workshop speakers addressed three basic questions: What are the challenges in automatic annotation? What are the best technologies for doing this job? What is the best division of labor between biology and computer science? Introductory remarks by session chairman Chris Sander [European Molecular Biology Laboratory European Bioinformatics Institute (EBI)] made clear that no one yet has the experience to know the right way to proceed with automatic annotation. Richard Durbin (Sanger Cantre) stressed an often-repeated theme that proper annotation will require wet-laboratory work as well as computational annotation. He also stressed the need for curated databases.
    [Show full text]
  • Studying the Regulatory Landscape of Flowering Plants
    Studying the Regulatory Landscape of Flowering Plants Jan Van de Velde Promoter: Prof. Dr. Klaas Vandepoele Co-Promoter: Prof. Dr. Jan Fostier Ghent University Faculty of Sciences Department of Plant Biotechnology and Bioinformatics VIB Department of Plant Systems Biology Comparative and Integrative Genomics Research funded by a PhD grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT Vlaanderen). Dissertation submitted in fulfilment of the requirements for the degree of Doctor in Sciences:Bioinformatics. Academic year: 2016-2017 Examination Commitee Prof. Dr. Geert De Jaeger (chair) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Klaas Vandepoele (promoter) Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Prof. Dr. Jan Fostier (co-promoter) Faculty of Engineering and Architecture, Department of Information Technology (INTEC), Ghent University - iMinds Prof. Dr. Kerstin Kaufmann Institute for Biochemistry and Biology, Potsdam University Prof. Dr. Pieter de Bleser Inflammation Research Center, Flanders Institute of Biotechnology (VIB) and Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium Dr. Vanessa Vermeirssen Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University Dr. Stefanie De Bodt Crop Science Division, Bayer CropScience SA-NV, Functional Biology Dr. Inge De Clercq Department of Animal, Plant and Soil Science, ARC Centre of Excellence in Plant Energy Biology, La Trobe University and Faculty of Sciences, Department of Plant Biotechnology and Bioinformatics, Ghent University iii Thank You! Throughout this PhD I have received a lot of support, therefore there are a number of people I would like to thank. First of all, I would like to thank Klaas Vandepoele, for his support and guidance.
    [Show full text]
  • CV Burkhard Rost
    Burkhard Rost CV BURKHARD ROST TUM Informatics/Bioinformatics i12 Boltzmannstrasse 3 (Rm 01.09.052) 85748 Garching/München, Germany & Dept. Biochemistry & Molecular Biophysics Columbia University New York, USA Email [email protected] Tel +49-89-289-17-811 Photo: © Eckert & Heddergott, TUM Web www.rostlab.org Fax +49-89-289-19-414 Document: CV Burkhard Rost TU München Affiliation: Columbia University TOC: • Tabulated curriculum vitae • Grants • List of publications Highlights by numbers: • 186 invited talks in 29 countries (incl. TedX) • 250 publications (187 peer-review, 168 first/last author) • Google Scholar 2016/01: 30,502 citations, h-index=80, i10=179 • PredictProtein 1st Internet server in mol. biol. (since 1992) • 8 years ISCB President (International Society for Computational Biology) • 143 trained (29% female, 50% foreigners from 32 nations on 6 continents) Brief narrative: Burkhard Rost obtained his doctoral degree (Dr. rer. nat.) from the Univ. of Heidelberg (Germany) in the field of theoretical physics. He began his research working on the thermo-dynamical properties of spin glasses and brain-like artificial neural networks. A short project on peace/arms control research sketched a simple, non-intrusive sensor networks to monitor aircraft (1988-1990). He entered the field of molecular biology at the European Molecular Biology Laboratory (EMBL, Heidelberg, Germany, 1990-1995), spent a year at the European Bioinformatics Institute (EBI, Hinxton, Cambridgshire, England, 1995), returned to the EMBL (1996-1998), joined the company LION Biosciences for a brief interim (1998), became faculty in the Medical School of Columbia University in 1998, and joined the TUM Munich to become an Alexander von Humboldt professor in 2009.
    [Show full text]
  • Workshop on Human Language Technology and Knowledge
    AI Magazine Volume 23 Number 2 (2002) (© AAAI) Workshop Reports Knowledge Media Institute at The Workshop on Human Open University in England, gave the keynote entitled “Supporting Organizational Learning through the Language Technology and Enrichment of Documents.” Accord- ing to Domingue, only a small per- Knowledge Management centage of corporate training is ever applied within the workplace because organizations tend to use school- based methods of learning in con- trast to organizational learning based Mark T. Maybury on theories of learning in the work- place. Domingue described knowl- edge sharing by enriching web docu- ments with informal and formal representations, a process that cap- tures the context in which a docu- The Workshop on Human Language technologies that could enable ment is created and applied. Technology and Knowledge Manage- knowledge management functions Domingue demonstrated how this ment was held on July 6 and 7 in such as the following: enrichment facilitates retrieval and Toulouse, France, in conjunction Expert discovery: Modeling, cata- comprehension. with the meeting of the Joint Associ- loging, and tracking of distributed In addition, the group heard an ation for Computational Linguistics organizations and communities of invited talk from Hans Uszkoreit and European Association for Com- experts (DFKI Saarbruecken), scientific direc- putational Linguistics (ACL / EACL Knowledge discovery: Identifica- tor at the German Research Center ’01). Human language technologies tion and classification of knowledge for Artificial Intelligence (DFKI), head promise solutions to challenges in from unstructured multimedia data of DFKI Language Technology Lab, human-computer interaction, infor- Knowledge sharing: Awareness of, and professor of computational lin- mation access, and knowledge man- and access to, enterprise expertise guistics at the Department of Com- agement.
    [Show full text]
  • Proceedings of the Eighteenth International Conference on Machine Learning., 282 – 289
    Abstracts of papers, posters and talks presented at the 2008 Joint RECOMB Satellite Conference on REGULATORYREGULATORY GENOMICS GENOMICS - SYSTEMS BIOLOGY - DREAM3 Oct 29-Nov 2, 2008 MIT / Broad Institute / CSAIL BMP follicle cells signaling EGFR signaling floor cells roof cells Organized by Manolis Kellis, MIT Andrea Califano, Columbia Gustavo Stolovitzky, IBM Abstracts of papers, posters and talks presented at the 2008 Joint RECOMB Satellite Conference on REGULATORYREGULATORY GENOMICS GENOMICS - SYSTEMS BIOLOGY - DREAM3 Oct 29-Nov 2, 2008 MIT / Broad Institute / CSAIL Organized by Manolis Kellis, MIT Andrea Califano, Columbia Gustavo Stolovitzky, IBM Conference Chairs: Manolis Kellis .................................................................................. Associate Professor, MIT Andrea Califano ..................................................................... Professor, Columbia University Gustavo Stolovitzky....................................................................Systems Biology Group, IBM In partnership with: Genome Research ..............................................................................editor: Hillary Sussman Nature Molecular Systems Biology ............................................... editor: Thomas Lemberger Journal of Computational Biology ...............................................................editor: Sorin Istrail Organizing committee: Eleazar Eskin Trey Ideker Eran Segal Nir Friedman Douglas Lauffenburger Ron Shamir Leroy Hood Satoru Miyano Program Committee: Regulatory Genomics:
    [Show full text]
  • Downloaded Directly from the TCGA
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Building bioinformatic tools for massive repurposing of multi-omic data in the Sequence Read Archive Permalink https://escholarship.org/uc/item/62g7m386 Author Tsui, Brian Yik Tak Publication Date 2019 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SAN DIEGO Building bioinformatic tools for massive repurposing of multi-omic data in the Sequence Read Archive A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Bioinformatics and Systems Biology by Brian Yik Tak Tsui Committee in charge: Professor Hannah Carter, Chair Professor Jill Mesirov, Co-Chair Professor Ruben Abagyan Professor Terry Gaasterland Professor Nathan Lewis 2019 Copyright Brian Yik Tak Tsui, 2019 All rights reserved. The Dissertation of Brian Yik Tak Tsui is approved, and it is acceptable in quality and form for publication on microfilm and electronically: University of California San Diego 2019 iii DEDICATION I want to thank all my friends and family members who have made my Ph.D. journey fun and enjoyable. iv TABLE OF CONTENTS SIGNATURE PAGE ...................................................................................................................... iii DEDICATION ............................................................................................................................... iv TABLE OF CONTENTS ...............................................................................................................
    [Show full text]
  • Reverse Engineering Gene Regulatory Networks for Elucidating Transcriptome Organisation, Gene Function and Gene Regulation in Mammalian Systems
    Ph.D. Thesis Reverse Engineering Gene Regulatory Networks for elucidating transcriptome organisation, gene function and gene regulation in mammalian systems Director of studies: Candidate: Dr. Diego di Bernardo Vincenzo Belcastro External supervisor: Degree in Computer Science Prof. Mario di Bernardo ARC: Telethon Institute of Genetics and Medicine Year 2007/2010 Contents LIST OF TABLES vii LIST OF FIGURES viii 1 Introduction 3 2 Introduction to reverse-engineering 7 2.1 Microarray technology and microarray data repositories . 12 2.2 Reverse-engineering . 14 2.2.1 Bayesian networks . 14 2.2.2 Association Networks . 19 2.2.3 Ordinary differential equations (ODEs) . 25 2.2.4 Other Approaches . 27 3 Comparison of Reverse-engineering algorithms 29 3.1 Gene network inference algorithms . 30 3.1.1 BANJO . 31 3.1.2 ARACNe . 34 3.1.3 NIR . 37 Contents iii 3.2 In-silico and experimental data . 44 3.2.1 Generation of `In silico' data . 44 3.2.2 Experimental data . 50 3.2.3 Assessing the performance of algorithms . 53 3.3 Results: `in silico' evaluation . 57 3.3.1 Application of parallel NIR . 60 3.4 Results: Experimental evaluation . 61 3.4.1 Reverse Engineering the IRMA network . 62 3.5 Discussion and Conclusion . 67 4 Reverse-engineering gene networks from massive and hetero- geneous gene expression profiles 69 4.1 Introduction . 70 4.2 A new algorithm for reverse-engineering . 71 4.2.1 Normalisation of Gene Expression Profiles . 71 4.2.2 Mutual Information . 72 4.3 Application to simulated data . 75 4.3.1 Simulated Dataset .
    [Show full text]