ENCODE: Understanding the Genome Michael Snyder May 8, 2013

Total Page:16

File Type:pdf, Size:1020Kb

ENCODE: Understanding the Genome Michael Snyder May 8, 2013 ENCODE: Understanding the Genome Michael Snyder! " May 8, 2013 Conflicts: Personalis, Genapsys, Illumina! Slides From Ewan Birney, Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE) •# NHGRI-funded consor?um •# Goal: delineate all “func?onal” elements in the human genome •# Wide array of experimental assays •# Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0 The ENCODE Project Consor?um. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 2012 Project website: h5p://encodeproject.org The ENCODE Consor?um Phase 2 Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) AddiFonal ENCODE ParFcipants: Ellio? Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wibrodt .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 The ENCODE Consor?um Phase 3 Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) Mike Cherry John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Sco Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) Brenton Graveley (John Rinn, Others) .. and many senior sciensts, postdocs, students, technicians, computer sciensts, stascians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 4 Chip-seq (180 TFs Experimental Assays + Histone marks; 1770 data sets) RNA-seq (418) DNAse-seq (318) RNA-Sequencing Wang et al. 2009 Nat Gen. Rev." Func?onal data: ChIP-seq Sequence and align ChIP-seq Peak 300-500 bp Mo?f (8-12 bp) Immunoprecipitaon An?body Transcrip?on Factor ChIP-exo Histone Marks Func?onal data: DNase-seq DNaseI hypersensi?vity Sequence peak and align Transcrip?on DNaseI Factor Region of open chroman Histone Histone Func?onal data: DNase footprints DNaseI Sequence Footprint and align Transcrip?on DNaseI Factor Region of open chroman Histone Histone a b *0 ODSIHDWXUH b a r 3KHQRW\SHïDVVRFLDWHG613V e Y 5DQGRPVDPSOLQJRIPDWFKHG613V o *0 *HQRW\SHG613V *HQRPHV PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ODSIHDWXUH r 3KHQRW\SHïDVVRFLDWHG613V e Y 5DQGRPVDPSOLQJRIPDWFKHG613V o *HQRW\SHG613V ROGHQ I *HQRPHV PHUVRQDOJHQRPHV LFKPHQWRUGHSOHWLRQ r ï 0 ORJ DFWLRQRI613VWKDW r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766 ROGHQ &7&) I c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10ï 0 ORJ WKUHVKROG DFWLRQRI613VWKDW r E 7 '1DVH,SHDNV 7) WE 3) '5 ) 766 &7&) c *:$6HQULFKPHQW ORJSYDOXH *2LPPXQHUHVSRQVH JHQHVDERYH 10 WKUHVKROG d +XPDQ)HE *5&KKJ FKU Ross HardisonES , Belinda Giardine e FKU 40000000 C9 PTGER4 TTC33 DAB2 OSRF 1a 1 cf12 Examples of Signal Tracks BC026261 T cf4Ucd PRKAA1 T al1sc12984Iggmus T *:$6&DWDORJ K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 d Phenotype SNP-Pheno associations overlap any TF occupancy Gm12878Ebf Gm12878Pol24h8 Gm12878Pol2 Helas3CebpbIggrab K562Ctcf Gm12878Pu1 Gm12878Mef2a K562Pol2V0416101 Gm12878Pax5c20 Gm12878NfkbIggrab Hepg2Ctcf HuvecGata2Ucd Gm12878Elf1sc631V0416101 Gm12878Egr1V0416101 Hepg2Mafkab50322Iggrab HuvecCfosUcd Gm12878Bcl Gm12878Irf4 Gm12878Batf Gm12878 Hepg2Fosl2 K562 +XPDQ)HE *5&KKJ FKU ES TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34 1 2 2 0 4 2 2 0 2 1 2 0 2 4 4 e FKU 40000000 Systemic_lupus_erythematosus 62 10 4 2 1 1 4 0 1 4 1 1 4 2 0 1 2 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 1 1 1 2 1 1 0 2 1 1 2 1 2 2 1 PTGER4 Ulcerative_colitis 85 11 2 0 1 2 1 1 2 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 2 2 C9 Multiple_sclerosis 71 15 4 1 0 4 2 4 2 0 2 2 1 0 2 4 2 0 1 0 0 0 0 0 0 0 0 1 1 4 FKU ES Rheumatoid_arthritis 57 11 4 2 2 1 0 4 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 1 TTC33 LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0 0 2 2 2 1 1 1 0 2 1 0 1 0 &URKQҋVGLVHDVHDAB2 Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 1 1 1 2 2 4 2 UV UV UV UV UVOSRF UV 1a Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 1 0 0 1 1 1 1 2 2 2 1 1 1 2 0 0 1 Chronic_lymphocytic_leukemia 17 8 1 4 0 0 1 0 2 1 0 0 2 0 1 0 2 1 1 cf12 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1 XOFHUDWLYHFROLWLV BC026261 UV T Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0 2 0 0 0 0 2 1 1 cf4Ucd 4 0 0 2 2 1 1 2 0 1 PRKAA1 T al1sc12984Iggmus Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 T 1 2 1 2 2 1 2 2 0 2 1 0 2 1 0 PXOWLSOHVFOHURVLV Celiac_disease 54 11 4 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2 *:$6&DWDORJ UV K562MaxV0416102 Hepg2 Hepg2Foxa2 Hepg2P300V0416101 Hepg2Foxa1c20 HuvecCtcf Helas3Ctcf Hepg2Jund CACO2.DS8235 HUVEC.all HepG2.all Jurkat.DS12659 hTH1.all hTH2.DS7842 CD34.DS16814 Colorectal_cancerPhenotype 18SNP-Pheno associations overlap any TF occupancy 5 Gm12878Ebf 0 Gm12878Pol24h8 0 Gm12878Pol2 0 Helas3CebpbIggrab 1 K562Ctcf 0 Gm12878Pu1 0 Gm12878Mef2a 0 K562Pol2V0416101 1 Gm12878Pax5c20 0 Gm12878NfkbIggrab 0 Hepg2Ctcf 2 HuvecGata2Ucd 0 Gm12878Elf1sc631V0416101 0 Gm12878Egr1V0416101 0 Hepg2Mafkab50322Iggrab 0 HuvecCfosUcd 0 Gm12878Bcl 0 Gm12878Irf4 0 Gm12878Batf 0 Gm12878 0 Hepg2Fosl2 2 K562 0 0 2 0 0 2 2 0 1 0 2 0 1 Hematological_parametersTOTAL 486085 60012 78 570 690 69 721 470 47 710 541 351 542 290 440 280 482 500 381 351 452 372 371 441 620 330 571 461 621 400 550 470 701 851 118 62 192 57 81 HIV-1_controlHeight 20455 3410 0 2 4 1 2 10 1 22 2 20 0 00 41 0 0 20 1 1 1 21 01 21 2 11 21 02 21 0 40 1 0 0 0 40 2 1 0 +89(&*$7$ Systemic_lupus_erythematosusProtein_quantitative_trait_loci 6248 107 42 2 2 20 10 12 41 01 11 40 10 11 42 22 01 10 22 1 42 21 10 00 11 00 00 00 00 10 10 11 11 21 02 01 42 21 11 Alzheimer's_diseaseCrohn's_disease 10542 205 20 20 20 21 12 20 20 01 20 10 22 0 10 10 10 1 20 10 10 00 21 11 11 20 11 21 0 22 2 11 1 2 0 0 2 0 1 Ulcerative_colitisHDL_cholesterol 8555 118 21 0 0 01 11 20 0 12 1 0 11 20 0 20 11 12 20 10 20 21 02 21 21 11 02 22 22 01 11 12 0 21 2 0 21 0 FKU ES Multiple_sclerosisCholesterol 7116 156 41 0 0 10 02 0 40 22 42 20 01 20 21 10 00 21 40 0 20 1 01 0 11 00 00 00 00 02 01 01 00 10 12 0 1 41 0 7)V Rheumatoid_arthritisLongevity 5730 115 40 22 2 11 01 40 0 01 40 40 01 00 11 10 00 00 10 00 20 20 00 10 01 00 00 00 00 01 02 01 02 20 22 11 110 0 11 +89(&F)26 Attention_deficit_hyperactivity_disorderLDL_cholesterol 45102 89 00 00 00 21 22 10 00 41 10 01 11 00 10 00 11 00 00 00 01 00 0 00 1 20 0 20 20 11 10 0 10 00 22 10 01 10 00 Bone_mineral_densityCognitive_performance 65111 98 10 10 12 11 21 20 20 10 20 10 11 00 20 20 21 00 10 20 10 10 00 00 10 00 20 20 0 11 12 10 20 21 4 0 0 20 0 &URKQҋVGLVHDVH UV UV UV UV UV UV Coronary_heart_diseaseType_2_diabetes 10797 1713 20 00 00 21 41 02 01 41 11 20 01 21 00 00 12 11 11 01 01 10 10 1 1 10 20 20 20 12 11 10 1 20 2 00 4 00 10 Chronic_lymphocytic_leukemiaConduct_disorder 1738 85 10 41 1 01 0 0 11 00 20 10 02 00 20 00 10 00 20 10 10 20 00 10 01 10 00 00 00 01 02 00 10 01 02 02 21 00 11 XOFHUDWLYHFROLWLV+89(&,QSXW UV Prostate_cancerType_1_diabetes 5667 87 02 01 01 00 00 02 01 10 01 01 20 10 01 00 0 20 01 00 01 00 20 11 11 41 1 2 2 00 01 20 21 0 12 11 2 01 11 Dialysis-related_mortalityTriglycerides 4826 106 01 00 00 11 21 01 00 20 10 00 22 11 10 00 20 21 00 00 00 00 0 10 20 10 21 21 11 21 21 0 00 21 12 01 20 10 01 PXOWLSOHVFOHURVLV Bipolar_disorderCeliac_disease 54110 116 41 0 0 02 21 20 10 11 10 20 00 01 10 00 00 0 00 10 10 10 11 00 11 10 10 10 20 00 10 21 02 0 01 20 22 10 21 UV Colorectal_cancerBody_mass 1898 55 00 00 00 14 00 00 00 10 00 00 20 00 00 00 00 00 00 00 00 00 20 00 00 20 0 0 0 00 01 20 20 00 1 00 21 00 10 HUVEC Hematological_parametersC-reactive_protein 8534 127 0 00 00 0 10 00 0 01 10 10 20 00 00 00 22 00 10 10 20 20 10 10 01 00 11 10 10 01 01 00 11 10 2 1 0 0 1 Menarche_and_menopauseHIV-1_control 5562 106 00 20 40 10 20 00 10 21 20 00 00 00 10 00 01 00 10 10 10 10 11 10 22 10 10 20
Recommended publications
  • Semantic Web
    SEMANTIC WEB: REVOLUTIONIZING KNOWLEDGE DISCOVERY IN THE LIFE SCIENCES SEMANTIC WEB: REVOLUTIONIZING KNOWLEDGE DISCOVERY IN THE LIFE SCIENCES Edited by Christopher J. O. Baker1 and Kei-Hoi Cheung2 1Knowledge Discovery Department, Institute for Infocomm Research, Singapore, Singapore; 2Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA Kluwer Academic Publishers Boston/Dordrecht/London Contents PART I: Database and Literature Integration Semantic web approach to database integration in the life sciences KEI-HOI CHEUNG, ANDREW K. SMITH, KEVIN Y. L. YIP, CHRISTOPHER J. O. BAKER AND MARK B. GERSTEIN Querying Semantic Web Contents: A case study LOIC ROYER, BENEDIKT LINSE, THOMAS WÄCHTER, TIM FURCH, FRANCOIS BRY, AND MICHAEL SCHROEDER Knowledge Acquisition from the Biomedical Literature LYNETTE HIRSCHMAN, WILLIAM HAYES AND ALFONSO VALENCIA PART II: Ontologies in the Life Sciences Biological Ontologies PATRICK LAMBRIX, HE TAN, VAIDA JAKONIENE, AND LENA STRÖMBÄCK Clinical Ontologies YVES LUSSIER AND OLIVIER BODENREIDER Ontology Engineering For Biological Applications vi Revolutionizing knowledge discovery in the life sciences LARISA N. SOLDATOVA AND ROSS D. KING The Evaluation of Ontologies: Toward Improved Semantic Interoperability LEO OBRST, WERNER CEUSTERS, INDERJEET MANI, STEVE RAY AND BARRY SMITH OWL for the Novice JEFF PAN PART III: Ontology Visualization Techniques for Ontology Visualization XIAOSHU WANG AND JONAS ALMEIDA On Vizualization of OWL Ontologies SERGUEI KRIVOV, FERDINANDO VILLA, RICHARD WILLIAMS, AND XINDONG WU PART IV: Ontologies in Action Applying OWL Reasoning to Genomics: A Case Study KATY WOLSTENCROFT, ROBERT STEVENS AND VOLKER HAARSLEV Can Semantic Web Technologies enable Translational Medicine? VIPUL KASHYAP, TONYA HONGSERMEIER AND SAMUEL J. ARONSON Ontology Design for Biomedical Text Mining RENÉ WITTE, THOMAS KAPPLER, AND CHRISTOPHER J.
    [Show full text]
  • The for Report 07-08
    THE CENTER FOR INTEGRATIVE GENOMICS REPORT 07-08 www.unil.ch/cig Table of Contents INTRODUCTION 2 The CIG at a glance 2 The CIG Scientific Advisory Committee 3 Message from the Director 4 RESEARCH 6 Richard Benton Chemosensory perception in Drosophila: from genes to behaviour 8 Béatrice Desvergne Networking activity of PPARs during development and in adult metabolic homeostasis 10 Christian Fankhauser The effects of light on plant growth and development 12 Paul Franken Genetics and energetics of sleep homeostasis and circadian rhythms 14 Nouria Hernandez Mechanisms of basal and regulated RNA polymerase II and III transcription of ncRNA in mammalian cells 16 Winship Herr Regulation of cell proliferation 18 Henrik Kaessmann Mammalian evolutionary genomics 20 Sophie Martin Molecular mechanisms of cell polarization 22 Liliane Michalik Transcriptional control of tissue repair and angiogenesis 24 Alexandre Reymond Genome structure and expression 26 Andrzej Stasiak Functional transitions of DNA structure 28 Mehdi Tafti Genetics of sleep and the sleep EEG 30 Bernard Thorens Molecular and physiological analysis of energy homeostasis in health and disease 32 Walter Wahli The multifaceted roles of PPARs 34 Other groups at the Génopode 37 CORE FACILITIES 40 Lausanne DNA Array Facility (DAFL) 42 Protein Analysis Facility (PAF) 44 Core facilities associated with the CIG 46 EDUCATION 48 Courses and lectures given by CIG members 50 Doing a PhD at the CIG 52 Seminars and symposia 54 The CIG annual retreat 62 The CIG and the public 63 Artist in residence at the CIG 63 PEOPLE 64 1 Introduction The Center for IntegratiVE Genomics (CIG) at A glance The Center for Integrative Genomics (CIG) is the newest depart- ment of the Faculty of Biology and Medicine of the University of Lausanne (UNIL).
    [Show full text]
  • PREDICTD: Parallel Epigenomics Data Imputation with Cloud-Based Tensor Decomposition
    bioRxiv preprint doi: https://doi.org/10.1101/123927; this version posted April 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition Timothy J. Durham Maxwell W. Libbrecht Department of Genome Sciences Department of Genome Sciences University of Washington University of Washington J. Jeffry Howbert Jeff Bilmes Department of Genome Sciences Department of Electrical Engineering University of Washington University of Washington William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington April 4, 2017 Abstract The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have produced thousands of data sets mapping the epigenome in hundreds of cell types. How- ever, the number of cell types remains too great to comprehensively map given current time and financial constraints. We present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to address this issue by computationally im- puting missing experiments in collections of epigenomics experiments. PREDICTD leverages an intuitive and natural model called \tensor decomposition" to impute many experiments si- multaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining methods yields further improvement. We show that PREDICTD data can be used to investigate enhancer biology at non-coding human accelerated regions. PREDICTD provides reference imputed data sets and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, two technologies increasingly applicable in bioinformatics.
    [Show full text]
  • Aggregation and Correlation Toolbox for Analyses of Genome Tracks Justin Jee Yale University
    University of Massachusetts eM dical School eScholarship@UMMS Program in Bioinformatics and Integrative Biology Program in Bioinformatics and Integrative Biology Publications and Presentations 4-15-2011 ACT: aggregation and correlation toolbox for analyses of genome tracks Justin Jee Yale University Joel Rozowsky Yale University Kevin Y. Yip Yale University See next page for additional authors Follow this and additional works at: http://escholarship.umassmed.edu/bioinformatics_pubs Part of the Bioinformatics Commons, Computational Biology Commons, and the Systems Biology Commons Repository Citation Jee, Justin; Rozowsky, Joel; Yip, Kevin Y.; Lochovsky, Lucas; Bjornson, Robert; Zhong, Guoneng; Zhang, Zhengdong; Fu, Yutao; Wang, Jie; Weng, Zhiping; and Gerstein, Mark B., "ACT: aggregation and correlation toolbox for analyses of genome tracks" (2011). Program in Bioinformatics and Integrative Biology Publications and Presentations. Paper 26. http://escholarship.umassmed.edu/bioinformatics_pubs/26 This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Program in Bioinformatics and Integrative Biology Publications and Presentations by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ACT: aggregation and correlation toolbox for analyses of genome tracks Authors Justin Jee, Joel Rozowsky, Kevin Y. Yip, Lucas Lochovsky, Robert Bjornson, Guoneng Zhong, Zhengdong Zhang, Yutao Fu, Jie Wang, Zhiping Weng, and Mark B. Gerstein Comments © The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non- Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non- commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
    [Show full text]
  • Biocreative II.5 Workshop 2009 Special Session on Digital Annotations
    BioCreative II.5 Workshop 2009 Special Session on Digital Annotations The purified IRF-4 was also The main role of BRCA2 shown to be capable of binding appears to involve regulating the DNA in a PU.1-dependent manner function of RAD51 in the repair by by electrophoretic mobility shift homologous recombination . analysis. brca2 irf4 We found that cells ex- Moreover, expression of pressing Olig2, Nkx2.2, and NG2 Carma1 induces phosphorylation were enriched among virus- of Bcl10 and activation of the infected, GFP-positive (GFP+) transcription factor NF-kappaB. cells. carma1 BB I O olig2 The region of VHL medi- The Rab5 effector ating interaction with HIF-1 alpha Rabaptin-5 and its isoform C R E A T I V E overlapped with a putative Rabaptin-5delta differ in their macromolecular binding site within ability to interact with the rsmallab5 the crystal structure. GTPase Rab4. vhl Translocation RCC, bearing We show that ERBB2-dependenterbb2 atf1 TFE3 or TFEB gene fusions, are Both ATF-1 homodimers and tfe3 medulloblastoma cell invasion and ATF-1/CREB heterodimers bind to recently recognized entities for prometastatic gene expression can the CRE but not to the related which risk factors have not been be blocked using the ERBB tyrosine phorbol ester response element. identified. kinase inhibitor OSI-774. C r i t i c a l A s s e s s m e n t o f I n f o r m a t i o n E x t r a c t i o n i n B i o l o g y October 7th - 9th, 2009 www.BioCreative.org BioCreative II.5 Workshop 2009 special session | Digital Annotations Auditorium of the Spanish National
    [Show full text]
  • Manolis Kellis Piotr Indyk
    6.095 / 6.895 Computational Biology: Genomes, Networks, Evolution Manolis Kellis Rapid database search Courtsey of CCRNP, The National Cancer Institute. Piotr Indyk Protein interaction network Courtesy of GTL Center for Molecular and Cellular Systems. Genome duplication Courtesy of Talking Glossary of Genetics. Administrivia • Course information – Lecturers: Manolis Kellis and Piotr Indyk • Grading: Part. Problem sets 50% Final Project 25% Midterm 20% 5% • 5 problem sets: – Each problem set: covers 4 lectures, contains 4 problems. – Algorithmic problems and programming assignments – Graduate version includes 5th problem on current research •Exams – In-class midterm, no final exam • Collaboration policy – Collaboration allowed, but you must: • Work independently on each problem before discussing it • Write solutions on your own • Acknowledge sources and collaborators. No outsourcing. Goals for the term • Introduction to computational biology – Fundamental problems in computational biology – Algorithmic/machine learning techniques for data analysis – Research directions for active participation in the field • Ability to tackle research – Problem set questions: algorithmic rigorous thinking – Programming assignments: hands-on experience w/ real datasets • Final project: – Research initiative to propose an innovative project – Ability to carry out project’s goals, produce deliverables – Write-up goals, approach, and findings in conference format – Present your project to your peers in conference setting Course outline • Organization – Duality:
    [Show full text]
  • Tporthmm : Predicting the Substrate Class Of
    TPORTHMM : PREDICTING THE SUBSTRATE CLASS OF TRANSMEMBRANE TRANSPORT PROTEINS USING PROFILE HIDDEN MARKOV MODELS Shiva Shamloo A thesis in The Department of Computer Science Presented in Partial Fulfillment of the Requirements For the Degree of Master of Computer Science Concordia University Montréal, Québec, Canada December 2020 © Shiva Shamloo, 2020 Concordia University School of Graduate Studies This is to certify that the thesis prepared By: Shiva Shamloo Entitled: TportHMM : Predicting the substrate class of transmembrane transport proteins using profile Hidden Markov Models and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining commitee: Examiner Dr. Sabine Bergler Examiner Dr. Andrew Delong Supervisor Dr. Gregory Butler Approved Dr. Lata Narayanan, Chair Department of Computer Science and Software Engineering 20 Dean Dr. Mourad Debbabi Faculty of Engineering and Computer Science Abstract TportHMM : Predicting the substrate class of transmembrane transport proteins using profile Hidden Markov Models Shiva Shamloo Transporters make up a large proportion of proteins in a cell, and play important roles in metabolism, regulation, and signal transduction by mediating movement of compounds across membranes but they are among the least characterized proteins due to their hydropho- bic surfaces and lack of conformational stability. There is a need for tools that predict the substrates which are transported at the level of substrate class and the level of specific substrate. This work develops a predictor, TportHMM, using profile Hidden Markov Model (HMM) and Multiple Sequence Alignment (MSA).
    [Show full text]
  • Prof. Manolis Kellis April 15, 2008
    Chromosomes inside the cell Introduction to Algorithms 6.046J/18.401J • Eukaryote cell LECTURE 18 • Prokaryote Computational Biology cell • Bio intro: Regulatory Motifs • Combinatorial motif discovery - Median string finding • Probabilistic motif discovery - Expectation maximization • Comparative genomics Prof. Manolis Kellis April 15, 2008 DNA packaging DNA: The double helix • Why packaging • The most noble molecule of our time – DNA is very long – Cell is very small • Compression – Chromosome is 50,000 times shorter than extended DNA • Using the DNA – Before a piece of DNA is used for anything, this compact structure must open locally ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC ATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC AATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT GCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG TTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA
    [Show full text]
  • MPGM: Scalable and Accurate Multiple Network Alignment
    MPGM: Scalable and Accurate Multiple Network Alignment Ehsan Kazemi1 and Matthias Grossglauser2 1Yale Institute for Network Science, Yale University 2School of Computer and Communication Sciences, EPFL Abstract Protein-protein interaction (PPI) network alignment is a canonical operation to transfer biological knowledge among species. The alignment of PPI-networks has many applica- tions, such as the prediction of protein function, detection of conserved network motifs, and the reconstruction of species’ phylogenetic relationships. A good multiple-network align- ment (MNA), by considering the data related to several species, provides a deep understand- ing of biological networks and system-level cellular processes. With the massive amounts of available PPI data and the increasing number of known PPI networks, the problem of MNA is gaining more attention in the systems-biology studies. In this paper, we introduce a new scalable and accurate algorithm, called MPGM, for aligning multiple networks. The MPGM algorithm has two main steps: (i) SEEDGENERA- TION and (ii) MULTIPLEPERCOLATION. In the first step, to generate an initial set of seed tuples, the SEEDGENERATION algorithm uses only protein sequence similarities. In the second step, to align remaining unmatched nodes, the MULTIPLEPERCOLATION algorithm uses network structures and the seed tuples generated from the first step. We show that, with respect to different evaluation criteria, MPGM outperforms the other state-of-the-art algo- rithms. In addition, we guarantee the performance of MPGM under certain classes of net- work models. We introduce a sampling-based stochastic model for generating k correlated networks. We prove that for this model if a sufficient number of seed tuples are available, the MULTIPLEPERCOLATION algorithm correctly aligns almost all the nodes.
    [Show full text]
  • ENCODE Consortium Meeting
    ENCODE Consortium Meeting June 17-19, 2008 Hilton Washington DC/Rockville Executive Meeting Center Rockville, Maryland PARTICIPANTS Bradley Bernstein Piero Carninci, Ph.D. Molecular Pathology Unit Leader Massachusetts General Hospital Functional Genomics Technology Team and 149 13th Street Omics Resource Development Unit Charlestown, MA 02129 Deputy Project Director (617) 726-6906 LSA Technology Development Group (617) 726-5684 Fax Omics Science Center [email protected] RIKEN Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku Ewan Birney Yokohama 230-0045 Joint Team Leader Japan Panda Group Nucleotides +81-(0)901-709-2277 Panda Coordination and Outreach [email protected] Panda Metabolism European Molecular Biology Laboratory Philip Cayting European Bioinformatics Institute Gerstein Laboratory Hinxton Outstation Department of Molecular Biophysics and Wellcome Trust Genome Campus Biochemistry Hinxton, Cambridge CB10 1SD Yale University United Kingdom P.O. Box 208114 +44-(0)1223-494 444, ext. 4420 New Haven, CT 06520-8114 +44-(0)1223-494 494 Fax (203) 432-6337 [email protected] [email protected] Michael Brent, Ph.D. Howard Y. Chang, M.D., Ph.D. Professor Assistant Professor Center for Genome Sciences Stanford University Washington University Center for Clinical Sciences Research, Campus Box 8510 Room 2155C 4444 Forest Park 269 Campus Drive Saint Louis, MO 63108 Stanford, CA 94305 (314) 286-0210 (650) 736-0306 [email protected] [email protected] James Bentley Brown Mike Cherry, Ph.D. Graduate Student Researcher Associate Professor Graduate Program in Applied Science and Department of Genetics Technology Stanford University Bickel Group 300 Pasteur Drive University of California, Berkeley Stanford, CA 94305-5120 Room 2 (650) 723-7541 1246 Hearst Avenue [email protected] Berkeley, CA 94702 (510) 703-4706 [email protected] Francis S.
    [Show full text]
  • You Are Cordially Invited to a Talk in the Edmond J. Safra Center for Bioinformatics Distinguished Speaker Series
    You are cordially invited to a talk in the Edmond J. Safra Center for Bioinformatics Distinguished Speaker Series. The speaker is Prof. Alfonso Valencia, Spanish National Cancer Research Centre, CNIO. Title: "A mESC epigenetics network that combines chromatin localization data and evolutionary information" Time: Wednesday, November 11 2015, at 11:00 (refreshments from 10:50) Place: Green building seminar room, ground floor, Biotechnology Department. Host: Prof. Ron Shamir, [email protected], School of Computer Science Abstract: The description of biological systems in terms of networks is by now a well- accepted paradigm. Networks offer the possibility of combining complex information and the system to analyse the properties of individual components in relation to the ones of their interaction partners. In this context, we have analysed the relations between the known components of mouse Embryonic Stem Cells (mESC) epigenome at two orthogonal levels of information, i.e., co-localization in the genome and concerted evolution (co-evolution). Public repositories contain results of ChIP-Seq experiments for more than 60 Chromatin Related Proteins (CRPs), 14 for Histone modifications, and three different types of DNA methylation, including 5-Hydroxymethylcytosine (5hmc). All this information can be summarized in a network in which the nodes are the CRPs, histone marks and DNA modifications and the arcs connect components that significantly co-localize in the genome. In this network co-localization preferences are specific of chromatin states, such as promoters and enhancers. A second network was build by connecting proteins that are part of the mESC epigenetic regulatory system and show signals of co-evolution (concerted evolution of the CRPs protein families).
    [Show full text]
  • News Archive Table of Contents
    News Archive Table of Contents The Institute in the News 2 2020 2 2019 3 2018 5 2017 6 2016 6 2015 7 2014 10 2013 12 2012 12 2011 13 2010 14 2009 14 2008 15 2007 15 2006 15 2005 15 2004 16 2003 16 2001 16 2000 17 1997 17 1996 17 David Haussler in the News 18 News Archive: The UC Santa Cruz Genomics Institute in the News - Beginnings to the present | Page 1 of 32 Jim Kent in the News 20 UCSC Genome Browser in the News 21 UCSC Cancer Research In The News 23 Evolutionary and Conservation Genomics in the News 25 Video Library 30 2020 30 2019 30 ​ 2018 30 2015 31 2014 31 2013 32 2012 32 2010 32 The Institute in the News Research and accomplishments of the UC Santa Cruz Genomics Institute and its faculty, seen through the eyes of journalists. 2020 Slugs lauded as Hometown Heroes. Marc DesJardins. December 2, 2020. UCSC. ​ List of most highly cited researchers features 19 UCSC scientists and engineers. Tim ​ Stephens. November 18, 2020. UCSC. ARCS Foundation scholarships support UCSC graduate students. Tim Stephens. ​ ​ September 14, 2020. UCSC. UCSC genomics scientist wins fellowship to clarify genetic risk for sudden heart failure. September 24, 2020. NIH. News Archive: The UC Santa Cruz Genomics Institute in the News - Beginnings to the present | Page 2 of 32 The (near) complete sequence of a human genome. Adam Phillippy. September 22, ​ 2020. Genome Informatics Section. ​ NSF advances 25 projects to explore bold ideas for transformative research. September ​ ​ 15, 2020.
    [Show full text]