ENCODE: Understanding the Genome

Total Page:16

File Type:pdf, Size:1020Kb

ENCODE: Understanding the Genome ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney, Marc Schaub, Alan Boyle Encyclopedia of DNA Elements (ENCODE) • NHGRI-funded consortium • Goal: delineate all functional elements in the human genome • Wide array of experimental assays • Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0 The ENCODE Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 2012 Project website: http://encodeproject.org The ENCODE Consortium Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides) Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng) Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer) Jim Kent (David Haussler, Kate Rosenbloom) John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick Navas, Phil Green) Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman) Rick Myers (Barbara Wold) Scott Tenenbaum (Luiz Penalva) Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent, Roderic Guigo) Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan, Yoshihide Hayashizaki) Zhiping Weng (Nathan Trinklein, Rick Myers) Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt .. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups NHGRI: Elise Feingold, Mike Pazin, Peter Good 3 Experimental Assays Chip-seq (165 TFs + Histone marks) RNA-seq (292) DNAse-seq (~200) RNA-Sequencing Wang et al. 2009 Nat Gen. Rev. Functional data: ChIP-seq Sequence and align ChIP-seq Peak 300-500 bp Motif (8-12 bp) Immunoprecipitation Antibody Transcription Factor ChIP-exo Histone Marks Functional data: DNase-seq DNaseI hypersensitivity Sequence peak and align Transcription DNaseI Factor Region of open chromatin Histone Histone Functional data: DNase footprints DNaseI Sequence Footprint and align Transcription DNaseI Factor Region of open chromatin Histone Histone b ) n a e 1.5 q o i u t t G M12878 e a 0.3 l e p f e p d 1.0 a ) q l b r n o a e 1.5 Phenotype−associated SNPs q e o t i u v t Random sampling of matched SNPs n t o G M12878 0.2 e e a l G enotyped SNPs t 0.3 e p 0.5 m a f 1000 G enomes e h h t p d c 1.0 24 Peqsonal genomes i a q r s l r o n Phenotype−associated SNPs P e t e N v Random sampling of matched SNPs 0.1 n 0.0 d o S 0.2 l e G enotyped SNPs t f o 0.5 f m a o ( 1000 G enomes h h t n c 2 24 Peqsonal genomes i o r g s i t n −0.5 o P l c 0 e F N a 0.1 0.0 E S F C R r / d E S T S DNaseI peaks TF l W P T D F T f o C f o ( n 2 genes above o c G W AS enqichment -log g p-value G O:0006955 immune qesponse i t −0.5 10o thqeshold l c 0 F a E S F C R r E T / DNaseI peaks TF W S P T D F T C c G W AS enqichment -log p-value G O:0006955 immune qesponse genes above 10 thqeshold d H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-4Ross0,819, 5Hardison00 (1,545,0,0 0Belinda bp) Giardine e chq5: 39500000 40000000 40500000 1 PTGER4 0 y 1 b C9 c 6 s s a n 1 r 1 n u a 4 g 0 TTC33 o i 0 m p 1 g t 1 I b g u V 6 a 0 b 2 i a c 1 1 g 1 2 1 a I r 2 c c 0 3 4 0 DAB2 r 0 8 6 4 3 g o 1 6 o 2 0 g 1 1 h 8 d 0 a 0 g 9 s 6 c 5 c a V 6 I OSRF g 4 4 9 1 c 5 d 2 F 2 4 I 5 d 1 s s 3 5 2 1 1 b f 0 2 2 1 2 c c b 1 1 T f 2 6 U c f l l b 1 4 1 r l t 2 a Examples of Signal Tracks x 4 f k 4 1 f 2 V 1 a 8 e 4 2 U 2 0 f c a 8 b o o p u a l g f 2 U y c 0 c o l a 0 a f d k 6 8 r BC026261 f a s 1 f l b V 4 S E P P P M P N E E f B I B T n t V l l s s x 0 x c n n 1 7 f c o c l f S e 2 t 8 8 8 8 8 8 8 8 8 a 8 8 8 8 a t a 1 x D a f 3 t e l o o o u l c S l . a S c 7 7 7 7 7 7 7 7 7 7 7 7 7 l D a C . C PRKAA1 t h o C M F T F P F J G C C . p 2 a D 8 8 8 8 8 8 8 8 8 8 8 8 8 C a D 3 3 2 t . 2 2 2 2 2 2 2 2 . c c c P a C P T M 2 2 2 2 2 2 2 2 2 2 2 2 2 O l s s E a - 4 1 2 g e g e g g g g g e g G 2 2 2 2 r 1 1 1 1 1 1 1 1 1 1 1 1 1 a a k C V 3 P v v v l 6 6 p p p 6 6 p p p p l p p H H e r m m m m m m m m m m m m m e 5 5 e u e u e 5 5 e e e e u e e A U e D N v T T u G W AS Catalog o G G G H K G G K G G H H G G H H G G G G H K K H H H H H H H C H H J h h C d Phenotype S H uman Feb. 2009 (G RCh37/hg19) chq5:39,274,501-40,819,500 (1,545,000 bp) TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81 Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7 e chq5: 39500000 40000000 40500000 Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1 Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5 1 PTGER4 Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 0 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3 y 1 b C9 c 6 chq5:40,390,001-40,440,000 (50,000 bp) s s Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 a 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3 n 1 r 1 n u a 4 g 0 TTC33 o i 0 m Rheumatoid_arthritis 57 1p 1 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1 1 g t 1 I b g u V 6 a 0 b 2 i a c 1 1 g LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 1 0 1 0 0 0 0 0 0 2 2 2 1 1 1 0 2 1 0 1 0 3 2 3 3 3 1 a I r 2 c c 0 3 4 0 DAB2 r 0 8 6 4 3 g Cqohn’s disease o 1 6 o 2 0 g 1 1 qs4613763 qs17234657 qs11742570 qs6896969 qs1373692 qs9292777 Bone_mineral_density 65 9 1 h 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3 8 d 0 a 0 g 9 s 6 c 5 c a V 6 I OSRF g 4 4 9 1 c 5 d 2 F 2 4 I 5 d 1 s s 3 5 2 1 1 b f 0 2 2 1 2 c c b 1 Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1 T f 2 6 U c a f l l b 1 4 1 r l t 2 x 4 f k 4 1 f 2 V 1 a 8 e 4 2 U 2 0 f c a 8 b o o p u a l g f 2 U y c 0 c o l a 0 a f d k 6 8 r BC026261 f a s 1 l Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 f 0 0 0 1 0 0 0 2 0 1 ulceqative colitis qs1992660 b V 4 S E P P P M P N E E f B I B T n t V l l s s x 0 x c n n 1 7 f c o c l f S e 2 t 8 8 8 8 8 8 8 8 8 a 8 8 8 8 a t a 1 x D a f 3 t e l o o o u l c S l .
Recommended publications
  • Gene Prediction: the End of the Beginning Comment Colin Semple
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by PubMed Central http://genomebiology.com/2000/1/2/reports/4012.1 Meeting report Gene prediction: the end of the beginning comment Colin Semple Address: Department of Medical Sciences, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK. E-mail: [email protected] Published: 28 July 2000 reviews Genome Biology 2000, 1(2):reports4012.1–4012.3 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2000/1/2/reports/4012 © GenomeBiology.com (Print ISSN 1465-6906; Online ISSN 1465-6914) Reducing genomes to genes reports A report from the conference entitled Genome Based Gene All ab initio gene prediction programs have to balance sensi- Structure Determination, Hinxton, UK, 1-2 June, 2000, tivity against accuracy. It is often only possible to detect all organised by the European Bioinformatics Institute (EBI). the real exons present in a sequence at the expense of detect- ing many false ones. Alternatively, one may accept only pre- dictions scoring above a more stringent threshold but lose The draft sequence of the human genome will become avail- those real exons that have lower scores. The trick is to try and able later this year. For some time now it has been accepted increase accuracy without any large loss of sensitivity; this deposited research that this will mark a beginning rather than an end. A vast can be done by comparing the prediction with additional, amount of work will remain to be done, from detailing independent evidence.
    [Show full text]
  • The EMBL-European Bioinformatics Institute the Hub for Bioinformatics in Europe
    The EMBL-European Bioinformatics Institute The hub for bioinformatics in Europe Blaise T.F. Alako, PhD [email protected] www.ebi.ac.uk What is EMBL-EBI? • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data, services and research The European Molecular Biology Laboratory Heidelberg Hamburg Hinxton, Cambridge Basic research Structural biology Bioinformatics Administration Grenoble Monterotondo, Rome EMBO EMBL staff: 1500 people Structural biology Mouse biology >60 nationalities EMBL member states Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom Associate member state: Australia Who we are ~500 members of staff ~400 work in services & support >53 nationalities ~120 focus on basic research EMBL-EBI’s mission • Provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress • Contribute to the advancement of biology through basic investigator-driven research in bioinformatics • Provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators • Help disseminate cutting-edge technologies to industry • Coordinate biological data provision throughout Europe Services Data and tools for molecular life science www.ebi.ac.uk/services Browse our services 9 What services do we provide? Labs around the
    [Show full text]
  • Functional Effects Detailed Research Plan
    GeCIP Detailed Research Plan Form Background The Genomics England Clinical Interpretation Partnership (GeCIP) brings together researchers, clinicians and trainees from both academia and the NHS to analyse, refine and make new discoveries from the data from the 100,000 Genomes Project. The aims of the partnerships are: 1. To optimise: • clinical data and sample collection • clinical reporting • data validation and interpretation. 2. To improve understanding of the implications of genomic findings and improve the accuracy and reliability of information fed back to patients. To add to knowledge of the genetic basis of disease. 3. To provide a sustainable thriving training environment. The initial wave of GeCIP domains was announced in June 2015 following a first round of applications in January 2015. On the 18th June 2015 we invited the inaugurated GeCIP domains to develop more detailed research plans working closely with Genomics England. These will be used to ensure that the plans are complimentary and add real value across the GeCIP portfolio and address the aims and objectives of the 100,000 Genomes Project. They will be shared with the MRC, Wellcome Trust, NIHR and Cancer Research UK as existing members of the GeCIP Board to give advance warning and manage funding requests to maximise the funds available to each domain. However, formal applications will then be required to be submitted to individual funders. They will allow Genomics England to plan shared core analyses and the required research and computing infrastructure to support the proposed research. They will also form the basis of assessment by the Project’s Access Review Committee, to permit access to data.
    [Show full text]
  • Semantic Web
    SEMANTIC WEB: REVOLUTIONIZING KNOWLEDGE DISCOVERY IN THE LIFE SCIENCES SEMANTIC WEB: REVOLUTIONIZING KNOWLEDGE DISCOVERY IN THE LIFE SCIENCES Edited by Christopher J. O. Baker1 and Kei-Hoi Cheung2 1Knowledge Discovery Department, Institute for Infocomm Research, Singapore, Singapore; 2Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA Kluwer Academic Publishers Boston/Dordrecht/London Contents PART I: Database and Literature Integration Semantic web approach to database integration in the life sciences KEI-HOI CHEUNG, ANDREW K. SMITH, KEVIN Y. L. YIP, CHRISTOPHER J. O. BAKER AND MARK B. GERSTEIN Querying Semantic Web Contents: A case study LOIC ROYER, BENEDIKT LINSE, THOMAS WÄCHTER, TIM FURCH, FRANCOIS BRY, AND MICHAEL SCHROEDER Knowledge Acquisition from the Biomedical Literature LYNETTE HIRSCHMAN, WILLIAM HAYES AND ALFONSO VALENCIA PART II: Ontologies in the Life Sciences Biological Ontologies PATRICK LAMBRIX, HE TAN, VAIDA JAKONIENE, AND LENA STRÖMBÄCK Clinical Ontologies YVES LUSSIER AND OLIVIER BODENREIDER Ontology Engineering For Biological Applications vi Revolutionizing knowledge discovery in the life sciences LARISA N. SOLDATOVA AND ROSS D. KING The Evaluation of Ontologies: Toward Improved Semantic Interoperability LEO OBRST, WERNER CEUSTERS, INDERJEET MANI, STEVE RAY AND BARRY SMITH OWL for the Novice JEFF PAN PART III: Ontology Visualization Techniques for Ontology Visualization XIAOSHU WANG AND JONAS ALMEIDA On Vizualization of OWL Ontologies SERGUEI KRIVOV, FERDINANDO VILLA, RICHARD WILLIAMS, AND XINDONG WU PART IV: Ontologies in Action Applying OWL Reasoning to Genomics: A Case Study KATY WOLSTENCROFT, ROBERT STEVENS AND VOLKER HAARSLEV Can Semantic Web Technologies enable Translational Medicine? VIPUL KASHYAP, TONYA HONGSERMEIER AND SAMUEL J. ARONSON Ontology Design for Biomedical Text Mining RENÉ WITTE, THOMAS KAPPLER, AND CHRISTOPHER J.
    [Show full text]
  • The for Report 07-08
    THE CENTER FOR INTEGRATIVE GENOMICS REPORT 07-08 www.unil.ch/cig Table of Contents INTRODUCTION 2 The CIG at a glance 2 The CIG Scientific Advisory Committee 3 Message from the Director 4 RESEARCH 6 Richard Benton Chemosensory perception in Drosophila: from genes to behaviour 8 Béatrice Desvergne Networking activity of PPARs during development and in adult metabolic homeostasis 10 Christian Fankhauser The effects of light on plant growth and development 12 Paul Franken Genetics and energetics of sleep homeostasis and circadian rhythms 14 Nouria Hernandez Mechanisms of basal and regulated RNA polymerase II and III transcription of ncRNA in mammalian cells 16 Winship Herr Regulation of cell proliferation 18 Henrik Kaessmann Mammalian evolutionary genomics 20 Sophie Martin Molecular mechanisms of cell polarization 22 Liliane Michalik Transcriptional control of tissue repair and angiogenesis 24 Alexandre Reymond Genome structure and expression 26 Andrzej Stasiak Functional transitions of DNA structure 28 Mehdi Tafti Genetics of sleep and the sleep EEG 30 Bernard Thorens Molecular and physiological analysis of energy homeostasis in health and disease 32 Walter Wahli The multifaceted roles of PPARs 34 Other groups at the Génopode 37 CORE FACILITIES 40 Lausanne DNA Array Facility (DAFL) 42 Protein Analysis Facility (PAF) 44 Core facilities associated with the CIG 46 EDUCATION 48 Courses and lectures given by CIG members 50 Doing a PhD at the CIG 52 Seminars and symposia 54 The CIG annual retreat 62 The CIG and the public 63 Artist in residence at the CIG 63 PEOPLE 64 1 Introduction The Center for IntegratiVE Genomics (CIG) at A glance The Center for Integrative Genomics (CIG) is the newest depart- ment of the Faculty of Biology and Medicine of the University of Lausanne (UNIL).
    [Show full text]
  • PREDICTD: Parallel Epigenomics Data Imputation with Cloud-Based Tensor Decomposition
    bioRxiv preprint doi: https://doi.org/10.1101/123927; this version posted April 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition Timothy J. Durham Maxwell W. Libbrecht Department of Genome Sciences Department of Genome Sciences University of Washington University of Washington J. Jeffry Howbert Jeff Bilmes Department of Genome Sciences Department of Electrical Engineering University of Washington University of Washington William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington April 4, 2017 Abstract The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project have produced thousands of data sets mapping the epigenome in hundreds of cell types. How- ever, the number of cell types remains too great to comprehensively map given current time and financial constraints. We present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to address this issue by computationally im- puting missing experiments in collections of epigenomics experiments. PREDICTD leverages an intuitive and natural model called \tensor decomposition" to impute many experiments si- multaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining methods yields further improvement. We show that PREDICTD data can be used to investigate enhancer biology at non-coding human accelerated regions. PREDICTD provides reference imputed data sets and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, two technologies increasingly applicable in bioinformatics.
    [Show full text]
  • Multi-Class Protein Classification Using Adaptive Codes
    Journal of Machine Learning Research 8 (2007) 1557-1581 Submitted 8/06; Revised 4/07; Published 7/07 Multi-class Protein Classification Using Adaptive Codes Iain Melvin∗ [email protected] NEC Laboratories of America Princeton, NJ 08540, USA Eugene Ie∗ [email protected] Department of Computer Science and Engineering University of California San Diego, CA 92093-0404, USA Jason Weston [email protected] NEC Laboratories of America Princeton, NJ 08540, USA William Stafford Noble [email protected] Department of Genome Sciences Department of Computer Science and Engineering University of Washington Seattle, WA 98195, USA Christina Leslie [email protected] Center for Computational Learning Systems Columbia University New York, NY 10115, USA Editor: Nello Cristianini Abstract Predicting a protein’s structural class from its amino acid sequence is a fundamental problem in computational biology. Recent machine learning work in this domain has focused on develop- ing new input space representations for protein sequences, that is, string kernels, some of which give state-of-the-art performance for the binary prediction task of discriminating between one class and all the others. However, the underlying protein classification problem is in fact a huge multi- class problem, with over 1000 protein folds and even more structural subcategories organized into a hierarchy. To handle this challenging many-class problem while taking advantage of progress on the binary problem, we introduce an adaptive code approach in the output space of one-vs- the-rest prediction scores. Specifically, we use a ranking perceptron algorithm to learn a weight- ing of binary classifiers that improves multi-class prediction with respect to a fixed set of out- put codes.
    [Show full text]
  • Aggregation and Correlation Toolbox for Analyses of Genome Tracks Justin Jee Yale University
    University of Massachusetts eM dical School eScholarship@UMMS Program in Bioinformatics and Integrative Biology Program in Bioinformatics and Integrative Biology Publications and Presentations 4-15-2011 ACT: aggregation and correlation toolbox for analyses of genome tracks Justin Jee Yale University Joel Rozowsky Yale University Kevin Y. Yip Yale University See next page for additional authors Follow this and additional works at: http://escholarship.umassmed.edu/bioinformatics_pubs Part of the Bioinformatics Commons, Computational Biology Commons, and the Systems Biology Commons Repository Citation Jee, Justin; Rozowsky, Joel; Yip, Kevin Y.; Lochovsky, Lucas; Bjornson, Robert; Zhong, Guoneng; Zhang, Zhengdong; Fu, Yutao; Wang, Jie; Weng, Zhiping; and Gerstein, Mark B., "ACT: aggregation and correlation toolbox for analyses of genome tracks" (2011). Program in Bioinformatics and Integrative Biology Publications and Presentations. Paper 26. http://escholarship.umassmed.edu/bioinformatics_pubs/26 This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Program in Bioinformatics and Integrative Biology Publications and Presentations by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ACT: aggregation and correlation toolbox for analyses of genome tracks Authors Justin Jee, Joel Rozowsky, Kevin Y. Yip, Lucas Lochovsky, Robert Bjornson, Guoneng Zhong, Zhengdong Zhang, Yutao Fu, Jie Wang, Zhiping Weng, and Mark B. Gerstein Comments © The Author(s) 2011. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non- Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non- commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
    [Show full text]
  • UC Irvine UC Irvine Previously Published Works
    UC Irvine UC Irvine Previously Published Works Title The capacity of feedforward neural networks. Permalink https://escholarship.org/uc/item/29h5t0hf Authors Baldi, Pierre Vershynin, Roman Publication Date 2019-08-01 DOI 10.1016/j.neunet.2019.04.009 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California THE CAPACITY OF FEEDFORWARD NEURAL NETWORKS PIERRE BALDI AND ROMAN VERSHYNIN Abstract. A long standing open problem in the theory of neural networks is the devel- opment of quantitative methods to estimate and compare the capabilities of different ar- chitectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity provides an upperbound on the number of bits that can be extracted from the training data and stored in the architecture during learning. We study the capacity of layered, fully-connected, architectures of linear threshold neurons with L layers of size n1, n2,...,nL and show that in essence the capacity is given by a cubic polynomial in the layer sizes: L−1 C(n1,...,nL) = Pk=1 min(n1,...,nk)nknk+1, where layers that are smaller than all pre- vious layers act as bottlenecks. In proving the main result, we also develop new techniques (multiplexing, enrichment, and stacking) as well as new bounds on the capacity of finite sets. We use the main result to identify architectures with maximal or minimal capacity under a number of natural constraints.
    [Show full text]
  • Biocreative II.5 Workshop 2009 Special Session on Digital Annotations
    BioCreative II.5 Workshop 2009 Special Session on Digital Annotations The purified IRF-4 was also The main role of BRCA2 shown to be capable of binding appears to involve regulating the DNA in a PU.1-dependent manner function of RAD51 in the repair by by electrophoretic mobility shift homologous recombination . analysis. brca2 irf4 We found that cells ex- Moreover, expression of pressing Olig2, Nkx2.2, and NG2 Carma1 induces phosphorylation were enriched among virus- of Bcl10 and activation of the infected, GFP-positive (GFP+) transcription factor NF-kappaB. cells. carma1 BB I O olig2 The region of VHL medi- The Rab5 effector ating interaction with HIF-1 alpha Rabaptin-5 and its isoform C R E A T I V E overlapped with a putative Rabaptin-5delta differ in their macromolecular binding site within ability to interact with the rsmallab5 the crystal structure. GTPase Rab4. vhl Translocation RCC, bearing We show that ERBB2-dependenterbb2 atf1 TFE3 or TFEB gene fusions, are Both ATF-1 homodimers and tfe3 medulloblastoma cell invasion and ATF-1/CREB heterodimers bind to recently recognized entities for prometastatic gene expression can the CRE but not to the related which risk factors have not been be blocked using the ERBB tyrosine phorbol ester response element. identified. kinase inhibitor OSI-774. C r i t i c a l A s s e s s m e n t o f I n f o r m a t i o n E x t r a c t i o n i n B i o l o g y October 7th - 9th, 2009 www.BioCreative.org BioCreative II.5 Workshop 2009 special session | Digital Annotations Auditorium of the Spanish National
    [Show full text]
  • Establishing Incentives and Changing Cultures to Support Data Access
    EXPERT ADVISORY GROUP ON DATA ACCESS ESTABLISHING INCENTIVES AND CHANGING CULTURES TO SUPPORT DATA ACCESS May 2014 ACKNOWLEDGEMENT This is a report of the Expert Advisory Group on Data Access (EAGDA). EAGDA was established by the MRC, ESRC, Cancer Research UK and the Wellcome Trust in 2012 to provide strategic advice on emerging scientific, ethical and legal issues in relation to data access for cohort and longitudinal studies. The report is based on work undertaken by the EAGDA secretariat at the Group’s request. The contributions of David Carr, Natalie Banner, Grace Gottlieb, Joanna Scott and Katherine Littler at the Wellcome Trust are gratefully acknowledged. EAGDA would also like to thank the representatives of the MRC, ESRC and Cancer Research UK for their support and input throughout the project. Most importantly, EAGDA owes a considerable debt of gratitude to the many individuals from the research community who contributed to this study through feeding in their expert views via surveys, interviews and focus groups. The Expert Advisory Group on Data Access Martin Bobrow (Chair) Bartha Maria Knoppers James Banks Mark McCarthy Paul Burton Andrew Morris George Davey Smith Onora O'Neill Rosalind Eeles Nigel Shadbolt Paul Flicek Chris Skinner Mark Guyer Melanie Wright Tim Hubbard 1 EXECUTIVE SUMMARY This project was developed as a key component of the workplan of the Expert Advisory Group on Data Access (EAGDA). EAGDA wished to understand the factors that help and hinder individual researchers in making their data (both published and unpublished) available to other researchers, and to examine the potential need for new types of incentives to enable data access and sharing.
    [Show full text]
  • Manolis Kellis Piotr Indyk
    6.095 / 6.895 Computational Biology: Genomes, Networks, Evolution Manolis Kellis Rapid database search Courtsey of CCRNP, The National Cancer Institute. Piotr Indyk Protein interaction network Courtesy of GTL Center for Molecular and Cellular Systems. Genome duplication Courtesy of Talking Glossary of Genetics. Administrivia • Course information – Lecturers: Manolis Kellis and Piotr Indyk • Grading: Part. Problem sets 50% Final Project 25% Midterm 20% 5% • 5 problem sets: – Each problem set: covers 4 lectures, contains 4 problems. – Algorithmic problems and programming assignments – Graduate version includes 5th problem on current research •Exams – In-class midterm, no final exam • Collaboration policy – Collaboration allowed, but you must: • Work independently on each problem before discussing it • Write solutions on your own • Acknowledge sources and collaborators. No outsourcing. Goals for the term • Introduction to computational biology – Fundamental problems in computational biology – Algorithmic/machine learning techniques for data analysis – Research directions for active participation in the field • Ability to tackle research – Problem set questions: algorithmic rigorous thinking – Programming assignments: hands-on experience w/ real datasets • Final project: – Research initiative to propose an innovative project – Ability to carry out project’s goals, produce deliverables – Write-up goals, approach, and findings in conference format – Present your project to your peers in conference setting Course outline • Organization – Duality:
    [Show full text]