Giggle: a Search Engine for Large-Scale Integrated Genome

Total Page:16

File Type:pdf, Size:1020Kb

Giggle: a Search Engine for Large-Scale Integrated Genome BRIEF COMMUNICATIONS data. It works through command line and web interfaces, as well GIGGLE: a search engine as APIs in the C, Go, and Python programming languages. GIGGLE is based on a temporal indexing scheme5 that uses a B+ for large-scale integrated tree to create a single index of the genome intervals from thousands of annotations and genomic data files (Fig. 1a). Each interval in an genome analysis indexed file is represented by two keys in the tree that correspond to the interval’s bounds (start and end + 1). Each key in a leaf node Ryan M Layer1,2 , Brent S Pedersen1,2, contains a list of intervals that either start at a chromosomal posi- Tonya DiSera1,2 , Gabor T Marth1,2, Jason Gertz3 tion (indicated by a “+”) or have ended (indicated by a “−”) just & Aaron R Quinlan1,2,4 before that position. We give an example (Fig. 1a) in which position 7 corresponds to a key in the second leaf node with the list [+T2, −B2]. This indicates that at chromosomal position 7, the second GIGGLE is a genomics search engine that identifies and ranks interval in the “Transcripts” file (T2) has started, and the second the significance of genomic loci shared between query features interval in the “TF binding sites” file (B2) has ended. To find the and thousands of genome interval files. GIGGLE (https:// intervals in the index that intersect a query interval (e.g., [1,5] in github.com/ryanlayer/giggle) scales to billions of intervals Fig. 1a), the tree is searched for the query start and end, the keys and is over three orders of magnitude faster than existing within that range are scanned, and intervals in the lists of those keys methods. Its speed extends the accessibility and utility of are identified as intersecting the query interval (see Supplementary resources such as ENCODE, Roadmap Epigenomics, and GTEx by Fig. 1 and Online Methods for complete algorithmic details). facilitating data integration and hypothesis generation. GIGGLE′s potential for high scalability is based on two factors. First, identifying the number of overlaps between a query and The results from genome-wide assays such as ChIP-seq, RNA-seq, any given annotation file is determined entirely within the uni- and variant calling are often interpreted by comparing experimen- fied index, thus eliminating the inefficiencies of existing methods, tally identified genomic loci to other known genomic features which must instead open and inspect the underlying data files. such as open chromatin, enhancers, and transcribed regions. Second, the B+ tree structure minimizes disk reads; this is vital Large-scale functional genomics projects have greatly empow- to performance since databases of this scale will grow beyond the ered this type of analysis by characterizing the genomic regions capacity of main memory and must be stored on disk. To meas- associated with a wide range of genomic processes. However, ure GIGGLE′s query performance (Supplementary Software), interpretation is complicated by the size of these data set collec- we created an index of the ChromHMM6 annotations curated tions, which consist of thousands of results that span hundreds of by the Roadmap Epigenomics Project (Roadmap) from 127 tis- different tissue types, assays, and biological conditions. Effectively sues and cell lines. Each genome was segmented into 15 genomic integrating these large, complex, and heterogeneous resources states, yielding over 55 million intervals in the resulting GIGGLE Nature America, Inc., part of Springer Nature. All rights reserved. requires the ability to rapidly search the full data set and identify index (2.2 GB index, indexed in 80 s). When testing query per- 8 the most statistically relevant features. While existing software formance with a range of 10 to 1,000,000 query intervals, GIGGLE 1 2 201 such as BEDTOOLS and TABIX identify regions that are com- was 2,336× faster than TABIX and 25× faster than BEDTOOLS © mon to genome interval files, these methods were designed to (Fig. 1b; see Supplementary Data 1 for the data used to create Fig. 1) investigate a limited number of files. More recent methods3,4 for the largest comparison. Similarly, using an index of 5,603 anno- describe improved statistical measures, yet they do not scale to tation files for the human genome (GRCh37, a total of 6.9 billion the vast amount of data that is now available. intervals) from the UCSC Genome browser (554 GB index, indexed We introduce GIGGLE, a fast and highly scalable genomic in 269 min), GIGGLE was up to 345× faster than TABIX and 8× interval searching strategy that, much like web search engines did faster than BEDTOOLS (Fig. 1c). for the Internet, provides users with the ability to conduct large- Speed is essential for searching data of this scale, but, as with scale comparisons of their results with thousands of reference data internet searches, it is arguably more important to rank results by sets and genome annotations in seconds. GIGGLE enables the their relevance to the set of query intervals. Ranking requires a identification of novel and unexpected relationships among local metric that quantifies the degree of similarity between the query data sets as well as the vast amount of publicly available genomics intervals and each interval file in the GIGGLE index. Monte Carlo 1Department of Human Genetics, University of Utah, Salt Lake City, Utah, USA. 2USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA. 3Department of Oncological Sciences, University of Utah, Huntsman Cancer Institute, Salt Lake City, Utah, USA. 4Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA. Correspondence should be addressed to R.M.L. ([email protected]) or A.R.Q. ([email protected]). RECEIVED 5 JULY 2017; ACCEPTED 6 DECEMBER 2017; PUBLISHED ONLINE 8 JANUARY 2018; DOI:10.1038/NMETH.4556 NATURE METHODS | ADVANCE ONLINE PUBLICATION | BRIEF COMMUNICATIONS 7,8 (MC) simulations are commonly used in genomics analyses to a 1 2 3 4 5 6 7 8 9 5 compare the observed number of intersections to a null distri- TF binding sites B1 B2 1 2 3 4 5 7 8 10 bution of intersections obtained by randomly shuffling intervals Promoters P1 P2 T1 T2 +P1+B1+T1 −P1 +B2+T2 −T2−P2 Transcripts −B1 +P2−B2 thousands of times and testing the number of intersections in −T1 each trial. While MC simulations are an effective method for pairs Query Search(1,5) = [P1,B1,T1,B2,P2] b 106 c 106 of interval sets, they are computationally intractable for large- 105 105 scale data sets since thousands of permutations are required for 104 104 103 103 each interval file. 102 102 GIGGLE eliminates this complexity by estimating the sig- 101 101 0 0 Runtime (s) 10 Runtime (s) 10 nificance and enrichment between the query intervals and each –1 GIGGLE –1 GIGGLE 10 BEDTOOLS 10 BEDTOOLS –2 –2 indexed interval file with a Fisher’s Exact two-tailed test and the 10 TABIX 10 TABIX odds ratio of a 2 × 2 contingency table containing the number of 4 5 6 4 5 6 10 10 100 100 intervals that are in (i) both the query and indexed file, (ii) solely 1,000 1,000 1 × 10 1 × 10 1 × 10 1 × 10 1 × 10 1 × 10 the query file, (iii) solely the indexed file, and (iv) neither the Number query intervals Number query intervals query file nor the indexed file. The first three values are directly computed with a GIGGLE search, and the last value is estimated d 1.0 e 40 by the difference between the union of the two sets and the quo- 35 0.8 tient of the mean interval size of both sets and the genome size. 30 These estimates are well correlated with the MC results (Fig. 1d,e) 0.6 25 value 20 and have the favorable property of near-instant computation. P 0.4 15 GIGGLE ranks query results by a composite of the product of MC 10 −log (P value) and log (odds ratio). This ‘GIGGLE score’ avoids 0.2 10 2 MC observed/expected some of the issues that arise when using only P values to select 5 9 0.0 0 top hits . In MC simulations, the proportion of values that are 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 more extreme than the observation (i.e., the P value) is highly Fisher’s exact P value (GIGGLE) Odds ratio (GIGGLE) dependent on the variance of the trials. When the variance of Figure | Indexing, searching, performance, and score calibration. the MC distribution is low, observations that are only margin- (a) A set of three genomic intervals files (transcription factor (TF) ally larger than the expected value may be significant, yet not binding sites, promoters, and transcripts) (left, black) is indexed using interesting biologically. For example, one result from a search of a single (simplified) B+ tree (right). Intervals among the annotations MyoD (a muscle differentiation transcription factor) ChIP-seq overlapping a query interval (left, red) are found by searching the tree peaks against Roadmap had a low enrichment (1.7×), but the vari- for the query start and end (right, boxed red) and scanning the keys ance of the MC simulations was also low, making the observation between these positions (right, boxed red). (b) Runtimes for GIGGLE, BEDTOOLS, and TABIX considering random query sets with between 10 significant (P < 0.001). Similarly, when the MC distribution vari- and 1 million random 100-base-pair intervals against the ChromHMM ance is high, large enrichments may not reach significance.
Recommended publications
  • Evolution, Expression and Meiotic Behavior of Genes Involved in Chromosome Segregation of Monotremes
    G C A T T A C G G C A T genes Article Evolution, Expression and Meiotic Behavior of Genes Involved in Chromosome Segregation of Monotremes Filip Pajpach , Linda Shearwin-Whyatt and Frank Grützner * School of Biological Sciences, The University of Adelaide, Adelaide, SA 5005, Australia; fi[email protected] (F.P.); [email protected] (L.S.-W.) * Correspondence: [email protected] Abstract: Chromosome segregation at mitosis and meiosis is a highly dynamic and tightly regulated process that involves a large number of components. Due to the fundamental nature of chromosome segregation, many genes involved in this process are evolutionarily highly conserved, but duplica- tions and functional diversification has occurred in various lineages. In order to better understand the evolution of genes involved in chromosome segregation in mammals, we analyzed some of the key components in the basal mammalian lineage of egg-laying mammals. The chromosome passenger complex is a multiprotein complex central to chromosome segregation during both mitosis and meio- sis. It consists of survivin, borealin, inner centromere protein, and Aurora kinase B or C. We confirm the absence of Aurora kinase C in marsupials and show its absence in both platypus and echidna, which supports the current model of the evolution of Aurora kinases. High expression of AURKBC, an ancestor of AURKB and AURKC present in monotremes, suggests that this gene is performing all necessary meiotic functions in monotremes. Other genes of the chromosome passenger complex complex are present and conserved in monotremes, suggesting that their function has been preserved Citation: Pajpach, F.; in mammals.
    [Show full text]
  • Redundant and Specific Roles of Cohesin STAG Subunits in Chromatin Looping and Transcriptional Control
    Downloaded from genome.cshlp.org on October 10, 2021 - Published by Cold Spring Harbor Laboratory Press Research Redundant and specific roles of cohesin STAG subunits in chromatin looping and transcriptional control Valentina Casa,1,6 Macarena Moronta Gines,1,6 Eduardo Gade Gusmao,2,3,6 Johan A. Slotman,4 Anne Zirkel,2 Natasa Josipovic,2,3 Edwin Oole,5 Wilfred F.J. van IJcken,1,5 Adriaan B. Houtsmuller,4 Argyris Papantonis,2,3 and Kerstin S. Wendt1 1Department of Cell Biology, Erasmus MC, 3015 GD Rotterdam, The Netherlands; 2Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany; 3Institute of Pathology, University Medical Center, Georg-August University of Göttingen, 37075 Göttingen, Germany; 4Optical Imaging Centre, Erasmus MC, 3015 GD Rotterdam, The Netherlands; 5Center for Biomics, Erasmus MC, 3015 GD Rotterdam, The Netherlands Cohesin is a ring-shaped multiprotein complex that is crucial for 3D genome organization and transcriptional regulation during differentiation and development. It also confers sister chromatid cohesion and facilitates DNA damage repair. Besides its core subunits SMC3, SMC1A, and RAD21, cohesin in somatic cells contains one of two orthologous STAG sub- units, STAG1 or STAG2. How these variable subunits affect the function of the cohesin complex is still unclear. STAG1- and STAG2-cohesin were initially proposed to organize cohesion at telomeres and centromeres, respectively. Here, we uncover redundant and specific roles of STAG1 and STAG2 in gene regulation and chromatin looping using HCT116 cells with an auxin-inducible degron (AID) tag fused to either STAG1 or STAG2. Following rapid depletion of either subunit, we perform high-resolution Hi-C, gene expression, and sequential ChIP studies to show that STAG1 and STAG2 do not co-occupy in- dividual binding sites and have distinct ways by which they affect looping and gene expression.
    [Show full text]
  • Loss of Cohesin Complex Components STAG2 Or STAG3 Confers Resistance to BRAF Inhibition in Melanoma
    Loss of cohesin complex components STAG2 or STAG3 confers resistance to BRAF inhibition in melanoma The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Shen, C., S. H. Kim, S. Trousil, D. T. Frederick, A. Piris, P. Yuan, L. Cai, et al. 2016. “Loss of cohesin complex components STAG2 or STAG3 confers resistance to BRAF inhibition in melanoma.” Nature medicine 22 (9): 1056-1061. doi:10.1038/nm.4155. http:// dx.doi.org/10.1038/nm.4155. Published Version doi:10.1038/nm.4155 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:31731818 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author Nat Med Manuscript Author . Author manuscript; Manuscript Author available in PMC 2017 March 01. Published in final edited form as: Nat Med. 2016 September ; 22(9): 1056–1061. doi:10.1038/nm.4155. Loss of cohesin complex components STAG2 or STAG3 confers resistance to BRAF inhibition in melanoma Che-Hung Shen1, Sun Hye Kim1, Sebastian Trousil1, Dennie T. Frederick2, Adriano Piris3, Ping Yuan1, Li Cai1, Lei Gu4, Man Li1, Jung Hyun Lee1, Devarati Mitra1, David E. Fisher1,2, Ryan J. Sullivan2, Keith T. Flaherty2, and Bin Zheng1,* 1Cutaneous Biology Research Center, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 2Department of Medical Oncology, Massachusetts General Hospital Cancer Center, Boston, MA 3Department of Dermatology, Brigham & Women's Hospital and Harvard Medical School, Boston, MA 4Division of Newborn Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA.
    [Show full text]
  • Low Tolerance for Transcriptional Variation at Cohesin Genes Is Accompanied by Functional Links to Disease-Relevant Pathways
    bioRxiv preprint doi: https://doi.org/10.1101/2020.04.11.037358; this version posted April 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Title Low tolerance for transcriptional variation at cohesin genes is accompanied by functional links to disease-relevant pathways Authors William Schierdingǂ1, Julia Horsfieldǂ2,3, Justin O’Sullivan1,3,4 ǂTo whom correspondence should be addressed. 1 Liggins Institute, The University of Auckland, Auckland, New Zealand 2 Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand 3 The Maurice Wilkins Centre for Biodiscovery, The University of Auckland, Auckland, New Zealand 4 MRC Lifecourse Epidemiology Unit, University of Southampton Acknowledgements This work was supported by a Royal Society of New Zealand Marsden Grant to JH and JOS (16-UOO- 072), and WS was supported by the same grant. Contributions WS planned the study, performed analyses, and drafted the manuscript. JH and JOS revised the manuscript. Competing interests None declared. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.11.037358; this version posted April 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Variants in DNA regulatory elements can alter the regulation of distant genes through spatial- regulatory connections.
    [Show full text]
  • Human Stromal Antigen 1 (STAG1) Cohesin Subunit SA-1 a Target Enabling Package (TEP)
    Human Stromal Antigen 1 (STAG1) Cohesin Subunit SA-1 A Target Enabling Package (TEP) Gene ID / UniProt ID / EC 10274 / Q8WVM7 Target Nominator Mark Petronczki (Boerhinger-Ingelheim) SGC Authors Joseph Newman, Vittorio Katis, Opher Gileadi Collaborating Authors Mark Petronczki1 Target PI Opher Gileadi (SGC Oxford) Therapeutic Area(s) Cancer Disease Relevance STAG1 is essential for survival of cancer cells lacking the paralogue gene STAG2 Date Approved by TEP Evaluation June 10 2019 Group Document version Version 1 Document version date October 2020 Citation 10.5281/zenodo.3245308 Affiliations 1. Boehringer-Ingelheim. USEFUL LINKS (Please note that the inclusion of links to external sites should not be taken as an endorsement of that site by the SGC in any way) SUMMARY OF PROJECT Loss of function mutations in the cohesin subunit gene STAG2 are common in a variety of cancers (1). These cells become dependent on the paralogous cohesin subunit STAG1 (2-4). Mutants of STAG1 that disrupt the binding to the cohesin subunit RAD21 cannot complement the loss of STAG2. This TEP examines the druggability of STAG1 as a synthetic lethal strategy to treat stag2 - cancers. The TEP includes crystal structures of two domains of STAG1, alone and in complex with Rad21-rderived peptides. We performed screens of a fragment library and identified small molecules bound to pockets in the two domains of STAG1. We also developed assays for binding of RAD21 peptides to STAG1, which can be used to screen for molecules that disrupt binding. For more information regarding any aspect of TEPs and the TEP programme, please contact [email protected] 1 SCIENTIFIC BACKGROUND Cohesins are ring-shaped multiprotein complexes that encircle chromosomes from G1 to late anaphase, holding together sister chromatids and ensuring proper chromosomal segregation during mitosis (1,5,6).
    [Show full text]
  • Redundant and Specific Roles of Cohesin STAG Subunits in Chromatin Looping and Transcriptional Control
    Downloaded from genome.cshlp.org on October 6, 2021 - Published by Cold Spring Harbor Laboratory Press Redundant and specific roles of cohesin STAG subunits in chromatin looping and transcriptional control Valentina Casa1#, Macarena Moronta Gines1#, Eduardo Gade Gusmao2,3#, Johan A. Slotman4, Anne Zirkel2, Natasa Josipovic2,3, Edwin Oole5, Wilfred F.J. van IJcken5, Adriaan B. Houtsmuller4, Argyris Papantonis2,3,* and Kerstin S. Wendt1,* 1Department of Cell Biology, Erasmus MC, Rotterdam, The Netherlands 2Center for Molecular Medicine Cologne, University of Cologne, 50931 Cologne, Germany 3Institute of Pathology, University Medical Center, Georg-August University of Göttingen, 37075 Göttingen, Germany 4Optical Imaging Centre, Erasmus MC, Rotterdam, The Netherlands 5Center for Biomics, Erasmus MC, Rotterdam, The Netherlands *Corresponding authors #Authors contributed equally Downloaded from genome.cshlp.org on October 6, 2021 - Published by Cold Spring Harbor Laboratory Press Abstract Cohesin is a ring-shaped multiprotein complex that is crucial for 3D genome organization and transcriptional regulation during differentiation and development. It also confers sister chromatid cohesion and facilitates DNA damage repair. Besides its core subunits SMC3, SMC1A and RAD21, cohesin in somatic cells contains one of two orthologous STAG subunits, STAG1 or STAG2. How these variable subunits affect the function of the cohesin complex is still unclear. STAG1- and STAG2- cohesin were initially proposed to organize cohesion at telomeres and centromeres, respectively. Here, we uncover redundant and specific roles of STAG1 and STAG2 in gene regulation and chromatin looping using HCT116 cells with an auxin-inducible degron (AID) tag fused to either STAG1 or STAG2. Following rapid depletion of either subunit, we perform high-resolution Hi-C, gene expression and sequential ChIP studies to show that STAG1 and STAG2 do not co-occupy individual binding sites and have distinct ways by which they affect looping and gene expression.
    [Show full text]
  • A Sleeping Beauty Transposon-Mediated Screen Identifies Murine Susceptibility Genes for Adenomatous Polyposis Coli (Apc)-Dependent Intestinal Tumorigenesis
    A Sleeping Beauty transposon-mediated screen identifies murine susceptibility genes for adenomatous polyposis coli (Apc)-dependent intestinal tumorigenesis Timothy K. Starra,1, Patricia M. Scottb, Benjamin M. Marshb, Lei Zhaob, Bich L. N. Thanb, M. Gerard O’Sullivana,c, Aaron L. Sarverd, Adam J. Dupuye, David A. Largaespadaa, and Robert T. Cormierb,1 aDepartment of Genetics, Cell Biology and Development, Center for Genome Engineering, Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455; bDepartment of Biochemistry and Molecular Biology, University of Minnesota Medical School, Duluth, MN 55812; cDepartment of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN 55108; dDepartment of Biostatistics and Informatics, Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455; and eDepartment of Anatomy and Cell Biology, University of Iowa, Iowa City, IA 52242 Edited* by William F. Dove, University of Wisconsin, Madison, WI, and approved March 2, 2011 (received for review December 1, 2010) Min Min It is proposed that a progressive series of mutations and epigenetic conducted the screen in mice carrying the Apc allele. Apc events leads to human colorectal cancer (CRC) and metastasis. mice harbor a T→A nonsense mutation in the Apc gene (4, 5) Furthermore, data from resequencing of the coding regions of that results in a truncated protein product that is unable to bind human CRC suggests that a relatively large number of mutations β-catenin and promote its degradation, thus leading to abnormal occur in individual human CRC, most at low frequency. The levels of β-catenin protein and up-regulation of β-catenin target functional role of these low-frequency mutations in CRC, and genes such as cyclin D1 (Ccnd1) and myelocytomatosis oncogene specifically how they may cooperate with high-frequency muta- (C-Myc).
    [Show full text]
  • Novel STAG1 Frameshift Mutation in a Patient Affected by a Syndromic Form of Neurodevelopmental Disorder
    G C A T T A C G G C A T genes Case Report Novel STAG1 Frameshift Mutation in a Patient Affected by a Syndromic Form of Neurodevelopmental Disorder Ester Di Muro 1 , Pietro Palumbo 1 , Mario Benvenuto 1, Maria Accadia 2, Marilena Carmela Di Giacomo 3, Sergio Manieri 4, Rosaria Abate 4, Maria Tagliente 4, Stefano Castellana 5, Tommaso Mazza 5 , Massimo Carella 1 and Orazio Palumbo 1,* 1 Division of Medical Genetics, Fondazione IRCCS-Casa Sollievo della Sofferenza, 71013 San Giovanni Rotondo (Foggia), Italy; [email protected] (E.D.M.); [email protected] (P.P.); [email protected] (M.B.); [email protected] (M.C.) 2 Medical Genetics Service, Hospital “Cardinale G. Panico”, 73039 Tricase (Lecce), Italy; [email protected] 3 U.O.C di Anatomia Patologica, AOR Ospedale “San Carlo”, 85100 Potenza, Italy; [email protected] 4 U.O.C di Pediatria, AOR Ospedale “San Carlo”, 85100 Potenza, Italy; [email protected] (S.M.); [email protected] (R.A.); [email protected] (M.T.) 5 Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, 71013 San Giovanni Rotondo (Foggia), Italy; [email protected] (S.C.); [email protected] (T.M.) * Correspondence: [email protected]; Tel.: +39-0882416345 Abstract: The cohesin complex is a large evolutionary conserved functional unit which plays an Citation: Di Muro, E.; Palumbo, P.; essential role in DNA repair and replication, chromosome segregation and gene expression. It Benvenuto, M.; Accadia, M.; Di consists of four core proteins, SMC1A, SMC3, RAD21, and STAG1/2, and by proteins regulating Giacomo, M.C.; Manieri, S.; Abate, R.; the interaction between the complex and the chromosomes.
    [Show full text]
  • WAPL Maintains Dynamic Cohesin to Preserve Lineage Specific Distal Gene Regulation
    bioRxiv preprint doi: https://doi.org/10.1101/731141; this version posted August 9, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. WAPL maintains dynamic cohesin to preserve lineage specific distal gene regulation Ning Qing Liu1, Michela Maresca1, Teun van den Brand1, Luca Braccioli1, Marijne M.G.A. Schijns1, Hans Teunissen1, Benoit G. Bruneau2,3,4, Elphѐge P. Nora2,3, Elzo de Wit1,* Affiliations 1 Division Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands; 2 Gladstone Institutes, San Francisco, USA; 3 Cardiovascular Research Institute, University of California, San Francisco; 4 Department of Pediatrics, University of California, San Francisco. *corresponding author: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/731141; this version posted August 9, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. HIGHLIGHTS 1. The cohesin release factor WAPL is crucial for maintaining a pluripotency-specific phenotype. 2. Dynamic cohesin is enriched at lineage specific loci and overlaps with binding sites of pluripotency transcription factors. 3. Expression of lineage specific genes is maintained by dynamic cohesin binding through the formation of promoter-enhancer associated self-interaction domains. 4. CTCF-independent cohesin binding to chromatin is controlled by the pioneer factor OCT4. bioRxiv preprint doi: https://doi.org/10.1101/731141; this version posted August 9, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder.
    [Show full text]
  • Identification and Molecular Characterization of the Mammalian Α-Kleisin RAD21L
    Identification and molecular characterization of the mammalian α-kleisin RAD21L Cristina Gutiérrez-Caballero,1# Yurema Herrán,1# Manuel Sánchez-Martín,2 José Ángel Suja,3 José Luis Barbero,4 Elena Llano1,5* and Alberto M. Pendás1* 1Instituto de Biología Molecular y Celular del Cáncer (CSIC-USAL), Campus Miguel de Unamuno S/N, 37007 Salamanca, Spain. 2Departamento de Medicina, Campus Miguel de Unamuno S/N, 37007 Salamanca, Spain. 3Unidad de Biología Celular, Departamento de Biología, Universidad Autónoma de Madrid, 28049 Madrid, Spain. 4Departamento de Proliferación Celular y Desarrollo., Centro de Investigaciones Biológicas (CSIC), 28040 Madrid, Spain. 5Departamento de Fisiología, Campus Miguel de Unamuno S/N, 37007 Salamanca, Spain. #These authors contributed equally *Corresponding authors: Alberto M. Pendás Instituto de Biología Molecular y Celular del Cáncer (CSIC-USAL), Campus Miguel de Unamuno, 37007 Salamanca, Spain. E-MAIL: [email protected] Tel. 34-923 294809; Fax: 34-923 294743 Or Elena Llano Departamento de Fisología, Universidad de Salamanca Campus Miguel de Unamuno, 37007 Salamanca, Spain. E-MAIL: [email protected] Tel. 34-923 294809; Fax: 34-923 294743 Running title: Molecular characterization of mammalian RAD21L Key words: Cohesins, Kleisin, meiosis, mitosis, chromosome segregation, synaptonemal complex. Abbreviations: CC, cohesin complex; AE, axial element; LE, lateral element; IP, immunoprecipitation; SC, synaptonemal complex; ORF, open reading frame; WB, Western blot. 1 Abstract Meiosis is a fundamental process that generates new combinations between maternal and paternal genomes and haploid gametes from diploid progenitors. Many of the meiosis- specific events stem from the behavior of the cohesin complex (CC), a proteinaceous ring structure that entraps sister chromatids until the onset of anaphase.
    [Show full text]
  • Cpg Island-Mediated Global Gene Regulatory Modes in Mouse Embryonic Stem Cells
    ARTICLE Received 26 Mar 2014 | Accepted 3 Oct 2014 | Published 18 Nov 2014 DOI: 10.1038/ncomms6490 OPEN CpG island-mediated global gene regulatory modes in mouse embryonic stem cells Samuel Beck1, Bum-Kyu Lee1, Catherine Rhee1, Jawon Song2, Andrew J. Woo3 & Jonghwan Kim1,4,5 Both transcriptional and epigenetic regulations are fundamental for the control of eukaryotic gene expression. Here we perform a compendium analysis of 4200 large sequencing data sets to elucidate the regulatory logic of global gene expression programs in mouse embryonic stem (ES) cells. We define four major classes of DNA-binding proteins (Core, PRC, MYC and CTCF) based on their target co-occupancy, and discover reciprocal regulation between the MYC and PRC classes for the activity of nearly all genes under the control of the CpG island (CGI)-containing promoters. This CGI-dependent regulatory mode explains the functional segregation between CGI-containing and CGI-less genes during early development. By defining active enhancers based on the co-occupancy of the Core class, we further demon- strate their additive roles in CGI-containing gene expression and cell type-specific roles in CGI-less gene expression. Altogether, our analyses provide novel insights into previously unknown CGI-dependent global gene regulatory modes. 1 Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, USA. 2 Texas Advanced Computing Center, The University of Texas at Austin, Austin, Texas 78758, USA. 3 School of Medicine and Pharmacology, Royal Perth Hospital Unit, The University of Western Australia, Perth, WA 6000, Australia. 4 Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA.
    [Show full text]
  • Cohesin Components Stag1 and Stag2 Differentially Influence Haematopoietic Mesoderm Development in Zebrafish Embryos
    bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.346122; this version posted October 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Cohesin components Stag1 and Stag2 differentially influence haematopoietic mesoderm development in zebrafish embryos 1 Sarada Ketharnathan1,2, Anastasia Labudina1, Julia A. Horsfield1,3* 2 1University of Otago, Department of Pathology, Otago Medical School, Dunedin, New Zealand 3 2Current address: CHEO Research Institute, University of Ottawa, Ottawa, Canada 4 3The University of Auckland, Maurice Wilkins Centre for Molecular Biodiscovery, Private Bag 5 92019, Auckland, New Zealand 6 7 * Correspondence: 8 Julia Horsfield 9 [email protected] 10 Keywords: zebrafish, cohesin, haematopoiesis, mesoderm, development. 11 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.19.346122; this version posted October 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND Characterisation4.0 International license. of zebrafish Stag paralogues 12 Abstract 13 Cohesin is a multiprotein complex made up of core subunits Smc1, Smc3 and Rad21, and either 14 Stag1 or Stag2. Normal haematopoietic development relies on crucial functions of cohesin in cell 15 division and regulation of gene expression via three-dimensional chromatin organisation. Cohesin 16 subunit STAG2 is frequently mutated in myeloid malignancies, but the individual contributions of 17 Stag variants to haematopoiesis or malignancy are not fully understood.
    [Show full text]