Improving Multitask Transcription Factor Binding Site Prediction with Base-Pair Resolution

Total Page:16

File Type:pdf, Size:1020Kb

Improving Multitask Transcription Factor Binding Site Prediction with Base-Pair Resolution bioRxiv preprint doi: https://doi.org/10.1101/2021.05.29.446316; this version posted May 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. NetTIME: Improving Multitask Transcription Factor Binding Site Prediction with Base-pair Resolution a a,b, a,b,c,d, Ren Yi , Kyunghyun Cho ⇤, Richard Bonneau ⇤ aDepartment of Computer Science, New York University, New York, NY 10011, USA bCenter for Data Science, New York University, New York, NY 10011, USA cDepartment of Biology, New York University, New York, NY 10003, USA dCenter for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY 10010, USA Abstract Motivation: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequenc- ing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because avail- able binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here we propose NetTIME, a multitask learning framework for predicting cell-type-specific transcription factor binding sites with base- pair resolution. Results: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a prob- ability threshold and reduces classification noise. We compare our method’s predictive performance with several state-of-the-art methods, including DeepBind, BindSpace, and Catchitt, and show that our method outperforms previous methods under both supervised and transfer learning settings. Availability: NetTIME is freely available at https://github.com/ryi06/NetTIME Contact: [email protected] 1. Introduction DNA in vivo [1]. Experimentally determining cell- type-specific TF binding sites is made possible Genome-wide modeling of non-coding DNA se- through high-throughput techniques such as chro- quence function is among the most fundamental matin immunoprecipitation followed by sequencing and yet challenging tasks in biology. Transcrip- (ChIP-seq) [2]. Due to experimental limitations, tional regulation is orchestrated by transcription however, it is infeasible to perform ChIP-seq (or factors (TFs), whose binding to DNA initiates a related single-TF-focused experiments) on all TFs series of signaling cascades that ultimately deter- across all cell types and organisms [1]. Therefore, mine both the rate of transcription of their tar- computational methods for accurately predicting in get genes and the underlying DNA functions. Both vivo TF binding sites are essential for studying TF the cell-type-specific chromatin state and the DNA functions, especially for less well-known TFs and sequence a↵ect the interactions between TFs and cell types. ⇤To whom correspondence should be addressed. Preprint submitted to Elsevier bioRxiv preprint doi: https://doi.org/10.1101/2021.05.29.446316; this version posted May 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Multiple community crowdsourcing challenges role. This multitude of factors changes substan- have been organized by the DREAM Consortium1 tially from cell type to cell type. In vivo TF bind- to find the best computational methods for predict- ing site prediction, therefore, requires cell-type- ing TF binding sites in both in vitro and in vivo specific data such as chromatin accessibility and settings [3, 4]. These challenges also set the com- histone modifications. Convolutional neural net- munity standard for both processing data and eval- works as well as TF- and cell-type-specific embed- uating methods. However, top-performing methods ding vectors have both been used to learn cell-type- from these challenges have revealed key limitations specific TF binding profiles from DNA sequences in the current TF binding prediction community. and DNase-seq data [19]. The DREAM Consor- Generalizing predictions beyond the training panels tium also initiated the ENCODE-DREAM chal- of cell types and TFs can potentially benefit from lenge to systematically evaluate methods for pre- multitask learning and increased prediction reso- dicting in vivo TF binding sites [4]. Apart from lution. However, many existing methods still use carefully designed model architectures, top-ranking shallow single-task models. Predictions generated methods in this challenge also rely on extensively from these methods typically have low resolution, curated feature sets. One such method, called and they cannot achieve competitive performance Catchitt [20], achieves top performance by lever- for prediction of binding regions shorter than 50 aging a wide range of features including DNA se- base pairs (bp), although the actual TF binding quences, genome annotations, TF motifs, DNase- sites are considerably shorter [5]. seq, and RNA-seq. 1.1. Related work 1.2. Current limitations Early TF binding prediction methods such as Compendium databases such as ENCODE [21] MEME [6, 7] focused on deriving interpretable TF and Remap [22] have collected ChIP-seq data for a motif position weight matrices (PWMs) that char- large collection of TFs in a handful of well-studied acterize TF sequence specificity. Amid rapid ad- cell types and organisms [1]. Within a single or- vancement in machine learning, however, growing ganism, however, the ENCODE TF ChIP-seq col- evidence has suggested that sequence specificity lection is highly skewed towards only a few TFs in a can be more accurately captured through more ab- small collection of well-characterized cell lines and stract feature extraction techniques. For example, primary cell types (Figure. S1). Knowledge trans- a method called DeepBind [8] used a convolutional fer from well-known cell types and TFs is crucial neural network to extract TF binding patterns from for understanding less-studied cell types and TFs. DNA sequences. Several modifications to Deep- Top-performing methods from the ENCODE- Bind subsequently improved model architecture [9] DREAM Challenge typically adopt the single-task as well as prediction resolution [10]. Yuan et al. learning approach. For example, Catchitt [20] developed BindSpace, which embeds TF-bound se- trains one model per TF and cell type. Cross cell- quences into a common high-dimensional space. type transfer predictions are achieved by provid- Embedding methods like BindSpace belong to a ing a trained model with input features from a class of representation learning techniques com- new cell type. This approach can be highly un- monly used in natural language processing [12] and reliable as the chromatin landscapes between the genomics [13, 14] for mapping entities to vectors of trained and predicted cell types can be drastically real numbers. New methods also explicitly model di↵erent [23] and these di↵erences can be function- protein binding sites with multiple binding mode ally important [24]. Alternatively, each model can predictors [15], and the e↵ect of sequence variants be trained on one TF using cell-type-specific data on non-coding DNA functions at scale [16, 17, 18]. across multiple cell types of interest [25]. However, In general, the DNA sequence at a potential TF such models tend to assign high binding probabil- binding site is just the beginning of the full DNA- ities to common binding sites among training cell function picture, and the state of the surrounding types without proper mechanisms to distinguish cell chromosome, the TF and TF-cofactor expression, types. Very few methods have adopted the multi- and other contextual factors play an equally large task learning approach in which data from multiple cell types and TFs are trained jointly in order to improve the overall model performance. One mul- 1http://dreamchallenges.org/about-dream/ titask solution [16, 17, 18] involves training a multi- 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.29.446316; this version posted May 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. class classifier on DNA sequences, where each class To collect target labels that cover a wide range represents the occurrence of binding sites for one of cellular conditions and binding patterns, we first TF in one cell type. This solution is suboptimal as select 7 cell types (including 3 cancer cell types, 3 it cannot generalize predictions beyond the training normal cell types, and 1 stem cell type) and 22 TFs TF and cell-type pairs. (including 17 TFs from 7 TF protein families as Sequence context a↵ects TF binding affinity [26], well as 5 functionally related TFs.). Conserved and and increasing context size can improve TF binding relaxed peak sets are collected from 71 ENCODE site prediction [16]. TF binding sites are typically replicated ChIP-seq experiments conducted on our only 4-20 bp long [5]; increasing the prediction res- cell types and TFs of interest.
Recommended publications
  • Autism Multiplex Family with 16P11.2P12.2 Microduplication Syndrome in Monozygotic Twins and Distal 16P11.2 Deletion in Their Brother
    European Journal of Human Genetics (2012) 20, 540–546 & 2012 Macmillan Publishers Limited All rights reserved 1018-4813/12 www.nature.com/ejhg ARTICLE Autism multiplex family with 16p11.2p12.2 microduplication syndrome in monozygotic twins and distal 16p11.2 deletion in their brother Anne-Claude Tabet1,2,3,4, Marion Pilorge2,3,4, Richard Delorme5,6,Fre´de´rique Amsellem5,6, Jean-Marc Pinard7, Marion Leboyer6,8,9, Alain Verloes10, Brigitte Benzacken1,11,12 and Catalina Betancur*,2,3,4 The pericentromeric region of chromosome 16p is rich in segmental duplications that predispose to rearrangements through non-allelic homologous recombination. Several recurrent copy number variations have been described recently in chromosome 16p. 16p11.2 rearrangements (29.5–30.1 Mb) are associated with autism, intellectual disability (ID) and other neurodevelopmental disorders. Another recognizable but less common microdeletion syndrome in 16p11.2p12.2 (21.4 to 28.5–30.1 Mb) has been described in six individuals with ID, whereas apparently reciprocal duplications, studied by standard cytogenetic and fluorescence in situ hybridization techniques, have been reported in three patients with autism spectrum disorders. Here, we report a multiplex family with three boys affected with autism, including two monozygotic twins carrying a de novo 16p11.2p12.2 duplication of 8.95 Mb (21.28–30.23 Mb) characterized by single-nucleotide polymorphism array, encompassing both the 16p11.2 and 16p11.2p12.2 regions. The twins exhibited autism, severe ID, and dysmorphic features, including a triangular face, deep-set eyes, large and prominent nasal bridge, and tall, slender build. The eldest brother presented with autism, mild ID, early-onset obesity and normal craniofacial features, and carried a smaller, overlapping 16p11.2 microdeletion of 847 kb (28.40–29.25 Mb), inherited from his apparently healthy father.
    [Show full text]
  • Leveraging Mouse Chromatin Data for Heritability Enrichment Informs Common Disease Architecture and Reveals Cortical Layer Contributions to Schizophrenia
    Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Research Leveraging mouse chromatin data for heritability enrichment informs common disease architecture and reveals cortical layer contributions to schizophrenia Paul W. Hook1 and Andrew S. McCallion1,2,3 1McKusick-Nathans Department of Genetic Medicine, 2Department of Comparative and Molecular Pathobiology, 3Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA Genome-wide association studies have implicated thousands of noncoding variants across common human phenotypes. However, they cannot directly inform the cellular context in which disease-associated variants act. Here, we use open chro- matin profiles from discrete mouse cell populations to address this challenge. We applied stratified linkage disequilibrium score regression and evaluated heritability enrichment in 64 genome-wide association studies, emphasizing schizophrenia. We provide evidence that mouse-derived human open chromatin profiles can serve as powerful proxies for difficult to ob- tain human cell populations, facilitating the illumination of common disease heritability enrichment across an array of hu- man phenotypes. We demonstrate that signatures from discrete subpopulations of cortical excitatory and inhibitory neurons are significantly enriched for schizophrenia heritability with maximal enrichment in cortical layer V excitatory neu- rons. We also show that differences between schizophrenia and bipolar disorder are concentrated in excitatory neurons in cortical layers II-III, IV, and V, as well as the dentate gyrus. Finally, we leverage these data to fine-map variants in 177 schiz- ophrenia loci nominating variants in 104/177. We integrate these data with transcription factor binding site, chromatin in- teraction, and validated enhancer data, placing variants in the cellular context where they may modulate risk.
    [Show full text]
  • Identification of Genetic Variants in CFAP221 As a Cause of Primary Ciliary Dyskinesia
    HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author J Hum Genet Manuscript Author . Author manuscript; Manuscript Author available in PMC 2020 April 21. Published in final edited form as: J Hum Genet. 2020 January ; 65(2): 175–180. doi:10.1038/s10038-019-0686-1. Identification of Genetic Variants in CFAP221 as a Cause of Primary Ciliary Dyskinesia Ximena M. Bustamante-Marin1,#, Adam Shapiro2,#, Patrick R. Sears1, Wu-Lin Charng3, Donald F. Conrad3,4, Margaret W. Leigh5, Michael R. Knowles1, Lawrence E. Ostrowski1,*, Maimoona A. Zariwala6,* 1Department of Medicine, Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina., NC 27599, USA. 2Department of Pediatrics, Division of Pediatric Respiratory Medicine, McGill University Health Centre Research Institute, Montreal, Quebec. 3Department of Genetics and Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA 4Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, 97006, USA 5Department of Pediatrics, Marsico Lung Institute, University of North Carolina, Chapel Hill, North Carolina, 27599. 6Department of Pathology and Laboratory Medicine, Marsico Lung Institute University of North Carolina, Chapel Hill, NC 27599, USA. Abstract Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms * Corresponding Author: Lawrence E. Ostrowski, 104 Manning Dr Room 6021, [email protected], Phone: 919-8437177, Maimoona A. Zariwala, 7219 A Marsico Hall, [email protected], Phone: 919-966-7050, Fax: 919-966-5178. #Ximena M. Bustamante-Marin and Adam Shapiro are co-first authors.
    [Show full text]
  • A KMT2A-AFF1 Gene Regulatory Network Highlights the Role of Core Transcription Factors and Reveals the Regulatory Logic of Key Downstream Target Genes
    Downloaded from genome.cshlp.org on October 7, 2021 - Published by Cold Spring Harbor Laboratory Press Research A KMT2A-AFF1 gene regulatory network highlights the role of core transcription factors and reveals the regulatory logic of key downstream target genes Joe R. Harman,1,7 Ross Thorne,1,7 Max Jamilly,2 Marta Tapia,1,8 Nicholas T. Crump,1 Siobhan Rice,1,3 Ryan Beveridge,1,4 Edward Morrissey,5 Marella F.T.R. de Bruijn,1 Irene Roberts,3,6 Anindita Roy,3,6 Tudor A. Fulga,2,9 and Thomas A. Milne1,6 1MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom; 2MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom; 3MRC Molecular Haematology Unit, MRC Weatherall Institute of Molecular Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 9DS, United Kingdom; 4Virus Screening Facility, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, United Kingdom; 5Center for Computational Biology, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, United Kingdom; 6NIHR Oxford Biomedical Research Centre Haematology Theme, University of Oxford, Oxford, OX3 9DS, United Kingdom Regulatory interactions mediated by transcription factors (TFs) make up complex networks that control cellular behavior. Fully understanding these gene regulatory networks (GRNs) offers greater insight into the consequences of disease-causing perturbations than can be achieved by studying single TF binding events in isolation. Chromosomal translocations of the lysine methyltransferase 2A (KMT2A) gene produce KMT2A fusion proteins such as KMT2A-AFF1 (previously MLL-AF4), caus- ing poor prognosis acute lymphoblastic leukemias (ALLs) that sometimes relapse as acute myeloid leukemias (AMLs).
    [Show full text]
  • A Rare Duplication on Chromosome 16P11.2 Is Identified in Patients with Psychosis in Alzheimer’S Disease
    A Rare Duplication on Chromosome 16p11.2 Is Identified in Patients with Psychosis in Alzheimer’s Disease Xiaojing Zheng1,7*, F. Yesim Demirci2, M. Michael Barmada2, Gale A. Richardson3,6, Oscar L. Lopez4,5, Robert A. Sweet3,4,5, M. Ilyas Kamboh2,3,5, Eleanor Feingold1,2 1 Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 2 Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 3 Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 4 Department of Neurology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 5 VISN 4 Mental Illness Research, Education and Clinical Center, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America, 6 Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 7 Department of Pediatrics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, United States of America Abstract Epidemiological and genetic studies suggest that schizophrenia and autism may share genetic links. Besides common single nucleotide polymorphisms, recent data suggest that some rare copy number variants (CNVs) are risk factors for both disorders. Because we have previously found that schizophrenia and psychosis in Alzheimer’s disease (AD+P) share some genetic risk, we investigated whether CNVs reported in schizophrenia and autism are also linked to AD+P. We searched for CNVs associated with AD+P in 7 recurrent CNV regions that have been previously identified across autism and schizophrenia, using the Illumina HumanOmni1-Quad BeadChip.
    [Show full text]
  • Positional Specificity of Different Transcription Factor Classes Within Enhancers
    Positional specificity of different transcription factor classes within enhancers Sharon R. Grossmana,b,c, Jesse Engreitza, John P. Raya, Tung H. Nguyena, Nir Hacohena,d, and Eric S. Landera,b,e,1 aBroad Institute of MIT and Harvard, Cambridge, MA 02142; bDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; cProgram in Health Sciences and Technology, Harvard Medical School, Boston, MA 02215; dCancer Research, Massachusetts General Hospital, Boston, MA 02114; and eDepartment of Systems Biology, Harvard Medical School, Boston, MA 02215 Contributed by Eric S. Lander, June 19, 2018 (sent for review March 26, 2018; reviewed by Gioacchino Natoli and Alexander Stark) Gene expression is controlled by sequence-specific transcription type-restricted enhancers (active in <50% of the cell types) and factors (TFs), which bind to regulatory sequences in DNA. TF ubiquitous enhancers (active in >90% of the cell types) (SI Ap- binding occurs in nucleosome-depleted regions of DNA (NDRs), pendix, Fig. S1C). which generally encompass regions with lengths similar to those We next sought to infer functional TF-binding sites within the protected by nucleosomes. However, less is known about where active regulatory elements. In a recent study (5), we found that within these regions specific TFs tend to be found. Here, we char- TF binding is strongly correlated with the quantitative DNA acterize the positional bias of inferred binding sites for 103 TFs accessibility of a region. Furthermore, the TF motifs associated within ∼500,000 NDRs across 47 cell types. We find that distinct with enhancer activity in reporter assays in a cell type corre- classes of TFs display different binding preferences: Some tend to sponded closely to those that are most enriched in the genomic have binding sites toward the edges, some toward the center, and sequences of active regulatory elements in that cell type (5).
    [Show full text]
  • 2020.05.10.087254V1.Full
    Edinburgh Research Explorer Identification of MAZ as a novel transcription factor regulating erythropoiesis Citation for published version: Deen, D, Butter , F, Holland, ML, Samara, V, Sloane-Stanley, JA, Ayyub, H, Mann, M, Garrick, D & Vernimmen, D 2020 'Identification of MAZ as a novel transcription factor regulating erythropoiesis'. https://doi.org/10.1101/2020.05.10.087254, https://doi.org/10.1101/2020.05.10.087254 Digital Object Identifier (DOI): 10.1101/2020.05.10.087254 10.1101/2020.05.10.087254 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 08. Oct. 2021 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.10.087254; this version posted May 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Identification of MAZ as a novel transcription factor regulating erythropoiesis Darya Deen1, Falk Butter2, Michelle L.
    [Show full text]
  • 16P11.2 Transcription Factor MAZ Is a Dosage-Sensitive Regulator Of
    16p11.2 transcription factor MAZ is a dosage-sensitive PNAS PLUS regulator of genitourinary development Meade Hallera,b,c,1, Jason Aub,c, Marisol O’Neilla,b,c, and Dolores J. Lamba,b,c aDepartment of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030; bCenter for Reproductive Medicine, Baylor College of Medicine, Houston, TX 77030; and cDepartment of Urology, Baylor College of Medicine, Houston, TX 77030 Edited by Frank Costantini, Columbia University, New York, NY, and accepted by Editorial Board Member Kathryn V. Anderson January 12, 2018 (received for review September 12, 2017) Genitourinary (GU) birth defects are among the most common yet novo or inherited, and their allele frequency is influenced by both least studied congenital malformations. Congenital anomalies of the the rate of their spontaneous occurrence as well as their compati- kidney and urinary tract (CAKUTs) have high morbidity and mortality bility with life and with fertility (6). While many CNVs are consid- rates and account for ∼30% of structural birth defects. Copy number ered benign (7), specific CNVs have well-documented associations variation (CNV) mapping revealed that 16p11.2 is a hotspot for GU with a wide variety of birth defects (8). The specific congenital development. The only gene covered collectivelybyallofthemapped malformationsassociatedwitheachpathogenicCNVdependonthe GU-patient CNVs was MYC-associated zinc finger transcription factor specific genes disrupted within the genomic loci and whether the (MAZ), and MAZ CNV frequency is enriched in nonsyndromic GU- CNV is a duplication or a deletion. A CNV is considered syndromic abnormal patients. Knockdown of MAZ in HEK293 cells results in if multiple organs fail to develop normally in affected patients.
    [Show full text]
  • The Myc-Associated Zinc Finger Protein (MAZ) Works Together with CTCF to Control Cohesin Positioning and Genome Organization
    The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organization Tiaojiang Xiaoa, Xin Lia, and Gary Felsenfelda,1 aLaboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD 20892-0540 Contributed by Gary Felsenfeld, December 30, 2020 (sent for review November 6, 2020; reviewed by Peter Fraser and Tom Maniatis) The Myc-associated zinc finger protein (MAZ) is often found at interaction contributes to the effect of MAZ binding on CTCF genomic binding sites adjacent to CTCF, a protein which affects affinity for DNA. There are a relatively small number of genomic large-scale genome organization through its interaction with sites where MAZ is bound independently of CTCF but is asso- cohesin. We show here that, like CTCF, MAZ physically interacts ciated with Rad21. Depletion of MAZ results in loss of Rad21 with a cohesin subunit and can arrest cohesin sliding indepen- from these sites. dently of CTCF. It also shares with CTCF the ability to indepen- These results suggest that MAZ plays a role in genome or- dently pause the elongating form of RNA polymerase II, and ganization that supports and complements that of CTCF. In many consequently affects RNA alternative splicing. CTCF/MAZ double genomic locations, it binds to sites adjacent to CTCF, stabilizes sites are more effective at sequestering cohesin than sites occu- CTCF binding, and provides an additional contribution to the pied only by CTCF. Furthermore, depletion of CTCF results in pref- arrest of cohesin. Consistent with an ability to interfere with such erential loss of CTCF from sites not occupied by MAZ.
    [Show full text]
  • Development and Utilization of Human Decidualization Reporter Cell Line Uncovers New Modulators of Female Fertility
    Development and utilization of human decidualization reporter cell line uncovers new modulators of female fertility Meade Hallera,1, Yan Yina,1, and Liang Maa,2 aDepartment of Internal Medicine, Division of Dermatology, Washington University in St. Louis, St. Louis, MO 63110 Edited by R. Michael Roberts, University of Missouri, Columbia, MO, and approved August 19, 2019 (received for review May 2, 2019) Failure of embryo implantation accounts for a significant percent- Decidualization involves the rapid proliferation, then differ- age of female infertility. Exquisitely coordinated molecular pro- entiation of fibroblast-like endometrial stromal cells into epithelioid- grams govern the interaction between the competent blastocyst like decidual cells, some of which become large and polyploid or and the receptive uterus. Decidualization, the rapid proliferation multinuclear. These cells become part of the decidual tissue that and differentiation of endometrial stromal cells into decidual cells, surrounds the implanting conceptus (2, 9). The maternal de- is required for implantation. Decidualization defects can cause cidual tissue plays a crucial role in the establishment of preg- poor placentation, intrauterine growth restriction, and early nancy (11, 12). Accompanying the transformation of uterine parturition leading to preterm birth. Decidualization has not yet stromal cells to decidual cells are changes occurring in the en- been systematically studied at the genetic level due to the lack of a dometrium that include extensive extracellular matrix remodel- suitable high-throughput screening tool. Herein we describe the ing, vascular remodeling, angiogenesis, and apoptosis. While generation of an immortalized human endometrial stromal cell line these are happening, the conceptus enlarges and placental de- that uses yellow fluorescent protein under the control of the prolactin velopment occurs (2, 9).
    [Show full text]
  • A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes
    bioRxiv preprint doi: https://doi.org/10.1101/058826; this version posted June 14, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A cell type-specific expression signature predicts haploinsufficient autism-susceptibility genes Chaolin Zhang1,2,3,6,*, Yufeng Shen1,4,5,6* 1 Department of Systems Biology, 2 Department of Biochemistry and Molecular Biophysics, 3 Center for Motor Neuron Biology and Disease, 4 Department of Biomedical Informatics, 5 JP Sulzberger Genome Center Columbia University, New York NY 10032, USA 6 Equal contribution * To whom correspondence should be addressed: [email protected] (C.Z.); [email protected] (Y.S.) 1 bioRxiv preprint doi: https://doi.org/10.1101/058826; this version posted June 14, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Recent studies have identified many genes with rare de novo mutations in autism, but a limited number of these have been conclusively established as disease-susceptibility genes due to lack of recurrence and confounding background mutations. Such extreme genetic heterogeneity severely limits recurrence–based statistical power even in studies with a large sample size. In addition, the cellular contexts in which these genomic lesions confer disease risks remain poorly understood.
    [Show full text]
  • A Cpg Island Promoter Drives the CXXC5 Gene Expression
    www.nature.com/scientificreports OPEN A CpG island promoter drives the CXXC5 gene expression Pelin Yaşar1,6,8*, Gizem Kars1,8, Kerim Yavuz1,8, Gamze Ayaz1,7, Çerağ Oğuztüzün2, Ecenaz Bilgen3, Zeynep Suvacı3, Özgül Persil Çetinkol3, Tolga Can4 & Mesut Muyan1,5* CXXC5 is a member of the zinc-fnger CXXC family that binds to unmethylated CpG dinucleotides. CXXC5 modulates gene expressions resulting in diverse cellular events mediated by distinct signaling pathways. However, the mechanism responsible for CXXC5 expression remains largely unknown. We found here that of the 14 annotated CXXC5 transcripts with distinct 5′ untranslated regions encoding the same protein, transcript variant 2 with the highest expression level among variants represents the main transcript in cell models. The DNA segment in and at the immediate 5′-sequences of the frst exon of variant 2 contains a core promoter within which multiple transcription start sites are present. Residing in a region with high G–C nucleotide content and CpG repeats, the core promoter is unmethylated, defcient in nucleosomes, and associated with active RNA polymerase-II. These fndings suggest that a CpG island promoter drives CXXC5 expression. Promoter pull-down revealed the association of various transcription factors (TFs) and transcription co-regulatory proteins, as well as proteins involved in histone/chromatin, DNA, and RNA processing with the core promoter. Of the TFs, we verifed that ELF1 and MAZ contribute to CXXC5 expression. Moreover, the frst exon of variant 2 may contain a G-quadruplex forming region that could modulate CXXC5 expression. DNA methylation is one of the mechanisms of gene silencing and primarily occurs in CpG dinucleotides of the genome.
    [Show full text]