Initial Sequencing and Analysis of the Human Genome

Total Page:16

File Type:pdf, Size:1020Kb

Initial Sequencing and Analysis of the Human Genome articles Initial sequencing and analysis of the human genome International Human Genome Sequencing Consortium* * A partial list of authors appears on the opposite page. Af®liations are listed at the end of the paper. ............................................................................................................................................................................................................................................................................ The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. The rediscovery of Mendel's laws of heredity in the opening weeks of coordinate regulation of the genes in the clusters. the 20th century1±3 sparked a scienti®c quest to understand the X There appear to be about 30,000±40,000 protein-coding genes in nature and content of genetic information that has propelled the human genomeÐonly about twice as many as in worm or ¯y. biology for the last hundred years. The scienti®c progress made However, the genes are more complex, with more alternative falls naturally into four main phases, corresponding roughly to the splicing generating a larger number of protein products. four quarters of the century. The ®rst established the cellular basis of X The full set of proteins (the `proteome') encoded by the human heredity: the chromosomes. The second de®ned the molecular basis genome is more complex than those of invertebrates. This is due in of heredity: the DNA double helix. The third unlocked the informa- part to the presence of vertebrate-speci®c protein domains and tional basis of heredity, with the discovery of the biological mechan- motifs (an estimated 7% of the total), but more to the fact that ism by which cells read the information contained in genes and with vertebrates appear to have arranged pre-existing components into a the invention of the recombinant DNA technologies of cloning and richer collection of domain architectures. sequencing by which scientists can do the same. X Hundreds of human genes appear likely to have resulted from The last quarter of a century has been marked by a relentless drive horizontal transfer from bacteria at some point in the vertebrate to decipher ®rst genes and then entire genomes, spawning the ®eld lineage. Dozens of genes appear to have been derived from trans- of genomics. The fruits of this work already include the genome posable elements. sequences of 599 viruses and viroids, 205 naturally occurring X Although about half of the human genome derives from trans- plasmids, 185 organelles, 31 eubacteria, seven archaea, one posable elements, there has been a marked decline in the overall fungus, two animals and one plant. activity of such elements in the hominid lineage. DNA transposons Here we report the results of a collaboration involving 20 groups appear to have become completely inactive and long-terminal from the United States, the United Kingdom, Japan, France, repeat (LTR) retroposons may also have done so. Germany and China to produce a draft sequence of the human X The pericentromeric and subtelomeric regions of chromosomes genome. The draft genome sequence was generated from a physical are ®lled with large recent segmental duplications of sequence from map covering more than 96% of the euchromatic part of the human elsewhere in the genome. Segmental duplication is much more genome and, together with additional sequence in public databases, frequent in humans than in yeast, ¯y or worm. it covers about 94% of the human genome. The sequence was X Analysis of the organization of Alu elements explains the long- produced over a relatively short period, with coverage rising from standing mystery of their surprising genomic distribution, and about 10% to more than 90% over roughly ®fteen months. The suggests that there may be strong selection in favour of preferential sequence data have been made available without restriction and retention of Alu elements in GC-rich regions and that these `sel®sh' updated daily throughout the project. The task ahead is to produce a elements may bene®t their human hosts. ®nished sequence, by closing all gaps and resolving all ambiguities. X The mutation rate is about twice as high in male as in female Already about one billion bases are in ®nal form and the task of meiosis, showing that most mutation occurs in males. bringing the vast majority of the sequence to this standard is now X Cytogenetic analysis of the sequenced clones con®rms sugges- straightforward and should proceed rapidly. tions that large GC-poor regions are strongly correlated with `dark The sequence of the human genome is of interest in several G-bands' in karyotypes. respects. It is the largest genome to be extensively sequenced so far, X Recombination rates tend to be much higher in distal regions being 25 times as large as any previously sequenced genome and (around 20 megabases (Mb)) of chromosomes and on shorter eight times as large as the sum of all such genomes. It is the ®rst chromosome arms in general, in a pattern that promotes the vertebrate genome to be extensively sequenced. And, uniquely, it is occurrence of at least one crossover per chromosome arm in each the genome of our own species. meiosis. Much work remains to be done to produce a complete ®nished X More than 1.4 million single nucleotide polymorphisms (SNPs) sequence, but the vast trove of information that has become in the human genome have been identi®ed. This collection should available through this collaborative effort allows a global perspective allow the initiation of genome-wide linkage disequilibrium on the human genome. Although the details will change as the mapping of the genes in the human population. sequence is ®nished, many points are already clear. In this paper, we start by presenting background information on X The genomic landscape shows marked variation in the distribu- the project and describing the generation, assembly and evaluation tion of a number of features, including genes, transposable of the draft genome sequence. We then focus on an initial analysis of elements, GC content, CpG islands and recombination rate. This the sequence itself: the broad chromosomal landscape; the repeat gives us important clues about function. For example, the devel- elements and the rich palaeontological record of evolutionary and opmentally important HOX gene clusters are the most repeat-poor biological processes that they provide; the human genes and regions of the human genome, probably re¯ecting the very complex proteins and their differences and similarities with those of other 860 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com articles Genome Sequencing Centres (Listed in order of total genomic Biotechnology: Andre Rosenthal12, Matthias Platzer12, sequence contributed, with a partial list of personnel. A full list of Gerald Nyakatura12, Stefan Taudien12 & Andreas Rump12 contributors at each centre is available as Supplementary Information.) Beijing Genomics Institute/Human Genome Center: Huanming Yang13, Jun Yu13, Jian Wang13, Guyang Huang14 Whitehead Institute for Biomedical Research, Center for Genome & Jun Gu15 Research: Eric S. Lander1*, Lauren M. Linton1, Bruce Birren1*, Chad Nusbaum1*, Michael C. Zody1*, Jennifer Baldwin1, Multimegabase Sequencing Center, The Institute for Systems 16 16 16 16 Keri Devon1, Ken Dewar1, Michael Doyle1, William FitzHugh1*, Biology: Leroy Hood , Lee Rowen , Anup Madan & Shizen Qin 1 1 1 1 Roel Funke , Diane Gage , Katrina Harris , Andrew Heaford , 17 1 1 1 1 Stanford Genome Technology Center: Ronald W. Davis , John Howland , Lisa Kann , Jessica Lehoczky , Rosie LeVine , 17 17 17 Paul McEwan1, Kevin McKernan1, James Meldrim1, Jill P. Mesirov1*, Nancy A. Federspiel , A. Pia Abola & Michael J. Proctor 1 1 1 Cher Miranda , William Morris , Jerome Naylor , 18 1 1 1 Stanford Human Genome Center: Richard M. Myers , Christina Raymond , Mark Rosetti , Ralph Santos , 18 18 18 1 1 1 Jeremy Schmutz , Mark Dickson , Jane Grimwood Andrew Sheridan , Carrie Sougnez , Nicole Stange-Thomann , & David R. Cox18 Nikola Stojanovic1, Aravind Subramanian1 1 & Dudley Wyman University of Washington Genome Center: Maynard V. Olson19, Rajinder Kaul19 & Christopher Raymond19 The Sanger Centre: Jane Rogers2, John Sulston2*, 2 2 2 2 Rachael Ainscough , Stephan Beck , David Bentley , John Burton , Department of Molecular Biology, Keio University School of 2 2 2 Christopher Clee , Nigel Carter , Alan Coulson , Medicine: Nobuyoshi Shimizu20, Kazuhiko Kawasaki20 2 2 2 Rebecca Deadman , Panos Deloukas , Andrew Dunham , & Shinsei Minoshima20 Ian Dunham2, Richard Durbin2*, Lisa French2, Darren Grafham2, Simon Gregory2, Tim Hubbard2*, Sean Humphray2, Adrienne Hunt2, University of Texas Southwestern Medical Center at Dallas: Matthew Jones2, Christine Lloyd2, Amanda McMurray2, Glen A. Evans21², Maria Athanasiou21 & Roger Schultz21 Lucy Matthews2, Simon Mercer2, Sarah Milne2, James C. Mullikin2*, Andrew Mungall2, Robert Plumb2, Mark Ross2, Ratna Shownkeen2 University of Oklahoma's Advanced Center for Genome & Sarah Sims2 Technology: Bruce A. Roe22, Feng Chen22 & Huaqin Pan22 Washington University Genome Sequencing Center: Max Planck Institute for Molecular Genetics: Juliane
Recommended publications
  • The International Human Epigenome Consortium (IHEC): a Blueprint for Scientific Collaboration and Discovery
    The International Human Epigenome Consortium (IHEC): A Blueprint for Scientific Collaboration and Discovery Hendrik G. Stunnenberg1#, Martin Hirst2,3,# 1Department of Molecular Biology, Faculties of Science and Medicine, Radboud University, Nijmegen, The Netherlands 2Department of Microbiology and Immunology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada V6T 1Z4. 3Canada’s Michael Smith Genome Science Center, BC Cancer Agency, Vancouver, BC, Canada V5Z 4S6 #Corresponding authors [email protected] [email protected] Abstract The International Human Epigenome Consortium (IHEC) coordinates the generation of a catalogue of high-resolution reference epigenomes of major primary human cell types. The studies now presented (cell.com/XXXXXXX) highlight the coordinated achievements of IHEC teams to gather and interpret comprehensive epigenomic data sets to gain insights in the epigenetic control of cell states relevant for human health and disease. One of the great mysteries in developmental biology is how the same genome can be read by cellular machinery to generate the plethora of different cell types required for eukaryotic life. As appreciation grew for the central roles of transcriptional and epigenetic mechanisms in specification of cellular fates and functions, researchers around the world encouraged scientific funding agencies to develop an organized and standardized effort to exploit epigenomic assays to shed additional light on this process (Beck, Olek et al. 1999, Jones and Martienssen 2005, American Association for Cancer Research Human Epigenome Task and European Union 2008). In March 2009, leading scientists and international health research funding agency representatives were invited to a meeting in Bethesda (MD, USA) to gauge the level of interest in an international epigenomics project and to identify potential areas of focus.
    [Show full text]
  • A Machine Learning Framework for Precise 3D Domain Boundary Prediction at Base-Level Resolution
    bioRxiv preprint doi: https://doi.org/10.1101/2020.09.03.282186; this version posted September 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. preciseTAD: A machine learning framework for precise 3D domain boundary prediction at base-level resolution Spiro C. Stilianoudakis1 ([email protected]), Mikhail G. Dozmorov1* (mikhail.dozmorov@ vcuhealth.org) 1 Dept. of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA * To whom correspondence should be addressed Abstract The low resolution of high-throughput chromatin conformation capture data limits the precise mapping of boundaries of topologically associating domains and chromatin loops. We developed preciseTAD, an optimized random forest model trained on high-resolution genome annotation data (e.g., CTCF ChIP- seq) to predict the location of domain boundaries at base-level resolution. Distance between boundaries and annotations, random under-sampling, and transcription factor binding sites resulted in best model performance. preciseTAD boundaries were more enriched for CTCF, RAD21, SMC3, and ZNF143, and conserved across cell lines. Using genome annotations, pre-trained models can detect boundaries in cells without Hi-C data. preciseTAD is available at https://bioconductor.org/packages/preciseTAD Keywords Hi-C, TAD, machine learning, random forest, DBSCAN Background The advent of chromosome conformation capture (3C) sequencing technologies, and its successor Hi-C, have revealed a hierarchy of the 3-dimensional (3D) structure of the human genome such as chromatin loops [1], Topologically Associating Domains (TADs) [2,3], and A/B compartments [4].
    [Show full text]
  • Generative Modeling of Multi-Mapping Reads with Mhi-C Advances Analysis of Hi-C Studies Ye Zheng1, Ferhat Ay2,3, Sunduz Keles1,4*
    TOOLS AND RESOURCES Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies Ye Zheng1, Ferhat Ay2,3, Sunduz Keles1,4* 1Department of Statistics, University of Wisconsin-Madison, Madison, United States; 2La Jolla Institute for Allergy and Immunology, La Jolla, United States; 3School of Medicine, University of California, San Diego, La Jolla, United States; 4Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, United States Abstract Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological signal from repetitive regions of genomes. We developed and validated mHi-C, a multi-read mapping strategy to probabilistically allocate Hi-C multi-reads. mHi-C exhibited superior performance over utilizing only uni-reads and heuristic approaches aimed at rescuing multi-reads on benchmarks. Specifically, mHi-C increased the sequencing depth by an average of 20% resulting in higher reproducibility of contact matrices and detected interactions across biological replicates. The impact of the multi-reads on the detection of significant interactions is influenced marginally by the relative contribution of multi-reads to the sequencing depth compared to uni-reads, cis-to-trans ratio of contacts, and the broad data quality as reflected by the proportion of mappable reads of datasets. Computational experiments highlighted that in Hi-C studies with short read lengths, mHi-C rescued multi-reads can emulate the effect of longer reads. mHi-C also revealed biologically supported bona fide promoter-enhancer interactions and topologically associating domains involving repetitive genomic regions, thereby *For correspondence: unlocking a previously masked portion of the genome for conformation capture studies.
    [Show full text]
  • Next Generation Sequencing: Advances in Characterizing the Methylome
    Genes 2010, 1, 143-165; doi:10.3390/genes1020143 OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Review Next Generation Sequencing: Advances in Characterizing the Methylome Kristen H. Taylor 1, Huidong Shi 2 and Charles W. Caldwell 1,* 1 University of Missouri-Columbia School of Medicine, Ellis Fischel Cancer Center, Columbia, MO 65212, USA; E-Mail: [email protected] 2 Medical College of Georgia, Augusta, GA 30912, USA; E-Mail: [email protected] * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +1-573-882-1283; Fax: +1-573-884-4612. Received: 3 May 2010; in revised form: 22 June 2010 / Accepted: 28 June 2010 / Published: 1 July 2010 Abstract: Epigenetic modifications play an important role in lymphoid malignancies. This has been evidenced by the large body of work published using microarray technologies to generate methylation profiles for numerous types and subtypes of lymphoma and leukemia. These studies have shown the importance of defining the epigenome so that we can better understand the biology of lymphoma. Recent advances in DNA sequencing technology have transformed the landscape of epigenomic analysis as we now have the ability to characterize the genome-wide distribution of chromatin modifications and DNA methylation using next-generation sequencing. To take full advantage of the throughput of next-generation sequencing, there are many methodologies that have been developed and many more that are currently being developed. Choosing the appropriate methodology is fundamental to the outcome of next-generation sequencing studies. In this review, published technologies and methodologies applicable to studying the methylome are presented.
    [Show full text]
  • Comprehensive Epigenome Characterization Reveals Diverse Transcriptional Regulation Across Human Vascular Endothelial Cells
    Nakato et al. Epigenetics & Chromatin (2019) 12:77 https://doi.org/10.1186/s13072-019-0319-0 Epigenetics & Chromatin RESEARCH Open Access Comprehensive epigenome characterization reveals diverse transcriptional regulation across human vascular endothelial cells Ryuichiro Nakato1,2† , Youichiro Wada2,3*†, Ryo Nakaki4, Genta Nagae2,4, Yuki Katou5, Shuichi Tsutsumi4, Natsu Nakajima1, Hiroshi Fukuhara6, Atsushi Iguchi7, Takahide Kohro8, Yasuharu Kanki2,3, Yutaka Saito2,9,10, Mika Kobayashi3, Akashi Izumi‑Taguchi3, Naoki Osato2,4, Kenji Tatsuno4, Asuka Kamio4, Yoko Hayashi‑Takanaka2,11, Hiromi Wada3,12, Shinzo Ohta12, Masanori Aikawa13, Hiroyuki Nakajima7, Masaki Nakamura6, Rebecca C. McGee14, Kyle W. Heppner14, Tatsuo Kawakatsu15, Michiru Genno15, Hiroshi Yanase15, Haruki Kume6, Takaaki Senbonmatsu16, Yukio Homma6, Shigeyuki Nishimura16, Toutai Mitsuyama2,9, Hiroyuki Aburatani2,4, Hiroshi Kimura2,11,17* and Katsuhiko Shirahige2,5* Abstract Background: Endothelial cells (ECs) make up the innermost layer throughout the entire vasculature. Their phe‑ notypes and physiological functions are initially regulated by developmental signals and extracellular stimuli. The underlying molecular mechanisms responsible for the diverse phenotypes of ECs from diferent organs are not well understood. Results: To characterize the transcriptomic and epigenomic landscape in the vascular system, we cataloged gene expression and active histone marks in nine types of human ECs (generating 148 genome‑wide datasets) and carried out a comprehensive analysis with chromatin interaction data. We developed a robust procedure for comparative epigenome analysis that circumvents variations at the level of the individual and technical noise derived from sample preparation under various conditions. Through this approach, we identifed 3765 EC‑specifc enhancers, some of which were associated with disease‑associated genetic variations.
    [Show full text]
  • Survey of Epigenomic Landscapes in ES Cells and Differentiated Cells
    The NIH Roadmap Epigenomics Program: Sequencing Human Epigenomes from Head to Toe Joseph F. Costello UCSF Funded by NIH Common Fund NIH Roadmap Epigenomics Program Program Goal: Understand how epigenetic mechanisms contribute to disease NIEHS: Lisa Chadwick, Fred Tyson NIDA:, Joni Rutter, John Satterlee NIH Roadmap Epigenomics Program Novel Marks UCSF UCSD Mapping Broad Centers UW New Health and Technologies Disease EDACC Baylor NCBI Reference Epigenome Mapping Centers Goal: Create high quality epigenome maps of human cells and tissues Matching maps to diseases Reference Map Complex Disease Immune system Asthma Airway cells Autoimmune disease Monocytes Atheroslerosis Breast Breast Cancer Muscle, smooth, skeletal Cardiopulmonary disease Pancreatic islets, Adipocytes Diabetes, Obesity Liver Schizophrenia Brain regions Autism Neural progenitors Dementia Human Epigenome Assays DNA methylation (shotgun bisulfite, RRBS, MeDIP/MRE) HistoneStatus modifications on November by ChIP8, 2012-seq: (6 core marks, plus others) H3K4me1 334 Tissues/Cell types H3K4me3 Chromatin2092 Epigenome accessibility/Transcriptome by DNAseI assays hypersensitivity H3K9me3 H3K27me3 RNA by ssRNA-seq H3K36me3 miRNA by miRNA-seqH3K27Ac* BE Bernstein et al, Nat Biotech 2010; Zhou, Nature Methods, 2011 Sequencing Epigenomes from Head to Toe Oocyte Fertilized egg Morula Sperm Blastocyst Trophoblast Development of specialized cells S. Yamanaka cells Panceatic islets Intestinal cells Epigenomic analysis of multi-lineage differentiation of hESCs Mapping of DNA methylation
    [Show full text]
  • Profiling Single-Cell Histone Modifications Using Indexing Chromatin Immunocleavage Sequencing
    Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press Method Profiling single-cell histone modifications using indexing chromatin immunocleavage sequencing Wai Lim Ku,1 Lixia Pan,1 Yaqiang Cao, Weiwu Gao, and Keji Zhao Laboratory of Epigenome Biology, Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892-1674, USA Recently, multiple single-cell assays were developed for detecting histone marks at the single-cell level. These techniques are either limited by the low cell throughput or sparse reads which limit their applications. To address these problems, we in- troduce indexing single-cell immunocleavage sequencing (iscChIC-seq), a multiplex indexing method based on TdT terminal transferase and T4 DNA ligase-mediated barcoding strategy and single-cell ChIC-seq, which is capable of readily analyzing histone modifications across tens of thousands of single cells in one experiment. Application of iscChIC-seq to profiling H3K4me3 and H3K27me3 in human white blood cells (WBCs) enabled successful detection of more than 10,000 single cells for each histone modification with 11 K and 45 K nonredundant reads per cell, respectively. Cluster analysis of these data allowed identification of monocytes, T cells, B cells, and NK cells from WBCs. The cell types annotated from H3K4me3 single-cell data are specifically correlated with the cell types annotated from H3K27me3 single-cell data. Our data indicate that iscChIC-seq is a reliable technique for profiling histone modifications in a large number of single cells, which may find broad applications in studying cellular heterogeneity and differentiation status in complex developmental and disease systems.
    [Show full text]
  • Integrated Analysis of Tissue-Specific Promoter Methylation and Gene
    International Journal of Molecular Sciences Article Integrated Analysis of Tissue-Specific Promoter Methylation and Gene Expression Profile in Complex Diseases 1, 1, 1 1 1 Kibaick Lee y, Sanghoon Moon y, Mi-Jin Park , In-Uk Koh , Nak-Hyeon Choi , Ho-Yeong Yu 1, Young Jin Kim 1, Jinhwa Kong 1, Hee Gyung Kang 2 , Song Cheol Kim 3,* and Bong-Jo Kim 1,* 1 Division of Genome Research, Center for Genome Science, Korea National Institute of Health, Chungcheongbuk-do 28519, Korea; [email protected] (K.L.); [email protected] (S.M.); [email protected] (M.-J.P.); [email protected] (I.-U.K.); [email protected] (N.-H.C.); [email protected] (H.-Y.Y.); [email protected] (Y.J.K.); [email protected] (J.K.) 2 Department of Pediatrics, Seoul National University College of Medicine, Seoul 03080, Korea; [email protected] 3 Department of Surgery, Asan Medical Center, AMIST, University of Ulsan College of Medicine, Seoul 05505, Korea * Correspondence: [email protected] (S.C.K.); [email protected] (B.-J.K.) These authors contributed equally to this work. y Received: 8 June 2020; Accepted: 16 July 2020; Published: 17 July 2020 Abstract: This study investigated whether the promoter region of DNA methylation positively or negatively regulates tissue-specific genes (TSGs) and if it correlates with disease pathophysiology. We assessed tissue specificity metrics in five human tissues, using sequencing-based approaches, including 52 whole genome bisulfite sequencing (WGBS), 52 RNA-seq, and 144 chromatin immunoprecipitation sequencing (ChIP-seq) data. A correlation analysis was performed between the gene expression and DNA methylation levels of the TSG promoter region.
    [Show full text]
  • The Epigenomic Basis of Common Diseases Euan J
    Rodger and Chatterjee Clinical Epigenetics (2017) 9:5 DOI 10.1186/s13148-017-0313-y SHORT REPORT Open Access The epigenomic basis of common diseases Euan J. Rodger1,2* and Aniruddha Chatterjee1,2* Abstract A report of the 6th Epigenomics of Common Diseases Conference held at the Wellcome Genome Campus in Hinxton, Cambridge, UK, on 1–4 November 2016. Introduction to investigate a large number of CpGs across large num- Epigenetic modification provides a stable mechanism by bers of patients and controls to detect aberrant methyla- which cells with the same genotype can modulate their tion signals at a population level [1]. Further, EWASs are gene expression and exhibit different phenotypes. In the an ideal platform to tap into large international resources past two decades, excellent progress has been made to and compare multiple datasets with custom-generated profile these modifications and our understanding of EWAS data. Examples of well-curated epigenomic data- epigenetic marks has surpassed beyond the basic sets include the International Human Epigenome Consor- phenomenon of cellular heterogeneity. It is now estab- tium (IHEC), the EU-funded BLUEPRINT project, and lished that epigenetic marks are altered in almost all the International Cancer Genome Consortium (ICGC). common human diseases. The Epigenomics of Common Although EWASs have been in use for several years now Diseases meeting, 1–4 November 2016, provided an and thousands of datasets and several analytical tools have account of the progress made in this area and also indi- been reported, there is still a need to understand the po- cated future areas that are yet to be addressed.
    [Show full text]
  • Epigenomic Analysis Reveals DNA Motifs Regulating Histone Modifications in Human and Mouse
    Epigenomic analysis reveals DNA motifs regulating histone modifications in human and mouse Vu Ngoa,1, Zhao Chenb,1, Kai Zhanga, John W. Whitakerb, Mengchi Wanga, and Wei Wanga,b,c,2 aGraduate Program of Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA 92093-0359; bDepartment of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093-0359; and cDepartment of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093-0359 Edited by Steven Henikoff, Fred Hutchinson Cancer Research Center, Seattle, WA, and approved January 3, 2019 (received for review August 6, 2018) Histones are modified by enzymes that act in a locus, cell-type, and An analogy is that a transcription factor (TF) recognizes the same developmental stage-specific manner. The recruitment of enzymes DNA motif but its binding sites are cell-type–dependent. However, if to chromatin is regulated at multiple levels, including interaction we identify all motifs enriched in the TF binding sites across a large with sequence-specific DNA-binding factors. However, the DNA- and diverse set of cell types, the most common motif is likely the one binding specificity of the regulatory factors that orchestrate spe- recognized by the TF. Histone modifications are more complicated cific histone modifications has not been broadly mapped. We have than a single TF binding and one histone mark can be regulated by analyzed 6 histone marks (H3K4me1, H3K4me3, H3K27ac, H3K27me3, K3H9me3, H3K36me3) across 121 human cell types and tissues from multiple factors recognizing different motifs. Therefore, a compar- the NIH Roadmap Epigenomics Project as well as 8 histone marks ative analysis across diverse cell types/tissues is critical.
    [Show full text]
  • Chromatin Features Constrain Structural Variation Across Evolutionary Timescales
    Chromatin features constrain structural variation across evolutionary timescales Geoff Fudenberga,1 and Katherine S. Pollarda,b,c,d,e,f,1 aGladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158; bDepartment of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158; cInstitute for Human Genetics, University of California, San Francisco, CA 94158; dQuantitative Biology Institute, University of California, San Francisco, CA 94158; eInstitute for Computational Health Sciences, University of California, San Francisco, CA 94158; and fChan-Zuckerberg Biohub, San Francisco, CA 94158 Edited by Jasper Rine, University of California, Berkeley, CA, and approved December 10, 2018 (received for review May 23, 2018) The potential impact of structural variants includes not only the with autism and developmental delay, where deletions occur re- duplication or deletion of coding sequences, but also the pertur- markably uniformly across the genome, and in cancer, where dele- bation of noncoding DNA regulatory elements and structural tions in fact show a slight enrichment for disrupting otherwise chromatin features, including topological domains (TADs). Struc- important features. Together our analyses uncover a genome- tural variants disrupting TAD boundaries have been implicated wide pattern of negative selection against deletions that could po- both in cancer and developmental disease; this likely occurs via tentially alter chromatin structure and lead to enhancer hijacking. “enhancer hijacking,” whereby
    [Show full text]
  • Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific
    Resource Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks Graphical Abstract Authors Davide Pellacani, Misha Bilenky, Nagarajan Kannan, ..., Samuel Aparicio, Martin Hirst, Connie J. Eaves Correspondence [email protected] (M.H.), [email protected] (C.J.E.) In Brief Pellacani et al. present comprehensive histone and DNA modification profiles for four cell types in normal human breast tissue and three immortalized human mammary epithelial cell lines. Analysis of activated enhancers place luminal progenitors in between bipotent progenitor-containing basal cells and nonproliferative luminal cells. Explore consortium data at the Cell Press IHEC webportal at http://www.cell.com/ consortium/IHEC. Highlights d Epigenomes of four normal human breast cell fractions and three cell lines are reported d Luminal progenitor and mature luminal cell epigenomes differ greatly d Enhancers define luminal progenitors as intermediate between basal and luminal cells d Transcription factor binding sites in active enhancers point to distinct regulators Pellacani et al., 2016, Cell Reports 17, 2060–2074 November 15, 2016 ª 2016 The Author(s). http://dx.doi.org/10.1016/j.celrep.2016.10.058 Cell Reports Resource Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks Davide Pellacani,1 Misha Bilenky,2 Nagarajan Kannan,1 Alireza Heravi-Moussavi,2 David J.H.F. Knapp,1 Sitanshu Gakkhar,2 Michelle Moksa,3 Annaick Carles,3 Richard Moore,2 Andrew J. Mungall,2 Marco A. Marra,2 Steven J.M. Jones,2 Samuel Aparicio,4,5 Martin Hirst,2,3,* and Connie J.
    [Show full text]