Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine

Total Page:16

File Type:pdf, Size:1020Kb

Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine biomolecules Review Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine Ryuji Hamamoto 1,2,*, Masaaki Komatsu 1,2, Ken Takasawa 1,2 , Ken Asada 1,2 and Syuzo Kaneko 1 1 Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; [email protected] (M.K.); [email protected] (K.T.); [email protected] (K.A.); [email protected] (S.K.) 2 Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan * Correspondence: [email protected]; Tel.: +81-3-3547-5271 Received: 1 December 2019; Accepted: 27 December 2019; Published: 30 December 2019 Abstract: To clarify the mechanisms of diseases, such as cancer, studies analyzing genetic mutations have been actively conducted for a long time, and a large number of achievements have already been reported. Indeed, genomic medicine is considered the core discipline of precision medicine, and currently, the clinical application of cutting-edge genomic medicine aimed at improving the prevention, diagnosis and treatment of a wide range of diseases is promoted. However, although the Human Genome Project was completed in 2003 and large-scale genetic analyses have since been accomplished worldwide with the development of next-generation sequencing (NGS), explaining the mechanism of disease onset only using genetic variation has been recognized as difficult. Meanwhile, the importance of epigenetics, which describes inheritance by mechanisms other than the genomic DNA sequence, has recently attracted attention, and, in particular, many studies have reported the involvement of epigenetic deregulation in human cancer. So far, given that genetic and epigenetic studies tend to be accomplished independently, physiological relationships between genetics and epigenetics in diseases remain almost unknown. Since this situation may be a disadvantage to developing precision medicine, the integrated understanding of genetic variation and epigenetic deregulation appears to be now critical. Importantly, the current progress of artificial intelligence (AI) technologies, such as machine learning and deep learning, is remarkable and enables multimodal analyses of big omics data. In this regard, it is important to develop a platform that can conduct multimodal analysis of medical big data using AI as this may accelerate the realization of precision medicine. In this review, we discuss the importance of genome-wide epigenetic and multiomics analyses using AI in the era of precision medicine. Keywords: epigenetics; precision medicine; DNA methylation; histone modifications; machine learning; deep learning 1. Introduction Barack Obama, the 44th president of the United States, stated his intention to fund an amount of $215 million to the “Precision Medicine Initiative” in his 2015 State of the Union Address [1]. Since then, precision medicine has frequently been used as a term that contains concepts of personalized medicine worldwide. Generally, precision medicine refers to a medical model that proposes the customization of healthcare with medical decisions, treatments, practices or products tailored to individual patients. Biomolecules 2020, 10, 62; doi:10.3390/biom10010062 www.mdpi.com/journal/biomolecules Biomolecules 2020, 10, 62 2 of 21 Biomolecules 2020, 10, x 2 of 20 individual patients. In this model, diagnostic testing is often employed for selecting appropriate and In this model, diagnostic testing is often employed for selecting appropriate and optimal therapies optimal therapies based on the context of a patient’s genetic content or other molecular or cellular basedBiomolecules on the context2020, 10, x of a patient’s genetic content or other molecular or cellular analyses [2].2 Toof 20 date, analyses [2]. To date, most precision medicine interventions consist of genetic profiling, including the most precision medicine interventions consist of genetic profiling, including the detection of predictive detectionindividual of predictive patients. biomarkers In this model, [3]. Itdia hasgnostic been testing repeatedly is often reported employed that for this selecting may identify appropriate patients and biomarkers [3]. It has been repeatedly reported that this may identify patients at risk for a specific at risk foroptimal a specific therapies disease based or ona severe the context variant of aof patient’s a disease genetic and allow content for or preventive other molecular interventions or cellular to disease or a severe variant of a disease and allow for preventive interventions to reduce the burden of reduce analysesthe burden [2]. ofTo disease date, most and precision improve medicine quality interventionsof life. However, consist it hasof genetic also been profiling, reported including that only the diseasedetection and improveof predictive quality biomarkers of life. [ However,3]. It has been it has repeatedly also been reported reported that thatthis may only identify a small patients number of a small number of patients benefit from current precision medicine, and it is of no benefit for most patientsat risk benefit for a specific from current disease precisionor a severe medicine, variant of anda disease it is ofand no allow benefit for preventive for most tumor interventions patients to [ 4,5]. tumor patients [4,5]. In addition, it has been stated that the MD Anderson Cancer Center found that In addition,reduce the it hasburden been of stateddisease that and the improve MD Anderson quality of life. Cancer However, Center it foundhas also that been the reported gene sequencing that only of the gene sequencing of 2600 people only benefited 6.4% of them through the use of targeted drugs. 2600a peoplesmall number only benefited of patients 6.4% benefit of themfrom current through precision the use medicine, of targeted and drugs. it is of Accordingno benefit for to most the data According to the data about matching plans of the National Cancer Institute, only 2% people can abouttumor matching patients plans [4,5]. ofIn addition, the National it has Cancerbeen stated Institute, that the only MD 2%Anderson people Cancer can benefit Center fromfound targeted that benefit from targeted drugs [4,6]. These results indicate that we definitely need to explore the drugsthe [ 4gene,6]. Thesesequencing results of indicate2600 people that only we definitelybenefited 6.4% need of to them explore through the possibility the use of targeted that more drugs. patients possibility that more patients can benefit from precision medicine. To extend precision medicine, not canA benefitccording from to the precision data about medicine. matching To plans extend of precisionthe National medicine, Cancer Institute, not only only genomic 2% people data butcan also only genomic data but also other omics data, such as epigenetic and proteomics data, should be otherbenefit omics from data, targeted such as epigeneticdrugs [4,6] and. These proteomics results indicate data, should that we be involved,definitely andneed integrated to explore analyses the involved,possibility and integrated that more analyses patients ofcan different benefit from type precisions of omics medicine. data are To considered extend precision to be ofmedicine, paramount not of different types of omics data are considered to be of paramount importance. In this review article, importance.only genomicIn this review data but article, also weother highlight omics data, the current such as knowledge epigenetic andof the proteomics importance data, of epigeneticshould be we highlight the current knowledge of the importance of epigenetic data in precision medicine by data in involvedprecision, and medicine integrated by analysesdescribing, of different in part typesicular, of the omics integrated data are consideredanalysis of to multiomics be of paramount data, describing, in particular, the integrated analysis of multiomics data, including epigenetic data, using includingimportance. epigenetic In thisdata, review using article, machine we highlight learning the and current deep knowledgelearning technologies. of the importance of epigenetic machinedata in learning precision and medicine deep learning by describing, technologies. in particular, the integrated analysis of multiomics data, including epigenetic data, using machine learning and deep learning technologies. 2. Characteristics2. Characteristics of Epigenetics of Epigenetics and and Technologies Technologies for forEpigenetics Epigenetics Analysis Analysis 2. Characteristics of Epigenetics and Technologies for Epigenetics Analysis 2.1.2.1. General General Characteristics Characteristics of Epigenetics of Epigenetics In 2.1.principle,In principle,General epi-geneticsCharacteristics epi-genetics is of the Epigenetics is thestudy study of heri of heritabletable phenotype phenotype changes changes without without altering altering the theDNA DNA sequencesequence [7].In [ 7Theprinciple,]. The Greek Greek epi prefix-genetics prefix epi- epi- is (the (ἐ studyπι πι- “above”)- “above”)of heritable in in epi-genetics phenotype epi-genetics changes implies implies without features features altering that that are the “onare DNA “on top of” topor of” “insequence or addition “in addition [7].to” The the Greek to” traditional the prefix traditional epi genetic- (ἐπι- “geneticabove basis” for)basis in inheritanceepi -forgenetics inheritance implies [8]. Over features[8]. the
Recommended publications
  • 13 Genomics and Bioinformatics
    Enderle / Introduction to Biomedical Engineering 2nd ed. Final Proof 5.2.2005 11:58am page 799 13 GENOMICS AND BIOINFORMATICS Spencer Muse, PhD Chapter Contents 13.1 Introduction 13.1.1 The Central Dogma: DNA to RNA to Protein 13.2 Core Laboratory Technologies 13.2.1 Gene Sequencing 13.2.2 Whole Genome Sequencing 13.2.3 Gene Expression 13.2.4 Polymorphisms 13.3 Core Bioinformatics Technologies 13.3.1 Genomics Databases 13.3.2 Sequence Alignment 13.3.3 Database Searching 13.3.4 Hidden Markov Models 13.3.5 Gene Prediction 13.3.6 Functional Annotation 13.3.7 Identifying Differentially Expressed Genes 13.3.8 Clustering Genes with Shared Expression Patterns 13.4 Conclusion Exercises Suggested Reading At the conclusion of this chapter, the reader will be able to: & Discuss the basic principles of molecular biology regarding genome science. & Describe the major types of data involved in genome projects, including technologies for collecting them. 799 Enderle / Introduction to Biomedical Engineering 2nd ed. Final Proof 5.2.2005 11:58am page 800 800 CHAPTER 13 GENOMICS AND BIOINFORMATICS & Describe practical applications and uses of genomic data. & Understand the major topics in the field of bioinformatics and DNA sequence analysis. & Use key bioinformatics databases and web resources. 13.1 INTRODUCTION In April 2003, sequencing of all three billion nucleotides in the human genome was declared complete. This landmark of modern science brought with it high hopes for the understanding and treatment of human genetic disorders. There is plenty of evidence to suggest that the hopes will become reality—1631 human genetic diseases are now associated with known DNA sequences, compared to the less than 100 that were known at the initiation of the Human Genome Project (HGP) in 1990.
    [Show full text]
  • Genomics and Its Impact on Science and Society: the Human Genome Project and Beyond
    DOE/SC-0083 Genomics and Its Impact on Science and Society The Human Genome Project and Beyond U.S. Department of Energy Genome Research Programs: genomics.energy.gov A Primer ells are the fundamental working units of every living system. All the instructions Cneeded to direct their activities are contained within the chemical DNA (deoxyribonucleic acid). DNA from all organisms is made up of the same chemical and physical components. The DNA sequence is the particular side-by-side arrangement of bases along the DNA strand (e.g., ATTCCGGA). This order spells out the exact instruc- tions required to create a particular organism with protein complex its own unique traits. The genome is an organism’s complete set of DNA. Genomes vary widely in size: The smallest known genome for a free-living organism (a bac- terium) contains about 600,000 DNA base pairs, while human and mouse genomes have some From Genes to Proteins 3 billion (see p. 3). Except for mature red blood cells, all human cells contain a complete genome. Although genes get a lot of attention, the proteins DNA in each human cell is packaged into 46 chro- perform most life functions and even comprise the mosomes arranged into 23 pairs. Each chromosome is majority of cellular structures. Proteins are large, complex a physically separate molecule of DNA that ranges in molecules made up of chains of small chemical com- length from about 50 million to 250 million base pairs. pounds called amino acids. Chemical properties that A few types of major chromosomal abnormalities, distinguish the 20 different amino acids cause the including missing or extra copies or gross breaks and protein chains to fold up into specific three-dimensional rejoinings (translocations), can be detected by micro- structures that define their particular functions in the cell.
    [Show full text]
  • Genetic Effects on Microsatellite Diversity in Wild Emmer Wheat (Triticum Dicoccoides) at the Yehudiyya Microsite, Israel
    Heredity (2003) 90, 150–156 & 2003 Nature Publishing Group All rights reserved 0018-067X/03 $25.00 www.nature.com/hdy Genetic effects on microsatellite diversity in wild emmer wheat (Triticum dicoccoides) at the Yehudiyya microsite, Israel Y-C Li1,3, T Fahima1,MSRo¨der2, VM Kirzhner1, A Beiles1, AB Korol1 and E Nevo1 1Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel; 2Institute for Plant Genetics and Crop Plant Research, Corrensstrasse 3, 06466 Gatersleben, Germany This study investigated allele size constraints and clustering, diversity. Genome B appeared to have a larger average and genetic effects on microsatellite (simple sequence repeat number (ARN), but lower variance in repeat number 2 repeat, SSR) diversity at 28 loci comprising seven types of (sARN), and smaller number of alleles per locus than genome tandem repeated dinucleotide motifs in a natural population A. SSRs with compound motifs showed larger ARN than of wild emmer wheat, Triticum dicoccoides, from a shade vs those with perfect motifs. The effects of replication slippage sun microsite in Yehudiyya, northeast of the Sea of Galilee, and recombinational effects (eg, unequal crossing over) on Israel. It was found that allele distribution at SSR loci is SSR diversity varied with SSR motifs. Ecological stresses clustered and constrained with lower or higher boundary. (sun vs shade) may affect mutational mechanisms, influen- This may imply that SSR have functional significance and cing the level of SSR diversity by both processes. natural constraints.
    [Show full text]
  • Gene Prediction and Genome Annotation
    A Crash Course in Gene and Genome Annotation Lieven Sterck, Bioinformatics & Systems Biology VIB-UGent [email protected] ProCoGen Dissemination Workshop, Riga, 5 nov 2013 “Conifer sequencing: basic concepts in conifer genomics” “This Project is financially supported by the European Commission under the 7th Framework Programme” Genome annotation: finding the biological relevant features on a raw genomic sequence (in a high throughput manner) ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Thx to: BSB - annotation team • Lieven Sterck (Ectocarpus, higher plants, conifers, … ) • Yao-cheng Lin (Fungi, conifers, …) • Stephane Rombauts (green alga, mites, …) • Bram Verhelst (green algae) • Pierre Rouzé • Yves Van de Peer ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Annotation experience • Plant genomes : A.thaliana & relatives (e.g. A.lyrata), Poplar, Physcomitrella patens, Medicago, Tomato, Vitis, Apple, Eucalyptus, Zostera, Spruce, Oak, Orchids … • Fungal genomes: Laccaria bicolor, Melampsora laricis- populina, Heterobasidion, other basidiomycetes, Glomus intraradices, Pichia pastoris, Geotrichum Candidum, Candida ... • Algal genomes: Ostreococcus spp, Micromonas, Bathycoccus, Phaeodactylum (and other diatoms), E.hux, Ectocarpus, Amoebophrya … • Animal genomes: Tetranychus urticae, Brevipalpus spp (mites), ... ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Why genome annotation? • Raw sequence data is not useful for most biologists • To be meaningful to them it has to be converted into biological significant knowledge
    [Show full text]
  • Small Variants Frequently Asked Questions (FAQ) Updated September 2011
    Small Variants Frequently Asked Questions (FAQ) Updated September 2011 Summary Information for each Genome .......................................................................................................... 3 How does Complete Genomics map reads and call variations? ........................................................................... 3 How do I assess the quality of a genome produced by Complete Genomics?................................................ 4 What is the difference between “Gross mapping yield” and “Both arms mapped yield” in the summary file? ............................................................................................................................................................................. 5 What are the definitions for Fully Called, Partially Called, Half-Called and No-Called?............................ 5 In the summary-[ASM-ID].tsv file, how is the number of homozygous SNPs calculated? ......................... 5 In the summary-[ASM-ID].tsv file, how is the number of heterozygous SNPs calculated? ....................... 5 In the summary-[ASM-ID].tsv file, how is the total number of SNPs calculated? .......................................... 5 In the summary-[ASM-ID].tsv file, what regions of the genome are included in the “exome”? .............. 6 In the summary-[ASM-ID].tsv file, how is the number of SNPs in the exome calculated? ......................... 6 In the summary-[ASM-ID].tsv file, how are variations in potentially redundant regions of the genome counted? .....................................................................................................................................................................
    [Show full text]
  • Ecological Developmental Biology and Disease States CHAPTER 5 Teratogenesis: Environmental Assaults on Development 167
    Integrating Epigenetics, Medicine, and Evolution Scott F. Gilbert David Epel Swarthmore College Hopkins Marine Station, Stanford University Sinauer Associates, Inc. • Publishers Sunderland, Massachusetts U.S.A. © Sinauer Associates, Inc. This material cannot be copied, reproduced, manufactured or disseminated in any form without express written permission from the publisher. Brief Contents PART 1 Environmental Signals and Normal Development CHAPTER 1 The Environment as a Normal Agent in Producing Phenotypes 3 CHAPTER 2 How Agents in the Environment Effect Molecular Changes in Development 37 CHAPTER 3 Developmental Symbiosis: Co-Development as a Strategy for Life 79 CHAPTER 4 Embryonic Defenses: Survival in a Hostile World 119 PART 2 Ecological Developmental Biology and Disease States CHAPTER 5 Teratogenesis: Environmental Assaults on Development 167 CHAPTER 6 Endocrine Disruptors 197 CHAPTER 7 The Epigenetic Origin of Adult Diseases 245 PART 3 Toward a Developmental Evolutionary Synthesis CHAPTER 8 The Modern Synthesis: Natural Selection of Allelic Variation 289 CHAPTER 9 Evolution through Developmental Regulatory Genes 323 CHAPTER 10 Environment, Development, and Evolution: Toward a New Synthesis 369 CODA Philosophical Concerns Raised by Ecological Developmental Biology 403 APPENDIX A Lysenko, Kammerer, and the Truncated Tradition of Ecological Developmental Biology 421 APPENDIX B The Molecular Mechanisms of Epigenetic Change 433 APPENDIX C Writing Development Out of the Modern Synthesis 441 APPENDIX D Epigenetic Inheritance Systems:
    [Show full text]
  • Early-Life Experience, Epigenetics, and the Developing Brain
    Neuropsychopharmacology REVIEWS (2015) 40, 141–153 & 2015 American College of Neuropsychopharmacology. All rights reserved 0893-133X/15 REVIEW ............................................................................................................................................................... www.neuropsychopharmacologyreviews.org 141 Early-Life Experience, Epigenetics, and the Developing Brain 1 ,1 Marija Kundakovic and Frances A Champagne* 1Department of Psychology, Columbia University, New York, NY, USA Development is a dynamic process that involves interplay between genes and the environment. In mammals, the quality of the postnatal environment is shaped by parent–offspring interactions that promote growth and survival and can lead to divergent developmental trajectories with implications for later-life neurobiological and behavioral characteristics. Emerging evidence suggests that epigenetic factors (ie, DNA methylation, posttranslational histone modifications, and small non- coding RNAs) may have a critical role in these parental care effects. Although this evidence is drawn primarily from rodent studies, there is increasing support for these effects in humans. Through these molecular mechanisms, variation in risk of psychopathology may emerge, particularly as a consequence of early-life neglect and abuse. Here we will highlight evidence of dynamic epigenetic changes in the developing brain in response to variation in the quality of postnatal parent–offspring interactions. The recruitment of epigenetic pathways for the
    [Show full text]
  • From 1957 to Nowadays: a Brief History of Epigenetics
    International Journal of Molecular Sciences Review From 1957 to Nowadays: A Brief History of Epigenetics Paul Peixoto 1,2, Pierre-François Cartron 3,4,5,6,7,8, Aurélien A. Serandour 3,4,6,7,8 and Eric Hervouet 1,2,9,* 1 Univ. Bourgogne Franche-Comté, INSERM, EFS BFC, UMR1098, Interactions Hôte-Greffon-Tumeur/Ingénierie Cellulaire et Génique, F-25000 Besançon, France; [email protected] 2 EPIGENEXP Platform, Univ. Bourgogne Franche-Comté, F-25000 Besançon, France 3 CRCINA, INSERM, Université de Nantes, 44000 Nantes, France; [email protected] (P.-F.C.); [email protected] (A.A.S.) 4 Equipe Apoptose et Progression Tumorale, LaBCT, Institut de Cancérologie de l’Ouest, 44805 Saint Herblain, France 5 Cancéropole Grand-Ouest, Réseau Niches et Epigénétique des Tumeurs (NET), 44000 Nantes, France 6 EpiSAVMEN Network (Région Pays de la Loire), 44000 Nantes, France 7 LabEX IGO, Université de Nantes, 44000 Nantes, France 8 Ecole Centrale Nantes, 44300 Nantes, France 9 DImaCell Platform, Univ. Bourgogne Franche-Comté, F-25000 Besançon, France * Correspondence: [email protected] Received: 9 September 2020; Accepted: 13 October 2020; Published: 14 October 2020 Abstract: Due to the spectacular number of studies focusing on epigenetics in the last few decades, and particularly for the last few years, the availability of a chronology of epigenetics appears essential. Indeed, our review places epigenetic events and the identification of the main epigenetic writers, readers and erasers on a historic scale. This review helps to understand the increasing knowledge in molecular and cellular biology, the development of new biochemical techniques and advances in epigenetics and, more importantly, the roles played by epigenetics in many physiological and pathological situations.
    [Show full text]
  • The Economic Impact and Functional Applications of Human Genetics and Genomics
    The Economic Impact and Functional Applications of Human Genetics and Genomics Commissioned by the American Society of Human Genetics Produced by TEConomy Partners, LLC. Report Authors: Simon Tripp and Martin Grueber May 2021 TEConomy Partners, LLC (TEConomy) endeavors at all times to produce work of the highest quality, consistent with our contract commitments. However, because of the research and/or experimental nature of this work, the client undertakes the sole responsibility for the consequence of any use or misuse of, or inability to use, any information or result obtained from TEConomy, and TEConomy, its partners, or employees have no legal liability for the accuracy, adequacy, or efficacy thereof. Acknowledgements ASHG and the project authors wish to thank the following organizations for their generous support of this study. Invitae Corporation, San Francisco, CA Regeneron Pharmaceuticals, Inc., Tarrytown, NY The project authors express their sincere appreciation to the following indi- viduals who provided their advice and input to this project. ASHG Government and Public Advocacy Committee Lynn B. Jorde, PhD ASHG Government and Public Advocacy Committee (GPAC) Chair, President (2011) Professor and Chair of Human Genetics George and Dolores Eccles Institute of Human Genetics University of Utah School of Medicine Katrina Goddard, PhD ASHG GPAC Incoming Chair, Board of Directors (2018-2020) Distinguished Investigator, Associate Director, Science Programs Kaiser Permanente Northwest Melinda Aldrich, PhD, MPH Associate Professor, Department of Medicine, Division of Genetic Medicine Vanderbilt University Medical Center Wendy Chung, MD, PhD Professor of Pediatrics in Medicine and Director, Clinical Cancer Genetics Columbia University Mira Irons, MD Chief Health and Science Officer American Medical Association Peng Jin, PhD Professor and Chair, Department of Human Genetics Emory University Allison McCague, PhD Science Policy Analyst, Policy and Program Analysis Branch National Human Genome Research Institute Rebecca Meyer-Schuman, MS Human Genetics Ph.D.
    [Show full text]
  • Epigenetics: the Biochemistry of DNA Beyond the Central Dogma
    Epigenetics: The Biochemistry of DNA Beyond the Central Dogma Since the Human Genome Project was completed, the newly emerging field of epigenetics is providing a basis for understanding how heritable changes, other than those in the DNA sequence, can influence phenotypic variation. Epigenetics, essentially, affects how genes are read by cells and subsequently how they produce proteins. The interest in epigenetics has led to new findings about the relationship between epigenetic changes and a host of disorders including cancer, immune disorders, and neuropsychiatric disorders. The field of epigenetics is quickly growing and with it the understanding that both the environment and lifestyle choices can directly interact with the genome to influence epigenetic change. This unit is designed to provide a detailed look at the influence of epigenetics on gene expression through developmental stages, tissue-specific needs, and environmental impacts. Students will complete this unit with the understanding that gene regulation and expression is truly “above the genome” and well beyond the simplified mechanism of the Central Dogma of Biology. University of Florida Center for Precollegiate Education and Training www.cpet.ufl.edu 1 EPIGENETICS: THE BIOCHEMISTRY OF DNA www.cpet.ufl.edu 2 Author: Susan Chabot This curriculum was developed as part of Biomedical Explorations: Bench to Bedside, which is supported by the Office Of The Director, National Institutes Of Health of the National Institutes of Health under Award Number R25 OD016551. The content
    [Show full text]
  • CUT&Runtools: a Flexible Pipeline for CUT&RUN Processing and Footprint Analysis
    Zhu et al. Genome Biology (2019) 20:192 https://doi.org/10.1186/s13059-019-1802-4 SOFTWARE Open Access CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis Qian Zhu1, Nan Liu2, Stuart H. Orkin2,3* and Guo-Cheng Yuan1* Abstract We introduce CUT&RUNTools as a flexible, general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&RUN primary cleavage data. CUT&RUNTools extracts endonuclease cut site information from sequences of short-read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&RUN. CUT&RUNTools is available at https://bitbucket.org/qzhudfci/cutruntools/. Introduction Results Mapping the occupancy of DNA-associated proteins, in- Overview cluding transcription factors (TFs) and histones, is central CUT&RUNTools takes paired-end sequencing read to determining cellular regulatory circuits. Conventional FASTQ files as the input and performs a set of analytical ChIP sequencing (ChIP-seq) relies on the cross-linking of steps: trimming of adapter sequences, alignment to the target proteins to DNA and physical fragmentation of reference genome, peak calling, estimation of cut matrix chromatin [1]. In practice, epitope masking and insolubil- at single-nucleotide resolution, de novo motif searching, ity of protein complexes may interfere with the successful motif footprinting analysis, direct binding site identifica- use of conventional ChIP-seq for some chromatin- tion, and data visualization (Fig. 1b). The outputs of the associated proteins [2–4]. CUT&RUN is a recently de- pipeline (Fig. 1c) are (1) an aggregate footprint capturing scribed native endonuclease-based method based on the the characteristics of chromatin-associated protein bind- binding of an antibody to a chromatin-associated protein ing (Fig.
    [Show full text]
  • Epigenetics for Dummies
    On reflection Postgrad Med J: first published as 10.1136/postgradmedj-2016-133993 on 25 February 2016. Downloaded from fur instead of brown. In another much Epigenetics for dummies quoted experiment, researchers claimed to show that mice bred from fathers who John Launer had been trained to become afraid of a particular smell also showed avoidance of the same smell – although it is hard to In the last few years, epigenetics has mechanisms themselves are very common understand how such a specific outcome seized the public imagination. Magazines in nature, quite aside from environmental could be achieved through known New Scientist fl like are publishing articles in uences, and they govern gene expres- molecular mechanisms.9 “ with titles such as How to change your sion in all kinds of ways, including As far as human beings are concerned, ” 1 genes by changing your lifestyle . A turning a stem cell into a liver or a skin everyone has known for a long time that book called “The Epigenetic Revolution” cell, or a bee larva into a worker or a 2 maternal behaviour during pregnancy has become a bestseller. The popular queen bee. One hypomethylating agent, affects the growing fetus, with lifelong press regularly reports epigenetic research, decitabine, is an established treatment for effects. The most familiar example is including a recent study purporting to myelodysplastic disorders. probably foetal alcohol syndrome, where show that severe psychological trauma can The question then arises: do environ- a mother’s addiction can lead to irrepar- ’ 3 be passed down through one s genes. mental effects have to happen anew in able damage to her infant.
    [Show full text]