DNA Sequencing with Chain-Terminating Inhibitors (DNA Polymerase/Nucleotide Sequences/Bacteriophage 4X174) F

Total Page:16

File Type:pdf, Size:1020Kb

DNA Sequencing with Chain-Terminating Inhibitors (DNA Polymerase/Nucleotide Sequences/Bacteriophage 4X174) F Proc. Nati. Acad. Sci. USA Vol. 74, No. 12, pp. 5463-5467, December 1977 Biochemistry DNA sequencing with chain-terminating inhibitors (DNA polymerase/nucleotide sequences/bacteriophage 4X174) F. SANGER, S. NICKLEN, AND A. R. COULSON Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 2QH, England Contributed by F. Sanger, October 3, 1977 ABSTRACT A new method for determining nucleotide se- a stereoisomer of ribose in which the 3'-hydroxyl group is ori- quences in DNA is described. It is similar to the "plus and ented in trans position with respect to the 2'-hydroxyl group. minus" method [Sanger, F. & Coulson, A. R. (1975) J. Mol. Biol. 94,441-4481 but makes use of the 2',3'-dideoxy and arabinonu- The arabinosyl (ara) nucleotides act as chain terminating in- cleoside analogues ofthe normal deoxynucleoside triphosphates, hibitors of Escherichia coli DNA polymerase I in a manner which act as specific chain-terminating inhibitors of DNA comparable to ddT (4), although synthesized chains ending in polymerase. The technique has been applied to the DNA of 3' araC can be further extended by some mammalian DNA bacteriophage 4bX174 and is more rapid and more accurate than polymerases (5). In order to obtain a suitable pattern of bands either the plus or the minus method. from which an extensive sequence can be read it is necessary to have a ratio of terminating triphosphate to normal triphos- The "plus and minus" method (1) is a relatively rapid and phate such that only partial incorporation of the terminator simple technique that has made possible the determination of occurs. For the dideoxy derivatives this ratio is about 100, and the sequence of the genome of bacteriophage 4X174 (2). It for the arabinosyl derivatives about 5000. depends on the use of DNA polymerase to transcribe specific regions of the DNA under controlled conditions. Although the method is considerably more rapid and simple than other METHODS available techniques, neither the "plus" nor the "minus" Preparation of the Triphosphate Analogues. The prepa- method is completely accurate, and in order to establish a se- ration of ddTTP has been described (6, 7), and the material is quence both must be used together, and sometimes confirma- now commercially available. ddA has been prepared by tory data are necessary. W. M. Barnes (J. Mol. Biol., in press) McCarthy et al. (8). We essentially followed their procedure has recently developed a third method, involving ribo-substi- and used the methods of Tener (9) and of Hoard and Ott (10) tution, which has certain advantages over the plus and minus to convert it to the triphosphate, which was then purified on method, but this has not yet been extensively exploited. DEAE-Sephadex, using a 0.1-1.0 M gradient of triethylamine Another rapid and simple method that depends on specific carbonate at pH 8.4. The preparation of ddGTP and ddCTP chemical degradation of the DNA has recently been described has not been described previously; however we applied the by Maxam and Gilbert (3), and this has also been used exten- same method as that used for ddATP and obtained solutions sively for DNA sequencing. It has the advantage over the plus having the requisite terminating activities. The yields were very and minus method that it can be applied to double-stranded low and this can hardly be regarded as adequate chemical DNA, but it requires a strand separation or equivalent frac- characterization. However, there can be little doubt that the tionation of each restriction enzyme fragment studied, which activity was due to the dideoxy derivatives. makes it somewhat more laborious. The starting material for the ddGTP was N-isobutyryl-5'- This paper describes a further method using DNA poly- O-monomethoxytrityldeoxyguanosine prepared by F. E. merase, which makes use of inhibitors that terminate the newly Baralle (11). After tosylation of the 3'-OH group (12) the synthesized chains at specific residues. compound was converted to the 2',3'-didehydro derivative with Principle of the Method. Atkinson et al. (4) showed that the sodium methoxide (8). The isobutyryl group was partly re- inhibitory activity of 2',3'-dideoxythymidine triphosphate moved during this treatment and removal was completed by (ddTTP) on DNA polymerase I depends on its being incorpo- incubation in NH3 (specific gravity 0.88) overnight at 45°. The rated into the growing oligonucleotide chain in the place of didehydro derivative was reduced to the dideoxy derivative (8) thymidylic acid (dT). Because the ddT contains no 3'-hydroxyl and converted to the triphosphate as for the ddATP. The mo- group, the chain cannot be extended further, so that termination nophosphate was purified by fractionation on a DEAE-Se- occurs specifically at positions where dT should be incorporated. phadex column using a triethylamine carbonate gradient If a primer and template are incubated with DNA polymerase (0.025-0.3 M) but the triphosphate was not purified. in the presence of a mixture of ddTTP and dTTP, as well as the ddCTP was prepared from N-anisoyl-5'-O-monomethoxy- other three deoxyribonucleoside triphosphates (one of which trityldeoxycytidine (Collaborative Research Inc., Waltham, is labeled with 32p), a mixture of fragments all having the same MA) by the above method but the final purification on 5' and with ddT residues at the 3' ends is obtained. When this DEAE-Sephadex was omitted because the yield was very low mixture is fractionated by electrophoresis on denaturing and the solution contained the required activity. The solution acrylamide gels the pattern of bands shows the distribution of was used directly in the experiments described in this paper. dTs in the newly synthesized DNA. By using analogous ter- An attempt was made to prepare the triphosphate of the minators for the other nucleotides in separate incubations and intermediate didehydrodideoxycytidine because Atkinson et running the samples in parallel on the gel, a pattern of bands is obtained from which the sequence can be read off as in the Abbreviations: The symbols C, T, A, and G are used for the deoxyri- other rapid techniques mentioned above. bonucleotides in DNA sequences; the prefix dd is used for the 2',3'- Two types of terminating triphosphates have been used-the dideoxy derivatives (e.g., ddATP is 2',3'-dideoxyadenosine 5'-tri- dideoxy derivatives and the arabinonucleosides. Arabinose is phosphate); the prefix ara is used for the arabinose analogues. 5463 Downloaded by guest on September 30, 2021 5464 Biochemistry:' Sanger et al. Proc. Natl. Acad. Sci. USA 74 (1977) A14 (-) A 12d(- G A T C G A T C :.. S: _ | < .. _ .. C,-t;ri. | 0*.. T aA 4040 I I AxA GAC~ ~ 40320 /t -'~ (-AG-(" QiT t e _ P-t..A'AAAIT 35,10 4Ozc20C1 1-Ta G-A- 2:e. IT AT A 3500 401A0A1(A OFO A UAs4A33)40 44 -1A;AT c At A ( $48( jij~~~C.'G 4000 T(,1 Q ( 1 04-A-A-- - 3999)(;001(oT I-1A 1 A- - ; jA 7T-3- by C 34980C0 39601AT -C A 3ox4810 ~~~~~~~( AC 34 7 T A 39T0 -t it1-to3430 A A A 3460 3950Q( ACs. iAA o A A _ Ad GAT 34150 3940 GrT AC AC 34 300 -CA 3440 3930 T A 4.m. C A A 3420 A AA 3 FIG. 1. Autoradiograph of the acrylamide gel from the sequence determination using restriction fragments A12d and A14 as primers on the complementary strand of IX174 DNA. The inhibitors used were (left to right) ddGTP, ddATP,~ACTGAddTTP, and araCTP. Electrophoresis was on a 12%6 acrylamide gel at 40 mA for 14 hr. The top 10 cm of the gel is not shown. The DNA sequence is written from left to right and upwards beside the corresponding bands on the radioautograph. The numbering is as given in ref. 2. al. (4) have shown that the didehydrodideoxy-TTP is also active ddG: 0.1 mM dCTP, 0.1 mM dTTP, 0.005 mM dGTP, 0.5 as a terminator. However, we were unsuccessful in this. These mM ddGTP compounds seem much less stable than the dideoxy deriva- ddC: 0.1 mM dGTP, 0.1 mM dTTP, 0.005 mM dCTP, tives. approximately 0.25 mM ddCTP araATP and araCTP were obtained from P-L Biochemicals (The concentration of the ddCTP was uncertain because there Inc., Milwaukee, WI. was insufficient yield to determine it, but the required dilution Sequencing Procedure. Restriction enzyme fragments were of the solution was determined experimentally.) obtained from OX174 replicative form and separated by elec- araC: 0.1 mM dGTP, 0.1 mM dTTP, 0.005 mM dCTP, 12.5 trophoresis on acrylamide gels. The material obtained from 5 mM araCTP ,ug of OX174 replicative form in 5 ,Al of H20 was mixed with Incubation was at room temperature for 15 min. Then 1 Ml 1 Al of viral or complementary strand 4X174 DNA (0.6 jg) and of 0.5 mM dATP was added and incubation was continued for 1 Mi of H X 10 buffer (13) and sealed in a capillary tube, heated a further 15 min. If this step (chase) was omitted some termi- to 1000 for 3 min, and then incubated at 670 for 30 min. The nation at A residues occurred in all samples due to the low solution was diluted to 20 Al with H buffer and 2 Al samples concentration of the [a-32PldATP. With small primers, where were taken for each incubation and mixed with 2 Ml of the ap- it was unnecessary to carry out a subsequent splitting (as in the propriate "mix" and 1 il of DNA polymerase (according to experiment shown in Fig.
Recommended publications
  • DNA Sequencing and Sorting: Identifying Genetic Variations
    BioMath DNA Sequencing and Sorting: Identifying Genetic Variations Student Edition Funded by the National Science Foundation, Proposal No. ESI-06-28091 This material was prepared with the support of the National Science Foundation. However, any opinions, findings, conclusions, and/or recommendations herein are those of the authors and do not necessarily reflect the views of the NSF. At the time of publishing, all included URLs were checked and active. We make every effort to make sure all links stay active, but we cannot make any guaranties that they will remain so. If you find a URL that is inactive, please inform us at [email protected]. DIMACS Published by COMAP, Inc. in conjunction with DIMACS, Rutgers University. ©2015 COMAP, Inc. Printed in the U.S.A. COMAP, Inc. 175 Middlesex Turnpike, Suite 3B Bedford, MA 01730 www.comap.com ISBN: 1 933223 71 5 Front Cover Photograph: EPA GULF BREEZE LABORATORY, PATHO-BIOLOGY LAB. LINDA SHARP ASSISTANT This work is in the public domain in the United States because it is a work prepared by an officer or employee of the United States Government as part of that person’s official duties. DNA Sequencing and Sorting: Identifying Genetic Variations Overview Each of the cells in your body contains a copy of your genetic inheritance, your DNA which has been passed down to you, one half from your biological mother and one half from your biological father. This DNA determines physical features, like eye color and hair color, and can determine susceptibility to medical conditions like hypertension, heart disease, diabetes, and cancer.
    [Show full text]
  • The Bio Revolution: Innovations Transforming and Our Societies, Economies, Lives
    The Bio Revolution: Innovations transforming economies, societies, and our lives economies, societies, our and transforming Innovations Revolution: Bio The Executive summary The Bio Revolution Innovations transforming economies, societies, and our lives May 2020 McKinsey Global Institute Since its founding in 1990, the McKinsey Global Institute (MGI) has sought to develop a deeper understanding of the evolving global economy. As the business and economics research arm of McKinsey & Company, MGI aims to help leaders in the commercial, public, and social sectors understand trends and forces shaping the global economy. MGI research combines the disciplines of economics and management, employing the analytical tools of economics with the insights of business leaders. Our “micro-to-macro” methodology examines microeconomic industry trends to better understand the broad macroeconomic forces affecting business strategy and public policy. MGI’s in-depth reports have covered more than 20 countries and 30 industries. Current research focuses on six themes: productivity and growth, natural resources, labor markets, the evolution of global financial markets, the economic impact of technology and innovation, and urbanization. Recent reports have assessed the digital economy, the impact of AI and automation on employment, physical climate risk, income inequal ity, the productivity puzzle, the economic benefits of tackling gender inequality, a new era of global competition, Chinese innovation, and digital and financial globalization. MGI is led by three McKinsey & Company senior partners: co-chairs James Manyika and Sven Smit, and director Jonathan Woetzel. Michael Chui, Susan Lund, Anu Madgavkar, Jan Mischke, Sree Ramaswamy, Jaana Remes, Jeongmin Seong, and Tilman Tacke are MGI partners, and Mekala Krishnan is an MGI senior fellow.
    [Show full text]
  • Genomic Sequencing of Infectious Pathogens
    Science, Technology Assessment, MARCH 2021 and Analytics WHY THIS MATTERS Genomic sequencing reveals the genetic code of an infectious disease pathogen, such as SARS-CoV-2, the SCIENCE & TECH SPOTLIGHT: virus that causes COVID-19. Newer, faster, and less costly sequencing can now be used to more quickly track GENOMIC SEQUENCING OF transmission, detect new variants, and develop vaccines and other countermeasures. However, challenges such INFECTIOUS PATHOGENS as high startup costs and privacy concerns remain. /// THE TECHNOLOGY How mature is it? First-generation sequencing is often used to confirm results of NGS. NGS is relatively new to the public health field, but is used What is it? Genomic sequencing technologies decode a pathogen’s to augment surveillance (i.e., data collection and analysis) for SARS- genetic material by identifying the order of chemical “letters” of its DNA CoV-2 in the U.S. and overseas. New technologies are allowing greater (or RNA, its chemical equivalent in some viruses). Each of four letters access to sequencing capabilities by making NGS portable, faster, and represents a chemical unit called a base. The sequence of the bases can more affordable (the cost of one sequence run is now one-millionth of reveal useful information for combatting disease. For example, sequencing what it was two decades ago). Genomic sequencing technologies enable of SARS-CoV-2 is used to track the spread of various strains, known as many different areas of infectious disease study. For example, they enable variants (see fig. 1). genomic epidemiology—the science of using pathogen genomic data to determine the distribution and spread of an infectious disease in a group How does it work? Technologies for sequencing genomes of pathogens of people or animals, and the application of this information to respond to use chemicals to break up the DNA or RNA into small fragments.
    [Show full text]
  • The Genomics Era: the Future of Genetics in Medicine - Glossary
    The Genomics Era: the Future of Genetics in Medicine - Glossary The glossary below provides a list of key terms used throughout the course. You do not need to read them all now; we’ll be linking back to the main glossary step wherever these terms appear, so you may refer back to this list if you are unsure of the terminology being used. Term Definition The process of matching reads back to their original Alignment position in the reference genome. An allele is one of a number of alternative forms of the same gene or genetic locus. We inherit one copy Allele of our genetic code from our mother and one copy of our genetic code from our father. Each copy is known as an allele. Microarray based genomic comparative hybridisation. This is a technique used to detect chromosome imbalances by comparing patient and control DNA and comparing differences between the two sets. It is Array CGH a useful technique for detecting small chromosome deletions and duplications which would not have been detected with more traditional karyotyping techniques. A unit of DNA. There are four bases which form the Base cross links (or rungs) of the DNA double helix: adenine (A), thymine (T), guanine (G) and cytosine (C). Capture see Target enrichment. The process by which a cell becomes specialized in Cell differentiation order to perform a specific function. Centromere The point at which the sister chromatids are joined. #1 FutureLearn A structure located in the nucleus all living cells, comprised of DNA bound around proteins called histones. The normal number of chromosomes in each Chromosome human cell nucleus is 46 and is composed of 22 pairs of autosomes and a pair of sex chromosomes which determine gender: males have an X and a Y chromosome whilst females have two X chromosomes.
    [Show full text]
  • DNA Sequencing) Time Binding Sites and Other Functional Sites Using Sequencing As a Means of Evaluating Chromatin Immunoprecipitation (Chip) Elaine R
    COMMENTARY for re-sequencing genomes. With next- generation-based applications, we rapidly A brief history of will begin to annotate the human genome with the positions of regulatory protein- (DNA sequencing) time binding sites and other functional sites using sequencing as a means of evaluating chromatin immunoprecipitation (ChIP) Elaine R. Mardis experiments. A variant of ChIP will eluci- date how genome-wide patterns of histone Each spring, I co-teach a course to under- binding and DNA methylation relate to graduates that aims to introduce them to gene-expression regulation in various cell the brief history of genome sequencing states, such as differentiation or disease. and then immerses them in its current The short read-lengths of next-generation practices. Looking out at their eager faces, platforms also are ideal for discovering the I try (perhaps in vain) to capture for them sequences of non-coding RNA genes that in a 1 hour lecture what has been my life’s cannot be elucidated by in silico methods. focus for the past 20 years or so — DNA Bioinformatics-based analysis of genome- sequencing. If forced to summarize this wide DNA copy number, DNA-sequence time period succinctly, I would say, “Never variation and RNA-expression data sets, all a dull moment!” This is largely owing to the produced by next-generation sequencers, technological advances that have catapulted will be integrated to stitch together the genome sequencing from a cottage industry pieces of the biological ‘story’ being told to a high-throughput enterprise, akin to by genomic data. These stories will, in a factory.
    [Show full text]
  • Cloning, DNA Amplification.Pdf
    Molecular Biology DNA structure and function Recombinant DNA technology DNA amplification DNA sequencing Arthur Günzl, PhD Dept. of Genetics & Genome Sciences University of Connecticut Health Center Bioinformatics and Computational Biology Course (BME5800) September 13, 2016 Deoxyribonucleic Acid (DNA) 5’ 3’ 2’ The haploid human genome consists of ~3,000,000,000 bp The combined a-helical length is ~2 m A human nucleus’ diameter is ~10 mm, containing 46 chromosomes From Alberts et al., Molecular Biology of the Cell Which one is the reverse primer ? 5’-TTGGGAAGCTCCTTGTCA-3’ |||||||||||||||||| 3’-AACCCTTCGAGGAACAGT-5’ a. 5’-ACTGTTCCTCGAAGGGTT-3’ b. 5’-AACCCTTCGAGGAACAGT-3’ c. 5’-TGACAAGGAGCTTCCCAA-3’ d. 5’-GGTCCTGGAGAAAAGTCT-3’ e. None of the above Recombinant DNA Technology Recombination Process in which DNA molecules are broken and the fragments are rejoined in new combinations Recombinant DNA Any DNA molecule formed by joining DNA segments from different sources Recombinant DNA Technology Key Enzymes Restriction enzyme One of a large number of endonucleases that can cleave a DNA molecule at any site where a specific short nucleo- tide sequence occurs 5’-...GTACCTAGAATTCTT CTAG...-3’ ||||||||||||||||| 3’-...CATGGATCTTAAGTT GATC...-5’ Eco RI (sticky end) 5’-...GTACCTAG AATTCCTAGTG...-3’ |||||||| ||||||| 3’-...CATGGATCTTAA GGATCAC...-5’ SmaI CCC GGG Serratia marcescens Recombinant DNA Technology Key Enzymes DNA ligase Enzyme that joins the ends of two DNA strands together with a covalent bond to make a continuous DNA strand 5’-...GTACCTACCC-3’ 5’-pGGGCTAGTG...-3’ |||||||||| ||||||||| 3’-...CATGGATGGGp-5’ 3’-CCCGATCAC...-5’ T4 DNA ligase + ATP 5’-...GTACCTACCCGGGCTAGTG...-3’ ||||||||||||||||||| 3’-...CATGGATGGGCCCGATCAC...-5’ Recombinant DNA Technology plasmid cloning vector Multiple Cloning Site Gene cloning gene plasmid donor DNA vector RESTRICTION DIGEST DNA LIGASE (ATP) recombinant plasmid Transformation of E.
    [Show full text]
  • Guide to Interpreting Genomic Reports: a Genomics Toolkit
    Guide to Interpreting Genomic Reports: A Genomics Toolkit A guide to genomic test results for non-genetics providers Created by the Practitioner Education Working Group of the Clinical Sequencing Exploratory Research (CSER) Consortium Genomic Report Toolkit Authors Kelly East, MS, CGC, Wendy Chung MD, PhD, Kate Foreman, MS, CGC, Mari Gilmore, MS, CGC, Michele Gornick, PhD, Lucia Hindorff, PhD, Tia Kauffman, MPH, Donna Messersmith , PhD, Cindy Prows, MSN, APRN, CNS, Elena Stoffel, MD, Joon-Ho Yu, MPh, PhD and Sharon Plon, MD, PhD About this resource This resource was created by a team of genomic testing experts. It is designed to help non-geneticist healthcare providers to understand genomic medicine and genome sequencing. The CSER Consortium1 is an NIH-funded group exploring genomic testing in clinical settings. Acknowledgements This work was conducted as part of the Clinical Sequencing Exploratory Research (CSER) Consortium, grants U01 HG006485, U01 HG006485, U01 HG006546, U01 HG006492, UM1 HG007301, UM1 HG007292, UM1 HG006508, U01 HG006487, U01 HG006507, R01 HG006618, and U01 HG007307. Special thanks to Alexandria Wyatt and Hugo O’Campo for graphic design and layout, Jill Pope for technical editing, and the entire CSER Practitioner Education Working Group for their time, energy, and support in developing this resource. Contents 1 Introduction and Overview ................................................................ 3 2 Diagnostic Results Related to Patient Symptoms: Pathogenic and Likely Pathogenic Variants . 8 3 Uncertain Results
    [Show full text]
  • Glossary of Terms
    GLOSSARY OF TERMS Table of Contents A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z A Amino acids: any of a class of 20 molecules that are combined to form proteins in living things. The sequence of amino acids in a protein and hence protein function are determined by the genetic code. From http://www.geneticalliance.org.uk/glossary.htm#C • The building blocks of proteins, there are 20 different amino acids. From https://www.yourgenome.org/glossary/amino-acid Antisense: Antisense nucleotides are strings of RNA or DNA that are complementary to "sense" strands of nucleotides. They bind to and inactivate these sense strands. They have been used in research, and may become useful for therapy of certain diseases (See Gene silencing). From http://www.encyclopedia.com/topic/Antisense_DNA.aspx. Antisense and RNA interference are referred as gene knockdown technologies: the transcription of the gene is unaffected; however, gene expression, i.e. protein synthesis (translation), is lost because messenger RNA molecules become unstable or inaccessible. Furthermore, RNA interference is based on naturally occurring phenomenon known as Post-Transcriptional Gene Silencing. From http://www.ncbi.nlm.nih.gov/probe/docs/applsilencing/ B Biobank: A biobank is a large, organised collection of samples, usually human, used for research. Biobanks catalogue and store samples using genetic, clinical, and other characteristics such as age, gender, blood type, and ethnicity. Some samples are also categorised according to environmental factors, such as whether the donor had been exposed to some substance that can affect health.
    [Show full text]
  • Profiling DNA Methylation Based on Next-Generation Sequencing Approaches
    G C A T T A C G G C A T genes Review Profiling DNA Methylation Based on Next-Generation Sequencing Approaches: New Insights and Clinical Applications Daniela Barros-Silva 1 ID , C. Joana Marques 2,3 ID , Rui Henrique 1,4,5 ID and Carmen Jerónimo 1,5,* ID 1 Cancer Biology and Epigenetics Group, IPO Porto Research Center (CI-IPOP), Portuguese Oncology Institute of Porto (IPO Porto), Rua António Bernardino Almeida, 4200-072 Porto, Portugal; [email protected] (D.B.-S.); [email protected] (R.H.) 2 Genetics, Department of Pathology, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal; [email protected] 3 I3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal 4 Department of Pathology, Portuguese Oncology Institute of Porto (IPO Porto), 4200-072 Porto, Portugal 5 Department of Pathology and Molecular Immunology, Institute of Biomedical Sciences Abel Salazar (ICBAS)—University of Porto, 4050-313 Porto, Portugal * Correspondence: [email protected]; Tel.: +351-225084000; Fax: +351-225084199 Received: 23 May 2018; Accepted: 20 August 2018; Published: 23 August 2018 Abstract: DNA methylation is an epigenetic modification that plays a pivotal role in regulating gene expression and, consequently, influences a wide variety of biological processes and diseases. The advances in next-generation sequencing technologies allow for genome-wide profiling of methyl marks both at a single-nucleotide and at a single-cell resolution. These profiling approaches vary in many aspects, such as DNA input, resolution, coverage, and bioinformatics analysis. Thus, the selection of the most feasible method according with the project’s purpose requires in-depth knowledge of those techniques.
    [Show full text]
  • DNA Methylation Data by Sequencing: Experimental Approaches and Recommendations for Tools and Pipelines for Data Analysis
    Rauluseviciute et al. Clinical Epigenetics (2019) 11:193 https://doi.org/10.1186/s13148-019-0795-x REVIEW Open Access DNA methylation data by sequencing: experimental approaches and recommendations for tools and pipelines for data analysis Ieva Rauluseviciute1* , Finn Drabløs1 and Morten Beck Rye1,2 Abstract Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms—from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions.
    [Show full text]
  • Generations of Sequencing Technologies: from First to Next
    nd M y a ed g ic lo i o n i e B Kchouk et al., Biol Med (Aligarh) 2017, 9:3 DOI: 10.4172/0974-8369.1000395 ISSN: 0974-8369 Biology and Medicine Review Article Open Access Generations of Sequencing Technologies: From First to Next Generation Mehdi Kchouk1,3*, Jean-François Gibrat2 and Mourad Elloumi3 1Faculty of Sciences of Tunis (FST), Tunis El-Manar University, Tunisia 2Research Unit Applied Mathematics and Computer Science, from Genomes to the Environment' (MaIAGE), Jouy en Josas, France 3Laboratory of Technologies of Information and Communication and Electrical Engineering, National Superior School of Engineers of Tunis, University of Tunis, Tunisia *Corresponding author: Mehdi Kchouk, Faculty of Sciences of Tunis (FST), Tunis El-Manar University, Tunisia, Tel: +216 71 872 253; E-mail: [email protected] Received date: January 31, 2017; Accepted date: February 27, 2017; Published date: March 06, 2017 Copyright: © 2017 Kchouk M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited. Abstract DNA sequencing process utilizes biochemical methods in order to determine the correct order of nucleotide bases in a DNA macromolecule using sequencing machines. Ten years ago, sequencing was based on a single type of sequencing that is Sanger sequencing. In 2005, Next Generation Sequencing Technologies emerged and changed the view of the analysis and understanding of living beings. Over the last decade, considerable progress has been made on new sequencing machines. In this paper, we present a non-exhaustive overview of the sequencing technologies by beginning with the first methods history used by the commonly used NGS platforms until today.
    [Show full text]
  • To Get the Whole Picture, Sequence the Whole Genome
    APPLICATION NOTE APPLICATION NOTE Switch from Exome to To Get the Whole Picture, Whole Genome Studies Sequence the Whole Genome • Affordable: Low cost per Comparing whole genome sequencing and exome sequencing genome for large studies • High-quality: Better quality Introduction exomes than by selection Next-generation DNA sequencing empowers scientists to identify genetic variations as- methods sociated with human disease at higher resolution and greater sensitivity than previously possible. Two approaches are commonly employed -- exome sequencing and whole • Comprehensive: Non-coding genome sequencing. Exome sequencing targets protein-coding regions comprising and coding variants includ- ing SV and CNV approximately 1% of the human genome, while whole genome sequencing uses an unbiased approach to investigate the majority of the human genome, including a com- • Easy-to-use: Tables of prehensive survey of coding and non-coding regions. research-ready annotated coding and non-coding vari- Researchers perform exome sequencing for two main reasons. First, targeted se- ants quencing may cost less than whole genome sequencing. Secondly, at the moment, protein-coding variants are more easily interpreted than non-coding variants. However, • Fast: Large studies com- prising hundreds of whole by targeting only a specified list of protein coding sequences, DNA variations outside genomes can be done in a of these regions are missed. Moreover, exome capture by hybridization can introduce few months considerable coverage variability which may impact comparative analyses and there- fore limit discovery efforts. In addition, some types of events such as copy number and • Informative: Need fewer structural variations (CNVs and SVs), as well as some insertions, deletions and block samples per study: Whole genomes contain more substitutions, may be difficult to detect in exome capture data.
    [Show full text]