Current Status and New Features of the Consensus Coding Sequence Database Catherine M

Total Page:16

File Type:pdf, Size:1020Kb

Current Status and New Features of the Consensus Coding Sequence Database Catherine M Published online 11 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D865–D872 doi:10.1093/nar/gkt1059 Current status and new features of the Consensus Coding Sequence database Catherine M. Farrell1, Nuala A. O’Leary1, Rachel A. Harte2, Jane E. Loveland3, Laurens G. Wilming3, Craig Wallin1, Mark Diekhans2, Daniel Barrell3, Stephen M. J. Searle3, Bronwen Aken3, Susan M. Hiatt1, Adam Frankish3, Marie-Marthe Suner3, Bhanu Rajput1, Charles A. Steward3, Garth R. Brown1, Ruth Bennett3, Michael Murphy1, Wendy Wu1, Mike P. Kay3, Jennifer Hart1, Jeena Rajan3, Janet Weber1, Catherine Snow3, Lillian D. Riddick1, Toby Hunt3, David Webb1, Mark Thomas3, Pamela Tamez1, Sanjida H. Rangwala1, Kelly M. McGarvey1, Shashikant Pujar1, Andrei Shkeda1, Jonathan M. Mudge3, Jose M. Gonzalez3, James G. R. Gilbert3, Stephen J. Trevanion3, Robert Baertsch2, Jennifer L. Harrow3, Tim Hubbard3, James M. Ostell1, David Haussler2,4 and Kim D. Pruitt1,* 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA, 2Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, 3Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and 4Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA Received September 12, 2013; Revised September 30, 2013; Accepted October 12, 2013 ABSTRACT and our approach to representing genes for The Consensus Coding Sequence (CCDS) project which support evidence is incomplete. We also (http://www.ncbi.nlm.nih.gov/CCDS/) is a collab- present a summary of recent and future curation orative effort to maintain a dataset of protein- targets. coding regions that are identically annotated on the human and mouse reference genome INTRODUCTION assemblies by the National Center for Biotech- Biological and biomedical research has come to rely on nology Information (NCBI) and Ensembl genome an- accurate and consistent annotation of genes and their notation pipelines. Identical annotations that pass products on genome assemblies. High-quality genome quality assurance tests are tracked with a stable assemblies, such as the human and mouse reference identifier (CCDS ID). Members of the collaboration, genome assemblies that are maintained by the Genome who are from NCBI, the Wellcome Trust Sanger Reference Consortium (GRC) (1), are particularly Institute and the University of California Santa amenable to high-definition gene annotation. Reference an- Cruz, provide coordinated and continuous review notation of these genomes is available from various sources, of the dataset to ensure high-quality CCDS repre- including the National Center for Biotechnology sentations. We describe here the current status Information (NCBI) (2), Ensembl (3), the Vertebrate Genome Annotation database (Vega) (4) and GENCOD and recent growth in the CCDS dataset, as well as Egenes.org. Each annotation group has independent goals recent changes to the CCDS web and FTP sites. and policies, which result in some annotation variation. These changes include more explicit reporting Nevertheless, much of the annotation provided by these in- about the NCBI and Ensembl annotation releases dividual groups is identical, thus providing users with a being compared, new search and display options, higher degree of confidence in the accuracy of these the addition of biologically descriptive information annotations. *To whom correspondence should be addressed. Tel: +1 301 435 5898; Fax: +1 301 480 2918; Email: [email protected] Published by Oxford University Press 2013. This work is written by US Government employees and is in the public domain in the US. D866 Nucleic Acids Research, 2014, Vol. 42, Database issue The Consensus Coding Sequence (CCDS) project (5) has GROWTH OF THE CCDS DATASET been established to identify a gold standard set of protein- The CCDS datasets for human and mouse are periodically coding gene annotations that are identically annotated on reanalyzed and publicly distributed as CCDS releases the human and mouse reference genome assemblies (Table 1). Historically, CCDS releases occurred following by the participating annotation groups. The participating coordinated whole-genome annotation runs by both members are from NCBI, the European Bioinformatics NCBI and Ensembl. This policy has been changed to Institute (EBI), the Wellcome Trust Sanger Institute allow CCDS updates on a more regular basis following (WTSI) and the University of California Santa Cruz annotation updates by either NCBI or Ensembl. Human (UCSC). Manual annotation is provided by the Reference updates are expected to occur roughly every 6 months and Sequence (RefSeq) group at NCBI (2) and by the Human mouse releases will be yearly. Additional CCDS updates and Vertebrate Analysis and Annotation (HAVANA) will occur for both species following assembly updates at group at WTSI (4). A combination of manual and auto- which time we still require both groups to have generated mated genome annotations provided by NCBI and updated annotation on the new assembly. Reducing the Ensembl (which incorporates manual HAVANA annota- time interval between releases allows for the CCDS tions) are compared to identify annotations with matching dataset to represent more updated annotations. genomic coordinates; NCBI and Ensembl-coding region However, because genomic annotation is updated by annotations must be identical at the CDS start and stop NCBI and Ensembl at different times, CCDS content codons and at every splice site to be considered for the may not yet reflect the most recent manual annotation CCDS database. Each matching annotation is assigned a curation efforts. Figure 1A indicates that the number of unique identifier known as a CCDS ID, which is tracked CCDS IDs for both human and mouse continues to and reported in the database. Matching CCDS sequences increase with each new CCDS analysis. Since 2011, the and related metadata are available to users via a public human and mouse CCDS dataset sizes have increased by FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/) or a 1279 and 906 CCDS IDs, respectively. web-based interface (www.ncbi.nlm.nih.gov/CCDS/) that Most of the recent growth in the CCDS dataset comes includes individual report pages for each CCDS ID. from an increase in the number of genes with more than Information on how to access CCDS data was described one splice variant obtaining a CCDS ID. Since 2011, the previously (5), including details on CCDS report page number of human genes in the CCDS database increased features and how to access CCDS data tracks on various by 135, whereas the number of human genes having more genome browsers. In addition, the README file (ftp://ftp. than one splice variant with a CCDS ID increased by 479. ncbi.nlm.nih.gov/pub/CCDS/README) on the FTP site A similar pattern was observed in mouse (Figure 1B), provides full descriptions of the various download files bringing the total number of genes with more than one that are available. CCDS ID to 7752, of which over 100 have more than six CCDS matching annotations indicate concordance CCDS IDs. Therefore, the CCDS dataset is increasingly between different annotation groups with diverse representing more alternative splicing events with each policies, and they thus represent high-quality annotations new release. The CCDS dataset is expected to continue that can be used as standards. The integrity of the CCDS to grow in the next few years as a result of targeted dataset is maintained through stringent quality assurance curation initiatives (see below), which will allow for an (QA) testing and on-going manual curation (6). QA increase in the representation of protein-coding genes testing includes checks for possible conflicts within the and more protein isoforms per gene. Also, as both coding sequence and its annotated structure, protein- curation groups begin to integrate RNAseq data available coding potential, possible problems with the genome from the Illumina Human Body Map 2.0 project (NCBI sequence, and assessing the quality of matched NCBI GEO accession GSE30611), ENCODE (13) and other and Ensembl annotation. Curatorial updates to existing sources such as GTEx data (14) into their pipelines, the CCDS IDs require unanimous agreement by all collabor- number of alternatively spliced transcripts will increase. ators. A process flow has been established to facilitate review of CCDS IDs that have been flagged by any member of the collaboration for update or withdrawal, NEW FEATURES with the voting members for curation updates being RefSeq, HAVANA and UCSC. The CCDS dataset is an The CCDS database has incorporated several new features integral part of the GENCODE gene annotation project to improve user browsing, to add useful biological infor- (7) and it is used as a standard for high-quality coding mation that was either previously internal or not exon definition in various research fields, including categorized, and to increase gene representation in the clinical studies (8,9), large-scale epigenomic studies (10), dataset. exome projects (11) and exon array design (12). Due Updated reporting of CCDS and annotation release to the consensus annotation of CCDS exons by the information independent annotation groups, exome projects in par- ticular have regarded CCDS coding exons as reliable The CCDS web resource and FTP site now specifies which targets for downstream studies
Recommended publications
  • Expert Curation of the Human and Mouse Olfactory Receptor Gene Repertoires Identifies Conserved Coding Regions Split Across Two Exons
    bioRxiv preprint doi: https://doi.org/10.1101/774612; this version posted October 30, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Expert Curation of the Human and Mouse Olfactory Receptor Gene Repertoires Identifies Conserved Coding Regions Split Across Two Exons 5 If H. A. Barnes1†#, Ximena Ibarra-Soria2,3†#, Stephen Fitzgerald3, Jose M. Gonzalez1, Claire Davidson1, Matthew P. Hardy1, Deepa Manthravadi4, Laura Van Gerven5, Mark Jorissen5, Zhen Zeng6, Mona Khan6, Peter Mombaerts6, Jennifer Harrow7, Darren W. Logan3,8,9 and Adam Frankish1#. 10 1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. 2. Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK. 3. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. 15 4. Brandeis University, 415 South Street, Waltham, MA 02453, USA. 5. Department of ENT-HNS, UZ Leuven, Herestraat 49, 3000 Leuven, Belgium. 6. Max Planck Research Unit for Neurogenetics, Max von-Laue-Strasse 4, 60438 Frankfurt, Germany. 7. ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. 8. Monell Chemical Senses Center, Philadelphia, PA 19104, USA. 20 9. Waltham Centre for Pet Nutrition, Leicestershire, LE14 4RT, UK. † These authors contributed equally to this work. # To whom correspondence should be addressed. Email: [email protected], [email protected] and [email protected].
    [Show full text]
  • Repetitive Elements in Humans
    International Journal of Molecular Sciences Review Repetitive Elements in Humans Thomas Liehr Institute of Human Genetics, Jena University Hospital, Friedrich Schiller University, Am Klinikum 1, D-07747 Jena, Germany; [email protected] Abstract: Repetitive DNA in humans is still widely considered to be meaningless, and variations within this part of the genome are generally considered to be harmless to the carrier. In contrast, for euchromatic variation, one becomes more careful in classifying inter-individual differences as meaningless and rather tends to see them as possible influencers of the so-called ‘genetic background’, being able to at least potentially influence disease susceptibilities. Here, the known ‘bad boys’ among repetitive DNAs are reviewed. Variable numbers of tandem repeats (VNTRs = micro- and minisatellites), small-scale repetitive elements (SSREs) and even chromosomal heteromorphisms (CHs) may therefore have direct or indirect influences on human diseases and susceptibilities. Summarizing this specific aspect here for the first time should contribute to stimulating more research on human repetitive DNA. It should also become clear that these kinds of studies must be done at all available levels of resolution, i.e., from the base pair to chromosomal level and, importantly, the epigenetic level, as well. Keywords: variable numbers of tandem repeats (VNTRs); microsatellites; minisatellites; small-scale repetitive elements (SSREs); chromosomal heteromorphisms (CHs); higher-order repeat (HOR); retroviral DNA 1. Introduction Citation: Liehr, T. Repetitive In humans, like in other higher species, the genome of one individual never looks 100% Elements in Humans. Int. J. Mol. Sci. alike to another one [1], even among those of the same gender or between monozygotic 2021, 22, 2072.
    [Show full text]
  • GENCODE: the Reference Human Genome Annotation for the ENCODE Project
    Downloaded from genome.cshlp.org on September 26, 2012 - Published by Cold Spring Harbor Laboratory Press Resource GENCODE: The reference human genome annotation for The ENCODE Project Jennifer Harrow,1,9 Adam Frankish,1 Jose M. Gonzalez,1 Electra Tapanari,1 Mark Diekhans,2 Felix Kokocinski,1 Bronwen L. Aken,1 Daniel Barrell,1 Amonida Zadissa,1 Stephen Searle,1 If Barnes,1 Alexandra Bignell,1 Veronika Boychenko,1 Toby Hunt,1 Mike Kay,1 Gaurab Mukherjee,1 Jeena Rajan,1 Gloria Despacio-Reyes,1 Gary Saunders,1 Charles Steward,1 Rachel Harte,2 Michael Lin,3 Ce´dric Howald,4 Andrea Tanzer,5 Thomas Derrien,4 Jacqueline Chrast,4 Nathalie Walters,4 Suganthi Balasubramanian,6 Baikang Pei,6 Michael Tress,7 Jose Manuel Rodriguez,7 Iakes Ezkurdia,7 Jeltje van Baren,8 Michael Brent,8 David Haussler,2 Manolis Kellis,3 Alfonso Valencia,7 Alexandre Reymond,4 Mark Gerstein,6 Roderic Guigo´,5 and Tim J. Hubbard1,9 1Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, United Kingdom; 2University of California, Santa Cruz, California 95064, USA; 3Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; 4Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; 5Centre for Genomic Regulation (CRG) and UPF, 08003 Barcelona, Catalonia, Spain; 6Yale University, New Haven, Connecticut 06520-8047, USA; 7Spanish National Cancer Research Centre (CNIO), E-28029 Madrid, Spain; 8Center for Genome Sciences & Systems Biology, St. Louis, Missouri 63130, USA The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computa- tional analysis, manual annotation, and experimental validation.
    [Show full text]
  • Next-Gen Sequencing Identifies Non-Coding Variation Disrupting
    OPEN Molecular Psychiatry (2018) 23, 1375–1384 www.nature.com/mp ORIGINAL ARTICLE Next-gen sequencing identifies non-coding variation disrupting miRNA-binding sites in neurological disorders P Devanna1, XS Chen2,JHo1,2, D Gajewski1, SD Smith3, A Gialluisi2,4, C Francks2,5, SE Fisher2,5, DF Newbury6,7 and SC Vernes1,5 Understanding the genetic factors underlying neurodevelopmental and neuropsychiatric disorders is a major challenge given their prevalence and potential severity for quality of life. While large-scale genomic screens have made major advances in this area, for many disorders the genetic underpinnings are complex and poorly understood. To date the field has focused predominantly on protein coding variation, but given the importance of tightly controlled gene expression for normal brain development and disorder, variation that affects non-coding regulatory regions of the genome is likely to play an important role in these phenotypes. Herein we show the importance of 3 prime untranslated region (3'UTR) non-coding regulatory variants across neurodevelopmental and neuropsychiatric disorders. We devised a pipeline for identifying and functionally validating putatively pathogenic variants from next generation sequencing (NGS) data. We applied this pipeline to a cohort of children with severe specific language impairment (SLI) and identified a functional, SLI-associated variant affecting gene regulation in cells and post-mortem human brain. This variant and the affected gene (ARHGEF39) represent new putative risk factors for SLI. Furthermore, we identified 3′UTR regulatory variants across autism, schizophrenia and bipolar disorder NGS cohorts demonstrating their impact on neurodevelopmental and neuropsychiatric disorders. Our findings show the importance of investigating non-coding regulatory variants when determining risk factors contributing to neurodevelopmental and neuropsychiatric disorders.
    [Show full text]
  • Whole Exome and Whole Genome Sequencing
    UnitedHealthcare® Community Plan Medical Policy Whole Exome and Whole Genome Sequencing Policy Number: CS150.J Effective Date: October 1, 2021 Instructions for Use Table of Contents Page Related Community Plan Policies Application ..................................................................................... 1 • Chromosome Microarray Testing (Non-Oncology Coverage Rationale ....................................................................... 1 Conditions) Definitions ...................................................................................... 2 • Molecular Oncology Testing for Cancer Diagnosis, Applicable Codes .......................................................................... 3 Prognosis, and Treatment Decisions Description of Services ................................................................. 4 • Preimplantation Genetic Testing Clinical Evidence ........................................................................... 4 U.S. Food and Drug Administration ........................................... 22 Commercial Policy References ................................................................................... 22 • Whole Exome and Whole Genome Sequencing Policy History/Revision Information ........................................... 26 Medicare Advantage Coverage Summaries Instructions for Use ..................................................................... 26 • Genetic Testing • Laboratory Tests and Services Application This Medical Policy does not apply to the states listed below; refer to
    [Show full text]
  • Clinical Exome Sequencing Tip Sheet – Medicare Item Numbers 73358/73359
    Clinical exome sequencing Tip sheet – Medicare item numbers 73358/73359 Glossary Chromosome microarray (CMA or molecular Monogenic conditions (as opposed karyotype): CMA has a Medicare item number to polygenic or multifactorial conditions) are for patients presenting with intellectual caused by variants in a single gene. Variants disability, developmental delay, autism, or at may be inherited (dominant or recessive least two congenital anomalies. CMA is the fashion), or may occur spontaneously (de recommended first line test in these cases as novo) showing no family history. it can exclude a chromosome cause of disease which is unlikely to be detected by Whole exome sequence – sequencing only exome. the protein coding genes (exons). The exome is ~2% of the genome and contains ~85% of Gene panel is a set of genes that are known to disease-causing gene variants. be associated with a phenotype or disorder. They help narrow down the search Whole genome sequence – sequencing the for variants of interest to genes with evidence entire genome (all genes, including coding linking them to particular phenotypes and noncoding regions) Human phenotype ontology (HPO) terms Singleton – Analysis of the child only. describe a phenotypic abnormality using a Trio – analysis of the child and both biological standard nomenclature. Ideally, all clinicians parents. and scientists are using the same terms. Variant - A change in the DNA code that Mendeliome refers to the ~5,000 genes (out of differs from a reference genome. about 20,000 protein coding genes) that are known to be associated with monogenic disease. As variants in new genes are identified with evidence linking them with human disease, they are added to the Mendeliome.
    [Show full text]
  • The Genomics Era: the Future of Genetics in Medicine - Glossary
    The Genomics Era: the Future of Genetics in Medicine - Glossary The glossary below provides a list of key terms used throughout the course. You do not need to read them all now; we’ll be linking back to the main glossary step wherever these terms appear, so you may refer back to this list if you are unsure of the terminology being used. Term Definition The process of matching reads back to their original Alignment position in the reference genome. An allele is one of a number of alternative forms of the same gene or genetic locus. We inherit one copy Allele of our genetic code from our mother and one copy of our genetic code from our father. Each copy is known as an allele. Microarray based genomic comparative hybridisation. This is a technique used to detect chromosome imbalances by comparing patient and control DNA and comparing differences between the two sets. It is Array CGH a useful technique for detecting small chromosome deletions and duplications which would not have been detected with more traditional karyotyping techniques. A unit of DNA. There are four bases which form the Base cross links (or rungs) of the DNA double helix: adenine (A), thymine (T), guanine (G) and cytosine (C). Capture see Target enrichment. The process by which a cell becomes specialized in Cell differentiation order to perform a specific function. Centromere The point at which the sister chromatids are joined. #1 FutureLearn A structure located in the nucleus all living cells, comprised of DNA bound around proteins called histones. The normal number of chromosomes in each Chromosome human cell nucleus is 46 and is composed of 22 pairs of autosomes and a pair of sex chromosomes which determine gender: males have an X and a Y chromosome whilst females have two X chromosomes.
    [Show full text]
  • GENCODE Reference Annotation for the Human and Mouse Genomes
    D766–D773 Nucleic Acids Research, 2019, Vol. 47, Database issue Published online 24 October 2018 doi: 10.1093/nar/gky955 GENCODE reference annotation for the human and mouse genomes Adam Frankish1, Mark Diekhans2, Anne-Maud Ferreira3, Rory Johnson4,5, Irwin Jungreis 6,7, Jane Loveland 1, Jonathan M. Mudge1, Cristina Sisu8,9, Downloaded from https://academic.oup.com/nar/article-abstract/47/D1/D766/5144133 by Universite and EPFL Lausanne user on 11 April 2019 James Wright10, Joel Armstrong2, If Barnes1, Andrew Berry1, Alexandra Bignell1, Silvia Carbonell Sala11, Jacqueline Chrast3, Fiona Cunningham 1,Tomas´ Di Domenico 12, Sarah Donaldson1, Ian T. Fiddes2, Carlos Garc´ıa Giron´ 1, Jose Manuel Gonzalez1, Tiago Grego1, Matthew Hardy1, Thibaut Hourlier 1, Toby Hunt1, Osagie G. Izuogu1, Julien Lagarde11, Fergal J. Martin 1, Laura Mart´ınez12, Shamika Mohanan1, Paul Muir13,14, Fabio C.P. Navarro8, Anne Parker1, Baikang Pei8, Fernando Pozo12, Magali Ruffier 1, Bianca M. Schmitt1, Eloise Stapleton1, Marie-Marthe Suner 1, Irina Sycheva1, Barbara Uszczynska-Ratajczak15,JinuriXu8, Andrew Yates1, Daniel Zerbino 1, Yan Zhang8,16, Bronwen Aken1, Jyoti S. Choudhary10, Mark Gerstein8,17,18, Roderic Guigo´ 11,19, Tim J.P. Hubbard20, Manolis Kellis6,7, Benedict Paten2, Alexandre Reymond3, Michael L. Tress12 and Paul Flicek 1,* 1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA,
    [Show full text]
  • Downloaded from the Tranche Distributed File System (Tranche.Proteomecommons.Org) and Ftp://Ftp.Thegpm.Org/Data/Msms
    Research Article Title: The shrinking human protein coding complement: are there now fewer than 20,000 genes? Authors: Iakes Ezkurdia1*, David Juan2*, Jose Manuel Rodriguez3, Adam Frankish4, Mark Diekhans5, Jennifer Harrow4, Jesus Vazquez 6, Alfonso Valencia2,3, Michael L. Tress2,*. Affiliations: 1. Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, rid, 28029, MadSpain 2. Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain 3. National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain 4. Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK 5. Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA 6. Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, 28029, Madrid, Spain *: these two authors wish to be considered as joint first authors of the paper. Corresponding author: Michael Tress, [email protected], Tel: +34 91 732 80 00 Fax: +34 91 224 69 76 Running title: Are there fewer than 20,000 protein-coding genes? Keywords: Protein coding genes, proteomics, evolution, genome annotation Abstract Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein coding potential is the detection of cellular protein expression through peptide mass spectrometry experiments. Here we map the peptides detected in 7 large-scale proteomics studies to almost 60% of the protein coding genes in the GENCODE annotation the human genome.
    [Show full text]
  • Whole Exome Sequencing (WES)
    Whole Exome Sequencing (WES) Turn Around Time: 30 Days TEST METHODOLOGY CPT Codes: Proband – 81415, Family Member – 81416 DNA will be extracted from whole blood or Test Includes: DNA Extraction other specimen types. Extracted DNA is Library Prep quantified and sheared to the correct size. The Exome Capture sample then undergoes library preparation and Library QC the exome is captured. After quality assurance, Illumina Platform Sequencing the captured library is then subjected to next Data Analysis generation DNA sequencing on the Illumina Sanger Variant Confirmation (if requested) platform. The reads from this sequencing are Interpreted Clinical Report aligned to a reference sequence and variations from this reference are identified. The sequence variants are then loaded into a commercial software package that contains data sources and Expedited WES testing is available. algorithms allowing for the evaluation of whole Contact the lab for more information. exome sequencing variants for evolutionary conservation, predicted impact on protein TEST DESCRIPTION structure and function (including Polyphen2 (5) and SIFT (6)), ability to disrupt conserved Whole Exome Sequencing (WES) is used to detect variants in a patient’s exome splice sites, and presence in databases including in order to determine the role of genomic variants in disease outcomes. The OMIM, dbSNP, and HGMD (1,2,3). The exome is a little more than 1% of the genome that codes for protein. The patient’s software annotates variants with this data, exome will be sequenced to an average depth of 100X with a minimum depth of considering both the reference gene model and coverage of 85X. Over 97% of the exome will be sequenced to a depth of 10X.
    [Show full text]
  • Genomic Technologies for Cancer Research
    Genomic Technologies for Cancer Research www.illumina.com/applications/cancer.html Table of Contents I. Introduction: Genomic Technologies for Cancer Research 3 II. Approaches for Detecting Somatic Mutations 4 Targeted Sequencing Solutions for Somatic Mutation Detection 4 Exome Sequencing 4 Focused Sequencing Panels 4 Custom Targeted Sequencing 4 Whole-Genome Sequencing Solutions 4 Data Analysis Tools for Somatic Variant Detection 5 III. Evaluating Germline Mutations in Cancer 6 Targeted Sequencing to Detect Common Germline Mutations 7 Microarray-Based Approaches 7 IV. Structural Variant Detection in Cancer 7 DNA and RNA Sequencing for Translocation Detection 8 Copy Number Variation Arrays 8 V. Investigating Gene Regulation in Cancer 8 DNA–Protein Interactions 8 DNA Methylation 9 RNA Sequencing 9 Targeted RNA Sequencing 9 Small RNA Sequencing 10 Data Analysis Tools for the Study of Gene Regulation 11 VI. Summary 11 For Research Use Only. Not for use in diagnostic procedures. I. Introduction: Genomic Technologies for Cancer Research In recent years, genomic technologies have emerged as invaluable tools in cancer research (Figure 1). International projects such as the International Cancer Genome Consortium (ICGC)1 and The Cancer Genome Atlas (TCGA)2, tasked with mapping the biology of dozens of tumor types, would not have been possible without these tools. Next-generation sequencing (NGS) and high-density microarrays are used to study the biology of cancer. Both provide the cancer research community with a growing body of knowledge that may lead to more effective drug design, better patient treatment options, and more accurate prognoses.3 Normal Neoplastic Changes Tumor Treatment Response Recurrence PROGRESSION Somatic Mutations Germline Gene Expression & Mutations Epigenetic Changes Additional Mutations Chromosomal Abnormalities HETEROGENEITY Figure 1: The Tumor Progression Pathway—Genomic technologies are helping researchers achieve a deeper understanding of the tumor progression pathway.
    [Show full text]
  • RNA Next Generation Sequencing Resources Available at the Experimental and Computational Genomics Core (ECGC)
    RNA next generation sequencing resources available at the Experimental and Computational Genomics Core (ECGC) Kornel Schuebel, PhD ECGC Resource Director [email protected] Telephone 410-614-0445 CRB2 Rm 131 (lab) CRB2 Rm 1m44 (office) What’s our mission? To facilitate easy access to genomic technologies and bioinformatics expertise, including experimental design, sample processing, and data analysis. To build educational and training opportunities for genomics analysis. Next Generation Microarray Sequencing Experimental and Computational Genomics Biostatistics and Genomics Bioinformatics Education Analysis The ECGC team Faculty Directors Staff Leslie Cope Michael Considine Sarah Wheelan Anuj Gupta Vasan Yegnasubramanian Jennifer Meyers Alyza Skaist Resource Director Hai Xu Kornel Schuebel Coordinators Lauren Ciotti Luda Danilova Daniel Vellucci Core faculty IT support Rob Scharpf Greg Smith Elana Fertig Dominic King How do I start my project? Let us know a little (a Lauren Sarah sentence or two is fine) Vasan Contact us at ecgc.jhmi.edu about your project Leslie Kornel Schedule and attend a consultation Lauren Meet with us to establish an Sarah experimental plan, discuss Vasan costs and a timeline Leslie Set up an iLab project report Kornel and drop off samples Drop off times are generally Lauren on Tuesdays and Thursdays Jennifer Kornel We will contact you to verify types of data analysis you want Anuj, Alyza, Michael Schedule a meeting to look We confirm with you the Sarah comparisons your iLabs Vasan at the data together report showed
    [Show full text]