An Overview of Synergistic Data Tools for Biological Scrutiny

Total Page:16

File Type:pdf, Size:1020Kb

An Overview of Synergistic Data Tools for Biological Scrutiny Review DOI: 10.1002/ijch.201200094 An Overview of Synergistic Data Tools for Biological Scrutiny Tsviya Olender,*[a] Marilyn Safran,[a] Ron Edgar,[b] Gil Stelzer,[a] Noam Nativ,[a] Naomi Rosen,[a] Ronit Shtrichman,[b] Yaron Mazor,[b] Michael D. West,[b] Ifat Keydar,[a] Noa Rappaport,[a] Frida Belinky,[a] David Warshawsky,[b] and Doron Lancet[a] Abstract: A network of biological databases is reviewed, the realm of disease scrutiny, we portray MalaCards, a novel supplying a framework for studies of human genes and the comprehensive database of human diseases and their anno- association of their genomic variations with human pheno- tations. Also shown is GeneKid, a tool aimed at generating types. The network is composed of GeneCards, the human novel kidney disease biomarkers using systems biology, as gene compendium, which provides comprehensive informa- well as Xome, a database for whole-exome next-generation tion on all known and predicted human genes, along with DNA sequences for human diseases in the Israeli popula- its suite members GeneDecks and GeneLoc. Two databases tion. Finally, we show LifeMap Discovery, a database of em- are shown that address genes and variations focusing on ol- bryonic development, stem cell research and regenerative factory reception (HORDE) and transduction (GOSdb). In medicine, which links to both GeneCards and MalaCards. Keywords: bioinformatics · databases · gene sequencing · genetic disease · olfaction 1 Introduction 2 GeneCards A main focus of our research is the use of genomics and The individual scientist, seeking research knowledge bioinformatics tools to obtain novel biological insights. about a gene of interest, can be overwhelmed by the These approaches involve the application of computation- deluge of data from worldwide genome projects. The la- al strategies to biological information management and borious task of sifting through hundreds of thousands of analysis. We develop data compendia that assist in obtain- records can be reduced by the use of integrated flexible ing a better understanding of how genes and their varia- and user-friendly databases. For over 15 years, we have tions underlie different human phenotypes. We focus on developed and expanded GeneCards (www.genecards several parallel threads, including human olfaction, Men- .org), a comprehensive, authoritative compendium of an- delian and complex diseases, embryonic development and notative information about human genes, accessed yearly stem cell research. In the spirit of systems biology, we by more than two million unique users. Its gene-centric help generate genome-wide views to facilitate gene-based content is automatically mined and integrated from over inquiry. One of our long-time fortes is a gene-centric view 100 electronic sources, including HGNC, NCBI, Ensembl, of the human genome universe, as embodied in Gene- UniProt, UCSC, REACTOME, and BioGPS, resulting in Cards. More recently, we generated in parallel the human web-based cards with sections encompassing a variety of disease compendium MalaCards. Both are richly annotat- topics (e.g., genomic location, gene function, transcrip- ed from numerous web resources and provide focused in- tion, disorder, literature, and pathways) for each of more formation to external data structures (Figure 1). A special than 122,000 human gene entries.[1] GeneCards also fea- liaison is between the Weizmann data structure and the LifeMap Discovery database, which provides a unique [a] T. Olender, M. Safran, G. Stelzer, N. Nativ, N. Rosen, I. Keydar, angle on embryonic development and stem cell research. N. Rappaport, F. Belinky, D. Lancet Department of Molecular Genetics Such a network of databases helps our own teams as well Weizmann Institute of Science as numerous external users worldwide carry out genome Rehovot 76100 (Israel) and bioinformatics research, creating a live digital “eco- e-mail: [email protected] system” that stores and expands knowledge. This paper [b] R. Edgar, R. Shtrichman, Y. Mazor, M. D. West, D. Warshawsky presents an overview of each of these databases, and ex- LifeMap Sciences amples of interconnections and synergistic benefits. HaNehoshet 10, Tel Aviv 69710 (Israel) Isr. J. Chem. 2013, 53, 185 – 198 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 185 Review T. Olender et al. tures comprehensive searches of all mined data, allowing fields and receive tabulated output. Recent research fo- one to generate data relations that would otherwise be cuses include revamped gene expression visualization, in- concealed. Also shown are links to gene-related research cluding RNA-seq data from the Illumina BodyMap proj- reagents such as antibodies, recombinant proteins, DNA ect, and an expanded 76-tissue repertoire from BioGPS[2] clones and inhibitory RNAs, was well as human embryon- and CGAP Serial Analysis of Gene Expression (SAGE)[3] ic stem and progenitor cell lines. The GeneCards suite data. member GeneALaCart provides batch query support, whereby users submit a gene list (e.g., from a microarray 2.1 ncRNA Genes experiment) along with desired GeneCards annotation Non-coding RNA (ncRNA) genes in the human genome have been increasingly studied in the past few years, and Dr. Tsviya Olender has a PhD in physi- new ncRNA classes have been discovered.[4] However, cal chemistry from the Weizmann Insti- tute. In 1999, she joined the group of until recently, no comprehensive non-redundant database Prof. Lancet as a postdoc and contin- for all ncRNA human genes was available. We have re- ued on as a research associate. Her re- cently completed a major unification effort of ncRNA search focuses on the genetics of gene entries within GeneCards, integrated from 15 differ- human olfaction and the genetics of ent data sources by mining four secondary sources[5] and rare diseases using computational several additional primary sources. An integration algo- tools. Together with Prof. Lancet, she rithm based on mapping to genomic coordinates em- has played a key role in discovering ol- ployed, among others, another GeneCards suite member, factory receptor genes from humans GeneLoc[6] (Section 2.3). This was used to cluster overlap- and other mammals, and developed ping entries at unified locations, thereby merging parallel the human OR data explorer (HORDE) versions of the same gene, as well as precursors with their database. She is also involved in developing tools for deciphering [1a] disease-causing mutations, including the Xome database described mature transcripts. A total of ~64,000 new human here. ncRNA genes were added to GeneCards, resulting in a total of almost 80,000 entries belonging to 14 RNA Marilyn Safran is the head of develop- classes, including 22,000 piwi-interacting RNAs (piRNAs) ment for the multi-disciplinary Gene- and 17,000 long non-coding RNAs (LncRNAs). Gene- Cards/MalaCards team in the Lancet Cards V3.09 now contains ~122,500 gene entries, with lab at the Weizmann Institute of Sci- a greater than fivefold enhancement of the ncRNA ence in Israel. Prior to joining in 1997, count, compared to the pre-unification version 3.07 (Fig- she did software development and ure 2A). Among the annotations provided is a quality management at Ubique Inc., the CS score, reflecting the degree of confidence that the entry is department at Weizmann, Bell Labora- indeed a bona fide gene, based on functional annotations tories, MIT’s Lincoln Laboratory, and as well as expression. Current work in progress includes the Memorial Sloan Kettering Cancer Center, in the fields of bioinformatics, a probabilistic categorization, indicating source-related web applications, compilers, databas- ambiguities in gene classification (Figure 2B). es, and medicine. Marilyn received a BA in Math with Honors from Queens College of CUNY in 1974, 2.2 GeneDecks and an MS in Computer Science from Boston University in 1978. Enrichment analysis of high-throughput data is a key Prof. Lancet, head of the Weizmann In- method used to glean new biological insights from experi- stitute’s Crown Genome Center, has mental results.[7] GeneDecks, a GeneCards suite member a PhD from Weizmann and postdoctor- (www.genecards.org/GeneDecks), is an annotation-based al training from Harvard and Yale. He enrichment analysis tool for human genes and gene sets. pioneered research on the biochem- It constitutes a systems biology facilitator, leveraging istry, genetics and evolution of olfac- GeneCards unique wealth of combinatorial annotations. tion, and investigates human rare dis- ease genes. Lancet and his team devel- In its Set Distiller mode, GeneDecks exposes commonali- oped GeneCards, a world-renowned ties within a list of genes, shedding functional light upon web compendium of human genes, the constituent individual genes (Figure 3A). Descriptors and more recently initiated a compan- that are enriched in the query gene set are sorted first by ion database MalaCards, a comprehen- their statistical significance and then by their shared gene sive web tool for human diseases. count. GeneDecks strength lies in the broad range of at- Lancet was awarded the Takasago and Wright Awards in the USA tributes available for use in its analyses, coming from and the Landau Prize in Israel. He is a member of EMBO and (until eight categories, such as disorders, compounds, pathways, 2012) of the HUGO Council. Gene Ontology (GO) terms and expression. It produces 186 www.ijc.wiley-vch.de 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Isr. J. Chem. 2013, 53, 185 – 198 Synergistic Data Tools for Biological Scrutiny Figure 1. Data integration and information flow. One hundred varied external sources provide detailed information to the GeneCards inte- grated database of human genes, along with its suite members, GeneLoc exon-based genomic map and GeneDecks sets analysis. Gene- Cards has recently greatly enhanced its ncRNA gene coverage. GeneCards empowers specific databases as shown, covering tools for olfac- tory genes as well as disease-specific genes and biomarkers. Two novel medically focused databases, MalaCards and LifeMap Discovery, to- gether with GeneCards, form a powerful trio of mutually enriching bioinformatics tools.
Recommended publications
  • Visual Exploration Across Biomedical Databases
    1 Visual Exploration Across Biomedical Databases Michael D. Lieberman∗†‡ Sima Taheri∗ Huimin Guo∗ Fatemeh Mir-Rashed∗ [email protected] [email protected] [email protected] [email protected] Inbal Yahav§ Aleks Aris∗ Ben Shneiderman∗† [email protected] [email protected] [email protected] Abstract—Though biomedical research often draws on knowl- mentioned in d. d may also have associations with other edge from a wide variety of fields, few visualization methods PubMed documents that cite d as a reference, as well as for biomedical data incorporate meaningful cross-database ex- associations to the PubMed documents that d itself cites. ploration. A new approach is offered for visualizing and explor- ∈ ing a query-based subset of multiple heterogeneous biomedical Furthermore, each gene g G could have associations with databases. Databases are modeled as an entity-relation graph the proteins for which g codes, or the DNA sequences in containing nodes (database records) and links (relationships which g’s code appears. Usually, the various types of records between records). Users specify a keyword search string to in these databases also have many attributes associated with retrieve an initial set of nodes, and then explore intra- and inter- them. For example, PubMed documents might be annotated database links. Results are visualized with user-defined semantic substrates to take advantage of the rich set of attributes usually with the date of publication, authors, and general topics, while present in biomedical data. Comments from domain experts gene records could be annotated with the relevant species, indicate that this visualization method is potentially advantageous location on chromosome, or function.
    [Show full text]
  • A Novel Transmembrane Glycoprotein Cancer Biomarker Present in the X Chromosome ANA PAULA DELGADO, SHEILIN HAMID, PAMELA BRANDAO and RAMASWAMY NARAYANAN
    CANCER GENOMICS & PROTEOMICS 11 : 81-92 (2014) A Novel Transmembrane Glycoprotein Cancer Biomarker Present in the X Chromosome ANA PAULA DELGADO, SHEILIN HAMID, PAMELA BRANDAO and RAMASWAMY NARAYANAN Department of Biological Sciences, Charles E. Schmidt College of Science, Florida Atlantic University, Boca Raton, FL, U.S.A. Abstract. Background: The uncharacterized proteins of the genome are of unknown nature (10). These proteins and the human proteome offer an untapped potential for cancer non-coding RNAs are called the ‘dark matter’ of the genome biomarker discovery. Numerous predicted open reading (11-13). Whereas in the past most gene discovery has frames (ORFs) are present in diverse chromosomes. The revolved around known genes due to the ease of follow-up mRNA and protein expression data, as well as the mutational studies (14-17), the dark matter of the genome offers an and variant information for these ORF proteins are available untapped potential (18). Realizing the importance of this area, in the cancer-related bioinformatics databases. Materials the US National Cancer Institute has recently announced a and Methods: ORF proteins were mined using bioinformatics major initiative called illuminating the dark matter for and proteomic tools to predict motifs and domains, and druggable targets (http://commonfund.nih.gov/idg/index). cancer relevance was established using cancer genome, Establishing cancer relevance for novel or uncharacterized transcriptome and proteome analysis tools. Results: A novel proteins is a crucial first step in lead discovery. The cancer testis-restricted ORF protein present in chromosome X called genomes of patients from around the world can be readily CXorf66 was detected in the serum, plasma and neutrophils.
    [Show full text]
  • BIOINFORMATICS ORIGINAL PAPER Doi:10.1093/Bioinformatics/Btu524
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Open Repository and Bibliography - LuxembourgBioinformatics Advance Access published August 26, 2014 2014, pages 1–8 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btu524 Data and text mining Advance Access publication August 6, 2014 BioTextQuest+: a knowledge integration platform for literature mining and concept discovery Nikolas Papanikolaou1,y, Georgios A. Pavlopoulos1,y, Evangelos Pafilis2, Theodosios Theodosiou1,ReinhardSchneider3, Venkata P. Satagopam3, Christos A. Ouzounis4,5, Aristides G. Eliopoulos1,6, Vasilis J. Promponas7 and Ioannis Iliopoulos1,* 1Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, 2Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, 3Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, 4Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Downloaded from Thessalonica, Greece, 5Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, 6Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and 7Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus http://bioinformatics.oxfordjournals.org/ Associate Editor: Alfonso Valencia ABSTRACT Contact: [email protected] or georgios.pavlopoulos@esat. Summary: The iterative process of finding relevant information in kuleuven.be biomedical literature and performing bioinformatics analyses might Supplementary information: Supplementary data are available at result in an endless loop for an inexperienced user, considering the Bioinformatics online.
    [Show full text]
  • New Insights Into the Immune System Using Dirty Mice Sara E
    New Insights into the Immune System Using Dirty Mice Sara E. Hamilton, Vladimir P. Badovinac, Lalit K. Beura, Mark Pierson, Stephen C. Jameson, David Masopust and This information is current as Thomas S. Griffith of September 27, 2021. J Immunol 2020; 205:3-11; ; doi: 10.4049/jimmunol.2000171 http://www.jimmunol.org/content/205/1/3 Downloaded from References This article cites 100 articles, 17 of which you can access for free at: http://www.jimmunol.org/content/205/1/3.full#ref-list-1 http://www.jimmunol.org/ Why The JI? Submit online. • Rapid Reviews! 30 days* from submission to initial decision • No Triage! Every submission reviewed by practicing scientists • Fast Publication! 4 weeks from acceptance to publication by guest on September 27, 2021 *average Subscription Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2020 by The American Association of Immunologists, Inc. All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. New Insights into the Immune System Using Dirty Mice ,†,‡,x {,‖,# Sara E. Hamilton,* Vladimirx P. Badovinac,x Lalit K. Beura,** Mark Pierson,*x xx Stephen C. Jameson,*,†,‡, David Masopust,†,‡, ,†† and Thomas S. Griffith†,‡, ,‡‡, The mouse (Mus musculus) is the dominant organism then is how to improve the value of experimental rodent re- used to investigate the mechanisms behind complex search to increase the success rate of translating preclinical immunological responses because of their genetic sim- findings to the clinic.
    [Show full text]
  • Structure and Function of the Human Genome
    Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Perspective Structure and function of the human genome Peter F.R. Little School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney 2074, New South Wales, Australia The human genome project has had an impact on both biological research and its political organization; this review focuses primarily on the scientific novelty that has emerged from the project but also touches on its political dimensions. The project has generated both anticipated and novel information; in the later category are the description of the unusual distribution of genes, the prevalence of non-protein-coding genes, and the extraordinary evolutionary conservation of some regions of the genome. The applications of the sequence data are just starting to be felt in basic, rather than therapeutic, biomedical research and in the vibrant human origins and variation debates. The political impact of the project is in the unprecedented extent to which directed funding programs have emerged as drivers of basic research and the organization of the multidisciplinary groups that are needed to utilize the human DNA sequence. The past decade in biological research has surely been the decade If the biomedical goals of the HGP are in the future, the of genome research—from the scientific perspective, in the pub- immediate outcomes expected by the scientific community were lic imagination, and even in the minds of international politi- perhaps more pragmatic; a description of the gross structure of cians. It is therefore timely to use this 10th anniversary of Genome the human DNA sequence, the number of genes, and the pro- Research to take stock of where we are and where we might be in teins these might encode.
    [Show full text]
  • Protein Sequence Databases Rolf Apweiler1,, Amos Bairoch2 and Cathy H Wu3
    Protein sequence databases Rolf Apweiler1,Ã, Amos Bairoch2 and Cathy H Wu3 A variety of protein sequence databases exist, ranging from information provided by the genome projects and to a simple sequence repositories, which store data with little or no range of new technologies in protein science. For exam- manual intervention in the creation of the records, to expertly ple, mass spectrometry approaches are being used in curated universal databases that cover all species and in which protein identification and in determining the nature of the original sequence data are enhanced by the manual post-translational modifications [1]. These and other addition of further information in each sequence record. methods make it possible to quickly identify large num- As the focus of researchers moves from the genome to the bers of proteins, to map their interactions, to determine proteins encoded by it, these databases will play an even their location within the cell [2] and to analyse their more important role as central comprehensive resources of biological activities. Protein sequence databases play a protein information. Several the leading protein sequence vital role as a central resource for storing the data gen- databases are discussed here, with special emphasis on the erated by these and more conventional efforts, and mak- databases now provided by the Universal Protein ing them available to the scientific community. Knowledgebase (UniProt) consortium. To exploit the various resources fully, it is essential to Addresses distinguish between them
    [Show full text]
  • Convergent and Distributed Effects of the 3Q29 Deletion on the Human Neural Transcriptome ✉ Esra Sefik1,2,7, Ryan H
    Translational Psychiatry www.nature.com/tp ARTICLE OPEN Convergent and distributed effects of the 3q29 deletion on the human neural transcriptome ✉ Esra Sefik1,2,7, Ryan H. Purcell 3,4,7, The Emory 3q29 Project*, Elaine F. Walker2, Gary J. Bassell3,4 and Jennifer G. Mulle 1,5 © The Author(s) 2021 The 3q29 deletion (3q29Del) confers high risk for schizophrenia and other neurodevelopmental and psychiatric disorders. However, no single gene in this interval is definitively associated with disease, prompting the hypothesis that neuropsychiatric sequelae emerge upon loss of multiple functionally-connected genes. 3q29 genes are unevenly annotated and the impact of 3q29Del on the human neural transcriptome is unknown. To systematically formulate unbiased hypotheses about molecular mechanisms linking 3q29Del to neuropsychiatric illness, we conducted a systems-level network analysis of the non-pathological adult human cortical transcriptome and generated evidence-based predictions that relate 3q29 genes to novel functions and disease associations. The 21 protein-coding genes located in the interval segregated into seven clusters of highly co-expressed genes, demonstrating both convergent and distributed effects of 3q29Del across the interrogated transcriptomic landscape. Pathway analysis of these clusters indicated involvement in nervous-system functions, including synaptic signaling and organization, as well as core cellular functions, including transcriptional regulation, posttranslational modifications, chromatin remodeling, and mitochondrial metabolism. Top network-neighbors of 3q29 genes showed significant overlap with known schizophrenia, autism, and intellectual disability-risk genes, suggesting that 3q29Del biology is relevant to idiopathic disease. Leveraging “guilt by association”, we propose nine 3q29 genes, including one hub gene, as prioritized drivers of neuropsychiatric risk.
    [Show full text]
  • The Mouse Genome
    Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Perspective The mouse genome Jean Louis Guénet Département de Biologie du Développement, Institut Pasteur, 75724 Paris Cedex 15, France The house mouse has been used as a privileged model organism since the early days of genetics, and the numerous experiments made with this small mammal have regularly contributed to enrich our knowledge of mammalian biology and pathology, ranging from embryonic development to metabolic disease, histocompatibility, immunology, behavior, and cancer. Over the past two decades, a number of large-scale integrated and concerted projects have been undertaken that will probably open a new era in the genetics of the species. The sequencing of the genome, which will allow researchers to make comparisons with other mammals and identify regions conserved by evolution, is probably the most important project, but many other initiatives, such as the massive production of point or chromosomal mutations associated with comprehensive and standardized phenotyping of the mutant phenotypes, will help annotation of the ∼25,000 genes packed in the mouse genome. In the same way, and as another consequence of the sequencing, the discovery of many single nucleotide polymorphisms and the development of new tools and resources, like the Collaborative Cross, will contribute to the development of modern quantitative genetics. It is clear that mouse genetics has changed dramatically over the last 10–15 years and its future looks promising. Because of its many advantages as an animal model, geneticists sands of new mutations with chemicals or by gene trapping, the have used the house mouse since the early days of genetics.
    [Show full text]
  • Gocats: a Tool for Categorizing Gene Ontology Into Subgraphs of User-Defined Concepts
    University of Kentucky UKnowledge UKnowledge Molecular and Cellular Biochemistry Faculty Publications Molecular and Cellular Biochemistry 6-11-2020 GOcats: A Tool for Categorizing Gene Ontology into Subgraphs of User-Defined Concepts Eugene Waverly Hinderer III University of Kentucky, [email protected] Hunter N. B. Moseley University of Louisville, [email protected] Follow this and additional works at: https://uknowledge.uky.edu/biochem_facpub Part of the Biochemistry, Biophysics, and Structural Biology Commons, and the Cell and Developmental Biology Commons Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Repository Citation Hinderer, Eugene Waverly III and Moseley, Hunter N. B., "GOcats: A Tool for Categorizing Gene Ontology into Subgraphs of User-Defined Concepts" (2020). Molecular and Cellular Biochemistry Faculty Publications. 173. https://uknowledge.uky.edu/biochem_facpub/173 This Article is brought to you for free and open access by the Molecular and Cellular Biochemistry at UKnowledge. It has been accepted for inclusion in Molecular and Cellular Biochemistry Faculty Publications by an authorized administrator of UKnowledge. For more information, please contact [email protected]. GOcats: A Tool for Categorizing Gene Ontology into Subgraphs of User-Defined Concepts Notes/Citation Information Published in PLOS ONE, v. 15, no. 6, 0233311, p.1-29. © 2020 Hinderer III, Moseley. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Digital Object Identifier (DOI) https://doi.org/10.1371/journal.pone.0233311 This article is available at UKnowledge: https://uknowledge.uky.edu/biochem_facpub/173 PLOS ONE RESEARCH ARTICLE GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts 1 1,2,3,4,5 Eugene W.
    [Show full text]
  • Fundoscopy-Directed Genetic Testing to Re-Evaluate Negative Whole
    Cho et al. Orphanet Journal of Rare Diseases (2020) 15:32 https://doi.org/10.1186/s13023-020-1312-1 RESEARCH Open Access Fundoscopy-directed genetic testing to re- evaluate negative whole exome sequencing results Ahra Cho1,2,3, Jose Ronaldo Lima de Carvalho Jr1,3,4,5, Akemi J. Tanaka6, Ruben Jauregui1,3,7, Sarah R. Levi1,3, Alexander G. Bassuk8, Vinit B. Mahajan9,10 and Stephen H. Tsang1,3,6* Abstract Background: Whole exome sequencing (WES) allows for an unbiased search of the genetic cause of a disease. Employing it as a first-tier genetic testing can be favored due to the associated lower incremental cost per diagnosis compared to when using it later in the diagnostic pathway. However, there are technical limitations of WES that can lead to inaccurate negative variant callings. Our study presents these limitations through a re-evaluation of negative WES results using subsequent tests primarily driven by fundoscopic findings. These tests included targeted gene testing, inherited retinal gene panels, whole genome sequencing (WGS), and array comparative genomic hybridization. Results: Subsequent genetic testing guided by fundoscopy findings identified the following variant types causing retinitis pigmentosa that were not detected by WES: frameshift deletion and nonsense variants in the RPGR gene, 353- bp Alu repeat insertions in the MAK gene, and large exonic deletion variants in the EYS and PRPF31 genes. Deep intronic variants in the ABCA4 gene causing Stargardt disease and the GUCY2D gene causing Leber congenital amaurosis were also identified. Conclusions: Negative WES analyses inconsistent with the phenotype should raise clinical suspicion. Subsequent genetic testing may detect genetic variants missed by WES and can make patients eligible for gene replacement therapy and upcoming clinical trials.
    [Show full text]
  • Structural Variation in the Human Genome
    WHITEPAPER STRUCTURAL VARIATION IN THE HUMAN GENOME THE LEADER IN LONG-READ SEQUENCING The past quarter century has brought tremendous progress in the detection of single nucleotide variants (SNVs), but intermediate-sized (50 bp to 50 kb) structural variants (SV) remain a challenge. Such variants are too small to detect with cytogenetic methods, but too large to reliably discover HUMAN BIOMEDICAL RESEARCH with short-read DNA sequencing. Recent high-quality genome assemblies using PacBio long-read sequencing have revealed that each human genome has approximately 20,000 structural variants, spanning 10 million base pairs, more than twice the number of bases affected by SNVs1,2,3,4. Base pairs affected 5 Mb 3 Mb 10 Mb SNVs Indels Structural variants 1 bp <50 bp >50 bp Figure 1. Variation between two human genomes, by number of base pairs impacted. SNVs = Single nucleotide variants; indels = insertions and deletions4. www.pacb.com/sv Structural variants of all types are known to cause Mendelian disease and contribute to complex disease. All of these variants can be most robustly detected by PacBio Single Molecule, Real-Time (SMRT®) Sequencing. Genomics studies have shown that While these disorders are individually rare, the insertions, deletions, duplications, they collectively affect ~350 million people translocations, inversions, and tandem globally9, and the lengthy diagnostic repeat expansions in the structural variant odysseys these patients face has a size range can cause Mendelian disease, significant economic impact on society. 5 including Carney Complex , Potocki-Lupski Structural variants also contribute to 6 7 syndrome , and Smith-Magenis syndrome . complex disorders, including However, robust detection of structural autism10,11, cancer12,13, Alzheimer’s variants remains a hurdle.
    [Show full text]
  • Emerging Roles of Protein Kinase D1 in Cancer
    Published OnlineFirst June 16, 2011; DOI: 10.1158/1541-7786.MCR-10-0365 Molecular Cancer Subject Review Research Emerging Roles of Protein Kinase D1 in Cancer Vasudha Sundram1, Subhash C. Chauhan1,2,3, and Meena Jaggi1,2,3 Abstract Protein kinase D1 (PKD1) is a serine-threonine kinase that regulates various functions within the cell, including cell proliferation, apoptosis, adhesion, and cell motility. In normal cells, this protein plays key roles in multiple signaling pathways by relaying information from the extracellular environment and/or upstream kinases and converting them into a regulated intracellular response. The aberrant expression of PKD1 is associated with enhanced cancer phenotypes, such as deregulated cell proliferation, survival, motility, and epithelial mesenchymal transition. In this review, we summarize the structural and functional aspects of PKD1 and highlight the pathobiological roles of this kinase in cancer. Mol Cancer Res; 9(8); 985–96. Ó2011 AACR. Introduction ing pathways, and to disseminate information from the PKC signaling pathway as well. Further, PKD family Protein kinase D1 (PKD1) is a ubiquitous serine-threo- members exhibit diverse subcellular localizations (such as nine protein kinase that is involved in a multitude of cytosol, membrane, nucleus, Golgi, and mitochondria), functions in both normal and diseased states (1–3). This which allow them to influence different signaling pathways evolutionarily well-conserved protein, with homologs in (5–7). mice (Mus musculus), rats (Rattus norvegicus), flies (Droso- The PKD family contains 3 members that are homo- phila melanogaster), worms (Caenorhabditis elegans), and logous in structure and function, namely, PKD1, PKD2, yeast (Saccharomyces cerevisiae), was initially recognized as and PKD3 (5–9).
    [Show full text]