Mouse Genome Database (MGD): Knowledgebase for Mouse- Human Comparative Biology
Total Page:16
File Type:pdf, Size:1020Kb
The Jackson Laboratory The Mouseion at the JAXlibrary Faculty Research 2021 Faculty Research 1-8-2021 Mouse Genome Database (MGD): Knowledgebase for mouse- human comparative biology. Judith A. Blake Richard M. Baldarelli James A. Kadin Joel E Richardson Cynthia Smith See next page for additional authors Follow this and additional works at: https://mouseion.jax.org/stfb2021 Part of the Life Sciences Commons, and the Medicine and Health Sciences Commons Authors Judith A. Blake, Richard M. Baldarelli, James A. Kadin, Joel E Richardson, Cynthia Smith, Carol J Bult, and Mouse Genome Database Group Published online 24 November 2020 Nucleic Acids Research, 2021, Vol. 49, Database issue D981–D987 doi: 10.1093/nar/gkaa1083 Mouse Genome Database (MGD): Knowledgebase for mouse–human comparative biology Judith A. Blake *, Richard Baldarelli, James A. Kadin, Joel E. Richardson, Cynthia L. Smith , Carol J. Bult and the Mouse Genome Database Group The Jackson Laboratory, Bar Harbor, ME, USA Downloaded from https://academic.oup.com/nar/article/49/D1/D981/5999894 by Jackson Laboratory user on 28 January 2021 Received September 15, 2020; Revised October 18, 2020; Editorial Decision October 19, 2020; Accepted November 22, 2020 ABSTRACT lying human biology and disease. MGD serves three ma- jor user communities: (i) biomedical researchers who use The Mouse Genome Database (MGD; http://www. mouse experimentation to investigate genetic and molecu- informatics.jax.org) is the community model organ- lar principles of biology and disease processes, (ii) transla- ism knowledgebase for the laboratory mouse, a tional scientists who use the laboratory mouse to model hu- widely used animal model for comparative studies man disease and (iii) bioinformaticians/computational bi- of the genetic and genomic basis for human health ologists who use the rich integrated data MGD provides to and disease. MGD is the authoritative source for bio- develop algorithms and bioinformatics tools for data anal- logical reference data related to mouse genes, gene ysis and interpretation. MGD processes and data support functions, phenotypes and mouse models of human and comply with FAIR principles (2) with a variety of for- disease. MGD is the primary source for official gene, mats for data distribution. allele, and mouse strain nomenclature based on the MGD maintains a comprehensive catalog of mouse genes and genome features connected to genomic sequence data guidelines set by the International Committee on and biological annotations. As the community model or- Standardized Nomenclature for Mice. MGD’s biocu- ganism knowledgebase for the laboratory mouse, MGD ration scientists curate information from the biomed- contains comprehensive information about mouse gene ical literature and from large and small datasets con- function, genotype-to-phenotype annotations, and mouse tributed directly by investigators. In this report we de- models of human disease. Annotations include: (i) molecu- scribe significant enhancements to the content and lar function, biological process and cellular location of gene interfaces at MGD, including (i) improvements in the products using terms and relations from the Gene Ontol- Multi Genome Viewer for exploring the genomes of ogy (GO) (3), (ii) mouse mutations, variants and human dis- multiple mouse strains, (ii) inclusion of many more ease models with genotypes annotated to the Mammalian mouse strains and new mouse strain pages with ex- Phenotype Ontology (MP) (4) and the Disease Ontology tended query options and (iii) integration of extensive (DO) (5) and (iii) standardized nomenclature and identi- fiers for mouse gene names, symbols, alleles and strains. The data about mouse strain variants. We also describe rigorous application of nomenclature and annotation stan- improvements to the efficiency of literature curation dards in MGD ensures that the information in the resource processes and the implementation of an information is curated consistently to support robust and comprehen- portal focused on mouse models and genes for the sive data retrieval for accurate identification of genes that study of COVID-19. share biological properties and data mining for knowledge discovery (Table 1). INTRODUCTION MGD is a core resource within the Mouse Genome Informatics (MGI) consortium (http://www.informatics. As the cost of genome-scale sequencing continues to de- jax.org). Other MGI consortium databases include the crease and new technologies for genome editing become Gene Expression Database (GXD) (6), the Mouse Mod- widely adopted, the laboratory mouse is more important els of Human Cancer Database (MMHC; formerly the than ever as a model system for understanding the bio- Mouse Tumor Biology database) (7), the Gene Ontol- logical significance of human genetic variation and forad- ogy project (3), MouseMine (8), the International Mouse vancing the emergence of genomic medicine. The Mouse Strain Resource (IMSR) (9) and the CrePortal database Genome Database (MGD) (1) has a unique and strategic of recombinase expressing mice (http://www.informatics. role as a community resource for facilitating the use of the jax.org/home/recombinase). Data included in all resources laboratory mouse for understanding the genomics under- *To whom correspondence should be addressed. Tel: +1 207 288 6248; Email: [email protected] C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D982 Nucleic Acids Research, 2021, Vol. 49, Database issue Table 1. Data for which MGD serves as the authoritative source Data type Community relationship Unified genome feature catalog MGD compares/integrates predictions from Ensembl, NCBI, Havana/Vega, produces unified catalog used by NCBI, IMPC. Gene Ontology (GO) annotations for MGD does primary curation, integrates data from others, provides definitive mouse GO annotation sets mouse to GO site. Mouse Phenotype annotations MGD does primary curation & integrates data from publications & large scale projects. Mouse models of human diseases MGD does primary mouse model curation using disease terms & human gene associations from OMIM and NCBI. Gene-to-nucleotide sequence association Co-curation with MGA (Mouse Genome Annotation) group. Gene-to-protein sequence association Co-curation with UniProt and Protein Ontology. Mammalian Phenotype (MP) Ontology MGD develops & distributes MP. MP is actively used by many groups, e.g., RGD, MRC Harwell, Sanger, IMPC, etc. Downloaded from https://academic.oup.com/nar/article/49/D1/D981/5999894 by Jackson Laboratory user on 28 January 2021 Symbols & names for genes & genome MGD provides access to International Nomenclature guides, implements policies, coordinates with features human and rat groups. Strain designations MGD assigns official nomenclature; provides to repositories. hosted at the MGI website are obtained through a combina- with use of ‘canonical’ identifiers to assert homology be- tion of expert curation of the biomedical literature and au- tween genes in strains within a species. The interface di- tomated or semi-automated processing of data sets down- rectly supports viewing and navigating complex orthol- loaded from more than fifty other data resources. A sum- ogy relationships. mary of the current content of MGD is presented in Table • Uses orthology relationships to infer paralogs, which are 2. used to return additional genes in certain contexts, and In this report we describe significant enhancements to to draw additional connections between displayed genes. MGD, including two new graphical user interfaces: (i) im- Users can turn this feature on/off. provements to the Multiple Genome Viewer for exploring • The user has many controls to adjust the display, such the genomes of multiple mouse strains (ii) enhancement and as specifying which gene labels are displayed, whether to inclusion of information of now over 64 000 strains of mice display all transcripts or just one representative per gene, including collaborative cross strains and (iii) incorporation the ability to download the current display as an PNG or of SNP data with extensive updates in query and visualiza- SVG file, and more. tion of variant × strain data. Other improvements include improvements to literature curation processes, and specific New mouse strain data pages access to data about mouse models for the study of coron- avirus infections. New mouse strain data pages are now available, including information for over 64 000 inbred, mutant congenic and co-isogenic strains, Collaborative Cross strains, recombi- NEW FEATURES AND CURATION WORKFLOW EN- nant inbred and other mouse strain types (Figure 2). The HANCEMENTS newly revamped Strain and SNPs mini-home page (http: //www.informatics.jax.org/home/strain) includes a search Enhancements to the multiple sequence viewer form for mouse strains by nomenclature or by strain at- The Multiple Genome Viewer (MGV, http://www. tribute, quick links to lists of strain collections such as the informatics.jax.org/mgv) is a web-based browser de- Collaborative Cross, and links to tools and strain resources. signed for navigating and comparing multiple related Strain detail pages show standardized nomenclature, genomes simultaneously. The first release of MGV in unique identifiers,