Guest Editorial the Human Gene Mutation Database: Providing a Comprehensive Central Mutation Database for Molecular Diagnostics and Personalised Genomics
Total Page:16
File Type:pdf, Size:1020Kb
GUEST EDITORIAL Guest Editorial The Human Gene Mutation Database: Providing a comprehensive central mutation database for molecular diagnostics and personalised genomics Reading the recent Editorial in the Journal by valuable extra features, including an expanded Richard Cotton,1 which publicised the proposed search engine, genomic coordinates, additional lit- new Human Variome Project, one could be for- erature references, Human Genome Variation given for thinking that no central repository for Society (HGVS) nomenclature (http://www.hgvs. inherited human gene mutations of pathological org/mutnomen) and a suite of advanced search significance currently exists. In practice, however, tools that greatly enhance the utility of the database. the Human Gene Mutation Database (HGMDw; Together with BIOBASE, we are working toward http://www.hgmd.org)2 already constitutes a com- being able to make all HGMD data and search prehensive collection of single base-pair substi- tools available to the academic community free of tutions in coding (missense and nonsense), charge and in a timely fashion, with the costs of regulatory and splicing-relevant regions of human upkeep being borne primarily by industry and nuclear genes, micro-deletions and micro- commerce. We believe that this funding model insertions, indels, repeat expansions, as well as gross should not only guarantee the financial viability of gene lesions (deletions, insertions and duplications) HGMD, but also allow this unique resource to be and complex gene rearrangements. This unique sustainable into the long term, to the benefit of the resource currently contains in excess of 96,000 scientific community. different germline mutations and disease-associated/ Although HGMD has now become the de facto functional polymorphisms—in a total of over 3,600 central disease-associated mutation database, there nuclear genes (December 2009 release)—causing or are several other valuable sources of human associated with human inherited disease. mutation data available. Online Mendelian HGMD currently provides free access to the Inheritance in Man (OMIM;3 http://www.ncbi. bulk of its mutation data to over 30,000 registered nlm.nih.gov/sites/entrez?db=omim) is a formidable academic/non-profit users worldwide. In the absence genotype–phenotype knowledge base, but it is not of any public funding, HGMD is maintained cour- comprehensive with respect to mutation data on tesy of a subscription-based version (HGMD account of its policy of providing only specimen Professional), distributed through BIOBASE examples of allelic variants (in a total of 2,328 GmbH (http://www.biobase-international.com). human genes). In addition to inherited pathological HGMD Professional not only provides access to mutations, it should be noted that OMIM also the very latest mutation data, but also contains contains some somatic lesions, neutral # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 4. NO 2. 69–72 DECEMBER 2009 69 GUEST EDITORIAL Stenson et al. polymorphisms, disease-associated single nucleotide genes (including 290 known cancer genes) and polymorphism (SNP) haplotypes and data from which represents the somatic equivalent of HGMD. genome-wide association studies (GWAS), none of The above resources notwithstanding, HGMD which are included in HGMD. The HGVS website Professional remains the only comprehensive data- (http://www.hgvs.org/dblist/glsdb.html) provides a base of germline mutations in nuclear genes under- list of URLs for the 670 internet-accessible lying or associated with human inherited disease. It locus-specific mutation databases (LSDBs). can be used to search for newly identified gene Although curated by experts in specific genes/pro- lesions to determine whether or not they are novel. teins, these databases cannot be screened in combi- It can be searched on a gene-wise basis to obtain nation and, even if they could, they would only an overview of the known mutational spectrum for represent a fraction of the mutational lesions listed a given gene (via a dynamic mutation viewer in HGMD. Moreover, as Cotton1 himself readily which depicts coding region mutations superim- admits, many of these LSDBs are incomplete, out posed on the cDNA sequence of a gene). It can of date, inaccessible, moribund or otherwise non- also be searched for other examples of a specific functional. The National Center for Biotechnology type of mutation in a specific location (eg at pos- Information (NCBI)’s Single Nucleotide ition þ5 to a donor splice site) in order to garner Polymorphism Database (dbSNP; http://www.ncbi. evidence for the pathological authenticity of a nlm.nih.gov/projects/SNP/) has recently begun to given lesion. include variants of clinical significance from The ‘advanced search’ facility, available as part of OMIM, from some LSDBs and by direct sub- HGMD Professional, is a suite of software tools mission, thereby extending its original remit of which has been designed to enhance mutation listing only human polymorphisms. These variants searching, viewing and retrieval. Currently, two of are currently listed under ‘Clinical/LSDB vari- the main types of mutation listed in HGMD (single ation’. Although dbSNP currently only contains nucleotide substitutions and small micro-lesions, 8,669 such entries, it is anticipated that this accounting for .90 per cent of all entries) may be number will steadily increase in the future. For interrogated with this toolset. The datasets for each now, however, utility is somewhat limited owing to mutation type may be readily combined (eg micro- the difficulty in accessing phenotypic information. deletions, micro-insertions and indels) in order to Finally, the Pharmacogenomics Knowledge Base allow more powerful searching across comparable (PharmGKB;4 http://www.pharmgkb.org) displays types of mutation. When utilising the advanced some overlap with HGMD’s own 4,600 func- search, users may tailor their queries with more tional and disease-associated polymorphism entries specific criteria, including amino acid exchange, (http://www.functional-polymorphism.org). nucleotide substitution, micro-deletion/insertion/ PharmGKB contains only 1,575 annotated SNPs, indel size and composition, motif searching (both however, and much of the data in this knowledge- created and abolished), dbSNP number and article base would appear to relate to GWAS data and title/abstract keywords. The Mutation human genetic variants that confer differential Interpretation Software offered by Alamut (http:// responsiveness to drugs, neither of which are www.interactive-biosoftware.com/alamut.html) is covered by HGMD. Complementing the above similar to some of the tools provided by HGMD germline mutation databases are those that focus Professional; the utility of the Alamut software will exclusively on somatic mutations, usually in associ- soon be greatly enhanced by the provision of links ation with tumorigenesis. The most comprehensive to the mutation data present in HGMD of these is the Catalogue of Somatic Mutations in Professional. Cancer (COSMIC;5 http://www.sanger.ac.uk/gen- HGMD/HGMD Professional will be continu- etics/CGP/cosmic/), which currently (v. 43) con- ally improved by the inclusion of novel mutation tains over 88,000 mutations in 2,927 different data and new informational or software features. 70 # HENRY STEWART PUBLICATIONS 1479–7364. HUMAN GENOMICS. VOL 4. NO. 2. 69–72 DECEMBER 2009 The Human Gene Mutation Database GUEST EDITORIAL Indeed, the provision of further supplementary additional costs inherent in organising, coordinat- information—including additional clinical pheno- ing, monitoring and remunerating this number of types observed with a given mutation, fully anno- experts. By contrast, HGMD has already managed tated genomic sequences for all HGMD genes, to build a central mutation database for a small genomic coordinates for as many mutations as fraction of this proposed budget and is financially possible, multiple additional references for each self-sustaining. mutation, gene and disease ontologies and in vitro The first of the ten stated ‘key objectives’ of the characterisation data—is already well underway. Human Variome Project is to ‘capture and archive Further developments in the pipeline include all human gene variation associated with human tools for the in silico annotation of mutational disease in a central location’.7 We u r g e t h e o r g a n - information and the curation of in vitro/in vivo isers of the Human Variome Project to abandon functional data to allow the identification of any plans they might have to coordinate centrally specific types of gene lesion from a functional the various (largely independent and dissimilarly standpoint. This should allow the rapid identifi- funded) existing entities such as dbSNP, European cation of, for example: all characterised/predicted Bioinformatics Institute (EBI) (http://www.ebi.ac. mutations in microRNA binding sites, exon splice uk), GEN2PHEN (http://www.gen2phen.org; enhancers or polyadenylation sites; or all charac- one of whose partners is BIOBASE), University terised/predicted mutations giving rise to the gain of California Santa Cruz (UCSC) (http://genome. or loss of a glycosylation or phosphorylation site ucsc.edu), HGMD and OMIM1 in favour of a in the protein product; or (courtesy of BIOBASE’s ‘light touch’ umbrella role. Too much emphasis transcription factor database, TRANSFACw; http:// on central control is likely to inflate adminis- www.biobase-international.com/index.php?id=trans- tration, stifle initiative and hinder