HGNC: Nomenclature of the Horse Genome

1/22/2018 Who are the HGNC and what do we do? • HUGO Gene Nomenclature Committee. • Responsible for approving unique symbols and names for human loci, including protein coding HGNC: genes, ncRNA genes and pseudogenes. • Maintain http://www.genenames.org Nomenclature of the horse • Collaborate with • CCDS: RefSeq, HAVANA, Ensembl, MGI, UCSC genome • Other databases, e.g. GeneCards, UniProt • Specialist resources, e.g. IMGT, IUPHAR, miRBase, Beth Yates – Scientific Programmer, HGNC, GtRNAdb EMBL-EBI, Hinxton, UK • Specialist advisors, e.g. HORDE (ORs), SLCS • GO Consortium, Human Genome Variation Society • Transforming genetic medicine initiative • Other species’ nomenclature groups, e.g. MGNC, RGNC, XGNC, ZNC, AGNC, CGNC, FlyBase https://vertebrate.genenames.org Other Vertebrate Gene Naming Groups What about Vertebrate Species with no Nomenclature Group? • Currently only six vertebrate models organisms have naming groups: • Unofficial names from • Inconsistencies between resources MGNC RGNC CGNC • Duplication of symbols within resources • Genes names may be incorrect or uninformative XGNC AGNC ZNC • Homologous genes are named based on HGNC human nomenclature Vertebrate Gene Nomenclature Committee HCOP – HGNC Comparison of Orthology Predictions tool • Gene Nomenclature Across Species meeting held in Cambridge, UK; in October 2009 • Gene nomenclature should, where possible, reflect homologous relationships across vertebrate species • Consensus naming, based on human gene nomenclature, should be extended beyond the species with nomenclature groups • Care must be taken when assigning gene names to incomplete genomes and to avoid humanisation of non-human genomes • Automated naming should initially focus on consensus 1:1 orthologs as identified by at least two comprehensive orthology resources • Complex families will require expert manual curation for cross-species nomenclature • VGNC is a new initiative extending the role of the HUGO Gene Nomenclature Committee (HGNC) to coordinate gene naming across vertebrates • Data will be distributed to genomic databases and model organism databases • The VGNC website will become a portal for all official vertebrate nomenclature 1 1/22/2018 Automated human nomenclature transferal Rules for gene nomenclature transferal to fail • CYP*/OR* or HLA* in gene symbol • HIST* or H[1-9] in symbol, plus ”histone” in gene name ✓ ✓ ✓ ✓ Human nomenclature • Name contains ( AND ) / opposite / *stream / neighbour / X-linked / Y- automatically transferred linked / region / syndrome / candidate / autosomal / virus / integration / cancer / leukemia / *oma = to orthologous gene if • Human gene locus type is not protein coding Gene A Gene B additional rules are passed • Locus type is not protein coding in both the NCBI gene and Ensembl Orthology assertion gene model for the orthologous gene Gene flagged for • Gene is not on the main assembly in human and/or the other species = investigation by • There is no HGNC approved symbol for the human gene ✓ 7 ✓ ✓ curators Mapping Tool Quick Curate Tool • Quick Curate tool summarises the information we have for a gene and its 1:1 human ortholog • Displays the reason why the human nomenclature was not automatically transferred • Links to Mapping tool which allows you to view Ensembl and NCBI gene models in a single genome browser www.vertebrate.genenames.org Current status of VGNC gene naming Species Number of 1:1 orthologs with Number of 1:1 orthologs with Total protein coding consensus in 4 resources consensus in 3 of the 4 resources genes Total approved (awaiting Total approved (awaiting approval) (Ensembl 91/NCBI gene) approval) Chimp 12,129 (565) 2,952 (517) 23,534 / 23,702 GCA_000001515.5 Cow 10,821 (527) 1639 (1860) 19,994 / 23,289 GCA_000003055.5 Dog 11,897 (493) 142 (2132) 19,856 / 20,424 GCA_000002285.2 Horse 10,142 (454) 935 (2598) 20,449 / 20,502 GCA_000002305.1 2 1/22/2018 Gene search Gene Symbol Report Data Download Dissemination of VGNC nomenclature data • JSON and TSV format files for gene nomenclature data are available from the VGNC website • REST API is currently in development to allow retrieval of • Gene nomenclature information • Orthologous genes • Gene family information Collaborations with expert curators Olfactory receptor naming • Cytochrome P450s • • David Nelson University of Tennessee Our olfactory receptor expert advisor has identified OR genes and • Jed Goldstone Woods Hole Oceanographic Institution pseudogenes in cow that do not have an existing gene model in either of Ensembl or NCBI RefSeq • Olfactory receptors • Doron Lancet & Tsviya Olender HORDE database, Weizmann Institute of Science • Some existing OR gene models in Ensembl are classified as protein coding but • UDP glucuronosyltransferases (UGTs) and Glutathione S-transferases (GSTs) manual curation identifies them as pseudogenes • Jed Goldstone Woods Hole Oceanographic Institution • ELAs • We will feed this information back to RefSeq and Ensembl to allow them to • Zinc Fingers build new or correct existing gene models • Histone genes • Examples of novel OR genes discovered in cow include • OR9S31, olfactory receptor family 9 subfamily S member 31 • OR11C1P, olfactory receptor family 11 subfamily C member 31 pseudogene 3 1/22/2018 Naming keratins across species Example gene family display from HGNC Keratin gene cluster I Keratin gene cluster II Human Chimp Cow Horse Dog Human Chimp Cow Horse Dog c17 c17 c19 c11 c9 c12 c12 c5 c6 c27 KRT222 KRT222 KRT222 KRT222 KRT222 KRT80 KRT80 KRT80 KRT80 KRT80 KRT24 KRT24 KRT24 KRT24 KRT24 KRT7 KRT7 KRT7 KRT7 KRT7 KRT223P - - - - KRT87P KRT87 LOC507184 KRT87P KRT87P KRT25 KRT25 KRT25 KRT25 KRT25 KRT88P KRT88P LOC787600 KRT88 KRT88 KRT26 KRT26 KRT26 KRT26 KRT26 - LOC615451 KRT129P - KRT27 KRT27 KRT27 KRT27 KRT27 KRT81 KRT81 KRT81 KRT81 KRT81 KRT28 KRT28 KRT28 KRT28 KRT28 KRT86 KRT86 KRT86 KRT86 KRT86 KRT10 KRT10 KRT10 KRT10A KRT10 KRT83 KRT83 KRT83 KRT83 KRT83 - - - KRT10B - KRT89P KRT89P KRT89 KRT89 KRT89 KRT12 KRT12 KRT12 KRT12 KRT12 KRT85 KRT85 KRT85 KRT85 KRT85 KRT20 KRT20 KRT20 KRT20 KRT20 KRT84 KRT84 KRT84 KRT84 KRT84 KRT23 KRT23 KRT23 KRT23 KRT23 KRT82 KRT82 KRT82 KRT82 KRT82 KRT39 KRT39 KRT39 KRT39 KRT39 KRT90P KRT90P/KRT124P KRT90/KRT124 KRT124 KRT124 KRT40 KRT40 KRT40 KRT40 KRT40 KRT75 KRT75 KRT75 KRT75 KRT75 KRT33A KRT33A KRT33A KRT33A KRT33A - - - KRT6P - KRT33B KRT33B KRT33B KRT33B KRT33B KRT6B KRT6B KRT6B KRT6B KRT6B KRT34 KRT34 KRT34 KRT34 KRT34 KRT6C KRT6C KRT6C KRT6C - KRT31 KRT31 KRT31 KRT31 KRT31 KRT6A KRT6A KRT6A KRT6A KRT6A KRT41P KRT41 KRT41P KRT41 KRT5 KRT5 KRT5 KRT5 KRT5 KRT37 KRT37 LOC515000 KRT37 KRT37 KRT71 KRT71 KRT71 KRT71 KRT71 KRT38 KRT38 LOC107131523 - KRT38 KRT74 KRT74 KRT74 KRT74 KRT74 KRT43P - - - KRT72 KRT72 KRT72 KRT72 KRT72 KRT32 KRT32 KRT32 KRT32 KRT32 KRT73 KRT73 KRT73 KRT73 KRT73 KRT35 KRT35 KRT35 KRT35 KRT35 KRT128P - - KRT128 - KRT36 KRT36 KRT36 KRT36 KRT36 KRT2 KRT2 KRT2 KRT2A KRT2 KRT13 KRT13 KRT13 KRT13 KRT13 - - - KRT2B - KRT15 KRT15 KRT15 KRT15 KRT15 - - KRT2C - KRT19 KRT19 KRT19 KRT19 KRT19 KRT1 KRT1 KRT1 KRT1 KRT1 KRT9 KRT9 KRT9 KRT9P KRT9 KRT77 KRT77 KRT77 KRT77 KRT77 KRT14 KRT14 KRT14 KRT14 KRT14 KRT126P - - KRT126P KRT126 KRT16 KRT16 KRT16 KRT16 KRT16 KRT127P - - - KRT127 KRT17 KRT17 KRT17 KRT17 KRT17 KRT125P - - - - KRT42P KRT42 KRT42 KRT42 KRT42 KRT76 KRT76 KRT76 KRT76 KRT76 LOC100336907 KRT3 KRT3 KRT3 KRT3 KRT3 Named automatically by VGNC KRT4 KRT4 KRT4 KRT4 KRT4 Named manually by VGNC KRT79 KRT79 KRT79 KRT79 KRT79 Name unresolved KRT78 KRT78 KRT78 KRT78 KRT78 bold VGNC name differs from Ensembl or NCBI Gene KRT8 KRT8 KRT8 KRT8 KRT8 red text Pseudogenes KRT18 KRT18 KRT18 KRT18 KRT18 Future plans with VGNC gene naming • Continue work on naming the remaining chimp, cow, dog and horse protein coding genes • VGNC will prioritize species for naming based on • Quality of genome sequence and annotations • Value of the species in terms of research • Level of interest/input from the research community • Potential next species to consider include pig and rhesus macaque • We will also investigate sheep when Ensembl and NCBI gene have both annotated the latest genome assembly • Gene family curation – specific gene families to be curated across multiple species at the same time Acknowledgements HGNC/VGNC Team Elspeth Bruford Ruth Seal Bethan Yates (Poster P0051) Susan Tweedie Kristian Gray Bryony Braschi Paul Denny Tamsin Jones contact us at [email protected] http://vertebrate.genenames.org 4.

Load more