1/22/2018

Who are the HGNC and what do we do?

• HUGO Nomenclature Committee • Approve unique symbols and names for Vertebrate human loci, including coding , Committee (VGNC): Standardizing ncRNA genes and pseudogenes • Maintain http://www.genenames.org Gene Names in Vertebrates • Collaborate with • CCDS: RefSeq, HAVANA, Ensembl, MGI, UCSC • Other databases, e.g. GeneCards, UniProt, RNA central • Specialist resources, e.g. IMGT, IUPHAR, miRBase, GtRNAdb Susan Tweedie • Specialist advisors, e.g. HORDE (ORs), SLCS • Transforming genetic medicine initiative HUGO Gene Nomenclature Committee (HGNC) • Other species’ nomenclature groups EMBL-EBI, Hinxton, UK

Vertebrate Gene Naming Groups

Six vertebrate models organisms have naming groups:

MGNC RGNC CGNC

XGNC AGNC ZNC

Homologous genes are named based on HGNC human nomenclature

1 1/22/2018

RGD1311747 Hpf1 2700029M09Rik Hpf1

C4orf27 HPF1

c4orf27 zgc:101819 hpf1 hpf1

LOC100567991 hpf1* C4H4orf27 HPF1 HPF1, ‘histone PARylation factor 1’

Vertebrate Gene Nomenclature What about Vertebrate Species with no Committee Nomenclature Group? • Extends role of the HGNC to coordinate gene naming across vertebrates

• Unofficial names from • Gene nomenclature reflects homologous relationships across • Inconsistencies between resources species; names based on human nomenclature • Duplication of symbols within resources • Data shared with genomic and model organism databases (MODs) • Genes names may be incorrect or uninformative

HCOP: HGNC Comparison of Ortholog Predictions tool HCOP: HGNC Comparison of Ortholog Predictions tool

11 external orthology resources, e.g. Ensembl, Panther etc

HCOP HCOP Pipeline orthologs database

HCOP web tool and data file Gene based resources, HGNC, downloads NCBI Gene etc.

2 1/22/2018

HCOP: HGNC Comparison of Orthology Predictions tool

Gene naming across vertebrates Consensus orthologs in chimp Aim: semi-automated naming of consensus 1:1 orthologs as Ensembl identified by 4 comprehensive orthology resources: 5837

375 545 96

Panther 296 2915 10,834 1153 4555 NCBI

Pilot project on chimpanzee (Pan troglodytes) 100 92 68 • high similarity to human

• well assembled genome 229 OMA

Integrated mapping tool

3 1/22/2018

Issues identified in chimp pilot study Automated human nomenclature transferal

• Multi-copy gene families – exclude olfactory receptors, cytochrome P450s, HLA, histones • Symbols mapped to non-syntenic pseudogenes ✓ ✓ ✓ ✓ Human nomenclature • Identifier mapping issues automatically transferred = to orthologous gene if Gene A Gene B extra checks are passed • Locus type conflicts • Annotations differences (missing, concatenated etc) Orthology assertion Gene flagged for • Assembly limitations = investigation by ✓ x ✓ ✓ curators • BUT comparatively few genuinely incorrect orthology calls

VGNC gene names for cow, dog and horse Current status of VGNC gene naming

Species Number of 1:1 orthologs with Number of 1:1 orthologs with Total protein coding consensus in 4 resources consensus in 3 of the 4 resources genes

Total approved (awaiting approval) Total approved (awaiting approval) (Ensembl 91/NCBI gene)

Chimp 12,129 (565) 2,952 (517) 23,534 / 23,702 GCA_000001515.5

Cow 10,821 (527) 1639 (1860) 19,994 / 23,289 • Consensus 1:1 orthologs added to VGNC in August 2017 GCA_000003055.5

• Naming non-consensus orthologs ongoing Dog 11,897 (493) 142 (2132) 19,856 / 20,424 • Nightly updates of pipeline GCA_000002285.2 • Additional genes named automatically Horse 10,142 (454) 935 (2598) 20,449 / 20,502 GCA_000002305.1 • Gene names change in line with human

vertebrate.genenames.org Identification of unnamed human genes

Published as “Familial obliterative portal venopathy (FOPV)”

4 1/22/2018

Collaborations with experts:

• Cytochrome P450s • David Nelson University of Tennessee • Jed Goldstone Woods Hole Oceanographic Institution • Olfactory receptors • Doron Lancet & Tsviya Olender HORDE database, Weizmann Institute of Science • UDP glucuronosyltransferases (UGTs) and Glutathione S-transferases (GSTs) • Jed Goldstone Woods Hole Oceanographic Institution • Zinc Fingers • Histone genes

Keratins named by VGNC automatic pipeline

Keratin gene cluster I gene cluster II Human Chimp Cow Horse Dog Human Chimp Cow Horse Dog c17 c17 c19 c11 c9 c12 c12 c5 c6 c27 67 human keratin genes 2 clusters: KRT222 KRT222 KRT222 KRT80 KRT80 KRT80 KRT80 KRT80 KRT24 KRT24 KRT24 KRT24 KRT24 KRT7 KRT7 KRT7 KRT7 KRT7 KRT223P - - KRT87P KRT87P KRT87P c17 28 protein coding; 4 pseudogenes KRT25 KRT25 KRT25 KRT25 KRT25 KRT88P KRT88 KRT88 KRT26 KRT26 KRT26 KRT26 KRT26 - KRT129P - KRT27 KRT27 KRT27 KRT27 KRT27 KRT81 KRT81 KRT81 KRT81 c12 27 protein coding; 8 pseudogenes KRT28 KRT28 KRT28 KRT28 KRT28 KRT86 KRT86 KRT86 KRT86 KRT10 KRT10 KRT10 KRT10A KRT10 KRT83 KRT83 KRT83 KRT83 - KRT10B - KRT89P KRT89 KRT89 KRT12 KRT12 KRT12 KRT12 KRT12 KRT85 KRT85 KRT85 KRT85 KRT20 KRT20 KRT20 KRT20 KRT84 KRT84 KRT84 KRT84 KRT84 KRT23 KRT23 KRT23 KRT23 KRT82 KRT82 KRT82 KRT82 KRT82 KRT39 KRT39 KRT39 KRT39 KRT39 KRT90P KRT124 KRT124 Named by VGNC pipeline: KRT40 KRT40 KRT40 KRT40 KRT40 KRT75 KRT75 KRT75 KRT75 KRT33A KRT33A KRT33A KRT33A - KRT6P - KRT33B KRT33B KRT33B KRT6B KRT6B KRT6B chimp 42 KRT34 KRT34 KRT34 KRT6C KRT6C - KRT31 KRT31 KRT31 KRT6A KRT6A KRT6A KRT6A KRT41P KRT41P KRT41 KRT5 KRT5 KRT5 cow 29 KRT37 KRT37 KRT37 KRT37 KRT71 KRT71 KRT71 KRT71 KRT71 KRT38 KRT38 - KRT38 KRT74 KRT74 KRT74 KRT74 KRT74 KRT43P - - KRT72 KRT72 KRT72 KRT72 KRT72 horse 24 KRT32 KRT32 KRT32 KRT32 KRT32 KRT73 KRT73 KRT73 KRT73 KRT73 KRT35 KRT35 KRT35 KRT35 KRT35 KRT128P KRT128 - KRT36 KRT36 KRT36 KRT36 KRT36 KRT2 KRT2 KRT2A KRT2 dog 20 KRT13 KRT13 KRT13 KRT13 - KRT2B - KRT15 KRT15 KRT15 KRT15 - KRT2C - KRT19 KRT19 KRT19 KRT19 KRT19 KRT1 KRT1 KRT1 KRT1 KRT9 KRT9 KRT9 KRT9P KRT9 KRT77 KRT77 KRT77 KRT77 KRT14 KRT14 KRT14 KRT14 KRT126P KRT126P KRT126 KRT16 KRT16 KRT16 KRT16 KRT127P - KRT127 KRT17 KRT17 KRT17 KRT17 KRT125P - - KRT42P KRT42 KRT42 KRT76 KRT76 KRT76 KRT76 KRT3 KRT3 KRT3 KRT3 KRT4 KRT4 KRT4 KRT4 KRT4 KRT79 KRT79 KRT79 KRT79 KRT79 KRT78 KRT78 KRT78 KRT78 KRT78 Horse and dog symbols Named automatically by VGNC KRT8 KRT8 KRT8 KRT8 red text Pseudogenes KRT18 KRT18 KRT18 KRT18 KRT18 from Balmer et al. PLoS, 2017.

5 1/22/2018

Nomenclature inconsistency within and between resources

Bos_taurus_UMD_3.1.1 NCBI Gene annotation release 105

VGNC symbols now: KRT33A KRT33B KRT34 KRT31

Bos_taurus_UMD_3.1 Ensembl release 91

Annotation inconsistencies identified Pseudogenes require manual curation

Human

Cow Chimp KRT5 and KRT6A concatenated in Ensembl

Now KRT89 Cow KRT13 not annotated in Ensembl

Naming across species

Keratin gene cluster I Keratin gene cluster II Human Chimp Cow Horse Dog Human Chimp Cow Horse Dog c17 c17 c19 c11 c9 c12 c12 c5 c6 c27 • Majority of cow and chimp KRT222 KRT222 KRT222 KRT222 KRT222 KRT80 KRT80 KRT80 KRT80 KRT80 KRT24 KRT24 KRT24 KRT24 KRT24 KRT7 KRT7 KRT7 KRT7 KRT7 KRT223P - - - - KRT87P KRT87 LOC507184 KRT87P KRT87P KRTs now have VGNC KRT25 KRT25 KRT25 KRT25 KRT25 KRT88P KRT88P LOC787600 KRT88 KRT88 KRT26 KRT26 KRT26 KRT26 KRT26 - LOC615451 KRT129P - symbols KRT27 KRT27 KRT27 KRT27 KRT27 KRT81 KRT81 KRT81 KRT81 KRT81 KRT28 KRT28 KRT28 KRT28 KRT28 KRT86 KRT86 KRT86 KRT86 KRT86 KRT10 KRT10 KRT10 KRT10A KRT10 KRT83 KRT83 KRT83 KRT83 KRT83 - - - KRT10B - KRT89P KRT89P KRT89 KRT89 KRT89 KRT12 KRT12 KRT12 KRT12 KRT12 KRT85 KRT85 KRT85 KRT85 KRT85 KRT20 KRT20 KRT20 KRT20 KRT20 KRT84 KRT84 KRT84 KRT84 KRT84 • Some unresolved orthology KRT23 KRT23 KRT23 KRT23 KRT23 KRT82 KRT82 KRT82 KRT82 KRT82 KRT39 KRT39 KRT39 KRT39 KRT39 KRT90P KRT90P/KRT124P KRT90/KRT124 KRT124 KRT124 KRT40 KRT40 KRT40 KRT40 KRT40 KRT75 KRT75 KRT75 KRT75 KRT75 KRT33A KRT33A KRT33A KRT33A KRT33A - - - KRT6P - KRT33B KRT33B KRT33B KRT33B KRT33B KRT6B KRT6B KRT6B KRT6B KRT6B • Assembly/annotation issue KRT34 KRT34 KRT34 KRT34 KRT34 KRT6C KRT6C KRT6C KRT6C - KRT31 KRT31 KRT31 KRT31 KRT31 KRT6A KRT6A KRT6A KRT6A KRT6A KRT41P KRT41 KRT41P KRT41 KRT5 KRT5 KRT5 KRT5 KRT5 KRT37 KRT37 LOC515000 KRT37 KRT37 KRT71 KRT71 KRT71 KRT71 KRT71 KRT38 KRT38 LOC107131523 - KRT38 KRT74 KRT74 KRT74 KRT74 KRT74 KRT43P - - - KRT72 KRT72 KRT72 KRT72 KRT72 • Clashes with agreed KRT32 KRT32 KRT32 KRT32 KRT32 KRT73 KRT73 KRT73 KRT73 KRT73 KRT35 KRT35 KRT35 KRT35 KRT35 KRT128P - - KRT128 - naming rules – contact us KRT36 KRT36 KRT36 KRT36 KRT36 KRT2 KRT2 KRT2 KRT2A KRT2 KRT13 KRT13 KRT13 KRT13 KRT13 - - - KRT2B - KRT15 KRT15 KRT15 KRT15 KRT15 - - KRT2C - before publishing KRT19 KRT19 KRT19 KRT19 KRT19 KRT1 KRT1 KRT1 KRT1 KRT1 KRT9 KRT9 KRT9 KRT9P KRT9 KRT77 KRT77 KRT77 KRT77 KRT77 KRT14 KRT14 KRT14 KRT14 KRT14 KRT126P - - KRT126P KRT126 KRT16 KRT16 KRT16 KRT16 KRT16 KRT127P - - - KRT127 KRT17 KRT17 KRT17 KRT17 KRT17 KRT125P - - - - KRT42P KRT42 KRT42 KRT42 KRT42 KRT76 KRT76 KRT76 KRT76 KRT76 LOC100336907 KRT3 KRT3 KRT3 KRT3 KRT3 Named automatically by VGNC KRT4 KRT4 KRT4 KRT4 KRT4 Named manually by VGNC KRT79 KRT79 KRT79 KRT79 KRT79 Name unresolved KRT78 KRT78 KRT78 KRT78 KRT78 bold VGNC name differs from Ensembl or NCBI Gene KRT8 KRT8 KRT8 KRT8 KRT8 red text Pseudogenes KRT18 KRT18 KRT18 KRT18 KRT18

6 1/22/2018

Future plans with VGNC gene naming Acknowledgements • Name remaining chimp, cow, dog and horse protein coding genes HGNC/VGNC Team • Curate names for gene families across multiple species at the same time Elspeth Bruford

• Prioritize new species for naming based on: Ruth Seal Bethan Yates (Poster P0051) • Quality of genome sequence and annotations • Value of the species in terms of research Susan Tweedie Kristian Gray • Level of interest/input from the research community Bryony Braschi Paul Denny Contact us at [email protected] Tamsin Jones Contact us at [email protected]

http://vertebrate.genenames.org

7