Standardizing Gene Product Nomenclature—A Call To
Total Page:16
File Type:pdf, Size:1020Kb
OPINION Standardizing gene product nomenclature—acallto action OPINION Kenji Fujiyoshia,b, Elspeth A. Brufordc,d,1, Pawel Mroze, Cynthe L. Simsf,g,h, Timothy J. O’Learyi,j,1, Anthony W. I. Lok, Neng Chenl, Nimesh R. Patelm, Keyur Pravinchandra Pateln, Barbara Seligero, Mingyang Songa,p,q,r, Federico A. Monzons, Alexis B. Cartert, Margaret L. Gulleyu, Susan M. Mockusv, Thuy L. Phungw, Harriet Feilotterx,y, Heather E. Williamsz,aa, and Shuji Oginoa,bb,cc,dd,1 The current lack of a standardized nomenclature system ID. We call for action across all biomedical communities for gene products (e.g., proteins) has resulted in a and scientific and medical journals to standardize nomen- haphazard counterproductive system of labeling. Differ- clature of gene products using HGNC gene symbols to ent names are often used for the same gene product; the enhance accuracy in scientific and public communication. same name is sometimes used for unrelated gene Use of gene symbols designated by the HGNC products. Such ambiguity causes not only potential harm [www.genenames.org (1)] is nearly universal. DNA- and to patients, whose treatments increasingly rely on labo- RNA-level sequence variation nomenclature has been ratory tests for multiple gene products, but also miscom- standardized to use HGNC gene symbols, the Single munication and inefficiency, both of which hinder progress of broad scientific fields. To mitigate this Nucleotide Polymorphism database (dbSNP) IDs, and confusion, we recommend standardizing human protein genetic variant nomenclature designated by the Hu- nomenclature through the use of a Human Genome man Genome Variation Society (HGVS) to unambigu- Organisation (HUGO) Gene Nomenclature Committee ously designate variants. In striking contrast to the use (HGNC) gene symbol accompanied by its unique HGNC of universal identifiers for genes and gene variants, We call on all biomedical communities and scientific and medical journals to standardize nomenclature of gene products to enhance accuracy in scientific and public communication. Image credit: Shutterstock/greenbutterfly. Author contributions: S.O. conceived and designed the overall project. K.F. and S.O. made the initial draft and created the table. All authors edited the manuscript, provided constructive feedback, and approved the final version. K.F., E.A.B., P.M., and C.L.S. contributed equally as co-first authors. S.M.M., T.L.P., H.F., H.E.W., and S.O. contributed equally as co-last authors. Published under the PNAS license. Any opinions, findings, conclusions, or recommendations expressed in this work are those of the authors and have not been endorsed by the National Academy of Sciences. The opinions are those of the authors and are not to be construed as official or as representing the views of their institutions or governments. Use of Standardized Official Symbols: We use HGNC (HUGO Gene Nomenclature Committee) approved symbols and root symbols for genes and gene families, including ABL1, ACE, ACE2, ACTA2, ARHGEF7, ARPC5, ASCC1, BCR, CD2, CD14, CD40, CDKN1A, CDKN1B, CDKN2A, CKAP4, CKB, CKM, CLN6, CLN8, COX8A, DCTN2, EDEM, EREG, ESR1, ESR2, FLNB, H3P16, IFI27, IL2RA, INS, ISG20, KLK3, KMT, KMT2B, KMT2D, KRT, MT-CO2, MTTP, NFKB1, NKX2-1, NPEPPS, NSG1, NXF1, PDCD1, PGR, PIK3CA, POLD2, PRH2, PSAT1, PSMD9, PTGS2, SEC14L2, SMN1, SNCA, SPATA2, STAT3, TAP1, TCEAL1, TMEM37, TPRG1, TP63, TTF1, USO1, and ZNRD2; all of which are described at www.genenames.org. The official gene symbols are italicized to differentiate from nonitalicized gene product names, gene root/stem symbols, and nonofficial names. 1To whom correspondence may be addressed. Email: [email protected], [email protected], or [email protected]. This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2025207118/-/DCSupplemental. Published January 6, 2021. PNAS 2021 Vol. 118 No. 3 e2025207118 https://doi.org/10.1073/pnas.2025207118 | 1of5 Downloaded by guest on September 26, 2021 there are no universal identifiers for the peptides and cellular localizations are discovered. Although HGNC proteins that these genes encode. actively collaborates with all of these protein naming Many gene products have multiple nomenclatures committees when assigning gene nomenclature, the in widespread use, and many common nomenclatures absence of a single universally accepted protein no- are used for multiple gene products. For example, the menclature, combined with multiple independent symbol “PD-1” is shared by multiple unrelated gene groups creating varied naming systems based on their products and is used to describe PDCD1, SNCA, and preferences, is both startling and potentially danger- SPATA2 gene products. The PDCD1 (PD-1) protein is a ous. We believe it is critical that protein nomenclature well-known target for cancer immunotherapy. How- avoid this significant pitfall. ever, the term “anti–PD-1” could mean a therapeutic antibody against products from PDCD1, SNCA, or Official Gene Symbols SPATA2. For another example, “TTF-1” (an abbrevia- Several groups, both private (www.biosciencewriters.com/ tion of “thyroid transcription factor 1”) is used to de- Guidelines-for-Formatting-Gene-and-Protein-Names.aspx) scribe a protein encoded by the NKX2-1 gene, and public (www.ncbi.nlm.nih.gov/genome/doc/ causing confusion with a different gene having the internatprot_nomenguide/), have previously rec- official gene symbol TTF1 (transcription termination ommended using HGNC-approved gene symbols for factor 1). Using “TTF-1” to describe the NKX2-1 gene gene product identification. HGNC further recom- product is confusing and potentially harmful, particu- mends the usage of italics for symbols denoting genes, larly if a drug specifically targeting either of these mRNAs, and alleles to differentiate them from proteins. gene products becomes available. This approach, which has not been universally adopted, Clearly there are instances in which use of non- is attractive for several reasons. standardized nomenclature may lead to miscommunica- HGNC gene symbols are already widely adopted tion (see Table 1 and SI Appendix,Table1). At best the across all major genomic resources. These symbols are current situation results in inefficient communication; at standardized by HGNC rules, and HGNC collaborates worst it creates harm from misunderstanding test results with UniProt and all of the external protein nomenclature and making inappropriate therapeutic choices. Names resources mentioned herein (www.genenames.org/help/ matter. Clearly, a better naming convention that elimi- symbol-report/). There is a one-to-one pairing of a given nates this type of ambiguity is necessary. gene and its official symbol with a unique HGNC ID, with no Our goal is to prevent the harms caused by ambig- ambiguity or confusion. The central dogma of biology uous gene product nomenclature. The National Acad- dictates that coding DNA gives rise to RNA, then to emy of Medicine (2, 3) and the Institute for Safe peptide and protein. Although this central dogma Medication Practices [ISMP (4)] report alarming data on does not cover all biological truths, it provides a the cause and impact of medical errors in the United strong rationale to name gene products using HGNC- States, with more than 100,000 medical errors reported approved gene symbols. HGNC IDs remain stable, annually. To mitigate errors caused by misreading med- even if HGNC gene symbols are updated. Using HGNC ical abbreviations, ISMP and Davis et al. are working on gene symbols with common colloquial names (if any) recommendations to avoid uncommon or ambiguous ab- and HGNC ID in parenthesis (e.g., PDCD1 protein [PD- breviations (5, 6). We assert that standardizing nomenclature 1; HGNC: 8760]) can eliminate nomenclature ambiguity. of gene products will synergize with the efforts for re- For a specific gene product isoform, a supple- ducing medical errors. mentary unique UniProt ID can be added after the HGNC ID (e.g., PDCD1 protein [PD-1; HGNC: 8760; Nomenclature for Genes and Gene Products UniProt: Q15116-1]). UniProt assigns each isoform a The 1979 official guidelines for human gene naming unique ID composed of the primary UniProt accession remain the basis of gene nomenclature assigned by plus a dash and a number. This strict combination of the HGNC today (7, 8). Their universal, unambiguous the gene symbol accompanied by HGNC ID and nomenclature system for genes was widely adopted UniProt ID avoids confusion with a small effort. This with immense value in facilitating communication. In approach takes advantage of the efforts of the HGNC parallel, the International Union of Immunological to assure unique gene identification. In this scenario, Societies subcommittees are responsible for naming, italics signifies a gene symbol, whereas nonitalics for example, Cluster of Differentiation (CD) molecules, signifies the encoded protein(s). Thus, the italic term immunoglobulins, T cell receptors, and interleukins PDCD1 indicates the PDCD1 gene, whereas the (9). The Enzyme Commission [EC (10)] designates “ac- nonitalic term PDCD1 indicates the PDCD1 protein, cepted names” for enzymes, and the Nomenclature which may be accompanied by nonofficial names such Committee of the International Union of Basic and as PD-1 in parenthesis. To eliminate ambiguity, non- Clinical Pharmacology (NC-IUPHAR) names biological official names such as PD-1 should not be used with- targets (11). The UniProt Knowledgebase (UniProtKB) out official symbols. (12) includes recommended protein