<<

OPINION

Standardizing product —acallto action OPINION Kenji Fujiyoshia,b, Elspeth A. Brufordc,d,1, Pawel Mroze, Cynthe L. Simsf,g,h, Timothy J. O’Learyi,j,1, Anthony W. I. Lok, Neng Chenl, Nimesh R. Patelm, Keyur Pravinchandra Pateln, Barbara Seligero, Mingyang Songa,p,q,r, Federico A. Monzons, Alexis B. Cartert, Margaret L. Gulleyu, Susan M. Mockusv, Thuy L. Phungw, Harriet Feilotterx,y, Heather E. Williamsz,aa, and Shuji Oginoa,bb,cc,dd,1

The current lack of a standardized nomenclature system ID. We call for action across all biomedical communities for gene products (e.g., ) has resulted in a and scientific and medical journals to standardize nomen- haphazard counterproductive system of labeling. Differ- clature of gene products using HGNC gene symbols to ent names are often used for the same gene product; the enhance accuracy in scientific and public communication. same name is sometimes used for unrelated gene Use of gene symbols designated by the HGNC products. Such ambiguity causes not only potential harm [www.genenames.org (1)] is nearly universal. DNA- and to patients, whose treatments increasingly rely on labo- RNA-level sequence variation nomenclature has been ratory tests for multiple gene products, but also miscom- standardized to use HGNC gene symbols, the Single munication and inefficiency, both of which hinder progress of broad scientific fields. To mitigate this Nucleotide Polymorphism database (dbSNP) IDs, and confusion, we recommend standardizing genetic variant nomenclature designated by the Hu- nomenclature through the use of a man Genome Variation Society (HGVS) to unambigu- Organisation (HUGO) Committee ously designate variants. In striking contrast to the use (HGNC) gene symbol accompanied by its unique HGNC of universal identifiers for and gene variants,

We call on all biomedical communities and scientific and medical journals to standardize nomenclature of gene products to enhance accuracy in scientific and public communication. Image credit: Shutterstock/greenbutterfly.

Author contributions: S.O. conceived and designed the overall project. K.F. and S.O. made the initial draft and created the table. All authors edited the manuscript, provided constructive feedback, and approved the final version. K.F., E.A.B., P.M., and C.L.S. contributed equally as co-first authors. S.M.M., T.L.P., H.F., H.E.W., and S.O. contributed equally as co-last authors. Published under the PNAS license. Any opinions, findings, conclusions, or recommendations expressed in this work are those of the authors and have not been endorsed by the National Academy of Sciences. The opinions are those of the authors and are not to be construed as official or as representing the views of their institutions or governments. Use of Standardized Official Symbols: We use HGNC (HUGO Gene Nomenclature Committee) approved symbols and root symbols for genes and gene families, including ABL1, ACE, ACE2, ACTA2, ARHGEF7, ARPC5, ASCC1, BCR, CD2, CD14, CD40, CDKN1A, CDKN1B, CDKN2A, CKAP4, CKB, CKM, CLN6, CLN8, COX8A, DCTN2, EDEM, EREG, ESR1, ESR2, FLNB, H3P16, IFI27, IL2RA, INS, ISG20, KLK3, KMT, KMT2B, KMT2D, KRT, MT-CO2, MTTP, NFKB1, NKX2-1, NPEPPS, NSG1, NXF1, PDCD1, PGR, PIK3CA, POLD2, PRH2, PSAT1, PSMD9, PTGS2, SEC14L2, SMN1, SNCA, SPATA2, STAT3, TAP1, TCEAL1, TMEM37, TPRG1, TP63, TTF1, USO1, and ZNRD2; all of which are described at www.genenames.org. The official gene symbols are italicized to differentiate from nonitalicized gene product names, gene root/stem symbols, and nonofficial names. 1To whom correspondence may be addressed. Email: [email protected], [email protected], or [email protected]. This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2025207118/-/DCSupplemental. Published January 6, 2021.

PNAS 2021 Vol. 118 No. 3 e2025207118 https://doi.org/10.1073/pnas.2025207118 | 1of5 Downloaded by guest on September 26, 2021 there are no universal identifiers for the and cellular localizations are discovered. Although HGNC proteins that these genes encode. actively collaborates with all of these protein naming Many gene products have multiple committees when assigning gene nomenclature, the in widespread use, and many common nomenclatures absence of a single universally accepted protein no- are used for multiple gene products. For example, the menclature, combined with multiple independent symbol “PD-1” is shared by multiple unrelated gene groups creating varied naming systems based on their products and is used to describe PDCD1, SNCA, and preferences, is both startling and potentially danger- SPATA2 gene products. The PDCD1 (PD-1) protein is a ous. We believe it is critical that protein nomenclature well-known target for immunotherapy. How- avoid this significant pitfall. ever, the term “anti–PD-1” could mean a therapeutic antibody against products from PDCD1, SNCA, or Official Gene Symbols SPATA2. For another example, “TTF-1” (an abbrevia- Several groups, both private (www.biosciencewriters.com/ tion of “thyroid factor 1”) is used to de- Guidelines-for-Formatting-Gene-and-Protein-Names.aspx) scribe a protein encoded by the NKX2-1 gene, and public (www.ncbi.nlm.nih.gov/genome/doc/ causing confusion with a different gene having the internatprot_nomenguide/), have previously rec- official gene symbol TTF1 (transcription termination ommended using HGNC-approved gene symbols for factor 1). Using “TTF-1” to describe the NKX2-1 gene gene product identification. HGNC further recom- product is confusing and potentially harmful, particu- mends the usage of italics for symbols denoting genes, larly if a drug specifically targeting either of these mRNAs, and to differentiate them from proteins. gene products becomes available. This approach, which has not been universally adopted, Clearly there are instances in which use of non- is attractive for several reasons. standardized nomenclature may lead to miscommunica- HGNC gene symbols are already widely adopted tion (see Table 1 and SI Appendix,Table1). At best the across all major genomic resources. These symbols are current situation results in inefficient communication; at standardized by HGNC rules, and HGNC collaborates worst it creates harm from misunderstanding test results with UniProt and all of the external protein nomenclature and making inappropriate therapeutic choices. Names resources mentioned herein (www.genenames.org/help/ matter. Clearly, a better naming convention that elimi- symbol-report/). There is a one-to-one pairing of a given nates this type of ambiguity is necessary. gene and its official symbol with a unique HGNC ID, with no Our goal is to prevent the harms caused by ambig- ambiguity or confusion. The central dogma of uous gene product nomenclature. The National Acad- dictates that coding DNA gives rise to RNA, then to emy of Medicine (2, 3) and the Institute for Safe and protein. Although this central dogma Medication Practices [ISMP (4)] report alarming data on does not cover all biological truths, it provides a the cause and impact of medical errors in the United strong rationale to name gene products using HGNC- States, with more than 100,000 medical errors reported approved gene symbols. HGNC IDs remain stable, annually. To mitigate errors caused by misreading med- even if HGNC gene symbols are updated. Using HGNC ical abbreviations, ISMP and Davis et al. are working on gene symbols with common colloquial names (if any) recommendations to avoid uncommon or ambiguous ab- and HGNC ID in parenthesis (e.g., PDCD1 protein [PD- breviations (5, 6). We assert that standardizing nomenclature 1; HGNC: 8760]) can eliminate nomenclature ambiguity. of gene products will synergize with the efforts for re- For a specific gene product isoform, a supple- ducing medical errors. mentary unique UniProt ID can be added after the HGNC ID (e.g., PDCD1 protein [PD-1; HGNC: 8760; Nomenclature for Genes and Gene Products UniProt: Q15116-1]). UniProt assigns each isoform a The 1979 official guidelines for human gene naming unique ID composed of the primary UniProt accession remain the basis of gene nomenclature assigned by plus a dash and a number. This strict combination of the HGNC today (7, 8). Their universal, unambiguous the gene symbol accompanied by HGNC ID and nomenclature system for genes was widely adopted UniProt ID avoids confusion with a small effort. This with immense value in facilitating communication. In approach takes advantage of the efforts of the HGNC parallel, the International Union of Immunological to assure unique gene identification. In this scenario, Societies subcommittees are responsible for naming, italics signifies a gene symbol, whereas nonitalics for example, Cluster of Differentiation (CD) molecules, signifies the encoded protein(s). Thus, the italic term immunoglobulins, T cell receptors, and interleukins PDCD1 indicates the PDCD1 gene, whereas the (9). The Commission [EC (10)] designates “ac- nonitalic term PDCD1 indicates the PDCD1 protein, cepted names” for , and the Nomenclature which may be accompanied by nonofficial names such Committee of the International Union of Basic and as PD-1 in parenthesis. To eliminate ambiguity, non- Clinical Pharmacology (NC-IUPHAR) names biological official names such as PD-1 should not be used with- targets (11). The UniProt Knowledgebase (UniProtKB) out official symbols. (12) includes recommended protein names as well as Furthermore, the Vertebrate Gene Nomenclature functional, taxonomic, and structural information. Committee (VGNC, vertebrate.genenames.org/), However, current protein naming systems based which is a sister project to the HGNC, is responsible on different aspects of structure, function, and cellular for assigning standardized nomenclature to genes in localization readily conflict, and they may make less key vertebrate that lack a nomenclature au- sense if new functions, alternative structures, or novel thority. The VGNC coordinates closely with all existing

2of5 | PNAS Fujiyoshi et al. https://doi.org/10.1073/pnas.2025207118 Opinion: Standardizing gene product nomenclature—a call to action Downloaded by guest on September 26, 2021 vertebrate gene nomenclature committees, namely the mouse, rat, chicken, xenopus, and committees (see SI Appendix, Table 2), to ensure that vertebrate genes are named in line with their human homologs. The VGNC additionally approves gene nomenclature for other selected vertebrates (vertebrate. genenames.org/about/species-list/). Additionally, inver- tebrates, including important model species such as melanogaster (fruit fly), (nematode), (bak- er’s yeast), and Schizosaccharomyces pombe (fission yeast), also have established naming committees as- sociated with their databases (see SI Appendix, Table 2). Therefore, the concept of using the approved gene symbol, along with a database ID (and UniProt ID where required), for unambiguous Fig. 1. Roadmap to universal implementation of a gene product nomenclature gene product naming is clearly applicable to numerous system. Steps to implement an advanced gene product nomenclature system: (i) other species and should be encouraged wherever form a working group consisting of experts from diverse fields, (ii) develop a iii possible. standardized nomenclature system and usage guidelines, ( ) implement the system, (iv) receive feedback, and (v) improve the standardized nomenclature system and usage guidelines. Gene Symbol Versus Gene Name Each gene has one unique long “gene name” and one unique short “gene symbol,” both of which are offi- developing rules for the unique identification of gene cially approved by the HGNC. Routinely, the gene product variants as the next step to standardize gene symbol is an abbreviation of the gene name. product nomenclature. We recommend use of symbols over names for a Although some scientists, healthcare providers, multitude of reasons. First, gene symbols are much and patients may be unfamiliar with certain gene easier to remember than gene names (e.g., PIK3CA is symbols, the widespread use of gene symbols to de- the symbol for the name “phosphatidylinositol-4,5- scribe gene products in the literature and medical bisphosphate 3-kinase catalytic subunit alpha”). Second, records should eventually promote their familiarity. changes to gene names have historically occurred more Endorsement of a standard nomenclature system by often than symbol changes, and continuity is a valuable the World Association of Medical Editors and the In- means of reducing confusion. Third, a space, hyphen, or ternational Committee of Medical Journal Editors will comma in a gene name may create ambiguity over help educate authors, reviewers, and readers. It is where the name ends. Fourth, names may contain words also crucial that this standard is implemented and with differential spellings in British and American English; enforced by journals through their editorial boards e.g., “oestrogen 1” (nonofficial name) versus and copyeditors. Considering the impact of published “ 1” (official gene name); however, scientific data on the scientific process, one could ar- there is only one HGNC-approved symbol: ESR1.Lastly, gue that editors have an enhanced obligation to assist gene names may be mistranslated by computer software with the process of diminishing confusion and mis- and artificial intelligence. For all of these reasons, the use statements in scientific findings. The potential harms of official HGNC gene symbols to represent protein in not adopting such a system far outweigh the short- products promotes accurate, efficient communication. term discomfort of learning a new naming convention. Additionally, we assert that implementing standard- More Nomenclature Challenges ized gene product nomenclature will accelerate clini- The biology of peptides and proteins is very compli- cal and epidemiological research using electronic health cated. There are many protein complexes consisting records and other healthcare information systems. of multiple different peptides, often with other bio- molecules. One gene may encode multiple isoforms, Seeking Solutions which may have structural variants or undergo post- We propose the universal use of HGNC-approved transcriptional/posttranslational modifications. Sev- gene symbols, along with HGNC IDs, UniProt IDs, and eral peptides are derived from “INS (HGNC: 6081)”; common colloquial names (when appropriate), to un- however, UniProt IDs do not currently discriminate ambiguously identify gene products and reduce the against these different entities. Furthermore, the use potential for serious harm arising from mistaken of prefixes on established symbols, such as “sCD14” identification of gene products. The lack of a single to denote “soluble CD14” or “pSTAT3” to denote unifying protein nomenclature system continues to “phosphorylated STAT3,” can cause confusion. Al- confuse scientists, clinicians, and the public. This con- though dealing with this complexity requires effort fusion impedes communication, data sharing, and sci- beyond the simple use of gene symbols, naming entific progress. Given the wealth of data on alterations multiple products derived from one gene based on of specific gene products in various diseases and in- the gene symbol provides a feasible starting point for creasing importance of treatment approaches targeting an unequivocal nomenclature system. We propose proteins and peptides, the use of nonstandardized

Fujiyoshi et al. PNAS | 3of5 Opinion: Standardizing gene product nomenclature—a call to action https://doi.org/10.1073/pnas.2025207118 Downloaded by guest on September 26, 2021 Table 1. Examples of gene products with common ambiguous and/or confusing nomenclature HGNC- Other genes and gene products that approved HGNC UniProt HGNC-approved have similar (or the same) official official symbol ID ID gene name Other names symbols and/or colloquial names* Notes

ACE2 13557 Q9BYF1 I ACE ACE (angiotensin I converting The term “ACE inhibitor” has been converting enzyme) loosely used for an inhibitor of enzyme 2 ACE2, especially in the context of the COVID-19 pandemic. Currently, the term “ACE inhibitor” can mean either an ACE2 inhibitor (e.g., C16) or a conventional ACE inhibitor (e.g., captopril or lisinopril). However, ACE2 and ACE are different gene products. ERBB2 3430 P04626 erb-b2 receptor HER2, HER-2, NEU NEU1 ( 1, NEU), NEU Trastuzumab is a monoclonal tyrosine kinase (root symbol for ) antibody targeting the ERBB2 2 (HER2) protein for treatment of certain . ESR1 3467 P03372 estrogen estrogen receptor EREG (ER), CLN6 (CLN6 “ER” can stand for “estrogen receptor 1 alpha, ER transmembrane ER protein), CLN8 receptor” and “endoplasmic (CLN8 transmembrane ER and reticulum.” The “ER” term is ERGIC protein), EDEM (ER commonly used in clinical practice. degradation enhancing alpha- “ESR1” and “ESR2” should be like protein) distinguished to avoid ambiguity. KLK3 6364 P07288 kallikrein related prostate specific NPEPPS (aminopeptidase puromycin The term “PSA test” is a commonly peptidase 3 antigen, PSA sensitive, PSA), PSAT1 used test for prostate cancer in the (phosphoserine aminotransferase literature. 1, PSA) NKX2-1 11825 P43699 NK2 1 transcription TTF1 (transcription termination factor “TTF-1” or “TTF1” are commonly factor 1, TTF-1, 1) used to represent “NKX2-1” in TTF1 surgical pathology reports. PDCD1 8760 Q15116 programmed cell PD1, PD-1 SNCA (synuclein alpha, PD-1), “PD1,”“PD-1,” "anti-PD1," and death 1 SPATA2 (spermatogenesis "anti-PD-1" are widely used in associated 2, PD-1) immunology, oncology, and pathology.

*In this column, HGNC-approved gene symbols are primarily used with HGNC-approved gene names and/or nonofficial names in a parenthesis.

names for gene products must end. We believe that clinical and public health leaders, is critical for effec- regular and strict use of gene symbols along with tive implementation. Standardized use of the HGNC- nonofficial legacy names will provide clarity with approved gene symbols along with HGNC IDs to minimal difficulty. identify gene products is an important first step to Ultimately, a more complex protein nomenclature achieve this ambitious goal. system must be developed to unequivocally distin- guish gene product variants emanating from one Acknowledgments gene, as well as complexes of biomolecules contain- This work was supported in part by grants from the US National ing multiple gene products. We propose to form a Institutes of Health (U24 HG003345 to E.A.B., R35 CA197735 to working group of experts from diverse fields to de- S.O., R01 CA151993 to S.O., R21 CA230873 to S.O., R01 CA248857 to S.O.) and from the Wellcome Trust UK (208349/Z/ velop a robust and complete gene product nomen- 17/Z to E.A.B.). The content is solely the responsibility of the clature system and guidelines for its use (Fig. 1). authors and does not necessarily represent the official views of the Support from journals and publishers, together with National Institutes of Health.

aProgram in MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02215; bDepartment of Surgery, Kurume University, Kurume, Fukuoka 8300011, Japan; cHUGO Gene Nomenclature Committee (HGNC), EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK; dDepartment of Haematology, University of Cambridge, Cambridge CB2 0XY, UK; eDepartment of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN 55455; fTorreyana Corp, San Diego, CA 92126; gBlackhawk Genomics, Concord, CA 94518; hAdvagenix, Rockville, MD, 20850; iOffice of Research and Development, Veterans Health Administration, Washington, DC 20420; jDepartment of Pathology, University of Maryland School of Medicine, Baltimore, MD 21201; kMolecular Pathology Laboratory, Division of Anatomical Pathology, Queen Mary Hospital, Hong Kong Special Administrative Region, People’s Republic of China; lDepartment of Molecular , Quest Diagnostics, San Juan Capistrano, CA 92690; mDepartment of Pathology, Rhode Island Hospital and Alpert Medical School at Brown University, Providence, RI 02903; nDepartment of Hematopathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030; oInstitute of Medical Immunology, Martin Luther University Halle-Wittenberg, Halle 06112, Germany; pClinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02215; qDivision of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02215; rDepartment of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA 02215; sCastle Biosciences, Friendswood, TX 77546; tPathology and Laboratory Medicine, Children’s Healthcare of Atlanta, Atlanta, GA 30322; uDepartment of Pathology, University of North Carolina, Chapel Hill, NC 27599; vPrecision Biomarker Laboratories, Cedars-Sinai Medical Center, Los Angeles, CA 90048; wDepartment of Pathology, University of South Alabama, Mobile, AL 36617; xDepartment of Pathology and Molecular Medicine, Queen’s University, Kingston, Ontario, K7L 3N6, Canada; yMolecular Diagnostics, Kingston Health Sciences Centre, Kingston, Ontario, K7L 2V7, Canada; zCytogenetics Laboratory,

4of5 | PNAS Fujiyoshi et al. https://doi.org/10.1073/pnas.2025207118 Opinion: Standardizing gene product nomenclature—a call to action Downloaded by guest on September 26, 2021 Haematological Malignancy Diagnostic Centre, Viapath at King’s College Hospital, London SE5 9RS, UK; aaDepartment of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY 10032; bbDepartment of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115; ccCancer Immunology and Cancer Epidemiology Programs, Dana-Farber Harvard Cancer Center, Boston, MA 02115; and ddBroad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142 Competing interest statement: C.L.S. is employed through Torreyana Corp, San Diego, CA; Blackhawk Genomics, Concord, CA; and Advagenix, Rockville, MD. Volunteer activities of C.L.S. include those for College of American Pathologists, Northfield, IL, and Clinical Laboratory Standards Institute, Wayne, PA. T.J.O. is a Member, Scientific Advisory Committee, MioDx and Integrated Nano-Technologies. A.W.I.L.’s laboratory receives sponsorships from AstraZeneca (HK) Ltd. and MSD (HK) Ltd. for providing selected companion diagnostic tests free to public patients. N.C. is an employee at Quest Diagnostics, Inc. F.A.M. is an employee and stock option holder at Castle Biosciences, Inc. A.B.C. is paid teaching faculty for the American Medical Informatics Association Clinical Informatics Board Review Course and receives small honoraria as well as travel reimbursement to speak at multiple scientific and professional medical society meetings. T.L.P. has been consulted (compensated) for Bio-Rad, Inc. and is Director of Pathology Strategies for the Sturge Weber Foundation (compensated). H.E.W. has employment through Viapath, a majority National Health Service-owned independent pathology service provider; has been a paid faculty member at Kingston University, accepted paid accommodation and subsistence as an invited speaker to Cytocell User Group meeting for the United Kingdom and Ireland, and accepted paid event registration as an invited speaker to Digital Pathology/Global Engage meeting. H.E.W.’s laboratory received scholarship funds from The International Council for Standardization in Haematology for providing JAK2 testing. The other authors (K.F., E.A.B., P.M., N.R.P., K.P.P., B.S., M.S., M.L.G., S.M.M., H.F., and S.O.) do not have any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this article.

1 Conventional wisdom. Nat. Genet. 42, 363 (2010). 2 L. T. Kohn, J. M. Corrigan, M. S. Donaldson, Eds., To Err is Human: Building a Safer Health System (National Academies Press, Washington, DC, 2000). 3 Consumer Reports Health, To err is human-to delay is deadly. https://advocacy.consumerreports.org/wp-content/uploads/2013/05/ safepatientproject.org-ToDelayIsDeadly.pdf (2009). Accessed 13 November 2020. 4 The Institute for Safe Medication Practices, Medication errors 2018: The year in review. https://www.pharmacypracticenews.com/Review-Articles/ Article/10-18/Medication-Errors-2018-The-Year-in-Review/53076?sub=F09D4E1AEB1935236B7DD88EBF3511796624DD9967A417E04A6 3FAD9FB86&enl=true(2018). Accessed 13 November 2020. 5 N. M. Davis, 1700 medical abbreviations: Convenience at the expense of communications and safety. Hosp. Pharm. 18, 175, 180–182, 186–187 passim (1983). 6 N. M. Davis, MedAbbrev.com. https://medabbrev.com/ (2020). Accessed 13 November 2020. 7 E. A. Bruford et al., Guidelines for human gene nomenclature. Nat. Genet. 52,754–758 (2020). 8 B. Braschi et al., Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47 (D1), D786–D792 (2019). 9 Proceedings of the 9th International Workshop on Human Leukocyte Differentiation Antigens. March 2010. Barcelona, Spain. Immunol Lett. 134, 103–187 (2011). 10 A. G. McDonald, K. F. Tipton, Fifty-five years of enzyme classification: Advances and difficulties. FEBS J. 281, 583–592 (2014). 11 S. P. H. Alexander et al.; CGTP Collaborators, The concise guide to pharmacology 2019/20: Introduction and other protein targets. Br. J. Pharmacol. 176 (suppl. 1), S1–S20 (2019). 12 UniProt Consortium, UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 47 (D1), D506–D515 (2019).

Fujiyoshi et al. PNAS | 5of5 Opinion: Standardizing gene product nomenclature—a call to action https://doi.org/10.1073/pnas.2025207118 Downloaded by guest on September 26, 2021