DEVELOPMENT

The Evolution of PharmVar Andrea Gaedigk1,2 , Katrin Sangkuhl3, Michelle Whirl-Carrillo3, Greyson P. Twist2,4, Teri E. Klein3 and Neil A. Miller2,4 on behalf of the PharmVar Steering Committee

The Pharmacogene Variation Consortium (PharmVar) continues Each has its own page with important information (Figure 1); the mission of the Human (CYP) Nomenclature this section is followed by the listing of allelic variants defined to date. database to provide the clinical and research communities a repos- The impact of sequence variations (SVs; amino acid changing and itory and standardized nomenclature of pharmacogene variation. splicing) is provided along with a selection of references. Functional To improve the utility of this resource, PharmVar has established information is cross-­referenced from functionality tables available a database and gene expert panels and expanded beyond CYPs to through the PharmGKB (https://www.pharmgkb.org/page/pgxGe- include other clinically important pharmacogenes. This report de- neRef ). However, function can be substrate specific, and generaliza- scribes the evolution from static CYP variation tables to a resource tions to other substrates should be made with caution. with growing functionality describing pharmacogene variation. As a gene transitions from its original table format into the PharmVar database, revisions may be required to standardize dis- WHERE PHARMVAR STARTED play (e.g., some coordinates for insertions/deletions have shifted The Human Cytochrome P450 (CYP) Nomenclature data- because these were right aligned or because mapping now uses the base provided static Web-­based tables listing CYP genetic vari- current RefSeq). Original comments and notes are removed, which ation (archived at https://www.pharmvar.org/htdocs/archive/ may trigger revisions for allele definitions (e.g., a note stating that index_original.htm). Although universally accepted and widely “12662A>G is likely part of all CYP2C19*2 alleles” was removed, used, these tables are now outdated. The Pharmacogene Variation and 12662A>G was included in all *2 definitions). Inadvertent Consortium (PharmVar) was launched in September 2017 with errors are being corrected (e.g., c.1425A>T was added to all the mission to continue providing haplotype information and CYP2C9*3 haplotypes because it was omitted when this allele was transforming the original tables into a modern and interactive first defined ≈20 years ago). In addition, intronic single-­nucleotide resource.1 polymorphisms have been removed from several CYP2C9 and To that end, PharmVar established a state-­of-­the-­art database CYP2C19 alleles; none of these are known to affect function and and gene expert panels from its membership base to facilitate the only represented some of many intronic single-­nucleotide polymor- review of submissions, develop standard operating procedures, phisms. Finally, some haplotypes may be redesignated during the standardize the submission/review process, and designate haplo- curation process. For instance, the nonfunctional CYP2D6*14A types. Table 1 itemizes the that have been transitioned to the allele was revised to CYP2D6*114, whereas the decreased function database and details PharmVar member demographics and site use. CYP2D6*14B allele was retained as CYP2D6*14. Changes are de- Close partnership with the Pharmacogenomics Knowledge Base tailed in each gene’s CHANGE LOG document. (PharmGKB) ensures consistency of data display, use, and dissemi- The PharmVar database also offers several download options, nation across other databases and stakeholders, such as the Clinical which greatly simplifies integration of allelic data into informatics Pharmacogenetics Implementation Consortium (CPIC). workflows. The user can download the entire PharmVar database content or selected variants of interest in sequence (fasta or vcf file) THE PHARMVAR DATABASE or tab-­separated value formats. Furthermore, the database is ver- PharmVar database features are built with the goal of facilitating sioned and dated. PharmVar members have the option to receive access to important and often-­used information in a meaningful email notifications when a new version is released, and automated and consistent way. To that end, PharmVar displays allelic data change log reports will be available in the future. consistently across genes (see STANDARDS document under PharmVar works with the National Center for Biotechnology GENES menu) and maps each gene to current versions of genomic Information (NCBI) to develop locus reference genomic (LRG) and transcript reference sequences (RefSeq) and genome builds sequences (https://www.lrg-sequence.org/) for pharmacogenes. GRCh37 and GRCh38. The variant view shows coordinates LRGs are stable references that are developed for reporting se- across all references for two count modes, from the sequence start quence variants with clinical implications. The benefit of using or the ATG start codon. The variant page also lists all haplotypes LRGs is that these will not change over time (as opposed to on which the variant can be found and provides the rs identifica- RefSeqs), thereby providing the field with “gold standard” refer- tion (ID) and link to the dbSNP database. ences for test reporting. PharmVar also seeks to update RefSeqs

1Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children’s Mercy Kansas City, Kansas City, Missouri, USA; 2School of Medicine, University of Missouri–Kansas City, Kansas City, Missouri, USA; 3Department of Biomedical Data Science, Stanford University, Stanford, California, USA; 4Center for Pediatric Genomic Medicine, Children’s Mercy Kansas City, Kansas City, Missouri, USA. Correspondence: Andrea Gaedigk ([email protected]) Received 23 August 2018; accepted 25 October 2018; advance online publication 00 Month 2018. doi:10.1002/cpt.1275

CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 0 NUMBER 0 | Month 2018 1 DEVELOPMENT

Table 1 PharmVar gene and membership demographics Haplotypes per Genes Star alleles genea PharmVar members PharmVar usersb PharmVar site visitsb CYP2A13 *1-*10 10 Total CYP2C8 *1-*14 14 95c 2,204 3,779 CYP2C9 *1-61d 61d United States and Canada CYP2C19 *1-*35 33e 59 847 1,486 CYP2D6 *1-*122c 106d,e Europe CYP2F1 *1-*6 6 18 449 812 CYP2J2 *1-*10 10 Asia CYP2R1 *1, *2 2 9 717 1,172 CYP2S1 *1-*5 5 Africa CYP2W1 *1-*6 6 3 21 34 CYP3A7 *1-*3 3 South America CYP3A43 *1-*3 3 4 94 154 CYP4F2 *1-*3 3 Australia and New Zealand 2 50 81 NUDT15 *1-*19 19 Gene and PharmVar member data are as of October 8, 2018. Only those genes transferred into the PharmVar database are listed. The following genes (number of alleles in parentheses) will be transferred: CYP1A1 (13), CYP1A2 (21), CYP1B1 (26), CYP2A6 (412), CYP2B6 (38), CYP2E1 (8), CYP3A4 (322), CYP3A5 (92), while CYP4A11, CYP4A22, CYP4B1, TBXAS1 (CYP5A1), PTGIS (CYP8A1), CYP19A1, CYP21A2 and CYP26A1 will receive legacy status. PharmVar membership demographics are provided by major geographical region. Website statistics are collected with Google Analytics and represent use over the period of 1 month (September 8 through October 8, 2018). PharmVar, Pharmacogene Variation Consortium. aNot counting suballeles. bExcluding hits generated by Children’s Mercy (PharmVar) and Stanford (PharmGKB). cIncluding 18 Steering Committee members and pending applications. dAdditional alleles have been defined and are awaiting release. eOne or more star alleles may have been removed or consolidated; the count of alleles does not correspond to the latest star allele number. in certain instances (e.g., the NCBI created separate RefSeqs for PHARMVAR INTRODUCES EVIDENCE LEVELS CYP1A1 and CYP11A2, which were originally located in opposite For smaller genes (10–15 kb) including CYP2D6, haplotypes directions on a single RefSeq). can be verified experimentally using allele-­specific long-­range polymerase chain reaction or NGS-­based single-­molecule se- NEW PHARMVAR HAPLOTYPE VERSIONING SYSTEM quencing techniques. In contrast, definitions for haplotypes The Human CYP Nomenclature webpages initially defined sub- for larger genes, such as CYP2C9 or CYP2C19, are often alleles using letters (A, B, C, etc.), but eventually stopped accept- based on computationally inferred predictions. Although ing these submissions. This triggered authors to publish variants Sanger sequencing is still considered the gold standard, ad- using a suffix (e.g., *2var1 and *2var2), making it difficult, if not vances in NGS technology are yielding increasingly robust impossible, to keep track of allelic variation. Suballeles denote data. PharmVar accepts NGS-­based submissions, given that haplotypes that carry one or more SVs in addition to those de- certain quality control criteria are fulfilled. Regardless of the fining a haplotype. Although these SVs are known or predicted sequencing method used, haplotype definitions can be based to be inconsequential, they may interfere with genotyping assays on “definitive” or “moderate” evidence (e.g., computational and may lead to inaccurate result interpretations.2 Furthermore, inference). Many haplotypes defined before PharmVar may a comprehensive catalog of allelic variation that includes sub- lack crucial pieces of information now required for haplotype alleles will facilitate next-­generation sequencing (NGS)–based definition; these are considered of “limited” evidence. The data interpretation using bioinformatic algorithms, such as evidence level of a haplotype may be important information Astrolabe3 or Stargazer,4 that assign diplotypes using star no- for investigators and in clinical implementation. Detailed in- menclature. To provide the field with the most comprehensible formation regarding the methods used for haplotype determi- data repository, PharmVar has resumed cataloging suballeles. nation is collected in the submission process and detailed in To accommodate potentially large numbers of suballeles of the ALLELE DESIGNATION AND EVIDENCE LEVEL major haplotypes (e.g., CYP2D6*2 or CYP2D6*4), a subal- CRITERIA document. PharmVar also accepts supporting lele numbering system has been introduced that replaces the submissions of the same haplotype to build a stronger body of use of letters. For example, CYP2D6*2A and CYP2D6*2B are evidence in support of a haplotype. This type of information, now displayed as CYP2D6*2.001 and CYP2D6*2.002, respec- not systematically captured before PharmVar, is used to assign tively. Details can be found in the ALLELE DESIGNATION definitive, moderate, and limited evidence levels to each haplo- AND EVIDENCE LEVEL CRITERIA document under the type, reflecting the strength of evidence on which a definition SUBMISSION menu. is based.

2 VOLUME 0 NUMBER 0 | Month 2018 | www.cpt-journal.com DEVELOPMENT

Variants window (opens upon clicking on variant) important documents

choose sequence for display choose count mode

Custom display features

Link to db SNP Click to list all haplotypes containing this SNP

Figure 1 PharmVar gene page view for CYP2C19. The top portion of each gene page provides general information and links to important documents, including the READ ME and CHANGE LOG documents (some genes also have documentation for structural variation). In the middle section, the user can choose to visualize positions of sequence variations on the genomic or cDNA reference sequences (NG_008384.3 and NM_000769.1) or on the GRCh37 or GRCh38 genome build (information buttons link each to NCBI). The user also has the option between two count modes (start of the sequence or the ATG start codon). In the lower section, all defined alleles and suballeles are listed with their respective haplotype designation, legacy label, PharmVar ID, and sequence variations (those with rs IDs are highlighted in blue), followed by their impact (e.g., splicing defect or amino acid change) in parentheses, function (if available through the PharmGKB), evidence level, and selected reference(s). Download options include to download all gene data or individual haplotypes. On clicking a sequence variant of interest, the variant window (shown to the right) slides in “at a glance” information, including its rs ID and link to dbSNP.

INTRODUCTION OF THE PHARMVAR ID EXPANDING THE REPERTOIRE OF GENES The challenge of allele naming arises from the conflict between Building nomenclature for new genes faces numerous challenges, the desire to have consistent, stable identifiers for alleles and including the existence of star-­designated alleles in the absence of having meaningful names, allowing allele grouping by major a formal nomenclature system, designations that do not conform allele, function, or relevant variant. Users expect an assigned al- to PharmVar criteria, the use of designations without a publication lele name to have meaning on its own (i.e., all suballeles have the record, and insufficient published information, among others. same or similar function). Given that it is impossible to know Some of these challenges were encountered for NUDT15, which all of the eventual functional implications of each allele at the was chosen as the first PharmVar non-­CYP gene because of its time a name is assigned, those two goals are not easily and si- contribution to thiopurine toxicity.5 NUDT15 genetic testing is multaneously achieved. Thus, we either accept that allele names increasingly being performed in concert with TPMT and, thus, do not always represent the best familial grouping for an allele, nomenclature is a prerequisite for standardized reporting. Other or we accept that in the face of new knowledge, an allele name CPIC genes for which no formal nomenclature exists (e.g., DPYD will sometimes have to change. PharmVar offers a solution to and SLCO1B1) will follow. Each gene will likely present unique this dilemma by splitting those duties (stable identification vs. challenges, which will be addressed by expert gene panelists and functional/familial grouping). Identification will be handled Steering Committee input. through the introduction of the PharmVar ID, a numeric iden- tifier similar to the dbSNP’s rs ID, while continuing to use star PHARMVAR KEEPS EVOLVING allele names, which might not be permanent, to show the func- Finally, PharmVar continues to evolve by improving display, devel- tional grouping. This approach allows the identification of an oping new features (e.g., a comparative allele viewer tool and cus- allele to be fixed while its functional interpretation can change tom data displays), and adding complementary information (e.g., over time as more information becomes available. We recog- linking to Coriell samples known to contain a particular haplo- nize that it is unrealistic to expect the entire community to type). PharmVar welcomes feedback and suggestions to further change how alleles are referred to, especially in the short-­term. the evolution of this resource. However, because PharmVar needs to be able to revise allele des- ignations in some cases to reflect a more appropriate classifica- ACKNOWLEDGMENTS tion, we suggest citing PharmVar IDs alongside star allele names We thank all Pharmacogene Variation Consortium Steering Committee moving forward (e.g., CYP2D6*46.002 (PV00182)). members for their critical review of this article and all experts serving

CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 0 NUMBER 0 | Month 2018 3 DEVELOPMENT

on gene panels. We also thank Scott Casey for website design and development. 1. Gaedigk, A. et al. The Pharmacogene Variation (PharmVar) Consortium: incorporation of the Human Cytochrome P450 (CYP) Allele Nomenclature database. Clin. Pharm. Ther. 103, 399–401 (2018). FUNDING 2. Nofziger, C. & Paulmichl, M. Accurately genotyping CYP2D6: not for This work was funded by the National Institutes of Health for the the faint of heart. Pharmacogenomics 19, 999–1002 (2018). Pharmacogene Variation Consortium (R24GM123930) and PharmGKB (R24GM61374). 3. Twist, G. et al. Constellation: a tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, CYP2D6, from whole-­genome sequence. Genomic Med. (2016). http://www.nature. CONFLICT OF INTEREST com/articles/npjgenmed20157. The authors declared no competing interests for this work. 4. Lee, S.B. et al. Stargazer: a software tool for calling star alleles from next-­generation sequencing data using CYP2D6 as a model. Genet. Med. https://doi.org/10.1038/s41436-018-0054-0. [e-pub ahead of print]. © 2018 The Authors Clinical Pharmacology & Therapeutics 5. Yang, J.J. et al. Pharmacogene variation consortium gene introduc- published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics tion: NUDT15. Clin. Pharm. Ther. https://doi.org/10.1002/cpt.1268. [e-pub ahead of print]. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and dis- tribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

4 VOLUME 0 NUMBER 0 | Month 2018 | www.cpt-journal.com