Reports from the Fifth Edition of CAGI: the Critical Assessment of Genome Interpretation
Total Page:16
File Type:pdf, Size:1020Kb
Received: 16 July 2019 | Accepted: 19 July 2019 DOI: 10.1002/humu.23876 OVERVIEW Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation Gaia Andreoletti1 | Lipika R. Pal2 | John Moult2,3 | Steven E. Brenner1,4 1Department of Plant and Microbial Biology, University of California, Berkeley, California Abstract 2Institute for Bioscience and Biotechnology Interpretation of genomic variation plays an essential role in the analysis of Research, University of Maryland, Rockville, cancer and monogenic disease, and increasingly also in complex trait disease, with Maryland 3Department of Cell Biology and Molecular applications ranging from basic research to clinical decisions. Many computational Genetics, University of Maryland, College impact prediction methods have been developed, yet the field lacks a clear consensus Park, Maryland on their appropriate use and interpretation. The Critical Assessment of Genome 4Center for Computational Biology, University of California, Berkeley, California Interpretation (CAGI, /'kā‐jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. Correspondence John Moult, Institute for Bioscience and CAGI participants are provided genetic variants and make blind predictions of Biotechnology Research, University of resulting phenotype. Independent assessors evaluate the predictions by comparing Maryland, 9600 Gudelsky Drive, Rockville, MD 20850. with experimental and clinical data. Email: [email protected] CAGI has completed five editions with the goals of establishing the state of art in Steven E. Brenner, Department of Plant and genome interpretation and of encouraging new methodological developments. This Microbial Biology, University of California, special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises Berkeley, CA 94720. Email: [email protected] reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged Funding information National Human Genome Research Institute hundreds of participants from a dozen countries. This edition had a notable increase and National Cancer Institute, Grant/Award in splicing and expression regulatory variant challenges, while also continuing Number: U41 HG007346 and R13 HG006650 challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org. KEYWORDS CAGI, Critical Assessment of Genome Interpretation, genomics, genetic variation, cancer genetics, variant impact predictors, SNP 1 | INTRODUCTION The Critical Assessment of Genome Interpretation (CAGI, pronounced /'kā‐jē/) addresses this need by providing an objective Interpretation of genome sequence variation plays a major and evaluation of the state‐of‐the‐art in relating human genetic variation increasing role in both basic research and in clinical medicine. to phenotype, particularly health. Each edition of CAGI provides These applications necessitate robust and reliable computational about a dozen challenges to understand performance of prediction approaches to aid in determining the phenotypic impact of methods in a given scenario. Participants in a challenge are provided variants. Many such methods have been developed (see genetic variants and make predictions of resulting molecular, cellular, Hu et al., 2019 in this issue). However, the appropriate use or organismal phenotypes. Data sets for the challenges include and accuracy of most methods have not been objectively germline and somatic cancer variation, rare disease, common disease, determined. and pharmacogenomics. The scale of challenges ranges from single Human Mutation. 2019;40:1197–1201. wileyonlinelibrary.com/journal/humu © 2019 Wiley Periodicals, Inc. | 1197 1198 | ANDREOLETTI ET AL. nucleotides to whole genomes, as well as complementary multiomics CAGI4 SickKids challenge, also described in the assessment paper and environmental information. Variant types include those affecting here. expression, splicing, and amino acid sequence, and may be single base Over all CAGI editions, the plurality of challenges have been on changes, insertions or deletions, and structural variation. the interpretation of isolated missense variants, and CAGI5 The CAGI cycle commences with the organizers soliciting data continues that trend. There are assessment, data provider, and sets from researchers and clinicians, whose studies demonstrate participant papers for the prediction of the destabilizing effect of relationships between genotype and phenotype. Such data sets must missense mutations in a cancer‐relevant protein (Frataxin, with be fully developed (“ripe”) but not publicly available (“spoiled”) until biophysical measurements of protein stability; Petrosino et al., 2019; after the CAGI prediction season. Data providers and organizers Savojardo, Petrosino et al., 2019; Strokach, Corbi‐Verge, & Kim, work together to develop these data sets into prediction challenges 2019); on the effect of missense changes in a human calmodulin, with well‐defined data sets and goals. The CAGI Ethics Forum assayed using a high‐throughput yeast complementation assay evaluates and adjusts these challenges for suitability and sensitivity. (Zhang et al., 2019); the effect of missense mutations related to In parallel, predictors are enrolled, bound by the CAGI Data Use schizophrenia in human Pericentriolar Material 1 (PCM1), using a Agreement [https://genomeinterpretation.org/data‐use‐agreement], zebrafish development model (Miller, Wang, & Bromberg, 2019; and vetted for access to the CAGI experiment with tiers reflecting Monzon et al., 2019); the effect of missense mutations in two the sensitivity of the data. The prediction season launches with cancer‐related proteins, PTEN and TPMT, on intracellular protein release of the challenges and concludes with the submission of levels, measured in a high‐throughput assay (Pejaver et al., 2019); predictions from predictors. For the fifth edition of CAGI (CAGI5), and the effect of missense changes in a monogenic disease related the prediction season extended until May 2018. The submitted protein, acid alpha‐glucosidase (GAA), with measurements of total predictions are evaluated by independent assessors. Each CAGI intracellular enzyme activity (Adhikari, 2019). Three participant edition culminates in a conference to discuss the outcome; the papers describe results on all the missense challenges (Garg & Pal, CAGI5 conference was held from 5 to 7 July 2018, for which videos 2019; Katsonis & Lichtarge, 2019; Savojardo, Babbi et al., 2019). The and slide sets are publicly available to registered CAGI members. issue also contains assessment articles from two earlier missense [https://genomeinterpretation.org/content/5‐conference] challenges on monogenic disease related proteins: N‐acetyl‐glucosa- Results from the previous CAGI edition (CAGI4) were published minidase (NAGLU; Clark et al., 2019), with total intracellular enzyme in a special issue of this journal (Hoskins et al., 2017). The present activity measured; and cystathionine beta‐synthase (CBS), using the issue contains assessment and participant papers primarily from the metric of yeast growth in a complication assay (Kasak, Bakolitsa most recent experiment, CAGI5, which included 14 challenges and et al., 2019). attracted participants from 12 countries. In addition to the other cancer‐related challenges outlined above, Notable in this CAGI edition is the increasing representation of there are two that required prediction of the pathogenicity of regulatory and noncoding variant challenges. Previously, CAGI has germline variants in cancer‐related proteins: one for breast cancer had just one small splicing challenge [https://genomeinterpretation. risk from variants in BRCA1 and BRCA2 as characterized by the org/content/Splicing‐2012]. CAGI5 included two full‐scale splicing ENIGMA consortium (Cao et al., 2019; Cline et al., 2019; Padilla et al., challenges (Mount et al., 2019) and these have resulted in five papers 2019; Parsons et al., 2019), and the other for cancer risk of variants from participants (Chen, Lu, Zhao, & Yang, 2019; Cheng, Çelik, in CHEK2 in Latina breast cancer cases and ancestry matched Nguyen, Avsec, & Gagneur, 2019; Gotea, Margolin, & Elnitski, 2019; controls (Voskanian et al., 2019). Naito, 2019; Wang, Wang, & Hu, 2019). The issue also contains an CAGI5 will continue to bear fruit. This edition introduced a time‐ overview paper from one of the splicing data providers (Rhine et al., based challenge in association with dbNSFP, whereby CAGI accepted 2019). There had also been only one previous expression regulatory predictions for all possible missense variants in the genome. The variant challenge, in CAGI4 (Kreimer et al., 2017). CAGI5 has a new results are to be vetted periodically in the future as the impact of expression regulatory challenge (Shigaki et al., 2019), and there are some of these variants are experimentally or clinically established. also two participant papers (Dong & Boyle, 2019; Kreimer, Yan, Additional papers on the challenges and other aspects of CAGI are Ahituv, & Yosef, 2019). presently in development and will be added to the CAGI5 papers CAGI5 continued the emphasis on the interpretation of clinically collection at Human Mutation (https://onlinelibrary.wiley.com/toc/