Research Brief

Large-scale Identification of Clonal Hematopoiesis and Mutations Recurrent in Blood Cancers

Julie E. Feusier1,2, Sasi Arunachalam1,3,4, Tsewang Tashi3,5,6, Monika J. Baker1, Chad VanSant-Webb1, Amber Ferdig1, Bryan E. Welm3,7, Juan L. Rodriguez-Flores8, Christopher Ours1, Lynn B. Jorde2, Josef T. Prchal3,5,6, and Clinton C. Mason1

abstract Clonal hematopoiesis of indeterminate potential (CHIP) is characterized by detect- able hematopoietic-associated mutations in a person without evidence of hematologic malignancy. We sought to identify additional cancer-presenting mutations usable for CHIP detection by performing a data mining analysis of 48 somatic mutation landscape studies report- ing mutations at diagnoses of 7,430 adult and pediatric patients with leukemia or other hematologic malignancy. Following extraction of 20,141 -altering mutations, we identified 434 significantly recurrent mutation hotspots, 364 of which occurred at loci confidently assessable for CHIP. We then performed an additional large-scale analysis of whole-exome sequencing data from 4,538 persons belonging to three noncancer cohorts for clonal mutations. We found the combined cohort prevalence of CHIP with mutations identical to those reported at blood cancer mutation hotspots to be 1.8%, and that some of these CHIP mutations occurred in children. Our findings may help to improve CHIP detec- tion and precancer surveillance for both children and adults.

Significance: This study identifies frequently occurring mutations across several blood cancers that may drive hematologic malignancies and signal increased risk for cancer when detected in healthy persons. We find clonal mutations at these hotspots in a substantial number of individuals from noncan- cer cohorts, including children, showcasing potential for improved precancer surveillance.

See related commentary by Spitzer and Levine.

1Division of Pediatric Hematology and Oncology, Department of Pedi- J.E. Feusier and S. Arunachalam contributed equally to this article. 2 atrics, University of Utah, Salt Lake City, Utah. Department of Human Corresponding Author: Clinton C. Mason, University of Utah, 417 Wakara 3 Genetics, University of Utah, Salt Lake City, Utah. Huntsman Cancer Insti- Way, Room 3117, Salt Lake City, UT 84108. Phone: 801–587–3633; E-mail: 4 tute, University of Utah, Salt Lake City, Utah. Department of Oncological [email protected] Sciences, University of Utah, Salt Lake City, Utah. 5Division of Hematology and Hematologic Malignancies, University of Utah, Salt Lake City, Utah. Blood Cancer Discov 2021;2:1–12 6VA Medical Center, Salt Lake City, Utah. 7Department of Surgery, Univer- doi: 10.1158/2643-3230.BCD-20-0094 8 sity of Utah, Salt Lake City, Utah. Department of Genetic Medicine, Weill ©2021 American Association for Cancer Research. Cornell Medical College, New York, New York. Note: Supplementary data for this article are available at Blood Cancer Discovery Online (https://bloodcancerdiscov.aacrjournals.org/).

May 2021 blood CANCER DISCOVERY | OF1

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Copyright 2021 by AssociationAmerican for Association Cancer Research. for Cancer Research. RESEARCH BRIEF Feusier et al.

INTRODUCTION and many are not currently included in other mutation databases, our analysis provides a valuable compiled list of Somatic mutations at hotspots (genetic loci observed to be recurrent hematologic cancer mutations at a protein-coding frequently mutated across patients with cancer) often drive or level with smaller genome-wide bias and larger sample size contribute to cancer pathogenesis (1, 2). Many somatic land- than previously reported. We then performed a large analy- scape studies have now been performed with next-generation sis of CHIP, focused on finding clonal mutations identical sequencing (NGS), typically for a single cancer type in small- to those reported at these hotspots in the whole-exome to medium-sized patient cohorts, identifying mutations that sequencing (WES) data of 4,538 persons from three noncan- recur frequently. Large single-study and pan-cancer analyses cer cohorts, which included the widely used 1000 Genomes of both pediatric and adult cohorts have identified cancer Project (1KG) and more than 400 children. Our findings may and mutation hotspots having even lower but signifi- lead to the improved prognostic and clinical utility of precan- cant recurrence rates. For example, over 140 driver genes were cer surveillance in children and adults. identified in an analysis of children with six cancer types (3), and more than 450 hotspots were found in an analysis of 41 RESULTS predominantly nonhematologic cancers (2). However, the number of included leukemic or other hematologic cancer Somatic Mutation Studies samples in both pan-cancer and single-study analyses reporting By systematic literature search, we identified 48 somatic hotspots at a codon level has generally been insufficient to iden- mutation landscape studies of blood cancers (3, 11–57) that tify blood cancer–specific hotspots of low recurrence. Existing could be used to determine recurrence of mutations reported databases that accumulate mutations across many individual at amino acid positions in protein-coding genes (Table 1). studies [most notable in size, the Catalogue Of Somatic Muta- Focusing on diseases that may have been preceded by CHIP tions In Cancer (COSMIC; https://cancer.sanger.ac.uk/cosmic); (58), we limited our investigation to studies that assessed ref. 4] are extremely valuable for identifying reported muta- patients by NGS at diagnosis of one of seven hematologic tions within their included studies; yet, current versions of malignancies: acute lymphoid leukemia (ALL), acute myeloid these databases lack many exome/genome-wide blood cancer leukemia (AML), chronic myeloid leukemia (CML), chronic studies that could improve detection of mutations recurrent myelomonocytic leukemia (CMML), juvenile myelomonocytic at low frequencies, and some are difficult to filter for specific leukemia (JMML), MDS, and MPN (Fig. 1). After evaluation, cancers. Hence, many hematologic malignancy mutation hot- filtering, and harmonization of 58,177 reported mutations spots of lesser recurrence and their constituent mutations assessed in 7,430 diagnostic patients, we determined a total that may drive blood cancers are likely still unidentified or of 20,141 mutations that altered an amino acid or splice site. unappreciated. Clonal hematopoiesis of indeterminate potential (CHIP), Mutational Hotspots in Hematologic Cancers which is detected as an expansive clonal somatic mutation We found 434 amino acid or splice-site positions occurring in a person currently free from hematologic malignancy, car- across 85 genes that met our criteria for hotspots (Methods; ries a greatly increased risk (HR >10) for future blood cancer Supplementary Table S1). The most common hotspots were development (5, 6). CHIP increases sharply in prevalence with well-known and observed frequently even within individual advanced age (5–9), and backward extrapolation of trends in studies (e.g., mutations at NRAS p.G12 were reported for adults could suggest that CHIP in childhood is extremely 345 persons within 35 studies; Fig. 2A). Yet, 79 hotspots rare. However, no exome-wide CHIP study has included large observed in three to eight persons had only a single mutation numbers of healthy children to directly address this issue. within any given study, highlighting the benefit of multistudy The mutations in hematopoiesis-regulating genes used to evaluations. Nearly all of the hotspots were recurrently listed identify CHIP include hotspot mutations of high recur- across combined cancers in the most recent COSMIC data- rence, but also include mutations lacking driver or recurrent base, which includes many blood cancer studies, the majority evidence at specific amino acid positions. Regardless of neo- having been published between 1992 and 2010. Yet this ver- plastic evidence, CHIP mutations are frequently identified in sion was lacking 32 of the 48 NGS studies (comprising 51% persons singly (5–10); therefore, most are likely responsible of total patients) used in our assessment. When restricting for the clonal expansion that makes their detection possible. COSMIC mutation data to primary samples in the seven Yet CHIP mutations identical to those reported at blood can- hematologic malignancies of our focus, the majority (n = 222) cer mutation hotspots may be found to carry increased risk of our identified hotspots were observed in less than three for future neoplastic transformation. individuals of this COSMIC subset (Supplementary Table S2). With the dual aims of identifying mutational hotspots of Some of our identified mutation hotspots were unique hematologic malignancies and using these hotspots to detect to a specific hematologic malignancy (Fig. 2B; Supplemen- CHIP having potentially greater risk for affecting cancer tary Table S1). With the proportion of assessed patients transformation, we performed a recurrent mutation analysis being 49.4% AML, 31.0% ALL, and 19.6% other malignancies, independently from existing databases. We systematically we found that mutations at IDH2 p.R172 (n = 59), KIT extracted and analyzed reported data from 48 studies meet- p.N822 (n = 31), and KIT p.Y418 (n = 18) were reported ing inclusion criteria, which detected mutations at diagnoses only in patients with AML, whereas RPL10 p.R98 (n = 25), of patients with acute leukemias, myeloproliferative neo- NOTCH1 p.L1678 (n = 24), FBXW7 p.R479 (n = 23), NOTCH1 plasms (MPN), and myelodysplastic syndrome (MDS). As p.L1600 (n = 22), and NOTCH1 p.L1585 (n = 19) mutations the majority of these studies interrogated the entire exome were observed only in patients with ALL. We also observed

OF2 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Clonal Hematopoiesis and Recurrent Blood Cancer Mutations RESEARCH BRIEF

Table 1. The 48 studies included in the hotspot mutation analysis to identify recurrent mutations in hematologic malignancies

Main hematologic Patients Mutations Protein-altering Study malignancy assessed (N)a assessed (N)b mutations (N)c Andersson et al. (2015; ref. 11) ALL 65 250 146 Chen et al. (2018; ref. 12) ALL 36 302 294 De Keersmaecker et al. (2013; ref. 13) ALL 211 538 470 Holmfeldt et al. (2013; ref. 14) ALL 40 706 366 Liu et al. (2016; ref. 15) ALL 203 2,437 1,418 Liu et al. (2017; ref. 16) ALL 264 4,165 3,426 Oshima et al. (2016; ref. 17) ALL 55 1,845 397 Papaemmanuil et al. (2014; ref. 18) ALL 55 795 549 Paulsson et al. (2015; ref. 19) ALL 51 459 376 Russell et al. (2017; ref. 20) ALL 8 218 86 Ryan et al. (2016; ref. 21) ALL 42 45 30 Stengel et al. (2014; ref. 22) ALL 625 110 98 Ma et al. (2018; ref. 3) ALL 122 574 254 Huether et al. (2014; ref. 23) ALL 525 412 167 Bolouri et al. (2018; ref. 24) AML 684 1,983 584 de Rooij et al. (2017; ref. 25) AML 113 268 99 Dolnik et al. (2012; ref. 26) AML 50 171 153 Eisfeld et al. (2017; ref. 27) AML 177 258 199 Eisfeld et al. (2017; ref. 28) AML 10 36 36 Eisfeld et al. (2016; ref. 29) AML 23 68 56 Faber et al. (2016; ref. 30) AML 165 849 588 Farrar et al. (2016; ref. 31) AML 20 208 126 Garg et al. (2015; ref. 32) AML 67 443 250 Greif et al. (2018; ref. 33) AML 50 558 440 Hirsch et al. (2016; ref. 34) AML 53 257 246 Lavallée et al. (2015; ref. 35) AML 29 55 53 CGARN et al. (2013; ref. 36) AML 200 22,633 1,574 Madan et al. (2016; ref. 37) AML 153 210 126 Papaemmanuil et al. (2016; ref. 38) AML 1,540 3,902 3,341 Sehgal et al. (2015; ref. 39) AML 152 52 52 Sood et al. (2016; ref. 40) AML 13 180 94 Thol et al. (2017; ref. 41) AML 171 603 415 Kim et al. (2017; ref. 42) CML 100 68 40 Togasaki et al. (2017; ref. 43) CML 24 191 181 Mason et al. (2016; ref. 44) CMML 69 507 479 Merlevede et al. (2016; ref. 45) CMML 17 8,077 143 Palomo et al. (2016; ref. 46) CMML 56 262 255 Patnaik et al. (2017; ref. 47) CMML 261 16 15 Caye et al. (2015; ref. 48) JMML 118 187 107 Stieglitz et al. (2015; ref. 49) JMML 71 128 128 Haferlach et al. (2014; ref. 50) MDS 102d 300 281 Pastor et al. (2017; ref. 51) MDS 50 25 18

(continued)

May 2021 blood CANCER DISCOVERY | OF3

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. RESEARCH BRIEF Feusier et al.

Table 1. The 48 studies included in the hotspot mutation analysis to identify recurrent mutations in hematologic malignancies (Continued)

Main hematologic Patients Mutations Protein-altering Study malignancy assessed (N)a assessed (N)b mutations (N)c Walter et al. (2013; ref. 52) MDS 150 322 254 Yoshida et al. (2011; ref. 53)e MDS 22 268 185 Churpek et al. (2015; ref. 54)f MDS 16 183 51 Schwartz et al. (2017; ref. 55)g MDS 54 288 195 Lundberg et al. (2014; ref. 56) MPN 197 267 236 Nangalia et al. (2013; ref. 57) MPN 151 1,498 1,064 Total 7,430 58,177 20,141

Note: Studies assessed somatic mutations in diagnostic patient samples with NGS and met additional inclusion criteria. Abbreviation: CGARN, Cancer Genome Atlas Research Network. aNumber of unique diagnostic patients assessed in the original study meeting inclusion criteria (hence, the total sample size of the original study may have been larger). bNumber of mutations from the original study assessed in this hotspot mutation analysis. cNumber of determined protein-altering or splice-site mutations used in recurrence tallies. dSample size estimated for the random selection of mutations provided. eSome samples from patients classified as CMML were included in this study. fSome patients were also classified with AML in this study. gSome patients were also classified with AML or MPN/JMML in this study.

that some genes contain mostly point or nonframeshift hot- from 26 different populations with sample sizes ranging spots (e.g., KRAS, NRAS, PTPN11, SF3B1), mostly nonsense or from N = 61 to N = 113 (median N = 99). The Qatari frameshift hotspots (e.g., ASXL1, NF1, STAG2), or had distinct Genome (QTRG) and Simons Simplex Collection (SSC) groupings of mutation types by location within the coding cohorts included children and adults. We found the ages, region (e.g., CEBPA, NOTCH1). sequencing depths, and methods utilized in each study to vary considerably (Supplementary Fig. S1A), with potential Hotspot Mutations Evaluable for CHIP to influence the rate of clonal mutation identification. The We identified 70 of the 434 hotspots to have heightened means of the average depths of coverage across the 364 con- potential for false positives if used to infer CHIP from muta- fident hotspots were 78.5×, 93.3×, and 53.6×, with averages tions in blood samples alone (Supplementary Table S3). This of 27.5%, 32.7%, and 10.3% of hotspots covered at a depth resulted in 364 hematologic cancer mutation hotspots and ≥100× in the 1KG, SSC, and QTRG cohorts, respectively 755 constituent mutations that could be confidently used (Supplementary Fig. S1B). in screening for CHIP with prevalent NGS methods (Sup- All detected variants were evaluated for reliability (which plementary Table S4). Only 45 (12.4%) of these 364 loci were led to the exclusion of eight outlier samples) and for meeting identified as hotspots in a previous pan-cancer analysis (2), the criteria for CHIP at hotspots, resulting in the identifica- and only 35 (9.6%) were observed in three or more patients tion of 83 individuals (1.83% of the 4,530 included) each of a sizable MDS investigation not included in our assess- having a single CHIP mutation at one of 62 hotspots across ment (59). Likely because many of the exome-wide studies 23 genes (Supplementary Fig. S2; Supplementary Table S6). that identified mutations at these hotspots were published The prevalence rate of CHIP at hotspots for each cohort was around or after a previous landmark CHIP study (10), 350 of as follows: QTRG: 0.73% (9/1,231), 1KG: 2.48% (62/2,503), and the 755 specific mutations identified in our hotspot muta- SSC 1.51% (12/796)—with subgroup rates of 0.77% (3/388) for tion assessment were not present in the list of queried CHIP SSC children and 2.21% (9/408) for SSC adults (Fig. 3A). CHIP variants of that study. Of these, 134 were reported in ≥3 mutations at hotspots were most common in DNMT3A, TET2, patients at diagnosis of a hematologic malignancy (Supple- and TP53 (Fig. 3B). Clonal size was not significantly associated mentary Table S5). Many of these unique mutations occur with reported frequency in our recurrent mutation analysis within NOTCH1, FLT3, CEBPA, KIT, and RUNX1. (P = 0.56; Supplementary Fig. S3). However, CHIP detection was associated with frequency of hotspot recurrence [OR, 1.75 per CHIP in Noncancer Cohorts Identical log of reported mutations; 95% confidence interval (CI), 1.33– to Hotspot Mutations 2.29; P < 0.0001]. Still, while half of the 12 hotspots (includ- We next assessed three large noncancer cohorts (60–62) ing those of lesser confidence for CHIP evaluation) reported for CHIP at our identified hotspots within their combined in more than 100 patients of the recurrence analysis were 4,538 individuals. The 1KG cohort included 2,503 adults observed with identical clonal mutations in the noncancer

OF4 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Clonal Hematopoiesis and Recurrent Blood Cancer Mutations RESEARCH BRIEF

Recurrent mutation (hotspot) analysis CHIP analysis

Initial extraction PubMed search for NGS 1KG cohort Simons Simplex Qatari cohort evaluation mutation studies of blood (noncancer) Collection (noncancer) cancers (noncancer) 14 studies 172 studies 2,503 persons 804 persons 1,231 persons

4,538 persons with paired-end whole-exome Single transcript sequencing data (8 persons later excluded) determination

Studies evaluated for providing: Data processed • NGS assessment of diagnostic pt. sample Data processed to identify CHIP • Unambiguous genetic coordinates to identify CHIP not exclusive to • Consistent genetic mapping at hotspots hotspots 48 usable studies (as per ref. 10)

CHIP mutations identified in: All mutations harmonized to single transcript; identified protein altering and splice-site • 83 persons with CHIP at hotspots (82 at mutations with locus reported in three or confident loci, 1 in recovery analysis of high more patients potential for artifact loci) 434 mutation hotspots • 183 persons with CHIP not exclusive to hotspots (6 persons with 2 CHIP mutations)

364 hotspots 70 hotspots with identifiable for high potential CHIP for artifactual (confident loci) CHIP calls

Figure 1. Flowchart of mutational hotspot and clonal hematopoiesis of indeterminate potential (CHIP) analyses. Hematologic cancer hotspot muta- tions identified in the recurrent mutation analysis were assessed for clonal presence in persons from noncancer cohorts as part of the CHIP analysis. pt., patient. cohorts, six were not. These included NPM1 p.W288 and FLT3 the child. The child with a RUNX1 p.S322X clonal mutation p.D835, both of which may rapidly accelerate neoplastic onset, also had an apparent germline SNP at RUNX1 p.L56S (vari- making them more difficult to observe in healthy persons (9). ant present in 5/10 reads) that was inherited from the child’s Together, these observations suggest that stratifying CHIP father. This SNP is present in the gnomAD version 2.1.1 mutations by significant recurrence in hematologic malignan- database (63) at a frequency of 1.2% yet was also reported in cies may prove prognostic in studies of blood cancer risk. 4/56 (7.1%) patients diagnosed with CMML (46), suggesting a possibility that this variant helped accelerate clonal growth CHIP at Mutation Hotspots by Age Groups under the two-hit hypothesis (64). Still, the clonal size and We detected CHIP mutations at the leukemic hotspots in 3 number of variant reads at each hotspot in these children of 388 (0.77%) children of the SSC cohort (two autism spec- were insufficient to completely rule out the possibility of arti- trum disorder probands and one unaffected sibling). These facts, and hence these findings are preliminary. mutations occurred at hematologic hotspots in DNMT3A The QTRG cohort having participants with age ranging [p.R882S, variant allele frequency (VAF) = 2.5%] and RUNX1 from 0 to 85 years had a much lower sequencing depth, so [p.G170 splice site (c.509–1G>T), VAF = 1.3%; p.S322X, VAF = we explored the effect of relaxing the requirement for detect- 1.1%], with all variant reads passing manual review and being ing CHIP at hotspots to include mutations with only two detected on both strands (Supplementary Table S6; Supple- variant reads. While such a threshold would result in a high mentary Fig. S4). We assessed the parent–child trios for each false-positive rate in deeply sequenced data, a higher rate of of these children for expected SNP inheritance (to preclude true positives is likely to be present in lower depth data. This the possibility of sample mix-up of children and adult sam- assessment found an additional 41 clonal mutations for ples or data) and confirmed that all three samples were from this cohort (Supplementary Table S7) and found age to be

May 2021 blood CANCER DISCOVERY | OF5

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. RESEARCH BRIEF Feusier et al.

A 3) = 403) = 189) = 40 = 3,670) N = 2,302) N = 408) = 2,302) = 408) N N = 189) = 124) N N N = 348) N

AML (N ALL (N MDS ( CMML ( MPN (N =JMML 348) (N CML (N =N 124) AML (N = ALL3,670) ( MDS ( CMML ( MPN ( JMML ( CML ( N DNMT3A p.R882 479 31013111 517 RUNX1 p.R201 20 122 25 >10% NPM1 p.W288 408 3 411 NOTCH1 p.L1678 24 24 10% NRAS p.G12 251 52 912 2 19 345 FBXW7 p.R479 23 23 JAK2 p.V617 18 1 8 8 247 282 WT1 p.R462 21 1 1 23 NRAS p.Q61 194 7 1 1 7 210 NOTCH1 p.L1600 22 22 8% IDH2 p.R140 177 4 11 8 1 201 RUNX1 p.R166 18 2 1 1 22 6% IDH1 p.R132 170 4 6 1 4 185 TP53 p.R248 11 6 4 21 FLT3 p.D835 174 3 177 JAK3 p.M511 2 18 20 SRSF2 p.P95 98 15 57 2 172 RUNX1 p.R162 15 2 1 1 1 20 4% NRAS p.G13 113 27 7 1 10 158 WT1 p.S381 15 3 2 20 KRAS p.G12 88 30 3 5 9 135 ASXL1 p.R693 10 2 3 3 1 19 2% KIT p.D816 105 1 1 1 108 FLT3 p.V491 15 4 19 NPM1 p.L287 70 5 3 1 79 FLT3 p.I836 18 1 19 0% U2AF1 p.S34 49 2 17 3 1 72 KRAS p.Q61 15 1 3 19 PTPN11 p.E76 28 2 6 31 67 NOTCH1 p.L1585 19 19 KRAS p.G13 40 14 11 65 PTPN11 p.S502 17 1 1 19 IDH2 p.R172 59 59 KIT p.Y418 18 18 SF3B1 p.K700 18 33 3 1 55 KRAS p.A146 7 6 1 4 18 PTPN11 p.A72 30 3 2 10 45 PTPN11 p.G60 13 2 3 18 FBXW7 p.R465 2 42 44 SETBP1 p.D868 2 5 5 6 18 PTPN11 p.D61 24 1 4 8 37 FBXW7 p.R505 1 16 17 ASXL1 p.H630 19 4 1 4 5 33 FLT3 p.D839 15 2 17 KIT p.N822 31 31 RUNX1 p.R204 14 1 1 1 17 U2AF1 p.Q157 9 13 6 3 31 WHSC1 p.E1099 17 17 ASXL1 p.G643 6 20 1 27 WT1 p.R370 12 5 17 SF3B1 p.K666 19 7 1 27 NF1 p.T676 1 3 12 16 TP53 p.R273 11 10 3 1 2 27 SRSF2 p.S101 16 16 FLT3 p.N676 23 2 1 26 TP53 p.R175 7 8 1 16 MYC p.P74 22 3 25 WT1 p.R458 15 1 16 PTPN11 p.G503 19 3 3 25 RUNX1 p.D198 12 2 1 15 RPL10 p.R98 25 25 SETBP1 p.G870 1 6 5 3 15

Only nonsense Only point mutations Mixture of nonsense/frameshift B Splice site or frameshift or nonframeshift and point/nonframeshift

AKT1 KRAS ARIH1 MED12 ASXL1 MPL ASXL2 MYC BCL11B MYCN BCOR NF1 BRAF NOTCH1 CALR NPM1 CBL NRAS CCND2 NSD1 CCND3 PAX5 CEBPA PHF6 CHD4 PIK3CA CNOT3 PIK3CD CREBBP PIK3R1 CRLF2 PPP4C CSF3R PTEN CTCF PTPN11 DDX41 RAD21 DHX15 RIT1 DNM2 RPL10 DNMT3A RUNX1 EP300 SETBP1 ETV6 SETD2 EZH2 SF3B1 FBXO28 SMARCA2 FBXW7 SMC1A FLT3 SMC3 GATA2 SPI1 GATA3 SRSF2 GIGYF2 STAG2 GNAS STAT5B GNB1 TET2 HDAC7 TP53 IDH1 TRIM24 IDH2 TRRAP IKZF1 U2AF1 JAK1 USP7 JAK2 WHSC1 JAK3 WT1 KDM6A ZEB2 KIT ZRSR2 KMT2C 0210 030405060708090 100 0210 030405060708090 100 Coding region (%) Coding region (%)

Figure 2. Mutation hotspots in hematologic malignancies. A, Most frequently reported mutation hotspots at blood cancer diagnoses. Top axes: total patients assessed for each hematologic malignancy. Numbers within cells: total number of patients having a reported mutation. Color intensity scale: percentage of persons having a reported mutation for each malignancy. As not all patients were assessed for mutations at each locus, these frequencies likely represent lower-bound estimates. Yellow boxes indicate mutations identified in only a single hematologic malignancy. B, Location in coding region for the 434 hotspots occurring in 85 genes. Horizontal axes: normalized location of each mutation locus within the coding sequence. Color of triangles: types of mutations reported at each locus (purple indicates only nonsense or frameshift, blue indicates only point mutations or nonframeshift, orange indicates a mixture of nonsense/frameshift and point/nonframeshift, and pink indicates splice site). Utilized transcripts are provided in Supplementary Table S1.

OF6 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Clonal Hematopoiesis and Recurrent Blood Cancer Mutations RESEARCH BRIEF

A 0.09 9 0.08 0.07 6 0.06 5 0.05 5 0.04 4 4 3 3 at hotspots 0.03 62 9 2 2 2 2 2 2 2 0.02 12 1 Proportion with CHIP 1 1 1 1 1 1 1 0.01 3 9 9 1 0.00

= 25) = 96) = 61) = 86) = 93) = 94) = 99) = 99) = 91) = 99) = 98) = 85) = 64) = 96) = 408) = 388) N N N N N N = 99) = 103) = 105)N N N N = 103) = 113) = 102) = 104)N N N N N = 85)N = 104) = 102) = 107) = 108) N N = 1,231) = 1,206) = 2,503) N N N N N = 107)N N N N N N N N N

1KG-FIN ( 1KG-PJL ( 1KG-ACB1KG-ASW (1KG-BEB (1KG-CDX (1KG-CEU ( ( 1KG-CLM1KG-ESN ( (1KG-GBR1KG-GIH ( ( 1KG-IBS1KG-ITU (1KG-JPT ( 1KG-KHV ( 1KG-LWK (1KG-MSL (1KG-MXL ( 1KG-PEL ( ( 1KG-TSI1KG-YRI ( ( 1KG-CHB1KG-CHS ( ( 1KG-GWD ( 1KG-PUR1KG-STU ( ( SSC overallSSC SSCparents (N children= 796) ( ( Qatari children ( Qatari adults1KG ( overall ( Qatari overall ( Cohorts

B C 0.07 Qatari cohort 16 0.06 14 Child mutations 12 0.05 Adult mutations 10 0.04 8 0.03 6 at hotspots

at hotspots 0.02 4 2 0.01 Number of CHIP mutations

0 Rate of relaxed CHIP mutations 0.00 0–29 30–39 40–49 50–59 60–69 70–85 KIT WT1 CBL FLT3 IDH1 Age category (years) TET2TP53 NRAS EZH2 BCOR GNAS JAK2 PHF6 RUNX1 ASXL1FBXW7 KDM6A CCND2 SETD2SF3B1 SRSF2 DNMT3A SMARCA2 N with relaxed CHIP: 3 6 9115 05 Genes Total N: 120 241 296 336 166 72

Figure 3. CHIP detection at hotspots in noncancer cohorts. A, Proportion of each cohort with identified CHIP mutations identical to mutations reported at 364 confident mutation hotspots (black colored bars indicate cohort aggregates). ACB, African Caribbean in Barbados; SW,A Americans of African Ancestry in SW, USA; BEB, Bengali in Bangladesh; CDX, Chinese Dai in Xishuangbanna, China; CEU, Utah Residents with Northern and Western European Ancestry; CHB, Han Chinese in Beijing, China; CHS, Han Chinese South; CLM, Colombians from Medellin, Colombia; ESN, Esan in Nigeria; FIN, Finnish in Finland; GBR, British in England and Scotland; GIH, Gujarati Indian from Houston, Texas; GWD, Gambian in Western Divisions in the Gambia; IBS, Iberian Population in Spain; ITU, Indian Telugu in the UK; JPT, Japanese in Tokyo, Japan; KHV, Kinh in Ho Chi Minh City, Vietnam; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; MXL, Mexican Ancestry from Los Angeles, USA; PEL, Peruvians from Lima, Peru; PJL, Punjabi from Lahore, Pakistan; PUR, Puerto Ricans in Puerto Rico; STU, Sri Lankan Tamil from the UK; TSI, Toscani in Italia; YRI, Yoruba in Ibadan, Nigeria. B, Genes with CHIP mutations identical to mutations reported at hotspots across the three cohorts (N = 4,530). Black color indicates mutation identified in a child sample. C, Rate of relaxed CHIP (mutations having ≥2 variant reads) in the Qatari cohort by age groups. associated with relaxed CHIP prevalence in this call set (OR, (5) and Jaiswal and colleagues (10), respectively, were identi- 1.025 per year of age; 95% CI, 1.004–1.048; P = 0.022; Fig. 3C). cal to mutations in our hotspot list. Our general CHIP analysis identified six persons having CHIP Not Exclusive to Hotspots two mutations (five of whom had a mutation ofDNMT3A ), We also investigated the noncancer cohorts for the none with more than two, and the remaining 177 persons prevalence of CHIP that included seldom or never-before having only a single CHIP mutation—yielding a general CHIP reported mutations at diagnosis of blood cancers, but prevalence rate of 4.04% (183/4,530) across the cohorts. which may still be involved with clonal expansion due to Including identified clonal mutations having a VAF> 4% at their occurrence in genes with known hematopoietic func- the additional hotspots not previously utilized by Jaiswal tion. Across the specified mutations and domains of the and colleagues (10), 193/4,530 (4.26%) persons had quali- 74 allowed hematologic genes of Jaiswal and colleagues fying CHIP. As in previous studies, CHIP not exclusive to (10), and using their criteria, we observed 189 mutations hotspots was most frequent for TET2 and DNMT3A (Sup- not exclusive to hotspots within the 4,530 persons of our plementary Table S8). However, we found mutations in analyzed cohorts (Supplementary Table S8), 40 of which SETD2, EP300, and KMT2A/D to be more common in our (21.2%) were identical to mutations reported at confi- cohort, whereas ASXL1 and JAK2 mutations were noticeably dent hotspots in three or more diagnostic patients in the less frequent. Variability of gene prevalence for CHIP muta- 48 somatic landscape publications we assessed (i.e., those tions in DNMT3A, TET2, ASXL1, and, most strikingly, JAK2 listed in Supplementary Table S4). For comparison, we also between control and cardiovascular disease cohorts was pre- determined that 76 of the 224 (33.9%) and 298 of the 805 viously observed (10). As the cohorts we assessed likely had (37.0%) reported CHIP mutations in Jaiswal and colleagues fewer underlying cases of cardiovascular disease and younger

May 2021 blood CANCER DISCOVERY | OF7

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. RESEARCH BRIEF Feusier et al. average age than other CHIP studies using WES data, some sion. Future studies of large size and long follow-up will be of our gene prevalence rates may reflect these differences. required to determine the degree to which CHIP at hotspots Further exploration identified sevenASXL1 variants having a may carry an increased risk for aiding neoplastic transforma- VAF >2%, yet only one reached the threshold of 4% required tion, as well as for apparently unrelated nonhematologic for this general CHIP analysis. No additional JAK2 variants cardiovascular complications of CHIP (10). with three or more supporting reads were observed at any Our analysis of CHIP included for the first time a large frequency. assessment of children free from blood disorders [please note Potential CHIP at non-hotspots with a VAF >4% was that CHIP has been observed in children having aplastic ane- observed in only four children having variant reads detected mia (67) and that postzygotic mutations of generally higher on both strands (Supplementary Table S8). In one child, the mutation frequencies in children have also been previously variant (DNMT3A p.V665L) had been previously identified assessed (65)]. Of great interest, we detected CHIP at mutation and reported as a de novo mutation by Lim and colleagues hotspots in three of the children in our cohorts. This novel, (65), although it could possibly have been a CHIP mutation preliminary detection of CHIP in children unselected for blood that had reached near-complete saturation (the VAF was disorders is important because although only a small number 47.3%; see Methods). This particular variant was not observed of children were observed to have these clonal mutations, this in 55 individuals having Tatton-Brown–Rahman syndrome, rate may be higher than anticipated from past analyses of adult a congenital condition due to germline DNMT3A mutations cohorts (e.g., only 1/1,039 adults aged 20–39 years was detected (66), one of whom developed AML in childhood. Yet the with CHIP in a previous analysis of WES data; ref. 5). The well- clinical association of autistic spectrum disorder in 20 (36%) designed Simons Simplex Collection study sequenced children of their study participants could be related to this detection and parental samples simultaneously at nearly identical depths of a DNMT3A mutation in an SSC proband. Also intriguing, (60). That CHIP at hotspots in these children occurred at 35% of another of the four children with CHIP at non-hotspots was the overall parental frequency (2.2%) encourages future studies one of the three children we initially identified to have a hot- to investigate this phenomenon more thoroughly in children. spot CHIP mutation, making the child’s nonhotspot TET2 Abelson and colleagues (7) found that AML mutations p.A1863S variant (VAF = 6.5%) a potentially synergistic CHIP reported with higher frequency in the COSMIC database (4) mutation with their previously mentioned RUNX1 p.G170 were also observed as CHIP more frequently. We similarly splice-site hotspot variant (Supplementary Fig. S4). This found an association between frequency of reported diagnos- observation may once again reflect an increased likelihood tic mutations across the seven hematologic malignancies and of observing CHIP in children when two genetic insults are their detection as clonal mutations in the noncancer cohorts, present, as without an additional factor, single clonal muta- strengthening the conjecture that mutations at hotspot loci tions in younger persons may have lacked sufficient time to may increase risk for affecting transition from CHIP to hema- expand to detectable levels. tologic cancer (6). Some of the hotspot mutations we list may have been overlooked in previous targeted designs for CHIP assessment. These mutations may now receive heightened DISCUSSION priority when clinically observed as well as further investiga- Owing to the work of many researchers who published the tion for functional significance and therapeutic potential. findings of their somatic mutation studies (3, 11–57), we were Much remains to be discovered regarding most individual able to compile a novel list of recurrent hematologic cancer CHIP mutations, including their prevalence by ethnicity, risk mutations. The majority of these NGS studies assessed the for onset of nonhematologic diseases, and concomitant factors entire exome/genome, allowing for a less biased and larger-scale associated with their size and clonal dynamics. Using high- assessment of diagnostic mutation recurrence in blood cancers depth WGS and RNA sequencing, future work may expand to than has previously been available. While significant recurrence include the identification of clonal fusions in CHIP prevalence alone does not imply a driver or even contributory role in cancer and risk assessments. While CHIP was previously thought to be development, overall, the hotspots we identified will likely be pertinent to adults only, our analysis of noncancer cohorts iden- enriched for pathogenic blood cancer mutations. Importantly, tified clonal mutations at both hotspots and non-hotspots in these identified hotspots also increase the number of mutations children, indicating a need to further explore CHIP in younger- having documented recurrence in primary hematologic malig- age cohorts. Our hematologic cancer–focused hotspot list will nancies that can now be used to identify CHIP. allow for subgroup analyses of CHIP in future studies, with Using these recurrent mutation loci, we analyzed three promise to improve its prognostic performance and future large noncancer cohorts, finding clones with mutations iden- blood cancer prevention research efforts. tical to those observed at blood cancer mutation hotspots in 1.83% of persons across the combined cohorts. Without METHODS restricting to hotspots, our estimate of CHIP prevalence Somatic Mutation Studies coincides with that of previous WES studies of CHIP, 4% An initial feasibility exercise was performed on 14 somatic mutation to 5%, with the proportion of CHIP identical to diagnostic landscape studies to discover possible extraction-related problems and mutations at hotspots ranging from 21% to 37% within these necessary steps to confidently identify and harmonize reported muta- studies. As the vast majority of all persons with identified tions usable for hotspot determination. This preliminary work was sup- CHIP are found to have only a single clonal mutation in plemented with a formal PubMed search on studies published before hematopoiesis-regulating genes, most detected hotspot or July 2018 with the following search criteria: (“genomic landscape,” nonhotspot mutations have likely driven the clonal expan- “somatic landscape,” “genomic profile,” “mutational landscape,” OR

OF8 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Clonal Hematopoiesis and Recurrent Blood Cancer Mutations RESEARCH BRIEF

“mutation landscape”) AND (“leukemia” OR “leukaemia”). This search splice sites tallied separately. As most studies did not provide silent or returned a total of 172 papers, of which 61 were found to be review arti- nonexonic mutations in their lists, a formal driver analysis was not possi- cles reporting on no new patients. We assessed studies focused on any ble, which may be considered a weakness of this work. Hence, we sought of seven disorders: ALL, AML, CML, CMML, JMML, MPN, and MDS. to determine minimalist criteria for our designation of hotspot based on deviance from expected recurrence in binomial distribution modeling Mutation Evaluation that utilized the set of reported protein-altering mutations (Supplemen- Each study was assessed for whether hematologic neoplasms had tary Methods; Supplementary Fig. S5). Various models were assessed, been investigated with NGS and for providing information that across which a threshold of at least three times recurrence was mark- could accurately identify the genetic position of each alteration and edly pronounced for deviation from expectation. Thus, our minimalist the effect on protein coding. Studies that did not utilize NGS or criteria for hotspot designation was defined as three or more patients reported mutations that could not be determined without ambigu- having such mutations at the same amino acid or splice-site position, ity (owing to multiple transcripts for many genes, mutations from with 434 loci meeting that criteria. To determine hotspots that could studies providing only an amino acid substitution without either an be confidently used in identifying CHIP, we additionally determined accompanying transcript identifier or genomic positions were gener- the mappability of the genetic sequence surrounding each locus with ally deemed ambiguous) were not included. For remaining studies, all multiple approaches including an extensively repeated sequence search. substitutions and all deletions were assessed separately for provided This allowed us to exclude hotspots that occurred at difficult-to-map reference alleles to determine the combination of factors that were regions in CHIP analyses. Reported mutations at regions of homopoly- used in reporting mutations, specifically, cDNA or gDNA, 0-bp or mer repeats or identical to nonrare germline SNPs were also classified as 1-bp coordinate system, and reference version. If less confident loci for CHIP calling and excluded. These additional filter- only a trivial number of reference allele discrepancies were observed, ing steps resulted in a final total of 364 hotspot loci having increased these were discarded and the remaining variants were incorporated; likelihood for a driver or contributory role in hematologic malignancies, otherwise, the entire group of substitutions, indels, or entire study was which could be confidently used for detecting CHIP. discarded. While translocations and large structural rearrangements are Noncancer Cohorts common for many cancers (68), these were not included in this investi- gation as our primary purpose was the identification of recurrent events Three large cohorts unselected for cancer and having publicly occurring entirely within exonic regions, so that these could be queried available paired-end WES data were analyzed for CHIP-associated for CHIP in WES data (in addition, less than one third of the studies pro- mutations. (i) 1KG: We assessed CHIP from the FASTQ data of vided breakpoints of large structural rearrangements at a level). 2,535 WES samples coming from the 26 populations of the phase 3 Many somatic landscape studies reported having performed validation of 1000 Genomes Project submitted to the European Nucleotide their listed mutations; uniformly, we relied on the mutations as reported Archive (ENA) as PRJNA262923 by the Wellcome Trust Sanger by the original authors rather than requesting the original NGS data for Institute. Persons in this cohort were generally assumed to be unse- reanalysis. Overall, our approach was designed to enrich for reported lected for disease and ≥18 years old. One sample with unavailable mutations having high confidence for accurate extraction rather than to paired-end FASTQ data files as well as samples that were poorly collect increased numbers of mutations with less overall confidence in sequenced or had other issues previously identified for exclusion (61) their authenticity. were not utilized, resulting in a total of N = 2,503 persons whose samples were deemed satisfactory. DNA had been extracted from Mutation Filtering and Harmonization lymphoblastoid cell lines (LCL) for most participants (61). Also as per 1000 Genomes Project Consortium and colleagues (61, 69) and In studies that included mutations from relapse or secondary leuke- The International Genome Sample Resource (www.internationalge- mia samples in addition to primary samples, mutations in the primary nome.org), the populations assessed consist of African Caribbean diagnostic samples were extracted only if they could be clearly distin- in Barbados (ACB); Americans of African Ancestry in SW, USA guished from those in relapse or secondary leukemia samples. If there (ASW); Bengali in Bangladesh (BEB); Chinese Dai in Xishuangbanna, was no distinction, the entire study was excluded. For a few studies China (CDX); Utah Residents (from CEPH families) with Northern assessing samples that had been investigated in previous studies, muta- and Western European Ancestry (CEU); Han Chinese in Beijing, tions from patients reported in the older analysis were not included. China (CHB); Han Chinese South (CHS); Colombians from Medel- Mutations in genes known for having high false-positive rates were lin, Colombia (CLM); Esan in Nigeria (ESN); Finnish in Finland frequently filtered by authors, and we likewise removed mutations (FIN); British in England and Scotland (GBR); Gujarati Indian reported in such genes from studies that had not performed such from Houston, Texas (GIH); Gambian in Western Divisions in the filtering. Similarly, a few studies reported all observed variants with Gambia (GWD); Iberian Population in Spain (IBS); Indian Telugu little or no read/frequency filtering for mutation calling, and variants in the UK (ITU); Japanese in Tokyo, Japan (JPT); Kinh in Ho Chi having minor VAFs were removed. Finally, a single RefSeq transcript Minh City, Vietnam (KHV); Luhya in Webuye, Kenya (LWK); Mende was determined for each gene based on its study-utilized frequency and in Sierra Leone (MSL); Mexican Ancestry from Los Angeles, USA length, and each mutation was assessed for its effect on this transcript (MXL); Peruvians from Lima, Peru (PEL); Punjabi from Lahore, (listed in Supplementary Table S1). Only splice-site or protein-altering Pakistan (PJL); Puerto Ricans in Puerto Rico (PUR); Sri Lankan mutations were retained. A total of 48 studies (3, 11–57) satisfying all Tamil from the UK (STU); Toscani in Italia (TSI); and Yoruba in of these stringent criteria were included in the hotspot mutation analy- Ibadan, Nigeria (YRI). (ii) SSC: We assessed the FASTQ data of 804 sis. Comparison with COSMIC data was performed with the most paired-end WES samples from Simons Simplex families deposited recent COSMIC database (version 92, August 2020). in ENA as PRJNA167318. These included 413 parents and 391 children. Of the children, 205 were diagnosed with autism and 186 Hotspot Determination were unaffected siblings. DNA had been extracted from whole blood Similar to Chang and colleagues (2), we assessed mutational recur- as reported in Sanders and colleagues (60). Of note, this cohort has rence at amino acid and splice-site positions in protein-coding genes, been previously analyzed for postzygotic mutations of substantial yet also included indels in addition to substitutions. The number of frequency (65), and while not specifically looking for CHIP, Lim and reported protein-altering and splice-site substitutions and indels across colleagues (65) did identify two of the general (and none of the hot- the final set of 48 somatic mutation landscape studies was tallied at each spot) CHIP mutations we list in Supplementary Table S8. (iii) QTRG: amino acid locus, with indels tallied at the locus at which they began and We assessed paired-end WES FASTQ data from the samples of 1,231

May 2021 blood CANCER DISCOVERY | OF9

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. RESEARCH BRIEF Feusier et al. persons of the QTRG for precision medicine deposited in ENA as Testing for association of variant allele fraction and reported recur- PRJNA290484. These included 510 persons diagnosed with type II rence groups was computed with the Wilcoxon–Mann–Whitney test. diabetes and 270 persons designated as controls. The age spectrum A two-sided Fisher exact test was used to determine the significance of this cohort ranged from 0 to 85 years, with 25 persons being of differences in prevalence for the different variant thresholds and <18 years of age. DNA had been extracted from blood as reported in cohort pairings assessed. Additional statistical details are provided Fakhro and colleagues (62). While both this QTRG cohort and the in the Supplementary Methods. All statistical computations were SSC cohort assessed DNA from blood, a possibility for infrequent performed with SAS software, version 9.4 (SAS Institute). LCL-specific mutations could be present in the data of the 1KG samples. However, we found the frequency of CHIP mutations in the Authors’ Disclosures 1KG samples to be similar to the SSC parent cohort, implying that J.L. Rodriguez-Flores is a full-time employee of Regeneron Pharma- any LCL-specific effect may be small in magnitude. ceuticals Inc. C.C. Mason reports other from Intermountain Health- care Foundation (funding for the Pediatric Cancer Program) and CHIP Assessment at Hotspots Primary Children’s Hospital Foundation (funding for the Pediatric We aligned the paired-end FASTQ data files for each of the 4,538 Cancer Program), and grants from Primary Children’s Center for Per- samples in the noncancer cohorts to the human reference genome sonalized Medicine during the conduct of the study. No disclosures hg19 and performed subsequent deduplication, realignment, recali- were reported by the other authors. bration, filtering, and variant calling (additional details provided in Supplementary Methods). CHIP at hotspots classification required Disclaimer somatic mutations to have ≥3 variant reads of ≥20 total reads and to The content and expressed viewpoints are those of the authors only. occur at one of the 364 confident amino acid hotspots identified in the recurrent mutation analysis (hence at a locus reported in at least Authors’ Contributions three patients and with confident mapping). As the total number of loci at which CHIP could be called was reduced >100-fold from that J.E. Feusier: Data curation, software, validation, investigation, writing– used in other studies assessing CHIP across large domains (1,092 original draft, writing–review and editing. S. Arunachalam: Data cura- bases compared with >160,000 bases in Jaiswal and colleagues; ref. tion, software, validation, investigation, writing–review and editing. 10), we did not incorporate an additional lower-bound cutoff at T. Tashi: Conceptualization, writing–review and editing. M.J. Baker: Data hotspots. In addition, such mutations were required to yield the curation, writing–review and editing. C. VanSant-Webb: Data cura- identical amino acid substitution or insertion/deletion effect (e.g., tion. A. Ferdig: Data curation, writing–review and editing. B.E. Welm: frameshift) as had been previously reported in at least one diagnostic Funding acquisition, writing–review and editing. J.L. Rodriguez-Flores: tumor sample, without additional filtering for predicted benign/ Validation, writing–review and editing. C. Ours: Conceptualization, deleterious effect on protein. Only mutations meeting all of these writing–review and editing. L.B. Jorde: Conceptualization, funding criteria were used to identify CHIP at hotspots. CHIP at the 70 hot- acquisition, writing–review and editing. J.T. Prchal: Conceptualization, spots with greater potential for artifacts were also assessed, and a writing–review and editing. C.C. Mason: Conceptualization, resources, recovery analysis was performed to arbitrate any excluded mutations data curation, software, formal analysis, supervision, funding acqui- for inclusion. While the coverage statistics at the 364 hotspots were sition, validation, investigation, visualization, methodology, writing– plausibly similar to those of other WES studies analyzed for CHIP, original draft, project administration, writing–review and editing. coverage was insufficient to detect small- to medium-sized clones at many hotspots. In a separate analysis, a classification of relaxed CHIP Acknowledgments at hotspots was used with the sole modification of allowing≥ 2 vari- The authors thank the many researchers who published the data ant reads for identification. and details of their studies as well as the participants in those stud- ies, and acknowledge their original publications and the European General CHIP Assessment Nucleotide Archive as sources of the primary data. They also thank We utilized the specified mutations and domains provided in Allyson Mower and Michele Ballantyne for invaluable advice, and Jaiswal and colleagues (10) to assess general CHIP (clonal mutations the many publishers who granted text and data mining permission. not exclusive to hotspot mutations previously reported in hemato- C.C. Mason acknowledges Pediatric Cancer Program funding, which logic cancers, although in genes or domains that have been associ- is supported by the Intermountain Healthcare and Primary Children’s ated with hematopoiesis). We performed filtering for mappability, Hospital Foundations as well as the Department of Pediatrics and sequencing artifacts, and recurrence as we did for CHIP at hotspots, Division of Pediatric Hematology/Oncology at the University of with an additional imposed VAF threshold requirement of 4%. We Utah. Some work by J.E. Feusier was supported by National Center also assessed sensitivity of CHIP prevalence by gene for lower VAF for Advancing Translational Sciences (NCATS)/NIH (UL1TR002538/ threshold levels. We calculated the proportion of hotspots in our TL1TR002540). Some work by S. Arunachalam was funded by a general CHIP calls at a gene level and overall level, and similarly com- U.S. Department of Defense grant (W81XWH-14-1-0417; to B.E. puted these rates on the data of previously published cohorts (5, 10). Welm). L.B. Jorde acknowledges funding from NIH (GM118335/ GM059290). The support and resources from the Center for High Statistical Analysis Performance Computing at the University of Utah are gratefully acknowledged. We used logistic regression analysis to determine the association of year of age with relaxed CHIP mutation occurrence, with age Received June 5, 2020; revised December 21, 2020; accepted Febru- being mean centered. We also used logistic regression to assess the ary 24, 2021; published first March 3, 2021. association of frequency of reported mutations at hotspots and binary detection of CHIP at those loci. As the frequency of reported mutations was right skewed, the natural logarithm of this variable was used. Sensitivity analyses using unlogged data as well as log base References 10 showed similar results. Deviance from expected recurrence fre- 1. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, quencies for mutations based on binomial distributions was used to Weerasinghe A, et al. Comprehensive characterization of cancer driver determine the effect of various potential hotspot threshold criteria. genes and mutations. Cell 2018;173:371–85.

OF10 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Clonal Hematopoiesis and Recurrent Blood Cancer Mutations RESEARCH BRIEF

2. Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, et al. 23. Huether R, Dong L, Chen X, Wu G, Parker M, Wei L, et al. The land- Identifying recurrent mutations in cancer reveals widespread lineage scape of somatic mutations in epigenetic regulators across 1,000 diversity and mutational specificity. Nat Biotech 2016;34:155–63. paediatric cancer genomes. Nat Commun 2014;5:3630. 3. Ma X, Liu Y, Liu Y, Alexandrov LB, Edmonson MN, Gawad C, et al. 24. Bolouri H, Farrar JE, Triche T Jr, Ries RE, Lim EL, Alonzo TA, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric The molecular landscape of pediatric acute myeloid leukemia reveals leukaemias and solid tumours. Nature 2018;555:371–6. recurrent structural alterations and age-specific mutational interac- 4. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. tions. Nat Med 2018;24:103–12. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic 25. de Rooij JD, Branstetter C, Ma J, Li Y, Walsh MP, Cheng J, et al. Acids Res 2019;47:D941–7. Pediatric non-Down syndrome acute megakaryoblastic leukemia is 5. Jaiswal S, Fontanillas P, Flannick J, Manning A, Grauman PV, Mar BG, characterized by distinct genomic subsets with varying outcomes. et al. Age-related clonal hematopoiesis associated with adverse out- Nat Genet 2017;49:451–6. comes. N Engl J Med 2014;371:2488–98. 26. Dolnik A, Engelmann JC, Scharfenberger-Schmeer M, Mauch J, 6. Genovese G, Kähler AK, Handsaker RE, Lindberg J, Rose SA, Kelkenberg-Schade S, Haldemann B, et al. Commonly altered Bakhoum SF, et al. Clonal hematopoiesis and blood-cancer risk genomic regions in acute myeloid leukemia are enriched for somatic inferred from blood DNA sequence. N Engl J Med 2014;371: mutations involved in chromatin remodeling and splicing. Blood 2477–87. 2012;120:e83–92. 7. Abelson S, Collord G, Ng SWK, Weissbrod O, Mendelson Cohen N, 27. Eisfeld AK, Kohlschmidt J, Schwind S, Nicolet D, Blachly JS, Orwick S, Niemeyer E, et al. Prediction of acute myeloid leukaemia risk in et al. Mutations in the CCND1 and CCND2 genes are frequent events healthy individuals. Nature 2018;559:400–4. in adult patients with t(8;21)(q22;q22) acute myeloid leukemia. 8. Xie M, Lu C, Wang J, McLellan MD, Johnson KJ, Wendl MC, et al. Age- Leukemia 2017;31:1278–85. related mutations associated with clonal hematopoietic expansion 28. Eisfeld AK, Kohlschmidt J, Mrózek K, Volinia S, Blachly JS, Nicolet D, and malignancies. Nat Med 2014;20:1472–8. et al. Mutational landscape and gene expression patterns in adult 9. McKerrell T, Park N, Moreno T, Grove CS, Ponstingl H, Stephens J, acute myeloid leukemias with monosomy 7 as a sole abnormality. et al. Leukemia-associated somatic mutations drive distinct patterns Cancer Res 2017;77:207–18. of age-related clonal hemopoiesis. Cell Rep 2015;10:1239–45. 29. Eisfeld AK, Kohlschmidt J, Mrózek K, Blachly JS, Nicolet D, Kroll K, 10. Jaiswal S, Natarjan P, Silver AJ, Gibson CJ, Bick AG, Shvartz E, et al. et al. Adult acute myeloid leukemia with trisomy 11 as the sole abnor- Clonal hematopoiesis and risk of atherosclerotic cardiovascular dis- mality is characterized by the presence of five distinct gene muta- ease. N Engl J Med 2017;377:111–21. tions: MLL-PTD, DNMT3A, U2AF1, FLT3-ITD and IDH2. Leukemia 11. Andersson AK, Ma J, Wang J, Chen X, Gedman AL, Dang J, et al. The 2016;30:2254–8. landscape of somatic mutations in infant MLL-rearranged acute 30. Faber ZJ, Chen X, Gedman AL, Boggs K, Cheng J, Ma J, et al. The lymphoblastic leukemias. Nat Genet 2015;47:330–7. genomic landscape of core-binding factor acute myeloid leukemias. 12. Chen B, Jiang L, Zhong ML, Li JF, Li BS, Peng LJ, et al. Identifica- Nat Genet 2016;48:1551–6. tion of fusion genes and characterization of transcriptome features 31. Farrar JE, Schuback HL, Ries RE, Wai D, Hampton OA, Trevino LR, in T-cell acute lymphoblastic leukemia. Proc Natl Acad Sci U S A et al. Genomic profiling of pediatric acute myeloid leukemia reveals 2018;115:373–8. a changing mutational landscape from disease diagnosis to relapse. 13. De Keersmaecker K, Atak ZK, Li N, Vicente C, Patchett S, Girardi T, Cancer Res 2016;76:2197–205. et al. Exome sequencing identifies mutation in CNOT3 and riboso- 32. Garg M, Nagata Y, Kanojia D, Mayakonda A, Yoshida K, Haridas Keloth S, mal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia. et al. Profiling of somatic mutations in acute myeloid leukemia with Nat Genet 2013;45:186–90. FLT3-ITD at diagnosis and relapse. Blood 2015;126:2491–501. 14. Holmfeldt L, Wei L, Diaz-Flores E, Walsh M, Zhang J, Ding L, et al. 33. Greif PA, Hartmann L, Vosberg S, Stief SM, Mattes R, Hellmann I, The genomic landscape of hypodiploid acute lymphoblastic leuke- et al. Evolution of cytogenetically normal acute myeloid leuke- mia. Nat Genet 2013;45:242–52. mia during therapy and relapse: an exome sequencing study of 50 15. Liu YF, Wang BY, Zhang WN, Huang JY, Li BS, Zhang M, et al. patients. Clin Cancer Res 2018;24:1716–26. Genomic profiling of adult and pediatric B-cell acute lymphoblastic 34. Hirsch P, Zhang Y, Tang R, Joulin V, Boutroux H, Pronier E, et al. leukemia. EBioMedicine 2016;8:173–83. Genetic hierarchy and temporal variegation in the clonal history of 16. Liu Y, Easton J, Shao Y, Maciaszek J, Wang Z, Wilkinson MR, et al. acute myeloid leukaemia. Nat Commun 2016;7:12475. The genomic landscape of pediatric and young adult T-lineage acute 35. Lavallée VP, Baccelli I, Krosl J, Wilhelm B, Barabé F, Gendron P, et al. lymphoblastic leukemia. Nat Genet 2017;49:1211–8. The transcriptomic landscape and directed chemical interrogation of 17. Oshima K, Khiabanian H, da Silva-Almeida AC, Tzoneva G, Abate F, MLL-rearranged acute myeloid leukemias. Nat Genet 2015;47:1030–7. Ambesi-Impiombato A, et al. Mutational landscape, clonal evolution 36. Cancer Genome Atlas Research Network; Ley TJ, Miller C, Ding L, patterns, and role of RAS mutations in relapsed acute lymphoblastic Raphael BJ, Mungall AJ, et al. Genomic and epigenomic landscapes of leukemia. Proc Natl Acad Sci U S A 2016;113:11306–11. adult de novo acute myeloid leukemia. N Engl J Med 2013;368:2059–74. 18. Papaemmanuil E, Rapado I, Li Y, Potter NE, Wedge DC, Tubio J, et al. 37. Madan V, Shyamsunder P, Han L, Mayakonda A, Nagata Y, Sundaresan J, RAG-mediated recombination is the predominant driver of onco- et al. Comprehensive mutational analysis of primary and relapse acute genic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. promyelocytic leukemia. Leukemia 2016;30:1672–81. Nat Genet 2014;46:116–25. 38. Papaemmanuil E, Gerstung M, Bullinger L, Gaidzik VI, Paschka P, 19. Paulsson K, Lilljebjörn H, Biloglav A, Olsson L, Rissler M, Castor A, Roberts ND, et al. Genomic classification and prognosis in acute et al. The genomic landscape of high hyperdiploid childhood acute myeloid leukemia. N Engl J Med 2016;374:2209–21. lymphoblastic leukemia. Nat Genet 2015;47:672–6. 39. Sehgal AR, Gimotty PA, Zhao J, Hsu JM, Daber R, Morrissette JD, 20. Russell LJ, Jones L, Enshaei A, Tonin S, Ryan SL, Eswaran J, et al. et al. DNMT3A mutational status affects the results of dose-escalated Characterisation of the genomic landscape of CRLF2-rearranged acute induction therapy in acute myelogenous leukemia. Clin Cancer Res lymphoblastic leukemia. Genes Cancer 2017;56:363–72. 2015;21:1614–20. 21. Ryan SL, Matheson E, Grossmann V, Sinclair P, Bashton M, Schwab C, 40. Sood R, Hansen NF, Donovan FX, Carrington B, Bucci D, Maskeri B, et al. The role of the RAS pathway in iAMP21-ALL. Leukemia 2016; et al. Somatic mutational landscape of AML with inv(16) or t(8;21) 30:1824–31. identifies patterns of clonal evolution in relapse leukemia. Leukemia 22. Stengel A, Schnittger S, Weissmann S, Kuznia S, Kern W, Kohlmann A, 2016;30:501–4. et al. TP53 mutations occur in 15.7% of ALL and are associated with 41. Thol F, Klesse S, Köhler L, Gabdoulline R, Kloos A, Liebich A, et al. MYC-rearrangement, low hypodiploidy, and a poor prognosis. Blood Acute myeloid leukemia derived from lympho-myeloid clonal hemat- 2014;124:251–8. opoiesis. Leukemia 2017;31:1286–95.

May 2021 blood CANCER DISCOVERY | OF11

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. RESEARCH BRIEF Feusier et al.

42. Kim T, Tyndel MS, Kim HJ, Ahn JS, Choi SH, Park HJ, et al. Spectrum 55. Schwartz JR, Ma J, Lamprecht T, Walsh M, Wang S, Bryant V, et al. of somatic mutation dynamics in chronic myeloid leukemia follow- The genomic landscape of pediatric myelodysplastic syndromes. Nat ing tyrosine kinase inhibitor therapy. Blood 2017;129:38–47. Commun 2017;8:1557. 43. Togasaki E, Takeda J, Yoshida K, Shiozawa Y, Takeuchi M, Oshima M, 56. Lundberg P, Karow A, Nienhold R, Looser R, Hao-Shen H, Nissen I, et al. Frequent somatic mutations in epigenetic regulators in newly et al. Clonal evolution and clinical correlates of somatic mutations in diagnosed chronic myeloid leukemia. Blood Cancer J 2017;7:e559. myeloproliferative neoplasms. Blood 2014;123:2220–8. 44. Mason CC, Khorashad JS, Tantravahi SK, Kelley TW, Zabriskie MS, 57. Nangalia J, Massie CE, Baxter EJ, Nice FL, Gundem G, Wedge DC, Yan D, et al. Age-related mutations and chronic myelomonocytic et al. Somatic CALR mutations in myeloproliferative neoplasms with leukemia. Leukemia 2016;30:906–13. nonmutated JAK2. N Engl J Med 2013;369:2391–405. 45. Merlevede J, Droin N, Qin T, Meldi K, Yoshida K, Morabito M, et al. 58. Steensma DP, Bejar R, Jaiswal S, Lindsley RC, Sekeres MA, Hasserjian RP, Mutation allele burden remains unchanged in chronic myelomono- et al. Clonal hematopoiesis of indeterminate potential and its distinc- cytic leukaemia responding to hypomethylating agents. Nat Commun tion from myelodysplastic syndromes. Blood 2015;126:9–16. 2016;7:10767. 59. Papaemmanuil E, Gerstung M, Malcovati L, Tauro S, Gundem G, 46. Palomo L, Garcia O, Arnan M, Xicoy B, Fuster F, Cabezón M, et al. Van Loo P, et al. Clinical and biological implications of driver muta- Targeted deep sequencing improves outcome stratification in chronic tions in myelodysplastic syndromes. Blood 2013;122:3616–27. myelomonocytic leukemia with low risk cytogenetic features. Onco- 60. Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, target 2016;7:57021–35. Willsey AJ, et al. De novo mutations revealed by whole-exome sequenc- 47. Patnaik MM, Barraco D, Lasho TL, Finke CM, Hanson CA, Ketterling RP, ing are strongly associated with autism. Nature 2012;485:237–41. et al. DNMT3A mutations are associated with inferior overall and 61. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, leukemia-free survival in chronic myelomonocytic leukemia. Am Huddleston J, et al. An integrated map of structural variation in 2,504 J Hematol 2017;92:56–61. human genomes. Nature 2015;526:75–81. 48. Caye A, Strullu M, Guidez F, Cassinat B, Gazal S, Fenneteau O, 62. Fakhro KA, Staudt MR, Ramstetter MD, Robay A, Malek JA, Badii R, et al. Juvenile myelomonocytic leukemia displays mutations in com- et al. The Qatar genome: a population-specific tool for precision ponents of the RAS pathway and the PRC2 network. Nat Genet medicine in the Middle East. Hum Genome Var 2016;3:16016. 2015;47:1334–40. 63. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, 49. Stieglitz E, Taylor-Weiner AN, Chang TY, Gelston LC, Wang YD, et al. The mutational constraint spectrum quantified from variation Mazor T, et al. The genomic landscape of juvenile myelomonocytic in 141,456 humans. Nature 2020;581:434–43. leukemia. Nat Genet 2015;47:1326–33. 64. Knudson AG Jr. Mutation and cancer: statistical study of retinoblas- 50. Haferlach T, Nagata Y, Grossman V, Okuno Y, Bacher U, Nagae G, toma. Proc Natl Acad Sci U S A 1971;68:820–3. et al. Landscape of genetic lesions in 944 patients with myelodysplas- 65. Lim ET, Uddin M, De Rubeis S, Chan Y, Kamumbu AS, Zhang X, et al. tic syndromes. Leukemia 2014;28:241–7. Rates, distribution and implications of postzygotic mosaic mutations 51. Pastor V, Hirabayashi S, Karow A, Wehrle J, Kozyra EJ, Nienhold R, in autism spectrum disorder. Nat Neurosci 2017;20:1217–24. et al. Mutational landscape in children with myelodysplastic syn- 66. Tatton-Brown K, Zachariou A, Loveday C, Renwick A, Mahamdallie S, dromes is distinct from adults: specific somatic drivers and novel Aksglaede L, et al. The Tatton-Brown-Rahman syndrome: a clinical germline variants. Leukemia 2017;31:759–62. study of 55 individuals with de novo constitutive DNMT3A variants. 52. Walter MJ, Shen D, Shao J, Ding L, White BS, Kandoth C, et al. Clonal Wellcome Open Res 2018;3:46. diversity of recurrently mutated genes in myelodysplastic syndromes. 67. Yoshizato T, Dumitriu B, Hosokawa K, Makishima H, Yoshida K, Leukemia 2013;27:1275–82. Townsley D, et al. Somatic mutations and clonal hematopoiesis in 53. Yoshida K, Sanada M, Shiraishi Y, Nowak D, Nagata Y, Yamamoto R, aplastic anemia. N Engl J Med 2015;373:35–47. et al. Frequent pathway mutations of splicing machinery in myelod- 68. Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, ysplasia. Nature 2011;478:64–9. et al. Patterns of somatic structural variation in human cancer genomes. 54. Churpek JE, Pyrtel K, Kanchi KL, Shao J, Koboldt D, Miller CA, Nature 2020;578:112–21. et al. Genomic analysis of germ line and somatic variants in 69. 1000 Genomes Project Consortium; Auton A, Brooks LD, Durbin RM, familial myelodysplasia/acute myeloid leukemia. Blood 2015;126: Garrison EP, Kang HM, et al. A global reference for human genetic 2484–90. variation. Nature 2015;526:68–74.

OF12 | blood CANCER DISCOVERY May 2021 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research. Large-scale Identification of Clonal Hematopoiesis and Mutations Recurrent in Blood Cancers

Julie E. Feusier, Sasi Arunachalam, Tsewang Tashi, et al.

Blood Cancer Discov Published OnlineFirst March 3, 2021.

Updated version Access the most recent version of this article at: doi: 10.1158/2643-3230.BCD-20-0094

Supplementary Access the most recent supplemental material at: Material http://bloodcancerdiscov.aacrjournals.org/content/suppl/2021/02/26/2643-3230.BCD-20-0094.DC 1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://bloodcancerdiscov.aacrjournals.org/content/early/2021/03/31/2643-3230.BCD-20-0094.1. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 28, 2021. Copyright 2021 American Association for Cancer Research.