Transcriptional landscape of precursor acute lymphoblastic based on an international study of 1,223 cases

Jian-Feng Lia,1, Yu-Ting Daia,1, Henrik Lilljebjörnb,1, Shu-Hong Shenc, Bo-Wen Cuia, Ling Baia, Yuan-Fang Liua, Mao-Xiang Qiand, Yasuo Kubotae, Hitoshi Kiyoif, Itaru Matsumurag, Yasushi Miyazakih, Linda Olssonb, Ah Moy Tani, Hany Ariffinj, Jing Chenc, Junko Takitak, Takahiko Yasudal, Hiroyuki Manom, Bertil Johanssonb,n, Jun J. Yangd,o, Allen Eng-Juh Yeohp, Fumihiko Hayakawaq, Zhu Chena,r,s,2, Ching-Hon Puio,2, Thoas Fioretosb,n,2, Sai-Juan Chena,r,s,2, and Jin-Yan Huanga,s,2

aState Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, National Research Center for Translational Medicine, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200025 Shanghai, China; bDepartment of Laboratory Medicine, Division of Clinical Genetics, Lund University, 22184 Lund, Sweden; cKey Laboratory of Pediatric Hematology and Oncology, Ministry of Health, Department of Hematology and Oncology, Shanghai Children’s Medical Center, Shanghai Jiao Tong University School of Medicine, 200127 Shanghai, China; dDepartment of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105; eDepartment of Pediatrics, Graduate School of Medicine, The University of Tokyo, 1138654 Tokyo, Japan; fDepartment of Hematology and Oncology, Nagoya University Graduate School of Medicine, 4668550 Nagoya, Japan; gDivision of Hematology and Rheumatology, Kinki University Faculty of Medicine, 5778502 Osaka, Japan; hDepartment of Hematology, Atomic Bomb Disease Institute, Nagasaki University, 8528521 Nagasaki, Japan; iDepartment of Paediatrics, KK Women’s & Children’s Hospital, 229899 Singapore; jPaediatric Haematology-Oncology Unit, University of Malaya Medical Centre, 59100 Kuala Lumpur, Malaysia; kDepartment of Pediatrics, Graduate School of Medicine, Kyoto University, 6068501 Kyoto, Japan; lClinical Research Center, Nagoya Medical Center, National Hospital Organization, 4600001 Nagoya, Japan; mNational Center Research Institute, 1040045 Tokyo, Japan; nDepartment of Clinical Genetics, University and Regional Laboratories, Region Skåne, Lund 22185, Sweden; oDepartment of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105; pCentre for Translational Research in Acute Leukaemia, Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, 119228 Singapore; qDepartment of Pathophysiological Laboratory Sciences, Nagoya University Graduate School of Medicine, 4618673 Nagoya, Japan; rKey Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China; and sPôle de Recherches Sino-Français en Science du Vivant et Génomique, Laboratory of Molecular Pathology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China GENETICS

Contributed by Zhu Chen, October 17, 2018 (sent for review August 29, 2018; reviewed by Christine J. Harrison and Patrick Tan) Most B cell precursor acute lymphoblastic leukemia (BCP ALL) can four decades, most of the recurring chromosomal abnormalities, be classified into known major genetic subtypes, while a sub- including aneuploidy, chromosomal rearrangements/ fu- stantial proportion of BCP ALL remains poorly characterized in sions (e.g., ETV6–RUNX1, BCR–ABL1, and TCF3–PBX1), and relation to its underlying genomic abnormalities. We therefore rearrangements of KMT2A (previously MLL), were identified by initiated a large-scale international study to reanalyze and de- lineate the transcriptome landscape of 1,223 BCP ALL cases using RNA sequencing. Fourteen BCP ALL subgroups Significance (G1 to G14) were identified. Apart from extending eight previously described subgroups (G1 to G8 associated with MEF2D fusions, In BCP ALL, molecular classification is used for risk stratification TCF3–PBX1 fusions, ETV6–RUNX1–positive/ETV6–RUNX1–like, DUX4 and influences treatment strategies. We reanalyzed the tran- fusions, ZNF384 fusions, BCR–ABL1/Ph–like, high hyperdiploidy, and scriptomic landscape of 1,223 BCP ALLs and identified 14 sub- KMT2A fusions), we defined six additional gene expression sub- groups based on their transcriptional profiles. Eight of these groups: G9 was associated with both PAX5 and CRLF2 fusions; (G1 to G8) are previously well-known subgroups, harboring G10 and G11 with in PAX5 (p.P80R) and IKZF1 (p.N159Y), specific genetic abnormalities. The sample size allowed the respectively; G12 with IGH–CEBPE fusion and mutations in ZEB2 identification of six previously undescribed subgroups, con- (p.H1038R); and G13 and G14 with TCF3/4–HLF and NUTM1 fu- sisting of cases harboring PAX5 or CRLF2 fusions (G9), PAX5 sions, respectively. In pediatric BCP ALL, subgroups G2 to G5 and (p.P80R) mutations (G10), IKZF1 (p.N159Y) mutations (G11), G7 (51 to 65/67 ) were associated with low-risk, G7 either ZEB2 (p.H1038R) mutations or IGH–CEBPE fusions (G12), (with ≤50 chromosomes) and G9 were intermediate-risk, whereas HLF rearrangements (G13), or NUTM rearrangements (G14). In G1, G6, and G8 were defined as high-risk subgroups. In adult BCP addition, this study allowed us to determine the prognostic ALL, G1, G2, G6, and G8 were associated with high risk, while G4, impact of several recently defined subgroups. This study G5, and G7 had relatively favorable outcomes. This large-scale suggests that RNA sequencing should be a valuable tool in the transcriptome sequence analysis of BCP ALL revealed distinct mo- routine diagnostic workup for ALL. lecular subgroups that reflect discrete pathways of BCP ALL, informing disease classification and prognostic stratification. The Author contributions: Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. designed research; J.-F.L., combined results strongly advocate that RNA sequencing be in- Y.-T.D., H.L., S.-H.S., B.-W.C., L.B., Y.-F.L., M.-X.Q., Y.K., H.K., I.M., Y.M., L.O., A.M.T., H.A., J.C., J.T., T.Y., H.M., B.J., J.J.Y., A.E.-J.Y., F.H., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. troduced into the clinical diagnostic workup of BCP ALL. performed research; S.-H.S., Y.-F.L., J.C., J.J.Y., and F.H. collected the samples and clinical data; J.-F.L., Y.-T.D., H.L., B.-W.C., L.B., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. analyzed data; BCP ALL | RNA-seq | subtypes | gene fusion | gene Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. wrote the paper; and J.-F.L., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. critically revised the manuscript. Reviewers: C.J.H., Newcastle University; and P.T., Duke–NUS Medical School. cell precursor acute lymphoblastic leukemia (BCP ALL), the Bmost common childhood cancer, is a highly heterogeneous The authors declare no conflict of interest. malignant hematological disorder (1). Previous genome- and/or Published under the PNAS license. 1 transcriptome-wide analyses of BCP ALLs have greatly im- J.-F.L., Y.-T.D., and H.L. contributed equally to this work. 2To whom correspondence may be addressed. Email: [email protected], ching-hon.pui@ proved our understanding of the pathogenesis and prognostic stjude.org, [email protected], [email protected], or [email protected]. impact of many molecular abnormalities in BCP ALL (2, 3). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. Structural chromosomal alterations as well as sequence muta- 1073/pnas.1814397115/-/DCSupplemental. tions are common in childhood and adult BCP ALL. In the last

www.pnas.org/cgi/doi/10.1073/pnas.1814397115 PNAS Latest Articles | 1of10 Downloaded by guest on September 24, 2021 cytogenetics and fluorescence in situ hybridization. Subsequently, those involving ZNF384, MEF2D,andDUX4 rearrangements (6– gene expression profiling revealed that these cytogenetic sub- 11), among those cases with no defining chromosomal abnor- groups displayed specific gene expression patterns (3–5). With the malities, termed “B-other-ALL.” advent of genome sequencing technology, several groups discov- However, it remained unknown whether additional novel BCP ered a large number of novel gene mutations and fusions, such as ALL subtypes could be detected by integrated analysis of pooled

Age Gender MEF2D fusions TCF3-PBX1 ETV6-RUNX1 ETV6-RUNX1-like DUX4 fusions ZNF384/ZNF362 fusions BCR-ABL1 Ph-like CRLF2 fusions Hyperdiploidy KMT2A fusions PAX5 fusions TCF3/4-HLF NUTM1 fusions IGH-CEBPE PAX5 (p.P80R) ZEB2 (p.H1038R) IKZF1 (p.N159Y) G1 G2 G3 G4 G5 G6 G7 G8 G9 G10-14

Subgroup

Color Key Low High Age Adult Paediatric Gender Male Female Subgroup G1 (MEF2D fusions) G2 (TCF3-PBX1) G3 (ETV6-RUNX1/-like) G4 (DUX4 fusions) G5 (ZNF384 fusions) G6 (BCR-ABL1/Ph-like) G7 (Hyperdiploidy) G8 (KMT2A fusions) G9 (PAX5 and CRLF2 fusions) PAX5 G10 G11 G12 G13 G14 G10 [ (p.P80R) mutation] Subgroup(G10-G14) G11 [IKZF1 (p.N159Y) mutation] Age Gender G12 [ZEB2 (p.H1038R)/IGH-CEBPE] PAX5 (p.P80R) TCF3/4-HLF IKZF1 (p.N159Y) G13 ( ) IGH-CEBPE G14 (NUTM1 fusions) ZEB2 (p.H1038R) TCF3/4-HLF NUTM1 fusions

Fig. 1. Two-step unsupervised hierarchical clustering of the global gene expression profile from 1,223 BCP ALL patients. In the gene expression subgroups of G1 to G7 (Left) and G8 to G14 (Right), columns indicate 1,223 BCP ALL patients and rows represent gene expression levels or genetic features for each patient. showing over- and underexpression in the heatmap are shown in red and blue, respectively. The first box above the heatmap indicates genotypes and fusion genes, followed by a box including three clusters of hotspot sequence mutations defined in this analysis. The first row below the heatmap specifies the 14 BCP ALL subgroups identified on the basis of gene expression profiles. In the unsupervised hierarchical clustering heatmap of G10 to G14 (Lower Right), columns represent patients and rows are top variance genes in G10 to G14. The box below the heatmap indicates the five gene expression subgroups, gender, and genotypes of the G10 to G14 clusters.

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021 datasets from studies with otherwise relatively small sample the gene expression subgroups G1 to G14. Gene mutations sizes. We hypothesized that the versatility provided by RNA-seq among signaling molecules were enriched in subgroups G5, G7, (sequencing) would uncover otherwise undetected genetic ab- G9, and G10, while G4, G10, G11, and G12 harbored a higher normalities in BCP ALL, providing that sufficient numbers of number of variants in genes. HIST family cases were analyzed. Thus, through the formation of an in- (HIST1H2AG and HIST1H2AI) point mutations located in the ternational consortium of five major study groups, we have de- histone H2A type 1 domain (SI Appendix,Fig.S7A) were highly lineated the transcriptomic landscape of BCP ALL and at the correlated with G2 (TCF3–PBX1), while WHSC1 (NSD2) point same time identified new subgroups of biological and clinical mutations (p.E1099K) in the SET domain (SI Appendix,Fig. importance. S7B) were significantly associated with G3 (ETV6–RUNX1– positive/ETV6–RUNX1–like; SI Appendix,Figs.S5–S7). Results Co-occurrence or mutual exclusivity of mutations was also Identification of BCP ALL Subgroups with Distinctive Gene Expression evaluated using two-sided Fisher’s exact test. A total of 36 gene Profiles and Genomic Aberrations. To comprehensively identify pairs (for example, TP53 and ) exhibited significant co- BCP ALL subtypes, we first systematically classified gene ex- occurrence (P < 0.05; SI Appendix, Fig. S6B). Along with the pression profiles, gene fusions, and gene mutations from RNA- novel subgroups defined in this study (G9 to G14), 13 gene pairs seq data of 1,223 BCP ALL cases from five significant patient (for example, PAX5 and PTPN11,andZEB2 and NRAS)exhibited cohorts (Table 1 and SI Appendix, Fig. S2 and Dataset S2). Based significant co-occurrence (SI Appendix,Fig.S6C and D). In G9, four on a consecutive two-step unsupervised clustering, 14 distinct gene pairs, namely PAX5 and IKZF1, JAK1 and SETD2, SH2B3 subgroups based on their gene expression signatures were iden- and ASXL1, and CDKN2A and ARID1B, exhibited significant co- tified (G1 to G14) (Fig. 1 and Table 2). Most of these gene occurrence (P < 0.05; SI Appendix, Fig. S6D). expression subgroups segregated with well-known genetic ab- Enrichment of certain mutations differed between pediatric normalities. TCF3–PBX1 fusions were present among the and adult BCP ALL patients. Transcription factor mutations, G2 subgroup (n = 76, 6%); ETV6–RUNX1 fusion belonged to such as in RUNX1, were more frequent in adult ALL, while G3; BCR–ABL1 (Ph) and BCR–ABL1–like (Ph-like, including a signaling molecule and epigenetic factor WHSC1 mutations were cluster with CRLF2 fusions) comprised G6 (n = 167, 14%); and more prevalent in pediatric BCP ALL (Datasets S5 and S6). cases with a hyperdiploid karyotype formed the subgroup G7 n = ZNF362 ZNF384 ( 408, 33%). Three subgroups which had recently been Fusions Cluster with Rearrangements (G5) and Display GENETICS reported identified among B-other-ALL cases were those with Activation of the JAK-STAT Pathway. Four cases harbored pre- MEF2D fusions (G1; n = 39, 3%), DUX4 rearrangements (G4; viously undescribed ZNF362 rearrangements (n = 4), including n = 63, 5%), and ZNF384 fusions (G5; n = 74, 6%) (6–11). SMARCA2–ZNF362 (n = 3) and TAF15–ZNF362 (n = 1). These These recently described subgroups formed distinctive gene cases clustered within the G5 subgroup, otherwise associated expression-based clusters, consistent with prior reports (6, 7, 10, with ZNF384 fusions (Fig. 2 and SI Appendix, Figs. S8A and S9). 11). The most recently defined BCP ALL ETV6–RUNX1–like ZNF384 and ZNF362 are homologous C2H2-type zinc-finger cluster, characterized by the absence of ETV6–RUNX1 fusions transcription factors containing six zinc fingers that belong to but with similar gene expression profiles to ETV6–RUNX1– the zinc-finger 384/nuclear matrix transcription factor 4 positive BCP ALLs (6), was also found among our combined (ZFAM4) gene family (14). Of note, the zinc-finger domains datasets. In concordance with previous findings (6), both fusions were retained in both fusion (SI Appendix, Fig. S8B), involving ETV6 and fusions involving IKZF1 were common in and both clusters showed similar gene expression profiles with these ETV6–RUNX1–like cases (Dataset S2). However, all activated JAK-STAT signaling pathway (SI Appendix, Fig. S8C). ETV6–RUNX1–negative cases exhibiting a gene expression pro- Moreover, the fusion partners of ZNF362, namely TAF15 and file similar to ETV6–RUNX1–positive cases were defined as SMARCA2, were also found to fuse to ZNF384, with similar ETV6–RUNX1–like. Together, ETV6–RUNX1–positive/ETV6– breakpoints. RUNX1–like BCP ALL constituted G3 (n = 161, 13%). KMT2A- rearranged cases formed a distinct subgroup (G8; n = 56, 5%). Previously Undescribed Subgroups Associated with Different Gene Notably, six previously undescribed gene expression subgroups Fusions/Sequence Mutations. (G9 to G14) with distinct genomic abnormalities were identified. G9: PAX5 and CRLF2 fusions are representative of this subgroup. G9 (n = 111, 9%) was associated with PAX5 fusions and “Ph- According to the gene expression profiles, 46 cases with PAX5 like” ALL with CRLF2 fusions (12). G10 (n = 23, 2%) and G11 fusions and 13 cases with CRLF2 fusions (accounting for 41 and (n = 6, <1%) were characterized by two hotspot mutations in 12%, respectively) clustered together in G9 (n = 111). Previous PAX5 (p.P80R) (21/22, 96%) and IKZF1 (p.N159Y) (6/6, 100%), work identified CRLF2 fusions in Down syndrome ALL and Ph- respectively. The subgroup G12 (n = 8, <1%) was enriched for like BCP ALL, each of them accounting for approximately half hotspot mutation in ZEB2 (p.H1038R) (5/8, 63%) and IGH– of the cases (12, 15). In our study, 30% of the cases with CRLF2 CEBPE fusions (3/8, 27%). G13 (n = 11, <1%) and G14 (n = fusions (13/44) were found in G9 and 57% (25/44) in the BCR– 20, 2%) were associated with TCF3/4–HLF (7/11, 64%) and ABL1/Ph-like subgroup (G6), with the remaining cases present NUTM1 (6/20, 30%) rearrangements, respectively. in G7 and G10 (Fig. 1). Notably, all 13 CRLF2 fusions in G9 were P2RY8–CRLF2 fusions, in contrast to those in G6 in Nonsilent Sequence Mutation Profile. We next analyzed nonsilent which the fusion partners of CRLF2 were either P2RY8 or IGH. sequence variants in available whole exome sequencing (WES) In the 13 CRLF2 fusion cases (G9), seven coexisted with PAX5 and RNA-seq data, based on in-house analysis criteria from fusions. Signaling molecule mutations were also significantly previous studies (6, 8, 11, 13). We identified 44 genes that were enriched in G9 (P < 0.001; SI Appendix, Fig. S5 and Dataset S5), recurrently mutated in at least 1% of the cases (12/1,223 cases). a feature reminiscent of Down syndrome ALL (12). Compared Nonsilent variants in NRAS, KRAS, FLT3, KMT2D, PAX5, with the CRLF2 fusion clusters in G6, the PI3K-Akt signaling PTPN11, CREBBP, and TP53 exhibited the highest mutation (e.g., FLT4 and EGF), cytokine–cytokine interaction frequencies (3 to 14%) (SI Appendix,Figs.S5andS6A). The (e.g., CCL17 and IL2RA), and hematopoietic cell lineage (e.g., mutated genes (>1%) were functionally divided into five cat- CD33 and CD34) pathways were significantly down-regulated egories: signaling molecules, transcription factors, epigenetic in the CRLF2 fusion-positive cases in G9 (SI Appendix,Fig. factors, cell cycle, and others (Dataset S3). Distinct gene mu- S10), whereas a B cell-specific member of the tumor necrosis tation categories showed different levels of enrichment among factor (TNF) receptor superfamily, TNFRSF13B, was up-regulated

Li et al. PNAS Latest Articles | 3of10 Downloaded by guest on September 24, 2021 A Missense Frameshift Nonsense N210fs B Protein insertion Protein deletion R225L * 21/22 cases exon3:c.C239G (p.P80R) cluster in G10 M335fs p.V26G N29K * N29D A100P S131G P363L D2H R31W N106D V132A A375V Y7C P34L(3) L58P F110I S133R(2) R377X V20G R38C L58F(3) F110V R140Q K196X X392W P80R(22) Q22P V26G(10) R38H(2) C64F A111D R140L(2) E201fs(3)X392R(2) 10 22 3 3 2 p.P80R PAX5 0 100 200 300 392 PAX5 C PAX and DNA binding Pax2_C E DPEP1 IKZF1 IKZF1 (p.N159Y) (G11) PAX5 20.0 HHAT 20 PAX5 LRRK1 Up-regulated *** 75 (p.P80R) (G10) Down-regulated **** Up-regulated 17.5 ns FLT3 18 Down-regulated MEGF10 20 ns ns P2RY8 STAT5A 15.0 SLC17A9 16 50 SMAD1 ARHGEF28 KCNK3 ARSG 14 CD72 TTC16 EBF3 GNB3 12.5 CD58 10 FLT4 MYO10 LNX1 (adjusted P −value) (adjusted P −value) 25 INHBB 12 CRLF2

10 FAT1 10 YAP1 PCDH12 NAV2 10.0 CCL17 CXCL12 10 IGF2BP1 IQCJ SALL1 Expression level (FPKM) −Log 0 7.5 −Log PAX5 (p.P80R) (G10) −10 −5 0 5 Expression level (FPKM) IKZF1 (p.N159Y) (G11) −15 −10 −5 0 5 10 PAX5 IKZF1 Other mutations log2 (fold change) Other mutations log2 (fold change) Other BCP-ALLs Other BCP-ALLs HALLMARK PI3K_AKT_MTOR KEGG Cell adhesion KEGG B-cell receptor signaling KEGG JAK-STAT signaling signaling molecules (CAMs) pathway pathway 0.1 0.3 0.0 0.05 0.2 -0.2 -0.1 0.1 = -0.15 0.0 NES 1.63 -0.4 NES = -1.47 NES = -1.43 -0.3 NES = -1.61 P = 0.01 P = 0.02 P = 0.11 P = 0.005 -0.1 -0.6 -0.35 -0.5 Enrichment Score (ES) PAX5 (p.P80R) Other PAX5 (p.P80R) Other IKZF1 (p.N159Y) Other IKZF1 (p.N159Y) Other (G10) BCP-ALLs (G10) BCP-ALLs Enrichment Score (ES) (G11) BCP-ALLs (G11) BCP-ALLs D * 6/6 cases exon5:c.A475T (p.N159Y) cluster in G11 CEBPE * E170K chr14:23,586,738 L177P H L117_K118insNQ D186G [0 - 1264] R137_S138insAL D186A(2) D2Y R143Q(3) L216X D285N Y503X

S17fs G158S N159Y(6) T244A Y348C M459fs R511X chr14:23,587,791 3 6 2

SJBALL015446 [0 - 264] IKZF1 S65 0 100 200 300 400 520 C2H2 Zn finger * Cluster in G12 chr14:23,587,791 F * 5/15 cases exon10:c.A3113G (p.H1038R) [0 - 12448] cluster in G12 N749S J31 H777R P595L A813T Q1072R(2) chr14:23,586,607

M404I I679N M824T H1038R(15) Q1072K(2) [0 - 440] 15 C097 ZEB2 0 200 400 600 800 1000 1215 [0 - 840] G C2H2 Zn finger Homeodomain C2H2 Zn finger Wildtype GNAI1 ZEB2 (p.H1038R)/ I CEBPE IGH 20 - (G12) 23,587,000 23,588,000 23,589,000 SMAD1 Up-regulated Ref Gene Down-regulated −value) BMP2 Exon 2 Exon 1 P 15 NT5E Protein 0 281bZIP_1 Helix 10 LMO1 FAT1 KCTD12 (adjusted SALL4 ISL2 10 5 SLC47A1 p.H1038R −Log ZEB2 −10 −5 0 5 log2 (fold change)

Fig. 2. Schematic representation of identified PAX5 (p.P80R) (G10), IKZF1 (p.N159Y) (G11), and ZEB2 (p.H1038R)/IGH–CEBPE (G12) subgroups in BCP ALL. (A, D,andF) Protein domain plots and the positions of amino acid substitutions in distinct domains of the PAX5, IKZF1, and ZEB2 proteins. Hotspot mutations enriched in BCP ALL subgroups are marked with a red star (G10 to G12). (B and G) Structure prediction of the PAX5 and ZEB2 point mutations. The crystal structures of both the PAX5 and ZEB2 proteins were generated based on the Protein Data Bank using homology modeling. (C) Gene expression levels and gene set enrichment analysis (GSEA) of PAX5 (p.P80R) mutated cases. The violin plot (Left) shows the comparison of PAX5 expression levels between clusters of PAX5 (p.P80R)-positive samples, other PAX5 mutations, and all other cases. The mean and 25th and 75th percentiles are presented in the middle box of violin plots. The volcano plot (Right) shows differentially expressed genes between PAX5 (p.P80R)-positive (G10) patients and other patients. The x axis represents log2-transformed fold-change values, while the y axis is a −log10-transformed P value. Significantly up-regulated and down-regulated genes are shown in red and blue, respectively. GSEA plot of B-lymphocyte maturation and cell-adhesion molecules in PAX5 (p.P80R)-positive (G10) patients and other cases. P values were calculated by 1,000-gene set two-sided permutation tests. ns, not significant; *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. (E) Gene expression levels and GSEA of cases showing the IKZF1 (p.N159Y) mutation (G11). The violin plot (Left) shows the comparison of IKZF1 expression levels between the cluster of IKZF1 (p.N159Y) cases (G11), cluster of other IKZF1 mutations, and other patients. The P values were calculated using Student’s t test. The volcano plot (Right) shows differentially expressed genes between IKZF1 (p.N159Y)-positive (G11) and -negative cases. GSEA plot of B cell receptor and the JAK-STAT signaling pathway in IKZF1 (p.N159Y)-positive (G11) and -negative cases. (H) Sequencing read coverage of CEBPE in four cases with IGH–CEBPE–positive BCP ALL (three cases are clustered in G12). The blue arrows indicate the fusion breakpoints. (I) Gene expression volcano plot of ZEB2 (p.H1038R)/IGH–CEBPE (G12) cases. The volcano plot (Right) shows differentially expressed genes between ZEB2 (p.H1038R)/IGH–CEBPE (G12) cases and negative cases. FPKM, fragments per kilobase of transcript per million mapped reads; ns, not significant; NES, normalized enrichment score; *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021 among those cases with CRLF2 fusions in G9 (SI Appendix, Fig. expression of HLF (Fig. 3 A–C). Down-regulation of the JAK- S10C) (16). However, the expression patterns of cytokine receptor STAT and an up-regulation of the NOTCH signaling pathways and tyrosine kinase signaling genes (CRLF2, PDGFRB, JAK1, were also noted (Fig. 3D). Four cases with low expression of JAK2, JAK3,andRAS) were similar in the CRLF2 fusion-positive HLF, but lacking TCF3/4–HLF fusions, were assigned to this cases in G9 and G6. cluster (Fig. 1), based on evidence of expression signatures G10: PAX5 (p.P80R) point mutation is strongly associated with a distinct similar to TCF3/4–HLF fusion (e.g., BCL2, PAX5, JAK2, and gene expression profile. PAX5 encodes the B cell lineage-specific STAT5), suggesting that alternative genetic alterations may elicit activator protein that is normally expressed at the early stage of the same transcriptional program. B cell differentiation (17). It has previously been reported that G14: NUTM1 fusions with aberrantly high expression of NUTM1. NUTM1 PAX5 haploinsufficiency is central to ALL pathogenesis (17). In is a chromatin regulator that functions to recruit p300, leading to the present study, 64 cases harbored PAX5 sequence mutations, increased local histone acetylation (27). NUTM1 is normally only including p.P80R (n = 22), p.V26G (n = 10), p.L58F/L58P (n = expressed in testis, but is frequently involved in NUT midline 4), and others. PAX5 (p.P80R), located at the DNA-binding do- carcinoma (27). We found nine cases with distinct NUTM1 fu- main, was correlated with increased expression of PAX5 (P < sions (SI Appendix, Fig. S13), six of them clustering into the 0.001) compared with other BCP ALLs without PAX5 mutations G14 subgroup (Fig. 1). The predicted protein structure showed (Fig. 2 A–C and SI Appendix,Fig.S11A). Previous studies have that all NUTM1 fusions retained part of the NUT domain (Fig. 3 described heterozygous deletions of CDKN2A/B, IKZF1,and E and F). Furthermore, increased expression of NUTM1 result- PAX5 in PAX5 (p.P80R)-positive BCP ALL patients (18). Nota- ing from the fusion was found (Fig. 3G), possibly leading to a bly, 21 of the 22 PAX5 (p.P80R) cases clustered in subgroup G10, global change in chromatin acetylation. We also noted up- with no other known driver gene abnormalities detected, except regulation of ZYG11A, a cell-cycle regulator, and HOXA fam- for one case with a P2RY8–CRLF2 fusion (C184) (Fig. 1). PAX5 ily genes (Fig. 3H), which were slightly down-regulated in the (p.P80R)-positive cases showed up-regulation of PI3K/Akt/mTOR three NUTM1 fusion-positive cases which did not cluster in G14, signaling and down-regulation of cell-adhesion molecules (Fig. 2C). especially ZYG11A and HOXA9 (Fig. 3I). In addition, gene set As in G9, TNFRSF13B gene up-regulation was seen in this sub- enrichment analysis showed a higher expression level of the group (SI Appendix,Figs.S10C and S12A). NOTCH pathway and a down-regulation of genes in the Hedge- G11: IKZF1 (p.N159Y) point mutation associated with a distinct gene hog pathway among the G14 subgroup (Fig. 3J).

expression profile and increased SALL1 expression. Inherited or so- GENETICS matic sequence mutations of IKZF1 have previously been de- Prognostic Impact of Gene Expression Subgroups in BCP ALL. We scribed in BCP ALL (19–21). In the present series, 26 cases with were able to retrieve clinical follow-up data on 380 BCP ALL IKZF1 sequence abnormalities were found, with mutations cases (31%), allowing us to investigate the prognostic impact of commonly located in its DNA-binding domain (Fig. 2D and SI the different gene expression subgroups. As these patients were Appendix, Fig. S11B). Notably, IKZF1 (p.N159Y) cases (n = 6) treated on different protocols, we used BCR–ABL1–positive formed a gene expression subgroup (G11) without other de- cases (n = 35) as a reference group for “high-risk” and ETV6– tectable genomic rearrangements (Fig. 1). Pathway analysis RUNX1–positive cases (n = 96) as a reference group for “low- showed down-regulation of B cell receptor signaling and JAK- risk” BCP ALL. We then compared the outcomes in terms of 5-y STAT signaling such as FLT3 (P < 0.001) and STAT5A (P < overall survival and relapse-free survival rates of the other sub- 0.001) (Fig. 2E). We also found that spalt-like transcription factor 1 types against these two reference groups and classified them into (SALL1) was overexpressed (P < 0.001) in G11 (SI Appendix, Fig. low-, intermediate-, or high-risk groups. Due to the small sample S12B). Previous studies have reported that SALL1 can recruit sizes with available clinical data in subgroups G10 to G14, only histone deacetylase (HDAC) to mediate transcriptional repression cases in G1 to G9 were analyzed for treatment outcome. In and that its promoter is often methylated in BCP ALL (22, 23). pediatric BCP ALL, no deaths occurred in G2 (TCF3–PBX1), G12: hotspot point mutations in ZEB2 (p.H1038R) and IGH–CEBPE fusion. ETV6–RUNX1–like (a part of G3), G5 (ZNF384 fusions), and ZEB2 is a member of the Zfh1 family of two-handed zinc-finger/ high hyperdiploidy (G7; 51 to 65/67 chromosomes) (SI Appendix, homeodomain proteins. We and others have previously reported Fig. S14). In addition to these subtypes, G4 (DUX4 fusions) was mutations of ZEB2 in BCP ALL (9, 10, 24). Here, we showed also considered as low-risk, as no significant difference in overall that ZEB2 was recurrently mutated (n = 25), with the p.H1038R survival was found in comparison with G3 (n = 46, P = 0.476; SI hotspot mutation (n = 15) being located within the DNA-binding Appendix, Fig. S14). PAX5 and CRLF2 fusions (n = 33) and domain (Fig. 2 F and G and SI Appendix, Fig. S11C). Based on other cases in G7 (≤50 chromosomes, n = 14), however, were unsupervised clustering of gene expression, cases with ZEB2 classified into the intermediate-risk group due to an inferior 5-y (p.H1038R) (n = 5) clustered closely with cases with IGH–CEBPE overall survival compared with that of G3 (ETV6–RUNX1–positive/ fusions (n = 3). The remaining 10 cases with ZEB2 (p.H1038R) ETV6–RUNX1–like) (P < 0.05; SI Appendix,Fig.S14). In contrast, mutations mostly coexisted with other known gene fusions, such as G1 (MEF2D fusions) and G8 (KMT2A fusions) were associated with TCF3–PBX1 (n = 1), DUX4 fusions (n = 1), ZNF384 fusions (n = 5), high risk. Taken as a whole, among 295 pediatric patients, the RNA- and ZNF362 fusions (n = 1). A significant enrichment of NRAS seq–based subgroups stratified 193 (65%) as low-risk, 47 (16%) as mutations (5/8) was also found in the G12 cases. All four cases intermediate-risk, and 55 (19%) as high-risk groups. Based on the with IGH–CEBPE fusion exhibited a truncation of the 3′ UTR Cox proportional-hazards model, the range of hazard ratios between region of CEBPE (Fig. 2H). The known ALL driver gene LMO1 low and intermediate risk was 10.7 [95% confidence interval (CI) was up-regulated in G12 (Fig. 2I and SI Appendix, Fig. S12C). 3.3 to 34.1, P < 0.001] and between low and high risk was 14.52 G13: TCF3/4–HLF fusion. TCF3–HLF is a rare (<1%) fusion asso- (4.8 to 44.1, P < 0.001) (Fig. 4). For 5-y relapse-free survival, hazard ciated with high-risk BCP ALL and PAX5 haploinsufficiency ratios between low and intermediate risk was 2.1 (95% CI 1.0 to 4.5, from allelic deletion. It has been shown that TCF3–HLF–positive P = 0.04) and between low and high risk was 3.6 (1.9 to 6.8, P < cases may respond to the BCL2 inhibitor venetoclax (25). It has 0.001). In adult BCP ALL, in the absence of G3 cases, the BCR– also been shown that the homologous TCF4 may compensate for ABL–positive subgroup (G6) was used as the only reference, denot- TCF3 in a conditional knockout mice model (26). Herein, we ing high-risk BCP ALL. In this regard, G1 (MEF2D fusions), G2 identified one case with a TCF4–HLF fusion, which clustered (TCF3–PBX1), and G8 (KMT2A fusions) were associated with poor with six cases of TCF3–HLF in G13 (Fig. 1 and SI Appendix, Fig. prognosis, while G4 (DUX4 fusions), G5 (ZNF384 fusions), and G7 S13). Both TCF3–HLF and TCF4–HLF retained part of the HLF (high hyperdiploidy) were associated with an intermediate prognosis bZIP_2 domain (Fig. 3A) and displayed significant up-regulated (SI Appendix,Fig.S15). Overall, in adult BCP ALL, 47 (55%) of the

Li et al. PNAS Latest Articles | 5of10 Downloaded by guest on September 24, 2021 A B C

D

E

F

I

GH

J

Fig. 3. Schematic representation of identified TCF3/4–HLF and NUTM1 fusions in BCP ALL. (A) Protein structure of TCF3, TCF4, HLF, and their fusion proteins. The dotted red lines represent the joining points in the fusion proteins. (B) Violin plot of gene expression levels of HLF and NOTCH2 in TCF3/4–HLF fusion-positive and -negative patients. (C) Volcano plot of differentially expressed genes between TCF3/4–HLF fusion-positive and -negative patients. (D) GSEA plots of the JAK-STAT and NOTCH pathways in TCF3/4–HLF fusion-positive and -negative patients. (E) Protein structure of wild-type NUTM1 and distinct fusion partners. (F) Protein structure of each NUTM1 fusion protein. Red lines represent the joining points of the fusion proteins. (G) Violin plot of gene expression levels of NUTM1 in NUTM1 fusion-positive and -negative cases. (H) Volcano plot of differentially expressed genes between NUTM1 fusion-positive and -negative cases. (I) Violin plot of gene expression levels of ZYG11A and HOXA9 in NUTM1 fusion-positive and -negative patients, excluding KMT2A fusions. (J) GSEA plot of the NOTCH signaling and Hedgehog signaling pathways in NUTM1 fusion-positive (G14) and -negative patients excluding KMT2A fusions (G8) cases. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021 ABPediatric Pediatric 100 ++++++++++++++ +++++++++++++++++ +++++++ + +++++++++++++++ ++ +++ 100 +++++ ++++ ++ + + +++++++++++++++++++++ + ++ ++ ++ + + + ++ + +++ + + ++ +++++ +++ ++ + + ++ ++++++ ++ + + ++ + 75 ++ 75 +++ ++ + ++++ + + + + ++ +

50 Low-risk 50 Low-risk Intermediate-risk Intermediate-risk High-risk High-risk

Overall survival (%) 25 Intermediate-risk vs low-risk: 25 Intermediate-risk vs low-risk:

HR 10.7(95%CI 3.3 - 34.1), P < 0.001 Relapse free survival (%) HR 2.1(95%CI 1.0 - 4.5), P = 0.04 High-risk vs low-risk: High-risk vs low-risk: HR 14.52(95%CI 4.8 - 44.1), P < 0.001 HR 3.6(95%CI 1.9 - 6.8), P < 0.001 0 0 Number at risk 0102030405060 0102030405060 (number censored) Low-risk 193 (0) 179 (14) 151 (41) 147 (44) 144 (47) 139 (53) 127 (189) 188 (0) 169 (15) 145 (37) 138 (40) 129 (43) 123 (47)112 (165) Intermediate-risk 47 (0) 44 (0) 35 (5) 32 (7) 31 (8) 28 (11) 26 (37) 41 (0) 39 (0) 29 (5) 27 (7) 25 (8) 21 (11) 19 (31) High-risk 55 (0) 50 (2) 33 (9) 31 (11) 29 (12) 28 (13) 26 (41) 47 (0) 38 (3) 27 (8) 24 (9) 22 (10) 20 (11) 19 (31) Adult Adult ++ ++ CD100 ++ ++ 100 ++ + + + + + + + + + + ++ + ++++ + ++ 75 + +++ 75 + + + + + ++ + ++ + + + + + ++ + + + ++ + + ++

+ GENETICS + ++ 50 50 ++ + + ++ ++++

Overall survival (%) +++ 25 Intermediate-risk 25 Intermediate-risk High-risk Relapse free survival (%) High-risk High-risk vs Intermediate-risk: High-risk vs Intermediate-risk: HR 3.2(95%CI 1.5 - 6.9), P = 0.003 HR 3.0(95%CI 1.5 - 6.3), P = 0.003 0 0 Number at risk 0102030405060 0102030405060 (number censored) Intermediate-risk 47 (0) 36 (10) 25 (15) 20 (19) 17 (21) 10 (26) 7 (36) 45 (0) 30 (9) 21 (12) 17 (16) 14 (18) 9 (23) 6 (32) High-risk 38 (0) 23 (9) 8 (15) 5 (16) 4 (17) 4 (17) 1 (21) 35 (0) 13 (10) 6 (14) 5 (14) 3 (15) 3 (15) 1 (18)

Fig. 4. Overall survival rates of pediatric and adult BCP ALL patients. Five-year overall survival (OS) curves (A) and 5-y relapse-free survival (RFS) curves (B)of pediatric patients with low, intermediate, and high risk. Five-year OS curves (C) and 5-y RFS curves (D) of adult patients with intermediate and high risk. The ranges of hazard ratios (HRs) between low and intermediate risk, and low and high risk, are presented below the survival curves. Survival curves were es- timated with the Kaplan–Meier method and compared by two-sided log-rank test. Note: In pediatric cases, TCF3–PBX1 (G2), ETV6–RUNX1–like (G3), DUX4 fusions (G4), ZNF384 fusions (G5), and high hyperdiploidy (G7; 51 to 65/67 chromosomes) subgroups displayed a low risk, other cases in G7 (≤50 chromosomes) and PAX5 and CRLF2 fusions (G9) showed an intermediate risk, whereas MEF2D fusions (G1), BCR-ABL1 (G6), and KMT2A fusions (G8) defined high-risk subgroups. In adult cases, MEF2D fusions (G1), TCF3–PBX1 (G2), BCR–ABL1 (G6), and KMT2A fusions (G8) were associated with high risk, while DUX4 fusions (G4), ZNF384 fusions (G5), and hyperdiploidy (G7) had relatively favorable outcomes. (A–D) Numbers listed on the x-axis are in months.

85 patients were classified as intermediate-risk and 38 (45%) as We addressed survival among the various BCP ALL subtypes high-risk. in relation to pediatric and adult patients. Although the outcome data originated from different study groups, we validated the Discussion prognostic impact of all previously known major subgroups of In this comprehensive analysis of the transcriptomic landscape of BCP ALL and were able to ascertain the prognostic impact of 1,223 BCP ALL cases, we identified 14 subgroups of BCP ALL some of the newly defined subgroups. Among the pediatric co- based on their gene expression profiles. Of these, eight were hort in this study, TCF3–PBX1 (G2), ETV6–RUNX1–positive/ previously well-known subgroups, harboring specific genetic ab- ETV6–RUNX1–like (G3), DUX4 fusions (G4), ZNF384 fusions normalities (MEF2D fusions, TCF3–PBX1, ETV6–RUNX1–positive/ (G5), and high hyperdiploidy (G7; 51 to 65/67 chromosomes) ETV6–RUNX1–like, DUX4 fusions, ZNF384 fusions, BCR–ABL1 were defined as low-risk, PAX5 and CRLF2 fusions (G9) and and Ph-like, high hyperdiploidy, and KMT2A fusions). Notably, the other cases in G7 (≤50 chromosomes) were intermediate-risk, large sample size allowed us to identify six additional subgroups while MEF2D fusions (G1), BCR–ABL1/Ph-like (G6), and KMT2A (G9 to G14), harboring distinct genetic alterations including gene fusions (G8) were defined as high-risk groups. In adults, MEF2D fusions and/or sequence mutations. The number of cases for some fusions (G1), TCF3–PBX1 (G2), BCR–ABL1/Ph-like (G6), and of the candidate leukemogenic abnormalities identified, such as KMT2A fusions (G8) were high-risk, as previously described, ZNF362 fusions, NUTM1 fusions, and PAX5/CRLF2 fusions, and while DUX4 fusions (G4), ZNF384 fusions (G5), and high hyper- hotspot mutations of PAX5 (p.P80R), IKZF1 (p.N159Y), and ZEB2 diploidy (G7) showed relatively favorable outcomes, albeit in- (p.H1038R), was relatively small, which may explain the lack of ferior to those of their pediatric counterparts. Even though this detection of such cases in previous studies. is a large study, the numbers of patients with novel subgroups

Li et al. PNAS Latest Articles | 7of10 Downloaded by guest on September 24, 2021 (G10 to G14) are small and the treatments are heterogeneous, and identify such previously undescribed molecular phenocopies. thus more cases are needed to be analyzed in independent studies ZNF384 fusions were the predominant fusions in the G5 subgroup, in the future to validate their prognostic impact. while ZNF362 fusions displayed the same expression signature and Notably, fusion genes were generally mutually exclusive, sug- thus plausibly the same pathogenetic process. Also, the rare TCF4– gesting their role as drivers in the leukemogenic process. In con- HLF fusion evidently phenocopies TCF3–HLF rearrangements. trast, while some hotspot gene mutations, such as PAX5 (p.P80R) Intriguingly, the hotspot point mutation ZEB2 (p.H1038R) and IKZF1 (p.N159Y), were independent abnormalities suggestive appeared to phenocopy the IGH–CEBPE fusion, although the of their function as leukemia drivers, co-occurrence of many of the molecular relationship between these genetic aberrations is less other point mutations indicated their potential cooperative role in obvious than in the previous examples. All of these observations leukemogenesis. A schematic summary of the major gene expres- point to a common theme in BCP ALL: There are likely a lim- sional/structural aberrations identified in our analysis is provided in ited number of pathways leading to leukemogenesis in BCP Fig. 5. These alterations are functionally located within distinct and ALL, and each is identified by a distinct gene expression pattern. wide-ranging cellular compartments, from cell-surface receptors to However, there are presumably several factors, such as complex cytosolic signaling pathways, to transcription factors/cofactors for genetic backgrounds, coexisting genetic abnormalities, alterna- transcriptional regulation essential for B-precursor development, tive partner genes of fusions, and different cells of origin, that all and molecules involved in epigenetic regulation. contribute to determine the dominating pathway in a single case, A large body of evidence suggests that many BCP ALL sub- which can partially explain why cases sometime present outside groups have a unifying gene expression signature driven by of the expected cluster. similar but not identical gene fusion/mutation events. Hence In conclusion, we additionally defined six gene expression sub- these genetic abnormalities molecularly “phenocopy” each other groups. These six subgroups included cases characterized by PAX5 and point to a convergent signaling pathway related to patho- and CRLF2 fusions; point mutations in PAX5 (p.P80R); point genesis with the same transcriptomic subgroup. For example, the mutations in IKZF1 (p.N159Y); IGH–CEBPE fusion or mutations similarities in gene expression profiles and genetic aberrations in ZEB2 (p.H1038R); TCF3/4–HLF fusion; and NUTM1 fusions. between the Ph-like and the BCR–ABL1 subtypes indicate that We have also demonstrated that transcriptome profiling by RNA these are phenocopies of each other; a similar relationship sequencing allows the identification of distinct gene expression appears to exist between the ETV6–RUNX1 and ETV6–RUNX1– subgroups in BCP ALL with characteristic gene fusions and/or se- like subtypes. This large dataset has allowed us to systematically quence mutations that can be readily called using the integrative

EGFR IL-7R B-cell receptor CRLF2

IGH-EPOR Antigen RAS JAK1 JAK2 NRAS, KRAS FLT3 B-cell receptor JAK-STAT signaling RAS signaling signaling PTPN11, NF1, SH2B3, IL7R, STAT5B CRLF2 fusions Signaling molecules BCR-ABL1

T IKZF1 (p.N159Y) G T G AAC AGG C A C TTG T CC TP53, MED12, G ZEB2 G PAX5 CDKN2A/B (p.H1038R) (p.P80R) Cell cycle TRRAP ARID1A/B, CTCF Bind writers Remodel chromatin RUNX1 ZEB2 H2A H2B CREBBP,

ETV6 MGA H3 EP300, EZH2, IKZF1 H4 KMT2A/C/D SET1B, SETD2,WHSC1 PAX5 MYC Writers CHD8 Transcription factors KDM6A Readers Erasers and co-factors Epigenetic factors ASXL1/2, CHD4, NCOR2 Bind erasers ETV6-RUNX1 TCF3-HLF ZNF362 fusions

ETV6-RUNX1-like TCF4-HLF ZNF384 fusions

IGH-CEBPE NUTM1 fusions PAX5 fusions

MLL fusions MEF2D fusions

TCF3-PBX1

Fig. 5. Schematic figure of gene expression alterations and structural aberrations identified in this study. Representation of the various molecular abnor- malities that lead to leukemogenesis in BCP ALL. Known and novel gene fusions and their subcellular localizations are schematically represented. Three hotspot mutations, ZEB2 (p.H1038R), IKZF1 (p.N159Y), and PAX5 (p.P80R), that define distinct BCP ALL subgroups are located in the DNA-binding domains of each protein. Identified mutations in epigenetic regulators, such as KMT2D and WHSC1, are colored in green and shown as a pentagon in the nucleus. Additionally, transcription factor mutations such as IKZF1 and PAX5 are depicted at the left in the nucleus near the DNA chain, and mutations in cell-cycle regulators are depicted at the top left of the nucleus. Mutations found in signaling pathways such as JAK-STAT, RAS, and B cell receptor are depicted below the cell-surface membrane. Note: The epigenetic regulatory genes that covalently modify histones are classified as writers, erasers, readers, and remodel. Writers: proteins that can add epigenetic modifications; erasers: proteins that erase epigenetic modifications; readers: proteins that can recognize epigenetic modifications; bind writers: proteins that can bind the writers; bind erasers: proteins that can bind the erasers. Remodel chromatin: proteins that are functionally relevant to chromatin remodeling. MLL fusions are also known as KMT2A fusions.

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021 Table 1. Clinical characteristics and major genetic features of the BCP ALL cohorts included in the analysis Cohort 1 Cohort 2 Cohort 3 Cohort 4 Cohort 5 Cohort 6 Total Characteristics (SIH, n = 166) (LUH, n = 182) (JALSG, n = 71) (MaSpore, n = 194) (TARGET/COG, n = 394) (TARGET/COG, n = 216) (n = 1,223)

Age at diagnosis Mean, y 19.41 5.04 15.42 5.84 17.96 7.87 12.60 Median, y 15.01 4.00 17.00 4.56 13.00 6.44 7.95 <18 y 91 (55) 182 (100) 39 (55) 194 (100) 248 (63) 152 (70) 906 (74) ≥18 y 75 (45) 0 32 (45) 0 145 (37) 6 (3) 258 (21) Not available 0 0 0 0 1 58 (27) 59 (5) Gender Male 95 (57) 107 (59) 30 (42) 111 (57) 209 (53) 105 (49) 657 (54) Female 71 (43) 75 (41) 41 (58) 83 (43) 185 (47) 111 (51) 566 (46) Fusions BCR–ABL1 27 (16) 5 (3) NA 9 (5) 12 (3) 6 (3) 59 (5) ETV6–RUNX1 19 (11) 45 (25) 2 (3) 36 (19) 18 (5) 14 (6) 134 (11) TCF3–PBX1 17 (10) 13 (7) 6 (8) 13 (7) 11 (3) 16 (7) 76 (6) KMT2A 8 (5) 14 (8) 2 (3) 7 (4) 9 (2) 6 (3) 46 (4) DUX4 9 (5) 8 (4) 10 (14) 23 (12) NA 2 (1) 52 (4) MEF2D 7 (4) 1 (1) 7 (10) 2 (1) 18 (5) 5 (2) 40 (3) ZNF384 15 (9) 2 (1) 10 (14) 11 (6) 11 (3) 17 (8) 66 (5)

Data are years or no. of patients (%). Percentages might not add up to 100% because of rounding. Note: Cohort 3 (JALSG) does not include BCR–ABL patients. NA, not available.

analysis described in this study. Apart from providing information majority of cohort-2 patients were treated according to the Nordic Society of

on perturbed transcriptional programs/signaling pathways that may Paediatric Haematology and Oncology (NOPHO) ALL 1992, 2000, or 2008 pro- GENETICS be amenable to therapeutic targeting, the identified gene expression tocols (6), and cohort-4 patients were enrolled on the MaSpore frontline ALL subgroups are likely important for improved disease stratification protocols (7). The Japan Adult Leukemia Study Group (JALSG) cohort (cohort 3) (8) comprised adolescents and young adults with Philadelphia - and prognostication of BCP ALL. Hence, our combined results of negative ALL who were treated with the JALSG ALL202-U (adults) and TCCSG this collaborative study strongly advocate for RNA-seq being ap- L04-16 (pediatric) protocols (8, 28). BCP ALL patients in the Chinese cohort plied in the clinical diagnostic workup of BCP ALL. (cohort 4) enrolled in this study were diagnosed and/or treated in the Multi- center Hematology-Oncology Protocols Evaluation System (M-HOPES) by the Materials and Methods Shanghai Institute of Hematology (SIH)-based hospital network. Adult patients Patients. Transcriptome (RNA-seq) and other genomic data of all patients were enrolled in an SIH trial (Chinese Clinical Trial Registry; no. ChiCTR-ONRC- analyzed in this study are listed in Dataset S1. All of the included datasets 14004968), which was basically a modification of the vincristine, daunorubicin, have been analyzed as part of previous publications (3, 6–8, 10, 11, 18, 19). L-asparaginase, cyclophosphamide, and prednisone regimen. Pediatric patients Basic clinical characteristics and genetic types of collected BCP ALL cohorts in the Chinese cohort were enrolled in the Shanghai Children’sMedicalCenter are shown in Table 1 and Dataset S2. The Lund University Hospital (LUH) ALL-2005 protocol (Chinese Clinical Trial Registry; no. ONC-14005003) (10). cohort (cohort 2) (6) and the Singapore and Malaysia MaSpore cohort There were two TARGET/COG (Therapeutically Applicable Research to Gener- (MaSpore; cohort 4) (7) included only childhood BCP ALL cases. The vast ate Effective Treatments/Children’s Oncology Group) cohorts (cohort 5 and

Table 2. Proposed BCP ALL subgroups based on gene expression and gene fusion/sequence mutation patterns Frequency in the study cohort (n = 1,223), RNA-seq data-based subgroups no. of patients (%) Most frequently mutated genes (%)

MEF2D fusions (G1) 39 (3) MEF2D–BCL9 (67), MEF2D–HNRNPUL1 (21), NRAS (13), KMT2A (10) TCF3–PBX1 (G2) 76 (6) TCF3–PBX1 (100), TP53 (8) ETV6–RUNX1/–like (G3) 161 (13) ETV6–RUNX1 (82), WHSC1 (9), KRAS (7), NRAS (6) DUX4 fusions (G4) 63 (5) DUX4–IGH (78), NRAS (30), MYC (11), TP53 (11), PTPN11 (11), KMT2D (11), CTCF (8), FLT3 (8), PAX5 (8) ZNF384 fusions (G5) 74 (6) EP300–ZNF384 (53), TCF3–ZNF384 (12), TAF15–ZNF384 (11), SMARCA2–ZNF362 (4), NRAS (14), KRAS (12), FLT3 (14), PTPN11 (14), SETD1B (9), ZEB2 (8), EZH2 (8), KMT2D (7) BCR–ABL1/Ph–like (G6) 167 (14) BCR–ABL1 (31), IGH–CRLF2 (10), JAK2 fusions (10), ABL1 fusions (7), IGH–EPOR (7), P2RY8–CRLF2 (5), KRAS (6), JAK2 (7), RUNX1 (5) Hyperdiploidy (G7) 408 (33) NRAS (19), KRAS (18), FLT3 (13), PTPN11 (8), KMT2D (7), CREBBP (6) KMT2A fusions (G8) 56 (5) KMT2A–AFF1 (29), KMT2A–MLLT1 (25), KMT2A–MLLT3 (13), KRAS (13), NRAS (14), FLT3 (7) PAX5 and CRLF2 fusions (G9) 111 (9) P2RY8–CRLF2 (12), PAX5–NOL4L (8), PAX5–AUTS2 (6), NRAS (23), KRAS (23), PAX5 (12), FLT3 (11), JAK1 (8) PAX5 (p.P80R) mutation (G10) 23 (2) PAX5 (96), PTPN11 (26), NRAS (22), KRAS (17), FLT3 (13), IL7R (9), SETD2 (9) IKZF1 (p.N159Y) mutation (G11) 6 (<1) IKZF1 (100), KRAS (17), KMT2D (17) ZEB2 (p.H1038R)/IGH–CEBPE (G12) 8 (<1) ZEB2 (75), NRAS (62), KMT2D (25), KRAS (12), KMT2A (12), CDKN2A (12) TCF3/4–HLF (G13) 11 (<1) TCF3/4–HLF (64), KRAS (18), NRAS (9), ZEB2 (9), ASXL2 (9) NUTM1 fusions (G14) 20 (2) NUTM1 fusions (30), TP53 (15), KRAS (10), CREBBP (15), KMT2D (10), SETD1B (10)

Li et al. PNAS Latest Articles | 9of10 Downloaded by guest on September 24, 2021 cohort 6), with the data accession nos. EGAS00001001952 and phs000463/ calculated from time of complete remission to relapse. The Kaplan–Meier phs000464, respectively (3, 11, 18, 29). Informed consent was obtained from all method, log-rank test, and Cox proportional-hazards model were used to patients, and the study was approved by the ethics committee of Rui-jin calculate estimates of survival probabilities and hazard ratios. Two-sided P Hospital. The clinical outcome data of the TARGET/COG cohorts were not values are reported, and the significance level was set to less than 0.05. available. The comparability of the clinical data from the different cohorts was Analyses were performed with the use of R (v3.4.4). supported by very similar survival curves for the favorable genetic subtype – – (ETV6 RUNX1) and the unfavorable genetic subtype (BCR ABL1/Ph-like cases) ACKNOWLEDGMENTS. We thank TARGET/COG and St. Jude Children’s Re- of ALL among these cohorts (SI Appendix,Fig.S1). search Hospital for providing the RNA-seq data in this analysis. The RNA-seq dataset and clinical information for the TARGET/COG ALL project used in this RNA-Seq Data Analyses. Reading pairs were aligned to reference study are available in the database of Genotypes and Phenotypes (dbGaP) genomes hg38 (fusion gene analysis) and hg19 (gene expression and gene under accession phs000218.v20.p7 and European Genome Phenome archive, mutation calling). Principal component analysis was applied on the RNA-seq accessions EGAS00001000654 and EGAS00001001952. This work was sup- data of the 1,223 BCP ALL cases, and batch effects were adjusted by the SVA ported by Mega-Projects of Scientific Research for the 12th Five-Year Plan (2013ZX09303302); National Natural Science Foundation of China (Grants package (30) (SI Appendix, Fig. S3). To investigate the bias of different co- 81570122 and 81570122); Shanghai Municipal Education Commission-Gaofeng horts, age, gender, and race on gene expression, we checked the distribu- Clinical Medicine Grant Support (Grant 20161303); Program for Professor of tion of well-known biomarkers in the gene expression clusters. No obvious Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learn- bias based on cohort, age, gender, and race was found. The patients mainly ing (Grant QD2015005); Fine Classification and Standardized Treatment of Chil- clustered based on the different gene expression profiles related to un- dren with Acute Leukemia of Multi Center Clinical Research (Grant 14411950600), derlying genetic abnormalities. Procedures of reading pair alignment, mu- Shanghai Municipal Science and Technology Commission; National Key Research tation calling from RNA-seq data, and gene expression/pathway analysis are and Development Program (Grant 2016YFC0902800); Practical Research for listed in SI Appendix, Materials and Methods. Innovative Cancer Control from the Japan Agency for Medical Research and Development; US National Institutes of Health Grants CA21765, CA36401, and U01 GM115279; American Lebanese Syrian Associated Charities (ALSAC); Statistical Analyses. We tested mutual exclusivity and co-occurrence of mu- Swedish Cancer Society, Swedish Childhood Cancer Foundation, Swedish Re- tations for the 44 most frequently mutated genes (>1%). For gene pairs, we search Council, Knut and Alice Wallenberg Foundation, and Governmental ’ completed the two-sided Fisher s exact test according to their mutation (ALF) Funding of Clinical Research within the Swedish National Health Service; status (positive or negative). The R package QVALUE (v2.10.1) (31) was used NMRC/CSA/0053/2013; VIVA Foundation for Children with Cancer; Goh Foun- to control for multiple testing. Comparisons of categorical variables were dation; Children’s Cancer Foundation, Singapore Totalisator Board; Samuel ascertained by Pearson’s χ2 test or Fisher’s exact test. Overall survival was Waxman Cancer Research Foundation; and Center for HPC at Shanghai Jiao calculated from time of diagnosis to death, while relapse-free survival was Tong University.

1. Pui C-H, Yang JJ, Bhakta N, Rodriguez-Galindo C (2018) Global efforts toward the cure 16. Salzer U, et al. (2009) Relevance of biallelic versus monoallelic TNFRSF13B mutations of childhood acute lymphoblastic leukaemia. Lancet Child Adolesc Health 2:440–454. in distinguishing disease-causing from risk-increasing TNFRSF13B variants in 2. Holmfeldt L, et al. (2013) The genomic landscape of hypodiploid acute lymphoblastic deficiency syndromes. Blood 113:1967–1976. leukemia. Nat Genet 45:242–252. 17. Dang J, et al. (2015) PAX5 is a tumor suppressor in mouse mutagenesis models of 3. Roberts KG, et al. (2012) Genetic alterations activating kinase and cytokine receptor acute lymphoblastic leukemia. Blood 125:3609–3617. signaling in high-risk acute lymphoblastic leukemia. Cancer Cell 22:153–166. 18. Roberts KG, et al. (2014) Targetable kinase-activating lesions in Ph-like acute lym- 4. Den Boer ML, et al. (2009) A subtype of childhood acute lymphoblastic leukaemia phoblastic leukemia. N Engl J Med 371:1005–1015. with poor treatment outcome: A genome-wide classification study. Lancet Oncol 10: 19. Churchman ML, et al. (2015) Efficacy of retinoids in IKZF1-mutated BCR-ABL1 acute – 125–134. lymphoblastic leukemia. Cancer Cell 28:343 356. 5. Andersson A, et al. (2007) Microarray-based classification of a consecutive series of 20. Churchman ML, et al. (2018) Germline genetic IKZF1 variation and predisposition to – 121 childhood acute : Prediction of leukemic and genetic subtype as well as childhood acute lymphoblastic leukemia. Cancer Cell 33:937 948.e8. of minimal residual disease status. Leukemia 21:1198–1203. 21. Olsson L, et al. (2015) Cooperative genetic changes in pediatric B-cell precursor acute 6. Lilljebjörn H, et al. (2016) Identification of ETV6-RUNX1-like and DUX4-rearranged lymphoblastic leukemia with deletions or mutations of IKZF1. Genes Chromosomes – subtypes in paediatric B-cell precursor acute lymphoblastic leukaemia. Nat Commun Cancer 54:315 325. 22. Ma C, et al. (2018) SALL1 functions as a tumor suppressor in breast cancer by regu- 7:11790. lating cancer cell senescence and metastasis through the NuRD complex. Mol Cancer 7. Qian M, et al. (2017) Whole-transcriptome sequencing identifies a distinct subtype of 17:78. acute lymphoblastic leukemia with predominant genomic abnormalities of EP300 and 23. Kuang SQ, et al. (2008) Genome-wide identification of aberrantly methylated pro- CREBBP. Genome Res 27:185–195. moter associated CpG islands in acute lymphocytic leukemia. Leukemia 22:1529–1538. 8. Yasuda T, et al. (2016) Recurrent DUX4 fusions in B cell acute lymphoblastic leukemia 24. Ma X, et al. (2018) Pan-cancer genome and transcriptome analyses of 1,699 paediatric of adolescents and young adults. Nat Genet 48:569–574. leukaemias and solid tumours. Nature 555:371–376. 9. Zhang J, et al.; St. Jude Children’s Research Hospital–Washington University Pediatric 25. Fischer U, et al. (2015) Genomics and drug profiling of fatal TCF3-HLF-positive acute Cancer Genome Project (2016) Deregulation of DUX4 and ERG in acute lymphoblastic lymphoblastic leukemia identifies recurrent mutation patterns and therapeutic op- – leukemia. Nat Genet 48:1481 1489. tions. Nat Genet 47:1020–1029. 10. Liu YF, et al. (2016) Genomic profiling of adult and pediatric B-cell acute lympho- 26. Nguyen H, et al. (2009) Tcf3 and Tcf4 are essential for long-term homeostasis of skin – blastic leukemia. EBioMedicine 8:173 183. epithelia. Nat Genet 41:1068–1075. 11. Gu Z, et al. (2016) Genomic analyses identify recurrent MEF2D fusions in acute lym- 27. Alekseyenko AA, et al. (2015) The oncogenic BRD4-NUT chromatin regulator drives phoblastic leukaemia. Nat Commun 7:13331. aberrant transcription within large topological domains. Genes Dev 29:1507–1523. 12. Schwartzman O, et al. (2017) Suppressors and activators of JAK-STAT signaling at 28. Takahashi H, et al. (2018) Treatment outcome of children with acute lymphoblastic diagnosis and relapse of acute lymphoblastic leukemia in Down syndrome. Proc Natl leukemia: The Tokyo Children’s Cancer Study Group (TCCSG) study L04-16. Int J Acad Sci USA 114:E4030–E4039. Hematol 108:98–108. 13. Chen B, et al. (2018) Identification of fusion genes and characterization of tran- 29. Pui CH, et al. (2015) Childhood acute lymphoblastic leukemia: Progress through col- scriptome features in T-cell acute lymphoblastic leukemia. Proc Natl Acad Sci USA 115: laboration. J Clin Oncol 33:2938–2948. 373–378. 30. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for re- 14. Seetharam A, Bai Y, Stuart GW (2010) A survey of well conserved families of moving batch effects and other unwanted variation in high-throughput experiments. C2H2 zinc-finger genes in Daphnia. BMC Genomics 11:276. Bioinformatics 28:882–883. 15. Mullighan CG, et al. (2009) Rearrangement of CRLF2 in B-progenitor- and Down 31. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc syndrome-associated acute lymphoblastic leukemia. Nat Genet 41:1243–1246. Natl Acad Sci USA 100:9440–9445.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021