Transcriptional landscape of B cell precursor acute lymphoblastic leukemia based on an international study of 1,223 cases
Jian-Feng Lia,1, Yu-Ting Daia,1, Henrik Lilljebjörnb,1, Shu-Hong Shenc, Bo-Wen Cuia, Ling Baia, Yuan-Fang Liua, Mao-Xiang Qiand, Yasuo Kubotae, Hitoshi Kiyoif, Itaru Matsumurag, Yasushi Miyazakih, Linda Olssonb, Ah Moy Tani, Hany Ariffinj, Jing Chenc, Junko Takitak, Takahiko Yasudal, Hiroyuki Manom, Bertil Johanssonb,n, Jun J. Yangd,o, Allen Eng-Juh Yeohp, Fumihiko Hayakawaq, Zhu Chena,r,s,2, Ching-Hon Puio,2, Thoas Fioretosb,n,2, Sai-Juan Chena,r,s,2, and Jin-Yan Huanga,s,2
aState Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, National Research Center for Translational Medicine, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200025 Shanghai, China; bDepartment of Laboratory Medicine, Division of Clinical Genetics, Lund University, 22184 Lund, Sweden; cKey Laboratory of Pediatric Hematology and Oncology, Ministry of Health, Department of Hematology and Oncology, Shanghai Children’s Medical Center, Shanghai Jiao Tong University School of Medicine, 200127 Shanghai, China; dDepartment of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105; eDepartment of Pediatrics, Graduate School of Medicine, The University of Tokyo, 1138654 Tokyo, Japan; fDepartment of Hematology and Oncology, Nagoya University Graduate School of Medicine, 4668550 Nagoya, Japan; gDivision of Hematology and Rheumatology, Kinki University Faculty of Medicine, 5778502 Osaka, Japan; hDepartment of Hematology, Atomic Bomb Disease Institute, Nagasaki University, 8528521 Nagasaki, Japan; iDepartment of Paediatrics, KK Women’s & Children’s Hospital, 229899 Singapore; jPaediatric Haematology-Oncology Unit, University of Malaya Medical Centre, 59100 Kuala Lumpur, Malaysia; kDepartment of Pediatrics, Graduate School of Medicine, Kyoto University, 6068501 Kyoto, Japan; lClinical Research Center, Nagoya Medical Center, National Hospital Organization, 4600001 Nagoya, Japan; mNational Cancer Center Research Institute, 1040045 Tokyo, Japan; nDepartment of Clinical Genetics, University and Regional Laboratories, Region Skåne, Lund 22185, Sweden; oDepartment of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105; pCentre for Translational Research in Acute Leukaemia, Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, 119228 Singapore; qDepartment of Pathophysiological Laboratory Sciences, Nagoya University Graduate School of Medicine, 4618673 Nagoya, Japan; rKey Laboratory of Systems Biomedicine, Ministry of Education, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China; and sPôle de Recherches Sino-Français en Science du Vivant et Génomique, Laboratory of Molecular Pathology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China GENETICS
Contributed by Zhu Chen, October 17, 2018 (sent for review August 29, 2018; reviewed by Christine J. Harrison and Patrick Tan) Most B cell precursor acute lymphoblastic leukemia (BCP ALL) can four decades, most of the recurring chromosomal abnormalities, be classified into known major genetic subtypes, while a sub- including aneuploidy, chromosomal rearrangements/gene fu- stantial proportion of BCP ALL remains poorly characterized in sions (e.g., ETV6–RUNX1, BCR–ABL1, and TCF3–PBX1), and relation to its underlying genomic abnormalities. We therefore rearrangements of KMT2A (previously MLL), were identified by initiated a large-scale international study to reanalyze and de- lineate the transcriptome landscape of 1,223 BCP ALL cases using RNA sequencing. Fourteen BCP ALL gene expression subgroups Significance (G1 to G14) were identified. Apart from extending eight previously described subgroups (G1 to G8 associated with MEF2D fusions, In BCP ALL, molecular classification is used for risk stratification TCF3–PBX1 fusions, ETV6–RUNX1–positive/ETV6–RUNX1–like, DUX4 and influences treatment strategies. We reanalyzed the tran- fusions, ZNF384 fusions, BCR–ABL1/Ph–like, high hyperdiploidy, and scriptomic landscape of 1,223 BCP ALLs and identified 14 sub- KMT2A fusions), we defined six additional gene expression sub- groups based on their transcriptional profiles. Eight of these groups: G9 was associated with both PAX5 and CRLF2 fusions; (G1 to G8) are previously well-known subgroups, harboring G10 and G11 with mutations in PAX5 (p.P80R) and IKZF1 (p.N159Y), specific genetic abnormalities. The sample size allowed the respectively; G12 with IGH–CEBPE fusion and mutations in ZEB2 identification of six previously undescribed subgroups, con- (p.H1038R); and G13 and G14 with TCF3/4–HLF and NUTM1 fu- sisting of cases harboring PAX5 or CRLF2 fusions (G9), PAX5 sions, respectively. In pediatric BCP ALL, subgroups G2 to G5 and (p.P80R) mutations (G10), IKZF1 (p.N159Y) mutations (G11), G7 (51 to 65/67 chromosomes) were associated with low-risk, G7 either ZEB2 (p.H1038R) mutations or IGH–CEBPE fusions (G12), (with ≤50 chromosomes) and G9 were intermediate-risk, whereas HLF rearrangements (G13), or NUTM rearrangements (G14). In G1, G6, and G8 were defined as high-risk subgroups. In adult BCP addition, this study allowed us to determine the prognostic ALL, G1, G2, G6, and G8 were associated with high risk, while G4, impact of several recently defined subgroups. This study G5, and G7 had relatively favorable outcomes. This large-scale suggests that RNA sequencing should be a valuable tool in the transcriptome sequence analysis of BCP ALL revealed distinct mo- routine diagnostic workup for ALL. lecular subgroups that reflect discrete pathways of BCP ALL, informing disease classification and prognostic stratification. The Author contributions: Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. designed research; J.-F.L., combined results strongly advocate that RNA sequencing be in- Y.-T.D., H.L., S.-H.S., B.-W.C., L.B., Y.-F.L., M.-X.Q., Y.K., H.K., I.M., Y.M., L.O., A.M.T., H.A., J.C., J.T., T.Y., H.M., B.J., J.J.Y., A.E.-J.Y., F.H., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. troduced into the clinical diagnostic workup of BCP ALL. performed research; S.-H.S., Y.-F.L., J.C., J.J.Y., and F.H. collected the samples and clinical data; J.-F.L., Y.-T.D., H.L., B.-W.C., L.B., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. analyzed data; BCP ALL | RNA-seq | subtypes | gene fusion | gene mutation Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. wrote the paper; and J.-F.L., Z.C., C.-H.P., T.F., S.-J.C., and J.-Y.H. critically revised the manuscript. Reviewers: C.J.H., Newcastle University; and P.T., Duke–NUS Medical School. cell precursor acute lymphoblastic leukemia (BCP ALL), the Bmost common childhood cancer, is a highly heterogeneous The authors declare no conflict of interest. malignant hematological disorder (1). Previous genome- and/or Published under the PNAS license. 1 transcriptome-wide analyses of BCP ALLs have greatly im- J.-F.L., Y.-T.D., and H.L. contributed equally to this work. 2To whom correspondence may be addressed. Email: [email protected], ching-hon.pui@ proved our understanding of the pathogenesis and prognostic stjude.org, [email protected], [email protected], or [email protected]. impact of many molecular abnormalities in BCP ALL (2, 3). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. Structural chromosomal alterations as well as sequence muta- 1073/pnas.1814397115/-/DCSupplemental. tions are common in childhood and adult BCP ALL. In the last
www.pnas.org/cgi/doi/10.1073/pnas.1814397115 PNAS Latest Articles | 1of10 Downloaded by guest on September 24, 2021 cytogenetics and fluorescence in situ hybridization. Subsequently, those involving ZNF384, MEF2D,andDUX4 rearrangements (6– gene expression profiling revealed that these cytogenetic sub- 11), among those cases with no defining chromosomal abnor- groups displayed specific gene expression patterns (3–5). With the malities, termed “B-other-ALL.” advent of genome sequencing technology, several groups discov- However, it remained unknown whether additional novel BCP ered a large number of novel gene mutations and fusions, such as ALL subtypes could be detected by integrated analysis of pooled
Age Gender MEF2D fusions TCF3-PBX1 ETV6-RUNX1 ETV6-RUNX1-like DUX4 fusions ZNF384/ZNF362 fusions BCR-ABL1 Ph-like CRLF2 fusions Hyperdiploidy KMT2A fusions PAX5 fusions TCF3/4-HLF NUTM1 fusions IGH-CEBPE PAX5 (p.P80R) ZEB2 (p.H1038R) IKZF1 (p.N159Y) G1 G2 G3 G4 G5 G6 G7 G8 G9 G10-14
Subgroup
Color Key Low High Age Adult Paediatric Gender Male Female Subgroup G1 (MEF2D fusions) G2 (TCF3-PBX1) G3 (ETV6-RUNX1/-like) G4 (DUX4 fusions) G5 (ZNF384 fusions) G6 (BCR-ABL1/Ph-like) G7 (Hyperdiploidy) G8 (KMT2A fusions) G9 (PAX5 and CRLF2 fusions) PAX5 G10 G11 G12 G13 G14 G10 [ (p.P80R) mutation] Subgroup(G10-G14) G11 [IKZF1 (p.N159Y) mutation] Age Gender G12 [ZEB2 (p.H1038R)/IGH-CEBPE] PAX5 (p.P80R) TCF3/4-HLF IKZF1 (p.N159Y) G13 ( ) IGH-CEBPE G14 (NUTM1 fusions) ZEB2 (p.H1038R) TCF3/4-HLF NUTM1 fusions
Fig. 1. Two-step unsupervised hierarchical clustering of the global gene expression profile from 1,223 BCP ALL patients. In the gene expression subgroups of G1 to G7 (Left) and G8 to G14 (Right), columns indicate 1,223 BCP ALL patients and rows represent gene expression levels or genetic features for each patient. Genes showing over- and underexpression in the heatmap are shown in red and blue, respectively. The first box above the heatmap indicates genotypes and fusion genes, followed by a box including three clusters of hotspot sequence mutations defined in this analysis. The first row below the heatmap specifies the 14 BCP ALL subgroups identified on the basis of gene expression profiles. In the unsupervised hierarchical clustering heatmap of G10 to G14 (Lower Right), columns represent patients and rows are top variance genes in G10 to G14. The box below the heatmap indicates the five gene expression subgroups, gender, and genotypes of the G10 to G14 clusters.
2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1814397115 Li et al. Downloaded by guest on September 24, 2021 datasets from studies with otherwise relatively small sample the gene expression subgroups G1 to G14. Gene mutations sizes. We hypothesized that the versatility provided by RNA-seq among signaling molecules were enriched in subgroups G5, G7, (sequencing) would uncover otherwise undetected genetic ab- G9, and G10, while G4, G10, G11, and G12 harbored a higher normalities in BCP ALL, providing that sufficient numbers of number of variants in transcription factor genes. HIST family cases were analyzed. Thus, through the formation of an in- (HIST1H2AG and HIST1H2AI) point mutations located in the ternational consortium of five major study groups, we have de- histone H2A type 1 domain (SI Appendix,Fig.S7A) were highly lineated the transcriptomic landscape of BCP ALL and at the correlated with G2 (TCF3–PBX1), while WHSC1 (NSD2) point same time identified new subgroups of biological and clinical mutations (p.E1099K) in the SET domain (SI Appendix,Fig. importance. S7B) were significantly associated with G3 (ETV6–RUNX1– positive/ETV6–RUNX1–like; SI Appendix,Figs.S5–S7). Results Co-occurrence or mutual exclusivity of mutations was also Identification of BCP ALL Subgroups with Distinctive Gene Expression evaluated using two-sided Fisher’s exact test. A total of 36 gene Profiles and Genomic Aberrations. To comprehensively identify pairs (for example, TP53 and MYC) exhibited significant co- BCP ALL subtypes, we first systematically classified gene ex- occurrence (P < 0.05; SI Appendix, Fig. S6B). Along with the pression profiles, gene fusions, and gene mutations from RNA- novel subgroups defined in this study (G9 to G14), 13 gene pairs seq data of 1,223 BCP ALL cases from five significant patient (for example, PAX5 and PTPN11,andZEB2 and NRAS)exhibited cohorts (Table 1 and SI Appendix, Fig. S2 and Dataset S2). Based significant co-occurrence (SI Appendix,Fig.S6C and D). In G9, four on a consecutive two-step unsupervised clustering, 14 distinct gene pairs, namely PAX5 and IKZF1, JAK1 and SETD2, SH2B3 subgroups based on their gene expression signatures were iden- and ASXL1, and CDKN2A and ARID1B, exhibited significant co- tified (G1 to G14) (Fig. 1 and Table 2). Most of these gene occurrence (P < 0.05; SI Appendix, Fig. S6D). expression subgroups segregated with well-known genetic ab- Enrichment of certain mutations differed between pediatric normalities. TCF3–PBX1 fusions were present among the and adult BCP ALL patients. Transcription factor mutations, G2 subgroup (n = 76, 6%); ETV6–RUNX1 fusion belonged to such as in RUNX1, were more frequent in adult ALL, while G3; BCR–ABL1 (Ph) and BCR–ABL1–like (Ph-like, including a signaling molecule and epigenetic factor WHSC1 mutations were cluster with CRLF2 fusions) comprised G6 (n = 167, 14%); and more prevalent in pediatric BCP ALL (Datasets S5 and S6). cases with a hyperdiploid karyotype formed the subgroup G7 n = ZNF362 ZNF384 ( 408, 33%). Three subgroups which had recently been Fusions Cluster with Rearrangements (G5) and Display GENETICS reported identified among B-other-ALL cases were those with Activation of the JAK-STAT Pathway. Four cases harbored pre- MEF2D fusions (G1; n = 39, 3%), DUX4 rearrangements (G4; viously undescribed ZNF362 rearrangements (n = 4), including n = 63, 5%), and ZNF384 fusions (G5; n = 74, 6%) (6–11). SMARCA2–ZNF362 (n = 3) and TAF15–ZNF362 (n = 1). These These recently described subgroups formed distinctive gene cases clustered within the G5 subgroup, otherwise associated expression-based clusters, consistent with prior reports (6, 7, 10, with ZNF384 fusions (Fig. 2 and SI Appendix, Figs. S8A and S9). 11). The most recently defined BCP ALL ETV6–RUNX1–like ZNF384 and ZNF362 are homologous C2H2-type zinc-finger cluster, characterized by the absence of ETV6–RUNX1 fusions transcription factors containing six zinc fingers that belong to but with similar gene expression profiles to ETV6–RUNX1– the zinc-finger protein 384/nuclear matrix transcription factor 4 positive BCP ALLs (6), was also found among our combined (ZFAM4) gene family (14). Of note, the zinc-finger domains datasets. In concordance with previous findings (6), both fusions were retained in both fusion proteins (SI Appendix, Fig. S8B), involving ETV6 and fusions involving IKZF1 were common in and both clusters showed similar gene expression profiles with these ETV6–RUNX1–like cases (Dataset S2). However, all activated JAK-STAT signaling pathway (SI Appendix, Fig. S8C). ETV6–RUNX1–negative cases exhibiting a gene expression pro- Moreover, the fusion partners of ZNF362, namely TAF15 and file similar to ETV6–RUNX1–positive cases were defined as SMARCA2, were also found to fuse to ZNF384, with similar ETV6–RUNX1–like. Together, ETV6–RUNX1–positive/ETV6– breakpoints. RUNX1–like BCP ALL constituted G3 (n = 161, 13%). KMT2A- rearranged cases formed a distinct subgroup (G8; n = 56, 5%). Previously Undescribed Subgroups Associated with Different Gene Notably, six previously undescribed gene expression subgroups Fusions/Sequence Mutations. (G9 to G14) with distinct genomic abnormalities were identified. G9: PAX5 and CRLF2 fusions are representative of this subgroup. G9 (n = 111, 9%) was associated with PAX5 fusions and “Ph- According to the gene expression profiles, 46 cases with PAX5 like” ALL with CRLF2 fusions (12). G10 (n = 23, 2%) and G11 fusions and 13 cases with CRLF2 fusions (accounting for 41 and (n = 6, <1%) were characterized by two hotspot mutations in 12%, respectively) clustered together in G9 (n = 111). Previous PAX5 (p.P80R) (21/22, 96%) and IKZF1 (p.N159Y) (6/6, 100%), work identified CRLF2 fusions in Down syndrome ALL and Ph- respectively. The subgroup G12 (n = 8, <1%) was enriched for like BCP ALL, each of them accounting for approximately half hotspot mutation in ZEB2 (p.H1038R) (5/8, 63%) and IGH– of the cases (12, 15). In our study, 30% of the cases with CRLF2 CEBPE fusions (3/8, 27%). G13 (n = 11, <1%) and G14 (n = fusions (13/44) were found in G9 and 57% (25/44) in the BCR– 20, 2%) were associated with TCF3/4–HLF (7/11, 64%) and ABL1/Ph-like subgroup (G6), with the remaining cases present NUTM1 (6/20, 30%) rearrangements, respectively. in G7 and G10 (Fig. 1). Notably, all 13 CRLF2 fusions in G9 were P2RY8–CRLF2 fusions, in contrast to those in G6 in Nonsilent Sequence Mutation Profile. We next analyzed nonsilent which the fusion partners of CRLF2 were either P2RY8 or IGH. sequence variants in available whole exome sequencing (WES) In the 13 CRLF2 fusion cases (G9), seven coexisted with PAX5 and RNA-seq data, based on in-house analysis criteria from fusions. Signaling molecule mutations were also significantly previous studies (6, 8, 11, 13). We identified 44 genes that were enriched in G9 (P < 0.001; SI Appendix, Fig. S5 and Dataset S5), recurrently mutated in at least 1% of the cases (12/1,223 cases). a feature reminiscent of Down syndrome ALL (12). Compared Nonsilent variants in NRAS, KRAS, FLT3, KMT2D, PAX5, with the CRLF2 fusion clusters in G6, the PI3K-Akt signaling PTPN11, CREBBP, and TP53 exhibited the highest mutation (e.g., FLT4 and EGF), cytokine–cytokine receptor interaction frequencies (3 to 14%) (SI Appendix,Figs.S5andS6A). The (e.g., CCL17 and IL2RA), and hematopoietic cell lineage (e.g., mutated genes (>1%) were functionally divided into five cat- CD33 and CD34) pathways were significantly down-regulated egories: signaling molecules, transcription factors, epigenetic in the CRLF2 fusion-positive cases in G9 (SI Appendix,Fig. factors, cell cycle, and others (Dataset S3). Distinct gene mu- S10), whereas a B cell-specific member of the tumor necrosis tation categories showed different levels of enrichment among factor (TNF) receptor superfamily, TNFRSF13B, was up-regulated
Li et al. PNAS Latest Articles | 3of10 Downloaded by guest on September 24, 2021 A Missense Frameshift Nonsense N210fs B Protein insertion Protein deletion R225L * 21/22 cases exon3:c.C239G (p.P80R) cluster in G10 M335fs p.V26G N29K * N29D A100P S131G P363L D2H R31W N106D V132A A375V Y7C P34L(3) L58P F110I S133R(2) R377X V20G R38C L58F(3) F110V R140Q K196X X392W P80R(22) Q22P V26G(10) R38H(2) C64F A111D R140L(2) E201fs(3)X392R(2) 10 22 3 3 2 p.P80R PAX5 0 100 200 300 392 PAX5 C PAX and DNA binding Pax2_C E DPEP1 IKZF1 IKZF1 (p.N159Y) (G11) PAX5 20.0 HHAT 20 PAX5 LRRK1 Up-regulated *** 75 (p.P80R) (G10) Down-regulated **** Up-regulated 17.5 ns FLT3 18 Down-regulated MEGF10 20 ns ns P2RY8 STAT5A 15.0 SLC17A9 16 50 SMAD1 ARHGEF28 KCNK3 ARSG 14 CD72 TTC16 EBF3 GNB3 12.5 CD58 10 FLT4 MYO10 LNX1 (adjusted P −value) (adjusted P −value) 25 INHBB 12 CRLF2