Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Integration of genomic and transcriptomic markers improves the

2 prognosis prediction of acute promyelocytic leukemia

3

4 Authors and Affiliations:

5 Xiaojing Lin1, Niu Qiao1, Yang Shen1, Hai Fang1, Qing Xue1, Bowen Cui1, Li Chen1, Hongming

6 Zhu1, Sujiang Zhang1, Yu Chen1, Lu Jiang1, Shengyue Wang1, Junmin Li1, Bingshun Wang2,

7 Bing Chen1, Zhu Chen1, Saijuan Chen1,

8 L.X.J., Q.N., S.Y. contributed equally to this work.

9 1Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National

10 Research Center for Translational Medicine (Shanghai), Rui-Jin Hospital, Shanghai Jiao Tong

11 University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao

12 Tong University, 200025, Shanghai, China

13 2Department of Biostatistics and Clinical Research Center, Shanghai Jiao Tong University

14 School of Medicine, 200025, Shanghai, China

15

16 Running head: Integrated genomic and transcriptomic model in APL

17 Keywords: APL, NRAS, genomics, transcriptomics, prognosis

18 Corresponding authors:

19 Saijuan Chen, State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology,

20 National Research Center for Translational Medicine (Shanghai), Rui-Jin Hospital, Shanghai

21 Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology,

22 Shanghai Jiao Tong University,197 Rui-Jin Er Road,

23 Shanghai, 200025,China

24 Tel: +86-21-33562658

25 Fax: +86-21-64743206 26 E-mail: [email protected] 27 28 29 Zhu Chen, MD, PhD

30 Shanghai Institute of Hematology,

31 Rui-Jin Hospital, 1

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 197 Rui-Jin Er Road,

2 Shanghai 200025,China

3 Tel: +86-21-33562658

4 Fax: +86-21-64743206 5 E-mail: [email protected] 6

7 Yang Shen, MD, PhD

8 Shanghai Institute of Hematology,

9 Rui-Jin Hospital,

10 197 Rui-Jin Er Road,

11 Shanghai 200025,China

12 Tel: +86-21-33562658

13 Fax: +86-21-64743206

14 E-mail: [email protected]

15

16

17 Competing interests

18 All authors declare no competing interest. 19

2

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Statement of translational relevance

2 Acute promyeloid leukemia (APL), which is used to be one of the deadliest forms of cancer,

3 has gradually become highly curable. However, in a subset of APL patients, early death and

4 relapse remained challenges for the treatment. Moreover, with the treatment strategy evolved

5 from ATRA and chemotherapy (ATRA–chemotherapy)- to ATRA and ATO (ATRA–ATO)-based

6 synergistic targeted therapy, the classic risk stratification needs further improvement. In this

7 study, we identified a feasible and reliable way and established an integrated system to

8 discriminate APL patients into high- (HR) and standard-risk (SR) groups. The revised HR

9 group displayed significantly worse outcomes than the revised SR one. In summary, we could

10 better define the risks of APL, add predictive value to the choice of currently available

11 therapeutic tools, and make the clinical decision more convenient and clearer.

12

3

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Abstract

2 Purpose: The current stratification system for acute promyelocytic leukemia (APL) is based on

3 the white blood cell (WBC) and the platelet counts (i.e., Sanz score) over the past two decades.

4 However, the borderlines among different risk groups are sometimes ambiguous, and for some

5 patients, early death and relapse remained challenges. Besides, with the evolving of the

6 treatment strategy from ATRA-chemotherapy to ATRA-ATO-based synergistic targeted therapy,

7 the precise risk stratification with molecular markers is needed.

8 Patients and Methods: This study performed a systematic analysis of APL genomics and

9 transcriptomics to identify genetic abnormalities in 348 patients mainly from the APL2012 trial

10 (NCT01987297) to illustrate the potential molecular background of Sanz score and further

11 optimize it. The least absolute shrinkage and selection operator (LASSO) algorithm was used

12 to analyze the expression in 323 cases to establish a scoring system (i.e., APL9 score).

13 Results: Through combining NRAS mutations, APL9 score with WBC, 321 cases can be

14 stratified into two groups with significantly different outcomes. The estimated 5-year overall (P

15 = 0.00031), event-free (P < 0.0001), and disease-free (P = 0.001) survival rates in the revised

16 standard-risk group (95.6%, 93.8%, and 98.1%, respectively) were significantly better than

17 those in the revised high-risk group (82.9%, 77.4%, and 88.4%, respectively), which could be

18 validated using The Cancer Genome Atlas dataset.

19 Conclusions: We have proposed a two-category system improving prognosis in APL patients.

20 Molecular markers identified in this study may also provide genomic insights into the disease

21 mechanism for improved therapy. 22

4

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Introduction

2 Acute promyelocytic leukemia (APL) is a unique subtype of acute myeloid leukemia (AML),

3 characterized by the expansion and accumulation of leukemic cells blocked at the

4 promyelocytic stage of granulocytic differentiation and the presence of a specific disease driver

5 fusion gene encoding the PML–RARA oncoprotein. In the last two decades, the

6 all-trans-retinoic acid (ATRA) and arsenic trioxide (ATO)-based combination (ATRA–ATO)

7 showed a considerable advantage over traditional ATRA and chemotherapy combination

8 (ATRA–chemotherapy)-based protocol and has become the standard APL core treatment

9 regimen nowadays (1-3). Meanwhile, the Sanz score was widely used to classify APL patients

10 into three risk categories according to white blood cell (WBC) and platelet (PLT) counts

11 (low-risk: WBC count ≤ 10 × 109/L and PLT count > 40 × 109/L; intermediate-risk: WBC count

12 ≤ 10 × 109/L and PLT count ≤ 40 × 109/L; and high-risk: WBC count >10 × 109/L) (4).

13 While Sanz score represents a very useful first-generation tool to stratify APL, we propose to

14 further enrich the current APL risk evaluation system taking into consideration several factors.

15 First, although the hemogram criteria of the risks of APL seem to be easily applicable, such

16 criteria appear not so straightforward for some patients, especially due to the fluctuation in

17 WBC and PLT levels caused by infection and drugs. Notably, now ATRA is routinely used to

18 treat APL, often causing leukocytosis syndrome (5). The inclusion of reliable molecular

19 markers can be of great value in this aspect. Second, the data to generate the Sanz risk score

20 was mostly from the clinical trials of ATRA–chemotherapy, which showed significant inferior

21 results as compared to current ATRA–ATO treatment in low- and intermediate-risk categories

22 (6). Third, the identification of biological markers related to distinct disease outcomes may

23 allow not only a better understanding of the molecular mechanisms underlying different clinical

24 phenotypes as those in Sanz score but also a precision use of novel therapeutic methods,

25 such as targeted drugs or immune therapies.

26 Therefore, we generated gene mutations and expression profiles of 348 APL patients enrolled

27 in the APL2012 trial (NCT01987297), the largest cohort ever reported(7). Integration of

28 genomics and transcriptomics allowed us to panoramically clarify the importance of gene

29 mutations and expression profiles on the risk assessment and the prognosis of APL.

5

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Patients and Methods 2 Patients

3 A total of 348 APL patients with available bone marrow (BM) samples were included in this

4 study, including a majority part of the patients (n = 304) coming from the APL2012 clinical trial

5 (NCT01987297) from December 6, 2012 to December 31, 2017, and a small group of the

6 patients (n = 44) from the historical cases between June 1, 2012 to November 31, 2012, who

7 received the similar treatment (Table 1). It is noteworthy that the key part in APL2012 trial for

8 randomization into two groups with or without ATO was at the phase of consolidation while all

9 patients received ATRA–ATO treatment for remission induction and maintenance. As a result,

10 there were no statistically significant differences in terms of 3-year disease-free survival (DFS)

11 as well as estimated 7-year DFS and overall survival (OS) between the two groups(7). When

12 the 348 cases investigated in the present work were scrutinized for their treatment and

13 outcome data, there were 22 cases of early death during remission induction and two cases

14 who were lost during follow-up. Among the remaining 324 cases, no significant differences

15 were observed in terms of the 5-year OS, event-free survival (EFS), and DFS between the two

16 groups receiving consolidation with (n = 172) or without (n = 152) ATO (Table S1 and Figure

17 S1). The collection and the preservation of the samples were approved by the Institutional

18 Review Boards of all participating centers, and written informed consent for sample collection

19 and research was obtained following the Declaration of Helsinki (8).

20 Whole-exome sequencing (WES), whole-genome sequencing (WGS), and RNA

21 sequencing (RNA-seq)

22 The pretreatments of BM samples, genomic DNA extraction, library preparation, and detailed

23 analysis for WES, WGS, and RNA-seq are listed in the Supplementary Materials and Methods.

24 Establishment of a scoring system with penalty least absolute shrinkage and selection

25 operator (LASSO) method to linearize the profiles

26 After exclusion of the patients (n = 61) from whom we performed differential analysis

27 identifying a list of 389 differentially expressed (DEGs), there were 262 patients with

28 RNA-seq data in our cohort. Using the createDataPartition function in the caret (version 6.0-84)

29 (9) package, we divided these 262 APL patients randomly into the training set (n = 158;

30 three-fifths) and the validation set (n = 104; two-fifths) set. No significant difference was

6

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 observed in such division in terms of Sanz risk group distribution (P = 0.697) or other clinical

2 characteristics (Table 1).

3 To identify a core subset of DEGs that were relevant to distinct patient groups, we used a linear

4 regression technique based on the LASSO algorithm as implemented in the glmnet package

5 (version 2.0-16) (10), choosing 10-fold cross-validation to fit a binomial regression model with

6 optimized parameters. We repeated the process 10 times to test the stability of this method. A

7 scoring model was built using the training set and tested in the validation set. The receiver

8 operating characteristic (ROC) curve was used to identify the best threshold of the score

9 model separating patients into two subgroups by the pROC package (version 1.13.0) (11).

10 The definition of clinical outcome endpoints

11 We considered three endpoints including OS, DFS, and EFS. OS was defined for all patients in

12 a trial, and measured from the date of entry onto a study until death from any cause. Relapse,

13 including morphologic relapse, which was defined as the reappearance of blasts

14 post-complete response in peripheral blood (PB) or bone marrow, and molecular or

15 cytogenetic relapse, which was defined as the reappearance of molecular or cytogenetic

16 abnormality (12). DFS was defined as the time from the complete response (CR) to any

17 relapse, or persistent positivity of reverse transcriptase-polymerase chain reaction (RT-PCR)

18 of PML–RARA transcripts after consolidation therapy, or secondary AML, or death of any

19 reason. EFS was defined as the time from diagnosis until an event (i.e., induction failure,

20 relapse, or death from any cause) or last follow-up. Early death was defined as death occurring

21 either before treatment initiation or during induction. After the systematic introduction of ATRA

22 in modern regimens, most early deaths have been recorded within the first 2–3 weeks.

23 Patients who were lost to follow-up were censored at the time of last contact. OS, EFS, and

24 DFS were evaluated using the log-rank test and the Cox proportional hazards (CPH) model.

25 The violation of the CPH assumption was detected by examining the Schoenfeld residuals.

26 Statistical analysis

27 Analysis workflow was charted in Figure 1. The multivariate Cox analysis was performed using

28 all significant univariate predictors following the suggestion of TRIPOD (13). The risk-revised

29 categories were generated using the X-tile (version 3.6.1) (14), a tool that assisted the marker

30 cutoff point according to the minimal P-value. Bias-corrected internal calibration was 7

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 performed at 1-, 2-, and 5-year using 1000-bootstrap resamples and fitting the relationship

2 between the observed and the estimated survival probabilities by the rms package (version

3 5.1-4) (15). Bias-corrected internal validation was also performed by evaluating the Harrell’s

4 concordance index (C-index) under 1000-bootstrap resamples under the boot package

5 (version 1.3-24) (16). All survival analyses were performed using the survival package (version

6 2.42-6) (17), with survival curves visualized using the survminer package (version 0.4.3) (18).

7 Net reclassification improvement (NRI) (19) calculation was performed using the nricens

8 package (version 1.6) (20).

9 Supplementary materials and methods

10 Detailed materials, methods, statistical analysis, additional extended data display items are

11 presented in the Supplementary Materials and Methods and Table S2-S8.

12 Availability of data and materials

13 RNA-Seq data have been deposited to the Gene Expression Omnibus (GEO) database with

14 accession number GSE172057. The detailed information of gene mutations from WES data

15 has been summarized in Table S9. All data can also be viewed in The National Omics Data

16 Encyclopedia (http://www.biosino.org/node) by pasting the accession (OEP001919) or through

17 the URL: http://www.biosino.org/node/project/detail/OEP001919.

18

19 Results

20 Clinical characteristics and genome-wide profilings of APL patients

21 In our cohort involving 348 APL patients, the median age at diagnosis was 38 years, and the

22 median follow-up time was 49.0 (InterQuartile Range, IQR = 32.7–61.0) months. The

23 occurrence of early death and relapse was recorded in 22 and 13 patients, respectively.

24 According to the Sanz risk model, 110, 173, and 65 patients were classified as high-,

25 intermediate-, and low-risk, respectively (Table 1).

26 Among the 348 patients, 202 and 2 were subjected to WES and WGS, respectively, to detect

27 genomic abnormalities, and 189 out of these 204 patients had their own matched control

28 samples (complete remission BM samples). A total of 324 patients were subjected to RNA-seq

29 for transcriptome analysis. However, one case was excluded in further analysis because of the

30 co-occurrence of PML–RARA and BCR–ABL1 fusion genes (Table 1). 8

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Gene mutation landscape of de novo APL

2 From the above-mentioned patients, we identified 1,481 non-silent somatic mutations involving

3 940 genes, including 1,295 single-nucleotide variations (SNVs) and 186 insertions/deletions

4 (indels), with a median of 4 mutations (IQR, 2–7) found per patient (Figure S2 and Table S9).

5 Recurrent mutations were observed in 81.03% (282/348) of patients. As illustrated in Figure

6 2A, the most common APL-related variations were found in FLT3 (33%, including internal

7 tandem duplication [ITD: 13%] and point mutations in the tyrosine kinase domain [TKD: 20%])

8 and WT1 (19%), followed by NRAS (9%), ARID1A (7%), and RREB1 (6%). The full list of

9 recurrent gene mutations is detailed in Table S10. In addition to these well-known genes, we

10 also detected previously unidentified mutated genes in APL, including RREB1, EBF3, and

11 UPF3B; they were detected with a mutation frequency of > 2%, likely being loss-of-function in

12 nature (frameshift insertions/deletions and nonsense mutations; see Figure S3A). As

13 comparison, the incidence of mutations of these genes was seldomly seen in non-APL AML or

14 other diseases (Table S11).

15 Notably, FLT3-ITD was significantly associated with high-risk patients as compared to the two

16 other groups (high-risk vs. intermediate-risk: P = 0.0036; high-risk vs. low-risk: P < 0.0001).

17 While NRAS mutations were equally distributed in Sanz low-, intermediate-, and high-risk

18 groups, KRAS mutations showed a significant difference between high- and low-risk patients

19 (P = 0.027). ARID1A, USP9X, and ARID1B were mutated more frequently in low-risk patients

20 than in high-risk patients (P = 0.040, 0.013, and 0.011, respectively; Figure S3B).

21 The analysis of prognostic relevance of NRAS subclonal status

22 Besides, genomic mutations mentioned above were all assessed in univariate analysis for OS,

23 EFS, and DFS. Of note, only NRAS mutations (n = 32; Figure 2B) resulted in poor prognosis,

24 as reflected by OS (HR = 4.01, 95% CI = 1.675–9.602, P = 0.0018), EFS (HR = 3.049, 95% CI

25 = 1.389–6.692, P = 0.0055), and DFS (HR = 4.519, 95% CI = 1.416–14.420, P = 0.011).

26 The variant allele fraction (VAF) of NRAS was analyzed to identify the impact of subclonal

27 status on the prognosis. Firstly, the VAF of NRAS was treated as a numeric variable and add

28 into the univariate Cox analysis. As shown in Table S12, it was not related to OS, EFS, and

29 DFS, respectively. Secondly, we divided APL patients into three groups: NRAS wild-type,

30 NRAS mutation with VAF less than the median, NRAS mutation with VAF greater than the 9

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 median. To evaluate the impact of VAF, we treated NRAS mutation with VAF less than the

2 median as a reference and performed the univariate Cox analysis. As shown in Table S12, the

3 VAF of NRAS mutation was not related to OS, EFS, and DFS.

4 Analysis of functional gene categories among different Sanz risk groups.

5 The different distributions of gene mutations among the low-, intermediate-, and high-risk

6 patients were assessed by comparing the known functional gene groups (Table S13). Notably,

7 the mutations of activated signaling-related genes were convincingly enriched in high-risk

8 patients (high-risk vs. intermediate-risk: P < 0.0001; high-risk vs. low-risk: P < 0.0001),

9 whereas those of epigenetic modifiers tended to be associated with low-risk patients (low-risk

10 vs. intermediate-risk: P = 0.052; low-risk vs. high-risk: P = 0.036; Figure S4A).

11 In univariate Cox analysis, only the mutations of activated signaling pathway were related to

12 the clinical prognosis in DFS (HR = 6.362, 95% CI = 1.424–28.431, P = 0.015) in this study

13 (Table S14).

14 Pairwise co-occurrence and mutual exclusivity calculation

15 The genes with mutation frequency more than 2% were applied to pairwise co-occurrence and

16 mutual exclusivity calculations (Figure S4B). The mutations in SF3B1 (n = 7) and CALR (n = 6),

17 which were commonly detected in myelodysplastic syndrome and myeloproliferative

18 neoplasms (21), showed a pattern of co-occurrence (P < 0.0001). Meanwhile, SMARCB1 and

19 FLT3-ITD as well as EBF3 and SF3B1 tended to display co-occurrence of gene mutation (P =

20 0.033 and P = 0.024, respectively).

21 Identification of distinct gene expression patterns

22 To gain a panoramic view of gene expression profiles, we first performed the unsupervised

23 clustering analysis focusing on a set of 1,090 genes with the highest variance across 323

24 patients (Table S15). As illustrated in the heatmap (Figure S5), we observed a group of genes

25 displaying distinct expression patterns associated with Sanz low-risk (left dotted box) and

26 high-risk patients (right dotted box). This observation motivated us to further identify the gene

27 expression pattern of different APL phenotypic features. DEGs were obtained from BM

28 samples of 34 high- and 27 low-risk (Sanz score) patients who had no or only short exposure

29 to ATRA (less than 48 hours). Heatmap was generated by the 389 DEGs that met the criteria

30 (|Log2FC| > 2, adjP < 0.01, and basemean > 100; Figure 3A and Table S16) 10

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Though identified from the 61 patients mentioned above, we observed that these 389 DEGs

2 were mainly grouped into two gene clusters according to the expression pattern seen in the

3 remaining 262 patients (Figure 3A). Genes in cluster A were largely involved in erythrocytic

4 functions, differentiation, and homeostasis (Figure 3B and Table S17). Genes in cluster B were

5 mostly of functional relevance to T cell activation, leukocyte differentiation, and lymphocyte

6 activation (Figure 3C and Table S17). GSEA also showed a number of gene sets or pathways

7 at significantly lowered expression levels in high-risk patients compared with low-risk patients

8 (Figure S6).

9 Two distinct subcohorts, namely, subcohorts 1 (SC1) and 2 (SC2), could be defined in

10 accordance with the gene expression pattern without artificial intervention. SC1 (n = 128;

11 48.9%) comprised 21 Sanz low-, 90 intermediate-, and 17 high-risk patients. Most genes in

12 cluster A were highly expressed only in SC1. SC2 (n = 134; 51.1%) consisted of 14 Sanz low-,

13 67 intermediate-, and 53 high-risk patients. The patients with WBC count > 50 × 109/L were

14 primarily observed in SC2 (P = 0.0018). In terms of gene mutations, FLT3-ITD was significantly

15 enriched in SC2 (P = 0.0024), whereas WT1 and NRAS were distributed equally between

16 these two subcohorts (P = 0.085 and 0.539, respectively).

17 To investigate whether these DEGs were modulated by the same upstream regulator or the

18 same pathway, we employed Ingenuity Pathway Analysis (IPA) which identified GATA binding

19 1 (GATA1, P < 0.0001; Table S18) as the most significant upstream regulator that might

20 explain the different gene expression patterns observed between low- and high-risk patients.

21 Low GATA1 expression was significantly associated with the high-risk group (Figure 3D), and

22 also accompanied with the tendency of increased WBC count, decreased PLT count (Figure

23 3E–F). We then analyzed the correlation of GATA1 expression and APL blasts in 323 patients.

24 Similarly, GATA1 was conversely associated with the blast percentage (P < 0.0001; R = −0.386;

25 Figure S7A). We further classified the patients according to the quartile interval of GATA1

26 expression (< Q1, Q1–Q2, Q2–Q3, ≥ Q3), and a similar result was obtained (Figure S7B).

27 Besides, we counted the number of all forms of megakaryocytes in BM. The median number

28 per smear in patient samples was 1, with a 25%–75% CI of 0–6. As shown in Figure S7C, in

29 patients with elevated GATA1 expression, the number of megakaryocytes was increased (P <

11

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 0.0001; R = 0.333). It became much clearer in the quartile analysis showing that the elevation

2 of GATA1 expression was correlated with the number of megakaryocytes per smear (Figure

3 S7D). In support of our finding, negative correlations were also observed between the DNA

4 methylation and the gene expression level of GATA1 in APL patients from The Cancer

5 Genome Atlas (TCGA) dataset (Figure S8).

6 Potential of APL9 score in determining risks in APL

7 Starting with 389 DEGs, we proceeded to perform a sparse regression analysis stratifying

8 patients into molecularly low- or high-risk by a unified score (see workflow chart in Figure 1). In

9 brief, the remaining cohort (n = 262, excluding the 61 patients defining DEGs) was randomly

10 divided into the training set (three-fifths, n = 158) and the validation set (two-fifths, n = 104). In

11 the training phase, the LASSO algorithm were applied to 389 DEGs with SC2 as endpoint,

12 which yielded a signature of 9 genes. The linear combination of these 9 genes weighted by

13 regression coefficients was then calculated for each patient (i.e., the APL9 score; Figure 4A).

14 To test the stability of the APL9 score, we repeated the above division for 10 times to establish

15 10 models based on each of the 10 training sets. The ROCs of these 10 models in each

16 training set and validation set are illustrated in Figure S9, showing the robustness of our

17 proposed method.

18 The cut-off point of the APL9 score (0.58) was determined by the ROC curve maximizing the

19 separation of two subcohorts (i.e., separating SC1 and SC2 only in the training set). We

20 classified patients with APL9 score higher than 0.58 into the molecular high-risk group (i.e.,

21 GII), and otherwise into the molecular low-risk group (i.e. GI; Figure 4B).

22 The GII patients had 5-year OS, EFS, and DFS rates of 87.1% (95% CI = 81.9%–92.6%), 82.7%

23 (95% CI = 77.0%–88.9%), and 87.7% (95% CI = 88.2%–97.0%), respectively (Figure 4C).

24 Their outcome was significantly worse than that observed in GI patients who had 5-year OS,

25 EFS, and DFS rates of 96.9% (95% CI = 94.2%–99.6%), 95.6% (95% CI = 92.4%–98.8%), and

26 98.7% (95% CI = 96.8%–100.0%), respectively (Figure 4C). We also found that higher APL9

27 score was remarkably associated with worse prognosis (P = 0.0023, 0.00035, and 0.012 for

28 OS, EFS, and DFS, respectively). The related clinical characteristics and the survival

29 information are listed in Table S19 and S20 and Figure S10A-D. The area under the curves

30 (AUCs) of the APL9 score were 0.986 and 0.980 in the training and validation sets, 12

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 respectively (Figure S10E).

2 Although the APL9 score was established by LASSO regression without considering its

3 biological function, it was interesting to find that some of these genes still have functions

4 related to hematological or immune features or diseases (Table S21).

5 Besides, in attempt to explore simpler and less time-consuming and less costing method for

6 gene expression score in the future, we have randomly selected 30 GI and 30 GII patients and

7 for each patient, used real-time RT-PCR to quantify the expression of all 9 genes in the APL9

8 score (Figure S11). The expression levels of these genes show the concordance between the

9 real-time RT-PCR and RNA-seq results.

10 Establishment of a new prognosis model combining clinical characteristics, APL9 score,

11 and NRAS mutations

12 We assessed clinical characteristics, genomic mutations, and APL9 score in univariate Cox

13 analysis for OS, EFS, and DFS (Figure 5A). Characteristics used for the univariate Cox

14 analysis was summarized in Table S22. High WBC level (> 10 × 109/L) remained significant as

15 a classical outcome predictor (OS: HR = 2.419, 95% CI = 1.104–5.302, P = 0.027; EFS: HR =

16 2.548, 95% CI = 1.324–4.902, P = 0.0051; DFS: HR = 4.396, 95% CI = 1.473–13.117, P =

17 0.0080). In addition to NRAS mutation mentioned above, high APL9 score also predicted a

18 poor outcome (OS: HR = 4.087, 95% CI = 1.534–10.890 , P = 0.005; EFS: HR = 4.056, 95% CI

19 = 1.766–9.315, P = 0.001; DFS: HR = 5.629, 95% CI = 1.233–25.693, P = 0.026).

20 In univariate Cox analysis, WBC remained an important predictive factor, whereas PLT did not

21 show significant value in outcome analysis. Hence, the WBC count instead of the Sanz score

22 was used in the multivariate Cox regression to predict the EFS (β coefficient = 0.406, HR =

23 1.501, 95% CI = 0.723–3.116, P = 0.276). The other significant univariate predictors of EFS

24 were all included in the multivariate Cox model (APL9 score GII:β coefficient = 1.173, HR =

25 3.230, 95% CI = 1.311–7.959, P = 0.011; NRAS mutation: β coefficient = 1.056, HR = 2.875, 95%

26 CI = 1.297–6.375 , P = 0.009). The β coefficients of the model were multiplied by two and

27 rounded off to the nearest integer to form the scores for each predictor. The revised risk score

28 was formulated as such: WBC > 10 (× 109/L) × 1 + APL9 score (GII) × 2 + NRAS (mutation) × 2

29 (Figure 5B).

13

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 We quantified the degree of the agreement between the predicted and the actual EFS of the

2 new score system using 1000-sample bootstrapped calibration plots at 1-, 2-, and 5-year

3 (Figure S12). The C-index was 0.700 (95% CI = 0.613–0.787) and the optimism-corrected C

4 statistic was 0.704 (95% CI = 0.659–0.753), indicative of good discrimination. The AUCs of the

5 revised risk score predicting the OS, EFS, and DFS status were 0.721, 0.712, and 0.743,

6 respectively (Figure 5C). APL patients possessing sufficient follow-up data were separated

7 into three groups similar to the Sanz score by X-tile (14) based on the revised risk score

8 (Figure S13A): revised high- (revised HR, n = 93, score = 3–5), revised intermediate- (revised

9 MR, n = 81, score = 2), and revised low-risk (revised LR, n = 147, score = 0–1) groups (Figure

10 S13B). The revised HR group displayed significantly poor outcomes when compared to the

11 revised LR and MR groups, but the difference between MR and LR was not significant (Figure

12 S13C); this finding was similar to the Sanz stratification (Figure S13D). For this reason, we

13 combined the revised LR and MR groups into a single group, called the revised standard-risk

14 group (revised SR, n = 228, score = 0–2). As a result, 2 and 5 patients of Sanz low- and

15 intermediate-risk groups, respectively, who had high APL9 score and NRAS mutation were

16 reclassified into the revised HR group (Figure 5D). These 7 patients showed much inferior OS

17 (P = 0.0049) and EFS (P = 0.02) to that in the other 210 patients (Figure S14A). Eighteen

18 patients of Sanz high-risk group with neither high APL9 score nor NRAS mutation were

19 assigned into the revised SR group (Figure 5D). The 18 crossover patients had superior OS (P

20 = 0.085) and EFS (P = 0.038) over that in the other patients (Figure S14B).

21 Net reclassification improvement (NRI) is commonly used to quantify whether adding a new

22 predictor to an existing model is of benefit (22-25). We combined the Sanz low and

23 intermediate groups as the non-high-risk group (Figure S15A-C) and compared the NRI

24 between the revised model and the combined Sanz model. The 1000-sample bootstrapped

25 NRI values in OS, EFS, and DFS still showed 12.16% (95% CI = 0.00% to 35.00%), 10.27%

26 (95% CI = 1.41% to 30.47%), and 4.27% (95% CI = 0.00% to 25.31%) higher, respectively,

27 compared with those in the combined Sanz groups. Besides, the 2000-samples bootstrap

28 resampling strategy was also used to compare ROCs of the Sanz score and the revised score

29 (Figure S15D-F).

30 Using our newly introduced system, we next evaluated 321 patients in accordance with the 14

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 three key prognostic parameters, namely, NRAS mutation, APL9 score, and WBC count. The

2 revised SR category comprising 228 patients had a 5-year OS, EFS, and DFS rates of 95.6%

3 (95% CI = 93.0%–98.3%), 93.8% (95% CI = 90.7%–97.0%), and 98.1% (95% CI = 96.3%–

4 100.0%), respectively (Table S20). The revised HR category comprising 91 patients had a

5 5-year OS, EFS, and DFS rates of 82.9% (95% CI = 75.3%–91.3%), 77.4% (95% CI = 69.1%–

6 86.8%), and 88.4% (95% CI = 81.1%–96.4%), respectively, indicating worst outcomes. The

7 revised SR category had superior OS (P = 0.00031), EFS (P < 0.0001), and DFS (P = 0.001)

8 expectation over the revised HR group (Figure 5E). The characteristics of the revised groups

9 are shown in Table S23.

10 The correlation of minimal residual disease (MRD) and other forms of relapse to the 11 revised score

12 In APL, the most important MRD endpoint is the achievement of PCR negativity for PML–

13 RARA at the end of consolidation treatment. During the period of treatment, any detection of

14 PML–RARA (The lower limit of quantification for the quantitative PML–RARA assay is 1:10000)

15 would be considered as relapse (26,27). In this study, relapse was defined as occurrence of

16 either molecular, hematological or CNS relapse. As shown in the Table S24, in the 11 relapsed

17 patients, 3 showed hematological relapse, 5 had CNS relapse, and the remaining 3 patients

18 only had a detectable PML–RARA, which was considered as molecular relapse.

19 In our final revised score, relapse was lower in patients of revised SR group (n = 4, 1.754%)

20 than those of revised HR ones (n = 7, 7.527%, HR = 5.117, 95% CI = 1.498–17.48, P = 0.009).

21 Since we noticed that too small number of relapses occurred in our study, we call for much

22 larger validation sets to be generated in the future to confirm our findings.

23 External validation using publicly available datasets

24 Detailed searching process of available external validation datasets were demonstrated in

25 Supplementary Appendix. APL is a rare disease, and only two datasets were available for

26 external validation: one from TCGA dataset (28) and the other from GSE6891(29). In the

27 TCGA dataset, the gene expression profiles were available for 14 APL patients, together with

28 the information on WBC count, NRAS mutation status and survival information. This dataset

29 was then used for validation (Table S25). When the APL9 score was utilized in TCGA data,

15

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 which generated GI (n = 9) and GII (n = 5) groups (Figure S16A). Patients in the GII group had

2 significantly inferior OS to patients in the GI group (P = 0.025; Figure S16B). Furthermore,

3 when the revised risk model applied to TCGA patients, the AUC of OS was 0.850 (Figure

4 S16C). Patients were divided into the revised SR (n = 11) and revised HR (n = 3) groups. The

5 revised HR group showed a remarkably inferior OS to the revised SR group (P < 0.0001;

6 Figure S16D), thus providing the support to our revised risk model. Similarly, using the

7 GSE6891 dataset we demonstrated the predictive ability of the APL9 score (n = 8; Table S26

8 and Figure S17A). These 8 patients were stratified into GII (n = 3) and GI (n = 5) groups, and a

9 higher APL9 score (GII) was associated with shorter OS (P = 0.022; Figure S17B).

10

11 Discussion

12 From the beginning of the APL trial, investigators have noticed that APL should not be

13 regarded as the same whole, but a group of heterogeneous diseases including distinct clinical

14 behaviors and risk burdens. Various models were used to identify the risk factors of the APL,

15 and the most famous model was the Sanz risk stratification (4). More recent efforts tried to

16 examine the value of gene mutations and expression additional to PML–RARA, including FLT3,

17 epigenetic modifier genes (30), and the ISAPL model (31). These systems aimed at effectively

18 discriminating the dangerous cases and cases with a good outcome at disease onset. The

19 advantage of Sanz risk stratification is its convenience, which needs only the information of

20 WBC and PLT count; however, its molecular mechanism and background are far from being

21 known. Establishing molecular models to illustrate the genomic and transcriptomic difference

22 between different groups of APL patients with various prognoses, and more ambitiously, to

23 optimize the current system is of research interest.

24 Therefore, this genomic and transcriptomic analysis was performed to identify new biomarkers

25 for a better classification of risk groups. Our study served as the largest integrated analysis in

26 APL at present. In the panoramic view of somatic mutations, some of them were associated

27 with the Sanz risk groups. Of note, NRAS mutations were highly prevalent in hematologic

28 malignancies, which function to promote self-renewal and invasiveness of leukemia cells (32)

29 as well as to induce a spectrum of fatal hematologic disorders (33). Previous studies have

30 demonstrated that NRAS mutations bear potential prognostic value in AML (34) and acute 16

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 lymphoblastic leukemia (35). In this work, despite its relatively low number (n = 32; 9%), NRAS

2 showed a strong correlation with early death and relapse events, resulting in an extremely

3 poor prognosis. Notably, NRAS mutations were nearly equally distributed in the Sanz groups

4 and should be considered an independent molecular marker without any relationship with

5 clinical phenotypes.

6 Evidence suggested that the phenotypic features of leukemia are strongly associated with the

7 gene expression signature of malignant cells (36,37). Through analyzing the gene expression

8 of high-and low-risk APL, we have generated an APL9 score to predict the outcome of the

9 disease. The functions of the 9 genes involved in this score are shown in Table S21.

10 Interestingly, high expression levels of erythroid- and megakaryocyte-specific function genes

11 tended to be enriched in the low-risk APL patients, including relatively high GATA1 expression

12 levels, a well-known master erythroid transcription factor that regulates almost all

13 erythroid-specific genes through a dual zinc finger domain (38). Previous studies had proven

14 that a part of erythroid elements was derived from the PML–RARA-expressing progenitor cells

15 (39). As shown in our data, the high expression of GATA1 might promote the proliferation of

16 megakaryocyte, while causing a decrease of common myeloid progenitor toward the

17 differentiation to the granulocyte/monocyte lineage (40).

18 We used LASSO algorithm translating transcriptome features into a useful tool for prognosis.

19 This algorithm was chosen for its ability to prevent over-fitting as well as to create a simple but

20 practical model. It has been successfully used in the development of prognostic biomarkers for

21 AML (41) and T-lymphoblastic lymphoma (42).

22 We demonstrated the prognostic value of APL9 score reliably predicting outcomes in terms of

23 OS and EFS, but for DFS, the NRAS mutation was the only independent predictor. Combining

24 the APL9 score, NRAS, and WBC into a final model allowed us to predict all three clinical

25 outcomes (OS, EFS, and DFS as well), and could enrich the current stratification. Firsly, it

26 provided the understanding of the molecular mechanisms of Sanz score. Secondly, NRAS,

27 which was identified as an important biomarker through this study, may serve as a novel

28 potential biological target for those refractory and/or relapse APL patients.

29 We simplified the system into a two-category classification to accommodate with survival

30 distribution and clinical meaning: a larger revised SR category (n = 228), in which the ATRA– 17

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 ATO without chemotherapy consolidation seems to be the ideal and safe treatment to minimize

2 the toxicity of chemotherapy, and a smaller revised HR category (n = 93), in which the

3 combination of ATRA, traditional chemotherapy, prolonged ATO or other novel therapeutic

4 options should be used to reduce relapse rates and more efficient supportive care should be

5 provided to prevent early death.

6 We recognize that it is challenging nowadays to widely employ techniques such as RNA-seq in

7 daily clinical practice. In comparison with the information on gene mutations and fusions that

8 could be easily reproduced using different genetic assays, gene expression levels may vary

9 across platforms. For instance, as compared to RNA-seq (43,44), microarray is unable to

10 precisely detect genes with very low expression level. However, we believe that, with the

11 further reduction of cost and the emerging of next-generation technologies, gene expression

12 patterns may be found robustly associated with the treatment outcome of the disease. A

13 composite multi-gene signature will strongly supplement the current guideline that mainly

14 considers cytogenetic and gene mutational factors. We would also like to emphasize the

15 importance of the Sanz score regarding its feasibility and clinical value. On the other hand, we

16 are on the agenda to simplify the technique in the future by using real-time RT-PCR for gene

17 expression score which would be more convenient, less time-consuming and less costing.

18 Patients (n = 304) with available BM in the APL2012 clinical trial (NCT01987297) and a small

19 group of 44 historical cases treated with similar protocol were all enrolled in this study to make

20 the study representative by including as many samples as possible. Even though the

21 distribution of patients in this study showed no difference with the patients in the trial, caution

22 should still be taken for potential selection bias. However, the number of APL patients in the

23 publicly available datasets is very small, and we report the largest cohort. We also feel our

24 findings can be further confirmed/extended by scientific community, when larger external

25 validation sets are available. The present work has made a trial of integrating the molecular

26 markers (WES/WGS for gene mutation detection and RNA-seq for transcriptomic analysis)

27 and clinical parameters in a large series of APL for a better prognosis prediction. The intriguing

28 correlation between the molecular and clinical heterogeneities of APL may also provide

29 genomic insights into this disease model for improved therapy.

30 18

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Authors ‘contributions

2 SJC and ZC initiated, designed, and coordinated the study. XJL and NQ performed

3 bioinformatics and statistical analysis and experiments and interpreted the results. BC and YS

4 analyzed the data. XJL, NQ, YS, BC, HF, SJC, and ZC wrote the manuscript. SYW and LJ

5 performed the next-generation sequencing experiment. XJL and QX participated in the Sanger

6 sequencing validation. BWC provided help on the RNA-seq analysis. BSW performed the

7 statistical analysis. YC, SJZ, JML, LC, and HMZ provided clinical information.

8

9 Acknowledgments

10 This work was supported by the Center for High-performance Computing at Shanghai Jiao

11 Tong University. We thank all members of the Shanghai Institute of Hematology and National

12 Research Center for Translational Medicine at Shanghai. This study was funded by the

13 National High-tech Research and Development [863] Program of China (2012AA02A505), the

14 Overseas Expertise Introduction Project for Discipline Innovation (111 Project, B17029), the

15 Double First-Class Project (WF510162602) of Shanghai Jiao Tong University and Shanghai

16 Collaborative Innovation Program on Regenerative Medicine and Stem Cell Research

17 (2019CXJQ01), the National Natural Science Foundation of China (81670137, 81770141,

18 81890994), the National Key R&D Program of China (2016YFE0202800), Shanghai Municipal

19 Education Commission-Gaofeng Clinical Medicine Grant Support (20152501, 20161406),

20 Innovative research team of high-level local universities in Shanghai and Shanghai Guangci

21 Translational Medical Research Development Foundation.

22

19

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 References 2 1. Shen ZX, Shi ZZ, Fang J, Gu BW, Li JM, Zhu YM, et al. All-trans retinoic acid/As2O3 3 combination yields a high quality remission and survival in newly diagnosed acute 4 promyelocytic leukemia. Proc Natl Acad Sci U S A 2004;101(15):5328-35 doi 5 10.1073/pnas.0400053101. 6 2. Hu J, Liu YF, Wu CF, Xu F, Shen ZX, Zhu YM, et al. Long-term efficacy and safety of 7 all-trans retinoic acid/arsenic trioxide-based therapy in newly diagnosed acute 8 promyelocytic leukemia. Proc Natl Acad Sci U S A 2009;106(9):3342-7 doi 9 10.1073/pnas.0813280106. 10 3. Lo-Coco F, Avvisati G, Vignetti M, Thiede C, Orlando SM, Iacobelli S, et al. Retinoic acid 11 and arsenic trioxide for acute promyelocytic leukemia. N Engl J Med 2013;369(2):111-21 12 doi 10.1056/NEJMoa1300874. 13 4. Sanz MA, Coco FL, Martıń G, Avvisati G, Rayón C, Barbui T, et al. Definition of relapse 14 risk and role of nonanthracycline drugs for consolidation in patients with acute 15 promyelocytic leukemia: a joint study of the PETHEMA and GIMEMA cooperative groups. 16 Blood 2000;96(4):1247. 17 5. Sanz MA, Fenaux P, Tallman MS, Estey EH, Lowenberg B, Naoe T, et al. Management of 18 acute promyelocytic leukemia: updated recommendations from an expert panel of the 19 European LeukemiaNet. Blood 2019;133(15):1630-43 doi 20 10.1182/blood-2019-01-894980. 21 6. Burnett AK, Russell NH, Hills RK, Bowen D, Kell J, Knapper S, et al. Arsenic trioxide and 22 all-trans retinoic acid treatment for acute promyelocytic leukaemia in all risk groups 23 (AML17): results of a randomised, controlled, phase 3 trial. The Lancet Oncology 24 2015;16(13):1295-305 doi 10.1016/s1470-2045(15)00193-x. 25 7. Chen L, Zhu HM, Li Y, Liu QF, Hu Y, Zhou JF, et al. Arsenic trioxide replacing or reducing 26 chemotherapy in consolidation therapy for acute promyelocytic leukemia (APL2012 trial). 27 Proc Natl Acad Sci U S A 2021;118(6) doi 10.1073/pnas.2020382118. 28 8. World Medical A. World Medical Association Declaration of Helsinki: ethical principles for 29 medical research involving human subjects. Jama 2013;310(20):2191-4 doi 30 10.1001/jama.2013.281053. 31 9. Kuhn M. Building Predictive Models in R Using the caret Package. Journal of Statistical 32 Software 2008;28(5) doi 10.18637/jss.v028.i05. 33 10. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via 34 Coordinate Descent. J Stat Softw 2010;33(1):1-22. 35 11. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an 36 open-source package for R and S+ to analyze and compare ROC curves. BMC 37 Bioinformatics 2011;12:77 doi 10.1186/1471-2105-12-77. 38 12. Cheson BD, Bennett JM, Kopecky KJ, Buchner T, Willman CL, Estey EH, et al. Revised 39 recommendations of the International Working Group for Diagnosis, Standardization of 40 Response Criteria, Treatment Outcomes, and Reporting Standards for Therapeutic Trials 41 in Acute Myeloid Leukemia. J Clin Oncol 2003;21(24):4642-9 doi 42 10.1200/JCO.2003.04.036. 43 13. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. 44 Transparent Reporting of a multivariable prediction model for Individual Prognosis or 20

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162(1):W1-73 doi 2 10.7326/M14-0698. 3 14. Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker 4 assessment and outcome-based cut-point optimization. Clin Cancer Res 5 2004;10(21):7252-9 doi 10.1158/1078-0432.CCR-04-0713. 6 15. FrankHarrell. rms: Regression Modeling Strategies. 2019. 7 16. Canty A, Ripley B. boot: Bootstrap R (S-Plus) Functions. 2019. 8 17. Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. Springer; 9 2000. 10 18. Kassambara A, Kosinski M, Biecek P. survminer: Drawing Survival Curves using 'ggplot2'. 11 2019. 12 19. Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Discrimination and 13 Calibration of Clinical Prediction Models: Users' Guides to the Medical Literature. Jama 14 2017;318(14):1377-84 doi 10.1001/jama.2017.12126. 15 20. Inoue E. nricens: NRI for Risk Prediction Models with Time to Event and Binary Response 16 Data. 2018. 17 21. Deininger MWN, Tyner JW, Solary E. Turning the tide in 18 myelodysplastic/myeloproliferative neoplasms. Nat Rev Cancer 2017;17(7):425-40 doi 19 10.1038/nrc.2017.40. 20 22. Pencina MJ, D'Agostino RB, Sr., Steyerberg EW. Extensions of net reclassification 21 improvement calculations to measure usefulness of new biomarkers. Stat Med 22 2011;30(1):11-21 doi 10.1002/sim.4085. 23 23. Pencina MJ, D'Agostino RB, Vasan RS. Statistical methods for assessment of added 24 usefulness of new biomarkers. Clin Chem Lab Med 2010;48(12):1703-11 doi 25 10.1515/CCLM.2010.340. 26 24. Pencina MJ, D'Agostino RB, Pencina KM, Janssens AC, Greenland P. Interpreting 27 incremental value of markers added to risk prediction models. Am J Epidemiol 28 2012;176(6):473-81 doi 10.1093/aje/kws207. 29 25. Pencina MJ, D'Agostino RB, Sr., D'Agostino RB, Jr., Vasan RS. Evaluating the added 30 predictive ability of a new marker: from area under the ROC curve to reclassification and 31 beyond. Stat Med 2008;27(2):157-72; discussion 207-12 doi 10.1002/sim.2929. 32 26. Grimwade D, Jovanovic JV, Hills RK, Nugent EA, Patel Y, Flora R, et al. Prospective 33 minimal residual disease monitoring to predict relapse of acute promyelocytic leukemia 34 and to direct pre-emptive arsenic trioxide therapy. J Clin Oncol 2009;27(22):3650-8 doi 35 10.1200/JCO.2008.20.1533. 36 27. Dohner H, Estey E, Grimwade D, Amadori S, Appelbaum FR, Buchner T, et al. Diagnosis 37 and management of AML in adults: 2017 ELN recommendations from an international 38 expert panel. Blood 2017;129(4):424-47 doi 10.1182/blood-2016-08-733196. 39 28. Cancer Genome Atlas Research N, Ley TJ, Miller C, Ding L, Raphael BJ, Mungall AJ, et al. 40 Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J 41 Med 2013;368(22):2059-74 doi 10.1056/NEJMoa1301689. 42 29. Verhaak RG, Wouters BJ, Erpelinck CA, Abbas S, Beverloo HB, Lugthart S, et al. 43 Prediction of molecular subtypes in acute myeloid leukemia based on gene expression 44 profiling. Haematologica 2009;94(1):131-4 doi 10.3324/haematol.13299.

21

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 30. Shen Y, Fu YK, Zhu YM, Lou YJ, Gu ZH, Shi JY, et al. Mutations of Epigenetic Modifier 2 Genes as a Poor Prognostic Factor in Acute Promyelocytic Leukemia Under Treatment 3 With All-Trans Retinoic Acid and Arsenic Trioxide. EBioMedicine 2015;2(6):563-71 doi 4 10.1016/j.ebiom.2015.04.006. 5 31. Lucena-Araujo AR, Coelho-Silva JL, Pereira-Martins DA, Silveira DM, Koury LCA, Melo 6 RAM, et al. Combining gene mutation with gene expression analysis improves outcomes 7 prediction in acute promyelocytic leukemia. Blood 2019 doi 10.1182/blood.2019000239. 8 32. Sachs Z, LaRue RS, Nguyen HT, Sachs K, Noble KE, Mohd Hassan NA, et al. NRASG12V 9 oncogene facilitates self-renewal in a murine model of acute myelogenous leukemia. 10 Blood 2014;124(22):3274-83 doi 10.1182/blood-2013-08-521708. 11 33. Li Q, Haigis KM, McDaniel A, Harding-Theobald E, Kogan SC, Akagi K, et al. 12 Hematopoiesis and leukemogenesis in mice expressing oncogenic NrasG12D from the 13 endogenous . Blood 2011;117(6):2022-32 doi 10.1182/blood-2010-04-280750. 14 34. Schlenk RF, Dohner K, Krauter J, Frohling S, Corbacioglu A, Bullinger L, et al. Mutations 15 and treatment outcome in cytogenetically normal acute myeloid leukemia. N Engl J Med 16 2008;358(18):1909-18 doi 10.1056/NEJMoa074306. 17 35. Perentesis JP, Bhatia S, Boyle E, Shao Y, Shu XO, Steinbuch M, et al. RAS oncogene 18 mutations and outcome of therapy for childhood acute lymphoblastic leukemia. Leukemia 19 2004;18(4):685-92 doi 10.1038/sj.leu.2403272. 20 36. Li JF, Dai YT, Lilljebjorn H, Shen SH, Cui BW, Bai L, et al. Transcriptional landscape of B 21 cell precursor acute lymphoblastic leukemia based on an international study of 1,223 22 cases. Proc Natl Acad Sci U S A 2018;115(50):E11711-E20 doi 23 10.1073/pnas.1814397115. 24 37. Liu YF, Wang BY, Zhang WN, Huang JY, Li BS, Zhang M, et al. Genomic Profiling of Adult 25 and Pediatric B-cell Acute Lymphoblastic Leukemia. EBioMedicine 2016;8:173-83 doi 26 10.1016/j.ebiom.2016.04.038. 27 38. Ferreira R, Ohneda K, Yamamoto M, Philipsen S. GATA1 function, a paradigm for 28 transcription factors in hematopoiesis. Mol Cell Biol 2005;25(4):1215-27 doi 29 10.1128/MCB.25.4.1215-1227.2005. 30 39. Grignani F, Valtieri M, Gabbianelli M, Gelmetti V, Botta R, Luchetti L, et al. PML/RARα 31 fusion protein expression in normal human hematopoietic progenitors dictates myeloid 32 commitment and the promyelocytic phenotype. Blood 2000;96(4):1531-7 doi 33 10.1182/blood.V96.4.1531. 34 40. Graf T, Enver T. Forcing cells to change lineages. Nature 2009;462(7273):587-94 doi 35 10.1038/nature08533. 36 41. Ng SW, Mitchell A, Kennedy JA, Chen WC, McLeod J, Ibrahimova N, et al. A 17-gene 37 stemness score for rapid determination of risk in acute leukaemia. Nature 38 2016;540(7633):433-7 doi 10.1038/nature20598. 39 42. Tian XP, Huang WJ, Huang HQ, Liu YH, Wang L, Zhang X, et al. Prognostic and predictive 40 value of a microRNA signature in adults with T-cell lymphoblastic lymphoma. Leukemia 41 2019 doi 10.1038/s41375-019-0466-0. 42 43. Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, et al. The 43 concordance between RNA-seq and microarray data depends on chemical treatment and 44 transcript abundance. Nat Biotechnol 2014;32(9):926-32 doi 10.1038/nbt.3001.

22

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 44. Peters TJ, French HJ, Bradford ST, Pidsley R, Stirzaker C, Varinli H, et al. Evaluation of 2 cross-platform and interlaboratory concordance via consensus modelling of genomic 3 measurements. Bioinformatics 2019;35(4):560-70 doi 10.1093/bioinformatics/bty675. 4

23

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Table 1. Clinical and molecular characteristics of 348 APL patients in this study. All patients (n = WES + WGS (n = RNA-seq (n = Training cohort Validation cohort (n = Training vs. Validation Characteristics 348) 204) 324)† (n = 158) 104) Comparison, P*

Sex 0.408 Women 159 (45.7%) 92 (45.1%) 151 (46.6%) 68 (43.0%) 51 (49.0%) Men 189 (54.3%) 112 (54.9%) 173 (53.4%) 90 (57.0%) 53 (51.0%) Age (at diagnosis, year) 0.066 <40 193 (55.5%) 113 (55.4%) 180 (55.6%) 88 (55.7%) 57 (54.8%) 40–60 138 (39.7%) 80 (39.2%) 127 (39.2%) 62 (39.2%) 44 (42.3%)

≥60 17 (4.9%) 11 (5.4%) 17 (5.2%) 8 (5.1%) 3 (2.9%) White blood cell count (× 109/L) 3.3 (1.3–16.6) 3.2 (1.4–18.7) 3.4 (1.4–16.9) 3.4 (1.3–12.9) 2.3 (1.4–13.5) 0.664 Platelet count (× 109/L) 28.0 (16.0–41.2) 28.0 (17.0–43.5) 28.0 (16.0–43.0) 28.0 (17.0–38.0) 26.0 (13.0–37.8) 0.507 Hemoglobin level (g/L) 87.0 (68.0–106.2) 89.0 (70.5–110.0) 87.0 (68.0–107.2) 84.0 (66.0–107.0) 84.0 (69.0–105.0) 0.975 Bone marrow blast count (%) 87.0 (80.5–91.5) 87.3 (81.1–91.3) 87.5 (81.2–91.5) 88.5 (82.5–91.8) 85.5 (79.2–90.0) 0.012 Activated partial thromboplastin time (s) 30.2 (27.2–36.4) 28.9 (26.5–34.5) 30.3 (27.3–36.4) 30.3 (27.0–36.4) 29.7 (27.3–38.0) 0.903 Prothrombin time (s) 14.5 (12.6–16.5) 13.9 (12.4–16.0) 14.5 (12.6–16.5) 14.6 (12.6–16.3) 14.2 (12.4–16.7) 0.687 Fibrinogen (g/L) 1.4 (1.0–2.0) 1.3 (1.0–2.0) 1.4 (1.0–2.0) 1.4 (1.0–2.0) 1.3 (0.9–2.0) 0.787 Sanz risk group 0.697 Low-risk group 65 (18.7%) 42 (20.6%) 63 (19.4%) 19 (12.0%) 16 (15.4%) Intermediate-risk group 173 (49.7%) 92 (45.1%) 157 (48.5%) 95 (60.1%) 62 (59.6%) High-risk group 110 (31.6%) 70 (34.3%) 104 (32.1%) 44 (27.8%) 26 (25.0%) Consolidation strategy 0.288 without ATO 152 (43.7%) 95 (46.6%) 142 (43.8%) 71 (44.9%) 41 (39.4%) with ATO 172 (49.4%) 103 (50.5%) 160 (49.4%) 72 (45.6%) 57 (54.8%) PML–RARA transcripts‡ 0.451 L-type 209 (60.1%) 107 (52.5%) 209 (64.5%) 100 (63.3%) 70 (67.3%) S-type 89 (25.6%) 60 (29.4%) 89 (27.5%) 41 (25.9%) 28 (26.9%) V-type 24 (6.9%) 11 (5.4%) 24 (7.4%) 16 (10.1%) 6 (5.8%)

24

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Missing data 26 (7.5%) 26 (12.7%) 2 (0.6%) 1 (0.6%) 0 Early death 22 (6.3%) 6 (2.9%) 22 (6.8%) 15 (9.5%) 6 (5.8%) 0.355 Relapse 13 (3.7%) 13 (6.4%) 11 (3.4%) 5 (3.2%) 4 (3.8%) 0.744

Data are median (IQR) or n (%). *Chi-square or Fisher’s exact tests were performed to compare categorical variables, and Wilcoxon tests were performed to compare continuous variables between cohorts. †One patient was identified with PML–RARG fusion, and one patient was detected with PML–RARA and BCR–ABL1 fusions and excluded in the subsequent analysis. ‡The PML–RARA transcript isoforms were identified in accordance with the breakpoint of the PML gene. The isoform information of 24 patients were missing due to lack of RNA-seq data.

25

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure Legends

Figure 1. The multistep workflow chart of the integrated prognosis model establishment.

*In the 323 with RNA-seq data, two people without survival data were excluded from the Cox

regression. LASSO, least absolute shrinkage and selection operator; WBC, white blood cell

count; NRI, net reclassification improvement; C-index, concordance index.

Figure 2. Non-silent recurrent gene mutations in primary APL. (A) Genes mutated

recurrently in three or more patients or altered twice in the same site are shown and color

coded for different types of variant classification. Genes in rows are displayed in descending

order in accordance with the mutational frequency except for FLT3-ITD. Other recurrent genes

with two mutational events in different sites are compressed into one row for improved graphic

presentation. Sanger sequencing is applied for validation, and the detailed information is listed

in Table S9. Samples are arranged in columns to accentuate gene co-occurrence and mutual

exclusivity. Bars on the right panel indicate the exact number of mutations in each gene. (B)

Kaplan–Meier estimates of overall survival, event-free survival, and disease-free survival in

accordance with the NRAS mutation status. P-value is calculated using the log-rank test. ITD,

internal tandem duplications; TKD, point mutation in the tyrosine kinase domain.

Figure 3. Semi-supervised hierarchical clustering identified the two distinct subcohorts

(SC1 and SC2) of APL patients. (A) A total of 389 differentially expressed genes (DEGs) are

clustered using the ward.D method in 262 patients with APL. Distances are calculated on the

basis of the Pearson’s correlation coefficient. Samples are clustered into two subcohorts (SC1

and SC2). Columns indicate patients with APL and rows indicate genes. The first bottom panel

shows the clinical features and outcomes of patients with APL. The second bottom panel

depicts patients’ white blood cell count (WBC), platelet (PLT) level, and hemoglobin (Hb) level

at diagnosis. The third bottom panel displays the Sanz risk groups and the GATA1 expression

level, and the last bottom panel presents the distribution of the top three most frequently

altered genes. (B) (GO) enrichment analyses on 389 DEGs of gene cluster A.

(C) GO enrichment analyses on 389 DEGs of gene cluster B. (D) GATA1 expression in

accordance with different Sanz risk groups. (E) GATA1 expression in accordance with different

quantiles of WBC count. (F) GATA1 expression in accordance with PLT count.

Figure 4. Establishment of APL9 score and the systemic stratification model. (A) APL9

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

score model in the entire cohort (n = 323). The left panel shows the regression coefficients of 9

signature genes in the least absolute shrinkage and selection operator (LASSO) model. The

upper right panel shows the gene expression patterns of 9 relative genes across samples. The

bottom panel shows the bar plot of the APL9 score in ascending order. These 9 genes

weighted by regression coefficients are summed as follows: APL9 score = (GATA1 × −1.066) +

(CLCN4 × −1.234) + (CACNA2D2 × −0.544) + (OPTN × −0.308) + (KRT1 × −0.245) + (LTF ×

−0.205) + (S100A12 × −0.192) + (LEF1 × −0.169) + (LGALS1 × 0.111) + 27.163. The GI and

the GII subgroups are divided using the cutoff of 0.58 by the receiver operating characteristic

(ROC) curve (dotted line). (B) Sankey plot for reclassification from the Sanz risk to the APL9

score groups in the entire cohort. (C) Kaplan–Meier estimates of overall survival, event-free

survival, and disease-free survival in accordance with the APL9 score groups of the entire

cohort. P-value is calculated using the log-rank test.

Figure 5. Forest plot exhibiting the univariate and multivariate Cox regression of OS,

EFS, and DFS. (A) Forest plots of multivariable Cox-proportional hazard models showing

WBC, APL9 score, and NRAS as independent prognostic factors of OS, EFS, and DFS.

Hazard ratios and 95% confidence intervals (CIs) are listed next to each variable. Within the

forest plot, HR for each variable is depicted as a box and 95% CIs are shown as horizontal

lines. The vertical line crossing the value of 1 represents non-statistically significant effect,

odds greater than one indicate worse effects. (B) The chart is presented to show the

cumulative score of the revised risk model. In each cell of the chart, the risk for a person is

estimated with the values of each predictor in accordance with the multivariate Cox model.

Then, the cells of the chart are colored in accordance with the risk status: 0–2 and 3–5. (C)

Receiver operating characteristic (ROC) curves of the revised risk score to predict the OS,

EFS, and DFS status in the entire cohort. (D) Sankey plot for reclassification from Sanz risk to

the revised risk model. (E) Kaplan–Meier estimates of OS, EFS, and DFS in accordance with

the revised risk groups in the entire cohort. P-value is calculated using the log-rank test. AUC,

area under the curve; WBC, white blood cell count; HR, high-risk; SR, standard-risk; OS,

overall survival; EFS, event-free survival; DFS, disease-free survival.

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure 1

34 High-risk vs. 27 low-risk patients

Generate 389 DEGs

Total SC1 and SC2 (n = 262*)

Three-fifths Two-fifths

Training set Validation set (n = 158) (n = 104)

Logistic LASSO regression: APL9 score Generate GI and GII

Calibration plots: Discriminative ability: 1-, 2-, and 5-year Harrell’s C-index

Internal validation Transcriptomic APL9 score analysis 1000 bootstrap samples

Multivariate Univariate Genomic NRAS mutation Cox analysis revised risk Cox analysis analysis (n = 321*) score X-tile

Clinical WBC characteristics three-category system

two-category compared with system Combined Sanz

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure 2

A Altered in 282 (81.03%) of 348 samples 0 10050150 FLT3-ITD 13% FLT3-TKD 20% WT1 19% NRAS 9% ARID1A 7% RREB1 6% KRAS 4% USP9X 4% EBF3 4% ARID1B 3% CALR 2% SF3B1 2% SMARCB1 2% UPF3B 2% EP300 2% CCDC168 1% PEG3 1% CSMD3 1% ETV6 1% HK3 1% HUWE1 1% JAG2 1% NSD1 1% ZFP36L2 1% ABL1 1% TRRAP 1% Other recurrents 42% Sanz group Sex Relapse Early death Germline control Sanz group Sex Relapse Early death Germline control Mutation type High-risk Women Yes Yes Yes FLT3-ITD Frameshift insertion Inframe deletion Intermediate-risk Men No No No Nonsense Frameshift deletion Nonstop Low-risk RNA-seq Missense Multi hit

B ++++ 1.00 1.00 1.00 ++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++ ++++ +++ ++++++ +++++++++ +++ ++ ++++ + 0.75 +++ +++++ +++ +++ 0.75 +++ +++++++ ++++ +++++ +++ ++ 0.75

0.50 P = 0.00074 0.50 P = 0.0034 0.50 P = 0.0052

Overall survival Overall 0.25 Mutation status 0.25 Mutation status 0.25 Mutation status Event-free survival Event-free + NRASWT (314; events = 18) + NRASWT (314; events = 28) Disease-free survival + NRASWT (291; events = 10) + NRASMut (32; events = 7) + NRASMut (32; events = 8) + NRASMut (28; events = 4) 0.00 0.00 0.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Number at risk Time (years) Time (years) Time (years) WT 314 289 260 206 152 83 20 314 287 255 202 149 82 20 291 280 244 192 138 70 11 Mut 32 27 25 16 11 5 1 32 24 21 14 10 4 0 28 24 21 14 10 3 0

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure 3

Expression Z-score Gene cluster A A −5.70 5.7 B Sex Women Men

Age 18 65

Early death Yes No

Relapse Yes No

PML–RARA L S V

WBC group (×109/L) [0,10) [10,50) [50,+)

PLT group (×109/L) [0,20) [20,40) [40,+)

Hb group (g/L) [30,60) [60,90) [90,110) [110,+) SC1 SC2 Sex Sanz group Age High-risk Early death Intermediate-risk Relapse Low-risk PML–RARA isoform GATA1 expression WBC group Low High PLT group Hb group FLT3 mutation type FLT3-ITD Sanz group TKD-835 GATA1 expression TKD-others FLT3 Mutation type WT1 Gene mutation NRAS

B Gene cluster A C Gene cluster B homeostasis of number of cells ● Count T cell activation ● Count erythrocyte homeostasis ● 10 immune response−activating cell surface receptor signaling ● 10 cofactor catabolic process ● 15 immune response−activating signal transduction ● 20 erythrocyte differentiation ● lymphocyte differentiation BP BP ● gas transport ● adjP antigen receptor−mediated signaling pathway ● adjP hydrogen peroxide catabolic process 0.0025 T cell differentiation ● ● 0.01 hydrogen peroxide metabolic process ● 0.0050 regulation of leukocyte cell−cell adhesion ● antibiotic catabolic process ● 0.0075 positive regulation of cell−cell adhesion ● 0.02 0.0100 oxygen transport ● positive regulation of leukocyte cell−cell adhesion ● 0.03 erythrocyte development ● 0.0125 T cell selection ● 0.050 0.075 0.100 0.05 0.10 0.15 0.20 D E F Kruskal–Wallis, P < 0.0001 20 Kruskal–Wallis, P < 0.0001 20 Kruskal–Wallis, P < 0.0001 < 0.0001 0.00017 < 0.0001 15 < 0.0001 0.00018 < 0.0001 0.00031 0.022 < 0.0001 < 0.0001 0.058 ● ● ● ● 15 15 ● 0.0065 0.024 ● ● ● ● ● ● ● ● 0.290 0.950 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● expression GATA1 ● ● ●● ●● ● ● expression GATA1 ● ● ● ● ● expression GATA1 ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● 5 5 ●

Low-risk Intermediate-risk High-risk =Q3 =Q3 Sanz risk group Quantile of white blood cell count Quantile of platelet count

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure 4

A LGALS1 ( 0.111) LEF1 (−0.169) S100A12 (−0.192) LTF (−0.205) KRT1 (−0.245) Expression Z-score

OPTN (−0.308) −4 40 CACNA2D2 (−0.544) GATA1 (−1.066) CLCN4 (−1.234)

−1.0 −0.5 0 APL9 Coefficient 5 0 APL9 score group GI (n = 160) APL9 Score −5 Cut point: 0.58 GII (n = 163) −10 −15

B C 1.00 1.00 1.00 Sanz risk APL9 score + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++ +++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 49 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Low 0.75 0.75 0.75 ( n = 62) GI

13 rvival ( n = 160) 92 0.50 P = 0.0023 0.50 P = 0.00035 0.50 P = 0.012

( n = 157) 65 Intermediate ent−free su rvival Overall survival Overall 0.25 APL9 score group 0.25 APL9 score group 0.25 APL9 score group Ev Disease−free su 19 GII + GI (159; events = 5) + GI (159; events = 7) + GI (152; events = 2)

85 ( n = 163) + GII (162; events = 20) + GII (161; events = 27) + GII (142; events = 10) High

( n = 104) 0.00 0.00 0.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Number at risk Time (years) Time (years) Time (years) GI 159 153 134 105 76 39 10 GI 159 153 133 105 76 39 10 GI 152 151 130 101 73 33 6 GII 162 139 129 97 69 36 6 GII 162 135 122 92 66 35 5 GII 142 130 114 86 58 28 3

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure 5

A Overall survival Event-free survival Disease-free survival Characteristics Hazard ratio (95% CI)† P−value‡ Hazard ratio (95% CI)† P−value‡ Hazard ratio (95% CI)† P−value‡ Univariate Cox analysis Men 1.063 (0.483–2.342) 0.879 1.327 (0.679–2.593) 0.408 2.089 (0.655–6.662) 0.213 Age 1.006 (0.974–1.039) 0.726 0.998 (0.971–1.026) 0.907 0.973 (0.929–1.019) 0.244 WBC count (> 10 × 109/L) 2.419 (1.104–5.302) 0.027 2.548 (1.324–4.902) 0.0051 4.396 (1.473–13.117) 0.008 Platelet count (≥ 40 × 109/L) 0.692 (0.260–1.844) 0.462 0.664 (0.291–1.516) 0.331 1.096 (0.344–3.496) 0.877 Sanz risk group 0.441 0.020 0.040 Low-risk group 1[Reference] 1[Reference] Intermediate-risk group 1[Reference] 6.248 (0.825–47.304) 1.286 (0.134–12.363) High-risk group 1.669 (0.761–3.657) 11.703 (1.562–87.673) 5.690 (0.712–45.497) Consolidation (with ATO) 0.448 (0.041–4.946) 0.513 0.672 (0.233–1.936) 0.462 0.685 (0.238–1.973) 0.483 FLT3-ITD mutation 0.266 (0.036–1.970) 0.195 0.788 (0.279–2.229) 0.654 1.731 (0.483–6.205) 0.400 FLT3-TKD mutation 0.349 (0.082–1.482) 0.154 0.645 (0.251–1.658) 0.362 1.099 (0.307–3.941) 0.884 WT1 mutation 1.337 (0.534–3.347) 0.536 0.855 (0.356–2.053) 0.725 0.333 (0.044–2.547) 0.290 NRAS mutation 4.010 (1.675–9.602) 0.0018 3.049 (1.389–6.692) 0.0055 4.519 (1.416–14.420) 0.011 ARID1A mutation 0.539 (0.073–3.985) 0.545 0.766 (0.184–3.188) 0.714 0.979 (0.128–7.487) 0.984 RREB1 mutation 2.528 (0.756–8.449) 0.132 1.686 (0.517–5.498) 0.386 KRAS mutation 1.025 (0.139–7.579) 0.981 0.713 (0.098–5.205) 0.739 EBF3 mutation 1.026 (0.139–7.586) 0.980 0.702 (0.096–5.128) 0.728 PML–RARA transcripts§ 0.906 0.927 0.973 L-type 1[Reference] 1[Reference] 1[Reference] S-type 0.826 (0.326–2.096) 0.951 (0.438–2.065) 0.864 (0.229–3.258) V-type 1.083 (0.250–4.688) 1.240 (0.371–4.143) 1.046 (0.131–8.372) APL9 score (GII)§ 4.087 (1.534–10.890) 0.005 4.056 (1.766–9.315) 0.001 5.629 (1.233–25.693) 0.026 Multivariate Cox analysis WBC count (> 10 × 109/L)§ 1.379 (0.589–3.228) 0.459 1.501 (0.723–3.116) 0.276 2.820 (0.746–10.656) 0.126 NRAS mutation§ 3.444 (1.432–8.287) 0.006 2.875 (1.297–6.375) 0.009 5.017 (1.499–16.792) 0.009 APL9 score (GII)§ 3.298 (1.141–9.529) 0.028 3.230 (1.311–7.959) 0.011 3.170 (0.588–17.086) 0.179 1 5 10 1 5 10 1 5 10 B C 9 0 2 2 4 WBC ≤ 10 × 10 /L 100

WBC > 10 × 109/L 1 3 3 5 80

GI GII GI GII 60 NRASWT NRASMut D AUC = 0.721 60

Sensitivity (%) AUC = 0.712 Low

( n = 62) 2 AUC = 0.743 ( n = 228) revised SR revised 150 20 40 OS status EFS status ( n = 155)

Intermediate 5 DFS status 0 18 86 0 20 40 60 80 100 ( n = 93) High

revised HR revised 100 − Specificity (%) ( n = 104) E 1.00 1.00 1.00 +++++ +++++++ +++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++ + +++ ++++++++++++++++++++++++++ +++ +++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++ 0.75 0.75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ + 0.75

0.50 P = 0.00031 0.50 P < 0.0001 0.50 P = 0.001

Overall survival Overall 0.25 Revised group 0.25 Revised group 0.25 Revised group Event-free survival Event-free + revised SR (228; events = 10) + revised SR (228; events = 14) Disease-free survival + revised SR (216; events = 4) + revised HR (93; events = 15) + revised HR (93; events = 20) + revised HR (78; events = 8) 0.00 0.00 0.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Number at risk Time (years) Time (years) Time (years) revised SR 228 215 192 150 109 57 13 228 214 189 148 108 57 13 216 211 184 142 102 49 7 revised HR 93 77 71 52 36 18 3 93 74 66 49 34 17 2 78 70 60 45 29 12 2

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research. Author Manuscript Published OnlineFirst on April 23, 2021; DOI: 10.1158/1078-0432.CCR-20-4375 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Integration of genomic and transcriptomic markers improves the prognosis prediction of acute promyelocytic leukemia

Xiao Jing Lin, Niu Qiao, Yang Shen, et al.

Clin Cancer Res Published OnlineFirst April 23, 2021.

Updated version Access the most recent version of this article at: doi:10.1158/1078-0432.CCR-20-4375

Supplementary Access the most recent supplemental material at: Material http://clincancerres.aacrjournals.org/content/suppl/2021/04/22/1078-0432.CCR-20-4375.DC1

Author Author manuscripts have been peer reviewed and accepted for publication but have not yet Manuscript been edited.

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://clincancerres.aacrjournals.org/content/early/2021/04/23/1078-0432.CCR-20-4375. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from clincancerres.aacrjournals.org on September 25, 2021. © 2021 American Association for Cancer Research.