Supplemental Information Somatic mutations drive specific, but reversible epigenetic heterogeneity states in AML
Sheng Li*,#,1,2,3,4, Xiaowen Chen#,1, Jiahui Wang1, Cem Meydan5, Jacob L. Glass6, Alan H. Shih7, Ruud Delwel8, Ross L. Levine9, Christopher E. Mason5, and Ari M. Melnick*,10
Author Affiliation:
1The Jackson Laboratory for Genomic Medicine, Farmington, CT; 2The Jackson Laboratory Cancer Center, Bar Harbor, ME; 3The Department of Genetics and Genomic Sciences, The University of Connecticut Health Center, Farmington, CT; 4Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA; 5Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY; 6Leukemia Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York,NY; 7Center for Hematologic Malignancies, Memorial Sloan Kettering Cancer Center, New York, NY; 8Department of Hematology, Erasmus University Medical Center and Oncode Institute, Rotterdam, The Netherlands; 9Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY; 10Division of Hematology/Oncology, Weill Cornell Medicine, New York, NY
*Corresponding Authors:
Sheng Li, PhD The Jackson Laboratory for Genomic Medicine Farmington, CT 06032, USA Tel: 860-837-2119 Email: [email protected]
Ari M. Melnick, MD Department of Medicine Division of Hematology and Medical Oncology Weill Cornell Medicine New York, NY 10021, USA Tel: 212-746-7643 Email: [email protected]
# These authors contribute equally.
1 Supplementary Methods
Supplementary Figures 1-16
Supplementary Table 1
Supplementary Table 4
Supplementary Table 5
Supplementary Table 6
Supplementary Table 7
Supplementary Table 9
2 Supplementary Methods
Microarray data analysis. Gene expression data of primary AML patients was profiled by Affymetrix Human Genome 133 Plus2.0 GeneChips, which has been published previously(1,2) (GEO accession number: GSE6891). Raw expression microarray data were processed using the HGU133PLUS2.0 BrainArray annotation version 17.0.0(3). Gene expression levels were transformed to log2 values. Differentially expressed genes were called using limma (FDR adjusted P value ≤ 0.05, a fold change of 1.5).
RNA sequencing and analysis. We collected gene expression profiles of LSK cells from wild- type (4 samples), Idh2R140Q (2 samples) and Idh2R140Q;Flt3ITD (5 samples) mice, from Idh2R140Q;Flt3ITD mice treated with AG-221 (4 samples) or vehicle (4 samples), and from Tet2-/-; Flt3ITD mice treated with AzaC (4 samples) or vehicle (4 samples) (Table S9)(4). Briefly, the RNA library was generated using Illumina HiSeq. RNA sequence data were aligned to the mouse genome (mm9) using STAR(5), and RNAseq counts were annotated to GENCODE reference annotation using the Subread package(6). Gene expression data were normalized using the transcripts per million (TPM) method. Then we evaluated transcriptional heterogeneity using the above gene expression data. Specifically, the coefficient of variation per gene across samples was calculated using TPM. The top 75% of genes ranked in descending order by the average TPM of all samples were included in determining the coefficient of variation.
Whole Genome Bisulfite sequencing (WGBS) data collection and analysis
WGBS data of total bone marrow cells from AML patients with and without of DNMT3AR882H/C mutations were obtained from Spencer, D.H. et al.’s work(7). Five normal karyotype AML samples with DNMT3AR882H/C mutations and five normal karyotype AML samples with no DNMT3A mutations were used to investigate the contribution of DNMT3A mutations to DNA methylation heterogeneity in AML. WGBS was performed using either the EpiGnome/TruSeq DNA methylation Kit (illumina). Alignment was performed to the reference genome (hg19) using biscuit, after which methylation values were calculated with MethylDackel version 0.3.0.
To the above WGBS datasets, we estimated intra-sample DNA methylation heterogeneity using the M score approach(8). M-score assesses epigenetic heterogeneity by the percentage of CpG sites harboring intermediate methylation levels (from 0.45 to 0.55). An intermediate score reflects the presence of balanced hypo- and hyper-methylated CpGs within the cells of a sample and thus high intra-sample variation.
3 Calculation of epigenetic allele diversity. In this study, we used three local epigenetic allele diversity measurements, including PDR, Epipolymorphism and Shannon entropy. All the three measurements considered variability of DNA methylation patterning occurring in discrete sets of four adjacent CpGs which were able to be captured in individual sequencing reads. Any four consecutive CpGs can give up to 16 different allelic patterns. The three methods differ in how variance of these patterns is calculated. Specifically, the proportion of discordant reads (PDR) is a measure of locally disordered DNA methylation(9). In aligned ERRBS data, a read at one given locus with four adjacent CpG sites was classified as a concordant read or a discordant read. Here, a concordant read is one that shows either an unmethylated or a methylated state at all CpG sites of a given locus; i.e., four unmethylated cytosines or four methylated cytosines at a given locus. A discordant read is one that shows different methylated and unmethylated states at a given locus, such as one methylated cytosine followed by three unmethylated cytosines. PDR at each locus is defined as , i.e. the proportion of discordant reads compared to the total number of reads from that locus. Epipolymorphism of a given locus in the cell population was defined as the probability that epialleles randomly sampled from the locus differ from each
16 2 other(10). Epipolymorphism was calculated as1- p , where pi is the fraction of each epiallele åi=1 i i in the cell population. Shannon entropy was defined as the chance that two randomly chosen epialleles of a given locus have different methylated states(11). Shannon entropy of a given locus
16 was calculated as - p log p , where pi is the fraction of each epiallele i in the cell population. åi=1 i i Each measurement employs different calculations and equations and is based on slightly different assumptions.
Each assay has distinct strengths and weaknesses. The strength of the PDR approach is its simplicity in dividing loci into two categories: discordant vs concordant, which allows for simple distinctions that are easy to integrate with other biological or genomic parameters. The weakness is that it does not consider the frequency or types of distinct 4-CpG patterns occurring at each locus and hence misses potentially important information on the type of discordance occurring at each locus. PDR also does not distinguish fully methylated vs fully un-methylated alleles. In contrast, Epipolymorphism and Shannon entropy capture these kinds of differences. Both of these measurements capture information on all 16 possible allele variants that can be manifested at an individual locus. However, they differ in that Epipolymorphism is designed to capture the probability of sampling two distinct patterns at a time, whereas Shannon entropy considers the
4 proportion of all sixteen possible patterns together. They are more closely related to each other than PDR. Epipolymorphism is a statistical measurement of variance, whereas Shannon entropy is based on information theory assumptions and examines chaos in organized systems. The value in comparing and contrasting them is to determine whether these highly orthogonal and different types of equation yield similar results, which provides a far more rigorous analysis than using only one assay.
In contrast to the above three methods, which infer epigenetic allele diversity from the basic state of any given population of cells, epiallele shifts per million loci (EPM), which was described previously(12,13), is a normalized measure of a sample’s global epiallele changes at different time points. Briefly, the epiallele patterns of compositional changes between AML patient samples and normal bone marrow samples were examined using methclone(12) to calculate the combinatorial entropy change (∆�) of epialleles at each locus. Then, eloci were defined as those loci that underwent a significant epiallele shift ( DS < -70 ). EPM was calculated as the number of eloci per million loci covered by sequencing data. Here, we required that each locus was covered by at least 60 sequencing reads. Loci shared by all 119 patients were considered in this analysis. No DMR EPM was calculated by excluding DMR. Differentially methylated regions (DMRs) were defined as regions with total methylation differences of more than 25%.
DNA methylation heterogeneity by HELP assay
Bone marrow was obtained from patients with myelodysplastic syndromes (MDSs) and MDS- associated AML using the HELP assay at baseline and at days 15 and 29 during the first cycle of treatment with DNMTi 5’-azacytidine at concentrations of 30, 40, or 50 mg/m2 per day subcutaneously for 10 consecutive days. DNA methylation was profiled using the HELP (HpaII tiny fragment enrichment by ligation mediated PCR) assay in previous work, which was stored in the GEO database (accession number: GSE17328)(14). One patient was removed because no significantly changes in DNA methylation profile were found for this patient after treatment. Finally, our study used HELP assays of six patients at both baseline and at days 15 or 29 days of 5’- azacytidine treatment. Because of the nature of the platform, we are unable to perform allele diversity analyses, and instead have used the M score approach(8). The cutoff for defining the intermediate score [0.3, 0.85] was transformed from Mass Array methylation levels [0,45, 0.55] using previously described regression coefficient R= -0.77(14).
5 Data analysis. Unsupervised hierarchical clustering was performed using a Euclidean distance measurement. Specifically, in AML human patients, the top 50% variable loci across all patients measured by standard deviation values of epigenetic allele diversity were used in clustering of PDR, Epipolymorphism, and Shannon entropy (20,948 loci) and in clustering of epiallele shifts (20,764 loci). In mouse strains, the union of the eloci was used in hierarchical clustering of PDR and Epipolymorphism. (2,255 eloci for PDR and 2,145 eloci for Epipolymorphism in Idh2R140Q&Flt3ITD mice; 143 eloci for PDR and 19 eloci for Epipolymorphism in Tet2-/-&Flt3ITD mice). The top 10% variable loci across all samples measured by standard deviation values of Shannon entropy were used in clustering of Shannon entropy (12,234 loci in Idh2R140Q&Flt3ITD mice; 22,711 loci in Tet2-/-&Flt3ITD mice). Dendrograms were constructed using Ward.D2’s method.
To visualize the sample diversity of different AML subtypes and mouse model mutations in a 2- dimensional scatter plot, the t-distributed stochastic neighborhood embedding (t-SNE) algorithm(15) was performed using the Barnes-Hut approximation(16) implemented in the Bioconductor package Rtsne with 5,000 iterations. We applied this unsupervised technique to the same subset of loci with hierarchical clustering. After reducing dimensionality, each point on the obtained two-dimensional scatter plot was colored according to its AML subtype for human, and its mutation information for mouse strains.
Assessment of similarity between two AML subtypes based on eloci was performed using the Jaccard similarity coefficient. For any two AML subtypes i and j, the Jaccard similarity coefficient
si " s j was calculated as , where si and s j are the sets of eloci in AML subtype i and j, si ! s j respectively. Here, to avoid the potential bias caused by the patients having more than one AML genetic or epigenetic lesions for subtyping, we focused on eloci that undergo a significant epiallele diversity change between patients unique to a given AML subtype and NBM controls. Finally, 10 of 13 AML subtypes were considered in assessment of the Jaccard similarity coefficient.
Two-samples Welch’s t-test was performed to compare the mean of two independent groups, an adaptation of classical Student’s t-test. The Welch’s t-test does not assume that the homogeneity of variance. Specifically, the Welch’s t-test is more reliable than the Student’s t-test when the two samples have unequal variances and unequal sample sizes(17,18). Furthermore, the Welch’s t- test gives the same result as the Student’s t-test when sample sizes and variances are equal.
6 Genomic location annotation. GENCODE gene model was downloaded from the UCSC genome browser and was used for loci and eloci annotation with R-package GenomicRanges. Promoters were defined as regions up or down to a distance of 1kb from transcription start sites in the reference annotation (hg19 for human and mm9 for mouse). Gene neighborhoods were defined as regions 50 kb away from the transcription start sites or transcription end sites.
Region set enrichment analysis. The Molecular Signatures Database (MSigDB v6.1, http://software.broadinstitute.org/gsea/msigdb) is a widely used database of gene sets for performing gene set enrichment analysis. We obtained the curated gene sets and the motif gene sets from MSigDB v6.1. For each AML subtype, we performed a hypergeometric gene set enrichment analysis for genes containing eloci within their promoters with FDR adjusted P-value ≤ 0.05. As background, we used the genes containing loci shared by 119 AML patients and 14 NBM controls within their promoters.
Additionally, high-quality Chip-seq data for seven hematopoietic transcription factors (FLI1, ERG, GATA2, RUNX1, SCL, LYL1 and LMO2) in primary human CD34+ blood stem and progenitor cells were obtained from the BloodChIP database(19). For each AML subtype, we performed peak region enrichment analysis for eloci with FDR adjusted P-value ≤ 0.05. As background, we used all the loci shared by 119 AML patients and 14 NBM controls.
Association with clinical outcome. Multivariable Cox proportional hazards regression model was performed to investigate the association between the event-free survival (EFS, event denotes failure to achieve complete remission) and epigenetic allele diversity controlling for age, gender, t(15;17), inv(16), Del(5;7q), t(v;11q23), t(8;21), EVI1, CEBPA-dm, NPM1, IDH1/2, IDH1/2 + DNMT3A, DNMT3A, FLT3ITD, TET2 and CEBPA-sil (R package survival). We divided the primary AML patients into two groups: high epiallele diversity and low epiallele diversity. Here, an optimal cut-off for epiallele diversity was selected based on maximal log-rank statistic(20,21). Kaplan- Meier survival curves were drawn and compared among subgroups using log-rank tests.
Development of an epiallele-based prognostic model for event-free survival in AML In order to develop a prognostic biomarker model which is predictive of AML even-free survival, we used the supervised principal components (SuperPC) method(22). We collected 256 AML patients by combining the current 119 patients with the 137 patients from our 2016 Nature Medicine paper cohort(13). Two patients were removed because they don’t have information on
7 time to relapse. We then calculated the PDR values of the 635 loci shared by all 254 patients and used their clinical annotation to build an epiallele diversity prognostic classifier. The complete cohort was then randomly divided into a training set (n=151), a test set (n=65), and a validation set (n=38). We applied three steps to develop and validate our epiallele prognostic model.
Step 1. Epiallele Prognostic Model Training: We used the supervised principal components (SuperPC) method(22) on the training set of patients, to train a Cox proportional hazards regression model for event-free survival (event denotes relapse). Specifically, Univariate Cox model regression coefficients were computed for each locus. Here, the threshold for selecting prognosis-related loci was determined so that they maximized performance, using a 10-fold cross- validation procedure on the training set. This procedure yielded a set of 26 loci that very well (based on likelihood ratio test statistic) segregate patients according to their event free survival times. Then, we fit a SuperPC model to calculate the first principal component of the reduced PDR matrix of 26 prognosis-related loci. We confirmed the robustness of the selected threshold (set to 1.7381) and first principal componence by performing 1000 additional random splits of the data set into training sets of 151 patients and test sets of 103 patients and running the SuperPC algorithm for each of them, which were validated with a P value of <0.05 in a Cox proportional hazards regression model. The first principal component was subsequently used to calculate continuous prognostic scores (i.e. SuperPC scores) on the test cohort and validation cohort as follows.
Step 2. Model Testing: The SuperPC model was then applied to our test set of 65 patients. We fit the SuperPC scores to their event-free survival using a Cox proportional hazards model to investigate whether the SuperPC scores of the test set were successfully able to classify the test set patients into high-risk group and low-risk group.
Step 3. Model Validation: The above procedure was applied to the 38-patient validation cohort as well as the combined test and validation cohort to confirm the robustness of the prognostic model derived from the training set.
Finally, To determine the significance of this 26-loci prognostic classifier as a putative biomarker, we performed a multivariate analysis considering 26-loci SuperPC score, age, gender, ELN risk stratification(23), FLT3ITD, CEBPA-dm, t(8;21), NPM1, and inv(16) as covariates (these were the only variables that were available to us in all patients).
8 Supplementary Figures a b c 3.50 0.14