Supplemental Information Somatic Mutations Drive Specific, but Reversible Epigenetic Heterogeneity States in AML

Supplemental Information Somatic mutations drive specific, but reversible epigenetic heterogeneity states in AML Sheng Li*,#,1,2,3,4, Xiaowen Chen#,1, Jiahui Wang1, Cem Meydan5, Jacob L. Glass6, Alan H. Shih7, Ruud Delwel8, Ross L. Levine9, Christopher E. Mason5, and Ari M. Melnick*,10 Author Affiliation: 1The Jackson Laboratory for Genomic Medicine, Farmington, CT; 2The Jackson Laboratory Cancer Center, Bar Harbor, ME; 3The Department of Genetics and Genomic Sciences, The University of Connecticut Health Center, Farmington, CT; 4Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA; 5Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY; 6Leukemia Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York,NY; 7Center for Hematologic Malignancies, Memorial Sloan Kettering Cancer Center, New York, NY; 8Department of Hematology, Erasmus University Medical Center and Oncode Institute, Rotterdam, The Netherlands; 9Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY; 10Division of Hematology/Oncology, Weill Cornell Medicine, New York, NY *Corresponding Authors: Sheng Li, PhD The Jackson Laboratory for Genomic Medicine Farmington, CT 06032, USA Tel: 860-837-2119 Email: [email protected] Ari M. Melnick, MD Department of Medicine Division of Hematology and Medical Oncology Weill Cornell Medicine New York, NY 10021, USA Tel: 212-746-7643 Email: [email protected] # These authors contribute equally. 1 Supplementary Methods Supplementary Figures 1-16 Supplementary Table 1 Supplementary Table 4 Supplementary Table 5 Supplementary Table 6 Supplementary Table 7 Supplementary Table 9 2 Supplementary Methods Microarray data analysis. Gene expression data of primary AML patients was profiled by Affymetrix Human Genome 133 Plus2.0 GeneChips, which has been published previously(1,2) (GEO accession number: GSE6891). Raw expression microarray data were processed using the HGU133PLUS2.0 BrainArray annotation version 17.0.0(3). Gene expression levels were transformed to log2 values. Differentially expressed genes were called using limma (FDR adjusted P value ≤ 0.05, a fold change of 1.5). RNA sequencing and analysis. We collected gene expression profiles of LSK cells from wild- type (4 samples), Idh2R140Q (2 samples) and Idh2R140Q;Flt3ITD (5 samples) mice, from Idh2R140Q;Flt3ITD mice treated with AG-221 (4 samples) or vehicle (4 samples), and from Tet2-/-; Flt3ITD mice treated with AzaC (4 samples) or vehicle (4 samples) (Table S9)(4). Briefly, the RNA library was generated using Illumina HiSeq. RNA sequence data were aligned to the mouse genome (mm9) using STAR(5), and RNAseq counts were annotated to GENCODE reference annotation using the Subread package(6). Gene expression data were normalized using the transcripts per million (TPM) method. Then we evaluated transcriptional heterogeneity using the above gene expression data. Specifically, the coefficient of variation per gene across samples was calculated using TPM. The top 75% of genes ranked in descending order by the average TPM of all samples were included in determining the coefficient of variation. Whole Genome Bisulfite sequencing (WGBS) data collection and analysis WGBS data of total bone marrow cells from AML patients with and without of DNMT3AR882H/C mutations were obtained from Spencer, D.H. et al.’s work(7). Five normal karyotype AML samples with DNMT3AR882H/C mutations and five normal karyotype AML samples with no DNMT3A mutations were used to investigate the contribution of DNMT3A mutations to DNA methylation heterogeneity in AML. WGBS was performed using either the EpiGnome/TruSeq DNA methylation Kit (illumina). Alignment was performed to the reference genome (hg19) using biscuit, after which methylation values were calculated with MethylDackel version 0.3.0. To the above WGBS datasets, we estimated intra-sample DNA methylation heterogeneity using the M score approach(8). M-score assesses epigenetic heterogeneity by the percentage of CpG sites harboring intermediate methylation levels (from 0.45 to 0.55). An intermediate score reflects the presence of balanced hypo- and hyper-methylated CpGs within the cells of a sample and thus high intra-sample variation. 3 Calculation of epigenetic allele diversity. In this study, we used three local epigenetic allele diversity measurements, including PDR, Epipolymorphism and Shannon entropy. All the three measurements considered variability of DNA methylation patterning occurring in discrete sets of four adjacent CpGs which were able to be captured in individual sequencing reads. Any four consecutive CpGs can give up to 16 different allelic patterns. The three methods differ in how variance of these patterns is calculated. Specifically, the proportion of discordant reads (PDR) is a measure of locally disordered DNA methylation(9). In aligned ERRBS data, a read at one given locus with four adjacent CpG sites was classified as a concordant read or a discordant read. Here, a concordant read is one that shows either an unmethylated or a methylated state at all CpG sites of a given locus; i.e., four unmethylated cytosines or four methylated cytosines at a given locus. A discordant read is one that shows different methylated and unmethylated states at a given locus, such as one methylated cytosine followed by three unmethylated cytosines. PDR at each locus !"#$%&'()* &,(' )-./,& is defined as , i.e. the proportion of discordant reads compared to the total 0%*(1 )-./,& %2 &,('# number of reads from that locus. Epipolymorphism of a given locus in the cell population was defined as the probability that epialleles randomly sampled from the locus differ from each 16 2 other(10). Epipolymorphism was calculated as1- p , where pi is the fraction of each epiallele åi=1 i i in the cell population. Shannon entropy was defined as the chance that two randomly chosen epialleles of a given locus have different methylated states(11). Shannon entropy of a given locus 16 was calculated as - p log p , where pi is the fraction of each epiallele i in the cell population. åi=1 i i Each measurement employs different calculations and equations and is based on slightly different assumptions. Each assay has distinct strengths and weaknesses. The strength of the PDR approach is its simplicity in dividing loci into two categories: discordant vs concordant, which allows for simple distinctions that are easy to integrate with other biological or genomic parameters. The weakness is that it does not consider the frequency or types of distinct 4-CpG patterns occurring at each locus and hence misses potentially important information on the type of discordance occurring at each locus. PDR also does not distinguish fully methylated vs fully un-methylated alleles. In contrast, Epipolymorphism and Shannon entropy capture these kinds of differences. Both of these measurements capture information on all 16 possible allele variants that can be manifested at an individual locus. However, they differ in that Epipolymorphism is designed to capture the probability of sampling two distinct patterns at a time, whereas Shannon entropy considers the 4 proportion of all sixteen possible patterns together. They are more closely related to each other than PDR. Epipolymorphism is a statistical measurement of variance, whereas Shannon entropy is based on information theory assumptions and examines chaos in organized systems. The value in comparing and contrasting them is to determine whether these highly orthogonal and different types of equation yield similar results, which provides a far more rigorous analysis than using only one assay. In contrast to the above three methods, which infer epigenetic allele diversity from the basic state of any given population of cells, epiallele shifts per million loci (EPM), which was described previously(12,13), is a normalized measure of a sample’s global epiallele changes at different time points. Briefly, the epiallele patterns of compositional changes between AML patient samples and normal bone marrow samples were examined using methclone(12) to calculate the combinatorial entropy change (∆�) of epialleles at each locus. Then, eloci were defined as those loci that underwent a significant epiallele shift ( DS < -70 ). EPM was calculated as the number of eloci per million loci covered by sequencing data. Here, we required that each locus was covered by at least 60 sequencing reads. Loci shared by all 119 patients were considered in this analysis. No DMR EPM was calculated by excluding DMR. Differentially methylated regions (DMRs) were defined as regions with total methylation differences of more than 25%. DNA methylation heterogeneity by HELP assay Bone marrow was obtained from patients with myelodysplastic syndromes (MDSs) and MDS- associated AML using the HELP assay at baseline and at days 15 and 29 during the first cycle of treatment with DNMTi 5’-azacytidine at concentrations of 30, 40, or 50 mg/m2 per day subcutaneously for 10 consecutive days. DNA methylation was profiled using the HELP (HpaII tiny fragment enrichment by ligation mediated PCR) assay in previous work, which was stored in the GEO database (accession number: GSE17328)(14). One patient was removed because no significantly changes in DNA methylation profile were found for this patient after treatment. Finally, our study used HELP assays of six patients at both baseline and at days 15 or 29 days of 5’- azacytidine treatment. Because of the nature of the platform,

Supplemental Information Somatic Mutations Drive Specific, but Reversible Epigenetic Heterogeneity States in AML

MCG10 (PCBP4) (NM 033008) Human Recombinant Protein Product Data

Genome-Wide DNA Methylation Map of Human Neutrophils Reveals Widespread Inter-Individual Epigenetic Variation

Spermatozoal Gene KO Studies for the Highly Present Paternal Transcripts

Identification of Inherited Retinal Disease-Associated Genetic Variants in 11 Candidate Genes

Genome-Wide Linkage and Association Study Implicates the 10Q26 Region As a Major Genetic Contributor to Primary Nonsyndromic

DEAH-Box Polypeptide 32 Promotes Hepatocellular Carcinoma Progression Via Activating the Β-Catenin Pathway

DHX32 Expression Suggests a Role in Lymphocyte Differentiation

Potential Sperm Contributions to the Murine Zygote Predicted by in Silico Analysis

A Causal Gene Network with Genetic Variations Incorporating Biological Knowledge and Latent Variables

Mice Deficient in Poly(C)-Binding Protein 4 Are Susceptible to Spontaneous Tumors Through Increased Expression of ZFP871 That Targets P53 for Degradation

Anti-PCBP4 Antibody (ARG64279)

DEAH)/RNA Helicase a Helicases Sense Microbial DNA in Human Plasmacytoid Dendritic Cells