Leukemia (2014) 28, 1617–1626 & 2014 Macmillan Publishers Limited All rights reserved 0887-6924/14 www.nature.com/leu

ORIGINAL ARTICLE Epigenetic regulation of GATA2 and its impact on normal karyotype acute myeloid leukemia

M Celton1,2,3, A Forest1,2, G Gosse1,2, S Lemieux1,4, J Hebert1,5, G Sauvageau1,5,6 and BT Wilhelm1,2

The GATA2 encodes a zinc-finger transcription factor that acts as a master regulator of normal hematopoiesis. Mutations in GATA2 have been implicated in the development of myelodysplastic syndrome and acute myeloid leukemia (AML). Using RNA sequencing we now report that GATA2 is either mutated with a functional consequence, or expressed at low levels in the majority of normal karyotype AML (NK-AML). We also show that low-GATA2-expressing specimens (GATA2low) exhibit -specific expression (ASE) (skewing) in more than half of AML patients examined. We demonstrate that the hypermethylation of the silenced allele can be reversed by exposure to demethylating agents, which also restores biallelic expression of GATA2. We show that GATA2low AML lack the prototypical R882 mutation in DNMT3A frequently observed in NK-AML patients and that The Cancer Genome Atlas AML specimens with DNMT3A R882 mutations are characterized by CpG hypomethylation of GATA2. Finally, we validate that several known missense single- polymorphisms in GATA2 are actually loss-of-function variants, which, when combined with ASE, represent the equivalent of homozygous GATA2 mutations. From a broader perspective, this work suggests for the first time that determinants of ASE likely have a key role in leukemia.

Leukemia (2014) 28, 1617–1626; doi:10.1038/leu.2014.67

INTRODUCTION We performed whole-genome sequencing and transcriptome Acute myeloid leukemia (AML) is a heterogeneous disease that is sequencing (RNA sequencing (RNA-seq)) on a novel AML cell line, 5 characterized by the uncontrolled proliferation of progenitor cells CG-SH, recently derived from a patient with NK-AML and identified that lack the ability to terminally differentiate. AML is typically a number of mutations, including one in the GATA2 gene, 6 classified into various subgroups, defined by specific chromosomal previously reported to be mutated in AML. In addition, the aberrations that are present, however, a substantial proportion GATA2 gene in CG-SH was found to have a partially duplicated and of AML patients (45% of adult and 20% of pediatric AML cases1) rearranged last . GATA2 is highly expressed in hematopoietic do not have cytogenetic abnormalities and are classified as stem cells and has a central function in their maintenance and 7,8 normal karyotype AML (NK-AML). For such patients, identifying proliferation. It contains two zinc fingers (ZF1 and ZF2) in its DNA- novel genetic determinants of the disease has presented challenges. binding domain and recently, mutations within a highly conserved In the past few years, the use of the next-generation DNA threonine repeat located in ZF2 have been identified as the cause 9 sequencing technology has been instrumental in helping to of familial myelodysplastic syndrome/AML. Unexpectedly, an identify novel recurrent mutations in AML.1,2 Despite the discovery examination of the GATA2 locus in CG-SH revealed that the novel of these novel candidate , however, a comprehensive mutation, along a number of other single-nucleotide variants, understanding of the genetic and epigenetic mechanisms exhibited allele-specific expression (ASE). Because the only involved in the disease remains elusive. An example is provided expressed version of GATA2 in CG-SH has a mutation and by the DNMT3A gene, which has been found to be recurrently rearranged last exon, this results in a complete loss of GATA2 mutated in B30% of all NK-AML patient samples.2 Although function. When examining our RNA-seq data for NK-AML patients þ DNMT3A has been shown to function as a DNA methyltransferase,3 and normal blood progenitor cells (CD34 ), we further observe ASE AML patients with mutations in this gene do not show changes in of GATA2 in only the leukemic samples, suggesting that it is a novel, bulk DNA methylation, nor recurrent changes in DNA methylation disease-specific mode of transcriptional regulation. at specific regions of their genomes.2 Subsequent studies in The expression of specific of imprinted genes is Dnmt3a knockout mice have demonstrated that hematopoietic associated with changes in DNA methylation patterns that are stem cell self-renewal can be affected in these mice, but only after established and maintained by three DNA methyltransferase serial transplantation.4 The subtle effects of this recurrent enzymes: DNMT1, DNMT3A and DNMT3B. It has been demon- mutation underline the need for greater functional studies to strated for instance that the offspring of Dnmt3a knockout mice 10 better define the roles of candidate driver mutations in AML. The lack DNA methylation and ASE at all maternally imprinted loci. rarity of cell line models of NK-AML, however, has made the Because of its known role in transcriptional silencing and 11,12 investigation of candidate mutations difficult. correlation with allelic imbalance in gene expression, we

1Department of Medicine, Institute for Research in and Cancer, Universite´ de Montreal, Montreal, Que´bec, Canada; 2Laboratory for high-throughput , Montreal, Que´bec, Canada; 3INRA, UMR1083, Sciences Pour l’Oenologie, Montpellier, France; 4Laboratory for Functional and Structural , Montreal, Que´bec, Canada; 5Leukemia Cell Bank of Quebec and Division of Hematology, Maisonneuve-Rosemont Hospital, Montreal, Que´bec, Canada and 6Laboratory for Molecular of Stem Cells, Montreal, Que´bec, Canada. Correspondence: Dr B Wilhelm, Department of Medicine, Institute for Research in Immunology and Cancer, Universite´ de Montreal, PO Box 6128, Station Centre-Ville, Montreal H3C 3J7, Que´bec, Canada. E-mail: [email protected] Received 14 October 2013; revised 29 January 2014; accepted 3 February 2014; accepted article preview online 11 February 2014; advance online publication, 7 March 2014 Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1618 examined the impact of DNA methylation on GATA2 in CG-SH. All next-generation DNA sequencing data reported here have been We first examined the mutational status of various epigenetic submitted to GEO under the accession numbers GSE48173 and GSE40199. enzymes, and DNMT3A specifically, and found a mutational bias in DNMT3A in patients who exhibit ASE of GATA2. We then Global analysis of ASE and GATA2 allele usage in CG-SH demonstrated that DNA methylation is required for ASE of Genome-wide analysis of ASE in CG-SH cells was performed using a GATA2 in CG-SH, and that AML patients with mutations in custom Perl script, which identified heterozygous positions (coverage X20 DNMT3A have lower levels DNA methylation and higher GATA2 and representation of both alleles between 30 and 70%) in the whole- expression than wild-type patients. We also examined the impact genome sequence for the cell line where transcriptome coverage showed of altered GATA2 expression levels and found that the expression a strong bias (coverage X50 where a single allele shows X90% usage). of known GATA2 target genes in NK-AML samples is highly All putative regions (B250) were subjected to manual validation in the correlated to GATA2 expression. This indicates that downstream UCSC genome browser to eliminate false positives owing to the presence of repetitive or duplicated regions of the genome. Different genomic targets can also be directly impacted by the ASE of GATA2. regions within the GATA2 gene containing heterozygous SNPs (see primers To understand the full biological impact of this phenomenon, used in table below) were validated with an ABI 3730 automatic DNA we lastly investigated known missense single-nucleotide poly- sequencer (Applied Biosystems, Burlington, ON, Canada). morphisms (SNPs) that are present and expressed in the GATA2 alleles of NK-AML patients. None of these variants have previously Analysis of TCGA data been classified as mutations, however, the GATA2 shows Publicly available processed data from The Cancer Genome Atlas (TCGA) an extremely high level (98%) of sequence conservation between analysis of DNA methylation in AML samples on the Illumina Infinium and mice. As a result, functional predictions of these SNPs 13 14 human methylation 450 platform was analyzed. Associated clinical data by SIFT and PolyPhen both indicate that some of these are were used to select NK-AML patients that had percentages of ‘blast_cel- likely to affect GATA2 activity. We validated a number of these l_outcome_percentage’ and ‘bone_marrow_blast_cell_outcome_percent’ predictions, showing that several known SNPs, with defined over 50% and without any other cytogenetic abnormalities at a frequency population frequencies, can cause a complete loss of GATA2 of 410% and that had either wild-type DNMT3A or R882 mutations (29 function. Together, our results demonstrate that the mutation rate samples in total). Average probe beta signals for DNA methylation in wild- of GATA2 is significantly higher than previously appreciated and, type samples (22) or R882 mutants (7) were obtained from level 3 in combination with ASE, represents a novel mechanism for processed files. leukemogenesis. TCGA patient samples used by an identifier

MATERIALS AND METHODS Cell culture DNMT3A-R882 2975 DNMT3A-WT 3008 2984 2866 2977 2921 THP1, HL60 and KG1a cells were grown in RPMI medium with 10% 2825 2981 2863 2947 2903 2880 2979 3011 inactivated fetal bovine serum, penicillin (100 U/ml) and streptomycin 2853 2974 2826 2970 2909 2922 2990 2805 1 (100 mg/ml) (at 37 C with 5% CO2). CG-SH cells were provided by Munker 2965 2809 2839 2976 2919 2896 2992 et al.5 and cultured in Iscove’s modified Dulbecco’s medium with also fetal bovine serum, penicillin and streptomycin in the same proportion as above, supplemented with cytokines as previously described. DAC treatment of CG-SH CG-SH cells were cultured as described above and seeded at 0.5 106 RNA and genomic DNA high-throughput sequencing  cells/ml. The following day they were treated with 0.1 mM 5-aza-20- The patient samples used in this study were collected by the Leukemia Cell deoxycytidine (DAC; Sigma, St Louis, MO, USA) once per day for 3 days. Bank of Quebec with informed consent and project approval by the Cells were left to recover for 48 h before harvesting. Cells without DAC Research Ethics Board of the Maisonneuve-Rosemont Hospital and treatment were used as a control. Cells (4  106) for both RNA and gDNA þ Universite´ de Montre´al. Normal human cord blood CD34 samples were extractions (as described above). collected by HemaQuebec and sorted at the flow cytometery platform at IRIC (details in Supplementary Table 2). Cultured cells were harvested and either resuspended in 1 ml TRIzol reagent (Invitrogen, Burlington, ON, Bisulfite treatment of DNA to analyze methylation status Canada) (5  106 cells) or directly purified using the PureLink Genomic Genomic DNA from CG-SH cells was digested with BglII or SacI and purified DNA Mini Kit (Invitrogen). RNA that was extracted was further purified with with the PCR purification Kit (Invitrogen). DNA (200 ng) was used with the the RNeasy Plus Mini Kit (Invitrogen) and then reverse-transcribed using a EpiTect Bisulfite Kit (Qiagen, Toronto, ON, Canada) for sodium bisulfite Superscript III Reverse Transcriptase (Invitrogen). RNA/DNA paired-end conversion of unmethylated cytosine following the manufacturer’s barcoded libraries were prepared according to the TruSeq RNA/DNA instructions for low concentration. Unmethylated strand-specific primers Sample Prep High Throughput (HT) protocol (Illumina, San Diego, CA, USA). were designed to amplify a 285-bp fragment in the CpG-rich region Genomic DNA was either used for selective exome enrichment (Illumina flanking a SNP (chr3: 128 206 618) in the GATA2 promoter region TruSeq Exome Enrichment kit) or used directly for whole-genome containing an informative SNP. PCR products were gel purified and cloned sequencing on a Illumina HiSeq 2000 at the IRIC genomics core facility. into a pGemT easy vector, transformed into electrocompetent DH10B bacteria and individual colonies were selected for sequencing.

RNA-seq analysis GATA2 cDNA sequencing after DAC treatment Alignments of sequence reads to the human genome (hg19) and estimates Single-stranded complementary DNA (cDNA) was produced starting with of reads per kilobase of exon model per million mapped reads (RPKM) 1 mg of mRNA from cells with and without DAC treatment and using the were obtained using Casava Ver.1.8. Sequence variants in AML samples 15 Superscript II Reverse Transcriptase (Invitrogen), then purified with using a relative to the reference genome were called by SAMtools and included PCR purification kit (Invitrogen). Nested PCR amplification of GATA2 was filters for both read mapping quality (-q 250) and single-base quality scores performed coupled with HindIII and EcoRI restriction sites. PCR products (-Q 30). Lower coverage BodyMap 2.0 data were also filtered for mapping were separated by agarose gel electrophoresis. Individual PCR products quality (-q 100) and single-base quality scores (-Q 20). All heterozygous were purified, cloned into pcDNA3.1 vectors and sequenced. sites in GATA2 were manually inspected using an Integrative Genomic Viewer16 and visualizations of read counts at these positions were generated using custom Perl and R scripts. Publicly available raw Luciferase assays for GATA2 sequence variants sequence data for normal tissues (for example, Illumina BodyMap 2.0) An hg19 reference GATA2 cDNA was PCR amplified from the cDNA of an was remapped with variant calls performed as described above. AML patient. The CG-SH isoform of GATA2 and the rs34870876 SNP were

Leukemia (2014) 1617 – 1626 & 2014 Macmillan Publishers Limited Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1619 PCR amplified using the same primers from CG-SH cDNA and AML patient data for GATA2 is almost certainly tumor specific, although small 04H112 cDNA, respectively. Other sequence variants were generated by amounts of contaminating non-leukemic cells (which might not site-directed mutagenesis on the wild-type cDNA cloned above. All GATA2 have shown ASE of GATA2) could contribute to the range of biased variants were cloned into pcDNA3.1 and published pcDNA3 PU.1 and the expression we observed. Together these results indicate that the GATA2-responsive promoters (LYL1 and CSF1R) in PGL4.12 vectors were range of biased expression of GATA2 seen in NK-AML samples is kindly provided by the Scott lab (Centre for Cancer , Adelaide, SA, Australia).9 Cos-7 cells were transfected at 90% confluence with neither a sampling artifact nor a normal feature of GATA2 Lipofectamine 2000 and were harvested and used for luciferase assays expression but is specific to AML samples. We did also see after 20 h using a Veritas Microplate Luminometer (Turner Biosystem, evidence for biased GATA2 expression in 15 other adult AML Madison, WI, USA). The luciferase assay results were reported as relative patients of various cytogenetic subgroups (Supplementary light units of firefly luciferase activity. Table 3), however, because the numerous and complex differ- ences between karyotypes of these samples could easily obscure Computational analysis of GATA2 network, regulation and SNPs directly related effects, we focused our analysis here on only the homogeneous group of NK-AML patient samples. We then plotted The GATA2 expression correlation network was generated using Gephi 0.8.1 beta software.17 For the positive network, the 25 genes with the most the combined GATA2 expression data from informative NK-AML correlated expression (Pearson’s correlation 40.82) with GATA2 were samples along with allele-specific usage (Figure 1e). Our NK-AML selected along with all genes correlated in turn with these 30 genes having patients showed a significant inverse correlation between GATA2 a Pearson value 40.8. For the negative network, the 30 genes whose expression and single-allele usage, however, as noted earlier, three expression was most anticorrelated with GATA2 were selected along with of the five largest outliers from this trend have mutations in all genes correlated in turn with these 30 genes having an absolute GATA2. Pearson value 40.8. Sizes of the nodes reflect the number of genes connected to each node and the edge width is proportional to the absolute value of the correlation. Node coloring is determined by a Only a limited number of genes in the CG-SH cell line exhibit ASE heuristic method of differentiating a community substructure within the In order to identify other genes that exhibited ASE, we performed network.18 19 a genome-wide analysis of CG-SH RNA-seq data and detected 324 Gene set enrichment analysis software from the Broad Institute was used individual nucleotide positions that were heterozygous in the along with rank-ordered genes based on their Pearson correlation values with GATA2 expression. Significance was determined using an empirical genomic DNA and showed ASE in the transcriptome (490% phenotype-based permutation test as described by Subramanian et al.19 single-allele usage). A majority of these variants were clustered The over-representation of transcription factor binding sites for the within processed pseudogenes (177) and/or other duplicated co-expression GATA2 network genes (correlated (red) and anti-correlated regions of the human genome (256), suggesting that these sites (green)) was predicted using matrices from JASPAR database using the likely represent artifacts of the read mapping process. After Pscan software (Dipartimento di Scienze Biomolecolari e Biotecnologie, removing these regions, ASE was identified in H19, MEG3 and University of Milan, Milan, Italy). The enrichment of transcription factor- SNRPN, all known imprinted genes,21 as well as GALNT7, NAPRT1, binding sites was determined by a Fisher exact test (P-value p5%) and at RPS23, LIPA and MYO1D. Of these, only MEG3, SNRPN and H19 also X the indicated stringency at a z-score 1.64. Ensembl Variant Effect showed evidence of ASE in any of our NK-AML patient samples, Predictor software (http://useast.ensembl.org/tools.html)20 was used to predict the biological impact for known and novel variants on GATA2. although they were generally poorly expressed. Expression of other members of the GATA gene family was not detected in CG- SH cells or NK-AML patient specimens. Taken together, these RESULTS observations suggest a surprisingly limited extent of ASE in the CG-SH cells. Expression of GATA2 is low in NK-AML patients After identifying mutations in the GATA2 gene in CG-SH cells by RNA-seq, we examined the expression level of GATA2 in a cohort DNA methylation is required for ASE of GATA2 of 49 NK-AML patients as well as CD34 þ cells from healthy We next sought to investigate the molecular mechanisms individuals. Relative to CD34 þ cells (from 19 single healthy involved in the allele-specific regulation of GATA2. To determine donors), GATA2 expression levels were significantly lower in the whether DNA methylation was required for ASE of GATA2,we vast majority of NK-AML patients (Figure 1a). Interestingly, patient performed bisulfite sequencing on CG-SH cells before and after samples and the CG-SH cell line that had mutations in GATA2 were treatment with an inhibitor of DNA methylation (DAC). By exceptions to this trend in low GATA2 expression. The GATA2 tracking a heterozygous SNP in the 50 untranslated region of expression level did not differ based on the CD34 expression level GATA2 (Figure 2a) we demonstrated that, before DAC treat- or NPM1 mutation status of AML samples (Table 1). ment, the expressed allele shows almost no DNA methylation, whereas the silenced allele is heavily methylated at CpG GATA2 expression exhibits allelic skewing in NK-AML patients positions throughout the region (Figure 2b). Direct sequencing of the entire cDNA of the expressed allele of GATA2 in CG-SH In analyzing the GATA2 RNA-seq and genomic sequence data revealed a 355-bp portion of GATA2 locus (containing part of from CS-SH, we were surprised to see that the novel GATA2 the last exon and ) that has been duplicated and mutation and other heterozygous SNPs within GATA2 clearly reinserted into the last exon of GATA2 (Supplementary displayed ASE (Figure 1b), which was verified by standard Figure 1). After DAC treatment, we demonstrate that the DNA sequencing (Figure 1c; Supplementary Figure 1). We next methylation of the silenced allele is reduced (Figure 2b) and examined GATA2 allele usage in the informative patients (36) of that this decrease is sufficient to allow the induction of our NK-AML cohort (Supplementary Table 1) along with RNA-seq transcription from the silent wild-type allele of GATA2 (which data from normal tissues either publically available (Illumina þ in CG-SH, lacks the 355-bp insertion in the last exon) (Figure 2c). Bodymap 2.0) or analyzed by our lab (normal CD34 cells, The identity of the bands seen in the gel was verified through monocytes; Supplementary Table 2; Figure 1d). As expected, the standard DNA sequencing. stochastic selection of transcript fragments during the RNA-seq procedure results in variation of allele representation, however, no normal tissue/cells showed an average bias 4B64%. Strikingly, NK-AML DNMT3AMut patients do not show high levels of only five NK-AML patients were below this threshold, whereas allelic bias 86% were above. Because the average blast percentage in our Because of the demonstrated involvement of DNA methylation in samples is high (485%) (Supplementary Table 1) the expression the ASE of GATA2, we next examined the mutational status and

& 2014 Macmillan Publishers Limited Leukemia (2014) 1617 – 1626 Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1620

Figure 1. Allele usage of GATA2 in normal and NK-AML cells. (a) Bar graph of GATA2 expression in AML patient samples and the CG-SH cell line. Expression range of GATA2 in normal CD34 þ cells is shown in gray with 95% confidence interval. Samples or cell lines with mutations in the GATA2 gene are shown in red. (b) Integrated Genome Viewer screenshot illustrating the read coverage of a novel heterozygous SNP in GATA2 in the DNA-seq and RNA-seq data from CG-SH cells. Numbers in adjacent locations indicate the number of mapped reads containing either allele. Right part of panel shows Sanger sequencing validation of the SNP position. (c) Allele usage in GATA2 at other SNP positions in CG-SH (transcribed from right to left). Individual columns represent SNP positions within GATA2 and are identified by their hg19 genomic position along chromosome 3 and the associated dbSNP rsID number. The sequence coverage (read counts) is shown for both DNA and RNA within each box from the CG-SH cell line containing either the reference allele (white box) or alternate allele (gray box). (d) Allele bias in normal tissues versus leukemia. Individual samples within each category are denoted by filled circles overlaid on boxplots representing the distribution of values. The limit of observed variation in allele representation of normal cells/tissues owing to the stochastic process of fragment selection during sequencing is shown in the dotted gray box. A total of 36 NK-AML patient samples were informative for variations. (e) Scatterplot of GATA2 gene expression versus single-allele usage. Limits for normal GATA2 expression and allele usage are shown in gray boxes. Informative patient samples or cell lines with mutations in GATA2 are shown in rectangular boxes. For samples whose GATA2 expression exceeds the y-axis limits (95% confidence interval for GATA2 expression in CD34 þ cells), their names are shown at the top of the graph with the actual expression value below the circle. The Pearson correlation between GATA2 expression and allele usage in AML patient samples is À 0.497 (P-value ¼ 0.002) excluding two patients with mutated GATA2 (04H112, 06H028).

expression of DNA methylation genes in our NK-AML patients patients (with heterozygous SNPs and GATA2 expression (RPKM (Table 1). For all three DNA methyltransferase enzymes (DNMT1, 41)) did, however, show a bias in the distribution of mutations in DNMT3A and DNMT3B) we saw no correlation with gene either the R882 residue of DNMT3A (representing B60% of all expression patterns that could explain the pattern of ASE, nor recurrent DNMT3A mutations in NK-AML2)orotherdamaging(for did we observe any obvious differences in DNMT3A exon usage example, frameshift/nonsense) mutations (Table 1). We calcu- that might indicate alternative isoform usage.22 Informative lated the average heterozygosity of all SNP locations within

Leukemia (2014) 1617 – 1626 & 2014 Macmillan Publishers Limited Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1621 Table 1. Mutational status and gene expression level of DNA methylation enzymes and select genes, surface markers in NK-AML patients

Patient Ave DNMT3A- DNMT3A- DNMT1 DNMT3B NPM1 CD34 CD34 CD38 GATA2 DNMT1 DNMT3A DNMT3B ID main 882 Other FACS allele % (%) (sites)

03H119 99 (3) Wt 292.85 98 23.41 5.50 37.92 12.53 4.75 09H113 99 (2) G890S Wt 338.89 96 30.43 22.53 15.69 13.20 22.72 09H115 98 (3) S Wt 4.22 3 12.68 19.19 33.54 7.36 5.37 12H030 97 (2) S Wt 175.37 94 27.24 7.13 58.06 20.31 3.99 06H045 96 (2) Wt 125.67 70 58.02 11.79 16.83 9.06 4.06 03H041 95 (3) Wt 44.66 28 35.05 14.47 45.62 7.76 7.13 06H028 93 (6) Mut 1.30 2 19.47 124.02 18.28 98.21 13.64 11H142 92 (5) Mut 0.18 0 13.93 44.88 15.19 15.20 10.18 11H058 90 (3) Mut 0.44 0 67.57 26.17 49.60 13.62 15.00 11H095 90 (1) Wt 0.58 0 27.42 2.81 55.62 7.35 1.55 10H056 89 (1) R326H Wt 135.17 97 26.82 7.77 42.56 6.98 12.28 11H151 88 (4) R736C Mut 2.52 5 3.20 86.36 10.64 12.08 6.1 10H038 88 (1) S Wt 277.18 99 99.15 6.11 22.87 9.77 7.25 07H135 86 (2) G708D Wt 20.14 2 5.70 36.64 32.41 17.73 17.86 07H042 85 (4) C587Y Mut 3.23 7 29.50 42.01 25.82 13.49 21.25 10H052 84 (5) Wt 1.61 2 5.70 15.19 14.72 9.71 1.52 02H053 84 (3) Mut 0.05 0 24.01 10.19 22.54 8.55 6.82 11H072 83 (4) R882C Mut 2.19 0 21.35 25.59 15.84 8.82 11.72 05H050 80 (2) R882C S S Wt 35.66 10 20.87 10.30 20.46 4.97 2.63 04H024 78 (5) S304fs Mut 9.26 8 25.78 32.99 39.02 5.83 7.70 09H043 77 (3) L567* S Mut 51.27 8 20.99 55.00 41.09 12.94 25.35 11H021 76 (3) Wt 255.42 97 3.29 18.38 12.39 12.31 11.63 11H083 76 (2) R882H Mut 1.27 0 34.54 38.79 28.14 11.14 13.40 06H144 73 (3) H355fs Mut 0.26 2 3.69 38.61 11.77 6.25 3.44 05H163 72 (4) Wt 232.25 96 90.97 37.65 28.20 13.59 10.22 04H112 70 (6) R882C Mut 45.44 36 25.12 201.12 26.52 12.54 34.84 04H133 70 (4) G674S Mut 0.43 0 40.31 43.99 40.74 17.43 11.22 11H006 70 (1) R882H Mut 0.43 0 29.11 4.87 30.90 14.17 19.86 10H101 67 (5) R882C S Mut 1.42 1 10.08 43.46 19.21 9.87 9.71 11H009 67 (2) R882C Wt 9.40 6 13.45 56.17 24.87 5.33 11.60 10H166 67 (1) R882C Mut 5.87 1 18.86 27.83 34.10 9.27 17.29 10H092 59 (3) R882H Mut 1.12 0 33.85 54.46 26.44 13.05 9.81 11H160 58 (5) Wt 89.12 56 46.74 19.54 36.55 8.54 5.80 09H031 58 (4) Mut 18.40 10 24.43 102.37 31.98 8.33 8.42 05H149 56 (4) R882C Wt 267.03 98 10.10 44.47 16.00 18.19 6.43 08H048 52 (3) Wt 39.40 41 170.24 47.28 52.73 20.07 8.27 Abbreviations: FACS, fluorescence-activated cell sorting; Mut, mutant; NK-AML, normal karyotype acute myeloid leukemia; Wt, wild type. Mutations in genes are noted either by specific residue locations or with an S (substitution) where locations are not specified. Frameshift mutations are denoted as ‘fs’ while nonsense mutations are denoted with a ‘*’. RPKM values for DNMT1, DNMT3A, DNMT3B and DNMT3L in NK-AML patients are shown. NPM1 mutational status along with RPKM values for GATA2, CD34 and CD38 are also shown along with CD34 FACS staining results. Samples above the threshold of 85% allelic skewing (where DNMT3A R882 mutations or nonsense mutations are not seen) or below are represented in italics or bold, respectively.

GATA2 and compared these levels for patients with either wild- data from the TCGA Illumina DNA methylation arrays to examine type DNMT3A or R882 mutations. We saw a statistically significant the GATA2 locus in NK-AML patients. Using similar constraints for lower level of heterozygosity in patients with DNMT3A mutations the inclusion of samples (minimum blast percentage (450%), (Figure 3a), indicating DNMT3A activity can influence DNA lack of other cytogenetic abnormalities) we compared DNA methylation at the GATA2 locus. The reduced allelic skewing methylation patterns for 22 DNMT3A wild-type patients and 7 for GATA2, which we detect in our AML patients with DNMT3A R882 mutants (Figure 3b). Interestingly, we observed statistically mutations, was also reflected in their RNA-seq data significant lower levels of DNA methylation in and around a (Supplementary Figure 2) and could also be validated through defined CpG island in the GATA2 promoter region in patients standard sequencing (Supplementary Figure 3). We also identi- with R882 mutations. At this same level of significance fied mutations at other locations in DNMT3A as previously (Po0.001), we did not see differences in a variety of other reported,2 although it remains unclear whether any of these have expressed or imprinted genes examined except for ESRP2 the same functional consequences of the R882 mutation. Other (Supplementary Figure 4), which had been previously described mutations likely to lead to a loss of function (for example, in DNMT3A mutants.2 nonsense and frameshift) were not seen in patients with high levels (480%) of ASE. DNMT3A mutations do not affect ASE of imprinted genes Our observations, along with DNA methylation data from NK-AML NK-AML DNMT3AMut patients show reduced GATA2 DNA patients suggest that one consequence of DNMT3A mutations methylation might be the loss of DNA methylation at the GATA2 locus. This In order to examine the influence of DNMT3A on DNA hypothesis is supported by published genome-wide studies of methylation at the GATA2 locus, we then used publicly available Dnmt3a-null mice that showed the GATA2 promoter region was

& 2014 Macmillan Publishers Limited Leukemia (2014) 1617 – 1626 Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1622

Figure 2. Allelic imbalance linked to hypermethylation of GATA2 promoter. (a) A cartoon representation of the 50 end of the human GATA2 gene (transcribed from right to left) is shown. A heterozygous SNP in the 50 untranslated region region of GATA2 (chr3: 128 206 618), which displays an allelic imbalance was used to assess DNA methylation status. (b) Bisulfite genomic DNA sequencing analysis of CG-SH cells with and without DAC treatment. Black and white circles represent methylated and unmethylated CpG positions, respectively, within a B300-bp region of a heterozygous SNP that were determined after treatment, PCR amplification and sequencing. The expressed allele of GATA2 in CG- SH contains a thymine nucleotide at this SNP compared with a guanine in the silent allele. (c) RT-PCR products amplified with primers for the full CDS of GATA2 from CG-SH cells treated without and with DAC (0.1 mM for 3 days) and visualized by agarose gel electrophoresis (0.8% agarose). The smaller band indicated by a * corresponds to the shorter wild-type transcript of GATA2 (lacking the last exon insertion), which is induced after DAC treatment and hypomethylation of locus. All bands were extracted and sequenced to verify identity.

Figure 3. Mutational status of DNMT3A has an impact on the methylation level and regulation of allele-specific expression. (a) DNMT3A-R882 mutations are associated with a lower bias of allele-specific expression. A box plot is shown of the average major allele usage for informative patients with or without DNMT3A R882 mutations. Distribution of allele usage at all heterozygous sites in both groups (see Table 1) were used for a Welch two-sample t-test that indicates that there is a statistically significant difference (Po0.02) between the extent of the allelic bias seen in patients with or without the R882 mutation. (b) Differential methylation levels in GATA2 gene locus between TCGA normal karyotype AML patients. Each point represents the average signal of 7 (DNMT3A R882 mutant) or 22 (wild-type) probes on an Infinium Methylation 450 array. In the promoter region of GATA2, the methylation rate is significantly lower in case of DNMT3A mutants (Po0.001, marked with *). GATA2 are shown as green rectangles and the translational start and direction is marked with an arrow and CpG islands defined in UCSC are shown.

one of the most strongly hypomethylated loci in hematopoietic mutant patients, we found no skewed ASE of heterozygous sites stem cells of these mice.4 How DNMT3A could be specifically within classical imprinted genes (for example, H19) (Supplementary targeted to the GATA2 locus and why its effects would be so limited Figure 5a) despite the fact that we could also reverse the ASE of are not clear, although this is consistent with extensive initial these genes with DAC treatment as well (Supplementary Figure 5b). characterizations of DNMT3A mutations, which did not show bulk These results together suggest that the transcriptional regulation of changes in DNA methylation levels or methylation differences in individual alleles of GATA2 in AML is different than other classical large sets of genes in genome-wide analyses.2 In our DNMT3A examples of allele-specific expression.

Leukemia (2014) 1617 – 1626 & 2014 Macmillan Publishers Limited Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1623 Allelic skewing of GATA2 impacts overall expression level and both WT125 and PPARG26 have been shown to be direct targets of known downstream targets GATA2, the list of correlated genes likely contain other GATA2 If DNMT3A activity regulates the usage of one or both alleles of targets of relevance in AML. To generate a more comprehensive GATA2, we reasoned that there should be a difference in the view of the transcriptional network, a list of all genes ranked by expression level in patients with or without mutations in DNMT3A. their correlation with GATA2 expression was used to create non- We therefore examined gene expression of GATA2 in patients redundant positive- (Supplementary Figure 6a) and negative- without DNMT3A mutations and found that patients with DNMT3A (Supplementary Figure 6b) correlation networks. Within the positive mutations (including R882) did have an almost twofold higher network we identified genes implicated in biological processes average expression level. critical to AML such as the cell cycle control (for example, TFDP227), We then examined potential effects on downstream GATA2 hematopoietic differentiation (for example, CEBPB28)orotherwise targets by ranking average gene expression in our AML patients reported to be involved in AML (TIE129). Interestingly, a second DNA based on their correlation with GATA2 expression. We observed that methyltransferase DNMT3B is also part of the positive network one of the most positively correlated genes was Wilms tumor 1 along with MSI2, which has recently been implicated in regulating (WT1), which is recurrently mutated in AML,23 whereas one of the hematopoietic stem cells activity.30 Gene set enrichment analysis of genes most negatively correlated with GATA2 expression was GATA2 correlated genes also identified enriched signatures peroxisome proliferator-activated receptor gamma (PPARG), which consistent with leukemic growth and underrepresented regulates the differentiation of adipocytes24 (Figure 4b). Because signatures associated with markers of myeloid differentiation

Figure 4. Allele-specific expression mechanism involved in expression level of GATA2 regulatory network. (a) Gene expression correlation between GATA2 and WT1 or PPARg. For normal karyotype adult AML patients, the expression (RPKM value from RNA-seq analysis) of WT1 or PPARg is highly positively or negatively correlated with the expression of GATA2. Pearson correlation coefficients for each pair of genes are shown. (b) Gene set enrichment analysis (GSEA) conducted on rank-ordered gene lists based on correlation with GATA2. The top panels show that the genes whose expression are correlated with GATA2 are significantly enriched in MLL, NPM1 and NUMA1-associated signatures, whereas anticorrelated genes show matches to signatures associated with markers of myeloid differentiation (CD33, CD14) or genes that can antagonize GATA2 activity (SPI1).

& 2014 Macmillan Publishers Limited Leukemia (2014) 1617 – 1626 Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1624 (Figure 4c). We further examined the transcription factor-binding missense variants, in different regions of the GATA2 gene, are sites that were enriched within the networks (Supplementary sufficient to markedly alter its activity in vitro. Table 4) or subnetworks (Supplementary Figure 7), which again highlighted a number of genes known to be relevant in AML. Taken together, these results suggest that the disruption of normal GATA2 DISCUSSION expression levels could potentially alter a critical balance of this cell The underlying AML has been the subject of fate regulator with consequences for leukemic growth. intense investigation for many years. Recent whole-genome sequencing studies on NK-AML patient samples have identified a number of recurrent (DNMT3A, IDH1 and IDH22,32–35)and Known missense SNPs in GATA2 are loss of function mutations nonrecurrent somatic mutations demonstrating the very high level Our data from CG-SH cells demonstrate that the combination of of genetic heterogeneity seen in AML patients.1,2 In addition, the mutations and ASE can create the equivalent of homozygous loss precise molecular mechanisms of the mutations published are of function mutations of GATA2. As human and mouse GATA2 unclear, as a large number of them are present as heterozygous are very highly conserved31 (98% identical at the protein changes in patients. The explanation generally proposed is that these level), we hypothesized that other missense mutations, including mutations result in either haploinsufficiency or a dominant negative known SNPs, might also behave as pathogenic mutations. effect for the protein. This was proposed, for example, for the role of Software tools such as SIFT13 and PolyPhen14 predict that a familial GATA2 mutations in AML.36 A previous transcriptional analysis majority of known GATA2 missense SNPs would deleteriously highlighted the downregulation of GATA2 as a crucial step in affect protein function (Supplementary Table 5). We selected leukemia progression.37 Furthermore, the enforced expression of known SNPs to validate some of these predictions, including those GATA2 in a leukemic cell line showed significantly reduced growth, jointly predicted to be damaging by SIFT and PolyPhen (three plus indicating that a high expression level of GATA2 acts in opposition to two novel mutations (in CG-SH and patient 04H112)), benign (two) the maintenance of leukemic growth.37 Using a novel AML cell line or which had conflicting predictions (two). We then individually with a normal karyotype, we show that the ASE of GATA2 in introduced these variants into a GATA2 cDNA that matched the combination with a heterozygous mutation can produce the hg19 reference sequence exactly, which was already in an functional equivalent of a homozygous null mutation. This expression vector. These were then coexpressed with a published observation is also supported by a recent report in which three LYL1-luciferase promoter construct.9 As expected, the promoter MonoMAC patients had reduced or absent transcription from one assays demonstrated that the rearranged and mutated form of allele at the GATA2 locus38 leading to a reduction in GATA2 transcript GATA2 from CG-SH had essentially a complete loss of activity levels. ASE of GATA2 is therefore not restricted to AML but is involved compared with WT GATA2 (Figure 5a). Remarkably however, two in other different disease phenotypes. of the three putatively damaging sequence variants (including a To survey the extent of ASE in the CG-SH cell line, we performed known SNP present in one of our NK-AML patients) also resulted a genome-wide analysis, comparing heterozygous positions in almost the same complete loss of function. Interestingly, identified in whole-genome DNA sequencing to RNA-seq data, although many of the putative ‘benign’ SNPs had no effect, one of to identify where expressed alleles showed skewing. In contrast to the known SNPs with conflicting predictions actually showed other studies of ASE, which showed the existence of a large increased activity (B50%) over wild-type GATA2. Using the number of genes subject to parent of origin effects,39 our simultaneous coexpression of SPI1/PU.1, we again saw a significant observations suggest a surprisingly limited extent of ASE in the loss of function, in this case for all three ‘damaging’ SNPs along CG-SH cells. Aside from three previously known examples of with the novel mutation in one patient and the CG-SH GATA2 imprinted genes21 that showed ASE in both CG-SH cells or NK- cDNA (Figure 5b). These results show that a variety of known AML patient specimens, only a handful of other candidate genes

Figure 5. Loss-of-function mutations in GATA2.(a) Transcriptional activity of wild-type (WT) and mutated forms of GATA2. Luciferase activity for various GATA2 cDNA expression vectors containing putatively damaging missense SNPs are shown. The predicted consequences of SNPs are indicated (D, damaging; B, benign and U, unknown) below each variant tested. Where the results of the two prediction algorithm (SIFT and Polyphen) are conflicting, the letter ‘C’ is indicated. Reporter experiments were performed in triplicate. Statistically difference between pairwise comparisons compared with WT for both a and b are shown (***Po0.001, **Po0.01, *Po0.05; n ¼ 3). (b) Transcriptional activity of GATA2 WT and mutated forms in conjunction with PU.1 transactivation via CSF1R promoter. Triplicate luciferase assays were also performed using a mix of wild-type and mutated forms of GATA2. For further details see Materials and Methods.

Leukemia (2014) 1617 – 1626 & 2014 Macmillan Publishers Limited Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1625 could be detected in CG-SH. Although the number of genes wish to thank Dr Hamish Scott and the Louisiana State University Health Sciences exhibiting ASE is less than might be expected, there remains some Center at Shreveport for the generous donation of reagents and we acknowledge the controversy regarding the global extent of ASE.40 Given our TCGA for making data from their project publicly available. This research was manual inspection and filtering of putative ASE locations for supported by funding from the Cole Foundation and FRSQ (BTW and MC), NSERC duplicated regions within the human genome, we do not believe (BTW) and Genome Quebec (BTW, SL, JH and GS). Support from the BCLQ is also gratefully acknowledged and we also wish to acknowledge the contribution of all of that bona fide candidate genes were excluded as a result. the courageous patients who provided samples used in this study. Correspondence Because of the known role of DNA methylation in establishing and requests for materials should be addressed to BTW ([email protected]). and maintaining ASE, and given the recurrent mutations in AML of the DNA methyltransferase enzyme DNMT3A,2 we examined these enzymes in AML samples. Using the CG-SH cell line, we show that AUTHOR CONTRIBUTIONS following treatment with inhibitors of DNA methylation, the allele BTW designed and supervised the experimental work with help from MC while silencing at the level of GATA2 promoter can be reversed, along MC and AF performed the experimental work presented; GG, MC and BTW with ASE at the classically imprinted gene H19 (Supplementary analyzed the data with help from SL; MC and GG drafted the paper with Figure 5b). We also demonstrate that there is a mutational bias in revisions by BTW, GS, JH and SL. DNMT3A in the AML patients of our cohort who exhibit allelic skewing, something which we do not see for any other DNA methyltransferases. In examining published TCGA AML patient REFERENCES data, we see a significant difference in the level of DNA methylation at the GATA2 locus with the most frequent R882 1 Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N Engl J Med mutation in DNMT3A compared with patients without this 2009; 361: 1058–1066. mutation. Finally, recently published genome-wide data from 2 Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE et al. DNMT3A Dnmt3a knockout mice identified the Gata2 locus as one of the mutations in acute myeloid leukemia. N Engl J Med 2010; 363: 2424–2433. 4 most differentially hypomethylated loci. Taken together, these 3 Okano M, Xie S, Li E. Cloning and characterization of a family of novel mammalian data strongly implicate DNMT3A in the methylation changes DNA (cytosine-5) methyltransferases. Nat Genet 1998; 19: 219–220. involved in the GATA2 allele silencing, although it is not yet clear 4 Challen GA, Sun D, Jeong M, Luo M, Jelinek J, Berg JS et al. Dnmt3a is essential for how or when (during development) DNMT3A might be targeted hematopoietic stem cell differentiation. Nat Genet 2011; 44:23–31. (or mis-targeted) to the GATA2 locus. 5 Munker R, Nordberg ML, Veillon D, Williams BJ, Roggero A, Kern W et al. The precise molecular mechanisms responsible for establishing Characterization of a new myeloid leukemia cell line with normal (CG-SH). Leukemia Res 2009; 33: 1405–1408. the ASE of GATA2 observed are unclear. A recent report has 6 Greif PA, Dufour A, Konstandin NP, Ksienzyk B, Zellmeier E, Tizazu B et al. GATA2 suggested that mutations in an enhancer element in intron 5 of zinc finger 1 mutations associated with biallelic CEBPA mutations define a unique the GATA2 gene could be responsible for allelic imbalance. For five genetic entity of acute myeloid leukemia. Blood 2012; 120: 395–403. AML patient samples with high levels of GATA2 allelic imbalance in 7 Tsai F-Y, Keller G, Kuo FC, Weiss M, Chen J, Rosenblatt M et al. An early haemato- our cohort (03H119, 09H113, 06H028, 11H142 and 11H095), we poietic defect in mice lacking the transcription factor GATA-2. Nature 1994; 371:221. did not observe any mutations in this region (data not shown). 8 Tsai F-Y, Orkin SH. Transcription factor GATA-2 is required for proliferation/survival Therefore, despite the demonstration of the function of this of early hematopoietic cells and mast cell formation, but not for erythroid and enhancer in vivo, its connection to the ASE of GATA2 that we myeloid terminal differentiation. Blood 1997; 89: 3636–3643. observed is still not completely defined. 9 Hahn CN, Chong C-E, Carmichael CL, Wilkins EJ, Brautigan PJ, Li X-C et al. Heritable GATA2 mutations associated with familial myelodysplastic syndrome and acute With respect to protein activity, we demonstrate that known myeloid leukemia. Nat Genet 2011; 43: 1012–1017. missense SNPs in GATA2 (which are typically ignored in the analysis 10 Kaneda M, Okano M, Hata K, Sado T, Tsujimoto N, Li E et al. Essential role for de of AML patients) can abrogate its activity, indicating the effective novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature mutation rate of GATA2 in AML is higher than previously 2004; 429: 900–903. appreciated.41 Importantly, although the population frequency of 11 Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW. Allelic variation in human the missense SNPs we have examined is quite low (for example, gene expression. Science 2002; 297: 1143–1143. 0.00022 for rs148942346 in the NHLBI-ESP cohort populations), this 12 Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, Bockett N et al. Genome-wide is still five times higher than overall incidence of AML in North analysis of allelic expression imbalance in human primary cells by high- America according to SEER statistics (4.5 per 100 000 for males). This throughput transcriptome resequencing. Hum Mol Genet 2010; 19: 122–134. 13 Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res suggests that even if uncommon, the presence of such a damaging 2001; 11: 863–874. SNP could potentially increase the risk of developing AML. 14 Sunyaev S, Ramensky V, Bork P. Towards a structural basis of human non- Given that we do not observe ASE of GATA2 in a variety of normal synonymous single nucleotide polymorphisms. Trends Genet 2000; 16: 198–200. cells, this mode of regulation appears to be linked to the biology of 15 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al. The sequence leukemia. Allele-specific expression of GATA2 cannot be essential for alignment/map format and SAMtools. Bioinformatics 2009; 25: 2078–2079. AML growth however, since not all of our NK-AML samples show 16 Robinson JT, Thorvaldsdo´ ttir H, Winckler W, Guttman M, Lander ES, Getz G et al. this effect. It is possible that the establishment of aberrant DNA Integrative genomics viewer. Nat Biotechnol 2011; 29: 24–26. methylation at the GATA2 locus could be a rare stochastic event, 17 Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring occurring during the differentiation of myeloid progenitor cells, and Manipulating Networks. International AAAI Conference on Weblogs and Social Media. AAAI Press: Menlo Park, CA, USA, 2009. potentially leading to pre-leukemic growth. Such a scenario, 18 Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities although speculative, would be consistent with models of tumor in large networks. J Stat Mech 2008; 2008: P10008. 42,43 evolution in AML. Further experimental work will therefore be 19 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al. required to further define the origin of this phenomenon. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102: 15545–15550. CONFLICT OF INTEREST 20 McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. The authors declare no conflict of interest. Bioinformatics 2010; 26: 2069–2070. 21 Benetatos L, Hatzimichael E, Dasoula A, Dranitsaris G, Tsiara S, Syrrou M et al. CpG methylation analysis of the MEG3 and SNRPN imprinted genes in acute myeloid ACKNOWLEDGEMENTS leukemia and myelodysplastic syndromes. Leukemia Res 2010; 34: 148–153. We would like to thank Jana Krosl and Josette-Rene´e Landry for helpful discussions and 22 Chen T, Ueda Y, Xie S, Li EA. Novel Dnmt3a isoform produced from an alternative comments on the manuscript and Manish Goel, Patrick Gendron, Pierre Chagnon, promoter localizes to euchromatin and its expression correlates with activede Marianne Arteau and Raphae¨lle Lambert for excellent technical assistance. We also novo methylation. J Biol Chem 2002; 277: 38746–38754.

& 2014 Macmillan Publishers Limited Leukemia (2014) 1617 – 1626 Epigenetic regulation of GATA2 and its impact on NK-AML M Celton et al 1626 23 King-Underwood L, Pritchard-Jones K. Wilms’ tumor (WT1) gene mutations occur 33NakaoM,YokotaS,IwaiT,KanekoH,HoriikeS,KashimaKet al. Internal tandem mainly in acute myeloid leukemia and may confer drug resistance. Blood 1998; 91: duplication of the flt3 gene found in acute myeloid leukemia. Leukemia 1996; 10: 1911. 2961–2968. 34 Abdel-Wahab O, Mullally A, Hedvat C, Garcia-Manero G, Patel J, Wadleigh M et al. 24 Tong Q, Tsai J, Tan G, Dalgin Gk, Hotamisligil GkS. Interaction between GATA and Genetic characterization of TET1, TET2, and TET3 alterations in myeloid the C/EBP family of transcription factors is critical in GATA-mediated suppression malignancies. Blood 2009; 114: 144–147. of adipocyte differentiation. Mol Cell Biol 2005; 25: 706–715. 35 Dang L, Jin S, Su SM. IDH mutations in glioma and acute myeloid leukemia. Trends 25 Furuhata A, Murakami M, Ito H, Gao S, Yoshida K, Sobue S et al. GATA-1 and GATA- Mol Med 2010; 16: 387–397. 2 binding to 3’ enhancer of WT1 gene is essential for its transcription in acute 36 Gimelbrant A, Hutchinson JN, Thompson BR, Chess A. Widespread monoallelic leukemia and solid tumor cell lines. Leukemia 2009; 23: 1270–1277. expression on human autosomes. Science 2007; 318: 1136–1140. 26 Tong Q, Dalgin Gk XuH, Ting C-N, Leiden JM, Hotamisligil GkS. Function of GATA 37 Bonadies N, Foster SD, Chan W-I, Kvinlaug BT, Spensberger D, Dawson MA et al. transcription factors in preadipocyte-adipocyte transition. Science 2000; 290: 134–138. Genome-wide analysis of transcriptional reprogramming in mouse models of 27 Zhang Y, Chellappan SP. Cloning and characterization of human DP2, a novel acute myeloid leukaemia. PloS One 2011; 6: e16330. dimerization partner of E2F. Oncogene 1995; 10: 2085. 38 Hsu AP, Johnson KD, Falcone EL, Sanalkumar R, Sanchez L, Hickstein DD et al. 28 Scott L, Civin C, Rorth P, Friedman A. A novel temporal expression pattern of three GATA2 haploinsufficiency caused by mutations in a conserved intronic element C/EBP family members in differentiating myelomonocytic cells. Blood 1992; 80: leads to MonoMAC syndrome. Blood 2013; 121: 3830–3837. 1725–1735. 39 Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D et al. High-resolution 29 Verstovsek S, Estey E, Manshouri T, Keating M, Kantarjian H, Giles FJ et al. High analysis of parent-of-origin allelic expression in the mouse brain. Science 2010; expression of the receptor tyrosine kinase Tie-1 in acute myeloid leukemia and 329: 643–648. myelodysplastic syndrome. Leuk Lymphoma 2001; 42: 511–516. 40 DeVeale B, van der Kooy D, Babak T. Critical evaluation of imprinted gene 30 Hope KJ, Cellot S, Ting SB, MacRae T, Mayotte N, Iscove NN et al. An RNAi screen expression by RNA-Seq: a new perspective. PLoS Genet 2012; 8: e1002600. identifies Msi2 and Prox1 as having opposite roles in the regulation of hemato- 41 Fasan A, Eder C, Haferlach C, Grossmann V, Kohlmann A, Dicker F et al. GATA2 poietic stem cell activity. Cell Stem Cell 2010; 7: 101–113. mutations are frequent in intermediate-risk karyotype AML with biallelic CEBPA 31 Lowry JA, Atchley WR. Molecular evolution of the GATA family of transcription mutations and are associated with favorable prognosis. Leukemia 2012; 27:482–485. factors: conservation within the DNA-binding domain. J Mol Evol 2000; 50: 103–115. 42 Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS et al. Clonal evolution 32 Verhaak RGW, Goudswaard CS, van Putten W, Bijl MA, Sanders MA, Hugens W et al. in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Mutations in nucleophosmin (NPM1) in acute myeloid leukemia (AML): association Nature 2012; 481: 506–510. with other gene abnormalities and previously established gene expression sig- 43 Walter MJ, Shen D, Ding L, Shao J, Koboldt DC, Chen K et al. Clonal architecture natures and their favorable prognostic significance. Blood 2005; 106: 3747–3754. of secondary acute myeloid leukemia. N Engl J Med 2012; 366: 1090–1098.

Supplementary Information accompanies this paper on the Leukemia website (http://www.nature.com/leu)

Leukemia (2014) 1617 – 1626 & 2014 Macmillan Publishers Limited