Acquired copy number alterations in adult acute myeloid

Matthew J. Waltera,b,c,1,2, Jacqueline E. Paytond,1, Rhonda E. Riesa,b,1, William D. Shannona, Hrishikesh Deshmukhd, Yu Zhaoa,b, Jack Batye, Sharon Heatha,b, Peter Westervelta,b,c, Mark A. Watsonc,d, Michael H. Tomassona,b,c, Rakesh Nagarajanc,d, Brian P. O’Garaa,b, Clara D. Bloomfieldf,g, Krzysztof Mro´ zekf,g, Rebecca R. Selzerh, Todd A. Richmondh, Jacob Kitzmanh, Joel Geogheganh, Peggy S. Eish, Rachel Maupini, Robert S. Fultoni, Michael McLellani, Richard K. Wilsoni, Elaine R. Mardisi, Daniel C. Linka,b,c, Timothy A. Grauberta,b,c, John F. DiPersioa,b,c, and Timothy J. Leya,b,c

aDepartment of Medicine, bDivision of Oncology, cSiteman Cancer Center, dDepartment of Pathology and Immunology, and eDivision of Biostatistics, Washington University School of Medicine, St. Louis, MO 63110; fDivision of Hematology and Oncology, Department of Medicine, Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210; gCancer and Leukemia Group B, Chicago, IL 60601; and hRoche NimbleGen, Inc., Madison, WI 53719; and iThe Center, Washington University School of Medicine, St. Louis, MO 63110

Edited by Janet D. Rowley, University of Chicago Medical Center, Chicago, IL, and approved May 18, 2009 (received for review March 23, 2009) Cytogenetic analysis of acute myeloid leukemia (AML) cells has (CNAs) and UPD are common in AML genomes (6–12). However, accelerated the identification of important for AML patho- these studies used low-resolution arrays, often used reference DNA genesis. To complement cytogenetic studies and to identify genes that was not obtained from the same patient’s normal cells, and did altered in AML genomes, we performed genome-wide copy num- not routinely validate copy number changes with independent ber analysis with paired normal and tumor DNA obtained from 86 platforms. These limitations made it difficult to distinguish between adult patients with de novo AML using 1.85 million feature SNP acquired (somatic) CNA and inherited copy number variants arrays. Acquired copy number alterations (CNAs) were confirmed (CNVs) that exist in all individuals; furthermore, secondary vali- using an ultra-dense array comparative genomic hybridization dation methods are required to distinguish between true events and platform. A total of 201 somatic CNAs were found in the 86 AML false-positive findings, which are extremely common using the genomes (mean, 2.34 CNAs per genome), with French-American- current platforms. To overcome these limitations and to definitively British system M6 and M7 genomes containing the most changes identify genes that are somatically altered in AML genomes, we (10–29 CNAs per genome). Twenty-four percent of AML patients used the Affymetrix Genome-Wide Human SNP Array 6.0 plat- with normal cytogenetics had CNA, whereas 40% of patients with form (containing 1.85 million probes, median interprobe spacing an abnormal had additional CNA detected by SNP array, 680 bp) to screen paired tumor and normal DNA samples obtained and several CNA regions were recurrent. The mRNA expression from 86 adult patients with de novo AML, and validated putative levels of 57 genes were significantly altered in 27 of 50 recurrent CNA using an independent, ultra-dense custom Roche NimbleGen CNA regions <5 megabases in size. A total of 8 uniparental disomy CGH 12 ϫ 135K array (median interprobe spacing 245 bp). We (UPD) segments were identified in the 86 genomes; 6 of 8 UPD calls identified a mean of 2.34 CNAs per genome, and 76% of the CNAs occurred in samples with a normal karyotype. Collectively, 34 of 86 involved a known cancer-related . We identified 50 recurrent AML genomes (40%) contained alterations not found with cyto- CNAs Ͻ5 megabases (Mb) in size in the 86 genomes, and 32 of genetics, and 98% of these regions contained genes. Of 86 ge- these 50 regions contained genes not previously implicated in AML. nomes, 43 (50%) had no CNA or UPD at this level of resolution. In UPD was more common in normal karyotype samples. Fifty this study of 86 adult AML genomes, the use of an unbiased percent of the AML genomes tested in this study had no detectable high-resolution genomic screen identified many genes not previ- CNAs or UPD, indicating that other approaches, including whole- ously implicated in AML that may be relevant for pathogenesis, genome sequencing, may be required to discover the remaining along with many known oncogenes and tumor suppressor genes. genetic changes that contribute to AML pathogenesis.

AML ͉ array CGH ͉ genomics ͉ SNP array Results Patient Characteristics. A total of 86 adult patients (aged Ͼ18 years) cute myeloid leukemia (AML) is a heterogeneous group of with de novo AML were chosen for study on the basis of the Adiseases currently classified by abnormalities in bone mar- availability of high-quality, abundant, paired bone marrow (tumor) row morphology, karyotype, acquired gene mutations, and al- and skin (normal) DNA samples. Paired samples allowed us to terations in gene expression (1–3). Although the identification of distinguish acquired CNA from inherited CNV. Cases were clas- specific gene mutations has resulted in improved treatments and sified in accordance with the French-American-British (FAB) outcomes for some AML patients (4), enormous clinical heter- system upon diagnosis and banking of their bone marrow speci- ogeneity exists and may reflect the presence of as-yet undetected initiating and cooperating mutations. Therefore, the discovery of somatic mutations in the genomes of AML patients with Author contributions: M.J.W., J.E.P., R.E.R., and T.J.L. designed research; M.J.W., J.E.P., normal and abnormal will advance our understand- R.E.R., R.R.S., T.A.R., J.K., J.G., P.S.E., R.M., R.S.F., and M.M. performed research; R.R.S., ing of the genetics underlying AML and should lead to more T.A.R., J.K., J.G., P.S.E., R.M., R.S.F., and M.M. contributed new reagents/analytic tools; M.J.W., J.E.P., R.E.R., W.D.S., H.D., Y.Z., J.B., S.H., P.W., M.A.W., M.H.T., R.N., B.P.O., C.D.B., specific therapies and better patient classification schemes. K.M., R.R.S., T.A.R., J.K., J.G., P.S.E., R.M., R.S.F., M.M., R.K.W., E.R.M., D.C.L., T.A.G., J.F.D., The discovery of previously uncharacterized genes mutated in and T.J.L. analyzed data; and M.J.W., J.E.P., R.E.R., and T.J.L. wrote the paper. acute lymphoblastic leukemia (ALL) was recently reported using Conflict of interest: R.R.S., T.A.R., J.G., and J.K. are employees of Roche NimbleGen, Inc., which SNP array technology for DNA copy number analysis (5). SNP supplied the arrays and hybridization services for the research. array platforms can detect genomic amplifications, deletions, SNP Freely available online through the PNAS open access option. (LOH), and regions of uniparental disomy 1M.J.W., J.E.P., and R.E.R. contributed equally to this work. (UPD) (copy-neutral LOH events) in cancer cells. Early studies 2To whom correspondence should be addressed. E-mail: [email protected]. using SNP arrays and array comparative genomic hybridization This article contains supporting information online at www.pnas.org/cgi/content/full/ (CGH) platforms have suggested that both copy number alterations 0903091106/DCSupplemental.

12950–12955 ͉ PNAS ͉ August 4, 2009 ͉ vol. 106 ͉ no. 31 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0903091106 Downloaded by guest on September 24, 2021 Fig. 1. Copy number and UPD heatmap for 86 AML genomes. The results of copy number and UPD (copy-neutral LOH) analysis of 86 paired tumor and normal DNA samples assayed on the Affymetrix Genome-Wide SNP 6.0 arrays are shown. For each of the 86 genomes, each genome is represented by 2 columns, copy number as the log2 ratio of tumor/normal DNA is shown on the left and UPD on the right. Copy number is designated by a color range from white (deletion) to red (amplification), with pink indicating a normal copy number. The presence of UPD is shown in blue and the normal non-UPD state in gray. The y axis represents the number, with chromosome 1 at the top and Y on the bottom. The x axis displays samples grouped by common cytogenetic abnormalities. The patient number labels correspond to the patient numbers in Table S1. See Table S2 for a complete listing of miscellaneous cytogenetics.

mens. The patients include FAB M0–M7, with a median blast count CNAs that were not independently assessed on the custom array of 64% (range, 30–100%) [supporting information (SI) Table S1 CGH platform had a minimum size of 300 kb and involved at least and Table S2]. 100 probes. All putative CNAs Ͼ200 kb in size that were detected on the SNP array were validated on the custom array CGH platform Acquired CNA. We identified 201 acquired CNAs in the 86 AML (see SI Results and Fig. S1 for a complete description). genomes using the SNP arrays (Fig. 1 and Table S1 and Table S2). Of the 201 CNAs, 198 (99%) contained known genes, and 154 of The 201 CNAs occurred in 38 of 86 AML genomes, spanned from 201 loci (77%) contained at least 1 gene that had previously been associated with cancer- or AML/myelodysplastic syndromes (MDS) 35 kb (34 probes) to 250 Mb (146,524 probes) in size (median, 9.15 Ͻ Mb), and involved every chromosome at least once. There was a (13) (Table S2). Of CNAs 5 Mb in size (the lower limit of mean of 2.34 CNAs per AML genome (range, 0–30; median, 0), detection by cytogenetics), 38% (33 of 88) contained at least 1 cancer- or AML/MDS-associated gene (52 total cancer- or AML/ and deletions were more common than amplifications (1.23:1). The MDS-associated genes in 88 segments), which is significantly more 201 CNAs were distributed across all FAB subtypes, with M6 and than the 31 genes expected to occur in 88 sized-matched segments M7 subtypes containing significantly more CNAs per genome ϭ ϭ randomly distributed across the genome (1,000 permutations; P compared with all other subtypes (mean 21 vs. 1.4, respectively; P 0.009) (Fig. 2). CNAs Ͻ5 Mb were also significantly enriched for all 0.02) (Table S1). annotated genes, cancer genes alone, and AML/MDS-associated Of the 201 CNAs, 125 (62%) corresponded to changes detected genes alone (P ϭ 0.001, P ϭ 0.02, and P Ͻ 0.001, respectively). by cytogenetics, and 76 of 201 (38%) were detected by SNP array CNAs Ͻ1Mbinsize(n ϭ 45) were enriched for AML/MDS- only (Fig. S1). Of the 76 CNAs that were detected by SNP array associated genes and the combination of cancer and AML/MDS only, 32 (42%) were Ͻ1 Mb in size. Twenty-six of these 32 CNAs genes, but not for cancer genes alone or all annotated genes (P Ͻ (81%) were validated using an independent custom NimbleGen 0.001, P ϭ 0.02, P ϭ 0.16, and P ϭ 0.058, respectively). There was CGH 12 ϫ 135K array (Roche NimbleGen). Two of the 32 CNAs no enrichment for microRNA genes in CNAs Ͻ5MborϽ1Mbin Ͻ1 Mb in size occurred at a known translocation breakpoint. Four size. MEDICAL SCIENCES

Walter et al. PNAS ͉ August 4, 2009 ͉ vol. 106 ͉ no. 31 ͉ 12951 Downloaded by guest on September 24, 2021 Fig. 2. CNAs (deletions and amplifications) include 1 or more genes and demonstrate significant regions of recurrence. Log2 ratio dot plots of paired tumor and normal DNA samples from the same patient were generated from data obtained from the Affymetrix Genome-Wide SNP 6.0 arrays (top plot of each panel) and custom NimbleGen CGH 12 ϫ 135K array data (bottom plot of each panel). Solid horizontal lines indicate gene locations with selected gene names provided. The y axis is the log2 ratio of paired tumor/normal DNA, and the x axis represents the chromosomal megabase position for both array platforms. (A) Deletion of a 1.9-Mb region of chromosome 17, including the NF1 gene. (B) Deletion of a 57-kb region of chromosome X, including the STAG2 gene. (C) Partial tandem duplication of a 35.6-kb region of MLL on chromosome 11. (D) GISTIC analysis of genomic regions of deletion. Chromosome positions are indicated along the y axis and the false-discovery rate q values on the x axis, with the significance threshold indicated by the green line at 0.25. Deletion regions that surpass the significance threshold include 3p14.1, 5q31.1, 7q31.31, 12p12.3, 16q22.1, 17p13.1, 17q11.2, and 18p11.31. (E) GISTIC analysis of genomic regions of amplification. Amplification regions that surpass the significance threshold include chromosomes 8q23.2, 11q23.3, 19q13.43, and 21q22.2.

We identified 12 chromosomal regions (8 deletions and 4 am- 1 sample with a CNA Ͻ5 Mb in size. The mRNA expression levels plifications) from the 201 CNAs that were significantly altered in of 43 genes, located in 15 of these 32 regions (47%), were signifi- multiple AML genomes by using the Genomic Identification of cantly altered compared with samples without changes (P value Significant Targets in Cancer (GISTIC) (14) algorithm (Fig. 2 and range, 0.049–8.37E-11) (Table S5). Table S3). Most of these regions contain at least 1 gene previously Favorable, intermediate, and adverse cytogenetic categories at implicated in cancer and/or AML/MDS (deletions of 3p14.1: FHIT, diagnosis, defined by Cancer and Leukemia Group B (CALGB) 5q31.1: CTNNA1, 12p12.3: ETV6, 16q22.1: CBFB, 17p13.1: TP53, (1), were predictive of overall and event-free survival in our AML 17q11.2: NF1, and amplifications of 8q23.2: MYC, 11q23.3: MLL, patients, as expected (Fig. S2). However, the total number of CNAs and 21q22.2: ETS2). Chromosomal regions 17q11.2 and 21q22.2 per patient (for all patients or only patients with normal cytoge- were altered in at least 5 patients (the majority having complex netics), identified by SNP arrays and cytogenetics or by SNP arrays karyotypes) and were associated with worse overall survival (CNAs only, was not predictive of overall or event-free survival, indepen- spanning NF1 and ETS2, respectively; P Յ 0.012; Fig. S2). All 12 dent of cytogenetic classification (see SI Results for a complete recurrent regions displayed mRNA expression levels for the entire description of the analysis). region that were significantly altered in a gene dose-dependent manner, compared with samples without CNAs (P value range, Acquired CNAs Containing <3 Genes. Of the 201 CNAs detected in 0.02–2.06E-16) (Table S4). In addition, we identified 18 recurrent the 86 genomes, 21 (10%) contained Յ3 genes (all Ͻ1 Mb in size). CNA regions (10 deletions and 8 amplifications, 6 of 18 loci Of these 21 segments, 6 (29%) included known cancer-associated contained within 1 of the 12 GISTIC-identified loci) in the 201 genes, including a CNA involving the PML gene in a patient with CNAs that contained a cancer or AML/MDS-associated gene and a known t(15;17) translocation involving PML, a deletion of the involved at least 1 sample with a CNA Ͻ5 Mb in size. The mRNA ARF tumor suppressor gene, a partial tandem duplication of MLL, expression levels of BCL2L1, BMI1, ETS2, ETV6, FUS, MLL, NF1, and a known fusion partner of MLL (Table 1). Three of the 21 PML, RUNX1, SHMT1, SMAD2, SMAD4, TP53, and TOP3A genes, CNAs contained no known genes but did contain regions of high located in 12 of these 18 regions (67%), were significantly altered sequence conservation (Ͼ75%) across the chimp, rhesus, horse, compared with samples without changes (P value range, 0.047– dog, and mouse genomes (Table 1). To identify small (Ͻ5 Mb) 1.93E-8) (Table S5). There were an additional 32 recurrent CNA recurrent CNAs involving genes in these 21 regions, we evaluated regions (19 deletions and 13 amplifications, 6 of 32 loci contained an additional set of 38 independent AML samples using the same within 1 of the 12 GISTIC-identified loci) that contained genes not Affymetrix 6.0 platform and methods (SI Methods and Fig. S3). previously implicated in cancer or AML/MDS and involved at least Three recurrent CNAs were detected. One included the MYB

12952 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0903091106 Walter et al. Downloaded by guest on September 24, 2021 Table 1. CNAs with <3 genes in 86 AML genomes Total no. No. of Chromo- Length, Gain/ Cancer gene Gain/ Length, UPN of CNA UPD some Start End kb loss Gene symbol symbol UPN* Start* End* loss* kb*

269542 15 2 193,794,995 193,868,741 74 Loss LOC645314 938150 23 3 88,674,967 89,301,036 626 Loss EPHA3, LOC643869 346190 2 3 163,995,351 164,109,297 114 Gain 335640 163,993,156 164,107,632 Loss 114 575512 8 6 114,621,527 115,238,795 617 Loss hCG1820801, LOC643999 692900 5 1 6 135,555,189 135,763,913 209 Gain MYB, AHI1 MYB 816067 135,577,953 135,578,798 Loss 1 575512 8 7 46,289,583 46,726,654 437 Loss 291696 4 8 36,155,250 36,725,290 570 Gain LOC642855 291696 4 8 40,076,675 40,876,632 800 Gain ZMAT4, C8orf4 957664 3 9 21,890,124 21,990,770 101 Loss C9orf53, CDKN2A CDKN2A (ARF) 986000 29 9 120,372,717 120,423,964 51 Gain 775109 1 10 21,893,278 22,027,081 134 Loss MLLT10 MLLT10 380949 2 11 116,297,361 116,448,564 151 Loss KIAA0999 831711 1 11 117,820,656 117,856,311 36 Gain MLL MLL† 964886 1 15 72,016,248 72,105,212 89 Gain LOXL1, STOML1, PML‡ PML 869586 4 16 3,785,706 3,845,786 60 Gain CREBBP CREBBP 327929 22 16 83,343,173 83,535,839 193 Gain USP10, CRISPLD2, LOC123862 327929 22 18 24,478,963 25,445,625 967 Loss ARIH2P, LOC440490 986000 29 21 18,228,199 18,808,385 580 Loss CHODL, PRSS7, LOC643081 986000 29 21 46,850,222 46,921,385 71 Gain PRMT2 312340 2 X 45,752,722 45,872,338 120 Loss LOC392454 750152 44,708,962 46,659,952 Loss 1,951 808642 1 X 122,947,030 123,004,111 57 Loss STAG2

Start ϭ first location in the CNA region. End ϭ last base pair location in the CNA region. *Data from an independent set of 38 AML samples. †MLL partial tandem duplication identified by SNP arrays. ‡t(15;17) involving the PML gene was identified using cytogenetics.

oncogene, and another contained LOC392454 [a pseudogene sim- mutated and translocated in patients with AML (19, 20). We ilar to proliferating cell nuclear antigen (PCNA)] (Table 1). identified 3 of 180 samples with nonsynonymous single- nucleotide variants in ETV6 in the absence of a CNA (P4A, Identification of Translocations and Gene Mutations. There were 23 R105Q, and R202Q); none of these samples had CNAs at cytogenetically defined, balanced rearrangements that occurred in 12p12.3. Normal skin DNA was available from 1 of 3 patients (2 Ͼ25% of metaphases in 22 patients with noncomplex karyotypes. CALGB samples had no normal DNA available) and confirmed Using SNP arrays, we detected regions of CNA at the breakpoints that R105Q was an acquired (somatic) mutation not previously of patients with cytogenetically defined balanced rearrangements [1 described. We also identified mutations in WT1, and sequencing with t(15;17) (q22;q21), and one with inv(16) (p13q22)], indicating results from 37 additional genes have previously been reported that these rearrangements are not balanced (Fig. S4) (15). We also (Table S1 and SI Methods) (21, 22). identified a deletion endpoint located at a common translocation breakpoint in the NUP98 gene on chromosome 11p15.5, although Partial Uniparental Disomy. Using paired normal and tumor DNA there were no cytogenetic translocations detected in this patient. from the same patient, we could definitively identify regions of SNP Cryptic translocations between NUP98 and NSD1 on chromosome LOH in tumor samples. LOH in the absence of a CNA (copy- 5q35.3 have been described in AML and can be missed by cyto- neutral LOH) is consistent with UPD. We identified 8 regions of genetics (16, 17). Therefore, we screened this sample for evidence UPD in 7 of 86 samples. UPD occurred more often in cytogenet- of a translocation involving NUP98 and NSD1. Using RT-PCR, we ically normal AML genomes (15% vs. 3.8%, respectively; P ϭ 0.08). detected a NUP98-NSD1 fusion transcript involving exon 12 in All regions of UPD extended to the end of the affected chromo- NUP98 and exon 6 in NSD1, but we did not detect the reciprocal some and varied in size from 11 to 95 mb. UPD of chromosomes NSD1-NUP98 fusion transcript owing to the deletion of the re- 11p, 16p, and the entire chromosome 13 each involved at least 2 maining NUP98 gene. We screened an additional 179 AML sam- patients (Fig. 1), and both patients with chromosome 13 UPD had ples for the NUP98-NDS1 fusion transcript and identified 2 addi- homozygous FLT3 mutations, consistent with previous reports (23, tional samples containing cryptic NUP98-NSD1 and NSD1-NUP98 24). One patient with 16p UPD had a potential cryptic translocation fusion transcripts. We also identified a small CNA amplification in of CREBBP. Chromosome 6p UPD (present in 1 of 86 AML a patient with a normal karyotype that involved a common trans- genomes) was also identified in 1 genome from a second set of 38 location breakpoint in the CREBBP gene (18), suggesting that this independent AML samples (1p and 11p UPD were found in 1 case CNA also marks a cryptic translocation (Fig. S4). each, and 21q UPD in 2 cases from the 38 new samples; Fig. S3). Recurrent CNA regions in ALL genomes often contain genes that are frequently mutated in samples without a CNA. To test Discussion whether this paradigm occurs in AML, we examined a focal CNA In this study we identified 201 CNAs in 86 de novo AML on chromosome 12p12.3 that occurred in 3 AML patients. This genomes (average of 2.34 CNAs per genome). Known cancer- region contains the ETV6 (TEL) gene, which is known to be and leukemia-related genes were significantly enriched in the MEDICAL SCIENCES

Walter et al. PNAS ͉ August 4, 2009 ͉ vol. 106 ͉ no. 31 ͉ 12953 Downloaded by guest on September 24, 2021 CNA loci, and their mRNA expression levels were often altered, suggesting that these genes may contribute to AML pathogen- esis. We identified 51 recurrent CNAs Ͻ5 Mb in size in the 86 genomes and 3 recurrent CNA regions Ͻ210 kb in size in an independent set of 38 AML genomes. UPD occurred infre- quently in de novo AML (7 of 86 genomes) and was more common in AML genomes with a normal karyotype. CNAs identified exclusively on the SNP arrays (i.e., not detected by cytogenetics) did not predict overall or event-free survival in this study, independent of cytogenetically defined CNAs. Collec- tively, these results indicate that AML genomes are not inher- ently unstable at this level of resolution (Ϸ35 kb) and suggest that there is tremendous heterogeneity in the genes that are mutated in each AML genome. It is also possible that epigenetic alterations may contribute to the development and progression of AML, which was not addressed in the present study. Because there are only a small number of CNAs in most AML genomes, we hypothesize that the genes in these regions maybe important for AML pathogenesis (i.e., they are drivers rather than random passenger mutations). The genes present in very small CNAs (e.g., STAG2, PRMT2, USP10, C8orf4, and more) are excellent candidates for evaluation in AML pathogenesis (Table 1 and Table S5). None of these genes have been implicated in AML pathogenesis and would only have been discovered in an unbi- Fig. 3. Summary of genetic alterations in AML genomes. Pie chart demon- ϭ ased high-resolution screen of the genome. As more AML strates the relative proportions of AML samples with an abnormal (red, n 50) and normal (white, n ϭ 36) karyotype, with (ϩ) and without (-) CNA and UPD genomes are interrogated using this technology (using the detected by SNP arrays. (Two patients with failed cytogenetics are included with appropriate experimental design to allow definitive CNA iden- the normal karyotype data: UPN 295 had no CNAs or UPD detected by SNP arrays, tification), a comprehensive catalogue of genes that may con- and UPN 327929 had 22 CNAs detected.) The number of patients in each group tribute to AML pathogenesis will be created. is listed in parentheses. Of 36 patients with a normal karyotype (including the 2 Although previous studies of AML genomes have implicated patients with failed cytogenetics), 13 (36%) had a CNA or UPD (identified as SNP several large CNA regions that we identified here (5q31.1, 7q, 8q24, Array Specific CNA or UPD). Of 50 patients with an abnormal karyotype, 21 (42%) 11q23, 16q23, 17q11.2, 18p11, and 21q22), there are many discrep- had an SNP array-specific CNA (not seen by cytogenetics) or UPD detected by SNP ancies between our findings and prior studies due to 3 fundamental arrays (identified as SNP Array Specific CNA or UPD). Forty-three of 86 patients (50%) had no CNA or UPD detected by SNP arrays. experimental design differences (6, 8, 10–12). First, the array we used (median interprobe spacing of 680 bp) has 4–10-fold higher resolution than the platforms used in previous studies. Second, we Identification of the important mutations for AML pathogenesis used paired normal and tumor DNA from the same individuals for will require knowing the complete sequence of a large number of all samples. Finally, we used independent, orthogonal platforms to AML genomes (13). Unbiased complete genome sequencing using validate all of the small CNAs, which have a high false-positive call ‘‘next-generation’’ technologies may ultimately allow us to detect rate. These differences allowed us to unequivocally identify true every genetic alteration that exists in a cancer cell, but it is currently acquired (somatic) CNA and UPD regions in the AML genomes, cost-prohibitive to screen large numbers of genomes. Despite the which would be impossible to distinguish from inherited copy ability of next-generation sequencing platforms to accurately iden- number variants and/or false-positive calls without the use of paired tify amplifications, deletions, and translocations (13, 28), a multi- DNA samples and independent validation studies. In contrast to platform approach (traditional cytogenetics, FISH, SNP array, several previous studies, our data suggest that CNA and UPD are array CGH, and targeted gene resequencing) will continue to be the not as common in AML as in ALL (5) and provide a strong most practical approach to study AML genomes at most institutions rationale for pursuing even higher-resolution genomic studies to for the time being. Using this multiplatform approach (karyotype, define all of the recurring genetic alterations in AML. Nonetheless, SNP array, array CGH, and gene resequencing), we identified at cataloging genes located in recurrent CNA regions may accelerate least 1 genetic abnormality in 95% (82 of 86) of the AML genomes the discovery of mutated genes in AML genomes without CNAs, included in this study (Table S1), including many known and SNP such as point mutations in ETV6 (3 CNA, 3 point mutations); array-specific CNAs (Fig. 3). As more AML genomes have their furthermore, CNAs can also uncover cryptic translocations definitive acquired (somatic) CNAs cataloged, it is likely that (NUP98-NSD1). Many known cancer/AML genes and genes not specific CNAs will also contribute to the growing list of genetic previously implicated in AML were identified in recurrent CNA alterations that impact treatment choices and outcomes of patients regions that displayed concordant gene dosage changes (Table S4 with AML. and Table S5), a mechanism known to be important in AML pathogenesis. The identification of submegabase-sized CNAs (that Methods contain Յ3 genes; Table 1) will allow us to prioritize the study of Copy Number and LOH Analysis. Tumor (bone marrow cells) and normal (skin) genes that are likely to contribute to AML pathogenesis. For DNA (0.5 ␮g) was prepared and hybridized to Affymetrix Genome-Wide Human example, the MYB oncogene was altered in 2 very small CNAs, SNP 6.0 Array GeneChip microarrays, and CEL files were created using Affymetrix which may represent a cryptic chromosomal translocation or du- GeneChip Command Console operating software and Genotyping Console 2.1, plication, as previously described in T-ALL (25–27). Finally, array- according to the manufacturer’s protocols (Affymetrix). The data have been deposited in dbGAP (Study accession: phs000201.v1.p1) (http://www.ncbi.nlm.ni- based studies can sometimes detect copy number changes not ϭ Ͼ h.gov/projects/gap/cgi-bin/study.cgi?study_id phs000201.v1.p1). The Partek identified by cytogenetics despite being of adequate size ( 5 Mb), Genomics Suite was used for both copy number and LOH analysis. Regions of CNA presumably owing to technical failures during the expansion of cells were detected using a Hidden Markov Model algorithm in the standard Partek for cytogenetic analysis or loss of a clone with a cytogenetic workflow for paired samples. Manual review of putative CNAs was agreed upon abnormality during in vitro propagation. by 3 independent investigators who analyzed the log2 ratio of adjacent seg-

12954 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0903091106 Walter et al. Downloaded by guest on September 24, 2021 ments, the variability of probes within a CNA, and the identification of CNA resequencing was performed as previously described (primer sequences available segment boundaries. LOH used genotyping results from paired normal and upon request) (21, 22). tumor samples and Partek LOH workflow. All LOH segments were manually reviewed and compared with the copy number data for the segment to differ- Statistical Analysis. For analysis of differences between patients, a ␹2 test was entiate UPD from a deletion. Gene annotation and overlap was determined using used to compare CNA frequencies and a 2-tailed Student’s t test applying the National Center for Biotechnology Information build 36.1 and University of Bonferroni correction for multiple comparisons to evaluate mRNA expression California, Santa Cruz (UCSC) hg18. Validation of CNA was performed using a levels. Gene enrichment analysis was performed by randomly permuting the high-resolution custom NimbleGen CGH 12 ϫ 135K array (135,000 probes per location of CNAs throughout the genome 1,000 times to obtain the null distribution, which was compared with the gene counts within the actual CNA sample) covering putative CNA loci (Ϸ1,900 probes per locus) containing isother- locations (see SI Methods for cancer and AML/MDS-associated genes). The mal oligomers (T ϭ 76 °C) (Roche NimbleGen). For the custom arrays, genomic m estimation of overall and event-free survival was performed using the Kaplan- DNA samples were prepared, hybridized, and scanned using the method of Selzer Meier method, and differences between groups were analyzed using the et al. (29). Data extraction of scanned images was performed using NimbleScan log–rank statistic (see SI Methods for a detailed description of the permuta- software (v. 2.0) (see SI Methods for a complete description of sample processing, tion and survival analyses). LOH detection, and CNA detection). ACKNOWLEDGMENTS. We thank Xia Li and Elena Deych for assistance with mRNA Expression Analysis, RT-PCR, and Gene Resequencing. Total RNA was pre- statistical analysis; Jon Armstrong for array CGH technical assistance; Jaime pared from the unfractionated bone marrow aspirates using TRIzol reagent Garcia-Heras and Shashikant Kulkarni for cytogenetic reviews; Patrick (Invitrogen), labeled, and hybridized to Affymetrix Human Genomic U133 Plus Cahan for gene enrichment analysis; and Gretchen Carnoske and Nancy Reidelberger for expert editorial assistance. This work was supported by 2.0 Array GeneChip microarrays using standard protocols (Affymetrix). The data the National Institutes of Health Grants 2P01CA101937 (to T.J.L.), have been deposited in Gene Expression Omnibus (GSE10358) (22). Standard U10CA101140 (to C.D.B.), T32-HL07088 and K08HL083012 (to M.J.W.) and reverse transcription was performed using 1 ␮g of total RNA. Exonic gene by the Barnes-Jewish Foundation (T.J.L.).

1. Byrd JC, et al. (2002) Pretreatment cytogenetic abnormalities are predictive of induction 15. Kolomietz E, et al. (2001) Primary chromosomal rearrangements of leukemia are fre- success, cumulative incidence of relapse, and overall survival in adult patients with de novo quently accompanied by extensive submicroscopic deletions and may lead to altered acute myeloid leukemia: Results from Cancer and Leukemia Group B (CALGB 8461). Blood prognosis. Blood 97:3581–3588. 100:4325–4336. 16. Romana SP, et al. (2006) NUP98 rearrangements in hematopoietic malignancies: A study 2. Mro´zek K, Marcucci G, Paschka P, Whitman SP, Bloomfield CD (2007) Clinical relevance of of the Groupe Francophone de Cytogenetique Hematologique. Leukemia 20:696–706. mutations and gene-expression changes in adult acute myeloid leukemia with normal 17. Cerveira N, et al. (2003) Frequency of NUP98-NSD1 fusion transcript in childhood acute cytogenetics: Are we ready for a prognostically prioritized molecular classification? Blood myeloid leukaemia. Leukemia 17:2244–2247. 109:431–448. 18. Panagopoulos I, et al. (2003) Genomic characterization of MOZ/CBP and CBP/MOZ 3. Schlenk RF, et al. (2008) Mutations and treatment outcome in cytogenetically normal acute chimeras in acute myeloid leukemia suggests the involvement of a damage-repair myeloid leukemia. N Engl J Med 358:1909–1918. mechanism in the origin of the t(8;16)(p11;p13). Genes Chromosomes Cancer 4. Fenaux P, et al. (1993) Effect of all transretinoic acid in newly diagnosed acute promyelo- 36:90–98. cytic leukemia. Results of a multicenter randomized trial. Blood 82:3241–3249. 19. Barjesteh van Waalwijk van Doorn-Khosrovani S, et al. (2005) Somatic heterozygous 5. Mullighan CG, et al. (2007) Genome-wide analysis of genetic alterations in acute lympho- mutations in ETV6 (TEL) and frequent absence of ETV6 in acute myeloid leukemia. blastic leukaemia. Nature 446:758–764. Oncogene 24:4129–4137. 6. Akagi T, et al. (2009) Hidden abnormalities and novel classification of t(15;17) APL based 20. Cools J, et al. (1999) Fusion of a novel gene, BTL,toETV6 in acute myeloid with on genomic alterations. Blood. 2009;113:1741–1748. a t(4;12)(q11–q12;p13) Blood 94:1820–1824. 7. Gorletta TA, et al. (2005) Frequent loss of heterozygosity without loss of genetic material 21. Link DC, et al. (2007) Distinct patterns of mutations occurring in de novo AML versus in acute myeloid leukemia with a normal karyotype. Genes Chromosomes Cancer 44:334– AML arising in the setting of severe congenital neutropenia. Blood 110:1648–1655. 337. 22. Tomasson MH, et al. (2008) Somatic mutations and germline sequence variants in the 8. Paulsson K, et al. (2006) High-resolution genome-wide array-based comparative genome expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia. Blood hybridization reveals cryptic chromosome changes in AML and MDS cases with trisomy 8 111:4797–4808. as the sole cytogenetic aberration. Leukemia 20:840–846. 9. Raghavan M, et al. (2005) Genome-wide single nucleotide analysis reveals 23. Fitzgibbon J, et al. (2005) Association between acquired uniparental disomy and homozy- frequent partial uniparental disomy due to somatic recombination in acute myeloid gous gene mutation in acute myeloid leukemias. Cancer Res 65:9152–9154. leukemias. Cancer Res 65:375–378. 24. Griffiths M, et al. (2005) Acquired isodisomy for chromosome 13 is common in AML, and 10. Ru¨cker FG, et al. (2006) Disclosure of candidate genes in acute myeloid leukemia with associated with FLT3-itd mutations. Leukemia 19:2355–2358. complex karyotypes using microarray-based molecular characterization. J Clin Oncol 25. Clappier E, et al. (2007) The C-MYB locus is involved in chromosomal translocation and 24:3887–3894. genomic duplications in human T-cell acute leukemia (T-ALL), the translocation defining 11. Suela J, et al. (2007) DNA profiling analysis of 100 consecutive de novo acute myeloid a new T-ALL subtype in very young children. Blood 110:1251–1261. leukemia cases reveals patterns of genomic instability that affect all cytogenetic risk 26. Murati A, et al. (2009) Genome profiling of acute myelomonocytic leukemia: Alteration of groups. Leukemia 21:1224–1231. the MYB locus in MYST3-linked cases. Leukemia 23:85–94. 12. Tyyba¨kinoja A, Elonen E, Piippo K, Porkka K, Knuutila S (2007) Oligonucleotide array-CGH 27. O’Neil J, et al. (2007) Alu elements mediate MYB gene tandem duplication in human T-ALL. reveals cryptic gene copy number alterations in karyotypically normal acute myeloid J Exp Med 204:3059–3066. leukemia. Leukemia 21:571–574. 28. Campbell PJ, et al. (2008) Identification of somatically acquired rearrangements in cancer 13. Ley TJ, et al. (2008) DNA sequencing of a cytogenetically normal acute myeloid leukaemia using genome-wide massively parallel paired-end sequencing. Nat Genet 40:722–729. genome. Nature 456:66–72. 29. Selzer RR, et al. (2005) Analysis of chromosome breakpoints in neuroblastoma at sub- 14. Beroukhim R, et al. (2007) Assessing the significance of chromosomal aberrations in cancer: kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Methodology and application to glioma. Proc Natl Acad Sci USA 104:20007–20012. Cancer 44:305–319. MEDICAL SCIENCES

Walter et al. PNAS ͉ August 4, 2009 ͉ vol. 106 ͉ no. 31 ͉ 12955 Downloaded by guest on September 24, 2021