High throughput digital quantification of mRNA abundance in primary human samples

Jacqueline E. Payton, … , Mark A. Watson, Timothy J. Ley

J Clin Invest. 2009;119(6):1714-1726. https://doi.org/10.1172/JCI38248.

Technical Advance Hematology

Acute promyelocytic leukemia (APL) is characterized by the t(15;17) chromosomal translocation, which results in fusion of the retinoic acid receptor α (RARA) to another gene, most commonly promyelocytic leukemia (PML). The resulting fusion protein, PML-RARA, initiates APL, which is a subtype (M3) of acute myeloid leukemia (AML). In this report, we identify a gene expression signature that is specific to M3 samples; it was not found in other AML subtypes and did not simply represent the normal gene expression pattern of primary promyelocytes. To validate this signature for a large number of , we tested a recently developed high throughput digital technology (NanoString nCounter). Nearly all of the genes tested demonstrated highly significant concordance with our microarray data (P < 0.05). The validated gene signature reliably identified M3 samples in 2 other AML datasets, and the validated genes were substantially enriched in our mouse model of APL, but not in a cell line that inducibly expressed PML-RARA. These results demonstrate that nCounter is a highly reproducible, customizable system for mRNA quantification using limited amounts of clinical material, which provides a valuable tool for biomarker measurement in low-abundance patient samples.

Find the latest version: https://jci.me/38248/pdf Technical advance High throughput digital quantification of mRNA abundance in primary human acute myeloid leukemia samples Jacqueline E. Payton,1 Nicole R. Grieselhuber,2 Li-Wei Chang,1 Mark Murakami,2 Gary K. Geiss,3 Daniel C. Link,1,2 Rakesh Nagarajan,1 Mark A. Watson,1 and Timothy J. Ley2,4

1Department of Pathology and Immunology, Division of Laboratory and Genomic Medicine, 2Department of Medicine, Division of Oncology, Washington University Medical School, St. Louis, Missouri, USA. 3NanoString Technologies, Seattle, Washington, USA. 4Department of Genetics, Washington University Medical School, St. Louis, Missouri, USA.

Acute promyelocytic leukemia (APL) is characterized by the t(15;17) chromosomal translocation, which results in fusion of the retinoic acid receptor α (RARA) gene to another gene, most commonly promyelocytic leuke- mia (PML). The resulting fusion protein, PML-RARA, initiates APL, which is a subtype (M3) of acute myeloid leukemia (AML). In this report, we identify a gene expression signature that is specific to M3 samples; it was not found in other AML subtypes and did not simply represent the normal gene expression pattern of primary promyelocytes. To validate this signature for a large number of genes, we tested a recently developed high throughput digital technology (NanoString nCounter). Nearly all of the genes tested demonstrated highly significant concordance with our microarray data (P < 0.05). The validated gene signature reliably identified M3 samples in 2 other AML datasets, and the validated genes were substantially enriched in our mouse model of APL, but not in a cell line that inducibly expressed PML-RARA. These results demonstrate that nCounter is a highly reproducible, customizable system for mRNA quantification using limited amounts of clinical mate- rial, which provides a valuable tool for biomarker measurement in low-abundance patient samples.

Introduction ing fusion protein, PML-RARA, has been shown to initiate APL in Here we describe what is, to our knowledge, the first use of a high- several mouse models (2–5). Unlike most other AML subtypes, the throughput digital system to assay the expression of a large num- initiating event of M3 AML is known, making it an attractive model ber of genes in primary clinical samples from patients with acute for the study of mechanisms of pathogenesis and progression. myeloid leukemia (AML). This technology captures and counts indi- Several recent gene expression profiling studies used microarray vidual mRNA transcripts without enzymatic reactions or bias and is technologies to compare subclasses of AML and have reported spe- notable for its high levels of sensitivity, linearity, multiplex capabil- cific expression signatures for individual morphologic or molecu- ity, and digital readout (1). The nCounter system (NanoString) is lar subtypes (6–18). Although a subset of these studies included capable of detecting as little as 0.5 fM of a specific mRNA, making it normal whole bone marrow or purified myeloid precursor CD34+ a valuable tool for expression signature validation, diagnostic test- cells, none of them included fractionated primary hematopoietic ing, and large translational studies, all of which often are limited by cells from multiple discrete stages of myeloid differentiation (6, 7, the very small amounts of clinical material available. 9, 11, 13–15, 17, 18). Because different subtypes of AML represent In this study, our primary clinical focus is on acute promyelo- various arrested developmental stages of hematopoiesis (e.g., M3 cytic leukemia (APL), a subtype (M3) of AML that is unique in its versus normal promyelocytes), differences in expression may result morphology and its defining molecular initiating event. (Through- from these developmental stages rather than a fundamental differ- out this manuscript, we refer to human APL as M3 AML and the ence in pathogenesis or progression. The inclusion of normal, pri- mouse models as murine APL.) Morphologically, the leukemic cells mary fractionated myeloid precursors, including promyelocytes, are abnormal promyelocytes, which nevertheless retain many of could mitigate this potential pitfall. the structural and immunophenotypic characteristics of normal Another shortcoming of many gene expression profiling studies, promyelocytes. M3 AML is further characterized by fusion of the including the AML studies above, is that only a small number retinoic acid receptor α (RARA) gene to another gene, most com- of genes have been validated in a small number of samples, due monly the promyelocytic leukemia (PML) gene, through a balanced to limiting amounts of clinical material available and the labor- translocation of 17 and 15, respectively. The result- intensive and costly nature of quantitative RT-PCR–based (qRT- PCR–based) validation. In this study, we have overcome these lim- Authorship note: Jacqueline E. Payton and Nicole R. Grieselhuber contributed itations with a digital RNA quantitation system, which allowed equally to this work. triplicate measurements of the expression levels of 46 genes, using Conflict of interest: G.K. Geiss is an employee of NanoString and owns stock in that only 100 ng of total RNA (the amount obtained from approxi- company. mately 40,000 myeloid cells) in a multiplex reaction. Thus, the Nonstandard abbreviations used: AML, acute myeloid leukemia; APL, acute pro- confidence of our M3-specific signature is substantially increased myelocytic leukemia; ATRA, all-trans retinoic acid; FDR, false discovery rate; GSEA, gene set enrichment analysis; PCA, principle component analysis; qRT-PCR, quanti- by such extensive validation. tative RT-PCR. In the current study, we compare M3 cell expression patterns Citation for this article: J. Clin. Invest. 119:1714–1726 (2009). doi:10.1172/JCI38248. with those of other AML subtypes and to normal CD34+ cells, pro-

1714 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance

Figure 1 Isolation and expression profiling of myeloid cells. (A) High-speed cell sorting of bone marrow aspirates from healthy donors. FSC, forward scatter; PMNs, polymorphonuclear cells; Pros, promyelocytes; SSC, side scatter. (B) May Grunwald/Giemsa–stained cytospins of sorted pro- myelocytes (left; average purity, 80% promyelocytes, 11% myelocytes) and (right; average purity, 74% mature granulocytes with segmented nuclei, 21% bands [immediate precursor stage prior to the mature granulocyte, characterized by horseshoe-shaped nuclei]). Original magnification, ×100. (C) Microarray signal intensity data demonstrate the expected stage-specific expression of early, middle, and late devel- opmental myeloid genes in each fraction, with minimal expression in other fractions. Data are mean ± SD. (D) Heat map of microarray data demonstrates a progression of expression from less differentiated to terminally differentiated myeloid cells. Red indicates relatively upregulated expression. Green indicates relatively downregulated expression. myelocytes, and neutrophils purified from independent healthy validated using both conventional (qRT-PCR) and innovative human bone marrow samples using high-speed flow cytometry. (NanoString nCounter system; ref. 1) methodologies. Using these data, we define a unique expression signature of M3 We further used gene set enrichment analysis (GSEA) (19, 20) to malignant promyelocytes, which is distinct both from other sub- evaluate our validated gene set in 3 other datasets: a published set types of AML and from normal promyelocytes. A subset of the of 325 human AML samples (18), a mouse model of APL (5), and most highly dysregulated genes in this signature were extensively the PR-9 cell line (21), which is commonly used in studies of PML-

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1715 technical advance

Table 1 myeloid cell fractions was high: the average promyelocyte purity Clinical characteristics of patients and de novo AML samples exceeded 80%, and and band purity was greater than 95%, as determined by manual differentials performed on cytospin Parameter No. %A samples. RNA isolated from purified cells was analyzed on Affyme- Cytogenetic subgroup trix U133+2 microarrays. Normal 28 36 To confirm that each myeloid cell fraction contained cells with t(15;17) only 12 16 gene expression patterns consistent with the predominant cell t(15;17) + other 2 3 type, we compared the RNA expression levels of several develop- t(8;21) only 1 1 mentally regulated myeloid genes (Figure 1C). The “early” hema- inv(16) only 2 3 topoietic genes (associated with primitive myeloid precursor cells) Trisomy 8 only 4 5 CD34, FLT3, and KIT demonstrated much higher expression in the 5q–/–5 only 1 1 CD34+ cell fraction than in the other 2 fractions. Conversely, the 7q–/–7 only 1 1 Complex karyotype 7 9 “late” genes (associated with neutrophils) CTSS, FPR1, IL8RB, and Other 19 25 NCF2 were most highly expressed in the neutrophil fraction. Most FAB subtype importantly for this study, the “mid-myeloid,” promyelocyte-spe- M0 6 8 cific azurophil granule genes CTSG, ELA2, MPO, and PRTN3 dis- M1 18 23 played very high expression in the promyelocyte fraction, which M2 20 26 decreased by an order of magnitude or more in neutrophils. Fur- M3 15 19 ther analysis identified genes specifically expressed in each of the M4 18 23 3 fractions. The heat map in Figure 1D illustrates a progression Sex of gene expression from less differentiated to terminally differen- Male 47 61 tiated myeloid cells. The patterns of expression described above Female 30 39 Ethnicity support the flow cytometric and morphologic data, demonstrat- Asian 1 1 ing that each fraction is highly enriched for the target population. African-American 8 10 Collection of these fractions was essential for a robust comparison White 67 87 of malignant promyelocytes with normal myeloid cells at different Hispanic 1 1 stages of differentiation. Other 0 0 For this study, we analyzed 77 de novo AML bone marrow sam- ples obtained at diagnosis. The characteristics of the patients from Median age of patients was 54 years (range, 18–81). APercentages are from all samples (n = 77) or all patients (n = 77). which these samples were obtained are summarized in Table 1 and have previously been described (Discovery set, FAB subtypes M0–M4; ref. 23). Of these samples, 15 were diagnosed as M3; only samples with t(15;17) confirmed by cytogenetics and/or FISH were RARA activity. Both the human M3 AML and murine APL samples included in the M3 analysis set (24). The remaining 62 samples con- demonstrated significant enrichment of the validated gene set. sisted of FAB subtypes M0, M1, M2, and M4 with 2 or fewer cytoge- However, the PR-9 cells failed to show significant enrichment of netic abnormalities (these FAB subtypes were chosen because they this gene set after induction of the PML-RARA transgene. represent the most common AML subtypes and because there were Importantly, the validated genes reliably identified bona fide M3 insufficient numbers of M5, M6, or M7 patient samples available samples (PML-RARA fusion gene positive), separating them from for analysis). RNA was prepared from snap-frozen cell pellets of other FAB subtypes in 3 independent AML datasets. the bone marrow cells and analyzed on Affymetrix U133+2 expres- sion microarrays. We did not fractionate the AML samples for the Results following reasons: (a) the bone marrow blast percentage for all In order to identify genes that are specifically dysregulated in samples, including M3 abnormal promyelocytes, was high (medi- M3 AML cells, we compared the gene expression patterns of M3 an >70%), (b) we have previously observed that AML bone marrow samples to those of normal myeloid cells at various stages of dif- aspirates subjected to Ficoll separation of mononuclear cells, com- ferentiation. We collected bone marrow from healthy donors and pared with unfractionated snap-frozen cell pellets, demonstrated immediately fractionated it into CD34+ cells, promyelocytes, or no significant differences in expression by microarray analysis (our neutrophils. CD34+ cells were isolated after incubation with an unpublished observations), and (c) as of yet, there is no standard anti-CD34 antibody and separation on a Miltenyi Biotec MACS cell surface marker that can reliably separate malignant AML cells column, resulting in greater than 90% purity, as validated by flow from normal human hematopoietic cells. cytometry (data not shown). To ensure a high-quality expression Defining the M3-specific dysregulome. To identify genes specific to analysis of normal promyelocytes, we refined a previously described M3 AML, we compared M3 samples to other FAB subtypes and to flow cytometry–based methodology (22) to obtain a large number normal myeloid cells at various stages of differentiation. We estab- of highly enriched cells. After red cell lysis, whole bone marrow lished a series of criteria for M3-specific genes: significant differ- was incubated with antibodies to CD9, CD14, CD15, and CD16. ences in expression when compared with non-M3 AML or normal Washed cells were sorted and collected on a Dako MoFlo flow promyelocytes, including up- or downregulation, high expression cytometer as follows: CD9–, CD14–, CD15+, and CD16lo (for pro- similar to that of CD34+ myeloid precursor cells, and/or high myelocytes) and CD9–, CD14–, CD15+ and CD16hi for neutrophils. expression of genes that are not expressed in any of the normal (See Methods for details; Figure 1A for flow cytometric plots; and myeloid cells tested. We first performed significance analysis of Figure 1B for photomicrographs of sorted cells.) Cell purity for all microarrays (25), using a false discovery rate (FDR) cutoff of 0.05.

1716 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance

Figure 2 Identification of the M3-specific dysregulome: genes with significantly different expression in M3 compared with other AML subtypes and normal promyelocytes. (A) Heat map of microarray data demonstrates clear separation of 2,023 significantly up- or downregulated genes in M3 samples compared with other AML subtypes, although some genes were expressed at similar levels in normal and malignant (M3) promyelocytes (mark- ers of promyelocyte differentiation). (B) The genes from A were filtered, by comparison with normal myeloid cells (including normal promyelo- cytes), to retain only those genes with M3-specific expression (510 genes).

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1717 technical advance

Table 2 M3-specific signature’s most dysregulated genes: average microarray expression, fold change, and FDR

Probe set Gene Gene name M3 oAML M3/oAML FDR Pros M3/Pros FDR symbol average average FC average FC 204537_s_at GABRE γ-Aminobutyric acid (GABA) A receptor, e 7,663 216 35.4 0 423 18.1 0.02 205110_s_at FGF13 Fibroblast growth factor 13 8,086 330 24.5 0 4,471 1.8 0.02 210997_at HGF Hepatocyte growth factor 22,667 1,066 21.3 0 406 55.9 <0.01 203074_at ANXA8 Annexin A8 14,843 698 21.3 0 2,231 6.7 <0.01 219225_at PGBD5A PiggyBac transposable element derived 5 1,266 68 18.7 0 252 5.0 <0.01 234224_at PTPRG Protein tyrosine phosphatase, 1,267 75 16.9 0 359 3.5 <0.01 receptor type, G 206634_at SIX3 SIX homeobox 3 4,143 256 16.2 0 294 14.1 <0.01 209686_at S100B S100 calcium binding protein B 4,686 336 14.0 0 36 132.0 <0.01 202237_at NNMT Nicotinamide N-methyltransferase 5,814 425 13.7 0 927 6.3 0.03 212187_x_at PTGDS Prostaglandin D2 synthase 21 kDa 20,218 1,666 12.1 0 548 36.9 <0.01 229459_at FAM19A5 Family with sequence similarity 19 1,434 118 12.1 0 145 9.9 <0.01 (chemokine-like), member A5 214228_x_at TNFRSF4 Tumor necrosis factor receptor 2,476 206 12.0 0 78 31.9 <0.01 superfamily, member 4 202718_at IGFBP2 Insulin-like growth factor binding protein 2 20,887 1,840 11.4 0 701 29.8 <0.01 236787_at SYNE1A Spectrin repeat containing, nuclear 3,848 345 11.1 0 1,118 3.4 <0.01 envelope 1/CDNA FLJ35091 fis 208510_s_at PPARG Peroxisome proliferator activated 1,616 145 11.1 0 32 49.8 <0.01 receptor γ 213943_at TWIST1 Twist homolog 1 3,424 309 11.1 0 157 21.8 <0.01 38487_at STAB1 Stabilin 1 39,862 4,080 9.8 0 301 132.6 <0.01 233422_at EBF3 Early B cell factor 3 692 71 9.7 0 201 3.4 0.03 202994_s_at FBLN1 Fibulin 1 837 91 9.2 0 107 7.8 <0.01 244889_at FUT4A Fucosyltransferase 4 3,658 411 8.9 0 967 3.8 0.02 1552553_a_at NLRC4 NLR family, CARD domain containing 4 214 1,489 0.14 0.02 18,349 0.01 0 202917_s_at S100A8 S100 calcium binding protein A8 17,142 130,368 0.13 <0.01 106 0.02 0 208438_s_at FGR V-fgr oncogene homolog 1,229 9,511 0.13 <0.01 77,507 0.02 0 203508_at TNFRSF1B Tumor necrosis factor receptor 635 5,030 0.13 0.02 5,444 0.12 0 superfamily, member 1B 209949_at NCF2 Neutrophil cytosolic factor 2 1,401 11,929 0.12 0.02 55,034 0.03 0 205681_at BCL2A1 BCL2-related protein A1 766 6,680 0.11 0.02 27,417 0.03 0 1555349_a_at ITGB2 Integrin, β2 2,158 19,282 0.11 <0.01 55,854 0.04 0 1559502_s_at LRRC25 Leucine rich repeat containing 25 238 2,353 0.10 <0.01 2,507 0.09 0 228094_at AMICA1 Adhesion molecule, interacts 373 3,714 0.10 0.01 10,562 0.04 0 with CXADR antigen 1 225639_at SKAP2 Src kinase associated phosphoprotein 2 947 9,550 0.10 <0.01 14,829 0.06 0 202599_s_at NRIP1 Nuclear receptor interacting protein 1 1,153 12,784 0.09 <0.01 6,371 0.18 0 210784_x_at LILRB2/B3/A6 Leukocyte immunoglobulin-like 177 2,021 0.09 0.01 9,711 0.02 0 receptor, subfamily B2/B3/A6 219593_at SLC15A3 Solute carrier family 15, member 3 76 998 0.08 0.02 3,179 0.02 0 217078_s_at CD300A CD300a molecule 151 2,125 0.07 <0.01 2,822 0.05 0 212192_at KCTD12 Potassium channel tetramerisation 973 13,842 0.07 0.02 9,173 0.11 0 domain-containing 12 203535_at S100A9 S100 calcium binding protein A9 5,257 77,048 0.07 <0.01 821,839 0.01 0 205936_s_at HK3 Hexokinase 3 (white cell) 138 2,424 0.06 0.04 60,843 0.00 0 220005_at P2RY13 Purinergic receptor P2Y, 129 2,862 0.04 0.02 28,442 0.00 0 G-protein coupled, 13 224356_x_at MS4A6A Membrane-spanning 4 domains, 609 14,977 0.04 <0.01 3,430 0.18 0 subfamily A, member 6A 205844_at VNN1 Vanin 1 130 4,219 0.03 0.02 12,620 0.01 0

oAML, other AML subtypes; FC, fold change. AAffymetrix annotations for these probe sets changed after microarray experiments were completed and before validation. ENSEMBL alignment of probe set target sequences showed that probe sets 236787_at and 244889_at did not map to the SYNE1 or FUT4 gene regions.

1718 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance

Figure 3 Validation of NanoString nCounter system performance by compari- son with microarray results for calibration genes. A total of 28 AML (11 M3, 17 other AML subtypes), 2 CD34+, 5 promyelocyte, and 2 neu- trophil samples were analyzed. Expression is plotted as a percentage ([sample signal/signal of index group] × 100) because the microarray and nCounter system data were expressed in different units. Aster- isks indicate the signal index group for each graph. The NanoString results showed the expected pattern of expression for all 6 genes. (A) Expression of early myeloid-specific hematopoietic genes in CD34+ cells, promyelocytes, neutrophils, M3 AML, and other FAB subtypes (oAML) as measured by the Affymetrix microarray (red) and NanoString nCounter system (green). (B) Promyelocyte-specific genes. (C) Late myeloid–specific genes.

further study and validation. Some of these highly dysregulated genes (such as HGF, FGF13, and PPARG) have been documented in previous reports (6–8, 12, 14). There are also many genes in this list that have not been previously reported to be dysregulated in M3, including BCL2A1, TWIST1, and TNFRSF1B. Of the 40 genes selected for further study, 17 have not previously been reported in other M3 AML expression studies (6–8, 12, 14). Validation of M3-specific dysregulome. To validate the findings of the microarray analysis, we selected the 40 genes with the largest aver- This analysis identified 2,023 annotated genes (3,787 probe sets) age fold changes (both up and down) between M3 and the other whose expression was significantly up- or downregulated in M3 AML subtypes. Due to limited sample abundance, we used a high- compared with other AML subtypes, as demonstrated by the clear throughput methodology for gene validation that allowed us to separation of the two groups in the expression heat map (Figure perform triplicate measurements of expression of 46 genes (the 40 2A). We observed that some of the genes were expressed at simi- with the highest fold changes, plus 6 developmentally regulated lar levels in both normal and transformed (M3) promyelocytes. myeloid genes for calibration) with only 100 ng of RNA per repli- Therefore, to exclude genes that are simply markers of the nor- cate. Based on an average RNA yield of 25 μg per 107 cells from the mal promyelocyte developmental stage, and conversely to retain bone marrow aspirate samples used in this study, we estimated that genes that represent aberrant expression of developmentally regu- 100 ng corresponded to approximately 40,000 cells. The NanoString lated genes, we filtered these genes based on a comparison with nCounter Analysis System uses digital technology based on direct the specific expression signatures identified for normal myeloid multiplexed measurement of gene expression and offers high levels cells at 3 stages of differentiation (see Methods and Supplemental of sensitivity (500 attomolar, i.e., <1 copy per cell), precision, and Figure 1; supplemental material available online with this article; reproducibility (1). The technology uses molecular barcodes and doi:10.1172/JCI38248DS1). In addition to showing significant single-molecule imaging to detect and count hundreds of unique (FDR < 0.05) differences in expression compared with other AML mRNAs in a single reaction (See Methods) (1). In this study the subtypes, these genes, which we call the M3-specific dysregulome, full capacity of the nCounter system was not utilized; up to 500 fulfilled one or more of the following criteria: (a) were CD34+ pre- genes can be assayed in 1 multiplex reaction (1). To confirm the cursor stage specific but persistently expressed at similar levels in performance of this technology, we selected 6 “calibration” genes M3 cells (M3: CD34+ precursor fraction ratio, ≥1:1), (b) showed known to be differentially expressed in each myeloid cell fraction significantly different expression (FDR < 0.05) from that of nor- and in M3 samples. A total of 28 AML (11 M3 and 17 other AML mal promyelocytes, and/or (c) showed significantly upregulated subtypes), 2 CD34+, 5 promyelocyte, and 2 neutrophil samples were M3 expression (FDR <0.05; M3/normal cell fraction ratio, >2:1) analyzed. The NanoString results showed the expected pattern of and were not expressed in any of the normal myeloid cells tested expression for all 6 calibration genes (compare Figure 3, A–C with (greater than 75% absent calls, determined by the MAS 5 algorithm Figure 1C). As shown in Table 3, 37 of the 40 M3-specific dysregu- in Affymetrix expression analysis software). The normal promy- lome genes were also assayed. The remaining 3 genes, SYNE1, FUT4, elocyte filter removed approximately 1,800 probe sets, and the and PGDB5, could not be analyzed due to either inaccurate (SYNE1 remaining criteria each removed approximately equal numbers of and FUT4) or ambiguous (PGDB5) mapping of Affymetrix probe set probe sets. The heat map in Figure 2B demonstrates clear separa- target sequences to the (See footnote in Table 2). tion of these 510 up- or downregulated genes in malignant versus Data from both methods are plotted for 2 examples each of up- or normal promyelocytes (listed in Supplemental Table 1). downregulated genes (HGF and FAM19A5, NRIP1 and TNFRSF1B, Many of the genes in the M3-specific signature exhibited dra- respectively) in Figure 4. Data from nCounter and microarray anal- matic differences in expression level when compared with other yses demonstrate similar patterns of expression in M3 samples rela- AML subtypes or normal promyelocytes. A subset of genes with tive to other AML subtypes and normal promyelocytes (See Table 3 the greatest level of differential expression is shown in Table 2. To and Supplemental Table 2). investigate genes that may be activated or repressed in M3 AML, To more directly compare the nCounter and microarray meth- equal numbers of up- and downregulated genes were selected for ods, which have different units of measurement, we transformed

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1719 technical advance

Figure 4 Validation of the M3-specific signature by the NanoString nCounter system. (A–D) Expression of (A) HGF, (B) FAM19A5, (C) NRIP1, and (D) TNFRSF1B as measured by the Affymetrix Hu133+2 microarray (left panels) and the nCounter system (right panels). The same samples are plotted as in Figure 3. Each data point represents 1 patient sample. The horizontal line indicates the mean of each group. For microarray plots, each data point represents 1 sample (either patient or sorted normal cells) and indicates signal intensity for a single probe set on 1 microarray. For nCounter plots, each data point represents the mean nor- malized counts for 3 technical replicate measurements of 1 sample (either patient or sorted normal cells).

each AML data point as a proportion of maximum signal for each ference in expression (M3 vs. other subtypes and promyelocytes), probe set (microarray) or probe (nCounter) for all samples. These 33 genes were validated by the nCounter system. proportions were then plotted on a graph. As depicted in Figure We also performed qRT-PCR for 9 of the 40 M3-specific dysregu- 5, A and B, and Table 3, the correlation coefficients between the lome genes in parallel with the nCounter method. For 7 of the 9 results of the 2 methods were r > 0.7 and statistically significant genes, qRT-PCR confirmed the significant fold change expression (P < 0.05) for all but 1 gene, CD300A, which may be due to differ- differences among M3, other FAB subtypes, and normal promyelo- ential targeting of the microarray and nCounter probes (middle cytes, as determined by the NanoString and microarray datasets and 5′, respectively) to an mRNA with 3 isoforms. Three other (Table 3). Due to the limited abundance of many of our samples, genes (AMICA1, SLC15A3, and HK3) demonstrated similar fold we were unable to perform qRT-PCR assays on all 40 M3-specific change values and high correlation coefficients compared with the dysregulome genes. However, the nCounter method has previously microarray data but did not achieve significance when comparing demonstrated strong correlation with qRT-PCR for 21 genes whose expression in M3 with other AML subtypes in the nCounter assay. expression was measured in quadruplicate at 7 time points (1). This result may be due to the low overall expression signals shown To determine whether the 33 validated genes are similarly dys- by both methods (Table 3). We also compared fold change ratio regulated in AML samples from other studies, we used GSEA (19, measurements (M3/other FAB subtypes) of all genes assayed by 20) to evaluate a published dataset (18) (GSE6891). Expression both microarray and NanoString. As demonstrated in Figure 5C, in this set of 325 primary AML samples demonstrated a highly the correlation between the 2 platforms was very high (r = 0.963, significant enrichment of the validated 33-gene set (FDR q value P < 0.05). Based on the stringent criteria of a significantly high cor- = 0.0; Figure 6A). In addition, GSEA analysis of expression in one relation coefficient, similar fold change values, and significant dif- of our mouse models of APL (5) demonstrated significant enrich-

1720 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance

Table 3 M3-specific signature’s most dysregulated genes: comparison of microarray and nCounter fold changes and nCounter average signal, and qRT-PCR validation

Gene symbol Gene name Microarray nCounter data qRT-PCR data M3/oAML M3/oAML M3 oAML P r Validated? P FC FC avg avg GABRE γ-Aminobutyric acid (GABA) A receptor, e 35.4 7.8 14.8 1.9 0.001 0.75 ND ND FGF13 Fibroblast growth factor 13 24.5 20.0 22.1 1.1 <0.001 0.94 No 0.206 HGF Hepatocyte growth factor 21.3 24.4 1,311 53.7 0.001 0.93 Yes <0.001 ANXA8 Annexin A8 21.3 39.0 100.4 2.6 0.003 0.96 Yes <0.001 PTPRG Protein tyrosine phosphatase, 16.9 11.4 14.3 1.3 <0.001 0.8 ND ND receptor type, G SIX3 SIX homeobox 3 16.2 12.0 36.7 3.1 0.003 0.98 ND ND S100B S100 calcium binding protein B 14.0 17.9 123.3 6.9 0.003 0.97 ND ND NNMT Nicotinamide N-methyltransferase 13.7 8.7 27.7 3.2 0.031 0.96 ND ND PTGDS Prostaglandin D2 synthase 21 kDa 12.1 51.6 38.6 0.7 0.003 0.97 ND ND FAM19A5 Family with sequence similarity 19 12.1 19.3 17.9 0.9 0.005 0.97 ND ND (chemokine-like), member A5 TNFRSF4 Tumor necrosis factor receptor 12.0 17.0 27.1 1.6 0.01 0.92 Yes <0.001 superfamily, member 4 IGFBP2 Insulin-like growth factor binding protein 2 11.4 16.5 396.0 24.0 <0.001 0.97 Yes <0.001 PPARG Peroxisome proliferator activated receptor γ 11.1 8.1 25.7 3.2 <0.001 0.96 Yes 0.0033 TWIST1 Twist homolog 1 11.1 9.2 23.4 2.5 <0.001 0.9 ND ND STAB1 Stabilin 1 9.8 10.5 671.5 63.8 <0.001 0.85 Yes <0.001 EBF3 Early B cell factor 3 9.7 9.3 33.0 3.5 0.005 0.72 ND ND FBLN1 Fibulin 1 9.2 6.8 4.1 0.6 0.012 0.86 ND ND NLRC4 NLR family, CARD domain containing 4 0.14 0.2 3.0 13.9 0.005 0.94 ND ND S100A8 S100 calcium binding protein A8 0.13 0.06 103.7 1,780 0.006 0.78 ND ND FGR V-fgr oncogene homolog 0.13 0.06 17.0 268.3 0.017 0.94 ND ND TNFRSF1B Tumor necrosis factor receptor 0.13 0.06 8.5 139.1 0.02 0.99 ND ND superfamily, member 1B NCF2 Neutrophil cytosolic factor 2 0.12 0.24 36.8 152.0 0.046 0.92 No 0.215 BCL2A1 BCL2-related protein A1 0.11 0.25 15.6 63.0 0.035 0.95 ND ND ITGB2 Integrin, β2 0.11 0.12 47.2 387.2 0.002 0.91 ND ND LRRC25 Leucine rich repeat containing 25 0.10 0.11 2.9 27.3 0.041 0.95 ND ND AMICA1 Adhesion molecule, interacts with 0.10 0.63 0.6 0.9 0.203 0.8 ND ND CXADR antigen 1 SKAP2 Src kinase associated phosphoprotein 2 0.10 0.15 3.3 21.5 <0.001 0.86 ND ND NRIP1 Nuclear receptor interacting protein 1 0.09 0.14 4.7 32.9 <0.001 0.9 Yes 0.048 LILRB2/B3/A6 Leukocyte Ig-like receptor 0.09 0.13 14.2 111.7 0.014 0.9 ND ND subfamily B2/B3/A6 SLC15A3 Solute carrier family 15, member 3 0.08 0.24 3.4 13.9 0.075 0.82 ND ND CD300A CD300a molecule 0.07 0.69 6.6 9.5 0.303 0.19 ND ND KCTD12 Potassium channel tetramerisation 0.07 0.13 4.5 35.7 0.042 0.92 ND ND domain-containing 12 S100A9 S100 calcium binding protein A9 0.07 0.05 50.1 992.5 0.012 0.8 ND ND HK3 3 (white cell) 0.06 0.36 4.6 13.0 0.11 0.825 ND ND P2RY13 Purinergic receptor P2Y, G-protein 0.04 0.11 6.2 54.3 0.013 0.95 ND ND coupled, 13 MS4A6A Membrane-spanning 4 domains, 0.04 0.12 2.5 20.1 0.007 0.89 ND ND subfamily A, member 6A VNN1 Vanin 1 0.03 0.18 2.2 12.4 0.005 0.81 ND ND

ND, not determined; avg, average. Pearson’s correlation coefficient (r) comes from a comparison of NanoString and microarray results.

ment (FDR q value = 0.034; Figure 6B) of the murine orthologs PML-RARA fusion gene (see Supplemental Figure 2). GSEA analy- of the validated genes. Finally, we tested expression of these vali- sis failed to demonstrate significant enrichment of the validated dated genes in the PR-9 cell line (21), a commonly-used model of gene set in PR-9 cells expressing high levels of PML-RARA (FDR q M3 AML. Zn2+ treatment massively increased expression of the value = 0.956; Figure 6C).

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1721 technical advance

Figure 5 Comparison plots of NanoString nCounter with Affymetrix GeneChip data for M3-specific genes. (A and B) Scatter plots show the percentage of maximum expression per probe/probe set in all samples for microarray data versus that for nCounter data. Correlation coefficients demonstrate significant correlation between the microarray and nCounter data. (A) Upregulated genes (HGF and FAM19A5), (B) downregulated genes (NRIP1 and TNFRSF1B), and (C) log2 (M3/other AML) fold change ratios as measured by Affymetrix arrays (x axis) and NanoString assay (y axis) for 37 highly dysregulated genes. The linear fit of the ratios in both assays yielded a correlation coefficient of 0.963.

4 morphologically diagnosed M3 samples, in which the t(15;17) was missed by routine cytogenetics, clustered appropriately. Another sample in which morphologi- cal diagnosis (M2) conflicted with routine cytogenet- ics [t(15;17)] was also appropriately identified. Figure 7D demonstrates the ability of the validated gene set to separate all but 1 of the M3 t(15;17)-positive samples from other FAB subtypes; 19 of 20 M3 samples were appropriately identified in a dataset of 93 AML M0–M4 samples obtained from CALGB and analyzed in our microarray facility (see Methods).

Discussion We have demonstrated the use of an innovative high- throughput methodology, the NanoString nCounter Analysis System, to quantify the mRNA abundance of a large number of genes from an expression signature using very small amounts of clinical material. Through the use of a large number of clinical samples and nor- mal, primary myeloid cells, we defined the unique expression signature of M3 AML, which is distinct from other subtypes of AML and from normal pro- myelocytes. We then validated the M3-specific signa- ture using the NanoString nCounter Analysis System, which enabled us to quantitate the relative expression of 46 genes with only 300 ng of total RNA, using mul- tiplexed reactions. We determined that the validated genes were also significantly dysregulated in M3 AML Classification of M3 samples using the NanoString-validated gene set. samples from another large clinical study and from a mouse model We next tested the ability of the 33 validated genes to identify M3 of APL, but not from a commonly used tissue culture model of samples using unsupervised principle component analysis (PCA; PML-RARA function. Finally, the validated genes reliably identify ref. 26). In our dataset, all M3 samples positive for the PML-RARA bona fide t(15;17)-positive M3 samples, separating them from rearrangement separated from the other samples (Figure 7). Nota- other FAB subtypes in 3 independent AML datasets. bly, 1 sample diagnosed morphologically as M3 AML, but with The findings presented here demonstrate the power of includ- normal cytogenetics and negative FISH, did not cluster with those ing a large number of de novo AML samples and normal human positive for the PML-RARA fusion gene (Figure 7A). The patient myeloid samples to define malignancy-specific expression signa- from whom this sample was taken also failed all-trans retinoic acid tures. Comparison of 14 M3 samples with 62 samples of other (ATRA) therapy and died 2 months after induction. PCA of the pri- AML subtypes and 15 samples of normal primary myeloid cells mary NanoString expression data also clearly separated 11/11 M3 allowed us to identify expression patterns that were both unique t(15;17)-positive samples from other FAB subtypes, as expected and highly reproducible for M3 AML. Comparison with normal (Figure 7B). We next tested the validated gene set on a set of 325 enriched promyelocyte samples enabled us to filter out genes that M0–M4 AML samples (18) that contained several potentially were simply markers of the promyelocyte stage of myeloid develop- ambiguous diagnoses. The PCA plot from this analysis shows ment. Although a previous study (7) compared M3 and promyelo- that all 20 M3 t(15;17)-positive samples clustered separately from cyte expression patterns, these were derived from CD34+ PBMCs the other FAB subtypes, as expected (Figure 7C). In addition, cultured for 7 days with G-CSF, IL-3, and GM-CSF. Our analysis

1722 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance

Figure 6 The validated 33-gene M3-specific signature is consistently dysregu- lated in other AML datasets and a mouse model of APL, but not in a PML-RARA+ cell line. The top portion of each GSEA plot shows the running enrichment score for the validated M3-specific genes as the analysis moves down the ranked list. The peak score for each plot is the enrichment score for the gene set. The bottom portion of each plot shows the value of the ranking metric as it moves down the list of ranked genes. The FDR is an expression of the significance level of the enrichment, after multiple test correction. (A) GSEA plot of 325 M0–M4 AML samples (GSE6891), comparing M3 with other FAB sub- types, demonstrates significant enrichment. (B) GSEA plot of mCG- PML-RARA murine APL cells (20, 24) compared with day 2 wild-type murine myeloid cells (mostly promyelocytes) demonstrates significant enrichment. (C) GSEA plot of uninduced versus PML-RARA–induced PR-9 cells demonstrates no enrichment of the M3-specific genes at any time point.

showed that the majority of the genes reported in that study (7) were filtered out by our comparison of M3 with CD34+ cells, pri- mary promyelocytes, neutrophils, and other AML subtypes. The NanoString nCounter system allowed us to quantify and validate (in triplicate) the expression of 42 of 46 genes in 28 AML and 11 normal myeloid samples, using approximately 1/10th of the RNA that qRT-PCR would have required. The nCounter sys- tem performed with a high level of precision and reproducibility, using only 100 ng of RNA (the RNA content of ~40,000 AML cells) per replicate. Expression signal values demonstrated significant correlation with microarray expression data. The coefficient of variation, a ratio of the standard deviation to the mean and a mea- sure of reproducibility, was consistent with that of conventional qRT-PCR (data not shown). We have shown that the validated M3 gene set was significantly enriched in M3 samples in another large clinical study. Experimen- tal and in silico validation of the signature allowed us to examine 2 models of APL, one a knockin mouse model of the disease, the other a myeloid cell line with inducible expression of PML-RARA. Analysis of expression in APL cells derived from mCG-PML-RARA mice (27) demonstrated that the validated gene set was significantly dysregulated when compared with wild-type promyelocytes. Our previous work with this mouse model demonstrated that only 3 of 116 genes in the murine APL dysregulome were dysregulated in PML-RARA–expressing preleukemic promyelocytes, suggesting that in mice the genes that are dysregulated in APL are not down- stream targets of the transgene (27). Similarly, the validated gene set identified in the current study was not altered in PR-9 cells, suggesting that many of the dysregulated genes in primary M3 AML samples are not direct targets of PML-RARA. We have further demonstrated that the validated 33-gene set identifies M3 samples from within AML microarray expression datasets from other large studies, reliably separating those with t(15;17) and/or the PML-RARA fusion gene from those that were morphologically ambiguous. Only 1 of 60 M3 samples analyzed by PCA failed to segregate with the other M3s. Unsupervised hier- archical clustering of global expression analysis using the CALGB samples demonstrated that this outlier M3 sample segregated with a large mixed group of FAB subtypes and not with the other M3s (data not shown). There was no difference between the survival of the patient from whom this sample was taken and the survival of other M3 patients in the CALGB group. This evidence suggests that in a small number of patients there may be secondary muta-

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1723 technical advance

Figure 7 The NanoString-validated, 33-gene M3-specific signature reliably identifies M3 samples, including those with normal cytogenetics and/or ambigu- ous morphology. PCA plots of the validated gene expression data demonstrate a clear separation of M3 t(15;17)-positive samples (red) from other FAB subtypes (gray). (A) Data from the Washington University AML discovery set, including 15 M3 and 62 M0, M1, M2, and M4 AML samples (1). The PCA plot shows clustering of all M3 samples with the PML-RARA rearrangement, but not of 1 sample with an M3 morphologi- cal diagnosis, normal cytogenetics, and negative FISH that did not respond to ATRA therapy (blue). (B) NanoString nCounter expression data were sufficiently robust to separate 11/11 of M3 t(15;17)-positive samples from other FAB subtypes. (C) M3 samples from a published dataset (GSE6891) formed a distinct cluster separate from other FAB subtypes (total of 325). M3s with t(15;17) that were missed by routine cytogenetics (yellow) and a t(15;17)-positive sample morphologically classified as M2 (green) were also assigned appropriately to the M3 cluster. (D) A total of 19/20 M3s with t(15;17) from a CALGB sample set clustered separately from 73 other FAB subtypes.

tions that can modify expression phenotypes without altering important since ATRA is critical for the proper treatment of M3 response to ATRA. AML patients. When treated with ATRA, patients with M3 AML Importantly, the NanoString dataset itself was sufficient in reli- have a significantly higher survival rate than those with other FAB ably identifying all M3 samples in an unsupervised PCA of the subtypes (28). Moreover, due to the risk of disseminated intravas- 33-gene validated set. In both our study and one other that we cular coagulation in M3 AML, rapid, highly accurate diagnosis analyzed (18), the results of routine cytogenetics conflicted with and initiation of therapy are crucial for optimal patient outcomes. the morphologic diagnosis in several samples, but all PML-RARA+ Although most cases of M3 AML can be diagnosed using routine samples were appropriately clustered by PCA. This conflict is methods, a substantial number present atypically (i.e., without

1724 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 technical advance evidence for the reciprocal translocation [ref. 29], with atypical microarrays (Affymetrix) using standard protocols from the Labora- morphology, or with normal cytogenetics [ref. 18]). tory for Clinical Genomics (http://www.pathology.wustl.edu/research/ The findings of this study may have implications for other types lcgoverview.php; ref. 27). Profiling data for all samples have been depos- of cancer as well. For example, the diagnosis of solid tumors is often ited on the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih. made from fine needle biopsies, which retrieve a small amount of gov/geo/; accession no. GSE12662). tissue containing relatively few tumor cells. Depending upon the Analysis of AML and normal myeloid datasets. To find genes that are differen- type of needle used and the ratio of tumor cells to stroma, a few tially expressed in M3 in comparison with M0, M1, M2, and M4 subtypes, hundred thousand to one million cells are typically extracted (30). all probe sets with fewer than 10% present calls in both groups and less than Given the typical RNA yield of a metabolically active tumor cell, 0.5 coefficient of variation across all samples were eliminated from further fine needle biopsies from solid tumors would likely provide suf- analysis. The remaining probe sets were analyzed using significance analy- ficient RNA for nCounter assays of hundreds of genes. sis of microarrays, 2-class analysis; 3,787 probe sets were significant at an In summary, we have identified and validated a set of genes that FDR of 0.05. The normal myeloid developmental signature was defined by is significantly and specifically dysregulated in M3 relative to other probe sets that were significantly different among CD34+, promyelocytes, subtypes of AML and normal myeloid cells, including promyelo- and neutrophils at an ANOVA-adjusted P < 0.05 after multiple test correc- cytes. The nCounter method used to validate this signature is pre- tion. Probe sets specific to each developmental class were defined as having cise, sensitive, and automated and requires only 300 ng of RNA to a significantly higher average expression in one class relative to both other assay up to several hundred genes in triplicate. The manufacturer classes (adjusted P < 0.05), yielding 2,622 CD34+-specific, 371 promyelocyte- provides custom probes, and the assay can be performed on site in specific, and 601 neutrophil-specific probe sets. a research or clinical lab. With the advent of this innovative tech- Cell lines. NB4 cells were obtained from ATCC. PR-9 cells were a gift of nology, a more extensive validation of microarray-based signatures P. Pelicci of the European Institute of Oncology, Milan, Italy (21). All cell in precious clinical samples is now attainable. Extensive validation lines were maintained in RPMI 1640 with 10% fetal calf serum. PR-9 cells of dysregulated genes from clinical samples will allow us to more were induced in 100 μM ZnSO4 diluted in medium. Cell lysates were col- confidently assess the gene expression profiles of parallel studies lected at 0, 2, 4, 8, 16, and 24 hours after induction. RNA was isolated, performed in different laboratories and to more precisely evalu- quantified, and hybridized to microarrays as described above. ate model systems for human cancers. Furthermore, use of the Western blots. Prior to lysis, 2 × 106 cells were incubated in the presence of nCounter method to assay the M3-specific signature provides 100 μM diisopropyl-fluorophosphate (Sigma-Aldrich), then lysed in 100 a valuable diagnostic tool and offers the potential to assay the μl 2% SDS/PBS. Total protein (20 μg) was electrophoresed and Western expression of hundreds of genes in very small clinical samples. blotting performed as previously described (5). NanoString nCounter assay. Details of the nCounter Analysis System Methods (NanoString Technologies) were reported previously (1). In brief, 2 Human AML and normal sorted bone marrow samples. Seventy-seven de novo sequence-specific probes were constructed for each gene of interest (Sup- adult AML bone marrow aspirates, including 14 M3 samples, were ana- plemental Table 3). The probes were complementary to a 100-base region lyzed. Patient selection has been described previously (23); patient char- of the target mRNA. One probe was covalently linked to an oligonucle- acteristics are summarized in Table 1. Bone marrow aspirates were also otide containing biotin (the capture probe), and the other was linked to obtained from healthy adult donors. This study was approved by the a color-coded molecular tag that provided the signal (the reporter probe; Human Research Protection Office at Washington University School of see ref. 1). The nCounter CodeSet for these studies contained probe pairs Medicine after patients and donors provided informed consent in accor- for 73 test and control genes. Forty-six probe pairs were specific for Homo dance with the Declaration of Helsinki. Isolation of normal promyelocytes sapiens genes, and 28 corresponded to various nCounter system controls, and neutrophils was performed as described previously (22). Briefly, high- including a standard curve. Detailed sequence information for the target speed cell sorting isolated CD9–, CD14–, CD15hi, and CD16lo promyelocytes regions, capture probes, and reporter probes is listed in Supplemental and CD9–, CD14–, CD15hi, and CD16hi neutrophils (Figure 1, A and B). Table 3. Each sample was hybridized in triplicate with 100 ng of total RNA MACS sorting was performed to isolate normal CD34+ cells according in each reaction. All 46 genes and controls were assayed simultaneously in to the manufacturer’s instructions (Miltenyi Biotec). For all samples, multiplexed reactions (for details, see ref. 1). To account for slight differ- sufficient cells were collected to perform the standard 1-cycle 3ʹ in vitro ences in hybridization and purification efficiency, the raw data were nor- transcription protocol; this strategy avoids the bias introduced by linear malized to the standard curve generated via the nCounter system spike-in amplification (2-cycle) required for small amounts of RNA. Sorted cells controls present in all reactions. were lysed in Trizol reagent (Invitrogen) and stored at –80°C until RNA qRT-PCR. One-step qRT-PCR was performed on 20 ng total RNA using the purification. RNA from AML bone marrow aspirates was prepared from QuantiTect SYBR Green RT-PCR kit and QuantiTect Primer assays (Qiagen) unfractionated snap-frozen cell pellets using Trizol reagent. RNA from all on a Prism 7300 real-time PCR system (Applied Biosystems) according to samples was quantified using UV spectroscopy (Nanodrop Technologies) the manufacturer’s instructions. All reactions were performed in triplicate. and qualitatively assessed using a BioAnalyzer 2100 and RNA NanoChip Expression was normalized to GAPDH using the ΔCt method. assay (Agilent Technologies). An additional 93 de novo AML bone mar- Analysis software. For a significance analysis of microarrays, depending on row samples, described previously (23), were obtained from C. Bloom- the sample set, 2-class or multi-class analysis was performed. An FDR cut- field and M. Caligiuri, both of The Ohio State University Comprehensive off of 0.05 was used (http://www-stat.stanford.edu/~tibs/SAM/) (25). For Cancer Center and James Cancer Hospital & Solove Research Institute, GSEA, depending on the sample size, phenotype or gene set permutation Columbus, Ohio, USA; J. Vardiman of the University of Chicago, Chicago, analysis with ratio-of-classes or signal-to-noise gene ranking were performed Illinois, USA; and the Cancer and Leukemia Group B Tumor Bank, Chi- using GSEA (http://www.broad.mit.edu/gsea) (19, 20). Spotfire DecisionSite cago, Illinois and processed using the same methods as the samples from 8.2 software (TIBCO) was used in PCA and Wards hierarchical clustering. Washington University School of Medicine. Samples were labeled and Statistics. P values for NanoString nCounter and qRT-PCR data were cal- hybridized to Affymetrix Human Genome U133 Plus 2.0 Array GeneChip culated using a Student’s 2-tailed t test and were considered significant

The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009 1725 technical advance when P < 0.05. Correlation coefficients for comparison of nCounter and uri, and James Vardiman, and the Cancer and Leukemia Group B microarray data were calculated as follows: each patient data point was (CALBG) Tumor Bank for providing the CALBG AML samples transformed to a percentile of the maximum value for that particular probe used in this study. The authors are grateful to the High Speed Cell set (microarray) or probe (nCounter). Microarray and nCounter percentiles Sorter Core, the Laboratory for Clinical Genomics, the Tissue Pro- were plotted against each other (see Figure 5) and the Pearson’s correlation curement Core, and the Bioinformatics Core of the Siteman Can- coefficient R calculated. Correlation coefficients were considered signifi- cer Center for their invaluable support. Nancy Reidelberger pro- cant if greater than 0.374, which corresponds to P < 0.05. vided expert editorial assistance. Richard Perrin provided expert Note that the T statistic for R is calculated by the formula T = R√(n – 2) / microscopy assistance. Matthew Walter critically read the paper √(1 – R2). T = 2.056 when P = 0.05 with a 2-tailed distribution. Using and provided helpful advice. T = 2.056 and n = 28 (the total number of AML samples assayed by both Affymetrix microarrays and nCounter), the equation was solved for Received for publication December 5, 2008, and accepted in revised R (R = 0.374), meaning that any R value greater than 0.374 was statisti- form March 25, 2009. cally significant (P < 0.05). Address correspondence to: Timothy J. Ley, Division of Oncol- Acknowledgments ogy, Washington University Medical School, Section of Stem Cell This work was supported by grants from the NIH (RO1 CA083962 Biology, Campus Box 8007, 660 South Euclid Avenue, St. Louis, and PO1 CA0101937) and the Barnes-Jewish Hospital Foundation Missouri 63110, USA. Phone: (314) 362-8831; Fax: (314) 362-9333; (to T.J. Ley). The authors thank Clara Bloomfield, Michael Caligi- E-mail: [email protected].

1. Geiss, G.K., et al. 2008. Direct multiplexed mea- diagnosis of leukemia using gene expression profil- preting genome-wide expression profiles. Proc. Natl. surement of gene expression with color-coded ing. Blood. 106:1189–1198. Acad. Sci. U. S. A. 102:15545–15550. probe pairs. Nat. Biotechnol. 26:317–325. 11. Haferlach, T., et al. 2005. AML M3 and AML M3 21. Grignani, F., et al. 1993. The acute promyelocytic 2. Grisolano, J.L., Wesselschmidt, R.L., Pelicci, P.G., variant each have a distinct gene expression signa- leukemia-specific PML-RAR[alpha] fusion protein and Ley, T.J. 1997. Altered myeloid development ture but also share patterns different from other inhibits differentiation and promotes survival of and acute leukemia in transgenic mice express- genetically defined AML subtypes. Genes Chromo- myeloid precursor cells. Cell. 74:423–431. ing PML-RAR alpha under control of cathepsin G somes Cancer. 43:113–127. 22. Grenda, D.S., et al. 2007. Mutations of the ELA2 regulatory sequences. Blood. 89:376–387. 12. Kohlmann, A., et al. 2003. Molecular characteriza- gene found in patients with severe congenital neu- 3. Brown, D., et al. 1997. A PMLRARalpha transgene tion of acute leukemias by use of microarray tech- tropenia induce the unfolded protein response and initiates murine acute promyelocytic leukemia. nology. Genes Chromosomes Cancer. 37:396–405. cellular . Blood. 110:4179–4187. Proc. Natl. Acad. Sci. U. S. A. 94:2551–2556. 13. Lee, S., et al. 2006. Gene expression profiles in acute 23. Link, D.C., et al. 2007. Distinct patterns of muta- 4. He, L.Z., et al. 1997. Acute leukemia with promyelo- myeloid leukemia with common translocations using tions occurring in de novo AML versus AML aris- cytic features in PML/RARalpha transgenic mice. SAGE. Proc. Natl. Acad. Sci. U. S. A. 103:1030–1035. ing in the setting of severe congenital neutropenia. Proc. Natl. Acad. Sci. U. S. A. 94:5302–5307. 14. Valk, P.J., et al. 2004. Prognostically useful gene- Blood. 110:1648–1655. 5. Westervelt, P., et al. 2003. High-penetrance mouse expression profiles in acute myeloid leukemia. 24. Jaffe, E.S., Harris, N.L., Stein, H., and Vardiman, model of acute promyelocytic leukemia with very N. Engl. J. Med. 350:1617–1628. J.W. 2001. Pathology and genetics of tumours of hae- low levels of PML-RARalpha expression. Blood. 15. Ross, M.E., et al. 2004. Gene expression profiling matopoietic and lymphoid tissues. IARC Press. Lyon, 102:1857–1865. of pediatric acute myelogenous leukemia. Blood. France. 352 pp. 6. Bullinger, L., et al. 2004. Use of gene-expression pro- 104:3679–3687. 25. Tusher, V.G., Tibshirani, R., and Chu, G. 2001. filing to identify prognostic subclasses in adult acute 16. Schoch, C., et al. 2002. Acute myeloid leukemias Significance analysis of microarrays applied to the myeloid leukemia. N. Engl. J. Med. 350:1605–1616. with reciprocal rearrangements can be distin- ionizing radiation response. Proc. Natl. Acad. Sci. 7. Casorelli, I., et al. 2006. Identification of a molecu- guished by specific gene expression profiles. Proc. U. S. A. 98:5116–5121. lar signature for leukemic promyelocytes and their Natl. Acad. Sci. U. S. A. 99:10008–10013. 26. Jolliffe, I.T. 2002. Principal component analysis. 2nd edi- normal counterparts: Focus on DNA repair genes. 17. Wilson, C.S., et al. 2006. Gene expression profiling tion. Springer. New York, New York, USA. 502 pp. Leukemia. 20:1978–1988. of adult acute myeloid leukemia identifies novel 27. Yuan, W., et al. 2007. Commonly dysregulated 8. Debernardi, S., et al. 2003. Genome-wide analysis biologic clusters for risk classification and out- genes in murine APL cells. Blood. 109:961–970. of acute myeloid leukemia with normal karyo- come prediction. Blood. 108:685–696. 28. Mrozek, K., Heerema, N.A., and Bloomfield, C.D. type reveals a unique pattern of homeobox gene 18. Verhaak, R.G., et al. 2009. Prediction of molecular 2004. Cytogenetics in acute leukemia. Blood Rev. expression distinct from those with translocation- subtypes in acute myeloid leukemia based on gene 18:115–136. mediated fusion events. Genes Chromosomes Cancer. expression profiling. Haematologica. 94:131–134. 29. Lafage-Pochitaloff, M., et al. 1995. Acute promyelo- 37:149–158. 19. Mootha, V.K., et al. 2003. PGC-1alpha-responsive cytic leukemia cases with nonreciprocal PML/RARa 9. Gutierrez, N.C., et al. 2005. Gene expression profile genes involved in oxidative are or RARa/PML fusion genes. Blood. 85:1169–1174. reveals deregulation of genes with relevant func- coordinately downregulated in human diabetes. 30. Welker, L., Akkan, R., Holz, O., Schultz, H., and tions in the different subclasses of acute myeloid Nat. Genet. 34:267–273. Magnussen, H. 2007. Diagnostic outcome of two leukemia. Leukemia. 19:402–409. 20. Subramanian, A., et al. 2005. Gene set enrichment different CT-guided fine needle biopsy procedures. 10. Haferlach, T., et al. 2005. Global approach to the analysis: a knowledge-based approach for inter- Diagn. Pathol. 2:31.

1726 The Journal of Clinical Investigation http://www.jci.org Volume 119 Number 6 June 2009