346 Vol. 1, 346–361, March 2003 Molecular Cancer Research

Identification of a Global Expression Signature of B-Chronic Lymphocytic Leukemia

Diane F. Jelinek,1 Renee C. Tschumper,1 Gustavo A. Stolovitzky,4 Stephen J. Iturria,2 Yuhai Tu,4 Jorge Lepre,4 Nigam Shah,4 and Neil E. Kay3

1Departments of Immunology, 2Biostatistics, and 3Internal Medicine, Mayo Graduate and Medical Schools, Mayo Clinic, Rochester, MN; and 4IBM, T.J. Watson Research Center, Yorktown Heights, New York, NY

Abstract of circulating clonal leukemic B cells that typically express B-chronic lymphocytic leukemia (B-CLL) is an CD19, CD23, CD5, and low levels of surface immunoglobulin adult-onset leukemia characterized by significant (Ig) (3). B-CLL is not characterized by highly proliferative cells accumulation of apoptosis-resistant monoclonal B but rather by the presence of leukemic cells with significant lymphocytes. In this study, we performed gene resistance to apoptosis (4, 5). expression profiling on B cells obtained from 10 healthy The underlying cause of B-CLL remains unknown. In this age-matched individuals and CLL B cells from 38 B-CLL regard, profiling has been used successfully by a patients to identify key genetic differences between CLL variety of investigators to discover genetic differences between and normal B cells. In addition, we leveraged recent tumor cells and normal counterpart cells and to aid in independent studies to assess the reproducibility of our subclassification of human cancers. This information is proving molecular B-CLL signature. We used a novel combina- to be very useful in understanding tumor cell biology. This tion of several methods of data analysis including our methodology has also been applied toward achieving a better own software and identified 70 previously unreported understanding of B-CLL by other investigators. Plate et al. (6) that differentiate leukemic cells from normal B used a cDNA expression array system to identify genes up- cells, as well as confirmed recently reported B-CLL regulated by Fludarabine; however, the analysis was performed specific expression levels of an additional 10 genes. using an array limited to apoptosis-related genes, only one Importantly, many of these genes have previously been patient sample was studied, and there was no independent linked with other cancers, thus lending further support verification of the array data. Similarly, Aalto et al. (7) used a to their importance as candidate genes leading to B-CLL focused array approach (406 genes) to identify patterns pathogenesis. We have also validated a subset of these associated with an 11q23 deletion in 34 B-CLL samples, but genes using independent methodologies. Moreover, we there was no independent verification of differentially expressed show that our genes can be used to create a diagnostics genes. Stratowa et al. (8) used a microarray that allowed signature that performs with perfect sensitivity and assessment of 1024 genes on 54 B-CLL samples. They identified specificity in an independent cohort of 21 B-CLL and 20 three genes that permitted a distinction between high versus low normal subjects, thus strongly validating the informative Rai stages. Only one of these genes was assessed by alternative nature of our set of genes. Finally, we identified a group methodologies and only on two patient samples. Finally, two of 31 genes that distinguish between low (Rai stage 0) studies used microarray platforms, which assess much larger and high (Rai stage 4) risk patients, suggesting that numbers of genes to better understand the global molecular there may also be a gene expression signature that genetics of B-CLL (9, 10). Both groups used gene expression associates with disease progression. profiling as a tool to subclassify disease and to identify genetic differences between B-CLL cells and normal B-lymphocyte Introduction subsets. Klein et al. (9) concluded that B-CLL was more related B-chronic lymphocytic leukemia (B-CLL3) is the most to memory B cells than other B-cell subsets and Rosenwald et al. common leukemia in the United States and is a significant (10) concluded that the subset of B-CLL patients, the Ig variable cause of morbidity and mortality in the older adult population region of which was germline (GL) in sequence, was more (1, 2). The hallmark feature is the presence of elevated numbers related to activated B cells than to resting B cells. Both groups identified a number of interesting genes that appeared to be under- or overexpressed in B-CLL; however, with one exception, there was no independent validation of the overall level of gene expression. In addition, there were significant differences in the Received 10/2/02; revised 2/14/03; accepted 2/14/03. nature of the differentially expressed genes between the two The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in groups perhaps reflecting two different array platforms and accordance with 18 U.S.C. Section 1734 solely to indicate this fact. technologic noise. It should be noted, however, that it is difficult Grant support: Mayo Comprehensive Cancer Center, National Cancer Institute to thoroughly compare these two data sets without a complete CA91542 (awarded to N.E. Kay), and generous philanthropic support provided by Edson Spencer. listing of significant genes in addition to the top-ranked Requests for reprints: Diane F. Jelinek, Department of Immunology, Mayo differentially expressed genes. Moreover, some of the apparent Clinic, 200 First Street SW, Rochester, MN 55905. Phone: (507) 284-5617; Fax: (507) 266-0981. E-mail: [email protected] differences may instead reflect the use of slightly different non- Copyright D 2003 American Association for Cancer Research. diseased B-cell controls in each study.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 347

Clearly, gene expression profiling has the potential to mean log expression for the same gene in the cohort of normal provide a better understanding of human disease. However, controls, and the difference was calculated. This difference is the value of the data emerging from this type of analysis is nothing but the logarithm of the familiar fold change of gene critically dependent on a number of factors, including expression in B-CLL patients and control subjects, or more reproducibility (reviewed in Ref. 11). For example, whereas specifically, the log of the ratio of the geometric means of the similarities in differentially expressed genes in a given tumor expression level in case versus control. A potential hazard of have been reported by independent groups (12–14), there have gene expression profiling analyses is that the measurement of also been reports that fail to observe even a single common a sufficiently large number of genes on a relatively small gene (15, 16). number of patient samples can lead to the identification of In this study, we performed global gene expression profiling genes that appear to discriminate well between patient sample on 38 B-CLL patients and peripheral blood B cells from 10 types, but in fact are simply chance results. To protect healthy, age-matched patients (60 years) to determine the ourselves against false discoveries, we determined appropri- extent of genetic differences between B-CLL cells and normal ately stringent significance thresholds of experiment-wide B cells and whether differences exist between low and high risk false-positive rates. The significance thresholds were deter- B-CLL patients. Because we wanted to use gene expression mined by randomizing the data set 1000 times, that is, by profiling but were aware of the technical challenges of this randomly reassigning the patient labels such that some B-CLL methodology, we employed a rigorous experimental design and samples are now labeled as control B cells and vice versa, innovative methods of data analysis. This was accompanied by determining the maximum (over all the genes) difference in experiments designed to verify the expression patterns of a mean log expression (i.e., the maximum log fold change) for subset of informative genes. Finally, because we used the same the randomized data sets (in both directions), and determining microarray platform as that employed by Klein et al. (9), we the distribution for the maximum values. The significance also discuss the reproducibility of differentially expressed genes threshold was chosen as a high percentile (in the range of 90– in B-CLL. Our independent validation of genes with data 99%) of the distribution for the maximum values. Table 1 generated in an independent laboratory is an important indicates that for a significance threshold corresponding to an contribution in that it underscores the validity of the involve- experiment-wide 5% false-positive rate, there were only a ment of these key genes in the biology of this disease. Thus, we small number of informative genes. Thus, to classify a gene as describe the discovery of 70 novel genes that clearly being overexpressed in B-CLL relative to control B cells, the discriminate between B-CLL and normal peripheral blood B log fold change had to be greater than 1.99, that is, the fold cells. Of interest, this collection is highly enriched for genes change had to be greater than 97.7. To classify a gene as being known to be under- or overexpressed in other cancers, overexpressed in control B cells relative to CLL B cells, its suggesting many new areas that can be exploited in the study log fold change had to be greater than 1.94, which of the transformation of B-CLL. corresponds to a fold change greater than 87. It is noteworthy that these fold changes are substantially larger than the more typical 2-fold changes that are commonly used as thresholds in Results microarray experiments. Using this method, 11 gene probes Expression Profiling of B-CLL Cells versus Normal B Cells were identified as being significantly overexpressed in B-CLL Gene expression profiling was performed on patient tumor relative to control and 10 gene probes were significantly cells obtained from 38 B-CLL patients and on CD19+ B cells overexpressed in control B cells relative to B-CLL. The purified from the peripheral blood of healthy, age-matched identity of these genes is listed in Table 2. Of note, however, individuals. In this study, we wished to answer two questions. three of the genes that appear to be overexpressed in control B First, what are the genes that best differentiate between a cells versus B-CLL cells are Ig-related. This may reflect cohort of CLL patients and a control group in our data? patient-to-patient heterogeneity in light chain usage, that is, n Second, what confidence do we have that these genes are not versus E light chain, or the known fact that B-CLL cells tend the result of statistical or technical artifacts? For analysis of to underexpress Ig when compared to their normal counterpart the primary data, we generated two lists of differentially B cells. expressed genes. One of the lists was generated using the fold change method with very stringent significance thresholds. The other list was generated with a second and novel combination of methods consisting of the union of the genes Table 1. Upper Quantiles for Maxima of 1000 Randomized Data Sets found using the concatenation of filters approach and the multivariate analysis. Comparisons Upper Percentile (difference in mean log expression)

Fold Change Analysis of Differential Gene Expression 10 5 1 0.1 To identify genes differentially expressed between B-CLL B-CLL > Control 1.89 (17) 1.99 (11) 2.20 (8) 2.39 (5) cells and normal counterpart B cells, we first performed a Control > B-CLL 1.84 (11) 1.94 (10) 2.21 (5) 2.70 (2) gene-by-gene analysis of the difference in mean log expression between the two groups. Briefly, the mean log Note: Using the fold change analysis method, significance thresholds for 0.1 – 10% false-positive rates were determined. Numbers in parentheses indicate the expression for each gene on the Affymetrix GeneChip was number of genes in the non-permuted data that exceeded the mean log determined for the B-CLL patients and compared with the expression thresholds.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 348 Gene Expression Profiling of B-CLL

Table 2. Discriminatory Genes Identified Using the Fold designed to protect ourselves against false discoveries as Change Method at a Significance Level of 5% discussed above. As in the fold change analysis, each gene is treated independently. To be deemed differentially expressed in Category Mean Log Fold Gene Gene Description Difference Change the two classes, a gene has to successively pass the experi- mental noise (EN), the present/absent (P/A), and the signal- B-CLL > Control 3.34 2188 FMOD Fibromodulin to-noise (S/N) filters. With respect to the EN filter, the degree 3.03 1071 Unknown Androgen-dependent of assay variability is depicted in Fig. 1 (see ‘‘Materials and like-gene (clone 413+6+B2) Methods’’). Thus, each filter eliminates the genes that do not 2.92 832 LEF1 Lymphoid enhancer meet the filter’s criterion. Starting with the 12,625 gene factor-1 segments present in the Affymetrix HG-U95Av2 GeneChip, 2.58 380 COL9A2 a 2 type 4x 2.45 282 ABCA6 ATP-binding cassette we first applied the EN filter, and obtained 54 gene probes ( plus 2.29 195 TDO2 Tryptophan oxygenase symbols in Fig. 1). Of note, there was only one gene that passed 2.26 182 NUMA1 Nuclear mitotic apparatus 1 the fold change analysis (depicted by the dashed lines in Fig. 1) 2.23 170 PTPN7 Tyrosine phosphatase but did not pass the EN filter (indicated by the arrow in Fig. 1). 2.17 148 RASGRF1 Guanine nucleotide However, there were more than 30 genes that passed the EN releasing factor 2.15 141 Unknown Highly similar to filter but did not pass the fold change test. Of these 54 gene chaperonin probes, only 47 passed the P/A filter with parameter setting of GroEl precursor; hsp60 0.2, and of these 47, only 37 passed the S/N filter. That is, 37 2.14 138 IGFBP4 IGFBP4 Control > B-CLL 2.88 759 ZNF145 PLZF gene probes appear to differentiate the CLL patients from the 2.88 759 IGL Light chain (IgVmAb59) control group according to this method of univariate analysis. 2.35 224 MADH3 Mothers against decapentaplegic homologue 3 Multivariate Analysis of Differential Gene Expression With 2.30 200 NTSE CD73 2.27 186 GS3955 Serine/threonine kinase the Genes@Work Software 2.19 155 JUNB jun B We next wished to combine the power of the fold change test 2.15 141 IGJ Ig J polypeptide and concatenation of filters with a third method based on a 2.03 107 IL6 IFN-h 2a/interleukin 6 1.95 89 UMOD Uromodulin multivariate analysis, in which genes are considered not 1.94 87 Unknown Kappa Ig pseudogene individually, but rather in groups. In this method, the consistency of the expression of a given group of genes across the phenotype set confers the statistical significance. At the core of this multivariate analysis is the algorithm Genes@Work, Concatenation of Filters Analysis of Differential described in the ‘‘Materials and Methods.’’ After application of Gene Expression Genes@Work, we obtained 59 gene probes, of which only 54 We next applied alternative univariate analysis method- passed the P/A filter. Therefore, the multivariate analysis ologies that make use of three analytical filters that were also yielded 54 gene probes.

FIGURE 1. Variability of replica data. Each dot corresponds to a gene, whose coordinates f 1 and f 2 are the gene expression level in the replicate hybrid- izations of the same biological sample. The solid convergent lines constitute an enve- lope of the EN. Data points outside of these convergent lines pass the EN filter. The abscissa of each plus-symbol data point reflects the geometric mean of the expression values in the CLL panel, and the ordinate reflects the geometric mean of the expression values in the control B cells. Dashed lines, thresholds of signifi- cance of the fold change test. Arrow, indicates the only gene that passed the fold change test but not the EN filter.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 349

There are only 10 gene probes in common between the 37 Table 4. Identification of Genes Underexpressed in B-CLL identified using the concatenation of filters analysis and the 54 Using a Concatenation of Data Filters and the Genes@ Work Algorithm using multivariate analyses. Thus, the number of total gene probes after application of these rigorous gene selection criteria Category Gene Gene Description Lit. Link is 81. Because of duplicate probes for some genes, the number of to Cancer unique genes after this analysis was 72 (summarized in Tables 3 and 4). Among these 72 genes, Klein et al. have identified 10 of Unknown FLJ10374 Unknown GS3955 Unknown them previously as being differentially expressed between KIAA0763 Unknown normal and CLL B cells. These 10 genes (corresponding to 11 KIAA0350 Unknown gene probes) are grouped at the bottom of Fig. 2 and the genes KIAA0916 Large myc-binding protein Cytokine IL15 Interleukin 15 + that are novel are grouped above (n = 62 unique genes). Out of LTB Lymphotoxin h + these 72 genes, 13 were also identified using the fold change Adhesion ITG4A Integrin a 4, antigen + analysis (indicated by boldface type in Tables 3 and 4). It is molecule CD49D LAMR1 Ribosome assoc laminin + noteworthy, however, that a number of genes are not present on receptor 1 ADAM19 A disintegrin and + metalloproteinase domain 19 Table 3. Identification of Genes Overexpressed in B-CLL Binding OSBPL8 Oxysterol binding Using a Concatenation of Data Filters and the Genes@ protein-like 8 Work Algorithm Ig-related IgVV4-31 Ig variable region Ig Ig VDJ region IGHM IgM Category Gene Gene Description Lit. Link IGHG3 IgG3 to Cancer IGJ Ig J polypeptide IGL Ig L light chain SNC73 Peptide in IgA Unknown KIAA1010 Unknown constant region KIAA0275 Unknown Metabolism/ CYBB Cytochrome B-245, KIAA0672 Unknown enzymatic h polypeptide KIAA0747 Unknown GLRX Glutaredoxin + Clone413H6+B2 Unknown PIK3C2B Phosphoinositide-3-kinase, DKFZP586F2423 Unknown EST Highly similar to + class 2 chaperonin GroEl TKT Transketolase* + precursor; hsp60 TXNIP Thioredoxin interacting protein Binding ABCA6 ATP-binding cassette* + Receptor CD22 B-cell signaling molecule + CD79A Ig associated a chain + proteins PBP Prostatic binding protein CD81 Target of antiproliferative + STARD7 START domain-containing protein antibody 1 Cytokine IL-24 Melanoma diff assoc gene 7 + CXCR4 Chemokine receptor + MS4A1 CD20 + Extracellular COL9A3 Collagen type IX a 3* P2RX5 Purinergic receptor, + matrix FMOD Fibromodulin* Metabolism ALDH3A2 Aldehyde dehydrogenase + ligand-gated channel 3 family SORL1 Sortilin-related receptor* Signal DGKD Diacylglycerol kinase y Receptor CELSR1 Cadherin, seven-pass G-type receptor transduction DUSP Dual specificity phosphatase 1 + TNFRSF7 + P114-RHO-GEF U guanine exchange factor + receptor 7 SIPA Signal-induced proliferation TGFBR3 Transforming growth + associated gene factor h receptor III* SNX2 Sorting nexin 2 + Transcription JUNB Jun B proto-oncogene + TRA T-cell receptor a chain Signal ARHC Ras homologue gene + factor MEF2A MADS box transcription transduction family, member C factor enhancer factor 2 CSNK1E Casein kinase 1 epsilon + MLLT2 Myeloid/lymphoid or mixed + IGFBP4 IGF binding protein 4* + lineage leukemia MYC Human v-myc viral + OGT O-linked N-acetylglucosamine GlcNAc transferase oncogene homologue* NRIP1 Transcription co-activator PTPN7 Protein tyrosine phosphatase non-rec type 7 ZFP36L2 Zinc finger protein/butyrate + RASGRF1 Ras guanyl-nucleotide + response factor 2 exchange factor* TRAF1 TNF receptor associated factor 1 + Note: Boldface type identifies genes also identified using the fold change test Small molecule KCNN4 Potassium intermediate/small approach. Genes marked with an asterisk were also identified by Klein et al. (9). transport conductance Ca-activated channel subfamily Structural NUMA1 Nuclear mitotic apparatus + protein protein 1 both lists. The Eisen plot of these 81 genes (72 unique genes) is TTN Actin cytoskeleton + shown at the left of the yellow line in Fig. 2, where both the associated titin genes and the patients are ordered according to the hierarchical Transcription HIVEP2 HIV enhancer binding protein 2 + factor ID3 Inhibitor of DNA binding 3 + clustering scheme described in the ‘‘Materials and Methods.’’ LEF1 Lymphoid enhancer + The dendrogram of the patient samples shows that these genes binding factor 1* clearly separate the control from the CLL patients in our data. Note: Boldface type identifies genes also identified using the fold change test The question that we address next is whether these same genes approach. Genes marked with an asterisk were also identified by Klein et al. (9). also allow for a separation in a completely independent data set.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 350 Gene Expression Profiling of B-CLL

validation that the genes that arose from our analyses are truly differentially expressed in CLL with respect to control, but also serves to validate the reproducibility of the technology described here. To make the comparisons between the two data sets more quantitative, we applied a nearest neighbor classifier (see ‘‘Materials and Methods’’), which is trained using our data. When presented with a previously unseen sample, our classifier can give three kinds of answers based on the expression of the 81 probes (72 genes): the sample can be deemed to be as coming from a CLL patient; or coming from a control (CTRL) patient; or undecided. Fig. 3 shows the results of this classification when the new samples are taken to be the data from the Klein et al. study. According to this classifier, CLL samples should fall below the lower dotted line, whereas

FIGURE 2. Identification of genes differentially expressed in B-CLL and normal B cells. The data on the left-hand side of the figure (left of the yellow line) are the supervised cluster analyses of 10 control B-cell samples (columns 1 – 10) and 38 B-CLL samples (columns 11 – 48) using the concatenation of filters analysis and Genes@Work software. Rows correspond to genes and color changes within a row indicate expression levels relative to the average of the sample population. The data right of the yellow line are the 20 controls and 21 B-CLL patients studied by Klein et al. (9). Cluster analysis indicates that the genes identified in our study that discriminate between normal B and B-CLL also discriminate normal B from B-CLL in an independent data set. The leftmost and rightmost red arrows at the bottom of the figure correspond to two chips whose number of Present calls was below the average minus 2 SDs. The red arrow in the center corresponded to a chip, the 3V/5V ratio of both GAPDH and h-actin of which was >3.

The Set of Genes That Differentiate B-CLL versus Control B Cells in Our Data Also Separate CLL From Control in an Independent Data Set In a recent paper, Klein et al. (9) reported on similar data pertaining to a cohort of CLL patients and a control group. We therefore compared the results from that paper with the results arising from our data. This comparison can be made in a FIGURE 3. The gene expression profiles distinguishing B-CLL from normal B cells predict a similar distinction in an independent data set. To rather quantitative way because these investigators also make the comparisons between the two data sets more quantitative, we employed Affymetrix HG-U95Av2 GeneChips. The data of applied a nearest neighbor classifier (see ‘‘Materials and Methods’’), which Klein et al. for the 20 controls and 21 CLL patients are was trained using our data. This figure shows the results of this classification when the new samples are taken to be the data from the plotted alongside our data in Fig. 2. It is remarkable that the Klein et al. (9) study. According to this classifier, CLL samples should fall overall picture that emerges from Fig. 2 is that the majority of below the lower dotted line, whereas CTRL samples (NM, naı¨ve and memory B cells; GC, germinal center B cells) should lie above the upper the genes that overexpress in our data also do so in the data of dotted lines. Samples that fall in-between the dotted lines cannot be Klein et al., and vice versa. This is not only a biological classified and are deemed as undecided.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 351

CTRL samples should lie above the upper dotted lines. Samples the control was an assortment of memory and naı¨ve B cells, that fall in between the dotted lines cannot be classified and are centroblasts, and centrocytes. Of the 24 genes with known deemed as undecided. Fig. 3 demonstrates that we do not err at function that were overexpressed in our CLL samples, 16 all when we classify all the samples from Klein et al. (9). It is genes (67%) have been reported in the literature to also be interesting to observe that although germinal center control B overexpressed in some other cancer. This observation suggests cells are correctly classified, these B cells tend to be closer to that a high percentage of the known genes that have not the upper dotted lines than the naı¨ve and memory B cells. This previously been linked with cancer, as well as the seven may be explained by the fact that the control B cells used to unknown genes overexpressed in B-CLL, may be good targets train our classifier were peripheral blood B cells, which consist for further study (see below). primarily of naı¨ve B cells with memory B cells comprising only With respect to the genes repressed in B-CLL, we found that a minor subpopulation. of the 36 known genes underexpressed in CLL (Table 4), 10 of Finally, the relevance of this analysis is that if we take these them (27%) reflected products of genes already known to be 72 genes as diagnostic markers, the sensitivity and specificity expressed at lower levels in B-CLL cells (Ig-related sequences, of determining the health state, that is, normal versus B-CLL, CD79a, CD20, CD22). Of the remaining 26 genes, 17 (65%) of a previously unseen case appears to be perfect. This is even have been shown to be underexpressed in other cancers. Those more remarkable because the patients were collected in a genes that have not yet been linked to cancer may include a different site, different technicians did the RNA processing, and number of candidate tumor suppressor genes. the hybridizations were done in different core facilities. Because this kind of validation and reproducibility can be achieved, this suggests that gene expression technology Further Gene Validation including data analysis is becoming mature, and may be usable Further investigation of any of these genes differentially in a diagnostic context. expressed in our array assays first requires that the profiling data be verified using alternative methodologies. Although it was not possible in this study to verify all of these genes, Literature Exploration of Genes Differentially Expressed we have focused on verifying two subsets of potentially in CLL and Control informative genes. We focused first on verifying a subset of The analysis described above suggests that the set of 72 genes that were also identified in the Klein et al. study (9) as unique genes extracted from our study is predictive of whether being differentially expressed in normal B cells relative to CLL B cells are obtained from healthy individuals or from patients B cells (IGFBP4, RASGRF1, LEF-1, ABCA6, FMOD, and with B-CLL. The predictive ability of these data suggests that TGFBR3). In addition, the majority of these genes have known this set of genes may be involved in the biology of normal links to human cancer (Tables 3 and 4). Graphical depictions of versus transformed human B cells. In an effort to approach how the expression levels of these genes varied across the this concept, we systematically explored the existing literature initial set of patient and control samples are shown in Fig. 4. on this set of genes and Tables 3 and 4 summarize the results. Our own gene expression profiling data indicated that each of Taking into account that some of the 81 Affymetrix probes these six genes was overexpressed in B-CLL cells relative to correspond to the same genes, the actual number of unique normal B cells. genes identified was 72. Moreover, seven of the genes Fig. 5A displays the reverse transcription (RT)-PCR results underexpressed in B-CLL were Ig-related. Because B-CLL of mRNA expression levels of each of these genes and cells are characterized as being dim surface IgM positive, demonstrates that we were able to verify that six of six genes some of the Ig-related genes validate the methodology. For were consistently overexpressed in B-CLL versus control B example, CLL cells underexpressed IgM, IgG3, J chain, and cells when eight new B-CLL samples and five new normal B- IgA. In addition, CLL cells have also been shown to express cell samples were examined. RT-PCR verification of gene low levels of CD22 and CD79a, both of which were found to profiling data does not necessarily mean that the protein is be underexpressed in CLL cells relative to control B cells differentially expressed in a specific cell type. LEF-1, because (Table 4). of its role in the Wnt/h-catenin pathway and the availability of Out of this set of differentially expressed genes, 31 genes commercial reagents, was chosen for further study at the protein were overexpressed in B-CLL cells. Of these 31 genes, 16 level. Of interest, the data shown in Fig. 5B clearly demonstrate were previously identified as related in one way or another that LEF-1 is also expressed at the protein level in B-CLL cells, with an oncogenic transformation. For example, LEF-1,a but was undetectable in normal B cells. transcription factor involved in development, has been shown We were next interested in further studying genes uniquely to be involved in many cancers, most notably in colon cancer. identified in this study as being part of a global B-CLL Thus, h-catenin-sensitive isoforms of lymphoid enhancer signature. Nine genes were chosen for further analysis by RT- factor-1 are selectively expressed in colon cancer (17). We PCR using independent patient samples and the graphical also identified eight known genes that were overexpressed in depictions of how the expression levels of these genes varied B-CLL that have not been previously shown to be associated across the initial set of patient and control samples are shown in with any cancer including the gene, FMOD (fibromodulin). Fig. 6. Fig. 7A displays the RT-PCR results regarding NUMA1 FMOD was one of the genes with the highest fold change in expression levels. Of note, NUMA1 only appeared to be the comparison between CLL and CTRL in Klein et al. (9), underexpressed in three of five normal B-cell samples; both when the control cells were memory B cells, and when however, there was also one outlier in the initial group of 10

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 352 Gene Expression Profiling of B-CLL

IGFBP4 RASGRF1 LEF-1

30,000 3000 15,000

20,000 2000 10,000

10,000 1000 5000

0 0 0

ABCA6 Fibromodulin TGFBR3

4000 20,000 7000 5000 Signal Strength 3000 3000 3000

2000 10,000 2000 FIGURE 4. Primary gene expression data on a subset of differentially expressed genes. 1000 1000 Each of the graphs depict the signal strength for a given gene from each of the 38 B-CLL and 10 normal B-cell samples studied by gene chip analysis. Negative (Affymetrix average 0 0 0 difference) gene expression values were B-CLL CONTB-CLL CONT B-CLL CONT clipped-off at 0.

control B-cell samples (Fig. 6). Regarding CD73 (Fig. 7B), this the value of employing multiple non-overlapping methods in gene was identified using the fold change method but did not gene expression data analysis and the need to verify gene pass our concatenation of filters method. Indeed, the CD73 expression profiling data using independent methodologies and graph in Fig. 6 indicates that whereas the majority of control B cell samples. cells were positive for CD73, several were called absent by the Affymetrix software, leading to this particular gene’s failure to pass the P/A filter. Nevertheless, we wished to explore this gene Identification of a Putative Disease Progression Signature further and the data shown in Fig. 7B verifies the trend toward The results discussed above demonstrate the utility of this overexpression of CD73 in normal B cells relative to B-CLL. technology at identifying genes that are likely involved in Moreover, analysis of cell surface CD73 expression by FACS disease pathogenesis. Having already demonstrated the power demonstrated that most B-CLL cells are indeed CD73 of our data analysis methods and validation via multiple negative.1 Finally, GS3955 was a gene identified in both methods, we next wished to further exploit the richness of the methods of analysis and shown in Fig. 2 to also behave data set studied in this report. Thus, we next wished to differentially in the data set of Klein et al. However, it is clear determine whether we could discern genetic differences from the RT-PCR results in Fig. 7B that we were unable to between low risk (Rai 0) and high risk (Rai 4) CLL patients. verify the differential expression of this gene. The rationale for this particular comparison is that the survival Fig. 7C and D displays the RT-PCR results for KIAA1010, curves for these two risk categories are clearly distinct (18). KIAA0275, TNFSR7 (CD27), P2RX5, ITGA4, and JunB. Using the combined concatenation of filters and Genes@Work, Although Klein et al. (9) did not identify this subset of genes, we were indeed able to find 31 unique genes (34 probes) that scrutiny of the data shown in Fig. 2 to the right of the yellow appear to discriminate between low risk and high risk B-CLL line for each of these genes suggests that these genes are also (Fig. 8). Of some interest, there was one low risk patient that informative in an independent data set. The RT-PCR results also appeared to segregate with the high risk group. However, this suggests that the trend for over- or underexpression of these particular patient was also diagnosed with lupus at the time the genes in B-CLL cells relative to peripheral blood B cells is blood sample was used in the analysis. It is also interesting to similar when new patient samples were studied in this manner. note that there is little overlap between this signature and the Collectively, the data presented in this section highlight both global gene signature described above. We believe that these genes may represent a molecular signature of disease prog- ression and therefore merit further study. Validation of these genes at either the RNA or protein level is currently under 1Pittner and Jelinek, unpublished observations. investigation.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 353

Discussion heterogeneous B-CLL samples, for example, CD38 positive, In this study, we have rigorously identified a set of genes that CD38 negative, GL and somatic mutation (SM) type B-CLL; are differentially expressed between B-CLL and normal B cells. and control B cells in every experimental run. Indeed, Importantly, we have also identified a non-overlapping set of hierarchical cluster analysis of samples grouped by assay date genes that discriminate between low risk and high risk B-CLL. failed to reveal relationships defined by assay date (results not Because many of them have been shown to be involved in the shown). As a final measure to control variability introduced by biology of other human malignancies, it is likely that at least a time, using the same RNA, we obtained primary data from two subset of these genes underlies the biology of the tumor cell in separate samples. Rather than running these duplicate samples B-CLL. Our study also describes the utility of several stringent within the same run, we chose to separate these analyses by methods of analysis of gene expression profiling data as a time, and thereby use a more stringent filter that estimated measure to ward against false-positive results. Finally, we have maximal variation, rather than underestimating this important been able to strengthen the potential role of these genes in this source of variability using this methodology (19). disease by showing the ability of our data to correctly Our decision to use peripheral blood B cells from healthy, discriminate B-CLL from normal B cells in the independent age-matched controls rather than B cells obtained from separate data set reported by Klein et al. (9). Studies of this nature add a anatomic compartments largely reflected the likelihood that unique dimension of gene validation that is not typically the circulating tumor cell in B-CLL may be fundamentally possible within one study carried out at one institution. distinct from tumor cells that may reside within the node or Other groups have previously employed gene expression bone marrow. Indeed, there is evidence for small foci of leu- profiling as a means to better understand this disease (6–10). kemic B-cell proliferation centers in lymph nodes from B-CLL Our study can be distinguished from these latter studies by our patients; however, CLL blood B cells are notoriously quiescent experimental design, which featured a balanced collection of with respect to the cell cycle (20, 21). Therefore, because primary data, use of age-matched healthy controls, and our peripheral blood B cells from normal individuals are typically methods of data analysis. Thus, because early studies have not cycling, we reasoned that our goal to identify genes that revealed the danger of unbalanced data collection, for example, would discriminate B-CLL from normal B cells would be best grouping disease samples separate from controls or collection accomplished by comparing blood B cells from age-matched of primary data over long periods of time, Affymetrix controls with CLL B cells. Because there are noted differences GeneChips from the same lot were used in all of our analyses. between cord blood CD5+ B cells and CD5+ B cells from The chip analyses were performed over a restricted period of adults (22), we chose not to include cord blood B cells in our time and our experimental groups intentionally consisted of current analysis. In this study, we employed three general methods of data analysis. It is generally known that large data sets with many genes and few patients can lead to the identification of genes that appear to discriminate well between patient sample types, B-CLL CONT but in fact are simply chance results. To prevent this, we used A MW Stds 1 234 5678 1 234 5 stringent significance thresholds in all of our analyses. Even IGFBP4 though a high significance threshold for each method of data RASGRF1 analysis will necessarily imply missing a number of biologi- cally relevant but statistically non-significant genes, the LEF-1 combination of three data analysis methods allows us to ameliorate this effect, as a false negative in one method may be ABCA6 a true positive in another method, particularly because different methods interrogate the data in different ways. FMOD Our first analytical method interrogated the data in terms of the fold change between case and control patients. A gene TGFBR3 passed this fold change test if its ratio of expression in case versus control (or vice versa) was larger than the 95th β-Actin percentile of the distribution of the maximum fold change (over all the genes) in the null hypothesis (where cases and controls were randomly permutated). The fold change analysis CONT B-CLL B discussed above is only one measure of the distribution of gene 1 2 1 2 3 4 Jurkat transcription. For example, a fold change between case and LEF-1 control can be nonsignificant based on our fold change criterion, but the distribution of expression between case and MAPK control may be significantly different. We interrogated the data using this latter perspective coupled with other strategies that FIGURE 5. Gene validation. A. RNA was isolated from eight new B-CLL patients and five new normal, age-matched controls, and tested by RT-PCR required a minimum number of present calls and a fold for IGFBP-4, RASGRF1, LEF-1, ABCA6, FMOD, and TGFBR. B. Western difference in expression that exceeded the replicate noise level. blot analysis was performed to assess whether LEF-1 was also differentially expressed between CLL B cells and normal B cells at the protein level. For this, we designed a concatenation of three filters (the EN, Jurkat T cells were included as a positive control. P/A, and S/N filters) that subsequently excludes genes that do

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 354 Gene Expression Profiling of B-CLL

NUMA1 CD73 GS3955 KIAA1010 KIAA0275

10,000 3000 15,000 10,000 50,000

40,000 7500 7500 2000 10,000 30,000 5000 5000 20,000 1000 5000 2500 2500 10,000

0 0 0 0 0 TNFRSF7 P2RX5 ITGA4 JunB ACTIN

40,000 75,000 3000 50,000 300,000

Signal Strength 40,000 30,000 50,000 2000 200,000 30,000 20,000 20,000 25,000 1000 100,000 10,000 10,000

0 0 0 0 0 B-CLL CONT B-CLL CONTB-CLL CONT B-CLL CONT B-CLL CONT

FIGURE 6. Primary gene expression data on novel members of the B-CLL global gene signature. Each of the graphs depict the signal strength for a given gene from each of the 38 B-CLL and 10 normal B-cell samples studied by gene chip analysis.

not pass the corresponding criterion. It is interesting to mention Although there have been two previous wide-scale gene that the set of genes that resulted from the concatenation of filters profiling studies of B-CLL, our ability to identify a significant was the same if we replaced the S/N filter (with threshold 1, number of additional novel genes that discriminate B-CLL from see ‘‘Materials and Methods’’) by a t test at Bonferroni-corrected normal B cells emphasizes the power of our approach, that is, P of 0.05, showing a measure of robustness of this concatenation patient population and data analysis methodologies. In addition, of filters approach. our studies directly demonstrate the advantages provided by Yet a third way of interrogating the gene expression data multiple, independent microarray studies. We were able to corresponds to the scenario where a group of genes becomes obtain additional data on nine of the genes uniquely identified deregulated in CLL but is tightly regulated in normal counter- in this study. As noted above, a number of these genes were Ig- part B cells. This would result in a pattern of expression the related, and are consistent with the known levels of Ig signature of which was not necessarily captured by the previous expression on B-CLL cells. Other gene examples that are two analyses. To discover these kinds of patterns, we applied noteworthy include TNFRSF7 (also known as CD27), Genes@Work, a gene expression pattern discovery software KIAA1010, and KIAA0275. Thus, CD27 is highly expressed downloadable from www.research.ibm.com/FunGen/index.html. on all B-CLL cells, regardless of Ig mutation status (Ref. 23, The Genes@Work software, but not the other analytical tools unpublished observations) and this gene was part of the described and employed in this study, has been successfully signature of overexpressed genes in B-CLL. We were also able used by Klein et al. (9) to identify genes that could discriminate to validate by RT-PCR the overexpression of the two currently CLL from normal B-cell subsets. unknown genes in five of five new B-CLL patient samples: The combination of three different analytical methods that KIAA1010 and KIAA0275. Although KIAA1010 has been interrogate the data from different perspectives allowed us to described as an SH3 domain-containing protein and KIAA0275 recover some of the genes that were missed as false negatives in has some homology to serine protease inhibitors, the function one of the methods. In this study, we uniquely combined the fold of these two genes remains unknown, as does their relevance change test, the concatenation of filters, and the Genes@Work to B-CLL. software and we were able to identify a number of interesting It is also important to note that there were genes that were genes that were either over- or underexpressed in CLL B cells difficult to clearly validate given the trends suggested by our relative to normal B cells. Indeed, a number of genes, primary expression profile data. Noteworthy in this regard particularly those that were in the underexpressed category were NUMA, CD73, GS3955, and JunB because some patient primarily served to validate the methodology as they included samples clearly expressed these genes and others did not. genes the products of which are known to be underexpressed in Importantly, JunB gene expression was recently shown to be B-CLL, for example, Ig, CD79a, and CD22. inactivated by methylation in chronic myeloid leukemia (24),

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 355

evidence in the literature that B-CLL patients have decreased ecto-5V-nucleotidase activity and that this is secondary to B-CLL CONT decreased numbers of CD73+ cells (27). CD73 expression MW Stds 1 A 1 234 5678 234 5 may therefore be an important marker of disease heterogeneity. NUMA1 Collectively, our experience with these three genes underscores the necessity of gene validation before extensive further β-Actin investigation or designation as a component of a global gene B-CLL CONT signature versus inclusion in a disease subset signature. B 123567910 5 678 9 A significant number of genes differentially expressed by CD73 CLL B cells were genes that have been previously linked to other cancers. This fact supports the possibility that these gene GS3955 products may be directly or indirectly linked to the trans- formation process in B-CLL. For example, the LEF-1 tran- β-Actin scription factor was identified by Klein et al. (9) as being B-CLL CONT overexpressed in B-CLL cells. This gene was also part of our C 1234512 3 signature, and we have shown for the first time that this gene is KIAA1010 indeed overexpressed in B-CLL cells at both the RNA and KIAA0275 protein levels. This gene is an attractive candidate for further study for a variety of reasons. First, LEF-1 has been implicated TNFSR7

β-Actin B-CLL CONT D 123456 123 P2RX5

ITGA4 JunB β-Actin

FIGURE 7. RT-PCR validation of novel, differentially expressed genes. RNA was isolated from eight new B-CLL patients and five normal, age- matched individuals. RT-PCR was carried out for each of the genes shown in panels A – D.

suggesting the possibility that the absent expression of JunB in a subset of B-CLL patients may occur through a similar mechanism. Indeed, abnormal methylation patterns have been described in B-CLL (25, 26). The pattern of JunB expression did not, however, correlate with either CD38 expression levels or Ig mutation status (results not shown). With respect to NUMA, CD73, and GS3955, our analytical methods suggested that NUMA (which passed the fold change test and the concatenation of filters, but not the Genes@Work criterion) was overexpressed in CLL B cells and GS3955 (which passed all our tests) was underexpressed, but CD73 was only identified by the fold change analysis. Inspection of the original data for these three genes does indicate some heterogeneity of expression among patient samples, thereby perhaps explaining our RT-PCR results. Because the absolute differences in expression of these three genes may be less, a more quantitative RT-PCR may be required to validate the differences, if indeed they exist. CD73 is an ecto-5V-nucleotidase that we have also studied at the protein level.1 In preliminary studies, although FIGURE 8. Identification of genes differentially expressed in low risk normal B cells are uniformly positive for this marker, some versus high risk B-CLL. The data are the supervised cluster analyses of 14 B-CLL patients are completely negative for CD73 in a manner low risk and 8 high risk B-CLL samples using the concatenation of filters approach and Genes@Work software. Rows correspond to genes and consistent with the gene profiling data; however, some B-CLL color changes within a row indicate expression levels relative to the patients clearly express this molecule. Moreover, there is average of the sample population.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 356 Gene Expression Profiling of B-CLL

in Wnt and h-catenin signaling (reviewed in Refs. 28, 29), there that we did not err at all. As noted above, this is remarkable are reports linking LEF-1 with colon cancer (17), and there is because of differences in patient populations studied and also literature regarding the role of LEF-1 in lymphocyte because of the variety and number of factors that can limit the differentiation (reviewed in Refs. 30, 31). Of interest, LEF-1 is reproducibility of this technology (reviewed in Refs. 19, 46). only expressed in developing pro-B cells, but not in mature B Because this kind of validation and reproducibility can be cells (32). In this regard, Wnt signaling regulates B-lymphocyte achieved, this suggests that gene expression technology is proliferation through a LEF-1-dependent mechanism (33). coming to maturity, and may be usable in a diagnostic context, There is also evidence that Wnt proteins may function as perhaps even for disease subset definition. hematopoietic growth factors (34). It is therefore plausible that Finally, we also present evidence that there may be a gene reactivation of LEF-1 expression in B-CLL may facilitate the expression signature that differentiates low risk from high risk tremendous B-cell clonal expansion that is observed in this disease (Fig. 8). In this regard, we have previously suggested disease. This possibility is currently under investigation in our that there may be subsets of disease defined on the basis of laboratory. CD38 and Ig mutation levels (47). Although Stratowa et al. (8) Our study was additionally able to validate elevated studied expression differences between low (0, 1) and high (2, expression of IGFBP4, FMOD, RASGRF1, ABCA6,and 3, 4) Rai stages, this analysis was limited to a restricted number TGFBR3. Each of these genes, with the exception of (1024) of selected genes. Our study suggests some interesting fibromodulin, has been linked with other cancers. Moreover, genetic differences between the two stages of disease. For Klein et al. (9) also identified each of these genes as being example, IgM expression was higher in the high risk versus the overexpressed in CLL B cells relative to normal memory B low risk patients. This result is consistent with other cells. There are a number of interesting features of each of these observations that B-CLL patients, the tumor cells of which genes including fibromodulin, which has not been previously express GL Ig variable regions, have more aggressive disease linked with cancer. Thus, fibromodulin binds TGF-h and and display a more activated phenotype consistent with that typically negatively modulates its activity (35). This observa- displayed by B cells activated via the BCR (48). In addition, tion, coupled with reports that B-CLL cells respond abnormally MAP2K3, also activated through the BCR, was also more to TGF-h (36), raises the interesting possibility that CLL highly expressed in the high risk patients. Because a global overexpression of fibromodulin may underlie the abnormal signature could be identified using a heterogeneous mix of response to TGF-h. Similarly, the TGF-h receptor III (TGFBR3 early, intermediate, and late stage patients, the global signature or h-glycan) is expressed in leukemic B-cell precursors (37), is likely to be more closely linked with disease pathophysiol- overexpressed in small cell lung cancer cells (38), and modified ogy, whereas the stage specific signature is likely to be more in colon cancer cells overexpressing oncogenic Ki-ras (39). closely linked with progression events. Clearly, it will be Regarding the link between TGFBR3 and ras, it is interesting critical to verify the informative nature of these putative stage- that we also observed overexpression of the guanine nucleotide specific genes before embarking on future studies linking these exchange factor, RASGRF1. This protein has been shown to be genes with disease biology. In this regard, it is important to oncogenic in NIH 3T3 cells (40) and to increase active ras and acknowledge that this analysis was carried out on a limited MAPK in NIH 3T3 cells engineered to overexpress RASGRF1 number of patient samples (n = 22), raising the possibility that (41). There are also potentially interesting links between some of these genes may reflect inaccuracies resulting from IGFBP4 expression and the suppression of normal humoral overfitting of the data. However, our success in identifying a immunity seen in B-CLL. Thus, IGFBPs have been suggested global B-CLL gene signature using similar methods predicts to modulate the action of IGF-I or IGF-II on lymphopoiesis that the majority, or at least a subset of these genes will be valid (42), raising the possibility that this may be one mechanism by candidates for further study. which the normal humoral immune compartment becomes In conclusion, we describe new non-duplicative methods of variably depressed in B-CLL patients. Finally, ABCA6 is a data analysis that were useful in identifying a genetic signature member of the ATP-binding cassette (ABC) transporter family. of CLL B cells. Moreover, we present novel evidence that this Recent studies have begun to demonstrate a link between ABC technology may also permit the discrimination by gene transporter function in normal and malignant hematopoiesis profiling of low risk from high risk B-CLL. Genes linked to (reviewed in Ref. 43). In addition, members of this gene family this signature may uniquely play a significant role in disease have also been shown to be overexpressed in breast cancer (44). progression. Our methods of analysis were designed to Given the ability of some members of this family to confer drug maximally exploit the richness of the data that accompanies resistance on tumor cells (45), it is interesting to speculate that this high-throughput technology and simultaneously apply data the recently discovered ABCA6 transporter protein may be filters and statistical tools that would minimize the false- playing a critical role in B-CLL. Nevertheless, it is important to positive results while recovering false negatives. Because of the acknowledge that definitive evidence linking these genes to the ability of our signature to accurately discriminate leukemic cells biology of B-CLL is currently lacking. Given that these genes from normal B cells in an independent data set, we predict that have now been identified in two independent studies (this the majority of the genes identified in this study are genuine report; Ref. 9), there is a strong rationale to further investigate markers of disease. We believe that this relatively small group these genes in B-CLL. of genes should constitute a new focused CLL B-cell probe In this study, we also tested the ability of our gene signature array. In addition, we provide new data that validate the to correctly classify independent data, that is, discriminate differential expression of a number of interesting genes between normal and CLL B cells. In this regard, it is striking included in our gene signature of B-CLL. Our initial validation

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 357

data on some of the differentially expressed genes in B-CLL We flagged as dubious those chips for which the 3V/5V ratios of continue to support future work that links altered gene either GAPDH or h-actin was >3 or the percentage Present expression with functional downstream molecules that are calls were smaller than the mean percentage Present calls in critical to the biology of this disease. our data set minus 2 SDs. These criteria flagged out three chips. We therefore performed all the analyses with and without these three chips. As the differences were minimal, Materials and Methods we opted for keeping these three chips, which are nonetheless Patient Cohort marked with arrows in Fig. 2. To minimize experimental The diagnosis of B-CLL required persistent lymphocytosis variability, all of the HG-U95Av2 Affymetrix chips used in 3 + (>5000 lymphocytes/mm )andaCD5, dim surface Ig this study (n = 50) had a matched synthesis date. In addition, expression, and monoclonal n or E expression immunopheno- there was uniform technical effort and replicate analysis was type. Peripheral blood was collected from previously untreated performed on two of the samples, that is, uniform RNA but B-CLL patients encompassing all Rai stages (N = 38) after sample preparation times were separated by 2 months. Finally, informed consent. Consented healthy blood donors, 60 years the 50 patient and normal control samples were analyzed in 10 of age or older, served as age-matched controls (n = 10) for groups of 5 samples. Thus, to minimize spurious clustering this study. that may result from the day of analysis, each group of 5 samples contained one normal control RNA, and four B-CLL Cell Isolation RNA samples from B-CLL patients that were heterogeneous with respect to CD38 and Ig mutation status. Peripheral blood mononuclear cells (PBMCs) from B-CLL patients were separated from heparinized venous blood by Ficoll (Gallard-Schlesinger, Garden City, NY) density gradient RT-PCR centrifugation. B-CLL patient PBMCs used in the gene Sense and antisense primers to genes of interest were expression verification experiments that did not exceed >85% designed based on the probesets used by Affymetrix for CD19+ cells were further purified using CD19 magnetic beads synthesis of the GeneChip Primer sequences (Table 5). The (Miltenyi Biotec, Auburn, CA). PBMCs were suspended with sequences for h-actin primers have been described previously CD19 beads in PBS, 0.5% FCS, and 2 mM EDTA for 15 min at (49). Total RNA was converted to cDNA using the Pharmacia 4jC, before washing, resuspension in the same buffer at 5–10 1st Strand cDNA Synthesis kit (Piscataway, NJ). Two micro- 106 cells/ml, and passage through the AutoMacs Magnetic liters of cDNA was amplified using the Qiagen HotStarTaq kit Cell sorter (Miltenyi Biotec) to collect the CD19+ cells. The in a total volume of 50 Al, which included 20 pmol of each normal control samples were received as enriched buffy coats primer, 200 Amol/l of each dNTP, and 2.5 units of HotStarTaq. and CD19+ cells were isolated from the PBMCs as described Amplification was carried out in a Perkin-Elmer PE9600 above. Following bead selection, the purity of the CD19+ cells thermocycler (Perkin-Elmer, Boston, MA) as follows: denatu- typically exceeded 95%. ration at 95jC for 15 min; 35 cycles of 30 s at 94jC; 1 min at 55–60jC; 1 min at 72jC; and a final cycle of 10 min at 72jC. The annealing temperature was dependent on the primer pair RNA Isolation and Microarray Hybridization used. Amplified products were electrophoresed on a 1.5% Total RNA was isolated from cells using the Trizol reagent agarose gel and visualized by ethidium bromide staining. (Life Technologies, Inc., Grand Island, NY). RNA was quantitated by reading the absorbance at 260/280 and the RNA quality was further tested using an Agilent 2100 Western Blot Analysis Bioanalyzer (Agilent Technologies, Palo Alto, CA). Double- Freshly isolated cells (20 106) were lysed for 20 min stranded cDNA was generated from 10 Ag total RNA using on ice in lysis buffer containing 10 mM Tris (pH 7.4), 150 the Superscript Choice system (Life Technologies) with T7- mM NaCl, 1% Triton X-100, 0.5% deoxycholate, 0.1% SDS, (dT)24 oligomer. cDNA was purified by phenol/chloroform 5mM EDTA, 200 mM phenylmethylsulfonyl fluoride, 10 Ag/ extraction and precipitation. Biotin-labeled cRNA was ml aprotinin, 10 Ag/ml leupeptin, 5 Ag/ml pepstatin, 1 mM prepared using the Enzo BioArray HighYield RNA Transcript DTT, and 1 mM sodium Na3VO4. An equal volume of 2 labeling kit (Affymetrix, Inc., Santa Clara, CA). The SDS sample buffer was added before boiling for 5 min. biotinylated probe was cleaned of unincorporated NTPs using Lysates were electrophoresed through a 10% Tris-HCl an Rneasy kit (Qiagen, Valencia, CA). Quality of cRNA and polyacrylamide gel using Tris-Glycine-SDS buffer. Proteins labeling efficiency was verified on the GeneChip Test3 array were transferred to PVDF membranes (Millipore, Bedford, (Affymetrix). Fifteen micrograms of quality fragmented cRNA MA) and blocked with 25 mM Tris-HCl (pH 7.2), 150 mM and GeneChip Eukaryotic Hybridization Control reagents NaCl, 0.1% Tween 20 (TBST) plus 5.0% Blotto (ISC (Affymetrix) were hybridized to each HG-U95Av2 GeneChip BioExpress, Kaysville, UT). The blot was probed with LEF-1 (Affymetrix) in hybridization buffer with final concentrations antibody (Santa Cruz Biotechnology, Santa Cruz, CA) at a of 100 mM MES, 1 M [Na+], 20 mM EDTA, and 0.01% 1:2000 dilution. After three washes with TBST, donkey anti- Tween 20. After hybridization and washing according to the goat HRP secondary (Amersham, Piscataway, NJ) was used manufacturer’s protocol, the GeneChips were scanned on an at a 1:2500 dilution. After washing, SuperSignal (Pierce, Agilent GeneArray Scanner. All experiments were scaled to an Rockford, IL) Chemiluminescent substrate was used to detect arbitrary value of 1500 as recommended by the manufacturer. proteins. The blot was stripped in 7 M guanidine HCl for

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 358 Gene Expression Profiling of B-CLL

Table 5. Oligonucleotide Primers Used in PCR Verification of Genes of Interest

Gene Sense Antisense

IGFBP4 5V CATGTACCTTGACCATCGTC 3V 5V GGTCTTCTCCATTAGGCAC 3V NUMA1 5V CTGGAGCATGGACAGTGTG 3V 5V AAGAGCATCCTCTGGCAG 3V RASGRF1 5V GACGTGGCTCAAAGTCTCT 3V 5V CGGAGAGAAGACTCGTAGAG 3V GS3955 5V GGTGACACGTGGACTCTAG 3V 5V CATGCATGTCACCAACCTC 3V LEF-1 5V GCTTGTCTGGTAAGTGGC 3V 5V AAGTAATAACAATTCAAGTCGTGG 3V ABCA6 5V CACACTGAGATTCTGAAGC 3V 5V CACAGGTTTGCTATCACC 3V FIBROMOD 5V CACATCTCTGAGCTATATCC 3V 5V CAGGTCTGGAGCCAAGAAC 3V TGFbR3 5V GGAACCACAGATTTGTGC 3V 5V GGCAATCCAAATGAGTGC 3V CD73 5V CTGCCTCCAAATCTGAACAG 3V 5V CCACTTCACAATACACACAC 3V KIAA1010 5V CCTCCCAGTCTCAGGAAGATATTTG 3V 5V TGCATTACTGCCTTCTGACCACGAC 3V KIAA0275 5V GAGCTTTGATACCTGGGGTCG 3V 5V GCTGGATGACTCACGGGACTC 3V TSFRSF7 5V GTACACGTGACAGAGTGCCTTTTCG 3V 5V ATTTTGCCCGTCTTGTAGCATGTGG 3V P2RX5 5V CGTCAGTTCTTCCTTTCTCCG 3V 5VTCCACTGGAGAGTATGCTCTG 3V ITGA4 5V GAGTGCAATGCAGACCTTGA 3V 5V TGGATTTGGCTCTGGAAAAC 3V JUNB 5V GGACGATCTGCACAAGATGA 3V 5V AGGTAGCTGATGGTGGTCGT 3V

10 min, washed three times with TBST, and reprobed with a ~ ge = 2 correspond to complete saturation of the red and the MAPK antibody (New England Biolabs, Beverly, MD) at a blue, respectively. The resulting pseudo-color map associates 1:1000 dilution and then with goat anti-rabbit HRP secondary the same colors to measurements that are off by the same antibody (1:2500 dilution). number of SDs from their expected value.

Data Processing Gene Expression Data Analyses Both the Affymetrix average-difference expression data and Differentially expressed genes between two sets of subjects the P/A calls were used in the analyses. We used Affymetrix C1 and C2 (or alternatively called the phenotype set P and the Microarray Analysis Suite MAS 4.0 to process the microarray control set C) are identified using a mixture of statistical information. (More detail about the Affymetrix image analysis schemes that can be broadly characterized as univariate algorithms is available at http://www.affymetrix.com/products/ methods and multivariate methods. software/specific/mas.affx.) Unless otherwise stated, the loga- Univariate Analysis of Gene Expression Data. In the rithm to the base 10 of the average difference was chosen as following schemes, each gene is treated independently to the variable for subsequent analyses. In these cases, and assess whether it is differentially expressed in the two sets before taking the logarithm, the small and negative expression under consideration. levels were clipped-off to be equal to a cutoff value arbitrarily (1) Fold change analysis. One of the customary and simple chosen as 1. analyses for gene expression data is to accept a gene as Dendrograms. The hierarchical clustering algorithm used to differentially expressing in two groups if its fold change is generate the dendrograms in this paper is based on the average- above a significance threshold. We choose this threshold as a linkage method (50, 51). For the dendrograms, the log- relatively high percentile of the distribution of the maximum transformed expression value of each selected gene is fold change (over all the genes) under the null hypothesis that normalized to have zero mean and unit SD across all the case and controls are chosen at random. We do not use the samples. The similarity between two individual samples is variance of the fold change distribution to determine the calculated by the Pearson correlation. significance threshold, but rather the distribution of the fold Graphic Representations of Gene Expression (Color change extreme values, which we sample by permutating the Matrices). Columns represent individual experiments (or in this labels of cases and controls. For the comparison of primary case, samples) and rows represent individual genes present on interest, genes were first sorted by the difference in mean log the expression microarray. To generate a pseudo-color map, first expression values between comparison groups (CLL B cell a gene- and experiment-specific change of variables, from the versus normal B cell), where gene expression was defined as log-transformed measurement m into fge, is computed using the average difference generated by the Affymetrix Microarray the formula: Analysis Suite software. This difference in mean log expression

m ðAP þ ACÞ=2 is a measure of the log fold change between the two groups. fge ¼ ; The genes whose difference in mean log expression is beyond a ðjP þ jCÞ=2 significance threshold are deemed to be differentially express- where AP and jP are respectively the mean and SD computed ing in the two groups. The threshold is chosen in a very from the gene expression log-transformed values for that gene in stringent way using a method that allows us to estimate the the phenotype group, and AC and jC are their corresponding differences in mean log signal that could arise simply by chance values computed from the control group. The value of this in an analysis of more than 12,600 genes. We therefore mixed function is then plotted using a pseudo-color map that represents the classes (creating a null hypothesis ensemble), so that the ~ ge = 0 as black, ~ ge > 0 as progressively brighter hues of red, fold change that is computed for each gene is actually and ~ ge < 0 as progressively brighter levels of blue. ~ ge = 2 and independent of the real classes. Thus, out of all the genes in

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 359

one permutation, we chose the gene(s) with the maximum fold of a gene in sets C1 and C2, respectively. Thus, when applying change, and added that fold change to our statistics. By the EN filter, we take each gene and compute the geometric performing 1000 permutations, we generated 1000 maximum average of the expression value in the CLL class and plot it as (log) fold changes, and took as our threshold the 95th percentile the abscissa for that gene, whereas the ordinate corresponds to of the distribution of these 1000 values. In this way, we can the geometric mean in the CTRL group. Only genes that fall interpret that the probability of having the maximum fold outside the solid, convergent lines in the figure are plotted (plus change greater than that threshold in the null hypothesis is 5%. symbols, Fig. 1). Therefore, there are 54 plus symbols, which (2) Filter strategies. In this strategy, to be deemed differ- are the 54 genes that passed the EN filter. Some of these genes entially expressed in the two classes, a gene has to successively will be further filtered out by the S/N. Finally, we plotted the pass three filters: the P/A filter; the EN filter; and the S/N filter. two lines representing the thresholds of the fold change test (the (2a) P/A filter. Each gene in an Affymetrix GeneChip is first of our methods) on the same figure. The upper dashed line assigned a status of either Present (P) or Absent (A) or Marginal is the one representing a fold change of 87. The lower dashed (M). When the call for a gene is A, the corresponding numerical line represents a fold change of 98. value of the average difference is very unreliable. In our (2c) S/N filter. When a gene passes the EN filter, the average analyses, we only accept a gene for further analysis if there is a of this gene in C1 can be considered as different beyond EN minimum number of subjects for which that gene’s call was from the average of the same gene in C2. However, the question either P or M. If we denote by #C1 and #C2 the number of remains as to whether this difference in means is relevant when subjects in classes C1 and C2, respectively, then only genes compared with the natural variability of expression around the whose percentage of A to P and M calls is less than min (#C1, mean. In other words, the distribution of a gene in C1 may #C2)/(#C1+#C2) will be further processed. overlap considerably with the distribution of the same gene in (2b) EN filter. We performed two hybridizations of a sample C2. A simple measure of this overlap in distributions is the coming from one patient to assess the inherent assay variability. ratio of the difference in means divided by the average l l C1 C2 The result from one of these duplicated experiments is plotted SD: zg ¼j j: Our not too stringent S/N filter, weeds ðrC1 þrC2 Þ=2 in Fig. 1. In this figure, the dots correspond to genes measured out the genes with zg <1. from the same sample in two different hybridizations. The Multivariate Analysis of Gene Expression Data. To be expression level for a given gene in hybridization 1 is shown in deemed differentially expressed in our multivariate analysis, a the abscissa (/1), and the expression for the same gene in the gene has to belong to at least one of the patterns discovered by second hybridization is shown in the ordinate (/2). If the Genes@Work, and then has to pass the P/A filter. measurements had been unaffected by noise, the points in Fig. 1 (1) Genes@Work. We used the gene-expression pattern would be organized on a straight diagonal line, that is, /1 = /2. discovery algorithm Genes@Work (47) as our basic tool to Thus, the scatter of the points is an indication of experimental perform multivariate gene selection. A Genes@Work pattern (non-biological) variability. As a measure of this variability consists of a subset of genes that express differentially in a pertaining to the cloud of dots in Fig. 1, let us observe that the subset of the phenotype set with respect to the control set. percentage of genes, the reproducibility fold change of which Differential expression is determined as follows: for each gene was above 2, 3, 10, and 50, was respectively 30%, 16%, 3%, g, a gene expression probability density pg(e) is computed and 0.5%. The solid, empirically determined, convergent lines empirically from the control set. The algorithm discovers all indicate the envelope of the cloud of points and were designed the subgroups of patients within the phenotype group for to only leave 1% of the points outside of the two lines. For which there is a subset of genes with the following property: points in Fig. 1, 99% of them are bounded between the two for every gene g in this subset of genes, the integral of pg(e) lines, or what is equivalent, verify the empirical equation over the expression range in the subgroup of patients is less jlog10(/1) log10(/2)j2.8 0.25[log10(/1) + log10(/2)]; than a user-defined threshold d (0 < d < 1). In other words, the equality in this formula represents the equation for the solid each gene in a pattern expresses at a similar level in the lines. This approach is similar to the Use-Fold algorithm patient subset, the level of similarity given by d. The number described in Ref. (52). These convergent lines also emphasize of patients (genes) that compose a pattern is called its patient that the variability of a given gene is influenced by its overall (gene) support. The statistical significance of a Genes@Work level of expression with lowly expressed genes displaying more pattern is given by its P, defined as the probability of variability than highly expressed genes. This observation can be observing one or more similar patterns (i.e., a pattern with the used to decide whether the difference of expression of a given same number of support patients and support genes) in a set gene in two subjects is of biological origin. Suppose a gene is of random patients whose genes g’s are distributed according measured in patients 1 and 2 yielding expression values /1 and to the same probability density pg(e) (the null hypothesis, /2, respectively. If these two values verify the equation above, Ref. 53). The input parameters for Genes@Work are the we cannot assert that the difference in expression for this gene minimum gene and patient support of a pattern, the degree of in these two patients is not due to the assay noise. However, if similarity of gene expression for each gene d, and the the equation above is not met, which occurs only 1% of the maximum P. time if there is no biological effect at play, then that difference (2) Gene selection with Genes@Work. Given two sets of can be ascribed to a biological effect. The probability that we subjects, C1 and C2, Genes@Work is run twice, with either set err in that assessment is thus 1%. When we have two sets of chosen in turn as phenotype and the other as the control. The subjects C1 and C2, the same criterion is applied, where /1 and minimum patient support is made to be 70% of the number of /2 are interpreted to be the geometric average of the expression subjects in the phenotype set; the minimum gene support is

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. 360 Gene Expression Profiling of B-CLL

chosen to be 4 in all the cases; d is chosen to be 0.01, and the Online Supplemental Material maximum P was chosen to be 1010. A gene is selected as Data pertaining to this work and the algorithm Genes@Work differentially expressed if it belongs to at least one of the can be downloaded (http://www.research.ibm.com/FunGen/ patterns discovered by Genes@Work. For a more technical index.html) from our web site. discussion, see Ref. 53. Classifier and Classification Method. Both pattern discovery Acknowledgment and a concatenation of filter methods are used to identify a set of We thank Terri Felmlee for her assistance in the preparation of this manuscript. ng genes (gene features) that express differentially in two sets of subjects C and C . C and C constitute the training set. Given 1 2 1 2 References a new subject (the query), we can ask the question of whether 1. Foon, K. A., Rai, K. R., and Gale, R. P. Chronic lymphocytic leukemia: new this subject belongs in class C1 or C2. A classifier is a decision insights into biology and therapy. Ann. Intern. Med., 113: 525 – 539, 1990. function that, when presented with the query, yields a result that 2. Rozman, C. and Montserrat, E. Chronic lymphocytic leukemia. N. Engl. J. Med., 333: 1052 – 1057, 1995. allows us to assign the query to either C1 or C2. For a given sample i in the training set, we define a gene vector v the j-th 3. Garand, R. and Robillard, N. Immunophenotypic characterization of acute i leukemias and chronic lymphoproliferative disorders: practical recommendations component vi( j) of which is the logarithm (to the base 10, and and classifications. Hematol. Cell Ther., 38: 471 – 486, 1996. after clipping-off at 1) of the average difference of the j-th gene 4. Osorio, L. M. and Aguilar-Santelises, M. Apoptosis in B-chronic lymphocytic in our gene features. We define a measure of the distance leukaemia. Med. Oncol., 15: 234 – 240, 1998. between one gene vector v and another q,as 5. Reed, J. C. Molecular biology of chronic lymphocytic leukemia. Semin. Oncol., 25: 11 – 18, 1998. n 6. Plate, J. M. D., Petersen, K. S., Buckingham, L., Shahidi, H., and Schofield, 1 Xg distanceðv; qÞ¼ ½vð jÞqð jފ2: C. M. Gene expression in chronic lymphocytic leukemia B cells and changes n during induction of apoptosis. Exp. Hematol., 28: 1214 – 1224, 2000. g j¼0 7. Aalto, Y., El-Rifai, W., Vilpo, L., Ollila, J., Nagy, B., Vihinen, M., Vilpo, J., and Knuutila, S. Distinct gene expression profiling in chronic lymphocytic The distance D(C1, q) between class C1 and the query, is leukemia with 11q23 deletion. Leukemia, 15: 1721 – 1728, 2001. defined as the minimum of the distances between q and all the 8. Stratowa, C., Loffler, G., Lichter, P., Stilgenbauer, S., Haberl, P., Schweifer, subjects in C , that is N., Dohner, H., and Wilgenbus, K. K. cDNA microarray gene expression analysis 1 of B-cell chronic lymphocytic leukemia proposes potential new prognostic markers involved in lymphocyte trafficking. Int. J. Cancer, 91: 474 – 480, 2001. DðC1; qÞ¼min fdistanceðv; qÞg all v in C1 9. Klein, U., Tu, Y., Stolovitzky, G. A., Mattioli, M., Cattoretti, G., Husson, H., Freedman, A., Inghirami, G., Cro, L., Baldini, L., Neri, A., Califano, A., and Dalla-Favera, R. Gene expression profiling of B cell chronic lymphocytic For a given query q, we further define a score as: leukemia reveals a homogeneous phenotype related to memory B cells. J. Exp. Med., 194: 1625 – 1638, 2001. 1 1 ScoreðqÞ¼ 10. Rosenwald, A., Alizadeh, A. A., Widhopf, G., Simon, R., Davis, E., Yu, X., DðC1; qÞ DðC2; qÞ Yang, L., Pickeral, O. K., Rassenti, L. Z., Powell, J., Botstein, D., Byrd, J. C., Grever, M. R., Cheson, B. D., Chiorazzi, N., Wilson, W. H., Kipps, T. J., Brown, P. O., and Staudt, L. M. Relation of gene expression phenotype to The more positive the score is, the more likely it is that the immunoglobulin mutation genotype in B cell chronic lymphocytic leukemia. query is related to class C1; conversely, the more negative the J. Exp. Med., 194: 1639 – 1647, 2001. score is, the more likely it is that the sample belongs to class 11.Bustin,S.A.andDorudi,S.Thevalueofmicroarraytechniques for quantitative gene profiling in molecular diagnostics. Trends Mol. Med., 8: C2. A classifier can thus be derived from the score defined 269 – 272, 2002. above simply as 12. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and ( Levine, A. J. Broad patterns of gene expression revealed by clustering analysis C1 if ScoreðqÞ > s95% of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96: 6745 – 6750, 1999. ClassðqÞ¼ C2 if ScoreðqÞ < s5% undecided otherwise 13. Kitahara, O., Furukawa, Y., Tanaka, T., Kihara, C., Ono, K., Yanagawa, R., Nita, M. E., Takagi, T., Nakamura, Y., and Tsunoda, T. Alterations of gene expression during colorectal carcinogenesis revealed by cDNA microarrays after This classifier is known as a nearest neighbor classifier, laser-capture microdissection of tumor tissues and normal epithelia. Cancer Res., because it assigns q to the class to which the sample in the 61: 3544 – 3549, 2001. training set that is closest to the query belongs, as long as the 14. Takemasa, I., Higuchi, H., Yamamoto, H., Sekimoto, M., Tomita, N., Nakamori, S., Matoba, R., Monden, M., and Matsubara. K. Construction of score is beyond certain thresholds. The thresholds s5% and s95% preferential cDNA microarray specialized for human colorectal carcinoma: correspond to the significance levels of the classification. To molecular sketch of colorectal cancer. Biochem. Biophys. Res. Commun., 285: compute these thresholds, permutations of the components of 1244 – 1249, 2001. each sample in the training set is performed. Each permutated 15. Hegde, P., Qi, R., Gaspard, R., Abernathy, K., Dharap, S., Earle-Hughes, J., Gay, C., Nwokekeh, N. U., Chen, T., Saeed, A. I., Sharov, V., Lee, N. H., sample is considered as a query and its score is computed. All Yeatman, T. J., and Quackenbush, J. Identification of tumor markers in models of the samples in the training set are used in this way under many human colorectal cancer using a 19,200-element complementary DNA micro- different permutations, so as to create a probability density array. Cancer Res., 61: 7792 – 7797, 2001. function of scores. In this way, all the scores that are between 16. Yanagawa, R., Furukawa, Y., Tsunoda, T., Kitahara, O., Kameyama, M., Murata, K., Ishikawa, O., and Nakamura, Y. Genome-wide screening of genes the 5% percentile and the 95% percentile (s5% and s95%, showing altered expression in liver metastases of human colorectal cancers by respectively) are considered as undecided. However, if a query cDNA microarray. Neoplasia, 3: 395 – 401, 2001. 17. Hovanes, K., Li, T. W. H., Munguia, J. E., Truong, T., Molovanovic, T., yields a score above s95% (resp. below s5%), the probability that Marsh, J. L., Holcombe, R. F., and Waterman, M. L. h-catenin-sensitive isoforms it comes from a random query is less than 5%, and it will be of lymphoid enhancer factor-1 are selectively expressed in colon cancer. Nat. assigned to C1 (resp. C2). Genet., 28: 53 – 57, 2001.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Molecular Cancer Research 361

18. Rai, K. R. and Montserrat, E. Prognostic factors in chronic lymphocytic 36. Lagneaux, L., Delforge, A., Bernier, M., Stryckmans, P., and Bron, D. TGF-h leukemia. Semin. Hematol., 24: 252 – 256, 1987. activity and expression of its receptors in B-cell chronic lymphocytic leukemia. 19. Lander, E. S. Array of hope. Nat. Genet., 21: 3 – 4, 1999. Leuk. Lymphoma, 31: 99 – 106, 1998. 20. Schmid, C. and Isaacson, P. G. Proliferation centers in B-cell malignant 37. Buske, C., Becker, D., Feuring-Buske, M., Hannig, H., Griesinger, F., h lymphoma, lymphocytic (B-CLL): an immunophenotypic study. Histopathology, Hiddemann, W., and Wormann, B. TGF- and its receptor complex in leukemic 24: 445 – 451, 1994. B-cell precursors. Exp. Hematol., 26: 1155 – 1161, 1998. 21. Granziero, L., Ghia, P., Circosta, P., Gottardi, D., Strola, G. Geuna, M., 38. Damstrup, L., Rygaard, K., Spang-Thomsen, M., and Skovgaard Poulsen, H. h h Montagna, L., Piccoli, P., Chilosi, M., and Caligaris-Cappio, F. Surviving is Expression of transforming growth factor (TGF ) receptors and expression of h h h expressed on CD40 stimulation and interfaces proliferation and apoptosis in B-cell TGF 1, TGF 2 and TGF 3 in human small cell lung cancer cell lines. Br. J. chronic lymphocytic leukemia. Blood, 97: 2777 – 2783, 2001. Cancer, 67: 1015 – 1021, 1993. 22. Bhat, N. M., Kantor, A. B., Bieber, M. M., Stall, A. M., Herzenberg, L. A., 39. Yan, Z., Deng, X., and Friedman, E. Oncogenic Ki-ras confers a more and Teng, N. N. The ontogeny and functional characteristics of human B-1 aggressive colon cancer phenotype through modification of transforming growth h (CD5+ B) cells. Int. Immunol., 4: 243 – 252, 1992. factor- receptor III. J. Biol. Chem., 276: 1555 – 1563, 2001. 23. Damle, R. N., Ghiotti, F., Valetto, A., Albesiano, E., Fais, F., Yan, X. J., 40. Chevallier-Multon, M. C., Schweighoffer, F., Barlat, I., Baudouy, N., Fath, I., Sison, C. P., Allen, S. L., Kolitz, J., Schulman, P., Vinciguerra, V. P., Duchesne, M., and Tocque, B. Saccharomyces cerevisiae CDC25 (1028 – 1589) Budde, P., Frey, J., Rai, K. R., Ferrarini, M., and Chiorazzi, N. B cell is a guanine nucleotide releasing factor for mammalian ras proteins and is chronic lymphocytic leukemia cells express a surface membrane phenotype oncogenic in NIH3T3 cells. J. Biol. Chem., 268: 11113 – 11118, 1993. of activated, antigen-experienced B lymphocytes. Blood, 99: 4087 – 4093, 41. Zippel, R., Balestrini, M., Lomazzi, M., and Sturani, E. Calcium and 2002. calmodulin are essential for Ras-GRF1-mediated activation of the Ras pathway by 24. Yang, M-Y., Liu, T-C., Chang, J-G., Lin, P-M., and Lin, S-F. JunB gene lysophosphatidic acid. Exp. Cell Res., 258: 403 – 408, 2000. expression is inactivated by methylation in chronic myeloid leukemia. Blood First 42. Grellier, P., Yee, D., Gonzalez, M., and Abboud, S. L. Characterization of Edition; DOI 10.1182/blood-2002-05-1598, 2003. insulin-like growth factor binding proteins (IGFBP) and regulation of IGFBP-4 in 25. Mertens, D., Wolf, S., Schroeter, P., Schaffner, C., Dohner, H., Stilgenbauer, S., bone marrow stromal cells. Br. J. Haematol., 90: 249 – 257, 1995. and Lichter, P. Down-regulation of candidate tumor suppressor genes within 43. Bunting, K. D. ABC transporters as phenotypic markers and functional band 13q14.3 is independent of the DNA methylation pattern in regulators of stem cells. Stem Cells, 20: 11 – 20, 2002. B-cell chronic lymphocytic leukemia. Blood, 99: 4116 – 4121, 2002. 44. Bera, T. K., Iavarone, C., Kumar, V., Lee, S., Lee, B., and Pastan, I. MRP9, 26. Wahlfors, J., Hiltunen, H., Heinonen, K., Hamalainen, E., Alhonen, L., and an unusual truncated member of the ABC transporter superfamily, is highly Janne, J. Genomic hypomethylation in human chronic lymphocytic leukemia. expressed in breast cancer. Proc. Natl. Acad. Sci. USA, 99: 6997 – 7002, 2002. Blood, 80: 2074 – 2080, 1992. 45. Ross, D. D. Novel mechanisms of drug resistance in leukemia. Leukemia, 27. Rosi, F., Tabucchi, A., Carlucci, F., Galieni, P., Lauria, F., Zanoni, L., 14: 467 – 473, 2000. Guerranti, R., Marinello, E., and Pagani, R. 5V-nucleotidase activity in 46. Nadon, R. and Shoemaker, J. Statistical issues with microarrays: processing lymphocytes from patients affected by B-cell chronic lymphocytic leukemia. and analysis. Trends Genet., 18: 265 – 271, 2002. Clin. Biochem., 31: 269 – 272, 1998. 47. Jelinek, D. F., Tschumper, R. C., Geyer, S. M., Bone, N. D., Dewald, G. W., h 28. Novak, A. and Dedhar, S. Signaling through -catenin and Lef/Tcf. Cell. Hanson, C. A., Stenson, M. J., Witzig, T. E., Tefferi, A., and Kay, N. E. Analysis Mol. Life Sci., 56: 523 – 537, 1999. of clonal B cell CD38 and immunoglobulin variable region sequence status in 29. Polakis, P. Wnt signaling and cancer. Genes Dev., 14: 1837 – 1851, 2000. relation to clinical outcome for B-chronic lymphocytic leukemia. Br. J. 30. Reya, T., Okamura, R., and Grosschedl, R. Control of lymphocyte Haematol., 115: 854 – 861, 2001. differentiation by the LEF-1/TCF family of transcription factors. Cold Spring 48. Lanham, S., Hamblin, T., Oscier, D., Ibbotson, R., Stevenson, F., and Harbor Symp. Quant. Biol., LXIV: 133 – 140, 1999. Packham, G. Differential signaling via surface IgM is associated with VH gene 31. Van de Wetering, M., de Lau, W., and Clevers, H. WNT signaling and mutational status and CD38 expression in chronic lymphocytic leukemia. Blood, lymphocyte development. Cell, 109: S13 – S19, 2002. 101: 1087 – 1093, 2003. 49. Arora, T. and Jelinek, D. F. Differential myeloma cell responsiveness to 32. Travis, A., Amsterdam, A., Belanger, C., and Grosschedl, R. LEF-1, a gene INK4d encoding a lymphoid-specific protein with an HMG domain regulates T-cell interferon-a correlates with differential induction of p19 and cyclin D2 receptor enhancer function. Genes Dev., 5: 880 – 894, 1991. expression. J. Biol. Chem., 273: 11799 – 11805, 1998. 33. Reya, T., O’Riordan, M., Okamura, R., Devaney, E., Willert, K., Nusse, R., 50. Hartigan, J. A. Clustering Algorithms. New York: Wiley, 1975. and Grosschedl, R. Wnt signaling regulates B lymphocyte proliferation through a 51. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis LEF-1 dependent mechanism. Immunity, 13: 15 – 24, 2000. and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95: 34. Van Den Berg, D. J., Sharma, A. K., Bruno, E., and Hoffman, R. Role 14863 – 14868, 1998. of members of the Wnt gene family in human hematopoiesis. Blood, 92: 52. Tu, Y., Stolovitzky, G., and Klein, U. Quantitative noise analysis for gene 3189 – 3202, 1998. expression microarray experiments. Proc. Natl. Acad. Sci. USA, 99: 14031 – 35. Hildebrand, A., Romaris, M., Rasmussen, L. M., Heinegard, D., Twardzik, 14036, 2002. D. R., Border, W. A., and Ruoslahti, E. Interaction of the small interstitial 53. Califano, A., Stolovitzky, G., and Tu, Y. Analysis of gene expression , and fibromodulin with transforming growth microarrays for phenotype classification. Proc. Int. Conf. Intell. Syst. Mol. Biol., factor h. Biochem. J., 302: 527 – 534, 1994. 8: 75 – 85, 2000.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research. Identification of a Global Gene Expression Signature of B-Chronic Lymphocytic Leukemia 1 1 Mayo Comprehensive Cancer Center, National Cancer Institute CA91542 (awarded to N.E. Kay), and generous philanthropic support provided by Edson Spencer.

Diane F. Jelinek, Renee C. Tschumper, Gustavo A. Stolovitzky, et al.

Mol Cancer Res 2003;1:346-361.

Updated version Access the most recent version of this article at: http://mcr.aacrjournals.org/content/1/5/346

Cited articles This article cites 48 articles, 19 of which you can access for free at: http://mcr.aacrjournals.org/content/1/5/346.full#ref-list-1

Citing articles This article has been cited by 26 HighWire-hosted articles. Access the articles at: http://mcr.aacrjournals.org/content/1/5/346.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://mcr.aacrjournals.org/content/1/5/346. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from mcr.aacrjournals.org on September 27, 2021. © 2003 American Association for Cancer Research.