View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

FEBS Letters 580 (2006) 2774–2778

Integrated genome-wide expression map and high-resolution analysis of aberrant chromosomal regions in squamous cell lung cancer

Xin-Yu Zhanga,b,c, Ying Hud, Yong-Ping Cuie, Xiao-Ping Miaoe, Feng Tiana,b,c, Yong-Jing Xiaa,b,c, Yi-Qing Wua,b,c, Xiangjun Liua,b,c,* a Department of Biological Science and Biotechnology, Tsinghua University, Beijing 100084, PR China b Institutes of Biomedicine, Tsinghua University, Beijing 100084, PR China c Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, PR China d NCI Center for Bioinformatics, Laboratory of Population Genetics, National Cancer Institute MSC8302, United States e Department of Etiology and Carcinogenesis, Cancer Institute, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100021, PR China

Received 3 March 2006; revised 6 April 2006; accepted 12 April 2006

Available online 25 April 2006

Edited by Takashi Gojobori

lack et al. [2] used microarray containing more than 3000 Abstract The recognition of recurrent aberrant regions in can- cer is important to the discovery of candidate cancer related cDNA probes to localize the genomic aberrations in cancer. . Here we first constructed a genome-wide High throughput gene expression analysis techniques were also map of squamous lung carcinoma from the Stanford Microarray used to determine the aberrant regions in cancer. Takeshi Fujii Database. High-resolution detection of aberrant chromosomal et al. [3] constructed a transcriptome map based on the serial regions was performed by using moving-median method. 84% analysis of gene expression (SAGE) data to identify the geno- (27 of 32) of our results were consistent with the previous studies mic alteration regions in NSCLC. However, the SAGE results of comparative genomic hybridization or loss of heterozygosity. were subject to the size and reliabilities of the SAGE libraries. One overrepresented region in Xq28 was newly discovered to be In this study, we will introduce a bioinformatics approach for related to squamous cell lung carcinoma. These observations the generation of gene expression map and detection of the could be of great interest for further studies. aberrant chromosomal regions in SCC based on public micro- Ó 2006 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies. array datasets.

Keywords: Microarray; Aberrant chromosomal region; Squamous cell carcinoma; Gene expression map; Moving-median method 2. Materials and methods

2.1. Data collection A normalized microarray dataset, generated in a previous study by Arindam Bhattacharjee et al. [4], was obtained from http://www.geno- 1. Introduction me.wi.mit.edu/MPR. The dataset contained gene expression data of lung specimens hybridized to Affymetrix human U95Av2 oligonucleo- Lung cancer is one of the leading causes of cancer death in tide arrays, including 21 SCC samples and 17 normal ones. the world (Death rate extrapolations in USA for lung cancer: 160,439 per year). Non-small cell lung cancer (NSCLC), the 2.2. Statistical analysis To identify the differentially expressed genes, significance analysis of major form of lung cancer, accounts for nearly 80% of the microarrays (SAM) [5] was applied as a method for two-sample t-test disease and consists of three subtypes, squamous cell lung with the estimation of false discovery rate (FDR). Bioconductor carcinoma (SCC), lung adenocarcinoma and large cell carci- (http://www.bioconductor.org) module, siggene, and the R software noma [1]. Through comparative genomic hybridization (ver. 2.1.1) [6] were used to perform SAM analysis, and the parameter delta was set to 1.0. Those genes with FDR < 0.06 were selected for (CGH) and loss of heterozygosity (LOH) analyses, it was further analysis. found that the of lung tumor tissues often have recurrent abnormal regions. Consequently, the gene expression 2.3. Expression ratio calculation levels in these regions are usually upregulated or downregu- The gene expression ratios of tumor over normal or vice versa were lated. Definition of such aberrant regions is one important calculated as follows. First, the hybridization signal intensities of all route to the identification of genes that play a role in NSCLC. cases were termed X1, X2 ...Xr, whose mean was X ¼ðX 1 þ ... CGH, however, has a limited (20 Mb) mapping resolution. þXrÞ=r. Fold change for upregulated genes was calculated as ER ¼ X =X and fold change for down-regulated genes was Meanwhile, high-throughput LOH and fluorescence in situ tumor normal calculated as ER ¼ X normal=X tumor. hybridization (FISH), as higher-resolution techniques, are prohibitively labour-intensive on a genomic-wide scale. 2.4. Gene mapping Recently a microarray technique, called arrayCGH, has A MySQL database was created on Red Hat Linux System (ver. 9.0) been used to determine the DNA copy number variation. Pol- containing the NCBI (National Center for Biotechnology Information) public databases for genes (ftp://ftp.ncbi.nih.gov/gene/DATA, ftp:// ftp.ncbi.nih.gov/repository/UniGene, dated October 2005) and UCSC *Corresponding author. GoldenPath hg17 genome database (http://genome.ucsc.edu/golden- E-mail address: [email protected] (X. Liu). Path/hg17/bigZips). Each gene’s physical map position in the genome

0014-5793/$32.00 Ó 2006 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies. doi:10.1016/j.febslet.2006.04.043 X.-Y. Zhang et al. / FEBS Letters 580 (2006) 2774–2778 2775

Fig. 1. The flowchart of the pipeline of this analysis. was computed by mapping its reference mRNA sequence to the hg17 3.2. Differentially expressed chromosomal regions detected in by using the blat algorithm (http://genome.ucsc.edu/ squamous cell carcinoma cgi-bin/hgBlat), with parameters ‘‘-noHead -out=psl -fine -oneOff=1 - The moving median tends to reduce variability in genes’ t=dna -q=rna -tileSize=11 minIdentity=95’’. Splicing variants were uni- fied to one unique gene by using the gene2 accession file from the NCBI expression ratios, making the expression ratios trend easier gene database. UCSC databases (http://hgdownload.cse.ucsc.edu/gold- to analyze along each . Parameter K was set as enPath/hg17/database) were used to obtain the structure information of a threshold value to filter the gene expression ratios. A relative each genome element. The gene expression map was drawn by using the lower K value, 1.2, was used because all the genes were already pdf modules of the Perl language (ver. 5.8.1). identified significantly expressed by the t-test analysis. The size of moving window was set at 3 so that in a defined aberrant 2.5. Detection of differentially expressed chromosomal regions A modified moving-median ratio method [2,3] was used to survey region there were not two consecutive genes having opposite the gene expression map and to identify the overrepresented or under- expression patterns to most of the remaining genes, and there- represented chromosomal regions in SCC. The median expression ratio fore increasing the specificity. When the parameter R was set was calculated for a window size (W) of three positionally consecutive to 8, 9 or 10, a total of 43, 31 or 24 regions could be gained, gene loci. A clustering of differentially expressed genes was defined as respectively, for SCC. We applied a mediate R value, 9, in or- nine or more runs (R) of consecutive moving-medians with a threshold cutoff (K) of 1.2 times the genomic median. If the physical map dis- der to optimize both the specificity and accuracy of the result. tance of two ordinal genes was more than 5M base pairs, it was re- Table S1 (see supplementary file TableS1.xls) shows the garded that the distance overstepped the allowed cluster size and the identified chromosomal regions in SCC. The genes that were computation of clusters would be renewed from the next three adjacent located within the regions and significantly expressed in SCC genes. The flowchart of this analysis was shown in Fig. 1. were also defined. Of the 32 identified aberrant regions, 20 were underrepresented while 12 overrepresented. The data of lung adenocarcinoma was also analyzed but failed to present meaningful result (data not shown). Of the 27 aberrant regions 3. Results and discussion detected in lung adenocarcinoma, all appeared to be overrep- resented. Similar result had been reported by Takeshi Fujii 3.1. Statistical analysis and the gene expression map generation et al. [3]. The reasons might include: (i) the expression profile In total 3409 transcripts were filtered from the 12,600 array of adenocarcinoma was very similar to normal lung, and had probes through the two-sample t-test SAM analysis. From been testified in hierarchical analysis of the same datasets by those screened transcripts, 2721 unique genes were successfully previous study [4]; (ii) the subclasses of adenocarcinoma were mapped to the human genome. A high-resolution genome- so various that it was difficult to find the distinct expression wide human gene expression map covering these genes was profiles without further classification of adenocarcinoma. constructed (see supplementary file ArrayScc.pdf). For these reasons, our discussion was focused on SCC. 2776 X.-Y. Zhang et al. / FEBS Letters 580 (2006) 2774–2778

In SCC, deletion and amplification were often reported on 17q21 contained both overrepresented and underrepresented chromosome arms 3p and 3q, respectively. As expected, the regions, and the former mainly contained the KRT family expression bars were green for most genes on 3p (representing genes, seven of which were upregulated in SCC according to downregulation), while on 3q they are mostly red (representing our result. Similar results were also reported previously upregulation) (Fig. 2). Similar scenes can be observed in other [23,34]. In the remaining five results, 1 region, 7q21–q22, had frequently aberrant chromosomal regions such as 8p21 and been contrarily reported for to be underrepresented and over- 20p12. Our analysis might not include some less significant represented from different groups; 3 regions, 1p35, 1q23 and aberrant regions because: (i) the 12,600 expressed transcripts 12q13.11–13.13, were inconsistent with the previous reports, representing approximately 9000 unique genes cannot cover and this may be due to the different resolution between tradi- all the expressed genes; (ii) some aberrant regions may contain tional methods and our analysis. less differentially expressed genes. It could be improved by One new overrepresented region, Xq28, was firstly discov- using larger-scale microarray containing more probes or ered in SCC from our analysis. Xq28 had been reported having reducing the parameters R with sacrifice of specificity. relation to tumor carcinogenesis. High-level DNA amplifica- Previous CGH, FISH and LOH studies from published re- tions in this region were identified in mantle cell lymphomas ports were used to roughly verify our results of aberrant re- [35], human breast cancer [36] and adenoid cystic carcinoma gions in SCC. About 84% (27 of 32) of our results are of the salivary gland [37]. Another report indicated that consistent with the recurrent aberrant regions, namely, 1p36 Xq28 was one of the regions of clustered expression of groups [7], 1p35 [8],3p[9], 4q22 [10],5q[11], 6p21.3 [12], 8p21–22 of adjacent genes, which were co-regulated by transcriptional [13], 9p21 [10,14], 10q22 [15], 11p15.5 [16], 11q13 [17], 11q22 activation in major epithelial malignancies in human [38]. [18], 12p13 [19], 15q [20,21], 16p [22], 17p13 [23,24], 17q12– In this study, a part of Xq28 (Fig. 3), containing 4,418,626 21 [23],17q23 [25] and 19p13.3 [26] are often underrepresented, base pairs, was disclosed as a significantly overrepresented re- while 1q21 [27], 2p13 [28], 2p14–16 [22], 2q24–q31 [29],3q[30], gion in SCC. It includes 14 known differentially expressed 7p15–14 [31], 7q11 [20], 20q12–13 [32] and 22q11–13 [33], are genes, of which 12 were over-expressed. The expression profiles frequently overrepresented. Because of the low resolutions of of the 12 genes in normal lung and SCC tissues were shown in the previously developed techniques, those reported aberrant Fig. S1 (See supplementary file FigS1.xls). Four of the 12 over- regions normally have large sizes. Our work narrows down expressed genes, namely MAGEA1, MAGEA2, MAGEA4, these regions, and the significantly differentially expressed and MAGEA12, were in the MAGEA family. The MAGEA genes in the regions are identified. Our results showed that family consists of 12 genes (MAGEA1 to MAGEA12) and

Fig. 2. Gene expression map of chromosome arms 3p and 3q in squamous cell lung carcinoma. We have added an inset picture to show the enlarged view of a small part of the picture to demonstrate the condensation of the picture, and meanwhile, the complete high-resolution picture is shown in a supplementary data file, ‘‘FigS1.pdf’’. The aberrant regions and the information of chromosomal position, accession numbers, symbols, cytogenetic band and the expression ratio and pattern of genes were shown orderly from left side to right side. X.-Y. Zhang et al. / FEBS Letters 580 (2006) 2774–2778 2777

Fig. 3. Gene expression map of cytogenetic region Xq28 in squamous cell lung carcinoma. they are all located at Xq28. MAGEA1, MAGEA2, MA- References GEA3, MAGEA4, MAGEA6 and MAGEA12 were reported to be expressed in melanomas and other cancers but silent in [1] Nacht, M. et al. (2001) Molecular characteristics of non-small cell normal tissues [39]. The results of our studies were highly con- lung cancer. Proc. Natl. Acad. Sci. USA 98, 15203–15208. [2] Pollack, J.R. et al. (1999) Genome-wide analysis of DNA copy- sistent with those from previous reports, and strongly suggest number changes using cDNA microarrays. Nat. Genet. 23, 41–46. that this overrepresented region in Xq28 have significant cor- [3] Fujii, T. et al. (2002) A preliminary transcriptome map of non- relation with lung tumor carcinogenesis and may be of great small cell lung cancer. Cancer Res. 62, 3340–3346. value for oncologists’ further investigation. [4] Bhattacharjee, A. et al. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct ade- nocarcinoma subclasses. Proc. Natl. Acad. Sci. USA 98, 13790– 13795. Acknowledgements: The authors thank for the help and documentation [5] Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significance of Li Tiantian and Zhang Ke. This work is supported by the Key Pro- analysis of microarrays applied to the ionizing radiation response. ject of Chinese Ministry of Education (No. 104232), Trans-Century Proc. Natl. Acad. Sci. USA 98, 5116–5121. Training Programe Foundation for the Talents by the Ministry of [6] Abruzzo, L.V. et al. (2005) Biological validation of differentially Education, and Tsinghua-Yue-Yuen Medical Sciences Fund. expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide micro- arrays. J. Mol. Diagn. 7, 337–345. [7] Heijmans, B.T., Boer, J.M., Suchiman, H.E., Cornelisse, C.J., Appendix A. Supplementary data Westendorp, R.G., Kromhout, D., Feskens, E.J. and Slagboom, P.E. (2003) A common variant of the methylenetetrahydrofolate reductase gene (1p36) is associated with an increased risk of Supplementary data associated with this article can be cancer. Cancer Res. 63, 1249–1253. found, in the online version, at doi:10.1016/j.febslet.2006. [8] Thorstensen, L., Qvist, H., Heim, S., Liefers, G.J., Nesland, J.M., 04.043. Giercksky, K.E. and Lothe, R.A. (2000) Evaluation of 1p losses in 2778 X.-Y. Zhang et al. / FEBS Letters 580 (2006) 2774–2778

primary carcinomas, local recurrences and peripheral metastases small cell lung carcinoma. Zhonghua Jie He He Hu Xi Za Zhi 22, from colorectal cancer patients. Neoplasia 2, 514–522. 743–745. [9] Wikman, H. et al. (2000) hOGG1 polymorphism and loss of [24] Konishi, H., Sugiyama, M., Mizuno, K., Saito, H., Yatabe, Y., heterozygosity (LOH): significance for lung cancer susceptibility Takahashi, T., Osada, H. and Takahashi, T. (2003) Detailed in a caucasian population. Int. J. Cancer 88, 932–937. characterization of a homozygously deleted region corresponding [10] Petersen, I. et al. (2000) Chromosomal imbalances in brain to a candidate tumor suppressor locus at distal 17p13.3 in human metastases of solid tumors. Brain Pathol. 10, 395–401. lung cancer. Oncogene 22, 1892–1905. [11] Mendes-da-Silva, P., Moreira, A., Duro-da-Costa, J., Matias, D. [25] Cave-Riant, F. et al. (2002) Association of genetic defects in and Monteiro, C. (2000) Frequent loss of heterozygosity on primary resected lung adenocarcinoma revealed by targeted allelic chromosome 5 in non-small cell lung carcinoma. Mol. Pathol. 53, imbalance analysis. Am. J. Respir. Cell Mol. Biol. 27, 495–502. 184–187. [26] Zhu, Y., Harrison, D.J. and Bader, S.A. (2004) Genetic and [12] Virmani, A.K., Fong, K.M., Kodagoda, D., McIntire, D., Hung, epigenetic analyses of MBD3 in colon and lung cancer. Br. J. J., Tonk, V., Minna, J.D. and Gazdar, A.F. (1998) Allelotyping Cancer 90, 1972–1975. demonstrates common and distinct patterns of chromosomal loss [27] Goeze, A., Schluns, K., Wolf, G., Thasler, Z., Petersen, S. and in human lung cancer types. Genes Chromosomes Cancer 21, Petersen, I. (2002) Chromosomal imbalances of primary and 308–319. metastatic lung adenocarcinomas. J. Pathol. 196, 8–16. [13] Shao, S., Sun, W., Zhang, J., An, Q., Xiao, T., Yang, P., Cheng, [28] Sy, S.M. et al. (2004) Distinct patterns of genetic alterations in S. and Gao, Y. (2002) Fine deletion mapping on chromosome adenocarcinoma and squamous cell carcinoma of the lung. Eur. J. 8p21–8p22 in adenocarcinoma of lung. Zhonghua Yi Xue Za Zhi Cancer 40, 1082–1094. 82, 740–742. [29] Berrieman, H.K., Ashman, J.N., Cowen, M.E., Greenman, J., [14] Sanz-Ortega, J., Saez, M.C., Sierra, E., Torres, A., Balibrea, J.L., Lind, M.J. and Cawkwell, L. (2004) Chromosomal analysis of Hernando, F., Sanz-Esponera, J. and Merino, M.J. (2001) 3p21, non-small-cell lung cancer by multicolour fluorescent in situ 5q21, and 9p21 allelic deletions are frequently found in normal hybridisation. Br. J. Cancer 90, 900–905. bronchial cells adjacent to non-small-cell lung cancer, while they [30] Petersen, I. et al. (1997) Patterns of chromosomal imbalances in are unusual in patients with no evidence of malignancy. J. Pathol. adenocarcinoma and squamous cell carcinoma of the lung. 195, 429–434. Cancer Res. 57, 2331–2335. [15] Petersen, S., Rudolf, J., Bockmuhl, U., Gellert, K., Wolf, G., [31] Ubagai, T., Matsuura, S., Tauchi, H., Itou, K. and Komatsu, K. Dietel, M. and Petersen, I. (1998) Distinct regions of allelic (2001) Comparative genomic hybridization analysis suggests a imbalance on chromosome 10q22–q26 in squamous cell carcino- gain of chromosome 7p associated with lymph node metastasis in mas of the lung. Oncogene 17, 449–454. non-small cell lung cancer. Oncol. Rep. 8, 83–88. [16] Bepler, G. and Garcia-Blanco, M.A. (1994) Three tumor-sup- [32] Luk, C., Tsao, M.S., Bayani, J., Shepherd, F. and Squire, J.A. pressor regions on chromosome 11p identified by high-resolution (2001) Molecular cytogenetic analysis of non-small cell lung deletion mapping in human non-small-cell lung cancer. Proc. carcinoma by spectral karyotyping and comparative genomic Natl. Acad. Sci. USA 91, 5513–5517. hybridization. Cancer Genet. Cytogenet. 125, 87–99. [17] McKeeby, J.L. et al. (2001) Multiple leiomyomas of the esoph- [33] Michelland, S., Gazzeri, S., Brambilla, E. and Robert-Nicoud, M. agus, lung, and uterus in multiple endocrine neoplasia type 1. Am. (1999) Comparison of chromosomal imbalances in neuroendo- J. Pathol. 159, 1121–1127. crine and non-small-cell lung carcinomas. Cancer Genet. Cyto- [18] Rasio, D., Negrini, M., Manenti, G., Dragani, T.A. and Croce, genet. 114, 22–30. C.M. (1995) Loss of heterozygosity at chromosome 11q in lung [34] Shibata, T. et al. (2005) Genetic classification of lung adenocar- adenocarcinoma: identification of three independent regions. cinoma based on array-based comparative genomic hybridization Cancer Res. 55, 3988–3991. analysis: its association with clinicopathologic features. Clin. [19] Sundareshan, T.S. and Augustus, M. (1996) Cytogenetics of non- Cancer Res. 11, 6177–6185. small cell lung cancer: simple technique for obtaining high quality [35] Bea, S. et al. (1999) Increased number of chromosomal imbal- chromosomes by fine needle aspirate cultures. Cancer Genet. ances and high-level DNA amplifications in mantle cell lym- Cytogenet. 91, 53–60. phoma are associated with blastoid variants. Blood 93, 4365– [20] Balsara, B.R., Sonoda, G., du Manoir, S., Siegfried, J.M., 4374. Gabrielson, E. and Testa, J.R. (1997) Comparative genomic [36] Wallden, B., Emond, M., Swift, M.E., Disis, M.L. and Swisshelm, hybridization analysis detects frequent, often high-level, over- K. (2005) Antimetastatic gene expression profiles mediated by representation of DNA sequences at 3q, 5p, 7p, and 8q in retinoic acid receptor beta 2 in MDA-MB-435 breast cancer cells. human non-small cell lung carcinomas. Cancer Res. 57, 2116– BMC Cancer 5, 140. 2120. [37] Kasamatsu, A. et al. (2005) Identification of candidate genes [21] Kee, H.J., Shin, J.H., Chang, J., Chung, K.Y., Shin, D.H., Kim, associated with salivary adenoid cystic carcinomas using com- Y.S., Kim, S.K. and Kim, S.K. (2003) Identification of tumor bined comparative genomic hybridization and oligonucleotide suppressor loci on the long arm of chromosome 15 in primary microarray analyses. Int. J. Biochem. Cell Biol. 37, 1869–1880. small cell lung cancer. Yonsei Med. J. 44, 65–74. [38] Glinsky, G.V., Ivanova, Y.A. and Glinskii, A.B. (2003) Common [22] Chujo, M., Noguchi, T., Miura, T., Arinaga, M., Uchida, Y. and malignancy-associated regions of transcriptional activation Tagawa, Y. (2002) Comparative genomic hybridization analysis (MARTA) in human prostate, breast, ovarian, and colon cancers detected frequent overrepresentation of chromosome 3q in are targets for DNA amplification. Cancer Lett. 201, 67–77. squamous cell carcinoma of the lung. Lung Cancer 38, 23–29. [39] De Plaen, E. et al. (1994) Structure, chromosomal localization, [23] Guo, X., Ni, P. and Li, L. (1999) Studies of microsatellite and expression of 12 genes of the MAGE family. Immunogenetics instability and loss of heterozygosity at chromosome 17 in non- 40, 360–369.