Research Article

Seizure 6-Like (SEZ6L) and Risk for Lung Cancer

Ivan P. Gorlov,1 Peter Meyer,2,3,4 Triantafillos Liloglou,6 Jonathan Myles,7 Melanie Barbara Boettger,2,3 Adrian Cassidy,6 Luc Girard,9 John D. Minna,9 Reiner Fischer,8 Stephen Duffy,7 Margaret R. Spitz,1 Karl Haeussinger,2 Stefan Kammerer,5 Charles Cantor,5 Rainer Dierkesmann,8 John K. Field,6 and Christopher I. Amos1

1Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas; 2Department for Respiratory Medicine, Asklepios Specialist Hospitals, Munich-Gauting, Germany; 3Genefinder Technologies Ltd.; 4Institute of Molecular Medicine, Munich, Germany; 5Sequenom, Inc., San Diego, California; 6Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool, United Kingdom; 7Cancer Research UK Centre for Epidemiology, Mathematics and Statistics Wolfson Institute of Preventive Medicine, London, United Kingdom; 8Department for Respiratory Medicine, Schillerho¨he Specialist Hospital, Stuttgart-Gerlingen, Germany;and 9Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas

Abstract small cell lung cancer (SCLC; n = 22) cell lines by using Affymetrix HG-U133A and HG-U133B GeneChips. We found DNA pooling in combination with high-throughput sequenc- that the average expression level of SEZ6L in NSCLC cell lines ing was done as a part of the Sequenom-Genefinder project. In was almost two times higher and in SCLC cell lines more than the pilot study, we tested 83,715 single nucleotide poly- six times higher when compared with normal lung epithelial morphisms (SNP), located primarily in gene-based regions, cell lines. Using the National Center for Biotechnology to identify polymorphic susceptibility variants for lung cancer. Information Gene Expression Omnibus database, we found a For this pilot study, 369 male cases and 287 controls of both f2-fold elevated and statistically significant (P = 0.004) level sexes (white Europeans of Southern German origin) were of SEZ6L expression in tumor samples compared with normal analyzed. The study identified a candidate region in 22q12.2 lung tissues. In conclusion, the results of these studies that contained numerous SNPs showing significant case- representing 906 cases compared with 811 controls indicate control differences and that coincides with a region that was aroleoftheSEZ6L Met430Ile polymorphic variant in shown previously to be frequently deleted in lung cancer cell increasing lung cancer risk. lines. The candidate region overlies the seizure 6-like (SEZ6L) [Cancer Res 2007;67(17):8406–11] gene. The pilot study identified a polymorphic Met430Ile substitution in the SEZ6L gene (SNP rs663048) as the top Introduction candidate for a variant modulating risk of lung cancer. Two Although up to 90% of lung cancers are attributable to smoking, replication studies were conducted to assess the association of only a small fraction of smokers develop lung cancer over their SNP rs663048with lung cancer risk. The M. D. Anderson lifetimes (1), suggesting that genetic variation may contribute to Cancer Center study included 289 cases and 291 controls lung cancer susceptibility (2). Results of segregation analyses matched for gender, age, and smoking status. The Liverpool suggest that rare autosomal dominant polymorphisms may explain Lung Project (a United Kingdom study) included 248cases and susceptibility to early-onset lung cancer;however, only a minority 233 controls. Both replication studies showed an association of lung cancer cases can be explained by the presence of such of the rs663048with lung cancer risk. The homozygotes for the variants (3–6). There is also a rapidly expanding body of literature variant allele had more than a 3-fold risk compared with the on the association of common, low-penetrance with lung wild-type homozygotes [combined odds ratio (OR), 3.32; 95% cancer risk (5–8). According to the latest update of Cancer Genetics confidence interval (95% CI), 1.81–7.21]. Heterozygotes also Web database,10 more than forty genes may be involved in had a significantly elevated risk of lung cancer from the susceptibility to, and progress of lung cancer. combined replication studies with an OR of 1.15 (95% CI, 1.04– Identification of genetic factors modulating lung cancer risk 1.59). The effect remained significant after adjusting for age, requires a combination of effective genotyping technologies with gender, and pack-years of tobacco smoke. We also compared an appropriate and efficient study design. Sequenom (San Diego, expression of SEZ6L in normal human bronchial epithelial CA) has developed a DNA analysis platform, capable of high- cells (n = 7), non–small cell lung cancer (NSCLC; n = 52), and throughput genotyping with pooled DNA allele frequency analysis. Using this approach, Sequenom implemented a Genetics Discovery platform with dense genome-wide single nucleotide polymorphism (SNP) markers (7, 8). A hypothesis-free approach using allele Note: Supplementary data for this article are available at Cancer Research Online frequency estimates of many thousands (for lung cancer, 83,715 (http://cancerres.aacrjournals.org/) and at Cancer Genetics Web database: http:// SNPs) of SNPs was used as a first step (pilot study) in identifying www.cancerindex.org/geneweb/X1501.htm. potentially relevant genetic variants. Significant SNPs identified in I.P. Gorlov, P. Meyer, R. Dierkesmann, J.K. Field, and C.I. Amos contributed equally to this work. this first step were then individually genotyped and validated in Requests for reprints: Christopher I. Amos, Department of Epidemiology, The replication studies using independent samples. The efficiency of University of Texas M. D. Anderson Cancer Center, Box 189, 1515 Holcombe Boulevard, Houston, TX 77030. Phone: 713-563-49-51;Fax: 49-8106-23273;E-mail: camos@ this strategy has been shown by the rediscovery of genes shown mail.mdanderson.org or Peter Meyer, Genefinder Technologies Ltd., Sperberstrasse 2, 81827 Munich, Germany. Phone: 49-171-2838333;Fax: 49-8106-23273;E-mail: Peter. [email protected]. I2007 American Association for Cancer Research. doi:10.1158/0008-5472.CAN-06-4784 10 http://www.cancerindex.org/geneweb/index.htm

Cancer Res 2007; 67: (17). September 1, 2007 8406 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. SEZ6L and Lung Cancer previously to be involved in several common diseases (8–10). The smoke. Smoking status was defined as in the U.S. study. In total, 248 lung purpose of the current study was to implement this strategy to cancer cases and 233 controls were included in this analysis. Table 1 identify genetic variation modulating lung cancer risk. provides a description of cases and controls used in the LLP study. SNP markers and genotyping. Genomic DNA was extracted from blood peripheral leukocytes by using the Qiagen DNA blood mini kit (Qiagen) according to the manufacturer’s instruction. DNA pools were formed by Materials and Methods combining equimolar amounts of individual samples as described Sequenom-Genefinder pilot study (Southern Germany). Lung cancer elsewhere (13). For the pilot study, one pool of 369 cases and one pool of cases for the pilot study were recruited from the Departments for 287 controls, respectively, were constructed. For assays carried out on Respiratory Medicine and Thoracic Surgery, Schillerho¨he Specialist Hospital sample pools, 25 ng of a 5-ng/L pool were used for PCRs. All PCR and (Stuttgart-Gerlingen, Germany) and the Department for Respiratory MassEXTEND reactions were conducted using standard conditions. Relative Medicine, Asklepios Specialist Hospitals (Munich-Gauting, Germany). allele frequency estimates were derived from calculations based on the area Controls were sampled from patients with nonmalignant disease at the under the peak of mass spectrometry measurements from four analyte same hospitals. The final sample consisted of 369 male cases and 287 aliquots (14). Tests of association between disease status and each SNP were controls of both genders, all with a positive history of tobacco smoke carried out as previously discussed (15). When three or more replicate exposure. A total of 83,715 SNPs, mostly in gene-based regions, were used measurements of a SNP were available, the corresponding variance in the analysis. Epidemiologic data were collected by personal interview. component was estimated from the data. Otherwise, the following historical Table 1 provides a description of cases and controls used in the pilot study. laboratory averages were used to calculate sources of variability: pool À À The M. D. Anderson Cancer Center replication study (United formation, 5.0 Â 10 5;PCR/mass extension, 1.7 Â 10 4;and chip À States). Cases and controls for the M. D. Anderson Cancer Center (MDACC) measurement, 1.0 Â 10 4. The same procedure was used for individual study were recruited from an ongoing lung cancer case-control study genotyping except 2.5 ng DNA was used and only one mass spectrometry enrolling patients with newly diagnosed, untreated lung cancer at The measurement was taken. The following gene-specific primers were used to University of Texas MDACC (Houston, TX). The study has been described in genotype rs663048: the forward PCR primer was 5¶-TGGGCTATGAGCTC- detail previously (11). Control subjects were recruited from the largest CAGGG-3¶;the reverse PCR primer was 5 ¶-TGCGGCTTGGAGGCATTGAT-3¶; private multispecialty physician group in the Houston metropolitan area. and extend primer was 5¶-GAGCTCCAGGGCGCTAAGAT-3¶. The controls did not have a previous diagnosis of any type of cancer and The Sequenom-Genefinder pilot study included 83,715 SNPs selected were frequency matched to the cases on age (F5 years), sex, ethnicity, and based on their location within a gene region (including the coding region smoking status (current, former, and never). There were no never smokers plus additional 10 kb at the both ends) and minor allele frequency (MAF) in this subgroup (defined as a person who smoked fewer than 100 cigarettes from a total of 125,799 experimentally validated polymorphic variations in his/her lifetime). A former smoker had quit smoking for at least 1 year (7, 8, 10). In the first step, one PCR and primer extension reaction was before the interview. Pack-years were defined as the number of cigarettes carried out for 83,715 SNPs on each pool (case and control). In the second smoked per day divided by 20 and then multiplied by the number of years step, 4,293 SNPs (f5%) with the most statistically significant associations smoked. Epidemiologic data were collected by personal interview. In total, were remeasured in triplicate on each DNA pool. In the third step, the 301 289 lung cancer cases and 291 controls were included in the analysis. most significant SNPs (f7%) from step two were individually genotyped in Table 1 provides a description of cases and controls used in the MDACC. each sample. A total of 160 SNP markers were identified with statistically The Liverpool Lung Project replication study (United Kingdom). The significant differences between cases and controls (P < 0.05) after individual lung cancer case-control data were derived from the Liverpool Lung Project genotyping in the German pilot study and were then genotyped in the (LLP) from an ongoing molecular epidemiologic study of lung cancer in MDACC and LLP replication samples. Liverpool, United Kingdom (12). Histologically or cytologically confirmed Expression of the seizure 6-like gene. Besides analyzing the effect of lung cancer cases with primary tumors were recruited from participating the Met430Ile variant on lung cancer risk, we also compared the seizure chest clinics. Population controls were selected from registers of General 6-like (SEZ6L) expression level in normal versus cancer cell lines (our data) Practitioners in Liverpool to ensure similar age-sex distributions to the and in primary tumors versus normal lung tissues [data from the National cases. In all studies, a standardized questionnaire was used to determine Center for Biotechnology Information (NCBI) Gene Expression Omnibus basic demographic characteristics in addition to details on smoking history, (GEO) database]. Affymetrix HG-U133A and HG-U133B high-density lifetime residence and occupation, history of lung diseases, family history of oligonucleotide microarrays were also used to evaluate expression of the cancer in first-degree relatives, and exposure to environmental tobacco SEZ6L gene in cell lines. Gene expression profiling was done on a panel of

Table 1. Description of the cases and controls from the pilot and the two replication studies

Sequenom-Genefinder pilot study MDACC (United States) LLP (United Kingdom) (Southern Germany)

Cases, Controls, Cases, Controls, Cases, Controls, n = 369 (%) n = 287 (%) n = 289 (%) n = 291 (%) n = 248 (%) n = 233 (%)

Gender Male 369 (100) 196 (68.29) 164 (56.75) 158 (54.30) 159 (64.11) 158 (67.81) Female 0 91 (31.71) 125 (43.25) 133 (45.70) 89 (35.89) 75 (32.19) Smoking status Never 0 0 0 0 13 (5.24) 66 (28.33) Former 0 0 168 (58.13) 173 (59.45) 105 (42.34) 121 (51.93) Current 368 (100) 287 (100) 121 (41.87) 118 (40.55) 130 (52.42) 46 (19.74) Age (y) Mean (SD) 65.4 (6.3) 63.0 (8.3) 62.7 (9.94) 62.2 (10.09) 65.9 (9.2) 65.4 (8.4)

www.aacrjournals.org 8407 Cancer Res 2007; 67: (17). September 1, 2007

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. Cancer Research

Figure 1. Lung cancer susceptibility region of the 22q12.2. A total of 35 SNPs were genotyped in this region with median spacing f1 kb. Dark line, Àlog10 (P value) averaged across five neighboring SNPs. Blue bar, deletion identified by Nishioka et al. (18); orange bars, positions of the SEZ6L and MYO18B genes. Diamonds, positions of SNPs that were genotyped in the pilot study.

52 non–small cell lung cancer (NSCLC) and 22 small cell lung cancer (SCLC) We found that the position and size of the candidate region cell lines. As a control, we used seven normal human bronchial epithelial coincides well with the position and size of the SEZ6L gene. The cell lines immortalized with cyclin-dependent kinase 4 and hTert or with candidate region occupied the distal part of the 428-kb region E6/E7 with or without hTert (16), and two unimmortalized lung cell lines frequently deleted in lung cancer cell lines (18). The deletion (NHBEC and SAEC). The list of the cell lines used in the study can be found SEZ6L MYO18B SEZ6L contains two genes: and (18). No significant SNPs in Supplementary Table S1. Four probes were used for . Signals were MYO18B median normalized and log 2 transformed. Averaging of the signal across were detected in the region (Fig. 1). the five probes was used to estimate signal intensity for each cell line as well The MDACC replication study. SNP markers with allelic as for controls. Expression of the SEZ6L was detected in all cell lines. frequencies found to be significantly different between cases and For comparing the SEZ6L expression in normal tissues and primary lung controls in the pilot study were validated in the MDACC replication tumors, we used the GDS619 data set with microarray gene expression data study. The total number of SNP markers genotyped in the MDACC in SCLC (19 samples), non–small lung cancer (12 samples), and normal lung study was 147. Thirteen SNPs exhibited significant (P < 0.01) tissue (20 samples). Tumor samples were obtained from the patients deviation from Hardy-Weinberg equilibrium in controls and were undergoing surgery at the Cancer Institute Hospital (Tokyo, Japan). The excluded from the analysis. The remaining 133 SNPs were analyzed control samples were obtained by bronchial brushes from unrelated healthy to estimate their association with lung cancer. Analysis of all individuals. Normalized and log-transformed gene expression data were histologic types together enabled an identification of six cancer- downloaded and the SEZ6L expression in normal and tumor tissues was compared (see description for GDS619 data set from the GEO database).11 associated SNP markers. In NSCLC, the same SNP markers were Statistical analysis. The distributions of the demographic variables identified as showing significant associations with lung cancer as between cases and controls were compared using the m2 test. For well as two additional SNPs (Table 2). categorical variables (sex, ethnicity, and smoking status), two-sample To further prioritize the significant SNPs listed in Table 2, we Student’s t test was used. A goodness-of-fit m2 test was used to determine annotated the SNPs by using available data that potentially can whether the polymorphisms were in Hardy-Weinberg equilibrium. The help to identify causal SNPs (Table 3). The SNP rs663048 was the adjusted odds ratios (OR) were calculated using multiple logistic regression only nonsynonymous (Met430Ile) SNP in the list. Sorting Intolerant to control for age, sex, and intensity of smoking (pack-year), to estimate from Tolerant (SIFT;ref. 19) and Polymorphism Phenotyping effect of SNPs on lung cancer risk. To estimate the overall ORs from the (PolyPhen;ref. 20) software were used to predict the functional replication studies, we used the Mantel-Haenszel test (17). All statistical effect of the Met430Ile amino acid substitution. Both algorithms analyses were done using STATISTICA (StatSoft, Inc.). predict that the Met430Ile is a functional ( disturbing) variant;therefore, our further analysis was concentrated on Results rs663048 polymorphism. The Sequenom-Genefinder pilot study. The pilot study To estimate the relative risk associated with the significant SNPs, identified a candidate region located at 22q12.2. The size of the we used a logistic regression model. Table 4 shows the predicted OR candidate region was f37 kb. The region contains a cluster of 17 adjusted for age, intensity of smoking (pack-year), and gender. There of 25 SNPs significantly associated with lung cancer risk (P < 0.05). was no significant deviation from Hardy-Weinberg equilibrium in Most of the SNPs in the region were in linkage disequilibrium both replication samples neither in controls nor in cases. Homo- (minimum r2 z 0.8). Based on moderate or low r2 V 0.6 levels of zygotes for the more frequent allele were used as reference groups. OR linkage disequilibrium, six significant SNPs were chosen for values ranged from 1.29 to 1.72, with four of them being statistically genotyping in two replication sample sets. significant. The OR for the SNP rs663048 was 1.48 (P = 0.03) with rare genotypes being associated with increased risk for NSCLC. The LLP replication study (United Kingdom). The LLP 11 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gds replication sample was used to further validate the association

Cancer Res 2007; 67: (17). September 1, 2007 8408 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. SEZ6L and Lung Cancer

Table 2. SNP markers found to be significant by analysis of the all histologic types or NSCLCs (MDACC)

SNP ID Chromosome Alleles Controls Cases m2 P

n MA MAF SE n MA MAF SE

rs1037023 13 C/T 288 C 0.2 0.02 250 C 0.14 0.02 8.47 0.004 rs1906520 3 G/T 278 T 0.16 0.02 241 T 0.12 0.01 6.37 0.012 rs1890697 14 A/C 286 A 0.08 0.01 251 A 0.12 0.01 6.06 0.014 rs3803 3 C/T 273 T 0.23 0.02 245 T 0.17 0.02 5.63 0.018 rs663048 22 G/T 287 T 0.21 0.02 253 T 0.27 0.02 5.33 0.021 rs422679 17 C/T 281 T 0.39 0.02 250 T 0.32 0.02 5.32 0.021 rs765707* 12 C/T 277 T 0.06 0.01 247 T 0.1 0.01 5.18 0.023 rs1057050* 10 G/A 288 A 0.05 0.01 254 A 0.08 0.01 4.79 0.029

Abbreviation: MA, minor allele. *SNPs were significant only in NSCLCs.

between rs663048 and lung cancer risk. In this study, we also found not show differential expression between lung cancer and normal a significant association between Met430Ile polymorphism and lung cell lines. SNP genotypes were not available for this analysis. lung cancer with cases having a higher frequency of the variant T/T Expression of the SEZ6L in primary tumor and normal lung genotype (8.9 F 1.9%) than the controls [3.0 F 1.1%; m2 = 7.4; tissues. We used the NCBI GEO database containing microarray degrees of freedom (df )=1;P = 0.006]. We observed a 3.8-fold data on the genome wide assessment of gene expression. Using the increased risk of lung cancer in individuals who carried the key words ‘‘lung AND cancer OR tumor,’’ we have identified several rs663048 null genotype T/T compared with those who carried the entries with data on gene expression. The GDS619 data set rs663048 more common (wild-type) genotype G/G (95% CI, 1.40– (platform GPL962: CHUGAI41K) was most appropriate for our goal 10.42) after adjusting for age, sex, and smoking (Table 4). to compare SEZ6L expression in normal and tumor tissues. This Expression of the SEZ6L in normal and lung tumor cell lines. data set contains data on gene expression in SCLC, adenocarci- Supplementary Fig. S1 shows the expression level of SEZ6L in 9 nomas, and normal tissues. We found that the average log normal lung cell lines, 54 NSCLC, and 22 SCLC cell lines. The 2-transformed SEZ6L expression value was significantly higher in average expression signal in controls was 6.2 (n = 9) compared adenocarcinoma compared with normal lung tissues (0.34 F 0.13 with the average expression signal in the NSCLC cell lines of 7 versus À0.38 F 0.15;two sided t test = 3.1; df = 29; P = 0.004). The (n = 54) and in SCLC cell lines of 8.8 (n = 22). A nonparametric expression of SEZ6L was also insignificantly higher in SCLC Mann-Whitney U test was significant for all pairwise compar- compared with normal tissues (Student’s t test = 1.4; df = 43; isons. The smallest (but still very significant) difference was P = 0.15). The variance of expression values in NSCLC sample was found between the normal lung cell lines and NSCLC cell lines significantly higher compared with the normal tissues (Var = 0.07 (Z = À3.3; P = 0.001). We also found significant differences in among controls versus Var = 0.42 among SCLCs; F = 6.2; P = 0.001). expression level of GATA2, with the expression being higher in This result together with our analysis of the SEZ6L expression in cancer lines compared with controls (Mann-Whitney U test, cell lines suggests an increased level of the SEZ6L expression in Z = À3.9; P = 0.0005). Other candidate genes listed in Table 3 did lung tumor compared with normal lung tissues.

Table 3. Annotations of the SNPs that have been shown to be significantly associated with lung cancer risk

SNP ID SNP location Chromosome Chromosome position Gene Description Cancer-related*

rs1037023 Intron 13q14 24854588 FLJ10094 Hypothetical protein No rs1906520 7 kb Downstream CD200 3q13 113580534 CD200 Immunoglobulin No rs1890697 2 kb Upstream SETP2 14q22 50961888 SETP2 Pseudogene No rs3803 5¶-UTR 3q21 129682070 GATA2 Zinc finger, hematopoiesis No rs663048 Missense 22q12 25025077 SEZ6L Transmembrane protein Yes rs422679 Intron 17p13 8226123 RPL26 Ribosomal protein L26 No rs765707 2 kb Downstream OR7A19P 12q12 45288613 OR7A19P Olfactory receptor No rs1057050 3¶-UTR 10q24 104132284 GBF1 Guanine exchange factor No

Abbreviation: UTR, untranslated region. *An exhaustive search of literature was conducted to estimate potential relationship of the SNP to cancer.

www.aacrjournals.org 8409 Cancer Res 2007; 67: (17). September 1, 2007

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. Cancer Research

Table 4. Risk estimates for lung cancer associated with rs663048 genotype in MDACC and LLP replication samples

Sample Genotype Case, n (%) Control, n (%) OR* (95% CI) P

MDACC GG 137 (51.7) 176 (62.4) 1 (Reference) TG 102 (38.5) 100 (34.8) 1.18 (0.87–2.04) 0.27 TT 26 (9.8) 11 (3.8) 2.84 (1.36–8.54) 0.008 LLP GG 125 (50.4) 133 (57.1) 1 (Reference) TG 91 (36.7) 85 (36.5) 1.23 (0.78–1.96) 0.40 TT 22 (8.9) 7 (3.0) 3.81 (1.40–10.42) 0.009

*ORs were adjusted for age, pack-year, and gender.

Discussion protein structure and evolutionary conservation) was associated There are two major approaches used to identify cancer-related with increased risk for lung cancer, suggesting that loss of normal SEZ6L genes: (a) candidate gene approach and (b) genome-wide scan function may be a risk factor for lung cancer. This is SEZ6L approach. The candidate gene approach is based on prior data consistent with the finding that region is often deleted in indicating that a gene(s) is involved in cancer. For example, it has lung cancer cell lines (18). On the other hand, we found that SEZ6L been shown that increased lung cancer risk correlates with expression of is elevated in lung cancer cell lines. One SEZ6L decreased nucleotide excision repair (NER) capacity (21–23). explanation might be that is both a tumor marker and a Targeting genes involved in NER has identified several poly- variant affecting lung cancer susceptibility. Loss of normal SEZ6L morphisms in NER genes associated with elevated lung cancer risk function is a risk factor for development of lung cancer;however, (24–26). The advantage of the candidate gene approach is that the when lung cancer is caused by factors other than loss-of-SEZ6L number of candidate genes is limited;therefore, the number of function, expression of the SEZ6L is adaptively up-regulated to false positives among significant associations is also expected to be suppress tumorigenesis. SEZ6L low. A disadvantage of the approach is that genes without prior Several lines of evidence support the hypothesis that data on associations will not be included in the analysis and, might modulate lung cancer risk. First, frequent allelic losses on therefore, cannot be identified by the candidate gene approach. In 22q in NSCLCs have been reported, indicating the presence of the genome-wide scan approach, potentially all genes in the tumor suppressor gene(s) on that chromosome arm (18). Cloning of SEZ6L genome are targeted, allowing the identification of novel cancer- the breakpoints revealed a 400-kb deletion containing the MYO18B related genes. A disadvantage of the genome-wide scan is the large and genes (18, 27). A study conducted by Suzuki at al. (28) number of independent tests so that a large sample size is required suggests that SEZ6L gene may also influence development and to distinguish between true- and false-positive associations. progression of colorectal cancer. The authors found that SEZ6L In our study, we have combined elements of both approaches. At was one of the few genes highly hypermethylated in primary the first step, more than 83,000 SNP markers covering the coding colorectal tumors. region of the whole genome were analyzed. This analysis yielded In the pilot study, we populated the candidate region with 35 many candidate SNPs showing significant associations with lung SNPs and found that markers located in the SEZ6L gene region cancer. Significant SNPs identified in the pilot study are a mix of show a strong association with lung cancer;however, no significant false positives and true associations. For selected SNPs in SEZ6L associations were found in the neighboring MYO18B gene. Applying candidate region, two independent replication studies were then a sliding window of five neighboring SNPs revealed a peak of À P SEZ6L conducted to identify true associations. The MDACC replication log10 ( values) that coincides with the position of the study yielded eight SNPs showing associations with lung cancer gene. We found that the principal contributor to the peak was risk. We then used additional SNP-related information to identify rs663048. The association of this SNP with lung cancer risk was most promising genes. As a result of this analysis, the SEZ6L gene verified in two independent replication studies. The rs663048 SNP emerged as a top candidate gene to be associated with lung cancer. is a Met430Ile amino acid substitution that has been predicted to It is interesting that based only on the significance of the m2 test, be functional by both SIFT and PolyPhen, suggesting that this this gene was number five in the list. The LLP replication study amino acid substitution is protein disturbing. further validated the association of SNP rs663048 in SEZ6L with Nishioka et al. (18) found that 95% (43 of 45) of primary tumor lung cancer risk. samples carry the Met430Ile mutation. The authors did not An analysis of the expression of the SEZ6L gene showed different estimate the frequency of the variant in controls. We found that expression of SEZ6L in normal and NSCLC SCLC cell lines. 38% of controls and 48% of cases carry at least one variant allele. Interestingly, we found that the two histologic types of lung cancer This suggests that f40% of tumors may carry a somatic Met430Ile had different levels of expression of SEZ6L. The average expression mutation. If we consider that according to the HapMap the signal in NSCLC was 7.0 F 0.1 and in SCLC 8.8 F 0.2 (Mann- frequency of the Met430Ile polymorphism is lower in Japanese than Whitney U test, Z = 6.0; P < 0.001). It needs to be noted that there is in Caucasians, the percentage of accumulated somatic Met430Ile an apparent inconsistency between the results of the analysis of the may actually be higher. expression of SEZ6L and the results of the association studies. In We found that homozygotes for the variant allele had 3-fold the MDACC and United Kingdom samples, the variant allele (that higher lung cancer risk compared with the normal variant was predicted to be protein disturbing based on analysis of the homozygotes. Lung cancer risk was also significantly elevated in

Cancer Res 2007; 67: (17). September 1, 2007 8410 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. SEZ6L and Lung Cancer heterozygotes. According to our estimates, the frequency of the of lung cancer. Although the strongest evidence from our study variant allele for rs663048 is 22%, which is very similar to the indicates this SNP, it is possible that another closely located SNP 20% reported for Caucasians by the HapMap database. We found plays a dominant role in promoting lung cancer risk and that the that 36% of Caucasian controls are hetorozygotes and f4% are Met340Ile variant is a marker in linkage disequilibrium with the homozygotes for the risk allele, making the portion of Caucasians underlying causal variant. However, further studies, especially those having at least one risk allele f40%. Results from combined, implementing functional assays, are warranted to provide more Mantel-Haenszel, analysis yielded ORs of 1.15 [95% confidence conclusive evidences on causal association between the Met430Ile interval (95% CI), 1.04–1.59] for heterozygotes and 3.32 (95% CI, and lung cancer risk. 1.81–7.21) for homozygotes. The population attributable risk À Â P À Â P Â percentage [PAR% = (OR 1) / [(OR 1) +1] 100, Acknowledgments where P is the risk genotype frequency in the controls] was 7.5 f Received 12/27/2006;revised 4/27/2007;accepted 6/20/2007. for homozygotes and 8.3 for heterozygotes, suggesting 16% of Grant support: National Cancer Institute grants R01 CA55769 and CA 70907, excess risk in lung cancer cases is due to the presence of the Specialized Programs of Research Excellence grant P50CA70907, and Flight Attendant variant allele. Medical Research Institute. The costs of publication of this article were defrayed in part by the payment of page In conclusion, our data together with published studies suggest charges. This article must therefore be hereby marked advertisement in accordance that the Met430Ile variant might be a causal variant affecting risk with 18 U.S.C. Section 1734 solely to indicate this fact.

References the PDCD5 locus with lung cancer risk and prognosis in 20. Ramensky V, Bork P, Sunyaev S. Human non- smokers. J Clin Oncol 2006;24:1672–8. synonymous SNPs: server and survey. Nucleic Acids 1. American Cancer Society I. Cancer facts and 11. Wei Q, Cheng L, Amos CI, et al. Repair of tobacco Res 2002;30:3894–900. figures 2005. Atlanta (GA): American Cancer Society; carcinogen-induced DNA adducts and lung cancer risk: 21. Friedberg EC, Bond JP, Burns DK, et al. Defective 2005. a molecular epidemiologic study. J Natl Cancer Inst nucleotide excision repair in xpc mutant mice and its 2. Amos CI, Xu W, Spitz MR. Is there a genetic basis for 2000;92:1764–72. association with cancer predisposition. Mutat Res 2000; lung cancer susceptibility? Recent Results Cancer Res 12. FieldJK,SmithDL,DuffyS,CassidyA.TheLiverpool 459:99–108. 1999;151:3–12. Lung Project research protocol. Int J Oncol 2005;27:1633–45. 22. Ide F, Iida N, Nakatsuru Y, Oda H, Tanaka K, 3. Sellers TA, Chen PL, Potter JD, Bailey-Wilson JE, 13. Buetow KH, Edmonson M, MacDonald R, et al. High- Ishikawa T. Mice deficient in the nucleotide excision Rothschild H, Elston RC. Segregation analysis of throughput development and characterization of a repair gene XPA have elevated sensitivity to benzo[- smoking-associated malignancies: evidence for Mende- genomewide collection of gene-based single nucleotide a]pyrene induction of lung tumors. Carcinogenesis lian inheritance. Am J Med Genet 1994;52:308–14. polymorphism markers by chip-based matrix-assisted 2000;21:1263–5. 4. Sellers TA, Bailey-Wilson JE, Elston RC, et al. Evidence laser desorption/ionization time-of-flight mass spectro- 23. Cheng L, Spitz MR, Hong WK, Wei Q. Reduced for Mendelian inheritance in the pathogenesis of lung metry. Proc Natl Acad Sci U S A 2001;98:581–4. expression levels of nucleotide excision repair genes in cancer. J Natl Cancer Inst 1990;82:1272–9. 14. Bansal A, van den Boom D, Kammerer S, et al. lung cancer: a case-control analysis. Carcinogenesis 5. Yang P, Schwartz AG, McAllister AE, Aston CE, Association testing by DNA pooling: an effective initial 2000;21:1527–30. Swanson GM. Genetic analysis of families with non- screen. Proc Natl Acad Sci U S A 2002;99:16871–4. 24. Zienolddiny S, Campa D, Lind H, et al. Polymor- smoking lung cancer probands. Genet Epidemiol 1997; 15. Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, phisms of DNA repair genes and risk of non-small cell 14:181–97. Clayton DG. Identification of the sources of error in lung cancer. Carcinogenesis 2006;27:560–7. 6. Xu H, Spitz MR, Amos CI, Shete S. Complex allele frequency estimations from pooled DNA indicates 25. Wu X, Zhao H, Wei Q, et al. XPA polymorphism segregation analysis reveals a multigene model for lung an optimal experimental design. Ann Hum Genet 2002; associated with reduced lung cancer risk and a cancer. Hum Genet 2005;116:121–7. 66:393–405. modulating effect on nucleotide excision repair capa- 7. Nelson MR, Marnellos G, Kammerer S, et al. Large- 16. Ramirez RD, Sheridan S, Girard L, et al. Immortali- city. Carcinogenesis 2003;24:505–9. scale validation of single nucleotide polymorphisms in zation of human bronchial epithelial cells in the absence 26. Shen M, Berndt SI, Rothman N, et al. Polymorphisms gene regions. Genome Res 2004;14:1664–8. of viral oncoproteins. Cancer Res 2004;64:9027–34. in the DNA nucleotide excision repair genes and lung 8. Kammerer S, Roth RB, Reneland R, et al. Large-scale 17. Mantel N, Haenszel W. Statistical aspects of the cancer risk in Xuan Wei, China. Int J Cancer 2005;116: association study identifies ICAM gene region as breast analysis of data from retrospective studies of disease. 768–73. and prostate cancer susceptibility locus. Cancer Res J Natl Cancer Inst 1959;22:719–48. 27. Nishioka M, Kohno T, Tani M, et al. MYO18B, a 2004;64:8906–10. 18. Nishioka M, Kohno T, Takahashi M, et al. Identifica- candidate tumor suppressor gene at chromosome 9. Kammerer S, Burns-Hamuro LL, Ma Y, et al. Amino tion of a 428-kb homozygously deleted region disrupting 22q12.1, deleted, mutated, and methylated in human acid variant in the kinase binding domain of dual- the SEZ6L gene at 22q12.1 in a lung cancer cell line. lung cancer. Proc Natl Acad Sci U S A 2002;99:12269–74. specific A kinase-anchoring protein 2: a disease Oncogene 2000;19:6251–60. 28. Suzuki H, Gabrielson E, Chen W, et al. A genomic susceptibility polymorphism. Proc Natl Acad Sci U S A 19. Ng PC, Henikoff S. SIFT: predicting amino acid screen for genes upregulated by demethylation and 2003;100:4066–71. changes that affect protein function. Nucleic Acids Res histone deacetylase inhibition in human colorectal 10. Spinola M, Meyer P, Kammerer S, et al. Association of 2003;31:3812–4. cancer. Nat Genet 2002;31:141–9.

www.aacrjournals.org 8411 Cancer Res 2007; 67: (17). September 1, 2007

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research. Seizure 6-Like (SEZ6L) Gene and Risk for Lung Cancer

Ivan P. Gorlov, Peter Meyer, Triantafillos Liloglou, et al.

Cancer Res 2007;67:8406-8411.

Updated version Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/67/17/8406

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2007/08/28/67.17.8406.DC1

Cited articles This article cites 27 articles, 8 of which you can access for free at: http://cancerres.aacrjournals.org/content/67/17/8406.full#ref-list-1

Citing articles This article has been cited by 4 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/67/17/8406.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/67/17/8406. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on October 1, 2021. © 2007 American Association for Cancer Research.