Oncogene (2011) 30, 1117–1126 & 2011 Macmillan Publishers Limited All rights reserved 0950-9232/11 www.nature.com/onc ORIGINAL ARTICLE Genomic characterization of asymptomatic CT-detected lung cancers

E Belloni1,2,9, G Veronesi3,9, C Micucci1,2, S Javan1,2, SP Minardi2,4, E Venturini2,4, P Maisonneuve5, S Volorio1,4, M Riboni1,4, M Bellomi6, P Scanagatta3, G Taliento3, G Pelosi7, S Pece1,2,8, L Spaggiari3,8 and PG Pelicci1,2,8

1Department of Experimental Oncology, European Institute of Oncology, Milan, Italy; 2IFOM-IEO Campus, Milan, Italy; 3European Institute of Oncology, Division of Thoracic Surgery, Milan, Italy; 4Cogentech, Milan, Italy; 5European Institute of Oncology, Division of Epidemiology and Biostatistics, Milan, Italy; 6European Institute of Oncology, Division of Radiology, Milan, Italy; 7European Institute of Oncology, Division of Pathology, Milan, Italy and 8Universita’ degli Studi di Milano, Milan, Italy

Computed tomography (CT) screening of lung cancer Oncogene (2011) 30, 1117–1126; doi:10.1038/onc.2010.478; allows the detection of early tumors. The objective of published online 25 October 2010 our study was to verify whether initial asymptomatic lung cancers, identified by high-resolution low-dose CT Keywords: whole genome analysis; early diagnosis; cancer (LD-CT) on a high-risk population, show genetic abnorm- alities that could be indicative of the early events of lung carcinogenesis. We analyzed 78 tumor samples: 21 (pilot population) from heavy smokers with asymptomatic non- Introduction screening detected early-stage lung cancers and 57 from 5203 asymptomatic heavy smoker volunteers, who under- In the last 20 years, the incidence of lung cancer has went a LD-CT screening study. During surgical resection significantly grown, especially when related to tobacco of the detected tumors, tissue samples were collected and usage. Of the over 1 million new cases identified short-term cultures were started for karyotype evaluation. worldwide per year, more than 900 000 (about 90%) Samples were classified according to the normal (NK) or will result in death. The highest rates are currently aneuploid (AK) karyotype. The NK samples were further observed in Europe and North America. With over analyzed by the Affymetrix single-nucleotide polymor- 160 000 deaths from lung cancer, this disease now phisms (SNPs) technology. Metaphase spreads were accounts, per annum, for about 30% of all cancer obtained in 73.0% of the selected samples: 80.7% showed deaths in the United States alone (Jemal et al., 2008), an AK. A statistically significant correlation was found and there has been a 600% increase in the incidence between presence of vascular invasion and abnormal among women, following increased prevalence of karyotype. A total of 10 NK samples were suitable for smoking, in the last 80 years (Patel et al., 2004). SNPs analysis. Subtle genomic alterations were found in After smoking cessation, the risk of lung cancer does eight tumors, the remaining two showing no evidence to not drop promptly, with the result that it has become the date of chromosomal aberrations anywhere in the genome. leading cause of death among current and former Two common regions of amplification were identified at smokers. Recently, some success in the controlling of 5p and 8p11. Mutation analysis by direct sequencing was tobacco consumption has been achieved in several conducted for the K-RAS, TP53 and EGFR , developed countries: nevertheless, global consumption confirming data already described for heavy smokers. of tobacco continues to grow (Enstrom and Heath, We show that: (i) the majority of screening-detected 1999; Proctor, 2001; Jemal et al., 2008). tumors are aneuploid; (ii) early-stage tumors tend to One of the major difficulties in the management of harbor a less abnormal karyotype; (iii) whole genome lung cancer is late diagnosis, whereas the lung cancer analysis of NK tumors allows for the detection of common patients who are cured are typically those with localized regions of copy number variation (such as amplifications at disease that can be surgically removed, but hardly 5p and 8p11), highlighting genes that might be considered identified because asymptomatic (Horner et al., 2009). candidate markers of early events in lung carcinogenesis. It stands to reason that the development of early non- invasive screening procedures is needed (Henschke et al., 1999; Diederich et al., 2004; Humphrey et al., 2004; Correspondence: Dr E Belloni, Department of Experimental Mulshine and Sullivan, 2005), these offering the unique Oncology, European Institute of Oncology, Via Adamello 16, Milan 20139, Italy. opportunity to successfully achieve lung cancer early E-mail: [email protected] or detection and increase survival. Amongst existing Dr G Veronesi, European Institute of Oncology, Division of Thoracic screening techniques, low-dose spiral CT (LD-CT) has Surgery, Via Ripamonti 435, Milan 20141, Italy. been proven to be highly sensitive and specific (Veronesi E-mail: [email protected] 9These authors contributed equally to this work. et al., 2007) in detecting the disease at its initial stages Received 23 March 2010; revised 30 August 2010; accepted 12 September (stage IA-IB, characterized by absence of lymph nodes 2010; published online 25 October 2010 involvement as well as absence of distant metastases). Whole-genome analysis of early lung cancers E Belloni et al 1118 The Continuous Observation of Smoking Subjects the presence of tumor samples carrying evident chromo- (COSMOS) study, recruited 5203 heavy smoker volun- some alterations (AKs, which we did not study further, teers at the European Institute of Oncology, to be because of the high extent of genetic defects identifiable annually screened by LD-CT, and has identified, in 4 in such genomes), or of tumor samples with a normal years, 160 lung cancer early and/or asymptomatic cases numerical content (normal karyotypes (Veronesi et al., 2007, 2008). This population is (NKs)). Subsequently we performed a high-resolution exceptionally interesting, because the continuous mon- genomic analysis exclusively on NK samples, in order to itoring of high-risk subjects enhances the possibility of investigate for the presence of genomic alterations, identifying and studying lung cancer in its initial stages. possibly indicating early genetic events in lung tumor- Thus the series of asymptomatic tumors we had detected igenesis. represented a great opportunity to search for the Our results show that the vast majority (80%) of the presence of early genetic events that could plausibly screening-detected tumors had an abnormal genome illustrate which specific defects affect the tumoral content (AKs); however, a restricted number of cases genome at the early steps of tumor formation. did carry a macroscopically normal genome (NKs), with Conventional cytogenetics or comparative genomic subtle defects. hybridization studies evidenced the presence of several chromosomal alterations in lung cancers, involving the loss or gains of entire chromosome arms, such as the common losses of 3p, 6q, 8p, 9p, 9q, 13q, 17p, 18q, 19p, Results 21q and 22q, as well as gains of 1p, 1q, 3q, 5p, 7p, 7q, 11q and 12q, detected in non-small cell lung cancers A total of 78 tumor samples were collected from 57 male (Mitsuuchi and Testa, 2002). Chromosome numerical and 21 female patients. The mean age was 63 years. The alterations (aneuploidies) have also been described. The mean pack-years was 50.5 (53 for males and 43.7 for analysis of over 300 non-small cell lung cancers cases, females). All patients underwent surgical operation, retrieved through the Mitelman database of chromo- with radical intent, that was achieved in 76 out of some aberrations in cancer (publicly available on-line 78 cases (97.4%). We identified 55 ADK (70.5%), 13 at the website http://cgap.nci.nih.gov/Chromosomes/ squamous-cell carcinomas (16.7%), 3 small-cell lung Mitelman), clearly shows that chromosome losses cancers (3.8%) and 7 other type tumors (9%). Stage dominate over gains. The most common losses detected IA cases represented more than half of the total in the analyzed cases involve loss of chromosome Y, 9 or (44/78 ¼ 56.4%). More specifically, the tumors mean 14, whereas supernumerary 7, 12 or 20 size was 19.8 mm overall, 15.2 mm among stage IA, represent the most frequent gains. More recently, high- 30.5 mm among stage IB, 25.6 mm among stage II, and resolution genome-wide approaches, such as array- 18 mm among stages III and IV samples. The mean size comparative genomic hybridization and whole-genome of ADKs was 19 mm and the mean size of other types of single-nucleotide polymorphisms (SNPs) analysis, have tumors was 21.6 mm. For detailed data see Supplemen- been applied to the study of large collections of lung tary Table 1. cancer patients (Tonon et al., 2005; Zhao et al., 2005; Weir et al., 2007), in order to verify alterations also at a Genomic macroscopic analysis submicroscopic level, which could lead to a more Metaphase spreads were obtained for 57/78 (73.0%) complete characterization of the lung cancer genome. tumors. Aneuploidy was detected in the vast majority of In another study, 623 human genes were searched for the samples (46/57 ¼ 80.7%) despite the prevalent early somatic mutations by sequencing a total of 247 Mb in stage of disease. Table 1 reports the data on the clinical 188 primary adenocarcinoma (ADK) genomes (Ding parameters characterizing these samples. Statistical et al., 2008). Such an effort resulted in the identification analysis showed a significant correlation between AK of 26 significantly mutated genes in this specific type of and vascular invasion. In particular, NK was present in lung cancer. Although these results mostly identified 9/21 (42.8%) samples without vascular invasion and recurrent genomic lesions, there are no studies attempt- in only 1/20 (5.0%) of tumors with vascular invasion ing to define those abnormalities that could specifically (Po0.01). No significant correlation between tumor typify the early genetic events in lung carcinogenesis. karyotypes and age, histology, nodal status, tumor Notably, the study of aneuploid karyotypes (AKs), stage, tumor grade, necrosis or clinical status was particularly when a high degree of abnormalities is observed. However, none of the NK samples present, would be of no help in defining one or few (0/11 ¼ 0%) showed lymph nodes involvement, which specific alterations and/or genetic defects, marking the was instead evident in 5/44 (11.4%) AK samples. The initial steps of the tumorigenic processes, which could proportion of NK samples was higher in stage I tumors be instead emphasized by a detailed investigation of (10/47 ¼ 21.3%) compared with stage II–IV tumors macroscopically normal tumoral genomes. We decided (1/10 ¼ 10.0%), and among lesions with maximum to address this issue by using two consecutive strategies, diameter o15 mm (8/27 ¼ 29.6%) compared with larger with the aim to (i) highlight the presence of lung cancers lesions (3/30 ¼ 10.0%), even though the difference was with a normal karyotype and (ii) extensively examine not statistically significant. such tumors for subtle genetic alterations. Therefore, we Conversely, tumors of any grade could be aneuploid, first performed a macroscopic analysis, in order to verify including G1. In our ADK subgroup, metaphase

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1119 Table 1 Correlation between clinicopathological features and spreads were available for 18 ADK with bronchioloal- karyotypes of the 57 cases analyzed veolar component (BAC-ADK) and 23 non-BAC-ADK Variable Karyotype samples (pure BACs were not present in our collection). Karyotype evaluation showed that the rate of AK All Aneuploid Normal P-value samples was similar in both groups (14/18 ¼ 77.8% Patients 57 46 (80.7) 11 (19.3) and 17/23 ¼ 73.9% for BAC-ADK and non-BAC-ADK respectively). Age Mean 63.1 63.5 61.5 0.40 (a) s.d. 7.0 7.4 4.4

Pack-yearsa High-resolution genomic analysis o40 20 18 (90.0) 2 (10.0) Among the 57 cases analyzed, 11 NK tumors were 40–60 21 18 (85.7) 3 (14.3) detected. One DNA sample had to be excluded, as there 60 þ 16 10 (62.5) 6 (37.5) 0.05 (c) was no frozen tumor tissue and the quality of the DNA Tumor stage extracted from the correspondent paraffin-embedded Stage I 47 37 (78.7) 10 (21.3) Stage II–IV 10 9 (90.0) 1 (10.0) 0.67 (b) sample did not match the criteria required for the HuSNPs Chip analysis (case #42 in the tables). The remaining 10 Nodal status PNÀ 50 39 (78.0) 11 (22.0) samples, along with the matched normal controls, were PN þ 5 5 (100) 0 (0) 0.57 (b) subjected to SNPs genotyping. Table 2 summarizes the Tumor grade results obtained listing all the abnormalities found as well G1 8 6 (75.0) 2 (25.0) as the genomic size of each alteration. Two cases showed a G2 20 17 (85.0) 3 (15.0) G3 26 21 (80.8) 5 (19.2) 0.87 (c) great variety of genomic alterations (group I in Table 2), Unknown 3 two harbored no abnormalities (group II in Table 2), and six showed a limited number of alterations (not more Tumor size (mm) p15 mm 27 19 (70.4) 8 (29.6) than 2–3 in the same sample) with both focal and large- 415 mm 30 27 (90.0) 3 (10.0) 0.09 (b) scale events (group III in Table 2). The different typo- Necrosis logies of alterations identified with CNAG (Copy Number Absent 26 18 (69.2) 8 (30.8) Analyzer for Affymetrix GeneChip Mapping) were o50% focal 4 3 (75.0) 1 (25.0) o50% 8 8 (100) 0 (0) validated by means of real-time quantitative PCR (RQ– 450% 3 2 (66.7) 1 (33.3) 0.28 (c) PCR), in selected regions, as described in Materials and

Vascular invasion methods. Cases 3 to 8, characterized by a reduced number Absent 21 12 (57.1) 9 (42.9) of alterations, are particularly interesting. Indeed, the low Present 20 19 (95.0) 1 (5.0) 0.009 (b) number of genomic defects, concomitantly present in each BAC case, might indicate events marking the early steps of the No 23 17 (73.9) 6 (26.1) tumorigenic transformation in each one of these samples. Yes 18 14 (77.8) 4 (22.2) 1.00 (b) Missing 16 In addition, by comparing the anomalies found amongst all the different samples, we noted that three appeared BAC pct None 23 17 (73.9) 6 (26.1) more than once (bolded in Table 2): these consist of o50% 10 9 (90.0) 1 (10.0) two amplifications (one large-scale and one focal) and 50–99 8 5 (62.5) 3 (37.5) 0.47 (c) one deletion (loss of heterozygosity (LOH), large-scale). Outcome We decided to focus on the two common amplifications, ALIVE 50 40 (80.0) 10 (20.0) involving the chromosomes 5 and 8. The amplification at AWD 4 3 (75.0) 1 (25.0) DOD 1 1 (100) 0 (0) 5p involved the entire short arm and was detected in 3/12 DOC 2 2 (100) 0 (0) 1.00 (b) cases, two in association with several other abnormalities and one together with a second amplification on chromo- Abbreviations: AWD, alive with disease; DOC, dead of other causes; DOD, dead of disease. some 16q. The second common event, instead, involved a The distribution of AK vs NK genomes is analyzed within the different smaller genomic region (focal event), of about 1 Mb at categories, which characterize the clinical and pathological variables. 8p11. This defect was found in two cases, either in The distribution of AK vs NK genomes is analyzed within the different association with a large number of additional anomalies, categories, which characterize the clinical and pathological variables. or together with the amplification of 22q only. We Percentages are reported in brackets. aPack-years: number of cigarettes smoked per day  number of years performed genomic RQ–PCR with 8p11-specific primers smoked/20. on a total of 44 additional lung tumor samples and The Fisher exact test and the Mantel–Haenszel w2-test for trend were detected 4/44 cases carrying the amplification. Moreover, used to assess the association between categorical or ordinal variables RQ–PCR walking on such cases, using suitable primer and karyotypes. The Student’s t-test was used to assess differences in the mean age among patients with aneuploid or normal karyotype. pairs covering this region (see Figure 1), allowed to Footnotes: (a) Continuous variables (age) were compared with use of precisely defining the extentoftheamplifiedarea.As Student’s t-test; (b) categorical variables (tumor stage, nodal status, shown in Figure 1, representing the two most significant tumor size, vascular invasion, BAC, outcome) were compared with samples, the minimal amplified region at 8p11 extended use of the Fisher’s exact test; (c) ordinal variables (pack-years, for about 1 Mb and included the ADAM18, ADAM2, tumor grade, necrosis, BAC pct) were compared with use of the Mantel–Haenszel test for trend. Statistically significant values are INDO, INDOL1 and ZMAT4 loci. Interestingly, in indicated in bold. 5/44 cases genomic RQ–PCR revealed the presence of 8p11 LOH, involving also the 1 Mb above defined area.

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1120 Table 2 SNPs whole genome and mutation analyses results on the identified NK tumors Group Tumor Tumor Tumor Whole genome analysis Interval Mutation analysis sample type stage K-RAS TP53 EGFR

I 39 ADK IA loh2p24.2–24.1 17.9–23.9 M None V272M None loh2p22.2–21 37–44.5 M amp2p22.1–16.3 45.9–49.5 M amp5ptel-q11 Entire p arm amp6p25–23 Tel-14.9 M loh8ptel-p12 Tel-33 M amp8p11 See Figure 1 loh9 Whole chrom amp11q13.2–13.4 68–70 M amp13q14.3–21.32 52.4–68 M amp14q13.1–21.2 33–45.8 M amp17q23.2–23.3 57–58.8 M amp18q11.2 18.3–21.7 M amp19q11-tel 23M-tel amp20 Whole chrom ampXq25-tel 122M-tel 65 SCC IA amp1p12–21.1 118.9–143 M None None None amp1q25.3–31.1 179.9–183.6 M loh2p25.3–31.1 5.4–15.7 M loh3p Whole p arm amp3q11.2–26.2 95.7–171.6 M amp3q26.2-tel 171M-tel loh4 Whole chrom amp5ptel–q11.2 Tel-54.3 M loh5q11.2-tel 54M-tel amp7p11.2 53.7-56.5 M loh9ptel–21.1 Tel-28.4 M upd15q15.1-tel 40.4M-tel loh16q11-tel 31M-tel upd17 Whole chrom loh21 Whole chrom amp21ptel–q21.1 Tel-20.8 M II 58 ADK IA Normal — None None None 77 LCC IA Normal — None None None III 15 SCC IA amp8p11 39.7–40.6 M None None None amp22q Whole q arm 24 ADK IB amp12p11.1–12.3 16–34 M G13C None None amp18p Whole p arm ampXp11.3–11.23 44–48.5 M 30 ADK IA amp1q21.1–23.3 143–162 M G12C None None 48 CAR IV -X Whole chrom None None None 53 SCC IA amp5p15.32-cent 5M-cent None None None amp16q12.1–21 47.8–61.9M 73 ADK IB amp3q26.1-26.2 165–172 M G12C None None loh8ptel-p11 Tel-47 M

Abbreviations: ADK, adenocarcinoma; CAR, carcinoid; LCC, large cell carcinoma; SCC, squamous cell carcinoma. Common regions of alteration among different patients are bolded. NB: Sample #42 (ADK IA), harbouring the TP53 mutation P151S, was not included in this table, because the available material did not match the criteria required for the whole-genome analysis. SNPs analysis was performed according to the standard Affymetrix protocol. Briefly, genomic DNA was quantified by NanoDrop spectrophotometer and run on agarose gel to exclude sample degradation. 250 ng of genomic DNA was digested with either XbaI or HindIII restriction , depending on the GeneChip Mapping array to be applied too. Adapter sequences were ligated to digested DNA fragments and the ligated fragments were amplified by PCR under limiting conditions. PCR parameters were designed to favor the amplification of DNA fragments 250–1000 bp in size, achieving an up to 10-folds reduction in genome complexity. PCR products were then purified, fragmented and terminally labeled with biotin. The hybridization cocktail was then applied to the specific GeneChip Mapping array and hybridization took place in a rotisserie oven for 16 h at 48 1C and 60 r.p.m. speed. Mapping arrays were washed and stained using the Fluidic Station 450 and scanned using the GeneChip scanner 3000 7G according to manufacturer’s specifications. For the sequence analysis, each forward and reverse primer was designed in such a way to include also a 50 universal tail, consisting of a PE-21 and a M13rev sequence, respectively. This strategy allows to sequencing all the different PCR fragments with only two sequencing primers. Primers were also designed with a similar Tm, so that all the different regions could be amplified at the same time in isothermal conditions. Following the PCR reactions, products were controlled on 1% agarose gel, quantified and purified, by removing free PCR primers and dNTPs. An appropriate amount of each amplicon was used in the sequencing reactions. Each region was covered 2 Â, using both PE-21 and M13rev primers. The sequencing reactions were set up using BigDye v3.1 chemistry from Applyed Biosystems (Sanger’s method). Purification from unincorporated terminators was performed before reactions loading.

We also extended our analysis by searching for the TP53 and EGFR. The full coding sequence, complete of presence of sequence variations in genes that have been intron–exon junctions, of the K-RAS and TP53 genes, found frequently mutated in lung cancer, namely K-RAS, was analyzed, whereas only exons 18–21 of EGFR were

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1121

Cr8p11 (Mb) a 38 38.5 39 39.5 40 40.5

b Patients

Q-PCR primers c 1 23 4 5 6 7 8

d Genes INDO TACC1 ZMAT4 FGFR1 TM2D2 HTRA4 ADAM9 ADAM2 INDOL1 ADAM32 ADAM18 PLEKHA2

Minimal amplified region Figure 1 Study of the genomic amplification at 8p11. Results of the RQ–PCR mapping are shown. (a) Genomic position within 8p11 is given in Mb. (b) Horizontal bars represent the two most significant patients, showing 8p11 genomic amplification and defining the minimal amplified region. Dashed lines indicate the boundary between the last amplified and the first non-amplified primers pair. (c) RQ–PCR primer pairs, used to assess the extent of the genomic amplification in each patient, are reported and correspond to the following genomic regions at 8p11: 1 ¼ 38; 2 ¼ 38.4; 3 ¼ 38.53; 4 ¼ 39.5; 5 ¼ 39.7; 6 ¼ 39.8; 7 ¼ 40.4; 8 ¼ 40.6 Mb. (d) Genes located in this region are reported, according to the 8p11 physical map available at the UCSC web site (http://genome.ucsc.edu/), and represented by horizontal gray bars. 50–30 genes orientation is indicated by the black arrows. Gene names are reported underneath.

taken into consideration, given the fact that most of the Table 3 Summary of mutation frequencies found in the three lung cancer associated mutations described to date lie analyzed genes within this portion of the gene. We decided to perform Gene % mutation % mutation % mutation this part of the study on all the NK cases (11 in total), NK AK total and on 20 aneuploid tumors (chosen as representative of K-RAS 27 (3/11) 40 (8/20) 35.5 (11/31) our population). The results are summarized in Tables 2 TP53 18 (2/11) 40 (8/20) 33.3 (10/30) and 3 and reported in Supplementary Table 2, 3, and 4. EGFR ———

Discussion et al., 2007; Beroukhim et al., 2010), in order to assess the presence of genetic defects (genomic amplifications The presence of genomic alterations in tumor DNA is a and/or deletions). Nevertheless, these works did not recognized hallmark of human cancers. Although it is take into consideration the possibility of distinguishing accepted that other events, for example, those epi- between highly rearranged (AK) and apparently normal genetically acting on and affecting either chromatin genomes (NK). This is an important distinction, because structure or transcription, may be active players in it can provide an indication of the putative genetic tumor growth (Shames et al., 2006; Esteller, 2008; Ebi events marking the early steps of lung tumorigenesis. et al., 2009), the complete characterization of each Therefore, we firstly focused on the assessment of the cancer genome remains a key step towards a definitive genomic status, by analyzing metaphase spreads from understanding of the tumorigenic process. At the same the surgical-resected tumor tissue samples (macroscopic time, the identification of those defects defining the early analysis). We found that the vast majority of the steps of cancer development will result in a better samples, despite being frequently in the early stages, knowledge of the very initial molecular alterations carried an aneuploid karyotype, indicating that a high triggering cancer onset. One major effect of such an degree of genomic abnormalities can be evidenced even accomplishment would be the possibility of setting up in screening-detected asymptomatic tumors. Our data new diagnostic tools, which could allow detection of the did not evidence a specific and statistically significant disease even in the asymptomatic stages. Prevention and association between presence of aneuploidy and several treatment will be certainly improved by a better under- of the clinical and pathological parameters analyzed. standing of the molecular origins and evolution of the A major criticism against high-risk population screen- disease. This is of particular relevance for lung cancer, ing-based protocols for lung cancer detection is an characterized by a very low survival rate, because of late overdiagnosis bias, according to which, CT-detected diagnosis and lack of successful criteria for early disease cancers include a significant percentage of relatively detection. Therefore, the COSMOS screening, which is slow-growing, indolent cancers, whose apparent longer based on the continuous observation of a high-risk survival is likely due to the non-aggressive nature of population, represented an extraordinary resource for such cancers, more than to a real benefit of the screening the analysis of the genomic status of asymptomatic lung itself. However, as CT-detected cancers can carry a high cancers at various stages of progression (early to more degree of aneuploidy, commonly evidencing the pre- advanced stages). Importantly, other studies have sence of genomic instability, which is a distinctive tumor already analyzed lung tumors in terms of copy number hallmark, we think that our results, although need to be alterations (Tonon et al., 2005; Zhao et al., 2005; Weir confirmed in a larger cohort, contribute to support the

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1122 validity of a screening-based approach. In reality, our the basis of the results of our analysis we separated our analysis reveals that the vast majority of CT-detected NK samples in three groups: (I) two tumor samples lesions carry abnormal genomes, providing evidence with several genomic alterations. The genome of these against the risk of an overdiagnosis bias due to the high samples presented a high number of copy-number sensitivity of the imaging techniques used in the screen- alterations, in various chromosome regions, and from ing program. a genetic point of view can be considered, similarly to Additional data come in support of this conclusion. AK samples, less relevant, because they do not consent Aneuploidy was detected in the majority of grade 1 the identification of early events. (II) Two tumor tumors, which are considered less aggressive. Moreover, samples with no evidence of subtle genomic alterations the proportion of AKs in tumors characterized by a BACs (although it is possible that a similar analysis at a higher (usually a sign of less aggressive behavior) resembles that level of resolution, for example, using the Affymetrix identified in tumors not showing this feature, indica- 250K array (Affymetrix, Santa Clara, CA, USA), could ting that also less aggressive tumors might harbor a very identify sub-microscopic genomic abnormalities, eluded unstable genome. In this sense, our data confirm recently by this study). As for point mutations in genes reported conclusions underlining that screening-detected specifically associated to lung cancer (Ding et al., stage I lung cancers share most of the pathological features 2008), we could not detect any in the three selected (Pelosi et al., 2008) and gene expression profiles (Bianchi genes we analyzed; nevertheless, we cannot exclude the et al., 2004) of fully malignant tumors. presence of sequence alterations in other genes, not At the same time, we found a statistically significant screened in this study. (III) Six samples with a limited correlation between AK and vascular invasion, thus a number of genomic abnormalities. These are particu- normal karyotype may indicate a lower degree of larly interesting because they allowed the identification metastatic potentiality. Interestingly, absence of macro- of few defects that may characterize the genome of scopic genomic alterations was more common, even early-stage lung cancers. Interestingly, five of these though not statistically significant, among stage I and six cases are stage I (either A or B) non-small cell lung smaller (o15 mm) tumors, as well as those not showing cancers: only one sample is a multifocal, indent stage IV necrosis. In addition, none of the lymph node-positive carcinoid, harboring the deletion of one chromosome X tumors carried an AK. Nevertheless, very small lesions (ÀX), whereas the other five cases show copy-number can also present a completely abnormal genome. Indeed, alterations involving specific intra-chromosomal re- our results confirm that among LD-CT-detected tumors, gions. Three of these also carry a K-RAS mutation even at early stages, an abnormal chromosome content (G12C in two and G13C in one). This is not surprising, is a common feature, further indicating the importance as mutations in this gene have already been described of distinguishing between AK and NK tumors. Overall, as early events in lung carcinogenesis (Herbst et al., our screening approach can greatly enhance the 2008). We can then suppose that amplifications at possibility to identify lung cancers with a macroscopi- 1q21.1–23.3, 3q26.1–26.2, 5p15.32-cent, 8p11, 12p11.1– cally normal genome, as demonstrated by the 23.2% 12.3, 16q12.1–21, 18p, 22q, Xp11.3–11.23 and LOH at (10/43) NK tumors detected within the COSMOS study, 8ptel-p11 might all represent early genetic events in non- versus the 7.1% (1/14) among the pilot non-screening small cell lung cancers onset. In particular, amplifica- collection. The difference between AK and NK lung tions at 5p and 8p11, together with LOH at 8p, were tumors, if confirmed on larger cohorts, might translate detected in more than one of the analyzed cases. in evaluations on the tumor prognosis. However, at Considering the known difficulties in assessing the present, it is not possible to correlate karyotypes to presence of LOH, due to the unavoidable presence of prognostic outcomes because of both the limited follow- genomic DNA from normal cells in tumor tissue up data and the small number of lung cancer deaths samples, we concentrated our analysis on the regions among our patients. Surprisingly, we could show a of amplification. We identified three large scale and six correlation at the limit of statistical significativity focal events. Among them, those affecting 1q21.2–23.3, between the number of pack-years (more than 60) and 5p15.32-cent, 8p11 and 12p11.1–12.3 have already been the presence of a NK. The clinical significance of this previously described (Tonon et al., 2005; Zhao et al., correlation, if any, is not clear and needs further 2005; Weir et al., 2007; Beroukhim et al., 2010): the confirmation. remaining five have never been involved in lung cancer The identification of early stage lung tumor samples whole-genome studies. Interestingly, as already re- without macroscopic genomic alterations (NK samples) ported, the amplifications of 1q21.2–23.3 and 12p11.1– prompted us to conduct a whole-genome analysis on 12.3 encompass the ARNT and K-RAS genes, respec- such samples, in order to search for the presence of tively. Although the role of KRAS in lung cancer subtle genomic alterations, which could be a sign of has been largely known and described (reviewed in early stages of lung tumorigenesis (microscopic analy- Herbst et al., 2008), ARNT (aryl hydrocarbon nuclear sis). A total of 11 NK tumor samples were found in our translocator) is known to interact with AHR (aryl population, and 10 were eligible for whole-genome hydrocarbon receptor), giving rise to a transcriptional analysis, resulting in the genomic characterization regulatory heterodimer implicated in the carcinogenic summarized in Table 2. We included in our analysis action of cigarette smoke (Puppala et al., 2007). the search for mutations in three genes among those Recently, it was reported to be involved, albeit indirectly mostly involved in lung cancer (Ding et al., 2008). On through the aryl hydrocarbon receptor repressor

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1123 (AHRR), in multiple human cancers, including lung hypothesis that also this common genomic alteration, (Zudaire et al., 2008), as well as associated to acute although less frequent than the 5p amplification, might myeloid leukemia through the TEL-ARNT fusion represent an important event among those marking the protein (Nguyen-Khac et al., 2006). early steps of lung tumor development. As for the two amplified regions of chromosomes 5 Concerning the mutation analysis we performed, and 8, these have been detected in more than one of our assaying the K-RAS, TP53 and EGFR genes, our results NK samples (amp5p in three and amp8p11 in two are in agreement with what already reported (reviewed cases). 5p amplification has already been described in in Herbst et al., 2008): (i) K-RAS mutations are more previous studies as a quite common event in lung frequently found in lung cancers arising in smokers, and cancers. More specifically, in the study performed by all of our patients are heavy present or former smokers; Weir et al. (2007), who characterized the lung ADK (ii) none of the patients we analyzed, either NK or AK, genome in over 500 samples, the copy-number gain of carry EGFR mutations, which, in fact, are more chromosome 5p is described as the most common frequently found in non-smokers. As a matter of fact, genomic alteration, found in the 60% of total samples. EGFR mutations represent the first specific genetic Moreover, the data obtained in that study on focal alteration associated with never-smokers, and increasing copy-number variations, shows that a 1 Mb region at 5p smoke exposure negatively correlates with sequence also containing the TERT gene (along with other variations within this gene; (Sun et al., 2007) (iii) we additional nine genes) is found amplified in eight of could detect TP53 mutations, one of the most common the studied samples, strongly indicating that a gene changes in human tumor cells, commonly seen in both important for lung carcinogenesis might reside in this smokers and never-smokers. specific region. Although our results are consistent with Interestingly, the frequency of sequence abnormalities the definition of this amplification as a common event in in the three genes we examined is higher in AK than NK lung cancer, they also suggest that this alteration might samples (Table 3), as one would expect in a more have a role in the early steps of the tumorigenic process, unstable genome and as also described by Ding et al. as is found in NK lung cancers and, at least in one case, (2008), showing that gene mutations recurred more together with only one additional genomic abnormality frequently in association with copy-number variations in (16q12.1–21 amplification). Similarly, the 8p11 amplifi- the genomes they analyzed. cation we detected has been described in previous Recently, Beroukhim et al. (2010) conducted the studies. In the study by Tonon et al. (2005), the largest analysis of high-resolution copy-number evalua- amplified area was restricted by genomic RQ–PCR to tion in 26 different cancer types (over 3000 cancer a region including the WHSC1L1 and LETM2 genes. specimens). This study identified 357 regions of focal A second study, by Zhao et al. (2005), defined an copy-number alterations (either amplifications or dele- amplified area of about 1 Mb, located just telomerically tions), including some events common to several cancer to that previously identified. Our fine mapping results types (158) and others specific to individual cancer indicate that the common amplicon among our cases subgroups (199). An emerging challenge for the scientific partially overlaps that reported by Zhao et al. (2005), community would be the definition of a catalog of starting from its terminal portion and further extending cancer-related copy-number alterations, both common for about 1 Mb. Five genes have been mapped to to all major cancer types and typical of specific this area: ADAM18, ADAM2, INDO, INDOL1 and categories. The final expected result will be that of ZMAT4. No data are available on the possible identifying specific cancer-related genes, which could involvement in cancer for ADAM18 and ADAM2, two become targets for therapeutic interventions. This work members of the ADAM (a disintegrin and metallopro- already provided two examples with the MCL1 and tease domain) family, as well as ZMAT4 (a gene BCL2L1 genes, which appear to be amplification encoding a zinc-finger protein). More is known about targets. One requisite to fulfill this goal resides in the INDO and INDOL1, which encode two proteins availability of data collected on the highest possible involved in tryptophan catabolism. Several studies, number of cancer samples. Similar studies and others especially regarding INDO, have demonstrated their specifically related to single tumor types, as the one we role in inducing immune tolerance, leading to the are presenting, can contribute to the gathering of all the hypothesis that these proteins might trigger a mechan- information required to associate specific genetic events ism of immune tolerance against tumors (Uyttenhove to the onset of cancer (common as well as type-specific et al., 2003; Metz et al., 2007). INDO expression has defects). been described in several tumors and studies in animal In summary, in our work, we highlight the impor- models evidenced that tumors expressing INDO resist tance of studying a screened high-risk population to immune rejection and expand leading to death (Uytten- investigate if tumors with a normal chromosome hove et al., 2003). Besides, the growth of subcutaneously complement might harbor a limited number of subtle induced tumors in mice was significantly slowed down, genetic defects. Such events, when arising in tumors at although not prevented, by treatment with 1-methyl-L- early stages, can provide new indications on the putative tryptophan, a competitive inhibitor of INDO (Uytten- initial genetic modifications, which mark the early steps hove et al., 2003). Further studies are needed to of tumor formation in the lung tissue, and thus offer investigate a possible involvement of these genes in lung new opportunities for research directions and improved carcinogenesis. Nevertheless, our data account for the diagnosis and prognosis.

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1124 Materials and methods Human Mapping 100K arrays set (Mapping 50K Hind 240 Array or Mapping 50K Xba 240 Array) according to the Patients manufacturer’s instructions. For details, see the legend to 5203 asymptomatic heavy smokers or former smokers Table 2. The tumor DNA content was compared with that of a volunteers, with a smoking history of more than 20 pack- normal control (see below). Coverage of one SNP every 60Kb years and age X50 years, were enrolled in the COSMOS study is guaranteed, leading to the identification of genomic (a spiral low-dose CT scan screening approach) started at the alterations non-detectable with the conventional cytogenetic European Institute of Oncology in 2004 (Veronesi et al., 2007, analysis. Array data have been posted at the ArrayExpress 2008). At the end of the second year, among 89 screening- Archive database and are publicly available (www.ebi.ac.uk/ identified patients with lung cancers, 57 cases with clinical arrayexpress; ArrayExpress accession: E-MEXP-2450). localized lung malignancy were evaluated for the present analysis. These, together with 21 other asymptomatic patients (pilot population), carrying non-screening-detected lung can- SNPs data analysis cers, harboring similar characteristics to the screened popula- Data were analyzed with the CNAGv3.0 software package tion (asymptomatic, smokers, older than 50 years, with an (Nannya et al., 2005), which implemented an algorithm for copy occasionally detected localized cancer), constituted our study number detection and is freely available to academic users. Paired population (78 cases). We excluded cases submitted to analysis was carried out for each tumor-normal control matched preoperative or definitive chemotherapy or with no adequate pair, in order to identify all the possible abnormalities that CNAG tissue availability for the genomic analysis. is able to define: LOH, amplification and uniparental disomy. Unpaired analysis was also performed for each tumor sample, in order to avoid missing possible germline DNA alterations Lung tumor samples (Reference Genomic DNA: Affymetrix internal normal genomic At the time of surgical resection, a portion of the tumoral tissue control, 103 Human genomic DNA control). was collected, snap-frozen and stored at À80 1C for subsequent evaluation and DNA extraction. Tumor cellularity was deter- mined by analyzing hematoxylin–eosin stained sections of the RQ–PCR frozen samples. The evaluation was carried out by a pathologist Sybr green. SNPs results were validated by RQ–PCR on the of the Molecular Pathology Unit, at the IFOM-IEO Campus. tumor versus normal genomic DNA of each sample. RQ–PCR Only NK samples with a tumor cellularity of at least 70% were was performed using the SYBR Green technology on an ABI used for the high-resolution genomic analysis: this allowed us to PRISM 7700 sequence detection system (PE Applied Biosys- limit normal cells contamination of the tumor genomes. DNA tems, Foster City, CA, USA). PCR reactions were prepared in extraction was performed on 10–15 4 mmsections,cutfromthe a final volume of 25 ml(1Â SYBR Green PCR master mix; frozen tumor samples, using the GenomicPrep Cells and Tissue 5 ng DNA). Results were normalized to a control genomic DNA isolation kit (GE-Healthcare, Chalfont St Giles, UK). sequence, corresponding either to the MYB or the HOXa9 Patients’ and tumors’ characteristics are listed in Supple- genes. Each SNP genomic region versus MYB or HOXa9 mentary Table 1. amplicons (test/normalizer) ratio for the studied cases was also normalized to the same ratio concomitantly measured in a total human DNA control sample (Total Human DNA, Roche Normal controls Applied Science, Indianapolis, IN, USA), as a calibrator. This Peripheral blood samples from each patient were collected same approach was used for RQ–PCR walking experiments, before surgical resection, as a source of tumor-matched normal defining the focal region of amplification at 8p11 (Figure 1). DNA, obtained according to standard methodologies (Miller Sequences of primer pairs are available upon request. et al., 1988).

Ploidy evaluation TaqMan. Real-time PCR to determine the 8p11 amplifica- Following diagnosis, during surgical resection, fresh tumor tion frequency was carried out on the ABI/Prism 7900 HT tissue samples were collected to start short-term cultures. Sequence Detector System (Applied Biosystems), using a pre- Samples were mechanically dissociated and cultures were PCR step of 10 min at 95 1C, followed by 40 cycles of 15 s started for 24, 48 or 72 h (depending on the sample size) in at 95 1C and 60 s at 60 1C. DNA (5 ng) was amplified (in Ham’s F10 medium with 10% fetal bovine serum, NA, Pen/ triplicate) in a reaction volume of 15 ml containing the Strep and Glutamine. Karyomax Colcemid Solution (Invitro- following reagents: 7.5 ml of TaqMan PCR Mastermix 2 Â gen, Carlsbad, CA, USA) was then added and left for 2 h at No UNG (Applied Biosystems), 0.75 ml of TaqMan Gene 37 1C. Cells were centrifuged and resuspended in 0,075 M KCl expression assay 20 Â (Applied Biosystems). Samples were for 30 min at 37 1C and then fixed with a 3:1 Methanol:Acetic amplified with gene-specific primers and normalized to a Acid solution. The resulting nuclei suspension was used to control genomic sequence, corresponding to the HOXa9 gene prepare slides for metaphase spreads evaluation. By simply (assay sequences available upon request). Our analysis was counting metaphase chromosomes, diploidy (46 chromosomes) extended to a total of 44 cases, including advanced lung tumor or aneuploidy (more or less than 46 chromosomes) was samples, not identified through the COSMOS screening. established for each sample. Importantly, as the analyzed metaphase spreads could also belong to normal cells, always Direct sequencing analysis present in the tumor tissue samples, although in variable The regions of interest were amplified using exon-specific amounts, we examined at least 10 normal metaphase spreads primers, covering the coding sequence from the start to the for each given genome, before classifying it as normal. stop codon (except for the EGFR gene). Each primer pair was designed on intronic regions, in order to include the splicing SNPs array experiments junctions. For details, see the caption to Table 2. The In all those tumors showing a normal genomic content, SNPs sequencing reactions were loaded into the capillaries of the genotyping was performed with the Affymetrix GeneChip 3730xl sequencer (Applyed Biosystems) and analyzed with the

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1125 Mutation Surveyor v3.24 software (from SoftGenetics, State Acknowledgements College, PA, USA). This study was supported by AIRC (Associazione Italiana per Statistical analysis 2 la Ricerca sul Cancro), FUV (Fondazione Umberto Veronesi) The Fisher exact test and the Mantel–Haenszel w -test for trend and Eredita’ Benilde Viotti. We would like to thank Daniela were used to assess the association between categorical or Brambilla, Raffaella Bertolotti and Giovanna Ciambrone for ordinal variables and karyotypes. The Student’s t-test was used data managing; Antonio De Vito and Loris Bernard for to assess differences in the mean age among patients with technical assistance; the Real Time PCR Service at the IFOM- aneuploid or NK. Analyses were performed with the SAS IEO campus for TaqMan assays; Marco Bianchi, Giovanna software version 8.02 (Cary, NC, USA). All P-values were two- Maria Jodice, Paolo Nuciforo and the Molecular Pathology sided. See the caption to Table 1 for additional information. Unit at IFOM for samples characterization; Elvira Gerbino for samples preparation; the staff of the IEO Thoracic Surgery Conflict of interest and Radiology Departments for clinical assistance; Paola Dalton for manuscript editing; Francesca Toffalorio for The authors declare no conflict of interest. helpful discussion.

References

Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Mitsuuchi Y, Testa JR. (2002). Cytogenetics and molecular genetics of Donovan J et al. (2010). The landscape of somatic copy-number lung cancer. Am J Med Genet 115: 183–188. alteration across human cancers. Nature 463: 899–905. Mulshine JL, Sullivan DC. (2005). Clinical practice: lung cancer Bianchi F, Hu J, Pelosi G, Cirincione R, Ferguson M, Ratcliffe C et al. screening. N Engl J Med 352: 2714–2720. (2004). Lung cancers detected by screening with spiral computed Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A tomography have a malignant phenotype when analyzed by cDNA et al. (2005). A robust algorithm for copy number detection using microarray. Clin Cancer Res 10: 6023–6028. high-density oligonucleotide single nucleotide polymorphisms geno- Diederich S, Thomas M, Semik M, Thomas M, Lenzen H, Roos N typing arrays. Cancer Res 65: 6071–6079. et al. (2004). Screening for early lung cancer with low dose spiral Nguyen-Khac F, Della Valle V, Lopez RG, Ravet E, computed tomography: results of annual follow-up examinations in Mauchauffe´M, Friedman AD et al. (2006). Functional analyses asymptomatic smokers. Eur Radiol 14: 691–702. of the TEL-ARNT fusion protein underscores a role for oxygen Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K tension in hematopoietic cellular differentiation. Oncogene 25: et al. (2008). Somatic mutations affect key pathways in lung 4840–4847. adenocarcinoma. Nature 455: 1069–1075. Patel JD, Bach PB, Kris MG. (2004). Lung cancer in US women: a Ebi H, Sato T, Sugito N, Hosono Y, Yatabe Y, Matsuyama Y et al. contemporary epidemic. JAMA 291: 1763–1768. (2009). Counterbalance between RB inactivation and miR-17–92 Pelosi G, Sonzogni A, Veronesi G, De Camilli E, Maisonneuve P, overexpression in reactive oxygen species and DNA damage Spaggiari L et al. (2008). Pathologic and molecular features induction in lung cancers. Oncogene 28: 3371–3379. of screening low-dose computed tomography (LDCT)-detected Enstrom JE, Heath Jr CW. (1999). Smoking cessation and mortality lung cancer: a baseline and 2-year repeat study. Lung Cancer 62: trends among 118 000 Californians, 1960–1997. Epidemiology 10: 202–214. 500–512. Proctor RN. (2001). Tobacco and the global lung cancer epidemic. Nat Esteller M. (2008). Epigenetics in cancer. N Engl J Med 358: Rev Cancer 1: 82–86. 1148–1159. Puppala D, Gairola CG, Swanson HI. (2007). Identification of Henschke CI, McCauley DI, Yankelevitz DF, Naidich DP, kaempferol as an inhibitor of cigarette smoke-induced activation McGuinness G, Miettinen OS et al. (1999). Early lung cancer of the aryl hydrocarbon receptor and cell transformation. Carcino- action project: overall design and findings from baseline screening. genesis 28: 639–647. Lancet 354: 99–105. Shames DS, Girard L, Gao B, Sato M, Lewis CM, Shivapurkar N Herbst RS, Heymach JV, Lippman SM. (2008). Lung cancer. N Engl J et al. (2006). A genome-wide screen for promoter methylation in Med 359: 1367–1380. lung cancer identifies novel methylation markers for multiple Horner MJ, Ries LAG, Krapcho M, Neyman N, Aminou R, malignancies. PLoS Med 3: 2244–2263. Howlader N et al. (2009). SEER Cancer Statistics Review, 1975– Sun S, Schiller JH, Gazdar AF. (2007). Lung cancer in never 2005. National Cancer Institute: Bethesda, MD, http://seer.cancer. smokers—a different disease. Nat Rev Cancer 7: 778–790. gov/csr/1975_2005/, based on November 2008 SEER data submis- Tonon G, Wong KK, Maulik G, Brennan C, Feng B, Zhang Y et al. sion, posted to the SEER web site, 2009. (2005). High-resolution genomic profiles of human lung cancer. Humphrey LL, Teutsch S, Johnson M. (2004). Lung cancer screening Proc Natl Acad Sci USA 102: 9625–9630. with sputum cytologic examination, chest radiography, and Uyttenhove C, Pilotte L, Theate I, Stroobant V, Colau D, Parmentier computed tomography: an update for the U.S. Preventive Task N et al. (2003). Evidence for a tumoral immune resistance Force. Ann Intern Med 140: 740–753. mechanism based on tryptophan degradation by indoleamine 2, Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. (2008). 3-dioxygenase. Nat Med 9: 1269–1274. Cancer statistics. CA Cancer J Clin 58: 71–96. Veronesi G, Bellomi M, Veronesi U, Paganelli G, Maisonneuve P, Metz R, Duhadaway JB, Kamasani U, Laury-Kleintop L, Muller AJ, Scanagatta P et al. (2007). Role of positron emission tomography Prendergast GC. (2007). Novel tryptophan catabolic IDO2 scanning in the management of lung nodules detected at is the preferred biochemical target of the antitumor indoleamine 2,3- baseline computed tomography screening. Ann Thorac Surg 84: dioxygenase inhibitory compound D-1-methyl-tryptophan. Cancer 959–965. Res 67: 7082–7087. Veronesi G, Bellomi M, Mulshine JL, Pelosi G, Scanagatta P, Miller SA, Dykes DD, Polesky HF. (1988). A simple salting out Paganelli G et al. (2008). Lung cancer screening with low-dose procedure for extracting DNA from human nucleated cells. Nucleic computed tomography: a non-invasive diagnostic protocol for Acids Res 16: 1215. baseline lung nodules. Lung Cancer 61: 340–349.

Oncogene Whole-genome analysis of early lung cancers E Belloni et al 1126 Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R et al. in human lung carcinomas revealed by single nucleotide polymorph- (2007). Characterizing the cancer genome in lung adenocarcinoma. ism array analysis. Cancer Res 65: 5561–5570. Nature 450: 893–898. Zudaire E, Cuesta N, Murty V, Woodson K, Adams L, Gonzalez N et al. Zhao X, Weir BA, LaFramboise T, Lin M, Beroukhim R, Garraway L (2008). The aryl hydrocarbon receptor repressor is a putative tumor et al. (2005). Homozygous deletions and chromosome amplifications suppressor gene in multiple human cancers. JClinInvest118: 640–650.

Supplementary Information accompanies the paper on the Oncogene website (http://www.nature.com/onc)

Oncogene