Comprehensive Molecular Profiling of Lung Adenocarcinoma
Total Page:16
File Type:pdf, Size:1020Kb
OPEN ARTICLE doi:10.1038/nature13385 Comprehensive molecular profiling of lung adenocarcinoma The Cancer Genome Atlas Research Network* Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investi- gations of lung adenocarcinoma molecular pathogenesis. Lung cancer is the most common cause of global cancer-related mor- of passenger events per tumour genome11–13. Our efforts focused on com- tality, leading to over a million deaths each year and adenocarcinoma is prehensive, multiplatform analysis of lung adenocarcinoma, with atten- its most common histological type. Smoking is the major cause of lung tion towards pathobiology and clinically actionable events. adenocarcinoma but, as smoking rates decrease, proportionally more cases occur in never-smokers (defined as less than 100 cigarettes in a life- Clinical samples and histopathologic data time). Recently, molecularly targeted therapies have dramatically improved We analysed tumour and matched normalmaterial from230 previously treatment for patients whose tumours harbour somatically activated onco- untreated lung adenocarcinoma patients who provided informed con- genes such as mutant EGFR1 or translocated ALK, RET, or ROS1 (refs 2–4). sent (Supplementary Table 1). All major histologic types of lung ade- Mutant BRAF and ERBB2 (ref. 5) are also investigational targets. How- nocarcinoma were represented: 5% lepidic, 33% acinar, 9% papillary, ever,mostlung adenocarcinomas either lack an identifiable driver onco- 14% micropapillary, 25% solid, 4% invasive mucinous, 0.4% colloid and gene, or harbour mutations in KRAS and are therefore still treated with 8% unclassifiable adenocarcinoma (Supplementary Fig. 1)14. Median conventional chemotherapy. Tumour suppressor gene abnormalities, follow-up was 19 months, and 163 patients were alive at the time of last such as those in TP53 (ref. 6), STK11 (ref. 7), CDKN2A8, KEAP1 (ref. 9), follow-up. Eighty-onepercent of patients reported pastorpresent smok- and SMARCA4 (ref. 10) are also common but are not currently clinically ing. Supplementary Table 2 summarizes demographics. DNA, RNA and actionable. Finally, lung adenocarcinoma shows high rates of somatic protein were extracted from specimens and quality-control assessments mutation and genomic rearrangement, challenging identification of all were performed as described previously15. Supplementary Table 3 sum- but the most frequent driver gene alterations because of a large burden marizes molecular estimates of tumour cellularity16. a b Transversion high Transversion low Figure 1 | Somatic mutations in lung Gender Number of mutations Number of mutations adenocarcinoma. a, Co-mutation plot from whole Smoking status 150 100 50 0 0204060 Male Female NA Ever-smoker Never-smoker TP53 exome sequencing of 230 lung adenocarcinomas. TP53 46 KRAS KRAS 33 EGFR Data from TCGA samples were combined with KEAP1 17 STK11 12 STK11 17 KEAP1 previously published data for statistical analysis. EGFR 14 NF1 NF1 SMARCA4 Co-mutation plot for all samples used in the 11 Frequency (%) RBM10 BRAF 10 SETD2 PIK3CA statistical analysis (n 5 412) can be found in 9 RB1 RBM10 8 U2AF1 Supplementary Fig. 2. Significant genes with a MGA 8 ERBB2 MET 7 corrected P value less than 0.025 were identified ARID1A 7 PIK3CA 7 c using the MutSig2CV algorithm and are ranked SMARCA4 6 Males Females RB1 4 Number of mutations Number of mutations in order of decreasing prevalence. b, c,The CDKN2A 4 204060 0 0 20 40 60 U2AF1 3 differential patterns of mutation between samples RIT1 2 EGFR STK11 classified as transversion high and transversion low Missense Splice site Frameshift Nonsense In-frame indel SMARCA4 samples (b) or male and female patients (c) are 100 RBM10 80 shown for all samples used in the statistical analysis 60 (n 5 412). Stars indicate statistical significance 40 Q < 0.05 Missense Frameshift 20 P < 0.05 Splice site In-frame indel using the Fisher’s exact test (black stars: q , 0.05, Percentage 0 Nonsense Other non-synonymous Transversions Transitions Indels, other grey stars: P , 0.05) and are adjacent to the sample set with the higher percentage of mutated samples. *A list of authors and affiliations appears at the end of the paper. 00 MONTH 2014 | VOL 000 | NATURE | 1 ©2014 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE Somatically acquired DNA alterations MDM2,KRAS,EGFR,MET,CCNE1,CCND1,TERCand MECOM (Sup- We performed whole-exome sequencing (WES) on tumour and germ- plementary Table 6), as previously described24,8q24nearMYC, and a lineDNA,withameancoverageof97.63 and 95.83, respectively, as per- novel peak containing CCND3 (Supplementary Table 6). The CDKN2A formed previously17. The mean somatic mutation rate across the TCGA locus was the most significant deletion (Supplementary Table 6). Sup- cohort was 8.87 mutations per megabase (Mb) of DNA (range: 0.5–48, plementary Table 7 summarizes molecular and clinical characteristics median: 5.78). The non-synonymous mutation rate was 6.86 per Mb. by sample. Low-pass whole-genome sequencing on a subset (n 5 93) of MutSig2CV18 identified significantly mutated genes among our 230 the samples revealed an average of 36 gene–gene and gene–inter-gene cases along with 182 similarly-sequenced, previously reported lung adenocarcinomas12. Analysis of these 412 tumour/normal pairs high- lighted 18 statistically significant mutated genes (Fig. 1a shows co-mutation a Exon Exon plot of TCGA samples (n 5 230), Supplementary Fig. 2 shows co-mutation 13 20 EML4–ALK plot of all samples used in the statistical analysis (n 5 412) and Sup- 620 plementary Table 4 contains complete MutSig2CV results, which also EML4–ALK appear on the TCGA Data Portal along with many associated data files 620 (https://tcga-data.nci.nih.gov/docs/publications/luad_2014/). TP53 was EML4–ALK commonly mutated (46%). Mutations in KRAS (33%) were mutually 11 12 exclusive with those in EGFR (14%). BRAF was also commonly mutated TRIM33–RET (10%), as were PIK3CA (7%), MET (7%) and the small GTPase gene, RIT1 112 (2%). Mutations in tumour suppressor genes including STK11 (17%), CCDC6–RET 10 34 KEAP1 (17%), NF1 (11%), RB1 (4%) and CDKN2A (4%) were observed. EZR–ROS1 Mutations in chromatin modifying genes SETD2 (9%), ARID1A (7%) and 634 SMARCA4 (6%) and the RNA splicing genes RBM10 (8%) and U2AF1 CD74–ROS1 (3%) were also common. Recurrent mutations in the MGA gene (which 31 35 encodes a Max-interacting protein on the MYC pathway19) occurred in CLTC–ROS1 8% of samples. Loss-of-function (frameshift and nonsense) mutations 14 32–34 inMGA were mutually exclusive withfocal MYC amplification (Fisher’s SLC34A2–ROS1 exact test P 5 0.04), suggesting a hitherto unappreciated potential mech- Portion of original transcripts not in fusion transcript: anism of MYC pathway activation. Coding single nucleotide variants and indel variants were verified by resequencing at a rate of 99% and 100%, Normalized, exonic mRNA expression: Low High b respectively (Supplementary Fig. 3a, Supplementary Table 5). Tumour Normalized RNA-seq read coverage Exon 14 skipping Number of samples TCGA-99-7458 purity was not associated with the presence of false negatives identified 29 None in the validation data (P 5 0.31; Supplementary Fig. 3b). (0% skipping) 199 0 0 0 Past or present smoking associated with cytosine to adenine (C .A) 0 nucleotide transversions as previously described both in individual genes TCGA-75-6205 111 12,13 Intermediate and genome-wide .C. A nucleotide transversion fraction showed (60–80% skipping) 0 1 1 1 two peaks; this fraction correlated with total mutation count (R2 5 0.30) 0 and inversely correlated with cytosine to thymine (C . T)transition fre- TCGA-44-6775 27 quency (R2 5 0.75) (Supplementary Fig. 4). We classified each sample Full (Supplementary Methods) into one of two groups named transversion- (90–100% skipping) 0 5 1 0 0 5 5 * high (TH, n 269), and transversion-low (TL, n 144). The transversion- Y1003 , WT high group was strongly associated with past or present smoking (P ss del ss mut MET 2.2 3 10216), consistent with previous reports13. The transversion-high Y1003 13 14 15 mutations c and transversion-low patient cohorts harboured different gene mutations. Observed splicing across all tumours Whereas KRAS mutations were significantly enriched in the transversion- (total events = 29,867) high cohort (P 5 2.13 10213), EGFR mutations were significantly enriched 26 Associated with U2AF1 S34F mutation * in the transversion-low group (P 5 3.3 3 10 ). PIK3CA and RB1 muta- (total events = 129; q value < 0.05 ) tions were likewise enriched in transversion-low tumours (P , 0.05). 0.0 0.2 0.4 0.6 0.8 1.0 Additionally, the transversion-low tumours were specifically enriched Proportion for in-frame insertions in EGFR and ERBB2 (ref. 5) and for frameshift Cassette exon Coordinate cassette exons Mutually exclusive exon *P < 0.001 indels in RB1 (Fig.