© 2016 Nature America, Inc. All rights reserved. Correspondence should Correspondence be addressed to M.M. ( Medicine, Washington University School of Medicine, St. Louis, Missouri, USA. Pathology, General Massachusetts Hospital, Boston, USA. Massachusetts, California, San Francisco, San Francisco, California, USA. of and Bioinformatics Biology,Computational University of Texas MD Anderson Cancer Center, Houston, Texas, USA. Laboratory, ITMO University, St. Petersburg, Russia. Institute, Boston, USA. Massachusetts, 1 actionable clinically be may that driver cancer new to discover and SqCCs ADCs in lung teome pro and transcriptome, epigenome, genome, the in found changes NTRK1 the in alterations activating against directed are gation and genomic in including genes, kinase in the corresponding alterations activating harboring responses ADCs lung dramatic with to patients of lead subsets (RTKs) kinases tyrosine receptor never-smokers observed in are ADCs lung of 10–15% approximately subtypes, both for factor risk major the is smoking whereas example, For teristics. charac histopathological and clinical shared and unique both have subtypes NSCLC two These SqCCs. lung mostly and ADCs NSCLCs lung can comprise (SCLC). lung cancer lung non-small-cell small-cell and are (NSCLC) cer classes histological major two The in United the in (ref. States alone 2015 occurred from lung cancer world the around cancer from death of cause leading the remains cancer Lung targeted Regarding Lung CREBBP New pairs. carcinogenesis, To Matthew Meyerson Cancer Genome Atlas Research Network Roel G W Verhaak Aruna Ramachandran Marcin Imielinski Pedamallu Sekhar Chandra Joshua D Campbell adenocarcinomas and squamous cell carcinomas Distinct patterns of somatic genome alterations in lung Nature Ge Nature Received 4 September 2015; accepted 12 April 2016; published online 9 May 2016; Cancer Cancer Program, Eli and Edythe L. Broad Institute of Harvard and MIT, Cambridge, USA. Massachusetts,

Recent efforts haveRecent efforts on focused the comprehensively characterizing compare ROS1

significantly ADCs

Recurrent , ,

NTRK2 therapies in 1 (ref. (ref.

. An estimated 221,000 new cases and 158,000 deaths deaths 158,000 and cases new 221,000 estimated An . neoantigens, n

both lacking etics lung

4 , , we

). Other targeted therapies under current investi current under therapies targeted Other ). tumor alterations

ERBB2 3 adenocarcinoma

. Molecularly targeted therapies directed against against directed therapies targeted Molecularly . mutated

for

examined receptor ADVANCE ONLINE PUBLICATION ONLINE ADVANCE 1

6 lung ,

1 1 2 , , Catherine J Wu types.

, and and , , 47% , , 2 2 5 , , , Anton Alexandrov , Xin , HuXin 12

1 genes ADC

, in

2 tyrosine BRAF

, , Eric A Collisson

of

the New lung

6–

3 the and 1

Department Department of Pathology and Immunology, Washington University, St. Louis, Missouri, USA.

included 8 exome ,

2 . Identifying new cancer-related cancer-related new Identifying . kinases

(ADC) amplification SqCCs , , Sachet A Shukla

6 lung

, , Shiyun Ling SqCC kinase–Ras–Raf [email protected]

ADC sequences

and

4 5 were

PPP3CA 1 are , Molecular Molecular Pathology Unit, General Massachusetts Hospital, USA. Charlestown, Massachusetts, 5 , 2 . , , Peter S Hammerman

lung

and largely 8

more Department Department of Medicine, Brigham and Women’s Hospital, Boston, USA. Massachusetts, 10

3 peaks 7

, , , Maxim N Artyomov

53% 4 , , , David J Kwiatkowski squamous

, , Jaegil Kim and DOT1L 6

pathway

, , Rehan Akbani similar EGFR distinct, MET

encompassed

1 of copy , 2

, Guangwu , Guo Guangwu the , , , , , 10

ALK

RET and

to

A A full list of members and affiliations appears at the end of the paper. number cell

alterations lung immunotherapies 2

those ). ). 12

- - - - , , FTSJD1

1 Department Department of Pathology, Harvard Medical School, Boston, USA. Massachusetts, carcinoma , , Jeremiah Wala

SqCC ) ) or R.G. ( mal paired exome sequences (including 274 previously unpublished unpublished previously 274 (including sequences exome paired mal ADC–nor lung 660 studied we alterations, genetic new identify to and SqCC lung and ADC lung of profiles somatic the Tocompare Comparison RESULTS type. cancer each treat to used be can gies strate therapeutic immunological and targeted distinct or common in differences carcinogenesis between the and diseases and elucidate the degree to which similarities the into insights yield may Such analyses performed. been not has SqCC lung and ADC lung in found genes altered recurrently of comparison comprehensive a addition, mutations passenger with genes other than at a lower frequency mutated genes important distinguish to necessary power statistical additional the providing by problem, this overcome help can types a of tumor across and within samples tumor samples type combining deaminases cytidine APOBEC of activity such and as from carcinogens processes aberrant mutagenic inherent tobacco to exposure prolonged from accumulate can that mutations passenger of number large the of because challenging be can genes

MIR21 of doi:10.1038/ng.356 profiles 1

6 other ,

, , Mara Rosenberg

2 in tumors had , , Andrew D Cherniack

3 [email protected] lung

, Robert , Schreiber Robert 1 in (SqCC) 1

mutations of , , squamous 8 2

lung , , Michael S Lawrence , , Angela N Brooks

ADC, 660

may

had of

ADC,

somatically and

lung

1 2 at aid

Department Department of Medical Oncology, Dana-Farber Cancer , 4 RASA1 2

, , Alice H Berger

least

in

carcinomas to in

ADC

MIR205 SOS1

identify treatment

five

7 1 in Department Department of Medicine, University of

, Carrie , Cibulskis Carrie and ). ).

, lung

altered

predicted VAV1 3

in , , Ramaswamy Govindan

484 new

1 than

1 lung SqCC,

, for , 2 2 , , , Gad Getz

, Bradley , A Bradley Murray

lung

1

RASA1 drivers genes

, , John N Weinstein both

to SqCC, 1

4 neoepitopes. ,

Computer Computer Technologies 9

2 and

alterations . Profiling larger numbers numbers larger Profiling . SqCC ,

subtypes. ,

of

KLF5

and and

lung tumor–normal 1 s e l c i t r A 1

, ,

ARHGAP35 9 MAPK1 11 ,

, 9 EP300

Department Department of 6 Department Department of in

Department Department

Although

lung

, in

and 1

ADCs. 6 ,

11 2 both. , 1 . ,

0

,

. . In  - -

© 2016 Nature America, Inc. All rights reserved. that the somatic drivers of carcinogenesis may be largely distinct in in distinct largely be may carcinogenesis of drivers somatic the that fragile sites (shown in green in putative are peaks these of five peaks, deletion focal several (FDR (CRC) cancer and colorectal (GBM) in glioblastoma to those similar most were ADC lung in genes mutated significantly the In contrast, HNSC (HPV)-negative papillomavirus human in alterations for genes, lapping overlap; (>25% smoking with associations epidemiological with cancers epithelial other two and head in (BLCA), cancer and bladder altered (HNSC) carcinoma cell squamous neck genes the resembled closely most SqCC lung analyses pan-cancer overlap; overlap; (FDR) (>13% rate discovery types false tumor other from genes mutated sig nificantly of lists with overlap greater had SqCC lung and ADC lung for TCGA from types tumor ( types tumor types ( tumor both in altered as identified were peaks amplification ( tumors SqCC lung in frequency mutation higher a significantly had these, of and, types, tumor both in ARID1A 0.1; < MutSig2CV genes using 20 SqCC lung and in mutated ADC lung significantly as in mutated significantly as genes 38 fied and Methods median Online lower SqCCs; with genes (log excluding expression and After ADCs lung respectively. for SqCCs, mutations/Mb 9.7 and mutations/Mb 8.7 of studies previous TCGA from cases described previously 176 and cases unpublished previously 308 ing et al. Atlas (TCGA) Genome Cancer The from cases described previously 227 and cases significant. not NS, labeled. are cancer lung in role characterized a previously with genes to corresponding points Only labels. green with highlighted are sites fragile putative within located Deletions significant. be to considered ( ( type. tumor respective the against plotted is cohort ADC lung with Genes SqCCs. lung 484 and ADCs lung 660 across 1 Figure s e l c i t r A  SqCC. lung and ADC lung b a P Fig. 1 Fig.

) and deletions ( deletions ) and

< 0.01, Fisher’s exact test; test; exact Fisher’s 0.01, < Lung ADC q value 10 10 10 10 NS 8 –1 –2 –4 –8 ) and 484 lung SqCC–normal paired exome sequences (includ exome sequences paired ) and 484 lung SqCC–normal Supplementary Tables 5 Tables Supplementary q b value < 0.1). Although lung ADC and lung SqCC did share share did SqCC lung and ADC lung Although 0.1). < value

NS ), and 13 of 50 focal deletion peaks were altered in both both in altered were peaks deletion focal 50 of 13 and ), P Distinct somatic alterations in lung ADC and lung SqCC. ( SqCC. lung and ADC lung in alterations somatic Distinct , , CDKN2A = 0.105; = 0.105; SMAD4 CTNNB1 MAP2K1 NRAS RIT1 U2AF1 ME MG SETD2 BRAF ERBB2 APC ARID2 AT RAF1 10 b , M –1 T c A 6 Fig. 1 Fig. ) The GISTIC 2.0 algorithm was used to identify significantly recurrent copy number gains and losses. The The losses. and gains number copy recurrent significantly identify to used was algorithm 2.0 GISTIC ) The together with 159 cases from the cohort in Imielinski in Imielinski from cohort the 159 cases with together TP53 Significantly mutatedgenes 2 (FPKM) <6.16 for lung ADCs and <6.27 for lung lung for <6.27 and ADCs lung for <6.16 (FPKM) c 6 ) in the lung ADC cohort are plotted against the respective respective the against plotted are cohort ADC lung the ) in HRAS , KEAP1 KRAS RBM10 SMARCA4 STK11 EGFR 7 Supplementary Fig. 2 Fig. Supplementary Lung SqCC , we observed median somatic mutation rates rates mutation somatic median observed we , , , 7 10 1 , , PIK3CA c ; ; 1 –2 CDKN2A ). Interestingly, when compared to 19 other other 19 to compared when Interestingly, ). . Recurrently mutated and amplified genes in in genes amplified and mutated Recurrently . upeetr Tbe 1 Tables Supplementary KDM6A Fig. Fig. 1 Supplementary Fig. 2 Fig. Supplementary PIK3CA 1 0 q , the lists of significantly mutated genes genes mutated significantly of lists the , ARID1A q value < 0.1) than with each other (12% (12% other each with than 0.1) < value 10 , and , and v c Fig. 1 Fig. NO –4 alue ). ). Taken together, results these suggest upeetr Fg 1 Fig. Supplementary and and , and and , TCH1 NF1 NF1 a MLL2 CDKN2A 6 ). Likewise, only 11 of 42 focal focal 42 of 11 only Likewise, ). FA 10 TP53 FAT1 PTEN ). Only six genes— six Only ). TP53 NFE2L2 T1 —were significantly mutated mutated significantly —were –8 RB1 ), consistent with previous previous with consistent ), , , are specifically enriched enriched specifically are CDKN2A q value in the lung SqCC cohort. The majority of significantly mutated genes were unique to either either to unique were genes mutated significantly of majority The cohort. SqCC lung the in value b ). Among these over ). Among these Lung ADC q value 10 10 10 10 10 10 10 – NS –16 –32 –64 4 q –1 –2 –4 –8 ). Similarly to to Similarly ). values <0.1 were considered to be significantly mutated. The The mutated. significantly be to considered were <0.1 values , and and , NS , e identi we ), TP53 MECOM–TERC NKX2−1 ME KRAS CDK4 MCL1 1 0 MYCL1 ( PIK3CA 10 19p13.2 T q a –1 value value , , ) The MutSig2CV algorithm MutSig2CV ) The RB1 10 18q11.2 ERBB2 Focal amplifications BCL2L1 –2 1 Lung SqCC 2 MDM2 - - - - 10 . ,

–4 CCNE1 CDK6 Mutational never-smokers displayed the highest fraction of estimated mutations of estimated fraction the highest displayed never-smokers ADCs lung all ( to compared when megabase per mutations SI5 of rates overall higher significantly displayed SqCCs Lung signatures. all from mutations estimated of sum the by signature that for tions muta of for mutations number the a by estimated signature dividing ( were who never-smokers were from patients lung ADCs version-low test; low version ( percent of lung from ADCs never-smokers were as categorized trans Eighty-seven inaccurate. be the may SqCC for lung with statuses never-smokers 18 smoking the that suggesting 0.62), = (AUC SqCC 0.87; = (AUC) curve the under (area ADC lung in better substantially ever-smokers versus never- from those into tumors classify to able was megabase per mutations ( SqCC lung in not but ADC in pattern lung a displayed bimodal megabase mutations per related) each signature in each tumor. The estimated by number of SI4 contributed (smoking- mutations of number the estimates also NMF tures, ( COSMIC signature 5 (SI5) with putative to ‘molecular correlation clock’ properties moderate a T with signature at final a and SI2), changes and (SI13 C>T or C>G of signatures related G at (MMR) changes C>T of mismatch-repair signature a (SI4), transversions C>A of signature T C at changes C>T of signature UV-related a included These base data (COSMIC) Cancer in Mutations Somatic of Catalogue the in signatures defined previously with correlated strongly are which of (NMF) Methods), we identified factorization six mutational signatures in matrix this cohort, many non-negative Using set. sample of our power larger statistical improved the through findings cases smoking and non-smoking with associated signatures have identified genomes of cancer lung studies tumors in observed patterns mutational the to contribute processes cancer-related and Variouscarcinogenic 10 q Supplementary Fig. 5 Supplementary Fig. 2 Fig. P NFE2L2 values in the lung SqCC cohort. Peaks with with Peaks cohort. SqCC lung the in values C –8 < 0.001, Wilcoxon rank-sum test). However, lung ADCs from from ADCs lung However, test). rank-sum Wilcoxon 0.001, < PDGFRA–KIT–KDR TER C C sites signature (COSMIC 7, SI7), abbreviated a smoking-related q 10 13 v Fig. Fig. 2 EGFR –16 T alue WHSC1L1–FGFR1 REL–BCL11A , 1 b 7 10 ). For each tumor, we also derived the fraction of estimated estimated of fraction the derived also we tumor, each For ). MY ( –32 Supplementary Figs. 3 Figs. Supplementary b C and 10 CCND1

signatures ≤ –64 0.696 SI4 mutations/Mb; SI4 mutations/Mb; 0.696 10 SO a 1 Supplementary Fig. 6 Supplementary 0 DVANCE ONLINE PUBLICATION ONLINE DVANCE –128 X2 was used to identify significantly mutated genes genes mutated significantly identify to used was c

). In addition to identifying mutational ). signa mutational In to addition identifying Lung ADC q value

in 10 10 10 10 10 10 10 10 –128

NS –16 –32 –64 lung –1 –2 –4 –8 Fig. 2 Fig. NS Supplementary Fig. 6 Fig. Supplementary C

cancer SMAD4 G sites (SI15/SI6), two APOBEC- two (SI15/SI6), sites G – 5 10 a 6q22.31 and and 21q21.1 ). Furthermore, the rate of SI4 SI4 of rate the Furthermore, ). –1 ). ). However, only 45% of trans q KDM6A P 10 6 value for each in the the in gene each for value WWO = 8.5 × 10 , q 8 –2 Focal deletions Lung SqCC 13 q Supplementary Table 7 Table Supplementary values for amplifications amplifications for values , 1 values <0.25 were were <0.25 values 10 5 , 1 ; here we extend these these extend we here ; X 4 –4 ZMYND11 NF1 Xp22.2 . Previous large-scale large-scale Previous . PTPRD 10

B2M –8 Nature Ge Nature 10 C FA q −37 RB1 PDE4D v 4q22.1 –16 o T or T T1 alue , , Fisher’s exact 10 ) than in lung lung in than ) 13 FO –32 , 1 XP1 PTEN 10 6 CDKN2A CSMD1 (Online (Online LRP1B C –64 n A sites sites A C 10 etics C or or C –128 1 ). ). 8 - - - - -

© 2016 Nature America, Inc. All rights reserved. Comparing the significantly mutated genes to those in other tumor tumor other in those to genes mutated significantly the Comparing New lung. in signature this for etiology potential gene MMR the of levels expression ( data expression with tumors rates of both SSNVs and short indels when compared to all other lung ( (MSI) instability microsatellite with CRCs in observed commonly (SI15/SI6) nature sig an MMR-like exhibited SqCCs) lung three and ADCs lung (four tumors seven another for profiles mutational The lung. the to static meta carcinomas skin cell squamous represent also may signature the skin to the lung had occurred. The other two lung SqCCs with this carcinoma in the forehead, raising the possibility that metastasis from cell basal of history previous a had (TCGA-18-3409) patients these ( tumors lung other the to compared when (DNPs) polymorphisms variants somatic (SSNVs) single-nucleotide of and somatic dinucleotide rate mutation higher significantly a displayed and melanoma in exhibited a pattern of UV-related mutations (SI7) commonly observed ( stage with SI5 activity and total mutation rate also observed moderate associations of tumor 7 Fig. Supplementary ( average on signature this from values. maximum minimum– or range interquartile the times 1.5 demark whiskers Boxplot box). the of (top quartile third and box), the of (bottom quartile first bar), (middle median the shows ** * tests: rank-sum Wilcoxon from levels ( of levels lower as well as ( indels short and SSNVs both of rates higher significantly had tumors These CRCs. MSI in observed commonly signature MMR-like an exhibited tumors seven another ( tumors lung other all to compared when DNPs and SSNVs of rate overall higher a significantly displayed tumors These melanoma. in observed commonly signature a UV-associated from mutations estimated of number a high had ( below). discussed tumors MMR-high and UV-high the (excluding SqCCs lung and ADCs lung both for (CS) smokers current and (SFS), smokers former shorter-term (LFS), smokers former longer-term (NS), never-smokers lifelong across averaged were (bottom) signature each for mutations estimated of fraction the and (top) megabase per signature ( annotated lifelong never-smokers ( clinically for enriched were low transversion as ( high. transversion TV-H, low; TV-L, transversion (red). ADC lung in pattern a bimodal displayed tumor each in megabase per mutations (smoking-related) SI4 ( types. mutation distinct 192 on NMF using identified were signatures mutational Six cancer. lung in 2 Figure Nature Ge Nature types from TCGA Pan-Cancer study c P P P P ) The estimated number of mutations for each each for mutations of number estimated ) The = 0.011). Asterisks indicate significance significance indicate Asterisks = 0.011). ( < 0.01). P < 0.01; 0.01; < < 0.01) but not higher rates of indels ( indels of rates higher not but 0.01) < The mutational profiles of three lung SqCCs (~1% of lung SqCCs) of lung SqCCs) (~1% lung SqCCs of three profiles mutational The < 0.01, *** < 0.01,

significantly

Comparison of mutational signatures signatures mutational of Comparison Supplementary Fig. 8 Fig. Supplementary n e ) The mutational profiles for for profiles mutational ) The etics P a < 0.001. Each boxplot boxplot Each < 0.001. ) The estimated number of number estimated ) The b Fig. 2 Fig.

) Lung ADCs categorized categorized ADCs ) Lung

mutated ADVANCE ONLINE PUBLICATION ONLINE ADVANCE MHL1 ). In lung SqCC, we we SqCC, lung In ). e d ) ) Three lung SqCCs SqCCs lung ) Three 1 3 . These tumors had significantly higher higher significantly had tumors These . expression expression

P genes P P = 8.5 × 10 < 0.001), < 0.001), < 0.001). They also displayed lower lower displayed also They 0.001). < ). Fig. 2 Fig. P 1 MLH1 < 0.05, < 0.05, 0 showed that there were several

c

and and

−37

P ( ). > 0.05; 0.05; > P

= 0.011), suggesting a suggesting 0.011), = e b a d MMR estimated UV estimated Transversion Fig. 2 Fig.

mutations/Mb mutations/Mb 100 status 10 20 30 40 50 20 40 60 80 0 0 Number of tumors Probability density ( ( n 10 20 30 40 n d 0.1 0.2 0.3 0.4 0.5 0.6 =660) ADC =660) ADC ). One of of One ). 0 0 0 0 0 0 TV-L 3 2 1 0 TV-L mutations/Mb +1(log) ( ( Lung ADC n n SqCC SqCC =484) = 484) TV- Estimated SI4 - -

H TV-H class was the H3K79 methyltransferase methyltransferase H3K79 the was class ferase genes included methyltrans mutated Significantly pathways. signaling calcineurin ( itory domain also tended to co-occur with activating alterations ( terminus of the , that suggesting they may be gain-of-function domain the tered autoinhibitory near in encoding the the C sequence The mutations in calcineurin. phosphatase, dependent types tumor other in included altered not are which and ADC lung to sive TableSupplementary 5 and p.Arg24Leu, encoding mutation recurrent AKT1 included types tumor other in or cancer lung in altered be to ously previ observed been have that ADC lung in significance statistical BLCA) and ( HNSC (excluding types cancer other in not but SqCC RASA1 includ ADC, q lung in exclusively ing mutated significantly genes P Fig. 3 Fig. value < 0.1;

= 0.033), suggesting a potential relationship between the K-Ras and SSNVs/Mb SSNVs/Mb 100 120 100 120 STK11 20 60 80 40 60 80 20 40 TV-L Lung SqCC Lung ADC Lung SqCC 0 0 , , with a recurrent mutation p.Glu17Lys,encoding b , , Ever-smoker Never-smoker ( ( n n and and NOTCH1 PPP3CA =1,137) =1,141) MMR 4 TV-H lo lo UV , , Fig. 4 w w RBM10 Fig. Fig. 3 *** Supplementary Table 6 Table Supplementary ** ( ( MMR n n high high UV =3) =7) a , which encodes the catalytic subunit for the calcium- ). In addition, mutations mapping to the autoinhib , and and , a c and , , Smoking status MLL3 KEAP1 ). ). The mutated new significantly genes exclu Supplementary TableSupplementary 5 HRAS Short indels/Mb DNPs/Mb Fraction of estimated Estimated signature 3 4 5 6 0 1 2 3 4 5 0 1 2

( signature mutations mutations/Mb ( ( KMT2C n n , , 0.2 0.4 0.6 0.8 1.0 10 15 20 =1,137) =1,141) MMR n 0 0 5 RAF1 lo lo were significantly mutated in lung lung in mutated significantly were UV w w *** NS 92 ** , , ( ( ) ) and MMR n n high high LFS 13 UV =7) =3) RIT1 ). Genes that reached modest modest reached that Genes ). 1 1 DOT1L SF 163 SETD2 C S , and and , 43 MLH1 expression (log2) Indels/Mb S DNMT3A 10 11 , which was mutated mutated was which , 3 4 5 7 8 9 0 1 2 ). ). . . A new gene in this MET s e l c i t r A NFE2L2 ( n ( NS 18 Signature n =1,141) KRAS MMR =965) lo lo UV w w LFS (MutSig2CV (MutSig2CV PPP3CA 76 CDK4 * ( mutations SF 242 ( ( , , P MMR n n high high KDM6A UV =6) =3) < 0.005; < 0.005; C S SI SI SI13 SI15/SI6 SI SI , , with a 12 5 2 4 7 clus S 7  ------,

© 2016 Nature America, Inc. All rights reserved. zinc-finger domain, which was observed in both ADCs and SqCCs SqCCs and ADCs both in observed was which domain, zinc-finger opment pathways. related Many of these genes are involved in epigenetic regulation or immune- ( type tumor individual either in mutated significantly not were that types. We found 14 genes significantly mutated in the pan-lung cohort to common tumor both pathway alterations somatic recurrent tional lung SqCC tumor cohorts (into a pan-lung and cohort)ADC lung the would combining identifythat addi hypothesized We therefore both. activated protein (MAP) kinase signaling are often altered similarly in two may tumor pathways between be types, such distinct as mitogen- etiologies and/or gin even if events, the tumor are of types tissues from different ori vastly low-frequency detect to power statistical additional yield can types mutation between survival. and status associations significant additional testing identify not multiple-hypothesis for correction ( after stage tumor or survival patient and status mutation between associations nificant q whereas females, in cohort SqCC lung the significance in statistical reached also KEAP1, of partner interaction p120GAP ( mutations for frameshift that were in mutated genes enriched lung cantly SqCC ( reported previously as males, in Table Supplementary 8 and ( in mutations recurrent found and class mutations frameshift for ( enriched and mutated significantly was methyltransferase, cap a set, data in RNA-binding the protein identified been have mutations loss-of-function and as such factors splicing in reported mutations in lung haveADC previously been ( for mutations enrichment truncating with ADCs lung of 3% in SqCC. lung and/or ADC lung in not but type cancer another in mutated significantly genes represent quadrants lower-right the in points Black Methods). (Online test exact a Fisher’s (–log mutations ( (–log MutSig2CV by defined as clustering mutation for enrichment indicates point the of color The gene. the in mutations of frequency the to proportional is point each of size The BLCA. and HNSC excluding types, tumor other the from those to compared similarly were ( study a pan-cancer from types tumor best the against plotted is cohort ADC lung the in gene ( types. cancer other to compared as cancer 3 Figure s e l c i t r A  ( P b a Fig. 4 q Supplementary Fig. 9 Fig. Supplementary Fig. 4 Fig. upeetr Tbe 10 Tables Supplementary value < 0.1; 0.1; < value ) The ) The ) The ) The CL value < 0.1; Previous studies have shown that joint analysis of different tumor tumor different of analysis joint that shown have studies Previous )) and/or enrichment for loss-of-function loss-of-function for enrichment and/or )) SMARCA4 c a q q ). A regulator of

2 ). We also examined genes for other known in this this in proteins known other for genes examined We also ). q value for each significantly mutated mutated significantly each for value values from the lung SqCC cohort cohort SqCC lung the from values Significantly mutated genes in lung lung in genes mutated Significantly 2 value for the same gene in 19 other other 19 in gene same the for value , contained a new recurrent mutation mapping to the the to mapping mutation recurrent new a contained , 2 0 ( P < 0.001) included included 0.001) < i. 4 Fig. Supplementary Fig. Supplementary 10 10 mutations were in enriched males (FDR upeetr Tbe 9 Table Supplementary ( P KLF5 value)) as determined with with determined as value)) b PASK 1 ). ). 0 . Additionally, although the individual drivers drivers individual the . Additionally, although KLF5 , a transcription factor critical for lung devel critical factor , a transcription ). ). CUL3 ). ). i. 4 Fig. 2 EGFR RBM10 1 mutations were exclusive to males (FDR (FDR males to exclusive were mutations ( – Fig. 4 Fig. , the E3 ubiquitin ligase 13 , whose protein product is a known known a is product protein whose , a mutations were enriched in females, females, in enriched were mutations ). Recurrent Recurrent ). ). Controlling for tumor stage did did stage tumor for Controlling ). mutations were modestly enriched mutations enriched were modestly RASA1 q 1 FTSJD1 b 0 value = 0.219) = value . ). ). U2AF1

and 10 RB1 RBM10 SF3B1

. e i nt bev sig observe not did We ). , whose protein product is is product protein whose , Supplementary TableSupplementary 14 (also known as as known (also mutations were enriched enriched were mutations , (ref. (ref. (ref. (ref. a 6

. The new signifi new The . Lung ADC q value 8 1 FBXW7 10 10 10 10 ). ). In current the 9 NS –1 –2 –4 –8 ) and and ) q NS value < 0.1; RIT1 MET FTSJD STK11 KEAP1 RBM10 CMTR2 SNRPD3 (ref. 10 –1 1 MGA RAF1 KARS STIM1 KLHL5 DOT1L PTPR FANCM MAP2K1 PPP3CA ARID2 ATF7IP ARHGEF1 2 3 Best Pan-cancer ), ), ). ). ), ATM - - - - - Lung 10 U onl

SMARCA4 –2 y 2 Other tumor and peaks at at peaks and peaks around around peaks amplification additional identified analysis number copy pan-lung to similar alteration specific lineage- a represent may (miRNA) microRNA this of amplification to distinguish lung SqCCs from other NSCLC types and kinase, included SqCC ADC for early-stage cancer cation peak near CCND3 KDM5A KAT6A included tumors lung in ( characterized less but types tumor other in SqCC lung for significant Table Supplementary 16 most the among were TERT most The set. significantly this focally amplified from genes in lung ADC cancers were lung the of subset a included that types tumor 11 across analysis number copy pan-cancer a in peak same the by gene examining target likely most the we inferred genes, amplifications and deletions. For some peaks that still contained many new detect to of genes focal target putative the resolution and ascertain changes copy number better had we size, sample larger a With New an 88%. of rate validation observed SSNV we >95%), (power analysis RNA-seq the in depth ing sequenc sufficient For with sites proteins. two for these overlapping domain were this non- outside alterations and loss-of-function other (HAT) mutations mapping All domain. missense to the HAT domain a region mapping hotspot mutational to the histone acetyltransferase recurrently mutated in BLCA in by HNSC our group reported tion associated with increased but did not co-occur with was also significantly mutated in the lung SqCC and pan-lung cohorts type onl CTNNB1 ERBB2 10

somatic , , y –4 q 2 MCL1 value 6 , , Lung andother was specifically amplified in lung ADC, whereas an amplifi whereas in lung ADC, amplified specifically was . . , , ZNF217 tumor type MIR21 PTP4A1 PIK3CA ARID1A NRAS MIR205 MLL3 CDKN2A 10 SETD2 , and and , BRAF SOX2 MAPK1

–8 U2AF1 copy APC SMAD4 a NF1 RB1 TP53 KRAS EGFR expression has shown expression to been be factor a prognostic YES1 MIR21 , and and , DVANCE ONLINE PUBLICATION ONLINE DVANCE – MDM2 , , PHF3 27 ( Loss-of-function number CCND1 Fig. 5 Fig. , encoding a Src family non-receptor protein protein non-receptor family Src a encoding , , (

2 enrichment b MYCL1 8 – Fig. 5a Fig. 5 4 3 2 1 0

. Likewise, new amplification peaks for lung peaks amplification new . Likewise, Lung SqCC q value Cluster enrichment TUBD1 ). Amplification peaks previously described described previously peaks ). Amplification KLF5 0 , and and , 10 10 10 10 NS ( 2 1 –1 –2 –4 –8 d Fig. 5 Fig. 2 ). Expression of of Expression ). , , SOX2

5 KLF5 alterations WHSC1L1 . . The paralogs NS – mutations. A super-enhancer duplica for lung ADC ( ADC lung for 3 2 d MYCL1 ( IRF CUL3 NSD1 PASK HRAS RASA1 KDM6A NFE2L2 NOTCH1 4 4 10 and and , , and Fig. Fig. 5 a amplification. Finally, combined combined Finally, amplification. 5 –1 6 expression has also recently been and and Supplementary Table 17 Supplementary (excluding HNSCandBLCA) KLF5 c Frequency (%) Best Pan-cancer o ln SC ( SqCC lung for ) ) was in also observed breast Supplementary Table 15 Table Supplementary – 10 FGFR1 <1 1−2 2−3 3−5 5−10 >10 –2 has been reported to be reported has been EP300 FAT1 MIR205 Fig. 5 Fig.

, , Nature Ge Nature 10 2 and

9 MLL2 –4 ARHGAP35 q , , suggesting that valu NKX2-1 has been used used been has a ) and and ) i. 5 Fig. CREBBP e , and and , ARID1A 10 TP5 RB i. 5 Fig. PTE CDKN2A PIK3CA FBXW –8 NF n IGF1R 1 , , 3 N b 1 EGFR etics MYC and and ). 7 had b 2 ). ). ), ), 6 - - - , ,

© 2016 Nature America, Inc. All rights reserved. another focal deletion peak around q types tumor both ( in mutations loss-of-function for enriched was ponent of the MHC complex, was focally deleted in both tumor types, included Supplementary Table 19 types tumor other CREBBP in observed deletions focal SqCC lung New mutations. loss-of-function for enriched and 18 Table Supplementary genes fier including hotspots, with genes mutated significantly additional identified PPP3CA 4 Figure Nature Ge Nature HNSC in reported also was which P value = 0.006). Combined pan-lung copy number analysis identified 0 0 0 0 b a 0 Lung SqCC Lung ADC < 0.01), and was significantly mutated in the pan-lung analysis (FDR Focal deletion peaks in lung ADC included the modi chromatin the included ADC lung in peaks deletion Focal p.G9G p.R5I

p.A40G Required forinteractionwithnucleosomesandDNA S-adenosyl- p.N49D p.K90N Calcineurin Bbindingsite2potential Calcineurin Bbindingsite1potential Catalytic p.P25P p.S53R

200 , , New significantly mutated genes in lung cancer. ( cancer. lung in genes mutated significantly New 100 , , DOT1L 100 ROBO1

n p.E248* SMARCA4 p.Q113E 200 100

etics p.V85F L p.E145* -methionine binding

400 p.E348*

, and , and p.N155I p.G175R p.S250* 200

200 p.G412fs

, , p.S260C p.I187fs

p.S199* p.N522fs USP22 p.R283C p.A529P ADVANCE ONLINE PUBLICATION ONLINE ADVANCE FTSJD1 p.E238* p.W317* p.E246D p.L551F p.E243K p.L330L 600

and and p.A591T p.S255* p.V368I p.Y256* p.V174L 400 p.C637G p.257_258CE>* 200 300 ). ). p.P403A 300 , , and p.P664A

), which were also significantly mutated mutated significantly also were which ), p.P403S p.G309* B2M p.A678V

for lung ADC ( ADC lung for p.L321F p.NN406fs p.G309V p.R681R 800 ARID2 p.H327Y

p.L482L PPP3CA RASA1 p.E682* DOT1L FTSJD1 CUL KDM6A p.W490* p.Q839* p.F239fs ( p.A371A 400 400 β p.R512* p.G386* p.E843*

p.S389L 3 TRAF3 2 2 microglobulin), encoding a com p.R537fs p.E391* p.S911L 1 1,000

600 p.S423fs p.G936C 2 ( p.E424K 300 . In general, mRNA expression expression mRNA general, In . p.E451*

upeetr Fg 11 Fig. Supplementary p.D457N p.S1035* (

p.G477fs p.T1068fs Inhibitory domain Calmodulin bindingpotential Supplementary Fig. 11 Fig. Supplementary 500 500 (

a p.G484A 1,200 Supplementary Table 20 ) and ) and p.R709* p.L497fs p.M718I p.C511fs p.G528* p.D782Y p.K536* 800 SAM-binding motif SAM-binding motif SAM-binding motif 400 RASA1 p.E783A p.E418D 600 600 p.M802I 1,400 p.M431I p.K610fs p.E836fs p.E453G

p.G875E p.D477H p.V643fs p.G1443V p.D477Y

a p.R903* and and p.G1471A

, p.V677F p.I905N p.R478* b p.A1522T 1,600 ZMYND11 700 p.R699I 700 p.R478L

) Alteration profiles are shown for new genes specific to each lung tumor type, including including type, tumor lung each to specific genes new for shown are profiles ) Alteration p.910_911LN>H p.KK715fs p.S717* p.N480I

1,000 p.Q938*

CUL3 p.Q719* p.K731M p.LS991del 500 p.N480K p.K742N p.S1006* p.M727I 3 2 1 p.E481* p.R751T p.Q1020H p.K732* p.E481K

and and p.R756L p.L742V p.R487G and for lung SqCC ( SqCC lung for ), c - - , KLF5

Pan-lung translocations for patients with lung ADC harboring those as such trials clinical in inhibitors RTK to observed been have that of responses the dramatic because interest tions are of particular pathway. signaling in components of altera These the RTK–Ras–Raf characterized have been alterations exclusive In mutually lung ADC, Identifying 14 Figs. ( data variation number copy or expression the mRNA either in types tumor across or within effects batch substantial ( genes target ( associated significantly was

and two paralogs, paralogs, two and Lung ADC Lung SqCC Lung ADC Lung SqCC Lung ADC Lung SqCC 0 p.Q136H p.A15A p.A13A p.E3* p.G142R p.E3D

b p.G29F ZZ KIX ZF-TA and and ZZ KIX ZF-TA Interaction withWWP

). ( ). p.G174R Splice site Missens Silent p.R218I p.T228A p.A171A p.G152C 50 c p.P275Q p.V304V

Z ) Combined (pan-lung) analysis of both tumor types types tumor both of analysis (pan-lung) ) Combined p.Y257D Z

RTK–Ras–Raf p.P383R

15 p.Q301H

Supplementary Figs. 12 Figs. Supplementary p.M493I p.P332L 3

e p.N305D 0 ). p.E349* p.Q503* . However, many lung ADCs do not exhibit a known a known exhibit not do . However, ADCs many lung p.P616R p.G446C p.P81H p.R378Q 100

p.S646I Creb binding HA Bromo

EP300 p.T396R p.V701V Creb binding HA Bromo p.N530I T

Frameshift indel Nonsens p.D618N p.G556G p.G777G

T p.E628Q p.T125T

p.P814L 1 p.L833L p.Q816fs p.M749I

and and p.G837V p.G818G p.S772L p.P156T e p.D1049D

p.G896G p.I161M drivers p.E1054K p.Q943* p.G1069D CREBBP P p.P975fs p.P182L p.W1151L 200 < 0.05) with copy number levels for for levels number copy with 0.05) < p.R1140Q p.Q1036* p.H196N p.R1173R p.R1173Q CREBBP p.K1045N p.E202Q EP300 p.D1224Y p.RKV1202fs p.Q1223* KLF5 p.C1385Y p.D1273N p.Y1298C in p.E1285G p.Y1198F p.L1398P

EGFR p.D1309H 250 . Splice-site Indel In-frame indel p.K1367N p.K1327* p.T1300T p.D1399N lung and and p.Y1414C p.E1400* p.C1408Y p.I1366fs p.E1416Q p.G1411E p.G1443V p.W1472C p.Q1464L p.D1445G mutation or p.D1435Y p.D1485Y p.P1452L 13 p.E1550K ADC p.R1446C p.E1505* p.E1550Q p.R1446H p.Y1467F p.P301S

). We did not observe observe not We did ). p.R1602F p.D1507N p.A1448fs p.R1627Q p.R314T p.P309S p.C1710S p.Y1450D p.W1509R p.E1724K p.L1563L p.H1517L s e l c i t r A p.Q1756* p.FV1633fs p.L1631L 350

Supplementary Supplementary p.G1757C p.E1766K p.R1868L p.S1767L p.Q1904P

ALK p.R1875L p.R1768C p.P2036A p.A1824T p.A2044S p.Q2043H p.G2123W p.R395S p.Q2230R p.H2125Q or p.L2164L p.D418Y p.D418N p.M2301I p.M2225I p.P2163P p.E419K p.L2321L p.E419Q

ROS1 p.Q2261Q p.S2315C p.E419Q p.F433F p.R2353L p.L2330R 450 p.G2401W p.H2384Q p.D2429Y  -

© 2016 Nature America, Inc. All rights reserved. pathway components components pathway Ras known the including samples, oncogene-negative altera among for tions enriched significantly were genes 15 total, In test. exact enriched for alterations in oncogene-negative samples using a Fisher’s ( and analyses MutSig2CV the of any in mutated significantly were that genes whether determined we way, pathway.Ras–Raf To identify additional potential drivers in this path mutually exclusive with alterations in other genes related to the RTK– entirely not are gene this in mutations as group, oncogene-positive include not did we analysis, this of poses ( negative’ ‘oncogene considered were ADCs ( positive’ ‘oncogene ignated driver RTK–Ras–Raf known a affecting fusion gene reported in in tions duplication activating an harbored 16 Fig. Supplementary in pathway ( this alterations activating known other without of levels ( Interestingly, alterations. another activating known other without tumors KIF5B reported Previously ( of in insertion in-frame in recurrent a included alterations genes pathway known New mutually alterations. with exclusive genes new identified then and members pathway known among tions altera of pathway,this we characterized first To the somatic landscape understand further quency somatic events are yet to be identified. possibility that additional genes with low-fre activating mutation in the pathway, raising the MAPK1 ( ADC lung in include that peaks amplification new for ratios number copy focal ( SqCC. lung and/or ADC lung in not but type cancer another in amplifications by altered significantly genes indicate quadrants lower-right the in points Black analysis. number copy pan-lung combined the from or types tumor 11 across analysis number copy pan-cancer from inferred was gene target likely most the that indicate names gene around Brackets amplification. focal of frequency the to proportional is point each of size The BLCA. and HNSC excluding types, tumor other seven against compared are SqCC lung in peaks amplification for types tumor non-lung q best the against plotted is ADC lung in peak ( cancer. 5 Figure s e l c i t r A  components Fig. Fig. Fig. Fig. value for the same region across nine other other nine across region same the for value Lung ADCs that had an activating SSNV, indel, amplification, or or amplification, SSNV, indel, activating an had that ADCs Lung MET EGFR 14 6 6 – and and ) ) or that are in important of regulation the pathway Ras c in lung SqCC ( SqCC lung in and and

with its neighboring gene, gene, neighboring its with a MET , or or KRAS d Significant amplifications in lung lung in amplifications Significant NTRK2 6 ) The ) The MET , ) is plotted against against plotted is expression ) Gene 8 , 3 3 Supplementary Table 21 Supplementary MET 2 c 3 . . By manual review, we found additional canonical muta fusions ) and ) and ( VAV1 upeetr Tbe 21 Table Supplementary q and and , , Supplementary Table 22 Supplementary value for each amplification amplification each for value EGFR fusion with with fusion in 11 tumors, some of which have been previously previously been have which of some tumors, 11 in CCND3 YES1 and and 2 ERBB2 6 d . . ( TRIM24 SOS1 , or or , ). 3 MAP2K1 1 b , , EGFR were observed in in observed were , , ) The ) The MIR205 ARHGAP35 ) MIR21 6 ERBB2 . A single lung ADC (TCGA-49-4512) (TCGA-49-4512) ADC lung single A . amplification were enriched in tumors tumors in enriched were amplification and and n alteration resulting in kinase domain domain kinase in resulting alteration TP63 q – = 418), whereas the remaining lung lung remaining the whereas 418), = values values and a fusion fusion a and NTRK2 , and , and RASA1 , and , and in 17 tumors and complex indels indels complex and tumors 17 in was also found in a lung SqCC SqCC lung a in found also was CAPZA2 MAPK1 ). As observed previously, high high previously, observed As ). ( upeetr Tbe 5 Tables Supplementary q and the Rho kinase pathway pathway kinase Rho the and and and ). value < 0.1; 0.1; < value ) NF1 3 1 - - .

n -altered tumors in the the in tumors -altered = 242). For the pur the For 242). = a

d c Lung ADC q value Lung SqCC Lung ADC 10 6 10 10 10 , normalized expression normalized expression 10 10 10 10 34 –128 i. 7a Fig. NS –16 –32 –64 10 11 12 13 10 11 12 13 14 –1 –2 –4 –8 , 8 9 9 3 5 −1

were des were NS P MAPK1 CCND3 NKX2−1 2 1 0 < 0.01; 3 , 6 c

Best pan-cancertumor 10 were 3 2 1 0

and and –1 , 6

CCND3 10 –2 YES1 - - - - - , WHSC1L1–FGFR1

10 –4 TERT ME EGFR ( tions pathway. RTK–Ras–Raf the in alteration analysis fusion for available data RNA-seq had and review pathological expert secondary underwent driver genes ( RTK–Ras–Raf putative or known in alteration an displayed ( set tumor MAPK1 near mutations ( function Ras and Rho respectively,kinases, and were each forenriched loss-of- the for (GAPs) proteins GTPase-activating (p190RhoGAP) overall activity increase GEF to shown been has site and this domains, affecting PH and mutagenesis Ac, CH, the of interface the near located is domain of Dbl homology the catalytic for are domains autoinhibition (PH) important homology pleckstrin and (Ac), acidic (CH), homology calponin the between Interactions 17 Fig. same in Noonan syndrome the reported has been region in substitution p.Asp309Tyr the and ADCs, lung four in p.Asn233Tyr substitution in the autoinhibitory domain (DH) of SOS1 proteins Ras of activation in the RTK and to assists the complex bound (GEF) factor 23 Table Supplementary KRAS T

10 ZNF217 New co-occurrences included included New co-occurrences –8 KAT6A [ MYCL1 3

FGFR1 10 –16 amplification significantly overlapped with activating activating with overlapped significantly amplification MECOM–TERC P Fig. 7 MIR21–TUBD1 ). Similarly, VAV1 is a GEF for the Rho family GTPases. GTPases. family Rho the for GEF a is VAV1 Similarly, ). (22q11) that were only significant in the oncogene-negative oncogene-negative the in significant only were that (22q11) = 0.019; 0.019; = 10 MCL1 –32 ] <2.5 PDGFRA–KIT–KDR CCNE q value q 16 17 18 19 10 12 14 16 MDM2 – 10 4 6 8 c Lung SqCCfocalcopynumberratio value < 0.25; 0.25; < value –64 CDK4 4 Lung ADCfocalcopynumberratio WHSC1L1 ). Moreover, 193 of 227 (85%) lung ADCs that previously 1 0 2.5–5

( 10 −0.5

3 –128 a MY Supplementary Fig. 17 Fig. Supplementary ERBB2 7 EGFR CCND1 DVANCE ONLINE PUBLICATION ONLINE DVANCE . Recurrent mutations were observed encoding a encoding observed were mutations Recurrent . Supplementary Fig. 16 Fig. Supplementary C 2 1 0 5–10 0 P < We 0.01). peaks amplification identified also MIR205 MIR21 0.5 b (8p11.21), (8p11.21), 10–25

). SOS1 is a guanine-nucleotide-exchange guanine-nucleotide-exchange a is SOS1 ). Lung SqCC q value Fig. 7b Fig. 10 10 10 10 1. 10 10 10 10 –128 1 0 NS –16 –32 –64 –1 –2 –4 –8 >2 5 .5 MET 3 NS , 6 c harbored a predicted activating activating predicted a harbored MIR205 YES REL–BCL11A [ ). In total, 499 (76%) lung ADCs ADCs lung (76%) 499 total, In ). PDGFRA MAPK1 2. PTP4A1–PHF3 0 4 Best pan-cancertumor 1 amplifications and amplifications 10 (excluding HNSCandBLCA)

0 –1 . . The p.Ser67Tyr substitution [ 10 11 12 13 14 11 12 13 14 ] AKT

). ). 10 ). Additionally, high-level high-level Additionally, ).

−1 –2 −1 1 FOXA WHSC1L1–FGFR RASA1 ] NFE2L2

– 10 [

–4 KDM5A KIT TERT 38 1 3 2 1 0 3 2 1 0

10 BCL2L1 ,

3 –8 Nature Ge Nature PDGFRA–KIT–KD CDK 9 – ] and and ( 10 KDR MAPK1 MAPK1 –16 Supplementary Supplementary 6 [ GF1 1

10 MYCL1

–32 SOX2

CCNE ARHGAP35 (4q12), and (4q12), q R NF1 ] valu

10 [ MCL1

–64 MDM2 1 CCND1 ERBB2 R n

e 10

muta –128 EGFR ] etics MY EGFR C -

© 2016 Nature America, Inc. All rights reserved. patient, we evaluated the ability of the each protein For sequence resulting landscape. from mutational the of properties immunogenic tial inhibitors in lung cancer checkpoint immune of use the in interest increasing the of Because Assessment samples. cancer these in mutation ognized ( harbored pathway RTK–Ras–Raf the for negative 7 Fig. activating with overlapped cantly ( mutations box). gray the by indicated (as transcript fusion putative the in not exons of expression the than higher relatively was transcript fusion putative the in retained exons of expression the fusion, each For SqCC. a lung in found also was Another signaling. RTK–Ras–Raf in alterations fusions 5 3 the in resulting duplication tandem gene, neighboring its with one including RTK domain, the encoding sequence the retained that identified 6 Figure Nature Ge Nature Fig. 7 Fig. PDGFRA–KIT–KDR ′ end of of end c a Oncogene-negative enrichment (Fisher's exact LOR) c ARHGAP35 3 c −0.5 −1.5 −1.0 ) 1

), suggesting the possibility of an additional, hitherto unrec hitherto additional, an of possibility the suggesting ), 0.5 1.0 1.5 2.0 43 MAP2K were observed in lung ADCs without other known activating activating known other without ADCs lung in observed were Fusions involving involving Fusions KRAS CAPZA2 MAPK1 0 ERBB2 ERBB2 RASA FGFR1 NTRK STK11 , HRAS NRAS BRAF ROS1 KRAS EGFR EGFR ARAF SOS1 4 RAF1 VAV n RIT1 MET MET MET 4 RE AL NF 0 Fraction of mutated oncogene-negative tumor oncogene-negative mutated of Fraction . Furthermore, 28 lung ADCs that remain oncogene oncogene remain that ADCs lung 28 Furthermore, .

P KARS etics of K T 2 1 1 1 1 19 10 × 1.9 = -related genome alteration complementary to to complementary alteration genome -related

Co-occurrences suppressors Tumor Amplifications Fusion indels or SSNVs Gain-of-function neoantigen RBM10 . Previously reported reported . Previously 0.05

VAV1 RASA1 ATF7IP SOS1 ADVANCE ONLINE PUBLICATION ONLINE ADVANCE s LATS1 ARHGAP35 0.10 COL5A2 CREBBP 45 MET , 4 Mutations FANCM 6 −8 0.15 , we comprehensively analyzed the poten SMARCA4 CAPZA2

High-level amplification High-level Fusion skipping exon or mutation Splice-site mutation Missense load and and ) 41 , 4 0.20 MLL3 2

NTRK2 ′ , and and , and KRAS NF1 end of of end TRIM24 . This fusion most likely arose via via arose likely most fusion . This KEAP1 0.25

recurrence . Two fusions of . Twoof fusions mutations ( mutations STK11 MET TP53 TP53 NTRK2 s – 0.67 NTRK2 being fused with the the with fused being uain signifi mutations b Oncogene-negative q value fusion with with fusion 10 10 STK11 and and 10 10 10 10 NS

–16 –32 –1 –2 –4 –8 P = 1.1 × 10 × 1.1 = MET KIF5B NS RNA-seq NA RNA-seq mutation nonsense or indel Frameshift indel In-frame PTP4A1–PHF PDGFRA–KIT–KDR MAPK FGFR1–WHSC1L1 mutations mutations were were 1 TP63 STK11 – 10 MET n –1 = 49 = Oncogene-positive Oncogene-positive

MIR21–TUBD1 3 −6

10

2 9 - - - ; –2

Amplification 8 MECOM–TERC 10 lung cancer. Both nonsynonymous mutation and neoepitope counts counts neoepitope and mutation nonsynonymous Both cancer. lung in observed neoepitopes common most the and identified acteristics mutations (resulting in neoepitopes or neoantigens) and clinical char immunogenic of number the between association the assessed then alleles HLA patient-specific to the of one any by cells presented immune and processed be to mutation missense somatic each were not significantly different between lung ADCs and lung SqCCs SqCCs lung and ADCs lung between different significantly not were 133 TRIM24 –4 CCND1 10 EGFR s q –8 valu KIF5B 10 KAT6A CCND3 ERBB2 CCNE MET MCL1 e TERT –16 KRAS MDM2 CAPZA2 CDK 1 NKX2−1 10 Exon 12 MYC 4 –32 3 16 2 4 11 <1 1 1 2 2 <1 <1 <1 1 1 <1 <1 <1 <1 1 1 1 2 2 2 6 13 32 TP63

Exon 2 y Frequenc Exon 6 associated with overall smoking history history smoking overall with associated test; Wilcoxon rank-sum ( ever-smokers from to lung ADCs comparison in never-smokers from ADCs lung in lower significantly were counts these ( ever-smokers from pathway but have a mutation in in a mutation have but pathway this in alteration an harbor not do that tumors ( pathway this in alteration putative or a known harbor that tumors separate lines Dashed panels. all in red in labeled are pathway RTK–Rho/Ras–Raf the of members are that tumors oncogene-negative among alterations for enriched genes New included. were evidence functional experimental previous with sites or sites mutated recurrently only indels, or SSNVs gain-of-function with genes For 1. than greater log a total had they if gene a given for amplification high-level have to Tumorsconsidered pathway. the were of activators new and known for plot mutation ( 2.0 GISTIC using set tumor oncogene-negative the in found only were WHSC1L1 ( set. oncogene-negative the in higher was mutations of frequency the that 0 indicates than greater (LOR) ratio odds A log-transformed FDR test, exact (Fisher’s tumors oncogene-negative among mutations for enriched significantly ( otherwise. negative oncogene as classified were and components pathway characterized previously in alteration recurrent or activating a known harbored they if positive oncogene as classified were ADCs Lung ADC. lung in pathway Raf 7 Figure RNA-seq data were not available. not were data RNA-seq or pathway this in alteration an harbor not do that tumors and 28), Exon 23 b ) Significant amplification peaks near near peaks amplification ) Significant LUAD-TCGA-55-8091 LUSC-TCGA-O2-A52V q LUAD-TCGA-93-A4JN value < 0.1; < 0.1; value LUAD-TCGA-78-7220

New alterations in the RTK–Rho/Ras– the in alterations New , , a PDGFRA ) Fifteen genes (red points) were were points) (red genes ) Fifteen 2 -transformed copy number ratio ratio number copy -transformed STK11 Low S Normalized exonicexpression – upplementary upplementary KIT Exon 12 Exon 12 –3 q value < 0.25). ( < 0.25). value i. 8a Fig. – ( KDR Exon 14 n Exon 15 Fig. 8a Fig. = 133). NA, NA, = 133). s e l c i t r A , and , and 0 STK11 , NTRK2 NTRK2 b ). However, However, ). , T n b able 23 able MAPK1 = 499), = 499), ) ) and were P ( 47 < 0.001, < 0.001, c 3 FGFR1 n , ) ) Co-

High

4 = 8

ME ME . We .

). ). T T

 – -

© 2016 Nature America, Inc. All rights reserved. rent mutations in known and new pathway components. Similar Similar components. pathway new and known in mutations rent rare, recur additional pathway,to find we may yet be underpowered RTK–Ras–Raf the in alteration detectable known, a have not do still or fusions oncogenic of rates the underestimating be may we tumor, every for data RNA-seq matching have not did we Because events. rare detect of sample size increasing the to usefulness highlights mutants further muta recurrent detect in to tions able not were tumors lung of numbers such as genes identified newly including pathway, RTK–Ras–Raf the oncogenic alterations in noncoding genes or regulatory elements. genomes from lung cancer may betterbe suited for ofdiscovery other regulatory elements. Future studies examining large numbers of whole or genesmutations noncoding examine in to able not were we genes, protein-coding of sequencing whole-exome on focused study this in that results in increased recentlyalso reported the duplication of a noncoding super-enhancer miRNA near geneswere ( or contained that peaks two found also we Interestingly, includinggenes, coding studies other in done been has as heterogeneity, intratumoral analyze to able not were we patient, per sample tumor one only had we As tissue. defined cally similar more be an anatomi within of cells origin from different arising will cancers than tissues different across origin of cells similar developmentally from arising cancers Thus, contexts. cellular ferent that can somatic alterations in have potential dif oncogenic different types data molecular five of clustering BLCAs was of also observed when 12 subset tumor a types were and reclassified using HNSCs, SqCCs, lung between similarity The types. cancer lung two the for distinct largely are alterations number copy somatic recurrent and genes mutated both that showed comparison expression gene of studies with Consistent SqCC. lung and lung cancers to explore similarities and differences between lung ADC We examined the exome sequences and copy number profiles of 1,144 DISCUSSION immunotherapy. for potential great lung SqCC samples had at least five predicted neoepitopes, suggesting predicted neoepitope properties ( has which p.Gln311Glu, encoding mutation recurrent a cancer, lung in cated p.Gly154Val, p.Val157Phe, p.Arg175Gly, and including p.Pro278Ala ( TP53, in alterations and several p.Gly719Ala, EGFR p.Gly466Val, B-Raf p.Glu79Gln, NFE2L2 p.Glu542Lys, PIK3CA included tumors four least at in neoepitopes ( SqCCs lung test; not but ADCs lung in p.Gln311Glu. C3orf59 and p.Glu542Lys, PIK3CA p.Val157Phe, TP53 included neoepitopes be to predicted alterations common most (*** ever-smokers from ADCs lung in than never-smokers from ADCs lung in lower significantly were counts these ( SqCCs lung and ADCs lung from ever-smokers ( ( data. RNA-seq available with tumor each in alleles HLA inferring after predicted was mutation missense 8 Figure s e l c i t r A  example, For pathways. other for relevant be may considerations a ) and neoepitope counts ( counts neoepitope ) and Our study has uncovered multiple significantly mutated genes in in genes mutated significantly multiple uncovered has study Our protein- containing peaks amplification focal new Several RASA1 upeetr Fg 18 Fig. Supplementary

SOS1 Neoepitope load in lung cancer. The immunogenicity of each each of immunogenicity The cancer. lung in load Neoepitope , , MET SOS1 (refs. (refs. MIR21 exon 14 skipping events. As 15–25% of lung ADCs ADCs lung of 15–25% As events. skipping 14 exon , and and , 8 in lung ADC and 49 , 3 C3orf59 Fig. 8 , 9 MYC 5 VAV1 MAPK1 0 ). The fact that we were able to detect these these detect to able were we that fact The ). b . ) were not significantly different between between different significantly not ) were a c expression , . Previous studies examining smaller smaller examining studies Previous . b ). ). Overall, 47% of lung ADC and 53% of (also known as as known (also ). Alterations predicted to generate generate to predicted Alterations ). ) Nonsynonymous mutation counts counts mutation ) Nonsynonymous Fig. Fig. 8 , , YES1 c MIR205 ). ). A gene not previously impli P , and , 1 2 < 0.001). ( < 0.001). 1 P 4 . These differences suggest suggest differences These . . As the mutational analyses < 0.001, Kruskal–Wallis Kruskal–Wallis 0.001, < P CCND3 > 0.05). However, However, > 0.05). in lung SqCC). We have MB21D2 c ) Some of the the of ) Some , were identified. wereidentified. , ), harbored harbored ), 1 1 , this this , - - - - -

Indelocator, Indelocator, algorithm, The The Cancer Genome Atlas project: U24CA143845, U24CA143867, U24CA126546, This work was supported by grants from the National Cancer Institute as part of online version of the pape Note: Any Supplementary Information and Source Data files are available in the URLs). (see Portal Data TCGA the via accessed be can 2 Tablein UUIDs the using Hub Genomics Cancer Cruz ples codes. Accession the in ver available are references associated any and Methods M Tumor Portal, Hub, edu Genomics Cancer Cruz Santa California mathworks.com/m http://www.tumo calculations, power cancer/cga pipeline, Firehose Institute Broad URLs. customized and inhibitors therapies. vaccine checkpoint immune to responses cal clini and may candidates these studies between relationship the Future unravel further neoepitopes. in result to predicted were muta tions recurrent highly Some status. smoking as such variables cal clini and rates, mutation nonsynonymous overall loads, neoepitope between association the fully more understand to mutations sense EP300 we identified new epigenetic modifier mutations in in mutations modifier epigenetic new identified we A c a ckn et ial, e xmnd h imngnct o idvda mis individual of immunogenicity the examined we Finally, Nonsynonymous count sion of the pape the of sion / 1,024 2,048 6 ; ; TCGA Data Portal, h , 12 25 51 7 C3orf59 p.Q311E PIK3CA p.E542 16 32 64 o NFE2L2 p.E79Q can be downloaded from the University of California Santa Santa California of University the from downloaded be can , previously shown in SCLC in shown previously , Picard tools, tools, Picard EGFR p.G719 8 6 2 0 1 2 4 8 ods B-Raf p.G466V TP53 p.R175G TP53 p.G154V TP53 p.P278A wledgmen TP53 p.V157 . Additional clinical and molecular data for TCGA samples samples TCGA for data molecular and clinical Additional . / ( smoker Never- n ; Oncotator, Oncotator, ; ADCs =93) http://www.broadins http:// http://pub a A K F *** 0 DVANCE ONLINE PUBLICATION ONLINE DVANCE rfusions.org Binary alignment (BAM) files for all TCGA sam TCGA all for files (BAM) alignment Binary r ( t atlabcentral/fileexc n smoker . ADCs r Ever- =440) htt s . http:// www.broadinstitute.o p://www.tumorportal. http: s.broadinstitute.org http NS broadinstitute.githu ( n SqCCs 5 = 468) //tcga-data.nci.nih. ://www.broadinstitut Al / Number ofrecurrentneoepitopes ; mutational signatures, signatures, mutational ; l titute.org/cancer/cg 5 1 b . http:/ Neoepitope count 16 32 64 hange/3872 0 1 2 4 8 10 /www.broadinstitute. rg/cancer/cga/MutSi ( smoker Never- /panlung n ADCs =72) org b.io/picard

gov/tcga / Nature Ge Nature *** http://c ; PRADA fusions, fusions, PRADA ; e.org/oncotator 4 15 Supplementary Supplementary ( ; University of of University ; n smoker ADCs Ever- =405) / a/indelocato CREBBP . Lung SqCC Lung ADC http:/ / ghub.ucsc. ; ; Pan-Lung NS / ; MutSig MutSig ; ( n n SqCCs /www. online online =460) etics Al and and 20 org/ l g r / - - - - - ; ; ;

© 2016 Nature America, Inc. All rights reserved. Martin Martin L Ferguson Jean C Zenklusen Collaborators: 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/ at online available is information permissions and Reprints version of t The authors declare competing interests:financial details are available in the conceived and the designed study and wrote the manuscript. J.N.W., P.S.H., and D.J.K. contributed to manuscript preparation. andR.G. M.M. contributed to sample coordination and quality control. A.D.C.,A.R., E.A.C., and ABSOLUTE analyses. S.A.S. and C.J.W. performed HLA genotyping. C.C. mutation calling and analyses. B.A.M. and A.D.C. contributed to copy number indel identification. M.I., M.R., M.S.L., and G. Getz contributed algorithms for performed batch analyses. effect G. Guo contributed to events using RNA-seq. X.H. and R.G.W.V. generated fusion S.L. calls. and R.A. C.S.P. generated the pan-lung A.N.B. portal. identified A.H.B. contributed to oncogene-negative analysis and manuscript preparation. signature analyses. J.W. contributed to A.A., M.N.A., and generatedR.S. neoantigen J.K.calls. contributed to mutational characterization, identification of comparison of recurrently altered genes, mutational signature identification and analysis of tumors from the cohort of Imielinski J.D.C. performed sample quality control, mutation calling and review, ABSOLUTE Award (M.M.), and National Cancer Institute grant (M.M.). R35CA197568 W81XWH-12-1-0269 (M.M.), the American Research Cancer ProfessorSociety government of the Russian Federation (A.A.), US Department of Defense contract National Cancer Institute grant (P.S.H.),K08CA163677 grant 074-U01 from the and U24CA126544, Additionally,U24CA143883. this work was funded by Nature Ge Nature Sudha Chudamani C AU O

T MPE Quesada, V.Quesada, L.B. Alexandrov, S.A. Forbes, M.R. Stratton, & P.J. Campbell, D.C., Wedge, S., Nik-Zainal, L.B., Alexandrov, R. Govindan, M.S. Lawrence, Alexandrov,L.B. head of characterization genomic Comprehensive Network. Atlas Genome Cancer K.A. Hoadley, M.S. Lawrence, S.A. Roberts, M. Imielinski, characterization genomic Comprehensive Network. Research Atlas Genome Cancer lung of profiling molecular Comprehensive Network. Research Atlas Genome Cancer A. Vaishnavi, of treatment on changes genomic of impact The B.E. Johnson, & S. Cardarella, J.M. Samet, 2015. statistics, Cancer A. Jemal, & K.D. Miller, R.L., Siegel, C.P. Wild, & B.W. Stewart, (2012). factor Genet. Nat. cancer. cancer.human in human in operative processes mutational Reports Cell of signatures Deciphering never-smokers. and genes. cancer-associated 500 carcinomas. cell squamous neck and oforigin. (2014). tissues across and within classification molecular types. tumour cancers. human in widespread sequencing. parallel cancers. lung cell squamous of adenocarcinoma. cancer. cancer. lung factors. risk environmental 65 2014). Cancer, on Research H , 5–29 (2015). 5–29 , O , 415–421 (2013). 415–421 , index.htm T R R C ING FINANCIAL IN FINANCIAL ING SF3B1 he pape Nat. Med. Nat. O n N

etics 47 et al. et et al. et 3 Am. J. Respir. Crit. Care Med. Care Crit. Respir. J. Am. T ee n hoi lmhctc leukemia. lymphocytic chronic in gene et al. et t al. et t al. et et al. et , 246–259 (2013). 246–259 , Nature l RIBU , 1402–1407 (2015). 1402–1407 , t al. et . r et al. et . et al. et et al. et t al. et Nature Exome sequencing identifies recurrent mutations of the splicing the of mutations recurrent identifies sequencing Exome t al. et

COSMIC: exploring the world’s knowledge of somatic mutations somatic of knowledge world’s the exploring COSMIC: Nucleic Acids Res. Acids Nucleic 19 Genomic landscape of non–small cell lung cancer in smokers in cancer lung cell non–small of landscape Genomic noei ad drug-sensitive and Oncogenic

Mapping the hallmarks of lung adenocarcinoma with massively with adenocarcinoma lung of hallmarks the Mapping ug acr n ee soes ciia eieilg and epidemiology clinical smokers: never in cancer Lung Cell Cell

n PBC yiie emns mtgnss atr is pattern mutagenesis deaminase cytidine APOBEC An T Discovery and saturation analysis of cancer genes across 21 across genes cancer of analysis saturation and Discovery , 1469–1472 (2013). 1469–1472 , Mutational heterogeneity in cancer and the search for new for search the and cancer in heterogeneity Mutational Signatures of mutational processes in human cancer.human in processes mutational of Signatures ADVANCE ONLINE PUBLICATION ONLINE ADVANCE 505 I lc-ie uainl rcse i hmn oai cells. somatic human in processes mutational Clock-like utpafr aayi o 1 cne tps reveals types cancer 12 of analysis Multiplatform 13

O

150 511 15 13 150 Nature NS , 495–501 (2014). 495–501 , , , Jiashan Zhang Clin. Cancer Res. Cancer Clin. , , Jia Liu , , Roy ol Cne Rpr 2014 Report Cancer World , 1121–1134 (2012). 1121–1134 , , 543–550 (2014). 543–550 , , 1107–1120 (2012). 1107–1120 , T EGFR ERES Nat. Genet. Nat.

Nature 499 EGFR T , 214–218 (2013). 214–218 ,

complex indels, and manuscript writing. T Nature 43

arnuzzer S 489 15 , D805–D811 (2015). D805–D811 , complex indel characterization. , Charlie , SunCharlie

, 519–525 (2012). 519–525 , 45 188

517 15 et al. , 970–976 (2013). 970–976 , , 5626–5645 (2009). 5626–5645 , , 770–775 (2013). 770–775 , , 576–582 (2015). 576–582 , 13 NTRK1 , , identification and MET 13 , , Ina Felau MET , , Carolyn M Hutter Itrainl gny for Agency (International a. Genet. Nat. exon 14 skipping eragmns n lung in rearrangements http://www. exon 14 complex

Cell 14 CA Cancer J. Clin. J. Cancer CA

, Rashi , NareshRashi 158 13

nature.com/ 44 929–944 , , , John A Demchok 47–52 , online Nature

13 32. 31. 30. 29. 28. 27. 26. 25. 24. 23. 22. 21. 20. 51. 50. 49. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 38. 37. 36. 35. 34. 33. , , Heidi J Sofia 14

aln, J.N. Gallant, of landscape The C. Lengauer, & J.L. Kim, S., Schalm, E., Cerami, N., Stransky, non-small-cell in oncogenes P.A.targetable Jänne, New & Binder, A. G.R., Oxnard, Lebanony, D. M. Saito, I. Akagi, T.I.Zack, Cancer Genome Atlas Research Network. Comprehensive molecular characterization X. Zhang, targets suppressor tumor Fbw7 The C. Chen, & Z. Zhou, H.Q., Zheng, D., Zhao, H. Wan, GTPase superfamily Ras putative B.E. Hast, of survey A galore! GAPs A. Bernards, Peifer, M. Peifer, E.C. Bruin, de J. Zhang, M.M. Gubin, S.A. Shukla, N.A. Rizvi, J. Brahmer, H.S. Kim, Y. Liu, L.M. Sholl, Shan, L. Yu,B. K.D. Swanson, Lepri, their F. and oncogenes Ras S. Albert, & U.R. Rapp, R., Schreck, K., Rajalingam, in back Ras Dragging F. McCormick, & R.K. Bagni, D., Esposito, A.G., Stephen, genome. cancer lung the at away Chipping K.E. Hutchinson, W.& Pao, cancer. lung non-small-cell in mutations driver New N. Girard, & W. Pao, K. Ye, noei die i ln cne ta i ciial rsosv t afatinib. to responsive clinically is Discov. that cancer lung in driver oncogenic cancer. in fusions kinase cancer. lung (2009). 2030–2037 carcinoma. lung non-small-cell nonsquamous from squamous retrospective a cohorts. adenocarcinoma: three of lung analysis cell non–small early-stage, in progression (2013). 3821–3832 adenocarcinoma. lung I stage in classifier prognostic robust a as 45 carcinoma. bladder urothelial of cancers. epithelial human in Res. Cancer proliferation. cell breast suppresses and degradation ubiquitin-mediated for KLF5 function. and ubiquitination. not (2003). and man in proteins activating of small-cell lung cancer. lung small-cell of evolution. cancer lung defines sequencing. multiregion by delineated antigens. mutant specific genes. HLA I class in cancer. lung cell non–small cancer. lung small-cell cancer. lung non-small-cell in vulnerabilities in (2013). target a as kinase never-smokers. in features (2009). 8341–8348 molecular and clinicopathologic Cancer Lung patients. adenocarcinoma lung in therapy EGFR-TKI from benefit survival better a Vav1. of activation patients. (2008). syndrome Noonan for 32 correlations. genotype–phenotype and effects, pathogenic on insights targets. downstream ring. the 18 Oncol. cancers. , Laxmi , Lolla Laxmi , 1134–1140 (2013). 1134–1140 , , 760–772 (2011). 760–772 , (2012). 349–351 ,

et al. et

t al. et 12 et al. et 5 et al. et al. Nat. Med. Nat. t al. et , 1155–1163 (2015). 1155–1163 , Cancer Cell Cancer et al. et , 175–180 (2011). 175–180 , t al. et et al. et et al. et 13 et al. et t al. et t al. et et al. et Structural and energetic mechanisms of cooperative autoinhibition and autoinhibition cooperative of mechanisms energetic and Structural

ytmtc icvr o cmlx netos n dltos n human in deletions and insertions complex of discovery Systematic Metabolic and functional genomic studies identify deoxythymidylate identify studies genomic functional and Metabolic 70 t al. et t al. et J. Clin. Oncol. Clin. J. et al. et t al. et et al. 89

, , Liming Yang Concurrence of t al. et SOS1 Development rpe-ie atr i rqie fr eiaa ln morphogenesis lung perinatal for required is 5 factor Kruppel-like t al. et et al. et Pan-cancer patterns of somatic copy number alteration. number copy somatic of patterns Pan-cancer et al. et Identification of focally amplified lineage-specific super-enhancers lineage-specific amplified focally of Identification , 4728–4738 (2010). 4728–4738 , obnto o poen oig n nnoig ee expression gene noncoding and coding protein of Combination Integrative genome analyses identify key somatic driver mutations driver somatic key identify analyses genome Integrative Cancer-derived mutations in mutations Cancer-derived , 337–342 (2015). 337–342 , Mutational landscape determines sensitivity to PD-1 blockade in blockade PD-1 to sensitivity determines landscape Mutational h ascain f irRA xrsin ih rgoi and prognosis with expression microRNA of association The nrtmr eeoeet i lclzd ug adenocarcinomas lung localized in heterogeneity Intratumor Diagnostic assay based on hsa-miR-205 expression distinguishes Comprehensive analysis of cancer-associated somatic mutations somatic cancer-associated of analysis Comprehensive ioua vru dctxl n dacd qaoscl non– squamous-cell advanced in docetaxel versus Nivolumab ug dncrioa with adenocarcinoma Lung

Cancer Res. Cancer Cell hcpit lcae acr muohrp tres tumour- targets immunotherapy cancer blockade Checkpoint Systematic identification of molecular subtype–selective molecular of identification Systematic 13 mutations in Noonan syndrome: molecular spectrum, structural Spatial and temporal diversity in genomic instability processes instability genomic in diversity temporal and Spatial 22

GR iae oan ulcto (GRKD i a novel a is (EGFR-KDD) duplication domain kinase EGFR

Biochim. Biophys. Acta Biophys. Biochim. SOS1 15 25 , , Nat. Biotechnol. Nat. , 97–104 (2016). 97–104 ,

N. Engl. J. Med. J. Engl. N. LKB1 140 T , 272–281 (2014). 272–281 , , , Ye Wu Nat. Commun. Nat. Clin. Cancer Res. Cancer Clin. Nature odd Pihl odd Nat. Genet. Nat. mutations are rare in human malignancies: implications malignancies: human in rare are mutations

135 , 246–256 (2010). 246–256 , 31 Science EGFR Nat. Genet. Nat. mtn ln cancer. lung -mutant Science , 1097–1104 (2013). 1097–1104 ,

Drosophila , 2563–2572 (2008). 2563–2572 , 74

Nature 515 , 808–817 (2014). 808–817 , amplification and sensitizing mutations indicate 13 15

ee Crmsm Cancer Chromosom. Genes 348 , 577–581 (2014). 577–581 ,

, , Zhining Wang

44

, , Chad J Creighton Science

346 14 33

507

373 , 124–128 (2015). 124–128 , 5 , 1104–1110 (2012). 1104–1110 , 48 . , 1152–1158 (2015). 1152–1158 , , , Yunhu Wan , 4846 (2014). 4846 ,

, 251–256 (2014). 251–256 , Biochim. Biophys. Acta Biophys. Biochim.

17 , 315–322 (2014). 315–322 , , 176–182 (2016). 176–182 , 1773 , 123–135 (2015). 123–135 , Cell KEAP1

, 1875–1882 (2011). 1875–1882 , 346 EGFR

155 , 1177–1195 (2007). 1177–1195 , , 256–259 (2014). 256–259 , impair NRF2 degradation but degradation NRF2 impair acr Discov. Cancer , 552–566 (2013). 552–566 , amplification has distinct has amplification s e l c i t r A 13 14 . ln Oncol. Clin. J. acr Res. Cancer ,

, Cancer Res. Cancer

47

16 1603 3 Hum. Mutat. Hum. 253–259 , 870–879 , , Nat. Genet. Nat. Nat. Med. Nat.

, 47–82 , Cancer Lancet

69 27 73  , , ,

© 2016 Nature America, Inc. All rights reserved. Systems Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA. Center, University of Pittsburgh Cancer Institute, Pittsburgh, USA. Pennsylvania, California, Los Angeles, California, USA. Texas, USA. Medicine, Harvard Medical School, Boston, USA. Massachusetts, Molecular Medicine Cologne, Cologne, Germany. Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, USA. Massachusetts, USA. German Centre for Lung Research, Heidelberg, Germany. Cancer Centre, Toronto, Ontario, Canada. 27 North Carolina Tissue Procurement Facility, Chapel Hill, North Carolina, USA. Hill, Chapel Hill, North Carolina, USA. Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Hill, Chapel Hill, North Carolina, USA. Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Pathology and Laboratory Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Maryland, USA. 13 SheltonCandace Johanna Gardner Stephen B Baylin Daniel J Weisenberger Stacey B Gabriel Sougnez Carrie David Kwiatkowski Nilsa C Ramirez Peter Park Dianne Chadwick Scott Frazer Hailei Zhang Matthew D Wilkerson T Piotr A Mieczkowski Katherine A Hoadley Rathmell W Kimryn s e l c i t r A 1 0 ara ara Skelly Department Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas, USA. National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA.

41 Center Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, USA. 31 21 16 , 1 32 Baylor Baylor College of Medicine, Houston, Texas, USA. , , Matthew G Soloway , , Jaegil Kim 1 , , David I Heiman , Jay Bowen 33 1 Research Institute at Nationwide Children’s Hospital, Columbus, Ohio, USA. 33 1 , , Steven E Schumacher 45 43 , , Matthew Meyerson 45 28 , , , 34 , , James G Herman , , Kevin Lau 1 T , , , , Lisa , WiseLisa 17 8 21 T roy Shelton 19 , , Joshua D Campbell 21 42 , , J homas Muley , Lisle , E Lisle Mose , 21 , Lori , Boice Lori , , David J Van Den Berg 1 T , , Alan P Hoyle 33 , , Michael S Lawrence odd Aumanodd 24 21 , Julie M Gastier-Foster 43 Research Research Computing Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 29 Sidney Sidney Kimmel Cancer Comprehensive Center, Johns Hopkins University, Baltimore, Maryland, USA. Thoraxklinik Thoraxklinik am Heidelberg, Heidelberg, Germany.Universitätsklinikum 45 33 1 , , Juok Cho , , David Mallery 45 , Erik , Zmuda Erik 38 19 , , Mark Sherman Department Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, USA. Massachusetts, 17 , , Donghui 29 19 44 , 1 25 , 17 , Charles , M Charles Perou 30 , , , Ludmila Danilova 31 1 2 , , Mei Huang 19 , , , , , Michael Meister Harvard Harvard Medical School, Boston, USA. Massachusetts, 38 18 12 , , Corbin D Jones , , Juliann Shih 1 , , Saianand Balu , , Lauren A Byers 1 , , , Nils Gehlenborg 2 , Bradley , A Bradley Murray 40 Department Department of Thoracic Head and Neck Medical Oncology, MD Anderson Cancer Center, Houston, T 33 1 17 42 , , Lynda Chin an 45 University University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. , , Josh Stuart , , Moiz S Bootwalla , , Scott Morris 33 21 45 , 26 , , Junyuan Wu 17 34 , , Peggy Yena Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA. , , Mark Gerken 25 45 1 , Leigh B , Leigh 18 International Genomics International Consortium, Phoenix, Arizona, USA. , 2 29 19 , , Rameen , Beroukhim Rameen 22 43 23 19 40 , , , 30 Carolina Carolina Center for Genome Sciences, University of North Carolina at Chapel , , Leslie , Cope Leslie , 35 1 23 14 , , 21 T 1 , , , Hendrik Dienemann 27 T SRA International, SRA Fairfax, International, Virginia, USA. om Bodenheimer , , Eric Collisson , , , Stuart R Jefferys , , Jeffrey Roach 26 36 20 anja Davidsen 45 , , Ming-Sound 1 Department Department of Translational Genomics, Cologne, Germany. , , Gordon Saksena Department Department of Internal Medicine, University of North Carolina at Chapel , 45 2 , , Joeseph Paulauskis T 19 , , Andrew D Cherniack & Gordon B Mills horne 34 , , Umadevi Veluvolu 42 42 33 The Ohio State University, Columbus, Ohio, USA. Norris Comprehensive Norris Cancer Comprehensive Center, University of Southern , , Phillip H Lai , Kristen M Leraas a DVANCE ONLINE PUBLICATION ONLINE DVANCE 43 18 28 , , Daniel J Crain , University University Health Network and Princess Margaret 19 , 24 7 13 25 T 32 , , Martin Peifer 30 , , Yan Shi , , Peter W Laird sao , , Gad Getz Brigham Brigham and Women’s Hospital, Boston, 1 19 19 Translational Lung Research Centre Heidelberg, , 2 , , D Neil Hayes , , Shaowu Meng , 1 39 28 , , Doug Voet 29 42 , , , , Frances Allison 46 T , , , Dennis 45 30 ravis ravis I Zack 21 , Robert , PennyRobert 19 , , Raju Kucherlapati 33 1 , , Joel S Parker , , Janae V Simons , , 1 2 45 15 T , , , Alice H Berger 9 19 ara M Lichtenberg Leidos Leidos Biomedical, Rockville, , , Erin Curley , , Michael S Noble Lineberger Lineberger Comprehensive 36 T 41 , 1 37 19 , , Pei Lin Maglinte ,

39 19 ,

, 22 46 44 20

1 Nature Ge Nature Department Department of , , Department Department of Department Department of Hillman Hillman Cancer 38

, 2 18

5 28 , University University of 35 45 Department Department of

19 , Department of , 1

37 , , 21 45 42 Center Center for

19 , , 1 31 , n , , 2 etics

, , 32 1 33 , , ,

© 2016 Nature America, Inc. All rights reserved. Identification of recurrent copy number changes. genes. other all across mutations other to mutations loss-of-function of ratio the than higher significantly was gene given a for mutations other to and frameshift, nonsense, ing was used to determine whether the ratio of loss-of-function mutations test (includ Fisher’sexact one-sided A data. RNA-seq matching in reads supporting and had as (<1.5%) no of all positive, mutations had fractions these low allelic a false likely was that deletion frameshift a recurrent showed mutations of its gene, combined MutSig2CV the of correction multiple-hypothesis in considered were expression higher with cluster the to belonging of ability distributions was fit in R using the mclust package v4.2. Genes with 95% prob ( data RNA-seq available and >50% of ABSOLUTE from estimate purity a had that SqCCs 238 and ADCs 185 log for obtained was gene each median for value (FPKM) The purity. high relatively with tumors across expression low hypotheses tested in the MutSig2CV analysis, we excluded genes that exhibited mutations/Mb 9.7 of rate patients, mutation of a 5% assuming in mutated genes detect to power 41% and patients of 10% Mb mutations/ 8.7 of rate mutation a assuming patients, of 5% in mutated detect genes to power 73% and patients of 10% in mutated genes detect to power ( sites conserved evolutionarily at ( gene a within mutations of clustering ( rate mutation background the to relative frequency mutational high combines which MutSig2CV, using identified were genes. mutated significantly of Identification URLs). Tumor (see Pan-Lung the Portal in genes described ously as previ exomes, if in found normal a of panel over 2,900 were excluded calls the tumor to the matched normal sample to exclude germline variants. Somatic MuTect using were called and indels SSNVs TCGA TCGA. from lung SqCCs from reported newly SqCCs lung described previously 176 with together Imielinski of cohort the from ADCs lung 159 and TCGA, from ADCs TCGA from ADCs lung described previously 227 included set sample final The analysis. final the in considered were solution an available SNP6.0 array for copy and number analysis, a ABSOLUTE valid exomes was estimated using ContEst indels as small well as of recalibration quality base duplicates, and Toolkitthe Genome Analysis (GATK) for around realignment pipeline Picard the using processed were Reads sequencing. paired-end Illumina by followed kit 50Mb Exon All Human SureSelect Agilent the using performed calling. mutation and alignment, sequencing, DNA BCR. at the performed were control quality studies in previous Center) Cancer Kettering Sloan W. by (Memorial Travis led committee pathology expert an lung from bySqCCs review TCGA had histological also undergone additional ADCs from Imielinski of 159 lung site. the source tissue of original the classification histological the confirm to eosin and hematoxylin with stained and cut were sections frozen one or (BCR), two additional the Core Resource to tissue the of Biospecimen frozen shipment After classification. histological initial an given was tumor each where sites, source tissue contributing the at performed was review cal neoadju ( resection received before treatment vant who ADC lung with patients three and SqCC lung with boards. All patients were naive to treatment with the exception of four patients review institutional relevant the from approval with and consent appropriate described previously Imielinski the for performed review. were sequencing pathology and collection Sample ONLINE doi: SNP6.0 Affymetrix arrays, and signal intensities were normalized as previously Supplementary Fig. 1 Supplementary 1 10.1038/ng.3564 0 . For 484 lung SqCCs, we had 100% power to detect genes mutated in in mutated genes detect to power 100% had we SqCCs, lung 484 For . TRERF1 5 6 5 . . This pipeline uses BWA for read alignment, Picard tools for marking and Indelocator (see URLs), respectively. These algorithms compare

MET , was excluded from the final results because closer inspection inspection closer because results final the from excluded was , 1 H 0 . Coding mutation patterns can be viewed for individual individual for viewed be can patterns mutation Coding . ODS 6– et et al. ). ). For each tumor a type, mixture model of two normal 8 . All specimens were obtained from patients with with patients from obtained were specimens All . , , 289 of the lung ADCs from TCGA, and 213 of the de novo de Supplementary Table 2 Supplementary P 6– FN 5 8 3 . Nucleic acid extraction and molecular and molecular extraction . acid Nucleic . . Only tumors with <5% contamination, ) out-of-frame start codon mutations) mutations) codon start out-of-frame 1 0 P . For 660 lung ADCs, we had 100% 100% had we ADCs, lung 660 For . CL ), and enrichment of mutations mutations of enrichment and ), ape olcin n DNA and collection Sample 1 Significantly mutated genes genes mutated Significantly et al. et 0 5 . To reduce the number of of number the Toreduce . 6 2 DNA was hybridized onto , 274 newly reported lung lung reported newly 274 , . Contamination . in Contamination tumor P and TCGA cohorts as as cohorts TCGA and values from tests for for tests from values Exome capture was was capture Exome ). Initial pathologi Initial ). P values. One One values. 7 and 308 308 and et al. et P CV 5 8 ), ), 2 4 - - - - - ,

transcript annotations used for this analysis included included analysis this for used annotations transcript were gene a for exons all z described previously as values, RPKM to normalized and measured were levels expression exon transcripts, pipeline PRADA the with tified obtained from previous studies with juncBASE was identified log were data all and 1, to set were described previously as estimates, sion Mapsplice with assembly to data RNAthe from hg19 seq genome TCGA. aligned were reads generated, ined in this study, 495 lung ADCs and 476 lung SqCCs had corresponding RNA-RNA sequencing for expression and fusion analyses. contained fewer than 25 genes and were than smaller 10 Mb were considered. that peaks only criterion, second Forgene.the each of locations end and start the to Mb 1 adding after overlapped peaks the of location genomic the (ii) or same across tumor types if (i) the known target gene of each peak was the same and ploidy were estimated using ABSOLUTE Purity respectively. −0.1, than less or 0.1 than greater was ratio number copy focal 2.0–estimated GISTIC the if tumor a within deleted or amplified focally (ref. 2.0 GISTIC using identified were tion altera number copy somatic focal for peaks Recurrent segment. each of tude algorithm described mutation were considered. were mutation each for epitopes best the be to predicted peptides only this, After proteins. I MHC called the of each for performed were steps previous the dependent, is HLA binding epitope Because tumor). in each genes expressed highly most the gene in was expressed the RNA-seq data for that tumor (among the 15,000 value affinity score a processing displayed peptide mutant the if considered of the mutant versus peptide sequences wild-type immunogenicities relative the of comparator reliable a be to found was value This peptide. wild-type the for value affinity median the by divided peptide for value pair as the mutantaffinity the peptide median mutant and wild-type ite of Wemeasure strength. binding ratio for the neoepitope each defined also for and each peptide used the median value across as all algorithms a compos SMM using the NetChop program previously as described molecules known to be the possible lengths for peptides presented by human MHC class I and type mutant of peptides 8, 9, 10, and 11 amino acids in as arelength, these of wild- consisting were generated lists Separate alterations. missense single- residue and alleles HLA called confidently between interaction considering For for were exomes. bytumor, made lung each all cancer predictions epitope immunogenicity. Predicting signatures. all from mutations estimated the of sum the by signature that for mutations estimated of number the dividing by derived was signature a for mutations estimated of fraction the tumor, each Within described previously as bootstrapping and via ten, assessed to was one stability signature from varied was signatures possible of number The states. contexts nucleotide and for strands, 2 a transcriptional total of 192 mutational tri different 16 with types Toolbox. mutation We 6 used Statistics MATLAB from MATLAB URLs) (see and Central run using the from nnmf function the processes) mutational estimated of number exomes, cancer lung of an and esses K signatures. mutational of Identification E E for score transformed across all tumors within each tumor type. Subsequently, Subsequently, type. tumor each within tumors all across transformed score N N × × S S MET G T T 6 0 0 matrix of mutation catalogs into a a into catalogs mutation of matrix 7 0 0 , , and SMMPMBEC 0 0 , , 6 0 5 0 6 . Segmentation was performed using the Circular Binary Segmentation E 6 0 3 0 followed by Ziggurat Deconstruction to infer the length and ampli . . We MHC then predicted for binding affinity each of the peptides N 3 3 N 5 2 S ≥ 4 3 × × T 0.01, and a neoepitope ratio ratio neoepitope a and 0.01, 6 1 0 0 1 G 0 0 5 0 for for matrix of mutational exposures (where (where exposures mutational of matrix for for 0 0 3 4 TP63 6 8 K . First, a proteasome processing score was calculated . score was calculated First, a processing proteasome 1 5 z NTRK2 is the number of mutational states, and and states, mutational of number the is 8 1 6 score transformed again within each tumor. The The tumor. each within again transformed score , and normalized with RSEM with normalized and , 8 8 6 methods to predict MHC binding affinity values values to MHC affinity binding predict methods . 3 4 6 . . Then, we used the NetMHC HLA alleles were called with POLYSOLVER with called were alleles HLA for for 0 6 6 as described previously , 31 . Expression for an individual exon was first first was exon individual an for Expression . 6 , 2 , , 6 . To plot the exonic expression of fusion fusion of expression exonic the plot To . CAPZA2 1 2 . . Fusions for additional tumors were iden E transformed. Skipping of of Skipping transformed. N 6 S . Expression values less than 1 FPKM 1 FPKM than less values . Expression T 0 K NMF was used to deconvolute a deconvolute to used was NMF 0 5 5 , , × × 1 7 4 0 ≥ 6 E . Two peaks were considered the ). A peak was considered to be be to considered was peak A ). 0 1 and the mRNA transcript of of transcript mRNA the and 1 . Code for NMF was obtained obtained was NMF for Code . N N 0 4 3 matrix of mutational proc mutational of matrix S 8 4 T . . Peptide pairs were further 3 Of the 1,144 tumors exam 0 5 0 2 0 6 6 0 . Lists . of Lists were fusions for for Nature Ge Nature 0 5 ENST0000039775 3 9 6 to FPKM expres FPKM to 0 5 2 , , NetMHCpan G 4 is the number number the is ≥ TRIM24 1 MET 0.7, a median a median 0.7, 8 for for exon 14 14 exon N n KIF5B is the the is etics , and and , 1 6 4 6 6 2 7 ------, . ,

© 2016 Nature America, Inc. All rights reserved. 55. 54. 53. 52. accession under (dbGaP) Imielinski review. pathology and collection Sample the with performed procedure. was Benjamini–Hochberg testing multiple-hypothesis for Correction stage. tumor for controlling without and with status, mutation and survival patient between associations examine to used was model hazards proportional Cox The TCGA. from SqCC lung with patients 473 and ADC lung with patients In variables. egorical data total, longitudinal were on available for survival 481 cat two comparing when used was test exact Fisher’s The noted. unless otherwise variables continuous for used were groups) two than more between test (comparison two between groups) or the Kruskal–Wallis test (comparison comparisons. Statistical Nature Ge Nature

Cibulskis, K. Cibulskis, S.L. Carter, K. Cibulskis, M.A. DePristo, heterogeneous cancer samples. cancer heterogeneous cancer. data. sequencing (2011). next-generation in samples data. sequencing DNA (2011). next-generation using Nat. Biotechnol. Nat. et al et n etics t al. et . et al. et 8 are available in the database of Genotypes and Phenotypes Phenotypes and Genotypes of database the in available are t al. et t al. et Absolute quantification of somatic DNA alterations inhuman DNA alterations somatic of quantification Absolute Sensitive detection of somatic point mutations in impure and impure in mutations point somatic of detection Sensitive ots: siaig rs-otmnto o human of cross-contamination estimating ContEst: faeok o vrain icvr ad genotyping and discovery variation for framework A Nonparametric tests such as the Wilcoxon rank-sum

30 phs000488.v1.p , 413–421 (2012). 413–421 , Nat. Biotechnol. Nat. 1 Clinical and molecular data from from data molecular and Clinical .

31 Bioinformatics , 213–219 (2013). 213–219 , a. Genet. Nat.

27 43 2601–2602 , 491–498 , -

68. 67. 66. 65. 64. 63. 62. 61. 60. 59. 58. 57. 56.

Kim, Y., Sidney, J., Pinilla, C., Sette, A. & Peters, B. Derivation of an amino acid amino an of Derivation B. Peters, & A. Sette, C., Pinilla, Y.,J., Sidney,Kim, sequence the describing models quantitative Generating A. Sette, & B. Peters, I. Hoof, M. Nielsen, in proteasome the Kes¸mir,of & role O. The Lund, C. C., Lundegaard, M., Nielsen, B. Alberts, W. Torres-García, K. Yoshihara, A.N. Brooks, data RNA-Seq from quantification transcript accurate RSEM: Dewey,C.N. & B. Li, K. Wang, C.H. Mermel, Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation BMC Bioinformatics BMC prior. Bayesian a as application its and binding MHC peptide: matrixmethod. for matrix similarity thestabilized with processes Bioinformatics biological of specificity humans. representations. sequence novel cleavage. proteasomal of predictions improved from T-cellobtained cytotoxic insights generating epitopes: Bioinformatics fusions. transcript mammals. genome. reference a without or with discovery. 12 agt o fcl oai cp-ubr leain n ua cancers. human in alteration copy-number somatic focal of targets (2004). data. number copy DNA array-based of analysis the for , R41 (2011). R41 , et al. et Immunogenetics et al. et Nucleic Acids Res. Acids Nucleic Genome Res. Genome Molecular Biology of the Cell the of Biology Molecular et al. et et al. et et al. et NetMHCpan, a method for MHC class I binding prediction beyond prediction binding I class MHC for method a NetMHCpan, et al. et

MapSplice: accurate mapping of RNA-seq reads for splice junction splice for reads RNA-seq of mapping accurate MapSplice: 6 30 , 132 (2005). 132 , Reliable prediction of T-cell epitopes using neural networks with networks neural using T-cellepitopes of prediction Reliable , 2224–2226 (2014). 2224–2226 , Conservation of an RNA regulatory map between map regulatory RNA an of Conservation Oncogene t al. et GISTIC2.0 facilitates sensitive and confident localization of the of localization confident and sensitive facilitates GISTIC2.0 The landscape and therapeutic relevance of cancer-associated of relevance therapeutic and landscape The

10 Immunogenetics , 394 (2009). 394 ,

RD: ieie o RA eunig aa analysis. data sequencing RNA for pipeline PRADA: 21

61 , 193–202 (2011). 193–202 ,

, 1–13 (2009). 1–13 , 34

38 , 4845–4854 (2015). 4845–4854 , Protein Sci. Protein , e178 (2010). e178 , BMC Bioinformatics BMC

57 (Garland Science, 2002). Science, (Garland , 33–41 (2005). 33–41 ,

12 , 1007–1017 (2003). 1007–1017 , Biostatistics

12 doi: , 323 (2011). 323 , 10.1038/ng.3564 Drosophila eoe Biol. Genome

5 , 557–572 , BMC and