<<

(2014) 33, 5078–5089 & 2014 Macmillan Publishers Limited All rights reserved 0950-9232/14 www.nature.com/onc

ORIGINAL ARTICLE A comparative survey of functional footprints of EGFR pathway mutations in human

A Lane1,4, A Segura-Cabrera1,4 and K Komurov1,2,3

Genes functioning in epidermal (EGFR) signaling pathways are among the most frequently activated in human cancers. We have conducted a comparative analysis of functional footprints (that is, effect on signaling and transcriptional landscapes in cells) associated with oncogenic and tumor suppressor mutations in EGFR pathway in human cancers. We have found that mutations in the EGFR pathway differentially have an impact on signaling and metabolic pathways in cells in a mutation- and tissue-selective manner. For example, although signaling and metabolic profiles of breast tumors with PIK3CA or AKT1 mutations are, as expected, highly similar, they display markedly different, sometimes even opposite, profiles to those with ERBB2 or EGFR amplifications. On the other hand, although low-grade gliomas and glioblastomas, both brain cancers, driven by EGFR amplifications are highly functionally similar, their functional footprints are significantly different from lung and breast tumors driven by EGFR or ERBB2. Overall, these observations argue that, contrary to expectations, the mechanisms of tumorigenicity associated with mutations in different genes along the same pathway, or in the same across different tissues, may be highly different. We present evidence that oncogenic functional footprints in cancer cell lines have significantly diverged from those in tumor tissues, which potentially explains the discrepancy of our findings with the current knowledge. Nevertheless, our analyses reveal a common inflammatory response signature in EGFR-driven human cancers of different tissue origins. Our results may have implications in the design of therapeutic strategies in cancers driven by these oncogenes.

Oncogene (2014) 33, 5078–5089; doi:10.1038/onc.2013.452; published online 28 October 2013 Keywords: oncogenic networks; functional footprints; EGFR oncogenes

INTRODUCTION Akt, MAPK or mammalian target of rapamycin (mTOR) in cancers 3 Receptor tyrosine of the epidermal growth factor receptor with activating mutations in the EGFR pathway. However, the (EGFR) family, such as EGFR and ERBB2, are frequently activated in proto-oncogenes of the EGFR pathway have multiple roles in the human cancers. Upon stimulation by its ligands, these receptor cell that are independent of their ‘most famous’ targets. For tyrosine kinases activate a number of downstream signaling example, EGFR has almost 300 direct interacting partners listed in pathways that promote proliferation, growth and survival in public –protein interaction databases (not shown). Simi- cells.1,2 Some of the most notable of such pathways include the larly, Ras is known to have multiple signaling targets independent Ras/mitogen-activated protein (MAPK) and phospho- of its most studied downstream targets, Raf and Akt,6 and the Raf inositide 3-kinase/Akt signaling pathways, whose role in kinase has multiple functions that are independent of its most promoting growth, survival and tumor progression have been studied downstream target MEK.7 Therefore, it is possible that well characterized.3 Accordingly, activating mutations in the Ras/ these oncogenes engage different mechanisms in promoting MAPK pathway, that is, in NRAS, KRAS and BRAF oncogenes, are tumorigenesis, which may have significant implications in the very common in melanomas and in lung and colon cancers, design of therapeutic strategies against cancers driven by these whereas mutations in PIK3CA and AKT1 genes are frequently oncogenes. observed in breast cancers. Similarly, inactivating mutations in NF1 Here, by using the genome-wide data on mRNA and protein and phosphatase and tensin homolog (PTEN) genes, inhibitors of expressions, as well as somatic mutations and genomic copy Ras and PI-3K/Akt signaling, respectively, are also frequently numbers for different human cancers collected by The Cancer observed in different cancers. Genome Atlas (TCGA; https://tcga-data.nci.nih.gov/tcga/), we have Most of the current targeted therapy strategies in cancers conducted a comparative analysis of functional footprints of the driven by mutations in the EGFR pathway genes focus on oncogenes and tumor suppressors of the EGFR pathway. We targeting of the driver oncogene, such as targeting EGFR with define a functional footprint of an oncogene as the landscape of gefitinib or erlotinib in EGFR-mutated lung cancers4 or targeting functional changes, at multiple levels of organization, directly or BRAF with vemurafenib in BRAF-mutant melanomas.5 In addition, indirectly induced by the oncogenic mutation in the tumor. Our there is an effort on developing inhibitors to target some of the analyses reveal similarities and differences in functional footprints common downstream nodes along the EGFR pathway, such as among gene mutations within and across tissue types. However,

1Division of Experimental Hematology and Cancer Biology, Cancer and Blood Diseases Institute, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; 2Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA and 3Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA. Correspondence: Dr K Komurov, Division of Experimental Hematology and Cancer Biology, Cancer and Blood Diseases Institute, Cincinnati Children’s Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229, USA. E-mail: [email protected] 4These authors contributed equally to this work. Received 13 August 2013; revised 13 September 2013; accepted 20 September 2013; published online 28 October 2013 EGFR pathway footprints in human cancers A Lane et al 5079 most notably, despite some expected similarities in downstream downstream of these receptors, such as PIK3CA or KRAS (see pathway activations, such as activation of the MAPK pathway Figure 1c). For example, the footprints of ERBB2 amplification had by KRAS mutations, or of the Akt pathway by PIK3CA or PTEN no similarity to those of PIK3CA, AKT1, NF1 mutations, or even mutations, we find that the functional footprints often show of EGFR amplifications, in BRCA (Figure 1c). Similarly, EGFR significant tissue selectivity. To explore tissue selectivity of amplification footprints in LUAD had minimal similarity to KRAS, functional footprints of an oncogene, we present a detailed BRAF, NF1 or even EGFR mutations, in LUAD (Figure 1c), indicating analysis of EGFR and ERBB2 activations in breast, glioblastoma that these mutations lead to different downstream events. (GBM), lower-grade glioma (LGG) and lung tumor samples. Another surprising observation was a minimal correlation among However, despite the extensive differences in global functional transcriptional footprints across cancers based on tissue of origin, profiles, we find that tumors driven by the EGFR oncogene even in the case of the same oncogene (see Figure 1c). Indeed, often display an inflammatory response signature characterized generally, there is only weak, if any, correlation in transcriptional by interferon, interleukin or Toll-like receptor (TLR) signaling, footprints of EGFR, PIK3CA, KRAS, BRAF or NF1 gene mutations depending on the tissue type. This study, to our knowledge, is the across the cancer types. A notable exception to this is the first comprehensive analysis of global pathway profiles associated high similarity of transcriptional footprints of EGFR in GBM with EGFR pathway-driven human cancers. and LGG, which are both tumors of the brain (see Figure 1c). These observations suggest that although there is an overall similarity in the transcriptional footprints of EGFR pathway RESULTS mutations, there is a significant heterogeneity across tissue types Heterogeneity of transcriptional footprints in EGFR pathway for the same oncogene, and also within a tissue type for certain mutations oncogenes. We chose to study the seven oncogenes (EGFR, ERBB2, NRAS, KRAS, BRAF, PIK3CA and AKT1) and two tumor suppressors (PTEN and Heterogeneity of signaling footprints of EGFR pathway mutations NF1) implicated in the EGFR signaling network that are most frequently mutated in human cancers (Figure 1a). To gain insight Next, we compared signaling footprints of EGFR pathway into global molecular profiles of human cancers associated with mutations based on RPPA measurements of certain phosphory- these mutations, we set to identify functional footprints (tran- lated signaling known to be involved downstream of the scriptional or signaling) of these mutations. We define the EGFR pathway mutations (see Figure 2a). We chose these signaling transcriptional footprint of a mutation as the vector of individual outputs due to their known roles in EGFR signaling and the impact scores (t-values) of the mutation on the expression of each quality of the readings in RPPA data sets as assessed by at least gene in the transcriptome based on a multiple linear regression one strong expected correlation (for example, phospho-MAPK (MLR) model (Figure 1b and see Materials and methods). Use of (pMAPK) correlation with KRAS mutation, phospho-EGFR correla- MLR in our case allows for scoring of causal impacts of mutations tion with EGFR mutation or amplification). An analysis of signaling on the mRNA levels of a gene, as it is unlikely that the causal footprints of EGFR pathway mutations also reveals expected direction will go the other way around (that is, mRNA-mutation). similarities as well as unexpected differences (Figure 2b). As no In addition, MLR allows for isolation of individual effects of each RPPA data were available for LGG at the time of this analysis, it mutation by accounting for indirect confounding effects of other was excluded from this portion of our study. mutations in patients. In contrast to a transcriptional footprint, a signaling footprint Expected correlations. Most consistent correlations of signaling is defined as the vector of individual impact scores of the events included some of the expected observations of high levels mutation on each post-translational modification as measured by of tyrosine-phosphorylated EGFR and ERBB2 in tumors that carry reverse-phase protein arrays (RPPAs) in the TCGA data sets. We EGFR or ERBB2 amplifications or EGFR mutations, high phospho- chose to study the functional footprints of the EGFR pathway Akt levels (both S473 and T308 sites) in tumors with PIK3CA, AKT1 genes in breast carcinoma (BRCA), colon adenocarcinoma (COAD), or PTEN mutations, and high phospho-MEK and pMAPK levels in GBM, LGG, lung adenocarcinoma (LUAD) and skin melanoma tumors with KRAS or NRAS mutations (Figure 2b). KRAS mutations (SKCM) samples where these mutations are most frequently in LUAD lead to the activation of HER3, Src, Raf/Mek/MAPK and observed (see Supplementary Information for mutation fre- mTOR/S6 pathways, but not Akt. However, KRAS mutations in quencies). For analyses, we used the extensive collection of COAD only lead to the activation of Src and Mek/MAPK/RSK/S6 RNAseq, gene copy number, RPPA and somatic mutation data pathways. sets for these cancers in TCGA. We obtained transcriptional footprints for each mutation in PTEN loss in SKCM and COAD. Interestingly, PTEN mutations in each cancer type. We asked whether oncogenic aberrations of the SKCM or COAD did not reveal any correlation with Akt activation, EGFR pathway members have similar footprints. To answer this despite loss of PTEN expression in both. In case of COAD, this may question, we performed all pairwise correlations of each be attributed to a concomitant downregulation of Akt protein (see transcriptional footprint in every cancer type. For each pairwise Figure 2b), which also correlates with a downregulation of AKT2 correlation, we only included t-values that had absolute values of and AKT3 mRNA expression (not shown), indicating a possible 42 in at least one of the pair (that is, Po0.05 for correlation feedback regulation of Akt activity in PTEN-negative colon cancers significance, see Materials and methods). As expected, based on through a transcriptional mechanism. PTEN loss in SKCM did, prior knowledge, some of the most similar footprints belong to however, show a significant correlation with of genes whose products are proximally located on a linear pathway, PKCa, which can also be activated by phosphoinositide 3-kinase such as PIK3CA and AKT1 in BRCA, or KRAS and BRAF in LUAD and signaling and inhibited by PTEN.8 Therefore, PTEN loss in COAD, or NRAS and BRAF in SKCM (Figures 1c and d). In addition, melanomas may serve to activate C (PKC) weaker but still strong similarities were observed between pairs of signaling, rather than the Akt pathway, which seems to be the other mutations (for example, NRAS–KRAS in COAD, and EGFR and major target of PTEN mutations in other cancers (see Figure 2b). NF1 in LGG) that are expected to impact similar downstream Interestingly, PTEN mutations/loss in SKCM also correlates pathways. However, this correlation map also reveals some with ERBB2/HER2 overexpression. Such a correlation of PTEN loss unexpected observations. First, transcriptional footprints of some with ERBB2/HER2 overexpression has not been previously gene mutations, such as ERBB2 or EGFR amplifications in BRCA or reported. As ERBB2 is not amplified in SKCM (not shown) and LUAD, show no similarity to footprints of genes that function PTEN mutations/loss does not correlate with increased ERBB2

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089 EGFR pathway footprints in human cancers A Lane et al 5080

Mutation 1 Mutation 3 Mutation 5 Mutation 2 Mutation 4

NF1 EGFR / ERBB2 Matrix PIK3CA PTEN

NRAS KRAS

ERBB2EGFRPIK3CAAKT1PTENNF1 AADACL3 BRAF AKT1 AADACL2 t-value AACS Oncogene AACSL -5 5 AAAS Tumor suppressor AAA1 A4GNT A4GALT A2M A2ML1 A2LD1 A2BP1 A1CF LGG COAD LUAD SKCM BRCA GBM NF1 PTEN BRAF SKCM NRAS NF1 BRAF 8 KRAS LUAD EGFR.m 6 EGFR.amp NF1 PIK3CA LGG 4 EGFR NF1 2

PTEN GBM AKT1 EGFR NF1 0 PTEN BRAF COAD −2 KRAS NRAS NF1 PTEN −6 −4 −2 0 2 4 6 AKT1 BRCA PIK3CA PIK3CA EGFR ERBB2

NF1 NF1 NF1 NF1 NF1 NF1 Correlation r AKT1 PTEN BRAF PTEN PTEN BRAF BRAF PTEN KRAS KRAS EGFR NRAS EGFR EGFR NRAS ERBB2

PIK3CA PIK3CA -0.75 0.75 EGFR.m EGFR.amp Figure 1. Transcriptional heterogeneity in EGFR pathway-driven cancers. (a) EGFR pathway genes frequently mutated in human cancers and their presumed signaling interplay. (b) A diagram showing derivation of transcriptional footprints for mutations: several data types, including mutation statuses and a gene expression matrix, are provided for patient samples. Mutations can mutually overlap, necessitating a controlled approach to score their individual impacts on gene expression. An MLR scores individual impacts of each mutation on the expression of each gene, which results in a new matrix of t-values for each gene for each mutation. (c) A heatmap of correlations of transcriptional footprints of EGFR pathway mutations across cancers of different tissues. (d) A representative scatter plot of transcriptional footprints of AKT1 and PIK3CA mutations in (BRCA).

mRNA levels (not shown), overexpression of ERBB2/HER2 Unexpected correlations. In addition to expected results, the protein in melanomas with PTEN loss must be through a post- analysis of signaling footprints also revealed surprising observa- transcriptional mechanism. tions. For example, NRAS mutation in COAD is not associated with higher levels of activated MEK/MAPK or Akt, although KRAS, BRAF Other correlations. Activation levels of Src, STAT3 (signal trans- and NF1 mutations in COAD are (Figures 2b and c). In addition, ducer and activator of 3), A- or c-Raf, mTOR, although NRAS in SKCM is strongly associated with MEK/MAPK RSK, S6, stress-activated MAPKs (p38 and JNK) and of PKC were activity, BRAF in SKCM is only associated with higher phospho- more variable. For example, mTOR and RSK activity, as measured MEK, but not MAPK (Figures 2b and c). Interestingly, BRAF by phospho levels of their downstream targets 4-EBP1 and S6, was mutations correlate strongly with Akt T308 phosphorylation, a strongly associated with ERBB2 amplifications and PTEN mutations target of PDK1, in LUAD and SKCM, but not with S473. Akt activity in BRCA, KRAS and BRAF mutations in COAD, KRAS mutations in has been found to correlate with T308, but not S473, phosphor- LUAD, and relatively weakly with PTEN mutations in GBM and ylation in several contexts,9,10 indicating that BRAF mutations in EGFR amplifications in BRCA. On the other hand, stress-activated LUAD and SKCM are also strongly associated with Akt activity. Yet MAPKs p38 and JNK were activated by AKT1 mutations in BRCA another surprising observation is an almost complete lack of and KRAS mutations in LUAD, whereas PKCa and PKCd were correlation of EGFR/ERBB2 amplification or mutation with Akt or activated by EGFR mutations in LUAD and by PTEN mutations in MAPK activation in BRCA, GBM and LUAD, although a correlation SKCM. STAT3 activation as measured by its Y705 phosphorylation exists between ERBB2 amplification and A-Raf/MEK pathway was only observed in LUAD with EGFR mutations and, to a lesser activation, and between Akt phosphorylation on T308 and EGFR extent, in those with KRAS mutations. amplification in BRCA (see Figure 2b). These results suggest that

Oncogene (2014) 5078 – 5089 & 2014 Macmillan Publishers Limited EGFR pathway footprints in human cancers A Lane et al 5081 BRCA COAD GBM LUAD SKCM EGFR/HER2/HER3 PKC-alpha_pS657 PKC-delta_pS664 4E-BP1_pT70 4E-BP1_pT37 4E-BP1_pS65 Src Stat3 S6_pS240_S244 Ras PI3K PKC S6_pS235_S236 mTOR_pS2448 p90RSK_pT359_S363 p70S6K MAP3K Raf Akt PTEN p38_pT180_Y182 JNK_pT183_pT185 MAPK_pT202_Y204 MEK1_pS217_S221 p38/JNK MAPK mTOR A-Raf_pS299 C-Raf_pS338 B-Raf Akt_pS473 Akt_pT308 RSK 4-EBP1 Akt PTEN PDK1_pS241 Signaling Proteins STAT3_pY705 S6 K-Ras Src_pY527 Src_pY416 Shc_pY317 HER3_pY1289 LUAD SKCM HER3 10 0.002 HER2_pY1248 -0.5 N.S. HER2 0.07 EGFR_pY1173 EGFR_pY1068 -1.0 -1.0 EGFR 1 2 1 F F F S S -1.5 N .m F B A A AS A A AS R B NF1 N NF1 NF1 NF1 RA TE R R -1.5 R GFR F AKT B BR BR PTEN PTEN P PTEN K KR N N E EGFR ER G PIK3CA E pMEK levels -2.0 EGFR.amp -2.0 Correlation t-value -2.5 Mutated/amplified genes -2.5 2.5

WT WT KRAS BRAF NRAS BRAF

COAD COAD 0.02 SKCM BRCA N.S N.S 0.03 -1.0 0.02 10 N.S. -1.5 -1.0 N.S 2.0 -2.0 0.04 0.007 -1.5 -1.0 1.0 -2.5 -2.0 -3.0 0 -2.5 -3.5 pMAPK levels -2.0 -1.0 -3.0 -4.0 pMAPK levels

-2.0 pAKT (S473) levels -3.0 -+ - +NRAS -+-+PTEN -- ++NF1 --++NF1 WT WT NRAS BRAF ERBB2 PIK3CA Figure 2. Characterization of signaling footprints of EGFR pathway mutations. (a) A signaling diagram showing some of the known signaling connections in the EGFR pathway. Proteins in orange are measured in RPPA, and their phospho levels are included in the analysis in b. (b) A heatmap of t-values of each signaling protein for each of the mutations in respective cancers. White areas in the heatmap represent missing data. (c) Representative box plots of the indicated RPPA measurements. WT in each plot indicates tumor samples with no mutations/ amplifications in any of the genes included in this analysis, whereas the others (for example, KRAS and BRAF) indicate patient samples only with the given oncogene (for example, KRAS means only KRAS mutation and no EGFR, BRAF or NF1 mutations). Numbers above some of the boxes in plots indicate significance P-values as calculated by MLR. (d) Boxplots of RPPA measurements of pAKT at S473 and pMAPK in COAD accounting for double mutations as indicated below plots. The values above each box indicate significant P-values based on MLR with statistical interactions between mutations (see Materials and methods for MLR with interactions).

signaling pathway activations in vivo may not always be in phosphorylation with any of these mutations (not shown). concordance with the known functions of respective oncogenes However, accounting for double mutations with NF1 did reveal and tumor suppressors. some surprising effects in COAD. Interestingly, NRAS:NF1 double mutations, but not either of them alone, are associated with a Statistical interactions between mutations. One possible explana- significant Akt phosphorylation at S473 (Figure 2d), but not at tion may be the confounding effects of interactions between T308 (not shown), whereas PTEN:NF1 double mutations, but some pairs of mutations, as some cancers, especially COAD and neither alone, are significantly associated with pMAPK levels LUAD, contain a significant amount of co-occurrences in some (Figure 2d). In fact, NF1 mutations alone have a negative impact mutations (see Supplementary Text). For example, in COAD, on pMAPK in COAD (Figure 2d). These observations show that mutations in PTEN frequently co-occur with KRAS, BRAF or NF1, multiple mutations in the EGFR pathway can elicit combinatorial whereas PTEN mutations in SKCM frequently co-occur with BRAF. non-additive effects on the signaling pathways. Accounting for double mutations in SKCM did not change the Many of the observations above are in stark contrast to the well- overall profile of MAPK and Akt across the accepted notion of signaling impacts of these mutations. For mutations (see Supplementary Figure 1), indicating that the example, the effect of NRAS/BRAF mutations on the MAPK observed correlation of BRAF mutations with pAkt instead of pathway activity has been extensively reported in the literature,5 pMAPK are not due to masking by indirect effects. Similarly, and EGFR overexpression both in vitro and in vivo is known to accounting for double mutations of NRAS, KRAS or BRAF with PTEN strongly activate Akt, STAT and MAPK pathways.2 However, it is (KRAS:PTEN, BRAF:PTEN and NRAS:PTEN) in COAD did important to note that most of our current understanding of the not improve significance of correlation of Akt or MAPK role of these proteins in comes from in vitro

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089 EGFR pathway footprints in human cancers A Lane et al 5082 studies on tissue culture cells. Our analyses are done on tumor from transcriptional footprints by NetWalk,13 a computational tissues from human patients and, therefore, may be more network analysis procedure previously characterized by us and representative of bona fide effects of these mutations on implemented in the software suite NetWalker.14 Briefly, NetWalk is signaling pathways. Moreover, some past studies on human a random-walk method that integrates gene-centric values (such tumor samples have identified similar conflicting results, such as as gene expression or t-value) with a network of a priori molecular lack of correlation of MAPK activity with BRAF mutations in the interactions to derive interaction-centric values (edge flux (EF) melanocytic nevi11 and lack of correlation of EGFR activity with values) based on combined assessment of gene values and their downstream pathway activations.12 Therefore, it is possible that interconnectivity. The EF values are then used to derive a unique negative feedback mechanisms frequently act to restrict log-likelihood score for each pathway/process (process flux value, oncogenic pathways in these tumors. Accordingly, lack of Akt or PF value) as defined in the NCBI BioSystems database (see activation in ERBB2-amplifying BRCA, despite high expression of Materials and methods). The PF-values have a high correlation Akt mRNA (not shown) and protein, may be attributable to high with traditional hypergeometric enrichment scores (not shown). levels of PTEN (see Figure 2b), which may constitute a negative However, although hypergeometric enrichment is used on a feedback to inhibit Akt in these tumors. In addition, these predefined list of genes, PF-values are calculated by considering mutations may act through alternative downstream pathways, the whole genomic distribution and, therefore, can identify such as in the case of PTEN mutations in SKCM, where they lead to processes with consistent, albeit subtle, distribution of input the activation of PKC rather than the better-known target Akt (see values for constituent genes. above). Figure 3a shows a heatmap of PF-values for most significant processes for respective mutations in BRCA. In accordance with transcriptional footprint analyses, PIK3CA and AKT1 mutations Heterogeneity of pathway footprints in breast cancer have highly similar pathway footprints, whereas ERBB2 and EGFR Transcriptional footprints of EGFR pathway genes in BRCA showed have highly different pathway footprints that often were in a significant level of heterogeneity between receptors (ERBB2 and opposite direction to each other or to PIK3CA/AKT1 footprints. For EGFR) and PIK3CA and AKT1 mutations (see Figure 1c). We asked example, although breast cancers with ERBB2 amplifications have whether a similar heterogeneity is also present at a pathway level. an extensive upregulation of the fatty acid and cholesterol For this purpose, we defined a pathway footprint to be a vector of biosynthesis machinery, PIK3CA or AKT1 mutations were mainly individual impact scores of a mutation on the expression of associated with fatty acid breakdown through b-oxidation in the predefined molecular pathways. Pathway footprints are derived peroxisome and mitochondria (Figures 3b and c). Therefore,

Fatty acid beta-oxidation CROT mRNA expression Peroxisome Mitochondria

ACADL HADHA 1500 SCP2 CROT ERBB2 EGFR PIK3CA AKT1 NF1 PTEN ACADS ECHS1 ubiquinone metabolic process ACADM nucleoside diphosphate metabolic process 1000 HSD17B4 HADH guanosine de novo biosynthesis ACOX3 dolichol metabolic process ACOX2 regulation of translational initiation in resp. to stress HADHB 500 regulation of CDK activity involved in G1/S AMACR palmitoyltransferase activity PCCB PCCA ER palmitoyltransferase complex CRAT Fatty Acyl-CoA Biosynthesis 0 sugar biosynthesis, eukaryotes MCEE MUT Synthesis of UDP-N-acetyl-glucosamine tryptophan catabolic process to kynurenine WT

response to unfolded protein EGFR Normal ERBB2 noradrenaline and adrenaline degradation Correlation t-value PIK3CA LDL-mediated lipid transport Cholesterol biosynthesis -5 5 Glutathione synthesis and recycling proline metabolic process SREBF2 mRNA expression mevalonate pathway I Glycolysis, core module involving ... Fatty acid biosynthesis Cholesterol biosynthesis Proteasome, 19S regulatory particle (PA700) 8000 isocitrate dehydrogenase activity ACSL3 positive regulation of translational initiation LSS methylation-dependent chromatin silencing ACSL1 6000 SREBP-mediated signaling pathway HMGCS1 CYP51A1 regulation of ryanodine-... calcium-release channel ... FASN glycerol metabolic process SQLE 4000 nucleoside-diphosphatase activity ACACA HMGCR intrinsic to peroxisomal membrane Peroxisomal lipid metabolism 2000 choline transmembrane transporter activity collagen metabolic process ACLY MVK SREBF2 FDFT1 3-hydroxyacyl-CoA dehydrogenase activity Mitochondrial Fatty Acid Beta-Oxidation MVD Activation of C3 and C5 SC5DL SLC25A1 WT Wnt receptor signaling pathway, planar ... pathway EGFR Normal phosphatidylinositol GlcNAc- activity PMVK ERBB2 Iron uptake and transport INSIG2 PIK3CA leukocyte tethering or rolling IDI1 FDPS cellular extravasation regulation of translation, ncRNA-mediated TNF mRNA expression I-kappaB/NF-kappaB complex arginine degradation VI (arginase 2 pathway) 120 lipopolysaccharide-mediated signaling pathway Cellular Extravasation negative regulation of NF-kappaB TF activity 100 binding regulation of toll-like receptor signaling pathway SELE SELPLG 80 regulation of positive chemotaxis CCL2 glucose transmembrane transporter activity 60 L-serine metabolic process TNF cellular response to interleukin-1 40 negative regulation of translation Protein-Protein VCAM1 20 Process Flux value Indirect interaction CCL5 PIK3CG Metabolic reaction 0 -3 3 Gene Regulation (ENCODE)

PIK3CD WT GeneRIF similarity EGFR Normal ERBB2 Gene Regulation (TRANSFAC) PIK3CA Figure 3. Comparative analysis of pathway footprints in BRCA. (a) A heatmap of process flux values for most significant processes/pathways for each mutation/amplification in BRCA. (b) Pathway plots of select processes from the heatmap in a. (i) Fatty acid b-oxidation in the peroxisome and mitochondria, where nodes are colored by t-values in PIK3CA-mutant BRCA. (ii) Fatty acid and cholesterol biosynthesis, where nodes are colored by t-values in ERBB2-amplified BRCA. (iii) Cellular extravasation, where nodes are colored by t-values in EGFR-amplified BRCA. (c) Box plots of mRNA levels of representative genes from the network plots in b. ‘Normal’ in each box plot indicates mRNA expression in the normal mammary tissue, whereas the rest are as in Figure 2c.

Oncogene (2014) 5078 – 5089 & 2014 Macmillan Publishers Limited EGFR pathway footprints in human cancers A Lane et al 5083 although ERBB2-amplified tumors mainly switch on anabolic lipid we have observed a common effect among EGFR-related pathway metabolism, consistent with their high mTOR activity15 (see footprints (see below). Figure 2b), PIK3CA and AKT1 mutant tumors mainly engage in Interestingly, both GBM and LGG pathway footprints are oxidative lipid catabolism. On the hand, EGFR-amplified breast dominated by processes involved in the major histocompatibility cancers did not have specific activations of any of these pathways, complex (MHC) class I antigen presentation pathway. The MHC but were mainly characterized by an extensive and selective class I antigen presentation pathway is a ubiquitous pathway for upregulation of the innate immunity pathways, particularly the the presentation of intracellular antigens on MHC class I receptors tumor necrosis factor, cellular extravasation and TLR pathways on the cell surface to CD8 þ T cells. This is an important innate (Figures 2a–c). These observations demonstrate the heterogeneity immunity mechanism whereby cytotoxic T cells recognize foreign of pathway activations impacted by different EGFR pathway antigens in virus-infected and transformed cells.16,17 It is striking mutations in breast cancer. that genes functioning at almost every step of this pathway, starting with the proteasomal degradation, endoplasmic reticulum calnexin/calreticulin system for recognition and loading of Heterogeneity of pathway footprints among EGFR-driven human peptides, ATP-dependent peptide transporting machinery and cancers endosomal/vacuolar delivery17 are also overexpressed along with Next, we wanted to conduct a comparative analysis of pathway MHC class I complex proteins in GBM and LGG (see Figures 4a and footprints associated with activating mutations or amplifications b). Although the connection between EGFR amplification and of EGFR or ERBB2 in BRCA, GBM, LGG and LUAD. As expected from innate immunity has not been reported in the literature, a role for the transcriptional footprint analyses, EGFR amplifications in GBM MHC class I antigens in immune evasion in GBM have been and LGG have a high similarity of pathway footprints that is not proposed,18 possibly suggesting a similar functional role for this shared by any other tissue pair (Figure 4a). As GBM and LGG are pathway in EGFR þ GBM and LGG. both brain tumors, this strengthens the conclusion that oncogenic Strikingly, although MHC class I pathway seems to be unique to functional footprints in cancers are highly tissue specific. However, EGFR-amplifying GBMs and LGGs, LUADs with EGFR mutations, but

GBM/LGG Antigen loading on MHC Class I Interferon signaling Proteasome MHC Class I Complex

IFNGR2

SOCS1

SOCS2 Process Flux value TAP1 ISG15 IRF1 -3 3 PSMB9 IFI6 STAT1 CIITA Lysosomal degradation IRF9 IRF2 OAS1 MHC Class II Complex

PTPN11 EGFR.BRCA EGFR.LUAD.amp EGFR.GBM EGFR.LGG EGFR.LUAD.m ERBB2.BRCA ICAM1 telomerase holoenzyme complex Chondroitin sulfate degradation GBP1 NOS2 CXCL9 cell-matrix adhesion positive regulation of CDK activity involved in G1/S O-glycan biosynthesis, mucin type core integrin-mediated signaling pathway Correlation t-value intestinal cholesterol absorption -5 5 LUAD lysosomal membrane signaling MHC class II protein complex heme catabolic process glycerol metabolic process IRAK1 mRNA expression I-kappaB/NF-kappaB complex low high regulation of toll-like receptor signaling pathway regulation of positive chemotaxis Toll-like Receptor signaling glucose transmembrane transporter activity L-serine metabolic process 10000 EGFR+ chemokine activity interleukin-1-mediated signaling pathway NADH metabolic process 100 isocitrate dehydrogenase activity (log) regulation of translational initiation in response to stress ubiquinone metabolic process Nucleotide sugar biosynthesis, eukaryotes 1

tryptophan catabolic process to kynurenine EGFR mRNA expression proline metabolic process Cholesterol Biosynthesis 01234 Calnexin/calreticulin cycle Relative EGFR copy integral to endoplasmic reticulum membrane number Notch receptor processing peptide-transporting ATPase activity Pyrimidine biosynthesis SOCS2 mRNA expression SOCS2 methylation beta-value Endosomal/Vacuolar pathway low high 0.1 0.9 integral to lumenal side of ER membrane MHC class I protein complex phagocytic vesicle signaling events nucleotide-excision repair, DNA gap filling EGFR+ EGFR+ histone exchange Protein-Protein EphrinB-EPHB pathway Indirect interaction

DNA replication checkpoint (log) Metabolic reaction Gene Regulation (ENCODE)

GeneRIF similarity EGFR mRNA expression Gene Regulation (TRANSFAC) 0123456 0123456 Relative EGFR copy Relative EGFR copy number number Figure 4. Comparative analysis of pathway footprints in EGFR-driven human cancers. (a) A heatmap of most significant process flux values in EGFR-driven cancers. (b) A diagram of network plots of different steps of MHC class I and II antigen presentation pathways in GBM/LGG and LUAD. Nodes are colored according to t-values for EGFR-amplifying LGG (for interferon, proteasome, antigen loading on MHC class I and MHC class I complex) and EGFR-mutated LUAD (lysosomal degradation, MHC class II complex). (c) A network plot of TLR signaling colored by t-values for EGFR-amplifying breast cancer. (d) An mRNA expression profile of IRAK1 (interleukin-receptor activated kinase 1) gene in EGFR- amplifying (EGFR þ ) BRCA. The orange circle in the plot shows the EGFR þ population. Each dot in this plot represents a patient sample colored by their relative expression of IRAK1 according to the color key. (e) Left: same as in d for SOCS2 mRNA expression in GBM. Right: same, but colored by methylation b-value (high b-value indicates hypermethylated, low-value means hypomethylated). Network plots of additional pathways for BRCA, LGG and LUAD are provided in Supplementary Figure 3.

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089 EGFR pathway footprints in human cancers A Lane et al 5084 not amplifications, seem to overexpress components of MHC class II machinery (see Figure 4a). In contrast to MHC class I, TN TLI

MHC class II is mainly involved in the presentation of extracellular EGFR TN TLI antigens. Therefore, instead of the proteasome–endoplasmic GBM VCAM1 reticulum–Golgi route in MHC class I presentation, MHC class II IRAK2 involves phagocytosis of extracellular antigens, digestion in the BRCA lysosome, packaging into MHC class II complexes and recycling IRAK1 BRCA 17 LUAD back to the cell surface. Accordingly, in addition to MHC class II TLR2 complex genes, EGFR-driven lung cancers also have significant overexpression of the lysosomal degradation machinery (see Regression t-value TNF Figures 4a and b). Consistent with an increased innate immunity -4 4 signaling, LUADs, GBMs and LGGs driven by EGFR seem to IRF1 have upregulated interferon signaling. Thus, it appears that a common innate immunity signaling machinery that triggers the HLA-A production of genes involved in MHC class I and II presentation GBM pathways are activated in brain and lung cancers driven by EGFR TAP1 (Figure 4b). TN: Tumor necrosis Although ERBB2-amplifying breast cancers do not display any TLI: Tumor lymphocyte SOCS2 infiltration inflammatory response signature, EGFR-amplifying breast cancers have a significant upregulation of the tumor necrosis factor, CTSH TLR and interleukin pathway signaling, possibly converging on nuclear factor-kB, interleukin-receptor activated kinase 1–2 HLA-DPA1 and receptor-interacting serine/threonine kinase pathways LUAD (Figures 4a, c and d). Therefore, it is likely that EGFR-driven HLA-DRB6 human cancers are commonly characterized by an innate immunity signature; however, the pathways associated with HLA-DMA innate immunity are tissue specific. At least in GBMs, specific activation of these pathways was associated with significant Regression t-value hypomethylation of respective genes, as demonstrated in -5 5 Figure 4e for a representative gene SOCS2 (suppressor of cytokine signaling 2), a major target of cytokine signaling. Figure 5. Effect of tumor lymphocyte infiltration (TLI) and tumor necrosis (TN) on the inflammatory phenotype in EGFR-driven cancers. (a) A heatmap showing correlation strength (regression t- Effects of the tumor microenvironment on the functional value) of EGFR amplification (GBM and BRCA) or mutation (LUAD) footprints with TLI or TN. (b) Heatmaps showing partial t-values of correlation of EGFR amplification (BRCA and GBM) or mutation (LUAD), as well as The presence of non-tumor cells in a tissue sample can be a TN and TLI with the respective inflammatory pathway genes. The t- significant confounding factor, and our findings of a strong values were calculated using MLR including all three terms (EGFR, inflammatory phenotype in EGFR-driven cancers may suggest an TN and TLI). indirect effect of infiltrating immune cells or of tissue necrosis. TCGA provides information on tumor cell percentage, lymphocyte and monocyte infiltration, as well as necrosis percentages for the human cancers are likely to be tumor cell specific, rather than analyzed tumor specimens, which are among the primary factors an artifact of immune cell infiltration. that could elicit an inflammatory phenotype. We first tested whether EGFR amplification (in BRCA and GBM) or mutations (in LUAD) correlate with these factors. It should be noted that in the Variable levels of genome instability among EGFR-driven human case of LGG, the percentage of tumor necrosis was 0 (zero) for cancers all but one measured sample (consistent with their low-grade Despite significant heterogeneity of pathway footprints, there is a status), whereas the other parameters were mostly not indicated, consistently high expression of genes involved in positive suggesting that at least tumor necrosis was not responsible for the regulation of -dependent kinase activity, indicating a inflammatory phenotype in these tumors. Tumor necrosis and common proliferative phenotype of EGFR-driven human cancers lymphocyte infiltration did not significantly correlate with EGFR (Figure 4a). However, interestingly, PLK1 signaling, DNA replication amplification status in BRCA or GBM, but EGFR mutations in LUAD checkpoint and nucleotide-excision repair machinery seem to significantly correlated with tumor necrosis (Figure 5a). This be highly expressed only in LGG (Figures 6a and b), but expressed suggests that at least for LUAD, the inflammatory phenotype at a low level in LUAD (Figure 6a). We asked whether such associated with EGFR mutations may be due to tumor necrosis. To a pattern of expression of DNA repair machinery reflects test for the role of tumor necrosis and lymphocyte infiltration cellular response to a high mutational load. To answer this (monocyte infiltration was not adequately annotated for tumor question, we ranked patient samples in each data set by the samples) in the expression of respective inflammatory pathways, number of their non-synonymous mutations and assigned them a we included these parameters in our MLR model. As expected, normalized ‘relative mutational load’ score from 0 to 1, such that 0 tumor necrosis was a significant independent predictor of indicates the patient sample with the least number of non- increased tumor necrosis factor and TLR2 pathway genes in synonymous mutations for a given tissue and 1 indicates the BRCA, and lymphocyte infiltration was a significant independent patient sample with the highest number of non-synonymous predictor of MHC class II complex expression in LUAD (Figure 5b). mutations. We have found that, consistent with the expression However, EGFR amplifications in BRCA and GBM, and EGFR of DNA repair and checkpoint machinery, although EGFR-driven mutations in LUAD were still strongly correlated with the ex- LGG and EGFR/ERBB2-driven BRCA are associated with a pression of respective inflammatory pathways after controlling significantly high mutational load, EGFR-driven LUADs have a for tumor necrosis and lymphocyte infiltration (Figure 5b). This lower mutational load relative to other LUADs (Figure 6c). indicates that the inflammatory phenotypes of EGFR-driven Therefore, in addition to pathway footprint heterogeneity,

Oncogene (2014) 5078 – 5089 & 2014 Macmillan Publishers Limited EGFR pathway footprints in human cancers A Lane et al 5085 EGFR-driven cancers of different origins also display different cell lines, it is now possible to conduct comparative analyses levels of genomic instability, which may explain some of the of functional architectures in cancer cell lines and tissues.19,20 observed pathway heterogeneity. Cancer cell lines of at least some tissue origins have been shown to closely resemble respective primary cancers in situ in terms of genomic make-up and overall transcriptional profiles.21 However, Discrepancy of functional footprints between cancer cell lines to our knowledge, no analyses have been done to test for and tissues consistency of genotype-specific functional architectures in cell Cancer cell lines are extensively used to study oncogenic lines. As our analyses reveal a disconnect between EGFR pathway mechanisms and their targetable vulnerabilities. With the activation signatures and current knowledge, which is mainly availability of extensive genomic data for a large panel of cancer based on in vitro studies, we decided to conduct a comparative

Correlation t-value -5 5 Relative expression 1.0 ATRIP low high 0.8 CHEK1 ATR 0.6 LOC651610 CHEK2 0.4 ATM 0.2 CHEK1 EGFR+ EGFR+ CDC25C Relative mutational load 0.0

CCNB1 CDK1 0123 0123 EGFR mRNA expression (log) EGFR.LGG Relative EGFR Relative EGFR EGFR.GBM EGFR.BRCA

copy number copy number ERBB2.BRCA WEE1 EGFR.LUAD.m EGFR.LUAD.amp

Figure 6. Genomic instability in EGFR-driven cancers. (a) A network plot of ATM/CHEK signaling, where nodes are colored by their t-values in EGFR-amplifying LGG. (b) mRNA expression profiles of WEE1 and CHEK1 in LGG. Each dot is a patient sample colored by WEE1 or CHEK1, respectively, mRNA expression according to the color key. Orange circle indicates EGFR-amplifying population. (c) Box plots of relative mutational loads of EGFR-driven cancers. Relative mutational load is such that 1 indicates the patient with the highest number of non-synonymous mutations for a given cancer type (for example, GBM) and 0 means the patient sample with the least number of non- synonymous mutations for the given cancer type.

Breast Colon Breast Colon Lung Melanoma PTEN PIK3CA BRAF BRAF Melanoma PIK3CA BRAF NRAS BRAF PIK3CA KRAS KRAS.other Lung ERBB2 EGFR.m KRAS.G12 KRAS.other ERBB2 KRAS BRAF Colon ERBB2 KRAS KRAS.G12 PIK3CA Breast

ERBB2 BRAF BRAF BRAF KRAS KRAS ERBB2 ERBB2 ERBB2 PIK3CA PIK3CA PIK3CA

Pearson’s r KRAS.G12 KRAS.other BRAF BRAF PTEN KRAS NRAS -0.8 0.8 ERBB2 PIK3CA EGFR.m Lung Melanoma KRAS.G12 KRAS.other KRAS BRAF KRAS BRAF KRAS BRAF TCGA set 1 EGFR.m NRAS Source dataset: TCGA set 2 EGFR.m NRAS GDS cell line set EGFR.m NRAS BRAF BRAF BRAF KRAS KRAS KRAS NRAS NRAS NRAS EGFR.m EGFR.m EGFR.m Figure 7. Analyses of consistencies among oncogenic footprints in situ and in vitro.(a) A correlation heatmap of transcriptional footprints associated with the indicated oncogenes in respective cancer cell lines. In case of the colon, KRAS mutations in the G12 and other positions were separated due to sufficient number of samples. In the lung, all of the mutations were at the G12 position. (b) Correlation heatmaps of transcriptional footprints associated with the given oncogenes between two independent TCGA data sets and cancer cell lines. The color of each oncogene name in the heatmaps indicates the source data set for the calculation of its corresponding footprint (red: TCGA set 1, blue: TCGA set 2 and black: genomics of drug sensitivity cell line set). Only the oncogenes with sufficient number of corresponding samples in the cell line data set (genomics of drug sensitivity) were included in this analysis.

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089 EGFR pathway footprints in human cancers A Lane et al 5086 analysis of EGFR pathway functional footprints in cancer cell lines. MLR to identify functional footprints Using the genomic data sets from the ‘Genomics of drug Correlating alterations at the DNA level to changes at the mRNA 19 sensitivity’ project (the only one that has somatic mutation and phenotypic levels to infer causal relationships have been and copy number change data for select oncogenes/tumor done previously under different contexts.23,24 For example, Akavia suppressors and microarray gene expression data for cell lines), et al.24 used Bayesian approaches modified from Segal et al.25 to we calculated transcriptional footprints of cancer cell lines of the identify driver pathways downstream of copy number alterations breast, colon, lung and malignant melanoma for mutations that in melanoma. Our approach in this study makes use of MLR to had sufficient number of samples. Correlation profiles of score the individual impact of each EGFR pathway mutation on oncogenic transcriptional footprints among cancer cell lines the expression of every gene (transcriptional footprint) or protein were generally consistent with those of tissues, with strongest state (signaling footprint). Such an approach allows for derivation correlations being observed between KRAS and BRAF footprints in of unique scores (t-values), explaining the relative impact of a the colon, and NRAS and BRAF footprints in melanoma (Figure 7a). gene mutation or copy number gain on a functional trait, In accordance with the clinical data sets, intertissue correlations which can be integrated with a priori molecular networks to for most footprints were generally weaker, although the derive pathways of interest. In contrast to the approach of Akavia BRAF footprint in melanoma cells showed a strong similarity to et al.,24 we do not predefine modules of coregulated genes, the BRAF and KRAS footprints in the colon, but not the lung but rather elect to evaluate the whole distribution of t-values cancer cell lines. (a footprint). This approach may have advantages over module- Next, we sought to test for consistency of EGFR pathway based approaches when expressions of genes in a pathway footprints between clinical and cell line data sets. To allow for may not necessarily correlate. For example, we have found that higher-confidence correlations, we calculated two independent VCP and SYVN1, central members of the endoplasmic reticulum- transcriptional footprints for each mutation in each cancer type associated degradation pathway, are both significantly up- using two independent sets of patient samples in the respective regulated in ERBB2-amplifying breast cancers, although their clinical data sets in TCGA. Thus, our source data sets consisted expressions in ERBB2-amplifying breast cancers show a slight of two independent sets of clinical and one set of cell line data mutually exclusive pattern. NetWalk analysis of MLR results sets. A correlation profile of oncogenic footprints in clinical and identifies this pathway as significantly upregulated in ERBB2 þ cell line samples reveals a high consistency between clinical breast cancers, and we showed that this pathway has a requisite footprints, which attests to their reproducibility in independent role in the survival of ERBB2 þ breast cancer cells (manuscript data sets (Figure 7b). However, oncogenic footprints in cell lines under consideration elsewhere). A module-based approach, such generally correlated weakly, albeit still generally statistically as that of Akavia et al.,24 would probably have missed this significantly, with their corresponding footprints in clinical data correlation owing to the lack of correlation of expressions among sets. For example, in breast cancer data sets, transcriptional endoplasmic reticulum-associated degradation member genes. footprints of ERBB2 amplification or PIK3CA mutations are highly similar in the two clinical data sets, but they are less similar to the respective footprints in cell lines (Figure 7b). A particularly Tissue and oncogene specificity of functional footprints interesting profile emerges in the colon cancer data sets, where Our findings suggest that the global transcriptional and signaling although the transcriptional footprints of KRAS and BRAF programs activated by driver mutations of the EGFR pathway can mutations seem to have diverged in cell lines, they have retained be both tissue and mutation specific. Even in the case of the same their similarities to each other. Overall, the high consistency of oncogene, EGFR, its activating mutations in LUADs can have results between independent clinical data sets suggests that our markedly different functional footprints from those of EGFR calculated transcriptional footprints likely reflect true functional amplifications at transcriptional, signaling and pathway levels. impacts of the respective mutational events in situ. However, This finding may not be surprising, given that patients with EGFR our results also reveal a high level of divergence of cellular mutations in non-small cell lung cancers respond differently to functional make-up during in vitro propagation of cell lines, which treatment with EGFR inhibitor gefitinib compared with those with could explain the high level of disconnect between our findings EGFR copy number gains,26,27 implying different cellular functional and the current knowledge of EGFR pathway mechanisms in roles of the two mechanisms of EGFR activations. tumorigenesis. EGFR mutations or amplifications also have markedly different profiles across different tissues (Figures 1–4). Although EGFR amplifications in GBM and LGGs, both brain cancers, had highly DISCUSSION similar functional footprints, they were significantly different from Most of our current understanding of the tumorigenic mechan- EGFR mutations or amplifications in other cancers. Moreover, isms stems from in vitro studies on tissue culture and in vivo strikingly, EGFR activations were not generally associated with the studies on mouse models of cancer. Although the signaling activation of downstream Akt and MAPK pathways in cancers. This pathways operating downstream of receptor tyrosine kinases and other observations (for example, no correlation of BRAF and the mechanistic details of many downstream interactions mutation with MAPK activation) are in contrast to the currently, are currently well characterized, we do not know how relevant widely accepted mechanisms of signal transduction downstream these mechanisms are in the clinical setting. As most of of these oncogenes.3,5,6 the current therapeutic efforts rely on these mechanisms,3–5 it is At least in the case of rare genotypes (for example, EGFR vital that we dissect clinically relevant mechanisms from those amplifications in breast cancer), some of the discrepancy could be induced by experimental artifacts. Although past studies have credited to the low statistical power due to small sample sizes, reported on global classification of tumor types based on their which may prevent accurate measurement of functional footprints. transcriptional profiles,22 to our knowledge, the current study Another possible explanation for this discrepancy is that our is the first comprehensive analysis of the TCGA data sets to analyses are entirely based on analyses of tumor samples, with no delineate mutation-specific transcriptional, signaling and pathway normal control, for which no RPPA data are available in TCGA. profiles across clinical human cancer specimens. Despite several Therefore, it is possible that although MAPK levels are consistently correlations consistent with the current view, our analyses reveal activated in the majority of tumor samples, our analysis interprets a significant discrepancy of in vivo functional profiles of tumors less-pronounced activation of MAPK by one mutation (for example, with the currently accepted model of EGFR pathway-driven EGFR amplification in GBM) as a lack of activation due to the higher tumorigenesis. overall baseline across all tumor samples. However, we argue

Oncogene (2014) 5078 – 5089 & 2014 Macmillan Publishers Limited EGFR pathway footprints in human cancers A Lane et al 5087 that the panel of tumor samples in each data set, where no k positively contributes to xi; for example, ERBB2 amplification leads to high mutations in either of these tested EGFR pathway genes exist, ERBB2 mRNA expression; whereas a highly negative bik would indicate a could serve as an internal control for our purposes, at least in negative impact of the mutation on xi. Note that the bik values indicate the this case. Therefore, our conclusion that a given oncogenic magnitude of impact, but not the significance of the impact, due to the activation does not significantly further contribute to the activation error associated with bik. Therefore, for our analyses, we use the significance scores for b,ort-scores: of MAPK or AKT pathways is valid in light of these data. Accordingly, MAPK levels in EGFR-mutated or -amplified lung bik tik ¼ ; cancers are not significantly higher than in patients that have no sðbik Þ activating mutations/amplifications in EGFR, KRAS, BRAF or NF1 where sðbikÞ represents the s.e. associated with bik. The t-values directly (Supplementary Figure 2). reflect the significance of the impact of each mutation (for example, In addition, it is important to note that a bulk of studies on EGFR t41.96 indicates a positive impact with a P-value of o0.05). signaling pathways were conducted in in vitro conditions, often We construct MLR with statistical interactions between pairs of involving growth factor stimulations after prolonged cellular mutations as starvation, and therefore may be affected by tissue culture Xl Xl artifacts. Accordingly, we show that the oncogenic functional xi ¼ bi0 þ bikmk þ bijkmjmk þ Ei; footprints show a high level of divergence in cell lines, despite k¼1 j 6¼ k retaining some similarities to the tumor tissue. Although large- where bijk represents the regression coefficient for the impact of scale RPPA data for cancer cell lines to allow for a comparative interaction between mutations j and k on the expression of gene i. analysis of signaling footprints have not been reported (to our knowledge), discrepancies between tumor tissue and cell lines at Correlations of footprints the signaling level have been reported. For example, Hegi et al.12 For correlations of the transcriptional footprints in Figures 1c and 7, we did reported that although downstream pathway activations in EGFR- all pairwise correlations of the corresponding t-values. However, for each amplifying GBM cell lines in vitro was EGFR dependent, this pairwise correlation, we only considered t-values that had an absolute dependency was not present in GBM patient tumors with EGFR value 42, which corresponds to a P-value of o0.05, in either of the two amplifications. Such divergence of the functional make-up of samples. The degrees of freedom for any of the pairwise correlations in tumor cells in the culture is not unexpected, as the tumor stroma such a manner were always 4500. may have an important role in the shaping of tumor molecular architecture in situ. Therefore, in vivo roles of EGFR pathway Data sets mutations are complex, are likely confounded by feedback and Copy number (SNP6 platform), gene expression (RNAseq, Agilent), RPPA, alternative pathway activations in an oncogene- and tissue- methylomics and somatic mutations data sets (MAF files, level 2) were specific manner, and may be highly dissimilar to their obtained from the TCGA (https://tcga-data.nci.nih.gov/tcga/tcgaHo- counterparts in vitro. These results may invite skepticism about me2.jsp). The gene expression and mutation/amplification data for cancer the suitability of cell line models of tumorigenesis and, more cell lines were obtained from The Genomics of Drug Sensitivity project importantly, about the feasibility of targeting common nodes (for web site (http://www.cancerrxgene.org/). For analyses in Figure 7, patient example, Akt, mTOR, MEK and MAPK) downstream different samples in each of the respective cancer types (BRCA, COAD, LUAD and mutations in cancers, which is currently a major trend in cancer- SKCM) were randomly divided into two sample sets to obtain two independent transcriptional footprints for each mutational event in each targeted therapeutics. cancer type. Despite the significant heterogeneity of functional footprints, EGFR-driven brain, lung and breast cancers showed a common Defining mutations activation of inflammatory response pathways, although the 31 individual pathways differed in a tissue-specific manner. For For activating mutations in oncogenes, following Vogelstein et al. for example, EGFR-driven GBM and LGG were characterized by criteria of an activating mutation, we only considered mutations that occurred at a relatively higher frequency (410%) in the same data set. This activation of interferon–MHC class I antigen presentation pathway, criterion for activating mutations did not significantly affect our results (not whereas EGFR-driven LUAD was characterized by interferon–MHC shown), as such mutations accounted for an overwhelming majority of class II antigen presentation pathway. EGFR-amplifying breast oncogenic mutations in each data set (see Supplementary Text). To map cancers, on the other hand, had a significant activation of TLR and genomic coordinates in MAF files to corresponding positions in pathways converging on nuclear factor-kB and proteins, we used ANNOVAR.32 For inactivating mutations in tumor interleukin-receptor activated kinase signaling. This finding of a suppressor genes, we considered all mutations, regardless of their relative common denominator in EGFR-driven cancers may have some frequency. For amplifications, we considered a relative copy number value therapeutic implications, especially given that EGFR-targeting of 41 to indicate genomic amplification. drugs have not yet demonstrated significant efficacy in EGFR- driven cancers of tissues other than the lung.28–30 Scoring tumor lymphocyte infiltration and tumor necrosis In TCGA, tumor lymphocyte infiltration and tumor necrosis are provided as percentage scores using at least two slides for each tumor sample. For our MATERIALS AND METHODS analyses, we chose the highest percentage of tumor lymphocyte Functional footprint infiltration or tumor necrosis in any of the slides for any given sample to A functional footprint is a vector of individual impact scores of a mutation serve as the tumor lymphocyte infiltration or tumor necrosis score. on the expression of each mRNA (transcriptional footprint) or protein (signaling footprint). To score individual impacts of mutations mk, k ¼ 1yl, Networks on the expression of a gene i, xi, we construct a linear model of the form A knowledgebase of functional interactions between gene products was assembled from online databases. Protein–protein interactions, including Xl 33 34 35 x ¼ b þ b m þ E ; signaling relationships were obtained from HPRD, MINT, Reactome, i i0 ik k i BIND,36 BioGRID,37 Nature Pathway Database (http://pid.nci.nih.gov/), k¼1 Biocarta (http://www.biocarta.com/) and PathwayCommons;38 trans- 39 where bik represents the regression coefficient of mutation k for the cription factor–gene target relationships were obtained from TRANSFAC, 40 41 42 expression of gene i, and Ei is assumed to be independent normally ORegAnno, ENCODE and MSigDB. Metabolic relationships between distributed error terms. The bi values reflect the effects of mutations on the gene products were defined such that genes whose products catalyze expression of a gene i (because the other way around is unlikely). consecutive reactions (that is, product of the reaction catalyzed by one is For example, a highly positive bik value would indicate that the mutation used as a reactant in the reaction catalyzed by the other gene product)

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089 EGFR pathway footprints in human cancers A Lane et al 5088 were assigned an interaction; metabolic reactions catalyzed by human implications for therapeutic approaches. Expert Opin Ther Targets 2012; 16(Suppl 43 44 45 gene products were obtained from HMDB, BiGG and KEGG. To 2): S17–S27. increase the coverage of our knowledgebase, we also assigned interactions 4 Ciardiello F, De Vita F, Orditura M, Tortora G. The role of EGFR inhibitors in between pairs of genes if they shared GeneRIFs assigned to them in 46 nonsmall cell . Curr Opin Oncol 2004; 16: 130–135. Gene. Overall, our knowledgebase consisted of 444 828 unique 5 Salama AK, Flaherty KT. BRAF in melanoma: current strategies and future direc- interactions among 18 722 genes, which is available from authors upon tions. Clin Cancer Res 2013; 19: 4326–4334. request. 6 Karnoub AE, Weinberg RA. Ras oncogenes: split personalities. Nat Rev Mol Cell Biol 2008; 9: 517–531. Network analyses 7 Pearson G, Bumeister R, Henry DO, Cobb MH, White MA. Uncoupling Raf1 from To identify pathways/networks associated with transcriptional footprints MEK1/2 impairs only a subset of cellular responses to Raf activation. J Biol Chem (pathway footprints), we used NetWalk, a random-walk method for the 2000; 275: 37303–37306. scoring of functional pathways and network interactions. The individual 8 Parekh DB, Katso RM, Leslie NR, Downes CP, Procyk KJ, Waterfield MD et al. Beta1- impact scores (t-values) are used as weights (w ¼ et: weights must be integrin and PTEN control the phosphorylation of . Biochem J positive) in the transition probability matrix P in NetWalk: 2000; 352(Pt 2): 425–433. 9 Vincent EE, Elder DJ, Thomas EC, Phillips L, Morgan C, Pawade J et al. Akt w p ¼ P j ; phosphorylation on Thr308 but not on Ser473 correlates with Akt protein ij w k2Ni k kinase activity in human non-small cell lung cancer. Br J Cancer 2011; 104: 1755–1761. where Ni is the set of network neighbors of node i, and j 2 Ni and pij is the transition probability from node i to node j. We define visitation 10 Gallay N, Dos Santos C, Cuzin L, Bousquet M, Simmonet Gouy V, Chaussade C et al. probabilities of nodes, p, in the random walk as the solution to the The level of AKT phosphorylation on threonine 308 but not on serine 473 is eigenvector equation: associated with high-risk cytogenetics and predicts poor overall survival in acute myeloid leukaemia. Leukemia 2009; 23: 1029–1038. Pq T p ¼ p PðÞ1 À q þ 1nw ; 11 Uribe P, Andrade L, Gonzalez S. Lack of association between BRAF mutation and w MAPK ERK activation in melanocytic nevi. J Invest Dermatol 2006; 126: 161–166. nn where P ¼fpijg is the transition probability matrix, q is the restart 12 Hegi ME, Diserens AC, Bady P, Kamoshima Y, Kouwenhoven MC, Delorenzi M et al. probability for the random walk and 1n is a unit vector of length n (total Pathway analysis of glioblastoma tissue after preoperative treatment with the number of network nodes). The second term on the right-hand side is a EGFR tyrosine kinase inhibitor gefitinib--a phase II trial. Mol Cancer Ther 2011; 10: matrix with rank one that (1) adds a restart probability to the random 1102–1112. walker depending on the weights of nodes and (2) ensures that the 13 Komurov K, White MA, Ram PT. Use of data-biased random walks on graphs for equation converges to a unique p. Visitation probability of the network the retrieval of context-specific networks from genomic data. PLoS Comput Biol interaction between nodes i and j, mij, is defined as 2010; 6: e1000889. 14 Komurov K, Dursun S, Erdin S, Ram PT. NetWalker: a contextual network analysis m ¼ p p : ij i ij tool for functional genomics. BMC Genomics 2012; 13: 282. The vector l reflects the probabilities of the interactions at the end of 15 Laplante M, Sabatini DM. mTOR signaling in growth control and disease. Cell 2012; the random walk process, and each mij reflects the weights (t-values) of 149: 274–293. immediate nodes i and j, and the weights and connectivity of nodes in the 16 Cresswell P, Ackerman AL, Giodini A, Peaper DR, Wearsch PA. Mechanisms of MHC local network neighborhood. To control for topological bias in the network, class I-restricted antigen processing and cross-presentation. Immunol Rev 2005; 0 we also calculate mij , which is calculated by setting all w ¼ 1 (that is, all 207: 145–157. t ¼ 0). Final EF values are defined as the log-likelihood 17 Vyas JM, Van der Veen AG, Ploegh HL. The known unknowns of antigen pro- m cessing and presentation. Nat Rev Immunol 2008; 8: 607–618. EF ¼ log ij : ij m0 18 Wolpert F, Roth P, Lamszus K, Tabatabai G, Weller M, Eisele G. HLA-E contributes ij to an immune-inhibitory phenotype of glioblastoma stem-like cells. J Neu- As a pathway or a functional process is technically a set of interactions, roimmunol 2012; 250: 27–34. we define probability of a pathway/functional process k, jk, as the 19 Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW et al. cumulative probability of its constituent interactions: X Systematic identification of genomic markers of drug sensitivity in cancer cells. j ¼ m : Nature 2012; 483: 570–575. k ij 20 Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S et al. The ij2k Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug Finally, a PF is the log-likelihood sensitivity. Nature 2012; 483: 603–607. jk 21 Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T et al. A collection of breast PFk ¼ log 0 : jk cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell NetWalk analyses were performed in NetWalker, a stand-alone software 2006; 10: 515–527. suite for network-based genomic data analyses.14 22 Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma char- acterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010; 17: CONFLICT OF INTEREST 98–110. 23 Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ et al. Variations in DNA The authors declare no conflict of interest. elucidate molecular networks that cause disease. Nature 2008; 452: 429–435. 24 Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC et al. An integrated approach to uncover drivers of cancer. Cell 2010; 143: 1005–1017. ACKNOWLEDGEMENTS 25 Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D et al. Module networks: This work was supported in part by Marlene-Ride Cincinnati Breast Cancer identifying regulatory modules and their condition-specific regulators from gene Foundation Award and Cincinnati Children’s Trustee Award to KK. We thank Biplab expression data. Nat Genet 2003; 34: 166–176. DasGupta for helpful discussions of the results. ASG acknowledges CONACYT-Me´xico 26 Fukuoka M, Wu YL, Thongprasert S, Sunpaweravong P, Leong SS, Sriuranpong V for support from Estancias Posdoctorales al Extranjero (grant number 203863). et al. Biomarker analyses and final overall survival results from a phase III, randomized, open-label, first-line study of gefitinib versus carboplatin/paclitaxel in clinically selected patients with advanced non-small-cell lung cancer in Asia REFERENCES (IPASS). J Clin Oncol 2011; 29: 2866–2874. 1 Oda K, Matsuoka Y, Funahashi A, Kitano H. A comprehensive pathway map of 27 Dahabreh IJ, Linardou H, Siannis F, Kosmidis P, Bafaloukos D, Murray S. Somatic epidermal growth factor receptor signaling. Mol Syst Biol 2005; 1: 0010. EGFR mutation and gene copy gain as predictive biomarkers for response to 2 Yarden Y, Sliwkowski MX. Untangling the ErbB signalling network. Nat Rev Mol Cell tyrosine kinase inhibitors in non-small cell lung cancer. Clin Cancer Res 2010; 16: Biol 2001; 2: 127–137. 291–303. 3 De Luca A, Maiello MR, D’Alessio A, Pergameno M, Normanno N. The RAS/RAF/ 28 Troiani T, Martinelli E, Capasso A, Morgillo F, Orditura M, De Vita F et al. Targeting MEK/ERK and the PI3K/AKT signalling pathways: role in cancer pathogenesis and EGFR in pancreatic cancer treatment. Curr Drug Targets 2012; 13: 802–810.

Oncogene (2014) 5078 – 5089 & 2014 Macmillan Publishers Limited EGFR pathway footprints in human cancers A Lane et al 5089 29 Taylor TE, Furnari FB, Cavenee WK. Targeting EGFR for treatment of glioblastoma: 38 Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N et al. Pathway molecular basis to overcome resistance. Curr Cancer Drug Targets 2012; 12: Commons, a web resource for biological pathway data. Nucleic Acids Res 2011; 197–209. 39(Database issue): D685–D690. 30 Saxena R, Dwivedi A. ErbB family receptor inhibitors as therapeutic agents in 39 Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V et al. TRANSFAC: an breast cancer: current status and future clinical perspective. Med Res Rev 2010; 32: integrated system for gene expression regulation. Nucleic Acids Res 2000; 28: 166–215. 316–319. 31 Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz Jr. LA, Kinzler KW. 40 Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S et al. ORegAnno: Cancer genome landscapes. Science 2013; 339: 1546–1558. an open-access community-driven resource for regulatory annotation. Nucleic 32 Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants Acids Res 2008; 36(Database issue): D107–D113. from high-throughput sequencing data. Nucleic Acids Res 2010; 38: e164. 41 Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C et al. 33 Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P et al. Human Architecture of the human regulatory network derived from ENCODE data. protein reference database--2006 update. Nucleic Acids Res 2006; 34(Database Nature 2012; 489: 91–100. issue): D411–D414. 42 Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. 34 Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011; 27: 1739–1740. et al. MINT: the molecular INTeraction database. Nucleic Acids Res 2007; 43 Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B et al. HMDB: a 35(Database issue): D572–D574. knowledgebase for the human metabolome. Nucleic Acids Res 2009; 37(Database 35 Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B et al. issue): D603–D610. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 2005; 44 Schellenberger J, Park JO, Conrad TM, Palsson BO. BiGG: a biochemical genetic 33(Database issue): D428–D432. and genomic knowledgebase of large scale metabolic reconstructions. BMC 36 Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW. BIND--the Bioinformatics 2010; 11: 213. biomolecular interaction network database. Nucleic Acids Res 2001; 29: 242–245. 45 Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic 37 Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M et al. The Acids Res 2000; 28: 27–30. BioGRID interaction database: 2008 update. Nucleic Acids Res 2008; 36(Database 46 Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information issue): D637–D640. at NCBI. Nucleic Acids Res 2007; 35(Database issue): D26–D31.

Supplementary Information accompanies this paper on the Oncogene website (http://www.nature.com/onc)

& 2014 Macmillan Publishers Limited Oncogene (2014) 5078 – 5089