ARTICLE

Received 16 Sep 2015 | Accepted 18 Jul 2016 | Published 30 Aug 2016 DOI: 10.1038/ncomms12601 OPEN Cross-species identification of genomic drivers of squamous cell carcinoma development across preneoplastic intermediates

Vida Chitsazzadeh1,2,*, Cristian Coarfa3,*, Jennifer A. Drummond4, Tri Nguyen5, Aaron Joseph6, Suneel Chilukuri7, Elizabeth Charpiot7, Charles H. Adelmann1,2, Grace Ching1,2, Tran N. Nguyen1, Courtney Nicholas1, Valencia D. Thomas2, Michael Migden2, Deborah MacFarlane2, Erika Thompson8, Jianjun Shen9, Yoko Takata9, Kayla McNiece10, Maxim A. Polansky10, Hussein A. Abbas11, Kimal Rajapakshe3, Adam Gower12, Avrum Spira12, Kyle R. Covington3,4, Weimin Xiao13, Preethi Gunaratne13, Curtis Pickering14, Mitchell Frederick14, Jeffrey N. Myers14, Li Shen15, Hui Yao15, Xiaoping Su15, Ronald P. Rapini2,10, David A. Wheeler3,4, Ernest T. Hawk16, Elsa R. Flores11 & Kenneth Y. Tsai1,2

Cutaneous squamous cell carcinoma (cuSCC) comprises 15–20% of all skin cancers, accounting for over 700,000 cases in USA annually. Most cuSCC arise in association with a distinct precancerous lesion, the actinic keratosis (AK). To identify potential targets for molecularly targeted chemoprevention, here we perform integrated cross-species genomic analysis of cuSCC development through the preneoplastic AK stage using matched human samples and a solar ultraviolet radiation-driven Hairless mouse model. We identify the major transcriptional drivers of this progression sequence, showing that the key genomic changes in cuSCC development occur in the normal skin to AK transition. Our data validate the use of this ultraviolet radiation-driven mouse cuSCC model for cross-species analysis and demon- strate that cuSCC bears deep molecular similarities to multiple carcinogen-driven SCCs from diverse sites, suggesting that cuSCC may serve as an effective, accessible model for multiple SCC types and that common treatment and prevention strategies may be feasible.

1 Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 2 Department of Dermatology, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 3 Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA. 4 Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA. 5 Northwest Diagnostic Clinic, Houston, Texas 77090, USA. 6 Skin and Laser Surgery Associates, Pasadena, Texas 77505, USA. 7 Bellaire Dermatology, Bellaire, Texas 77030, USA. 8 Sequencing and Microarray Facility, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 9 Next Generation Sequencing Facility, Smithville, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 10 Department of Dermatology, University of Texas Medical School at Houston, Houston, Texas 77030, USA. 11 Department of Biochemistry and Molecular Biology, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 12 Department of Medicine, Boston University School of Medicine, Boston, Massachusetts 02215, USA. 13 Department of Biology and Biochemistry University of Houston, Houston, Texas 77204, USA. 14 Department of Head & Neck Surgery, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 15 Department of Bioinformatics & Computational Biology, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. 16 Department of Clinical Cancer Prevention, University of Texas MD Anderson Cancer Center Houston, Houston, Texas 77030, USA. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to K.Y.T. (email: [email protected]).

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

ctinic keratoses (AK) are likely the most common cuSCC with Mohs surgery (Table 1). cuSCC tumour cores precancerous lesion in humans, affecting up to 5.5% of were extracted before Mohs surgery with matched samples of Awomen and 13.9% of men in USA, accounting for 5.2 peri-tumoural clinically normal skin within 1 cm of the tumour million outpatient visits per year at an estimated annual cost of removed in the course of reconstruction. For most patients, a over $1 billion1,2. AKs are scaly lesions, often readily appreciated distinct AK was also isolated, often from the same general field on sun-exposed skin. Histologically, they are characterized by (Fig. 1a–c,g,h). epidermal dysmaturation and partial thickness basal and spinous In parallel, we established chronically UVR-irradiated SKH-1E layer atypia. In time, this atypia may extend to the full thickness Hairless mice using solar simulators (Oriel) as a highly relevant of the epidermis (AK3/squamous cell carcinoma in-situ) or model for UVR-induced human cuSCC23,24 (Fig. 1d–f,i). beyond, culminating in invasive cutaneous squamous cell SKH-1E hairless mice are highly susceptible to UVR-induced carcinoma (cuSCC). Ultraviolet radiation (UVR) is the main skin tumours, UVR-induced immunosuppression and DNA aetiological factor implicated in AK and cuSCC pathogenesis. damage23. Solar simulators more accurately simulate terrestrial Approximately 0.6% of clinically diagnosed AKs are estimated UVR exposure than do fluorescent ultraviolet bulbs24. Thus our to progress to cuSCC within 1 year and 2.6% are estimated to model ensures a useful platform in which we can test progress within 4 years3. Thus, the standard practice of chemoprevention approaches. Tumours in these mice develop destroying these lesions is well founded, but there is still no (ref. 25), RAS (ref. 26) and CDKN2A (ref. 27) mutations in rationally designed way of preventing their progression, and there similar proportions to those in human cuSCC, along with copy are still up to 700,000 cases of cuSCC in USA every year4. number variations that map to ones reported in human cuSCC Destructive therapies are effective but management of high-risk ( 3p, 11p and 9q) (refs 28–30) Serial Analysis of populations such as organ transplant recipients is challenging and Expression (SAGE) mRNA data from this systemic compounds such as retinoids have substantial adverse model, comparing UVR-induced cuSCC to NS epidermis, shows reactions2,5. AKs are almost always treated, usually quickly substantially similar patterns of changes to our human data and easily on an outpatient basis; however, the morbidity including overexpression of matrix metalloproteinases and and economic burden of multiple treatments is high2,5. hyperproliferative keratins31. Importantly, these mice develop Understanding the genetic alterations that dictate AK formation precancerous papillomas (PAP) and cuSCC following chronic and progression to cuSCC forms the molecular basis for rationally low-dose UVR exposure23. designed targeted cancer chemoprevention for an extremely Six littermate female Hairless mice were chronically irradiated common skin cancer. with 12.5 kJ m À 2 of ultraviolet B (UVB) weekly for 100 days, and To date, molecular genetic studies of AK have largely centred 14 days following cessation of irradiation, killed at which time, on known tumour suppressor . TP53, RAS, CDKN2A chronically irradiated skin (CHR), PAP, and cuSCC were isolated. mutations and loss of CDKN2A and p53 expression have been All papillomas were grade 1 or grade 2 (not grade 3) and all identified in AK6–8, as well as extensive loss-of-heterozygosity cuSCC were grade 1 or grade 2 (ref. 23). All human and mouse and chromosomal aberrations9. What dictates whether or not samples were histologically validated with estimated 80% AKs progress to cSCC is inadequately understood as these genetic tumour cellularity for AK/PAPs and cuSCCs. The chronically lesions are also commonly found in cuSCC. Amplifications of UVR-exposed samples from both patients (NS) and mice (CHR) epidermal growth factor (EGFR) and c- have been exhibited clear histologic evidence of solar damage including identified in AK and cuSCC10,11. Loss of INPP5A and CKS1B elastosis, fibrosis and chronic inflammation (Fig. 1). amplification have been demonstrated in human cuSCC and smaller proportions of AKs12,13. Gene signatures that distinguish SCC from AK or irradiated skin have been identified, but they Mutational analysis. Exome sequencing (Illumina Hi-Seq) was have not been refined to identify a mechanistic basis for performed on a subset (Table 1) of collected samples with an progression14–16. Few of the multiple attempts at genome-wide average coverage of 135 Â ±22 (mean±s.d.). The mutational analysis of AK and cuSCC15,17–21 have used matched load varied widely across our cohort of well-differentiated pri- histologically validated lesions from individual patients16 and mary cuSCC, averaging 2,927 somatic variants (range 385–9,156) all have employed several platforms known to have potentially or 45.7 variants per Mb (Fig. 2a), which is congruent with high annotation error rates22. Given these challenges, it is not previously reported results of about 50 mutations per Mb for surprising that it has been difficult to identify drivers of cuSCC32–35, keeping in mind that some AKs and cuSCCs were progression when comparing tumour tissue to their normal referenced to UVR-exposed peri-tumoural NS and not germline counterparts, or when comparing unmatched samples. (Table 1). AKs had substantially fewer variants, with an average of In this study, we sought to identify important genetic events 1,186 variants (range 290–1,873) or 18.5 per Mb. that drive squamous cell carcinoma (SCC) development through To our surprise, the clinically normal, chronically UVR- combined analysis of next-generation sequencing of matched exposed skin of patients harboured an average of 372 somatic patient samples with a UVR-driven mouse model to identify key variants (range 23–1,264) across the exome when referenced to pathways. Our approach minimizes the impact of inter-individual germline DNA samples obtained from saliva, corresponding to an variability and annotation errors, while enabling identification of average of 5.8 variants per Mb (Fig. 2a). This indicates that the the most biologically significant pathways through cross-species skin sustains substantial mutagenic insults in the course of analysis. We compared non-lesional, chronically UVR-exposed chronic UVR exposure, as recently reported in UVR-exposed skin (normal skin, ‘NS’ in human, ‘CHR’ in mouse) to eyelid skin36. TP53 mutations have been described before in preneoplastic AK (human)/papilloma (mouse) and subsequently UVR-exposed skin; however it was not known if this represented to cuSCC using successive pairwise comparisons as well as ongoing selection specifically for TP53 mutation37. Hi-depth progression models to highlight potential targets for cancer targeted sequencing of 74 genes has demonstrated an estimated prevention. five mutations per Mb in chronically UVR-exposed eyelid skin, with a strong preponderance of NOTCH1-3, TP53 and FGFR3 Results mutations suggesting positive selection for these mutations36. Patient samples and mouse model. A total of 27 tissue samples The spectrum of mutations is very strongly dominated by were isolated from nine patients who were treated for invasive transitions between cytosine and thymine, in particular from

2 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

Table 1 | Clinico-pathological characteristics of patient cohort and sequencing performed.

Patient Sample name (NS/AK/SCC) Gender Location Age (years) Exome seq RNA seq microRNA seq 29 26 27 1 Saliva M 56 1 NS Left medial neck 1 1 1 AK Left dorsal forearm 1 1 2 NS M Left anterior shoulder 78 1 1 1 AK Left temple 1 1 1 SCC Left anterior shoulder 1 1 1 3 NS M Left infraorbital cheek 70 1 1 1 AK Right dorsal forearm 1 1 1 SCC Left infraorbital cheek 1 1 1 4 Saliva F 60 1 NS Left ulnar forearm 1 1 1 AK1 Right anterior shoulder 1 1 AK2 Left chest sternum 1 1 1 SCC1 Left ulnar forearm 1 1 1 SCC2 Left dorsal hand 1 1 1 5 Saliva M 75 1 NS Crown of scalp 1 1 1 AK Scalp 1 1 1 SCC Crown of scalp 1 1 1 6 Saliva M 82 1 NS Left sternocleidomastoid 1 1 1 AK Left zygomatic arch 1 1 SCC Left sternocleidomastoid 1 1 1 8 Saliva M 83 1 NS Right temple 1 1 1 AK Right temple 1 1 SCC Right temple 1 1 1 10 Saliva M 79 1 NS Left temporal hairline 1 1 1 AK Right scalp 1 1 1 SCC Left temporal hairline 1 1 12 Saliva F 74 1 AK Right knee 1 1 1 SCC Right pretibia 1 1

AK, actinic keratosis; NS, normal skin; SCC, squamous cell carcinoma. cytosine to thymine (C-T) (Fig. 2b and Supplementary (resulting in p.P28S) in only two pairings (one cuSCC, AK), Figs 1,2a,b). The proportion of dinucleotide variants that are which appears to be within the same functional domain as a CC-TT is 90% for both AK and cuSCC and 84% for NS previously reported hotspot p.S25F (ref. 40). Importantly, AKs (Supplementary Fig. 1), also highly reflective of UVB exposure. not only have mutations in all of the known SMGs, but AKs have Given the relative statistical rarity of CpG islands across the the ‘greatest’ proportion of SMGs represented. This is consistent human genome, the high proportion of C-T transitions at CpG not only with the notion that AKs have acquired the mutational sites reflects the enhanced susceptibility of methylated CpG to events necessary for cuSCC formation, but that AKs may harbour deamination and to photoproduct formation38. multiple clones that have the capacity to ultimately give rise to By using non-negative matrix factorization (NMF) -based cuSCC (Fig. 3a). There is significant overlap in these variant allele spectral deconvolution, 21 mutation signatures were derived frequency profiles, as all lesion types span a continuum of from over 6,000 specimens across 32 cancer types profiled in distributions with a trend of increasing mutational load reflecting TCGA39. Three signatures predominated in our samples, which increasing monoclonality in NS and AK samples (Fig. 3a). were strongly enriched for C-T transitions (Fig. 2b and Of note, the NS sample (patient 1) with high variant allele Supplementary Fig. 2a,b). AK and cuSCC are clearly driven by frequencies, has evidence for three TP53 mutations (Fig. 3b) UVR exposure, with substantial enrichment for the classic UVB highlighting how significant mutational loads can be acquired C-T transition signature at dipyrimidines38 in a manner that even in the absence of evident dysplasia. Given that mutations correlated with increasing mutational load (Fig. 2b). NS samples exist in hundreds of clones within UVR-exposed skin36 (Fig. 2a), had more evenly represented mutation signatures, including those our data suggest that dominant clones may be emerging in AK associated with liver toxin exposure, temozolamide (Tem) and cuSCC, particularly in the latter (Fig. 3a and Supplementary exposure and CpG sites (Fig. 2b and Supplementary Fig. 2a,b). Fig. 3), although this sample size was limited. To identify significantly mutated genes (SMG), we identified To assess whether a pattern of mutationally driven progression those that were recurrently mutated in at least seven pairings and could be ascertained, we then probed whether mutations that were either previously implicated in cuSCC or have COSMIC overlapped between the three groups of samples. Globally, the frequencies over 400 (Fig. 2c and Supplementary Data 1). These number of site-specific mutational overlaps between any two included genes found to be mutated in metastatic and aggressive sample classes was extremely low and not likely to be significantly cuSCC (Fig. 2c)32–35, most prominently, TP53, NOTCH1-2, FAT1 different from chance, though the AK to cuSCC comparison had and MLL2. We also identified a rare KNSTRN missense mutation the majority, averaging 3.4 overlaps versus NS to AK (0.75) and

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

a b c

d e f

g 5A 5S h i 10A 10S 8A 2A 8S 3S 6A 1N 6S

4A1 2S Patient 1 1A 1N Patient 2 2A 2S 4A2 Patient 3 3A 3S 4S1 Patient 4 – set 1 4A1 4S1 1A Patient 4 – set 2 4A2 4S2 3A Patient 5 5A 5S Patient 6 6A 6S Patient 8 8A 8S 4S2 Patient 10 10A 10S 12A Patient 12 12A 12S 12S

Figure 1 | Anatomic distribution and histology of representative tissues isolated from human patients and Hairless mice. (a–c) Normal (peri-tumoural) skin, actinic keratosis and invasive cuSCC, respectively, are shown from human patients. Scale bar, 50 mm. Human samples were processed following combined RNAlater and formalin fixation, resulting in significant cytoplasmic shrinkage. (d–f) Normal (peri-tumoural) skin, papillomas and invasive cuSCC, respectively, are shown from Hairless mice. (g) Anatomic locations of matched samples from human patients. (h) Tabular list of matched samples from human patients: (S) denotes the cuSCC with adjacent NS, (N) denotes normal skin and (A) denotes the AK. For patient 1, only NS and AK were available for analysis. (i) Representative skin samples from Hairless mice are shown, which include smaller papillomas and a smaller number of invasive carcinomas.

NS to cuSCC (1.8) (Table 2). The most significant degree of site- physical proximity (Fig. 1g—patients 1, 3 and 4 with distant AK specific mutational overlap occurred between two cuSCCs from versus NS/cuSCC and patients 5, 6, 8 and 10 with nearby AK patient 4 (Table 2), which were in close physical proximity versus NS/cuSCC). The two cuSCCs from patient 4, which shared (Fig. 1g; forearm versus hand), strongly suggesting that these the greatest overlap (195), shared a TP53W23X mutant (Fig. 3b). physically distinct tumours arose at least in part from a common Among all the human tumour samples sequenced, only amino clone. When viewed within patients, functionally significant genes acid R248, which is a mutational hotspot in cuSCC41, was were mutated in multiple samples, including TP53 (four patients), multiply altered within two patients (patients 1 and 4). These FAT1 (three patients) and MLL3 (three patients) (Supplementary findings are consistent with the concept that mutations that Fig. 4 and Supplementary Table 1). Overlaps in AK-cuSCC were inactivate tumour suppressor genes are often distributed across the most common even among these genes, suggesting that they the entire coding region and that there has not been a strongly are specifically targeted in the development of cuSCC. dominant oncogenic mutation identified in cuSCC. An illustrative example is provided by TP53, in which multiple mutations were present (Fig. 3b). When placed in the context of Transcriptomic analysis. RNA and miRNA-seq were performed overall mutational loads and site-specific overlap (Fig. 3b), it is on the Illumina Hi-Seq platform to yield an average of 64 million evident that the degree of overlap did not closely correlate with and 6.1 million reads, respectively. No significantly expressed

4 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE a b 100 10,000 4,000 90 9,000 3,000 Indel NS 2,000 80 8,000 DNV 1,000 SNV 70 7,000 0 Total 4,000 60 6,000 UVB 3,000

AK Tem 50 5,000 2,000 Liver toxin 40 4,000 1,000 CpG

Signature counts 0 30 3,000

Total number of variants 4,000 Percent of variants in category 20 2,000

3,000 SCC

10 1,000 2,000 1,000 0 – 0 0 2,500 5,000 7,500 5-AK 1-AK 4-AK 3-AK 2-AK 5-NS 4-NS 8-NS 6-NS 1-NS 12-AK 10-AK 10-NS 8-SCC 3-SCC 2-SCC 6-SCC 5-SCC Total counts 4-SCC1 4-SCC2

c

Total mutations 449 290 257 385 578 347 319 503 1,264 1,873 1,631 1,332 1,555 9,156 1,056 1,447 4,924 3,614

Tissue type

Sample name

Gene 1-AK 1-NS 10-AK 12-AK 2-AK 2-SCC 3-AK 3-SCC 4-AK 4-NS 4-SCC1 4-SCC2 5-AK 5-SCC 6-NS 6-SCC 8-NS 8-SCC TP53 FAT1 MLL3 FAT3 NEB MLL2 SPHKAP NOTCH1 NOTCH2 NF1 FAM135B NYAP2 FLT1

Most damaging mutation type: Tissue types:

Missense SNV Skin Frame-shift indel AK Splice-site SNV SCC Nonsense SNV

Figure 2 | The spectrum of mutations in human samples is dominated by C toT transitions and show increasing mutation burden across cuSCC development. (a) cuSCC display very high mutational loads of 45.7 variants per Mb, with NS and AK samples harbouring an average of 5.8 and 18.5 variants per Mb, respectively. There are strongly dominated by single nucleotide variants. (b) NMF-derived orthogonal mutational profiles derived from over 6,000 human cancers confirm a strong enrichment for CpG-associated C-T transitions classically associated with UVB exposure, particularly for AK and cuSCC. For each lesion, the proportion of mutations attributable to a specific signature is plotted as a function of total mutation counts. This correlates strongly with the increasing mutational burden observed in AK and cuSCC, suggesting that the increase is attributable solely to UVB exposure. Three other profiles dominated by C-T transitions are significantly represented in the mutational data and relatively enriched in NS, including ones first described in the context of temozolamide exposure (Tem), one attributed to liver toxins, and one associated with CpG sites. These latter two signatures likely reflect background mutational processes in this context. (c) SMG from this cohort, match those identified previously studied cohorts of cuSCC.

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

a Skin 1-NS (1,264)

Sample type

3-AK Skin

(1,555) AK 2-AK AK (1,631)

SCC Density of variants (a.u.)

5-SCC (4,924) SCC 3-SCC (9,156) 2-SCC (1,332)

0.1 0.5 1.0 Tumor variant allele fraction

b P151S S90F, S96F F134L R282W R249G/1 N247fs R248Q1 2 R248W R248W4 E258K 6 E286K R248W4 P278F 6 M133K W23X3 also 2: 195 c.783-2A>T PT1 PT3 PT4 PT5

M237I R196X P250L 6 2

NS AK SCC Diameter proportional to PT6 PT8 PT10 mutational load

Figure 3 | Significant mutational heterogeneity and overlap exists between AK and cuSCC. (a) Histograms of variant allele frequencies in NS, AK and cuSCC, show that NS have a large number of low-frequency variants. AK and cuSCC have a more heterogeneous distribution of variant frequencies, with higher-frequency variants, indicative of a general towards the emergence of dominant clones. Outliers with high frequencies of mutations in the NS and AK groups (labelled by patient #) are annotated with mutation frequencies in parentheses to show how these frequencies correlate with increasing monoclonality. (b) Point mutations in TP53 compared with overall mutation counts and site-specific ovelaps. Specific mutations are indicated in text, size of the circle indicates the total number of mutations of that sample, with overlaps 41 shown between lesions. R248 is the only changed in multiple samples within patients 1, 4. Notes regarding specific findings: 1: complex variant at R248/9 hotspot; 2: splice-site variant, 3: appears in both SCC-1 and SCC-2, 4: base change differs between AK and SCC-1 (c.GG741AA in AK, and c.C742T in SCC-1). fusion or viral transcripts were detected. A correlation matrix all patient samples, six out of the eight complete sets (with tissue using mRNAs differentially expressed in at least one pairwise from all three lesion types) show that AK segregate with cuSCC comparison demonstrated a clear distinction between NS and by Pearson correlation, a pattern substantiated by the corre- cuSCC with AKs interspersed across the spectrum (Fig. 4a), a sponding gene expression heatmaps (Fig. 4b). AK and cuSCC are pattern also observed in the corresponding gene-sample heatmap both significantly enriched to a similar degree over peri-tumoural and principal component analysis (PCA) plots across all genes NS for a 70-gene chromosomal instability signature derived from (Supplementary Figs. 5a,b and Supplementary Data 2). multiple cancers (Fig. 4c), again reinforcing the idea that AKs This shows that even given uniform histological criteria, there have acquired key genomic features of invasive cuSCC42. is a spectrum of AK that transcriptomically resembles RNA-seq results performed on six sets of samples from UVR-exposed NS versus some that resemble cuSCC. Global Hairless mice corroborated this concept even more strongly, even unsupervised clustering of all expressed genes revealed that across though they are an outbred strain. Here, the correlation matrix of

6 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

Table 2 | Site-specific mutational overlap.

PT NS AK SCC NS/AK AK/SCC NS/SCC NS/AK/ SCC1 versus NS/AK/ NS/AK/ NS/AK/SCC1/ Total Total Total SCC SCC2 SCC1 SCC2 SCC2 1 1,264 449 — 2 — — — — — — — 2 — 1,631 1,332 — 4 — — — — — — 3 — 1,555 9,156 — 6 — — — — — — 4 257 1,056 385 0 1 0 — — 1 — 1 4 (SCC2) — — 578 — 0 1 — 195 — 0 1 5 24 1,447 4,924 1 6 0 1 — — — — 6 347 — 3,614 — — 6 — — — — — 8 319 — 503 — — 2 — — — — — 10 23 1,873 — 0 — — — — — — — 12 — 290 — — — — — — — — — Average 0.75 3.4 1.8

AK, actinic keratosis; cuSCC, cutaneous squamous cell carcinoma; NS, normal skin. The overlap of site-specific variants in NS, AK and cuSCC shows that the greatest amount of overlap between AK and cuSCC. These average 3.4 between AK and cuSCC, 0.75 between NS and AK, and 1.8 between NS and cuSCC. The most overlap occurred between two SCCs from patient 4 (195), which were from two distinct lesions in close physical proximity, suggesting that they may have arisen at least partially from a common clone. differentially expressed genes in at least one pairwise comparison As expected, b-catenin/Wnt signalling plays an important role clearly delineates CHR away from PAP and cuSCC (Fig. 4d), in cuSCC pathogenesis and this is reflected in the importance in a pattern also observed in the corresponding gene-sample this analysis of both TCF3 and LEF1 activity. TCF3 can function heatmap and PCA plots across all genes (Supplementary as a transcriptional repressor and activator and has been Figs. 5a,c and Supplementary Data 3). implicated in skin homoeostasis and wound healing47. Our data Next we sought to identify transcription factors that could be suggest an important role in cuSCC development as well responsible for the gene expression changes observed in the (Supplementary Fig. 6c). LEF1 appears to be activated across course of AK and cuSCC development. Using TRANSFAC-based the spectrum of samples, with targets significantly downregulated motif analysis, we identified significantly overrepresented motifs early and upregulated late (Supplementary Fig. 6d), and and their associated transcription factors for all pairwise is an established effector of b-catenin/Wnt signaling48. The comparisons in both human and mice. Although this method is downregulation of NFAT targets early (Supplementary Fig. 6e) not unbiased and does not definitively identify biochemical may reflect an inhibition of keratinocyte differentiation programs mechanisms of regulation, we surmised that cross-species analysis that may be modulated by NOTCH signalling and further would enable us to identify significant drivers. First, significant compromise TP53-dependent tumour suppression49. candidate factors were ranked by their enrichment scores Similarly, the predicted downregulation of AP1 target genes (Supplementary Data 4) and selected only if their targets were across the continuum of cuSCC development (Supplementary enriched in genes sets that changed in the same direction for both Fig. 6f) suggests a compromise of normal epidermal species in any one of the adjacent pairwise comparisons: NS/CHR differentiation50. FREAC2 (FOXF2) targets are downregulated to AK/PAP or AK/PAP to cuSCC. These changes also had to be across the development sequence (Supplementary Fig. 6g), significantly enriched in the NS/CHR to cuSCC comparison. This and although it has not been specifically implicated in skin analysis yielded a total of 17 transcription factors, 11 of which cancer, downregulation of its expression promotes epithelial– were significant early, and six of which were significant late mesenchymal transition in basal breast cancer51. (Fig. 5a). Four factors were globally important across all pairwise Consistent with our global transcriptomic data, most of the comparisons: ETS2, SP1, FREAC2 (FOXF2) and AP1. Gene transcriptional drivers are predicted to act early in the NS/CHR to interaction networks assembled from the overlap of genes AK/PAP transition, including NFY, and ELK1, with some, predicted to be regulated by multiple transcription factors reveal including MYC activation, occurring late (Fig. 5a and that these transcription factors are highly interconnected, Supplementary Fig. 6h,i). Both E2F and MYC have been potentially co-regulating up to several hundred genes in concert implicated in cuSCC development44. Although using individual (Fig. 5b). Though these interactions must be functionally pairwise analysis suggested key transcriptional regulators, we also validated in detail, our data suggest that a small core of used a linear mixed effects (LME) model previously employed to transcriptional regulators drive cuSCC development. identify differentially expressed genes in matched normal, Importantly, some transcription factors, such as ETS2, showed premalignant and tumour samples from patients with lung SCC largely unidirectional target modulation, but others such as (LUSC)52. Following cross-species overlap of mouse and human TCF3 and LEF1 had significant target modulation in opposite data, this model also clearly demonstrated that the majority directions, reflecting transition-specific changes (Supplementary of changes occur in the earliest transition from NS to AK/PAP Fig. 6a–i). Profiles of the four globally important transcription versus the subsequent transition to cuSCC (Fig. 5c and factors showed significant upregulation of ETS2 and SP1 and Supplementary Data 5). downregulation of FREAC2 (FOXF2) and AP1 targets across the In GATHER-based analysis of TRANSFAC motifs within entire progression sequence (Fig. 5a and Supplementary Data 4). concordantly differentially expressed genes53 by this analysis, E2F ETS2 (Supplementary Fig. 6a) is pro-oncogenic in multiple other and NFY (Supplementary Fig. 6h,i and Supplementary Data 6) contexts, and is downstream of the ERK MAP kinase signalling were again highlighted in the early stage transition from NS to module43, which is known to be important in sporadic and AK/PAP in both mouse and human. ELK1, which is likewise BRAF-inhibitor-induced cuSCC development44,45. The global regulated by the ERK MAPK pathway, was also significant by upregulation of SP1 targets (Supplementary Fig. 6b) may reflect both analyses (Fig. 5a and Supplementary Data 6). No enriched its ability to partner with a number of transcription factors in signatures were identified for the late-stage cancer46. AK/PAP to cuSCC transition using the LME model.

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

ab Pearson correlation coefficient

–1 0 1 Pearson correlation coefficient SCC AK –10 1 NS

4-SCC-1 1-AK 12-SCC 4-AK-2 6-SCC 6-AK SCC 5-SCC 4-SCC-2 AK 10-SCC NS 2-SCC 8-SCC 3-AK 2-AK 3-SCC 4-AK-1 10-AK 3-NS 8-AK 6-NS 5-NS 10-NS 8-NS 5-AK 4-NS 2-NS Z-score 12-AK 5-AK 8-AK 2-AK 3-AK 6-AK 1-AK 2-NS 4-NS 8-NS 5-NS 6-NS 3-NS 12-AK 10-AK 10-NS

3-SCC 8-SCC 2-SCC 5-SCC 6-SCC –2 0 2 4-AK-1 4-AK-2 10-SCC 12-SCC 4-SCC-2 4-SCC-1 3-AK 5-AK 2-AK 6-AK 8-AK 4-NS 3-NS 5-NS 2-NS 6-NS 8-NS 10-AK 10-NS 3-SCC 6-SCC 2-SCC 5-SCC 8-SCC 4-AK-2 4-AK-1 10-SCC 4-SCC-1 4-SCC-2

cdPearson correlation coefficient SCC –1 0 1 PAP 1.0 CHR

HR8_SCC 0.5 HR2_SCC HR3_SCC Pt 12 HR7_SCC Pt 10 HR9_SCC Pt 8 HR2_PAP 0.0 HR9_PAP Pt 6 HR8_PAP Pt 5 HR7_PAP HR3_PAP CIN 70 score Pt 4–2 HR10_SCC Pt 4–1 –0.5 HR10_PAP Pt 3 HR8_CHR Pt 2 HR3_CHR HR9_CHR Pt 1 HR7_CHR –1.0 HR2_CHR NS AK SCC HR10_CHR Lesion HR3_PAP HR7_PAP HR8_PAP HR9_PAP HR2_PAP HR9_SCC HR7_SCC HR3_SCC HR2_SCC HR8_SCC HR2_CHR HR7_CHR HR9_CHR HR3_CHR HR8_CHR HR10_PAP HR10_SCC HR10_CHR Figure 4 | mRNA profiling across AK/papilloma and cuSCC development. (a) Correlation matrix of mRNAs differentially expressed in at least one signature in human samples shows that AKs span the spectrum of NS to cuSCC samples. (b) Unsupervised clustering of all genes across patient samples with complete sets (all three lesion types) demonstrates that in 6/8 sets, AK more closely resemble cuSCC. The Pearson correlation matrix is shown on top with the underlying heat map shown below. (c) A 70-gene signature of chromosomal instability derived from human cancers is highly enriched in AK and cuSCC to a similar degree, but not NS. (d) Correlation matrix of mRNAs differentially expressed in at least one signature in mouse samples demonstrates that PAPs much more closely resemble cuSCC than CHR.

On the basis of these sets of differentially expressed genes, CDK4, TP53, RABL6 and ERBB2, the latter two of which may be Ingenuity Pathway Analysis of both human and mouse data novel targets for intervention. was performed individually (Supplementary Table 2) and The transcriptional networks identified in both the pairwise identified strongly overlapping key pathways including analyses and the progression model analysis implicate pathways progression, mitotic roles of polo-like kinase and DNA damage important in cuSCC development, mostly early in the NS/CHR to checkpoint functions, as well as upstream regulators , , AK/PAP transition, notably E2F, ELK1 and NFY. ERK signalling

8 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

a b

MAZ Q6 NFY Q6 01 FOXO4 01 NRF1 Q6 FOXO4 01 LEF1 Q2 ETS2 B ELK1 02 SP1 Q6 ETS2 GABP B ETS2 B GABP B SP1 UP E12 Q6 LEF1 Q2 MYC Q2 Targets up E2F NFY MYC TCF3 PAX4 03 GABP ELK1 LEF1 MAZ MYC Q2 E2F Q6 PAX4 03 NRF1 E2F Q6 PAX4 AP4 E12 Q6 SP1 Q6 NRF1 Q6 ELK1 02 NFY Q6 01 MAZ Q6

MAZ Q6 AP1 C E12 Q6 LEF1 Q2 SP1 Q6 AP4 TCF3 NF1 Q6 AP1 C NFAT Q4 01 MAZ FOXO4 MAZ Q6 FREAC2 01

NFAT LEF1 FOXO4 01 AP4 Q5

DOWN NF1 Q6 Targets down E12 Q6 AP1 LEF1 Q2 FREAC2 NFAT Q4 01 FOXO4 01 PAX4 03 PAX4 03 AP4 Q5 FREAC2 01 SP1 Q6

400 500 600 10 50 100 200 300

Early All Late

c Early Late

JUNB/Junb VPS37B/Vps37b MAFF/Maff LDLR/Ldlr TGIF1/Tgif1 SOCS3/Socs3 HSPD1/Hspd1 RPL21/Rpl21 FGF18/Fgf18 FAM83E/Fam83e SMU1/Smu1 RHEB/Rheb EIF4A1/Eif4a1 DLST/Dlst TRIM8/Trim8 HYPK/Hypk PPP2R1B/Ppp2r1b PTPN23/Ptpn23 PKP4/Pkp4 SH3RF1/Sh3rf1 SPRED2/Spred2 PXDC1/Pxdc1 DLL1/Dll1 NHS/Nhs FOXK2/Foxk2 FLT1/Flt1 ACSL3/Acsl3 FKBP10/Fkbp10 MAT2A/Mat2a EEF1A1/Eef1a1 HAS2/Has2 CTH/Cth RPL3/Rpl3 TINAGL1/Tinagl1 RPS28/Rps28 CHKA/Chka NAT8L/Nat8l PPP1R3B/Ppp1r3b NFIL3/Nfil3 SRF/Srf KRT7/Krt7 HES1/Hes1 VEGFA/Vegfa RGS11/Rgs11 PHF10/Phf10 WNT3A/Wnt3a CCND1/Ccnd1 WTIP/Wtip EMCN/Emcn BMP3/Bmp3 COL25A1/Col25a1 SCHIP1/Schip1 KIF26A/Kif26a FGFR2/Fgfr2 DIXDC1/Dixdc1 CD81/Cd81 DCAKD/Dcakd TXNDC9/Txndc9 C14orf142/AK010878 C9orf78/BC005624 ATP13A1/Atp13a1 TNFRSF21/Tnfrsf21 ERH/Erh ALG5/Alg5 NDUFA12/Ndufa12 LYSMD2/Lysmd2 IL19/Il19 ADAMDEC1/Adamdec1 MTRF1L/Mtrf1l SMARCAD1/Smarcad1 MRPS10/Mrps10 CHAC2/Chac2 GORAB/Gorab GTF3C3/Gtf3c3 PNPT1/Pnpt1 ALG10/Alg10b PUS7L/Pus7l C1orf112/BC055324 RPAP3/Rpap3 POLR3F/Polr3f NFYB/Nfyb PDCD2/Pdcd2 ELP6/Elp6 RBM45/Rbm45 QRSL1/Qrsl1 DNAAF2/Dnaaf2 CAD/Cad ALG8/Alg8 WDR92/Wdr92 ZC3H8/Zc3h8 DPY30/Dpy30 CCDC51/Ccdc51 MRPL47/Mrpl47 THAP1/Thap1 WISP1/Wisp1 GEMIN2/Gemin2 CCDC18/Ccdc18 ZMYND19/Zmynd19 HAUS1/Haus1 RPA3/Rpa3 NAA38/Naa38 MIS18BP1/Mis18bp1 ANKRD26/Ankrd26 VHL/Vhl FGFR1OP/Fgfr1op FBXO48/Fbxo48 FANCM/Fancm TRIM13/Trim13 DHFR/Dhfr C11orf73/l7Rn6 FKBP3/Fkbp3 CEP78/Cep78 DMTF1/Dmtf1 COMMD3/Commd3 FAM162A/Fam162a RFC3/Rfc3 CSE1L/Cse1l TROAP/Troap ANAPC1/Anapc1 WDHD1/Wdhd1 PRMT5/Prmt5 TAF5/Taf5 EIF2B1/Eif2b1 HMBS/Hmbs UBE3D/Ube2cbp RAD51AP1/Rad51ap1 DSCC1/Dscc1 ERCC6L/Ercc6l ADAM12/Adam12 SAC3D1/Sac3d1 HIST1H2AK/Hist1h2ac NDC80/Ndc80 STIL/Stil USP1/Usp1 BUB3/Bub3 DBF4/Dbf4 DCPS/Dcps LRR1/Lrr1 RFC4/Rfc4 NUP205/Nup205 TMPO/Tmpo NUP107/Nup107 POLR2B/Polr2b RIOK2/Riok2 CSTF1/Cstf1 AIMP2/Aimp2 ANAPC7/Anapc7 TMEM209/Tmem209 PIGW/Pigw NOL8/Nol8 NUDCD2/Nudcd2 PGAM5/Pgam5 FARSB/Farsb LMNB2/Lmnb2 RAE1/Rae1 AASDHPPT/Aasdhppt FBXO5/Fbxo5 SRSF1/Srsf1 METAP1/Metap1 LRRC40/Lrrc40 G2E3/G2e3 CCT5/Cct5 FMNL3/Fmnl3 VBP1/Vbp1 CAND1/Cand1 HIST1H3F/Hist1h3i HIST1H2AG/Hist1h2an GINS3/Gins3 ORC6/Orc6 EXOSC8/Exosc8 TOPBP1/Topbp1 NUP37/Nup37 ACTL6A/Actl6a MTBP/Mtbp GMNN/Gmnn POLR3B/Polr3b LARP7/Larp7 MBD4/Mbd4 ZWILCH/Zwilch GEN1/Gen1 PALB2/Palb2 FAM136A/Fam136a BORA/Bora PARPBP/Par pbp ERI2/Eri2 PLK1/Plk1 SPDL1/Spdl1 CAPRIN1/Caprin1 KIAA1524/C330027C09Rik CKS2/Cks2 RPF1/Rpf1 TRIM59/Trim59 CPSF2/Cpsf2 DEPDC1B/Depdc1b CENPE/Cenpe CDK1/Cdk1 BMP1/Bmp1 CCNA2/Ccna2 PTTG1/Pttg1 CSTF3/Cstf3 NEK2/Nek2 CEP55/Cep55 COMMD8/Commd8 FAM64A/Fam64a DSC2/Dsc2 SKA2/Ska2 WNT7A/Wnt7a BLM/Blm SMC6/Smc6 SPATA24/Spata24 VWA9/Vwa9 ZWINT/Zwint POC1A/Poc1a KIAA0101/2810417H13Rik TIMELESS/Timeless SKA1/Ska1 ATIC/Atic KIF20B/Kif20b NUP155/Nup155 PPIH/Ppih CDC7/Cdc7 FAM76B/Fam76b FANCD2/Fancd2 ETAA1/Etaa1 MMS22L/Mms22l POLE2/Pole2 HIST1H2AH/Hist1h2ah USP39/Usp39 GTF3C2/Gtf3c2 DTL/Dtl PPIC/Ppic RRM2/Rrm2 SSRP1/Ssrp1 CDCA4/Cdca4 MYBL2/Mybl2 CENPK/Cenpk NCAPH/Ncaph AURKB/Aurkb KIF4A/Kif4 MELK/Melk GSG2/Gsg2 NUSAP1/Nusap1 MKI67/Mki67 KIF2C/Kif2c CDCA8/Cdca8 CLSPN/Clspn ASF1B/Asf1b MCM5/Mcm5 TACC3/Tacc3 C4orf46/4930579G24Rik CENPN/Cenpn XRCC2/Xrcc2 CENPW/Cenpw ARHGAP11A/Arhgap11a BUB1/Bub1 SASS6/Sass6 CENPH/Cenph MAD2L1/Mad2l1 C11orf82/4632434I11Rik ORC1/Orc1 TRIP13/Trip13 MCM4/Mcm4 CDC25A/Cdc25a CDC45/Cdc45 KDELR3/Kdelr3 CHEK1/Chek1 MCM7/Mcm7 PLK4/Plk4 FIGNL1/Fignl1 MCM2/Mcm2 DSN1/Dsn1 E2F7/E2f7 CENPI/Cenpi MCM3/Mcm3 ATAD2/Atad2 CDCA5/Cdca5 GINS1/Gins1 SMC2/Smc2 PCNA/Pcna RBL1/Rbl1 NUF2/Nuf2 UBE2T/Ube2t DEPDC1/Depdc1a EXO1/Exo1 ASPM/Aspm KIF14/Kif14 KIF23/Kif23 MCM10/Mcm10 RFC5/Rfc5 CCNB2/Ccnb2 BUB1B/Bub1b TPX2/Tpx2 CASC5/Casc5 POLQ/Polq TTK/Ttk TOP2A/Top2a KIF20A/Kif20a FBXW8/Fbxw8 NCAPG/Ncapg KIF15/Kif15 NCAPD2/Ncapd2 KIF18A/Kif18a FOXM1/Foxm1 CDC25C/Cdc25c SKA3/Ska3 PBK/Pbk SMC4/Smc4 BIRC5/Birc5 HMMR/Hmmr DLGAP5/Dlgap5 CDCA2/Cdca2 CENPA/Cenpa AURKA/Aurka UBE2C/Ube2c CKAP2/Ckap2 RACGAP1/Racgap1 HIST1H2AJ/Hist1h2af ESPL1/Espl1 WBSCR16/Wbscr16 SRSF9/Srsf9 RUVBL2/Ruvbl2 HIST1H3D/Hist1h3a SNRNP40/Snrnp40 RAD54B/Rad54b SGOL1/Sgol1 TFDP1/Tfdp1 COPS6/Cops6 MRPS24/Mrps24 DAP3/Dap3 TARS2/Tars2 XRCC6BP1/Xrcc6bp1 CDYL2/Cdyl2 CCDC150/Ccdc150 KNTC1/Kntc1 PRKDC/Prkdc ZNF526/Zfp526 PYGO2/Pygo2 TRAPPC11/Trappc11 PKN2/Pkn2 SF3A3/Sf3a3 PSMG3/Psmg3 IARS/Iars CENPM/Cenpm GINS2/Gins2 RFWD3/Rfwd3 CIAPIN1/Ciapin1 ZCCHC9/Zcchc9 OBFC1/Obfc1 PGM2/Pgm1 OIP5/Oip5 TPBG/Tpbg SF3B1/Sf3b1 ARNTL2/Arntl2 CMAS/Cmas ZBTB6/Zbtb6 TSEN34/Tsen34 HSPBP1/Hspbp1 ZMPSTE24/Zmpste24 TMX1/Tmx1 FABP5/Fabp5 TMEM117/Tmem117 GJB2/Gjb2 KRT16/Krt16 TRMT10A/Trmt10a POLD1/Pold1 TNC/Tnc AKR1B10/Akr1b8 DSG3/Dsg3 ODF2L/Odf2l ECT2/Ect2 CDKN3/Cdkn3 MRPL52/Mrpl52 MRTO4/Mrto4 TMPRSS11A/Tmprss11a SFXN1/Sfxn1 TWF1/Twf1 FAM83D/Fam83d EIF2S1/Eif2s1 S100A9/S100a9 S100A8/S100a8 LMNB1/Lmnb1 C19orf40/C230052I12Rik CENPF/Cenpf CPSF3/Cpsf3 EPRS/Eprs CENPL/Cenpl INTS7/Ints7 NGRN/Ngrn PI4K2B/Pi4k2b NIF3L1/Nif3l1 GLE1/Gle1 PPIE/Ppie TMEM186/Tmem186 RNF34/Rnf34 HIST1H2BB/Hist1h2bb PPP1R8/Ppp1r8 GINS4/Gins4 TIPIN/Tipin KIAA1731/5830418K08Rik OGDHL/Ogdhl DCK/Dck FANCI/Fanci BRCA2/Brca2 KIF11/Kif11 PRIM1/Prim1 DNAJC9/Dnajc9 EXOSC1/Exosc1 RABEPK/Rabepk C9orf85/1110059E24Rik ANAPC4/Anapc4 PCGF6/Pcgf6 SUV39H1/Suv39h1 ORC5/Orc5 MSL3/Msl3 TNPO1/Tnpo1 FASTKD3/Fastkd3 TMA16/Tma16 CCNE1/Ccne1 APRT/Aprt LSM4/Lsm4 BANF1/Banf1 ERGIC2/Ergic2 VPS25/Vps25 PSMD14/Psmd14 SUMO2/Sumo2 DUT/Dut PIN1/Pin1 HELB/Helb HIST1H4I/Hist1h4h LSM6/Lsm6 ORAI2/Orai2 COTL1/Cotl1 MRPS16/Mrps16 ITGA1/Itga1 ARPC4/Arpc4 RPS6KA4/Rps6ka4 REXO4/Rexo4 FAM49B/Fam49b SHFM1/Shfm1 PRPF40A/Prpf40a KBTBD8/Kbtbd8 PLRG1/Plrg1 PPIL1/Ppil1 HSPA8/Hspa8 TRAM1/Tram1 SNRPF/Snrpf TXN/Txn1 ITSN1/Itsn1 MTUS1/Mtus1 ADORA1/Adora1 PHYHIP/Phyhip MYO5C/Myo5c TFAP2B/Tfap2b CYP4B1/Cyp4b1 FMO5/Fmo5 FGF13/Fgf13 PTPN14/Ptpn14 SUSD5/Susd5 ARVCF/Arvcf SCUBE3/Scube3 TMEM8B/Tmem8b SLC27A1/Slc27a1 NCALD/Ncald RHPN2/Rhpn2 RXRG/Rxrg GPAM/Gpam SCN4A/Scn4a KRT13/Krt13 CASP9/Casp9 DGKG/Dgkg CFL2/Cfl2 NRXN1/Nrxn1 NTN1/Ntn1 SRGAP1/Srgap1 GPR146/Gpr146 CTNNA3/Ctnna3 NNAT/Nnat GATA1/Gata1 LEPR/Lepr ABLIM3/Ablim3 RADIL/Radil APBA1/Apba1 MLPH/Mlph TACC2/Tacc2 KLF11/Klf11 LTBP3/Ltbp3 ITPR1/Itpr1 F5/F5 FXYD1/Fxyd1 SRL/Srl ANGPTL7/Angptl7 STARD9/Stard9 VIT/Vit NCMAP/Ncmap CTDSP2/Ctdsp2 TRPV4/Trpv4 KIAA1244/D10Bwg1379e C10orf90/D7Ertd443e IRX1/Irx1 ATP13A5/Atp13a5 NLK/Nlk ATP6V0A1/Atp6v0a1 LMOD1/Lmod1 ID4/Id4 HDAC5/Hdac5 GHR/Ghr RIMS1/Rims1 CA8/Car8 PARK2/Par k2 RELN/Reln EXTL1/Extl1 PLA2G6/Pla2g6 BBS1/Bbs1 RGS9BP/Rgs9bp ZDHHC9/Zdhhc9 FASN/Fasn ACOT2/Acot2 PTCH2/Ptch2 TRIM55/Trim55 AQP4/Aqp4 GPD1/Gpd1 SLC39A11/Slc39a11 SVOPL/Svopl AIF1L/Aif1l C17orf103/Gm16515 RHBDL3/Rhbdl3 DLG2/Dlg2 PRLR/Prlr C3orf52/BC016579 KRT77/Krt77 PAK3/Pak3 COBL/Cobl LIFR/Lifr LIMS2/Lims2 PLOD3/Plod3 F8/F8 IL6ST/Il6st JAZF1/Jazf1 MRVI1/Mrvi1 MATN4/Matn4 ARHGEF10/Arhgef10 GATA2/Gata2 AGAP1/Agap1 KIT/Kit SNTB1/Sntb1 ENPP3/Enpp3 IGFBP5/Igfbp5 RGL3/Rgl3 HMCN2/Hmcn2 ZNF704/Zfp704 GPRC5B/Gprc5b PCDH9/Pcdh9 PRRG3/Prrg3 NOVA1/Nova1 SGCG/Sgcg OSR1/Osr1 ITIH5/Itih5 CPQ/Cpq FCGRT/Fcgrt ZBTB20/Zbtb20 ABCA9/Abca9 SORBS3/Sorbs3 EBF1/Ebf1 TACC1/Tacc1 LGI4/Lgi4 TENC1/Tenc1 LRP1/Lrp1 PODN/Podn TNNC1/Tnnc1 CPED1/Cped1 DCLK2/Dclk2 RCAN2/Rcan2 SGCA/Sgca LDB3/Ldb3 FHL1/Fhl1 DPYSL2/Dpysl2 COX7A1/Cox7a1 PID1/Pid1 PARD3B/Pard3b SASH1/Sash1 GYPC/Gypc SLC29A1/Slc29a1 SRGAP2/Srgap2 PPAP2B/Ppap2b SYNE1/Syne1 AGTR1/Agtr1a DCN/Dcn PRELP/Prelp MITF/Mitf PMP22/Pmp22 OLFML1/Olfml1 ATOH8/Atoh8 LGALS3/Lgals3 NEGR1/Negr1 GSTM5/Gstm1 TIMP3/Timp3 EFEMP1/Efemp1 GREB1L/Greb1l DENND2A/Dennd2a CORO2B/Coro2b FGF10/Fgf10 NPY1R/Npy1r SEMA7A/Sema7a MMRN1/Mmrn1 LURAP1/Lurap1 GPRC5C/Gprc5c VGLL3/Vgll3 ADRA1A/Adra1a ISM1/Ism1 ARHGAP6/Arhgap6 HPSE2/Hpse2 C7/C7 FHL5/Fhl5 SDPR/Sdpr TNS1/Tns1 PGM5/Pgm5 TUSC5/Tusc5 STAC2/Stac2 ADIPOQ/Adipoq PKDCC/Pkdcc SYNPO2/Synpo2 RGAG4/Rgag4 ZNF423/Zfp423 SLC43A3/Slc43a3 DAB2/Dab2 COLEC12/Colec12 C6/C6 PYROXD2/Pyroxd2 RECK/Reck PAMR1/Pamr1 CYBRD1/Cybrd1 KANK2/Kank2 ADCY2/Adcy2 PTGFR/Ptgfr SVEP1/Svep1 FAM180A/Fam180a SUMF2/Sumf2 CFH/Cfh PI16/Pi16 EDNRB/Ednrb DPT/Dpt CLEC3B/Clec3b MYH11/Myh11 KCNT2/Kcnt2 MYOCD/Myocd ATP1A2/Atp1a2 RBM24/Rbm24 MSRB3/Msrb3 MYLK/Mylk PLA2R1/Pla2r1 NHSL2/Nhsl2 RAPGEF2/Rapgef2 AOC3/Aoc3 GPR133/Gpr133 LATS2/Lats2 LPAR1/Lpar1 FGF2/Fgf2 MGP/Mgp SCARA5/Scara5 LONRF3/Lonrf3 FAM107A/Fam107a APOD/Apod NPR1/Npr1 TLE2/Tle2 NOSTRIN/Nostrin DLC1/Dlc1 PPP1R9A/Ppp1r9a RBP4/Rbp4 CDIP1/Cdip1 TMEM66/Tmem66 ADCY5/Adcy5 TBX15/Tbx15 LDB2/Ldb2 PRUNE2/Prune2 PALM/Palm ESRRG/Esrrg EPS8/Eps8 PARVB/Parvb AQP1/Aqp1 PCDH19/Pcdh19 TMEFF2/Tmeff2 MARC2/Marc2 CREB5/Creb5 TCEAL7/Tceal7 LGI1/Lgi1 PLCE1/Plce1 TRDN/Trdn WBSCR17/Wbscr17 SPEG/Speg PCDH20/Pcdh20 CADM2/Cadm2 DMD/Dmd CA3/Car3 GRB14/Grb14 CDO1/Cdo1 RASD2/Rasd2 MYOM1/Myom1 MLXIPL/Mlxipl C14orf180/A530016L24Rik GABARAPL1/Gabarapl1 SEMA3E/Sema3e KSR1/Ksr1 BEND7/Bend7 ARL8B/Arl8b GNAZ/Gnaz ACSS3/Acss3 FZD4/Fzd4 TIMP4/Timp4 PPARGC1A/Ppargc1a ABCA8/Abca8b PDK4/Pdk4 SCARB1/Scarb1 FZD7/Fzd7 SEMA6A/Sema6a FOXC1/Foxc1 GATA6/Gata6 SESTD1/Sestd1 SPTBN1/Sptbn1 SLC27A6/Slc27a6 ACSF2/Acsf2 TLE6/Tle6 SIK2/Sik2 RIPK2/Ripk2 PCK1/Pck1 PNPLA2/Pnpla2 INSR/Insr ABCA3/Abca3 HMGCS2/Hmgcs2 ST3GAL5/St3gal5 NTRK2/Ntrk2 PKHD1L1/Pkhd1l1 SEMA3D/Sema3d INMT/Inmt GSN/Gsn GCSAM/Gcsam MYOC/Myoc PLP1/Plp1 EPHB6/Ephb6 CGNL1/Cgnl1 OPHN1/Ophn1 CHRDL1/Chrdl1 CFD/Cfd C1orf95/6330403A02Rik TGFBR3/Tgfbr3 ARHGAP20/Arhgap20 TSPAN2/Tspan2 PPM1H/Ppm1h SOX17/Sox17 GEM/Gem ZSCAN18/Zscan18 CD300LG/Cd300lg SMARCD3/Smarcd3 EBF2/Ebf2 CNRIP1/Cnrip1 CXorf36/4930578C19Rik ACVRL1/Acvrl1 DGKB/Dgkb VWF/Vwf SHANK3/Shank3 SLC8A3/Slc8a3 FGF1/Fgf1 FAM110D/Grrp1 RNF150/Rnf150 SORBS2/Sorbs2 ZNF536/Zfp536 TESC/Tesc MPZ/Mpz PPP1R1A/Ppp1r1a SYNM/Synm PADI2/Padi2 SLC2A4/Slc2a4 FAM132A/Fam132a VTCN1/Vtcn1 PLIN1/Plin1 CIDEC/Cidec RASD1/Rasd1 ATF3/Atf3 PLA2G5/Pla2g5 LGALS12/Lgals12 SNCG/Sncg PYGM/Pygm MYBPC1/Mybpc1 CRYAB/Cryab SH3BP5/Sh3bp5 HAS1/Has1 LMCD1/Lmcd1 MMP19/Mmp19 FES/Fes NNMT/Nnmt ACTA1/Acta1 ALDH1B1/Aldh1b1 JPH2/Jph2 LPL/Lpl SNX32/Snx32 XYLT1/Xylt1 SNPH/Snph NUMBL/Numbl RARA/Rara HABP4/Habp4 CAPN5/Capn5 MGRN1/Mgrn1 KLF9/Klf9 CD55/Cd55 PLEKHB1/Plekhb1 HSPB6/Hspb6 VDR/Vdr HMBOX1/Hmbox1 FAM211A/Fam211a ITIH3/Itih3 ABHD14B/Abhd14b PLA2G16/Pla2g16 CDH19/Cdh19 HPN/Hpn AQPEP/4833403I15Rik RNF112/Rnf112 C1orf115/C130074G19Rik PLEKHA4/Plekha4 MFSD4/Mfsd4 BACH2/Bach2 OBSCN/Obscn PHF1/Phf1 RHOU/Rhou PACSIN2/Pacsin2 SLC25A4/Slc25a4 EDEM3/Edem3 SNX11/Snx11 SERPINB13/Serpinb13 FKBP15/Fkbp15 GGH/Ggh C20orf196/1110034G24Rik C6orf62/BC005537 GYG1/Gyg PEX13/Pex13 TECPR1/Tecpr1 TTC16/Ttc16 ADH7/Adh7 TLR7/Tlr7 CYBB/Cybb PRR5L/Prr5l CPEB3/Cpeb3 NLRC4/Nlrc4 TLR8/Tlr8 RASGRF1/Rasgrf1 TLR9/Tlr9 DERL3/Derl3 GAPT/Gapt ZNF366/Zfp366 CLEC4C/Clec4b1 DCLRE1C/Dclre1c PLD4/Pld4 CXCL13/Cxcl13 CLEC12A/Clec12a PARP11/Parp11 ZYG11A/Zyg11a NXPE3/Nxpe3 AP3M2/Ap3m2 STK4/Stk4 CLNK/Clnk CD3E/Cd3e CD3G/Cd3g CD2/Cd2 CTSW/Ctsw ZAP70/Zap70 RNASE6/Rnase6 CORO1A/Coro1a PTPN7/Ptpn7 CXCR3/Cxcr3 RGS18/Rgs18 SIT1/Sit1 GPR174/Gpr174 SLAMF6/Slamf6 IKZF3/Ikzf3 ACAD9/Acad9 PIK3CG/Pik3cg RCSD1/Rcsd1 ACY3/Acy3 ENOX2/Enox2 ARMC7/Armc7 SLAMF8/Slamf8 SASH3/Sash3 CD22/Cd22 MYO1G/Myo1g RASAL3/Rasal3 CNR2/Cnr2 PLXNC1/Plxnc1 SP110/Sp110 POU2F2/Pou2f2 PRKCQ/Prkcq DENND1B/Dennd1b RAB7L1/Rab7l1 SLC38A9/Slc38a9 ITGAE/Itgae PPT1/Ppt1 CD47/Cd47 CDC42SE2/Cdc42se2 INPP4A/Inpp4a UNC13D/Unc13d SLAMF7/Slamf7 SEL1L3/Sel1l3 APOL6/Apol6 NKG7/Nkg7 FAM26F/Fam26f C1QC/C1qc EPSTI1/Epsti1 KLHL6/Klhl6 CYBA/Cyba PRRG2/Prrg2 ST8SIA4/St8sia4 CD244/Cd244 CD226/Cd226 HIST1H2BJ/Hist2h2be CARD11/Card11 MFNG/Mfng RAC2/Rac2 LPXN/Lpxn SLFN5/Slfn5 LAT2/Lat2 ARL11/Arl11 MARCH1/March1 TFEC/Tfec FGD3/Fgd3 ITGA4/Itga4 PIK3CD/Pik3cd CXorf21/5430427O19Rik SYK/Syk TWF2/Twf2 P2RY6/P2ry6 WDR5B/Wdr5b AGK/Agk CEP97/Cep97 PPP1R21/Ppp1r21 RAB11FIP2/Rab11fip2 FSD2/Fsd2 GPR1/Gpr1 MDFIC/Mdfic ADCY7/Adcy7 CCDC142/Ccdc142 TTC30B/Ttc30b GTF2A1/Gtf2a1 TMEM14A/Tmem14a MFAP3L/Mfap3l RNASEL/Rnasel ATP13A2/Atp13a2 STRIP2/Strip2 SLC38A6/Slc38a6 TRIM65/Trim65 IGF2BP3/Igf2bp3

Figure 5 | Cross-species transcription factor motif analysis reveals major drivers of cuSCC development. (a) Global view of transcription factors with target genes enriched across the entire NS/CHR to AK/PAP to cuSCC progression sequence. Directionality reflects the significant upregulation (above the line) or downregulation (below the line) of predicted targets of the listed transcription factors. Some factors have targets that are enriched in opposite directions across distinct transitions. The transcription factors highlighted in red were identified in both TRANSFAC and LME-based analyses. (b) Network analysis demonstrates that core transcriptional drivers are highly interconnected in both human (left) and mouse (right). The bolded lines delineate connections that are significant by Fisher exact test (Po10 À 4). (c) The LME model of mRNA expression changes across cuSCC development in both species demonstrates that the vast majority of significant gene expression changes occur in the early transition from NS/CHR to AK/PAP. through ETS2, b-catenin signalling through TCF3 and LEF1, and used, it is clear that AKs now define a pattern of microRNA pathways regulated by NFAT and AP1 that likely impinge upon expression that is intermediate between NS and cuSCC, with keratinocyte differentiation, were implicated globally across the improved discrimination between sample types (Fig. 6b, entire development sequence (Fig. 5). Supplementary Data 7 and Supplementary Fig. 7). Whereas unsupervised clustering of mRNA expression in the Hairless mouse model failed to distinguish papi- MicroRNA sequencing and integrated analysis. Relative to llomas from cuSCC (Fig. 4d), unsupervised clustering of mRNA, clustering of microRNAs differentially expressed in at microRNA expression more clearly separates all three least one pairwise comparison among the matched human sample types (Fig. 6c,d, Supplementary Data 8 and Supple- samples showed a better ability to distinguish the three sample mentary Fig. 7), suggesting that microRNAs have more classes (Fig. 6a). When recurrent statistically significant changes discriminatory power as compared with mRNA in segregating occurring in at least two out of three pairwise comparisons are the sample types.

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

ac SCC SCC PAP AK Pearson correlation coefficient CHR Pearson correlation coefficient NS

–1 0 1 –10 1

8-NS 4-NS HR3_PAP 1-NS HR7_PAP 10-NS 5-NS HR2_PAP 2-NS HR9_PAP 6-NS 4-AK-1 HR10_SCC 8-AK HR10_PAP 3-SCC 1-AK HR8_PAP 4-AK-2 HR2_SCC 10-AK HR3_SCC 5-AK 6-SCC HR8_SCC 5-SCC HR7_SCC 12-SCC 4-SCC-1 HR9_SCC 8-SCC HR9_CHR 3-AK 2-SCC HR10_CHR 10-SCC HR7_CHR 4-SCC-2 6-AK HR3_CHR 2-AK HR8_CHR 12-AK 3-NS HR2_CHR 5-AK 1-AK 8-AK 2-AK 6-AK 3-AK 6-NS 2-NS 5-NS 1-NS 4-NS 8-NS 3-NS 10-AK 12-AK 10-NS 2-SCC 8-SCC 5-SCC 6-SCC 3-SCC 4-AK-1 4-AK-2 10-SCC 12-SCC 4-SCC-2 4-SCC-1 HR8_PAP HR9_PAP HR2_PAP HR7_PAP HR3_PAP HR9_SCC HR7_SCC HR8_SCC HR3_SCC HR2_SCC HR2_CHR HR8_CHR HR3_CHR HR7_CHR HR9_CHR HR10_PAP HR10_SCC HR10_CHR

bd

SCC SCC Z-score Z-score AK PAP

NS CHR –2 0 2 –2 0 2

hsa-mir-133a mmu-miR-1198 5p hsa-mir-4524a-3p mmu-miR-149-5p hsa-mir-451a mmu-miR-3107-5p hsa-mir-664a-3p mmu-miR-486-5p hsa-mir-885a-5p mmu-miR-361-3p hsa-mir-30a-3p mmu-miR-423-5p hsa-mir-100-5p mmu-miR-6406 hsa-mir-30c-2-3p mmu-miR-125b-5p hsa-let-7e-3p mmu-miR-664-3p hsa-mir-383 mmu-miR-99b-5p hsa-mir-296-5p mmu-miR-320-3p hsa-mir-139-5p mmu-miR-484 hsa-mir-574-3p mmu-miR-744-5p hsa-mir-574-5p mmu-miR-193b-3p hsa-mir-125a-5p mmu-let-7d-3p hsa-mir-145-5p mmu-miR-328-3p hsa-mir-497-5p mmu-miR-532-3p hsa-mir-4508 mmu-miR-339-5p hsa-mir-3656 mmu-let-7i-3p hsa-mir-652-3p mmu-miR-378c hsa-mir-486-3p mmu-miR-667-3p hsa-mir-486-5p mmu-miR-219-1-3p hsa-mir-6089 mmu-miR-330-3p hsa-mir-762 mmu-miR-34b-3p hsa-mir-1538 mmu-miR-5107-5p hsa-mir-4634 mmu-miR-1843b-3p hsa-mir-3178 mmu-miR-700-5p hsa-mir-6087 mmu-miR-326-3p hsa-mir-3195 mmu-miR-1306-5p hsa-mir-3621 mmu-miR-125a-5p hsa-mir-3663-3p mmu-miR-1943-5p hsa-mir-3665 mmu-miR-150-5p hsa-mir-3196 mmu-miR-1249-3p hsa-mir-4655-3p mmu-miR-222-3p hsa-mir-328 mmu-miR-181a-2-3p hsa-mir-125b-5p mmu-miR-92b-3p hsa-mir-504 mmu-miR-342-3p hsa-mir-218-1-3p mmu-miR-351-5p hsa-mir-193a-5p mmu-miR-218-1-3p hsa-mir-6511b-3p mmu-miR-204-5p hsa-mir-375 mmu-miR-511-5p hsa-mir-664b-3p mmu-miR-34c-3p hsa-mir-139-3p mmu-miR-1981-3p hsa-mir-2110 mmu-miR-5107-3p hsa-mir-320a mmu-miR-145a-5p hsa-mir-532-3p mmu-miR-1247-5p hsa-mir-423-5p mmu-miR-350-5p hsa-mir-423-3p mmu-miR-139-5p hsa-mir-197-3p mmu-miR-128-3p hsa-let-7d-3p mmu-miR-27b-3p hsa-mir-6510-3p mmu-miR-429-3p hsa-mir-193b-3p mmu-miR-27a-3p hsa-mir-455-5p mmu-miR-141-3p hsa-mir-22-3p mmu-miR-31-3p hsa-mir-411-5p mmu-miR-28a-5p hsa-mir-450a-5p mmu-miR-28c hsa-mir-503-5p mmu-miR-186-5p hsa-mir-374b-5p mmu-miR-335-3p hsa-mir-15b-3p mmu-miR-30e-5p hsa-let-7a-3p mmu-miR-141-5p hsa-mir-22-5p mmu-miR-20a-5p hsa-mir-424-5p mmu-miR-98-5p hsa-mir-31-3p mmu-miR-92a-1-5p hsa-mir-582-5p mmu-miR-182-5p hsa-mir-34c-5p mmu-miR-17-5p hsa-mir-10b-5p mmu-miR-207-3p hsa-mir-331-5p mmu-miR-31-5p hsa-mir-130b-5p mmu-miR-93-5p hsa-mir-493-3p mmu-miR-210-3p hsa-mir-548o-3p mmu-miR-451a hsa-mir-151a-3p mmu-miR-138-5p hsa-mir-181c-3p mmu-miR-125b-1-3p hsa-mir-181d mmu-miR-200a-5p hsa-mir-424-3p mmu-miR-25-3p hsa-mir-944 mmu-miR-200b-3p hsa-mir-141-5p mmu-miR-192-5p hsa-mir-17-5p mmu-miR-21a-5p hsa-mir-103a-3p mmu-miR-26b-5p hsa-mir-20a-5p mmu-miR-15a-5p hsa-mir-340-5p mmu-miR-582-3p hsa-let-7g-5p mmu-miR-26a-5p hsa-mir-155-5p mmu-miR-28a-3p hsa-mir-25-3p mmu-miR-210-5p hsa-mir-95 mmu-miR-1983 hsa-mir-224-5p mmu-miR-148a-3p hsa-mir-629-5p mmu-miR-300-3p hsa-mir-421 mmu-miR-126-5p hsa-mir-21-3p mmu-miR-142-5p hsa-mir-21-5p mmu-miR-18a-5p hsa-mir-3676-5p mmu-miR-126-3p hsa-mir-27b-3p mmu-miR-22-3p hsa-mir-31-5p mmu-miR-322-5p hsa-mir-24-2-5p mmu-miR-96-5p hsa-mir-27b-5p mmu-miR-1843a-5p hsa-mir-181a-3p mmu-let-7f-5p hsa-mir-452-5p mmu-miR-148b-5p hsa-mir-203a hsa-mir-4443 mmu-miR-224-5p 2-AK 6-AK 3-AK 8-AK 5-AK 1-AK 1-NS 4-NS 8-NS 2-NS 5-NS 6-NS 3-NS HR8_PAP HR2_PAP HR9_PAP HR7_PAP HR3_PAP HR7_SCC HR9_SCC HR3_SCC HR8_SCC HR2_SCC HR9_CHR HR8_CHR HR7_CHR HR2_CHR HR3_CHR 12-AK 10-AK 10-NS HR10_PAP HR10_SCC HR10_CHR 2-SCC 5-SCC 8-SCC 6-SCC 3-SCC 4-AK-1 4-AK-2 -SCC-1 -SCC-2 10-SCC 12-SCC 4 4 Figure 6 | microRNA profiling across AK/papilloma and cuSCC development. (a) Correlation matrix of microRNAs differentially expressed in at least one signature in human samples shows that significantly improved discrimination between three sample types is achieved as compared with mRNA profiles. (b) Using only microRNAs differentially expressed in at least two out of three pairwise comparisons (Po0.05), robust discrimination is achieved between NS and cuSCC with most AKs occupying an intermediate expression pattern. (c) Hierarchical clustering of microRNAs differentially expressed in at least one signature in mouse samples shows distinct patterns among the three sample types as compared with mRNA profiles. (d) Using only microRNAs differentially expressed in at least two of three pairwise comparisons (Po0.05), CHR and cuSCC are very strongly segregated with an intermediate group dominated by PAP.

Given that microRNAs can negatively regulate their target species were ranked by all possible pairwise comparisons in which mRNA expression by Watson-Crick pairing over a 7–8 nucleotide they were significantly differentially expressed (rather than by seed region, we performed functional pair analysis by identifying stage), and an integrated network map was generated to identify miRNA–mRNA pairs predicted to be linked by a seed sequence several key microRNAs with multiply targeted mRNAs (Fig. 7a,b). in the 30UTR of the mRNA and that were significantly anti- The microRNAs significantly upregulated in this cross-species correlated in expression changes54. Functional pairs in both analysis included miRs-15a/b, 17, 20a, 21, 31, 200a and 340b

10 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

KY TLCD2 CACNB1 CDON MYRIP ACOX1 GABARAPL1 RERE ANKRD33B a FOXO3 MSRB3 c RBM24 FGF2 ATP2A1 LRIG1 RELN RASGEF1B SMOC1 CCDC64 CASZ1 RRAGD GULP1 AGPAT3 NR4A3 G0S2 MFAP3LSLC43A1 RORA SLC11A2 ZBTB44 GNAZ PCDH19 TMEM135 MCF2L PPARGC1B SEMA3D PID1 TGFBR3 SNTB1 ACSL1 RGMA PPAP2B PKDCC miR-21 ARHGAP24 TIMP3 COBL DLG2 AMOT ADAMTSL3 SRGAP1 AATK DBT SCN7A SEMA3A ESRRG SCN4B PLSCR4 PTCH2 * FRY MEOX2 ** ** NECAB1TNRC6C RTN4RL1 SORT1 KSR1 ALAD TSC22D3 30 * 1.5 1.5 PAFAH2 AGAP1 HPSE2 PANK1 MYLK NOTCH2C TMOD1 SVEP1 EZH1 FREM2 PAQR3 SGK1 miR-15b-3p GLUL CADM2 PLP1 DLC1 CHAC1 NEBL RGMB SLC4A4 WDR26 PPAP2A INSR EGR1 PPM1H miR-15a-5p ARHGAP26 FA2H LPHN3 SNCG GRAMD3 FADS6 FASN GK5 LDB2 NR3C1 miR-340-5p 20 CPEB4 PCDH9 KDSR 1.0 1.0 SLC26A2 KCNJ16 ZCCHC2 GAN RAPGEFL1 BAIAP2 ADAMTS19 RECK ALX4 CACNB4 ZFPM2 CRY2 EFCAB4A LMOD1 TRIM2 GAS7 ARHGAP20 PPP1R12B PDCD4 PIK3R1 BTG2 GHR DMD KCNT2 NOS1 ACADSB NFIB SESTD1 BACH2 FMN1 MOCS1 Fold change Fold change

CREB5 Fold change UNC80 10 0.5 0.5 FREM1 OSBPL6 BCL2 SESN1 PDK4 CPEB3 PCDH20 PRELP FZD4 ITPR2 miR-21-5p KCNA1 LATS2 NTF3 ACVR2A PKNOX2 TACC1 CREG1 miR-21-3p CBX4 SLC2A4 FGF1 SNX29 EDNRB FGF7 miR-31-5p CEBPA PPARA TNS1 ZFHX4 ARHGAP24 NBEA SLC8A3 SPG20 0 0.0 0.0 ALCAM miR-31-3p JAZF1 MEGF9 SLCO4C1 PPP1R9A NS AK SCC NS AK SCC NS AK SCC SSECISBP2L PTPN14 ADCY6 SRL FAM107B AK3 INSC EBF3 PDE4D FAM134B SASH1 FAM13A KIT RORC FGD4 GLIS2 RET MSX1 OSR1RBMS3 AIF1L VGLL3 SSH1 TLN2 TACC2 IL34 TNFRSF11B DHCR24 NFIA PDZD2 PGRMC2 STARD13 EHBP1 MLXIP LIFR TIMP3 RHBDL3 EGR3 ARMCX1 SPRY1 SLC16A2 PER2

NCLN b FAM83H SFXN1 d ENTPD3 VASH1 BMPR1B LRRC40 YWHAZ CCNE2 LIMK1 miR-31PTPN14 FAM134B DBF4 CHST2 * ITGA6 E2F7 HAPLN1 50 2.5 **** EIF4EBP1 POP1 *** DSG3 GDNF NME4 INTS7 4 SOX12 MBD2 miR-125b-5p PPAT CD276 ADAM12 MYBL2 PITX1 HMGA2 PDE7A 40 2.0 COL5A2 SLC25A24 DUSP7 SEMA4B AURKB PCGF6 3 CASP3 RRM2 TMEM26 AP1S1 SEMA6D ESPL1 PTGFRN SLC7A1 SRPRB PI4K2B 30 1.5 DUSP9 WARS CDC7 SULF1 NIP7 let-7d-3p VSNL1 TMTC3 DPH2 LRRC8D MICALL1 NAA38 LBR 2 TMPO BNC1 ENTPD7 RNF7 20 1.0 let-7e-3p SLC1A4 TWF1 NRIP1 Fold change COTL1 Fold change

UCHL5 CDK6 Fold change RALB CALU miR-30a-3p DYRK2 MDFI FANCF 1 CDCA8 ERO1L FKBP3 10 0.5 COL24A1 TFDP1 CTSC COL4A1 MAPK6 SLC7A11 IGF2BP1 REEP1 S100A8 FRK P4HA2 CDC25A GALNT3 ADAMTS6 0 0 0.0 GOT2 NS AK SCC NS AK SCC NS AK SCC SERPINE1 ENAH POLR3G SDAD1 –2 –1 0 1 2 Figure 7 | Functional pair analysis identifies highly interconnected microRNA/mRNA regulatory networks conserved across species. This occurs for both (a) upregulated microRNAs/downregulated target mRNAs and for (b) downregulated microRNAs/upregulated target mRNAs. The figures only À 8 show those microRNAs with qo10 , log2-fold change of 41.15, conserved between species. Validation of select microRNA/mRNA pairs using a distinct set of human matched samples demonstrate robustness of the findings. (c) miR-21 is upregulated between NS and cuSCC, whereas predicted targets ARHGAP24 and TIMP3 are downregulated. (d) miR-31 is upregulated between NS and cuSCC, whereas predicted targets PTPN14 and FAM134B are downregulated.

(Fig. 7a), and those significantly downregulated included let-7 is prominently represented among downregulated microRNAs, family members and miRs-30a and 125b (Fig. 7b). Some of these consistent with a tumour suppressor role, and one of the predicted microRNAs are confirmed in our data to be significant in cuSCC. targets, HMG2A, was substantially upregulated in a validation miR-21 and miR-31 have been shown to be upregulated in a cohort by 45.4±14.5-fold (mean±s.e.m.; n ¼ 4 sets). These number of cancers including cuSCC55–57, although the identity of specific pairs need to be functionally validated to provide the most relevant mRNA targets is not known. In many instances, mechanistic links, but these data show that the cross-species functionally paired mRNAs were predicted to be targeted by functional pair analysis is robust. multiple microRNAs. We validated the expression levels of miR-21 and miR-31 and of selected predicted target genes in additional sets (n ¼ 21) of Relationship of cuSCC to other SCC. Many types of SCC arising matched human samples of NS, AK, and cuSCC (Fig. 7c,d and in diverse sites, such as lung, oesophagus, bladder, cervix and Supplementary Data 9). miR-21 was substantially upregulated by head and neck SCC (HNSCC) have now been genomically 8.9±3.1-fold (mean±s.e.m.; n ¼ 8 sets) across the progression profiled, in addition to exome sequencing of cutaneous SCC32–35. sequence, with respective downregulation of the predicted targets These combined efforts have collectively identified common ARHGAP24 (2.9±0.6-fold; mean±s.e.m.; n ¼ 6 sets) and TIMP3 pathway alterations for many SCCs. TP53 mutations occur at (3.0±0.4-fold; mean±s.e.m.; n ¼ 6 sets) (Fig. 7c). ARHGAP24 is over 70% frequency in all SCCs; NOTCH family genes are a RAC1 GAP (ref. 58), and TIMP3 suppresses metalloproteinase mutated in over 70% of cuSCC32,33, 10–20% of HNSCC61–63, function, both of which are consistent with tumour suppressive 13% of LUSC64 and 10% of oesophageal SCC (ESCA SCC)65; and functions. miR-31 was also upregulated (16.2±6.8-fold; amplification is a common lineage-specific driver of SCC66. mean±s.e.m.; n ¼ 7 sets) across cuSCC development and To test the hypothesis that mRNA expression profiles would predicted to downregulate PTPN14 (3.2±0.4-fold; mean±s.e.m.; reveal molecular commonalities between all SCC, we profiled our n ¼ 5 sets), a which downregulates YAP signalling59 NS/cuSCC signature against cancers in the TCGA using gene set (Fig. 7d). In addition, FAM134B, a putative tumour suppressor60 enrichment analysis (GSEA). Furthermore, we surmised that was identified as a potential target of miR-31 (down 4.3±1.4-fold; there would be major differences between carcinogen-driven mean±s.e.m.; n ¼ 5 sets; Fig. 7d). The let-7 family of microRNAs versus virally driven SCC, as observed in HNSCC61–63. We found

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 that cuSCC is most similar to HNSCC, followed by LUSC, basal NS to AK to cuSCC, although our sample size is small (Fig. 3a). (triple-negative) and HER2 breast cancer and ESCA SCC, but not The lower relative representation of the UVB signature in the NS closely related to cervical SCC, which is overwhelmingly human samples maybe due to low sensitivity for detecting small papillomavirus (HPV) -driven (Fig. 8a and Supplementary subclones induced by UVR exposure at a given sequencing depth Data 10). This establishes transcriptomic evidence that deep (Fig. 2b). It is possible that initiated clones persist longer than molecular commonalities exist between carcinogen-driven SCCs normal keratinocytes, thereby enabling expansion or accumula- of multiple tissue sites, a concept supported by the mutational tion of further UVB-mediated DNA damage, a notion consistent data, and suggests that common molecular strategies for with the early appearance of TP53 mutations (Figs 2c and 3b)36, prevention and treatment of multiple types of SCCs may exist. and dramatically illustrated in the NS sample from patient 1 Given that HNSCC is consistently the most closely related (Figs 2a,b and 3b). Other contributory mechanisms could include tumour to cuSCC, we asked whether cuSCC-derived signatures compromised DNA repair, compromised photoprotection by from our cohort of well-differentiated tumours could also be used melanocytes, or the generation of these types of mutations in the to predict outcomes in carcinogen-driven (non-HPV) HNSCCs, absence of UVR exposure. Finally, the overlap in specific which have TP53 mutations and high mutational loads63. mutations was low (Fig. 3b, Table 2) and while this may also When we restricted the cuSCC signatures to early genes be due to the inability to detect small subclones particularly in the identified in the LME progression model for either human or NS and AK samples, all the lesions (particularly AKs) were mouse independently (Fig. 5c), both signatures were significantly physically distinct. Nevertheless, the dramatic overlap in predictive of overall survival (Fig. 8b—top row). Similarly, when mutations between the two separate SCCs in patient 4 strongly we used only the cross-species intersection of these two signatures suggests that clones of UVR-initiated keratinocytes can populate of early genes, this was again significantly predictive of large areas of epidermis (Fig. 3b and Table 2). overall survival (Fig. 8b—bottom left, Supplementary Fig. 8). SCCs arise at interfaces with the environment, thus making Most importantly, a 309-gene signature derived solely from the them susceptible to sustained carcinogenic insults. Our data cross-species microRNA functional pair analysis (Fig. 7a,b) had support the notion that cuSCC, HNSCC, LUSC and ESCA SCC significant predictive power for overall survival (Fig. 8b—bottom share deep molecular commonalities (Fig. 8a) at the mutational right), showing that these microRNA target genes likely regulate and transcriptional levels, and include deregulation of key not only important processes in cuSCC development but also pathways such as those driven by altered RB1, TP53 and TP63 drivers of disease outcome in HNSCC. function32,34,35,63–65. Therefore, for the subsets of these SCCs driven by UVR, alcohol and tobacco exposure, common Discussion molecular treatment and prevention strategies may potentially Our analysis is the first comprehensive characterization of be developed and modelled on cuSCCs, which are substantially genomic changes that drive the development of cuSCC through more accessible and common. The unexpected molecular its preneoplastic intermediate, the AK, employing the combina- similarity of cuSCC to specific subtypes of breast cancer may tion of matched human patient samples, next-generation also highlight similar molecular vulnerabilities sequencing, and cross-species analysis. Despite the clear clinical such as ERBB2/HER2 (Fig. 8a and Supplementary Table 2). and histological distinctions between cuSCC, AK and perilesional Interestingly, there was much less similarity between cuSCC and UVR-damaged skin, AK/PAP are most closely related to cuSCC, cervical SCC (Fig. 8a). cuSCC does not appear to require HPV by many measures including mRNA expression transcription for tumour maintenance, whereas cervical SCC is (unsupervised clustering and LME model), transcription factor overwhelmingly driven by high-risk aHPV infection67. motif analysis, mutational signatures and overlap, chromosomal While AKs appear already to harbour the majority of events instability signature expression, and microRNA–mRNA that are retained in cuSCC, at least two alternative explanations functional pair analysis (Fig. 8c). are possible: (1) consistent mutational or transcriptomic events Our data confirm the high mutational burden of these skin that separate AKs versus cuSCC could be present, and/or cancers32–35. Importantly, high mutational loads have recently (2) there are distinct molecular classes of AKs with different been described in chronically UVR-exposed blepharoplasty risks of progression to cuSCC. Either of these explanations would samples36, and, for the first time, we show the large degree of require much larger numbers of samples to demonstrate. mosaicism present across the entire exome in non-lesional UVR- Nevertheless, our data show that microRNA expression exposed peri-tumoural skin, with quantitatively similar distinguishes the three sample classes and may potentially serve mutational loads (Fig. 2). The SMGs identified, including TP53, as a basis for distinguishing different types of AKs (Fig. 6). NOTCH1-2, FAT1 and MLL2 are ones likely to be important in In addition, our cross-species functional pair analysis has cuSCC pathogenesis (Fig. 2c)32–35. Conversely, one of the most identified a handful of highly interconnected microRNA– frequently mutated genes previously identified in sun-exposed mRNA networks that drive cuSCC development through skin, FGFR3, was not found to be mutated, suggesting that this preneoplastic AKs (Fig. 7), highlighting specific microRNA genetic lesion, frequently found in seborrheic keratoses, may be targets for potential intervention. specifically critical for benign keratoses and not non-melanoma Nevertheless, given the many significant similarities between skin cancers36. Our findings are consistent with the notion that AK and cuSCC, our data suggest that non-lesional carcinogen- tumour suppressor genes, which represent the largest class of exposed fields of tissue may represent the most effective point of cancer genes known to be recurrently targeted in cuSCC, can intervention for molecularly targeted chemoprevention. The often be inactivated by mutational insults spread across their development of cancer through a preneoplastic intermediate has entire coding regions. been studied extensively in Barrett’s oesophagus and oesophageal The overwhelming preponderance of epidemiological, clinical adenocarcinoma68,69. Barrett’s oesophagus that evolves into and biological data suggest that UVR exposure is the main driver adenocarcinoma can be largely indistinguishable from of sporadic cuSCC development. The subsequent expansion in carcinoma, consistent with our conclusion that for a subset of overall mutational load, which occurs in progression to AK and patients, field-based treatment is most appropriate. AKs are cuSCC correlates with a significant enrichment for UVB-signa- typically too small to allow for repeated sampling, thus removing ture mutations (Fig. 2b). It is possible that this is reflective of a the possibility for longitudinal follow-up and clinical discernment progressively more clonal structure across the continuum from between AKs that ultimately progress and those that do not.

12 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

Nevertheless, it is likely that AKs are polyclonal reservoirs from a useful testbed for chemopreventive and treatment modalities. which cuSCCs can arise (Fig. 3a), and it is unclear whether Other mouse models of cuSCC have also been extensively studied, catastrophic genomic instability drives this late transition. most prominently the DMBA/TPA model70; in this model, Our data substantiate the power of cross-species analysis to tumorigenesis is overwhelmingly driven by HrasQ61 mutations identify biologically important pathways in cuSCC development which are rarely found in UVR-driven human cuSCC and and suggests that the solar UVR-exposed Hairless mouse model is in the UVR-driven Hairless mouse model (Supplementary

PAAD a KIRC HNSC

LUSC THCA BRCA Basal PRAD BRCA Her2

LIHC ESCA SCC

BLCA KIRP CESC COAD ESCA AD

STAD BRCA LumB BRCA Luma LAUD

UCEC

READ

Normalized enrichment score

–10 0 10

b 1.0 1.0 Early Early 0.8 0.8 Top 25% (n=47) 0.6 0.6

0.4 0.4 Bottom 25% (n=47) 0.2 0.2

P <0.0015 log-rank P <0.0163 log-rank 0.0 0.0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 1.0 1.0 Conserved early Conserved miRNA targets 0.8 0.8

0.6 0.6 Fraction surviving

0.4 0.4

0.2 0.2

0.0 P <0.0070 log-rank 0.0 P <0.0201 log-rank 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Time (months)

c Mutational load miRNA profiles

SMG Mutation signatures Mutational overlap CIN signature mRNA , TF profiles Functional pairs

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

Data 1)24,32–35. The key transcriptional drivers we identified in DNA Analysis Facility for sequencing (Illumina HiSeq200, 76 nt PE). We used both pairwise signatures and the LME model are important early DNA Genotek ORAgene saliva collection kits and followed manufactures’ in the NS/CHR to AK/PAP transition and include E2F, ELK1 and collection and storage instructions (catalogue # OG-500). Genomic DNA was isolated from saliva samples using DNA Genotek prepITC2D (PT-C2D) NFY. Globally, we identified ERK signalling through ETS2 and extraction columns and manufacturers’ protocol. ELK1, b-catenin signalling through TCF3 and LEF1, and possible differentiation pathways regulated by NFAT and AP1 as Exome analysis. For any given patient, if their saliva samples were available, they potentially important drivers of cuSCC development (Fig. 5). were used as the paired control for the mutation detection. Otherwise, NS samples The significant enrichment of signatures from both patient were used as controls. Four precapture libraries were pooled together and samples and the UVR-driven Hairless model in multiple hybridized according to the manufacturer’s protocol NimbleGen SeqCap EZ carcinogen-driven SCCs arising in diverse sites is important in Exome Version 3. Exomes were sequenced on an Illumina HiSeq 2000 platform to an average coverage of 135X. Sequencing runs generated approximately 300–400 that it establishes the concept that these SCCs are molecularly million successful reads on each lane of a flow cell, yielding 9–12 Gb per sample. closely related (Fig. 8a). Furthermore, early cuSCC signatures Initial sequence analysis was performed using the HGSC Mercury analysis pipeline were able to predict survival in TP53-mutant (non-HPV) HNSCC (https://www.hgsc.bcm.edu/software/mercury). First, the primary analysis software (Fig. 8b). This suggests that a common set of biological processes on the instrument produces.bcl files that are transferred off-instrument into the HGSC analysis infrastructure by the HiSeq Real-time Analysis module. Next, the underlie the development of multiple SCC types, and that cuSCC vendor’s primary analysis software (CASAVA) demultiplexes pooled samples and may serve as an accurate and extremely accessible model for generates sequence reads and base-call confidence values (qualities). Reads are exploring pathogenesis and testing interventions. mapped to the GRCh37 Human reference genome (http://www.ncbi.nlm.nih.gov/ projects/genome/assembly/grc/human/) using the Burrows-Wheeler aligner (BWA, http://bio-bwa.sourceforge.net/) and producing a BAM file. Finally, quality Methods is recalibrated (GATK, http://www.broadinstitute.org/gatk/), and separate Human tissue samples. All human tissues were studied under a MD Anderson sequence-event BAMs are merged into a single-sample-level BAM. BAM sorting, Cancer Center IRB-approved protocol (LAB08-0750). All human tissues were duplicate read marking, and realignment to improve in/del discovery all occur at obtained from patients who provided written informed consent and who had no this step. Mutations were identified by using the HGSC Cancer Genomics pipeline history of immunosuppression. These samples were validated by histological which includes variant calling using Atlas-SNP, Atlas-INDEL, and PInDel on each analysis and processed using standard methods to yield both high-quality DNA of the BAM files and then merging the variant calls and performing allele lookups. and RNA (RIN48.0). Merged variant files were annotated using dbSNP, COSMIC and Annovar and then split into somatic and germline calls based on variant quality and segregation. Mouse model of UVR-driven cuSCC. All mouse studies were conducted Variant calls were then filtered using a cohort filtering approach and filtered under MD Anderson Cancer Center IACUC-approved protocol (ACUF 00001396- variants were analysed. DNPs were collapsed with their neighbours and annotated by using Provean (provean.jcvi.org). Identification of SMGs essentially paralleled RN00). Mice were obtained from Charles River Laboratories. To model 34 UVR-driven cuSCC under controlled conditions, we exposed female SKH1-E our previously established pipeline . Hairless mice to chronic low-dose UVR (12.5 kJ m À 2 UVB total weekly divided in three doses M, W, F) using solar simulators (Oriel) starting at 3 months of age. In Non-negative matrix factorization-based mutation signatures. NMF-based this strain, 5.0 kJ m À 2 UVB is B0.5–1 mean erythemal dose71. These mice lack spectral deconvolution was used to derive signatures based upon mutations placed the Hairless gene, and they are highly susceptible to UVR-induced skin tumours, in trinucleotide context. The method used is mathematically similar to one recently UVR-induced immunosuppression and DNA damage23. In our studies, we have applied to similar cancer data sets72, but has the advantage of removing the partial used solar simulators with well-characterized spectra at 2.5–5.0 kJ m À 2 UVB 3 overlap between the reported signatures that resulted in high correlations between days a week for a total of 12.5 kJ m À 2 per week (275–325 nm) (ref. 71). These some of them39. Our strategy employed non-smooth NMF, a variant, which doses were verified by broadband UVB and ultraviolet A (UVA) measurements approximates the data using the basis and coefficient matrices as above with the (ILT1700/ILT73B) in experimental conditions before this experiment. Doses of addition of a third smoothing matrix which serves to absorb noise within the data, UVB and UVA averaged 12.5 and 145 kJ m À 2 weekly, respectively. In this model of thus driving the coefficient and basis matrices to increase sparseness. The resulting UVR-driven cuSCC development, we irradiated the mice for 100 days. Papillomas basis matrix generated k ¼ 21 signatures from a diverse set of over 6,000 cancers39, were typically observed within this time and were histologically well-differentiated. reducing correlations between signatures originally derived by Alexandrov et al.72 A minority of these lesions to invasive well-differentiated cuSCC. thereby increasing orthogonality. This enhanced orthogonality has the potential advantage of suggesting biological mechanisms of mutation generation with greater Preparation of RNA for illumina sequencing. Tissue specimens (50–200 mg) specificity. were homogenized with an Omni rotor stator homogenizer in TRIZOL (Invitrogen Cat # 15596018). Total RNA was extracted according to the manufacturer RNA-Seq analysis instructions. RNA purification was carried with Purelink RNA kit (Invitrogen Cat . RNA sequencing (Illumina Hi-Seq) yielded 30–40 million read pairs for each sample. The mRNA-seq paired-end reads were aligned to the human #12183018A). 4–10 mg of RNA per sample was submitted for 76 nt paired-end 73 sequencing by lllumina HiSeq 2000. The same samples were submitted to the reference genome, GRCh37/hg19, using the MOSAIK alignment software .The laboratory of Preethi Gunaratne, PhD (University of Houston, Biology & mRNA-Seq mouse sample reads were aligned onto the mouse genome build UCSC Biochemistry) for small RNA sequencing. mm10 (NCBI 38). The overlaps between aligned reads and annotated genomic features, such as genes/exons were counted using the HTSeq software platform (http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). The counts DNA isolation and exome seq. PureLink Genomic DNA mini kits were used to were normalized using the scaling factor method. To perform further analysis, the extract DNA from tissue samples (Cat # K1820-01). Briefly, tissue specimens were normalized counts were transformed by the variance-stabilizing transformation minced and incubated overnight (55 °C) in PureLink genomic digestion buffer and method and were corrected for experimental batch effects. Specifically, the median proteinase K. Manufacturer’s protocol was followed for purification using the spin expression of each batch was scaled to the same value per gene. Hierarchical columns. Two microgram of DNA per sample was submitted to MD Anderson clustering analysis, using the Pearson correlation coefficient as the distance metric,

Figure 8 | cuSCC is molecularly related to carcinogen-driven SCCs of multiple sites. (a) GSEA analysis of all significant pairwise comparisons in both mouse (CHR versus PAP, CHR versus cuSCC) and human (NS versus AK, AK versus cuSCC, NS versus cuSCC) represented as a CIRCOS plot. For all cancers profiled in the TCGA, normalized enrichment scores for each signature were determined and cancer types ranked by descending order (clockwise) of the sum of squares of all the scores with a penalty. By this measure, cuSCC is most closely related to HNSC, LUSC, basal and HER2 þ subtypes of breast cancer (BRCA) and ESCA SCC. (b) Given that HNSC is most closely related to cuSCC by this measure, we show that cuSCC signatures can predict outcome (overall survival) in HNSC with TP53 mutation, used here as a proxy for identifying tumours that do not express high-risk HPV. The cross-species early signatures derived from the linear mixed effects model and the cross-species microRNA functional analysis all predict survival in HNSCC for the top and bottom 25% of outcomes with statistical significance. Multiple hypothesis testing was performed and all of the plots shown are significant with the stated P-values and false discovery rate-adjusted q-values of o0.1 (q ¼ 0.021 human early, 0.070 mouse early, 0.049 conserved early and 0.070 conserved miRNA targets). (c) Taken together, our data show that AKs have acquired many of the properties of cuSCC as assessed by SMG, mutational overlap, mutational signatures, chromosomal instability signature, mRNA and transcription factor profiles and functional pair analysis, although overall mutational load and unsupervised microRNA clustering do enable separation of the three sample types.

14 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE complete linkage, and PCA were performed using the R statistical system one-sided Fisher exact test and using TargetScan78 (http://www.targetscan.org/ (https://www.r-project.org). Genes significantly differentially expressed between vert_71/) predicted microRNA targets, we determined the miRNAs for which the the NS/CHR, AK/PAP, and cuSCC stages were determined using the R package gene targets are significantly enriched (false discovery rate-adjusted qo0.25; DESeq2 (ref. 74). Since multiple genes were tested simultaneously, the Benjamini- fold-change41.25 Â ) in the gene signature, separately for the human specimens Hochberg method was used to control false discovery rate. For further integration and the mouse samples. Finally, we determined the conserved enriched miRNAs of mRNAs and miRNAs, and detection of enriched transcription factor targets, we alongside the SCC progression model, and the conserved miRNA–mRNA used a cutoff of Q-valueo0.25 and fold-change exceeding 1.25 Â . pairs conserved alongside the SCC progression model. Conserved enriched Viral and fusion transcripts were assessed using the VirusSeq algorithm75 given microRNA–mRNA pairs were visualized using the Cytoscape software that mapping viral integration sites is computationally similar to identifying fusion (http://www.cytoscape.org/). events. In brief, non-human reads were aligned to viral sequences from the Genome Information Broker for Viruses (http://gib-v.genes.nig.ac.jp/) with overall counts registered. Discordant paired-end reads that support a fusion event were Quantitative real-time PCR validation analysis. Separate cohorts of matched clustered and the candidates reported using supporting pairs (at least four) and samples from patients consisting of NS, AK and cuSCC were processed as above. junction spanning reads (at least one) as the cut-offs. Commercially available Taqman (Life Technologies) probes were acquired for TRANSFAC-based analysis (http://www.gene-regulation.com/index2.html) was human miR-21 (000397), miR-31 (002279), PTPN14 (Hs00193643_m1), performed by first identifying significantly overrepresented motifs and their FAM134B (Hs00375273_m1), HMG2A (Hs00171569_m1), TIMP3 associated transcription factors for all pairwise comparisons in both human and (Hs00165949_m1) and ARHGAP24 (Hs01097580_m1) and used in qRT-PCR mice. First, significant candidate factors were ranked by their enrichment scores based quantitation of expression in these tissues, as benchmarked to RNU6B (Supplementary Data 4) and selected only if their targets were enriched in genes (001093, microRNA) and 18S rRNA (Hs99999901, mRNA). sets that changed in the same direction for both species in any one of the adjacent pairwise comparisons: NS/CHR to AK/PAP or AK/PAP to cuSCC. These changes also had to be significantly enriched in the NS/CHR to cuSCC comparison. Gene set enrichment analysis for TCGA tumour signatures. GSEA was carried To quantify chromosomal instability (CIN), CIN70 score was calculated by out using the GSEA software package79 (http://software.broadinstitute.org/gsea/ summing up the normalized counts of all CIN70 genes42. To adjust the effects of index.jsp) to assess the degree of similarity among the studied gene signatures. For multiple samples per patient and experimental batches, CIN70 scores were first each of the human or the mouse SCC progression transcriptome response, all genes transformed by the variance-stabilizing transformation method implemented in were ranked by the fold change alongside the SCC progression model. To assess the DESeq2 and fit into a linear model with patients and experimental batches as relative association with multiple tumour progression signatures, we downloaded covariates. The residuals of the linear model were treated as adjusted CIN70 scores. from the Cancer Genome Atlas (TCGA) (https://tcga-data.nci.nih.gov/tcga/) gene expression data for 19 cancer cohorts, performed quantile normalization using the R statistical analysis system, and then inferred tumour progression transcriptome Cross-species linear mixed effects model. This analysis was confined to human signature by imposing a fold change exceeding 2 (Po0.05). We utilized separately samples that had complete matched sets of lesion types (21 samples from six the downregulated genes and the upregulated genes. Normalized Enrichment Score patients). Samples from all six mice were used. For each gene, LME models were (NES) and adjusted q-values (qo0.25) were computed utilizing the GSEA method, constructed in which normalized gene expression (as above) was modelled as a based on 1,000 random permutations of the ranked genes. We visualized combined function of sample type (fixed effect) while correcting for the patient or mouse NES scores for all the TCGA tumour development gene signatures and for source (random effect). The models were created twice, using either NS/CHR or our human and mouse SCC progression signatures using the Circos software80 AK/PAP as the reference group to compute coefficients, and t tests were performed (http://circos.ca/). to assign significance to each gene with respect to each coefficient in each model. From those genes for which both iterations of the model were successfully fitted to the data (numerically stable), a set of genes with nominal Po0.05 for the cuSCC Survival analysis for TCGA HNSCC with TP53 mutations. We evaluated versus NS/CHR coefficient was identified. Each of these sets was then filtered the survival prognostic power of cuSCC progression-associated gene signature to identify genes for which the AK/PAP versus NS/CHR or the cuSCC versus using human specimen cohorts from the Cancer Genome Atlas (TCGA) AK/PAP coefficient also had a nominal Po0.05 and a sign matching that of the (https://tcga-data.nci.nih.gov/tcga/), specifically HNSCC which are TP53-mutant. cuSCC versus NS/CHR coefficient; these genes were designated as early and late, We first replaced the gene expression of each gene with the z-score within the respectively, or stepwise (both early and late). LME modelling and t tests were cohort, then we computed the sum of z-scores for each sample by adding the carried out using the R package v.3.1-97 ((https://www.r-project.org). z-score for upregulated genes and subtracting the z-score from downregulated For cross-species analysis, the NCBI HomoloGene resource (version 68) genes. Specimens were sorted according to the sum z-score of the respective SCC (http://www.ncbi.nlm.nih.gov/homologene) was used to facilitate cross-species progression gene signature; association with survival (Po0.05, log-rank test) analysis as follows. First, HUGO identifiers were converted to Gene was evaluated by using the package Survival in the R statistical system identifiers. Most (495%) of the identifiers could be translated in this manner (https://cran.r-project.org/web/packages/survival/index.html). (16,155 out of 16,952 human features and 14,084 out of 14,542 mouse features); features that did not map to an Entrez Gene ID were discarded. Next, the resulting Entrez Gene identifiers were matched between species by common HomoloGene Data availability. Raw RNA and small RNA sequencing data from all human and identifier, and any feature from either data set that did not map to exactly one mouse samples that were used to support the findings of this study have been homologue in the other data set was discarded. Finally, the LME results from the deposited in NCBI/GEO with SuperSeries accession code GSE84194. The exome two data sets were merged by HomoloGene identifier to allow for direct sequencing data that support the findings of this paper may be made available upon comparison. request from the corresponding author (K.Y.T.). These exome data are not publicly GATHER (Gene Annotation Tool to Help Explain Relationships)53 was used to available due to them containing information that could compromise research identify TRANSFAC identifiers that were significantly overrepresented in the early participant privacy/consent. The data used to support the pan-cancer GSEA (NS/CHR to AK/PAP), late (AK/PAP to cuSCC), and stepwise (both early and late) tumour signature analysis used publicly available Cancer Genome Atlas gene sets, with a Bayes Factor 43 considered significant. (TCGA) gene expression data, which can be accessed through the websites: (https://tcga-data.nci.nih.gov/tcga/) (https://gdc-portal.nci.nih.gov/). smallRNA-Seq analysis. This work was performed with collaboration with laboratory of Dr Preethi Gunaratne, PhD (University of Houston, Biology & References Biochemistry). Illumina small RNA adapter sequences were trimmed from the 1. Rosen, T. & Lebwohl, M. G. Prevalence and awareness of actinic keratosis: reads, and reads of length below 10nt or ending in homopolymers of length 9 nt or barriers and opportunities. J. Am. Acad. Dermatol. 68, S2–S9 (2013). above were discarded. Total usable number of reads for each sample was calculated. 76 2. Warino, L. et al. Frequency and cost of actinic keratosis treatment. Dermatol. The reads were mapped to the miRBase reference (http://www.mirbase.org/) Surg. 32, 1045–1049 (2006). using BLAST; the abundance of each expressed microRNA was quantified as a 3. Criscione, V. D. et al. Actinic keratoses: natural history and risk of malignant fraction of the usable reads, and expressed as parts per million To reduce potential transformation in the Veterans Affairs Topical Tretinoin Chemoprevention batch effects due to sample collection and preparation at different times, the Trial. Cancer 115, 2523–2530 (2009). ComBat normalization algorithm77 (http://www.bu.edu/jlab/wp-assets/ComBat/ 4. Karia, P. S., Han, J. & Schmults, C. D. Cutaneous squamous cell carcinoma: Abstract.html) was applied for the human data. We determined differentially estimated incidence of disease, nodal metastasis, and deaths from disease in the expressed microRNAs imposing a fold-change of 1.5 Â and t-test comparison United States, 2012. J. Am. Acad. Dermatol. 68, 957–966 (2013). (Po0.05) using the R statistical system. We employed PCA to examine sample structure; further visualization of microRNA significant in one or multiple 5. Neidecker, M. V., Davis-Ajami, M. L., Balkrishnan, R. & Feldman, S. R. comparisons was carried out using the R statistical system. Pharmacoeconomic considerations in treating actinic keratosis. Pharmacoeconomics 27, 451–464 (2009). 6. Zaravinos, A., Kanellou, P. & Spandidos, D. A. Viral DNA detection and RAS Integrative mRNA-miRNA functional pair analysis. We determined enriched mutations in actinic keratosis and nonmelanoma skin cancers. Br. J. Dermatol. miRNA–mRNA pairs using the SigTerms methodology. Essentially, by applying a 162, 325–331 (2010).

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 15 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601

7. Kanellou, P. et al. Genomic instability, mutations and expression analysis of the 35. Li, Y. Y. et al. Genomic analysis of metastatic cutaneous squamous cell tumour suppressor genes p14(ARF), p15(INK4b), p16(INK4a) and p53 in carcinoma. Clin. Cancer Res. 21, 1447–1456 (2015). actinic keratosis. Cancer. Lett. 264, 145–161 (2008). 36. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive 8. Pacifico, A. et al. Loss of CDKN2A and p14ARF expression occurs selection of somatic mutations in normal human skin. Science 348, 880–886 frequently in human nonmelanoma skin cancers. Br. J. Dermatol. 158, 291–297 (2015). (2008). 37. Jonason, A. S. et al. Frequent clones of p53-mutated keratinocytes in normal 9. Rehman, I., Takata, M., Wu, Y. Y. & Rees, J. L. Genetic change in actinic human skin. Proc. Natl Acad. Sci. USA 93, 14025–14029 (1996). keratoses. 12, 2483–2490 (1996). 38. Brash, D. E. UV signature mutations. Photochem. Photobiol. 91, 15–26 (2015). 10. Toll, A. et al. Epidermal growth factor receptor gene numerical aberrations are 39. Covington, K. R., Shinbrot, E. & Wheeler, D. A. Mutation signatures reveal frequent events in actinic keratoses and invasive cutaneous squamous cell biological processes in human cancer, bioRxiv. http://dx.doi.org/10.1101/ carcinomas. Exp. Dermatol. 19, 151–153 (2010). 036541 (2016). 11. Toll, A. et al. MYC gene numerical aberrations in actinic keratosis and 40. Lee, C. S. et al. Recurrent point mutations in the kinetochore gene KNSTRN in cutaneous squamous cell carcinoma. Br. J. Dermatol. 161, 1112–1118 (2009). cutaneous squamous cell carcinoma. Nat. Genet. 46, 1060–1062 (2014). 12. Salgado, R. et al. CKS1B amplification is a frequent event in cutaneous 41. Ziegler, A. et al. Mutation hotspots due to sunlight in the p53 gene of squamous cell carcinoma with aggressive clinical behaviour. Genes nonmelanoma skin cancers. Proc. Natl Acad. Sci. USA 90, 4216–4220 (1993). Chromosomes Cancer 49, 1054–1061 (2010). 42. Carter, S. L., Eklund, A. C., Kohane, I. S., Harris, L. N. & Szallasi, Z. 13. Sekulic, A. et al. Loss of inositol polyphosphate 5-phosphatase is an early event A signature of chromosomal instability inferred from gene expression in development of cutaneous squamous cell carcinoma. Cancer Prev. Res. profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, (Phila.) 3, 1277–1283 (2010). 1043–1048 (2006). 14. Padilla, R. S., Sebastian, S., Jiang, Z., Nindl, I. & Larson, R. Gene expression 43. Kar, A. & Gutierrez-Hartmann, A. Molecular mechanisms of ETS transcription patterns of normal human skin, actinic keratosis, and squamous cell factor-mediated tumorigenesis. Crit. Rev. Biochem. Mol. Biol. 48, 522–543 carcinoma: a spectrum of disease progression. Arch. Dermatol. 146, 288–293 (2013). (2010). 44. Einspahr, J. G. et al. Functional pathway activation mapping of the 15. Nindl, I. et al. Identification of differentially expressed genes in cutaneous progression of normal skin to squamous cell carcinoma. Cancer Prev. Res. squamous cell carcinoma by microarray expression profiling. Mol. Cancer 5, 30 (Phila.) 5, 403–413 (2012). (2006). 45. Su, F. et al. RAS mutations in cutaneous squamous-cell carcinomas in patients 16. Hameetman, L. et al. Molecular profiling of cutaneous squamous cell treated with BRAF inhibitors. N. Engl. J. Med. 366, 207–215 (2012). carcinomas and actinic keratoses from organ transplant recipients. BMC 46. Kumar, A. P. & Butler, A. P. Enhanced Sp1 DNA-binding activity in Cancer 13, 58 (2013). murine keratinocyte cell lines and epidermal tumors. Cancer Lett. 137, 159–165 17. Dooley, T. P., Reddy, S. P., Wilborn, T. W. & Davis, R. L. Biomarkers of human (1999). cutaneous squamous cell carcinoma from tissues and cell lines identified by 47. Miao, Q. et al. Tcf3 promotes cell migration and wound repair through DNA microarrays and qRT-PCR. Biochem. Biophys. Res. Commun. 306, regulation of lipocalin 2. Nat. Commun. 5, 4088 (2014). 1026–1036 (2003). 48. Bhatia, N. & Spiegelman, V. S. Activation of Wnt/beta-catenin/Tcf signaling in 18. Haider, A. S. et al. Genomic analysis defines a cancer-specific gene expression mouse skin . Mol. Carcinog. 42, 213–221 (2005). signature for human squamous cell carcinoma and distinguishes malignant 49. Mammucari, C. et al. Integration of Notch 1 and calcineurin/NFAT signaling hyperproliferation from benign hyperplasia. J. Invest. Dermatol. 126, 869–881 pathways in keratinocyte growth and differentiation control. Dev. Cell. 8, (2006). 665–676 (2005). 19. Kathpalia, V. P. et al. Genome-wide transcriptional profiling in human 50. Eckert, R. L. et al. AP1 transcription factors in epidermal differentiation and squamous cell carcinoma of the skin identifies unique tumor-associated skin cancer. J. Skin Cancer 2013, 537028 (2013). signatures. J. Dermatol. 33, 309–318 (2006). 51. Wang, Q. S., Kong, P. Z., Li, X. Q., Yang, F. & Feng, Y. M. FOXF2 deficiency 20. Ra, S. H., Li, X. & Binder, S. Molecular discrimination of cutaneous squamous promotes epithelial-mesenchymal transition and metastasis of basal-like breast cell carcinoma from actinic keratosis and normal skin. Mod. Pathol. 24, cancer. Breast Cancer Res. 17, 30 (2015). 963–973 (2011). 52. Ooi, A. T. et al. Molecular profiling of premalignant lesions in lung squamous 21. Hudson, L. G. et al. Microarray analysis of cutaneous squamous cell carcinomas cell carcinomas identifies mechanisms involved in stepwise carcinogenesis. reveals enhanced expression of epidermal differentiation complex genes. Mol. Cancer Prev. Res. (Phila) 7, 487–495 (2014). Carcinog. 49, 619–629 (2010). 53. Chang, J. T. & Nevins, J. R. GATHER: a systems approach to interpreting 22. Harbig, J., Sprinkle, R. & Enkemann, S. A. A sequence-based identification of genomic signatures. Bioinformatics 22, 2926–2933 (2006). the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic 54. Gunaratne, P. H., Coarfa, C., Soibam, B. & Tandon, A. miRNA data analysis: Acids Res. 33, e31 (2005). next-gen sequencing. Methods Mol. Biol. 822, 273–288 (2012). 23. Benavides, F., Oberyszyn, T. M., VanBuskirk, A. M., Reeve, V. E. & Kusewitt, D. F. 55. Bruegger, C. et al. MicroRNA expression differs in cutaneous squamous cell The hairless mouse in skin research. J. Dermatol. Sci. 53, 10–18 (2009). carcinomas and healthy skin of immunocompetent individuals. Exp. Dermatol. 24. Vin, H. et al. BRAF inhibitors suppress through off-target inhibition 22, 426–428 (2013). of JNK signaling. Elife 2, e00969 (2013). 56. Dziunycz, P. et al. Squamous cell carcinoma of the skin shows a distinct 25. van Kranen, H. J. et al. Frequent p53 alterations but low incidence of ras microRNA profile modulated by UV radiation. J. Invest. Dermatol. 130, mutations in UV-B-induced skin tumors of hairless mice. Carcinogenesis 16, 2686–2689 (2010). 1141–1147 (1995). 57. Wang, A. et al. MicroRNA-31 is overexpressed in cutaneous squamous cell 26. Khan, S. G. et al. Mutations in ras : rare events in ultraviolet B carcinoma and regulates cell motility and colony formation ability of tumor radiation-induced mouse skin tumorigenesis. Mol. Carcinog. 15, 96–103 (1996). cells. PLoS One 9, e103206 (2014). 27. Soufir, N. et al. INK4a-ARF mutations in skin carcinomas from UV irradiated 58. Akilesh, S. et al. Arhgap24 inactivates Rac1 in mouse podocytes, and a mutant hairless mice. Mol. Carcinog. 39, 195–198 (2004). form is associated with familial focal segmental glomerulosclerosis. J. Clin. 28. Ashton, K. J., Weinstein, S. R., Maguire, D. J. & Griffiths, L. R. Chromosomal Invest. 121, 4127–4137 (2011). aberrations in squamous cell carcinoma and solar keratoses revealed by 59. Wang, W. et al. PTPN14 is required for the density-dependent control of comparative genomic hybridization. Arch. Dermatol. 139, 876–882 (2003). YAP1. Genes Dev. 26, 1959–1971 (2012). 29. Popp, S. et al. Genetic characterization of a human skin carcinoma progression 60. Kasem, K. et al. JK1 (FAM134B) represses cell migration in colon cancer: a model: from primary tumor to metastasis. J. Invest. Dermatol. 115, 1095–1103 functional study of a novel gene. Exp. Mol. Pathol. 97, 99–104 (2014). (2000). 61. Stransky, N. et al. The mutational landscape of head and neck squamous cell 30. Dworkin, A. M. et al. Chromosomal aberrations in UVB-induced tumors of carcinoma. Science 333, 1157–1160 (2011). immunosuppressed mice. Genes Chromosomes Cancer 48, 490–501 (2009). 62. Agrawal, N. et al. Exome sequencing of head and neck squamous cell 31. Rundhaug, J. E. et al. SAGE profiling of UV-induced mouse skin squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science 333, 1154–1157 carcinomas, comparison with acute UV irradiation effects. Mol. Carcinog. 42, (2011). 40–52 (2005). 63. Cancer Genome Atlas, N. Comprehensive genomic characterization of head 32. South, A. P. et al. NOTCH1 mutations occur early during cutaneous squamous and neck squamous cell carcinomas. Nature 517, 576–582 (2015). cell carcinogenesis. J. Invest. Dermatol. 134, 2630–2638 (2014). 64. Hammerman, P. S. et al. Comprehensive genomic characterization of 33. Wang, N. J. et al. Loss-of-function mutations in Notch receptors in squamous cell lung cancers. Nature 489, 519–525 (2012). cutaneous and lung squamous cell carcinoma. Proc. Natl Acad. Sci. USA 108, 65. Lin, D. C. et al. Genomic and molecular characterization of esophageal 17761–17766 (2011). squamous cell carcinoma. Nat. Genet. 46, 467–473 (2014). 34. Pickering, C. R. et al. Mutational landscape of aggressive cutaneous squamous 66. Bass, A. J. et al. SOX2 is an amplified lineage-survival oncogene in lung and cell carcinoma. Clin. Cancer Res. 20, 6582–6592 (2014). esophageal squamous cell carcinomas. Nat. Genet. 41, 1238–1242 (2009).

16 NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms12601 ARTICLE

67. Arron, S. T., Ruby, J. G., Dybbro, E., Ganem, D. & Derisi, J. L. Cancer, DXB Biosciences Research Fund, and institutional funds (MDACC). V.C. Transcriptome sequencing demonstrates that human papillomavirus is not acknowledges support from the UT CCTS TL1 award. C.C. acknowledges support from active in cutaneous squamous cell carcinoma. J. Invest. Dermatol. 131, an Alkek Foundation for Molecular Discovery pilot project grant. CCSG NCI CA016672 1745–1753 (2011). and CPRIT RP#120348 are acknowledged for support of the MD Anderson Sequencing 68. Ross-Innes, C. S. et al. Whole-genome sequencing provides new insights into and Microarray Facility and the NGS Facility at Smithville, Texas, USA, respectively. the clonal architecture of Barrett’s esophagus and esophageal adenocarcinoma. Nat. Genet. 47, 1038–1046 (2015). Author Contributions 69. Stachler, M. D. et al. Paired exome analysis of Barrett’s esophagus and V.C. led the processing of human specimens, overall analysis and validation of micro- adenocarcinoma. Nat. Genet. 47, 1047–1055 (2015). RNAs. C.C. led the bioinformatics analysis of RNAseq and microRNAseq data. J.A.D., 70. Nassar, D., Latil, M., Boeckx, B., Lambrechts, D. & Blanpain, C. Genomic K.R.C. and D.W. performed the exome sequencing analysis, including NMF analysis and landscape of carcinogen-induced and genetically induced mouse skin squamous mutation calling. T.N.N., A.J., S.C., E.C., M.M., D.M., V.D.T., J.N.M. and R.P.R. procured cell carcinoma. Nat. Med. 21, 946–954 (2015). and assisted in procuring human samples. C.H.A., T.N.N., G.C., C.N., K.M. and 71. Nghiem, D. X., Walterscheid, J. P., Kazimi, N. & Ullrich, S. E. Ultraviolet M.A.P. assisted in processing tissue and performing sample analysis. E.T., J.S. and radiation-induced immunosuppression of delayed-type hypersensitivity in Y.T. performed the quality control analysis and sequencing. H.A.A., K.R., W.X. and P.G. mice. Methods 28, 25–33 (2002). assisted in the analysis of RNAseq and microRNA seq data. L.S., H.Y. and X.S. performed 72. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. RNAseq and microRNAseq data analysis including CIN signature analysis. A.G. and A.S. Nature 500, 415–421 (2013). performed the linear mixed effects model analysis. C.P., M.F. and J.N.M. shared data 73. Lee, W. P. et al. MOSAIK: a hash-based algorithm for accurate next-generation before publication and scientific insight regarding exome sequencing of cuSCC. E.T.H. sequencing short-read mapping. PLoS One 9, e90581 (2014). and E.R.F. provided critical scientific support and conceptual insight for the project. 74. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and K.Y.T. conceived the study, wrote the paper and led the overall analysis. dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). 75. Chen, Y. et al. VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue. Bioinformatics. 29, Additional information 266–267 (2013). Supplementary Information accompanies this paper at http://www.nature.com/ 76. Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence naturecommunications microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014). 77. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray Competing financial interests: The authors declare no competing financial interest. expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). 78. Agarwal, V., Bell, G. W., Nam, J. W. & Bartel, D. P. Predicting effective Reprints and permission information is available online at http://npg.nature.com/ microRNA target sites in mammalian mRNAs. Elife 4, 1–38 doi:10.7554/ reprintsandpermissions/ eLife.05005 (2015). How to cite this article: Chitsazzadeh, V. et al. Cross-species identification of genomic 79. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based drivers of squamous cell carcinoma development across preneoplastic intermediates. Nat. approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Commun. 7:12601 doi: 10.1038/ncomms12601 (2016). Sci. USA 102, 15545–15550 (2005). 80. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this Acknowledgements article are included in the article’s Creative Commons license, unless indicated otherwise We thank the residents of the University of Texas Houston Dermatology Residency in the credit line; if the material is not included under the Creative Commons license, program, Humaira Khan, Patricia Sheffield, Yvette Ruiz and Chandra Goodman for users will need to obtain permission from the license holder to reproduce the material. helping to process tissue for the project. K.Y.T. acknowledges funding support from the To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ American Skin Association, American Cancer Society IRG, Duncan Family Institute Seed Grant Program, AACR Landon Foundation INNOVATOR Award in Cancer Prevention Research, NCI-1R01CA194617, T. Boone Pickens Endowment for Early Prevention of r The Author(s) 2016

NATURE COMMUNICATIONS | 7:12601 | DOI: 10.1038/ncomms12601 | www.nature.com/naturecommunications 17