ARTICLE doi:10.1038/nature21036

Translation from unconventional 5′ start sites drives tumour initiation Ataman Sendoel1, Joshua G. Dunn2, Edwin H. Rodriguez2, Shruti Naik1, Nicholas C. Gomez1, Brian Hurwitz1, John Levorse1, Brian D. Dill3, Daniel Schramek1†, Henrik Molina3, Jonathan S. Weissman2 & Elaine Fuchs1

We are just beginning to understand how translational control affects tumour initiation and malignancy. Here we use an epidermis-specific, in vivo profiling strategy to investigate the translational landscape during the transition from normal homeostasis to malignancy. Using a mouse model of inducible SOX2, which is broadly expressed in oncogenic RAS-associated cancers, we show that despite widespread reductions in and synthesis, certain oncogenic mRNAs are spared. During tumour initiation, the translational apparatus is redirected towards unconventional upstream initiation sites, enhancing the translational efficiency of oncogenic mRNAs. An in vivo RNA interference screen of translational regulators revealed that depletion of conventional eIF2 complexes has adverse effects on normal but not oncogenic growth. Conversely, the alternative eIF2A is essential for cancer progression, during which it mediates initiation at these upstream sites, differentially skewing translation and protein expression. Our findings unveil a role for the translation of 5′ untranslated regions in cancer, and expose new targets for therapeutic intervention.

Translational control is a key determinant of protein abundance, which yeast and cultured cell lines16–18. After treating freshly isolated skins in turn defines cellular states1. Its impact may intensify during the tran- with translation elongation inhibitor (cycloheximide), we prepared an sition from homeostasis to malignancy, as revealed by the surprisingly epidermal suspension enriched for basal progenitors and then isolated low correlations between mRNA and protein levels in genome-wide and sequenced ribosome-protected mRNA fragments to generate a human cancer databases2. Moreover, oncogenic drivers, such as mTOR, genome-wide in vivo epidermal translational landscape (Extended Data c-MYC and RAS, can influence the activity of eukaryotic initiation Fig. 1b). In parallel, we performed ribosome profiling on HRASG12V- factors (eIFs) and ribosomal proteins3–6. Thus, by generating aberrant transformed SCCs and primary keratinocytes, and we also mapped downstream networks of translational regulators, oncogenes might initiation sites using harringtonine, which blocks initiation during the impose altered protein synthesis programs that become the driving first round of elongation18. Ribosome-protected fragment-length assays force for tumour formation and malignant progression. generated a 31-nucleotide peak (Fig. 1c). Across deep-sequencing Here we test this hypothesis by focusing on squamous cell carcino- replicates, our in vivo ribosomal profiling was highly reproducible mas (SCCs), which are among the most common and life-threatening (R2 values >​ 0.9, Extended Data Fig. 1b, c). cancers worldwide. In mice, the RAS–MAPK pathway is essential for By conducting matched transcriptional RNA sequencing (RNA-seq) benign tumours and SCCs7. Downstream of RASG12V–MAPK is SOX2, and ribosome profiling, we contrasted genome-wide transcriptional an essential transcription factor induced by SCC-initiating (stem) and translational differences. As confirmed by immunofluorescence, cells8–11. Notably, SOX2 is also recurrently amplified in human SCCs some , such as Krt5 and Krt14, showed SOX2-independent of the lung, head and neck, oesophagus and cervix12. Given its broad translation, whereas others, such as Krt6, Krt16 and Sox2, displayed effect on these cancers, we used an established, inducible SOX2 mouse increased translation in SOX2+ epidermis19 (Fig. 1b, d–f). Notably, model to interrogate its effects on translational regulation of skin epi- 573 translatome-only differences were found, which included mRNAs dermis, at a time preceding overt phenotypic and proliferative changes encoding such as Cd44, Fos, Erbb3 and Irs2 that have well- associated with tumorigenesis. Our studies led us to an unexpected known functions in tumorigenesis. By performing parallel, comparative shift to unconventional translation that functions crucially in tumour mass spectrometry, we found that ribosome profiling correlated well initiation. with SOX2-induced protein differences (Spearman correlation coeffi- cient (rs) =​ 0.85, Extended Data Figs 2a, 10b). Translational landscapes Somatic stem cells and cancer-propagating cells have low protein Embryonic epidermis is an excellent model for studying a rapidly synthesis rates, a feature implicated in driving stemness20,21. Therefore, growing tissue that relies on a fine-tuned balance between proliferation we sought to determine whether SOX2 and HRASG12V also affected and differentiation13,14. To assess how SOX2 perturbs this balance, we the cell-based rate of protein synthesis. We first used O-propargyl- crossed R26-LSL-Sox2-IRES-eGFPfl/fl and K14-cre+/wt mice15, yielding puromycin (OPP) incorporation as a proxy for total protein synthesis20. newborn litters with unaffected (green fluorescent protein (GFP)- OPP led to a robust increase in fluorescence, which could be largely negative; GFP−) or SOX2-expressing (GFP+) epidermal progenitors. blocked by prior cycloheximide treatment. On the basis of this assay, We focused on postnatal days 0–4 (P0–P4) when SOX2+ epidermis was pre-malignant and SCC SOX2+ keratinocytes both showed lower pre-phenotypic (Fig. 1a, b, Extended Data Fig. 1a). protein synthesis rates than their wild-type counterparts (Fig. 2a, b). To determine the effect of SOX2 on translation, we adopted a strat- To explore further how SOX2 functions in translational regulation, egy previously used to map ribosome-protected mRNA sequences in we next focused on translational efficiency by determining the reads

1Robin Chemers Neustein Laboratory of Mammalian Development and Cell Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA. 2Department of Cellular and Molecular Pharmacology, Howard Hughes Medical Institute, University of California, San Francisco, California 94158, USA. 3Proteomics Resource Center, The Rockefeller University, New York, New York 10065, USA. †Present address: The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto M5G 1X5, Canada.

00 MONTH 2017 | VOL 000 | NATURE | 1 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE a K14-cre × R26-LSL-Sox2-IRES-eGFP b GFP K5 SOX2 ITGB4 EdU K5 ITGB4 H&E Figure 1 | The translational landscape loxP loxP P4 WT P4 WT CAG Neo-STOP Sox2 IRES-eGFP of the epidermis during premalignant Epi transformation. a, Transgene and mating loxP Epi used to induce Sox2 in E15.5 embryonic skin CAG Sox2 IRES-eGFP epidermis. b, Representative P4 skin sections c Der WT P4 SOX2 P4 WT Der P4 WT of littermates. Scale bars, 30 μ​m and 60 μ​m 15,000 SOX2 P4 SOX2 P4 SOX2 (haematoxylin and eosin (H&E) images). 10,000 Epi ITGB4 denotes epithelial marker β4-integrin.​ Reads Epi 5,000 WT, wild type. c, Ribosome-protected fragment

Der length (in nucleotides) in P4 epidermal samples 25 27 29 31 33 35 P4 SOX2 P4 WT Der P4 SOX2 Fragment length from randomly selected 105 reads (n =​ 3 per d e 500 genotype). d, Transcriptional and translational

0 Count

−4 0 4 8 0 500 1,000 Differences between changes comparing SOX2 with wild-type WT and SOX2 8 Sox2 8 465 425 573 P4 epidermis. Colour-coded are genes with Transcriptional changes epidermis in vivo Homodirectional changes Translational the adjusted P value <​ 0.05 (DESeq2; ref. 43; Translational changes Gpx2 Transcriptional Opposite change 2 n =​ 3 per group for ribosome profiling; n =​ 2 Sprr1b f for RNA-seq). e, Venn diagram depicts total 4 Krt6b 4 Macf1 Krt6a 29.165 _ WT P4 epidermis number of SOX2-dependent transcriptional Irs2 Cd44Fos and translational changes. f, Ribosome density 0 _ Erbb3 22.232 _ SOX2 P4 epidermis profile on Krt14 and Krt6a transcripts in the 0 0 0 _ epidermis (n =​ 3 per genotype). fold change translatom e

2 Krt14 Klk5 14.663 _ WT P4 epidermis log

0 _ –4 −4 16.411 _ SOX2 P4 epidermis

0 _

–4 0 48Count Krt6a

log2 fold change transcriptome per kilobase of transcript per million mapped reads (RPKM) of coding Tumour-induced shifts in translation initiation sequences (CDS) in ribosome profiling versus the RPKM in exons of In addition to these differences in translational efficiency, there were RNA-seq (RPKMribosome profiling/RPKMRNA-seq) in wild-type and premali­ also qualitative differences in the patterns of ribosomal occupancies. gnant states. Overall, translational efficiency in SOX2+ epidermis was Oncogene-dependent differential skewing of ribosome occupancy was markedly lower than in wild-type epidermis (Fig. 2c). Translational particularly evident within 5′ ​untranslated regions (UTRs). We identified efficiency was also reduced in cultured SOX2-transformed kerati- distinct SOX2-induced patterns of translated upstream open nocytes, indicating that this difference was intrinsic to SOX2 levels ­reading frames (uORFs), reflected in their triplet periodicity22, that (Extended Data Fig. 2b). Moreover, the cells grew comparably in vivo, were 5′​ of annotated CDS (Fig. 3a, b, Extended Data Figs 4, 5a). ruling out proliferation rates as the root of these differences (Extended Quantification of 5′​ UTR translation of 1,830 mRNAs revealed that Data Fig. 2c). We also found no differences in the expression of uORF translation increased substantially in SOX2+ compared to wild- phospho-eIF4E-BP1 or NSUN2, which could have contributed to type epidermis (median ratio 1.84), as measured by the ratio of ribo- reduced translational efficiency21 (Extended Data Fig. 2d, e). some profiling reads within the 5′​ UTR relative to reads within the Reduced protein synthesis in pre-malignancy correlated well with annotated downstream CDS. reduced translational efficiency. Notably, however, a small cohort of Notably, some extensions in ribosomal coverage along 5′​ UTRs in mRNAs deviated from this trend. In particular, while increased tran- pre-malignant epidermis also generated peptide diversity (Extended scripts were enriched for stress pathways, the cohort of efficiently trans- Data Figs 4, 5). Previous studies underscored the difficulties in detecting lated mRNAs in SOX2+ epidermis displayed a notable association with uORF peptides by mass spectrometry23, and indeed, even when we spe- cancer-related pathways (Fig. 2d, Extended Data Fig. 3). Thus, in the cifically enriched for N-terminal fragments (terminal amine isotopic face of global translational reduction during the early stages of malig- labelling of substrates, TAILS), we only detected 13 uORF peptides. nancy, a translationally controlled subset of cancer genes escaped this Independent of whether uORFs are translated into stable peptides suppression by maintaining efficient translation. or N-terminal extensions, or represent stalled or poised ,

a b Unstained CHX + OPP OPP c mRNAs with higher translational ef ciency 9,000 WT WT SOX2 SCC 100 8,000 SOX2 WT vs SOX2 SOX2 vs WT SCC 80 1,000 7,000 Cxcl14 60 Gas1 6,000 40 Krt6b 5,000 ** 750 Rorc 20 E2f1

MFI 4,000 *** 0 Smo 3 3 4 5 3 3 4 5 3 3 4 5 3,000 Normalized cell count –10 010 10 10 –10 010 10 10 –10 010 10 10 500 Lgals4 Hoxb2 2,000 d Genes transcribed differentially Genes translated ef ciently Mllt6 in SOX2 vs WT epidermis in SOX2 vs WT epidermis Number of mRNAs 1,000 250 Krt19 –10 –5 0 –2 –4 –6 Bptf 0 Superpathw. of cholesterol biosynth. Molecular mech. of cancer Map3k10 P NRF2-mediated oxidative stress resp. Glioblastoma multiforme sign. 0 OPP Vit. D receptor activation Glucocorticoid signalling Cholesterol biosynthesis I Ovarian cancer signalling –6 –3 03 CHX+OP Cholesterol biosynthesis II Wnt/Ca+ signalling log2 TE fold change SOX2 vs WT

Figure 2 | Overall protein synthesis and translational efficiency are **​ ​P <​ 0.01, **​ *​ ​P <​ 0.001, two-tailed Student’s t-test. MFI, mean decreased in premalignant and SCC states. a, b, O-propargyl-puromycin fluorescence intensity. c, Differential translational efficiency (OPP) incorporation was assessed 1 h after administration in vitro. (TE =​ RPKMribosome profiling/RPKMRNA-seq) SOX2 versus wild type. Representative histograms are shown for unstained, cycloheximide/OPP- d, Pathway analysis of genes transcribed differentially or translated treated (CHX) or OPP-treated keratinocytes. Data are mean ±​ s.d. efficiently in SOX2 versus wild type. (WT n =​ 11, SOX2 n =​ 7, SCC n =​ 7 independent experiments).

2 | NATURE | VOL 000 | 00 MONTH 2017 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

a Higher relative 5′ UTR in b c 0 e Figure 3 | Translation is shifted towards upstream WT SOX2 WT SOX2 open reading frames in premalignant and SCC Cebpb –200 Kras WT Top 10% SOX2 vs WT (log P value) SOX2 states. a, Relative uORF translation in P4 epidermis. Gnas –400 0 –5 –10 Sox5 0.11 Histogram shows distribution of log2 fold changes in 200 Bmpr2 0.10 Wnt/β-catenin –600 Mol. mech. of cancer Itgb1 0.09 relative 5′​ UTR translation (n =​ 3 per genotype). Red Npm1 PPARα/RXRα 0.08 P < 0.001 Protein kinase A sign. Smad4 Minimum free energy –800 line, median. b, Metagene analysis of ribosome density 0.07 ES cell pluripotency Tcf3 0 500 1,000 1,500 100 Itga6 0.06 for the 1,830 5′​ UTRs quantified in a. Normalized Pten 0.05 d 5′ UTR length Top 10% WT vs SOX2 (log P value) Number of mRNAs 0 –10 –20 Faf1 0.04 ribosome densities denotes the ratio of ribosome Racgap1 Norm. ribosome densities eIF2 signalling Scaled 5′ UTR Other profiling reads. c, Minimum free energy for the Ptp4a2 CUG mTOR signalling 0 eIF4 and p70S6K reg. UUG top 10% of wild-type and SOX2-regulated 5′​ UTRs −4 −2 024 p38 MAPK signalling AUG GUG STAT3 pathway (n =​ 183 per group, from 3 independent experiments, log2 ((SOX2 5′ UTR/ORF)/(WT 5′ UTR/ORF)) f ORF g two-sided Wilcoxon test). d, usage of 300 uORFs preferentially translated in SOX2+ epidermis. 5′ UTR 5′ UTR e, Pathway analysis for the top and bottom 10% of WT 200 1.576 _ genes with differential uORF usage in SOX2+ versus 0 _ wild type (n =​ 183 per group). ES cell, embryonic stem 1.137 _ 100 SOX2 ′

Number of mRNAs cell. f, Normalized ribosome densities in 5 ​ UTR and _ 0 0 ORF of Npm1 in P4 epidermis (average of n =​ 3 per Npm1 –4 –2 0 246 genotype). g, Relative uORF translation in HrasG12V; Npm1 log2 ((SCC 5′ UTR/ORF)/ CUG AUG (WT 5′ UTR/ORF)) Tgfbr2-null versus control keratinocytes in vitro. Red line, median. they still could serve an important regulatory role. Moreover, upon To find such regulatory nodes, we designed an RNA interference scrutinizing 5′​ UTRs of the top 10% of efficiently translated uORFs (RNAi)-based screen to determine whether knockdown of any of those in SOX2+ epidermis, we discovered a marked correlation between genes exhibits a differential growth phenotype in SOX2 versus wild-type enhanced translation, increased length and decreased minimum free epidermis. To this end, we pooled 750 short hairpin RNA (shRNA)- energy of 5′​ UTRs (Extended Data Fig. 6a, b). When normalized for containing lentiviruses targeting translation-related genes and then length, reduced folding energy still correlated with increased transla- introduced this library in utero into the amniotic sacs of living embry- tion (Fig. 3c), indicating that SOX2-dependent re-direction towards onic day (E)9.5 embryos (Fig. 4a, Extended Data Fig. 7a). This technique uORF translation may target 5′​ UTRs with increased secondary selectively and efficiently transduces skin epidermal progenitors14. structure. To prevent multi-infections and still ensure a coverage of more than Pathway analysis revealed that highly translated uORFs in SOX2+ 400 (cells/shRNA), we used a 10% infection rate on 27–49 embryos of epidermis have downstream ORFs involved in mechanisms of cancer, each genotype and assessed shRNA representation in P0 epidermis. stem-cell pluripotency and Wnt/β​-catenin signalling. This contrasted As expected, most cells containing and Eif shRNAs with wild-type epidermis, where eIF2 signalling was the primary were depleted in skins of both genotypes, with similar overall distribution target of differential uORF usage (Fig. 3e). Additionally, significant of enriched and depleted shRNAs (Fig. 4b, c). However, by analysing overlap existed between mRNAs with high uORF translation and differential shRNA representation, we began to unveil regulatory mRNAs refractory to oncogenic suppression of translational efficiency nodes downstream of SOX2 (Fig. 4d). As judged by the RIGER (Extended Data Fig. 6c). Many of these SOX2-induced, ribosome- algorithm27, Eif2s1, Eif2s2 and Eif5 were among the top six genes in protected uORFs within 5′​ UTRs displayed CUG, rather than the which the shRNAs were specifically enriched in SOX2+ epidermis conventional AUG initiation site of canonical ORFs (Fig. 3d). For (Fig. 4e, f, Extended Data Fig. 7b, c). instance, nucleophosmin (Npm1), which is frequently mutated or eIF2α​ (encoded by Eif2s1) and eIF2β​ (Eif2s2) are part of the eIF2 translocated in cancer24, exhibited increased ribosome occupancy in ternary complex, which initiates canonical translation together the 5′​ upstream CUG uORF (Fig. 3f). with methionine initiator-tRNA; eIF5 is required for recycling this These findings suggest that even before visible signs of tumorigenesis, complex to launch subsequent rounds of translation5,28 (Fig. 4g). On the translational initiation apparatus is redirected towards upstream the basis of our screen, we posited that pre-malignant SOX2-expressing ORFs of a cohort of cancer-related mRNAs. If this hypothesis is correct, epidermal cells may be using a translational program independent of then we should see differential uORF usage in established cancers. canonical eIF2-mediated translation initiation. Interestingly, both We therefore examined uORF usage in a tumour allograft model, SOX2+ and HrasG12V;Tgfbr2-null SCCs displayed higher eIF2α-Ser51​ in which oncogenic HRASG12V in combination with loss of TGFβ ​ recep- phosphorylation­ (Fig. 4h), a modification also seen in the integrated tor II rapidly progresses in a SOX2-dependent manner into SCCs25 stress response (ISR), where it diminishes eIF2 activity29. (Extended Data Fig. 6d). Indeed, when compared to keratinocytes from normal epidermis, SCC keratinocytes displayed an approximately eIF2A is an essential factor for tumour initiation 1.7-fold increase in uORF translation of mRNAs, similar to that seen To understand how SCC translation is coordinated independently with premalignant epidermis (Fig. 3g). Moreover, a notable overlap of eIF2, we first asked whether alternative initiation factors coordi- existed between uORFs regulated by SOX2 in vivo and HRASG12V nate translation in the context of tumour formation. Four alternative in vitro (Extended Data Fig. 6e). Together, these findings underscore initiation factors MCT1–DENR, eIF2D (also known as Lgtn or liga- the existence of a mechanism to redirect the translational initiation tin), eIF5B and eIF2A are known to deliver tRNAs to P sites of the machinery specifically in SCCs for the purpose of enhancing the ribosome30–34. Eif5b was not differentially depleted in our screen relative 5′ ​UTR translational efficiency of the uORFs of certain cancer- (Extended Data Fig. 7b), but Mct1-Denr, Eif2d and Eif2a shRNAs related mRNAs. had not been included in our initial library. We focused on eIF2A as it is a crucial factor for uORF translation35, it has been implicated in In vivo RNAi screen exposes an eIF2 regulatory node leucine-tRNA recruitment34 and may be directly involved in initiating Recognition of upstream and conventional ORF start codons is thought CUG (leucine) uORF start codons34,35, which showed the highest to involve a similar set of initiation factors (eIFs) and ribosomal preva­lence in SOX2+ epidermis18 (Fig. 3e). subunits26. Therefore, we posited that differential activity or abundance To test whether SOX2+ epidermis has enhanced dependency on of either eIFs or ribosomal proteins might be responsible for SOX2- eIF2A, we first performed an in vivo mini-screen with now availa- induced changes of the translational landscape. ble Eif2a shRNAs. Relative to 35 control shRNAs, Eif2a shRNAs were

00 MONTH 2017 | VOL 000 | NATURE | 3 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Rpl9 (310, 314, 312) a bc4 4 d 6 2 2 4 0 0 2 –2 –2 0 Intra-amniotic injection –4 –2 –4 WT P0 vs E9.5 –6 –6 SOX2 P0 vs E9.5 –4 fold change fold change WT vs SOX2 P0 2 2

–8 –8 fold change –6 2 log log –10 Rpl7 (865, 497, 867, 868) –10 –8 –12 –12 log –10 601 shRNAs 601 shRNAs 601 shRNAs

e Eif2s1 f g h i AUG recognition

Eif2s2 WT P0 vs E9.5 SOX2 P0 vs E9.5 SOX2 P0 vs WT P0 ID 871 and release In utero injection Rpl5 868

eIF2 WT SOX2 SC C Rpl21 870 E9.5 10% infection rate Eif2s 1 867 (eIF2α, eIF2β, eIF2γ) Rps24 876 eIF5 P-eIF2α 877 GTP Eif5 875 Ternary GTP (S51) 878 P0 epidermis Eif2s 2 γ complex γ 874 α β α β 262 eIF2α E9.5 055 +Met-tRNAi Eif5 264 AUG R26-LSL-Cas9 263 458 eIF2B 40S ± SOX2 455 eIF2A Control shRNA shRNA 456 embryos shRNA repr. > 1 repr. < 1 454 Eif2a sgRNA Eif3s6i p GDP GDP Rps27 457 SOX2 libr. = 1 377 γ representation Eef2k 375 P γ 378 α β α β WT vs SOX2 Comparative analysis of Rplp1 Rplp1 379 Actin 819 P0 epidermis shRNA representation Rpl31 822 820 Eef2k SOX2 vs WT Increased shRNA representation in SOX2 Eif3s6ip Canonical translation Eif2a initiation –5 05 02468 log2 ratio Fold change Figure 4 | An epidermal-specific in vivo RNAi screen identifies the eIF2 of relative shRNA representation reveals Eif2s1, Eif2s2 and Eif5 shRNAs ternary complex as a regulatory node in premalignancy. a, Schematic among the top hits with higher representation in SOX2+ versus wild-type of the screening strategy, in which the lentivirus library was injected into epidermis using the RIGER algorithm. f, Heatmap showing examples of the amniotic sacs of E9.5 wild-type and SOX2+ embryos. The library shRNAs with higher and lower relative representation in SOX2+ versus consisted of 715 shRNAs targeting 138 eukaryotic initiation factors and wild-type epidermis. g, Schematic of the eIF2 pathway that initiates ribosomal proteins and, 35 non-targeting control shRNAs. 49 wild-type canonical translation. h, Representative western blot from 3 independent and 27 SOX2+ embryos were transduced. Total coverage: wild type, 782×​; experiments of keratinocytes in vitro. i, In vivo CRISPR/Cas9 strategy to SOX2+, 432×.​ b–d, shRNA abundance ratios were calculated as the knockout Eif2a in wild-type and SOX2+ epidermis by in utero injection number of reads at P0 divided by the number of reads in the initial library of lentiviruses containing cre and single-guide RNAs (sgRNAs). Relative (mean of n =​ 3, 601 shRNAs quantified above threshold). Numbers in Eif2a sgRNA representation was analysed by quantitative PCR (n =​ 4 per parentheses denote the shRNA ID targeting a particular . e, Analysis genotype). Data are mean ±​ s.d.

+ a b Control 4M Eif2a 5A under-represented in SOX2 epidermis, correlating to an extent with Cas9 + sgRNA Eif2a 6D Isolation of Eif2a 5A mRNA-knockdown efficiency (Extended Data Fig. 7d). Similar results clonal Eif2a 5B Eif2a KO were obtained by using in utero CRISPR/Cas9-mediated gene targeting HrasG12V; 600 lines ) Tgfbr2-null 3 to ablate Eif2a fully (Fig. 4i). Thus, whereas translation in wild-type 500 Control 4M sg1 sg3 + 1,000 )

α 400 3 epidermis depended on eIF2 ​, SOX2 epidermis showed increased 900 Control 4M 800 300 Eif2a 5A

reliance on eIF2A. 700

Control 4M Control 5A Eif2a 6D Eif2a Eif2a 200 600 To test whether eIF2A is required for translation in SCCs, we used 500 eIF2A 100 400 CRISPR/Cas9 to establish three independent Eif2a-null clones of Tumour volume (mm 300 0 G12V 25 0510 15 20 25 200 Hras ;Tgfbr2-null SCCs (Fig. 5a, Extended Data Fig. 7e). In serum- Actin 100 Tumour volume (mm Time post-injection (days) 0 rich media, Eif2a-ablation did not affect proliferation, total protein t = 25 d* t = 60 d 100 synthesis or the translational landscape of SCC cells (Extended Data c 90 d eIF2A rescue 80 Control 4M Fig. 7f–h). However, whereas control SCCs formed tumours within 70 Control 4M 60 Eif2a 5A Eif2a 5A 1–2 weeks of engraftment, Eif2a-null SCCs rarely developed tumours 50 Eif2a 5B eIF2A 40 Eif2a 5A; eIF2A rescue rescue even after 2 months (Fig. 5b). Moreover, in limiting dilution assays, 30 Eif2a 5B; eIF2A rescue Eif2a 5A Eif2a 5A Eif2a 5B 20 )

10 3 tumour-initiating SCC cells were roughly 100 times fewer in isogenic Tumour-free mice (%) 600 0 100 1,000 10,000 500 Eif2a-null versus control SCC lines (Fig. 5c). Importantly, after reintro- Control 4M Eif2a 5A Eif2a 5B Number of grafted cells 400 1 eIF2A ducing eIF2A into our Eif2a-null lines, tumour formation was rescued Control 4M 300 Eif2a 5A (Fig. 5d), thereby establishing an essential and hitherto unappreciated 0.1 200 100 Actin role for eIF2A-mediated translation in tumour initiation. 0.01 Tumour volume (mm 0 0510 15 20 25

Estimated TIC (%) 0.001 Time post-injection (days) eIF2A, protein synthesis and SCC prognosis In performing ribosome profiling, we found a pronounced decline Figure 5 | eIF2A controls tumour formation. a, HrasG12V;Tgfbr2-null in the ratio of uORF to ORF translation in Eif2a-null compared with SCC keratinocytes were infected with lentiviruses containing Cas9 and control SCC keratinocytes (Fig. 6a). Having established a direct role sgRNAs to establish Eif2a-knockout (KO) and non-targeting control lines. for eIF2A in mediating genome-wide translation from 5′​ UTRs, we Western blot confirms Eif2a ablation by two different sgRNAs (sg1 and next monitored the consequences of eIF2A-dependent uORF usage sg3). 4M denotes control clone; 5A, 5B and 6D denote Eif2a knockout on protein production. Since eIF2A loss had no appreciable effects on clones. b, Eif2a ablation abrogates tumour growth. Plotted is mean tumour volume ±​ s.e.m. after subcutaneous injection of 105 cells (n =​ 16 control, translation in rich media (Extended Data Fig. 7h), we added arsenite to n =​ 14 knockout clone 5A, n =​ 12 clones 5B and 6D). Representative phosphorylate and inhibit eIF2α​, thus increasing eIF2A dependency. tumours 25 days after injection are shown. Asterisk denotes terminated To track newly synthesized proteins in control and Eif2a-null SCCs, we owing to tumour size. Scale bar, 1 cm. c, Limiting dilution assay. Graphs used a pulsed SILAC (stable isotope labelling with amino acids in cell show percentage of tumour-free mice 4 weeks after injection and estimated culture) approach, administering the eIF2α ​block concomitant with the tumour-initiating cells (TIC) (n =​ 24 grafts per dilution and genotype). switch from light (L)- to heavy (H)-labelled SILAC medium. d, Re-introducing eIF2A by pCMV-Eif2a transformation rescues Eif2a- Of 2,045 proteins detected by mass spectrometry, 368 showed knockout defective tumour initiation. Western blot confirms eIF2A an eIF2A-dependent, temporal increase in H-labelled peptides in restoration. Data are mean ±​ s.e.m. (n =​ 16 control, n =​ 14 clone 5A, n =​ 12 = = SCCs (Fig. 6b, Extended Data Fig. 7i). This cohort included cancer- clone 5B, n ​ 16 clone 5A; pCMV::Eif2a, n ​ 16 clone 5B; pCMV::Eif2a). Scale bar, 1 cm. associated proteins such as NRAS, CD44, KRT17, Ki67, NRP1 and

4 | NATURE | VOL 000 | 00 MONTH 2017 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

a b c 1.25 Light Heavy (R, K) P = 0.0012 Control eIF2A-regulated Non-eIF2A-regulated eIF2A-regulated 1.00 Arsenite stress uORF genes Non-uORF genes 200 Eif2a KO 0.75

Density 0.50 150 0 h 12 h 24 h 48 h 72 h Control Eif2a KO NRAS 0.25 100 KI67 CD44 0.00 50 RAC1 –2 –1 012

Number of mRNAs KRT17 e log2 ((control stress/no stress)/(Eif2a KO stress/no stress))

368 proteins NRP1

z -scored H/LFQ Canonical translation 0 ITGA6 eIF2 tRNA-i eIF2 60S 60S −6 −3 036 EIF2A γ α β 5′UTR3CDS ′UTR log2 ((control 5′UTR/ORF)/ (Eif2a KO 5′UTR/ORF)) 40S 40S Homeostasi s Global inhibition of canonical translation eIF2 P z -scored H/LFQ 1,144 proteins P γ 5′UTR3CDS ′UTR α β 40S

d 12 h 24 h 48 h 72 h 12 h 24 h 48 h 72 h SCC in vitro 04Distance to average Selective translation of oncogenic eIF2A targets 60S eIF2A 60S s.c. injection 60S Day 5 post-injection Stop Tumorigenesis eIF2A 5′UTR3uORF CDS ′UTR Ribosome pro ling Ribosome pro ling DESeq2 40S 40S 40S RNA-seq RNA-seq Day 5 in vivo f vs in vitro 100 –7 100 P = 9.34 × 10 eIF2A-regulated 90 Median 64.8 months Median 76.2 months 90 Non-eIF2A-regulated 80 Median 32.8 months Median 29.2 months All genes 80 70 0.4 70 P = 0.0003 60 Ctnnb1 60 Hif1a 0.3 50 P = 0.0135 50 n = 292 Cd44 40 40 Lgals3 0.2 n = 375

0.2 Survival (%) 30 30 Msn n = 142 n = 99 20 20

Density TE 0.1 Krt6a 27% patients with EIF2A 10 27% patients with 10 Disease-free survival (%) Density translation mRNA upregulation 0.0 0.0 0 EIF2A mRNA upregulation 0 P = 0.00224 −5.0 −2.5 0.0 2.5 5.0 −6 −3 036 04080 120 160 200 04080 120 160 log2 fold change log2 fold change Survival (months) Survival (months) Figure 6 | eIF2A promotes translation of select cancer genes and leads tumorigenesis. SCCs were subjected to ribosome profiling and RNA-seq to poor prognosis in human SCC. a, eIF2A controls genome-wide directly or after 5 days transplantation in vivo (n =​ 2). Kernel density plots uORF translation. Histogram shows distribution of log2 fold changes in show changes in translation and translational efficiency (TE) of tumours relative 5′​ UTR translation of uORF genes in Eif2a-knockout compared in vivo versus SCCs in vitro at day 5. eIF2A-targeted (n =​ 716) versus non- with control SCCs. Red line, median. b, A pulsed SILAC strategy reveals eIF2A-targeted uORF (n =​ 746) genes refer to changes in uORF usage in eIF2A-dependent protein synthesis under 5 μ​M arsenite stress. A cluster of control versus Eif2a-null SCCs (a). P values, two-sample Kolmogorov– 368 of the quantified 2,045 proteins were selectively reduced in synthesis Smirnov test. s.c., subcutaneous. e, Model of the switch towards eIF2A- in Eif2a-knockout SCCs. c, eIF2A-targeted uORF genes show increased dependent translation during tumorigenesis. tRNA-i, initiator-tRNA. protein synthesis in 72 h pulsed SILAC in control versus Eif2a-knockout f, Kaplan–Meier analysis comparing overall (left) and disease-free (right) SCCs. Kernel density plot shows eIF2A-dependent change in heavy- survival of TCGA patients with head and neck SCC, which were stratified labelled protein synthesis of eIF2A-targeted uORF and non-uORF according to EIF2A mRNA expression z-score >​1.75 (27% of patients) genes under stress. P value, two-sample Kolmogorov–Smirnov test. versus the rest. Median survival and disease-free survival values are given. d, eIF2A-targeted uORF cancer genes are preferentially translated in early

RAC1. Moreover, eIF2A-targeted uORF genes showed a significant not only add to evidence that suggests that eIF2α ​phosphorylation and eIF2A-dependent overall increase in H-labelled proteins when SCCs inactivation may be induced during tumour formation38, but also fur- were cultured under stress (Fig. 6c). Thus, under conditions of eIF2α​ ther indicate that oncogenes can direct the translational machinery phosphorylation, eIF2A specifically augments protein synthesis of towards eIF2A-dependent uORF translation in tumorigenesis. uORF-containing mRNAs. We found that whereas eIF2α ​phosphorylation globally reduces pro- To address whether this principle pertains to early stages of tumori- tein synthesis, translation is not inhibited on all mRNAs, but rather genesis in vivo, we performed transplantation assays (Fig. 6d, Extended reprogrammed to translate selected eIF2A-dependent cancer-associated Data Fig. 8a–c). As early as day 5 after transplantation, SCC mRNAs mRNAs efficiently. eIF2A competes, albeit very poorly, with eIF2α ​ for containing eIF2A-targeted uORFs exhibited a marked increase in both delivery of the initiator tRNA to the 40S ribosomal subunit32,39,40, thus translation and translational efficiency in the absence of appreciable accentuating its relevance only when eIF2α​ is inhibited. Our findings transcriptional differences. These data suggest that during early tumo­ support a model in which tumour initiation factors and/or cellular rigenesis, cancer-associated transcripts that contain eIF2A-targeted stressors38,41 lead to eIF2α​ inhibition early in tumorigenesis, freeing uORFs selectively increase their translational efficiency. In agreement, eIF2A of its competitive disadvantage and orchestrating its preferential mutations in the eIF2A-regulated uORF of Ctnnb1, a key oncogene translation of genes important for malignant progression. for SCC progression, resulted in diminished tumorigenic potential eIF2A could mediate tumorigenesis in several ways (Extended Data (Extended Data Fig. 8d). Fig. 10a). First, if a preferred 5′ ​upstream start codon is in-frame, eIF2A Analysis of The Cancer Genome Atlas (TCGA)36 revealed that the could generate an N-terminally extended protein, and as described human EIF2A locus is amplified in 29% of patients with lung SCC, 15% for the Pten tumour suppressor42, this could yield a function distinct of patients with head and neck SCC and 15% of patients with oesopha­ from its parent ORF. Second, as shown by our proteomics experiments, geal carcinoma (Extended Data Fig. 9). Notably, although initiation uORF translation can generate small peptides, which if bioactive, could factors are typically regulated post-translationally, higher EIF2A mRNA directly impact cellular behaviour. Third, uORF translation could have levels correlated significantly with shorter overall survival and shorter a regulatory role by amplifying or diminishing translation of down- disease-free survival (Fig. 6f). stream ORFs. This becomes particularly important in stem cells and the early stages of tumorigenesis, where overall protein synthesis is Discussion suppressed20,21 and yet marked changes in protein production must Translational control allows a cell to orchestrate rapid changes in protein occur. Indeed, we identified a subset of oncogenic mRNAs that contain synthesis and tailor newly synthesized proteins to its specific needs, eIF2A-targeted uORFs and which are preferentially translated at early thereby ensuring that cellular resources are conserved37. Our studies stages of tumorigenesis.

00 MONTH 2017 | VOL 000 | NATURE | 5 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

In summary, we discovered that during tumour initiation, oncogene- 27. Luo, B. et al. Highly parallel identification of essential genes in cancer cells. Proc. Natl Acad. Sci. USA 105, 20380–20385 (2008). induced changes in eukaryotic initiation factors profoundly impact 28. Hinnebusch, A. G. The scanning mechanism of the translational landscape. By introducing shifts in the translation initiation. Annu. Rev. Biochem. 83, 779–812 (2014). of uORFs, they can render specific cohorts of cancer-related mRNAs 29. Harding, H. P. et al. Regulated translation initiation controls stress-induced in mammalian cells. Mol. Cell 6, 1099–1108 (2000). refractory to global reductions in protein synthesis that often accom- 30. Skabkin, M. A. et al. Activities of Ligatin and MCT-1/DENR in eukaryotic pany a stem-like and/or cancerous state. Given the poor prognosis asso- translation initiation and ribosomal recycling. Genes Dev. 24, 1787–1801 ciated with EIF2A mRNA levels in human cancers, our findings form (2010). a foundation for future investigations into whether eIF2A-mediated 31. Dmitriev, S. E. et al. GTP-independent tRNA delivery to the ribosomal P-site by a novel eukaryotic translation factor. J. Biol. Chem. 285, 26779–26787 translation and/or translational regulation by uORFs can be exploited (2010). for therapeutic interventions. 32. Zoll, W. L., Horton, L. E., Komar, A. A., Hensold, J. O. & Merrick, W. C. Characterization of mammalian eIF2A and identification of the yeast Online Content Methods, along with any additional Extended Data display items and homolog. J. Biol. Chem. 277, 37079–37087 (2002). Source Data, are available in the online version of the paper; references unique to 33. Terenin, I. M., Dmitriev, S. E., Andreev, D. E. & Shatsky, I. N. Eukaryotic these sections appear only in the online paper. translation initiation machinery can operate in a bacterial-like mode without eIF2. Nat. Struct. Mol. Biol. 15, 836–841 (2008). Received 16 June; accepted 7 December 2016. 34. Starck, S. R. et al. Leucine-tRNA initiates at CUG start codons for protein Published online 11 January 2017. synthesis and presentation by MHC class I. Science 336, 1719–1723 (2012). 35. Starck, S. R. et al. Translation from the 5′ ​untranslated region shapes the 1. Schwanhäusser, B. et al. Global quantification of mammalian gene expression integrated stress response. Science 351, aad3867 (2016). control. Nature 473, 337–342 (2011). 36. Cancer Genome Atlas Network. Comprehensive genomic characterization of 2. Zhang, B. et al. Proteogenomic characterization of human colon and rectal head and neck squamous cell carcinomas. Nature 517, 576–582 (2015). cancer. Nature 513, 382–387 (2014). 37. Liu, B. & Qian, S.-B. Translational reprogramming in cellular stress response. 3. Mamane, Y., Petroulakis, E., LeBacquer, O. & Sonenberg, N. mTOR, translation WIREs RNA 5, 301–315 (2014). initiation and cancer. Oncogene 25, 6416–6422 (2006). 38. Koromilas, A. E. Roles of the translation initiation factor eIF2α ​serine 51 4. Pelletier, J., Graff, J., Ruggero, D. & Sonenberg, N. Targeting the eIF4F phosphorylation in cancer formation and treatment. Biochim. Biophys. Acta translation initiation complex: a critical nexus for cancer development. 1849, 871–880 (2015). Cancer Res. 75, 250–263 (2015). 39. Komar, A. A. et al. Novel characteristics of the biological properties of the 5. Sonenberg, N. & Hinnebusch, A. G. Regulation of translation initiation in yeast Saccharomyces cerevisiae eukaryotic initiation factor 2A. J. Biol. Chem. eukaryotes: mechanisms and biological targets. Cell 136, 731–745 (2009). 280, 15601–15611 (2005). 6. Hsieh, A. C. et al. The translational landscape of mTOR signalling steers 40. Reineke, L. C., Cao, Y., Baus, D., Hossain, N. M. & Merrick, W. C. Insights into cancer initiation and metastasis. Nature 485, 55–61 (2012). the role of yeast eIF2A in IRES-mediated translation. PLoS One 6, e24492 7. Huang, P. Y. & Balmain, A. Modeling cutaneous squamous carcinoma (2011). development in the mouse. Cold Spring Harb. Perspect. Med. 4, a013623 41. Holcik, M. Could the eIF2α-independent​ translation be the achilles heel of (2014). cancer? Front. Oncol. 5, 264 (2015). 8. Okubo, T., Pevny, L. H. & Hogan, B. L. M. Sox2 is required for development of 42. Liang, H. et al. PTENα​, a PTEN isoform translated through alternative initiation, taste bud sensory cells. Genes Dev. 20, 2654–2659 (2006). regulates mitochondrial function and energy metabolism. Cell Metab. 19, 9. Que, J. et al. Multiple dose-dependent roles for Sox2 in the patterning and 836–848 (2014). differentiation of anterior foregut endoderm.Development 134, 2521–2531 43. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change (2007). and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 10. Boumahdi, S. et al. SOX2 controls tumour initiation and cancer stem-cell (2014). functions in squamous-cell carcinoma. Nature 511, 246–250 (2014). 11. Rudin, C. M. et al. Comprehensive genomic analysis identifies SOX2 as a Supplementary Information is available in the online version of the paper. frequently amplified gene in small-cell lung cancer.Nat. Genet. 44, 1111–1116 (2012). Acknowledgements We thank J. Que for the R26-Sox2-IRES-eGFP mice, 12. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical D. Xu and the Cornell Genomics Facility for sequencing support, the Rockefeller profiles using the cBioPortal.Sci. Signal. 6, pl1 (2013). Proteomics Facility for protein/peptide analyses, members of the Fuchs’ 13. Beronja, S. et al. RNAi screens in mice identify physiological regulators of laboratory for discussions, and L. Polak and M. Sribour for their support with oncogenic growth. Nature 501, 185–190 (2013). tumorigenesis studies. We thank E. Heller for bioinformatics support, L. Calviello 14. Beronja, S., Livshits, G., Williams, S. & Fuchs, E. Rapid functional dissection of for support with RiboTaper, and F. Garcia-Quiroz and M. Jovanovic for critical genetic networks via tissue-specific transduction and RNAi in mouse reading of the manuscript. The Rockefeller University Proteomics Resource embryos. Nat. Med. 16, 821–827 (2010). Center acknowledges funding from the Leona M. and Harry B. Helmsley 15. Liu, K. et al. Sox2 cooperates with inflammation-mediated Stat3 activation in Charitable Trust and Sohn Conferences Foundation for mass spectrometer the malignant transformation of foregut basal progenitor cells. Cell Stem Cell instrumentation. The results published here are in part based upon data 12, 304–315 (2013). generated by the TCGA Research Network (http://cancergenome.nih.gov/). 16. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S. & Weissman, J. S. A.S. was supported by the Human Frontier Science Program Organization Genome-wide analysis in vivo of translation with nucleotide resolution using (HFSP, LT000639-2013) and is currently supported by the People Programme ribosome profiling.Science 324, 218–223 (2009). (Marie Curie Actions) of the European Union’s Seventh Framework Programme 17. Brar, G. A. et al. High-resolution view of the yeast meiotic program revealed FP7 under REA grant agreement no. 629861. S.N. is a Damon Runyon by ribosome profiling.Science 335, 552–557 (2012). Fellow (DRG-2183-14). B.H. was supported by a Medical Scientist Training 18. Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse Program grant from the National Institute of General Medical Sciences of embryonic stem cells reveals the complexity and dynamics of mammalian the National Institutes of Health under award number T32GM007739 to the proteomes. Cell 147, 789–802 (2011). Weill Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program. 19. Tyner, A. L. & Fuchs, E. Evidence for posttranscriptional regulation of the E.F. and J.S.W. are Investigators of the Howard Hughes Medical institute. This keratins expressed during hyperproliferation and malignant transformation work was supported by grants to E.F. from the National Institutes of Health in human epidermis. J. Cell Biol. 103, 1945–1955 (1986). (R37-AR27883) and NYSTEM CO29559. 20. Signer, R. A. J., Magee, J. A., Salic, A. & Morrison, S. J. Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 Author Contributions A.S. and E.F. conceived the project, designed the (2014). experiments and wrote the manuscript. A.S. and B.H. performed the 21. Blanco, S. et al. Stem cell function and stress response are controlled by experiments, and collected and analysed data. J.G.D., E.H.R., J.S.W. and N.C.G. protein synthesis. Nature 534, 335–340 (2016). contributed to ribosome profiling data analysis. D.S. contributed to control 22. Calviello, L. et al. Detecting actively translated open reading frames in shRNA library generation and established HrasG12V; Tgfbr2-null cell lines. ribosome profiling data.Nat. Methods 13, 165–170 (2016). S.N. contributed to OPP experiments. J.L. carried out in utero lentiviral 23. Slavoff, S. A.et al. Peptidomic discovery of short open - injections. H.M. and B.D.D. performed proteomics experiments and analysed encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013). proteomics data. E.F. supervised the project. All authors discussed the results 24. Falini, B. et al. Translocations and mutations involving the nucleophosmin and edited the manuscript. (NPM1) gene in lymphomas and leukemias. Haematologica 92, 519–532 (2007). Author Information Reprints and permissions information is available at 25. Yang, H. et al. ETS family transcriptional regulators drive chromatin dynamics www.nature.com/reprints. The authors declare no competing financial and malignancy in squamous cell carcinomas. eLife 4, e10870 (2015). interests. Readers are welcome to comment on the online version of the 26. Morris, D. R. & Geballe, A. P. Upstream open reading frames as regulators of paper. Correspondence and requests for materials should be addressed to mRNA translation. Mol. Cell. Biol. 20, 8635–8642 (2000). E.F. ([email protected]).

6 | NATURE | VOL 000 | 00 MONTH 2017 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

METHODS Ribosome occupancy was quantified for mRNAs with more than an average of Ribosome profiling. In vivo sample preparation: mouse skins of P4 mice were 128 reads over all replicates. 5′​ UTR translation was quantified for mRNAs with collected immediately after euthanization and placed in ice-cold PBS supplemented an average of more than 16 reads over all replicates. Throughout our study, we use with 8 mg ml−1 cycloheximide (CHX, in dimethyl sulfoxide) for 5 min. This con- 5′ ​UTR translation as a proxy for uORF translation. Relative 5′ ​UTR translation was centration of CHX has been shown to reduce potential CHX-related artefacts44 and calculated for each gene using the formula: relative 5′ ​UTR translation =​ counts in ensures rapid inhibition of translation within the skin. Skins were then placed into the 5′​ UTR/counts in the CDS. 2 ml of dispase (Corning 354235) supplemented with 8 mg ml−1 CHX for 20 min Translation efficiency (TE) was computed for each merged gene using the at 37 °C. Epidermis was separated from dermis under a ­dissection scope using fine formula: forceps. Epidermis was then placed immediately in 4 ml trypsin supplemented­ with TE =​ RPKM of CDS in ribosome profiling/RPKM in exon of RNA-seq. 4 mg ml−1 CHX and incubated for 12 min at 37 °C. The ­resulting cell ­suspension, Only RNAs with more than 256 reads were included for translation efficiency enriched for basal epidermal keratinocytes, was then filtered through a 40-μ​m calculations. cell strainer, spun down and resuspended in lysis buffer as reported previously­ 45. For metagene analysis, annotated 5′​ UTR from the list of quantified uORFs, Cells were lysed on ice for 10 min, centrifuged at 16,000g at 4 °C for 10 min translated mRNAs were selected from the ENSEMBL database. 5′​ UTRs were and ­supernatant was flash-frozen in liquid nitrogen. RNA concentration was then scaled to equal number of windows and average signal was plotted. For start ­determined using the Quant-it Ribogreen assay kit (R11490, Thermo Fisher). In codon usage (Fig. 3d), the top 5% of SOX2-regulated uORFs from P4 epidermis most experiments, three P4 mice were pooled per sample. A sample was taken were quantified. for RNA-seq analysis for comparison to ribosome profiling and calculation of eIF2A-regulated uORFs were defined as ratio of 5′​ UTR translation in SCC translational efficiency. control/5′​ UTR translation in SCC Eif2a knockout >​4 (n =​ 716). The remaining In vitro sample preparation: keratinocytes were treated with Harringtonine mRNAs with a ratio <​4 were defined as non-eIF2A regulated (n =​ 746). (Abcam, 141941) for 5 min (2 μ​g ml−1) to block initiation-specific translation, All calculations were made in R, some graphs were plotted using ggplot2 in R. CHX (Sigma) for 1 min (100 μ​g ml−1) to block translational elongation or no Gene lists were imported into the Ingenuity Pathway Analysis software (Ingenuity drug, as indicated. Cell lysis was performed in the dish. A sample was taken for Systems), and analyses and graphic outputs of relative enrichment in functional RNA-seq analysis. gene categories were performed as recommended. In vivo tumour sample preparation: tumours were collected immediately after Western blotting. Total cell lysates were prepared using RIPA (20 mM Tris- euthanization and placed in ice-cold PBS supplemented with 8 mg ml−1 CHX HCl, pH 8.0, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.5% (in dimethyl sulfoxide) for 5 min. Tumours were broken up into small pieces deoxycorate, 0.1% SDS) supplemented with protease inhibitors (Complete mini, using a scalpel and incubated for 20 min in 0.5% collagenase supplemented with Roche) and Halt phosphatase inhibitor cocktail (Thermo Fisher, 1862495). The 8 mg ml−1 CHX (in dimethyl sulfoxide). Samples were spun down and resuspended protein concentrations of clarified supernatants were measured by using BCA 4 ml trypsin supplemented with 4 mg ml−1 CHX for 12 min at 37 °C. The resulting Protein Assay Kit (Thermo Scientific). Gel electrophoresis was performed using cell suspension was then filtered through a 40-μ​m cell strainer, spun down and 4–12% NuPAGE Bis-Tris gradient gels (Life Technologies), transferred to nitro- resuspended in lysis buffer. cellulose membranes (GE Healthcare, 0.45 μ​m). Membranes were blocked for The ribosomal profiling technique was carried out as reported previously45, with 90 min in 2% BSA in TBS with 0.1% Tween 20 (TBST). Membranes were then a few modifications as described below. Lysates were treated with 2.5 U RNase I incubated with primary antibodies in the blocking buffer overnight at 4 °C. After (Ambion) per microgram RNA. Ribosome-protected fragments were isolated using ­washing with TBST, membranes were incubated with secondary antibodies in the sephacryl S400 columns (GE Healthcare) in TE buffer. rRNAs were removed using blocking buffer at room temperature for 45 min. The membranes were washed the Ribo-zero magnetic kit (Illumina, MRZH11124). Finally, ribosome-protected in TBST, then incubated for 1 min with ECL western blotting detection reagent fragments were amplified in 7–9 PCR cycles using Phusion polymerase (NEB). (GE Healthcare, RPN2209). Chemiluminescent protein bands were analysed Resulting amplicons were run on an 8% acrylamide non-denaturing gel, excised using CL-X posure film (Thermo Scientific, 34091). The following primary anti­ and incubated in 300 mM NaCl, 10 mM Tris (pH 8) and 1 mM EDTA-containing bodies were used: eIF2A (Proteintech, 11233-1-AP, 1:1,000), 4-E-BP1 (53H11, Cell buffer overnight at 20 °C. The ribosome-protected fragments library was then Signaling, 1:1,000), p-4E-BP1 (236B4, Cell Signaling, 1:1,000), p-eIF2α ​ (119A11, precipitated in 2 μ​l of glycogen and 700 μ​l isopropanol. Samples were analysed Cell Signaling, 1:1,000), eIF2α ​(D7D3, Cell Signaling, 1:1,000), β-actin​ (8H10D10, on a bioanalyzer before sequencing. Libraries were sequenced on HiSeq2500 and 1:1,000), GAPDH (14C10, Cell Signaling, 1:1,000), SOX2 (Abcam, 92494, 1:1,000). HiSeq4000 platforms. Immunofluorescence immunohistochemistry and imaging. For immunoflu- For parallel RNA-seq, RNA was purified using Direct-zol RNA MiniPrep orescence microscopy, skin was embedded in optimal cutting temperature com- kit (Zymo Research) per manufacturer’s instructions. Quality of the RNA was pound (OCT) (VWR). Cryosections were cut at a thickness of 12 μ​m on a Leica determined using Agilent 2100 Bioanalyzer, with all samples passing the quality cryostat and mounted on SuperFrost Plus slides (VWR). Sections were incubated threshold of RNA integrity numbers (RIN) > ​8. Library preparation using Illumina in 2% paraformaldehyde for 10 min, washed and blocked for 1 h in blocking buffer TrueSeq mRNA sample preparation kit was performed at the Weill Cornell Medical (5% normal donkey serum, 1% BSA, 2% fish gelatin, 0.3% Triton X-100 in PBS). College Genomic Core facility, and cDNA was sequenced on Illumina HiSeq 2500. Slides were incubated at 4 °C overnight in a primary antibodies diluted in blocking Reads were mapped to mm10 build of the mouse genome using TopHat2, and buffer. The following primary antibodies were used: SOX2 (Abcam, 92494, 1:100), differential expression was determined using DESeq2. integrin β​4 (ITGB4; rat, 1:100, BD Pharmingen), K5 (guinea-pig, 1:500, Fuchs Sequencing alignment and mapping. Sequencing reads were clipped and laboratory), After washing with PBS, sections were treated for 1 h at room temper- trimmed using fastx_clipper and fastx_trimmer from the Hannon laboratory. ature with secondary antibodies conjugated with Alexa 488, Alexa 594, or Alexa Fragments derived from rRNA were removed using Bowtie2 (ref. 46), aligning to 647 (Life Technologies). Slides were washed, counterstained with 4’6’-diamidino- the 45S preRNA sequence. The remaining non-ribosomal reads were then aligned 2-phenilindole (DAPI), and mounted in Prolong Gold (Life Technologies). Images to the mm10 mouse genome using TopHat2 (ref. 47). were acquired with an Axio Observer Z1 epifluorescence microscope equipped Gene expression quantitation. For all analyses we used GRCm38.78 of the mouse with a Hamamatsu ORCA-ER camera (Hamamatsu Photonics), and with an genome transcript annotation from Ensembl. Sets of genes whose transcripts ApoTome.2 (Carl Zeiss) slider that reduces the light scatter in the fluorescent shared one or more exact exons were collapsed to ‘merged’ genes. Genomic coor- samples, using a 20× objective, controlled by Zen software (Carl Zeiss). RGB dinates occupied by more than one merged gene on the same strand were excluded images were assembled using Imaris. Panels were labelled in Adobe Illustrator from analysis. Within each merged gene, the remaining nucleotide positions were CS5. For immunohistochemistry, samples were processed, embedded in paraffin, then divided into the following classes: ‘exon’ was the union of all positions across and sectioned at 4 μm.​ Immunohistochemistry was performed on a Bond Rx auto- all transcripts belonging to the merged gene; for coding genes, ‘5′ ​UTR’ contained stainer (Leica Biosystems) with heat-mediated antigen retrieval using standard­ all positions that were uniquely labelled as 5′​ UTR in all transcripts; ‘CDS’ con- protocols. Antibodies used were rabbit monoclonal primary antibodies for tained all positions that were uniquely labelled as CDS in all transcripts; and ‘3′​ phospho-4E-BP-1. Bond Polymer Refine Detection (Leica Biosystems) was used UTR’ contained positions labelled as 3′​ UTR in all transcripts. according to manufacturer’s protocol. Sections were then counterstained with The total number of mRNA fragments and ribosome footprints aligning to each haematoxycilin, dehydrated and film coverslipped using a TissueTek-Prisma and class of positions was tabulated for each merged gene. Gene merging, position Coverslipper (Sakura). Whole slide scanning (40×)​ was performed on an Aperio classification, and expression counting were performed using the cs script from AT2 (Leica Biosystems). the Plastid toolkit48. Statistics. Data were analysed and statistics performed in Prism6 (GraphPad) and These count data were then taken into DESeq2, an R package designed for the R. Significant differences between two groups were noted by asterisks or actual analysis of Illumina sequencing-based assays, which estimates and accounts for bio- P values (*​P <​ 0.05; *​*​P <​ 0.01; *​*​*​P <​ 0.001). Replicates (n) in this study refer logical variability in a statistical test based on the negative binomial distribution43. to biological replicates.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Analysis of human HNSCC patient data. We analysed the publicly available data- dilution analysis52. The maximal tumour size allowed as per IACUC committee sets of the The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov/). The was 20 mm, or smaller if tumour necrosis or bleeding occurred. cBioPortal for Cancer Genomics developed and maintained by the Computational In vitro cell culture experiments. Newborn, primary mouse epidermal keratino- Biology Center at Memorial Sloan-Kettering Cancer Center was used to mine the cytes from R26-Sox2-IRES-eGFPfl/+ and K14-cre; R26-Sox2-IRES-eGFPfl/+ were publicly available TCGA dataset on HNSCC12. cultured on 3T3-S2 feeder layer in 0.05 mM Ca2+ E-media supplemented with Lentivirus production and transduction. Production of vesicular stomatitis virus 15% serum53. HrasG12V; Tgfbr2 knockout cell line was generated previously in the (VSV-G) pseudotyped lentivirus was performed by calcium phosphate transfec- Fuchs laboratory25. Cell lines were cultured in E medium with 15% FBS and 50 mM tion of 293FT cells (Invitrogen) with pLKO.1 and helper plasmids pMD2.G and CaCl2. Cell lines were not tested for mycoplasma infection. psPAX2 (Addgene plasmid 12259 and 12260). Viral supernatant was collected 46 CRISPR in HrasG12V; Tgfbr2 knockout SCC cells. Three sgRNAs against Eif2a h after transfection and filtered through a 0.45-μm​ filter. For lentiviral infections in were selected from the GeCKO library and cloned into a lenti-CRISPR v2 vector culture, cells were plated in 6-well plate at 1.0 ×​ 105 cells per well and incubated (v2 vector was a gift from F. Zhang, Addgene plasmid 52961)54. Non-targeting with viruses in the presence of polybrene (20 mg ml−1) for 30 min, and then plates control sgRNA was used side-by-side with sgRNAs against target sites through- were spun at 1,100g for 30 min at 37 °C in a Thermo IEC CL40R centrifuge. Infected out the experiments to rule out phenotypic changes due to nonspecific editing. cells were selected with puromycin. HrasG12V; Tgfbr2 knockout mice were infected and selected with puromycin For in vivo lentiviral transduction, the same viral supernatant as above was (3 μ​g ml−1) to obtain stable Eif2a knockout pools. To obtain clonal cell lines, filtered (0.45-μm​ filter) and concentrated by ultracentrifugation. Final viral particle single-cell sorting into 96-well plates was performed. Genomic DNAs from single was resuspended in viral resuspension buffer (20 mM Tris pH 8.0, 250 mM NaCl, clones were isolated, from which the targeted Eif2a locus was PCR amplified and 10 mM MgCl2, 5% sorbitol) and 0.5 μl was in utero injected into E9.5 embryos, as Sanger sequenced to confirm editing. Three frameshift mutations were selected described before49. For knockdown experiments, we used clones from the Broad (targeted by sg1 and sg3). Knockout was confirmed by western blot. Institute’s Mission TRC mouse library. For shRNA screen, virus was diluted in Sequences of the guides. Eif2a sg1: 5′​-TATAATCAATGTCGCTAACA-3′​; Eif2a order to obtain a 10% infection rate. sg2: 5′​-TGTAAGGCTGCCACGTTGCC-3′​; Eif2a sg3: 5′​-ACCGTGCTTTC Sample preparation and pre-amplification of shRNA screen. shRNA library TGTGAAGTG-3′;​ non-targeting: 5′-GCGAGGTATTCGGCTCCGCG-3​ ′;​ Ctnnb1 was injected in E9.5 embryos and skins were collected as P0 postnatal animals. To uORF: 5′-GGCCTCCTGCACTGACGGCT-3​ ′​. distinguish between SOX2 and wild type, P0 pups were assessed for GFP expression Measurement of protein synthesis. For protein synthesis analysis, 104–105 cells (SOX2+). After euthanization, back and head skin was collected and epidermal were plated in 12-well plates. OPP (Jena Biosciences) was added for 1 h at a con- cells were isolated from P0 mouse skin using previously established procedures14. centration of 50 μ​M. Cells were then removed from wells and washed with PBS. For t =​ 0, keratinocytes were infected in vitro in three independent experiments The cell suspension was fixed in 0.5 ml of 1% paraformaldehyde in PBS for 15 min and gDNA was isolated 24 h later. on ice and permeabilized in PBS containing 0.1% saponin (Sigma) for 5 min at Cells from individual embryos were used for genomic DNA isolation with the room temperature. At this point the azide–alkyne cycloaddition reaction (using DNeasy Blood & Tissue Kit (Qiagen). gDNAs from transduced embryos of inde- Alexa Fluor 594 azide, Life Technologies) was allowed to proceed for 30 min in pendent experiments were pooled (3 independent experiments in wild type and the dark at room temperature using the Click-iT Cell Reaction Buffer Kit (Life SOX2, total coverage of >400​ ×​). Typically two litters were pooled and processed Technologies). Cells were then washed three times in PBS supplemented with 2% separately as independent experiments (n =​ 3). Total DNA was used as template FBS, resuspended in PBS, and immediately analysed by flow cytometry. As addi- in pre-amplification reaction with 25 cycles and Phusion High-Fidelity DNA tional negative control, translational elongation inhibitor 100 μg ml​ −1 CHX (Sigma) Polymerase (NEB). Per embryo, 4 μ​g of DNA was amplified. PCR products were was added 30 min before OPP. OPP fluorescence signal was compared between then run on an 8% TBE gel and a clean ~​240-bp band was isolated using DNA different genotypes after subtracting fluorescence from control (no OPP) samples. resuspension buffer (300 mM NaCl, 10 mM Tris (pH 8) and 1 mM EDTA) and Quantification of total proteome and pulsed SILAC labelling. Protein was incubated overnight at 20 °C. Samples were tested using the Agilent bioanalyzer. extracted from mouse epidermis tissue for total proteome profiling and from Final samples were then sent for Illumina HiSeq 2500 sequencing. HrasG12V; Tgfbr2 or HrasG12V; Tgfbr2; Eif2a knockout cells pulsed with heavy Statistical analysis of relative shRNA abundance. For each genotype, pooled arginine (R10) and lysine (K6) for 0, 12, 24, 48 and 72 h for pulsed SILAC to measure DNAs from three independent experiments were sequenced independently. newly translated proteins. Cell pellets were disrupted in 8 M urea and passed Illumina reads were trimmed to the 21-nucleotide hairpin sequence using the through a 20-gauge needle for 10 cycles and placed in a bath sonicator on ice for fastx toolkit and aligned to the TRC 2 library with Bowtie using a maximum edit 30 min. Fifty micrograms of protein for three replicates from each condition was distance of 3. Only shRNAs that showed at t =​ 0 more than 150 reads over three processed in parallel as follows: cysteines were reduced with dithiothreitol (Sigma) independent experiments were included for further analysis, resulting in a total before alkylation with iodoacetamide (Sigma). Proteins were digested with LysC of 601 shRNAs. shRNA abundance was then normalized and representation of (Wako Chemicals) followed by trypsin (Promega) and desalted with Empore C18 shRNAs was compared against a control library with 35 non-targeting shRNAs STaGETips (3M)55. (ratio of shRNA to average behaviour of control library). Representation of shRNAs One microgram of total protein was injected for nano-LC–MS/MS analysis. For were then analysed using the RIGER algorithm27, which ranks shRNAs according the total proteome analysis, peptides were separated using a 12 cm ×​ 75 μ​m C18 to their differential effects between two classes of samples, then identifies the genes column (Nikkyo Technos) at a flow rate of 200 nl min−1, with a 5–40% gradient targeted by the shRNAs at the top of the list. Please also see Extended Data Fig. 7 over 160 min (buffer A 0.1% formic acid, buffer B 0.1% formic acid in acetonitrile), for additional methods and metrics. and a Q-Exactive Plus (ThermoScientific) was operated in data-dependent mode Mouse strains. Rosa26-CAG-loxP-stop-loxP-Sox2-IRES-eGFP (R26-LSL-Sox2- with a top 20 method. For the pulse SILAC analysis, peptides were separated with IRES-eGFP+/+) mice were donated by J. Que and were maintained in a B6/svev129 an EasySpray 50 cm column (ThermoScientific) over a 3 h gradient, at a flow rate mixed background. K14-cre(tg) mice were maintained in a CD-1-ICR background. of 300 nl min−1 and analysed by an Orbitrap Lumos operated in ‘top speed’ mode, Nude mice (Nu/Nu) for tumour cell grafts were from Charles River. Rosa26-CAG- with HCD/ion trap MS/MS scans. loxP-stop-loxP-Cas9 mice were obtained from The Jackson Laboratory50. For ribo- Mass spectrometry data were analysed using MaxQuant (version 1.5) and some profiling studies, females and males were used; for tumour allografts, only Perseus software (version 1.4), searching against a Uniprot Mus musculus database females were used. (downloaded July 2014), allowing oxidation of methionine and protein N-terminal Mice were housed and cared for in an AAALAC-accredited facility, and all ani- acetylation, and filtering at a 1% false discovery rate at the peptide and protein mal experiments were conducted in accordance with IACUC-approved protocols. level. Proteins were quantified using LFQ values for the total proteome analysis; Sample sizes to ensure adequate power were chosen using Gpower software51. proteins were deemed significantly changing by t-test, using a corrected FDR less No animals were excluded from analysis. Randomization and blinding were not than 5%. For the pulse SILAC analysis, LFQ values from the heavy channel were used in this study. used for quantitation. Tumour formation. For allograft transplantation, 1.0 ×​ 105 mouse primary Identification of uORF peptides. To identify proteins expressed from the tumour cells were subcutaneously injected with growth-factor reduced Matrigel untranslated region, a peptidome enrichment from epidermis tissue and pro- (Corning, 356231) in nude female mice (6–8 weeks old). Tumour size was meas- tein N-terminus enrichment from P4 epidermis and SCCs was conducted before ured every 5 days and calculated using the formula V =​ π​/6 ×​ length ×​ width2. LC–MS/MS analysis and searching against a 5′​ UTR database. The database was For limit-­dilution transplantation, 1.0 ×​ 102–1.0 ×​ 104 mouse SCC cells were sub- generated by including all 5′​ UTRs with an average of more than 16 reads over all cutaneously injected with growth-factor reduced Matrigel (Corning, 356231) in P4 in vivo epidermis samples (3 wild type and 3 SOX2). nude mice and tumour formation was assessed 4 weeks after injection. Estimated The peptidome was enriched by passing 100 μ​g of undigested protein through percentage of tumour-initiating cells were analysed by ELDA extreme limiting a 30 kDa molecular mass cut-off filter (Millipore). Flow-through was further

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH enriched by C18 reversed Oasis phase solid phase extraction columns (Waters), 44. Gerashchenko, M. V. & Gladyshev, V. N. Translation inhibitors cause eluting in 50% acetonitrile, 0.1% TFA. The enriched peptidome was either analysed abnormalities in ribosome profiling experiments.Nucleic Acids Res. 42, e134 directly, or tryptic digested as above, to cover both short and long uORF peptides. (2014). 56 45. Ingolia, N. T., Brar, G. A., Rouskin, S., McGeachy, A. M. & Weissman, J. S. The Protein N-terminal peptides were enriched by the TAILS method . In brief, ribosome profiling strategy for monitoring translationin vivo by deep after protein denaturation, free amines were blocked by dimethyl labelling, sequencing of ribosome-protected mRNA fragments. Nat. Protocols 7, digested, and internal (neo-N-terminal unblocked) peptides were depleted by 1534–1550 (2012). crosslinking to HPG-ALD polymer. Enriched N termini were analysed by an 46. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Orbitrap Lumos operated in top speed mode, with alternating CID/ion trap and Methods 9, 357–359 (2012). 47. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the HCD/Orbitrap MS/MS scans. presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 Data were searched with Proteome Discover 1.4 using the Mascot search engine. (2013). Spectra not matched to the mouse proteome database with at least medium confi- 48. Dunn, J. G. & Weissman, J. S. Plastid: nucleotide-resolution analysis of dence (Percolator, confidence >95%)​ were exported for search of the 5′ ​UTR data- next-generation sequencing and genomics data. BMC Genomics 17, 958 base, using 6-frame translation, semitryptic constraints, and peptide N-terminal (2016). 49. Beronja, S. & Fuchs, E. RNAi-mediated gene function analysis in skin. Methods acetylation. Data from the TAILS experiment were searched with semi-ArgC Mol. Biol. 961, 351–361 (2013). constraints with lysine dimethylation as a stable modification and N-terminal 50. Platt, R. J. et al. CRISPR-Cas9 knockin mice for genome editing and cancer dimethylation as a variable modification. High confidence (>99%)​ peptides within modeling. Cell 159, 440–455 (2014). 5 p.p.m. accuracy were submitted to BLAST to confirm that they did not match a 51. Faul, F., Erdfelder, E., Buchner, A. & Lang, A.-G. Statistical power analyses using predicted protein or contaminant. Manual analysis of MS/MS matches confirmed G*​Power 3.1: tests for correlation and regression analyses. Behav. Res. Methods 13 uORF peptide sequences (Extended Data Fig. 5). 41, 1149–1160 (2009). 52. Hu, Y. & Smyth, G. K. ELDA: extreme limiting dilution analysis for comparing Minimum free energy calculation. Ribosome footprint reads were counted relative depleted and enriched populations in stem cell and other assays. J. Immunol. to a consensus 5′​ UTR sequence for each gene as annotated by plastid. These Methods 347, 70–78 (2009). genomic coordinates were used to build a library of 5′​ UTR sequences using bed- 53. Blanpain, C., Lowry, W. E., Geoghegan, A., Polak, L. & Fuchs, E. Self-renewal, tools. These sequences were input for the RNAfold algorithm of the ViennaRNA multipotency, and the existence of two cell populations within an epithelial Package57. The algorithm computes base pairing probabilities of nucleotides within stem cell niche. Cell 118, 635–648 (2004). 54. Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide an RNA sequence to determine a single thermodynamically favoured structure as libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014). well as its free energy. This library of structural data was used to evaluate genes 55. Ishihama, Y., Rappsilber, J. & Mann, M. Modular stop and go extraction tips with 5′​ UTRs differentially regulated by SOX2. with stacked disks for parallel and multidimensional peptide fractionation in RiboTaper analysis. The Ribotaper22 pipeline was used to predict translated proteomics. J. Proteome Res. 5, 988–994 (2006). regions in ribosome profiling data from SOX2 keratinocytes (sample number 16). 56. Kleifeld, O. et al. Identifying and quantifying proteolytic events and the natural N terminome by terminal amine isotopic labeling of substrates. Nat. Protocols Data availability. The data that support the findings of this study have been depo­ 6, 1578–1611 (2011). sited in the Gene Expression Omnibus (GEO) repository with the accession code 57. Hofacker, I. L. RNA secondary structure analysis using the Vienna GSE83332. All other data are available from the corresponding author(s) upon RNA package. Curr. Protoc. Bioinformatics Chapter 12, Unit12.2 reasonable request. (2009).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Extended Data Figure 1 | Consequence of SOX2 expression in the Plots show RPKM correlations between the 3 independent replicate epidermis and correlation between in vivo ribosome profiling experiments in wild-type P4 epidermis. Quantified in this study were mRNAs experiments. a, Papilloma formation induced by SOX2 overexpression with >​128 reads. c, N-terminal extension of the translated Swi5 mRNA in in the skin. SOX2 expression in tamoxifen-inducible K14-creER; wild-type and SOX2 P4 epidermis in vivo. Tracks show ribosome profiling R26-Sox2-IRES-eGFPfl/fl mice results in hyperplasia and papilloma reads along the Swi5 mRNA for replicate samples of each genotype. The final formation. Representative H&E sections are shown. Animals develop track shows harringtonine-treated ribosome profiling reads of wild-type severe skin lesions in the ventral epidermis 6–8 weeks after tamoxifen keratinocytes in vitro. Harringtonine blocks ribosomes at the translational injection and require euthanasia. b, Experimental strategy to perform start site and allows translation start site mapping18. Red arrow indicates in vivo epidermis-specific ribosome profiling. In vivo epidermis-specific direction of translation. Green bar marks the annotated CDS. Blue bar denotes ribosome profiling strategy results in highly reproducible quantifications. the actual translated coding sequence based upon ribosomal profiling.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

Extended Data Figure 2 | See next page for caption.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Extended Data Figure 2 | Global translational efficiency is decreased and cultured in vitro. Quantified were genes with more than 256 reads in upon SOX2 expression in keratinocytes in vitro. a, Ribosome profiling RNA-seq data. Histogram shows distribution of differential translational data correlates with proteomics data. At a stage when proliferation and efficiency (TE =​ RPKMribosome profiling/RPKMRNA-seq). Data are shown morphology were similar, freshly isolated, basally enriched keratinocytes from 4 independent ribosome profiling experiments and 2 independent from wild-type and SOX2-expressing P4 skins were subjected to in vivo RNA-seq experiments. c, Wild-type and SOX2 keratinocytes have similar ribosome profiling and to a label-free proteomics strategy. Plotted are proliferation rates in vivo. Basal epidermal EdU incorporation in P0, P2 proteomics fold changes compared to ribosome profiling fold changes. and P4 mice (n =​ 442/496, 385/449 and 903/841 cells from duplicate wild- Comparisons are made in SOX2 versus wild-type samples for significantly type/SOX2 animals) 1 h after injection. Data are mean ±​ s.d. d, Phospho- changed proteins (false discovery rate (FDR) <​ 0.05). b, Translational 4E-BP1 immunohistochemistry in the epidermis shows no difference in efficiency is markedly reduced in SOX2-expressing premalignant levels upon SOX2 induction. e, NSUN2 transcript, translation, and protein keratinocytes in vitro. Keratinocytes were isolated from R26-Sox2-IRES- levels in SOX2 versus WT P4 epidermis. eGFPfl/+ (WT) or K14-cre; R26-Sox2-IRES-eGFPfl/+ (SOX2+) P0 animals

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

Transcriptome upregulated in SOX2 Transcriptome downregulated in SOX2

Superpathway of cholesterol biosynthesis Hepatic fibrosis

NRF2-mediated oxidative stress response Arginine degradation I (Arginase pathway)

VDR/RXR Activation Ethanol degradation IV

Cholesterol Biosynthesis I L-carnithine biosynthesis

Cholesterol Biosynthesis II Aryl hydrocarbon receptor signaling (via 24,25-dihydrolanosterol) 6 4 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 -7 - -5 - -3 -2 -1 0 0 0 0 0 1 1 10 10 1 10 10 1 1 10 10 10 10 10 10 10 10 10 p value p value

Translatome downregulated in SOX2 Translatome upregulated in SOX2

Ethanol degradation IV Sertoli cell-Sertoli cell junction signaling

14-3-3-mediated signaling Ethanol degradation II

Germ cell sertoli cell junction signaling Oxidative Ethanol degradation III

NRF2-mediated oxidative stress response CDP-diacylgylcerol biosynthesis I

G 12/13 signaling Histamine degradation

0 -6 -5 -4 -3 -2 -1 -8 -7 -6 -5 -4 -3 -2 -1 0 0 0 0 0 0 0 0 0 1 10 10 10 10 10 1 10 10 1 1 1 1 1 10 1 p value p value

Translational efficiency upregulated in SOX2 Translational efficiency downregulated in SOX2

Molecular mechanisms of cancer Superpathway of cholesterol biosynthesis Glioblastoma multiforme signaling Cholesterol Biosynthesis I

Glucocorticoid receptor signaling Cholesterol Biosynthesis II (via 24,25-dihydrolanosterol) Cholesterol Biosynthesis III (via Desmosterol) Ovarian cancer signaling Atherosclerosis signaling Wnt/Ca+ pathway 0 5 0 -5 0 -2 -1 -1 0 10 1 10 10 10 -5 -4 -3 -2 -1 0 0 0 p value 10 10 1 10 10 1 p value

Relative uORF usage upregulated in SOX2 Relative uORF usage downregulated in SOX2

Wnt/ -catenin eIF2 signaling Molecular mechanisms of cancer mTOR signaling PPAR /RXR Activation Regulation of eIF4 and p70S6K Protein kinase A signaling p38 MAPK signaling Mouse embryonic stem cell pluripotency Stat3 pathway 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 -1 0 0 0 10 0 10 10 1 10 10 1 10 1 10 5 0 5 0 -5 0 1 -2 -2 -1 -1 0 10 p value 10 10 10 1 10 p value Extended Data Figure 3 | Translationally controlled genes show a shift and uORF/ORF ratio level. Included were the top 10% most upregulated towards cancer-related pathways. Pathway analyses reveal a shift towards and top 10% most downregulated genes at the levels of the transcriptome, cancer-related pathways in mRNAs resistant to the SOX2-mediated global translatome, and uORF usage. The translational efficiency list was decrease in translational efficiency and in mRNAs with preferential uORF restricted to the top and bottom-most 500 genes, corresponding to 6.6% translation in premalignant SOX2+ P4 epidermis. Ingenuity pathway of all genes. Total number of genes quantified: 4,725 transcriptome, 4,725 analysis (IPA) was used to analyse genes differentially regulated at four translatome, 7,605 translational efficiency, 1,830 uORFs. levels of control: transcriptional, translational, translational efficiency,

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

uORF1 a GTG 5’UTR ORF 141 _ F F L T R C S N S S E R Q R E * A D G W K S R V C R A A L R G D L R R Q L W V F F * L A V V I P A R D R G S E R T V G R A V C A E P R S G A T * E G S S F F S D S L * * F Q R E T E G V S G R L E E P C V Q S R A P G R P K K A A L 37 _

0 _ 0 _ 41 _ 11 _

0 _

WT 2 _ WT 0 _ 66 _ 0 _ 41 _

0 _ 0 _ 65 _ 65 _

0 _

SOX2 5 _ SOX2 0 _ 250 _ 0 _ 184 _ Harringtonine

0 _ 0 _ Myc uORF2 CTG

P R R R W E T L P I A A G R H F S L E L T I C E P G Q D S P G S G E G I F V T L A A A G K L C P L Q R A D T S H W N L Q S A S Q D R T P Q A P G R E F L P S P P L G N F A H C S G Q T L L T G T Y N L R A R T G L P R L R G G N F C uORF2ORF2 44 _ WT

0 _ uORF1 9 _

0 _ Ratio reads in 5’UTR/reads in CDS 10 _

0 _ Myc SOX2:WT= 2.7 30 _ SOX2

Eif4g2 SOX2:WT= 2.9 0 _ 127 _ Btg1 SOX2:WT= 3.9 Harringtonine

0 _ b UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics) Myc

315 _ uORF WT Rep 1 5’UTR ATG

0 _ 117 _ WT Rep 2 105 _ 0 _ 13 _ WT Rep 3 0 _ 82 _ 0 _ 66 _ SOX2 Rep 1 0 _ WT 12 _

0 _ 132 _ 0 _ SOX2 Rep 2 66 _

0 _ 42 _ 0 _ SOX2 Rep 3 132 _ SOX2 0 _ 303 _ 0 _ Harringtonine 42 _ WT in vitro 0 _ 0 _ Eif4g2 Eif4g2 uORF ATG c 5’UTR A A A A L R P S E L E M Q L S G S S E A T E L E A E A A G E V R A M * P G R R R R Q L F A P R S W K C N S R D P R R L P S W R R R R L G R S E R C D Q A 84 _ G G G S S S P L G A G N A T L G I L G G Y R A G G G G G W G G P S D V T R P WT Rep 1 84 _

0 _ 44 _ 0 _ WT Rep 2 44 _

0 _ 10 _ 0 _ WT Rep 3 10 _

0 _ 70 _ 0 _ WT SOX2 Rep 1 70 _

0 _ 75 _ 0 _ SOX2 Rep 2 75 _

0 _ 23 _ 0 _ SOX2 Rep 3 18 _ SOX2

0 _ 108 _ 0 _ Harringtonine 108 _ WT in vitro 0 _ 0 _

Btg1 Btg1

uORF usage in Harringtonine-treated samples d 200

150

100 Coun t

50

0

−6 −3 036 log2((SOX2 Harr. 5 UTR/ CHX ORF)/ (WT Harr. 5 UTR/CHX ORF)) Extended Data Figure 4 | Translation from 5′ UTRs. a, 5′ ​ UTR in the P4 epidermis in vivo. Harringtonine track shows main translation translation of Myc in the P4 epidermis in vivo. As described previously, initiation site in the 5′​ UTR. Right panels show higher magnification of the Myc mRNA contains several translated uORFs18. Right panels show higher uORF start codons. d, Relative uORF translation in SOX2 versus wild-type magnification of the uORF start codons. b, 5′​ UTR translation of Eif4g2 keratinocytes using Harringtonine-treated samples. For each gene, the (encoding eIF4γ​2) in the P4 epidermis in vivo. Harringtonine track shows ratio of ribosome footprints in 5′​ UTR of Harringtonine samples versus main translation initiation site in the 5′​ UTR. Right panels show higher CHX-treated ORFs (for normalization) was calculated. Histogram shows magnification of the uORF start codons. c, 5′​ UTR translation of Btg1 distribution of log2 fold changes in relative 5′​ UTR translation.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

S1 S2 S3 S4 TAILS workflow non-N-term 9378 7685 9832 9528 a b d Peptide count N-term 540431 521512 SOX2 keratinocytes 5.4% 5.3% 5.0% 5.1% NH2 Pre-TAILS PSMs non-N-term 1373911047 1462314029 NH (peptide-spectrum matches)N-term 812625 782764 5.6% 5.4% 5.1% 5.2% RiboTaper Blocking pipeline primary amines non-N-term 3098 3275 2773 3110 Peptide count N-term 2799 2412 2499 2343 47.5%42.4% 47.4%43.0% Post-TAILS PSMs non-N-term 6758 7388 6533 7118 215 AUG uORFs 9261 (peptide-spectrum matches)N-term 7614 6042 6869 6166 Trypsin 52.9%44.9% 51.2%46.4% annotated by RiboTaper Main ORFs 5’UTR digest e 238 _ in SOX2 keratinocytes 5’UTR WT Rep 1 877 _ NH2 NH2 0 _ WT Rep 1 77 _ 0 _ Median = 54 nt Median = 1176 nt 105 _ WT Rep 2 NH2 0 _ WT Rep 2 14 _ NH2 0 _ 35 _ WT Rep 3 300 WT Rep 3 0 _ NH2 NH2 0 _ 50 _ SOX2 Rep 1 72 _ SOX2 Rep 1 20000 0 _ 114 _ 0 _ s Reaction with 66 _ SOX2 Rep 2 SOX2 Rep 2 0 _ 48 _ s HPG-ALD polymer 0 _ SOX2 Rep 3 38 _ 0 _ 348 _ SOX2 Rep 3 Harringtonine NH NH 0 _ 15000 WT in vitro 0 _ Harringtonine 126 _ 200 NH Gltp 0 _ NH WT in vitro EAAGGDSPGADALGAGWGITK NH NH CTTGGGGTTTGGAAATTGGGCCCCCCCCCAAGGGCCCCCCGCGCCCGGAAGGGGGGGCCGGTCCCGCGGCCCCGGGGGGGCCCTTGGGTTCCCGGCCCCCGGCCCCCGCCCGGCCCTCCCCCGGGACGCCCCAAAGGCCGTCGCCACTTCCGCCCGCGCCTCGCAGCGAGCGCCGAGCG Hdgf 87777 _ AAPELASGAGIEAGAAR WT Rep 1 --->TTGCCCACCGCGCCCGGCCCTGTCCGAGCGGCGCGCGGGCGCAGACGCCGGTGTGGGGCGCTTGTGCCCCCCGGGGAAGGCTTCGCGGTCCGGGGGGGCCCGGGCAATCTCGGAGGAGGGCGGCGGGGGGCCCGCGCCGAGGGCCGGAGC 10000 0 _ 53 _ 99 _ WT Rep 1 Ultrafiltration WT Rep 2 0 _ 0 _ 77 _ 7 _ Length in nucleotide WT Rep 2 WT Rep 3 1 _ 100 0 _ 72 _ 6 _ Length in nucleotide WT Rep 3 SOX2 Rep 1 1 _ 0 _ 50 _ 94 _ 5000 SOX2 Rep 1 SOX2 Rep 2 0 _ 3 _ 19 _ 66 _ SOX2 Rep 3 SOX2 Rep 2 Enriched 1 _ 5 _ 348 _ 13 _ Harringtonine SOX2 Rep 3 N-terminally 0 _ 0 _ 0 WT in vitro 126 _ 0 blocked Harringtonine Gltp 2 _ WT in vitro uORF peptides ORF Hdgf c ADimAPELASGAGIEAGAAR AAcetylAPELASGAGIEAGAAR ADimAGGATAALEVWLGR m/z 770.4096 (-0.46 ppm) +2, Mascot: 92 m/z 777.4007 (1.42 ppm) +2, Mascot: 73 m/z 735.9054 (-2.0 ppm) +2, Mascot: 34 700 3.5 300 y₁₁⁺ y₁₄⁺ y₁₆⁺ 959.48987 1371.73267 3.0 1440.74353 600 250 y₁₂⁺ 500 1030.52734 2.5 y₇⁺ 872.49438 200 y₁₀⁺ 872.45844 400 y₈⁺ 2.0 y₁₅⁺ 150 744.39972 y₁₁⁺ y₁₂⁺ 1369.70532 300 1.5 959.48828 1030.52759 y₁₃⁺ b₈²⁺ y₈⁺ 100 y₁₃⁺ b₄⁺ y₁₅²⁺-H₂O 300.17490 y₁⁺ 200 411.18741 b₅⁺ y₉⁺ 1143.61023 1.0 b₄⁺ b₇⁺-H₂O 943.53174 y₁₃⁺ b₆⁺ y₁₀⁺ 676.34418 815.43903 y₉⁺ 175.11894 1143.60266 y₅⁺ 524.26953y₆⁺ 285.15552 b₆⁺ 510.26746 y₁₁⁺ b₅⁺ 581.33441 y₈⁺ 872.45789 y₁₄⁺ b₆²⁺ y₁₅²⁺ y₁₁⁺-H₂O b₈⁺ 1014.56854 1300.69434 b₄⁺ y₅⁺ y₁₅²⁺ y₉⁺ 445.25201 574.29297685.35895 941.47711 y₁₄⁺ b₂⁺ 457.24094 y₁₃²⁺ 1186.65991 50 y₂⁺ 510.29160 685.35077744.39996 y₁₁⁺-H₂O 298.14008y₄⁺ y₁₁²⁺ y₁₅⁺ 599.31580 397.20886445.24857 y₆⁺ 815.42999 y₁₃⁺-NH₃ 1272.65369 100 y₁₃⁺-H₂O 1272.65759 0.5 171.11276 650.84998 y₇⁺-NH₃ y₁₃⁺-NH₃ Intensity (Counts) 246.15578 941.48438 374.21448 1369.70215 574.29718 1126.57947 480.24509 1125.59778 855.47150 1283.67334 0 0 0.0 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 m/z m/z m/z

ADimSAGGGSDGGAAAGGR MAcet, OxVTPAMAEVLSAGPESVAGCR MAcetVTPAMAEVLSAGPESVAGCR m/z 623.7928 (0.91 ppm) +2, Mascot: 92 y₁₅⁺ m/z 1096.0148 (1.3 ppm) +2, Mascot: 32 m/z 725.6790 (-0.96 ppm) +3, Mascot: 74 y₉⁺ 1147.50830 100 932.42462 y₉⁺ 200 1.5 932.42688 80 y₁₁⁺ y₈⁺ 1090.49231 150 875.39606 60 b₂⁺, y₅²⁺-NH₃ 1.0 b₅⁺ y₁₈⁺ y₁₂⁺ 1801.85889 273.12646 y₁⁺ 558.25879 y₆⁺ y₁₂⁺ 1203.57605 649.30798 y₁₀⁺ 1203.57996 175.11903 y₁₀⁺ 100 y₄⁺ 40 b₇⁺, y₇⁺-H₂O y₁₃⁺ 1003.46069 b₂⁺ 1003.46576 y₁₅⁺ 463.20795 760.33966 1302.65637 y₁₄⁺ y₁⁺ y₅⁺ 289.12021 b₆⁺ 1502.73108 562.26971 b₈⁺-H₂O 0.5 y₁⁺ b₃⁺-H₂O y₁₁⁺ y₁₄⁺ y₈⁺ 1431.69543 175.11887 b₃⁺ y₁₃⁺ y₃⁺ b₅⁺ b₉⁺ b₁₀⁺ 689.30457 374.17117 855.38196 240.13411 875.39471 y₁₃⁺ 20 875.40936 50 1302.64246 175.11908 289.16077 372.18900 b₇⁺ 688.29193745.31403 1060.47583 y₇⁺ y₁₄⁺ b₂⁺ y₁₂⁺ 989.43933 y₁₅⁺-H₂O 778.33441 y₁₅⁺ b₃⁺ b₄⁺ 516.23950y₇⁺ y₉⁺ y₁₀⁺ y₁₂⁺-H₂O 1431.683111502.71777 Intensity (Counts) 187.10759 y₁₁²⁺-H₂O, b₆559.⁺ 30817 b₈⁺ 932.41559 1129.50037 258.14508315.16705 429.20938 631.26605731.34125 818.37598 1185.55713 0.0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 1400 1600 1800 200 400 600 800 1000 1200 1400 1600 m/z m/z m/z

V TPAMAEVLSAGPESVAGCR A SATGDSASERDSAAPAAAPTAEAPPPPSVITRPEPQA- Acet Acet AAcetSATGDSASER m/z 681.9981 (-2.1 ppm) +3, Mascot: 34 LPSSVIR m/z 547.2420 (0.97 ppm) +2, Mascot: 49 25 y₅⁺ y₉⁺ 549.26251 120 m/z 1479.4183 (1.7 ppm) +3, Mascot: 29 932.42645 y₂₅²⁺-NH₃, y₃₉³⁺ 2.0 y₈⁺ 1297.81 20 100 875.40399 y₇⁺ y₈⁺ 721.31116 822.35919 y₄⁺ y₁₁⁺ y₂₂²⁺ 1.5 80 463.20795 y₆⁺ 1090.49524 y₃₃²⁺ 15 1155.06 y₈²⁺ 649.30511 1623.65 438.20584 60 y₁⁺ y₁₀⁺ b₃₅²⁺ 1.0 y₃⁺ 1685.16 175.11925 1003.46100 y₁₂⁺ 10 392.17023 y₁₀⁺ b₂⁺ b₃⁺-H₂O 40 b₂⁺ 1203.57910 1067.36 b₁₂⁺ b₁₅⁺ y₃₁²⁺ y₃₅²⁺-NH₃ 254.11340 y₈²⁺-H₂O 243.13428 y₇⁺ 1190.15 1419.22 201.08725 b₄⁺, y₃⁺-H₂O y₄⁺ y₆⁺ 1544.79 1751.24 b₂₁⁺-NH₃ b₂⁺-H₂O y₉⁺-H₂O 778.35529 y₃₂³⁺ y₂₆²⁺ y₂₉²⁺ 0.5 373.17184402.6770462.6 23016 664.28918 875.38275 b₃⁺-H₂O 5 y₁₈²⁺ y₃₀²⁺ b₁₉⁺ 1910.76 183.07669 b₃⁺ y₁₀²⁺-H₂O y₈⁺-H₂O y₁₀⁺-H₂O 20 322.17651 y₁₁⁺-H₂O y₁₃⁺ y₁₄⁺ y₄⁺-H₂O b₁₇²⁺-H₂O, b₁ 1053.14 1354.561461.315094 .08 y₃⁺ y₉²⁺-H481.₂O71164 804.34601 y₉⁺ 456.15 785.18 973.71y₁₉²⁺-H₂O b₃₇³⁺ 1729.31 272.12421 438.19617 y₇⁺-NH₃ 962.41962 1198.99 y₁₇⁺-NH₃, b₃₈²⁺ 391.19342 893.39880 y₁₀⁺ Intensity (Counts) 1072.49487 1302.66113 1431.69153 b₂₀³⁺-H₂O, b₂ b₂₇³⁺, y₂₄³⁺-H₂O, y₂₄³⁺-NH₃ 1833.65 y₂⁺ y₆⁺-H₂O704.29132 603.19 831.16 1014.03 304.16083 646.29047 980.42737 0 0 0.0 200 400 600 800 1000 1200 1400 600 800 1000 1200 1400 1600 1800 2000 200 300 400 500 600 700 800 900 1000 m/z m/z m/z

(M)ADimQQLGLPQLRAVR IAcetVEDVWLLQNVLR AAcetTAKDimAMASKDimLLR m/z 541.6454 (3.5 ppm) +3, Mascot: 36 m/z 819.9631 (-1.5 ppm) +2, Mascot: 20 m/z 453.6104 (-1.6 ppm) +3, Mascot: 34 y₈²⁺ 900 476.86 300 200 800 b₇²⁺ b₅⁺ 250 441.35 671.24 700 150 600 y₆⁺ 200 742.45764 y₁₁³⁺ y₁₁²⁺ 500 y₁⁺ y₁⁺ 415.92789 b₆⁺-NH₃ 175.11896 623.38971 768.10 150 175.11900 y₁₀²⁺ 100 400 y₇⁺ b₈²⁺, y 572.86499 505.33 b₅⁺ 855.54382 y₈²⁺ y₉²⁺ b₁₂²⁺, y₆⁺- 300 598.31866 b₇⁺ 459.28220 b₆²⁺-NH₃ 724.95 y₄⁺-NH₃ 897.46979 100 y₃⁺ 537.34943 y₃⁺ b₁₀⁺ y₄²⁺-NH₃ 384.63 b₄⁺, y b₇⁺ y₂⁺ 484.28909 y₁₂⁺ y₂⁺ 401.28809 50 y₆²⁺ 881.22 200 387.27130 1252.65588 b₅⁺ y₁₁²⁺-NH₃ 242.19 b₉²⁺, y₁₀²⁺614.25 288.20264 y₄⁺ 1483.82629 288.20135 513.30261 y₅⁺ b₇³⁺-NH₃ 371.65 561.85 b₆⁺ b₇⁺-NH₃ y₂⁺-NH₃ b₉⁺ b₁₁⁺ 50 y₄²⁺ y₅²⁺ 614.88483 y₆⁺ b₈⁺ y₈⁺ y₆²⁺-NH₃ b₈⁺, y₉⁺ b₇²⁺-H₂O501.31516 1138.61414 1351.71875 b₁₂⁺ b₂⁺ 644.44556 Intensity (Counts) y₁⁺ 784.16864.17 271.17572 715.49640 288.18 y₈⁺ 1009.20 100 440.24927 1464.81287 215.1017279.0 20950322.72601 802.4104y₇6 ⁺ 917.55804 175.13 363.32 y₇⁺ b₉⁺, y₁₀⁺ 839.19 952.17 1122.21 846.52106 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 1400 200 300 400 500 600 700 800 900 1000 m/z m/z m/z

sequencegenemodifications EAAGGDSPGADALGAGWGITK EAAGGDSPGADALGAGWGITKGltpNone (global workflow) m/z 950.9545 (-0.99 ppm) +2, Mascot: 85 AAPELASGAGIEAGAAR Hdgf N-Term(Acetyl) y₈⁺ 200 789.42401 y₁₄⁺ 1313.68506 ASATGDSASERDdx17 N-Term(Acetyl)

150 ASATGDSASERDSAAPAAAPTAEAPPPPSVITRPEPQALPSSVIRDdx17 N-Term(Acetyl) IVEDVWLLQNVLR Ruvbl2 N-Term(Acetyl) y₆⁺ 661.36646 100 y₂⁺ y₄⁺ 248.16031 VTPAMAEVLSAGPESVAGCR Exosc3 N-Term(Acetyl); C19(Carbamidomethyl) b₄⁺-H₂O 418.26581 y₉⁺ y₁₀⁺ 311.13504 b₇⁺-H₂O 570.21448 902.50751973.54565 y₁₁⁺ b₂⁺ b₆⁺ 1088.57336 y₁₃⁺ MVTPAMAEVLSAGPESVAGCRExosc3N-Term(Acetyl); C20(Carbamidomethyl) 201.08717 501.18997 1216.63037 y₁₅⁺ 50 ₅⁺ y₇⁺

Intensity (Counts ) y b₆⁺-H₂O y₁₂⁺ 1400.72156 483.18240 604.34625 732.40308 b₁₀⁺ y₁₁⁺-H₂O1159.60950y₁₄⁺-H₂O y₁₇⁺ ATAKAMASKLLR Ak4N-Term(Acetyl); K4(Dimethyl); K9(Dimethyl) 813.33765 1070.55151 1572.73865 y₁₈⁺ 1295.65906 1629.77454 0 MVTPAMAEVLSAGPESVAGCRExosc3N-Term(Acetyl); M1(Oxidation); C20(Carbamidomethyl) 200 400 600 800 1000 1200 1400 1600 m/z AAGGATAALEVWLGR Agap1N-Term(Dimethyl) AAPELASGAGIEAGAAR Hdgf N-Term(Dimethyl) ASAGGGSDGGAAAGGR Fmr1 N-Term(Dimethyl) (M)AQQLGLPQLRAVRZfp617N-Term(Met addition, Acetyl)

Extended Data Figure 5 | See next page for caption.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Extended Data Figure 5 | Proteomic detection of 13 peptides produced mass error (parts per million), charge state, and Mascot ion score. Matched from 5′ UTRs. a, The RiboTaper analysis pipeline22 was used to annotate y-ion fragments are shown in blue, b-ions in red, and unfragmented parents upstream ORFs computationally in an in vitro SOX2 keratinocyte sample. in green. Peptide N termini were identified as naturally N-terminally RiboTaper exploits the triplet periodicity of ribosomal footprints to acetylated or unmodified. Owing to the protein-level primary amine predict bona fide translated regions. Boxplot shows the distribution of blocking step in the TAILS workflow, naturally unmodified N termini and length of 215 uORFs which start with an AUG codon with a median length all lysines carry a dimethyl chemical modification. e, Ribosome profiling of 54 nucleotides. As a comparison, the right panel shows boxplot with the tracks showing translated uORF in P4 epidermis of the hepatoma- distribution of length of main ORFs predicted by RiboTaper. b, Schematic derived growth factor Hdgf and of the glycolipid transfer Gltp gene. overview of the terminal amine isotopic labelling of substrates (TAILS) Encoded peptides identified by high-resolution/high-mass-accuracy mass strategy to specifically enrich for N-terminal fragments. c, Overview of spectrometry using proteomics and peptidomics are shown. Red amino the pre- and post-TAILS peptide counts and peptide-spectrum matches. acids correspond to the identified peptides, yellow nucleotides mark d, MS/MS spectra for identified uORF peptides. Representative MS/MS potential initiation sites. spectra for identified uORF peptides with monoisotopic (m/z), parent

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

a b 3000 ● 0 WT upregulated 5’UTR p < 0.001 SOX2 upregulated 5’UTR

y −250 1.00

2000 ● p < 0.001 ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● −500 ● ● ● ● ● ● ● Length ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● inimum free energ 0.50 ● ● ● ● ● ● ● ● ● ● M ● ● ● ● ● 1000 ● ● ● ● ● ● ● ● ● −750

● ● ● Cumulative Fraction 0.25

p < 0.001 −1000 ● ● 0.00 0 −2.0 −1.5 −1.0 −0.5 0.00.5 All 5’UTR WT upregulated SOX2−upregulated All 5’UTR WT upregulated SOX2−upregulated 5’UTR 5’UTR Minimum free energy / 5’UTR length 5’UTR 5’UTR

c Top 10% of genes resistant in translational Genes which show higher relative uORF/ORF efficiency in premalignant SOX2 vs. WT ratio in premalignant SOX2 vs. WT epidermis epidermis Overlap SOX2-regulated uORF and translational efficiency (122 mRNAs)

Regulation of Epithelial-mesenchymal transition pathway

Human Embryonic Stem Cell Pluripotency

Axonal Guidance signaling Translational uORFs efficiency 620 122 1521 Molecular mechanisms of cancer

Adipogenesis pathway

-6 -5 -4 -3 -2 -1 0 0 10 10 10 10 1 10 10 p value

p = 1.32e-19

de

G12V HRAS 800 shScramble Pre-malignant, SOX2-regulated uORFs in vivo Malignant HRASG12V-regulated uORFs in vitro

) 700 (> 2 fold increase in uORF usage (> 2 fold increase in uORF usage 3 shSox2 SOX2 vs. WT) HRASG12V SCC vs WT) 600 (mm

500 SOX2 me

lu 400 vo

r 300

mo 200 tumor initiation Tu 100 794 246 634 0 0510 15 20 25 30 35 40 45 50 Days post-injection

p = 9.0e-126 Extended Data Figure 6 | SOX2-targeted 5′ UTRs are highly structured. increase in the uORF/ORF ratio in SOX2 versus wild type were compared a, Genes with increased SOX2-regulated 5′​ UTR translation also have to the top 10% of mRNAs most resistant to reduction in translational longer 5′​ UTRs and are more structured. The 10% of genes with the most efficiency in P4 SOX2 versus wild-type epidermis. Pathway analysis increased 5′​ UTR translation in SOX2 cells were evaluated relative to the for the overlapping 122 genes (ingenuity pathway analysis) revealed 10% of genes with the largest decrease in 5′​ UTR translation (error bars epithelial–mesenchymal transition (EMT), stem-cell pluripotency, and indicate range, n =​ 183 each, two-sided Wilcoxon test). b, Cumulative axonal guidance as the top most enriched pathways. d, HrasG12V; Tgfbr2 distribution plot showing length to structure comparison, an assessment knockout SCC tumour growth is dependent on SOX2 signalling. As shown for each gene’s 5′​ UTR structure relative to its length. Analysis showing in Fig. 4h, the HrasG12V; Tgfbr2 knockout is sufficient to upregulate SOX2 that SOX2-upregulated 5′​ UTRs tend to have more favourable free levels. Graph showing SCC tumour growth post-injection of 105 cells. Data energy at each length, suggesting that SOX2-regulated 5′​ UTR secondary are mean ±​ s.e.m. (n =​ 8 for each genotype). e, Overlap between uORF structures are more stable even when normalized for length (error bars translation that occurs preferentially in premalignant SOX2-expressing P4 indicate range, n =​ 183 each, two-sided Wilcoxon test). c, mRNAs showing epidermis in vivo and uORF translation of malignant SOX2-expressing, preferential uORF translation in premalignant SOX2-expressing epidermis HRAS-regulated SCC in vitro. Included were all mRNAs with twofold significantly overlap with mRNAs that are most resistant to the reduction difference SOX2 versus wild-type P4 epidermis in vivo and twofold in translational efficiency in premalignant SOX2 versus wild-type difference HrasG12V; Tgfbr2 knockout SCC versus wild-type keratinocytes epidermis (hypergeometric test, P <​ 0.001). All mRNAs with a relative in vitro. Hypergeometric test, P <​ 0.001.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

Extended Data Figure 7 | See next page for caption.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

Extended Data Figure 7 | An shRNA screen reveals regulatory nodes cells under serum-rich conditions in vitro. SCC control and Eif2a in premalignancy. a, The 138 ribosomal genes and initiation factors knockout cells were quantified for EdU and OPP incorporation 1 h after targeted in our shRNA screen. b, Overview of the shRNA screen in wild- administration. Data are mean ±​ s.d. of 4 independent experiments. h, The type versus premalignant SOX2-expressing epidermis from P0 mice. The translational landscape is unchanged in Eif2a knockout cells under serum- RIGER algorithm27 was used with the following methods and metrics rich conditions in vitro. SCC control and Eif2a knockout cells subjected to to convert hairpins to genes and to rank top hits. From left to the right: ribosome profiling and reads within the main ORFs were quantified and weighted sum, signal to noise (median); second best rank, signal to noise tested for differential expression using DESeq2. As opposed to 5′ ​ UTR (median); weighted sum, signal to noise; weighted sum, fold change. translation, no significant differences (adjusted P <​ 0.1 DESeq2) in ORF c, Western blot shows knockdown efficiency of shRNAs targeting the top translation were found between SCC control and SCC Eif2a knockout cells hit in our screen, Eif2s1. Note that the knockdown efficiency correlates (n =​ 2 SCC control, n =​ 2 SCC Eif2a knockout). Right panels show two well with the degree of shRNA depletion in the screen. d, Intra-amniotic representative H&E sections of squamous cell carcinomas formed 25 days injection of lentivirus library of Eif2a and 35 control shRNAs. This sub- after subcutaneous injection of SCC cells. i, eIF2A-dependent changes library was injected into E9.5 embryos and representation of shRNAs was of heavy-labelled peptides of CTNNB1 and CD44 under stress in pulsed quantified in wild-type and SOX2 P0 skin. Top, knockdown efficiency; SILAC. SCC control and SCC Eif2a knockout cells were grown in light- bottom, relative representation of Eif2a shRNAs (normalized against the labelled medium and switched to heavy-labelled medium supplemented control shRNA library) in wild-type versus SOX2 epidermis. e, DNA with 5 μ​M arsenite. Graphs show the relative difference between SCC sequence of Eif2a knockout clonal cell lines used in our study. PAM region control and SCC Eif2a knockout cells during the pulsed SILAC time is highlighted in red, CRISPR target region in blue. f, g, Proliferation course. rates and overall protein synthesis rates are unchanged in Eif2a knockout

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

a SCC in vitro

s.c. injection Day 5 post-injection Day 10 post-injection AUG

Ribosome profiling Ribosome profiling DESeq2 Ribosome profiling DESeq2 RNA seq RNA seq Day 5 in vivo vs. in vitro RNA seq Day 10 vs. in vitro

b Day 5 All genes Day 5 significantly changed genes c Day 10 significantly changed genes Day 10 Translational changes Transcriptional changes Translational changes Translational efficiency 0.5 p = 1.703e-06 p = 0.09969 0.4 p = 2.604e-8 0.4 0.4 0.3 Ctnnb1 0.4 0.3 0.3 Hif1a p=1.004e-5p = 1.004e-5 Cd44 Densit y 0.2 0.2 0.3 Densit y Eef1a1 Densit y 0.2 Lgals3 0.1 0.2 0.1 Rac1 Cdk1 Densit y 0.0 0.1 0.1 0.0 Cdc42 −4 −2 024 −6 −3 036 Msn 0.0 0.0 log2 Fold Change log2 Fold Change Stat6 −6 −3 036 −5.0 −2.5 0.02.5 5.0 log2 Fold Change log2 Fold Change eIF2A-regulated uORF eIF2A-regulated uORF non-eIF2A-regulated uORF eIF2A-regulated uORF genes eIF2A-regulated uORF genes non-eIF2A-regulated uORF All genes non-eIF2A-regulated uORF genes non-eIF2A-regulated uORF genes All genes All genes Day 5 significantly changed genes Day 10 significantly changed genes Translational changes Translational changes p = 1.104e-05 p = 0.00654 0.3 0.3

0.2 0.2 Densit y Densit y 0.1 0.1

0.0 0.0 −4 −2 02 −6 −3 03 log2 Fold Change log2 Fold Change eIF2A-regulated uORF genes eIF2A-regulated uORF genes No uORF genes No uORF genes d GTG uORF R Q L V S C E A R G P G R R R R S T V G A E P S V Q E A E A E R A A A S E Q V S S C P V K P A A R G G G D G A R W A P S R Q C R R P R P S G R P R V S S 902 _ S A R V L * S P R P G E A E T E H G G R R A V S A G G R G R A G G R E * A A 5’UT5’UTRR 233 _ WTWT in vivoo 1 0 _ 423 _ 0 _ 88 _ WT in vivoo 2 0 _ 246 _ 1 _ 47 _

0 _ 2 _ 399 _ 242 _

0 _ 1 _ 81 _ 30 _

0 _ 1 _

Ctnnb1 Ctnnb1

Ctnnb1 5’UTR

CAS9 +sgRNA CRISPR target PAM Single Isolation of Ctnnb1 GGAGACGGAGCACGGTGGGCGCCGAGCCGTCAGTGCAGGAGGCCGAGGCCGCGCCGAGCCGTCAGTGCAGGAGGC Ctnnb1 5’UTR clonal Clone 2B cell Sequencing GGAGACGGAGCACGGTGGG------CGAGGCCG 24 nt deletion sorting Ctnnb1 uORF KO Clone 2B GGAGACGGAGCACGGTGGGCGCCGAGCGCCGAG - CGCGTCAGTGCAGGAGGCCGAGGCCGTCAGTGCAGGAGGC 1 nt deletion HrasG12V; lines Tgfbr2 KO GTG uORF start site

300 280

) 260 3 240 Non-targeting control sgRNA C-4C 220 200 Ctnnb1 uORF KO C-2B 180 160 140 120 100

umor volume (mm 80 T 60 40 20 0 0510 15 20 25 Time post-injection (Days) Extended Data Figure 8 | mRNAs containing eIF2A-targeted uORFs are control versus SCC Eif2a knockout in Fig. 6a (n =​ 2 SCC in vitro, n =​ 2 preferentially translated during tumorigenesis. a–c, Genes that contain SCC day 5, n =​ 2 SCC day 10, two-sample Kolmogorov–Smirnov test). eIF2A-targeted uORFs maintain increased translation and translational d, Mutations in the eIF2A-regulated uORF of Ctnnb1, a key oncogene efficiency of their downstream ORFs during early stages of tumorigenesis. for SCC progression, diminishes its tumorigenic potential. Tracks show HrasG12V; Tgfbr2 knockout SCCs were subcutaneously injected into nude ribosome profiling reads in the 5′​ UTR of Ctnnb1 and in the uORF GUG mice. Day 5 or day 10 tumours were analysed by ribosome profiling and start codon. HrasG12V; Tgfbr2 knockout SCC keratinocytes were infected RNA-seq. Changes in transcription, translation and translational efficiency with either non-targeting control sgRNA or Ctnnb1 uORF sgRNAs. Clonal were assessed comparing in vivo against SCC in vitro data and represented lines were sequenced and an uORF mutant clonal line was established. as fold changes in Kernel density plots. Either only the significantly Data are mean ±​ s.e.m. following subcutaneous injection of 105 cells (n =​ 8 changed genes (DESeq2) or all genes were assessed. eIF2A-targeted versus control, n =​ 14 clone 2B). non-eIF2A-targeted uORF genes refer to changes in uORF usage in SCC

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. ARTICLE RESEARCH

a Mutation Deletion AmplificationMultiple alterations

30%

25%

20%

15% Alteration Frequency

10%

5%

0% Cancer type Mutation data ++++++-++++++++++++++++++++++++++++ CNA data +++++++++++++++- +++- ++++++++++-++++

Lung squ NEPC (TrentL o Ov He Head & neckEsoph (T ag Breast (BCCRCDLBC Xen(TCGA)Cervical (TCGA)Ut Uterine CCLE (NovartiProstate (SU2C)Ova DESM (BroadProstat 20e Bladder (TCGPr A) PAAC (JHUNCI- 2 Pancreas Ut(U St Prostate B St B Breast Brea Lung SC Prost Breast A Lung aden un o lad omach ( lad CC (TCGA) a ad e o erine CS (TCG g squ rine (TCGA) sta mach (TCG rian ria 60 de d s & n er (MSKCC 20 t (TCGA a n (TCGA pub) te r (TCGA 2015 te (T (TCGA pub) (TCGA) (TCGA pub) us (T (MICH) (FHCRC) ( (TCGA 2014) (TCGA) (TCG eck (TCGA TCGA ( T UCO o CG CG T (TCGA) 014) /Cornell/Br CG s/Broa SW) ) A 2015 A pub) CGA) A A L ) ) O A) 15) A) p u G b) 1 ) NE p ograft) d 2012 2) ) ub ) o ) ad 2016) )

b Adjuvant Postoperative Pharmaceutical Therapy Administered Indicator Person Gender Neoplasm Histologic Type Name AJC Cancer Metastasis Stage Code Angiolymphatic Invasion Diagnosis Age Hpv status p16 Hpv status ish Neoadjuvant Therapy Type Prior To Res. Neoplasm Disease Stage AJC Code Person Neoplasm Status ICD-10 Classification Primary Therapy Outcome Success Type Did patient start adjuvant post-OP radiotherapy EIF2A 27% EIF2A mRNA upregulation

Neoplasm Disease Stage American N/A Stage IVA Stage III Stage I Stage II Stage IVB Stage IVC Joint Committee on Cancer Code

Hpv status ish N/A Negative Positive

Person Neoplasm Status N/A TUMOR FREE WITH TUMOR

ICD-10 Classification N/A C32.9 C03.9 C02.9 C14.8 C04.9 C06.0 C01 C13.9 C09.9 C00.9 C05.0 C10.9 C06.9 C06.2 C05.9

C04.0 C02.2 C41.1 C10.3 C02.1 C03.0 C32.1 C03.1

Adjuvant Postoperative N/A NO YES Pharmaceutical Therapy Primary Therapy Outcome N/A Complete Remission/Response Stable Disease Progressive Disease Partial Remission/Response Success Type

Did patient start adjuvant N/A YES NO postoperative radiotherapy?

Person Gender N/A MALE FEMALE

Neoplasm Histologic Type Name N/A Head & Neck Squamous Cell Carcinoma Head & Neck Squamous Cell Carcinoma Basaloid Type Head & Neck Squamous Cell Carcinoma, Spindle Cell Variant

American Joint Committee on N/A M0 MX M1 Cancer Metastasis Stage Code Angiolymphatic Invasion N/A YES NO

Diagnosis Age N/A 19 90

Hpv status p16 N/A Negative Positive

Neoadjuvant Therapy Type N/A No Yes Administered Prior To Resection Extended Data Figure 9 | Human EIF2A is frequently amplified in human cancer. a, Summary of cross-cancer alterations for EIF2A in human cancers12. 29% of patients with lung carcinoma and 15% of patients with head and neck SCCs show an amplification of the EIF2A locus. b, Summary of the clinical information accompanying TCGA patients with head and neck SCC.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. RESEARCH ARTICLE

a

5’UTR translation l

AUG In frame N-termina extension AUG

uORF Main ORF uORFs + N-terminal elongations Main ORF of the main ORF

Main possible outcomes 5’UTR translation as a proxy for uORF translation 1. Downregulates main ORF 2. Upregulates main ORF 3. Peptide serves function 1830 uORFs 1481 uORFs 13 translated 5’UTR peptides quantified in quantified in detected by mass spectrometry WT and SOX2 epidermis SCC and SCC Eif2a KO

during tumor initiation 215 AUG uORF annotated by 716 eIF2A-dependent uORFs preferential translation of Ribotaper main ORF 753 eIF2A-independent uORFs

Median length 54 nucleotides b Proteome Changes in SOX2 versus WT P4 epidermis

Extended Data Figure 10 | Overview of 5′ UTR translation and proteomics analyses. a, Summary of different analyses of 5′​ UTR translation in this study. b, Significantly changed proteins in SOX2 versus wild-type P4 epidermis in vivo (FDR <​ 0.05).

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.