A Short ORF-Encoded Transcriptional Regulator
Total Page:16
File Type:pdf, Size:1020Kb
A short ORF-encoded transcriptional regulator Minseob Koha,b, Insha Ahmada, Yeonjin Koa, Yuxiang Zhanga, Thomas F. Martinezc, Jolene K. Diedrichc, Qian Chuc, James J. Morescoc,1, Michael A. Erba, Alan Saghatelianc,2, Peter G. Schultza,2, and Michael J. Bollonga,2 aDepartment of Chemistry, The Scripps Research Institute, La Jolla, CA 92037; bDepartment of Chemistry, Pusan National University, Busan 46241, Republic of Korea; and cClayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA 92037 Contributed by Peter G. Schultz, December 15, 2020 (sent for review October 21, 2020; reviewed by Peng R. Chen and Kai Johnsson) Recent technological advances have expanded the annotated pro- photo-crosslinking group (e.g., diazirine, aryl azide) to covalently link tein coding content of mammalian genomes, as hundreds of previ- target to probe in live cells and an affinity or reactivity handle (e.g., ously unidentified, short open reading frame (ORF)-encoded biotin, alkyne) for isolation and detection. Analogously, genetic peptides (SEPs) have now been found to be translated. Although encoding of a photo-crosslinking noncanonical amino acid (ncAA) several studies have identified important physiological roles for this into an epitope-tagged SEP transgene would allow for covalent bond emerging protein class, a general method to define their interac- formation between a SEP and its interactor in situ and would provide tomes is lacking. Here, we demonstrate that genetic incorporation a means to detect and enrich SEP complexes (Fig. 1A). In contrast to of the photo-crosslinking noncanonical amino acid AbK into SEP methods involving the appendage of a much larger fusion protein (e.g., transgenes allows for the facile identification of SEP cellular interac- APEX, eGFP), such a strategy would, in principle, allow for enhanced tion partners using affinity-based methods. From a survey of seven enrichment of low affinity or transient interaction complexes while SEPs, we report the discovery of short ORF-encoded histone binding minimally interfering with the small binding surface of the SEP. protein (SEHBP), a conserved microprotein that interacts with Here, we demonstrate that a photo-crosslinking amino acid chromatin-associated proteins, localizes to discrete genomic loci, can be incorporated into SEPs at high levels and subsequently and induces a robust transcriptional program when overexpressed used to enrich putative binding partners in pull down experi- in human cells. This work affords a straightforward method to help ments. From an AP-MS–based screen of uncharacterized SEPs, define the physiological roles of SEPs and demonstrates its utility by we demonstrate the utility of this approach with the discovery of identifying SEHBP as a short ORF-encoded transcription factor. a sORF-encoded peptide that binds chromatin-associated pro- teins and regulates transcription in human cells. short open reading frame-encoded peptide | expanded genetic code | GENETICS photo-crosslinking | transcriptional regulation Results A Genetically Encoded Photo-Crosslinking Strategy for Identifying the dvances in proteomics and next generation sequencing have Cellular Targets of SEPs. We first sought to determine the feasi- Aindicated that in addition to the ∼19,000 nuclear-encoded bility of this mapping methodology in the context of a SEP with human genes hundreds to thousands of short open reading defined binding partners. The nuclear-localized SEP MRI-2 (also frames (sORFs, also called smORFs) likely contribute to the called CYREN) determines at which phases of the cell cycle functional proteome (1–3). With a high level of ribosomal occu- nonhomologous end joining repairs double-stranded DNA breaks pancy, sORFs give rise to an abundant protein class, sORF- by binding the XRCC6 and XRCC5 heterodimer (X-ray Repair encoded peptides (SEPs, often called microproteins), which are present at tens to thousands of molecules per cell (2). Historically, Significance SEPs have evaded detection by virtue of their small size (<150 codons) and due to the stringent criteria employed by gene pre- The small size of short ORF-encoded peptides (SEPs) has made diction algorithms (1, 4). Translation of SEPs is often initiated at it difficult to identify their cellular binding partners using non-AUG start sites (5), and sORFs can be located within an- ′ standard methods. Here, we show that the photo-crosslinking notated reference open reading frames (ORFs) as well as within 5 amino acid AbK can be incorporated into overexpressed, UTRs and in noncoding RNAs (6). SEPs have been identified in epitope-tagged SEP transgenes, allowing covalent bond for- most organisms studied (1) and frequently display a high degree of mation between SEPs and their interactors in live cells. From an – conservation among higher eukaryotes (6 8). A handful of studies AP-MS–based screen of conserved mammalian SEPs, we iden- have recently revealed essential cellular functions for SEPs, in- tified short ORF-encoded histone binding protein (SEBHP), a cluding roles in metabolism (9), muscle performance (10), or- micropeptide which acts as a transcriptional regulator, capable ganismal development (11, 12), apoptosis (13), and DNA repair of modulating more than 15% of the active transcriptome. The (14, 15). Despite the growing importance of SEPs, a general, ro- methodology described herein will likely be of broad utility in bust strategy to identify their cellular functions is lacking. annotating the essential physiological roles of SEPs. With a median length of less than 50 amino acids, SEPs themselves likely do not possess enzymatic activity, but instead Author contributions: M.K., I.A., Y.K., Y.Z., T.F.M., Q.C., J.J.M., M.A.E., A.S., P.G.S., and act by modulating the functions of their client proteins. As such, M.J.B. designed research; M.K., I.A., Y.K., Y.Z., T.F.M., J.K.D., Q.C., J.J.M., and M.J.B. per- formed research; M.K., I.A., Y.K., Y.Z., T.F.M., J.K.D., J.J.M., M.A.E., A.S., and M.J.B. ana- a key step in annotating the roles of SEPs is to first define their lyzed data; and M.K., I.A., A.S., P.G.S., and M.J.B. wrote the paper. cellular interactomes. Although a number of methods have been Reviewers: P.R.C., College of Chemistry and Molecular Engineering, Peking University; and used to elucidate the targets of SEPs, including appending an engi- K.J., Max Planck Institute for Medical Research. neered peroxidase to biotinylate potential interactors (APEX- ’ The authors declare no competing interest. tagging) and using CRISPR-Cas to edit an ORF s native locus (10, Published under the PNAS license. 16, 17), identification of SEP interactors remains a challenge. Here, 1Present address: Center for Genetics of Host Defense, UT Southwestern Medical Center, we describe an enhanced affinity purification mass spectrometry Dallas, TX 75390-8505. – (AP-MS) based mapping strategy to covalently fix SEP interactors in 2To whom correspondence may be addressed. Email: [email protected], schultz@ living cells, which was inspired by our previous work in deconvoluting scripps.edu, or [email protected]. the cellular targets of biologically active small molecules identified This article contains supporting information online at https://www.pnas.org/lookup/suppl/ from phenotypic screens (18, 19). These approaches typically use a doi:10.1073/pnas.2021943118/-/DCSupplemental. bifunctional photo-activatable affinity probe molecule bearing a small Published January 18, 2021. PNAS 2021 Vol. 118 No. 4 e2021943118 https://doi.org/10.1073/pnas.2021943118 | 1of6 Downloaded by guest on October 2, 2021 in situ A transfect plasmids, UV anti-FLAG UV - + add AbK cross-linking IP covalent interaction complexes O NN HN O -SEP Anti-FLAG AP-MS-based = AbK Western blot indentification of interactors OH H2N O B 1510 MLF2 DNAJA3 ALDH18A1 ACO2 RAC3 RAC1 SMR3B KRT37 BSG SEPINH1 KRT38 PGAM5 GNB1 RAC2 DNAJC5 KRT18 KRT24 GNAI3 IMPDH1 H2BFS RAB5C SCGB2A1 PEPD LACRT SNAPIN PITRM1 MARCKSL1 PPR4 C1orf57 SOD2 HMGN4 HIST3H3 CBX5 PMPC8 KPNA1 LYZ HIST2HBD HIST2HBB HIST2HBK COPS4 GLUD1 GLUD2 MYL2 SARS SLC3A2 HIST2HBC CPNE1 UCK2 DNAJC8 SF3A1 TRAP1 GRPEL1 UBL4A S100A10 CPMK1 UFD1L ISYNA1 MYCBP PPP1CA KPNA6 PNPT1 DLD XRCC5 VBP1 PSME3 IPO5 HMGN2 HMGN3 KNPA2 XRCC6 TSN RNH1 UV - + MRI-2 (CYREN) - SEP 4 + - SEP 8 + - SEHBP + - + SEP 15 - + SEP 16 - + SEP 22 - + SEP 24 - mScarlet + - FLAG peptide Fig. 1. A genetically encoded photo-crosslinker identifies cellular interactors of unexplored SEPs. (A) Schematic depicting the strategy used in this work to introduce the photo-crosslinking amino acid AbK (structure within inset circle) into SEP transgenes and identify their protein targets in cells. (B) Heatmap of the relative spectral count enrichment corresponding to protein preys identified from AP-MS experiments with the indicated SEP baits in the presenceor absence of UV (mean of three biological replicates). Cross Complementing 6/5; often called Ku70/Ku80, respectively) Together, these results suggested our mapping strategy provides a (14). To introduce a photo-crosslinker into MRI-2 in mammalian useful method for detecting cellular binding partners of SEPs. cells, we used an evolved mutant of the Methanosarcina barkeri pyrrolysyl aminoacyl transfer RNA (tRNA) synthetase/tRNA pair A Proteomics-Based Screen to Identify the Targets of Conserved (expressed from the vector pCMV-AbK) to genetically encode the Mammalian SEPs. We next applied this methodology in an unbi- diazirine functionalized ncAA N6-[[2-(3-Methyl-3H-diazirin-3-yl) ased fashion to deconvolute the targets of previously uncharac- ethoxy]carbonyl]-L-lysine (AbK) (20). Photoactivation leads to a terized SEPs. We used a recently described an approach for highly reactive carbene species, which can insert into a wide integrating de novo transcriptome assembly and Ribo-sequencing number of C-H and unsaturated C-C bonds. We next generated a (seq), which resulted in the annotation of more than 1,000 sORFs transient mammalian expression vector encoding a FLAG-tagged with high translational potential in mammalian cells (3).