Quantification and Discovery of Sequence Determinants of Protein-Per-Mrna Amount in 29 Human Tissues
Total Page:16
File Type:pdf, Size:1020Kb
Article Quantification and discovery of sequence determinants of protein-per-mRNA amount in 29 human tissues Basak Eraslan1,2,†, Dongxue Wang3,† , Mirjana Gusic4,5, Holger Prokisch4,5 ,Bjorn€ M Hallstrom€ 6, Mathias Uhlen6 , Anna Asplund7, Frederik Ponten7 , Thomas Wieland3, Thomas Hopf3, Hannes Hahne8,* , Bernhard Kuster3,9,** & Julien Gagneur1,*** Abstract See also: D Wang et al (February 2019) Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein- Introduction to-mRNA (PTR) ratios and a quantification of their effects are still lacking. Here, we quantified PTR ratios for 11,575 proteins across Unraveling how gene regulation is encoded in genomes is central to 29 human tissues using matched transcriptomes and proteomes. delineating gene regulatory programs and to understanding predis- We estimated by regression the contribution of known sequence positions to diseases. Although transcript abundance is a major determinants of protein synthesis and degradation in addition to determinant of protein abundance, substantial deviations between 45 mRNA and 3 protein sequence motifs that we found by associa- mRNA and protein levels of gene expression exist (Liu et al, 2016). tion testing. While PTR ratios span more than 2 orders of magni- These deviations include a much larger dynamic range of protein tude, our integrative model predicts PTR ratios at a median abundances (Garcıa-Martınez et al, 2007; Lackner et al, 2007; precision of 3.2-fold. A reporter assay provided functional support Schwanhausser€ et al, 2011; Wilhelm et al, 2014; Csardi et al, 2015) for two novel UTR motifs, and an immobilized mRNA affinity and poor mRNA–protein correlations for important gene classes competition-binding assay identified motif-specific bound proteins across cell types and tissues (Fortelny et al, 2017; Franks et al, for one motif. Moreover, our integrative model led to a new metric 2017). Moreover, deviations between mRNA and protein abun- of codon optimality that captures the effects of codon frequency dances are emphasized in non-steady-state conditions driven by on protein synthesis and degradation. Altogether, this study shows gene-specific protein synthesis and degradation rates (Peshkin et al, that a large fraction of PTR ratio variation in human tissues can be 2015; Jovanovic et al, 2016). Therefore, it is important to consider predicted from sequence, and it identifies many new candidate regulatory elements determining the number of protein molecules post-transcriptional regulatory elements. per mRNA molecule when studying the gene regulatory code. Decades of single-gene studies have revealed numerous sequence Keywords codon usage; mRNA sequence motifs; proteomics; transcriptomics; elements affecting initiation, elongation, and termination of transla- translational control tion as well as protein degradation. Eukaryotic translation is canoni- Subject Categories Genome-Scale & Integrative Biology; Methods & cally initiated after the ribosome, which is scanning the 50 UTR from Resources; RNA Biology the 50 cap, recognizes a start codon. Start codons and secondary DOI 10.15252/msb.20188513 | Received 21 June 2018 | Revised 22 January structures in 50 UTR can interfere with ribosome scanning (Kozak, 2019 | Accepted 23 January 2019 1984; Kudla et al, 2009). Also, the sequence context of the start Mol Syst Biol. (2019) 15:e8513 codon plays a major role in start codon recognition (Kozak, 1986). 1 Computational Biology, Department of Informatics, Technical University of Munich, Garching, Munich, Germany 2 Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universitat€ Munchen,€ Munich, Germany 3 Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany 4 Institute of Human Genetics, Technical University of Munich, Munich, Germany 5 Institute of Human Genetics, Helmholtz Zentrum Munchen,€ Neuherberg, Germany 6 Science for Life Laboratory, KTH - Royal Institute of Technology, Stockholm, Sweden 7 Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden 8 OmicScouts GmbH, Freising, Germany 9 Center For Integrated Protein Science Munich (CIPSM), Munich, Germany *Corresponding author. Tel: +49 8161 9762892; E-mail: [email protected] **Corresponding author. Tel: +49 8161 71 5696; E-mail: [email protected] ***Corresponding author. Tel: +49 89 289 19411; E-mail: [email protected] †These authors contributed equally to this work ª 2019 The Authors. Published under the terms of the CC BY 4.0 license Molecular Systems Biology 15:e8513 | 2019 1 of 25 Molecular Systems Biology Sequence determinants of protein per RNA Basak Eraslan et al The translation elongation rate is determined by the rate of decoding Here, we exploited matched proteome and transcriptome expres- each codon of the coding sequence (Sorensen et al, 1989; Gardin sion levels for 11,575 genes across 29 human tissues (Fig 1A, Wang et al, 2014; Hanson & Coller, 2018). It is understood that the low et al, 2019) to predict protein-to-mRNA ratios (PTR ratios) from abundance of some tRNAs leads to longer decoding time of their sequence. To interpret our findings related to mRNA degradation cognate codons (Varenne et al, 1984), which in turn can lead to (Radhakrishnan & Green, 2016), translation, and protein degrada- repressed translation initiation consistent with a ribosome traffic tion, we included mRNA half-life measurements (Tani et al, 2012; jam model (reviewed in Hanson & Coller, 2018). However, esti- Schueler et al, 2014; Schwalb et al, 2016), in addition to human mates of codon decoding times in human cells and their overall ribosome profiling of 17 independent studies (Dana & Tuller, 2015; importance for determining human protein levels are highly debated O’Connor et al, 2016) as well as protein half-life measurements (Plotkin & Kudla, 2011; Quax et al, 2015; Hanson & Coller, 2018). from immortal and primary cell lines (Zecha et al, 2018; Mathieson Secondary structure of the coding sequence and chemical properties et al, 2018; Fig 1A). We considered known post-transcriptional of the nascent peptide chain can further modulate elongation rates regulatory elements and identified novel candidates in the 50 UTR, (Qu et al, 2011; Artieri & Fraser, 2014; Sabi & Tuller, 2017; Dao Duc coding sequence, and 30 UTR, by means of systematic association & Song, 2018). Translation termination is triggered by the recogni- testing. We also modeled the effect of codons on protein-to-mRNA tion of the stop codon. The sequence context of the stop codon can ratio, leading to a new quantitative measure of codon optimality modulate its recognition, whereby non-favorable sequences can lead which we compared to existing metrics. Our integrative model esti- to translational read-through (Bonetti et al, 1995; McCaughan et al, mates the contribution of all these elements on protein-to-mRNA 1995; Poole et al, 1995; Tate et al, 1996). Furthermore, numerous ratio and predicts tissue-specific PTR ratios of individual genes at a RNA binding proteins (RBPs) and microRNAs (miRNAs) can be relative median error of 3.2-fold. Finally, we are providing initial recruited to mRNAs by binding to sequence-specific binding sites experimental results to assess the functional relevance of the novel and can further regulate various steps of translation (Baek et al, potentially regulatory elements. 2008; Selbach et al, 2008; Guo et al, 2010; Gerstberger et al, 2014; Hudson & Ortlund, 2014; Cottrell et al, 2017). However, not only predicting the binding of miRNAs and RBPs from sequence is still Results difficult, but the role of few of these binding events in translation is well understood. Matched transcriptomic and proteomic analysis of 29 Complementary to translation, protein degradation also plays an human tissues important role in determining protein abundance. Degrons are protein degradation signals which can be acquired or are inherent to Using label-free quantitative proteomics and RNA-Seq, we profiled protein sequences (Geffen et al, 2016). The first discovered degron the proteomes and transcriptomes of adjacent cryo-sections of 29 inherent to protein sequence was the N-terminal amino acid histologically healthy tissue specimens collected by the Human (Bachmair et al, 1986). However, the exact mechanism and its impor- Protein Atlas project (Fagerberg et al, 2014) that represent major tance are still debated, with recent data in yeast indicating a more human tissues (Wang et al, 2019). To facilitate data analysis, we general role of hydrophobicity of the N-terminal region on protein modeled every gene with a single transcript isoform because there stability (Kats et al, 2018). Further protein-encoded degrons include was little evidence for widespread expression of multiple isoforms several linear and structural protein motifs (Ravid & Hochstrasser, and to avoid practical difficulties of calling and quantifying isoform 2008; Geffen et al, 2016; Maurer et al, 2016), or phosphorylated abundance consistently at mRNA and protein levels. The number of motifs that are recognized by ubiquitin ligases (Meszaros et al,2017). genes with multiple quantified isoforms on protein level was small Altogether, numerous mRNA and protein-encoded sequence features (10% of the 13,664 genes with a protein detected in at least in one contribute to determining how many protein molecules per mRNA tissue). Also, for 5,636 (43%) genes the same isoform was the most molecule