Modeling Translation Initiation in Eukaryotes Based on TCP-Seq Data Tamar Neumann, Tamir Tuller [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Modeling Translation Initiation in Eukaryotes Based on TCP-seq Data Tamar Neumann, Tamir Tuller [email protected] Introduction Results Discretization Process AUGsAUG's in in 5’UTR 5'UTR Signal Signal DiscretizationDiscretized Signal of AUG's- AUGs in 5'UTRin 5’UTR Signal Translation regulation and specifically its initiation step is fundamental to gene expression control. Highest 10% AUG Context Score Values Highest 10% AUG Context Score Values 100 3 85 However, various aspects related to the biophysics of translation initiation in eukaryotes are 2.8 85 2.6 75 currently not well understood or modeled. 75 2.4 65 A novel protocol named Translation Complex Profile Sequencing (TCP-seq) was developed and 65 2.2 55 55 2 implemented on S. cerevisiae. This protocol provides the footprints of the small subunit (SSU) of the 1.8 45 45 Footprint length [nt] length Footprint [nt] length Footprint Foot print length [nt] Foot print length [nt] 1.6 35 ribosome (possibly with additional factors) across the entire transcriptome. 35 1.4 25 25 In this study, based the TCP-seq data, we develop for the first-time quantitative model related to 1.2 15 1 15 the affect of transcript features on the dynamic of the SSU. -50 -25 AUG +25 +50 -50 -25 AUG +25 +50 Position relative to AUG Codon [nt] Position relative to AUG Codon [nt] MIC Score of AUGs in 5’UTR with High/Low Context Score Methods MIC Score P Value (1000 Permutations) AUG Start Codon 0.8795 <0.001 1 Translation Complex Profile Sequencing (TCP-seq) AUGs With High AUG Context Score 0.5230 <0.001 AUGs With Low AUG Context Score 0.1389 0.08 Yeast cells were crosslinked using formaldehyde, which Ø efficiently stabilizes the preinitiation complexes. The results stay robust when using manually selected parameters. Ø These results shows that AUGs in the 5’UTR with high context score (i.e. similar to the context of the main AUG in highly expressed genes) induce RC distribution similar to the RC surrounding the The complexes were isolated and digested with RNase I to generate protected mRNA fragments, as in ribosome profiling. main AUG, meaning it postpones the SSU during it scan. MIC Score of AUGs in 5’UTR with Short/Long Distance to the Nearest STOP Codon Ribosome and SSU fractions were then separated by Ø The analyses were performed in each frame separately (Frames 0,1,2). sedimentation velocity. Ø In Frame 0, the results shows that longer uORF induce RC distribution similar to the RC surrounding the main AUG. Archer et al. (2016) Linear Regression The footprints were mapped to the yeast genome. Number of Sliding Windows Greater than Zero Correlations Vs. FootFootprint Print Length 4 10 Vs. FootFootprint Print LengthLength 0.2 12 TCP-seq Data 10 0.15 mRNA AUG 8 The Read Counts (RC) of the SSU depends on: SSU mRNA AUG 0.1 6 1. The location of the SSU in the 5’UTR. SSU 2. Footprint (FP) Length- the scanning of the SSU is accompanied with initiation factors, responsible 4 Spearman Correlation Spearman Correlation 0.05 for the scanning motion. They are located and interact with mRNA behind the SSU to push it in 2 mRNA 0 5ʹ→3ʹ direction. AUG 0 20 30 40 50 60 70 80 90 100 Number of Sliding Windows Greater than Zero 20 30 40 50 60 70 80 90 100 Foot Print Length [nt] Initiation Factors SSU Footprint length [nt] FootFootprint Print lengthLength [nt] [nt] Detecting complex statistical relations between variables using MIC Ø The partial Spearman correlation between each feature and SSU RC was calculated, while the other features and mRNA levels were controlled: Feature: Folding Energy of a sliding window in distance of 30 nt Feature: 5'UTR Length MIC- Maximal Information Coefficient is a measurement based on the mutual information of two 10-3 0.02 0.25 15 1 variables. Enables detecting various relations (of any type) between pairs of variables in large data 0 0.2 2 0.8 sets . 10 -0.02 Discretization Process- In order to use the MIC algorithm, the data must be discretized. Several 0.15 0.6 -0.04 5 discretization processes were tested: P Value 0.1 P Value 0.4 -0.06 Partial Correlation • Discretization using optimization process- the threshold of each row (=FP length) was set as the Partial Correlation 0 0.05 0.2 threshold that maximizes the MIC score of the signal. -0.08 • Discretization using different manually selected parameters- the threshold of each row was set -0.1 0 -5 0 0 50 100 0 50 100 0 50 100 0 50 100 FP Length [nt] FP Length [nt] as the mean (m), 2*m, mean + standard deviation etc., in order to show the robustness of the FP Length [nt] FP Length [nt] Ø results. These results can reveal features that affect the dynamics of the SSU. An interesting hypothesis derived from the directionality of the partial correlation of folding energy features Using linear regression to predict the density of the SSU is that when the SSU reaches a strongly folded area, it detaches from the mRNA. Calculating 20 Creating Creating linear Normalization Creating correlation Repetitions, sliding regressor of RC in mRNA features for between dividing the Conclusions & Future Work windows in based on the levels each window predictions and data randomly size of 30 nt training data. SSU density to groups v Based on the TCP-seq data we can reveal features in the transcript that affect the scanning of the Features SSU, and as a result affect the initiation step, allowing better understanding, modeling, and 5’ UTR ORF 3’ UTR v Derived from the gene itself: Gene engineering of translation initiation in eukaryotes. • 5’UTR Length & 5’UTR Length/ORF Length. v Derived from the sliding window: v Future plans include using MIC in order to detect additional features that affect the SSU scan, • Various local mRNA folding energy features 5’ UTR • Pairs/Groups of three nucleotides testing different feature selection processes, and examination of sliding windows in different • GC Content sizes. • Various AUG features, including number of AUGs in the window, average distance Sliding to start codon, average & maximal value of AUG context score, etc. Window: 30 nt 1 Archer, S. K., Shirokikh, N. E., Beilharz, T. H., & Preiss, T. (2016). Dynamics of ribosome scanning and recycling revealed by translation complex profiling 2 Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., ... & Sabeti, P. C. (2011). Detecting novel associations in large data sets.