1 maps of general eukaryotic RNA degradation factors

2 Salma Sohrabi-Jahromi1†, Katharina B. Hofmann2†, Andrea Boltendahl2, Christian Roth1, 3 Saskia Gressel2, Carlo Baejen2, Johannes Soeding1*, and Patrick Cramer2*

4 Affiliations: 5 1Max-Planck-Institute for Biophysical Chemistry, Quantitative and Computational Biology, 6 Am Fassberg 11, 37077 Göttingen, Germany. 2Max-Planck-Institute for Biophysical 7 Chemistry, Department of , Am Fassberg 11, 37077 Göttingen, 8 Germany. †These authors contributed equally. *Correspondence to: Johannes Soeding 9 ([email protected]) and Patrick Cramer ([email protected]).

10 Abstract:

11 RNA degradation pathways enable RNA processing, the regulation of RNA levels, and 12 the surveillance of aberrant or poorly functional in cells. Here we provide 13 transcriptome-wide RNA-binding profiles of 30 general RNA degradation factors in the 14 yeast . The profiles reveal the distribution of degradation 15 factors between different RNA classes. They are consistent with the canonical 16 degradation pathway for closed-loop forming mRNAs after deadenylation. Modeling 17 based on mRNA half-lives suggests that most degradation factors bind intact mRNAs, 18 whereas decapping factors are recruited only for mRNA degradation, consistent with 19 decapping being a rate-limiting step. Decapping factors preferentially bind mRNAs with 20 non-optimal codons, consistent with rapid degradation of inefficiently translated mRNAs. 21 Global analysis suggests that the nuclear surveillance machinery, including the 22 complexes Nrd1/Nab3 and TRAMP4, targets aberrant nuclear RNAs and processes 23 snoRNAs.

24 Impact Statement:

25 Mapping of 30 general RNA degradation factors onto the yeast transcriptome provides 26 the global distribution of factors for RNA turnover and surveillance in a eukaryotic cell.

27 Introduction:

28 The abundance of the different eukaryotic RNA species controls cell type and cell fate, 29 and is determined by the balance between RNA synthesis and RNA degradation. Multiple 30 mechanisms exist for RNA degradation (Parker, 2012). RNAs are generally exported to 31 the , where they are degraded with an RNA-specific rate. Such canonical 32 turnover is critical for RNA homeostasis. However, RNAs that are defective with respect 33 to their processing, folding, assembly into RNA- particles, or their ability to be 34 translated, are identified and rapidly degraded by surveillance pathways. Surveillance 1

35 occurs both in the nucleus (Schmid and Jensen, 2018) and in the cytoplasm (Zinder and 36 Lima, 2017). In the nucleus, aberrant RNAs resulting from upstream antisense 37 transcription are quickly degraded. Moreover, some non-coding RNAs (ncRNAs) require 38 processing by the degradation machinery (Houseley et al., 2006). Cytosolic RNAs also 39 vary in their life-time, with mRNAs encoding cell cycle regulators or transcription factors 40 having reported life-times in the range of minutes (Geisberg et al., 2014; Miller et al., 41 2011), whereas ribosomal RNAs live for days (Turowski and Tollervey, 2015). Therefore, 42 RNA degradation kinetics need to be actively regulated to ensure the optimal life time for 43 each transcript.

44 RNA degradation can occur from both ends of a transcript, and these processes 45 are often coupled. During canonical mRNA turnover, degradation is thought to be initiated 46 by shortening of the 3´ poly-adenylated (polyA) tail through two major deadenylation 47 complexes, the Pan2/Pan3 complex and the multi-subunit Ccr4/Not complex (Ccr4, Not1, 48 Pop2, Caf40) (Wolf and Passmore, 2014). Specific mRNAs can recruit selected 49 deadenylating factors after loss of the polyadenylate-binding protein (Pab1) that protects 50 the mRNA from degradation on the 3′ end (Finoux and Seraphin, 2006; Goldstrohm et 51 al., 2006; Semotok et al., 2005), but the choice of deadenylation pathway remains 52 unclear. Studies have pointed out a direct link between termination and mRNA 53 degradation (reviewed in Huch and Nissan, 2014), in particular deadenylation, which is 54 dependent on Pab1 and Ccr4 (Webster et al., 2018). A proposed stepwise model for 55 deadenylation suggests that first the average yeast polyA tail length of 90 (nt) 56 is reduced to 50 nt by the Pan2/Pan3 complex, before further shortening via the Ccr4/Not 57 complex (Beilharz and Preiss, 2007; Brown and Sachs, 1998; Tucker et al., 2001). When 58 the polyA tail reaches a length of 10-12 nt, the mRNA is decapped (Chowdhury et al., 59 2007; Tharun and Parker, 2001), or subjected to catalyzed degradation 60 (Bonneau et al., 2009).

61 The second step in mRNA degradation is thought to be the removal of the 5´ cap 62 by the decapping complex (Dcp1, Dcp2, Dcs1). The cap protects the mRNA from 63 degradation by the 5´→3´ Xrn1, which requires a 5´ monophosphate at the 64 terminal residue (Stevens and Poole, 1995). Decapping is highly regulated by decapping 65 enhancers such as the DEAD box Dhh1, Edc2 and Edc3. Different potential 66 mechanisms to trigger decapping include interference with translation initiation factors, 67 facilitated assembly of the decapping machinery, and stimulation of Dcp2 catalytic 68 function. Assembly of the decapping machinery occurs mainly after shortening of the 69 polyA tail, which triggers decapping complex formation on the deadenylated 3´ end of 70 mRNA and opening of the mRNA closed-loop structure (Caponigro and Parker, 1995; 71 Morrissey et al., 1999). In this closed-loop model, the 5′ and 3′ ends of the mRNA are

2

72 thought to be in close proximity by forming a complex between translation initiation factors 73 binding to the 5´ cap and Pab1 associated with the 3´ end, thereby contributing to mRNA 74 expression regulation (Vicens et al., 2018; Wells et al., 1998).

75 An alternative pathway of mRNA degradation after deadenylation is 3′→5′ 76 degradation by the exosome and its auxiliary factors (Anderson and Parker, 1998). The 77 exosome is a multi-subunit complex that consists of 10 core factors, comprising six 78 members of the RNase PH protein family (Rrp43, Rrp45, Rrp42, Mtr3, Rrp41, Rrp46), 79 three small RNA-binding (Csl4, Rrp40, Rrp4) (Allmang et al., 1999), and the 80 Rrp44/Dis3 protein, which harbors an exonuclease and an endonuclease domain 81 (Lebreton and Seraphin, 2008; Schaeffer et al., 2009). In addition to its functions in the 82 cytoplasm, the exosome fulfills multiple roles in nuclear RNA processing and degradation 83 (Lykke-Andersen et al., 2009; Ogami et al., 2018) for which it is additionally bound by 84 Rrp6, another 3′→5′ exonuclease, Rrp47, and Mpp6 (Milligan et al., 2008; Mitchell et al., 85 2003; Synowsky et al., 2009).

86 For RNA degradation by the exosome, RNA first passes through either the TRAMP 87 or the Ski complex. TRAMP is a nuclear poly-adenylation complex (Houseley and 88 Tollervey, 2009) that is involved in many of the RNA maturation and degradation 89 processes and exists in two isoforms, TRAMP4 (Trf4, Air2 and Mtr4) and TRAMP5 (Trf5, 90 Air1 and Mtr4). These complexes harbor a pA polymerase (Trf4 or Trf5), a zinc-knuckle 91 putative RNA-binding protein (Air1 or Air2), and an RNA helicase (Mtr4). Defective 92 nuclear RNAs are tagged with a short polyA tail by TRAMP, making them a more 93 favorable substrate for the exosome core (Vaňáčová et al., 2005). The Ski complex is 94 required for cytoplasmic exosomal degradation. The Ski7 protein is stably bound to the 95 cytoplasmic exosome through the Ski4 subunit (van Hoof et al., 2002). The Ski2, Ski3, 96 and Ski8 proteins form a subcomplex interacting with Ski7, which is required for 3′→5′ 97 degradation of mRNAs (Araki et al., 2001; Brown et al., 2000; Wang et al., 2005). The 98 Ski2 protein is an ATPase of the RNA helicase family that generates energy by ATP 99 hydrolysis to unwind secondary structures and dissociate bound proteins to deliver the 100 RNA to the exosome.

101 To degrade eukaryotic mRNAs that are defective in translation, cytoplasmic quality 102 control mechanisms exist (Doma and Parker, 2007). Normal and aberrant mRNAs can 103 be discriminated by the translation machinery, and translationally defective mRNAs are 104 guided to a degradation pathway. mRNAs with aberrant translation termination due to a 105 premature translation termination codon are subjected to nonsense-mediated decay 106 (NMD) (Losson and Lacroute, 1979). Substrates for NMD are identified by the Upf1 107 protein interacting with the translation termination complex followed by binding of the Upf2 108 and Upf3 proteins, which enhances the helicase activity of Upf1 (Baker and Parker, 2004;

3

109 Chakrabarti et al., 2011). During NMD, the mRNA can be subjected to enhanced 110 deadenylation, deadenylation-independent decapping and rapid 3′→5′ degradation (Cao 111 and Parker, 2003; Mitchell and Tollervey, 2003; Muhlrad and Parker, 1994).

112 The large variety of different RNA degradation factors poses the question how 113 RNA degradation pathways are selected and whether the RNA sequence can influence 114 this selection. Answering this question requires a systematic analysis of the RNA-binding 115 profiles of the involved protein factors. Although several transcriptome profiles of the RNA 116 degradation factors Xrn1, Rrp44, Csl4, Rrp41, Rrp6, Mtr4, Trf4, Air2 and Ski2 have been 117 reported (Delan-Forino et al., 2017; Milligan et al., 2016; Schneider et al., 2012; Tuck and 118 Tollervey, 2013), we lack transcriptome-wide binding profiles for components of the 119 deadenylation, decapping, and NMD machineries, as well as other subunits of the 120 . Thus, the task of systematically analyzing the binding of subunits from 121 all known factors involved in RNA degradation to a eukaryotic transcriptome 122 (‘transcriptome mapping’) has not been accomplished thus far.

123 Here we used photoactivatable ribonucleoside-enhanced crosslinking and 124 (PAR-CLIP) to systematically generate transcriptome-wide protein 125 binding profiles for 30 general RNA degradation factors in the yeast Saccharomyces 126 cerevisiae (S. cerevisiae). In-depth bioinformatic analysis and comparisons with 127 previously reported PAR-CLIP data provide factor enrichment on different RNA classes 128 and the binding behavior for mRNAs and their associated antisense transcripts. The 129 results also give insights into how the various degradation complexes, and also different 130 subunits in these complexes, may be involved in the degradation of different RNA 131 species. Several conclusions can be drawn with respect to degradation pathway 132 selection, new functions for known factors can be proposed, and several hypotheses 133 emerge that can be tested in the future. Finally, our dataset provides a rich resource for 134 future studies of eukaryotic RNA degradation pathways, mechanisms, and the integration 135 of mRNA metabolism.

136

137 Results:

138 Transcriptome maps for 30 RNA degradation factors 139 In order to get a better understanding of RNA processing and degradation in a eukaryotic 140 cell, we measured transcriptome-wide binding locations of 30 RNA degradation factors 141 involved in mRNA deadenylation, decapping, exosome-mediated degradation, and in 142 RNA surveillance pathways including nuclear RNA surveillance and cytoplasmic 143 nonsense-mediated decay (NMD) (Figure 1A,B). We performed PAR-CLIP in S. 144 cerevisiae using our published protocol (Battaglia et al., 2017), with minor modifications

4

145 (Methods). The high reproducibility of these PAR-CLIP experiments is revealed by a 146 comparison of two independent biological replicates that we collected for all 30 147 degradation factors (Figure 1—figure supplement 1), with Spearman correlations 148 between 0.87 and 1.00 (mean: 0.94). We typically obtained tens of thousands of verified 149 factor-RNA cross-link sites with p-values ≤ 0.005 (Figure 1B). These transcriptome maps 150 represent an extensive, high-confidence dataset of in vivo RNA-binding sites for factors 151 involved in RNA degradation. 152 153 Degradation factors exhibit transcript class specificity 154 We first compared degradation factor binding over different RNA classes. These included 155 messenger RNA (mRNA), where we distinguished the 5´ untranslated region (5´ UTR), 156 the coding sequence (CDS), introns, and the 3´ untranslated region (3´ UTR). We also 157 included several classes of ncRNAs: ribosomal (r), transfer (t), small nucleolar (sno), and 158 small nuclear (sn) RNAs, as well as stable unannotated transcripts (SUTs), cryptic 159 unstable transcripts (CUTs), and Nrd1-unterminated transcripts (NUTs) (Neil et al., 2009; 160 Pelechano et al., 2013; Schulz et al., 2013) (Figure 2). 161 A first analysis revealed that most PAR-CLIP sequencing reads fall into the mRNA 162 transcript class, although many of the factors also show a considerable number of 163 sequencing reads in ncRNAs, in particular rRNAs (Figure 2A). To obtain a more 164 quantitative comparison, we defined log enrichment scores that reflect the preferences of 165 factors in binding to a specific transcript class in comparison to other factors and classes. 166 To correct for the different sizes of classes and different numbers of measured factor 167 binding sites, we normalized the log enrichment scores by subtracting class- and factor- 168 specific offsets, such that the mean for each class and each factor vanishes (Figure 2B, 169 Methods). This analysis highlights differences between degradation factors with respect 170 to binding to various transcript classes, as will be discussed in detail below. 171 172 RNA end-processing complexes differ in their targets 173 The catalytic subunit Pop2 and the core subunits Not1 and Caf40 of the deadenylase 174 complex Ccr4/Not have similar binding preferences for the 5´ UTR, the CDS and 3´ UTR 175 of mRNAs, for rRNAs, tRNAs, snoRNAs, and snRNAs (Figure 2B, highlighted in red). 176 Compared to other deadenylation factors of the Ccr4/Not complex, the catalytically active 177 subunit Ccr4 has different binding preferences, and is strongly enriched at mRNA introns. 178 The second deadenylation complex, Pan2/Pan3, shows a similar binding preference as 179 the Ccr4/Not complex (except for the Ccr4 subunit), consistent with its dominant role in 180 yeast mRNA deadenylation (Boeck et al., 1996). Pan3 shows a strong binding preference 181 for rRNAs and tRNAs.

5

182 For all decapping-related factors we observed similar binding preferences among 183 each other (Figure 2B, highlighted in green). They show the strongest enrichment at 184 SUTs and at mRNAs compared to the other transcript classes. Decapping factors bind 185 preferentially to CDS and 3´ UTR as well as SUTs. This is consistent with previous 186 findings that SUTs are degraded via Dcp2-dependent pathways in the cytoplasm 187 (Marquardt et al., 2011; Smith et al., 2014; Thompson and Parker, 2007). Dcp2, which 188 harbors the hydrolase activity that removes the 5´ cap, and the decapping activator Edc3, 189 additionally bind to NUTs. The 5´ exonuclease Xrn1 shows a similar binding preference 190 as the decapping factors (Figure 2B, highlighted in orange). Taken together, complexes 191 and enzymes that are known to target mRNA ends for 3´ deadenylation and 5´ decapping 192 and degradation show remarkably distinct binding specificities to different transcript 193 classes. 194 195 The exosome and surveillance factors 196 For the exosome we also observed binding to different RNA classes (Figure 2B, 197 highlighted in royal blue). The core exosome subunits Csl4 and Rrp40 show similar cross- 198 linking to rRNAs, tRNAs, snoRNAs, and snRNAs. The catalytic exosome subunit Rrp44 199 and the core subunit Rrp4 binds to introns of mRNAs, but preferentially to the short-lived, 200 nuclear CUTs and NUTs. Rrp6, a subunit that is exclusively present in the nuclear 201 exosome complex, shows binding to rRNAs, snoRNAs, snRNAs, CUTs and NUTs. This 202 is consistent with the suggestion that the factor is needed for nuclear processing of such 203 non-coding transcripts and degradation of short-lived nuclear transcripts (Heo et al., 2013; 204 Vasiljeva and Buratowski, 2006). This complex distribution of cross-links for different 205 exosome subunits to different RNA classes reflects the distinct functions of the exosome 206 in nuclear RNA surveillance, processing of stable ncRNAs, and cytoplasmic mRNA 207 degradation (Zinder and Lima, 2017). 208 The two TRAMP complexes TRAMP4 and TRAMP5 show clearly distinct cross- 209 linking patterns (Figure 2B, highlighted in light blue). TRAMP4 subunits (Mtr4, Air2, Trf4) 210 are enriched in introns, consistent with a function on mRNAs, and on SUTs, CUTs, and 211 NUTs. The TRAMP5 complex (Mtr4, Air1, Trf5) shows binding enrichment for introns, 212 rRNAs, tRNAs, snRNAs, and snoRNAs. This is in agreement with previous data, which 213 showed rRNA binding for Mtr4 and exosome subunits (Delan-Forino et al., 2017; 214 Schneider and Tollervey, 2013). Moreover, the TRAMP complex cooperates with the 215 Nrd1/Nab3 complex and the nuclear exosome complex during the maturation and 3´ pre- 216 processing of snoRNAs (Grzechnik and Kufel, 2008). To distinguish binding upon 217 degradation and binding in order to pre-process snoRNAs, we investigated metagene 218 profiles of TRAMP subunits along snoRNA genes (Figure 2—figure supplement 2). 219 Air1/Trf5 bind almost exclusively to the gene body whereas Air2/Trf4 bind downstream of

6

220 the 3´ end. This suggests that TRAMP5 is mainly involved in snoRNA degradation, 221 whereas TRAMP4 may work together with the Nrd1/Nab3 machinery to pre-process 222 snoRNAs (Figure 2—figure supplement 2) and to target NUTs, SUTs, and CUTs for 223 degradation (Figure 2B, highlighted in light blue). 224 The cross-linking preferences of subunits of the Ski complex differ only slightly 225 from each other (Figure 2B, highlighted in cyan). All Ski complex subunits bind the 226 5´ UTR, CDS, and 3´ UTR of mRNAs, rRNAs, tRNAs, snoRNAs, and snRNAs. The Ski2 227 subunit preferentially binds to the CDS of mRNAs, consistent with its function as a 228 helicase to detach bound proteins from the mRNAs (Houseley and Tollervey, 2009; 229 Lebreton and Seraphin, 2008). The exosome adaptor subunit Ski7 preferentially binds 230 rRNAs and tRNAs. These patterns are consistent with the model that the exosome 231 cooperates with distinct accessory complexes and factors to target different transcript 232 classes. Finally, we observed similar cross-linking patterns for all NMD factors with strong 233 binding to SUTs and NUTs (Figure 2B, highlighted in yellow). Upf2 shows an additional 234 binding preference to introns and CUTs. Upf3 also binds to the 5´ UTR, CDS, and 3´ UTR 235 of mRNAs, and Nmd4 binds to introns and 3´ UTRs of mRNAs. 236 237 Distinct factor distribution along mRNA 238 We next focused on degradation factor distribution on mRNAs. We prepared metagene 239 profiles showing the average occupancy of each factor around the mRNA transcription 240 start sites (TSS) and the poly-adenylation (pA) sites, respectively (Figure 3). The 241 Pan2/Pan3 deadenylase complex and the Ccr4/Not subunits Pop2, Not1, and Caf40 all 242 cross-link upstream of the 3´ end of mRNA with the highest enrichment at the pA site, as 243 expected from their function in shortening the polyA tail. The catalytic subunit Ccr4 binds 244 strongly in the 5´ region of mRNAs. All 5´ decapping factors bind upstream of the pA site, 245 and all but the catalytically active subunit Dcp2 show increasing occupancy towards the 246 3´ end of mRNAs. These patterns can be explained if decapping factors are pre-bound to 247 mRNAs that form a closed loop that holds the RNA ends in proximity. In contrast, Dcp2 248 binds almost exclusively at the pA site, suggesting that it might be recruited only upon 249 active mRNA degradation. The cytoplasmic 5´ exonuclease Xrn1 has the highest 250 occupancy towards the 3´ end, similar to the previously published crosslinking and cDNA 251 analysis (CRAC) data (Tuck and Tollervey, 2013), thereby resembling the binding profiles 252 of the decapping factors. Comparison of the binding profiles aligned at the pA site or 253 alternatively with profiles aligned at the translation stop codon shows that the binding 254 preference indeed lies at the end of the 3´ UTR independent of the stop codon position 255 (Figure 3—figure supplement 1B,C). 256 The exosome core subunits (Csl4, Rrp40, and Rrp4) and the catalytically active 257 subunits (nuclear: Rrp6, cytoplasmic: Rrp44) cross-link to the 5´ end of the transcript

7

258 (Figure 3), possibly because the exosome binds to the 5´ end while digesting the 3´ end, 259 or more likely because the exosome slows down towards the remaining 5´ end of mRNAs 260 after rapid degradation from the 3´ end. Both TRAMP complexes bind mainly in the 5´ 261 region of mRNAs near the TSS, as previously observed for Mtr4 and Trf4 (Tuck and 262 Tollervey, 2013). 263 The Ski complex components Ski7 and Ski8 occupy the entire mRNA with 264 increasing occupancy towards the pA site, whereas Ski2 and Ski3 show more discrete 265 binding towards the polyA tail (Figure 3). The NMD factors Upf1 and Upf3 show binding 266 over the entire mRNA with highest occupancy at the pA site, consistent with their role in 267 scanning for premature stop codons in mRNAs and remodeling of the 3´ end of protein- 268 RNA complexes and completion of mRNA decay (Franks et al., 2010). In addition, Upf2 269 and Nmd4 show strongest binding near the 3´ ends of mRNAs. Taken together, the 270 distribution of cross-links along mRNA transcripts differs between degradation complexes 271 and in some cases also between their subunits. 272 273 Surveillance of aberrant nuclear ncRNA 274 Pervasive transcription of the genome leads to many short-lived aberrant RNAs that must 275 be rapidly detected and degraded in the nucleus. We previously reported that the RNA 276 surveillance factors Nrd1 and Nab3 strongly cross-link to aberrant upstream antisense 277 RNA that stems from bidirectional transcription (Schulz et al., 2013). In order to find 278 factors cross-linking to aberrant ncRNAs, we plotted the occupancy of all 30 investigated 279 factors on the antisense strand of known mRNAs (Figure 4). For comparison, we plotted 280 the published Nrd1 and Nab3 profiles in the first two lanes of Figure 4. The factors are 281 involved in processing and degradation of Nrd1-unterminated transcripts, or NUTs 282 (Schulz et al., 2013), and are expected to show similar binding to upstream antisense 283 RNA as Nrd1 and Nab3. Indeed, we observed a similar binding pattern for all exosome 284 subunits (Rrp6, Csl4, Rrp40, Rrp4, Rrp44) and subunits of the TRAMP4 complex (Mtr4, 285 Air2, Trf4). Consistent with this, these factors also bind strongly to previously annotated 286 NUTs and CUTs (Figure 2) and show strong enrichment of Nrd1 and Nab3 motifs (GTAG, 287 CTTG) around their cross-link sites (Figure 4—figure supplement 1). 288 It has been shown that Nrd1 is involved in terminating transcripts upstream of the 289 TSS. We also observe a strong signal for binding upstream of TSS on the sense strand 290 for Air2 and Mtr4 (Figure 3). This suggests that the TRAMP4 complex is involved in 291 degradation of those Nrd1-regulated upstream sense transcripts. To investigate this 292 hypothesis, we compared the binding profiles around the TSS of 459 protein-coding 293 genes, previously annotated as having upstream Nrd1-unterminated transcripts, or NUTs 294 (Schulz et al., 2013), with the profiles obtained for all mRNAs (Figure 3—figure 295 supplement 2A,B). TRAMP4 and the exosome subunits show a strong preference for

8

296 binding to the upstream promoter region of genes that are controlled by the Nrd1/Nab3 297 complex. To further confirm that this upstream signal originates from NUTs and CUTs, 298 we excluded cross-link sites that fall within such previously annotated regions. We then 299 compared the binding profiles generated from the remaining binding sites on mRNAs 300 (Figure 3—figure supplement 3). Upon filtering, the signal upstream of the TSS for Air2 301 and Mtr4 decreases, showing that Nrd1-mediated regulation is the primary cause for this 302 upstream signal. Comparison of the observed antisense profiles (Figure 4) with those 303 obtained after excluding cross-link sites in previously annotated NUT and CUT regions 304 (Figure 4—figure supplement 2) confirms that most of the signal originates from 305 transcripts that are targeted by the Nrd1/Nab3 machinery. 306 These results are consistent with the idea that the nuclear RNA surveillance 307 machinery involves, in addition to Nrd1 and Nab3, the TRAMP4 complex and the nuclear 308 exosome. Indeed, it was reported that TRAMP4 can add a short polyA tail on aberrant 309 RNAs (Wyers et al., 2005), which may trigger degradation by the nuclear exosome. It was 310 also recently shown that Nrd1 and Trf4 interact, providing a basis for coupling 311 surveillance-mediated termination to RNA degradation (Tudek et al., 2014). 312 313 Interactions between RNA processing machineries 314 To find out which groups of factors can work together in degrading transcripts, we 315 analyzed their tendency to co-occupy the same transcripts by calculating the Pearson 316 correlation of their occupancy across all transcripts (Figure 5A). We also analyzed their 317 co-localization, that is the tendency of a factor to bind near to another factor’s binding 318 sites using a range of ±40 nt from each cross-link site (Figure 5B). To relate these profiles 319 to those of other factors, we included previously published PAR-CLIP profiles from our 320 lab (Supplementary File 1). Profiles were available for factors that function in nuclear 321 RNA surveillance (Nrd1, Nab3), cap binding (Cbc2), mRNA transcript elongation (Bur1, 322 Bur2, Ctk1, Ctk2, Cdc73, Ctr9, Leo1, Paf1, Rtf1, Set1, Set2, Dot1, Spt5, Spt6, Rpb1), 323 pre-mRNA splicing (Ist3, Nam8, Mud1, Snp1, Luc7, Mud2, Msl5), pre-mRNA 3´ 324 processing (Pab1, Pub1, Rna15, Mpe1, Cft2; Yth1), transcription termination (Rat1, Rai1, 325 Rtt103, Pcf11), and mRNA export (Hrp1, Tho2, Gbp2, Hrb1, Mex67, Sub2, Yra1, Nab2, 326 Npl3) (Baejen et al., 2017, 2014; Battaglia et al., 2017; Schulz et al., 2013). 327 Co-occupancy and co-localization plots for all factors can be found in Figure 5— 328 figure supplement 1 and 2, respectively. A two-dimensional embedding of co- 329 occupancy profiles between all these processing factors is shown in Figure 5C. It 330 represents the degree of similarities between co-occupancy of transcripts (Figure 5A) in 331 terms of the distance in two dimensions. The two-dimensional embedding of the co- 332 localization matrix in Figure 5B shows a similar clustering (Figure 5—figure supplement 333 3). This extensive global analysis suggests which factors reside in functional complexes

9

334 and which functional complexes may interact during RNA processing and degradation. 335 The analysis recovers several established interactions between subunits of known 336 complexes and between different complexes, providing a positive control. For example, 337 all factors of the decapping complex show very high co-occupancy and co-localization, 338 as do Air2 and Mtr4, which reside in the TRAMP4 complex. 339 The analysis contains a lot of new information, forcing us to focus here on a few 340 interesting, novel findings (Figure 5C). First, the largest cluster is formed by the 341 previously analyzed factors involved in transcription elongation by RNA polymerase II 342 (cluster 1) and in co-transcriptional pre-mRNA processing, including cap-binding complex 343 (Cbc2), 3´ processing, transcription termination, and RNA export. The degradation factors 344 Ccr4 and Air1 also reside in this cluster, maybe reflecting the role of Ccr4 in transcription 345 elongation (Kruk et al., 2011). A second cluster is formed by splicing factors (cluster 2). 346 Factors involved in nuclear and cytoplasmic exosomal degradation (Rrp6, Csl4, Rrp4, 347 Rrp40 and Rrp44) form a third cluster (cluster 3). Close to cluster 3, we find the TRAMP4 348 complex subunit Trf4, the elongation factors Dot1, Paf1, Leo1, and the termination factors 349 Pcf11 and Rai1. Rai1 has been shown to detect and remove incomplete 5´ cap structures, 350 to subject aberrant pre-mRNAs to nuclear degradation (Jiao et al., 2010). 351 A fourth cluster is formed by mRNA deadenylation factors together with polyA tail 352 binding proteins (Pab1 and Pub1), Ski7, Ski8, Trf5, and the export factor Yra1 (cluster 4). 353 This is consistent with coupled mRNA deadenylation and subsequent degradation from 354 its 3´ end by the exosome with the Ski or TRAMP complex as adaptors. The fifth cluster 355 is formed by mRNA decapping factors, which cluster together with Xrn1, suggesting a 356 coupling of mRNA decapping with degradation from the 5´ end by Xrn1 (cluster 5). The 357 NMD-involved factors Upf1, Upf2, Upf3 and Nmd4, and Ski2 and Ski3 are also found in 358 cluster 5. The high correlation between Xrn1 and Ski2 has been reported in a CRAC 359 experiment (Tuck and Tollervey, 2013). The elongation factor Ctr9, the 3´ processing 360 factor Mpe1 and the export factors Tho2, Mex67 and Nab2 are also found in cluster 5. A 361 last cluster (cluster 6) is formed by factors involved in nuclear RNA surveillance, including 362 Air2, Mtr4 and the Nrd1/Nab3 complex. Taken together, these findings are consistent with 363 known functional associations and physical interactions between factors and suggest 364 intriguing new associations to be investigated in future work. 365

366 5´ degradation machinery senses translation efficiency

367 To study the link between cytosolic mRNA translation and degradation, we compared the 368 occupancy of degradation factors on mRNAs to their average codon-optimality score 369 (‘transcript optimality’) (Figure 6A, Figure 6—figure supplement 2-8A). We found that 370 the 5´ decapping machinery and Xrn1 preferentially bind transcripts with low transcript

10

371 optimality. In contrast, the 3´ deadenylation machinery and the exosome bind more 372 strongly to optimal transcripts. We asked whether this correlation with codon optimality is 373 introduced by only a few differentially bound codons or by global enrichment/depletion of 374 optimal codons. For this purpose, we introduced a ‘codon enrichment score’, which 375 measures a codon’s enrichment in the set of transcripts bound by the factor relative to 376 the yeast mRNA pool. For Dcp2 this enrichment score is high on non-optimal codons, 377 and low on optimal codons, whereas the opposite trend is observed for Ccr4 and most 378 degradation factors (Figure 6B, Figure 6—figure supplement 2-8B) This is consistent 379 with a model that stalling on translationally inefficient codons can lead to 380 recruitment of Dcp2 and Xrn1 and subsequent 5´ degradation of the transcript (Heck and 381 Wilusz, 2018).

382 To investigate the significance of the correlation between transcript optimality and 383 binding of the 5´ degradation machinery, we compared the contribution of several mRNA 384 features in explaining the occupancy patterns retrieved from PAR-CLIP experiments. 385 Since mRNA expression, half-life, and translation optimality are inter-correlated (Figure 386 6—figure supplement 1), a causative effect of one of these features on binding strength 387 may lead to correlations with all three features. To better distinguish correlation from 388 causation, we used linear regression analysis to explore whether correlations between 389 factor binding and optimality are better explained with other mRNA features (Figure 5— 390 figure supplement 9). We assessed the significance of features via the likelihood ratio 391 test on the multi-variate linear regression model for occupancy. The likelihood ratio test 392 calculates the significance of a feature from the change of the likelihood (quantifying the 393 prediction quality) upon removal of that feature from the regression model. For decapping 394 enhancers (Edc2, Edc3, and Dhh1) and Xrn1, low codon optimality is the most 395 determining feature for binding (Figure 6C). The same is true for NMD factors Upf1 and 396 Upf3, which are known to bind non-optimal transcripts (Celik et al., 2017). This result 397 confirms the importance of the translation efficiency for the stability of cytosolic mRNAs 398 and strengthens our finding that transcripts with low average codon optimality are 399 preferentially targeted by the decapping machinery and degraded from the 5´ end.

400

401 Decapping factors are enriched upon RNA degradation

402 Although decapping occurs at the 5´ end of mRNAs, decapping factors show a strong 403 occupancy near the 3´ end (Figure 3). To investigate this further, we compared metagene 404 profiles of decapping factors between stable (top 25%) and unstable (bottom 25%) 405 transcripts, using mRNA half-life estimates (Figure 7A, Methods). On both stable and 406 unstable mRNAs, Dcp1, Edc2, Edc3, and Dhh1 show increased binding near the 3´ end, 407 but unstable RNAs show a higher occupancy in the transcript body. The catalytically 11

408 active subunit Dcp2 binds almost exclusively at the 3´ end and has a higher occupancy 409 on unstable transcripts. Moreover, A-rich 4-mers are abundant around the proximity (8 410 nt) of Dcp2-cross-link sites (Figure 7C), indicating a binding preference of Dcp2 for A- 411 rich RNA sequences. Overall, these binding patterns suggest that decapping factors are 412 bound in transcript bodies and near the 3´ end of transcripts, and that through closed- 413 loop formation of the mRNA they are in close proximity to the 5´ end. Decapping factors 414 might also travel with the 5´→3´ exonuclease Xrn1 upon RNA degradation.

415 Decapping factors may bind to complete mRNAs or to transcripts that are in the 416 process of being degraded. To quantify these two behaviors, we combined our PAR-CLIP 417 occupancy data with RNA half-life estimates (Methods). We modeled the occupancy of 418 factors on mRNA as the sum of binding to all transcripts (b) and surplus binding to 419 transcripts that are in the process of degradation ( ). Therefore, we can model / 420 occupancy as a function of half-life with a linear equation (occupancy = + b). In cases / 421 where there is no surplus binding upon active degradation, i.e., the occupancy is the same 422 as in intact RNAs, ‘a’ will be zero. For 5´ decapping factors, this model closely fits the 423 occupancy patterns retrieved from our experiments (Figure 7B), other degradation 424 factors also follow this pattern to varying degrees (Figure 6—figure supplement 2-8). In 425 particular, Dcp2 shows a very high a/b ratio, revealing that it cross-links preferentially to 426 transcripts that are being degraded. This analysis strongly suggests that the 5´ decapping 427 machinery, although present to some extent on complete mRNAs, is enriched when 428 mRNAs are degraded.

429

430 Discussion:

431 Here we report transcriptome-wide binding maps for 30 RNA degradation factors in yeast. 432 A detailed bioinformatics analysis of these maps revealed how degradation factors vary 433 in their binding specificities for different classes of RNAs and with respect to their 434 preferred locations on RNA transcripts. Global comparisons of the profiles alongside 435 previously published profiles of other RNA-binding factors revealed clusters of factors that 436 co-occupy RNAs or co-localize on RNAs. Our data are consistent with a large body of 437 published results and extend these to a global scale. In addition, the results revealed 438 several unexpected, novel findings, which we discuss below. Although our data reflect 439 factor cross-linking signal and measure occupancy on transcripts, and do not directly 440 reveal function, the correlations of occupancies between factors and with transcript 441 properties indicate functional aspects and suggest functional associations between 442 factors that may guide future studies.

12

443 With respect to canonical mRNA turnover in the cytoplasm, the Pan2/Pan3 444 deadenylation complex and subunits of the Ccr4/Not complex bound preferentially to the 445 pA site, reflecting a function in polyA tail shortening at early stages of RNA degradation. 446 In contrast, the Ccr4/Not complex subunit Ccr4 bound also strongly in the 5´ region of 447 transcripts. This pattern may reflect functional differences between Ccr4 and Pop2 during 448 deadenylation (Webster et al., 2018) or an additional function of Ccr4 in transcription 449 elongation (Kruk et al., 2011). Ccr4/Not subunits also show variations in their RNA- 450 binding specificities, suggesting several isoforms of the complex that vary in composition 451 and function, or RNA-specific conformational rearrangements. Factors involved in 452 decapping show higher cross-linking near the RNA 3´ end, consistent with previously 453 proposed binding near the pA site (Chowdhury et al., 2007). The profiles of decapping 454 factors resemble those of Xrn1, suggesting formation of a complex with this 5´→3´ 455 exonuclease or a fast mRNA decay by Xrn1 from 5´→3´ and slowing down towards the 456 3´ end. Complex formation of Xrn1 and the decapping factors is consistent with a function 457 of Xrn1 in the buffering of mRNA levels in cells (Sun et al., 2013), which may be explained 458 if Xrn1 is a regulatory component of the decapping complex.

459 The preferred localization of 5´ decapping and 3´ deadenylation factors near the 460 mRNA 3´ and 5´ end, respectively, seems counterintuitive, but may be explained by 461 formation of an mRNA closed-loop structure by messenger ribonucleoproteins (mRNPs), 462 where 5´ and 3´ ends are in proximity (Gallie, 1991). It is possible that mature mRNPs 463 carry decapping factors near their 3´ end, and that upon polyA tail shorting the decapping 464 complex is activated, leading to decapping and rapid RNA degradation from the 5´ end. 465 In this model, decapping would open the RNA closed-loop structure, providing access for 466 and allowing for rapid RNA removal. Further, our result that decapping 467 factors are enriched on translationally non-optimal codons agrees with previous findings 468 that suggest a link between translation and RNA degradation from the 5´ end through the 469 decapping enhancer Dhh1 (Radhakrishnan et al., 2016). We note however that our 470 approach does not detect binding events within the polyA tail, limiting further insights.

471 Comparison of our data with previous profiles of the nuclear surveillance factors 472 Nrd1 and Nab3 reveals many factors that show a similar binding to aberrant non-coding 473 nuclear RNAs, in particular antisense RNA upstream of known promoters, suggesting that 474 these factors are part of the nuclear surveillance machinery. These results are consistent 475 with published data (Schmid and Jensen, 2018) and with the following model for nuclear 476 surveillance. First, the Nrd1/Nab3 complex recognizes aberrant, antisense RNAs (Schulz 477 et al., 2013). These RNAs then get adenylated by the TRAMP4 complex, consistent with 478 an interaction of the Nrd1/Nab3 complex and the TRAMP subunit Trf4 (Tudek et al., 479 2014). The short polyA tail targets the RNA for degradation by the nuclear exosome. We

13

480 note that the degradation of introns and ncRNAs upstream of mRNAs on the same strand, 481 which were annotated as NUTs and CUTs, is likely to use the same degradation 482 mechanism because these are bound by the same factors (Figure 2, Figure 3—figure 483 supplement 2). Thus, our results are consistent with the idea that degradation of short- 484 lived ncRNAs in the nucleus involves Nrd1/Nab3, TRAMP4, and the exosome.

485 The exosome is involved in 3´→5´ cytoplasmic mRNA degradation, and in the 486 processing and degradation of long-lived transcripts such as tRNAs, rRNAs and 487 sn(o)RNAs. The differences in exosome co-factor binding to different RNA classes 488 (Figure 2) support the hypothesis that these factors confer specificity for processing and 489 degradation of various RNA species (Delan-Forino et al., 2017). The exosome co-factor 490 Ski2 shows cross-linking towards the 3´ end of mRNAs (Figure 3). This indicates that the 491 subunit is important for initial RNA degradation by using its helicase activity to dissolve 492 secondary structures and allowing the exosome to start degradation of the transcript 493 (Schneider and Tollervey, 2013). Like the exosome complex, the exosome co-factor 494 TRAMP5 binds mRNAs towards the 5´ end of transcripts, and thus some mRNAs may be 495 targeted by TRAMP5 for exosomal degradation. The catalytic exosome subunits Rrp44 496 and Rrp6 and the exosome core cross-link near the mRNA 5´ end, probably because the 497 exosome moves rapidly from 3´→5´ and then slows down, causing extensive cross- 498 linking. In summary, our resource of transcriptome-binding profiles for 30 RNA 499 degradation factors reveals several hypotheses for the function of these factors that can 500 be tested case by case in the future.

501

502 Materials and Methods:

503 Key Resources Table: Reagent type Designatio Additional (species) or Source or reference Identifiers n information resource strain, strain Ccr4_TAP C-terminally tagged SGD: background (S. gene (Open S000000019 cerevisiae, Biosystems, BY4741) Germany). strain, strain Pop2_TAP C-terminally tagged SGD: background (S. gene (Open S000005335 cerevisiae, Biosystems, BY4741) Germany). strain, strain Not1_TAP C-terminally tagged SGD: background (S. gene (Open S000000689 cerevisiae, Biosystems, BY4741) Germany). 14

strain, strain Caf40_TAP C-terminally tagged SGD: background (S. gene (Open S000005232 cerevisiae, Biosystems, BY4741) Germany). strain, strain Pan2_TAP C-terminally tagged SGD: background (S. gene (Open S000003062 cerevisiae, Biosystems, BY4741) Germany). strain, strain Pan3_TAP C-terminally tagged SGD: background (S. gene (Open S000001508 cerevisiae, Biosystems, BY4741) Germany). strain, strain Dcp1_TAP C-terminally tagged SGD: background (S. gene (Open S000005509 cerevisiae, Biosystems, BY4741) Germany). strain, strain Dcp2_TAP C-terminally tagged SGD: background (S. gene (Open S000005062 cerevisiae, Biosystems, BY4741) Germany). strain, strain Edc2_TAP C-terminally tagged SGD: background (S. gene (Open S000000837 cerevisiae, Biosystems, BY4741) Germany). strain, strain Edc3_TAP C-terminally tagged SGD: background (S. gene (Open S000000741 cerevisiae, Biosystems, BY4741) Germany). strain, strain Dhh1_TAP C-terminally tagged SGD: background (S. gene (Open S000002319 cerevisiae, Biosystems, BY4741) Germany). strain, strain Xrn1_TAP C-terminally tagged SGD: background (S. gene (Open S000003141 cerevisiae, Biosystems, BY4741) Germany). strain, strain Rrp6_TAP C-terminally tagged SGD: background (S. gene (Open S000005527 cerevisiae, Biosystems, BY4741) Germany). strain, strain Csl4_TAP C-terminally tagged SGD: background (S. gene (Open S000005176 cerevisiae, Biosystems, BY4741) Germany).

15

strain, strain Rrp40-TAP C-terminally tagged SGD: background (S. gene (Open S000005502 cerevisiae, Biosystems, BY4741) Germany). strain, strain Rrp4_TAP C-terminally tagged SGD: background (S. gene (Open S000001111 cerevisiae, Biosystems, BY4741) Germany). strain, strain Rrp44_TAP C-terminally tagged SGD: background (S. gene (Open S000005381 cerevisiae, Biosystems, BY4741) Germany). strain, strain Air1_TAP C-terminally tagged SGD: background (S. gene (Open S000001341 cerevisiae, Biosystems, BY4741) Germany). strain, strain Trf5_TAP C-terminally tagged SGD: background (S. gene (Open S000005243 cerevisiae, Biosystems, BY4741) Germany). strain, strain Mtr4_TAP C-terminally tagged SGD: background (S. gene (Open S000003586 cerevisiae, Biosystems, BY4741) Germany). strain, strain Air2_TAP C-terminally tagged SGD: background (S. gene (Open S000002334 cerevisiae, Biosystems, BY4741) Germany). strain, strain Trf4_TAP C-terminally tagged SGD: background (S. gene (Open S000005475 cerevisiae, Biosystems, BY4741) Germany). strain, strain Ski2_TAP C-terminally tagged SGD: background (S. gene (Open S000004390 cerevisiae, Biosystems, BY4741) Germany). strain, strain Ski3_TAP C-terminally tagged SGD: background (S. gene (Open S000006393 cerevisiae, Biosystems, BY4741) Germany). strain, strain Ski7_TAP C-terminally tagged SGD: background (S. gene (Open S000005602 cerevisiae, Biosystems, BY4741) Germany).

16

strain, strain Ski8_TAP C-terminally tagged SGD: background (S. gene (Open S000003181 cerevisiae, Biosystems, BY4741) Germany). strain, strain Upf1_TAP C-terminally tagged SGD: background (S. gene (Open S000004685 cerevisiae, Biosystems, BY4741) Germany). strain, strain Upf2_TAP C-terminally tagged SGD: background (S. gene (Open S000001119 cerevisiae, Biosystems, BY4741) Germany). strain, strain Upf3_TAP C-terminally tagged SGD: background (S. gene (Open S000003304 cerevisiae, Biosystems, BY4741) Germany). strain, strain Nmd4_TAP C-terminally tagged SGD: background (S. gene (Open S000004355 cerevisiae, Biosystems, BY4741) Germany). antibody IgG Sigma-Aldrich Cat#: I5006, IP: 0.1 mg RRID:AB_11 per IP 63659 antibody PAP anti- Sigma Aldrich Cat#: P1291, WB TAP RRID:AB_10 (1:2000) 79562 Commercial Dynabeads Invitrogen Cat#: 10003D 330 µl per assay or kit Protein G IP Commercial RNase T1 Thermo Fisher Cat#: assay or kit Scientific EN0541 Commercial Antarctic NEB Cat#: assay or kit Phosphata M0289S se Commercial RNase Invitrogen Cat#: assay or kit OUT 10777019 Commercial T4 Invitrogen Cat#: assay or kit Polynucleot EK0032 ide Kinase Commercial T4 RNA NEB Cat#: assay or kit ligase 2, M0373S truncated KQ Commercial T4 RNA NEB Cat#: assay or kit ligase 1 M0437M

17

Commercial Proteinase NEB Cat#: assay or kit K P8107S Commercial SuperScript Thermo Fisher Cat#: assay or kit III RT Scientific 18080093 Commercial Phusion Thermo Fisher Cat#: F531S assay or kit High- Scientific Fidelity PCR Master Mix chemical 4-thiouracil Carbosynth Cat#: 1 mM final compound, drug 591-28-6 conc. software, mockinbird Roth and Torkler, algorithm 2018; https://github.com/so edinglab/Degradation _scripts software, UMI-tools Smith et al, 2017; algorithm doi:10.1101/gr.20960 1.116 software, Skewer Jiang et al, 2014; doi: algorithm 10.1186/1471-2105- 15-182. software, Bowtie Langmead et al, algorithm 2009; doi: 10.1186/gb-2009-10- 3-r25 software, tSNE Van Der Maaten and algorithm Hinton, 2008; doi: 10.1007/s10479-011- 0841-3 504 505 S. cerevisiae strain verification 506 Saccharomyces cerevisiae BY4741 strains harboring C-terminally tagged genes (Open 507 Biosystems, Germany) were tested for the correctly inserted tag by Western Blotting 508 using the Peroxidase Anti-Peroxidase (PAP; Sigma) antibody and Pierce™ ECL Western 509 Blotting Substrate (Thermo Fisher Scientific, USA) (data not shown).

510 511 PAR-CLIP experiments of S. cerevisiae proteins 512 PAR-CLIP was performed as described (Baejen et al., 2014; Battaglia et al., 2017). 513 Briefly, TAP-tagged protein expressing yeast cells were grown in minimal medium (CSM 514 mixture, Formedium, UK) supplemented with 89 µM uracil, 50-100 µM 4-thiouracil (4tU) 515 and 2% glucose at 30°C to OD600 = 0.5. Cells were labeled in 1 mM 4tU final concentration 516 for 4 h. After labeling, cells were harvested, resuspended in ice-cold PBS and UV 18

517 irradiated with 10-12 J/cm2 at a wavelength of 365 nm on ice and continuous shaking. 518 Lysis was performed in lysis buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.5% sodium 519 deoxycholate, 0.1% SDS, 0.5% NP-40) by bead beating (FastPrep−24 Instrument, MP 520 Biomedicals, LLC., France) using silica-zirconium beads (Roth, Germany). The cleared 521 lysate was used for immunoprecipitation with rabbit IgG-conjugated Protein G magnetic 522 beads (Invitrogen, Germany) on a rotating wheel for 4 h or overnight at 4°C. Beads were 523 washed in wash buffer (50 mM Tris-HCl pH 7.5, 1 M NaCl, 0.5% sodium deoxycholate, 524 0.1% SDS, 0.5% NP-40). IP efficiency was controlled with part of the sample by Western 525 Blot as shown in Figure 1 – figure supplement 2. Partial digest of the cross-linked RNA 526 was performed with 50 U RNase T1 per mL for 15-25 min at 25 °C. The 527 dephosphorylation reaction was performed in antarctic phosphatase reaction buffer 528 (NEB, Germany) supplemented with 1 U/µL of antarctic phosphatase and 1 U/µL of 529 RNase OUT (Invitrogen) at 37 °C for 30 min. For rephosphyorylation, beads were 530 incubated in T4 PNK reaction buffer A (Invitrogen) with a final concentration of 1 U/µL T4 531 PNK, 1 U/µL RNase OUT and 1 mM ATP for 1 h at 37 °C. 3´ adapter ligation was 532 performed in T4 RNA ligase buffer (NEB) with 10 U/µL T4 RNA ligase 2 (KQ) (NEB), 533 10 μM 3′ adapter (5′ 5rApp-TGGAATTCTCGGGTGCCAAGG-3ddC 3′ (IDT), 1 U/µL 534 RNase OUT, and 15% (w/v) PEG 8000 overnight at 16 °C. 5´ adapter was ligated to the 535 RNA using T4 RNA ligase buffer (NEB) with 6 U/µL T4 RNA ligase 1 (NEB), 10 μM 5′ 536 adapter (5′ 5InvddT-GUUCAGAGUUCUACAGUCCGACGAUCNNNNN 3′, IDT), 1 mM 537 ATP, 1 U/µL RNase OUT, 5 % (v/v) DMSO, and 10 % (w/v) PEG 8000 for 4 h at 25 °C 538 and 1 h at 37 °C. Beads were boiled in proteinase K buffer (50 mM Tris-HCl pH 7.5, 539 6.25 mM EDTA, 75 mM NaCl, 1% SDS) at 95 °C for 5 min. Proteinase K digest was 540 performed with 1.5 mg/mL proteinase K (NEB) for 2 h at 55 °C. RNA was recovered by 541 acidic phenol/chloroform extraction followed by ethanol precipitation in presence of 0.5 µL 542 GlycoBlue (Invitrogen) and 100 μM RT primer (5′ CCTTGGCACCCGAGAATTCCA 3′, 543 IDT). SuperScript III RTase was used for reverse transcription for 1 h at 44 °C and 1 h at 544 55 °C. NEXTflex barcode primer and universal primer were added to cDNA by PCR 545 amplification with Phusion HF master mix (NEB). After PCR amplification, cDNA was 546 purified and size-selected on a 4% E-Gel EX Agarose Gel (Invitrogen). Quantification on 547 an Agilent 2200 TapeStation instrument (Agilent Technologies, Germany) and 50-75 nt 548 single-end sequencing was performed on Illumina sequencers (HiSeq1500, HiSeq2500 549 and NextSeq550). 550 551 PAR-CLIP data pre-processing 552 Reads from PAR-CLIP experiments with replicates were merged after making sure that 553 all samples showed high Spearman correlation values comparing binding occupancies of 554 replicates on different genes (Figure 2—figure supplement 1). Mapping and statistical

19

555 evaluation of PAR-CLIP experiments was performed using our in-house software 556 mockinbird (Roth and Torkler, 2018). In summary, the UMI is removed from the 5´ end 557 with UMI-tools (Smith et al., 2017), and the 3´ adapter is trimmed with Skewer (Jiang et 558 al., 2014). Reads with traces of the 5´ adapter are discarded. The preprocessed reads 559 are then mapped to the S. cerevisiae genome (sacCer3, version 64.2.1). After mapping 560 PCR duplicates are removed with UMI-tools. 561 We used two alternative approaches for mapping reads using Bowtie (Langmead 562 et al., 2009): For all analyses except the ‘transcript class enrichment analysis’ in Figure 563 2, reads are uniquely mapped with up to one mismatch. We discard alignments shorter 564 than 20 nt. This stringent mapping ensures that our high confidence PAR-CLIP cross-link 565 sites are originating from correctly mapped reads on the reference genome. For Figure 566 2, unique mapping would cause the loss of most reads that fall into rRNAs and tRNAs 567 because of duplicated rRNA genes and tRNA isodecoders. For Figure 2, we therefore 568 allowed Bowtie multi-mapping in two regions with –best, –starra options and discarded 569 reads shorter than 30 nt. 570 T→C transitions directly at the edge of the reads or with a Phred quality score 571 lower than 20 are not considered as signature of protein binding as they suffer from higher 572 technical noise. To obtain high confidence cross-link sites, we set a stringent cutoff of 573 0.005 for the p-value of cross-link sites and require a minimum coverage of 2 per site. 574 Moreover, if we see the same transition in at least 75% of reads in the input library control 575 (SRA: SRX532381) (Baejen et al., 2014), we annotate it as a single 576 polymorphism of our lab strain with respect to the genomic reference and remove such 577 sites from our analysis. Finally, the occupancy of a factor on a verified cross-link site is 578 defined as the number of transitions obtained from our PAR-CLIP experiments divided by 579 the concentration of RNAs covering the cross-link site according to the input library 580 control. This control coverage is measured under comparable conditions to PAR-CLIP 581 experiments (Baejen et al., 2014). Occupancy values are capped at the 95th percentile. 582 Subsequent analyses were performed using in-house python scripts. Mockinbird 583 configuration files as well as the analysis scripts can be found at 584 https://github.com/soedinglab/Degradation_scripts. 585 586 Transcript class enrichment 587 We analyzed the distribution of reads from high confidence cross-link sites over the 588 genome (Figure 2A). We presented the sum of reads from 5´ and 3´ UTRs, coding 589 sequences, and introns as the value for mRNAs. Reads that fall within genomic regions 590 not annotated as categories analyzed here are shown with gray. These annotated 591 transcript classes have comparable U-content, making the comparison between fractions 592 of cross-link sites in each category possible (Figure 2—figure supplement 2).

20

593 For each factor studied here, we defined enrichment scores that represent their 594 preferences for binding to various transcript classes c, in comparison to all other factors. 595 We use annotations for rRNA, tRNA, snoRNA, snRNA, coding sequences (CDS), from S. 596 cerevisiae genome sacCer3, version 64.2.1. Untranslated regions around coding 597 boundaries (5´ and 3´ UTRs) were annotated based on TIF-seq experiment (Pelechano 598 et al., 2013). We selected the most strongly expressed isoform for each gene. We then 599 assigned boundaries to 3´ and 5´ UTRs based on annotated CDS of the same gene. We 600 furthermore used annotations for stable, unannotated transcripts (SUTs), cryptic unstable 601 transcripts (CUTs), and Nrd1-unterminated transcripts (NUTs) (Neil et al., 2009; 602 Pelechano et al., 2013; Schulz et al., 2013). We removed overlapping annotations with 603 the following priority list: rRNA, tRNA, snRNA, snoRNA, intron, CDS, UTR, SUT, CUT, 604 NUT. For each factor, we counted the number of high confidence reads falling in each 605 transcript class. We then used the log2-transformed matrix and normalized it in the 606 following way for both rows and columns to get log enrichment values that sum to zero in 607 both rows and columns. The row- and sum-normalized enrichment score is defined as

608 follows, where �, is the number of high confidence reads for factor f that fall into

609 transcript class c, and �′, = log �, (Figure 2B): 610 Xf,0 X0 ,c ˜ Xf,c0 = Xf,c0 X0 AAACPHicbZDNS8MwGMbT+TXrV9Wjl+AQPehoRdCLMPTicaL7gLWMNE23sPSDJBVG6R/mxT/CmycvHhTx6tm0q6ibLwR+eZ73JXkfN2ZUSNN80ipz8wuLS9VlfWV1bX3D2NxqiyjhmLRwxCLedZEgjIakJalkpBtzggKXkY47usz9zh3hgkbhrRzHxAnQIKQ+xUgqqW/c2JIyj6Td/X7qH+Isg+fwm+ERtD2fI1y6NqYcZzkXlHenP7eJm+m63jdqZt0sCs6CVUINlNXsG4+2F+EkIKHEDAnRs8xYOinikmJGMt1OBIkRHqEB6SkMUUCEkxbLZ3BPKR70I65OKGGh/p5IUSDEOHBVZ4DkUEx7ufif10ukf+akNIwTSUI8echPGJQRzJOEHuUESzZWgDCn6q8QD5GKS6q88xCs6ZVnoX1ct8y6dX1Sa1yUcVTBDtgFB8ACp6ABrkATtAAG9+AZvII37UF70d61j0lrRStntsGf0j6/AF7rrPw= , 611 612

613 We defined the row and sum averages of ��,� ,

614 C 615 1 , Xf,0 = Xf,c0 616 C

AAACHHicbVDLSgMxFM3UV62vqks3wSK6kDKjgm4KxW5cVrAP6NQhc5tpQzMPkoxQwnyIG3/FjQtF3LgQ/BvTx0JbDwQO55zLzT1+wplUtv1t5ZaWV1bX8uuFjc2t7Z3i7l5TxqkA2oCYx6LtE0k5i2hDMcVpOxGUhD6nLX9YG/utByoki6M7NUpoNyT9iAUMiDKSVzxvH3s6OHWBCchwBbu9QBDQTqZrmSvT0NNQcbL7mp7mIMsKBa9Yssv2BHiRODNSQjPUveKn24shDWmkgBMpO46dqK4mQjHgNCu4qaQJgSHp046hEQmp7OrJcRk+MkoPB7EwL1J4ov6e0CSUchT6JhkSNZDz3lj8z+ukKrjqahYlqaIRTBcFKccqxuOmcI8JCoqPDCEgmPkrhgEx7SjT57gEZ/7kRdI8Kzt22bm9KFWvZ3Xk0QE6RCfIQZeoim5QHTUQoEf0jF7Rm/VkvVjv1sc0mrNmM/voD6yvH+gLoJw= c=1 X 617 F 618 1 , X0 ,c = Xf,c0 619 F AAACHHicbVDLSgMxFM34rONr1KWbYBFdSJlRQTeFolBcVrAP6IxDJs20oZkHSUYoYT7Ejb/ixoUiblwI/o2ZtgttPRA4nHMuN/cEKaNC2va3sbC4tLyyWloz1zc2t7atnd2WSDKOSRMnLOGdAAnCaEyakkpGOiknKAoYaQfD68JvPxAuaBLfyVFKvAj1YxpSjKSWfOusc+QrF1OOT3AOq9DthRxh5eSqnrsii3wVVp38vq6KXKgzuWn6Vtmu2GPAeeJMSRlM0fCtT7eX4CwiscQMCdF17FR6CnFJMSO56WaCpAgPUZ90NY1RRISnxsfl8FArPRgmXL9YwrH6e0KhSIhRFOhkhORAzHqF+J/XzWR46Skap5kkMZ4sCjMGZQKLpmCPcoIlG2mCMKf6rxAPkG5H6j6LEpzZk+dJ67Ti2BXn9rxcu5rWUQL74AAcAwdcgBq4AQ3QBBg8gmfwCt6MJ+PFeDc+JtEFYzqzB/7A+PoB8jOgog== fX=1 620 F C 621 1 , X0 , = Xf,c0 622 FC c=1 AAACLXicbZDLSgMxFIYzXmu9VV26CRbRhZQZEXQjFCvFpYK1hc44ZE4zGsxcSDJCCfNCbnwVEVxUxK2vYaadhVoPJHz85/wk5w9SzqSy7ZE1Mzs3v7BYWaour6yurdc2Nm9kkgmgHUh4InoBkZSzmHYUU5z2UkFJFHDaDR5aRb/7SIVkSXythin1InIXs5ABUUbya+e9PV+7wAQcjO8cn2J3EAoC2sl1u5W7Mot8HZ46+W17wlBwSxfG8ADyvFr1a3W7YY8LT4NTQh2VdenXXt1BAllEYwWcSNl37FR5mgjFgNO86maSpgQeyB3tG4xJRKWnx9vmeNcoAxwmwpxY4bH606FJJOUwCsxkRNS9/NsrxP96/UyFJ55mcZopGsPkoTDjWCW4iA4PmKCg+NAAAcHMXzHcExOVMgEXITh/V56Gm8OGYzecq6N686yMo4K20Q7aRw46Rk10gS5RBwF6Qi9ohN6tZ+vN+rA+J6MzVunZQr/K+voGfvOnow== f=1 623 X X

21

624 F is the number of factors and C is the number of transcript classes (Figure 2B). 625 The normalization can be interpreted as subtracting from the log enrichment matrix X´ the 626 first singular component of its singular-value decomposition. 627 628 Metagene analysis 629 We used the most abundant TIF-annotated isoform for mRNAs (Pelechano et al., 2013) 630 as a reference. Transcripts longer than 1500 bases are chosen and aligned at their TSS 631 or pA sites. The average occupancy per nucleotide is then calculated based on high 632 confidence cross-link sites of each PAR-CLIP experiment. The profiles are smoothed by 633 a moving average in a 41 nt window and the 95% confidence interval is estimated by 634 1500 bootstrap sampling iterations over the transcripts. To further denoise the profiles, 635 the cross-link sites falling in snRNAs, rRNAs, and tRNAs are removed. Furthermore, to 636 avoid ambiguous results, we made sure that the profile comes solely from the central 637 gene. To do so, we performed the metagene analysis around the TSS on the sense strand 638 on TIF-annotated mRNAs that have no other mRNA up to 700 bp upstream of their TSS 639 (3193 transcripts in total). Analogously, for sense-strand pA site profiles we used mRNAs 640 that have no nearby genes downstream of their pA site up to 700 bases on the same 641 strand (3193 transcripts in total). For the antisense strand profiles, we applied the same 642 criteria on the opposite strand which left us with 3076 and 3193 transcripts filtered around 643 TSS and pA site respectively. This ensures that the observed antisense binding does not 644 originate from neighboring or overlapping transcripts on the antisense strand. In both 645 cases we looked at the average occupancy in a window of [±700 nt] around TSS and 646 around pA sites. Occupancies were normalized to the maximum value, which is the 647 background binding level for antisense profiles with no significant cross-linking to the 648 antisense strand (Figure 3, Figure 4, Figure 3—figure supplement 3, and Figure 4— 649 figure supplement 2). The same procedure was followed to plot metagene occupancies 650 centered around protein-coding regions and snoRNAs from S. cerevisiae genome 651 sacCer3, version 64.2.1 (Figure 3—figure supplement 1 and Figure 2—figure 652 supplement 2). Similarly, CRAC coverage profiles of Xrn1, Mtr4, Trf4, and Ski2 (pre- 653 processed as described in Tuck and Tollervey, 2013) were aligned to TIF-annotated 654 transcripts in the same approach as described here (Figure 1—figure supplement 1). 655 656 Co-occupancy 657 Co-occupancy measures the tendency of two factors to bind to the same transcripts. 658 Occupancy of a factor on a transcript is defined as the sum of occupancies for all high 659 confidence cross-link sites falling within this transcript. Co-occupancy of two factors is 660 defined as the Pearson correlation over all transcripts between the occupancies of these 661 factors (Figure 5A). We used these correlation values between all pairs of RNA

22

662 processing factors to assign distances to each pair and used tSNE (Van Der Maaten and 663 Hinton, 2008) to visualize the two-dimensional nonlinear embedding of co-occupancies 664 for all RNA-binding proteins in our dataset (Figure 5C). 665 666 Co-localization 667 Co-localization measures how likely two factors are to bind near each other in the 668 transcriptome. More precisely, we first calculate the occupancy of a factor f ∈{1,…,F} 669 around the cross-link sites of another factor f´ ([-40 nt, +40 nt] excluding the centered T). 670 We then normalize according to the total occupancy values, 671

nf 1 40 0 z = Occ + Occ ff0 0 ff0,i,j ff0,i,j1 i=1 j= 40 j=1 672 AAACUnicbVJNS8MwGM7m95w69eglOGQTdbQy0MtA9OLNCU4H6yxplm7Z0rQkb4VZ+hsF8eIP8eJBzT4Ep74QeHg+SPIkXiS4Bst6zWTn5hcWl5ZXcqv5tfWNwubWrQ5jRVmDhiJUTY9oJrhkDeAgWDNSjASeYHfe4GKk3z0wpXkob2AYsXZAupL7nBIwlFvgj27i+6UU17Cj48BNeM1O7xNp2FKaOoL5UJ4I/dpR1TLSkZ3iK0rHsUN+2E/xAf52jKLGNKs7ind7sO8WilbFGg/+C+wpKKLp1N3Cs9MJaRwwCVQQrVu2FUE7IQo4FSzNObFmEaED0mUtAyUJmG4n40pSvGeYDvZDZZYEPGZ/JhISaD0MPOMMCPT0b21E/qe1YvBP2wmXUQxM0slGfiwwhHjUL+5wxSiIoQGEKm7OimmPKELBvELOlGD/vvJfcHtcsa2KfV0tnp1P61hGO2gXlZGNTtAZukR11EAUPaE39IE+My+Z96z5JRNrNjPNbKOZyea/ACJRsq8=sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4sha1_base64="ZOZy2ChzUQ7GgzxZINKin2EFaXg=">AAACR3icbVFNSwMxFHyt3/WrevUSLKKill0R9FIQvHhTwbZCty7ZNNumZrNL8laoy/5GQbz4Q7x4UNNWQasDIcPMG5JMgkQKg47zUihOTc/Mzs0vlBaXlldWy2tLDROnmvE6i2WsbwJquBSK11Gg5DeJ5jQKJG8Gd2dDv3nPtRGxusZBwtsR7SoRCkbRSn5ZPPhZGG7npEY8k0Z+Jmpufpspq27nuSd5iDtjo187OHKsdeDm5IKxUWxf7Pdzske+J4ZRO/Tb97To9nDXL1ecqjMC+UvcL1KBL1z65SevE7M04gqZpMa0XCfBdkY1CiZ5XvJSwxPK7miXtyxVNOKmnY0qycmWVTokjLVdCslI/ZnIaGTMIArsZESxZya9ofif10oxPGlnQiUpcsXGB4WpJBiTYb+kIzRnKAeWUKaFvSthPaopQ/sLJVuCO/nkv6RxWHWdqnvlwDxswCbsgAvHcArncAl1YPAIr/AOH4XnwluxMK7re4d1+IXi1CeD5bHxsha1_base64="1LjyKLhDWU+7nEPQnSYPmJ3IcV4=">AAACUnicbVJNS8MwGM7m15xTpx69BIeouI1WBL0Mhl68OcE5YZ0lzdItW5qW5K0wS3+jIF78IV48qNmH4NcLgYfngyRP4kWCa7Csl0x2bn5hcSm3nF8prK6tFzc2b3QYK8qaNBShuvWIZoJL1gQOgt1GipHAE6zlDc/HeuueKc1DeQ2jiHUC0pPc55SAodwif3AT399LcQ07Og7chNfs9C6Rht1LU0cwH/anwqBWObaMVLFTfEnpJFbm5UGKD/GXYxw1pp+6o3ivDwdusWRVrcngv8CegRKaTcMtPjndkMYBk0AF0bptWxF0EqKAU8HSvBNrFhE6JD3WNlCSgOlOMqkkxbuG6WI/VGZJwBP2eyIhgdajwDPOgEBf/9bG5H9aOwb/tJNwGcXAJJ1u5McCQ4jH/eIuV4yCGBlAqOLmrJj2iSIUzCvkTQn27yv/BTdHVduq2ldWqX42qyOHttEO2kc2OkF1dIEaqIkoekSv6B19ZJ4zb1nzS6bWbGaW2UI/Jlv4BCERsqs= X X X @ zff0 A co-localization(f,f0)= zff zff 673 f 0 f 0 0 674 P P 675 Where, nf is the number of cross-link sites for factor f, and Occff´,i,j is the occupancy of f at th 676 position j around the i cross-link site from factor f´ (Occff´,i,j = 0 if no verified cross-link 677 sites exist). To improve signal-to-noise, we compute from the resulting matrix of co- 678 localizations between all RNA-processing factors Cf,f´, the matrix of Pearson correlations 679 between the rows of Cf,f´, (Figure 5B). 680 681 Codon-enrichment analysis 682 To search for possible links between translation efficiency and RNA degradation, we 683 checked if some degradation factors preferentially bind to translationally efficient/non- 684 efficient transcripts. To do so we adapted the proposed normalized translation efficiency 685 scale (Pechmann and Frydman, 2013). The authors generate a normalized optimality 686 score for codons that incorporates the competition between supply and demand of tRNAs. 687 The coding region for each transcript was extracted according to ORFs annotated by 688 SGD. The codon optimality score was averaged over the whole reading frame (Figure 689 6A, more detailed explanation in the next section). 690 We then checked whether mRNAs that bind to each factor are enriched or depleted 691 in some codons compared to all mRNAs. To achieve this, we defined the following score 692 for codon enrichment that represents deviations from average frequencies in all mRNAs, 693

23

T occ(t) t=1 T Fc,t occ(t0) ⇥ ! codon enrichment = t0=1 P 1 T P t=1 Fc,t

694 AAACeHicbVFda9swFJW9ry77aLo99kUsK0lgBLsMNhiFssHYYwdJW4gzI1/LiagsGem6EIx+w/7b3vZD9rKnyY4Ha7sLgsM550r3HmWVFBaj6GcQ3rv/4OGjvceDJ0+fPd8fHrw4t7o2wBegpTaXGbNcCsUXKFDyy8pwVmaSX2RXn1r94pobK7Sa47biq5KtlSgEMPRUOvwOOtcq+cCVEbApuUJ6QpO8MAyaxNZl2uBJ7L7NaSJ5gZNe0QATnLq/jnFraeaOdvx46miCouSWfk4beIMuMWK96fy79th5843be6MbpMNRNIu6ondB3IMR6essHf5Icg11OzlIZu0yjipcNcygAMndIKktrxhcsTVfeqiYn2vVdME5euSZnBba+OM379h/OxpWWrstM+8sGW7sba0l/6ctayzerxqhqhq5gt1DRS0patr+As2F4YBy6wEDI/ysFDbMZ4P+r9oQ4tsr3wXnx7M4msVf345OP/Zx7JFD8opMSEzekVPyhZyRBQHyKzgMXgdHwe+QhuNwurOGQd/zktyo8PgPuuLATQ== T 695 P 696 Here T is the number of mRNA transcripts, ��,� is the fraction of the codon c in transcript

697 t, and ���(�) is the total occupancy of the factor on transcript t. 90% confidence intervals 698 were generated by bootstrapping: we sampled with replacement 1000 times the same 699 number of mRNAs from the total set as in total, and for each set we recalculated the 700 codon enrichment score. We colored the bars based on the previously ranked optimality 701 of codons (Pechmann and Frydman, 2013) (Figure 6B, Figure 6—figure supplement 702 2-8). 703 704 Relating occupancies to various transcript features 705 We analyzed the correlation of the occupancy of all factors with transcript length, codon 706 enrichment of the transcript, expression level, transcript stability, and polyA tail length. 707 For expression, we used an RNA-seq experiment of wild-type yeast (SRA: SRX532381) 708 (Baejen et al., 2017) and mapped the reads to mRNAs. We present the average number 709 of reads per base as an estimate for . For half-life calculations, we used 710 published yeast 4tU-seq (GEO: GSM2199309) and RNA-seq experiments (SRA: 711 SRX532381) (Baejen et al., 2017). Transcript half-life is estimated with an optimized 712 method that will be published elsewhere (Hofmann et al., unpublished). 713 Since there are only few transcripts with very low or very high half-life, codon 714 optimality, and expression (Figure 6—figure supplement 1), we performed the analysis 715 on a subset of mRNAs where the transcript property lies between the 5% and 95% 716 quantiles. We then compared the total occupancy of degradation factors on each mRNA 717 relative to such transcript features (Figure 6A, 7B, and Figure 6—figure supplement 2- 718 8). We show 95% confidence intervals generated by bootstrapping mRNAs in gray shade. 719 We checked whether such correlations originate from the feature of interest or 720 merely shows up due to correlations between this feature and others (Figure 6—figure 721 supplement 1). We used a multivariate linear regression to model total occupancy as a 722 linear function of these four features: 723 occupancy0(t) length + optimality + expression + halflife 724 ⇠ 725 726 In cases where the correlation is a direct effect from our feature of interest, we 727 expect to lose significantly on our prediction when this variable is taken out of the 24

728 equation. Therefore, we use p-values representing the importance of each feature in this 729 linear regression as a score representing the significance of its contribution in explaining 730 the final occupancies. Occupancy correlated strongly with transcript length, which 731 dominated as explanatory variable in this regression, trivially because most factors bind 732 along the entire transcript. To eliminate this trivial dependency, we used occupancy per 733 nucleotide, denoted ���������′, as the target variable in our regression (Figure 6C). 734 735 Motif enrichment analysis 736 To find sequence preferences for binding events of degradation factors, we counted 4- 737 mers in a window of [±5 nt] intervals around high confidence cross-link sites of PAR-CLIP 738 experiments. Based on this count table, the enrichment score for each 4-mer was 739 calculated using the following formula, 740 n4-mer,i +1 enrichment(4-mer,i)= 4 N P4-mer[j] 741 AAACUnicbVJBSxtBFJ6ktWpqa2yPXoaGgqINuxLQSyDUS08SwaiQ3S6zs2/N6OzsMvNWGob5jYXipT+klx6qk5iD0T4Y+Pi+93jffDNpJYXBIPjdaL56vfJmdW299Xbj3fvN9taHc1PWmsOIl7LUlykzIIWCEQqUcFlpYEUq4SK9OZ7pF7egjSjVGU4riAt2pUQuOENPJW0BSgs+KUDhToTwA23vSwHa7Ytd2qdRlmvGrUrssuboHg2dPaERigIMjSpdZom97ofuu+05OlweGF/HziXtTtAN5kVfgnABOmRRw6T9K8pKXs+sccmMGYdBhbFlGgWX4FpRbaBi/IZdwdhDxbyT2M4jcfSzZzKal9ofhXTOPp2wrDBmWqS+s2A4Mc+1Gfk/bVxjfhRboaoaQfHHRXktKZZ0li/NhAaOcuoB41p4r5RPmE8R/Su0fAjh8yu/BOcH3TDohqe9zuDrIo41sk0+kR0SkkMyIN/IkIwIJz/JH/KP3DfuGn+b/pc8tjYbi5mPZKmaGw85mLST ⇥ j=1 742 Q 743 Here N is the number of cross-link sites below the cut-off p-value (we used a maximum

744 of 5000 cross link sites), �4���, � is the number of observed 4-mers at position i in the set 745 of binding sequences aligned at their cross-link site i=0, 4-mer[j] is the base at the j’th 746 position of the 4-mer, and Pb is the probability of observing base b. We used the 747 probabilities: PA = PT = 0.31 and PC = PG = 0.19 based on frequencies in yeast genome 748 and corrected for the T bias at the cross-link site (Figure 4—figure supplement 1).

749

750 Acknowledgments:

751 We would like to thank Helmut Blum (LAFUGA, LMU Munich), Stefan Krebs (LAFUGA, 752 LMU Munich), Kerstin Maier and Petra Rus (Cramer laboratory) for sequencing. We thank 753 Gabriel Villamil and Bjoern Schwalb (Cramer laboratory) for sharing RNA half-life 754 calculations prior to publication. PC was funded by the Advanced Grant 755 TRANSREGULON of the European Research Council and by the Volkswagen 756 Foundation. This work was supported by the DFG SPP1935 grant CR 117/6-1.

757

758 Competing interests:

759 The authors declare that no competing interests exist.

760

761 Accession code: 25

762 The data discussed in this publication have been deposited in NCBI's Gene Expression 763 Omnibus (Edgar et al., 2002) and are accessible through GEO Series accession number 764 GSE128312. 765

26

766 References: 767 768 Allmang C, Petfalski E, Podtelejnikov A, Mann M, Tollervey D, Mitchell P. 1999. The yeast 769 exosome and human PM-Scl are related complexes of 3’ --> 5’ exonucleases. Genes 770 Dev 13:2148–2158. 771 Anderson JS, Parker RP. 1998. The 3’ to 5’ degradation of yeast mRNAs is a general 772 mechanism for mRNA turnover that requires the SKI2 DEVH box protein and 3’ to 773 5’ exonucleases of the exosome complex. EMBO J 17:1497–1506. 774 doi:10.1093/emboj/17.5.1497 775 Araki Y, Takahashi S, Kobayashi T, Kajiho H, Hoshino SI, Katada T. 2001. Ski7p G 776 protein interacts with the exosome and the ski complex for 3′-to-5′ mRNA decay in 777 yeast. EMBO J 20:4684–4693. doi:10.1093/emboj/20.17.4684 778 Baejen C, Andreani J, Torkler P, Battaglia S, Schwalb B, Lidschreiber M, Maier KC, 779 Boltendahl A, Rus P, Esslinger S, Söding J, Cramer P. 2017. Genome-wide Analysis 780 of RNA Polymerase II Termination at Protein-Coding Genes. Mol Cell 38–49. 781 doi:10.1016/j.molcel.2017.02.009 782 Baejen C, Torkler P, Gressel S, Essig K, Söding J, Cramer P. 2014. Transcriptome maps 783 of mRNP biogenesis factors define pre-mRNA recognition. Mol Cell 745–757. 784 doi:10.1016/j.molcel.2014.08.005 785 Baker KE, Parker R. 2004. Nonsense-mediated mRNA decay: terminating erroneous 786 gene expression. Curr Opin Cell Biol 16:293–299. doi:10.1016/j.ceb.2004.03.003 787 Battaglia S, Lidschreiber M, Baejen C, Torkler P, Vos SM, Cramer P. 2017. RNA- 788 dependent association of transcription elongation factors and Pol II CTD 789 kinases 1–26. doi:10.7554/eLife.25637 790 Beilharz TH, Preiss T. 2007. Widespread use of poly(A) tail length control to accentuate 791 expression of the yeast transcriptome. RNA 13:982–997. doi:10.1261/.569407 792 Boeck R, Tarun SJ, Rieger M, Deardorff JA, Muller-Auer S, Sachs AB. 1996. The yeast 793 Pan2 protein is required for poly(A)-binding protein-stimulated poly(A)-nuclease 794 activity. J Biol Chem 271:432–438. 795 Bonneau F, Basquin J, Ebert J, Lorentzen E, Conti E. 2009. The yeast exosome functions 796 as a macromolecular cage to channel RNA substrates for degradation. Cell 139:547– 797 559. doi:10.1016/j.cell.2009.08.042 798 Brown CE, Sachs AB. 1998. Poly(A) tail length control in Saccharomyces cerevisiae 799 occurs by message-specific deadenylation. Mol Cell Biol 18:6548–6559. 800 Brown JT, Bai X, Johnson AW. 2000. The yeast antiviral proteins Ski2p, Ski3p, and Ski8p 801 exist as a complex in vivo. Rna 6:449–457. doi:10.1017/S1355838200991787 802 Cao D, Parker R. 2003. Computational modeling and experimental analysis of nonsense- 803 mediated decay in yeast. Cell 113:533–545.

27

804 Caponigro G, Parker R. 1995. Multiple functions for the poly(A)-binding protein in mRNA 805 decapping and deadenylation in yeast. Genes Dev 9:2421–2432. 806 Celik A, Baker R, He F, Jacobson A. 2017. High-resolution profiling of NMD targets in 807 yeast reveals translational fidelity as a basis for substrate selection. Rna 23:735– 808 748. doi:10.1261/rna.060541.116 809 Chakrabarti S, Jayachandran U, Bonneau F, Fiorini F, Basquin C, Domcke S, Le Hir H, 810 Conti E. 2011. Molecular mechanisms for the RNA-dependent ATPase activity of 811 Upf1 and its regulation by Upf2. Mol Cell 41:693–703. 812 doi:10.1016/j.molcel.2011.02.010 813 Chowdhury A, Mukhopadhyay J, Tharun S. 2007. The decapping activator Lsm1p-7p- 814 Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and 815 polyadenylated RNAs. RNA 13:998–1016. doi:10.1261/rna.502507 816 Delan-Forino C, Schneider C, Tollervey D. 2017. Transcriptome-wide analysis of 817 alternative routes for RNA substrates into the exosome complex. PLoS Genet 818 13:e1006699. doi:10.1371/journal.pgen.1006699 819 Doma MK, Parker R. 2007. RNA quality control in . Cell 131:660–668. 820 doi:10.1016/j.cell.2007.10.041 821 Finoux A-L, Seraphin B. 2006. In vivo targeting of the yeast Pop2 deadenylase subunit to 822 reporter transcripts induces their rapid degradation and generates new decay 823 intermediates. J Biol Chem 281:25940–25947. doi:10.1074/jbc.M600132200 824 Franks TM, Singh G, Lykke-Andersen J. 2010. Upf1 ATPase-dependent mRNP 825 disassembly is required for completion of nonsense- mediated mRNA decay. Cell 826 143:938–950. doi:10.1016/j.cell.2010.11.043 827 Gallie DR. 1991. The cap and poly(A) tail function synergistically to regulate mRNA 828 translational efficiency. Genes Dev 5:2108–2116. doi:10.1101/gad.5.11.2108 829 Geisberg J V, Moqtaderi Z, Fan X, Ozsolak F, Struhl K. 2014. Global analysis of mRNA 830 isoform half-lives reveals stabilizing and destabilizing elements in yeast. Cell 831 156:812–824. doi:10.1016/j.cell.2013.12.026 832 Goldstrohm AC, Hook BA, Seay DJ, Wickens M. 2006. PUF proteins bind Pop2p to 833 regulate messenger RNAs. Nat Struct Mol Biol 13:533–539. doi:10.1038/nsmb1100 834 Grzechnik P, Kufel J. 2008. Linked to Transcription Termination Directs 835 the Processing of snoRNA Precursors in Yeast. Mol Cell 32:247–258. 836 doi:10.1016/j.molcel.2008.10.003 837 Heck AM, Wilusz J. 2018. The Interplay between the RNA Decay and Translation 838 Machinery in Eukaryotes. Cold Spring Harb Perspect Biol a032839. 839 doi:10.1101/cshperspect.a032839 840 Heo D, Yoo I, Kong J, Lidschreiber M, Mayer A, Choi B-Y, Hahn Y, Cramer P, Buratowski 841 S, Kim M. 2013. The RNA polymerase II C-terminal domain-interacting domain of

28

842 yeast Nrd1 contributes to the choice of termination pathway and couples to RNA 843 processing by the nuclear exosome. J Biol Chem 288:36676–36690. 844 doi:10.1074/jbc.M113.508267 845 Houseley J, LaCava J, Tollervey D. 2006. RNA-quality control by the exosome. Nat Rev 846 Mol Cell Biol 7:529–539. doi:10.1038/nrm1964 847 Houseley J, Tollervey D. 2009. The Many Pathways of RNA Degradation. Cell 136:763– 848 776. doi:10.1016/j.cell.2009.01.019 849 Huch S, Nissan T. 2014. Interrelations between translation and general mRNA 850 degradation in yeast. Wiley Interdiscip Rev RNA 5:747–763. doi:10.1002/wrna.1244 851 Jiang H, Lei R, Ding S-W, Zhu S. 2014. Skewer: a fast and accurate adapter trimmer for 852 next-generation sequencing paired-end reads. BMC Bioinformatics 15:182. 853 doi:10.1186/1471-2105-15-182 854 Jiao X, Xiang S, Oh C, Martin CE, Tong L, Kiledjian M. 2010. Identification of a quality- 855 control mechanism for mRNA 5’-end capping. Nature 467:608–611. 856 Kruk JA, Dutta A, Fu J, Gilmour DS, Reese JC. 2011. The multifunctional Ccr4-Not 857 complex directly promotes transcription elongation. Genes Dev 25:581–593. 858 doi:10.1101/gad.2020911 859 Lebreton A, Seraphin B. 2008. Exosome-mediated quality control: substrate recruitment 860 and molecular activity. Biochim Biophys Acta 1779:558–565. 861 doi:10.1016/j.bbagrm.2008.02.003 862 Losson R, Lacroute F. 1979. Interference of nonsense mutations with eukaryotic 863 messenger RNA stability. Proc Natl Acad Sci U S A 76:5134–5137. 864 Lykke-Andersen S, Brodersen DE, Jensen TH. 2009. Origins and activities of the 865 eukaryotic exosome. J Cell Sci 122:1487–1494. doi:10.1242/jcs.047399 866 Marquardt S, Hazelbaker DZ, Buratowski S. 2011. Distinct RNA degradation pathways 867 and 3’ extensions of yeast non-coding RNA species. Transcription 2:145–154. 868 doi:10.4161/trns.2.3.16298 869 Miller C, Schwalb B, Maier K, Schulz D, Dumcke S, Zacher B, Mayer A, Sydow J, 870 Marcinowski L, Dolken L, Martin DE, Tresch A, Cramer P. 2011. Dynamic 871 transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol 872 Syst Biol 7:458. doi:10.1038/msb.2010.112 873 Milligan L, Decourty L, Saveanu C, Rappsilber J, Ceulemans H, Jacquier A, Tollervey D. 874 2008. A Yeast Exosome Cofactor, Mpp6, Functions in RNA Surveillance and in the 875 Degradation of Noncoding RNA Transcripts. Mol Cell Biol 28:5446–5457. 876 doi:10.1128/MCB.00463-08 877 Milligan L, Huynh‐Thu VA, Delan‐Forino C, Tuck A, Petfalski E, Lombraña R, Sanguinetti 878 G, Kudla G, Tollervey D. 2016. Strand‐specific, high‐resolution mapping of modified 879 RNA polymerase II. Mol Syst Biol 12:874. doi:10.15252/msb.20166869

29

880 Mitchell P, Petfalski E, Houalla R, Podtelejnikov A, Mann M, Tollervey D. 2003. Rrp47p 881 is an exosome-associated protein required for the 3’ processing of stable RNAs. Mol 882 Cell Biol 23:6982–6992. 883 Mitchell P, Tollervey D. 2003. An NMD pathway in yeast involving accelerated 884 deadenylation and exosome-mediated 3’-->5’ degradation. Mol Cell 11:1405–1413. 885 Morrissey JP, Deardorff JA, Hebron C, Sachs AB. 1999. Decapping of stabilized, 886 polyadenylated mRNA in yeast pab1 mutants. Yeast 15:687–702. 887 doi:10.1002/(SICI)1097-0061(19990615)15:8<687::AID-YEA412>3.0.CO;2-L 888 Muhlrad D, Parker R. 1994. Premature translational termination triggers mRNA 889 decapping. Nature 370:578–581. doi:10.1038/370578a0 890 Neil H, Malabat C, d’Aubenton-Carafa Y, Xu Z, Steinmetz LM, Jacquier A. 2009. 891 Widespread bidirectional promoters are the major source of cryptic transcripts in 892 yeast. Nature 457:1038–1042. doi:10.1038/nature07747 893 Ogami K, Chen Y, Manley J. 2018. RNA Surveillance by the Nuclear RNA Exosome: 894 Mechanisms and Significance. Non-Coding RNA 4:8. doi:10.3390/ncrna4010008 895 Parker R. 2012. RNA degradation in Saccharomyces cerevisae. Genetics 191:671–702. 896 doi:10.1534/genetics.111.137265 897 Pechmann S, Frydman J. 2013. Evolutionary conservation of codon optimality reveals 898 hidden signatures of cotranslational folding. Nat Struct Mol Biol 20:237–243. 899 doi:10.1038/nsmb.2466 900 Pelechano V, Wei W, Steinmetz LM. 2013. Extensive transcriptional heterogeneity 901 revealed by isoform profiling. Nature 497:127–131. doi:10.1038/nature12121 902 Radhakrishnan A, Chen YH, Martin S, Alhusaini N, Green R, Coller J. 2016. The DEAD- 903 Box Protein Dhh1p Couples mRNA Decay and Translation by Monitoring Codon 904 Optimality. Cell 167:122-132.e9. doi:10.1016/j.cell.2016.08.053 905 Roth C, Torkler P. 2018. soedinglab/mockinbird: Degradation PAR-CLIP. 906 doi:10.5281/ZENODO.1342555 907 Schaeffer D, Tsanova B, Barbas A, Reis FP, Dastidar EG, Sanchez-Rotunno M, Arraiano 908 CM, van Hoof A. 2009. The exosome contains domains with specific 909 endoribonuclease, and cytoplasmic mRNA decay activities. Nat 910 Struct Mol Biol 16:56–62. doi:10.1038/nsmb.1528 911 Schmid M, Jensen TH. 2018. Controlling nuclear RNA levels. Nat Rev Genet 19:518– 912 529. doi:10.1038/s41576-018-0013-2 913 Schneider C, Kudla G, Wlotzka W, Tuck A, Tollervey D. 2012. Transcriptome-wide 914 Analysis of Exosome Targets. Mol Cell 48:422–433. 915 doi:10.1016/j.molcel.2012.08.013 916 Schneider C, Tollervey D. 2013. Threading the barrel of the RNA exosome. Trends 917 Biochem Sci 38:485–493. doi:10.1016/j.tibs.2013.06.013 30

918 Schulz D, Schwalb B, Kiesel A, Baejen C, Torkler P, Gagneur J, Soeding J, Cramer P. 919 2013. XTranscriptome surveillance by selective termination of noncoding RNA 920 synthesis. Cell 155:1075–1087. doi:10.1016/j.cell.2013.10.024 921 Semotok JL, Cooperstock RL, Pinder BD, Vari HK, Lipshitz HD, Smibert CA. 2005. 922 Smaug recruits the CCR4/POP2/NOT deadenylase complex to trigger maternal 923 transcript localization in the early Drosophila embryo. Curr Biol 15:284–294. 924 doi:10.1016/j.cub.2005.01.048 925 Smith JE, Alvarez-Dominguez JR, Kline N, Huynh NJ, Geisler S, Hu W, Coller J, Baker 926 KE. 2014. Translation of small open reading frames within unannotated RNA 927 transcripts in Saccharomyces cerevisiae. Cell Rep 7:1858–1866. 928 doi:10.1016/j.celrep.2014.05.023 929 Smith T, Heger A, Sudbery I. 2017. UMI-tools: modeling sequencing errors in Unique 930 Molecular Identifiers to improve quantification accuracy. Genome Res 27:491–499. 931 doi:10.1101/gr.209601.116 932 Stevens A, Poole TL. 1995. 5’-exonuclease-2 of Saccharomyces cerevisiae. Purification 933 and features of activity with comparison to 5’-exonuclease-1. J Biol 934 Chem 270:16063–16069. 935 Sun M, Schwalb B, Pirkl N, Maier KC, Schenk A, Failmezger H, Tresch A, Cramer P. 936 2013. Global analysis of Eukaryotic mRNA degradation reveals Xrn1-dependent 937 buffering of transcript levels. Mol Cell 52:52–62. doi:10.1016/j.molcel.2013.09.010 938 Synowsky SA, van Wijk M, Raijmakers R, Heck AJR. 2009. Comparative multiplexed 939 mass spectrometric analyses of endogenously expressed yeast nuclear and 940 cytoplasmic exosomes. J Mol Biol 385:1300–1313. doi:10.1016/j.jmb.2008.11.011 941 Tharun S, Parker R. 2001. Targeting an mRNA for decapping: displacement of translation 942 factors and association of the Lsm1p-7p complex on deadenylated yeast mRNAs. 943 Mol Cell 8:1075–1083. 944 Thompson DM, Parker R. 2007. Cytoplasmic decay of intergenic transcripts in 945 Saccharomyces cerevisiae. Mol Cell Biol 27:92–101. doi:10.1128/MCB.01023-06 946 Tuck AC, Tollervey D. 2013. A transcriptome-wide atlas of RNP composition reveals 947 diverse classes of mRNAs and lncRNAs. Cell 154:996–1009. 948 doi:10.1016/j.cell.2013.07.047 949 Tucker M, Valencia-Sanchez MA, Staples RR, Chen J, Denis CL, Parker R. 2001. The 950 transcription factor associated Ccr4 and Caf1 proteins are components of the major 951 cytoplasmic mRNA deadenylase in Saccharomyces cerevisiae. Cell 104:377–386. 952 doi:10.1016/S0092-8674(01)00225-2 953 Tudek A, Porrua O, Kabzinski T, Lidschreiber M, Kubicek K, Fortova A, Lacroute F, 954 Vanacova S, Cramer P, Stefl R, Libri D. 2014. Molecular basis for coordinating 955 transcription termination with noncoding RNA degradation. Mol Cell 55:467–481. 956 doi:10.1016/j.molcel.2014.05.031

31

957 Turowski TW, Tollervey D. 2015. Cotranscriptional events in eukaryotic ribosome 958 synthesis. Wiley Interdiscip Rev RNA 6:129–139. doi:10.1002/wrna.1263 959 Van Der Maaten LJP, Hinton GE. 2008. Visualizing high-dimensional data using t-sne. J 960 Mach Learn Res 9:2579–2605. doi:10.1007/s10479-011-0841-3 961 van Hoof A, Frischmeyer PA, Dietz HC, Parker R. 2002. Exosome-mediated recognition 962 and degradation of mRNAs lacking a termination codon. Science 295:2262–2264. 963 doi:10.1126/science.1067272 964 Vaňáčová Š, Wolf J, Martin G, Blank D, Dettwiler S, Friedlein A, Langen H, Keith G, Keller 965 W. 2005. A New Yeast Poly(A) Polymerase Complex Involved in RNA Quality 966 Control. PLOS Biol 3:e189. 967 Vasiljeva L, Buratowski S. 2006. Nrd1 interacts with the nuclear exosome for 3’ 968 processing of RNA polymerase II transcripts. Mol Cell 21:239–248. 969 doi:10.1016/j.molcel.2005.11.028 970 Vicens Q, Kieft JS, Rissland OS. 2018. Revisiting the Closed-Loop Model and the Nature 971 of mRNA 5′–3′ Communication. Mol Cell 72:805–812. 972 doi:https://doi.org/10.1016/j.molcel.2018.10.047 973 Wang L, Lewis MS, Johnson AW. 2005. Domain interactions within the Ski2/3/8 complex 974 and between the Ski complex and Ski7p. RNA 11:1291–1302. 975 doi:10.1261/rna.2060405 976 Webster MW, Chen Y, Stowell JAW, Graveley BR, Coller J, Passmore LA, Webster MW, 977 Chen Y, Stowell JAW, Alhusaini N, Sweet T, Graveley BR. 2018. mRNA 978 Deadenylation Is Coupled to Translation Rates by the Differential Activities of Ccr4- 979 Not Nucleases Article mRNA Deadenylation Is Coupled to Translation Rates by the 980 Differential Activities of Ccr4-Not Nucleases. Mol Cell 70:1089-1100.e8. 981 doi:10.1016/j.molcel.2018.05.033 982 Wells SE, Hillner PE, Vale RD, Sachs AB. 1998. Circularization of mRNA by eukaryotic 983 translation initiation factors. Mol Cell 2:135–140. doi:10.1016/S1097-2765(00)80122- 984 7 985 Wolf J, Passmore LA. 2014. Deadenylation by Pan2 / Pan3. Biochem Soc Trans 42:184– 986 187. doi:10.1042/BST20130211.mRNA 987 Wyers F, Rougemaille M, Badis G, Rousselle JC, Dufour ME, Boulay J, Régnault B, 988 Devaux F, Namane A, Séraphin B, Libri D, Jacquier A. 2005. Cryptic Pol II transcripts 989 are degraded by a nuclear quality control pathway involving a new poly(A) 990 polymerase. Cell 121:725–737. doi:10.1016/j.cell.2005.04.030 991 Zinder JC, Lima CD. 2017. Targeting RNA for processing or destruction by the eukaryotic 992 RNA exosome and its cofactors. Genes Dev 31:88–100. 993 doi:10.1101/gad.294769.116 994 995 32

996 Figures and supplementary files: 997 Figure 1. Overview of PAR-CLIP experiments performed in this study. 998 (A) Overview of degradation pathways studied. 999 (B) Number of high-confidence PAR-CLIP cross-link sites obtained for each factor after 1000 merging data from replicates. 1001

1002 1003

33

1004 Figure 2. Distribution of degradation factor cross-link sites over the yeast 1005 transcriptome. 1006 (A) Fractions of high confidence PAR-CLIP sequencing reads of 30 yeast degradation 1007 factors fall into various transcript classes. Depicted classes are the following: messenger 1008 RNA (mRNA) in turquoise (n=4,928), ribosomal RNA (rRNA) in antique pink (n=24), 1009 transfer RNA (tRNA) in dark blue (n=299), small nucleolar RNA (snoRNA) in yellow 1010 (n=77), small nuclear RNA (snRNA) in green (n=6), stable unannotated transcripts 1011 (SUTs) in red (n=318), cryptic unstable transcripts (CUTs) in light brown (n=637), Nrd1- 1012 unterminated transcripts (NUTs) in dark brown (n=298)(Methods). 1013 (B) Enrichment z-scores of high confidence PAR-CLIP cross-link sites of 30 yeast 1014 degradation factors (rows) in various segments of mRNA transcripts (left columns; UTR: 1015 untranslated region; intron; CDS: coding sequence), or other transcript classes as in A 1016 (other columns). The color-coded enrichment score shows the column and row 1017 normalized enrichment values of binding preferences of each factor for each transcript 1018 class (color encoded, depleted in blue and enriched in red). The coefficient of variation 1019 on top is the standard deviation divided by the mean for each transcript class.

34

AB

mRNA mRNA rRNA 2 tRNA snoRNA

snRNA cientof

SUT ffi variation

CUT coe NUT 0 others SUT CUT CDS NUT tRNA rRNA intron 3' UTR 5' UTR 0 100% snRNA snoRNA

Ccr4 deadenylation Pop2 Not1 Caf40 Pan2 Pan3

Dcp2 decapping Dcp1 Edc2 Edc3 Dhh1

Xrn1

Rrp6 exosome Csl4 Rrp40 Rrp4 Rrp44

Air1

Trf5 TRAMP Mtr4 Air2 Trf4 ski complex ski Ski2 Ski3 Ski7 Ski8

Upf1 NMD NMD Upf2 Upf3 Nmd4

en richm en t score enrichment score

2 1 0 1 2 1020 35

1021 Figure 3. Metagene analysis of degradation factor binding on mRNAs. 1022 Averaged occupancy profiles of degradation factors over mRNAs aligned around their 1023 transcription start site (TSS) (n=3,193, left) and around their poly-adenylation (pA) site 1024 (n=3,193, right) in a window of [±700 nt]. Regions that have neighboring transcripts on 1025 the same strand were removed to avoid contaminating profiles (Methods). Factors are 1026 grouped according to their functional role; from top to bottom: deadenylation, decapping, 1027 Xrn1, exosome, TRAMP complex, Ski complex, or NMD. The color code shows the 1028 average occupancy normalized between the minimum (blue) and maximum (red) values 1029 per profile.

1030

36

1031 Figure 4. Surveillance of aberrant nuclear antisense RNAs by the exosome and the 1032 TRAMP4 complex. 1033 Averaged occupancy profiles of degradation factors binding to transcripts antisense of 1034 mRNAs aligned around transcription start site (TSS) (n=3,076, left) and around their poly- 1035 adenylation (pA) site (n=2,705, right) in a window of [±700 nt]. Regions with annotated 1036 genes on the antisense strand are removed to avoid contaminating the profiles (Methods). 1037 The color code shows the average occupancy normalized between the minimum (blue) 1038 and maximum (yellow) values per profile. On top, previously published PAR-CLIP profiles 1039 for Nrd1 and Nab3 are included for comparison (Schulz et al., 2013).

1040 1041

37

1042 Figure 5. Global co-occupancy and co-localization analysis reveals unexpected 1043 cooperation between factors from different complexes and pathways. 1044 (A) Matrix of pairwise correlation coefficients of factor occupancies evaluated over all 1045 transcripts. (B) Matrix of co-localization based on the enrichment of factor x binding within 1046 40 nt of the cross-link site of factor x´ (Methods). (C) Two-dimensional embedding of the 1047 co-occupancies in (A) analyzed for 74 RNA processing factors with tSNE, including 30 1048 factors from this study (highlighted in bold), and 44 factors from previous studies (Baejen 1049 et al., 2017, 2014; Battaglia et al., 2017; Schulz et al., 2013) (Supplementary File 1). 1050 Factors that are plotted in close proximity show a preference for binding to the same 1051 transcripts. Clusters present factors involved in RNA synthesis (1), splicing (2), 1052 3´ processing (3), deadenylation (4), decapping (5), nuclear ncRNA processing (6), and 1053 surveillance (7).

38

1054

39

1055 Figure 6. Binding preferences reveal a link between decapping-mediated 1056 degradation and translation. 1057 (A) Total occupancy per mRNA (according to TIF-seq annotation) for six factors as a 1058 function of the average mRNA codon optimality (transcript optimality). The occupancy of 1059 factors from the 5´→3´ degradation machinery (decapping and Xrn1, left) decreases with 1060 increasing transcript optimality, whereas the occupancy of factors from the 3´→5´ 1061 degradation machinery (Ccr4 and Caf40 deadenylation complex subunits and exosome 1062 subunit Rrp44, right) increases with increasing average codon optimality. (Gray shading: 1063 95% confidence intervals generated by bootstrapping mRNAs). (B) Codon enrichment in 1064 transcripts bound by Dcp2 and Ccr4 compared to the average frequency over all mRNAs. 1065 The bar colors represent codon optimality, with highly optimal codons shown in dark red. 1066 (Thin gray lines: 90% confidence intervals generated by bootstrapping coding 1067 sequences.) (C) Significance of correlations between the binding strength of degradation 1068 factors and transcript length, transcript optimality (Pechmann and Frydman, 2013), 1069 expression level (Baejen et al., 2017), and half-life derived by multivariate linear 1070 regression analysis (Methods). Bars are separated according to the direction of 1071 correlation with positive correlation marked by a red background and negative correlation 1072 marked by a blue background.

40

1073 1074

41

1075 Figure 7. Location and recruitment of the decapping complex Dcp1/Dcp2 and 1076 decapping enhancers Edc3, Dhh1, and Edc2. 1077 (A) Smoothed, transcript-averaged PAR-CLIP occupancy profiles aligned at TSS and pA 1078 sites [±750 nt] of unstable and stable transcripts (first and fourth quantile of half-life 1079 distribution, respectively). (B) Dependence of total occupancy of factors on the transcripts 1080 half-life. The fitting function is plotted in red and the fitted value for b is marked with a 1081 dashed gray line. (Gray shade: 95% confidence intervals generated by bootstrapping 1082 transcripts). (C) Sequence binding preference for the catalytically active subunit of 1083 decapping complex (Dcp2), illustrated with the 5 most enriched and the 3 most depleted 1084 4-mers. The color code shows the log2 enrichment factor of 4-mers around PAR-CLIP 1085 cross-link sites [±8 nt]. Dark red represents strong enrichment and dark blue shows strong 1086 depletion of a 4-mer. Infeasible combinations are shown with gray. The most highly 1087 enriched field is binding AAAAU with the cross-link at the U, which is enriched over 1088 random expectation approximately 22.3 = 5-fold.

42

A B

unstable transcripts fitted function: a stable transcripts occ = + b t 1 -450 TSS +450 -450 pA +450 2

a/b = 18.7 (a.u.) Dcp2 occupancy b 0

a/b = 7.4 Dcp1 (a.u.)

occupancy b 0

a/b = 4.4 Edc2 (a.u.)

occupancy b 0

a/b = 9.1 Edc3 (a.u.)

occupancy b 0

a/b = 4.7 Dhh1 (a.u.) b occupancy 0 -450 TSS +450 -450 pA +450 20 40 transcript half-life C Dcp2 2.4 AAAA CAAA 1.6 GAAA 0.8 GCCA CCAA 0.0

CCTA 2 0.8

ACGT log (enrichment) 1.6 CTAG -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 1089

43

1090 Supplementary File 1. Overview of RNA processing factors and their respective 1091 PAR-CLIP experiments used in this study.

High confidence Pathway Factor Details Source Accession number cross-link sites :ArrayExpress Nrd1 187164 Schulz et al., 2013 E-MTAB-1766 Surveillance Nrd1/Nab3 complex :ArrayExpress Nab3 91762 Schulz et al., 2013 E-MTAB-1766 Cap binding Cbc2 95530 Baejen et al., 2014 GEO: GSE59676 complex Bur1 232237 Battaglia et al., 2017 GEO: GSE81822 BUR kinase complex Bur2 169544 Battaglia et al., 2017 GEO: GSE81822 Ctk1 257668 Battaglia et al., 2017 GEO: GSE81822 CTDK-I coomplex Ctk2 209835 Battaglia et al., 2017 GEO: GSE81822 Cdc73 234224 Battaglia et al., 2017 GEO: GSE81822 Ctr9 5361 Battaglia et al., 2017 GEO: GSE81822 Leo1 PAF1 complex 87202 Battaglia et al., 2017 GEO: GSE81822 Elongation Paf1 72871 Battaglia et al., 2017 GEO: GSE81822 Rtf1 52570 Battaglia et al., 2017 GEO: GSE81822 Set1 321575 Battaglia et al., 2017 GEO: GSE81822 methyl Set2 90276 Battaglia et al., 2017 GEO: GSE81822 transferase Dot1 88966 Battaglia et al., 2017 GEO: GSE81822 Spt5 DSIF 193823 Baejen et al. ,2017 GEO: GSE79222 Spt6 Nucleosome remodeling 204781 Battaglia et al., 2017 GEO: GSE81822 Rpb1 RNA polymerase 153593 Baejen et al. ,2017 GEO: GSE79222 Ist3 U2 snRNP 3899 Baejen et al., 2014 GEO: GSE59676 Nam8 7097 Baejen et al., 2014 GEO: GSE59676 Mud1 8512 Baejen et al., 2014 GEO: GSE59676 U1 snRNP Splicing Snp1 6773 Baejen et al., 2014 GEO: GSE59676 Luc7 33471 Baejen et al., 2014 GEO: GSE59676 Mud2 76190 Baejen et al., 2014 GEO: GSE59676 BBP/ U2AF65 Msl5 81742 Baejen et al., 2014 GEO: GSE59676 Pab1 Poly(A) 33422 Baejen et al., 2014 GEO: GSE59676 Pub1 Poly(U) 72015 Baejen et al., 2014 GEO: GSE59676 Rna15 CFIA 92792 Baejen et al., 2014 GEO: GSE59676 3' processing Mpe1 3743 Baejen et al., 2014 GEO: GSE59676 Cft2 CPF 32758 Baejen et al., 2014 GEO: GSE59676 Yth1 6506 Baejen et al., 2014 GEO: GSE59676 Rat1 Exoribonuclease 122259 Baejen et al. ,2017 GEO: GSE79222 Rai1 23092 Baejen et al. ,2017 GEO: GSE79222 Termination Rtt103 51954 Baejen et al. ,2017 GEO: GSE79222 Pcf11 1432 Baejen et al. ,2017 GEO: GSE79222 Hpr1 24616 Baejen et al., 2014 GEO: GSE59676 THO Tho2 3282 Baejen et al., 2014 GEO: GSE59676 Gbp2 151941 Baejen et al., 2014 GEO: GSE59676 Hrb1 132503 Baejen et al., 2014 GEO: GSE59676 TREX Export Mex67 10917 Baejen et al., 2014 GEO: GSE59676 Sub2 39981 Baejen et al., 2014 GEO: GSE59676 Yra1 46438 Baejen et al., 2014 GEO: GSE59676 Nab2 Adaptors 99934 Baejen et al., 2014 GEO: GSE59676 Npl3 131428 Baejen et al., 2014 GEO: GSE59676

1092 44

1093 Figure 1—figure supplement 1. Biological replicate PAR-CLIP experiments have 1094 high correlation. 1095 (A) Total transcript occupancy of factors in replicate experiments are plotted (in log2 1096 space) and Spearman correlation values are shown for each pair. Each dot corresponds 1097 to a transcript. The color indicates dot density. (B) Comparison of coverage profiles 1098 obtained from CRAC experiments of Xrn1, Mtr4, Trf4, and Ski2 in S. cerevisiae (Tuck and 1099 Tollervey, 2013) with occupancy profiles from our PAR-CLIP experiments highlights 1100 reproducibility of transcriptome profiles across different methods. These profiles show the 1101 averaged binding of degradation factors over mRNAs (sense strand: left and anti-sense 1102 strand: right) in a window of [±700 nt] around their transcription start site (TSS) and their 1103 poly-adenylation (pA) site in a window of [±700 nt]. Regions that have neighboring 1104 transcripts on the same strand were removed to avoid contaminating profiles (Methods). 1105

45

A

r = 0.90 r = 0.92 r = 0.93 r = 0.94 r = 0.94 r = 0.93 0 0 0 0 0 0

5 5 5 5 5 5

10 10 10 10 10 10 Ccr4 Rep2 Not1 Rep2 Pan2 Rep2 Pan3 Rep2 Pop2 Rep2 Caf40 Rep2

15 15 15 15 15 15

10 0 15 10 5 0 15 10 5 0 15 10 5 0 15 10 5 0 15 10 5 0 Ccr4 Rep1 Pop2 Rep1 Not1 Rep1 Caf40 Rep1 Pan2 Rep1 Pan3 Rep1

5 r = 0.96 r = 0.95 r = 0.94 r = 0.93 r = 0.91 r = 0.90 0 0 0 0 0 0

5 5 5 5 5 5

10 10 10 Xrn1 Rep2 10 Edc2 Rep2 Edc3 Rep2 10 Dcp2 Rep2 Dcp1 Rep2 10 Dhh1 Rep2

15 15 15 15 15 15

10 0 15 10 5 0 10 0 15 10 5 0 10 0 15 10 5 0 Dcp2 Rep1 Dcp1 Rep1 Edc2 Rep1 Edc3 Rep1 Dhh1 Rep1 Xrn1 Rep1

r = 0.89 r = 0.87 r = 0.94 r = 0.91 r = 0.93 r = 0.91 0 0 0 0 0 0

5 5 5 5 5 5

10 10

10 Air1 Rep2 10 Csl4 Rep2 10

Rrp6 Rep2 Rrp4 Rep2 10 Rrp40 Rep2 Rrp44 Rep2

15 15 15 15 15 15

15 10 5 0 15 10 5 0 15 10 5 0 10 0 15 10 5 0 15 10 5 0 Rrp6 Rep1 Csl4 Rep1 Rrp40 Rep1 Rrp4 Rep1 Rrp44 Rep1 Air1 Rep1

r = 0.98 r = 1.00 r = 0.91 r = 0.92 5 r = 0.96 r = 0.96 0 0 0 0 0 0 5 5 5 5 5 5

10 Trf5 Rep2 Trf4 Rep2 Air2 Rep2 Ski2 Rep2 Ski3 Rep2 Mtr4 Rep2 10 10 10 10 10

15 15 15 15 15 15

15 10 5 0 15 10 5 0 10 0 15 10 5 0 15 10 5 0 10 0 Trf5 Rep1 Mtr4 Rep1 Air2 Rep1 Trf4 Rep1 Ski2 Rep1 Ski3 Rep1

r = 0.94 r = 0.93 5 r = 0.98 r = 0.95 r = 0.97 r = 0.93 0 5 5 0 0 0 0 0 5 5 5 5 5 5 10 Ski7 Rep2 Ski8 Rep2 Upf1 Rep2 10 Upf2 Rep2 Upf3 Rep2 10 10 10 10 Nmd4 Rep2

15 15 15 15 15 15

10 0 15 10 5 0 10 0 10 0 10 0 15 10 5 0 Ski7 Rep1 Ski8 Rep1 Upf1 Rep1 Upf2 Rep1 Upf3 Rep1 Nmd4 Rep1

B

sense strand anti-sense strand

Xrn1 (CRAC) Xrn1(CRAC) Xrn1 (PAR-CLIP) Xrn1 (PAR-CLIP)

Mtr4 (CRAC) Mtr4 (CRAC) Mtr4 (PAR-CLIP) Mtr4 (PAR-CLIP)

Trf4 (CRAC) Trf4 (CRAC) Trf4 (PAR-CLIP) Trf4 (PAR-CLIP)

Ski2 (CRAC) Ski2 (CRAC) Ski2 (PAR-CLIP) Ski2 (PAR-CLIP) -700 bp TSS gene body pA +700 bp -700 bp TSS gene body pA +700 bp

occupancy occupancy low high low high 1106 1107

46

1108 Figure 1—figure supplement 2. Western Blot analysis for all degradation factors 1109 analyzed in this study show IP efficiency. 1110 IP using the TAP-tag is detected by Western Blot analysis with tag specific antibody to 1111 show the IP quality of the different PAR-CLIP experiments representative for one replicate 1112 per factor. The factors are sorted according to their complexes: deadenylation (Ccr4, 1113 Pop2, Not1, Caf40, Pan3, and Pan3), decapping (Dcp1, Dcp2, Edc2, Edc3, and Dhh1), 1114 5´→3´ exonuclease (Xrn1), exosome (Rrp6, Csl4, Rrp40, Rrp4, and Rrp44), TRAMP 1115 (Air1, Trf5, Mtr4, Air2, and Trf4), Ski (Ski2, Ski3, Ski7, and Ski8), and NMD (Upf1, Upf2, 1116 Upf3, and Nmd4). The molecular weight including the weight of the TAP tag (in kDa) is 1117 indicated for each factor. The band at ~25 kDa is caused by cross-reactivity of the light 1118 chain of the used antibodies for IP and Western Blot. A shift to higher molecular weight 1119 than indicated can be caused by UV-crosslinking of proteins to RNA. 1120

47

Ccr4 Pop2 Not1 Caf40 Pan2 Pan3 Dcp1 Dcp2 Edc2 Edc3 Dhh1 Xrn1 (114) (70) (260) (61) (147) (96) (46) (128) (36) (81) (77) (195) MW kDa 140- 115- 80- 65-

55 -

40 - 30 - 25 -

Rrp6 Csl4 Rrp40 Rrp4 Rrp44 Air1 Trf5 Mtr4 Air2 Trf4 MW (104) (51) (46) (59) (133) (61) (94) (142) (59) (86) kDa 140 - 115 - 80- 65-

50-

40 - 30 - 25 -

Ski2 Ski3 Ski7 Ski8 Upf1 Upf2 Upf3 Nmd4 MW (166) (183) (104) (64) (129) (146) (64) (45) kDa 140- 115- 80- 65-

50-

40 - 30- 25 -

1121 48

1122 Figure 2—figure supplement 1. Different transcript classes have comparable U- 1123 content. 1124 Fraction of U over all bases in transcript classes studied in Figure 1 (untranslated region 1125 (UTR); intron; coding sequence (CDS), ribosomal RNA (rRNA), transfer RNA (tRNA), 1126 small nucleolar RNA (snoRNA), small nuclear RNA (snRNA), stable unannotated 1127 transcripts (SUTs), cryptic unstable transcripts (CUTs), Nrd1- unterminated transcripts 1128 (NUTs)). 1129

0.35 0.30 0.25 0.20 0.15 U fraction fraction U 0.10 0.05 0.00 CDS SUTs tRNA CUTs rRNA NUTs intron 5' UTR 5' UTR 3' snRNA

1130 snoRNA

49

1131 Figure 2—figure supplement 2. Metagene profiles for subunits of the TRAMP 1132 complexes on snoRNA genes. 1133 Transcript averaged PAR-CLIP occupancy profiles are shown for Air1, Trf5, Mtr4, Air2, 1134 and Trf4. snoRNA genes are aligned either at their 5´ end or at their 3´ end (n=77). 1135 Occupancy profiles are shown over the range of ±35 nt.

5' snoRNA 3'

-35 5'-end +35 -35 3'-end +35 Air1 Trf5 Mtr4 Air2 Trf4

-35 5'-end +35 -35 3'-end +35 1136

50

1137 Figure 3—figure supplement 1. Metagene profiles of yeast RNA degradation factors 1138 centered on translation start and stop sites in comparison to TIF-annotated TSS 1139 and pA sites. 1140 Transcript-averaged PAR-CLIP occupancy profiles are shown for RNA degradation 1141 factors involved in (A) deadenylation, (B) decapping, (C) 5´→3´ exonuclease Xrn1, (D) 1142 exosome, (E) TRAMP, (F) Ski, and (G) NMD. Transcripts are aligned either at transcript 1143 start site (TSS) and poly-adenylation (pA) site (marked with blue) or at their start and stop 1144 codons (marked with green). TIF-seq based annotation is shown in blue (n=3,193 for TSS 1145 and pA site profiles) (Pelechano et al., 2013). Open reading frames (ORF) annotated in 1146 the SGD (version 64.2.1) are shown in green (n=4,012 for TSS, and n=3,965 for pA site 1147 selected transcripts). To avoid contaminating signals from neighboring genes, we filtered 1148 out regions that had annotations upstream and downstream of the centered gene (up to 1149 700 nt) (Methods). Shaded areas (in blue TIF-seq annotation, or in green for ORF 1150 annotation) depict 95% confidence intervals derived from bootstrapping genes. 1151 Comparison between these two profiles highlights preferences for end binding 1152 degradation factors in binding to untranslated regions at the two sides of the transcript. 1153

51

TIF annotation A ORF annotation B D TSS pA TSS pA TSS pA -200 AUG +200 -200 stop +200 -200 AUG +200 -200 stop +200 -200 AUG +200 -200 stop +200 Ccr4 Rrp6 Dcp2 Csl4 Pop2 Dcp1 Not1 Edc2 Rrp40 Rrp4 Edc3 Caf40 Pan2 Dhh1 Rrp44

-200 TSS +200 -200 pA +200 -200 TSS +200 -200 pA +200 AUG stop AUG stop

Pan3 C TSS pA -200 AUG +200 -200 stop +200 -200 TSS +200 -200 pA +200 AUG stop Xrn1

-200 TSS +200 -200 pA +200 AUG stop E F G

TSS pA TSS pA TSS pA -200 AUG +200 -200 stop +200 -200 AUG +200 -200 stop +200 -200 AUG +200 -200 stop +200 Air1 Ski2 Upf1 Trf5 Ski3 Upf2 Ski7 Mtr4 Upf3 Air2 Ski8 Nmd4

-200 TSS +200 -200 pA +200 -200 TSS +200 -200 pA +200 AUG stop AUG stop Trf4

-200 TSS +200 -200 pA +200 AUG stop 1154 52

1155 Figure 3—figure supplement 2. Comparison of binding profiles on genes 1156 containing annotated upstream sense NUTs with all mRNAs. 1157 (A) Binding enrichment of degradation factors around the TSS of genes with an upstream 1158 sense NUT. Enrichment is defined as the ratio of the average occupancy in the interval 1159 [±300 nt] of the TSS on these genes that contain an upstream NUT (n=459) (Schulz et 1160 al., 2013) divided by the average occupancy on all genes. (B) Transcript-averaged PAR- 1161 CLIP occupancy profiles for all mRNAs (black) is compared to patterns derived from 1162 genes with upstream sense NUTs (blue). Transcripts were aligned at their TSS and 1163 averaged over the interval of [±600 nt]. We compared Nrd1 and Nab3 profiles, known to 1164 process NUTs, with subunits of the TRAMP complex. 95% confidence intervals obtained 1165 from bootstrapping genes are shown with gray and blue shades. 1166

1167 1168

53

1169 Figure 3—figure supplement 3. Metagene analysis of degradation factor binding on 1170 mRNAs after removing signals from known NUTs and CUTs. 1171 Cross-link sites were filtered to exclude regions that were previously annotated as NUTs 1172 and CUTs (Neil et al., 2009; Schulz et al., 2013). Averaged occupancy profiles of 1173 degradation factors are then shown over mRNAs aligned around their transcription start 1174 site (TSS) (n=3,193, left) and around their poly-adenylation (pA) site (n=3,193, right) in a 1175 window of [±700 nt]. Regions that have neighboring transcripts on the same strand were 1176 removed to avoid contaminating profiles (Methods). Factors are grouped according to 1177 their functional role; from top to bottom: deadenylation, decapping, Xrn1, exosome, 1178 TRAMP complex, Ski complex, or NMD. The color code shows the average occupancy 1179 normalized between the minimum (blue) and maximum (red) values per profile. 1180

54

1181 1182

55

1183 Figure 4—figure supplement 1. Motif enrichment analysis shows enrichment of 1184 Nrd1/Nab3 motifs for the TRAMP4 and the exosome complex. 1185 Motif analysis was performed for all degradation factors in this study. Nrd1 and Nab3 are 1186 included for comparison (Schulz et al., 2013). Occurrences of Nrd1 motif (GTAG) and 1187 Nab3 motif (CTTG) are highlighted with red. The color code shows the log2 enrichment 1188 factor of top 5 enriched and top 3 depleted 4-mers around PAR-CLIP cross-link sites 1189 [±8 nt]. Dark red represents strong enrichment and dark blue shows strong depletion of a 1190 4-mer. Infeasible combinations are shown with gray.

56

Nrd1 Nab3 Ccr4 Pop2 4.5 GTAG CTTG CTGC ATAT 2 4.5 3.0 GTAA TCTT CCGC TATA 3.0 1 TGTA 3.0 CTTC GCTG GGCG 1.5 CTTG 1.5 CTCT 1.5 TGCT CACA 0 TTGT GTAG CGCT CCAC 0.0 CGAA 0.0 ATTA 0.0 CCTA TAGG 1 CTAG 1.5 TTTA CGAC AAGT 1.5 1.5 2 TGAG TTAA TTAG CTAG 3.0 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Not1 Caf40 Pan2 Pan3 2.4 2 GGCG GGCG GGCG 1.6 GCCA 1.6 CACA 1.6 TATA CCAA CCAA 1 0.8 TGGC ATAT GCCA 0.8 AAAA 0.8 CCAC CCAC 0 CCGC ATAT 0.0 0.0 GGCC 0.0 CTGC GTGG GCAG 0.8 TAGG GAGT 1 CCTA TAGG 0.8 0.8 TAAG TGAG TAGG CGAC 1.6 2 AGTC 1.6 TAGG TTAG 1.6 CGTA 2.4 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Dcp2 Dcp1 Edc2 Edc3 2.4 AAAA CCAA GGCA GCCA 1.6 1.6 1.6 CAAA 1.6 GCCA GCCA CCAA 0.8 0.8 GAAA 0.8 CAAA 0.8 GCCG GCAA GCCA GCAA CCAA 0.0 GCGG 0.0 0.0 CCAA 0.0 GGCA CGCC GCAG 0.8 0.8 CCTA 0.8 TAGG 0.8 TTAG TAAG ACGT TTAG TAGT 1.6 CGTA 1.6 1.6 CTAG CGAC 1.6 TAAG TTAA 2.4 2.4 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Dhh1 Xrn1 Rrp6 Csl4 2.4 2 2.4 GCGG GCCA 1.6 GGCG GGCG GATG GCAG GCGG 1.6 GGCC 1.6 1 GGTG CCGC 0.8 GCGC TGGC 0.8 0.8 CCGC 0 GGCC 0.0 GTGG CTTG GGCG GGCA GGCC 0.0 GTGG 0.8 0.0 TAAG 1 TTAG ATTA CTAA 0.8 TTAG AGTC CTAA CTAG 0.8 2 1.6 TTAA TAGG TTTA 1.6 ATTA 2.4 1.6 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Rrp40 Rrp4 Rrp44 Air1 GGCG 2 CTTG 3 CTTG 3.0 GGCG CTTG GGCG GTAG GCCA 1.6 1 2 TGGC GTAG GGCG 1.5 GTGG 0.8 GGCC 0 TGGC 1 GCGC GCGC GTGG GTGG GTGG 0.0 GGCC 0.0 0 CCTA 1 CTAA CCTA TAAG 0.8 CTAA ATTA 1 ATTA 1.5 TAAC 2 ATTA ACGA TACG AGTC 2 1.6 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Trf5 Mtr4 Air2 Trf4 2 GGCG 2.4 CTTG CTTG CTTG 2 2.4 TGGC GTAG TGGC TGGC 1.6 1.6 1 GGCC TCTT 1 GCAG GGCA GCGC 0.8 CTTC GTAG 0.8 GTGG 0 0 GTGG TGGC TCTT GCAG 0.0 0.0 TAGG CCTA 1 ATTA CTAA 1 TTAG 0.8 ATTA TACG 0.8 ATTA 2 TAAG ACGA 2 ACTA CCTA 1.6 1.6 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Ski2 Ski3 Ski7 Ski8 GCCA GCCA 2 GCCA 2 GCCA 2 2 GCAA GGCG GGCG GGCG 1 1 1 GGCC 1 TGGC CCAA CCAC 0 CAAA CCAA GCAG 0 GGCC 0 0 AAAA CACA GGCC GTGG 1 AGTC 1 TTAG 1 TTAG 1 TAGG TAGG TAAG TAGG TAAG 2 2 2 TTAG AGTC 2 CGTA CTAG 3 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5

Upf1 Upf2 Upf3 Nmd4 3 2 GCCA GCCA GCAG GCCA 1.6 1.6 2 GCTG GGCC GGCG GGCA 1 0.8 0.8 CTGC GCTG GGCA 1 GGCG GGCC CTGC GCCA GCGC 0 0.0 0.0 TGGC CCGC GCAC 0 TGGC 0.8 1 TTAG TAGG 0.8 TAAG 1 CTTA TAAG 1.6 GAGT CTTA TTAA 1.6 2 2 TTAA AGTT TTAA AGAC 2.4 1191 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 -8 -7 -6 -5 -4 -3 -2 -1 U 1 2 3 4 5 1192

57

1193 Figure 4—figure supplement 2. The aberrant nuclear ncRNAs bound by 1194 components of the exosome and the TRAMP4 complex are primarily NUTs and 1195 CUTs. 1196 Cross-link sites were filtered to exclude regions that were previously annotated as NUTs 1197 and CUTs (Neil et al., 2009; Schulz et al., 2013). Averaged occupancy profiles of 1198 degradation factors are shown on transcripts antisense of mRNAs aligned around the 1199 transcription start site (TSS) (n=3,076, left) and around their poly-adenylation (pA) site 1200 (n=2,705, right) in a window of [±700 nt]. Regions with annotated genes on the antisense 1201 strand are removed to avoid contaminating the profiles (Methods). The color code shows 1202 the average occupancy normalized between the minimum (blue) and maximum (yellow) 1203 values per profile. On top, previously published PAR-CLIP profiles for Nrd1 and Nab3 are 1204 included for comparison (Schulz et al., 2013). 1205

58

1206 1207

59

1208 Figure 5—figure supplement 1. Co-occupancy for 74 RNA processing factors. 1209 Matrix of pairwise correlation coefficients of factor occupancies evaluated over all 1210 transcripts. Analysis for 74 RNA processing factors, including 30 factors from this study, 1211 and 44 factors from previous studies (Baejen et al., 2017, 2014; Battaglia et al., 2017; 1212 Schulz et al., 2013) (see Supplementary File 1). Dark blue represents high correlation in 1213 binding across all transcripts. Factors are sorted and color coded (left and upper border) 1214 according to their general function.

surveillance Nrd1 cap binding complex Nab3 Cbc2 elongation factor Bur1 splicing Bur2 Ctk1 3' processing Ctk2 termination Cdc73 Leo1 export Paf1 deadenylation Rtf1 decapping Set1 Set2 Xrn1 Dot1 exosome and associated factors Spt5 Spt6 NMD Rpb1 Ctr9 1.0 Ist3 Nam8 0.8 Mud1 0.6 Snp1 0.4 Luc7 Mud2 0.2 Msl5 0.0 Pab1 Pub1 Rna15 Mpe1 Cft2 Yth1 Pcf11 Rat1 Rai1 Rtt103 Hpr1 Tho2 Gbp2 Hrb1 Mex67 Sub2 Yra1 Nab2 Npl3 Ccr4 Pop2 Not1 Caf40 Pan2 Pan3 Dcp2 Dcp1 Edc2 Edc3 Dhh1 Xrn1 Rrp6 Csl4 Rrp40 Rrp4 Rrp44 Air1 Trf5 Mtr4 Air2 Trf4 Ski2 Ski3 Ski7 Ski8 Upf1 Upf2 Upf3 Nmd4 Ist3 Air1 Trf5 Air2 Trf4 Rtf1 Cft2 Ski2 Ski3 Ski7 Ski8 Csl4 Ctr9 Paf1 Rai1 Yra1 Yth1 Set1 Set2 Ccr4 Msl5 Npl3 Spt5 Spt6 Mtr4 Ctk1 Ctk2 Rat1 Upf1 Upf2 Upf3 Xrn1 Bur1 Bur2 Luc7 Rrp6 Rrp4 Not1 Dot1 Leo1 Nrd1 Hpr1 Hrb1 Edc2 Edc3 Pop2 Pan2 Pan3 Pab1 Tho2 Pub1 Cbc2 Snp1 Sub2 Dcp2 Dcp1 Rpb1 Nab3 Nab2 Dhh1 Gbp2 Mpe1 Mud1 Mud2 Pcf11 Caf40 Nam8 Nmd4 Rrp40 Rrp44 Cdc73 Rna15 Mex67 Rtt103 1215 1216

60

1217 Figure 5—figure supplement 2. Co-localization coefficients for all 74 RNA 1218 processing factors. 1219 Pairwise correlation between normalized co-localization profiles of factors in a window of 1220 40 nt centered at PAR-CLIP cross-link sites. Analysis for 74 RNA processing factors, 1221 including 30 factors from this study, and 44 factors from previous studies (Baejen et al., 1222 2017, 2014; Battaglia et al., 2017; Schulz et al., 2013) (see Supplementary File 1). High 1223 co-localization represents binding to the same position on transcripts (marked with dark 1224 red). Factors are sorted and color coded (left and upper border) according to their general 1225 function.

surveillance Nrd1 cap binding complex Nab3 Cbc2 elongation factor Bur1 splicing Bur2 Ctk1 3' processing Ctk2 termination Cdc73 Leo1 export Paf1 deadenylation Rtf1 decapping Set1 Set2 Xrn1 Dot1 exosome and associated factors Spt5 Spt6 NMD Rpb1 Ctr9 Ist3 0.8 Nam8 0.4 Mud1 0.0 Snp1 Luc7 0.4 Mud2 0.8 Msl5 Pab1 Pub1 Rna15 Mpe1 Yth1 Cft2 Rat1 Pcf11 Rai1 Rtt103 Hpr1 Tho2 Gbp2 Mex67 Nab2 Npl3 Hrb1 Sub2 Yra1 Ccr4 Pop2 Not1 Caf40 Pan2 Pan3 Dcp2 Dcp1 Edc2 Edc3 Dhh1 Xrn1 Rrp6 Csl4 Rrp40 Rrp4 Rrp44 Air1 Trf5 Mtr4 Air2 Trf4 Ski2 Ski3 Ski7 Ski8 Upf1 Upf2 Upf3 Nmd4 Ist3 Air1 Trf5 Air2 Trf4 Rtf1 Cft2 Ski2 Ski3 Ski7 Ski8 Csl4 Ctr9 Paf1 Rai1 Yra1 Yth1 Set1 Set2 Ccr4 Msl5 Npl3 Spt5 Spt6 Mtr4 Ctk1 Ctk2 Rat1 Upf1 Upf2 Upf3 Xrn1 Bur1 Bur2 Luc7 Rrp6 Rrp4 Not1 Dot1 Leo1 Nrd1 Hpr1 Hrb1 Edc2 Edc3 Pop2 Pan2 Pan3 Pab1 Tho2 Pub1 Cbc2 Snp1 Sub2 Dcp2 Dcp1 Rpb1 Nab3 Nab2 Dhh1 Gbp2 Mpe1 Mud1 Mud2 Pcf11 Caf40 Nam8 Nmd4 Rrp40 Rrp44 Cdc73 Rna15 Mex67 Rtt103 1226 1227

61

1228 Figure 5—figure supplement 3. Two-dimensional embedding of co-localization 1229 between 74 RNA processing factors. 1230 tSNE plot visualizes similarities in co-localization profiles between RNA processing 1231 factors (factors are color-coded based on their function). Analysis for 74 RNA processing 1232 factors, including 30 factors from this study (marked in bold), and 44 factors from previous 1233 studies (Baejen et al., 2017, 2014; Battaglia et al., 2017; Schulz et al., 2013) (see 1234 Supplementary File 1).

surveillance Set1 cap binding complex Rat1 Spt5 elongation factor splicing Rpb1 3' processing Rna15 Gbp2 termination export deadenylation Npl3 decapping Nab2 Nab3 Tho2 5' 3' exonuclease Nrd1 Ctr9 Dcp2 exosome and associated factors Air2 Mtr4 Ski2 Mex67 NMD

Rrp44 Upf3 Trf4 Ski3 Rrp4 Upf2

Luc7 Rrp40 Upf1 Mud2 Snp1

tSNE component 2 Edc3 Msl5 Mud1 Rrp6 Csl4 Dcp1 Edc2 Ist3 Rtt103 Nmd4 Xrn1 Nam8 Dot1 Mpe1 Trf5 Yth1 Dhh1 Rai1 Paf1 Ccr4 Ctk1 Ski8 Leo1 Pub1 Ctk2 Rtf1 Cbc2 Bur1 Pcf11 Pab1 Ski7 Cdc73 Air1 Pan3 Bur2 Caf40 Hrb1 Cft2 Pan2 Hpr1 Pop2 Spt6 Set2 Sub2 Not1 Yra1 tSNE component 1 1235 1236

62

1237 Figure 6—figure supplement 1. Distributions of transcript length, half-life, 1238 expression level and transcript optimality for yeast mRNAs. 1239 Histograms on the diagonal show distributions of length, half-life (Methods), expression 1240 level (Baejen et al., 2017) and transcript optimality (Pechmann and Frydman, 2013). 1241 Pairwise comparisons of features are shown as scatter plots (top right) and kernel density 1242 estimates (KDEs) of bivariate densities are shown in the bottom with Pearson correlation 1243 values (r) (Methods).

5000

4000

3000 length 2000

1000

0

30 r = -0.04

20

half-life 10

0

20 r = -0.02 r = 0.40

15

10

expression 5

0

0.30 r = -0.06 r = 0.34 r = 0.33

0.25

0.20 transcript optimality transcript 0.15 0 2000 4000 0 10 20 30 0 10 20 0.15 0.20 0.25 0.30 length half-life expression transcript optimality 1244 1245

63

1246 Figure 6—figure supplement 2. Occupancies of deadenylation factors (Ccr4, Pop2, 1247 Not1, Caf40, Pan2, and Pan3) compared to transcript length, optimality, expression 1248 level, and half-life. 1249 (A) To understand binding specificity of deadenylation factors, the total occupancy of 1250 each factor on a transcript is plotted against various transcript features (Gray shading: 1251 95% confidence intervals generated by bootstrapping transcripts). (B) Same analysis as 1252 in Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts 1253 bound by a degradation factor compared to each codon’s frequency on all coding 1254 sequences. Each bar is colored according to its codon-optimality with highly optimal 1255 codons in dark red and highly non-optimal codons in dark blue. (Gray lines: 90% 1256 confidence intervals generated by bootstrapping coding sequences).

codon optimality A B low high

Ccr4 Pop2

0.16 0.10 Ccr4 (a.u.)

occupancy r = 0.45 0.00 0 0.00

-0.23 -0.22 (a.u.) Pop2 log2(codon enrichment) log2(codon enrichment) occupancy TTT TTT TAT TTA ATT TAT TTA ATT CTT TTC TCT CTT TTC TCT ATA AAT ATA AAT TCA CTA CAT ACT TAC ATC TCA CAT CTA ACT TAC ATC TGT GTT TTG TGT GTT TTG CTC CCT TCC CTC CCT TCC AAA AAA ACA CAA AAC ACA CAA AAC GTA AGT ATG GAT GTA AGT GAT ATG ACC CCA CAC ACC CCA CAC TGC TCG CTG CGT GTC GCT TGC TCG CTG CGT GTC GCT CCC CCC GAA AGA AAG GAA AGA AAG CGA AGC GCA ACG CAG GAC CGA AGC ACG GCA CAG GAC TGG GTG GGT TGG GTG GGT CGC CCG GCC CGC CCG GCC GGA AGG GAG GGA AGG GAG CGG GCG GGC r = 0.55 CGG GCG GGC GGG GGG 0

Not1 Caf40

0.10 0.13 (a.u.) Not1

occupancy r = 0.58 0.00 0.00 0

-0.22 -0.28 (a.u.) Caf40 log2(codon enrichment) log2(codon enrichment)

occupancy r = 0.48 TTT TTT TAT TTA ATT CTT TTC TCT TAT TTA ATT CTT TTC TCT ATA AAT TCA CTA CAT ACT TAC ATC TGT GTT TTG CTC TCC CCT ATA AAT TCA CTA CAT ACT TAC ATC TGT GTT TTG AAA CTC CCT TCC ACA CAA AAC GTA AGT ATG GAT ACC CAC CCA TGC TCG CTG CGT GCT GTC CCC AAA ACA CAA AAC GTA AGT ATG GAT ACC CCA CAC TGC TCG CTG CGT GTC GCT GAA AGA AAG CCC CGA AGC GCA ACG CAG GAC TGG GTG GGT CGC CCG GCC GAA AGA AAG CGA AGC ACG GCA CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG CGG GCG GGC GGA AGG GAG CGG GCG GGC GGG 0.0 GGG

Pan2 Pan3

0.11 (a.u.) Pan2 0.11

occupancy r = 0.54 0.00 0.0 0.00

-0.25 -0.20 (a.u.) Pan3 log2(codon enrichment) log2(codon enrichment)

occupancy r = 0.33 TTT TTT TAT TTA ATT TAT TTA ATT CTT TTC TCT CTT TTC TCT ATA AAT ATA AAT TCA CAT CTA ACT ATC TAC TCA CAT CTA ACT ATC TAC TGT GTT TTG TGT GTT TTG CTC TCC CCT CTC CCT TCC AAA AAA ACA CAA AAC ACA AAC CAA GTA AGT ATG GAT GTA AGT GAT ATG ACC CAC CCA CAC ACC CCA TGC TCG CTG CGT GTC GCT TGC TCG CTG CGT GTC GCT CCC CCC AGA GAA AAG AGA GAA AAG CGA AGC ACG GCA CAG GAC CGA AGC GCA ACG CAG GAC TGG GTG GGT TGG GTG GGT CGC CCG GCC CGC CCG GCC GGA AGG GAG GGA AGG GAG CGG GCG GGC CGG GCG GGC 0.0 GGG GGG 100020003000 0.200 0.225 5 10 25 50 length optimality expression half-life 61 codons (sorted by enrichment) 61 codons (sorted by enrichment)

1257 1258

64

1259 Figure 6—figure supplement 3. Occupancies of decapping factors (Dcp2, Dcp1, 1260 Edc2, Edc3, and Dhh1) compared to transcript length, optimality, expression level, 1261 and half-life. 1262 (A) To understand binding specificity of decapping factors, the total occupancy of each 1263 factor on a transcript is plotted against various transcript features (Gray shading: 95% 1264 confidence intervals generated by bootstrapping transcripts). (B) Same analysis as in 1265 Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts bound 1266 by a degradation factor compared to each codon’s frequency on all coding sequences. 1267 Each bar is colored according to its codon-optimality with highly optimal codons in dark 1268 red and highly non-optimal codons in dark blue. (Gray lines: 90% confidence intervals 1269 generated by bootstrapping coding sequences).

codon optimality A B low high

Dcp2 Dcp1

0.11 0.06

0.00 (a.u.) Dcp2 0.00

occupancy r = 0.61 0 -0.10

-0.15 log2(codon enrichment) log2(codon enrichment) TTT TTT ATT TAT TTA TAT TTA ATT TTC TCT CTT CTT TTC TCT ATA AAT ATA AAT ATC TAC ACT CTA CAT TCA TCA ATC CTA CAT ACT TAC GTT TGT TTG TGT GTT TTG TCC CTC CCT CTC TCC CCT AAA AAA AAC CAA ACA ACA AAC CAA ATG GTA GAT AGT ATG GTA AGT GAT ACC CCA CAC CCA CAC ACC GTC GCT CGT TGC CTG TCG TGC GCT GTC CGT CTG TCG CCC CCC AAG AGA GAA AGA GAA AAG GAC GCA AGC ACG CAG CGA CGA GCA ACG AGC GAC CAG GGT TGG GTG TGG GGT GTG GCC CGC CCG CGC GCC CCG GGA GAG AGG GGA AGG GAG GGC CGG GCG CGG GCG GGC GGG GGG (a.u.) Dcp1 Edc2 Edc3 occupancy r = 0.51 0 0.07 0.08

0.00 0.00 (a.u.) Edc2

occupancy r = 0.71 0 -0.15 -0.08 log2(codon enrichment) log2(codon enrichment) TTT TTT TAT TTA ATT ATT TTA TAT CTT TTC TCT CTT TTC TCT ATA AAT ATA AAT TCA CTA CAT ATC TAC ACT ATC CTA TAC ACT TCA CAT TGT GTT TTG TGT GTT TTG CTC TCC CCT CTC TCC CCT AAA AAA ACA CAA AAC ACA CAA AAC GTA ATG AGT GAT ATG GTA GAT AGT ACC CAC CCA ACC CCA CAC TGC CTG CGT TCG GTC GCT CGT TGC GTC GCT CTG TCG CCC CCC AGA AAG GAA AGA AAG GAA CGA AGC ACG GCA CAG GAC CGA GCA GAC ACG AGC CAG TGG GTG GGT TGG GGT GTG CGC CCG GCC CGC GCC CCG GGA AGG GAG GGA AGG GAG CGG GCG GGC CGG GCG GGC GGG GGG (a.u.) Edc3 Dhh1 61 codons (sorted by enrichment)

occupancy r = 0.56 0 0.07

0.00 (a.u.) Dhh1

occupancy r = 0.60 -0.11 0

1000 3000 0.200 0.225 5 10 25 50 log2(codon enrichment) TTT TAT TTA ATT CTT TTC TCT ATA AAT TCA CAT TAC ACT ATC CTA TGT GTT TTG CTC TCC CCT AAA ACA CAA AAC ATG GTA AGT GAT ACC CCA CAC TGC CGT GTC GCT TCG CTG CCC AGA AAG GAA CGA AGC GCA ACG CAG GAC TGG GGT GTG CGC GCC CCG GGA AGG GAG CGG GCG GGC length optimality expression half-life GGG 61 codons (sorted by enrichment) 1270 1271

65

1272 Figure 5—figure supplement 4. Occupancy of Xrn1 compared to transcript length, 1273 optimality, expression level, and half-life. 1274 (A) To understand binding specificity of Xrn1 on various mRNAs, the total occupancy of 1275 Xrn1 on a transcript is plotted against various transcript features (Gray shading: 95% 1276 confidence intervals generated by bootstrapping transcripts). (B) Same analysis as in 1277 Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts bound 1278 by a degradation factor compared to each codon’s frequency on all coding sequences. 1279 Each bar is colored according to its codon-optimality with highly optimal codons in dark 1280 red and highly non-optimal codons in dark blue. (Gray lines: 90% confidence intervals 1281 generated by bootstrapping coding sequences).

A (a.u.) Xrn1

occupancy r = 0.71 0 0 0 0 100020003000 0.200 0.225 5 10 25 50 length optimality expression half-life

codon optimality low high B

0.06

0.00

-0.12 log2(codon enrichment) TTT TAT TTA ATT CTT TTC TCT ATA AAT TCA CTA CAT ATC TAC ACT TGT GTT TTG CTC TCC CCT AAA ACA CAA AAC GTA ATG AGT GAT ACC CCA CAC TGC CGT GTC GCT TCG CTG CCC AGA AAG GAA CGA AGC GCA ACG CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG 61 codons (sorted by enrichment)

1282 1283

66

1284 Figure 6—figure supplement 5. Occupancies of exosome components (Rrp6, Csl4, 1285 Rrp40, Rrp4, and Rrp44) compared to transcript length, optimality, expression 1286 level, and half-life. 1287 (A) To understand binding specificity of exosome components, the total occupancy of 1288 each factor on a transcript is plotted against various transcript features (Gray shading: 1289 95% confidence intervals generated by bootstrapping transcripts). (B) Same analysis as 1290 in Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts 1291 bound by a degradation factor compared to each codon’s frequency on all coding 1292 sequences. Each bar is colored according to its codon-optimality with highly optimal 1293 codons in dark red and highly non-optimal codons in dark blue. (Gray lines: 90% 1294 confidence intervals generated by bootstrapping coding sequences).

A B codon optimality low high

Rrp6 Csl4

0.13 0.18 (a.u.) Rrp6

0.00 occupancy r = 0.50 0.00 0

-0.21 -0.31 log2(codon enrichment) log2(codon enrichment) Csl4 (a.u.) TTT TAT TTA ATT CTT TTC TCT ATA AAT TCA CTA CAT ACT ATC TAC TGT GTT TTG CTC CCT TCC TTT AAA ACA CAA AAC GTA AGT GAT ATG CAC CCA ACC TGC TCG CTG CGT GCT GTC CCC TAT TTA ATT CTT TCT TTC GAA AGA AAG CGA AGC ACG GCA CAG GAC TGG GTG GGT CGC CCG GCC ATA AAT TCA CAT CTA ACT ATC TAC TGT GTT TTG CTC CCT TCC GGA AGG GAG CGG GCG GGC AAA ACA CAA AAC GTA AGT GAT ATG CCA CAC ACC TGC TCG CTG CGT GTC GCT CCC GGG GAA AGA AAG CGA AGC ACG GCA CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG

occupancy r = 0.48 0 Rrp40 Rrp4

0.14 0.12

0.00

(a.u.) 0.00 Rrp40

occupancy r = 0.55 0 -0.15 -0.26 log2(codon enrichment) log2(codon enrichment) TTT TAT TTA ATT CTT TTC TCT TTT ATA AAT TCA CTA CAT ACT ATC TAC TGT GTT TTG CTC CCT TCC TAT TTA ATT AAA CTT TTC TCT ACA CAA AAC GTA AGT GAT ATG CCA ACC CAC TGC CTG TCG CGT GCT GTC CCC ATA AAT TCA CAT CTA ACT TAC ATC TGT GTT TTG GAA AGA AAG CTC CCT TCC CGA AGC GCA ACG CAG GAC TGG GTG GGT CGC CCG GCC AAA ACA CAA AAC GTA AGT GAT ATG CAC CCA ACC TGC TCG CTG CGT GTC GCT GGA AGG GAG CCC CGG GCG GGC GAA AGA AAG CGA AGC ACG GCA CAG GAC TGG GTG GGT CGC CCG GCC GGG GGA AGG GAG CGG GCG GGC GGG (a.u.) Rrp4 Rrp44 61 codons (sorted by enrichment)

occupancy r = 0.50 0 0.0 0.09

0.00 (a.u.) Rrp44 -0.14

occupancy r = 0.53 0 log2(codon enrichment)

1000 3000 0.200 0.225 5 10 25 50 TTT TTA TAT ATT CTT TTC TCT ATA AAT TCA CTA CAT ACT ATC TAC TGT GTT TTG CTC CCT TCC AAA ACA CAA AAC GTA AGT GAT ATG CCA CAC ACC TGC CTG TCG CGT GTC GCT CCC GAA AGA AAG CGA AGC GCA ACG CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG length optimality expression half-life 61 codons (sorted by enrichment) 1295 1296

67

1297 Figure 6—figure supplement 6. Occupancies for components of the TRAMP 1298 complex (Air1, Trf5, Mtr4, Air2, and Trf4) compared to transcript length, optimality, 1299 expression level, and half-life. 1300 (A) To understand binding specificity of TRAMP components, the total occupancy of each 1301 factor on a transcript is plotted against various transcript features (Gray shading: 95% 1302 confidence intervals generated by bootstrapping transcripts). (B) Same analysis as in 1303 Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts bound 1304 by a degradation factor compared to each codon’s frequency on all coding sequences. 1305 Each bar is colored according to its codon-optimality with highly optimal codons in dark 1306 red and highly non-optimal codons in dark blue. (Gray lines: 90% confidence intervals 1307 generated by bootstrapping coding sequences).

A B codon optimality low high

Air1 Trf5 0.14

0.08 0.00 Air1 (a.u.) 0.00

occupancy r = 0.57 0

-0.17 -0.29 log2(codon enrichment) log2(codon enrichment) TTT TAT TTA ATT CTT TTC TCT ATA AAT TCA CAT CTA ACT TAC ATC TGT GTT TTG CTC CCT TCC AAA ACA CAA AAC GTA AGT GAT ATG CAC ACC CCA TGC CTG TCG CGT GTC GCT CCC GAA AGA AAG CGA AGC ACG GCA CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG TTT TAT TTA ATT CTT TTC TCT ATA AAT TCA CAT CTA TAC ATC ACT TGT GTT TTG CTC CCT TCC AAA ACA CAA AAC GTA AGT GAT ATG CAC ACC CCA TGC TCG CTG CGT GTC GCT CCC AGA GAA AAG CGA AGC GCA ACG CAG GAC TGG GTG GGT CGC CCG GCC GGA AGG GAG Trf5 CGG GCG GGC GGG (a.u.)

occupancy r = 0.60 Air2 Trf4 0

0.09 0.17

0.00 Mtr4 (a.u.)

occupancy r = 0.07 0.00 0 -0.11 -0.10 log2(codon enrichment) log2(codon enrichment) TTT TTT TTA TAT ATT TTA ATT TAT CTT TTC TCT TCT TTC CTT ATA AAT AAT ATA TCA CTA CAT ACT ATC TAC TCA CTA ACT CAT ATC TAC TGT GTT TTG GTT TTG TGT CCT TCC CTC CTC CCT TCC AAA AAA ACA CAA AAC ACA CAA AAC GTA AGT GAT ATG GAT GTA AGT ATG CCA ACC CAC CCA CAC ACC GCT GTC CTG CGT TCG TGC TGC CTG GCT TCG CGT GTC CCC CCC GAA AGA AAG GAA AGA AAG CGA AGC GCA ACG CAG GAC GCA GAC CGA ACG CAG AGC GGT TGG GTG TGG GTG GGT GCC CCG CGC CCG CGC GCC GGA AGG GAG GGA GAG AGG GCG GGC CGG CGG GCG GGC GGG GGG Air2 (a.u.) Mtr4 61 codons (sorted by enrichment)

occupancy r = 0.19 0 0.23

0.00 Trf4 (a.u.)

occupancy r = 0.56 -0.19 0 log2(codon enrichment)

1000 3000 0.200 0.225 5 10 25 50 TTT TTA ATT TAT CTT TCT TTC AAT ATA CTA TCA CAT ATC ACT TAC TTG GTT TGT CCT TCC CTC AAA ACA CAA AAC GAT GTA AGT ATG CCA ACC CAC CGT CTG GCT GTC TCG TGC CCC GAA AAG AGA GCA GAC ACG CAG CGA AGC TGG GTG GGT CCG GCC CGC GGA GAG AGG GGC GCG CGG GGG length optimality expression half-life 61 codons (sorted by enrichment)

1308 1309

68

1310 Figure 6—figure supplement 7. Occupancies for components of the Ski complex 1311 (Ski2, Ski3, Ski7, and Ski8) compared to transcript length, optimality, expression 1312 level, and half-life. 1313 (A) To understand binding specificity of factors in the Ski complex, the total occupancy of 1314 each factor on a transcript is plotted against various transcript features (Gray shading: 1315 95% confidence intervals generated by bootstrapping transcripts). (B) Same analysis as 1316 in Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts 1317 bound by a degradation factor compared to each codon’s frequency on all coding 1318 sequences. Each bar is colored according to its codon-optimality with highly optimal 1319 codons in dark red and highly non-optimal codons in dark blue. (Gray lines: 90% 1320 confidence intervals generated by bootstrapping coding sequences).

A B codon optimality low high

Ski2 Ski3

0.08 0.07 Ski2 (a.u.) 0.00 occupancy r = 0.57 0.00 0

-0.13 -0.17 Ski3 (a.u.) log2(codon enrichment) log2(codon enrichment) TTT TTT TAT ATT TTA TAT TTA ATT TTC CTT TCT CTT TTC TCT ATA AAT ATA AAT occupancy TAC ATC ACT CTA CAT TCA CTA TCA CAT ATC TAC ACT TGT TTG GTT TGT GTT TTG CTC TCC CCT CTC TCC CCT AAA AAA ACA CAA AAC ACA CAA AAC ATG GTA AGT GAT GTA ATG AGT GAT ACC CAC CCA CAC CCA ACC TGC GTC GCT TCG CGT CTG TGC CTG TCG CGT GTC GCT CCC CCC AGA AAG GAA AGA GAA AAG CGA AGC GCA ACG GAC CAG CGA AGC ACG GCA CAG GAC TGG GGT GTG TGG GTG GGT GCC CGC CCG CGC CCG GCC GGA AGG GAG GGA AGG GAG CGG GGC GCG CGG GCG GGC r = 0.71 GGG GGG 0

Ski7 Ski8 0.10 0.11 Ski7 (a.u.) 0.00 occupancy r = 0.60 0.00 0

-0.20 -0.23 Ski8 (a.u.) log2(codon enrichment) log2(codon enrichment) TTT TTT TAT TTA ATT TAT TTA ATT CTT TTC TCT CTT TTC TCT ATA AAT ATA AAT TCA CAT CTA TAC ATC ACT TCA CTA CAT ACT TAC ATC TGT GTT TTG TGT GTT TTG CTC TCC CCT CTC CCT TCC AAA AAA ACA AAC CAA ACA CAA AAC GTA AGT ATG GAT GTA AGT ATG GAT CAC ACC CCA ACC CAC CCA TGC TCG CTG GCT CGT GTC TGC TCG CTG CGT GTC GCT CCC CCC GAA AGA AAG AGA GAA AAG CGA AGC ACG GCA CAG GAC CGA AGC ACG GCA CAG GAC TGG GTG GGT TGG GTG GGT CGC CCG GCC CGC CCG GCC GGA AGG GAG GGA AGG GAG CGG GCG GGC CGG GCG GGC GGG GGG

occupancy r = 0.69 0 1000 3000 0.200 0.225 5 10 25 50 61 codons (sorted by enrichment) 61 codons (sorted by enrichment) 1321 length optimality expression half-life 1322

69

1323 Figure 6—figure supplement 8. Occupancies for components of the NMD pathway 1324 (Upf1, Upf2, Upf3, and Nmd4) compared to transcript length, optimality, expression 1325 level, and half-life. 1326 (A) To understand binding specificity of factors in the NMD pathway, the total occupancy 1327 of each factor on a transcript is plotted against various transcript features (Gray shading: 1328 95% confidence intervals generated by bootstrapping transcripts). (B) Same analysis as 1329 in Figure 5C: Codon enrichment shows deviations in codon frequencies of transcripts 1330 bound by a degradation factor compared to each codon’s frequency on all coding 1331 sequences. Each bar is colored according to its codon-optimality with highly optimal 1332 codons in dark red and highly non-optimal codons in dark blue. (Gray lines: 90% 1333 confidence intervals generated by bootstrapping coding sequences).

A B codon optimality low high

Upf1 Upf2 (a.u.) Upf1 0.06 0.07

occupancy r = 0.66 0 0.00 0.00

-0.10 (a.u.) Upf2 -0.08 occupancy

r = 0.27 log2(codon enrichment) log2(codon enrichment)

0 TTT ATT TTA TAT TTC TCT CTT AAT ATA TCA ATC ACT CTA CAT TAC GTT TTG TGT CCT TCC CTC AAA AAC ACA CAA ATG GTA GAT AGT CCA CAC ACC GCT TGC TCG GTC CGT CTG CCC AGA AAG GAA GCA ACG GAC CGA CAG AGC TGG GGT GTG CGC GCC CCG TTT GGA AGG GAG CGG GGC GCG TTA TAT ATT CTT TTC TCT GGG ATA AAT CTA TCA ATC TAC CAT ACT TGT GTT TTG CTC TCC CCT AAA ACA CAA AAC ATG GTA AGT GAT CCA ACC CAC TGC CGT GTC GCT CTG TCG CCC AGA AAG GAA CGA GCA AGC ACG CAG GAC TGG GGT GTG CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG

Upf3 Nmd4 (a.u.) Upf3 0.06 0.06

occupancy r = 0.76 0.00 0.00 0

-0.13 -0.14 (a.u.) Nmd4 occupancy log2(codon enrichment)

r = 0.29 log2(codon enrichment) TTT TTA TAT ATT CTT TTC TCT ATA AAT TCA CAT CTA ATC ACT TAC TGT GTT TTG 0 CTC CCT TCC AAA ACA AAC CAA GTA ATG AGT GAT CAC CCA ACC TGC CGT TCG CTG GTC GCT CCC AGA GAA AAG CGA GCA ACG AGC CAG GAC TGG GTG GGT CGC CCG GCC TTT AGG GGA GAG CGG GCG GGC TAT TTA ATT CTT TTC TCT GGG ATA AAT ATC TCA TAC CTA CAT ACT TGT GTT TTG CTC TCC CCT AAA ACA CAA AAC ATG GTA AGT GAT ACC CAC CCA TGC CGT GTC GCT TCG CTG CCC AGA AAG GAA CGA AGC GCA ACG CAG GAC TGG GGT GTG CGC CCG GCC GGA AGG GAG CGG GCG GGC GGG 1000 3000 0.200 0.225 5 10 25 50 length optimality expression half-life 61 codons (sorted by enrichment) 61 codons (sorted by enrichment) 1334 1335

70

1336 Figure 6—figure supplement 9. Correlation between binding to degradation 1337 factors and transcript length, codon-optimality, expression, and half-life. 1338 Pearson correlation values between the binding strength of degradation factors (total 1339 occupancy over each transcript) and transcript length, transcript optimality (Pechmann 1340 and Frydman, 2013), expression level (Baejen et al., 2017), and half-life derived by 1341 multivariate linear regression analysis (Methods).

length optimality expression half-life 0.8

0.6

0.4

0.2

0.0

Pearson correlation 0.2

0.4 Air1 Trf5 Air2 Trf4 Ski2 Ski3 Ski7 Ski8 Csl4 Ccr4 Mtr4 Upf1 Upf2 Upf3 Xrn1 Rrp6 Rrp4 Not1 Edc2 Edc3 Pop2 Pan2 Pan3 Dcp2 Dcp1 Dhh1 Caf40 Nmd4 Rrp40 Rrp44 1342

71