Supplementary Experimental Procedures S1-DRIP detailed protocol 3x1010 cells of exponentially growing cells (in 1.2 L of YEP+ 2% glucose) were pelleted at 5500 rpm in a Sorvall RC-5B centrifuge, and pellets were either frozen at -80ºC or used directly for genomic DNA extraction. DNA extractions were done with Qiagen tip 500/G following standard protocol, with three exceptions: 30ul of 100T 20mg/ml zymolyase was used for spheroplasting, RNase A was not added to the G2 buffer, and only 200ul of Proteinase K was used in the deproteinization step. DNA is resuspended in 1.5ml nuclease-free TE buffer and nutated overnight at 4ºC. 3x1010 cells should yield ~200- 300ug of DNA, all of which is used for an IP. Note that despite the omission of RNase A treatment, no RNA contamination is detectable in the genomic preps as assayed by ethidium gel and 260/280 ratio. For the S1 nuclease treatment, S1 nuclease is diluted fresh 1000x in dilution buffer (to 1U/ul), and 2.5ml reaction set up with final 1X S1 Buffer, 0.3M NaCl, and 300U S1 nuclease (Thermo Fisher, Waltham, MA). Reactions were conducted at 30ºC for 50 minutes, and stopped with an equal volume of TE-50 (0.2M Tris pH 8.0, 50mM EDTA pH 8.0), and 670ul of 3M NaCl. Following precipitation DNA was resuspended in 500ul TE, and sonicated with Covaris™ (9 cycles with settings at duty cycle: 20%, intensity: 10, cycles/burst: 200, and 30s rest in between). For the IP, 35ug of S9.6 antibody was pre-bound to 160ul of magnetic Protein A Dynabeads (Thermo Fisher) for 1 hr at 4ºC. Sonicated DNA in final 1X FA (1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS, 50 mM HEPES, 150 mM NaCl, 1 mM EDTA) was incubated with the S9.6—protein A magnetic beads for 2 hrs at 4ºC. Beads were then washed and the DNA eluted according to standard ChIP protocols: 5 successive washes with 1X FA, 1X FA with 0.5M NaCl, Lithium chloride detergent (0.25M LiCl, 0.5% NP-40, 0.5% Sodium deoxycholate, 1mM EDTA, and 10mM Tris-HCl), and finally twice with TE buffer. DNA was then eluted from the beads with 300ul of Elution buffer (1% SDS, 0.1M NAHCO3). To ensure antibody is fully removed prior to library prep, samples are treated with 1ul proteinase K (20mg/ml) for 1 hour at 42° while shaking at 1000 rpm, extracted with phenol:chloroform, and ethanol precipitated. Final DNA pellet was resuspended in 50ul TE buffer. For RNase H treated controls, S1 nuclease-treated gDNA was treated with 50 units of RNase H (NEB, Ipswich, MA) overnight at 37ºC and re-precipitated prior to sonication and IP.

Spike-in Components of the spike-in (listed in Supplemental Table 8) were mixed at equimolar ratio in annealing buffer (50mM NaCl, 10mM Tris ph8, 1mM EDTA). The oligos were annealed by incubating the mix at 95° for 2 minutes and slowly decreasing to room temperature (~ 2 hours). 1.2x109 spike-in molecules were used in each S1-DRIP-seq reaction.

Wild-type specific peaks. The wild-type only hybrid-prone features listed in Supplemental Table 4 were identified by cross referencing the hybrid-prone features common to at least three of the four wild-type replicates with the hybrid-prone features identified in each individual rnh1∆rnh201∆ replicate using the intersectBed tool. Hybrid- prone features only found in wild-type samples, along with coordinates of the hybrid- prone region in the feature are reported. If multiple hybrid-prone regions existed within one feature, coordinates of only one region was reported.

Supplementary Figure Legends Figure S1.1. Development of S1-DRIP. (A) Representative α-DNA:RNA hybrid dot blot of gDNA from wild-type and rnh1∆rnh201∆ at 30s and 2min of exposure. (B) Quantitation of dot blot in Fig. 1B. Signals at each spot were quantified with ImageJ, and represented as % recovered relative to the untreated control. (C) Effect of S1 amount on DNA:RNA hybrids. Excess use of S1 nuclease (10U/ug) led to destruction of R-loops in gDNA prior to sonication.

Figure S1.2. Spike-in sequencing. (A) Expected spike-in sequencing results based on library prep protocol. During End repair, the combined action of Klenow and T4 DNA polymerase allows for strand displacement/filling in of the RNA region in the coding strand. This results in a full length dsDNA molecule for which reads can be obtained from both the coding and template strands. (B) Schematic of the spike-in and sequencing results obtained. Reads mapped to the spike-in were distributed 40%-60% to the coding and template strands, respectively. The start sites of the reads, and the proportion starting at the 5’-most ends of the coding and template strands (the expected start site) are indicated. (C) Models explaining the spike-in reads mapping to unexpected start site. Approximately 34% of total reads mapped to unexpected start sites. Potential explanations are: breakage at the site of DNA:RNA hybrid resulting in two separate dsDNA molecules for sequencing (left panel), and variability of 5’ end start site as a result of incomplete oligonucleotide synthesis. Expected percent of full-length products was obtained from IDT website.

Figure S1.3. DRIP results and sample correlations. (A) Average read counts recovered from various samples. (B) Spearman correlation coefficients of the 4 wild-type, 4 rnh1∆rnh201∆ IP samples, and 2 totals (input DNA).

Figure S1.4. Expected sequencing results of genomic DNA:RNA hybrids. (A) Expected results based on library prep protocol, when hybrid size is smaller than fragmentation size (<200-300 bp), indicated by grey dotted lines. Fragments resulting from DRIP of sonicated DNA, and individual reads mapping to the plus (pink) and minus (blue) strands are depicted. For smaller hybrids, all fragments recovered after sonication should contain a hybrid and hence be recognized by the S9.6 antibody, as well as a dsDNA region that can be extended. Note that based on the specificity of the antibody (Hu et al., 2006) fragments with longer hybrids would be preferentially bound. (B) Expected results when hybrid size is larger than fragmentation size (>200-300 bp), indicated by grey dotted lines. If hybrid region was greater than fragmentation size, then a subset of fragments IP’ed are expected to not contain a dsDNA region and hence will be poor substrates for library construction. This would lead to a different distribution of sequenced tags than in (A), with a large internal region of the hybrid-prone region with poor coverage.

Figure S1.5. S1-DRIP-seq results. (A) Comparison of S1-DRIP-seq, ChIP-seq, and DRIP-chip. Chromosomal snapshots of Chr X (top) and Chr III (bottom) of normalized coverage tracks from S1-DRIP-seq and ChIP-seq (El Hage et al., 2014), and BED tracks of called hybrid-prone regions from Chan et al.’s DRIP-chip study (2014). (B) Quantitation of hybrid-prone regions in S1-DRIP-seq. For all 781 hybrid prone regions mapped in rnh1∆rnh201∆, the read counts from wild-type, rnh1∆rnh201∆, and RNase H-treated rnh1∆rnh201∆ normalized to spike-in are shown. Hybrid prone regions are ranked along the x-axis by read counts in the rnh1∆rnh201∆ in descending order.

Figure S1.6. S1-DRIP-qPCR validation. (A) S1-DRIP-qPCR of RPL15A in wild-type, rnh1∆rnh201∆, and RNase-H treated rnh1∆rnh201∆. X-axis coordinates are plotted relative to RPL15A ORF coordinates. (B) S1-DRIP-qPCR of hybrid-prone regions with strong, mid- and low level peaks, and control regions in wild-type and rnh1∆rnh201∆. To the right of the graph, representative snapshots of strong, mid-, low and no peaks from rnh1∆rnh201∆ are shown.

Figure S2.1. S1-DRIP-seq peaks distribution among features.

Figure S2.2. Snapshots of hybrid-enriched loci. (A) Representaive snapshots from rnh1∆rnh201∆ and wild-type show relative R-loop levels in the two strains. The loci are from left to right, top to bottom: rDNA (RDN37-1, chr XII 450k-460k), Ty element (YDRCTy1-1, chr IV 645k-652k), telomere (TEL01L-TR, chr I 0-400bp), snoRNA (SNR128 and SNR190, chr X 139k-140.5k), tRNA (tA(AGC)F, chr VI 204.5k-205.5k), and protein-coding ORF (TPI1, chr IV 555k-557k). (B) Blow up of the RDN37-1 locus from rnh1∆rnh201∆. Dotted lines indicate the borders of the spacer regions that are in the primary transcript, but are rapidly eliminated for the production of the three final mature transcripts.

Figure S2.3. S1-DRIP-qPCR of DNA:RNA hybrids at Ty1 elements. (A) qPCR of all Ty1 elements in wild-type, rnh1∆rnh201∆, and RNase-H treated rnh1∆rnh201∆. X-axis coordinates are plotted relative to TSS of Ty1 elements (bp). (B) qPCR to detect hybrids only at genomic Ty repeats and not in Ty1 cDNA. Specific Ty1 elements in wild-type and rnh1∆rnh201∆ using qPCR primer sets spanning the 5’ (left panel) and 3’ (right panel) junctions of unique DNA and Ty1 elements.

Figure S2.4. RE-DRIP-qPCR of DNA:RNA hybrids at Ty1 elements. (A) qPCR of all Ty1 elements in wild-type and rnh1∆rnh201∆. X-axis coordinates are plotted relative to TSS of Ty1 elements, and restriction sites are indicated below Ty1 schematic. (B) qPCR of specific Ty1 elements using qPCR primer sets spanning the 5’ (left panel) and 3’ (right panel) junctions of unique DNA and Ty1 elements.

Figure S2.5. S1-DRIP-qPCR of rDNA locus. qPCR of the rDNA locus in wild-type, rnh1∆rnh201∆, and RNase-H treated rnh1∆rnh201∆. X-axis coordinates are plotted relative to 35S rRNA TSS.

Figure S2.6 Distribution of hybrids at tRNAs and sn/snoRNAs. tRNAs and sn/snoRNAs are aligned from TSS to TES and plotted ±1 kb. Metagene plots display the median read counts for each feature from an rnh1∆rnh201∆ DRIP library (top) as well as control rnh1∆rnh201∆ input DNA library (bottom). The heatmaps display the signals obtained from individual features, sorted according to total signal strength.

Figure S2.7. Mitochondrial S1-DRIP-seq. Snapshot of mitochondrial genome showing hybrid formation pattern in wild-type, rnh1∆rnh201∆ and the corresponding RNase-H treated samples.

Figure S3. RNA induction in galactose. The relative change in RNA levels for GAL7 and SMC3 after Induction of expression with galactose was measured using quantitative reverse-transcriptase PCR.

Figure S4.1. Hybrids at ORFs. (A) Metagene plot and heatmap of signals at hybrid-prone ORFs from control rnh1∆rnh201∆ input DNA. (B) Size distribution of the ORF-associated hybrid-prone regions. (C) Overlap distribution of ORF-associated hybrid-prone regions. (D) Comparison of the size of ORFs with the % of ORF hybrid-prone. For hybrid-prone regions largely overlapping ORFs (80-100%), ORFs size tend to smaller, while for hybrid-prone regions that only minorly overlap ORFs (1-20%) ORFs size tend to larger.

Figure S4.2. Expression-binned hybrid signal. Metagene plot and heatmap of signals at hybrid-prone ORFs ±1 kb from rnh1∆rnh201∆, binned by expression. Expression categories are: 1, 2, 3-4, 5-8, 9-12, 13-16 and 17-20.

Figure S5.1. Analysis of sequence features of hybrid-prone ORFs. (A) GC content of hybrid-prone and non-hybrid prone regions. Only hybrid-prone genes in the top 5% of expression are significantly more GC-rich than non-hybrid prone genes. P-values were calculated with Wilcoxon rank-sum test, and significant p-values indicated. (B) GC and AT skew of telomeric hybrids. Both GC [(G-C)/(G+C)] and AT [(A-T)/(A+T)] skew were high at telomeric hybrid regions. (C) GC skew of hybrid-prone regions. No significant GC skew was observed at hybrid- prone regions in the various expression categories.

Figure S5.2. Inverse correlation of positive AT skew and expression level at yeast ORFs. Metagene plots and heatmaps of ORFs ±1 kb of AT skew binned by expression level pointed to the increased positive AT skew across ORFs at lower expressed ORFs. AT skew was calculated over a 50bp sliding window.

Figure S5.3. AT skew at hybrid-prone ORFs. The AT skew over hybrid-prone ORFs is more positive than over non-hybrid prone ORFs in low expression categories (Bins 9- 20). P-values were calculated with Wilcoxon rank-sum test, with one asterisks indicating p=0.05 and two asterisks p< 0.005. Figure S5.4. AT skew of hybrid and non-hybrid regions. Within hybrid-prone ORFs, the regions with hybrids had significantly more positive AT skew than non-hybrid regions in expression bins 5-20. P-values were calculated with Wilcoxon rank-sum test.

Figure S5.5. PolyA tracts and hybrid formation. (A) Fraction of polyA tracts overlapping hybrid regions. Imperfect polyA tracts also correlated with hybrid formation but at longer lengths: 25 bp for 1 mismatch, and 27 bp for 2 mismatches. (B) The absolute number of polyA tracts used for the analysis in S5.5A. (C) AT skew of hybrid regions with and without polyA tracts. Hybrid-prone regions that contained a polyA tract ≥ 21 bp had significantly lower positive AT skew than hybrid- prone regions without long polyA tracts. P-values were calculated with Wilcoxon rank- sum test, with one asterisks indicating p<0.015 and two asterisks p< 0.01.

Figure S6.1. PolyA-associated hybrid regions. (A) Snapshot of representative ORF with a polyA and hybrid prone region from rnh1∆ rnh201∆. The site of the polyA, and distribution of uniquely mapped reads in the region are shown. Based on the reads distribution, the predicted site of the hybrid is indicated, with the darker color corresponding to the strongest hybrid formation. (B) PolyA-associated hybrid regions are similar in size distribution to non-polyA hybrids.

Figure S6.2. Effect of PolyA deletions on hybrid formation. (A) Outline of the deletion of polyA. PolyA tracts are deleted using PCR-based seamless gene editing technique (Horecka and Davis, 2014) in both the wild-type and rnh1∆rnh201∆ background. The cassette used for knockout of polyA contains the 50 bp upstream of a polyA followed by the 50 bp downstream of the polyA, the URA3 selectable marker, and finally a second copy of the 50 bp downstream of the polyA. Following selection of integrants, recombination of the two copies of the 50 bp downstream of the polyA pops out the marker and leaves behind a scar-free deletion (pA∆). All deletions were validated with Sanger sequencing. (B) % hybrid signal in polyA (solid bars) and pA∆ (hatched bars) in wild-type (light green) and rnh1∆rnh201∆ (dark green) strains. Deletion of polyAs led to significant decrease in hybrid signal in both wt and ∆∆ in 3 of 4 regions tested. (C) Effect of pA∆ on mRNA expression levels. No significant decrease in RNA levels were detected in either wt (light green) or rnh1∆rnh201∆ (dark green). RNA levels were quantified with quantitative reverse-transcriptase PCR. Data presented is the average of two experiments, and error bars represent their standard deviation.

Supplemental Sheet S1. Coordinates of hybrid-prone regions in rnh1∆rnh201∆. Supplemental Sheet S2. Coordinates of hybrid-prone regions in wild-type. Supplemental Sheet S3. Features overlapped with hybrid-prone regions in rnh1∆rnh201∆. Supplemental Sheet S4. Features with an overlapped hybrid-prone region specific to wild-type. Supplemental Sheet S5. ORFs in expression bin 1 and 2 with hybrid-prone regions identified in 2 of 4 rnh1∆rnh201∆ replicates. Supplemental Sheet S6. MEME analysis of hybrid-prone regions. Supplemental Sheet S7. Perfect polyA tracts ≥21 bp, overlapping hybrid-prone region and features associated with the hybrid-prone region. Supplemental Sheet S8. Primers and strains used in this study.

A) ∆ B) hybrid rnh201

∆ - S1 100

In vitro In rnh1 wildtype + S1 80 6.5ng 500ng 60 250ng

40 125ng

20

0 % (Sonicated/unsonicated DNA) %(Sonicated/unsonicated 500ng 250ng 125ng Amount of gDNA spotted 0.2ng 30 sec

Figure S1.1 2 min Dot Blot and S1 Dot blot C)

Units of S1 (per ug DNA): 0 0.1 0.2 1 5 10

500ng S1 treatment 250ng

125ng

S1 treatment + sonication A) B) Hybrid Spike-in (b) left 65mer RNA right 65mer P7 Actual Results, SR50 5’ Coding Strand (+) P5 (Read1) 3’ Template Strand (-) P5 (Read1) 150 bp

Synthesis/strand displacement 5’ 3’ 3’ 5’ End Repair (Klenow &T4) 5’ 3’ 3’ 5’

5’ A 3’ A-Tailing 52% of 3’ A 5’ template 88% of strand reads coding 3% of coding start at strand strand reads 5’-most reads start end at 5’-most Adapter T A end Ligation A T

40% of reads map to 60% of reads map to Denature & Amplify

Figure S1.2 Theoretical Results, SR50 Spike-in sequencing

C) (2) CleavageCleavage ofof ssDNAssDNA region/ region/adapter adapter ligation ligation to ssDNA.to ssDNA Variability at 5’ end due to incomplete oligonucleotide synthesis. Expected % full-length product for 65mers: 60-65% Expected % full-length for 150mer: 45-50% 5’ 3’ 3’ 5’ 5’ 5’ 3’ 3’ 5’ 5’ 3’ End Repair Reaction 5’ 3’ 5’ 5’ synthesis 3’ 5’ 5’ 3’ 5’ chew back End Repair Reaction 5’ 5’ 5’ end of template strand

starts +65->+85 (n=73) 5’ end of coding will 5’ end of coding strand at +1 consistently start at +86, 5’ end of template strand at +150 A)

Peaks

B) Figure S1.3 DRIP results and correlation

Spearman Correlation Coefficients Figure S1.4 (a) genome hybrid * Hybrid size < fragmentation size Coding Strand 5’ 3’ sequencing 3’ Template Strand 5’ Fragmentation

Fragments with longer hybrids may be preferentially IP’ed.

End repair, A-Tailing, Adapter ligation and Amplification P7 P5 (Read1)

Synthesis T A Synthesis A T T A A T 5’ end of fragments are sequenced, T A A T SR50 chew back

≥50bp

Distribution of tags ≥50bp

Hybrid size > fragmentation size 5’ 3’ 3’ 5’

End repair, A-Tailing, Adapter ligation and Amplification P7 P5 (Read1)

Synthesis T T A A Fragmentation A T

T A A T chew back

5’ end of fragments are sequenced

Distribution of tags Figure S1.5 comparison with other methods

A)

100 kb 110 120 130 140

[0 - 0.02]

WT

[0 - 0.02] Wahba et al. (S1-DRIP-seq) H1∆ H201∆

Hybrid regions

[0 - 6.00] WT El Hage et al. (ChIP-seq) [0 - 6.00] H1∆ H201∆

Chan et al. WT (DRIP-chip) H1∆ H201∆

KRE9 CPS1 YJL169W ERG20 HAL5 TPK1 JJJ2 tE FMP33 HSP150 FAR1 SSY5 FBP26 VPS35 INO1SNA3 SNR190

20 kb 40 60 80

[0 - 0.02] WT

Wahba et al. [0 - 0.02] (S1-DRIP-seq) H1∆ H201∆

Hybrid regions

[0 - 6.00] WT El Hage et al. [0 - 6.00] (ChIP-seq) H1∆ H201∆

Chan et al. WT (DRIP-chip) H1∆ H201∆

MRC1 FYV5PRD1 KAR4 SPB1 LRE1 SPS22 EMC1 PDI1 GLK1 GID7 SRO9 GRX1 STE50 HIS4 BIK1 FUS1 FRM2 YCL023C B)

300 rnh1∆rnh201∆

wildtype

250 rnh1∆rnh201∆ + RNase H treat.

200

150

100

Numberreadsof mapped peakto (x100) 50 # of reads perMACS#of mapped peak 0 0 100 200 300 400 500 600 700 800

MACS Peak

Figure S1.5B S1-DRIP-seq signal A) RPL15A

0.4 0.4 rnh1∆rnh201∆ wildtype rnh1∆rnh201∆ 0.3 0.3 + RNase H treat.

0.2 0.2 %hybrid signal 0.1 0.1

0.0 0.0 -100 0 100 200 300 400 500 600 700 800 9000 1 2 3 4 Base Pairs from ORF Start GAL7 BUD3 Chr III ChrIII YLR149C Intergenic Figure S1.6 DRIP-qPCR B) validation

0.40.4 0.4 0.40.40.4 0.4 0.4 HSP150 wildtype

rnh1∆rnh201∆ 0.30.3 0.3 0.30.30.3 0.3 0.3 LSR1

0.20.2 0.2 0.20.20.2 0.2 0.2 hybridsignal

% ENO1 0.10.1 0.1 0.10.10.1 0.1 0.1

0.00.0 0.0 0.00.00.0 0.0 0.0 RNA170 HSP150HSP150CIS3HSP150CIS3MTL1HSP150MTL1CIS3HSP150HSP150LSR1LSR1MTL1LSR1 CIS35'HSP150LSR1 CIS35'CIS3 internalLSR1MTL1 internalMTL1LSR1 RPR1CIS3MTL15' LSR1RPR1 internalLSR1LSR1MTL1SCR1LSR1 5' SCR1RPR1LSR1 LSR15' 5' internalLSR1NME1 internal internalNME1SCR1LSR1RPR1 5' RPR1SNR14RPR1 internalSNR14NME1SCR1SCR1RPR1SCR1SED1SNR14SED1NME1NME1SCR1NME1RPL3SED1SNR14RPL3SNR14SNR14NME1ENO1ENO1RPL3SED1SNR14SED1SED1ENO1RPL3RPL3SED1RPL3ENO1ENO1RPL3ENO1RNA170ENO1NUM1 VTC3 CIS3 MTL1 LSR1 SCR1 SED1 ENO1 RPL3 NUM1 VTC3 HSP150 NME1 SNR14 RNA170 Strong Peaks Mid Peaks Low Peaks No Peaks A)

TOTAL%781%peaks%

Ty/Delta,&19& NotA Mitochondria,&2& annotated,& 96& Telomeres,&18&

Structural& RNAs,&147& Genes,&499&

Figure S2.1 DRIP distribution A)

rDNA Ty 452 kb 456 460 646kb 648 650 [0 - 4] [0 - 0.5] WT

[0 - 4] RDN37 RDN5 [0 - 0.5] YDRCTy1-1 H1∆ H201∆

Tel snoRNA tRNA ORF 0 bp 200 bp 400 bp 140kb 205kb 556kb [0 - 0.03] [0 - 0.03] WT [0 - 0.03] [0 - 0.03]

H1∆ SNR128 SNR190 [0 - 0.03] TEL01L-TR [0 - 0.03] tA(AGC)F [0 - 0.03] TPI1 H201∆ [0 - 0.03]

Figure S2.2 Hybrid-forming regions WT vs H1∆H201∆

B)

H1∆ H201∆

RDN37 A)

5.0

4.5 rnh1∆rnh201∆ 4.0 wildtype 3.5 rnh1∆rnh201∆ 3.0 + RNase H treat. 2.5

2.0

%hybrid signal 1.5

1.0

0.5

0.0 0 1000 2000 3000 4000 5000 6000

Delta TyA TyB Delta B)

0.6 0.6 0.6 wildtype 0.5 0.5 0.5 rnh1∆rnh201∆

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2 (% Total Input) Total (%

0.1 0.1 0.1

0 0 0

GAL7

YARCTy1-1YDRCTy1-2 YNLWTy1-2 YMRCTy1-3 YHRCTy1-1 YMLWTy1-1 YOLWTy1-1 YARCTy1-1YDRCTy1-2 YARCTy1-1YNLWTy1-2 YDRCTy1-2 YMRCTy1-3 YNLWTy1-2 YHRCTy1-1 YMRCTy1-3 YMLWTy1-1 YHRCTy1-1 YOLWTy1-1YMLWTy1-1

Figure S2.3 sonication and DRIP-qPCR A)

70 rnh1∆rnh201∆ 60 wildtype

50

40

30

%hybrid signal Figure S2.4 20 Restriction 10 digestion and 0 0 1000 2000 3000 4000 5000 6000 DRIP-qPCR

Delta TyA TyB Delta

wildtype B) 18 18 18.0

16 rnh1∆rnh201∆ 16 16.0

14 14 14.0

12 12 12.0

10 10 10.0

8 8 8.0

6 6 6.0 (% Total Input) Total (% 4 4 4.0

2 2 2.0

0 0 0.0

GAL7 GAL7

YLRCTy2-2 YARCTy1-1 YOLWTy1-1 YLRCTy2-2 YARCTy1-1YNLWTy1-2 YOLWTy1-1YLRWTy2-1YDRWTy2-2 YARCTy1-1YNLWTy1-2YNLWTy1-2YMRCTy1-3 YOLWTy1-1 YLRWTy2-1YLRWTy2-1YDRWTy2-2YDRWTy2-2 YLRCTy2-2 YARCTy1-1YNLWTy1-2YMRCTy1-3YMRCTy1-3YOLWTy1-1 YLRWTy2-1YDRWTy2-2YLRCTy2-2 A)

RDN37 5S 5’ ETS IGS1 IGS2 0.7 rnh1∆rnh201∆ wildtype 0.6 rnh1∆rnh201∆ + RNase H treat. 0.5

0.4

0.3 %hybrid signal

0.2

0.1

0.0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Figure S2.5 rDNA DRIP- qPCR tRNAs sno/snRNAs A)

300 200

Signal from DRIP

Figure S2.6 0 0 DRIP and

tRNAs sno/snRNAs TOTALS

16 18

Signal from TOTAL

0 0

Spike-in Normalized Read Coverage A) [0-0.15] [0-0.15] [0-0.015] [0-0.015] [0-0.15] MITOCHONDRIA mitochondria mitochondria Figure S Figure DRIP 2.7 ∆∆ wt+ RNase H wt+ RNase + RNase H + RNase ∆∆ ∆∆ wt Figure S3 RNA levels upon GAL induction A) 1800

1500

1200

900

ACT1 normalized 600

300 RNA fold induc-on (GAL vs DEX)

0 GAL7 SMC3 B)

200 A) ORFs TOTAL 150

100

50 16

NumberRegionsof 0

300 500 700 900 1100 1300 1500 2000 2500 3000 3500 4000 Hybrid-prone Region Size (Base Pairs)

C) 200

150

100

50

0 NumberRegionsof 0

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Overlap of Hybrid Region with ORF (% of ORF)

D)

100 % of ORF Hybrid-prone 1-20% 80 80-100% Figure S4.1 60 ORF and 40 hybrids 20 numberORFs of 0

0 300 500 700 900 1100 1300 1500 2000 2500 3000 3500 4000 15000>4000 Size of ORF (Base Pairs) Figure S4.2 expression levels heat maps

A)

ORFs normalized read coverage read normalized Genes

Expression Level A) Non-Hybrid genes Hybrid-prone genes

60% 60% 60% 60% 60% 60% p=3.53e-07 55% 55% 55% 55% 55% 55% 50% 50% 50% 50% 50% 50% 45% 45% 45% 45% 45% 45% 40% 40% 40% 40% 40% 40% 35% 35% 35% 35% 35% 35% 30% 30% 30% 30% 30% 30% GC ContentGC (%) GC ContentGC (%) 25% 25% 25% 25% 25% 25% ContentGC (%) 20% 20% 20% 20% 20% 20% 1 Expression1 Bin 1 Expression1 Bin 2 Expression Bins 17-20

telomeres B) 1.0

-1.0 0.0 1.0 AT Skew AT

-1.0 GC Skew

1.00 C) 0.80 0.8 Figure S5.1

0.60 0.6 GC content,

0.40 0.4 GC skew, AT skew 0.20 0.2

0.00 0.0

-0.20 -0.2

-0.40 -0.4 GC SkewGC (Hybrid-prone regions)

-0.60 -0.6 Expression Level A) AT Skew Genes

Expression Bins (1-10) Figure S5.2 AT skew heat map AT Skew Genes

Expression Bins (11-20) A)

Non-hybrid prone ORFs 0.800.80 0.800.80 0.80 Hybrid-prone ORFs

0.600.60 0.600.60 0.60 ** *

0.400.40 0.400.40 0.40 **

0.200.20 0.200.20 0.20

0.000.00 0.000.00 0.00

-0.20-0.20 -0.20-0.20 -0.20

-0.40-0.40 -0.40Skew (Full ORF) AT -0.40 -0.40

-0.60-0.60 -0.60-0.60 -0.60

-0.80-0.80 -0.80-0.80 -0.80 Expression Level

Figure S5.3 AT skew hybrid ORFs vs allORFs Expression Bin(s): 1 2 3-4 Non-hybrid

Hybrid

Expression Bin(s): 5-8 9-12 13-16 Non-hybrid

p=1.03e-6 p=6.25e-3 p=1.99e-3

Hybrid

Expression Bin(s): 17-20

Figure S5.4 Non-hybrid AT skew

p=6.60e-3

Hybrid A)

100% Perfect polyA/T 75% 1 non-A 2 non-A 50%

25%

0% atleast3regions (%total) Fraction of polyA tracts with tracts FractionpolyA of 21-221 23-242 25-263 27+4 Length of polyA tract (bp)

B) 250 Perfect polyA/T 1 non-A 2 non-A Figure S5.5 200 polyA

150 mutations, polyA and 100 ATskew

50 Number of polyA tracts NumberpolyA of

0 21-221 23-242 25-263 27+4 Length of polyA tract (bp) C) 0.60.60.60.60.6 0.60.60.60.60.6 ** ** * * polyA no polyA 0.40.40.40.40.4 0.40.40.40.40.4

0.20.20.20.20.2 0.20.20.20.20.2

0.00.00.00.00.0 0.00.00.00.00.0 AT skew AT

-0.2-0.2-0.2-0.2-0.2 -0.2-0.2-0.2-0.2-0.2

-0.4-0.4-0.4-0.4-0.4 -0.4-0.4-0.4-0.4-0.4

-0.6-0.6-0.6-0.6-0.6 -0.6-0.6-0.6-0.6-0.6 Expression 3-4 5-8 9-12 13-16 17-19 Bins: A) 100 bp 200 300 400 500 600 polyA

rnh1∆ rnh201∆

DNA sequence

rnh1∆ rnh201∆ reads distribution

DNA-RNA hybrid location

Figure S6.1 polyA peaks

B) 35% polyA 30% no polyA 25% 20% 15% 10%

ProportionHybrid-of 5% prone regions (% Total) proneregions (% 0% 100 200 300 400 500 600 1000

Length of Hybrid-prone Region (bp) A) Seamless genomic polyA deletion Figure S6.2 polyA ORF polyA polyA deletion URA3

URA3 ORF

5-FOA ORF pA∆

B) 1.40 polyA 1.20 pA∆ 1.00

0.80

0.60 % hybrid signal 0.40

0.20

H1∆ H1∆ H1∆ H1∆ H1∆ H1∆ WT WT WT WT WT WT H201∆ H201∆ H201∆ H201∆ H201∆ H201∆ LEU9 (5’) RPA135(5’) SPC1 (3’) RPL26B (intron) positive control negative control polyA containing genes

C) mRNA expression pA∆ vs polyA 500% 418% WT rnh1 rnh201 400% 343%

300%

200% 84% 82%

104% 102%

% expression relative to polyA 100% 72% 68%

0% LEU9 5'polyA RPA135 5'polyA SPC1 3'polyA RPL26B intronic polyA