A. ~20 nt -exon junction (EEJ) ~20 nt exon 1 3’ 5’ exon 2 CAGE tag

B. 3000 CAGE tags mapping to genome

CAGE tags mapping across exon-exon junction

2000

1000 CAGE tags (total count) tags (total CAGE

-50 nt 3’ 5’ +30 nt CAGE tag 5’ termini distribution relative to exon-exon junctions

Supplemental Fig. S1. Evidence of post-transcriptional cleavage from CAGE tag analysis. (A) Schematic diagram illustrating the mapping strategy employed to discern CAGE tags that span exon-exon junctions (EEJs). (B) Distribution of mouse CAGE tags mapping proximal to RefSeq EEJs. The frequency of CAGE tags that map uniquely and exactly to the genome but do not span EEJs (blue) are under-represented adjacent to exon boundaries. This under-representation can be reconciled with CAGE tags that map uniquely and exactly across EEJs (red). The distribution also shows a distinct peak of enrichment of CAGE tags immediately downstream of EEJs (green arrow). A. G. D. tags mapping to promoters (blue) and coding (red) with antibodies to H3K4me3 with promoters and exons. Supplemental Fig. S2. Frequency distribution for modi ed chromatin immunoprecipitated tags relative to CAGE sites associated brain (Mikkelsen etal., 2007). brain (Mikkelsen (E) and RNA polymerase II

average tag frequency1 average tag frequency average tag frequency H3K4me1 (humanCD4+ T cells)

-2000 nt 0.1 -2000 nt

H3K9ac (humanCD4+ T cells) 0.05 H3K4me2 (mousebrain) -2000 nt position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) exon CAGE +1 +1 +1 (F) in human CD4+ T cells (Wang et al., 2008) and H3K4me2 promoter CAGE +2000 nt (A-I) The average frequency of DNA tags immunoprecipitated within a 4 kb window centered on CAGE +2000 nt +2000 nt H. E. B.

H2Az (humanCD4+ H2Az T cells) 1.8 H3K4me2 (humanCD4+Tcells) H3K4me3 (mousebrain) 0.04 0.04 -2000 nt -2000 nt -2000 nt position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) +1 +1 +1 +2000 nt +2000 nt +2000 nt (G) (A) , H3K4me3 , H3K2me2 C. I. F.

0.22 H3K4me3 (humanCD4+ T cells) RNA Polymerase 2(humanCD4+ T cells) H3K27me3 (mousebrain)

-2000 nt 0.25 -2000 nt 0.06 -2000 nt (H) and H4K27me3 (B), H3K4me3 position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) position relative to CAGE tag(nucleotides;nt) +1 (C) +1 +1 , H3K9ac (I) in adult mouse +2000 nt (D) +2000 nt +2000 nt , H2Az A. single-peak

i. promoter 1 bit TATA box

A A G G T AAT A GGGGGCC GGCCC CG CCGGC G C CG GG G AG C G G G GG CC C G CCGGG GC TG C C CC TCCTTGACT TA TG C

ii. coding exon 1 bit

CG G C GGG G G C G G G GGG C C C T CG GG G G CGG C C G G A G C GC GCC C C GGT C GAC GAT TC TTC CTCTG T A A C C C TC T AAC G AACT A TATGA C A C C ACA -30 -20 -10 +1 +10 +20 +30

B. broad-peak

i. promoter 1 bit

GGGG GGG GGGG C GG GG G G GGGGGG CAGC GT GGGGGGGGGGGGGGGGGGGGGGGGG C CCCCCCCCCCCCCCCCCC CCGTG GCC CCCCCCCCCCCCCCCCCCCCCCCC GTCTTA

ii. coding exon 1 bit

CG G G G G AGAGC G GGG G GG GGGG G G C C C GTA C C C CC C CT C -30 -20 -10 +1G +10 +20 +30 C. i. conserved 1 bit

G GGAGT G ii. non-conserved G 1 bit C G -30 -20 -10 +1 +10 +20 +30 D. 1 bit

CG TAG -30 -20 -10 +1 +10 +20 +30

Supplemental Fig. S3. Nucleotide motifs associated with exonic CAGE tags. (A) Sequence composition for a 60 nt window centered on single-peak CAGE tag clusters mapping to promoter (i) or coding exons (ii). (B) Sequence composition for a 60 nt window centered on broad-peak CAGE tag clusters mapping to promoter (i) or coding exons (ii). (C) Sequence composition for a 60 nt window centred on exonic CAGE tags that are conserved (i) or nonconserved (ii) between human and mouse. (D) Sequence composition for a 60 nt window centred on CAGE tag cluster after removal of terminal 5’ nucleotide from CAGE tags to control for potential 5’ guanine non-template nucleotide addition. A. encoded on mitochondrial genome B. genes encoded on nuclear genome

400000 300000 PARE tags CAGE tags 350000 250000 300000 200000 250000

150000 200000

150000 100000 100000 total frequencytotal distribution (tpm) 50000 frequencytotal distribution (tpm) 50000

0 0 -50 +1 +50 -50 +1 +50 position relative to transcription start site position relative to transcription start site C. D. 10000 1000000 Scgb1a1

100000 1000 Alb ApoE 10000 100 1000

100 PARE tags 10 10 frequency distribution (tpm) CAGE tags exonic PARE tag frequency (tpm) r2 = 0.29 1 1 1 100 1 10 100 1000 10000 Gene body (RefSeq; 100 deciles) exonic CAGE tag frequency (tpm)

E. 5_

0_ CAGE (Liver) CAGE

5_

0_ PARE (Liver) PARE

Alb

chr5: 90891000 90893000 90895000 90897000 90899000 90901000 90903000 90905000

136 _

1_ CAGE (Liver) CAGE

126 _

1_ PARE (Liver) PARE T A L A E L V K H K P K A T A E Q L K T V M D D F A Q F L D T C C K A A D K D T C F S T E chr5: 90903590 90903610 90903630 90903650 90903670 90903690 90903710

Supplemental Fig S4. Comparative distribution of PARE and CAGE tags. (A,B). Frequency distri- bution of PARE (red) and CAGE (blue) tags across a 100 nt window centred on transcription start sites in the mitochondrial (A) and nuclear (B) genome. (C) Frequency distribution of PARE (red) and CAGE (blue) tags across gene body. (D) Scatter plot comparing CAGE and PARE tag frequencies mapping to RefSeq gene exons. (E) Genome browser view of the Alb gene (top panel) with detail shown (bottom panel). The Alb gene is prevalently tiled with both CAG and PARE tags. 10000

1000

100

10 Mouse RefSeq Genes

1

r2=0.31

0.1 0.1 1 10 100 1000 10000 Human RefSeq Genes

Supplemental Fig. S5. Conservation of cleavage frequency between human and mouse. Scatter-plot graph for cleavage frequency of homologous genes between human and mouse (Pearson’s correlation r 2 = 0.31). A. mouse embryonic stem cell small RNAs C. cytoplasmic small RNAs 0.018 500 small RNAs 0.016 spliced random control 200 nonspliced 0.014 0.012 300 0.01

200 0.008

small RNA raw tag 0.006 frequency distibution

mES small RNAs (total count) (total mES small RNAs 100 0.004 0.002 -40-30nt nt 3’ 3’5’ +20 nt 5’ 0 -150 nt 0 +150 nt small RNA 5’ termini distribution relative to exon-exon junctions position (nt) relative to exonic CAGE tag site

B. nuclear small RNAs D. spliced small RNAs

0.03 small RNAs 0.16 random control 0.025 0.14 cytoplasmic 0.12 nuclear 0.02 0.1 0.015 0.08

0.01 0.06 small RNA raw tag frequency distibution 0.04 0.005 tag frequncy distribution

spliced small RNA normalized 0.02 0 0 -150 nt 0 +150 nt -30 nt +1 nt position (nt) relative to exon/exon junction position (nt) relative to exonic CAGE tag site

Supplemental Fig. S6. Evidence that post-transcriptional cleavage generates small RNAs. (A) Schematic diagram illustrating the mapping strategy employed to discern small RNAs that span EEJs. Distribution of small RNA tags derived from mouse embryonic stem cells (Babiarz et al. 2008) mapping proximal to RefSeq EEJs. The frequency of small RNAs that map uniquely to the genome (blue) are under-represented adjacent to EEJs. This under-representation may be reconciled with small RNA tags that map across EEJs (red). Average frequency distribution for small RNA tags derived from human THP1 cells nuclear (B) and cytoplasmic (C) preparations (Taft et al., 2009) within a 200 nt window relative to exonic CAGE tags (blue) and random nucleotides within matched exons (red). (D) Frequency distribution of small RNAs derived from THP1 cytoplasmic (red) and nuclear (blue) across EEJs. A. CAGE tag frequency distribution

Visual Cortex 28tpm

Somatic Cortex 10 tpm

Cerebellum 5tpm

Hippocampus 1tpm

Embryo 5tpm

miR miR* mmu-miR-124-1 AK044422

BC030402

mmu-mir-122a: B. gg c - uc agcugu aguguga aaugguguuug ug c |||||| ||||||| ||||||||||| || a ucgaua ucacacu uuaccgcaaac ac a ugccuaacggaucguga aa a u ca 142 tags (liver)

Supplemental Fig. S7. Secondary capping of microRNA precursors. (A) Genome browser view of mmu-miR-124-1 hairpin (light red bar) showing CAGE tags immediately downstream of star sequence (dark red bar) in mouse tissues (blue panels). The mmu-miR-124-1 hairpin is hosted within full-length RNA transcripts (black bars). (B) Schematic diagram of the mmu-miR-122a hairpin showing mature microRNA sequence (red) and CAGE tag locations (blue) in mouse liver. A. M. musculus embryonic stem cells

wild-type Dcr-1 null Dgcr8 null 0.25 0.12 0.12 genomic spliced (0.12% or 1872 tags) (0.13% of 1914 tags) 0.2 0.1 0.1

0.08 0.08 0.15

0.06 0.06 0.1 0.04 0.04 0.05 (0.08% or 897 tags) 0.02 0.02

0

small RNA (average count) small RNA (average 0 16 17 18 19 20 21 22 23 24 25 26 16 17 18 19 20 21 22 23 24 25 26 16 17 18 19 20 21 22 23 24 25 26 size (nt) size (nt) size (nt)

B. D. melanogaster ovaries

wild-type Dcr null loqs null

0.35 0.4 0.3 (0.07% or 231 tags) (0.04% or 164 tags) (0.1% or 498 tags) 0.35 0.3 0.25 0.3 0.25 0.2 0.25 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05

0 0 0 small RNA (average count) small RNA (average 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 size (nt) size (nt) size (nt)

Supplemental Fig. S8. Characterization of exon-derived small RNAs. (A) Average small RNA length frequency from wild-type, Dcr-1 null and Dgcr8 null mutant mouse embryonic stem cells (Babiarz et al. 2008) that map to the genome (blue) or across EEJs (red; total count indicated). (B) Average small RNA length frequency from wild-type, Dcr null or loqs null mutant ies that map to the genome (blue) or across EEJs (red; total count indicated). 3 )

2

1 Relative expression level level expression Relative (fold enrichment over Tubd enrichment over (fold 0 gene gene gene gene poly-T poly-T poly-T poly-T Pnpla2 Calm1 Bptf Pol2ra

Supplemental Fig. S9. Detection of polyadenylated transcripts upstream of exonic cleav- age sites. Quantitative PCR using gene-speci c reverse primers (gene) and poly-T primers (poly-T) indicates the absence of an upstream polyadenylated transcript for exonic cleavage sites within the Pnpla2, Calm1, Bptf and Pol2ra genes. Fold enrichment of gene relative to Tubd is indicated with gene speci c primers (error bars indicate standard deviation). size fraction

A. 2 - 3.5 kb 0.5 to 2 kb less than 0.5 kb

12000 8000 3500

7000 3000 10000 6000 2500 8000 5000 2000 6000 4000 1500 3000 4000 1000 2000 cumulative RNAseq total (tpm) total RNAseq cumulative 2000 1000 500

0 0 0 -50 nt +50 nt -50 nt +50 nt -50 nt +50 nt window relative to exonic CAGE site B. 4.2 tpm > 6.5kb

7.2 tpm 3.5 to 6.5kb3.5 to 6.2 tpm 2 to 3.5kb2 to 6.7 tpm 0.5 to 2kb0.5 to 37 tpm < 0.5kb

41 tpm CAGE

hES small RNA

C. MALAT1 (8.7 kb) 11 tpm Visual Cortex

27 tpm Cerebellum

15 tpm Macrophages

49 tpm Lung

71 tpm Embryo

17 tpm Liver

chr19 5800000 5810000 5820000 5830000 5840000 5850000 Malat1 Neat1

Supplemental Fig. S10. Size-fractionated RNA deep sequencing analysis. (A) The sum total RNAseq tags for 50 nt windows centered around exonic CAGE tags that occur within genes larger than 2 kb shows progressive enrichment for downstream windows in smaller size fractions. (B) The MALAT1 ncRNA encom- pass prevalent exonic CAGE tags (orange panel) and are expressed in all size fractionated RNAseq prepara- tions (gray panels). Furthermore, recent small RNA libraries derived from human embryonic stem cells indi- cate numerous sense small RNAs associated with the MALAT1 transcript (Lister et al. 2009). (C) Both the Malat1 ncRNA and the associated Neat1 ncRNAs (black bars) are subject to prevalent cleavage as indicated by exonic CAGE tags in mouse tissues (blue panel). 100000 0.2 A. no significant difference (p value = 0.7) C.

10000

1000 average tag count average 2 1

T TT G A bits TT T T TTT CTC C A G 0 T G AAG 100 AG GT 5’ 3’ D. exon 10 exonic CAGE tag frequency CAGE exonic 0.035 1 sense 0.1 spliced genes unspliced genes antisense -1500 nt +1 +1000 nt -1000 nt +1 +1500 nt B. 1000 E.

0.003 Ser5 phosphorylated

100

Unphosphorylated

10 tag count average -1500 nt +1 +1000 nt -1000 nt +1 +1500 nt exon number exon F.

1 0.006 sense

r2 = 0.02 0.1 antisense 0.1 1 10 100 1000 10000 100000 average tag count average exonic CAGE tag frequency -1500 nt +1 +1000 nt -1000 nt +1 +1500 nt

Supplemental Fig. S11. Enrichment of CAGE tags at 5’ end of exons. (A) Comparison of exonic CAGE tag frequency for spliced versus unspliced genes indicates no signi cant dierence (p-value = 0.7, unpaired t-test) (B) Scatter-plot indicates no correlation (r2 = 0.02) between number of splic- ing events and exonic CAGE tag frequency. (C) Average CAGE tag frequency distribution within 20 nt window centered on splice junctions with accompanying nucleotide composition. (D) Average GRO-seq sense (black) and antisense (red) frequency distribution within a 2.5 kb window centred on splice junction (Core et al. 2008). (E) The average frequency of DNA tags immunoprecipitated with antibodies to phosphorylated (red) and unphosphorylated (blue) RNA polymerase II across splice junctions. (F) The average frequency of MNase digested sense (red) and antisense (blue) DNA tags show nucleosome positioning across splice junction (Wang et al. 2008). A. human (THP1 di erentiation)

1200 y 1000

800 equen c 600 GE f r

A 400 R2=0.131 on C 200 x e 0 0 500 1000 1500 2000 2500 promoter CAGE frequency

B. mouse (tissue)

3500

y 3000

2500 equen c 2000 GE f r

A 1500 R2=0.093

on C 1000 x e 500

0 0 500 1000 1500 2000 2500 promoter CAGE frequency

Supplemental Fig. S12. Correlation between promoter and exonic CAGE tags. (A,B) Com- parison of CAGE tags mapping to promoter relative to coding exons for the top 500 selected genes in human THP1 cell di erentiation (A) or mouse tissue (B). Low Pearson’s correlation (r2) values suggest little correlation between promoter expression and cleavage levels. al. 2009) shows di erential regulation ofmRNAcleavage across mouse tissues. phenomenon. of pendence diverse ratios of CAGE tags mapping to promoter relative to exons for each di erentiation.time point indicates cell relative exons.inde- THP-1coding to mapping human frequency tag CAGE normalized in the as determined was cleavagelevel Expression mRNA of regulation (green) down and (red) up the shows (A) di erentiation. cell THP-1 during cleavage post-transcriptional Di erential S13. Fig. Supplemental Cluster analysis of the expression of the 500 genes containing the highest exonic CAGE tag frequency frequency tag CAGE exonic highest the containing genes 500 the of expression the of analysis Cluster (C) Cluster analysis of the expression of the 500 CAGE tag clusters (Valen et (Valen clusters tag CAGE 500 the of expression the of analysis Cluster HNRNPA2B1 C14orf106 ZMPSTE24 SH3BGRL3 ARHGAP18 C1orf106 KIAA1109 KIAA2026 C12orf51 SLC38A10 ATP6V0A1 C13orf23 ARHGAP30 HNRNPUL1 C12orf35 SNRNP200 KIAA0182 KIAA1545 PPP1R12A ATP6V0A2 DYNC1H1 HSP90B1 SH3KBP1 TBL1XR1 ATP13A3 SMARCA4 CCDC88A GATAD2B SLC20A1 ARFGEF1 SLC43A3 APBB1IP C1orf58 DENND1B ANKRD12 SEC61A1 PLEKHG2 ERBB2IP PPFIBP1 FAM120A EMILIN2 DENND4B ANKRD17 GATAD2A CAPRIN1 PRPF38B RAPGEF6 CCDC88B NCKAP1L LRRFIP1 SYNCRIP SFRS2IP HNRNPH1 PRPF40A ANKRD11 C9orf86 SMARCA2 DNMT3A VPS13C CEP350 ZCCHC6 GIGYF2 EIF4G3 GOLGA4 TRERF1 BAT2D1 SNAP47 CHI3L1 ARID4B GOLGB1 SPTAN1 RNF213 IL17RA CTNNA1 HP1BP3 FNDC3B SCARB2 FRMD4A ADAM10 CYFIP1 ANXA11 PABPC4 ADRBK1 TNRC6B DPYSL2 ERLIN2 ATP2B4 COL9A2 GOLGA3 JMJD2A SFRS18 CUGBP2 ZSWIM6 MAN2A1 DOCK10 NOTCH1 RASSF4 GRIN2D INPP5D FAM46A PIP5K3 TGFBR1 DDX60L PECAM1 ATP1A1 TM9SF3 ZBTB7A PTPN11 POLR2A ARID1A LSM14A TNRC18 NOTCH2 HNRPDL SMCHD1 MYCBP2 ATP8B4 ZC3H13 MED13L HIVEP3 SLC7A1 MAP3K4 LGALS3 IQGAP1 ZNF638 OSBPL8 AHCYL1 EIF4G1 SFMBT2 PLXNB2 UBAP2L SFRS11 ZNF469 FAM38A BAHCC1 ATP2A2 HNRNPK CLSTN1 THSD7A ANKHD1 NUP210 HNRNPM HNRNPU SFRS15 CHST11 SFRS14 CTNNB1 ATP2B1 SEC16A TCERG1 SFRS12 SREBF2 LRPPRC PABPC1 LRRC8C JMJD1C PRPF4B JMJD1B CCDC55 SAMHD1 DIAPH1 MLLT10 ZNF318 FANCD2 BCLAF1 NUCKS1 SHANK1 NUP214 EIF2C2 TRIP12 PTPN12 SEC24D VPS13B EIF4G2 TXNRD1 RANBP2 ZNF292 BAIAP2 TXNDC1 A. TAF15 RBM16 MACF1 HSPA4 CCAR1 PRKCA SEC62 NIPBL SBNO2 ZC3H4 MYO1F FNBP4 TRPM7 RASA1 IGF2R DOCK5 TGFB1 FOXJ3 SENP6 LIMS1 WDFY4 CD109 XRCC5 ALAS1 PTK2B GANAB BIRC6 ITGB2 ERMP1 CTSL1 G3BP1 PLCB2 TCF12 FMNL3 GPNMB ZMYM4 CKAP5 BAZ2A PREX1 HMHA1 RSRC1 CNOT1 PRKDC ITM2C EIF3B DDX17 LENG8 USP9X APLP2 ABCA1 SORL1 ITGB1 MYH14 PTGS1 PLEC1 AP3D1 MYO1E PARP1 LARP1 ZMIZ1 MMP14 ITGAX ACTN4 CSF1R SSFA2 ANPEP CIRBP ASAP1 ITGA5 KIF5B STK24 MYSM1 EFR3A NSMAF USP16 CSPG4 MKI67 BAT2L TRRAP VPS4B STK11 DOCK2 HCFC1 TNPO1 AP2A1 DDX46 PDCD7 DNMT1 KPNB1 LPHN2 CMYA5 AKAP8 SPAG9 SBNO1 STT3A RC3H1 BTAF1 ZMYM2 AP3B1 ZNFX1 FUBP1 FNIP1 HERC1 NFAT5 EP300 BRWD1 MYST3 MED13 PTBP1 NUP98 PTPRJ HSPD1 RBM25 SF3B1 LAMP2 PRDM8 LARP5 HIPK3 SRCAP KHSRP RREB1 BAZ1A EDEM1 GSPT1 RBBP6 CXXC5 PRPF8 TAOK2 SRRM2 STAG2 HUWE1 PDIA6 ABCA2 TXLNA TCF25 RAD21 NCOR2 PDS5B RPS14 SATB2 SCAF1 CENPF PDS5A SR140 ITSN2 PTPRC SATB1 ACSL4 ITPR1 TACC1 EDEM3 HERC4 ADNP2 ATAD2 SF3A1 DIDO1 HADHA CHSY1 AHNAK ITGA6 DOT1L CSDE1 ROCK1 ACTG1 WDFY3 TOP2B HIPK1 RUNX1 DOCK8 MTDH LARS CUL3 RHOA CALU REST USO1 LDHB UBR5 SPEN SND1 TLN2 CANX GPC4 AFF4 CHD4 UTRN FAT1 UBR4 CALR NSD1 TBL2 RCC2 UBA1 VAV3 CYBB USP8 TTC3 XRN2 RIN2 RYR1 GRB2 AFF1 GMIP BCL2 ELF1 CLTC DDX5 BRD4 WNK1 MFI2 AGRN ATN1 MYH9 TPP1 MMP9 TLN1 TNS3 CTSD LRP1 CTSB DNM1 GNAS SMG1 CD44 PKN1 HEG1 BPTF RPN2 IARS STK3 ADFP HELZ RFX7 MDM4 CHD7 SGK1 SMC3 PFKL TFRC MLL4 PRG2 MLL2 RIN3 ELA2 ACLY PLD4 ILF3 COPA MDH1 EEF2 FASN RPN1 WEE1 EGR3 PLEK EGR1 SMC4 PHIP PPIG MLL3 MLL5 NARS ASNS CROP SAFB NRD1 TARS CHD2 PHF3 ARS2 GNB1 MCM4 SRGN AZU1 ILF2 POGZ MDN1 ANLN ZEB2 ROD1 LCP1 NFYC PLD3 LYST EPRS SRA1 ASPM CHD9 RNF6 PAK2 MLEC ACTB SDF4 MTR VIM GSN CBL MYB FUS MGA RB1 A2M GPI ENG ARX PAM GAA NCL SON APP TPR PNN APC SET NF2 NNT FRY EZR ZFR CPD MSN AR 0 h 1 h exon 4 h 12 h 24 h 96 h C14orf106 HNRNPA2B1 ATP6V0A2 PPP1R12A SLC38A10 C12orf51 ARHGAP30 C13orf23 ZMPSTE24 ATP6V0A1 HNRNPUL1 KIAA1545 SH3BGRL3 ARHGAP18 KIAA1109 KIAA2026 C12orf35 SNRNP200 KIAA0182 C1orf106 LRRFIP1 SYNCRIP SFRS2IP HNRNPH1 C9orf86 SMARCA2 PRPF40A ANKRD11 C1orf58 SEC61A1 PLEKHG2 DENND4B ANKRD17 PRPF38B RAPGEF6 CCDC88B NCKAP1L DYNC1H1 HSP90B1 SH3KBP1 TBL1XR1 ATP13A3 SMARCA4 CCDC88A GATAD2B SLC20A1 ARFGEF1 SLC43A3 APBB1IP DENND1B ANKRD12 ERBB2IP PPFIBP1 FAM120A EMILIN2 GATAD2A CAPRIN1 CTNNB1 ATP2B1 SEC16A TCERG1 SFRS12 SREBF2 LRPPRC BCLAF1 NUCKS1 SHANK1 NUP214 EIF2C2 TRIP12 PTPN12 SEC24D VPS13B EIF4G2 TXNRD1 RANBP2 ZNF292 BAIAP2 TXNDC1 PABPC1 LRRC8C JMJD1C PRPF4B JMJD1B CCDC55 SAMHD1 DIAPH1 ATP2B4 COL9A2 GOLGA3 INPP5D FAM46A ADRBK1 TNRC6B DPYSL2 ERLIN2 RASSF4 GRIN2D TNRC18 NOTCH2 DOCK10 NOTCH1 ZBTB7A PTPN11 POLR2A ARID1A LSM14A FAM38A BAHCC1 ATP2A2 HNRNPK CLSTN1 THSD7A ANKHD1 NUP210 HNRNPM HNRNPU SFRS15 CHST11 SFRS14 DNMT3A VPS13C CEP350 ZCCHC6 GIGYF2 EIF4G3 GOLGA4 TRERF1 BAT2D1 SNAP47 CHI3L1 MLLT10 ZNF318 FANCD2 ARID4B GOLGB1 SPTAN1 RNF213 IL17RA CTNNA1 HP1BP3 FNDC3B CYFIP1 ANXA11 PABPC4 JMJD2A SFRS18 CUGBP2 ZSWIM6 MAN2A1 PIP5K3 TGFBR1 DDX60L PECAM1 ATP1A1 TM9SF3 HNRPDL SMCHD1 MYCBP2 ATP8B4 ZC3H13 MED13L HIVEP3 SLC7A1 MAP3K4 LGALS3 IQGAP1 ZNF638 OSBPL8 AHCYL1 EIF4G1 SFMBT2 PLXNB2 UBAP2L SFRS11 ZNF469 SCARB2 FRMD4A ADAM10 PRPF8 TAOK2 SRRM2 STAG2 HUWE1 PDIA6 ABCA2 TXLNA TCF25 RAD21 NCOR2 PDS5B RPS14 SATB2 SCAF1 CENPF ADNP2 ATAD2 SF3A1 DIDO1 HADHA CHSY1 AHNAK ITGA6 DOT1L CSDE1 ROCK1 ACTG1 WDFY3 TOP2B HIPK1 RUNX1 DOCK8 PDS5A SR140 ITSN2 PTPRC SATB1 ACSL4 ITPR1 TACC1 RSRC1 CNOT1 PRKDC ITM2C EIF3B GANAB BAZ2A PREX1 HMHA1 ANPEP CIRBP ASAP1 ITGA5 HCFC1 APLP2 ABCA1 SORL1 ITGB1 MYH14 PTGS1 PLEC1 AP3D1 MYO1E PARP1 LARP1 ZMIZ1 MMP14 ITGAX ACTN4 CSF1R SSFA2 MKI67 BAT2L TRRAP VPS4B STK11 DOCK2 PTBP1 NUP98 PTPRJ HSPD1 RBM25 SF3B1 LAMP2 PRDM8 LARP5 HIPK3 SRCAP KHSRP RREB1 BAZ1A EDEM1 GSPT1 RBBP6 CXXC5 TAF15 RBM16 MACF1 HSPA4 CCAR1 PRKCA SEC62 NIPBL SBNO2 ZC3H4 MYO1F FNBP4 TRPM7 RASA1 IGF2R DOCK5 TGFB1 FOXJ3 EDEM3 HERC4 SENP6 LIMS1 WDFY4 CD109 XRCC5 PTK2B BIRC6 ITGB2 ERMP1 CTSL1 G3BP1 PLCB2 TCF12 FMNL3 GPNMB ZMYM4 CKAP5 DDX17 LENG8 USP9X KIF5B STK24 MYSM1 EFR3A NSMAF USP16 CSPG4 TNPO1 AP2A1 DDX46 PDCD7 DNMT1 KPNB1 LPHN2 CMYA5 AKAP8 SPAG9 SBNO1 STT3A RC3H1 BTAF1 ZMYM2 AP3B1 ZNFX1 FUBP1 FNIP1 HERC1 NFAT5 EP300 BRWD1 MYST3 MED13 ALAS1 POGZ MDN1 ZEB2 ROD1 LCP1 NFYC PLD3 LYST EPRS SRA1 CHD9 RNF6 PAK2 MLEC ACTB SDF4 ANLN ASPM BCL2 ELF1 UBR4 CALR NSD1 RIN2 RYR1 GRB2 AFF1 GMIP SMG1 CD44 PKN1 BRD4 WNK1 MFI2 AGRN ATN1 MYH9 TPP1 MMP9 TLN1 TNS3 CTSD LRP1 CTSB DNM1 GNAS ILF3 COPA MDH1 EEF2 FASN RPN1 SAFB NRD1 TARS CHD2 PHF3 ARS2 GNB1 MCM4 SRGN AZU1 ILF2 MTDH LARS CUL3 RHOA CALU REST USO1 LDHB UBR5 SPEN SND1 TLN2 CANX GPC4 FAT1 TBL2 RCC2 UBA1 VAV3 CYBB USP8 TTC3 XRN2 CLTC DDX5 HEG1 BPTF RPN2 IARS STK3 ADFP HELZ RFX7 MDM4 CHD7 SGK1 SMC3 PFKL TFRC MLL4 PRG2 MLL2 RIN3 ELA2 ACLY PLD4 WEE1 EGR3 PLEK EGR1 SMC4 PHIP PPIG MLL3 MLL5 NARS ASNS CROP AFF4 CHD4 UTRN SET NF2 EZR ZFR CPD MSN NNT FUS MGA MYB ARX A2M GPI ENG NCL SON APP TPR PNN APC MTR VIM FRY GSN CBL RB1 PAM GAA AR B.

0 h promoter 1 h 4 h 12 h 24 h 96 h C. C19F214A338 C17R36468B6 C19F13BE108 C19F13BBB69 C19F13BA8D9 C19F13BD2A3 C11F5E48CBD C17F21CBD53 C14F1FA07C4 C11R4A61C6D C18F13D86F9 C18F13D94BC C19R246A209 C12R636D0FF C17F4C540FF C14F3400F41 C14F1C59E48 C17F2174353 C11F5E49023 C14F1C59382 C16R1179F4C C17F217438B C17R36431AA C17R364281F C19R29FFCEC C17R2B63446 C11R5E9B451 C16R50EE251 C15F4718663 C19F2CE358E C17R2AF270B C19F1F0B418 C16F5A2BD42 C15R2ADCC89 C15F613FAED C12R543DA7B C12F202B981 C19F2C3BBC6 C14F335A93D C18F2B91996 C17R1BBA506 C11R5C992EA C17R2AF296E C14F3895804 C14R2A83A57 C14R4644C1E C16F23F3417 C15F5B30739 C10R231EA05 C11R72E53BD C14F1C582BB C16R13A793B C12R6B341C2 C10F4CD9AC6 C18F4A5E7A5 C17R2AF2670 C17R2AF24A0 C18F4A5EDDC C15R551F4D1 C10F7995186 C18F4A5EC88 C10F4CD9A76 C14R1F5933A C12F5E8E24F C18F215DE81 C12R6AECC1B C17R21CD2D3 C17R1BBA432 C10R6B20F06 C13R3D3088D C13F1C9A84E C15F4B1800E C15F4B18051 C14F1294CD6 C17R212240F C11F68653CF C11R5E9B48E C14F3DEBFCB C11R6876914 C11F15045BA C12R42DE194 C11R1476A27 C19F297AB6A C12F5FF24AC C11F30C3CD0 C18F49CE1AB C18F22BA27B C11R34AE24C C18F2B91888 C13R6ED3D53 C11R6839F92 C18F4EDBB78 C14F41EC3CC C11F5D049FF C14F62907A9 C19R234557B C17R1FB67A8 C14R3327E66 C18R348F698 C15F48A07A5 C11F47D6409 C11F6869629 C11R62E285C C11F623CD62 C17F13B449A C11F623CDF0 C17F22F5F6B C10F1B80755 C19R2CE3509 C10F1B8037C C10F1B802D4 C18F242CFA8 C14F1BF36D9 C11R1CA53DC C17F52B6CE5 C11R1CA54C8 C14R33F8553 C15R5E10868 C19R1027AAE C11R32A6690 C17F21223C2 C10F4CD9C14 C11F5BAABA0 C17F52B7BC9 C17R1674A7C C11R5DB2D5C C19R23C357B C11R7159491 C11F5A6053F C11F4201A8B C11R32A4B7F C11R2FF1FBA C17R2C9D8B1 C19R2B3073B C17R2BB9D9F C10F550B30C C13R17DE480 C17F2015805 C11F5D2220D C11F737795F C12F42DE1B6 C10R6B210F2 C10F4B68D20 C15F4938245 C10R4C59CE0 C11R4298082 C10F4D4AD76 C11R342B780 C14F335B83D C17R35A9A7A C18F2499322 C13R71F864E C14R1C72CBC C10R58B5D46 C10R58B5D17 C11R4254647 C13R5FDFA5D C15R5E10A06 C15R5E1077E C16F24098EB C18F20D84CE C11R3B53B28 C18F20D8246 C11R70D49B4 C15R5E10C3D C15R5E10B31 C11R66DF79F C13R5FDF8C2 C13R5FDF6A4 C11R3B5454A C16F1D1B43B C11F41AF7EB C15F4F89BD3 C18F3A67E90 C16F232B110 C10F496EC97 C11R6D6F3BF C19R27E3C58 C13F2C49F16 C11F4B20662 C10R231E9B8 C14R181FF93 C12R2FEEB6B C17F209DC96 C19R2C663EC C10R6F47DC8 C17F21D53DB C12F19CD621 C10F735E9A2 C10F38EC06D C10R29AB0FA C13R70ABE9F C10R6F47E0E C11F1911C42 C10F4CD9641 C11R3893026 C10F4CD9810 C17R1721BBB C10F4CDA591 C19F996106 C8F460E304 C15F3146B4 C8R53AB9AF C2F7432B02 C8F53BC623 C7F2A666AD C7R792DAF5 C9R14AAE1B C6F725BED3 C1R87543ED C6R7A81C42 C1F9B4A66A C7F861B27F C7R792D8E8 C2R5730E1E C5F574DC7E C1F9B487BD C1F9B4BA46 C2F786BB51 C1F9B49E7B C1F9B49718 C9F2BE38A4 C12F7AB524 C5F574CE51 CXF12CBD5E C6F43D60A2 C12F7AB623 C4R3BBA8CA C4R47ECADD C3R36FE0D1 C7R5EC05D5 C5F88434B3 C9R3151E48 C2F9CE701F C7R1C847B4 C7R7126805 C4R77FF5E6 C3F6EDBDCB CXF1F3E120 C6R2EE5A5B C3F1D8E629 C7R2B26143 C9F2383992 CXR61AD754 C19R5D5C6F C4F296F109 C1R9E93455 C2F3AC0748 C8R680BBC0 C4R2697D86 C2F8223CB9 C2F1858A69 C6R8A49CAF C2R8959BDB C2F8224BB8 C8R665C224 C8R40743AA C5R571B759 C2FA5EDC40 C1R9E9279C C4R7F73D9F CXR96AB2BE C6F697CCDD C2R81C6751 C11F6A41B2 CXF7E0E1C3 C1F3915C65 C6F8612CF2 C1F2B31E64 C6F529514A C2R190BCDA C8F53FB6BE C7F1B32D11 C4F6E11C41 C4F6E38273 CXR433BB38 C9R22D357B C3F4F43C00 C5F574D880 C5F574C6F1 C5F574C1EC C5F574E154 C5F574E644 C5F574D40F C5F574CA55 C5F574E80D C9R624CECC C9F2BD9C8C C5R7430763 C4R83393EA C1F347316F C6R527C006 C3F8AF2628 C2FA5B44A2 C5F7C760FA C7F733A744 C7F799B6F9 C5R8360794 C2R8DAE969 C2R5631BD8 C15FBB2C13 C19F99601B C4R6D1B27B C7R1B8D07F C7F733A82F C7R2AE8ED7 C6R60A60EA C2F4271A9E C7R2B15F52 C4R7DB30EC C2R8DAF28A C2R8F9E90C C8F35EFFA2 C12FFE80B7 C8R405BEA1 C1F47E5734 C8F35EAC59 C5R84F240F C2R7CAD383 C9R2391616 C2F733BF6E C2F3AC07DC C11R3D931B C8R76B2E3B C7F2AF22BF C6F2F72CB2 C8F53E54DC C12F3C6CC0 C9F3540D8C C9F300642C C9F34366F7 C7R85785E5 C2R733BF2F C2R464B790 C3F1F6A9F2 C8R46FD063 C8F6CAA2C2 C9F302FD9D C2F79149D6 C9R14CF8DF C8R665C155 C4F67539A8 C5R235DE1A C2F207823D C2RA237C31 C11F362815 C9F261C56C C9F13FC58A CXF4159361 C10R71FFDB C5F1C62F58 C12F888903 C3F633944C C2F5736E7F C3F827A1A9 C8F35ED28D C7R2A666D3 C5F88973E3 C5F745272F C7R2B660D1 C8R79E9AA4 C4F17E5728 CXR1325C01 C8R4637C7B C17F5A8503 C1R456F29A C5F74526B8 C7R6022974 C4F26953B5 C9F5B3C327 C6F8BFB891 C2R72A8218 C17F4B9149 C4R87D4A78 C1F47531A6 C19F8AA088 C2F217F115 C4F78A9B6E C4R937DE55 C5R4AD8C14 C15F7B76AA C7R19DB79B C2F217AC7C C1R9A70A50 C2F217D78F C2F217C595 C7F46341CE C19F8A9AE4 C2F217CE64 C19F8A9E7C C2F217DD26 C19F8A9D40 CXF42CD78F C4F767F6BC C5R5FA238A C4R7149FDA C6F240CB1D C1F2B31ADF C4F93567B0 C1F2B31E85 C1F2B318C2 C12FFE806B CXR97E6DE7 C11R6C8B0D C4F900699D C7R21A1681 C17F76C6ED C4F1FFBF4A C12FFE811D C5R803B79E C9R2B96BBB C5F144FD78 C9F36AC8D6 C9R3ADA44A C8F19DFF02 C7R3916CC2 C1F7E67A1B C2R5684189 C2F567D46C C1R479EA93 C8R2D7F0D6 C8R68E3EEA C12F9EF3AB C8R1793C46 C12F9EF2F6 C9R37F0B38 C6R4F9B058 C6F6BE0F6A C2F56A56B5 C7R6022B89 C7F795868D C6R438A91B C2F3AC0E35 CXR131751F C4F78D401B C11R63445C C2F3AC221A C1R2129981 C2R13D598E C6R6B79031 C4F78A9A5D C1R4445248 C8R7119191 C4R81DACCA C19F897811 C3R540F8FC C6F2EB569C C6F2EB4C41 C7F868C47D C6F8302E1E C7R879A696 CXF36815D9 C9F14B35AB C7F7A18F06 C8R776BE56 C1R967CF53 C6F6C0B6CD C8R7119289 C9F3896C93 C5F695A88A C3R667B9DC C7R7A192F6 C8R7119433 C7R5EE43A9 CXR2113FA0 C5R746C890 C9F3D03372 C19F995EE1 C2R8906877 C19F8A9A33 C8F47263C3 C7R4FA96F6 C1R967CEC6 C7F67B063D C6R6E48EDF C2F95B2EC9 C2F7432D8F C8R67AE101 C7F74369BE C19F8976FB C4RE11EF6 C2R98C805 C8FC6F0DB C7RF54D6D C4RB89D3C C2R981368 C7FF5BADA C7FF4D054 C8F35AA4B C6F4480F8 C2R984DD3 C4FE7D499 C8R374F7B C9RE687DA C1FA786BE C5FE01EDC CXR6FE3F6 C6F447A7D C6F446A9E

Embryo CAGE cluster Liver Lung Macrophage Hippocampus Cerebellum SomCortex VisCortex (B) The A. tags that initiate from same 5’ nucleotide in: 1+ tissues 2+ tissues 4+ tissues all tissues 100

80

60

40

20

number of CAGE tag clusters number of CAGE 0 1 2 3 4 5 6 7 8 number of tissues

B. tag tissue: count: hippocampus 92 cerebellum 31 lung 11 embryo 11 somatosensory cortex 4 liver 2 macrophages 2 visual cortex 1 Foxb2 T GG C C T C A T G T T GG C G T G A T GG A T T C GG T AG C C G C GG C T G C C G C T G C T G C C G C C G C AG C T

W P H V G V M D S V A A A A A A A eA A A 16939730 16939720 16939710 16939700 16939690 16939680

Supplemental Fig. S14. Tissue speci city of post-transcriptional cleavage. (A) Proportion of CAGE tag clusters (Valen et al. 2009) where the preferred nucleotide to which the highest frequency of CAGE tags map is preferred across more than one tissue. (B) CAGE tags from eight di erent tissues map to a single nucleotide (red arrow) within the mouse Foxb2 gene. A. liver 169 tpm unspliced 55 tpm CAGE spliced Alb B. 27.2 tpm > 6.5 kb

90.2 tpm 4 to 6.5 kb4 to 64 tpm 2 to 4 kb2 to 23.3 tpm 0.5 to 2 kb0.5 to 4.67 tpm < 0.5 kb 4 tpm CAGE

XIST

Supplemental Fig. S15. Examples of small RNA genesis by post-transcriptional cleavage. (A) Liver CAGE tags (green panel) map prevalently to the mouse Alb gene coding exons. (B) The XIST ncRNA encompass prevalent exonic CAGE tags (orange panel) and are expressed in all size fractionated RNAseq preparations (gray panels). A. 450 400 350 300 250 200 150

(from PRIDE database) (from 100 50

Met peptide tag frequency distribution 0 -50 nt 0 +50 nt Position (nt) relative to exonic CAGE tag site B. 2.3

2.25

2.2

2.15

2.1

2.05

(from PRIDE database) (from 2 1.95

All peptide tag frequencyAll distribution 1.9 -50 nt 0 +50 nt Position (nt) relative to exonic CAGE tag site

Supplemental Fig. S16. Enrichment of start codons (Met) downstream of post-transcriptional cleavage sites. (A) Frequency distribution of peptide tags initiating with methionine (Met) residue across a 100 nt window centered on exonic CAGE tags (blue) and random exonic nucleotides (red, control). (B) Frequency distribution of peptide tags across a 100 nt window centered on exonic CAGE tags (blue) and random exonic nucleotides (red, control). Peptide tags downloaded from PRIDE database (www.ebi.ac.uk/pride/).