Supplemental Fig. S1. Evidence of Post-Transcriptional Cleavage from CAGE Tag Analysis
Total Page:16
File Type:pdf, Size:1020Kb
A. ~20 nt exon-exon junction (EEJ) ~20 nt exon 1 3’ 5’ exon 2 CAGE tag B. 3000 CAGE tags mapping to genome CAGE tags mapping across exon-exon junction 2000 1000 CAGE tags (total count) tags (total CAGE -50 nt 3’ 5’ +30 nt CAGE tag 5’ termini distribution relative to exon-exon junctions Supplemental Fig. S1. Evidence of post-transcriptional cleavage from CAGE tag analysis. (A) Schematic diagram illustrating the mapping strategy employed to discern CAGE tags that span exon-exon junctions (EEJs). (B) Distribution of mouse CAGE tags mapping proximal to RefSeq EEJs. The frequency of CAGE tags that map uniquely and exactly to the genome but do not span EEJs (blue) are under-represented adjacent to exon boundaries. This under-representation can be reconciled with CAGE tags that map uniquely and exactly across EEJs (red). The distribution also shows a distinct peak of enrichment of CAGE tags immediately downstream of EEJs (green arrow). A. H3K4me1 (human CD4+ T cells) B. H3K4me2 (human CD4+T cells) C. H3K4me3 (human CD4+ T cells) exon CAGE promoter CAGE 0.05 0.25 0.04 average tag frequency average -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) D. H3K9ac (human CD4+ T cells) E. H2Az (human CD4+ T cells) F. RNA Polymerase 2 (human CD4+ T cells) 0.1 0.06 0.04 average tag frequency average -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) G. H3K4me2 (mouse brain) H. H3K4me3 (mouse brain) I. H3K27me3 (mouse brain) 0.22 1 1.8 average tag frequency average -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt -2000 nt +1 +2000 nt position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) position relative to CAGE tag (nucleotides; nt) Supplemental Fig. S2. Frequency distribution for modied chromatin immunoprecipitated tags relative to CAGE sites associated with gene promoters and exons. (A-I) The average frequency of DNA tags immunoprecipitated within a 4 kb window centered on CAGE tags mapping to promoters (blue) and coding exons (red) with antibodies to H3K4me3 (A), H3K2me2 (B), H3K4me3 (C), H3K9ac (D), H2Az (E) and RNA polymerase II (F) in human CD4+ T cells (Wang et al., 2008) and H3K4me2 (G), H3K4me3 (H) and H4K27me3 (I) in adult mouse brain (Mikkelsen et al., 2007). A. single-peak i. promoter 1 bit TATA box A A G G T AAT A GGGGGCC GGCCC CG CCGGC G C CG GG G AG C G G G GG CC C G CCGGG GC TG C C CC TCCTTGACT TA TG C ii. coding exon 1 bit CG G C GGG G G C G G G GGG C C C T CG GG G G CGG C C G G A G C GC GCC C C GGT C GAC GAT TC TTC CTCTG T A A C C C TC T AAC G AACT A TATGA C A C C ACA -30 -20 -10 +1 +10 +20 +30 B. broad-peak i. promoter 1 bit GGGG GGG GGGG C GG GG G G GGGGGG CAGC GT GGGGGGGGGGGGGGGGGGGGGGGGG C CCCCCCCCCCCCCCCCCC CCGTG GCC CCCCCCCCCCCCCCCCCCCCCCCC GTCTTA ii. coding exon 1 bit CG G G G G AGAGC G GGG G GG GGGG G G C C C GTA C C C CC C CT C -30 -20 -10 +1G +10 +20 +30 C. i. conserved 1 bit G GGAGT G ii. non-conserved G 1 bit C G -30 -20 -10 +1 +10 +20 +30 D. 1 bit CG TAG -30 -20 -10 +1 +10 +20 +30 Supplemental Fig. S3. Nucleotide motifs associated with exonic CAGE tags. (A) Sequence composition for a 60 nt window centered on single-peak CAGE tag clusters mapping to promoter (i) or coding exons (ii). (B) Sequence composition for a 60 nt window centered on broad-peak CAGE tag clusters mapping to promoter (i) or coding exons (ii). (C) Sequence composition for a 60 nt window centred on exonic CAGE tags that are conserved (i) or nonconserved (ii) between human and mouse. (D) Sequence composition for a 60 nt window centred on CAGE tag cluster after removal of terminal 5’ nucleotide from CAGE tags to control for potential 5’ guanine non-template nucleotide addition. A. genes encoded on mitochondrial genome B. genes encoded on nuclear genome 400000 300000 PARE tags CAGE tags 350000 250000 300000 200000 250000 150000 200000 150000 100000 100000 total frequencytotal distribution (tpm) 50000 frequencytotal distribution (tpm) 50000 0 0 -50 +1 +50 -50 +1 +50 position relative to transcription start site position relative to transcription start site C. D. 10000 1000000 Scgb1a1 100000 1000 Alb ApoE 10000 100 1000 100 PARE tags 10 10 frequency distribution (tpm) CAGE tags exonic PARE tag frequency (tpm) r2 = 0.29 1 1 1 100 1 10 100 1000 10000 Gene body (RefSeq; 100 deciles) exonic CAGE tag frequency (tpm) E. 5_ 0_ CAGE (Liver) CAGE 5_ 0_ PARE (Liver) PARE Alb chr5: 90891000 90893000 90895000 90897000 90899000 90901000 90903000 90905000 136 _ 1_ CAGE (Liver) CAGE 126 _ 1_ PARE (Liver) PARE T A L A E L V K H K P K A T A E Q L K T V M D D F A Q F L D T C C K A A D K D T C F S T E chr5: 90903590 90903610 90903630 90903650 90903670 90903690 90903710 Supplemental Fig S4. Comparative distribution of PARE and CAGE tags. (A,B). Frequency distri- bution of PARE (red) and CAGE (blue) tags across a 100 nt window centred on transcription start sites in the mitochondrial (A) and nuclear (B) genome. (C) Frequency distribution of PARE (red) and CAGE (blue) tags across gene body. (D) Scatter plot comparing CAGE and PARE tag frequencies mapping to RefSeq gene exons. (E) Genome browser view of the Alb gene (top panel) with detail shown (bottom panel). The Alb gene is prevalently tiled with both CAG and PARE tags. 10000 1000 100 10 Mouse RefSeq Genes 1 r2=0.31 0.1 0.1 1 10 100 1000 10000 Human RefSeq Genes Supplemental Fig. S5. Conservation of cleavage frequency between human and mouse. Scatter-plot graph for cleavage frequency of homologous genes between human and mouse (Pearson’s correlation r 2 = 0.31). A. mouse embryonic stem cell small RNAs C. cytoplasmic small RNAs 0.018 500 small RNAs 0.016 spliced random control 200 nonspliced 0.014 0.012 300 0.01 200 0.008 small RNA raw tag 0.006 frequency distibution mES small RNAs (total count) (total mES small RNAs 100 0.004 0.002 -40-30nt nt 3’ 3’5’ +20 nt 5’ 0 -150 nt 0 +150 nt small RNA 5’ termini distribution relative to exon-exon junctions position (nt) relative to exonic CAGE tag site B. nuclear small RNAs D. spliced small RNAs 0.03 small RNAs 0.16 random control 0.025 0.14 cytoplasmic 0.12 nuclear 0.02 0.1 0.015 0.08 0.01 0.06 small RNA raw tag frequency distibution 0.04 0.005 tag frequncy distribution spliced small RNA normalized 0.02 0 0 -150 nt 0 +150 nt -30 nt +1 nt position (nt) relative to exon/exon junction position (nt) relative to exonic CAGE tag site Supplemental Fig. S6. Evidence that post-transcriptional cleavage generates small RNAs. (A) Schematic diagram illustrating the mapping strategy employed to discern small RNAs that span EEJs. Distribution of small RNA tags derived from mouse embryonic stem cells (Babiarz et al. 2008) mapping proximal to RefSeq EEJs. The frequency of small RNAs that map uniquely to the genome (blue) are under-represented adjacent to EEJs. This under-representation may be reconciled with small RNA tags that map across EEJs (red). Average frequency distribution for small RNA tags derived from human THP1 cells nuclear (B) and cytoplasmic (C) preparations (Taft et al., 2009) within a 200 nt window relative to exonic CAGE tags (blue) and random nucleotides within matched exons (red). (D) Frequency distribution of small RNAs derived from THP1 cytoplasmic (red) and nuclear (blue) across EEJs. A. CAGE tag frequency distribution Visual Cortex 28tpm Somatic Cortex 10 tpm Cerebellum 5tpm Hippocampus 1tpm Embryo 5tpm miR miR* mmu-miR-124-1 AK044422 BC030402 mmu-mir-122a: B. gg c - uc agcugu aguguga aaugguguuug ug c |||||| ||||||| ||||||||||| || a ucgaua ucacacu uuaccgcaaac ac a ugccuaacggaucguga aa a u ca 142 tags (liver) Supplemental Fig. S7. Secondary capping of microRNA precursors. (A) Genome browser view of mmu-miR-124-1 hairpin (light red bar) showing CAGE tags immediately downstream of star sequence (dark red bar) in mouse tissues (blue panels). The mmu-miR-124-1 hairpin is hosted within full-length RNA transcripts (black bars). (B) Schematic diagram of the mmu-miR-122a hairpin showing mature microRNA sequence (red) and CAGE tag locations (blue) in mouse liver. A. M. musculus embryonic stem cells wild-type Dcr-1 null Dgcr8 null 0.25 0.12 0.12 genomic spliced (0.12% or 1872 tags) (0.13% of 1914 tags) 0.2 0.1 0.1 0.08 0.08 0.15 0.06 0.06 0.1 0.04 0.04 0.05 (0.08% or 897 tags) 0.02 0.02 0 small RNA (average count) small RNA (average 0 16 17 18 19 20 21 22 23 24 25 26 16 17 18 19 20 21 22 23 24 25 26 16 17 18 19 20 21 22 23 24 25 26 size (nt) size (nt) size (nt) B.