Distinct Patterns of Epigenetic Marks and Transcription Factor Binding Sites Across Promoters of Sense-Intronic Long Noncoding Rnas
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary data: Distinct patterns of epigenetic marks and transcription factor binding sites across promoters of sense-intronic long noncoding RNAs Sourav Ghosh, Satish Sati, Shantanu Sengupta and Vinod Scaria J. Genet. 94, 17–25 Gencode V9 lncRNA gene : 11004 Known lncRNA : 1175 Novel lncRNA : 5898 Putative lncRNA : 3931 Min Max Figure 1. Different histone modifications and TFBS site distribution patterns across the TSS of known, novel and putative lncRNA genes. Histone modification peak summit count was calculated in the 100 bp sliding window within 5 kb upstream and downstream regions from TSS in H1 cell line. Peak summit count was further normalized by dividing each bin with the total number of genes in each category. Journal of Genetics Supplementary data, J. Genet. 94, 17–25 a b c Figure 2. RNA pol II and pol III occupancy around TSS, represented as heat maps. (a) RNA pol III peak summits of K562 cell types plotted across the TSS, 5 kb upstream and downstream in 100 bp nonoverlapping sliding window, for all protein coding, all long noncoding and lncRNA subclass genes. (b) RNA pol III peak summit count in 100bp non-overlapping sliding window, 5 kb upstream and downstream from the start site calculated for all protein-coding genes, lncRNA genes and lncRNA subclasses from GM12878 cell types. (c) RNA pol II peak summits calculated in 100 bp sliding window, 5 kb upstream and downstream from the TSS site was calculated for all protein coding, all lncRNA and lncRNA subclasses genes in GM12878 cell type. Count was normalized by dividing individual count with total number of genes in that category. Figure 3. Distribution of different histone modifications across the TSS of antisense-intronic lncRNA genes. Different H1 cell line histone modification peak summit count was calculated in 100 bp sliding widow, within 5 kb upstream and downstream region from TSS. The peak summit count was normalized in a similar manner by dividing each count with the total number of genes. Journal of Genetics Supplementary data, J. Genet. 94, 17–25 GAS5 HOATAIR Figure 4 (continues) Journal of Genetics Supplementary data, J. Genet. 94, 17–25 HULC Malat1 Figure 4 (contd) Journal of Genetics Supplementary data, J. Genet. 94, 17–25 PCGEM1 H19 Figure 4 (contd) Journal of Genetics Supplementary data, J. Genet. 94, 17–25 TUG1 UCA1 Figure 4 (contd) Journal of Genetics Supplementary data, J. Genet. 94, 17–25 XIST MEG3 Figure 4. Distribution of different histone modification marks and TFBS sites, across the promoter of 10 well-established lncRNA loci (GAS5, HOATAIR, HULC, MALAT1, PCGEM1, H19, TUG1, UCA1, XIST and MEG3). Different histone modification distributions are shown in UCSC genome browser by uploading the adult liver tissue wig files. Journal of Genetics Supplementary data, J. Genet. 94, 17–25 Figure 5. Distribution of TFBS across the distal promoter region. Distribution of TFBS count in 100 bp sliding window, 20 kb upstream and downstream from the TSS was calculated for all protein-coding, lncRNA genes and lncRNA subclasses. The count was normalized by dividing the individual count with the total number of genes in that category. The count was plotted as heat map. Figure 6. Distribution of H3K4me3 mark across the TSS of high-expressed and low-expressed protein coding and lncRNA gene in H1 cell line. H3K4me3 peak summit count was calculated across 5 kb upstream and downstream of TSS in 100 bp sliding window. Count was normalized by dividing the number of genes in each category. For protein-coding and lncRNA genes, the count was plotted as a line plot and the four subclass of lncRNA was plotted as heat map. Journal of Genetics Table 1. LncRNAs annotated with protein-coding genes. Sense lncRNA lncRNA gene Protein-coding gene ENSG-ID chr. Strand Start Stop Gene name ENSG-ID chr Strand Start Stop Gene name ENSG00000224813.1 chr1 + 329784 453948 RP4-669L17.4; ENSG00000235249.1 chr1 + 367640 368634 OR4F29; ENSG00000230021.1 chr1 − 536816 659930 RP5-857K21.4; ENSG00000185097.2 chr1 − 621059 622053 OR4F16; ENSG00000237491.2 chr1 + 714162 740255 RP11-206L10.6; ENSG00000197049.2 chr1 + 721320 722513 AL669831.1; ENSG00000242590.1 chr1 + 990413 991496 RP11-54O7.14; ENSG00000188157.8 chr1 + 955503 991496 AGRN; ENSG00000230415.1 chr1 + 1210603 1215800 RP5-902P8.10; ENSG00000162572.14 chr1 + 1214447 1227409 SCNN1D; ENSG00000240731.1 chr1 − 1252961 1254069 RP5-890O3.9; ENSG00000127054.12 chr1 − 1246965 1260071 CPSF3L; ENSG00000250188.1 chr1 − 1337454 1339205 RP4-758J18.5; ENSG00000242485.1 chr1 − 1337288 1342693 MRPL20; ENSG00000243558.1 chr1 − 2143610 2144013 RP11-181G12.5; ENSG00000162585.11 chr1 − 2115903 2144159 C1orf86; ENSG00000243558.1 chr1 − 2143610 2144013 RP11-181G12.5; ENSG00000203301.2 chr1 − 2143360 2145620 AL590822.1; ENSG00000234396.2 chr1 − 2143962 2152175 RP11-181G12.4; ENSG00000162585.11 chr1 − 2115903 2144159 C1orf86; ENSG00000234396.2 chr1 − 2143962 2152175 RP11-181G12.4; ENSG00000203301.2 chr1 − 2143360 2145620 AL590822.1; Supplementary data, J. Genet. ENSG00000226944.1 chr1 − 6264900 6265840 RP1-120G22.11; ENSG00000116251.5 chr1 − 6241329 6269449 RPL22; ENSG00000237402.1 chr1 + 7429003 7430331 CAMTA1-IT1; ENSG00000171735.12 chr1 + 6845384 7827916 CAMTA1; ENSG00000225820.1 chr1 − 8066074 8066784 ERRFI1-IT1; ENSG00000116285.8 chr1 − 8064464 8086368 ERRFI1; ENSG00000236269.1 chr1 − 8935847 8938066 ENO1-IT1; ENSG00000074800.8 chr1 − 8921061 8939308 ENO1; Journal of Genetics ENSG00000228150.1 chr1 + 10002981 10010032 RP11-84A14.4; ENSG00000173614.9 chr1 + 10003486 10045559 NMNAT1; ENSG00000241326.1 chr1 + 10043199 10044626 RP11-807G9.2; ENSG00000173614.9 chr1 + 10003486 10045559 NMNAT1; ENSG00000242349.1 chr1 + 11901074 11908136 NPPA-AS1; ENSG00000011021.17 chr1 + 11866207 11903201 CLCN6; ENSG00000255275.1 chr1 − 19180707 19247618 RP13-279N23.2; ENSG00000179002.5 chr1 − 19166093 19186176 TAS1R2; ENSG00000255275.1 chr1 − 19180707 19247618 RP13-279N23.2; ENSG00000159423.11 chr1 − 19197926 19229275 ALDH4A1; ENSG00000255275.1 chr1 − 19180707 19247618 RP13-279N23.2; ENSG00000169991.5 chr1 − 19230774 19283180 IFFO2; ENSG00000117242.7 chr1 − 20969150 20978686 PINK1-AS1; ENSG00000244038.2 chr1 − 20978270 20988000 DDOST; ENSG00000230068.1 chr1 + 22385690 22390692 CDC42-IT1; ENSG00000070831.11 chr1 + 22379120 22419437 CDC42; 94 ENSG00000237200.1 chr1 + 22843967 22846201 ZBTB40-IT1; ENSG00000184677.11 chr1 + 22778344 22857650 ZBTB40; , 17–25 ENSG00000240553.1 chr1 − 23346640 23414551 RP1-184J9.2; ENSG00000169641.9 chr1 − 23410516 23504301 LUZP1; ENSG00000232557.1 chr1 − 24233601 24234375 RP11-4M23.3; ENSG00000188822.6 chr1 − 24197016 24285549 CNR2; ENSG00000240284.1 chr1 − 24382586 24410622 RP11-293P20.3; ENSG00000142661.12 chr1 − 24382525 24438665 MYOM3; ENSG00000227985.1 chr1 − 25579798 25594376 RP11-335G20.5; ENSG00000117616.11 chr1 − 25568740 25594327 C1orf63; ENSG00000255054.2 chr1 + 26140464 26150235 RP1-317E23.6; ENSG00000162430.12 chr1 + 26126667 26144713 SEPN1; ENSG00000255054.2 chr1 + 26140464 26150235 RP1-317E23.6; ENSG00000117640.13 chr1 + 26145131 26159432 FAM54B; ENSG00000228172.1 chr1 − 26143240 26146263 RP1-317E23.3; ENSG00000223474.2 chr1 − 26145212 26147288 AL020996.1; ENSG00000225375.1 chr1 + 26575392 26580855 RP11-231P20.3; ENSG00000130695.8 chr1 + 26560691 26605299 CEP85; ENSG00000251020.1 chr1 − 27218269 27219052 RP1-50O24.8; ENSG00000198746.7 chr1 − 27216979 27226962 GPATCH3; ENSG00000237895.1 chr1 − 28356225 28357423 EYA3-IT1; ENSG00000158161.11 chr1 − 28296855 28415207 EYA3; ENSG00000242125.1 chr1 + 28832492 28837404 SNHG3; ENSG00000180198.9 chr1 + 28832455 28865708 RCC1; ENSG00000250135.1 chr1 + 32636334 32642169 RP4-622L5.2; ENSG00000025800.8 chr1 + 32573639 32642169 KPNA6; ENSG00000254553.1 chr1 + 32930658 33066393 RP1-27O5.3; ENSG00000215897.4 chr1 + 32930670 32953461 ZBTB8B; ENSG00000254553.1 chr1 + 32930658 33066393 RP1-27O5.3; ENSG00000160062.9 chr1 + 33005028 33066591 ZBTB8A; ENSG00000255811.1 chr1 − 35227027 35253698 RP1-34M23.5; ENSG00000163866.7 chr1 − 35178338 35325417 C1orf212; ENSG00000241014.1 chr1 − 35439957 35450914 RP11-244H3.1; ENSG00000163867.11 chr1 − 35447134 35497569 ZMYM6; ENSG00000241014.1 chr1 − 35439957 35450914 RP11-244H3.1; ENSG00000243749.1 chr1 − 35447136 35450954 ZMYM6NB; ENSG00000225306.1 chr1 + 36057977 36059256 RP5-983H21.3; ENSG00000116819.6 chr1 + 36038971 36060929 TFAP2E; Table 1 (contd) Sense lncRNA lncRNA gene Protein-coding gene ENSG-ID chr Strand Start Stop Gene name ENSG-ID chr Strand Start Stop Gene name ENSG00000225627.1 chr1 + 40747986 40749177 RP1-39G22.5; ENSG00000084073.4 chr1 + 40723779 40759856 ZMPSTE24; ENSG00000227527.1 chr1 − 42801057 42804047 RP11-223A3.1; ENSG00000198815.4 chr1 − 42642210 42801548 FOXJ3; ENSG00000233602.1 chr1 − 44709080 44709945 ERI3-IT1; ENSG00000117419.9 chr1 − 44686742 44820932 ERI3; ENSG00000248321.1 chr1 + 45190179 45191263 RP4-678E16.4; ENSG00000198520.5 chr1 + 45140364 45191263 C1orf228; ENSG00000246279.1 chr1 − 45769863 45771290 AL359540.1; ENSG00000162415.6 chr1 − 45482071 45771881 ZSWIM5; ENSG00000230896.1 chr1 − 46160356 46162747 RP11-767N6.7; ENSG00000197429.6 chr1 − 46159996 46216322 IPP; ENSG00000250719.1 chr1 − 46641158 46642150 RP11-322N21.2; ENSG00000117461.10 chr1 − 46505812 46642160 PIK3R3; ENSG00000225623.1 chr1 − 49839873 49937757 AGBL4-IT1; ENSG00000186094.11 chr1 − 48998527 50489585 AGBL4; ENSG00000236434.1 chr1 + 51730588 51731705 RP11-296A18.6; ENSG00000123091.4 chr1 + 51701943 51739127 RNF11; ENSG00000230272.1 chr1 + 51731525 51732716 RP11-296A18.7; ENSG00000123091.4 chr1 + 51701943 51739127 RNF11; ENSG00000242391.1 chr1 + 53346888 53347402