Supp Figures.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Figures and Tables for “The effects of genome sequence on differential allelic transcription factor occupancy and gene expression” Timothy E. Reddy1,2, Jason Gertz1, Florencia Pauli1, Katerina S. Kucera2, Katherine E. Varley1, Kimberly M. Newberry1, Georgi K. Marinov3, Ali Mortazavi3,4, Brian A. Williams3, Lingyun Song2, Gregory E. Crawford2, Barbara Wold3, Huntington F. Willard2, Richard M. Myers1* 1HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA 2Duke Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA 3Department of Biology, California Institute of Technology, Pasadena, CA, USA 4Developmental & Cell Biology, University of California, Irvine, CA, USA *To whom correspondence should be addressed ([email protected]) 1 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 1: 10 ) l a n r 5 e t a 2 m / l P a 0 n N r S e t a p ( 2 g -5 o l -10 -10 -5 0 5 10 log2(paternal/maternal) SNP 1 Intra-binding site correlation of differential allelic occupancy between SNPs. First, all binding sites that covered multiple SNPs with high coverage – at least 20 aligned reads – were identified. Then, the log of the ratio of paternal to maternal reads was calculated for each SNPs, and plotted for all pairs of SNPs in each binding site. Allelic biases between intra-peak SNPs were correlated with ρ = 0.65. 2 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 2: , y c = 0.91 n 1.00 ! a p u c c 0.75 2 O e l t a a n c i r l 0.50 e p t e a R M n 0.25 o i t c a r 0.00 F 0.00 0.25 0.50 0.75 1.00 Fraction Maternal Occupancy, Replicate 1 Reproducibility of allelic biases in occupancy. For each site of differential allelic occupancy, the fraction of reads aligning to the maternal allele is plotted for two biological replicates. Magenta line indicates linear regression between the replicates (ρ = 0.91). Black dots are autosomal sites (ρ = 0.91), and blue circles are X chromosomal sites (ρ = 0.70). 3 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 3: d 0.5 e y a 0.4 s s A / 0.3 t n a 0.2 c i f i n 0.1 g i S 0.0 1 2 3 4 5 6 7 8 9 X 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosome Per-chromosome distribution of allele-biased occupancy. For each chromsome (x-axis), the fraction of binding sites with significant allele-biased expression (y-axis) is plotted. Notably, binding sites on the X chromosome are far more likely to have allele-biased occupancy, as expected due skewed X inactivation in GM12878 cells. Supplementary Fig. 4: 0.5 s Autosomes e t i S 0.4 X g n i d n 0.3 i B f o 0.2 n o i t c a 0.1 r F 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction Maternal Occupancy Histogram of maternal bias for all sites of differential allelic occupancy. White bars are autosomal sites, and black bars are X chromosomal sites. 4 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 5 1000 i -3.3 c 100 y ! x o l f o 10 # 1 1 10 # of TFs binding at locus Overlap structure of differential allelic occupancy across the genome. Overlapping binding sites for different factors were clustered together on the genome. Each cluster of overlapping binding sites was referred to as a locus. Plotted is a histogram of the number of TFs binding in each locus. The dashed line indicates power-law distribution fit using maximum likelihood. The power-law fit was significant according to a Kolmogorov-Smirnov goodness-of-fit test with p = 0.40. Likelihood ratio tests to rule out fits to closely related distributions suggests that the distribution is more likely to be a power-law than an exponential distribution (p = 0.03) or a Poisson distribution (p = 0.009), and did not have sufficient power to distinguish from a Weibull distribution (p = 0.4) or from a log-normal distribution (p = 0.43). Fitting the power-law distribution, goodness-of-fit tests, and likelihood ratio tests were all performed as described in (Clauset et al. 2009) using code provided by the authors. Supplementary Fig. 6: (Included as external image due to size) Scatter-plot of allele-biased occupancy at co-bound SNPs for all pairs of transcription factors. Color indicates the amount of correlation, is significant, using the same color bar as in Fig. 1b. White indicates no significant correlation (p > 0.05). Matrix is organized as for Fig. 1b. 5 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 7: Distribution of coordinated differential allelic TF occupancy. For all pairs of factors with significant (p < 0.05) correlation in differential allelic occupancy, the number of such pairs (y-axis) is plotted as a function of the Spearman correlation coefficient (x-axis). 6 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 8: Distribution of differences in TF binding motif similarity between bound and unbound alleles (plotted as log scale on the x-axis) for differentially bound (red) and equally bound (white) sites. Overall there is more similarity on the bound allele, even when the difference in allelic occupancy between the two alleles was not significant. However, when allelic differences in binding were significant, the difference was overall larger. Note that the x-axis is on a log scale and that subtle differences between red and white bars are indeed substantial. Supplementary Fig. 9: 1.0 r2 = 0.88 0.5 Replicate 2 Replicate 0.0 Allele-biased Expression Allele-biased 0.0 0.5 1.0 Allele-biased Expression Replicate 1 Reproducibility of measurement of differential allelic expression. For each gene with differential allelic expression, the fraction of maternal expression was plotted for two biological replicates (x- and y- axis). Magenta line indicates linear regression between the replicates. 7 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 10: Validation of differential allelic expression. To validate differential allelic expression from RNA-seq, we used PCR to amplify fragments of genomic DNA and RT-PCR to amplify the same fragments of expressed mature RNAs. We then cloned fragments into a sequencing plasmid, transformed the plasmid into E. coli, and grew the transformed E. coli on selective media. We picked colonies, isolated the plasmids, and sequenced the cloned inserts. Plotted is the fraction of maternal expression determined by RNA-seq (x-axis) against the fraction of maternal expression determined by cloning (y-axis). Differential allelic expression of all six genes tested matched differential allelic expression by RNA-seq. For five of the six genes, the differential allelic bias was significant compared to the genomic DNA sequencing (p < 0.5, two-sided binomial test). For the sixth gene (ZNF132), 11 of 16 colonies matched the paternal allele and additional sequencing may provide additional statistical power to show significance. Supplementary Fig. 11: 1.00 E B 0.75 A t n 0.50 e c r e 0.25 P 0.00 1 2 3 4 5 6 7 8 9 10111213141516171819202122 X Chromosome Per-chromosome distribution of allele-biased expression. For each chromosome (x-axis), the fraction of heterozygous genes with allele-biased expression is indicated on the y-axis. As expected, allele-biased expression is far more prevalent on the X chromosome. 8 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 12: 1.00 s a i 0.75 B l a n 0.50 r e t a 0.25 M 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 17 18 19 20 21 22 X Chromosome Per-chromosome distribution of maternal bias (i.e. the fraction of expression arising from the maternal allele) for all genes with allele-biased expression. For each chromosome (x-axis), the median maternal bias (y-axis) is close to 0.5, indicating as much biased expression arises from the maternal allele as for the paternal allele. For the X chromosome, however, the majority of expression arises from the maternal (predominantly active) allele. Error bars indicate the full range of the data. Supplementary Fig. 13: LOC654433 LOC550643 LOC388152 LOC253039 LOC84740 XIST KCNQ1OT1 0.0 0.2 0.4 0.6 0.8 1.0 Maternal Bias Fraction of maternal expression (x-axis) for all long non-coding RNAs with allele-biased expression (y- axis). XIST and KCNQ1OT1 are well known to have allele-biased expression. The others are novel. 9 Reddy, Timothy E. et al. Supp. Figures and Tables for Differential allelic TF occupancy and expression Supplementary Fig. 14 Comparison of differential allelic expression of X chromosomal, non-pseudoautosomal (i.e. subject to inactivation) genes between clonal isolates of GM12878 with the maternal X inactivated (x-axis) and isolates with the paternal X inactivated (y-axis).