bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

HisTrader: A Tool to Identify Free Regions from ChIP-Seq of Post- Translational Modifications Yifei Yan1,2, Ansley Gnanapragasam2,3 and Swneke Bailey1,2,3* Affiliations: 1. Department of Surgery, Division of Thoracic and Upper Gastrointestinal Surgery, McGill University, Montreal, Canada. 2. The Cancer Research Program, Research Institute of the McGill University Health Centre (RI-MUHC), Montreal, Canada. 3. Department of Human Genetics, McGill University, Montreal, Canada. *Corresponding author. Contact: [email protected]

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ABSTRACT

Motivation: immuno-precipitation sequencing (ChIP-Seq) of histone post- translational modifications coupled with de novo motif elucidation and enrichment analyses can identify factors responsible for orchestrating transitions between cell- and disease- states. However, the identified regulatory elements can span several kilobases (kb) in length, which complicates motif-based analyses. Restricting the length of the target DNA sequence(s) can reduce false positives. Therefore, we present HisTrader, a computational tool to identify the regions accessible to transcription factors, nucleosome free regions (NFRs), within histone modification peaks to reduce the DNA sequence length required for motif analyses. Results: HisTrader accurately identifies NFRs from H3K27Ac ChIP-seq profiles of the lung cancer cell line A549, which are validated by the presence of DNaseI hypersensitivity. In addition, HisTrader reveals that multiple NFRs are common within individual regulatory elements; an easily overlooked feature that should be considered to improve sensitivity of motif analyses using histone modification ChIP-seq data. Availability and implementation: The HisTrader script is open-source and available on GitHub (https://github.com/SvenBaileyLab/Histrader) under a GNU general public license (GPLv3). HisTrader is written in PERL and can be run on any platform with PERL installed.

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Introduction Cell-type specific transcription factors (TFs) govern cell fate decisions1 and their aberrant expression can result in the development of diseases, such as cancer2,3. TFs control transcriptional programs by binding distinct DNA recognition sequences within noncoding regulatory elements. These regulatory elements are flanked by harbouring epigenetic post-translational modifications (PTMs) to the histone proteins that encompass them. For example, nucleosomes with at 27 (H3K27Ac) border TF-accessible DNA at active gene promoters and enhancers4. These regulatory regions, determined using chromatin sequencing (ChIP-Seq), often span several kilobases of DNA in length and include multiple modified nucleosomes together with TF binding sites. TFs bind to the nucleosome free region (NFR) located between the modified nucleosomes and the location of bound TFs can be inferred by the presence of a depression or valley in the ChIP-seq signal profile5. Therefore, ChIP-Seq of histone PTMs can simultaneously identify active regulatory elements and, through motif analyses, the TFs occupying them.

TF DNA recognition motifs are short DNA sequences that can occur by chance within long DNA sequences. Reducing the length of the DNA sequences used in motif analyses can reduce false positive motif occurrences and computational complexity. Using only TF-accessible regions, NFRs, provides a means to reduce the length of the target DNA sequence(s). For example, Ramsey, et al6 demonstrated that the mapping of motifs within NFRs more accurately predicted TF occupancy and Ziller et al7 used NFRs to identified TFs governing cell fate transitions in a model of neuronal development. In cancer, TF motif enrichment within NFRs revealed master transcriptional regulators active within medulloblastoma subtypes8. These findings illustrate that acquired and lost regulatory elements observed in different cell and disease states can be interrogated for the presence of recognition motifs to elucidate the TFs responsible for a given cellular or disease state.

Several tools can be used and adapted to identify NFRs within histone PTM ChIP-Seq peaks. For example, the Homer9 peak calling algorithm can centre peaks on the most likely NFR. However, this approach is limited to one NFR per peak region. Peaksplitter, within peak analyzer10, can identify multiple subpeaks within broad ChIP-seq peaks, but the subpeaks are directly adjacent bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

and touching one another. In addition, neither approach identifies the boundaries between nucleosome occupied regions (NORs) and NFRs. FindPeaks11 can be used to identify subpeaks, but requires a user specified minimum read depth. Finally, Binoch can identify changes in nucleosome occupancy between a control and a treatment condition12. However, a tool that identifies the location and boundaries between NORs and NFRs is currently unavailable. Therefore, we designed HisTrader a new tool capable of distinguishing NFRs and NORs within histone PTMs ChIP-Seq signal profiles independently of peak height or signal threshold.

Methods HisTrader adopts differencing and moving averages, two methods from time series data analysis, to identify local minima, NFRs, and local maxima, NORs, within the ChIP-Seq signal of histone PTMs (Figure 1A). First, HisTrader converts the ChIP-Seq signal profile into equal sized base pair (bp) intervals, which is analogous to equally spaced time measurements in time series data. Next, HisTrader applies a narrow and a broad moving average to the ChIP-Seq signal, which corresponds to a short-term and a long-term moving average, respectively. Points where the two moving averages intersect denote transitions between NORs and NFRs. HisTrader also applies second-order differencing to the signal profile to identify changes in its curvature. Positive values indicate a downward curvature and negative values indicate upward curvature of the signal. HisTrader merges the positive and negative intervals together to call NFRs and NORs, respectively. Finally, HisTrader reports the consensus between the results of two approaches. HisTrader requires the broad peak regions and the ChIP-Seq signal profile, which can be generated using ChIP-Seq peak calling algorithms, such as MACS213. To facilitate downstream motif analyses HisTrader can extract the DNA sequences for both the identified NFRs and NORs.

Results We ran HisTrader on H3K27Ac ChIP-Seq data provided by the Encyclopedia or DNA Elements (ENCODE) project14 for the A549 lung cancer cell line. Briefly, we aligned the reads to the human genome (Grch38) using BWA15 and called broad peaks with MACS213 (Supplementary Methods). As expected the H3K27Ac peaks tended to span several nucleosomes and had an average length of 1,245bp across all three replicates. HisTrader revealed that 67% of the H3K27Ac broad peaks had more than one NFR (Figure 1B). The average NFR length was 212bp and the bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

average distance between NFR midpoints was 456bp. Next, we assessed whether the identified NFRs corresponded to open chromatin by examining DNaseI hypersensitivity sequencing (DNaseI-seq) data from same A549 cell line. As expected, the average signal at H3K27Ac peaks with a single NFR corresponded to two acetylated nucleosomes flanking a single DHS region (Figure S1). Interestingly, the average H3K27Ac profile at sites where HisTrader identified multiple NFRs was less clear (Figure S2, S3 & S4). To unravel this apparent discrepancy, we centred the DNaseI signal on each NFR within these sites and found the DNase-I hypersensitivity signal to be clearly multimodal with a leptokurtic peak centred on each NFR with neighbouring platykurtic shoulder(s). The broad shoulders of the distribution are consistent with variability in nucleosome positioning and variation in nucleosome spacing (Figure 1B, inset). This variability affects our ability to visualize multiple NFRs. For example, if we restrict the analysis to sites where the NFRs are equally spaced, between 350bp-550bp apart, multiple adjacent acetylated nucleosomes and DHS regions become readily apparent (Figure 1C, S2, S3 & S4). Multiple NFRs tended to correspond with promoter regions (Figure 1D) and the deepest depression, NFR, is not necessarily associated with the most accessible region (Figure S5 & S6). Finally, HisTrader can distinguish between NFRs and NORs in very large regions (Figure S5).

In conclusion, HisTrader effectively identifies the boundaries between NORs and NFRs and can extract and trim the corresponding DNA sequence for use in motif-based analyses. In addition, HisTrader reveals that multiple NFRs, within individual regulatory elements, is common and should be considered in motif analyses using histone PTM ChIP-seq data.

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Funding/Acknowledgements

The Cancer Research Society (CRS), Canadian Institute of Health Research (CIHR) and Montreal General Foundation (MGF) supported this research. SDB is supported by a Thomlinson award from McGill University. SDB received the Dr. Ray Chiu distinguish scientist in surgical research award from the MGF and a Dr. Henry R. Shibata fellowship from the Cedars Cancer Foundation. YY is supported by a Research Institute of the McGill University Health Centre (RI-MUHC) postdoctoral fellowship. We thank Drs. Livia Garzia, Xiaoyang Zhang and James C. Engert for their insightful and helpful comments.

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure Legends.

Figure 1: Trading . (A) Procedure implemented in HisTrader. i) An example H3K27Ac broad peak in A549 cells is shown. ii) HisTrader converts the ChIP-Seq signal profile into equally sized bins. iii) HisTrader applies a narrow and broad moving average to detect trends in the signal. Crossover points indicate boundaries between NFRs and nucleosome occupied regions. iv) HisTrader applies second-order differencing to identify changes in the curvature of the signal profile. Transition from a positive to negative indicate boundaries between NFRs and NORs. Values are multiplied by -1. The consensus between the two methods is reported. iv) The NFRs called using each method and their consensus is displayed and compared to the corresponding DNaseI hypersensitivity signal from the A549 cell line. (B) The number of H3K27Ac peaks with multiple HisTrader called NFRs and the distribution of the distances between NFRs (inset). (C) Average H3K27Ac (blue) and DNaseI hypersensitivity signal (Red) at sites with two HisTrader called NFRs spaced 350-550bp apart. Centred on the first (i) and second NFR (ii). Heatmaps of H3K27Ac (iii) and DNaseI (iv) hypersensitivity signals at sites with two equally spaced NFRs centred on the first and second NFR. (D) Example sites with (i) one, (ii) two and (iii) three HisTrader identified NFRs. Blue = H3K27Ac, Red = DNaseI hypersensitivity

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

References

1. Stadhouders, R., Filion, G.J. & Graf, T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature 569, 345-354 (2019). 2. Bushweller, J.H. Targeting transcription factors in cancer - from undruggable to reality. Nat Rev Cancer 19, 611-624 (2019). 3. Lee, T.I. & Young, R.A. Transcriptional regulation and its misregulation in disease. Cell 152, 1237-51 (2013). 4. Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell- type-specific gene expression. Nature 459, 108-12 (2009). 5. He, H.H. et al. Nucleosome dynamics define transcriptional enhancers. Nature Genetics 42, 343-7 (2010). 6. Ramsey, S.A. et al. Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites. Bioinformatics 26, 2071-5 (2010). 7. Ziller, M.J. et al. Dissecting neural differentiation regulatory networks through epigenetic footprinting. Nature 518, 355-359 (2015). 8. Lin, C.Y. et al. Active medulloblastoma enhancers reveal subgroup-specific cellular origins. Nature 530, 57-62 (2016). 9. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576- 89 (2010). 10. Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010). 11. Fejes, A.P. et al. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729-30 (2008). 12. Meyer, C.A., He, H.H., Brown, M. & Liu, X.S. BINOCh: binding inference from nucleosome occupancy changes. Bioinformatics 27, 1867-8 (2011). 13. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome biology 9, R137 (2008). 14. Consortium, E.P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012). 15. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009).

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. st A. C. i. 1 NFR ii. 2nd NFR 0.6 Average Average Average 0.6

i. 0.8 0.8 0.6 0.8 0.6 0.8 0.6 0.6 2.5 0.8 Raw Signal 0.8 2.02.0 0.5 0.5 0.5 0.5 1.5

0.6 0.6 0.4 0.6 0.6 0.4 0.6 original DNaseI DNaseI 1.01.0 0.6 0.4 0.4 0.4 0.4 0.5

0.0 0.4 0.4 0.4 0.3 0.3 0.4 0.3 0.3 0.4 124937000 124938000 124939000 124940000 124941000 124942000 0.4

ii. 0.2 0.2

2.5 originalStep 0.2 Signal 0.2 Signal Equal Steps 0.2 0.2 2.0 0.2 0.2 2.0 0.2 0.2 0.2 1.5 0.2 0.1 0.1 0.1 0.1 1.0 1.0newSignal

Average H3K27Ac Signal 0.0 0.0 0.0 Average H3K27Ac Signal 0.0 0.5 0.0 0.0 0.0 0.0

0.0 −1500 0 1500 −1500 0 1500 -1.5 Distance−1500 (Kb)0 1.5 1500 −1500 0 1500 iii. 124937000 124938000 124939000 124940000 124941000 124942000 -1.5 Distance (Kb) 1.5 2.5 Moving Averages newStep st nd st nd 2.02.0 iii. 1 NFR 2 NFR iv. 1 NFR 2 NFR 1.5 fast 1.01.0 0.5 2.0

0.0 1.4 iv. 124937000 124938000 124939000 124940000 124941000 124942000 -(2nd order difference) newStep

0.050.05 1.2 -1*diff 0.000.00 Normalized H3K27Ac Signal H3K27Ac Normalized Normalized -0.05 1.5 -0.05

124937000 124939000 124941000 1.0 v. H3K27Ac Signal 0.8 DNaseI 1.0 Broad Peak Signal Moving Averages 0.6 Differencing Consensus 0.4

DnaseI Signal 0.5 0.2

±1.5 ±1.5 ±1.5 ±1.5 B. D. Distance (Kb) Distance (Kb) 1212 1111 i. ii. 1010 99 15000 15000 88 77 NUC NUC 66 NumberNFRs of

5

Number of NFRs 5 10000 NFR 10000 44 NFR WDR70 33 GTF2H1 HPS5 22 iii.

50005000 200200 400400 600600 800800 10001000 12001200 1400

Number of Broad Peaks DistanceDistance (BP) (BP) NUC

0 NFR 11 2 3 4 5 6 7 8 9 10 11 12 FANCC Number of NFRs

Figure 1: Trading histones. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary Methods

Alignments and Peak Calling

Reads were aligned to the human genome (hg38) using BWA1. Broad peaks were called using MACS22 with q-value of 0.01 and a broad peak cutoff of 0.1 with the –nomodel specified. A fold enrichment cut-off 2 was also used. Extremely, large enriched regions were called separately with the local lambda turned off (ie. --nolambda). In addition, we used MACS2 to create normalized signal files, signal per million reads, for both the H3K27Ac and DNAseI datasets using --SPMR. Since, the DNAseI dataset was paired-end BAMPE format was specified for this dataset when running MACS2. Heatmaps were created using deepTools3.

Datasets

H3K27Ac ChIP (ENCFF141ICG, ENCFF561LMW, ENCFF648QLO) and DNaseI ((ENCFF533OBQ and ENCFF073FEQ) sequencing reads for the A549 lung cancer cell line were downloaded from the ENCODE website (https://www.encodeproject.org/)4.

Tutorial

The HisTrader script is open-source and available on GitHub (https://github.com/SvenBaileyLab/Histrader).

1. To call nucleosome free and nucleosome occupied regions, NFRs and NORs respectively, issue the following command:

perl Histrader.pl --bedGraph --peaks --out

After issuing this command, HisTrader will generate an index file, named Histrader.idx, and three output files, named Histrader.nfr.bed, Histrader.nuc.bed and Histrader.missing.bed, which correspond to the NFRs, the nucleosomes or NORs and the peaks where no NFRs were detected, respectively.

2. To simultaneously extract the DNA sequences of NFRs and NORs a genome fasta file must be specified and it should match the genome used for aligning the ChIP-Seq data.

perl Histrader.pl --bedGraph --peaks - -out --genome

This command will create two additional files, Histrader.nfr.fa and Histrader.nuc.fa, which contain the DNA sequences for each of the sites in the Histrader.nfr.bed and Histrader.nuc.bed output files in fasta format. These files can be used in downstream motif-based analyses.

3. If the downstream motif analyses require equal length DNA sequences HisTrader can also trim DNA sequences from the midpoint of the NFRs using the --trim and --trimSize commands. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

For example, to extract 100bp sequences centred on the midpoint of each NFR issue the following command:

perl Histrader.pl --bedGraph --peaks - -out --genome --trim --trimSize 100

Case Example

A sample region, chr11:12286115-12289133 (hg38), from the H3K27Ac ChIP-seq of A549 is provide in the TEST_DATA folder. To run HisTrader to call NFRs and NORs run the following command:

perl Histrader.pl --bedGraph \ test.region.histrader.chr11_12286115_12289133.H3K27AC.bdg --peaks \ test.region.histrader.chr11_12286115_12289133.H3K27AC.broadPeak --out \ test.region.histrader.chr11_12286115_12289133.H3K27AC

This command will generate the following output files:

test.region.histrader.chr11_12286115_12289133.H3K27AC.nfr.bed (Nucleosome Free Regions) test.region.histrader.chr11_12286115_12289133.H3K27AC.nuc.bed (Nucleosome Occupied Regions) test.region.histrader.chr11_12286115_12289133.H3K27AC.missing.bed (empty)

The expected results are shown in Figure S6 and can be visualized using a genome browser, such as the integrative genomics viewer (IGV)5. Please note that only the chr11:12286115-12289133 (hg38) region is present.

The files test.region.histrader.chr11_12286115_12289133.DNASEI.bdg and test.region.histrader.chr11_12286115_12289133.DNASEI.narrowPeak contain the corresponding DNaseI signal and called narrowPeaks, respectively.

References

1. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009). 2. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome biology 9, R137 (2008). 3. Ramirez, F., Dundar, F., Diehl, S., Gruning, B.A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187-91 (2014). 4. Davis, C.A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 46, D794-D801 (2018). 5. Robinson, J.T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24-6 (2011). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Average Signal (Normalized)

0.6 Average Average

0.8 0.4 DNaseI

0.4 Signal 0.2 Average H3K27Ac Signal

-1.5 -1.5 1.2 1.5 Normalized Signal (SPMR) Signal Normalized 0.8 1.0 0.4 0.5

-1.5 1.5 -1.5 1.5 Distance (Kb)

Figure S1: H3K27Ac and DNaseI hypersensitivity signal at sites with one HisTrader defined NFR. The average signal intensity (top) across all sites. Heatmap of signal intensity (Bottom) for all sites. Blue = H3K27Ac, Red = DNaseI hypersensitivity. = two two Red with with Ac, 27 sites K 3 sites all H at Normalized Signal (SPMR) = Average DNaseI Signal The copyright holder for this preprint this for holder copyright The 1.4 1.2 1.0 0.8 0.6 0.4 0.2 across

0.60.6 0.5 0.40.4 0.3 0.20.2 0.1 0.0 Blue 1500 1.5 . signal ) ) (top) 1.5 Kb ± Kb sites 0 NFR all nd 2 Distance ( 1.5 Distance ( intensity ± for

1500 − 0.8 0.4 0.2 0.6 1.5 - 0.8 0.6 0.4 0.2 0.0 2.0 1.5 1.0 0.5

0.60.6 0.5 0.40.4 0.3 0.20.2 0.1 0.0 signal this version posted March 12, 2020. 2020. 12, March posted version this ; 1500 hypersensitivity 1.5 (Bottom) ) ) 1.5 ± Kb Kb Average Normalized Signal All Broad Peaks with 2 NFRs 0 NFR average st DNaseI 1 1.5 Distance ( Distance ( intensity The ± .

and

1500 − 0.8 0.4 0.2 0.6 1.5 .

- 0.8 0.6 0.4 0.2 0.0 Average H3K27Ac Signal H3K27Ac Average signal Ac NFRs 27 of K 3 H https://doi.org/10.1101/2020.03.12.989228 defined : doi: doi: 2 S hypersensitivity Heatmap (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. permission. without allowed No reuse reserved. rights All the author/funder. is review) peer by certified not was (which . bioRxiv preprint preprint bioRxiv NFRs DNaseI Figure HisTrader = and with have three Blue DNaseI to . sites (top) = with all NFRs Red selected sites intensity

Normalized Signal (SPMR) Normalized Signal (SPMR) three Average DNaseI Signal Average DNaseI Signal across at Ac, 2.5 2.0 1.5 1.0 0.5 2.5 2.0 1.5 1.0 0.5 NFRs 1.0 0.6 0.2 1.0 0.2 0.6 27 with The copyright holder for this preprint this for holder copyright The (top) K signal 3 H 1.5 ) signal three Kb ± Kb sites NFR Kb = rd 1.5 1.5 3 all ± ± with 1.5 intensity average ± for Blue Distance ( Sites . The 1.5 signal ± . Kb Kb NFR 2.0 1.5 1.0 0.5 2.0 1.5 1.0 0.5 site (B) nd . 1.5 1.5 2 (Bottom) ± ± hypersensitivity apart) this version posted March 12, 2020. 2020. 12, March posted version this 1.5 ; ) ± average each Kb bp Average Normalized Signal at 1.5 intensity The 550 Kb ± Kb - . NFR DNaseI Distance ( 1.5 st 1.5 bp 1 ± (A) ± 1.5 hypersensitivity . signal ± and 350 (

of intensity

0.8 0.4 0.6 0.2

Average H3K27Ac Signal H3K27Ac Average Average H3K27Ac Signal H3K27Ac Average Ac NFRs DNaseI 27 NFRs K = A. B. . signal 3 H Heatmap . of defined https://doi.org/10.1101/2020.03.12.989228 : Red 3 doi: doi: S spaced Ac, NFRs 27 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. permission. without allowed No reuse reserved. rights All the author/funder. is review) peer by certified not was (which K 3 hypersensitivity HisTrader three H equally heatmaps Figure bioRxiv preprint preprint bioRxiv bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint A.(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without Average permission. 0.8 0.5 0.8 0.5 0.8 0.5 0.8 0.5 0.8 0.8 0.8 0.8 0.4 0.4 0.4 0.4

0.4 0.4 0.4 0.4

0.6 0.6 0.6 0.6 0.6 DNaseI

0.3 0.3 0.3 0.3

0.4 0.4 0.4 0.4 0.2 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.1 0.1 0.1

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1 Signal 0.2

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.5−1500 (Kb0 ) 1.51500 -1.5−1500 (Kb0 ) 1.51500 -1.5−1500 (Kb0 ) 1.51500 -1.5−1500 (Kb0 ) 1.51500 2.0 2.5 Average H3K27Ac Signal Normalized Signal (SPMR) Signal Normalized 2.0 1.5 1.5 1.0 1.0 0.5 0.5

±1.5 ±1.5 ±1.5 ±1.5 Average ±1.5 ±1.5 ±1.5 ±1.5 V 0.7 B. 1.0 0.7 V1.0 0.7 1.0 0.7 1.0 0.7 0.6 0.6 0.6 1.0 1.0 1.0 1.0

0.6 0.6 0.6 0.6

0.8 0.8 0.8 0.8 0.5

0.8 0.5 0.5 0.5 0.5 DNaseI

0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.6 0.6 0.6 0.6 0.3 0.3 0.3 0.3

0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2 Signal 0.1 0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

−1500 0 1500 −1500 0 1500 −1500 0 1500 −1500 0 1500 Average H3K27Ac Signal -1.5 (Kb) 1.5 -1.5 (Kb) 1.5-1.5 (Kb) 1.5 -1.5 (Kb) 1.5 2.5 2.0 2.0 Normalized Signal (SPMR) Signal Normalized 1.5 1.5 1.0 1.0 0.5 0.5

±1.5 ±1.5 ±1.5 ±1.5 ±1.5 ±1.5 V±1.5 ±1.5 DistanceV (Kb) Distance (Kb) Figure S4: H3K27Ac and DNaseI hypersensitivity signal at sites with four HisTrader defined NFRs. . (A). The average signal intensity (top) across all sites with four NFRs. Heatmap of signal intensity (Bottom) for all sites with four NFRs. Blue = H3K27Ac, Red = DNaseI hypersensitivity. (B) Sites with four NFRs selected to have equally spaced NFRs (350bp-550bp apart). The average signal intensity (top) and heatmaps of signal intensity at each site. Blue = H3K27Ac, Red = DNaseI hypersensitivity. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A. H3K27Ac Signal

NUC DNaseI Signal

NFR

SETD1B RHOF LINC01089 B. H3K27Ac Signal

NUC DNaseI Signal

NFR

NCOR2

Figure S5: HisTrader results using H3K27Ac using large regions. The RHOF (A) and NCOR2 (B) are used as an examples. Blue = H3K27Ac, Red = DNaseI hypesensitivity. NFR = nucleosome free regions; NUC = nucleosome occupied regions. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.12.989228; this version posted March 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

H3K27Ac Signal

MACS2 Broad Peak HisTrader NORs HisTrader NFRs

DNaseI Signal

MACS2 DNaseI Narrow Peaks

MICALCL

Figure S6: Example HisTrader result using H3K27Ac ChIP-Seq from A549 cells. The MICALCL locus (chr11:12286115-12289133) is used as an example. Blue = H3K27Ac, Red = DNaseI hypesensitivity. NOR = nucleosome occupied regions; NFR = nucleosome free regions.