Oh et al. Epigenetics & Chromatin (2019) 12:30 https://doi.org/10.1186/s13072-019-0277-6 Epigenetics & Chromatin

METHODOLOGY Open Access XL‑DNase‑seq: improved footprinting of dynamic transcription factors Kyu‑Seon Oh1†, Jisu Ha1†, Songjoon Baek2 and Myong‑Hee Sung1*

Abstract Background: As the cost of high-throughput sequencing technologies decreases, genome-wide chromatin acces‑ sibility profling methods such as the assay of transposase-accessible chromatin using sequencing (ATAC-seq) are employed widely, with data accumulating at an unprecedented rate. However, accurate inference of occupancy requires higher-resolution footprinting analysis where major hurdles exist, including the sequence bias of nucleases and the short-lived chromatin binding of many transcription factors (TFs) with consequent lack of footprints. Results: Here we introduce an assay termed cross-link (XL)-DNase-seq, designed to capture chromatin interactions of dynamic TFs. Mild cross-linking improved the detection of DNase-based footprints of dynamic TFs but interfered with ATAC-based footprinting of the same TFs. Conclusions: XL-DNase-seq may help extract novel regulatory circuits involving previously undetectable TFs. The DNase-seq and ATAC-seq data generated in our systematic comparison of various cross-linking conditions also represent an unprecedented-scale resource derived from activated mouse macrophage-like cells which share many features of infammatory macrophages. Keywords: footprinting, Chromatin accessibility, High-throughput sequencing, Bioinformatics, Epigenomics, Genomic footprinting, DNase-seq, ATAC-seq, Computational genomics, Transcription factor binding motifs

Introduction of cells, whereas typical ChIP protocols require 1–20 mil- Understanding of tissue-specifc transcriptional regu- lion cells. Sonication and immunoprecipitation with anti- lome requires the knowledge about DNA sequence- bodies also need to be optimized and validated for each specifc chromatin interactions of transcription factors application of ChIP, and chromatin from diferent cell which are active in the given cell context. ChIP-seq has types often call for reoptimization of these steps. While been widely adopted and currently the most common ChIP-exo and ChIP-nexus were developed to produce method of profling the regulatory landscape of a tran- precise locations of TF binding sites [10, 30], commonly scription factor [16]. Related methods include ChIP-exo, used ChIP-seq methods can only localize TF binding sites ChIP-nexus, ChIA-PET, and HiChIP [7, 10, 21, 30]. How- with about 200 bp resolution. Another caveat with ChIP- ever, numerous issues preclude their use in a systematic based methods is that the antibody often recognizes one manner for a given cell type. Many in vivo cell subsets of subunit of multi-protein complexes. For example, many interest or patient samples come with a limited number TFs exist as heterodimers: AP-1 as c-Jun:c-Fos, NF-κB as RelA:p50; or homodimers: p50:p50, STAT1:STAT1. Since many of the subunits switch their partners in diferent *Correspondence: [email protected] †Kyu-Seon Oh, Jisu Ha are Co-frst authors cellular contexts, detecting one subunit does not distin- 1 Laboratory of Molecular Biology and Immunology, National Institute guish which specifc dimer species may occupy the site. on Aging, National Institutes of Health, 251 Bayview Boulevard, Baltimore, Perhaps the most serious problems may arise from batch- MD 21224, USA Full list of author information is available at the end of the article to-batch variability of ChIP-seq, refecting technical and

© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat​iveco​mmons​.org/licen​ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat​iveco​mmons​.org/ publi​cdoma​in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 2 of 12

biological variability from several sources including those DNA in chromatin (see [33] for a compilation of - mentioned above. evant data). Tis is an independent and major hurdle, Genomic TF footprinting, an alternate method for since it violates the basis for inferring protein occupancy identifying occupancy of a large number of TFs in one through stable binding of TFs to their cognate motif ele- DNase-seq or ATAC-seq sample, had the potential to ments. Consequently, little or no footprints are detect- circumvent many of these issues with ChIP-based assays able from dynamic TFs in the cut or insertion count [4, 11, 23]. Chromatin accessibility assays rely on the profles. No solution has been proposed to address the ability of DNA-acting enzymes to distinguish protected difculty of detecting signifcant footprints from such sites from accessible sites in the chromatin regard- TFs. Correcting for the nuclease sequence biases pro- less of their specifc DNA sequence content. Although duces a marginal increase in the accuracy of TF binding these methods robustly identify cell state-specifc regu- predictions [33, 34]. Here, we aimed to improve existing latory regions which are 150 bps or larger, footprint- TF footprinting protocols by introducing a cross-linking ing eforts to infer transcription factor (TF) occupancy step to capture highly dynamic interactions of TFs with from nucleotide-level DNase cut count (or transposase target DNA elements in chromatin [27] (Fig. 1a). We insertion count in case of ATAC-seq) profles face difer- asked whether there is an optimal cross-linking proce- ent challenges. Generating cut/insertion count data for dure which preserves and enhances the footprint depths footprinting of a large-size genome requires ultra-deep of most TFs. We present a new footprinting protocol, sequencing of DNase-seq/ATAC-seq libraries. Never- termed XL(cross-link)-DNase-seq, with a mild formal- theless, the improvement in sequencing technology and dehyde cross-linking procedure that can easily be incor- the decreasing cost have made genomic TF footprinting porated prior to the steps in the conventional DNase-seq feasible for many laboratories with proper computational protocols. resources and expertise. However, serious limitations have dampened early Results enthusiasms for using the chromatin accessibility analysis Mild cross‑linking prior to DNase‑seq preserves chromatin methods to identify TF occupancy in an unbiased high- accessibility and generates diferential footprints throughput manner. First, the enzymes used to probe We sought to systematically assess the efects of vari- chromatin (DNase I and Tn5) in these assays were found ous cross-linking procedures on the genomic footprints to have non-negligible DNA sequence preferences for of dynamic TFs in the same chromatin material. For a their reaction, which complicates the assumption that fxed source of chromatin, we chose a cell state in which these nucleases non-specifcally sample accessible nucle- numerous TFs are directly interacting with chromatin in otides [9, 17, 34]. To address this issue, computational a cascade of gene regulatory actions. Since the chroma- algorithms have been developed to take such sequence tin sample is prepared from a cell population containing biases into account when putative footprints are called [1, snapshots of these dynamic interactions, we reasoned 8, 26, 34]. Increasing the depth of sequencing can further that this would be a rich platform to assess changes in mitigate this artifact, for example, with a more accurate footprint depths of many TFs simultaneously. To this adjustment for the enzyme-inherent sequence prefer- end, immortalized mouse macrophage-like RAW264.7 ences directly from the data [9, 34]. cells were used, where many dynamic TFs, including Another problem with genomic TF footprinting NF-κB and AP-1, are activated in response to bacterial was the lack of sufcient protection of DNA from TFs products such as lipopolysaccharide (LPS). Tis cell con- whose chromatin binding is short-lived (referred to as text allows a large number of TFs occupying the chroma- “dynamic TFs” in this study) [9, 33, 34]. Independent live tin, thereby providing an ideal platform for assessing TF cell microscopy analyses and cross-linking analysis have footprint characteristics. We chose this cell system also established that such TFs have high exchange rates or because of the rich information about TF regulatory net- show short chromatin residence times, which together works that the new data will help uncover in a physiologi- suggest short-lived interactions of these TFs with target cally important innate immune cell type. RAW264.7 cells

(See fgure on next page.) Fig. 1 XL-DNase-seq is designed to capture short-lived binding of TFs and preserve chromatin accessibility profles. a Rationale for introducing a cross-linking step prior to digestion by DNase I. b Near-identical DNase-seq density profles across cross-linking conditions. XL-DNase-seq preserves the chromatin accessibility profle as observed by the native DNase-seq. The genome browser shot of the Rel locus shows DNase-seq signal tracks from all cross-linking conditions. c Fragment density normalized to 10 million mapped reads. Genome-wide comparison of DNase-seq fragment density profles across the cross-linking conditions. d A browser shot of cross-linked raw cut count tracks. The top track shows the locations of TF binding motif elements obtained by FIMO. Reference genome: mm9. See also Additional fle 2: Fig. S1 Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 3 of 12

a Native Formaldehyde crosslinked (concentration X, duration Y)

Dynamic TF DNase I TF binding DNA sequence TF binding Unprotected from cleavage Protected from cleavage motif elements

No footprint Deeper footprint?

DNase-seq PE fragment density Number of cuts per bp 0.1%_30sec b DNase-seq c n mm9 chr11: 23,650,000 23,660,000 23,670,000 80 - Native 0.1%_2mi 1 _ 80 - n 0.1%_30sec 1 _ 80 -

0.1%_2min 0.1%_5mi 1 _ 80 - 0.1%_5min 1 _ 80 -

0.1%_10min 0.1%_10min 1 _ 80 - 1%_30sec 1 _ Rel 1%_30sec

Native

d 90,910,600 90,910,700 90,910,800 mm9 chr2: TF motifs Elf4 Elf4 CTCF JunB Ikaros CTCF JunD Ikaros Native 0.1%_30sec 0.1%_2min 0.1%_5min 0.1%_10min 1%_30sec (y range: 0 - 100) Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 4 of 12

have chromatin profles which are similar to those of pri- to specifc TFs purely based on DNA sequence motifs mary macrophages (data not shown) [24], which allows [33], subsequent analyses focused on well-characterized for the discovery of functionally relevant gene regulatory TF motifs and footprints at genomic sites of their occur- mechanisms [13, 18, 19, 35]. rences. Not only did the various XL-DNase-seq samples With the same chromatin material from LPS-stimu- generate diferent numbers of footprints, but the foot- lated RAW264.7, we varied the duration and the con- print strengths and detectability also depended on the centration of the cross-linking agent formaldehyde to cross-linking procedure (Fig. 2b, Additional fle 2: Fig. determine the cross-linking parameters which may afect S2). footprinting characteristics of dynamic TFs (Fig. 1). Based on previous reports on the dominant efect of XL‑DNase‑seq captures more TF footprints with improved cross-linking duration over concentration, we focused accuracy on varying the duration of formaldehyde cross-linking. We analyzed the efects of cross-linking on footprint- A lower formaldehyde concentration of 0.1% was probed ing of NF-κB/RelA, a TF which is known to have short with various cross-linking durations, because a cross- DNA binding residence time over cognate DNA ele- linking kinetics study [27] and our pilot study indicated ments [3]. We assessed its footprint depths at individual that 1% is a saturating concentration for cross-linking κB motif sites across open chromatin in LPS-activated and may potentially interfere with nuclease reactions. macrophages, using cell state-matching RelA ChIP-seq We have performed the modifed DNase-seq, termed peaks as gold standard for TF binding. Receiver Operator cross-link (XL)-DNase-seq and generated a panel of Characteristic (ROC) analysis was performed to quan- sequencing libraries. Te enrichment, complexity, and tify the predictability of putative footprints from each quality of each library were confrmed, and all the librar- cross-link-DNase-seq cut count data (Fig. 3a). Statistical ies were subject to ultra-deep paired-end read sequenc- comparison of the ROC curves revealed that NF-κB foot- ing (Additional fle 1: Table S1). We frst verifed that the prints obtained from 0.1% formaldehyde 30 s cross-link- chromatin accessibility profle is generated independently ing (green curve) were more predictive of actual NF-κB of the mild cross-linking procedure, as observed by the binding in comparison to the footprints from the native reproducibility of DNase-seq fragment density across DNase-seq protocol (Fig. 3b, left). Tis improvement was samples from various cross-linking conditions (Fig. 1b, c). not apparent when assessed by the aggregate cut count Tis was an important frst checkpoint, because excessive profles (Additional fle 2: Fig. S3), underscoring the need cross-linking may induce capture of too many non-spe- to evaluate individual sites rather than average signals cifc factors onto the chromatin [2] and hinder sampling over heterogeneous sites [8, 29]. 1% Formaldehyde 30 s of chromatin by the nuclease (DNase). Generation of cross-linking (purple curve) negatively afected binding a DNase-seq peak relies on the ability of the enzyme to predictions. Interestingly, when we examined another access the hypersensitive site preferentially relative to the TF Ikaros (which lacks microscopy data on DNA bind- fanking region. Our cross-linking procedure was likely ing residence times), all XL-DNase-seq, but the 1% for- mild enough to allow sufciently diferential sampling of maldehyde sample showed improved predictions over chromatin which is refected in the well-preserved acces- the native DNase-seq (Fig. 3b, right). While the observed sibility profles (Fig. 1b, c). improvement in accuracy seems small in the ROC Te total number of putative footprints depended on plots, it is based on gaining several thousands of accu- the cross-linking procedure (Figs. 1d, 2a). While the exact rately predicted binding sites among 10­ 4–105 motif sites numbers of detected footprints difer between results found within open chromatin, therefore a substantial from the diferent methods of correcting the DNase bias improvement. (dimers, tetramers, etc.), the rank order of various cross- In all the ROC analyses, adjusting for the sequence bias linking samples was invariant. 0.1% 30 s XL-DNase-seq of DNase cleavage did not improve the auROC values footprints produced the largest set of footprints, while or change the overall results (Additional fle 2: Fig. S4). the native DNase-seq produced the least number of foot- Some studies have reported improvement of TF binding prints in both sets of biological replicates. Cross-linking prediction from correcting the sequence bias of DNase generally preserved the footprints observed in native [8], while others observed no improvement [14, 34]. Te DNase-seq, and revealed additional footprints (Addi- discrepant results arise probably because these stud- tional fle 2: Fig. S2A). Since the number of footprints ies used diferent computational detection methods to may change as a net result of detecting and missing true call putative footprints. Some detection methods rely on footprints, more insight can be obtained from examin- bias-prone cleavage signatures (shape of cut count pro- ing specifc TF footprints against a set of known bind- fle) and may see signifcant improvement after remov- ing sites. Because it is difcult to assign most footprints ing the sequence bias. Our method DNase2TF does not Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 5 of 12

No crosslinking a Form 0.1% 30s Form 0.1% 2m Form 0.1% 5m Form 0.1% 10m Form 1% 30s 14,000,000 s

12,000,000

10,000,000 Number of footprint

0 Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 Rep1 Rep2 UnadjustedDimer-adjustedTetramer-adjustedHexamer-adjusted

b No crosslinking CTCF footprint z score (tetramer-adjusted)

−4 Form 0.1% 30sec −1 0

Form 0.1% 2min

4 Form 0.1% 5min −1 0−

Form 0.1% 10min

−4 Form 1% 30sec −1 0 −10−4−10 −4 −10−4 Fig. 2 Total number of detected footprints across the cross-linking conditions. a Total number of FDR 1% putative footprints called by DNase2TF was averaged over fve rounds of subsampling. Subsampling of reads was performed to keep the number of uniquely mapped reads the same across conditions for fair comparison (see “Methods”). b Footprint z score comparison across the conditions at CTCF motif sites in open chromatin. See also Additional fle 2: Fig. S2

use the cleavage signatures directly in detecting putative LPS-activated RAW264.7 cells (Fig. 4a). Again, we frst footprints, and therefore may have no further improve- compared the ATAC-seq fragment density profles ment from bias correction. obtained from all the XL-ATAC-seq samples and con- frmed that the introduction of a cross-linking step does XL‑ATAC‑seq does not improve TF footprinting over native not negatively afect assaying chromatin accessibility, ATAC‑seq with only a slight reduction in signal intensity from the Since the widely used ATAC-seq can also generate high- samples cross-linked with 1% formaldehyde (Additional resolution footprinting data [4], we examined whether fle 2: Fig. S5A). However, unlike XL-DNase-seq, analysis cross-linking can have similar efects as described above of the insertion count data revealed that the largest num- for DNase-seq. While the efect of short DNA bind- ber of putative footprints was detected from the native ing residence times was shown in DNase-seq data, it is ATAC-seq samples, and the numbers declined with the likely to produce similar problems for ATAC-seq. Tis is duration and concentration of formaldehyde used for because dynamic TFs allow relatively long durations of cross-linking (Fig. 4b). Tis pattern is observed regardless dissociation from target DNA which would then be vul- of whether we adjusted for the DNA sequence bias of the nerable to enzymatic attacks. We generated a panel of transposase Tn5 (used in ATAC-seq). Similarly to XL- ATAC-seq libraries from mild formaldehyde cross-link- DNase-seq, aggregate insertion count analyses did not ing (XL-ATAC-seq) using the same biological material, reveal gross diferences in footprint depth (Additional Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 6 of 12

a Evaluation of predictions from individual footprints by ROC analysis

DNase I cut count

Binding motif elements for factor X

Prediction for X Bound Unbound Bound Unbound

ChIP-seq of X in same cell state

Outcome True True False False positive negative positive negative

Count genome-wide occurrence rates for each score threshold b 10 0 80 60 NF-ĸB/RelA Ikaros 40 Sensitivity (% ) auROC auROC No crosslinking (MS07) 0.893869 No crosslinking (MS07) 0.847303 Form 0.1% 30s (MS08) 0.909254 p = 0.015117 Form 0.1% 30s (MS08) 20 0.870025 p = 0.000000 Form 0.1% 2m (MS09) 0.894808 p = 0.886232 Form 0.1% 2m (MS09) 0.857496 p = 0.000432 Form 0.1% 5m (MS10) 0.890809 p = 0.663749 Form 0.1% 5m (MS10) 0.860569 p = 0.000019 Form 0.1% 10m (MS11) 0.896403 p = 0.715691 Form 0.1% 10m (MS11) 0.858163 p = 0.000319 Form 1% 30s (MS12) p = 0.109453 0 0.883438 Form 1% 30s (MS12) 0.845484 p = 0.555298

100806040200100806040200 Specificity (%) Specificity (%) Fig. 3 Cross-linking improves DNase-seq footprint-based prediction of TF binding. a Evaluation procedure in ROC analysis. b ROC curves for all the cross-link-DNase-seq samples are plotted using dimer-adjusted cut count data. The area under the ROC curve (auROC) values are shown along with the p values calculated by the ‘pROC’ package of R Bioconductor. Reference binding set was obtained from ChIP-seq data of LPS-activated macrophages. ROC analysis was performed for NF-κB (left: n 99,089) and Ikaros (right: n 165,392). See also Additional fle 2: Figs. S3, S4 = =

fle 2: Fig. S6A), and footprint detectability and strength formaldehyde (Fig. 4a, Additional fle 2: Fig. S6C). Puta- varied across XL-ATAC-seq samples (Additional fle 2: tive footprints detected in the native ATAC-seq data (red Fig. S6B). curves) produced the most accurate predictions of NF-κB To gain a more detailed insight, we performed an ROC and Ikaros binding to their cognate motif elements in analysis for NF-κB/RelA and Ikaros with XL-ATAC-seq open chromatin. Te best-performing ATAC-seq (native) footprints, in the same manner as for XL-DNase-seq nevertheless had lower auROC values compared to the footprints. Te prediction accuracy, as indicated by the XL-DNase-seq data (Additional fle 2: Fig. S7A). Consist- shape of ROC curves and quantifed by auROCs, gen- ent with this observation, several studies indicated that erally declined with the duration and concentration of TF footprinting based on ATAC-seq performs poorly in Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 7 of 12

a ATAC-seq 90,910,600 90,910,700 90,910,800 mm9 chr2: TF motifs Elf4 Elf4 CTCF JunB Ikaros CTCF JunD Ikaros Native 0.1%_30sec 0.1%_2min 0.1%_5min 0.1%_10min 1%_30sec (y range: 0 - 50) b 30,000,000 No crosslinking

s Form 0.1% 30s 25,000,000 Form 0.1% 2m Form 0.1% 5m 20,000,000 Form 0.1% 10m 15,000,000 Form 1% 30s

10,000,000

Number of footprint 5,000,000

0 Rep1 Rep2 Rep1 Rep2 Unadjusted Hexamer-adjusted c 00 01 ) 08 NF-ĸB/RelA Ikaros 06 Sensitivity (%

auROC auROC No crosslinking (MS25-26) No crosslinking (MS25-26) 0.847288 04 0.888492 Form 0.1% 30s (MS27-28) 0.863174 p = 0.009731 Form 0.1% 30s (MS27-28) 0.818197 p = 0.000000 Form 0.1% 2m (MS29-30) 0.851871 p = 0.000039 Form 0.1% 2m (MS29-30) 0.817869 p = 0.000000 Form 0.1% 5m (MS31-32) 0.866186 p = 0.011900 Form 0.1% 5m (MS31-32) 0.821036 p = 0.000000 Form 0.1% 10m (MS33-34) 0.839632 p = 0.000000 Form 0.1% 10m (MS33-34) 0.809093 p = 0.000000 02 Form 1% 30s (MS35-36) 0.785111 p = 0.000000 Form 1% 30s (MS35-36) 0.738732 p = 0.000000

100806040200100806040200 Specificity (%) Specificity (%) Fig. 4 Cross-linking hinders ATAC-seq footprint-based prediction of dynamic TF binding. a A browser shot of cross-linked raw insertion count tracks. The top track shows the locations of TF binding motif elements obtained by FIMO. Reference genome: mm 9. b Total number of FDR 1% putative footprints called by ATAC2TF (DNase2TF modifed for ATAC-seq) was averaged over fve rounds of subsampling. Subsampling of reads was performed to keep the number of uniquely mapped reads the same across conditions for fair comparison (see “Methods”). c ROC curves for all the cross-link-ATAC-seq samples are plotted using unadjusted insertion count data for RelA and Ikaros. Adjusting for the sequence bias of Tn5 insertion did not improve the predictions. The area under the ROC curve (auROC) values are shown along with the p values calculated by the ‘pROC’ package of R Bioconductor. Reference binding set was obtained from RelA ChIP-seq data of primary macrophages. See also Additional fle 2: Figs. S5, S6 Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 8 of 12

comparison to DNase-seq [15, 28, 29]. Te cross-linking- reproducible regulatory connections from PU.1 to many dependent pattern of poor predictions from XL-ATAC- macrophage/immune-relevant : Ncoa3 (a.k.a. Src- seq footprints was highly reproducible in technical and 3, involved in defense against bacteria) [5], Hcst (a.k.a. biological replicates. Tese results indicate that even the DAP10, which induces osteoclastogenic signaling in mye- mild cross-linking conditions may interfere with the abil- loid cells) [12], Atrx (a heterochromatin silencer), Mier1 ity of Tn5 to sample the nucleotide base pairs with high (a HDAC-binding transcriptional corepressor), Arid1a insertion efciency. (a.k.a. BAF250a, a component of SWI/SNF). In addi- We chose ROC as our main measure of comparison, tion, well-known regulatory targets of RelA/NF-κB such because of its wide usage for evaluating predictions as Nfkbia (a negative feedback gene), Rel (an immune- from footprints and the simple representation of a ran- specifc subunit of NF-κB), and Nfkb2 (an alternative dom baseline as the diagonal. Another measure, preci- dimer subunit of NF-κB) were robustly detected. RelA sion-recall (PR) curves may also be useful especially for was also a putative regulator of Tfe3 (involved in mac- unbalanced binary outcome data. Te PR curves gener- rophage autophagy and cytokine response, also detected ally preserved the top-performing samples (0.1% 30 s as a putative target of PU.1 in multiple networks) [25] XL-DNase-seq among DNase samples and native among and Tal2 (a known target of PU.1) [6]. Tese regulatory ATAC-seq samples), with the intermediate rankings var- connections were based on footprints detectable in both ying somewhat over diferent recall ranges (Additional native and XL-DNase-seq data. fle 2: Fig. S7B). For an overall comparison of all the TF networks con- structed separately, we generated a pairwise similarity Construction of TF regulatory networks and reproducibility matrix and found that XL-DNase-seq-derived networks An ultimate goal of TF footprinting is to discover novel have a stronger consensus, i.e., more regulatory edges mechanisms in TF regulatory networks. A large-scale were repeatedly detected in multiple cross-linking con- study in constructing such networks solely based on ditions, than XL-ATAC-seq-derived networks (Fig. 6). TF footprints and TF binding motif databases revealed Despite the lower consensus among XL-ATAC-seq- intriguing fndings [22, 32], but lack of footprints for derived networks, XL-ATAC-seq networks were still many TFs and other technical issues raised questions more similar to each other than to TF networks derived about the signifcance of such eforts [8, 9, 33, 34]. Given from XL-DNase-seq (Fig. 6, lower right square block). the remaining technical challenges (including assign- Notably, there were recurrent regulatory edges that are ment of TF identity to motifs), we chose a conservative reproduced in multiple cross-linking conditions and even approach aiming to construct higher-confdence TF reg- between DNase-seq and ATAC-seq protocols, which ulatory networks. To this end, we generated TF regula- probably represent highest-confdence TF regulatory tory networks using stringent criteria on TF motifs and events. footprint detection thresholds. We also focused only on a Importantly, networks derived from footprints in select set of TFs for footprinting and limited the analysis cross-linked samples had more uniquely detected edges on TF-encoding genes which have footprints of the select (Fig. 5b, black edges) than that from the native DNase-seq TFs. Each TF network was constructed by compiling (Additional fle 2: Fig. S8, black edges), suggesting that putative regulatory edges (Fig. 5a, A → B, i.e., TF A regu- cross-linking helps capture novel binding events. Among lates TF B). A regulatory edge indicates that one or more the connections identifed as novel regulatory relation- footprints of A were detected within 5 kb of transcription ships were: Tln1 (a.k.a. Talin, involved in mechanical start site of B, from the given XL-DNase-seq sample. responses and EMT) has a Nfkb1 footprint only from Comparison of six TF networks from the cross-linking XL-DNase. Nfkbie, a known target of NF-κB in a negative conditions revealed a substantial set of shared regulatory feedback loop, has RelA footprint only from XL-DNase. edges (Fig. 5b, Additional fle 2: Fig. S8). Such reproduc- Etv3, induced during macrophage diferentiation [31] ibility in regulatory relationships was surprising, given had RelA and Nfkb1 footprints only in XL-DNase. We the variability observed in the footprint Z score profles note that Tln1, Nfkbie, and Etv3 are all likely direct tar- and the diferential detectability of footprints across the gets of NF-κB/RelA because their transcripts are imme- XL-DNase-seq samples (Fig. 3, Additional fle 2: Fig. S2). diately induced by LPS treatment in macrophages [24]. A closer look provided a reason for this robust consen- analysis indicated that shared as well as sus among the independently constructed TF networks: distinct functional categories were enriched among the a shared edge is often supported by multiple redun- genes found in newly detected regulatory relationships dant footprints and detection of at least one is sufcient (Additional fle 2: Fig. S9). Tese results suggest that TF for reproducing the edge in a network derived from a regulatory networks constructed from XL-DNase-seq given XL-DNase-seq sample. For example, we found contain both robust regulatory relationships and novel Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 9 of 12

a Footprint of TF A

Regulatory connection AB TSS of TF B gene

-5 kb 5 kb Wbp4

Xbp1 Zfp668 Skil Rchy1 Sp3 b Atrx

Zfp148 Max Thap1 Mta3 Klf3 Zfp143 Zfp444 Rc3h1 Jun Hmga1b Csrnp1 Irf2 Hmga1 Tfap4 Nfkbil1 Ncoa3

Emg1 Plagl2 Aebp2 Mtf1 Zfp438 Nfatc3 Sp2 Rara Spi1 Patz1 Cct6b Zfp362 Ikzf1 Zfp296 Zfp512 Smarca2 Nfkbie Zfp598 Thtpa

Smarca4 Zfp90 Zfp384 Zfp605 Zbtb7b Atf6b Hcst Zfp607a Zbtb5 Ncoa4 Atp5j Foxn2 Smarcad1 Dnmt3l Smarcd1 E030011O05Rik Fosl2 Nr1h3 Nr1d2 Irf5 Arid4b Arid1a Cebpd Zfp592 Foxm1 Noc3l Zfp523 Tln1 Stat5a Tal1 Zfp652 Tapbp Mier1 Zscan29 Nfat5 Atg9a Bclaf1 Sp3os Mxd3 Ssrp1 Nfkbia Sp100 Gm15417 Nfkb2 Zfp53 Rela

4833438C02Rik Zfp317 Etv3 Tfe3 Nfya Trp53 Nfkbib Zfp407 Irf9 Zfp429 Mir5098 Zfp260 Rel Tal2

Topors Rreb1 Gtf2i Nfkb1 Irf3 Maz Arid5a Ubp1 Foxd2 Ccdc77 Fli1 Rpl41 Klf6 Kat5 Shared by native DNase-seq Unique in XL-DNase-seq Fig. 5 TF regulatory networks from XL-DNase-seq have reproducible connections as well as new connections. a De novo construction of transcriptional regulatory networks. FDR 1% footprints within 5 kb of TSSs were used to obtain regulatory relationships. b TF regulatory network derived from 0.1% formaldehyde 10 min XL-DNase-seq data (MS5 and MS11, merged) after correcting for the dimer bias of DNase. Regulatory connections which were also detected in the network from native DNase-seq are marked in blue. See also Additional fle 2: Fig. S8 Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 10 of 12

Jaccard Form 0.1% 10m Index 0.350 0.5 0.4 Form 0.1% 5m 0.403 0.354 0.3 Higher network similarity 0.2 (more shared edges) 0.1 AC-seq Form 0.1% 2m 0.374 0.371 0.355 T 0.0

XL-A Form 0.1% 30s 0.393 0.413 0.382 0.337 TF networks from

Native 0.439 0.405 0.401 0.366 0.348

Form 1% 30s 0.240 0.247 0.238 0.294 0.206 0.252

Form 0.1% 10m 0.431 0.306 0.316 0.309 0.316 0.227 0.288

Form 0.1% 5m 0.549 0.443 0.304 0.292 0.327 0.340 0.253 0.318

Form 0.1% 2m 0.570 0.548 0.444 0.278 0.283 0.303 0.289 0.239 0.315 XL-DNase-seq

TF networks from Form 0.1% 30s 0.500 0.472 0.442 0.471 0.250 0.258 0.239 0.269 0.184 0.248

Native 0.452 0.529 0.586 0.536 0.407 0.289 0.281 0.315 0.281 0.241 0.285

s

Native

Form 1% 30s Form 1% 30s Form 0.1% Form2m 0.1% 5m Form 0.1% Form2m 0.1% 5m Form 0.1% 30s Form 0.1% 10m Form 0.1% 30 Form 0.1% 10m

TF networks from TF networks from XL-DNase-seq XL-ATAC-seq Fig. 6 Comparison of TF regulatory networks derived from TF footprints in XL-DNase-seq and XL-ATAC-seq data. The similarity between all the pairs of TF regulatory networks was quantifed using Jaccard Index (ranging from 0 to 1), and the results are represented in a heatmap. The number in each component indicates the Jaccard Index of the given pair network wiring involving dynamic TFs whose footprints produce an improvement for ATAC-seq. Tese diver- are missed in native DNase-seq. gent outcomes may be related to the diferent enzy- matic reactions employed by DNase and Tn5 and their diferential efciency in attacking cross-linked chro- Discussion matin. ATAC-seq has been reported to produce TF Here we present a systematic comparison, designed footprints with lower binding prediction accuracy in to investigate whether introduction of a mild cross- comparison with DNase-seq by several groups using linking step helps capture footprints of dynamic TFs in diferent datasets [15, 28, 29], except for one recently DNase-seq and ATAC-seq data. We generated techni- published study [20]. It will be interesting to re-assess cal replicates and biological replicates of LPS-activated ATAC-seq footprinting with the newly introduced macrophage cell line, reaching an unprecedented repli- computational approach [20]. Finally, the numerous cate sampling and sequencing depths. Cumulative read replicates allowed us to examine reproducible features number from all DNase-seq and ATAC-seq replicates of TF footprinting, not only at the level of aggregate are 7.5 billion reads and 5.8 billion reads, respectively. signals, but also at the level of individual footprints and Statistically signifcant prediction improvement was resulting TF regulatory network wiring. achieved in XL-DNase-seq, but cross-linking did not Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 11 of 12

For a higher-confdence evaluation result, we have We have determined the efects of various cross- focused our analysis on a set of dynamic TFs with well- linking protocols on TF binding predictions based on characterized sequence motifs and available ChIP-seq footprints in XL-ATAC-seq or XL-DNase-seq data, in data in the same cell type. We found that the use of ChIP- quantitative and objective terms. We demonstrate that seq from matching cell states is important, as the predic- XL-DNase-seq improves footprintability of certain TFs, tion accuracy was decreased when ChIP-seq was taken while the same analysis revealed that XL-ATAC-seq from a diferent time point in a stimulation time course fails to enhance footprinting, compared to the native even with the same cell type (data not shown). Tis and protocol. Tese fndings will critically inform investi- other pitfalls known to afect footprinting analysis [33] gators as they utilize the methodologies. For example, suggest that a defnitive result cannot be obtained for a while ATAC-seq is widely popular due to its simplicity large number of TFs due to their uncertain motifs and and small-scale cell yields required for the assay, XL- limited (cell state-matching) ChIP-seq data among public DNase-seq may ofer unique advantages if the study databases. Terefore, we could not rely on a large-scale calls for higher-resolution occupancy information of analysis involving hundreds of TF motifs in our attempt TFs. In addition, our dataset represents an unprece- to accurately assess the efects of cross-linking proce- dented-scale resource for the epigenetics and immu- dures on dynamic TFs. nology communities. Previous studies have pinpointed nuclear receptors, such as GR and ER, as a special class of TFs because their chromatin binding produces “anti-footprints”, i.e., a slight enhancement of (instead of protection from) cleavage at Additional fles the binding motif elements [8, 9, 34]. It would be inter- esting to perform our analysis using cells with ligand- Additional fle 1. Supplementary table and methods. activated nuclear receptors and to determine whether Additional fle 2. Supplementary fgures. cross-linking enhances the depth of their footprints. However, because of the largely one-to-one relation- Acknowledgements ship between ligands and nuclear receptors, a separate We thank Stephen Smale for providing ChIP-seq data for ATF3, CEBPs, CREB, chromatin sample must be prepared for each nuclear Fos, JunB, JunD and Ana Pombo for helpful discussions. We also wish to thank Gordon Hager who provided funding for sequencing early pilot samples. , making analyses of multiple nuclear recep- Computational tasks were performed using the US National Institutes of tors prohibitively expensive. Even though we could not Health (NIH) Biowulf cluster, a GNU-Linux parallel processing system. We address nuclear receptors with our cell system, these thank the NIH High Performance Computing staf for the management of the Biowulf system. dynamic TFs may also beneft from a carefully chosen cross-linking procedure, given that they can be efciently Authors’ contributions immobilized onto chromatin by formaldehyde in primary M-HS conceived the project. K-SO performed all the chromatin work. JH, SB, and M-HS analyzed the data. M-HS supervised the project and wrote the immune cells [36]. manuscript with input from all the authors. All authors read and approved the We have focused on interrogating cross-linking dura- fnal manuscript. tion to identify an optimal cross-linking condition, Funding because a previous study of cross-linking kinetics found This work was supported by the Intramural Research Program of the NIH at that TF occupancy, observed by ChIP, largely depends National Institute on Aging and National Cancer Institute. Sequencing of on the cross-linking duration but is rather insensitive to DNase-seq and ATAC-seq was supported by the NIA Scientifc Director’s Chal‑ lenge Award and the NIH Deputy Director of Intramural Research Innovation the formaldehyde concentration [27]. Hence, it is some- Award, respectively. what surprising that we did not observe a clearly best- performing cross-linking duration which seems optimal Ethics approval and consent to participate Not applicable. for the TFs examined in our cross-link-DNase-seq analy- sis, in so far as the formaldehyde concentration is not too Consent for publication high. On the other hand, this result may make the assay Not applicable. more straightforward for other investigators to imple- Competing interests ment, because they may not need to reoptimize the dura- The authors declare that they have no competing interests. tion of the 0.1% formaldehyde cross-linking for their Author details cell types and TFs of interest. Improved footprinting of 1 Laboratory of Molecular Biology and Immunology, National Institute dynamic TFs with cross-link-DNase-seq, together with on Aging, National Institutes of Health, 251 Bayview Boulevard, Baltimore, MD 2 ever-decreasing cost of sequencing, will facilitate eforts 21224, USA. Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, National Institutes of Health, 41 Library Drive, Bethesda, MD to discover novel TF regulatory mechanisms without the 20892, USA. need to pre-select targeting antibodies. Oh et al. Epigenetics & Chromatin (2019) 12:30 Page 12 of 12

Received: 1 March 2019 Accepted: 17 May 2019 18. Li N, Katz S, Dutta B, Benet ZL, Sun J, Fraser ID. Genome-wide siRNA screen of genes regulating the LPS-induced NF-kappaB and TNF-alpha responses in mouse macrophages. Sci Data. 2017;4:170008. 19. Li N, Sun J, Benet ZL, Wang Z, Al-Khodor S, John SP, Lin B, Sung MH, Fraser ID. Development of a cell system for siRNA screening of pathogen References responses in human and mouse macrophages. Sci Rep. 2015;5:9559. 1. Baek S, Goldstein I, Hager GL. Bivariate genomic footprinting detects 20. Li Z, Schulz MH, Look T, Begemann M, Zenke M, Costa IG. Identifca‑ changes in transcription factor activity. Cell Rep. 2017;19:1710–22. tion of transcription factor binding sites using ATAC-seq. Genome Biol. 2. Baranello L, Kouzine F, Sanford S, Levens D. ChIP bias as a function of 2019;20:45. cross-linking time. Res. 2016;24:175–81. 21. Mumbach MR, Satpathy AT, Boyle EA, Dai C, Gowen BG, Cho SW, Nguyen 3. Bosisio D, Marazzi I, Agresti A, Shimizu N, Bianchi ME, Natoli G. A ML, Rubin AJ, Granja JM, Kazane KR, et al. Enhancer connectome in hyper-dynamic equilibrium between promoter-bound and nucleo‑ primary human cells identifes target genes of disease-associated DNA plasmic dimers controls NF-kappaB-dependent gene activity. EMBO J. elements. Nat Genet. 2017;49:1602–12. 2006;25:798–810. 22. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stama‑ 4. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition toyannopoulos JA. Circuitry and dynamics of human transcription factor of native chromatin for fast and sensitive epigenomic profling of open regulatory networks. Cell. 2012;150:1274–86. chromatin, DNA-binding and nucleosome position. Nat Meth‑ 23. Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thur‑ ods. 2013;10:1213–8. man RE, John S, Sandstrom R, Johnson AK, et al. An expansive human 5. Chen W, Lu X, Chen Y, Li M, Mo P, Tong Z, Wang W, Wan W, Su G, Xu J, et al. regulatory lexicon encoded in transcription factor footprints. Nature. Steroid receptor coactivator 3 contributes to host defense against enteric 2012;489:83–90. bacteria by recruiting neutrophils via upregulation of CXCL2 expression. J 24. Oh KS, Patel H, Gottschalk RA, Lee WS, Baek S, Fraser IDC, Hager GL, Sung Immunol. 2017;198:1606–15. MH. Anti-infammatory chromatinscape suggests alternative mechanisms 6. Courtial N, Mucke C, Herkt S, Kolodziej S, Hussong H, Lausen J. The T-cell of action. Immunity. 2017;47(298–309):e295. oncogene Tal2 Is a Target of PU.1 and upregulated during osteoclas‑ 25. Pastore N, Brady OA, Diab HI, Martina JA, Sun L, Huynh T, Lim JA, Zare togenesis. PLoS ONE. 2013;8:e76637. H, Raben N, Ballabio A, et al. TFEB and TFE3 cooperate in the regulation 7. Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, of the innate immune response in activated macrophages. Autophagy. Ho A, Mei PH, et al. An oestrogen-receptor-alpha-bound human chroma‑ 2016;12:1240–58. tin interactome. Nature. 2009;462:58–64. 26. Pique-Regi R, Degner JF, Pai AA, Gafney DJ, Gilad Y, Pritchard JK. Accurate 8. Gusmao EG, Allhof M, Zenke M, Costa IG. Analysis of computational inference of transcription factor binding from DNA sequence and chro‑ footprinting methods for DNase sequencing experiments. Nat Methods. matin accessibility data. Genome Res. 2011;21:447–55. 2016;13:303–9. 27. Poorey K, Viswanathan R, Carver MN, Karpova TS, Cirimotich SM, McNally 9. He HH, Meyer CA, Hu SS, Chen MW, Zang C, Liu Y, Rao PK, Fei T, Xu H, JG, Bekiranov S, Auble DT. Measuring chromatin interaction dynamics on Long H, et al. Refned DNase-seq protocol and data analysis reveals the second time scale at single-copy genes. Science. 2013;342:369–72. intrinsic bias in transcription factor footprint identifcation. Nat Methods. 28. Pranzatelli TJF, Michael DG, Chiorini JA. ATAC2GRN: optimized ATAC-seq 2014;11:73–8. and DNase1-seq pipelines for rapid and accurate genome regulatory 10. He Q, Johnston J, Zeitlinger J. ChIP-nexus enables improved detec‑ network inference. BMC Genom. 2018;19:563. tion of in vivo transcription factor binding footprints. Nat Biotechnol. 29. Quach B, Furey TS. DeFCoM: analysis and modeling of transcription factor 2015;33:395–401. binding sites using a motif-centric genomic footprinter. Bioinformatics. 11. Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, 2017;33:956–63. Thurman RE, Neph S, Kuehn MS, Noble WS, et al. Global mapping of 30. Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interac‑ protein-DNA interactions in vivo by digital genomic footprinting. Nat tions detected at single-nucleotide resolution. Cell. 2011;147:1408–19. Methods. 2009;6:283–9. 31. Sawka-Verhelle D, Escoubet-Lozach L, Fong AL, Hester KD, Herzig S, 12. Inui M, Kikuchi Y, Aoki N, Endo S, Maeda T, Sugahara-Tobinai A, Fujimura Lebrun P, Glass CK. PE-1/METS, an antiproliferative Ets repressor factor, S, Nakamura A, Kumanogoh A, Colonna M, et al. Signal adaptor DAP10 is induced by CREB-1/CREM-1 during macrophage diferentiation. J Biol associates with MDL-1 and triggers osteoclastogenesis in cooperation Chem. 2004;279:17772–84. with DAP12. Proc Natl Acad Sci USA. 2009;106:4816–21. 32. Stergachis AB, Neph S, Sandstrom R, Haugen E, Reynolds AP, Zhang 13. John SP, Sun J, Carlson RJ, Cao B, Bradfeld CJ, Song J, Smelkinson M, M, Byron R, Canfeld T, Stelhing-Sun S, Lee K, et al. Conservation of Fraser IDC. IFIT1 exerts opposing regulatory efects on the infammatory trans-acting circuitry during mammalian regulatory evolution. Nature. and interferon gene programs in LPS-activated human macrophages. 2014;515:365–70. Cell Rep. 2018;25(95–106):e106. 33. Sung MH, Baek S, Hager GL. Genome-wide footprinting: ready for prime 14. Kahara J, Lahdesmaki H. BinDNase: a discriminatory approach for tran‑ time? Nat Methods. 2016;13:222–8. scription factor binding prediction using DNase I hypersensitivity data. 34. Sung MH, Guertin MJ, Baek S, Hager GL. DNase footprint signa‑ Bioinformatics. 2015;31:2852–9. tures are dictated by factor dynamics and DNA sequence. Mol Cell. 15. Karabacak Calviello A, Hirsekorn A, Wurmus R, Yusuf D, Ohler U. Repro‑ 2014;56:275–85. ducible inference of transcription factor footprints in ATAC-seq and 35. Sung MH, Li N, Lao Q, Gottschalk RA, Hager GL, Fraser ID. Switching of the DNase-seq datasets using protocol-specifc bias modeling. Genome Biol. relative dominance between feedback mechanisms in lipopolysaccha‑ 2019;20:42. ride-induced NF-kappaB signaling. Sci Signal. (2014b);7:ra6. 16. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, 36. Taves MD, Mittelstadt PR, Presman DM, Hager GL, Ashwell JD. Single-cell Bernstein BE, Bickel P, Brown JB, Cayting P, et al. ChIP-seq guidelines and resolution and quantitation of targeted glucocorticoid delivery in the practices of the ENCODE and modENCODE consortia. Genome Res. thymus. Cell Rep. 2019;26(3629–3642):e3624. 2012;22:1813–31. 17. Lazarovici A, Zhou T, Shafer A, Dantas Machado AC, Riley TR, Sandstrom R, Sabo PJ, Lu Y, Rohs R, Stamatoyannopoulos JA, et al. Probing DNA shape Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub‑ and methylation state on a genomic scale with DNase I. Proc Natl Acad lished maps and institutional afliations. Sci USA. 2013;110:6376–81.