<<

Hit and run transcriptional repressors are difficult to catch in the act

Manan Shah1, Alister P.W. Funnell1,2, Kate G.R. Quinlan1 and Merlin Crossley1

1School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia

2Altius Institute for Biomedical Sciences, Seattle, Washington, USA

*Corresponding Author: [email protected]

Abstract

Transcriptional silencing may not necessarily depend on continuous residence of a sequence- specific repressor at a control element and may act via a ‘hit and run’ mechanism. Due to limitations in assays that detect factor (TF) binding, such as ChIP-Seq, this phenomenon may be challenging to detect and therefore its prevalence may be underappreciated. To explore this possibility we analysed erythroid promoters that are repressed directly by GATA1 in an inducible system. We found many repressed after bound immediately after induction of GATA1 but the residency of GATA1 decreases over time particularly at repressed genes. Furthermore, we show that the repressive mark H3K27me3 is seldom associated with bound repressors, whereas in contrast the active (H3K4me3) histone mark is overwhelmingly associated with TF binding. We hypothesise that during cellular differentiation and development, certain genes are silenced by repressive TFs that subsequently vacate the region. Catching such repressor TFs in the act of silencing via assays such as ChIP- Seq is thus a temporally challenging prospect. The use of inducible systems, epitope tags and alternative techniques may provide opportunities for detecting elusive ‘hit and run’ transcriptional silencing.

Keywords hit and run, , repressor, ChIP-Seq, transcriptional repression, transcriptional regulation, GATA1

Abbreviations ChIP - immunoprecipitation; ChIP-Seq - chromatin immunoprecipitation followed by high throughput sequencing; CUT&RUN - Cleavage Under Targets and Release Using Nuclease; FOG - Friend of GATA1; LacI - bacterial lac repressor; PIC – pre-initiation complex; PRC2 - polycomb repressive complex 2; REST/NRSF - RE1-Silencing Transcription Factor/Neuron-Restrictive Silence Factor; TALENs - transcription activator- like effector nucleases; TF - transcription factor; TSS - transcription start site; ZNFs - nucleases

Introduction

Turning on the expression of a gene typically requires the binding of activator transcription factors (TFs) to regulatory DNA sequence elements, such as promoters and distal enhancers. TF binding subsequently leads to the recruitment of activating cofactors, histone modifying enzymes, and ultimately RNA polymerase II and the general TFs, which form the pre-initiation complex (PIC) [1]. Repeated rounds of transcriptional initiation are achieved by sustained occupancy or re-attendance by the activating TFs.

Gene silencing, on the other hand, is different in that it is not necessarily dependent on the continuous residence of a sequence-specific repressor at a control element. This is an important distinction, because assays that detect genome-wide TF binding activity, such as ChIP-Seq (Chromatin Immunoprecipitation followed by high throughput sequencing), typically provide a static snapshot of occupancy precisely as it stands at the time-point of assay. For this reason, ChIP-Seq does not necessarily lend itself to the detection of some modes of transcriptional repression. This is an aspect that when overlooked, can skew our understanding of the gene regulatory mechanisms at play in a given cell type.

Here we propose that activator TFs, and certain repressor TFs that operate via sustained or frequent binding of a nucleosome-depleted region, are readily amenable to detection by assays such as ChIP-Seq. In contrast, transcriptional repressors that elicit stable, long-term silencing by a ‘hit and run’ mechanism, will be more challenging to detect. This latter category will require dynamic, temporally controlled systems that allow such repressors to be ‘caught in the act’ during their window of silencing.

In other words – repressors can act differently to activators. To use an analogy, in some cases, once the gene is locked down, the gaoler can sometimes, but not always, decide to throw away the key. In contrast, where gene activation is concerned, some sort of a key is likely to be evident every time the gene is transcribed.

Figure 1. Models of transcriptional repression. a) Sustained TF binding represses the gene and is amenable to detection via ChIP or other TF binding assays. b) A ‘hit and run’ mechanism; The TF is bound initially and may recruit histone modifying enzymes/chromatin remodellers, which package the , preventing transcription and further TF binding. Binding is able to be detected initially but not once the locus is tightly packed (heterochromatin).

Transcriptional repression and activation may occur through different mechanisms

A classic example of a transcriptional repressor is the bacterial lac repressor (LacI) [2–4]. In the absence of lactose, it resides at the promoters of genes required for lactose metabolism and blocks engagement by RNA polymerase. Upon lactose exposure, LacI undergoes allosteric changes that abrogate its ability to bind these promoters, and subsequently, RNA polymerase is recruited and transcription commences. LacI is thus an example of a continuously binding repressor that renders the lac operon poised to dynamically respond to fluctuating metabolic requirements.

Similarly, a textbook example of a mammalian transcriptional repressor is REST/NRSF (RE1- Silencing Transcription Factor/Neuron-Restrictive Silence Factor). REST resides at the control elements of neuronal genes in non-neuronal cells and represses their transcription [5–7]. In a very broad sense, REST thus silences in a similar fashion to LacI; it remains bound to its target genes for the duration of silencing, a characteristic that is amenable to ready detection. As such, REST is often invoked as a paradigmatic example of a mammalian transcriptional repressor.

Yet how generally representative is REST of the mechanism of action of mammalian repressors? Do all sequence-specific transcriptional repressors remain bound to the control elements of the genes they silence throughout the life of the cell? Or rather, is it that we are prone to preferentially detect repressors that operate through this mode by assays such as ChIP- Seq?

In somatic eukaryotic cells, only a fraction of the transcriptome is selectively expressed [8,9] and this in turn defines cell type identity. For any given cell type, thousands of genes are silent or repressed, a state that may be preserved through rounds of cell division or differentiation. Some of these may be continuously repressed by resident transcription factors akin to REST. It is tempting to speculate that these genes are in effect reversibly repressed, or poised to respond to specific stimuli, and that displacement of the resident TF could lead to their activation.

Other genes, however, are stably repressed by general mechanisms of chromatin silencing. For instance, they may be silenced by local Polycomb-mediated H3K27 trimethylation and formation of facultative heterochromatin [10]. Alternatively, nucleosomal occlusion following remodeling may prevent the access of requisite activating TFs that have limited or no pioneering activity. Indeed, an important distinction to make is that in eukaryotes, as opposed to prokaryotes, the baseline state of chromatin is transcriptionally restrictive [11,12]. That is to say, in vivo, eukaryotic core promoters are intrinsically inert and are occluded by nucleosomes. Transcriptional initiation requires the coordinated action of transcriptional activators to displace nucleosomes and recruit the PIC. The ground state is thus one of transcriptional inactivity and the repressive capacity of nucleosomes alone or facultative heterochromatin can maintain gene silencing long after the departure of the repressor(s) that elicited the initial silencing. While sequence-specific repressors may be required to initially silence an active gene, they may not be essential for durable propagation of the repressed state. This mechanism of action is described as ‘hit and run’.

These stable repression mechanisms are particularly important during development, enabling perpetuation of gene expression patterns through cell divisions and cellular differentiation. The patterning of Hox gene expression in Drosophila early embryogenesis is well studied and provides useful insights. During early embryogenesis, locally expressed gap gene products (transcriptional activators and repressors) directly bind to cis-acting regulatory regions in Hox genes and regulate transcription. The Hox genes that are initially repressed become stably repressed for the rest of development even when the gap repressors are no longer present. This repression is maintained by factors of the Polycomb group (PcG) [13–15]. Removal of most PcG from proliferating cells causes de-repression. Interestingly, the resupply of these proteins within a certain window of time can cause re-repression [16]. This suggests that a mechanism beyond simple nucleosomal occlusion may be required for true long-term repression across multiple cell divisions or development. The maintainence of repression even after the initial transcriptional repressors are no longer present is consistent with a ‘hit and run’ mechanism.

Where might one expect to detect evidence for ‘hit and run’ repressors?

Many TFs appear to function as either activators or repressors depending on cellular and /enhancer context. The milieu of available factors and cofactors in a cell or at a specific control region can determine whether the TF activates or represses. This is revealed in experiments where TFs are knocked out in cell or models, and the immediate changes in expression, presumably representing directly regulated genes, are not unidirectional.

The assumption that transcriptional activation typically requires the sustained binding of a TF whereas repression may be achieved by a ‘hit and run’ mechanism, sets up the following prediction: ChIP-seq peaks will be found at or proximal to activated genes, whereas peaks will, after time, be rarer at genes that are repressed by the TF.

We decided to test this prediction by considering the genes regulated by the well-characterised erythroid master regulator GATA1, which is known to both activate and repress transcription [17]. While we hypothesise that the ‘hit and run’ mechanism may also act via more distally located elements, such as enhancers, these elements are harder to unequivocally link to the genes they regulate, so in the current analysis we have focussed on promoters and proximal GATA1 binding and regulation.

Loci with evidence of ‘hit and run’ repression

GATA1 – a repressor and an activator

GATA1 is an essential erythroid TF that is known to have both activating and repressing functions [17]. As an activator, it binds control elements in conjunction with the TAL1/LDB1/LMO2 complex [18]. Alternatively, GATA1 can both activate and repress gene expression through interactions with Friend of GATA1 (FOG1) and the NuRD complex [19]. GATA1 has been well studied using the G1E cell model, a Gata1 null murine erythroid progenitor cell line. A daughter cell line, G1E-ER4, in which GATA1 is fused to an estrogen ligand-binding domain (Gata1-ER) and is constitutively expressed allows for inducible rescue of GATA1 nuclear activity, promoting erythroid differentiation [20]. In this case, treatment with estrogen or an estrogen analogue such as tamoxifen or estradiol allows tight control of GATA1 nuclear localisation and therefore analysis of transcriptional kinetics. Once induced, nuclear GATA1-ER expression is similar to levels of endogenous GATA1 found in murine erythroleukemia (MEL) cells and consistent expression is maintained for at least 24 hours after induction [17].

We examined RNA-seq and GATA1 ChIP-seq data arising from G1E-ER4 cells generated as part of the ENCODE project [21–23]. Full details of the datasets used and methodology are available in the Supplementary Material. The cells were either not induced or induced with b- estradiol for 3, 7, 14, 24 and 30 hours. RNA-seq and GATA1 ChIP-seq was performed at each stage. To see the pattern of GATA1 activation and repression in these cells, we first performed differential gene expression analysis at each stage of induction. At most time points, the number of activated and repressed genes is approximately equivalent, confirming GATA1’s dual function (Fig 2). It should be noted that as nuclear localisation of GATA1-ER upon induction causes differentiation and multiple changes in gene expression, some of the changes at later time points, may be due to indirect effects.

Figure 2. GATA1 acts as both an activator and repressor in G1E-ER4s. The number of activated (grey) and repressed genes (blue) (p.adj < 0.1) after induction of G1E-ER4s with b-estradiol for 3 h (685 differentially expressed genes), 7 h (599), 14 h (1753), 24 h (4912) or 30 h (4987). This analysis was done using published data from the ENCODE project. Full details of the datasets used and methodology are available in the supplementary material.

To search for evidence of ‘hit and run’ transcriptional regulation, we determined the set of proximal GATA1 ChIP-Seq peaks that appear within 1 kb of the transcription start site (TSS) of either activated or repressed genes at any time points. GATA1 occupancy at target promoters is at its highest at the earliest time point following b-estradiol induction (3 h) (Fig 3). We found that at all subsequent time points (t = 7, 14, 27, 30 h), genes that were repressed exhibited more rapid and pronounced reduction in GATA1 promoter occupancy over time than activated genes. For example, at 30 h following induction, only 19% of repressed promoters that were bound by GATA1 at 3 h remain bound, compared to 45% of activated promoters (Supplementary Fig 1). These observations are consistent with ‘hit and run’ repression at a marked proportion of target genes and departure of GATA1 following silencing. It must also be considered that by 3 hours, some ‘hit and run’ targets may already be missed, as other studies have demonstrated that TF binding and regulation can occur within as little as 15 minutes [24,25]. It is also notable that even when considering activated genes, the number of GATA1 bound promoters declines over time, suggesting hit and run may also operate in activation, albeit to a lesser extent than for repression.

Figure 3. Repressed genes lose promoter GATA1 occupancy more profoundly and rapidly over time than activated genes. The promoters (+/- 1kb TSS) of differentially expressed genes (p.adj < 0.1), (activated genes in grey, repressed in blue) at (A) 7 hours, (B) 14 hours, (C) 24 hours, and (D) 30 hours after induction of G1E-ER4s with b-estradiol were checked for GATA1 ChIP-Seq peaks across all induction time points. The peaks were normalised to the number of peaks at the 3 hour ChIP time point, which was set to 1.0. Background peaks that were present in the uninduced (0 h) ChIP-seq were removed from all induced time points. This analysis was done using published data from the ENCODE project. Full details of the datasets used and methodology are available in the supplementary material.

We also considered that repressors (or activators) could be acting through more distally located elements and that this might skew our interpretation. To account for this, we extended the window and searched for peaks +/- 5kb from the TSS. The increasing distance resulted in a

higher proportion of regulated genes having peaks (Supplementary Fig 2). However, the trends remained the same, and GATA1 occupancy was highest at the first time point after b-estradiol induction and repressed genes exhibiting more rapid and pronounced reduction in occupancy at subsequent time points (Fig 4). Also, in general a higher proportion of activated genes have peaks (ranging from 26-61% at 3 h) than repressed genes (16-36%) (Supplementary Fig 2). This is consistent with previous studies [26] and the hypothesis that GATA1 resides at genes that it activates but that it may be operating via a ‘hit and run’ mechanism at genes it represses.

Figure 4. Repressed genes lose local GATA1 occupancy more profoundly and rapidly over time than activated genes. The promoters (+/- 5kb TSS) of all differentially expressed genes (p.adj < 0.1) (activated genes in grey, repressed genes in blue) at (A) 7 hours, (B) 14 hours, (C) 24 hours, and (D) 30 hours after induction of G1E-ER4s with b-estradiol were checked for GATA1 ChIP-Seq peaks across all induction time points. The peaks were normalised to the number of peaks at the 3 hour ChIP time point, which was set to 1.0. Background peaks that were present in the uninduced (0 h) ChIP-seq were removed from all induced time points. This analysis was done using published data from the ENCODE project. Full details of the datasets used and methodology are available in the supplementary material.

We then examined individual genes that had ChIP-seq peaks proximal to their TSS and were repressed. In these selected repressed genes, clear GATA1 ChIP-seq enrichment is seen at 3 hours post-induction, reducing at later time points (Fig 5). The genes remained repressed at these later time points despite the lack of a peak. This is suggestive of stable repression even after loss of TF binding, suggestive of a ‘hit and run’ mechanism.

Figure 5. GATA1 binds early post induction but loses binding across later time points at repressed genes. GATA1 ChIP-seq profiles at 0, 3, 7, 14, 24 and 30 h following b-estradiol induction at the following loci: (A) Slfn2; (B) Fbxo4; (C) Arap3; (D) Six1; (E) ; and (F) Rrp15. The y-axis shows reads per million. The log2 fold change (FC) in expression at each time point compared to uninduced (0 h) derived from RNA-seq data is shown in red. * = not significant (p.adj > 0.1). Values of less than 0 indicate repression of the respective gene, as compared to the expression level before induction (0 h timepoint). This analysis was done using published data from the ENCODE project. Full details of the datasets used and methodology are available in the supplementary material.

Comprehensive analysis of all GATA1 target genes was not possible, partly because it is difficult to identify relevant GATA1 peaks functionally associated with the expression of nearby genes. Nevertheless, the analysis of a subset of GATA1 targets that are repressed throughout the induction timecourse and have a clear peak proximal to their TSS lends support to the idea that GATA1 can repress through a ‘hit and run’ mechanism.

Activating but not repressive transcription factors co-localise with histone marks

We next took a different perspective and considered the distribution of particular histone marks surrounding TF-bound sites. Specifically, we analysed the levels of active (H3K4me3) and repressive (H3K27me3) histone modifications immediately surrounding the binding sites of a wide range of TFs that are considered to be canonical activators, repressors, or both activators and repressors. For a transcriptional repressor whose constitutive presence is required for the stable silencing of a gene, one might expect to detect a local enrichment of H3K27me3. Conversely, a transcriptional repressor that operates by a ‘hit and run’ mechanism would not be detected in association with repressive histone marks, as it would have departed the locus once repression was established.

We first looked at the levels of H3K4me3 and H3K27me3 in G1E-ER4s around GATA1 ChIP- Seq peaks in promoter regions (+/- 2.5kb TSS). Strikingly, considering GATA1 is a dual activator/repressor, we only see H3K4me3 (active mark) enrichment around GATA1 peaks with minimal H3K27me3 (repressive mark) enrichment (Fig 6a). This is consistent regardless of whether the GATA1 peaks are at the promoters of activated genes or at the few repressed genes that remain bound by GATA1 at 24 h post-induction (Supplementary Fig 3). In other words, GATA1 is found associated with active chromatin marks, but despite the fact it is known to directly repress gene expression, there is little observable association between GATA1 and repressive histone marks. Interestingly, a previous study has shown by co-IP experiments that GATA1 associates with Suz12 and EZH2, components of PRC2 (polycomb repressive complex 2), which mediates H3K27me3 deposition [27].

Figure 6. An active (H3K4me3, grey) but not repressive (H3K27me3, blue) histone mark is enriched around TF ChIP-seq peaks. a) The average enrichment for both H3K4me3 (ENCFF035MWY and ENCFF750EGB) and H3K27me3 (ENCFF254JHP and ENCFF211VNX) around GATA1 ChIP-seq peaks in G1E-ER4s after induction with b-estradiol for 24 hours in promoter regions (+/- 2500bp) was plotted. The signal is normalised to reads per million. b) Overlap of TF ChIP- seq peaks with active (H3K4me3) or repressive (H3K27me3) histone marks. This analysis was done using published data from the ENCODE project. Full details of the datasets used and methodology are available in the supplementary material.

We extended this analysis further by looking at all available TF ChIP-seq data in K562 cells from the ENCODE project and show a selection of results in Fig 6b. Strikingly, TF binding is overwhelmingy associated with active marks (94% on average; a full list is available in Supplementary Table 1). When looking at repressive marks, there is only a 10% overlap on average, and even for TFs such as REST, which is generally considered a dedicated repressor, only 13% overlap. Consistent with the results in G1E-ER4s, GATA1 displays 96% overlap with H3K4me3 and only 7% overlap with H3K27me3.

Considering these data, we suggest that persistent residency of TFs in facultative heterochromatin is either not a general mechanism of transcriptional silencing, or at the least, is not easily detectable by ChIP-Seq. While ‘hit and run’ repression provides an attractive explanation for this observation, it is possible that in some instances, continuous occupancy and repression does in fact occur but that this may require the application of more sensitive occupancy assays for detection. This will be discussed later.

Is ‘hit and run’ repression under-appreciated?

To understand why this mechanism may have been under-appreciated, we must look at how TF binding is detected. ChIP, and especially the coupling of ChIP to high throughput sequencing (ChIP-seq), has been, and remains, the most prevalent method used to detect genome-wide binding in vivo. These experiments generate large datasets, wherein several thousand peaks indicating TF binding sites, are detected.

Inferring the gene(s) whose expression might be regulated by the binding of a TF at any given site is a notoriously challenging problem. It is confounded by the fact that any individual regulatory element can operate over very large distances (up to Mb scale), and can act positively or negatively on one or more genes that may or may not be contiguous with the element. Studies of changes in gene expression must necessarily be carefully designed to parse out directly impacted genes versus indirect targets whose expression change due to secondary, flow-on transcriptional cascades.

Another aspect of the complexity of this problem is evident in Supplementary Fig 2 - there is often an imperfect correlation between regulation and TF binding. Put simply, even when one looks at genes that are known to be regulated by GATA1, one does not always see the bindng to the promoter region of the gene that it regulates. For instance, in Fig 5, one only sees GATA1 at many genes early in the time course, and in other cases, which are not captured in the Figure, there is no clear GATA1 peak in the proximity of GATA-dependent genes. While there are many explanations for such discrepancies, these anomalies are not usually or even easily pursued further. The relevant functional site by which a gene is regulated may not lie in the proximal promoter nor in a nearby enhancer. It may lie in a more distant regulatory element. As such, the failure to identify a key peak – or smoking gun – does not mean the TF does not regulate the gene of interest.

In the case of transcriptional repressors, when a researcher is confronted by the lack of a ChIP- seq peak at a repressed locus where the TF is hypothesised to bind, several explanations may be invoked. One possibility is that the TF binds further away and acts at a distance via looping. However, this is not easily pursued because there may be multiple candidate distal peaks and determining which are functional is not simple. Alternatively, one might propose that the TF is not actually a direct player in repression, and that it is operating indirectly, by regulating some unknown TF that directly binds the gene in question. In this case the issue is seldom pursued further because identifying the secondary factor may be challenging and analysing its function could represent as much work as the original study. Epitope masking may also be at play, and may also explain the absence of a ChIP-seq TF peak at a gene that it regulates. However, epitope masking is also a challenging problem that can only be overcome by extensive epitope tagging experiments, so it is also often not pursued further.

In summary, if a TF peak is not observed close to a putative target gene then the matter is unlikely to be investigated further so evidence that a TF may bind transiently and operate via a ‘hit and run’ mechanisms will not emerge.

Additionally, because ChIP-Seq datasets tend to contain a multitude of TF peaks, one invariably does identify some peaks within, or at least close to, interesting target genes. This finding may be reported without further work to functionally validate individual sites to show that they are instrumental in the regulation of the target genes. Being visible at the ‘scene of the crime’ is often enough to report that a TF is regulating an expected target. This is because

it is difficult to generate data on functional regulation. Until the relatively recent advent of genome engineering (by zinc finger nucleases (ZFNs)), transcription activator-like effector nucleases (TALENs) and the more adoptable CRISPR/Cas9, it was very difficult to mutate candidate control elements. To further confound matters, if there are many candidate elements, as there often are, there may be functional redundancy among them, making it difficult to pin down regulatory effects. In this case, it may simply be assumed that the transcriptional regulator is operating directly but the matter is not investigated further. This means that in some cases the only peaks observed near the gene may be non-functional sites, and the idea that functional elements that are no longer bound by a repressor are in fact operative, will never be considered.

More generally, the remarkable abundance of peaks in most ChIP-Seq datasets has meant that the most pressing question is to explain what is seen, rather than to speculate about what might have been missed by taking a snapshot of a TF’s residency profile across the genome at a single time point. Efforts in both the experimental procedure and downstream analysis have focused on eliminating false positives rather than trying to uncover false negatives. Finally, because most ChIP-Seq datasets are effectively snapshots rather than dynamic time courses generated in inducible systems, this means that the idea that the profile of TF binding changes over time during cellular differentiation and maturation is not often discussed.

How does one detect binding at targets where the TF operates via ‘hit and run’?

A key challenge is finding the appropriate window of time when the TF is present and is actively mediating the repression of the gene. Our hypothesis is that factors will operate at specific periods during cellular differentiation and maturation and possibly at only specific stages of the cell cycle. ChIP-Seq is a technique that requires a reasonably large number of cells and can be compromised by low signal-to-noise ratios. If cell cycle stages are not synchronised or cellular differentiation or maturation varies across the population (perhaps stochastically) then transient binding events may not be detected. While single cell ChIP-Seq could conceivably circumvent such heterogeneity, it remains challenging due to the background caused by non-specific antibody binding [28–30].

One strategy could be to assay for binding at specific stages of the cell cycle, perhaps immediately following mitosis, at which point vacated TFs re-engage chromatin. To this end, small molecules that arrest the cell cycle at different stages (such as nocodazole, thymidine, lovostatin) can be employed to synchronise populations. In general, working with cell populations that are more epigenetically homogeneous may also enhance sensitivity to detect more transient binding events. This can be achieved by deriving clonal populations, sorting or enriching for cells at a specific stage of differentiation, and so forth.

Another strategy is to consider techniques that may be more sensitive and detect lower signals (or infrequent binding), as might be observed with ‘hit and run’ repressors. Techniques that involve enzymatic mechanisms, such as DamID-seq [31,32], may be particularly sensitive. DamID also has other advantages such as not relying on antibodies and thus avoiding epitope masking. DamID also detects any binding events that have occurred during the timeframe of the experiment and thus is not as limited by the ‘snapshot’ problem of ChIP-seq and other antibody-based techniques.

Similarly, CUT&RUN (Cleavage Under Targets and Release Using Nuclease) is a novel technique for detection of genome-wide occupancy in situ [33,34]. It employs antibody-targeted

micrococcal nuclease to liberate small DNA fragments that are subject to high-throughput sequencing. This method requires fewer cells (single cell CUT&RUN has now also been performed [35]) and exhibits a better signal-to-noise ratio, as well as better resolution than ChIP- Seq. Additionally, this method may be capable of detecting transient binding, as cuts may be made even if the TF is not resident at its target site for long periods.

Controlling when the TF is expressed and thus when the ‘hit’ occurs is also a potential method of detecting binding. This has in the past often been achieved by fusing the TF to a modified , which sequesters the TF in the cytoplasm until the cells are treated by estrogen or an analogue, as described above for GATA1. Using a system such as this may allow detection of ‘hit and run’ regulators as one can perform ChIP-seq (or other methods) immediately following treatment of the cells with the inducer.

Indeed, a recent elegant study using such a fusion system for the Ikaros TF has revealed a fine scale temporal map of transcriptional regulatory dynamics in pre-B cells [24]. Nuclear translocation of Ikaros-ER resulted in rapid binding at target gene promoters within 5 minutes, displacement of RNA polymerase II shortly thereafter, followed by nucleosome invasion and transcriptional cessation. Ultimately, stable silencing mediated by histone deacetylation occurred. Importantly, nuclear induction of Ikaros resulted in reduced accessibility of its target sites genome-wide, and this in turn impeded its own binding, consistent with the hypothesis of ‘hit and run’ repression. Further in support of this, disabling its chromatin remodelling capacity (by knocking down CHD4) restored the ability of Ikaros to bind to its target sites. Orthogonally, another recent study that deployed an inducible NuRD complex (ER-Mbd3b-ER) observed rapid expulsion of TFs and RNA pol II at control elements followed by transcriptional silencing [36]. Together, these studies highlight the difficulty in detecting transient repressive activity in static systems and the utility of temporally controlled models.

One caveat with such fusion systems is that the introduction of a tag or inducible domain may hinder interactions of the TF with its cofactors and prevent complex formation. As such, functional preservation should be assessed relative to the native TF. However, an inadvertent potential advantage of such functional disruption may be that it facilitates detection of genome- wide binding for ‘hit and run’ repressors. The ectopic domain could prevent epitope masking or render the TF unable to function properly, preventing chromatin condensation and repression and allowing the continued binding of the TF to the locus, allowing detection.

Indeed, our own work and that of others on BCL11A and its direct repression of the fetal g- globin (HBG1 and HBG2) gene promoters may be instructive [37,38]. Despite BCL11A being established as a potent repressor of the fetal g-globin genes a decade ago [39], direct promoter binding was not detected until recently [37,38]. While conventional ChIP had failed to demonstrate the interaction [40], CUT&RUN proved sufficiently sensitive [38]. Similarly, we detected BCL11A at the fetal g-globin gene promoters using a molecule tagged with an inducible Estrogen Receptor ligand binding domain as well as a V5 epitope tag [37]. The tagging appeared to interfere with the repression function of the fusion protein such that the fetal globin genes were not completely silenced when the binding of BCL11A was readily detected. It is not currently clear in these studies whether the BCL11A detected was present at the silenced locus or in a fraction of cells that maintained open chromatin and gene expression. The results are consistent with a model where BCL11A operates by a ‘hit and run’ mechanism of repression but do not rule out other mechanisms where it continues to bind at the locus but is undetectable due to epitope masking.

Conclusions and outlook

The ‘hit and run’ hypothesis has been proposed several times, but primarily in the context of transcriptional activation [25,41–44]. There is, however, also a growing interest in epigenetic co- repressors, both natural and artificial that operate via ‘hit and run’ mechanisms [45,46]. We suggest that the ‘hit and run’ mechanism is perhaps most prevalent in transcriptional repression. Some transcriptional repressors bind and then recruit histone modifying enzymes/complexes that package the genetic locus in such a way that it is no longer accessible. This prevents the gene from being transcribed and may also mean that the repressing TF itself is either no longer required to bind (or binds very infrequently) or may even be no longer able to bind (Fig. 1b). Indeed, this forms the basis of stable epigenetic programs that persist to control gene expression throughout cell division and during development. Further, the idea of ‘hit and run’ and stable repression has been used as a tool to achieve targeted gene silencing. Artifical DNA binding domains (TALEs, dCas9) have been fused to histone modifying enzymes requiring only transient expression for stable and inheritable gene silencing [45,46]. In some cases it may be that the gaoler’s key can be thrown away and life long gene repression ensues.

Currently, failure to detect a transcription factor at a particular site is most likely to be interpreted as suggesting that it is not relevant, or at least not directly involved in the regulation of the locus. However, the absence of seeing the criminal at the scene does not mean the transcription factor has not already silenced the gene and fled. It may be that in the case of repressors, many if not most, operate via ‘hit and run’ mechanisms. Use of sensitive assays for TF occupancy and inducible systems that allow temporal sampling should collectively reveal the extent of this and inform our understanding of the dynamic and sequential steps required for gene repression.

Acknowledgements

This work is supported by a grant from the Australian Research Council (DP170101786) to MC and KGRQ. MS is supported by an Australian Government Research Training Program (RTP) Scholarship. KGRQ is supported by a Scientia Fellowship from UNSW Sydney. The authors have no conflicts of interest to declare.

References

[1] V. Haberle, A. Stark, Nat. Rev. Mol. Cell Biol. 2018, 19, 621. [2] F. Jacob, J. Monod, J. Mol. Biol. 1961, 3, 318. [3] W. Gilbert, B. Müller-Hill, Proc. Natl. Acad. Sci. USA 1966, 56, 1891. [4] M. Lewis, C R Biol 2005, 328, 521. [5] J. A. Chong, J. Tapia-Ramírez, S. Kim, J. J. Toledo-Aral, Y. Zheng, M. C. Boutros, Y. M. Altshuller, M. A. Frohman, S. D. Kraner, G. Mandel, Cell 1995, 80, 949. [6] C. J. Schoenherr, D. J. Anderson, Science 1995, 267, 1360. [7] Z. F. Chen, A. J. Paquette, D. J. Anderson, Nat. Genet. 1998, 20, 136. [8] D. Ramsköld, E. T. Wang, C. B. Burge, R. Sandberg, PLoS Comput. Biol. 2009, 5, e1000598. [9] G. K. Marinov, B. A. Williams, K. McCue, G. P. Schroth, J. Gertz, R. M. Myers, B. J. Wold, Genome Res. 2014, 24, 496. [10] U. Grossniklaus, R. Paro, Cold Spring Harb. Perspect. Biol. 2014, 6, a019331. [11] K. Struhl, Cell 1999, 98, 1.

[12] M. Ptashne, J. Biol. Chem. 2014, 289, 5417. [13] M. Bienz, J. Müller, Bioessays 1995, 17, 775. [14] R. Paro, Curr. Opin. Cell Biol. 1993, 5, 999. [15] B. Schuettengruber, H.-M. Bourbon, L. Di Croce, G. Cavalli, Cell 2017, 171, 34. [16] D. Beuchle, G. Struhl, J. Müller, Development 2001, 128, 993. [17] J. J. Welch, J. A. Watts, C. R. Vakoc, Y. Yao, H. Wang, R. C. Hardison, G. A. Blobel, L. A. Chodosh, M. J. Weiss, Blood 2004, 104, 3136. [18] P. Vyas, M. A. McDevitt, A. B. Cantor, S. G. Katz, Y. Fujiwara, S. H. Orkin, Development 1999, 126, 2799. [19] A. Miccio, Y. Wang, W. Hong, G. D. Gregory, H. Wang, X. Yu, J. K. Choi, S. Shelat, W. Tong, M. Poncz, G. A. Blobel, EMBO J. 2010, 29, 442. [20] M. J. Weiss, C. Yu, S. H. Orkin, Mol. Cell. Biol. 1997, 17, 1642. [21] D. Jain, T. Mishra, B. M. Giardine, C. A. Keller, C. S. Morrissey, S. Magargee, C. M. Dorman, M. Long, M. J. Weiss, R. C. Hardison, Genom. Data 2015, 4, 1. [22] ENCODE Project Consortium, Nature 2012, 489, 57. [23] C. A. Davis, B. C. Hitz, C. A. Sloan, E. T. Chan, J. M. Davidson, I. Gabdank, J. A. Hilton, K. Jain, U. K. Baymuradov, A. K. Narayanan, K. C. Onate, K. Graham, S. R. Miyasato, T. R. Dreszer, J. S. Strattan, O. Jolanki, F. Y. Tanaka, J. M. Cherry, Nucleic Acids Res. 2018, 46, D794. [24] Z. Liang, K. E. Brown, T. Carroll, B. Taylor, I. F. Vidal, B. Hendrich, D. Rueda, A. G. Fisher, M. Merkenschlager, Elife 2017, 6, DOI 10.7554/eLife.22767. [25] A. Para, Y. Li, A. Marshall-Colón, K. Varala, N. J. Francoeur, T. M. Moran, M. B. Edwards, C. Hackley, B. O. R. Bargmann, K. D. Birnbaum, W. R. McCombie, G. Krouk, G. M. Coruzzi, Proc. Natl. Acad. Sci. USA 2014, 111, 10371. [26] Y. Cheng, W. Wu, S. A. Kumar, D. Yu, W. Deng, T. Tripic, D. C. King, K.-B. Chen, Y. Zhang, D. Drautz, B. Giardine, S. C. Schuster, W. Miller, F. Chiaromonte, Y. Zhang, G. A. Blobel, M. J. Weiss, R. C. Hardison, Genome Res. 2009, 19, 2172. [27] M. Yu, L. Riva, H. Xie, Y. Schindler, T. B. Moran, Y. Cheng, D. Yu, R. Hardison, M. J. Weiss, S. H. Orkin, B. E. Bernstein, E. Fraenkel, A. B. Cantor, Mol. Cell 2009, 36, 682. [28] A. Rotem, O. Ram, N. Shoresh, R. A. Sperling, A. Goren, D. A. Weitz, B. E. Bernstein, Nat. Biotechnol. 2015, 33, 1165. [29] S. J. Clark, H. J. Lee, S. A. Smallwood, G. Kelsey, W. Reik, Genome Biol. 2016, 17, 72. [30] G. Kelsey, O. Stegle, W. Reik, Science 2017, 358, 69. [31] B. van Steensel, S. Henikoff, Nat. Biotechnol. 2000, 18, 424. [32] O. J. Marshall, T. D. Southall, S. W. Cheetham, A. H. Brand, Nat. Protoc. 2016, 11, 1586. [33] P. J. Skene, S. Henikoff, Elife 2017, 6, DOI 10.7554/eLife.21856. [34] P. J. Skene, J. G. Henikoff, S. Henikoff, Nat. Protoc. 2018, 13, 1006. [35] S. J. Hainer, A. Boskovic, O. J. Rando, T. G. Fazzio, BioRxiv 2018, DOI 10.1101/286351. [36] S. Bornelöv, N. Reynolds, M. Xenophontos, S. Gharbi, E. Johnstone, R. Floyd, M. Ralser, J. Signolet, R. Loos, S. Dietmann, P. Bertone, B. Hendrich, Mol. Cell 2018, 71, 56. [37] G. E. Martyn, B. Wienert, L. Yang, M. Shah, L. J. Norton, J. Burdach, R. Kurita, Y. Nakamura, R. C. M. Pearson, A. P. W. Funnell, K. G. R. Quinlan, M. Crossley, Nat. Genet. 2018, 50, 498. [38] N. Liu, V. V. Hargreaves, Q. Zhu, J. V. Kurland, J. Hong, W. Kim, F. Sher, C. Macias- Trevino, J. M. Rogers, R. Kurita, Y. Nakamura, G.-C. Yuan, D. E. Bauer, J. Xu, M. L.

Bulyk, S. H. Orkin, Cell 2018, 173, 430. [39] V. G. Sankaran, T. F. Menne, J. Xu, T. E. Akie, G. Lettre, B. Van Handel, H. K. A. Mikkola, J. N. Hirschhorn, A. B. Cantor, S. H. Orkin, Science 2008, 322, 1839. [40] J. Xu, V. G. Sankaran, M. Ni, T. F. Menne, R. V. Puram, W. Kim, S. H. Orkin, Genes Dev. 2010, 24, 783. [41] V. Charoensawan, C. Martinho, P. A. Wigge, Bioessays 2015, 37, 748. [42] K. Varala, Y. Li, A. Marshall-Colón, A. Para, G. M. Coruzzi, Bioessays 2015, 37, 851. [43] W. Schaffner, Nature 1988, 336, 427. [44] J. Doidy, Y. Li, B. Neymotin, M. B. Edwards, K. Varala, D. Gresham, G. M. Coruzzi, BMC Genomics 2016, 17, 92. [45] A. Amabile, A. Migliara, P. Capasso, M. Biffi, D. Cittaro, L. Naldini, A. Lombardo, Cell 2016, 167, 219. [46] T. Mlambo, S. Nitsch, M. Hildenbeutel, M. Romito, M. Müller, C. Bossen, S. Diederichs, T. I. Cornu, T. Cathomen, C. Mussolino, Nucleic Acids Res. 2018, 46, 4456.

Supplementary Material

Accession Numbers

G1E-ER4 RNA-seq

Time point ENCODE Accession Number(s) File Type 0 h ENCFF119XQN,ENCFF138KEF Bam 3 h ENCFF098TJE,ENCFF805YUR Bam 7 h ENCFF770CFX,ENCFF479JSH Bam 14 h ENCFF242GCO,ENCFF936GQS Bam 24 h ENCFF951VAI,ENCFF612DLW Bam 30 h ENCFF033FKS,ENCFF633LAO Bam

GATA1 ChIP-seq in G1E-ER4s (peak files)

Time point ENCODE Accession Number(s) File Type 0 h ENCFF001YFO Bed 3 h ENCFF001YFM Bed 7 h ENCFF001YFN Bed 14 h ENCFF001YFJ Bed 24 h ENCFF724FOF Bed 30 h ENCFF001YFL Bed

GATA1 ChIP-seq in G1E-ER4s (for signal files)

Time point ENCODE Accession Number(s) File Type 0 h ENCFF145AOI Bam 3 h ENCFF054PYK Bam 7 h ENCFF242LAJ Bam 14 h ENCFF503ZWH Bam 24 h ENCFF181MQO,ENCFF033GTE Bam 30 h ENCFF944TDI Bam

H3K4me3 and H3K27me3 ChIP-seq in G1E-ER4s

Target ENCODE Accession Number(s) File Type H3K4me3 ENCFF035MWY, ENCFF750EGB Bam H3K27me3 ENCFF254JHP, ENCFF211VNX Bam

H3K4me3 and H3K27me3 in K562s

All ENCODE accession numbers listed in Supplementary Table 1

Methods

G1E-ER4 RNA-seq

Aligned bam files were quantified using featureCounts[1] with GENCODE M17 as the reference transcriptome. DESeq2[2] was then used to perform differential expression analysis.

GATA1 ChIP-seq in G1E-ER4s

Peak files (bed files) were first converted to the mm10 genome (where necessary) using UCSC LiftOver. Any peaks that were present in the 0h (not induced) sample were removed from all induced samples using bedtools[3]. To calculate the number of genes that had promoter peaks, promoter lists were first generated from activated and repressed genes. This was done by extending the +1 (TSS) site for any transcript of an activated or repressed gene by the specified promoter windows (+/- 1kb TSS, +/- 5kb TSS). These lists were than overlapped using bedtools intersect with the peak files. Duplicate gene instances (due to multiple peaks or multiple transcripts) were then consolidated to only count as 1 per gene. For Supplementary Fig 1, the peaks that were present at 3h after induction were tracked across subsequent time points (i.e. it was seen if the peak was present or not, no new peaks were considered).

Signal files (bigwig files) were generated from the GATA1 ChIP-seq alignment (bam) files and normalised to counts per million (CPM) using deepTools[4]. For the 24h time point, the two replicate files were first merged and then normalised. These files were visualised using Integrative Genomics Viewer (IGV).

Histone mark analysis

For G1E-ER4s, signal files were generated from the H3K4me3 and H3K27me3 bam files as described above. A peak file was generated that only included peaks from the GATA1 ChIP- seq 24 hour time point that were in promoter regions. An enrichment profile around these peaks was then generated using deepTools.

For histone mark localisation analysis in K562s, all ChIP-seq peaks available in K562s were downloaded from ENCODE (individual accession numbers provided in table) and then overlapped with either H3K4me3 or H3K27me3 ChIP-seqs. Only peaks in the proximal promoter (±1kb TSS) were considered.

Results

Supplementary Figure 1. Repressed genes lose promoter GATA1 occupancy detected at 3 h after induction more profoundly and rapidly over time than activated genes. The promoters (+/- 1kb TSS) of differentially expressed genes (p.adj < 0.1) at (A) 7 hours, (B) 14 hours, (C) 24 hours, and (D) 30 hours after induction of G1E-ER4s with b-estradiol were assessed for GATA1 ChIP-Seq peaks at 3 h after b-estradiol induction. Those that showed peaks were than tracked across later ChIP-seq time points for continued GATA1 occupancy. Peak counts at subsequent time points were normalised to the number of peaks at the 3 hour ChIP time point (set to 1.0). Background peaks that were present in the uninduced (0 h) ChIP- seq were removed from all induced time points.

Supplementary Figure 2. A higher proportion of activated genes have peaks than repressed genes. Promoter regions (+/- 1kb TSS, and +/-5kb TSS) of all differentially expressed genes (p.adj < 0.1) at (A) 7 hours, (B) 14 hours, (C) 24 hours and (D) 30 hours after induction of G1E-ER4s with b-estradiol were checked for GATA1 ChIP-seq peaks across all induction time points. The percentage of total activated or repressed genes with peaks in those regions is shown on the y-axis. Peaks that were present in the uninduced (0 h) ChIP-seq were removed from all time points.

Supplementary Figure 3. An active (H3K4me3, green) but not repressive (H3K27me3, blue) histone mark is enriched around GATA1 ChIP-seq peaks at both activated and repressed genes. The average enrichment for both H3K4me3 (ENCFF035MWY and ENCFF750EGB) and H3K27me3 (ENCFF254JHP and ENCFF211VNX) around GATA1 ChIP-seq peaks in the promoters of genes that are a) activated or b) repressed after induction with b-estradiol for 24 hours. The signal is normalised to reads per million.

Supplementary Table 1 Overlap of TF ChIP-seq peaks in promoters (within 1 kb of the TSS) with active (H3K4me3) or repressive (H3K27me3) histone marks

ENCODE Accession Target Overlap with Overlap with H3K4me3 H3K27me3 (ENCFF314GK (ENCFF145U M) (%) OC) (%) ENCFF002CEK CHD1 99% 10% ENCFF002CEL CTCF 81% 17% ENCFF002CEM EZH2 54% 77% ENCFF002CEN HDAC1 98% 11% ENCFF002CEO HDAC2 95% 13% ENCFF002CEP HDAC6 91% 21% ENCFF002CEQ EP300 94% 9% ENCFF002CER PHF8 98% 11% ENCFF002CES KDM5B 99% 9% ENCFF002CEU RBBP5 99% 9% ENCFF002CEV SAP30 99% 10% ENCFF002CLN ATF3 96% 9% ENCFF002CLP BCLAF1 99% 7% ENCFF002CLQ CBX3 96% 9% ENCFF002CLR CEBPB 92% 10% ENCFF002CLS CTCF 82% 18% ENCFF002CLT CTCFL 93% 13% ENCFF002CLU E2F6 94% 14% ENCFF002CLV EGR1 91% 15% ENCFF002CLW ELF1 96% 11% ENCFF002CLX ETS1 99% 10% ENCFF002CLY FOSL1 89% 12% ENCFF002CLZ GABPA 97% 10% ENCFF002CMA GATA2 89% 9% ENCFF002CMB HDAC2 95% 8% ENCFF002CMC MAX 94% 13% ENCFF002CMD MEF2A 96% 9% ENCFF002CME NR2F2 95% 9% ENCFF002CMF REST 92% 11% ENCFF002CMG PML 99% 8% ENCFF002CMH POLR2Aphosph 98% 8% oS5 ENCFF002CMI POLR2A 98% 9% ENCFF002CMJ SPI1 90% 13% ENCFF002CMK RAD21 81% 17% ENCFF002CML SIN3A 98% 10% ENCFF002CMM SIX5 98% 9% ENCFF002CMN SP1 98% 10% ENCFF002CMO SP2 98% 10% ENCFF002CMP SRF 98% 9% ENCFF002CMQ STAT5A 96% 7% ENCFF002CMR TAF1 99% 9% ENCFF002CMS TAF7 99% 9% ENCFF002CMT TEAD4 94% 9% ENCFF002CMU THAP1 99% 9% ENCFF002CMV USF1 92% 11% ENCFF002CMW YY1 98% 10% ENCFF002CMX YY1 97% 10% ENCFF002CMY ZBTB33 95% 9% ENCFF002CMZ ZBTB7A 95% 13% ENCFF002CVL ARID3A 95% 5% ENCFF002CVM ATF1 95% 10% ENCFF002CVN ATF3 99% 8% ENCFF002CVO BACH1 94% 9% ENCFF002CVP BDP1 90% 7% ENCFF002CVQ BHLHE40 96% 11% ENCFF002CVR BRF1 100% 8% ENCFF002CVS BRF2 86% 7% ENCFF002CVT SMARCA4 99% 6% ENCFF002CVU CCNT2 98% 11% ENCFF002CVV CEBPB 87% 11% ENCFF002CVW FOS 95% 10% ENCFF002CVX CHD2 98% 9% ENCFF002CWC JUN 94% 9% ENCFF002CWH MYC 98% 10% ENCFF002CWI MYC 99% 8% ENCFF002CWJ RCOR1 95% 9% ENCFF002CWK RCOR1 92% 11% ENCFF002CWL CTCF 83% 16% ENCFF002CWM 99% 9% ENCFF002CWN E2F6 97% 12% ENCFF002CWO ELK1 100% 9% ENCFF002CWP GATA1 96% 7% ENCFF002CWQ GATA2 92% 9% ENCFF002CWR GTF2B 100% 7% ENCFF002CWS GTF2F1 99% 7% ENCFF002CWT HMGN3 98% 11% ENCFF002CWU SMARCB1 97% 9% ENCFF002CWZ JUND 93% 10% ENCFF002CXB MAFF 85% 11% ENCFF002CXC MAFK 85% 10% ENCFF002CXD MAX 96% 11% ENCFF002CXE MAZ 95% 12% ENCFF002CXF MXI1 98% 10% ENCFF002CXG NELFE 100% 9% ENCFF002CXH NFE2 96% 8% ENCFF002CXI NFYA 97% 10% ENCFF002CXJ NFYB 92% 12% ENCFF002CXK NRF1 99% 8% ENCFF002CXL EP300 95% 8% ENCFF002CXQ POLR2A 98% 8% ENCFF002CXR POLR2A 99% 9% ENCFF002CXS POLR2Aphosph 97% 8% oS2 ENCFF002CXT POLR3G 100% 11% ENCFF002CXU RAD21 68% 17% ENCFF002CXV RFX5 95% 9% ENCFF002CXW POLR3A 89% 8% ENCFF002CXX SETDB1 92% 6% ENCFF002CXY SETDB1 97% 9% ENCFF002CXZ SIRT6 98% 6% ENCFF002CYA SMC3 81% 17% ENCFF002CYH TAL1 90% 9% ENCFF002CYI TBL1XR1 99% 8% ENCFF002CYJ TBL1XR1 98% 7% ENCFF002CYK TBP 99% 9% ENCFF002CYL GTF3C2 98% 7% ENCFF002CYM NR2C2 99% 7% ENCFF002CYN UBTF 98% 11% ENCFF002CYO UBTF 97% 12% ENCFF002CYP USF2 98% 9% ENCFF002CYQ YY1 99% 8% ENCFF002CYR ZNF143 91% 12% ENCFF002CYS ZNF263 91% 16% ENCFF002CYT ZNF274 0% 0% ENCFF002CYU ZNF274 90% 9% ENCFF002DBC MYC 99% 9% ENCFF002DBD CTCF 80% 18% ENCFF002DBE POLR2A 99% 9% ENCFF002DDJ CTCF 81% 17% ENCFF002DDU POLR2Aphosph 80% 6% oS2 ENCFF002DDV eGFP-FOS 82% 12% ENCFF002DDW eGFP-GATA2 91% 8% ENCFF002DDX eGFP-HDAC8 96% 8% ENCFF002DDY eGFP-JUNB 91% 10% ENCFF002DDZ eGFP-JUND 93% 10% ENCFF003ZTL LEF1 97% 7% ENCFF006PKD eGFP-PTTG1 87% 8% ENCFF008RCX eGFP-FOXJ2 96% 8% ENCFF017HXV eGFP-MAFG 75% 13% ENCFF026WOK ETV6 96% 8% ENCFF027LWL eGFP-GTF2E2 99% 8% ENCFF029TIT RUNX1 98% 10% ENCFF030HWZ FLAG-ATF1 97% 11% ENCFF050XKT MIER1 91% 14% ENCFF058XGG eGFP-PYGO2 94% 9% ENCFF066VZJ ARNT 99% 8% ENCFF083MJM DPF2 94% 8% ENCFF087ZGG RNF2 98% 10% ENCFF088OVM SUZ12 98% 10% ENCFF096MVD DEAF1 99% 8% ENCFF098MPD eGFP-ELF1 98% 10% ENCFF115MIG PKNOX1 91% 11% ENCFF138OPO FOXK2 97% 8% ENCFF141LQN FOXK2 97% 9% ENCFF144NEA eGFP-ZNF584 94% 9% ENCFF150FFT HMBOX1 93% 7% ENCFF152RVC TFDP1 98% 9% ENCFF185QPE HDAC1 96% 10% ENCFF187WOY MTA2 97% 9% ENCFF191LHZ eGFP-HDAC8 98% 10% ENCFF191QSX SP1 98% 10% ENCFF205JDZ eGFP- 92% 8% ZKSCAN8 ENCFF223QFO JUNB 95% 10% ENCFF228PLL ZBED1 99% 7% ENCFF242XCO eGFP-USF2 94% 10% ENCFF245EGO eGFP-DIDO1 99% 9% ENCFF251YTK eGFP-HINFP 95% 11% ENCFF256CGM eGFP-ID3 97% 10% ENCFF262XNJ eGFP-ELK1 98% 10% ENCFF269EMM FLAG-PBX2 95% 11% ENCFF269TDL KDM4B 99% 8% ENCFF269XTC RNF2 96% 11% ENCFF275TCO NR2F1 92% 10% ENCFF279SKQ eGFP-ADNP 90% 11% ENCFF284BAX IRF2 98% 9% ENCFF287OTO eGFP-ZNF740 96% 11% ENCFF300ITN CBX1 98% 10% ENCFF321MEV IKZF1 90% 11% ENCFF348DPQ eGFP-ZNF589 95% 9% ENCFF350MYQ eGFP-TSC22D4 67% 28% ENCFF354OCA HDGF 96% 9% ENCFF371SXN HES1 96% 10% ENCFF373UWQ ZNF318 96% 8% ENCFF379DQD ZNF24 96% 9% ENCFF386WDF MYNN 95% 10% ENCFF386XKI eGFP-ILK 97% 4% ENCFF401XIR eGFP-BACH1 92% 7% ENCFF406HKZ RUNX1 99% 7% ENCFF414QTG NFRKB 99% 7% ENCFF424QDH CBX5 96% 9% ENCFF454OVP NRF1 93% 13% ENCFF468PSZ L3MBTL2 94% 12% ENCFF469COY ZKSCAN1 96% 9% ENCFF480XEU eGFP-DDX20 90% 7% ENCFF484LPR eGFP-IRF9 92% 8% ENCFF484MTT eGFP-NFE2L1 75% 12% ENCFF493VJJ RAD51 93% 11% ENCFF514BVL eGFP-KLF13 98% 9% ENCFF520ZGU eGFP-TFDP1 98% 10% ENCFF559BMJ SIN3A 99% 9% ENCFF572OBM eGFP-RELA 97% 8% ENCFF576OQS eGFP-ZFX 98% 10% ENCFF585RPI eGFP-CUX1 97% 7% ENCFF591BIT TRIM28 85% 8% ENCFF592KOB eGFP-NR2C2 96% 10% ENCFF601MGP TCF7 98% 9% ENCFF616KSA NFE2 90% 10% ENCFF616LFC MITF 98% 7% ENCFF625IZQ eGFP-CEBPB 83% 12% ENCFF629BDQ eGFP-ZNF83 96% 8% ENCFF630QZR MLLT1 98% 7% ENCFF639KYI eGFP-ATF1 95% 11% ENCFF644WTS MLLT1 97% 9% ENCFF653DKS NBN 98% 8% ENCFF654RTP RFX1 85% 14% ENCFF657OBR eGFP-KLF1 98% 9% ENCFF657YIC NRF1 94% 13% ENCFF664FFU NRF1 95% 12% ENCFF672PKR eGFP-ZBTB11 98% 9% ENCFF674CGC eGFP-FOSL1 90% 10% ENCFF676RKB eGFP-CEBPG 83% 13% ENCFF680ONH CEBPZ 99% 7% ENCFF715PKK eGFP-ZNF175 95% 8% ENCFF731KHF DDX20 98% 8% ENCFF733UCH SMAD2 97% 14% ENCFF740BMT BMI1 95% 13% ENCFF757YKS eGFP-CREB3 96% 9% ENCFF760JPV TARDBP 98% 9% ENCFF760RIB ZEB2 95% 10% ENCFF769ZCV ZEB2 95% 10% ENCFF774VNG eGFP-ZNF644 96% 9% ENCFF793UHR CREM 95% 11% ENCFF794EJN eGFP-TEAD2 91% 9% ENCFF794RZT eGFP-ZNF639 98% 9% ENCFF797YEO eGFP-TAF7 99% 8% ENCFF800IBF TAL1 88% 9% ENCFF803BGX SMARCA4 94% 8% ENCFF815MOM eGFP-ETV1 96% 10% ENCFF817DXG ZBTB33 65% 12% ENCFF819AMI eGFP-ZNF197 97% 8% ENCFF822NPI NCOA1 97% 9% ENCFF835ALK eGFP-ZNF24 93% 12% ENCFF851DBO SMAD5 99% 9% ENCFF852AAZ eGFP- 97% 9% ENCFF858QMI TRIM28 98% 7% ENCFF864VDB eGFP-ZNF766 95% 10% ENCFF872EAZ SREBF1 99% 10% ENCFF873EGZ eGFP-GABPA 96% 10% ENCFF876YGA TARDBP 99% 8% ENCFF912CHJ eGFP-IRF1 90% 14% ENCFF922AJZ MTA2 95% 9% ENCFF932RXH ESRRA 96% 10% ENCFF983HOG eGFP-PTRF 94% 7% References

[1] Y. Liao, G. K. Smyth, W. Shi, Bioinformatics 2014, 30, 923. [2] M. I. Love, W. Huber, S. Anders, Genome Biol. 2014, 15, 550. [3] A. R. Quinlan, I. M. Hall, Bioinformatics 2010, 26, 841. [4] F. Ramírez, F. Dündar, S. Diehl, B. A. Grüning, T. Manke, Nucleic Acids Res. 2014, 42, W187.