<<

Jakobsen et al. - Supplemental Material

Table of contents

- Supplemental Data - Supplemental Methods - Supplemental References - Supplemental Figure (S1-S7) and Table (S1-S10) legends - Supplemental Figure S1-S7 - Supplemental Table S1, S2 - Supplemental Table S3-S10 available at http://genome.cshlp.org/

Supplemental Data

To investigate if CEBPA and CEBPB occupies the same genomic positions simultaneously, we performed sequential ChIP taking advantage of the antibodies we have tested comprehensively to be specific to either CEBPA or CEBPB (Supplemental Fig. S1 and Supplemental Methods). Our data shows that in the quiescent (0 hr) condition, the two transcription factors bind simultaneously to identical regions belonging to both the A and C cluster binding patterns (Supplemental Fig. S6A).

This outcome was used as a platform for performing sequential ChIP with other candidate factors.

In Supplemental Fig. S6B, we display the enrichment ratios of a list of binding target regions using

‘single-round’ ChIP to establish bona fine targets of the panel of candidate CEBP interaction partners (HNF1, ONECUT1 (HNF6), MAFB, EGR1, ). Primer sequences and names can be found in Supplemental Table S10. Further analysis of our unfiltered CEBPA and CEBPB binding data supported binding of these factors to the same genomic sites by demonstrating very proximally located binding peak summits (Supplemental Fig. S7A and S7B). This was true for A as well as C cluster regions, and for both interrogated time points.

1

Supplemental Methods

Mouse strains

For the generation of livers deleted for either Cebpa or Cebpb, used to examine antibody specificity or EGR1 dependency on CEBP, we generated Cebpafl/fl; Mx1-Cre and Cebpbfl/fl; Mx1-Cre mice by crossing the Cebpafl (Lee et al. 1997) and Cebpbfl alleles (Lopez et al. 2009), respectively, onto the

Mx1-Cre deleter strain (Kuhn et al. 1995). Excision of Cebpa and Cebpb were achieved by subjecting 10-12 weeks old Cebpafl/fl;Mx-Cre and Cebpbfl/fl;Mx1-Cre mice to 3 injections with polyinosinic-polycytidylic acid (pIpC) as described previously (Weischenfeldt et al. 2008). Livers were harvested 3 weeks after deletion. Cebpb deficient livers were a kind gift from Agnes Zay and

Claus Nerlov, University of Edinburgh.

Western blots

Liver material was boiled for 5 min in SDS-loading buffer, exposed to Benzonase (Sigma E1014-

5KU) for 20 min on ice and spun cold for 20 min at 20.000G. Antibodies were identical to those used for ChIP (Loading control anti-HistoneH3, Ab10799, Abcam). Nupage precast 4-12% Bis-Tris gels were used for separation (Invitrogen), blotting done following the Cell Signaling protocol

(http://www.cellsignal.com/support/protocols/Western.html). ImageJ was employed for quantifications, using program guidelines (http://rsb.info.nih.gov/ij/).

Sequencing and mapping

Sequencing of all samples was carried out on the Illumina HiSeq (EGR1) and Illumina Genome

Analyser IIx (all other samples) platform at the EMBL-Heidelberg Genomics Core facility, BGI-

Shenzhen, Hong Kong or Danish National High-throughput DNA Sequencing Centre, and reads were mapped to the NCBI7/mm9 (Mus musculus) genome assembly using Bowtie v. 0.12.7

2

(Langmead et al. 2009) with standard settings. An overview of sequencing and mapping statistics is presented in Supplemental Table S1 and Supplemental Fig. S2. External datasets (FOXA2 and

ONECUT1 (HNF6)) were downloaded as raw FASTQ sequence files from the NCBI short read archive (http://www.ncbi.nlm.nih.gov/sra/) (Hoffman et al. 2010; Laudadio et al. 2012), and processed as data generated in-house. E2F3, and ChIP-seq peak positions were retrieved directly from (Chen et al. 2008) and remapped to mm9 using the UCSC remapping tool

(http://genome.ucsc.edu/).

Normalization and visualization

For each CEBPA, CEBPB and RNA Polymerase II (POL2) sample, genome-wide read coverage was normalized to sequencing depth (number of mapped reads), grouped by antibody. Normalized coverage tracks were extended by 200 bp (sonication fragment length), converted to BigWig coverage tracks and uploaded to the UCSC Genome Browser (Kent et al. 2002) for visualization.

Peak calling, filtering and gene assignment

For peak calling, we utilized the peak callers uSeq (Scanseqs settings: window 177bp, peakshift:

131 bp, Enriched region maker settings: FDR=1% and log2ratio 1.3) (Nix et al. 2008) for samples with strong signal and low background (CEBPA, CEBPB) and MACS (standard settings) (Zhang et al. 2008) for samples with a higher noise to signal ratio (EGR1 and FOXA2). For all runs, a pooled time point IgG mock ChIP sample was used as control. All peaks were filtered for repeat content and amplification artifacts by removing peaks with a RepeatMasker (www.repeatmasker.org) track overlap of > 0.2, and a peak length of > 1200 bp. CEBPA and CEBPB peaks were further filtered as described below. For an overview of peak calling statistics, refer to Supplemental Table S2. from a repository of RefSeq gene models (Pruitt et al. 2005), using only the longest model for each

3

unique gene symbol, were assigned to peaks by overlapping gene regions, including 3000 bp upstream and 1000 bp downstream of the gene body, or by selecting the nearest TSS, in that order of priority. Peaks with ambiguous gene assignments were removed from further gene-centric analyses. See Supplemental Table S4 for peak/gene assignment. terms (Ashburner et al. 2000) were obtained from The Gene Ontology website (http://www.geneontology.org).

POL2 coverage processing

After normalization (mapped read counts, each of the eight samples) each longest Refseq gene model was assigned a POL2 coverage of the gene body, defined as the region from +500 bases downstream of the TSS to the end of the gene. This value was used as a measure for gene activity when comparing individual gene activity across the time course. See Supplemental Table S4 for assigned CEBP targets and Supplemental Table S7 for all listed genes.

Gene ontology analysis

Genes assigned to each cluster (A, B or C) were binned in an UP or DOWN group when exceeding a cut-off of 2x fold change of gene activity up or down (POL2 body counts), comparing each time point to 0 hrs. Color scaling was done for each cluster separately. For functional (gene ontology

(GO)-analysis), genes were filtered for differential expression with a cut-off of 1.55x fold change up or down for the time points 0, 3, 24 and 36 hrs. Gene activity changes for time point pairs 0-3, 3-

24 and 24-36 hrs were examined. For all GO-analyses, the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Huang da et al. 2009), (http://david.abcc.ncifcrf.gov/) was used.

Full analysis output is listed in Supplemental Table S5.

4

Motif discovery

To achieve a comprehensive sequence analysis, we condensed data from three large public databases (see below). For assessing PWM motif representation in our set of three clusters, we employed ASAP (Marstrand et al. 2008) (http://asap.binf.ku.dk/Asap/). For each cluster, sequence was retrieved from the UCSC sequence repository in windows of +/- 100 bp centered on peak summits, and was tested for motif occurrences (using condensed PWMs, see below), against a background set based on random sampling of 50.000 sequences of 200 bp width sequences from the same frequencies of genomic locations (intergenic, exonic, intronic, promoter). Z-values were used for hierarchical clustering using MeV on all consensus PWMs with a Z-score above 10 in any cluster and at least 400 hit instances total (A, B and C). All motif discovery data is available in

Supplemental Table S6. EGR1 and CEBP genomic hit UCSC tracks were generated using the statistical software R (http://www.r-project.org/) and condensed PWMs for the two factors, See

Supplemental Fig. S5.

Position Weight Matrix (PWM) consensus database construction

For constructing the TF binding sequence consensus database, we downloaded PWMs from three repositories. 130 PWMs were obtained from JASPAR 2008v (Portales-Casamar et al. 2010)

(http://jaspar.genereg.net/), 423 PWMs were obtained from TRANSFAC v.10.4 (Matys et al. 2006;

Bryne et al. 2008) (http://www.gene-regulation.com/pub/databases.html), and 386 PWMs from

UniPROBE (Newburger and Bulyk 2009) (http://the_brain.bwh.harvard.edu/uniprobe/), resulting in

939 PWMs. Duplicates were removed, models were trimmed in both ends for low information content (IC < 0.5 bit) and models with too low total IC (< 8 bits) were removed, resulting in 832

PWMs. PWMs were scored against each other using the pearson correlation coefficient, and were clustered by hierarchical clustering, using the statistical software R. Maximizing the ratio between

5

interclustal and intraclustal variance, a hierarchical tree threshold was chosen, producing 500 clusters of PWMs. PWMs within each cluster was aligned with pairwise Smith-Waterman local alignment, re-trimmed for low flanking IC (< 0.5 bit) and the resulting ‘condensed’ PWM extracted.

Peak overlaps and summit distances

To assess overlaps between peaks of CEBP A, B or C clusters and peaks of other transcription factors (EGR1, FOXA2, ONECUT1 (HNF6), KLF4, MYC, ), we required a minimum overlap of 1 nt. External data was acquired from: FOXA2 (Hoffman et al. 2010), ONECUT1 (HNF6)

(Laudadio et al. 2012) and , KLF4, MYC (Chen et al. 2008). Overlap p-values of peaks from these data sets and CEBP A versus C regions were calculated using a one-tailed Fisher's exact test

(hypergeometric test), using the statistics package R 2.15. Distances between CEBP C and A versus

EGR1-36hrs peaks were calculated as summit-to-summit distance using R software. The Seqminer software (Ye et al. 2011) was used for CEBP-peak proximal EGR1-36hrs binding visualization, centering on CEBP A or C peak summits. Full lists of genes assigned to overlapping/non- overlapping sets of CEBP versus FOXA2 or EGR1 can be found in Supplemental Table S9.

Transcription factor network

The 120 CEBP targeted genes with the highest expression (POL2 coverage) were extracted using the GO-term for biological process: ‘regulation of transcription, DNA-dependent’. Genes were displayed as nodes with node size relative to summed POL2 coverage during all eight time points

(normalized for gene lengths), while the color intensity (blue) was used to denote number of separate CEBP bound regions for each node/gene. See Supplemental Table S8 for full list and expression levels.

6

CEBP ChIP set validation

Basewise conservation scores using 59 vertebrate genomes versus mouse (phylo-P) (Hubisz et al.

2011) were downloaded from UCSC (http://genome.ucsc.edu/) using our 11,314 consensus regions and displayed using R, centering on the CEBP motif (Supplemental Fig. S4). For de-novo motif discovery, we utilized MEME-ChIP (http://meme.nbcr.net/meme) v. 4.9.0 with standard parameters

(Machanick and Bailey 2011), searching against JASPAR and TRANSFAC motif databases with

TOMTOM.

7

Supplemental References

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1): 25-29.

Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. 2008. JASPAR, the open access database of -binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 36(suppl 1): D102-106.

Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J et al. 2008. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6): 1106-1117.

Hoffman BG, Robertson G, Zavaglia B, Beach M, Cullum R, Lee S, Soukhatcheva G, Li L, Wederell ED, Thiessen N et al. 2010. co-occupancy, nucleosome positioning, and H3K4me1 regulate the functionality of FOXA2-, HNF4A-, and PDX1-bound loci in islets and liver. Genome Res 20(8): 1037-1051.

Huang da W, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1): 44-57.

Hubisz MJ, Pollard KS, Siepel A. 2011. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 12(1): 41-51.

Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The browser at UCSC. Genome Res 12(6): 996-1006.

Kuhn R, Schwenk F, Aguet M, Rajewsky K. 1995. Inducible gene targeting in mice. Science 269(5229): 1427-1429.

Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3): R25.

Laudadio I, Manfroid I, Achouri Y, Schmidt D, Wilson MD, Cordi S, Thorrez L, Knoops L, Jacquemin P, Schuit F et al. 2012. A feedback loop between the liver-enriched transcription factor network and miR-122 controls hepatocyte differentiation. Gastroenterology 142(1): 119-129.

Lee YH, Sauer B, Johnson PF, Gonzalez FJ. 1997. Disruption of the c/ebp alpha gene in adult mouse liver. Mol Cell Biol 17(10): 6014-6022.

Lopez RG, Garcia-Silva S, Moore SJ, Bereshchenko O, Martinez-Cruz AB, Ermakova O, Kurz E, Paramio JM, Nerlov C. 2009. CEBPA and beta couple interfollicular keratinocyte proliferation arrest to commitment and terminal differentiation. Nat Cell Biol 11(10): 1181-1190.

Machanick P, Bailey TL. 2011. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27(12): 1696-1697.

8

Marstrand TT, Frellsen J, Moltke I, Thiim M, Valen E, Retelska D, Krogh A. 2008. Asap: a framework for over-representation statistics for transcription factor binding sites. PLoS One 3(2): e1623.

Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K et al. 2006. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34(suppl 1): D108-110.

Newburger DE, Bulyk ML. 2009. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res 37(suppl 1): D77-82.

Nix DA, Courdy SJ, Boucher KM. 2008. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9: 523.

Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. 2010. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res 38(suppl 1): D105-110.

Pruitt KD, Tatusova T, Maglott DR. 2005. NCBI Reference Sequence (RefSeq): a curated non- redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33(suppl 1): D501-504.

Weischenfeldt J, Damgaard I, Bryder D, Theilgaard-Monch K, Thoren LA, Nielsen FC, Jacobsen SE, Nerlov C, Porse BT. 2008. NMD is essential for hematopoietic stem and progenitor cells and for eliminating by-products of programmed DNA rearrangements. Genes Dev 22(10): 1381-1396.

Ye T, Krebs AR, Choukrallah MA, Keime C, Plewniak F, Davidson I, Tora L. 2011. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res 39(6): e35.

Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9): R137.

9

Supplemental material legends

Figure S1. Test of CEBPA and CEBPB antibody specificity. Left hand panel shows chromatin immunoprecipitations with CEBPA or CEBPB antibodies, using either Cebpa or Cebpb KO mouse livers. Enrichment is shown as qPCR ratios of a positive (Pepck) over a negative (Mamstr) primer set. N=3, biological replicates. Right hand panel shows ChIP with the same antibodies and Mock IgG, was well as abrogation of pulldowns with epitope specific blocking peptides (p61 and p150). N=3 or 4, biological replicates. Error bars show SEM. Primer sequences in Supplemental Table S10.

Figure S2. Overview of genomic positions of mapped reads of each time point and antibody used. Gene models were acquired from the RefSeq (Pruitt et. al 2005) repository, defining proximal 5’ as 3000 bp upstream and 3’ as 1000 bp downstream.

Figure S3. qPCR validation of CEBPA and CEBPB ChIP consistency. Shown are IP ratios of the four key time points (0, 3, 24, 36 hrs), comparing pull-down levels for each indicated region vs. a negative region (Mamstr-primer set). All ratios are normalized to Cmbl vs. Mamstr ratios as a standard (Cmbl ratio is constant throughout the time course). Two series of three individual ChIPs (independent biological replicates) are shown, blue designates original ChIPs used for sequencing, green represents a new partial hepatectomy experiment performed independently, to demonstrate consistency. N=3 each series, Error bars show SEM. Primer sequences in Supplemental Table S10.

Figure S4. Quality assesment of CEBPA and CEBPB ChIP-sequencing data sets. (A) Phylo-P conservation scores of regions (+/-50 bp, left panel; +/-300 bp right panel) centered on the CEBP motif (as in Supplemental Fig. S4), delineated by green lines. (B) De-novo motif search by MEME, depicting the top identified motif and the top match (all MAO102.2 from JASPAR), for the listed data sets. Lower four sets not mapped-read normalized, but required to have more than 50 read coverage at each max peak height.

Figure S5. PWM sequence hits in A and C cluster enhancers. (A) Condensed db PWM logos (see Methods) for CEBPs and EGR1. (B) Sequence positions of PWM hits in the four enhancers (Smg7, Palm, Fbxo9 and Mart1) depicted in Fig. 5D. CEBP hits are red and EGR1 are yellow. (C) Seqminer visualization of EGR1 binding (36hrs set) proximal to A and C CEBP peak summits (centers), +/-5 kb position, intensity of blue denotes EGR1 max peak height.

Figure S6. Confirmation of CEBPA and CEBPB co-occupancy and test of positive sites for sequential ChIP. (A) Depiction of enrichments for CEBPA (first round) and CEBPB (second) sequential ChIP and enrichments, qPCR, selected sites representing A and C cluster regions (sc-61 and sc-150). Enrichments equal ratios of quantification with denoted primers to Sfi2, normalized to enrichment ratios using IgG as second round antibody. (B) ‘Single’-ChIP tests using antibodies utilized in Fig. 4F for CEBP-transcription factor X sequential ChIP. Enrichments as in (A). N=2-4, Error bars show SEM. Primer sequences in Supplemental Table S10.

Figure S7. Co-binding of CEBPA and CEBPB assessed by peak summit distances. (A) Distances between CEBPA A cluster peaks and the nearest CEBPB peak determined by peak calls at 0 and 24

10

hrs. Distances from CEBPA peaks to a random genomic position included for to illustrate background distribution. (B) As above, with CEBPA C cluster peaks.

Table S1. Listing of all ChIPs read and mapping statistics.

Table S2. Peak calling statistics, before and after filtering. Post-consensus: all significant peaks were used to build a sum track for all possible CEBP binding locations during liver regeneration. All positions with a coverage above 50 (normalized value) was included in the downstream analysis.

Table S3. List of all 11,314 CEBP peaks defined from consensus track, post maximal coverage filtering (> 50). Listed is: Mm9 genomic positions, region size in bps, maximal read coverage (normalized) for all eight time points for CEBPA and CEBPB ChIPs, maximal summed peak position, assigned gene feature and CEBP cluster relationship (A, B, C or none).

Table S4. RNA polymerase II (POL2) coverage data for all putative CEBP target genes, eight time points. Normalized across all time points. Listed is: Gene symbol, CEBP binding cluster assignment, GO-terms, gene position and POL2 coverage data.

Table S5. Full list of GO-analysis results. The Biological Process term analysis output from DAVID is shown. Number of genes and genes symbols associated with listed GO-terms is indicated, with percentage of total, raw and Benjamini-corrected P values for multiple testing. All analyses performed on genes with activity change (POL2 coverage up or down >1.55 fold, time points 0-3, 3-24 and 24-36 hrs). Gene sets analyzed shown: A, B or C cluster associated.

Table S6. TF binding motif discovery results using the ASAP tool (Marstand et al. 2008). First entry contains indication of TFs, second the names of motifs used for constructing the consensus PWMs. Indicated is P values, Z scores, number of regions with or without a sequence hit (positive or negative) for each cluster (A, B or C) and for the background set (bg).

Table S7. POL2 gene body read coverage for all genes in the RefSeq repository covered. Gene symbols are listed, with gene position, strand, and normalized (mapped read numbers) coverage for all eight time points.

Table S8. List of all CEBP A, B or C cluster target genes with the Biological Process GO-term ‘Regulation of transcription, DNA-dependent’. Gene symbol, cluster association, all GO-terms for Biological Process, Cellular Component, Molecular Function, total body POL2 coverage counts and counts/reads pr. kilobase (RPK) values are shown. Cut-off of >10,000 POL2 gene body read coverage.

Table S9. List of all identified EGR1/FOXA2 bound regions, separated in groups of regions overlapping with the CEBP A or C cluster regions or not overlapping. Position of each region is given, with associated gene feature, and POL2 body coverage.

Table S10. List of sequences of used primer sets for qPCR of CEBPA, CEBPB, EGR1 and POL2 ChIPs or sequential ChIPs of CEBP (primary antibody) and EGR1, HNF1, ONECUT1 (HNF6), MAFB, E2F3 (secondary antibody).

11

Jakobsen_FigS1

ChIP with KO mice (n=3) ChIP with specific blocking peptides (n>3) 50 80

70 40 60

n t n t e 30 50 e h m h m

c 40 i c 20 i n r n r E 30 E

10 20

10 0 0 CEBPA CEBPB CEBPA CEBPB CEBPA CEBPB Mock-IgG CEBPA CEBPB CEBPA CEBPB

CEBPA-KO CEBPA-KO No peptide p61 (alpha-block) p150 (beta-block) Jakobsen_FigS2 0h CEPBA 3h CEPBA 8h CEPBA 16h CEPBA 24h CEPBA 36h CEPBA 48h CEPBA 168h CEPBA

6% 7% 6% 7% 5% 7% 9% 8% 27% 25% 24% 31% 30% 30% 34% 29% 45% 46% 8% 47% 47% 44% 45% 8% 8% 7% 7% 7% 7% 47% 47% 6% 11% 12% 10% 11% 10% 12% 10% 11%

0h CEPBB 3h CEPBB 8h CEPBB 16h CEPBB 24h CEPBB 36h CEPBB 48h CEPBB 168h CEPBB

5% 14% 13% 7% 7% 6% 20% 11% 8% 8% 30% 27% 28% 26% 10% 32% 32% 9% 45% 47% 16% 8% 47% 44% 8% 46% 8% 47% 7% 48% 7% 7% 14% 46% 11% 11% 11% 10% 11% 11%

0h POL II 3h POL II 8h POL II 16h POL II 24h POL II 36h POL II 48h POL II 168h POL II

4% 8% 18% 6% 7% 18% 6% 21% 10%14% 6% 19% 13% 18% 19% 11% 13% 12% 26% 13% 14% 12% 13% 19% 10% 19% 22% 18% 20% 16% 20% 13% 47% 43% 34% 43% 39% 48% 42% 47%

FOXA2 ONECUT1 (HNF6) IgG Mock 24h EGR1 36h EGR1 (Ho man et al. 2010) (Lauladio et al. 2012)

9% 10%7% 3% 3% 4% Exonic Intronic 32% 9% 36% 35% 40% 12% 49% 50% 40% 44% 5’ proximal 3’ proximal Intergenic 7% 61% 9% 5%6% 5%6% 6%10% Jakobsen_FigS3

120 120 CEBPA CEBPB 100 100 -p21 -p21 80 80

60 qPCR-nIP 60 qPCR-nIP qPCR-org qPCR-org 40 40

20 20

0 0 a-0 a-3 a-24 a-36 b - 0 b - 3 b - 24 b - 36

120 120 CEBPA CEBPB 100 100 -Ece2 -Ece2 80 80

60 qPCR-nIP 60 qPCR-nIP qPCR-org qPCR-org 40 40

20 20

0 0 a-0 a-3 a-24 a-36 b - 0 b - 3 b - 24 b - 36

120 120 CEBPA CEBPB 100 100 -Cdk19 -Cdk19 80 80

60 qPCR-nIP 60 qPCR-nIP qPCR-org qPCR-org 40 40

20 20

0 0 a-0 a-3 a-24 a-36 b-0 b-3 b - 24 b - 36

120 120 CEBPA CEBPB 100 100 -Elf1 -Elf1 80 80

60 qPCR-nIP 60 qPCR-nIP qPCR-org qPCR-org 40 40

20 20

0 0 a-0 a-3 a-24 a-36 b - 0 b - 3 b - 24 b - 36 Jakobsen_FigS4

A Conservation scan - +/- 50 bp Conservation scan - +/- 300 bp 0.40 0.25 0.35 0.30 0.20 phyloP-score phyloP-score 0.25 0.15 0.20

-40 -20 0 20 40 -300 -200 -100 0 100 200 300 CEBP-motif o set, bp CEBP-motif o set, bp

B De novo motif search, MEME top output TOMTOM top motif match

11,314 regions, consensus set used for temporal clustering

CEBPA_0 hrs, 10146 regions

CEBPB_0 hrs, 13858 regions

CEBPA_36 hrs, 12575 regions

CEBPB_36 hrs, 22546 regions Jakobsen_FigS5

A CEBP EGR1 Information content, bits content, Information

Position B

Smg7-enh:

1 AAGCGCGAGT GTGAGGCGCT ATGACGTATG GGCGATTGCA GCAGTACAAT GGCCCCTATC TTCCACCAGC GCCAACGCAC CGCCCACTCA CGGGAAAGAG 101 AGCCACCTAG TGAGAAACT Palm-enh:

1 CTGAGCCGCC CTGGCCGCTG GGTGACCTTG GGCCTGTTGC ATAACCTCTC TGGGCCTTGC AAAAACAAAA CTGGAAGCCA AGGCTTGGGC TGGGCTGTCA 101 CTGGCTTCCC ATTATCTCAT CTGGAGCTGA ATTATCTCAG GAGCTGGGAG ATGAGGCCCT CCCTGGTCCT CAACCGCCCA CACAAAGCAG CAAAGGAGCC 201 CAGATCCCCC ACTGGG Mart1-enh:

1 ATCTGAGTGA CACACAAACT TGTGAAATAC AAACAGGTTT TTATCTAGTA AAAGGCCAGT CAGTACACAG CAAACATTTG CACAATGCTC TAGGATTTCC 101 AAGCAGTTAC ACAATAACTA TTCCCTCTGA CGTCACATGG TTTGTGTGTT GGGAGAAATC CCTTGACCTG TGTTTACCTG GTAGAGGTGG GGTGGGGCTG 201 TCCCACC

Fbxo9-enh:

1 TCAGTACATT AAGTGACTAA AATGAATCCT CAGGAAAATA AAGTATTGCC TGGTTCAGTC TGTTAGCTAA TAACTTATCA GCACGGGCAG CTTGCTCAAC 101 ACTCAGTGTG GTCACTCCTA CACAGAAATG GCAATGAGGA AATGACCCTG GCAGTTAGCT AGGACTAATA ACAAATGGAT TTTATTGCCT CTGGTCTCTA 201 CACACAT

Cluster A C C 5 kb EGR1 - CEBP proximal binding proximal EGR1 - CEBP proximal binding proximal Jakobsen_FigS6 A

Sequential ChIP: CEBPA ( rst) - CEBPB (second) 80 G

I g 70

o t 60 d e z

i 50 l

a CEBPA - CEBPB 40 r m CEBPA - IgG (mock) o n

30 t n

e 20 m h

c 10 r i n

E 0

6 c a a 9 2 a 5 2 h K 7 p r F o d 1 a 1 a p h c g o f b x x c i N K l 3 P D p R a E o S L b m H A e F a F B / S P V 1 t u c e n O C-peaks A-peaks

B

ChIP : CEBP cooperative binding TFs

2 100 S

r o t c e t e d

ONECUT1 e v i

t MAFB a g

e HNF1

n 10

o t

EGR1 t n

e E2F3 m h c i r n e

e v i t

a 1 l r t s s c a 2 a e a 8 1 2 9 2 5 2 1 6 e 2 1 5 1 1 a 7 h h h p t r r f r f m e r t f t f y t c I x 3 1 1 d a e 1 L n a 4 R i o h a a l b g t b T a s f u o u p g l c o x C c c k x c 2 b 5 E K a a P P c I w D P l i p T S n m R d a o P N b d C d P D A P N R e c C r S M F H n W F B a n S F M O C - Peaks A - peaks Jakobsen_FigS7

A A-peaks 0 hrs 24 hrs 0 hrs CEBPA to random

0.4

0.3

0.2 Frequency / bp

0.1

0.0

0 10 20 30 40 50 Distance, CEBPA to CEBPB peak summit, bp B C-peaks 0 hrs 24 hrs 0 hrs CEBPA to random

0.3

0.2 Frequency / bp 0.1

0.0

0 10 20 30 40 50 Distance, CEBPA to CEBPB peak summit, bp

Jakobsen_Table S1

Sample Reads sequenced Reads mapped Mapped %

CEBPA 0h 11952975 8545044 71.49 CEBPA 3h 19122522 12030701 62.91 CEBPA 8h 15159577 9450972 62.34 CEBPA 16h 30838814 17771409 57.63 CEBPA 24h 30673479 19820835 64.61 CEBPA 36h 19585954 12040493 64.47 CEBPA 48h 19423238 12864212 66.23 CEBPA 168h 21204717 14523997 68.49 CEBPB 0h 14182252 10072689 71.02 CEBPB 3h 18707253 8781215 46.94 CEBPB 8h 25227738 16936261 67.13 CEBPB 16h 30748925 18385441 59.79 CEBPB 24h 30368800 20093505 66.16 CEBPB 36h 27889354 18344909 72.04 CEBPB 48h 26974052 18131540 67.21 CEBPB 168h 19799456 13375728 67.56 POL2 0h 7011135 5398049 76.99 POL2 3h 18923733 13556771 71.64 POL2 8h 18631553 13261404 71.17 POL2 16h 29021889 22419004 77.25 POL2 24h 26894791 20509980 76.26 POL2 36h 18068693 14150575 78.32 POL2 48h 20568740 14948104 72.67 POL2 168h 17501829 12977312 74.15 IgG Mock 12080146 7856971 65.04 EGR1 24h 76564977 50513140 65.97 EGR1 36h 44186504 41030531 92.85

13

Jakobsen_Table S2

Sample Peaks called # peaks after filtering CEBPA 0h 29385 Applied post-consensus CEBPA 3h 31005 Applied post-consensus CEBPA 8h 31269 Applied post-consensus CEBPA 16h 28011 Applied post-consensus CEBPA 24h 29849 Applied post-consensus CEBPA 36h 27063 Applied post-consensus CEBPA 48h 20063 Applied post-consensus CEBPA 168h 25974 Applied post-consensus CEBPB 0h 54576 Applied post-consensus CEBPB 3h 35258 Applied post-consensus CEBPB 8h 36260 Applied post-consensus CEBPB 16h 28722 Applied post-consensus CEBPB 24h 34100 Applied post-consensus CEBPB 36h 31660 Applied post-consensus CEBPB 48h 20610 Applied post-consensus CEBPB 168h 25783 Applied post-consensus Cluster A NA 3549 Cluster B NA 2818 Cluster C NA 3034 EGR1_36h 9895 3182 Onecut1 (Laudadio et al. 2012) 34227 NA/Mapping redone in-house KLF4 (Chen et al. 2008) 10875 NA/Repositioned to mm9 MYC (Chen et al. 2008) 3422 NA/Repositioned to mm9 E2F1 (Chen et al. 2008) 20699 NA/Repositioned to mm9 FOXA2 (Hoffman et al. 2010) 17435 NA/Mapping redone in-house

14