Supplementary Material - Combining binding affinities with open-chromatin data for accurate expression prediction

Florian Schmidt 1,2, Nina Gasparoni 3, Gilles Gasparoni 3, Kathrin Gianmoena 4, Cristina Cadenas 4, Julia K. Polansky 5, Peter Ebert 2,6, Karl Nordstr¨om 3, Matthias Barann 7, Anupam Sinha 7, Sebastian Fr¨ohler 8, Jieyi Xiong 8, Azim Dehghani Amirabad 1,2,6, Fatemeh Behjati Ardakani 1,2, Barbara Hutter 9, Gideon Zipprich 10, B¨arbel Felder 10, J¨urgenEils 10, Benedikt Brors 9, Wei Chen 8, Jan G. Hengstler 4, Alf Hamann 6, Thomas Lengauer 2, Philip Rosenstiel 7, J¨ornWalter 3, and Marcel H. Schulz 1,2

1 Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarbr¨ucken, 66123, Germany. 2 Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨ucken, 66123, Germany. 3 Department of Genetics, University of Saarland, Saarbr¨ucken, Germany. 4 Leibniz Research Centre for Working Environment and Human Factors IfADo, Ardeystrasse 67, Dortmund, 44139, Germany. 5 Experimental Rheumatology, German Rheumatism Research Centre, Berlin, Germany. 6 International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbr¨ucken, 66123, Germany. 7 Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany. 8 Berlin Institute for Medical Systems Biology, Max-Delbr¨uck Center for Molecular Medicine, Berlin, Germany. 9 Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, Germany. 10 Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, Germany.

* To whom correspondence should be addressed. Email: [email protected]

1 Data 1

Within this study, we used data from the the DEEP project and from ENCODE. An 2

overview is provided in Supplementary Table 1. 3

4

1/30 Table 1. Overview on the data used within this study. Sample DEEP ID Manuscript ID 01 HepG2 LiHG Ct1 HepG2 41 Hf01 LiHe Ct LiHe1 41 Hf02 LiHe Ct LiHe2 41 Hf03 LiHe Ct LiHe3 51 Hf03 BlTN Ct T1 51 Hf04 BlTN Ct T2 51 Hf03 BlCM Ct T3 51 Hf04 BlCM Ct T4 51 Hf03 BLEM Ct T5 51 Hf04 BLEM Ct T6 DEEP File ID Data type 01 HepG2 LiHG Ct1 mRNA K 1.LXPv1.20150508 .fpkm tracking Quantified mRNA 41 Hf01 LiHe Ct mRNA K 1.LXPv1.20150530 genes.fpkm tracking Quantified mRNA 41 Hf02 LiHe Ct mRNA K 1.LXPv1.20150530 genes.fpkm tracking Quantified mRNA 41 Hf03 LiHe Ct mRNA K 1.LXPv1.20150530 genes.fpkm tracking Quantified mRNA 51 Hf03 BlCM Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 51 Hf04 BlCM Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 51 Hf03 BlEM Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 51 Hf04 BlEM Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 51 Hf03 BlTN Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 51 Hf04 BlTN Ct mRNA M 1.LXPv1.20150708 genes.fpkm tracking Quantified mRNA 01 HepG2 LiHG Ct1 DNase S 1.bwa.20140719.bam Dnase-1 seq 41 Hf01 LiHe Ct DNase S 1.bwa.20131216.bam Dnase-1 seq 41 Hf02 LiHe Ct DNase S 1.bwa.20131216.bam Dnase-1 seq 41 Hf03 LiHe Ct DNase S 1.bwa.20150120.bam Dnase-1 seq 51 Hf03 BlCM Ct NOMe S 1.NCSv2.20150513.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal 51 Hf04 BlCM Ct NOMe S 1.NCSv2.20150609.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal 51 Hf03 BlEM Ct NOMe S 1.NCSv2.20150513.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal 51 Hf04 BlEM Ct NOMe S 1.NCSv2.20150609.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal 51 Hf03 BlTN Ct NOMe S 1.NCSv2.20150513.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal 51 Hf04 BlTN Ct NOMe S 1.NCSv2.20150729.GRCh37.cpg.filtered.GCH.peaks.fdr001.bed NOMe signal ENCODE Accession Number Data type ENCFF000DYC Quantified mRNA of K562 ENCFF000SVN Dnase-1 seq of K562 ENCFF000CZF Quantified mRNA of GM12878 ENCFF000SKV Dnase-1 seq of GM12878 ENCFF000SKW Dnase-1 seq of GM12878 ENCFF000SKZ Dnase-1 seq of GM12878 ENCFF000SLB Dnase-1 seq of GM12878 ENCFF000SLD Dnase-1 seq of GM12878 ENCFF000DHQ Quantified mRNA of H1-hESC ENCFF000DHS Quantified mRNA of H1-hESC ENCFF000DHU Quantified mRNA of H1-hESC ENCFF000DHW Quantified mRNA of H1-hESC ENCFF000SOA Dnase-1 seq of H1-hESC ENCFF000SOC Dnase-1 seq of H1-hESC ENCFF000BFC Signal of H3K27ac in HepG2 ENCFF001SWK Peaks of H3K27ac in HepG2 ENCFF000BGB Signal of H3K4me3 in HepG2 ENCFF001SWO Peaks of H3K4me3 in HepG2 ENCFF000BWY Signal of H3K27ac in K562 ENCFF001SZE Peaks of H3K27ac in K562 ENCFF000BYB Signal of H3K4me3 in K562 ENCFF001SZJ Peaks of H3K4me3 in K562 ENCFF000ASJ Signal of H3K27ac in GM12878 ENCFF001SUG Peaks of H3K27ac in GM12878 ENCFF001EXX Signal of H3K4me3 in GM12878 ENCFF001WYJ Peaks of H3K4me3 in GM12878 ENCFF000AWK Signal of H3K27ac in H1-hESC ENCFF001SUX Peaks of H3K27ac in H1-hESC ENCFF000AXO Signal of H3K4me3 in H1-hESC ENCFF001SVC Peaks of H3K4me3 in H1-hESC ENCODE Accession Number TF ChIP in K562 ENCSR000BNU ATF3 ENCSR000BRT CBX3 ENCSR000BRQ CEBPB ENCSR077DKV CREM ENCSR000DWE CTCF ENCSR000BLI E2F6 ENCSR000BNE EGR1 ENCSR000BMD ELF1 ENCSR000BKQ ETS1 ENCSR000BMV FOSL1 ENCSR000BLO GABPA ENCSR000BKM GATA2 ENCSR000EFV MAX ENCSR000BNV MEF2A ENCSR000BRS NR2F2 ENCSR000BQY PML ENCSR000BKV RAD21 ENCSR000BMW REST ENCSR920BLG SIN3A ENCSR000BGX SIX5 ENCSR000FCD SMAD5 ENCSR000BKO SP1 ENCSR000BGW SPI1 ENCSR000BLK SRF ENCSR000BRR STAT5A ENCSR000BKS TAF1 ENCSR863KUB TCF7 ENCSR000BRK TEAD4 ENCSR000BNN THAP1 ENCSR000BKT USF1 ENCSR000BKU YY1 ENCSR000BKF ZBTB33 ENCSR000BME ZBTB7A

2/30 ENCODE Accession Number TF ChIP in HepG2 ENCFF002CTS ARID3A ENCSR000BID BHLHE40 ENCFF002CTU BRCA1 ENCFF002CTV CEBPB ENCSR000DUG CTCF ENCSR000BMZ ELF1 ENCFF002CUA ESRRA ENCSR000ARI EZH2 ENCSR000BHP FOSL2 ENCSR000BMO FOXA1 ENCSR000BNI FOXA2 ENCSR000BJK GABPA ENCSR000BMC HDAC2 ENCSR000BLF HNF4A ENCSR000BNJ HNF4G ENCFF002CUD HSF1 ENCFF002CTY JUN ENCSR000BGK JUND ENCFF002CUG MAFF ENCFF002CUI MAFK ENCFF002CUJ MAX ENCSR000BQX NFIC ENCFF002CUY NR2C2 ENCFF002CUM NRF1 ENCSR000BOT REST ENCFF002CUT RFX5 ENCSR00BHU RXRA ENCSR000BJX SP1 ENCSR000BOU SP2 ENCFF002CUV SREBF1 ENCFF001VLB SREBF2 ENCSR000BLV SRF ENCSR000BJN TAF1 ENCFF002CUW TBP ENCSR200BJG TCF12 ENCFF002CUX TCF7L2 ENCSR000BGM USF1 ENCFF002CUZ USF2 ENCSR000BHR ZBTB33 ENCODE Accession Number TF ChIP in H1-hESC ENCFF002CIR ATF2 ENCFF002CIS ATF3 ENCFF002CQP BACH1 ENCFF002CIT BCL11A ENCFF002CQQ BRCA1 ENCFF002CQR CEBPB ENCFF002CQS CHD1 ENCFF002CQT CHD2 ENCFF002CQW CTBP2 ENCFF002CIU CTCF ENCFF002CIV EGR1 ENCFF002CJC EP300 ENCFF002CDT EZH2 ENCFF002CIW FOSL1 ENCFF002CIX GABPA ENCFF002CQX GTF2F1 ENCFF002CIY HDAC2 ENCFF002CQU JUN ENCFF002CQY JUND ENCFF002CDU KDM5A ENCFF002CQZ MAFK ENCFF002CRA MAX ENCFF002CRB MXI1 ENCFF002CQV ENCFF002CJA NANOG ENCFF002CRC NRF1 ENCFF002CJE POLR2A ENCFF002CJF POU5F1 ENCFF002CRD RAD21 ENCFF002CJG RAD21 ENCFF002CDV RBBP5 ENCFF002CJB REST ENCFF002CRE RFX5 ENCFF002CJH RXRA ENCFF002CRF SIN3A ENCFF002CJJ SIX5 ENCFF002CJK SP1 ENCFF002CJL SP2 ENCFF002CJM SP4 ENCFF002CJN SRF ENCFF002CRG SUZ12 ENCFF002CJO TAF1 ENCFF002CJP TAF7 ENCFF002CRH TBP ENCFF002CJQ TCF12 ENCFF002CJR TEAD4 ENCFF002CJS USF1 ENCFF002CRI USF2 ENCFF002CJT YY1 ENCFF002CRJ ZNF143

3/30 ENCODE Accession Number TF ChIP in GM12878 ENCFF002CGO ATF2 ENCFF002CGP ATF3 ENCFF002CGQ BATF ENCFF002CGR BCL11A ENCFF002CGS BCL3 ENCFF002CGT BCLAF1 ENCFF809BIO CBFB ENCFF002CGU CEBPB ENCFF804OVD CREM ENCFF002CGV EBF1 ENCFF515PNJ EED ENCFF002CGW EGR1 ENCFF002CGX ELF1 ENCFF002CHI EP300 ENCFF002CGY ETS1 ENCFF191HSP ETV6 ENCFF002CGZ FOXM1 ENCFF002CHA GABPA ENCFF002CHB IRF4 ENCFF939TZS JUNB ENCFF002CHC MEF2A ENCFF002CHD MEF2C ENCFF002CHE MTA3 ENCFF002CHF NFATC1 ENCFF002CHG NFIC ENCFF002CHJ PAX5 ENCFF002CHK PAX5 ENCFF002CHL PBX3 ENCFF002CHM PML ENCFF002CHO POLR2A ENCFF002CHP POU2F2 ENCFF002CHR RAD21 ENCFF002CHH REST ENCFF002CHS RUNX3 ENCFF002CHT RXRA ENCFF002CHU SIX5 ENCFF374VLY SMAD5 ENCFF002CHV SP1 ENCFF002CHQ SPI1 ENCFF002CHW SRF ENCFF002CHX STAT5A ENCFF002CHY TAF1 ENCFF002CHZ TCF12 ENCFF002CIA TCF3 ENCFF144PGS TCF7 ENCFF002CIB USF1 ENCFF002CIC YY1 ENCFF694OTE ZBED1 ENCFF002CID ZBTB33 ENCFF002CIE ZEB1

4/30 2 Experimental Procedures 5

2.1 Isolation of primary human hepatocytes 6

Primary human hepatocytes (PHH) were isolated from healthy liver tissue resected from 7

patients suffering primary or secondary liver tumors. Informed consent of the patients 8

for the use of tissue for research purposes was obtained and experiments were approved 9

by the local ethical committees Isolation of PHH was performed by a two-step 10

EDTA/collagenase perfusion technique as described in [25]. The PHHs were separated 11

from the non parenchymal cells (NPC)-containing fraction by centrifugation at 50g, 12 ◦ 5min, 4 C and non-viable cells were removed by a 25% Percoll density gradient 13 ◦ centrifugation at 1250g, 20min, 4 C without brake [25]. After washing with PBS, PHH 14

were resuspended in Hepacult cold storage solution for overnight shipping of 15

hepatocytes on ice from the surgical department to the corresponding facility. Upon 16

arrival cells were counted in a Neubauer counting chamber by Trypan blue staining to 17

control viability. For DNA preparation 5 x 106 PHH were pelleted, snap frozen in liquid 18 ◦ N2 and stored at −80 C until further processing. For RNA preparation 5 x 106 19 ◦ hepatocytes were homogenized in 1mL Trizol, snap frozen and stored at −80 C until 20

further processing. 10 x 106 cells were used for nuclei isolation and subsequent DNaseI 21

digestion. Nuclei were obtained after treatment with IGEPAL at a final concentration 22 ◦ of 0.1% for 15 min. DNaseI treatments were performed for 3min at 37 C with 0, 40, 50, 23

60, 80U/mL. The reaction was stopped by adding stop buffer containing proteinase K 24 ◦ ◦ and incubating at 55 C for 1h. Samples were stored at 4 C until further processing. 25

2.2 DNaseI analysis of primary human hepatocytes 26

DNaseI-sequencing was performed according to a publicly available ENCODE protocol 27

(https://genome.ucsc.edu/ENCODE/protocols/cell/mouse/TReg- 28

Activated Stam protocol.pdf) with some modifications. Briefly, 1 x 107 cells (HepG2, 29

liver hepatocytes (LiHe)) were resuspended in 2ml buffer A (60mM KCl, 15mM 30

Tris-HCl (pH 8.0), 15mM NaCl, 1mM EDTA (pH 8.0), 0.5mM EGTA (pH 8.0), 31

0.5mM spermidine free base), supplemented with complete protease inhibitor cocktail 32

(Roche, Basel, Switzerland). Nuclei were extracted by adding an equal volume of buffer 33

A with 0.1% -0.2% of NP-40 and incubation on ice for 10-15min. Nuclei were washed in 34 ◦ buffer A and aliquots of 2.3 x 106 nuclei were digested for 3min at 37 C in DNase 35

buffer (13.5mM Tris-HCl pH 8.0, 88.5mM NaCl, 54mM KCl, 6mM CaCl2, 0.9mM 36

EDTA, 0.45mM EGTA, 0.45mM spermidine) with different amounts of DNase I 37

(Roche; 40 − 80U/ml). The reactions were stopped by adding an equal volume of stop 38

buffer (50mM Tris-Cl (pH 8.0), 100mM NaCl, 0.1% SDS, 100mM EDTA (pH 8.0), 39

1mM spermidine and 0.3mM spermine) with proteinase K (50µg/ml) and incubated at 40 ◦ 55 C for 1h. DNA was then purified using phenol chloroform extraction and quality 41

controls (agarose gels, qPCRs) were performed to determine the optimal digestion level 42

per sample. Double-hit fragments of 100bp-500bp were selected using either 43

gel-electrophoresis followed by electro-elution (HepG2, LiHe1, LiHe2) or sequential 44

purifications with Agencourt AMPure XP Beads (Beckman Coulter, Brea, USA; LiHe3). 45

The sequencing libraries were prepared from 8ng of purified DNA using the TruSeq 46

ChIP Library Preparation kit (Illumina, San Diego, USA) according to the 47

manufacturers protocol and sequenced on HiSeq v3 paired-end flow cells (HiSeq2500 48

system). The resulting sequences were processed according to the DEEP GAL v1 49

process (https://bioinf.mpi-inf.mpg.de/apex/f?p=901:3:::NO:::). Alignments were 50

produced with BWA [52], sorted with samtools [53] and duplicated reads were marked 51

with Picard tools (http://broadinstitute.github.io/picard). 52

5/30 2.3 T-cell isolation from human peripheral blood 53

PBMC from peripheral blood of healthy female donors were isolated from leukocyte 54

filters obtained from the blood bank of the Charit´eUniversity Hospital, Institute of 55

Transfusion Medicine as part of the DEEP project. Sampling and analysis of human T 56

cell population was approved by the ethics committee of the Charite 57

Universitaetsmedizin Berlin (Ethikkommission, Ethikausschuss 1 am Campus Charite 58

Mitte) under the application numbers EA1/116/13. PBMCs were isolated by density 59

gradient centrifugation using Lymphocyte Separation Medium LSM 1077 (PAA). 60

Remaining erythrocytes were lysed in Erythrocyte-Lysis-Buffer (Buffer EL, Qiagen). 61

CD4+ T lymphocytes were enriched by MACS-technology and human CD4 MicroBeads 62

(Miltenyi Biotec) on an AutoMACS-instrument (Miltenyi Biotec) and subsequently 63

sorted to high purity (> 95%) by using flow cytometry cell sorting (BD Bioscience). 64 ◦ Each sample was split into 4 fractions, snap-frozen and stored as a cell pellet at −80 C. 65

For RNA-seq, one aliquot was pelleted, resuspended and homogenized in QIAzol Lysis 66

Reagent (Qiagen) using a 21G needle before snap-freezing. Aliquots of several donors 67

were pooled for the analysis of genome-wide epigenetic signatures. Pools for ChIPseq 68

and NOMe-seq were fixed with 1% methanol-free formaldehyde for 5min at RT, 69

quenched with 0.125M glycine for 5min RT and washed with PBS. 70

2.4 NOMe analysis of T-cells 71

Nuclei from formaldehyde fixed cells were extracted using nuclei extraction buffer 72

(60mM KCl; 15mM Tris-HCl, pH 8.0; 15mM NaCl; 1mM EDTA, pH 8.0; 0.5mM 73

EGTA, pH 8.0; 0.5mM spermidine free base) supplemented with complete protease 74

inhibitor cocktail (Roche, Basel, Switzerland) and 0.1% NP40 (Sigma-Aldrich, St. Louis, 75

USA) and incubated on ice for 30min. During incubation each sample was dounced 76

10-20 times with a douncing pistil (Qiagen, Hilden, Germany). Nuclei were centrifuged 77 ◦ (500g, 4 C, 8min) and the pellet was washed using the same buffer without NP-40. 78

After another centrifugation the pellet was gently resuspended in 90µl of 1x GpC-buffer 79

(NEB,Ipswich, USA) followed by addition of 70µl of NOMe reaction mix (7µl 10x GpC 80

buffer (NEB), 1.5µl of 32mM SAM (NEB), 45µl of 1M sucrose and 60U of M. CviPI 81 ◦ (NEB). The reaction was incubated 3h at 37 C and 0.5µl of SAM were added after one 82

and two hours. The reaction was stopped by addition of 160µl NOMe stop buffer 83

(20mM Tris-HCl, pH 8.0; 600mM NaCl; 1% SDS, 10mM EDTA) plus 10µl proteinase 84

K (20mg/ml, Sigma-Aldrich) followed by extraction of genomic DNA (classic phenol 85

chloroform purification). Next, 100ng of DNA were bisulfite converted with the EZ 86

DNA Methylation-Gold kit (Zymo, irvine , USA) and then subjected to NGS library 87

preparation using the TruSeq DNA Methylation Kit (Illumina, San Diego, USA) 88

according to the manufacturer’s protocol. All libraries were checked for adapter dimers 89

and fragment distribution on a Bioanalyzer HS chip (Agilent Technologies, USA). Each 90

sample was sequenced on a HiSeq2500 system using 1-2 lanes of a HiSeq v3 paired-end 91

flow cell. Raw read data was adapter trimmed (Trim Galore!), mapped using bwa [52] 92

and then duplicate reads were removed. Methylation levels of CpGs and GpCs 93

respectively were called using Bis-SNP [58]. Cytosines in a GCG context were excluded 94

from further analysis. Bisulfite conversion efficiency was checked genome-wide with all 95

cytosines in an HCH-context. Next, NOMe-seq peaks (nucleosome-depleted regions, 96

NDRs) were called with in-house scripts: After an initial segmentation with a two-state 97

binomial HM the genome into putative putative NDRs and background regions the 98

former was contrasted to flanking background regions with Fisher’s exact test. Finally 99

p-values were adjusted by calculating empirical false-discovery rates (eFDR) by 100

contrasting to NDRs based on shuffled GCH methylation values. Only NDRs with 101

eFDR below 0.01 were kept. 102

6/30 2.5 RNA-seq incl. RNA isolation 103

Pooled samples for RNAseq were homogenized in QIAzol Lysis (Qiagen) using a 21G 104

needle. After addition of chloroform and phase separation by centrifugation, total RNA 105

was extracted from the aqueous phase by using the miRNeasy Micro Kit (Qiagen) 106

following the manufacturers recommendations. Starting from 2x500ng totalRNA of RIN 107

9.6, one stranded total RNA and one stranded mRNA library were prepared according 108

to the manufacturer’s instructions (Illumina). Both libraries were sequenced for 2x101nt 109

on an Illumina HiSeq 2000, yielding about 100 million paired-end reads for each library. 110

2.6 Mapping of RNA data and quantification 111

BAM files of RNA-Seq reads were produced with TopHat 2.0.11 [102], with Bowtie 112 2.2.1 [50], and NCBI build 37.1 in --library-type fr-firststrand and 113 --b2-very-sensitive setting. Gene expression has been quantified using Cufflinks 114 version 2.0.2 [103], the hg19 reference genome and with the options 115 frag-bias-correct, multi-read-correct, and compatible-hits-norm enabled. 116

3 Gene expression prediction setup 117

In Supplementary Figure 1 the general machine learning setup used within this article is 118 illustrated. Using the full data set, we randomly divide the data 10 times using 20% as

Figure 1. General learning setup used in this study.

119 test and 80% as training data. On the training data, we optimise the α parameter of 120

the elastic net (between 0.0 and 1.0 using a stepsize of 0.01) and determine model 121

7/30 parameters in a 6-fold cross validation. Finally, the model performance is assessed on 122 the test data. The reported mean correlation values cavg are computed as the mean 123 over the test correlation of the outer folds. 124

4 Overview on Learning Results 125

The correlation values between predicted and measured gene expression for all samples 126 and all tested setups are shown in Supplementary Table 2. The reported correlation is 127 the mean correlation over 10 independent runs. As we found the variance to be very 128 −4 small (≤ 10 ), it is neglected in the table. 129

5 Supplementary Analysis 130

Within this section, we present several additional analysis, which were just briefly 131 mentioned in the main paper. 132

5.1 Hocomoco pwms 133

In addition to the 439 position weight matrices (pwms) from JASPAR [68] and 134

Uniprobe [37], we used 641 pwms from Hocomoco, version 10 [47], which is the complete 135 set of human mononucleotide profiles available on Hocomoco. We compared both sets in 136 gene expression learning using the samples HepG2, LiHe1, LiHe2, and LiHe3. As shown 137 in Supplementary Figure 2, the Hocomoco set performed constantly worse then the 138 other one. Thus, we decided to use JASPAR and Uniprobe pwms for all analysis in this 139 study.

Figure 2. Barplots showing the mean test correlation of the gene expression learning for four samples (HepG2, LiHe1, LiHe2, and LiHe3), all four introduced annotation setups and two sets of pwms (Jaspar/Uniprobe and Hocomoco). Overall, the Hocomoco PWM set performs worse than the Jaspar/Uniprobe pwms.

140

8/30 Table 2. Overview on gene expression learning results. Sample TEPIC Peaks using JAMM 3kb 50kb 3kb-S 50kb-S K562 0.55 0.60 0.67 0.69 GM12878 0.40 0.56 0.47 0.58 H1-hESC 0.43 0.49 0.46 0.52 HepG2 0.55 0.65 0.61 0.68 LiHe1 0.49 0.57 0.59 0.63 LiHe2 0.42 0.50 0.56 0.58 LiHe3 0.53 0.58 0.58 0.61 T1 0.43 0.58 0.45 0.59 T2 0.38 0.57 0.41 0.58 T3 0.41 0.58 0.43 0.59 T4 0.46 0.56 0.47 0.56 T5 0.44 0.60 0.47 0.62 T6 0.44 0.57 0.46 0.59 Sample TEPIC using H3K4me3 peaks 3kb 50kb 3kb-S 50kb-S HepG2 0.55 0.63 0.61 0.67 K562 0.52 0.64 0.59 0.66 H1-hESC 0.48 0.55 0.53 0.58 GM12878 0.61 0.67 0.65 0.69 Sample TEPIC using H3K27ac peaks 3kb 50kb 3kb-S 50kb-S HepG2 0.31 0.55 0.36 0.54 K562 0.41 0.61 0.47 0.60 H1-hESC 0.37 0.52 0.38 0.49 GM12878 0.46 0.62 0.50 0.63 Sample TEPIC using HINTBC Footprints 50kb 3kb 50kb 3kb-S 50kb-S HepG2 0.37 0.48 0.54 0.58 K562 0.55 0.66 0.61 0.67 H1-hESC 0.42 0.53 0.44 0.52 GM12878 0.50 0.65 0.54 0.64 TEPIC using HINTBC Footprints 24kb 3kb 50kb 3kb-S 50kb-S HepG2 0.31 0.45 0.53 0.58 K562 0.54 0.65 0.59 065. H1-hESC 0.36 0.49 0.42 0.51 GM12878 0.48 0.64 0.51 0.63 Sample FIMO + Peaks 3kb 50kb HepG2 0.48 0.60 K562 0.52 0.52 H1-hESC 0.35 0.40 GM12878 0.35 0.52 LiHe1 0.45 0.55 LiHe2 0.39 0.48 LiHe3 0.46 0.54 Sample FIMO Prior 3kb 50kb HepG2 0.61 0.63 K562 0.64 0.56 H1-hESC 0.46 0.43 GM12878 0.61 0.53 LiHe1 0.59 0.63 LiHe2 0.53 0.59 LiHe3 0.57 0.61 Sample ChIP ChIP-Restricted 3kb 50kb 3kb 50kb HepG2 0.63 0.71 0.55 0.66 K562 0.64 0.71 0.58 0.65 H1-hESC 0.588 0.657 0.32 0.52 GM12878 0.62 0.71 0.56 0.68 Sample TEPIC using TF expression filtering 3kb 50kb 3kb-S 50kb-S LiHe1 0.48 0.57 0.60 0.62 LiHe2 0.41 0.49 0.56 0.58 LiHe3 0.53 0.58 0.58 0.61 T1 0.42 0.57 0.45 0.60 T2 0.39 0.57 0.41 0.57 T3 0.40 0.57 0.42 0.59 T4 0.46 0.55 0.47 0.56 T5 0.43 0.59 0.47 0.62 T6 0.43 0.58 0.46 0.59 #Peaks in HepG2 Influence of peak number 3kb 50kb 3kb-S 50kb-S 10000 0.15 0.33 0.20 0.34 50000 0.26 0.52 0.32 0.54 100000 0.34 0.58 0.41 0.60 200000 0.40 0.61 0.50 0.64 300000 0.45 0.63 0.52 0.66 400000 0.48 0.64 0.55 0.66 500000 0.50 0.64 0.56 0.67 600000 0.52 0.64 0.58 0.67 700000 0.52 0.65 0.59 0.67 800000 0.54 0.65 0.60 0.68 900000 0.54 0.64 0.61 0.68 1023463 (All) 0.55 0.65 0.61 0.68 Sample TEPIC using MACS2 3kb 50kb 3kb-S 50kb-S HepG2 0.37 0.61 0.44 0.62 LiHe1 0.34 0.37 0.37 0.37 LiHe2 0.42 0.41 0.40 0.42 LiHe3 0.30 0.26 0.26 0.27 Sample TEPIC using Hocomoco pwms 3kb 50kb 3kb-S 50kb-S HepG2 0.51 0.63 0.55 0.64 LiHe1 0.47 0.56 0.57 0.62 LiHe2 0.40 0.50 0.53 0.57 LiHe3 0.45 0.55 0.50 0.56

9/30 5.2 Comparison of Peak Callers 141

In the main paper, we used the peak caller JAMM [38] to identify DNase hypersensitive 142

sites (DHS). To learn about the influence of peak calling on TEPIC, we compared the 143

gene expression learning between TEPIC with JAMM peaks and TEPIC with 144 MACS2 [122] peaks. We used the following MACS2 options: --keep-dup all 145 --genomesize 2900000000 --nomodel --shift -100 --extsize 200 --qvalue 146 0.05 For this comparison, we used the samples HepG2, LiHe1, LiHe2, and LiHe3. 147 The learning results are visualised in Supplementary Figure 3. In general, we observe 148

that JAMM peaks lead to a better correlation between predicted and measured gene 149

expression than MACS2 peaks. This general statement might be explained by the 150

difference between the overall number of called peaks, as JAMM computes far more 151

DHS sites, than MACS2. On HepG2, JAMM calls 1023463 DHS sites, while MACS2 152

calls only 65497 peaks. As we found that the overall number of peaks also influences the 153

learning (Main Figure 2b), this might be a explanation for the poor performance of 154

MACS2 peaks. Interestingly, using MACS2 peaks, scaling the TF score using the DNase 155

signal within a peak does not necessarily improve the learning result, e.g. for LiHe3 the 156

correlation drops form 0.3 to 0.26 comparing the 3kb and the 3kb-S approach (see 157

Supplementary Table 2 for all data). This could be due to the fact, that JAMM peaks 158

are rather sharp compared to MACS2 peaks, e.g. the mean peak with in HepG2 for 159

JAMM is 96bp while it is 534bp for MACS2. Thus, the average per signal intensity 160 might be more informative for JAMM peaks.

Figure 3. Comparison of gene expression learning results between MACS2 (blue) and JAMM (red) using the samples HepG2, LiHe1, LiHe2, and LiHe3 and all four annotation setups. We observe that MACS2 peaks generally lead to poorer results than JAMM peaks. In addition, the scaling of TF affinities using the average DNase-1 signal within the peaks does not necessarily work using MACS2 peaks.

161

10/30 5.3 Extension of Footprints 162

We were provided the footprint predictions in HepG2 and K562 by the authors of 163

HINT(BC) [31]. In HepG2, 452281 footprints with an average length of 18bp were 164 predicted. In K562, there are 738707 footprints, also with an average length of 18bp. As 165 we required the footprint regions to be at least as longs as the longest pwms, we 166 extended the footprint regions by centring a window in the middle of the footprints. We 167 tested two different window sizes: 24bp and 50bp. These windows were annotated using 168

TEPIC. 169

In Supplementary Figure 4, we show the results of the gene expression learning using 170 the 24bp (blue) and 50bp (orange) windows. Except for the 50kb-S case, the larger 171 windows outperform the smaller ones. The increase in performs is especially remarkable 172 in the 3kb and 50kb case in HepG2. There are various possible hypotheses to explain 173 this observation. It could be, that the footprints are just not very well placed and thus, 174 the smaller windows can not capture the actual binding sites. However, it is also 175 possible that there are additional binding sites around the footprint, which are just not 176 covered in the smaller window.

Figure 4. Comparison of gene expression learning results between two different footprint extensions: 24bp and 50bp. Footprints called in HepG2 and K562 were replaced by windows with length 24bp and 50bp centered at the middle of the footprint. Overall, we found the 50bp window to work slightly better than the 24bp window.

177

5.4 Comparing affinities and hit-based TF annotation in a HM 178

based segmentation 179

We conducted gene expression prediction experiments using TF gene scores calculated 180 by TEPIC in HM-peaks and using Fimo scores calculated in the same regions. We 181 considered two HMs, H3K4me3 and H3K27ac, and used the samples HepG2, K562, 182

GM12878, and H1-hESC. As shown in Supplementary Figure 5, the affinity-based scores 183 perform better than the hit-based scores computed by Fimo. 184

11/30 Figure 5. Here, the gene expression learning results using affinity and hit-based TF annotations based on either H3K4me3 or H3K27ac peaks for segmentation are shown. Affinity-based scores outperform Hit-based measures on all samples.

5.5 Reduced PWM sets versus ChIP-seq 185

In addition to the ChIP-seq comparison presented in the main paper, we conducted an 186 additional ChIP-seq study using only those TF ChIP-seq sets for which a PWM was 187 available and vice versa. By this restriction, we could use 36 pwms/TF-ChIP-sets for 188

HepG2, 20 pwms/TF-ChIP-sets for K562, 22 pwms/TF-ChIP-sets for GM12878, and 25 189 pwms/TF-ChIP-sets H1-hESC. The learning results are shown in Supplementary Figure 190

6. The overall performance decreased compared to the full set of pwms and compared to 191 all available ChIP-seq data sets. But, the performance of the TEPIC scores is still 192 reasonably well. The difference between the ChIP data and the TEPIC scores might be 193 explainable by some general drawbacks of computational TF annotation. Although we 194 use TF affinities, it could be that factors for which the binding site is not represented 195 properly by its pwm, are captured in the ChIP data only. In addition, using ChIP-seq 196

TFs can also be captured when they are part of a TF complex. This is not easily 197 possible using computational methods.

Figure 6. Gene expression learning results for reduced PWM and TF-ChIP-seq sets. For HepG2, 36 PWM/TF-ChIP-sets were used, 20 for K562, 22 for GM12878, and 25 for H1-hESCs.

198

5.6 Comparing models across samples 199

To determine whether the learned models to predict gene expression are tissue specific 200 or not, we conducted a across sample comparison. To this end, a model was learned on 201 data of a distinct sample and tested on all available samples. If the learned models are 202

12/30 tissue specific, we would assume that a model learned on T-Cell data performs well on 203 the remaining T-Cells but less accurately on the other tissues and cell-lines. In 204

Supplementary Figure 7, the results of this experiment are shown. The heatmap is 205 clustered according to model performance across samples. The clustering indicates that

Figure 7. Heatmap showing the mean test correlation of models learned on data indicated on the y-axis, tested on data indicated on the x-axis. The clustering combines the models according to overall performance similarities.

206 our models do unravel tissue specific features, e.g. the T-Cell samples cluster together 207 with the GM12878 samples, which is similar to the observation we made in the PCA 208 analysis presented in the main paper. In addition, HepG2 clusters together with 209 primary human hepatocytes. In concordance to our finding in the main paper regarding 210 the gene expression prediction performance of LiHe2, this samples seems to be an 211 outlier in this analysis too, which is probably due to the fewer DNase1-seq peaks in 212

LiHe2 compared to the other replicates. 213

5.7 Comparison to TF ChIP-seq data 214

As explained in the main paper, we conducted a ChIP-seq based evaluation of our TF 215 binding predictions. In Supplementary Figures 8, 9, 10, and 11 Precision-Recall Area 216

Under the Curve (PR-AUC) values are shown all for tested TFs. The mean PR-AUC 217 values in the main paper are computed using the scores shown in these Figures. 218

5.8 Expression of TFs 219

We were interested in the question, whether the full model (using all pwms) rather 220 selects expressed TFs than not expressed TFs. Note that the expression of TFs itself is 221 not included as a feature in the model. To answer this, we computed the mean 222 expression of all selected and all not selected TFs in the same set of samples mentioned 223 above. This is shown in Supplementary Figures 12 and 13 for the primary human 224 hepatocytes and the T-cells, respectively. The boxplots show the log of the mean 225 average expression of all TFs splitted up into the group of selected and not selected TFs. 226

227

It can be seen that the average expression of the selected TFs is higher than the 228 average expression of the not selected TFs. The absolute magnitude of the difference 229 between both classes varies; it can be rather strong, e.g for T3 and T6 in Supplementary 230

Figure 13 but also rather low, e.g. for LiHe2 in Supplementary Figure 12. 231

Motivated by this analysis, we reduced the number of features in the gene expression 232 learning by applying TF gene expression filtering. Within this step we removed all TFs 233

13/30 Figure 8. Lineplots representing PR-AUC in a TF-ChIP-seq comparison for HepG2.

Figure 9. Lineplots representing PR-AUC in a TF-ChIP-seq comparison for K562.

with an expression < 1.0 FPKM and additionally removed all TFs which could not be 234 mapped to human ensemble gene IDs. Using this reduced feature set, we performed the 235 learning for all primary cell samples: LiHe1, LiHe2, LiH3, T1, T2, T3, T4 T5, and T6. 236

The bar plots in Supplementary Figure 14 show the learning results for the 50kb-S 237 setup using either all TFs or the filtered set. Overall both sets perform highly similar. 238

Thus, this feature reduction step seems to be a sensible way of pre reducing the feature 239 space to simplify the interpretation of the model coefficients. The observation we made, 240 that the expression of selected TFs is higher than that of not selected TFs in the full 241 model, might explain why the models with and without filtering perform so similar. 242

14/30 Figure 10. Lineplots representing PR-AUC in a TF-ChIP-seq comparison for GM12878.

Figure 11. Lineplots representing PR-AUC in a TF-ChIP-seq comparison for H1-hESC.

Figure 12. Boxplots representing the log of the mean expression value of TFs splitted into TFs that were either selected as a feature in the elastic net models for the primary human hepatocytes or not.

15/30 Figure 13. Boxplots representing the log of the mean expression value of TFs splitted into TFs that were either selected as a feature in the elastic net models for the T-Cells or not.

Figure 14. Barplots showing the result of the gene expression learning using either all TFs (orange) as features, or only TFs that are expressed (grey). Throughout all samples, both sets perform very similar.

5.9 Analysis of primary human hepatocytes 243

As described in the main paper, we decided to use the gene expression models learned 244 on the 3kb-S and 50kb-S annotation for biological interpretation. In Supplementary 245

Table 3, we provide the feature coefficients for all selected features in both models as 246 well as an overview whether the corresponding factor is known to be liver related. 247

In Supplementary Figure 15a, the feature overlap between the three hepatocyte 248

16/30 Table 3. Features in the primary human hepatocytes

50000bp Intersection TF Score Related Source TCF7L2 0,1324316145 Yes [4, 115] TBP 0,0388126535 Yes [1, 34] PPARG::RXRA 0,0259175593 Yes [82] NFE2l2 0,0235156349 Yes [10] CEBPA 0,0224927361 Yes [12, 14] EGR1 0,0220044673 Yes [81] ELK4 0,0209802209 Yes [19] HNF1A 0,0206758677 Yes [12, 100] YY1 0,0195248486 Yes [61, 62] GATA4 0,0163353733 Yes [6] FOSL2 0,014689792 Yes [96] HIF1A::ARNT 0,0145326159 Yes [111] CEBPB 0,0134923036 Yes [12, 14] NFYA 0,0132833189 Yes [63] SRF 0,0132796748 Yes [98] SOX6 0,0132207154 Yes [30] AR 0,0132137683 Yes [22] NKX3-1 0,0127119621 No SOX5 0,0126338172 Yes [107] HNF4G 0,0122823459 Yes [12] ARNT 0,0109715299 Yes [111] SREBF1 0,0105844412 Yes [29] STAT1 0,0097921517 Yes [23] STAT2::STAT1 0,0097296335 Yes [23] MAFF 0,0092077541 Yes [43] ZFP161 0,0087134727 No ESRRA 0,0080757375 Yes [2] MAFK 0,0078615274 Yes [43] TCF12 0,0071552846 Yes [4, 115] ETS1 0,0070703966 Yes [90] STAT3 0,0070242676 Yes [32, 97] MYC 0,0069673712 Yes [56] NR5A2 0,0069016053 No REL 0,0058698492 Yes [24] SPI1 0,0050937084 No ZFX 0,0049297876 Yes [48] SOX13 0,0046325899 Yes [66] FOXA1 0,0044434002 Yes [12] ESR1 0,0042255442 Yes [115] TCF7L2 0,0023408456 Yes [75] NFIL3 0,0020449749 Yes [69, 101] FOXJ3 0,0015251978 No MAX 0,000005441 Yes [88] FOXP1 0,000001522 Yes [123] ZBTB33 -0,0003362672 No ZFP187 -0,0005472743 No ETV1 -0,0016160511 Yes [70, 108] DDIT3::CEBPA -0,0050241319 Yes [65, 77] JUND -0,0057180963 Yes [96] ZNF263 -0,0065129283 No PPARG -0,0065244593 Yes [121] -0,0095442846 Yes [54] HINFP -0,0095443561 No E2F6 -0,0108375651 No REST -0,0130289925 No TCF3 -0,0136221887 No CTCF -0,0140836489 Yes [28, 36] NRF1 -0,0146143934 Yes [114] JUN -0,0167329666 Yes [96] NR2C2 -0,0180692669 Yes [42] ZNF143 -0,0183869851 Yes [27, 39] RFX1 -0,0188795599 No ARNT::AHR -0,0290829512 Yes [111] NFYB -0,0510911612 Yes [3, 21, 26] SP1 -0,054073586 Yes [71] 3000bp Intersection TF Score Related Source NR2F2 0,0533551239 Yes [46, 79] TBP 0,0422406946 Yes [1, 34] ARID5A 0,0291251714 No EGR1 0,0257482247 Yes [81] PPARG::RXRA 0,021359751 Yes [82] CEBPA 0,0182935482 Yes [12, 14] SRF 0,0179912777 Yes [98] HSF1 0,0177514147 Yes [55] Arnt..Ahr 0,017294949 Yes [111] NFE2I2 0,0158341959 Yes [10] HNF4G 0,0150538484 Yes [12] HLF 0,0150535681 Yes [17, 20] HNF1A 0,0146969969 Yes [12, 100] STAT1 0,0133908806 Yes [23] GATA4 0,0131922235 Yes [6] SREBF1 0,012826621 Yes [29] STAT2::STAT1 0,012345449 Yes [23] CEBPB 0,0110754029 Yes [14] ETS1 0,01065977 Yes [90] NR5A2 0,0105825719 No EGR2 0,0105041662 Yes [41] SOX6 0,0101218474 Yes [30] ZFP281 0,0087242196 No SOX5 0,0075345091 Yes [107] ZFP740 0,0052149846 No NFIL3 0,0039272158 Yes [69, 101] NR3C1 0,002939117 Yes [73] TCF7L2 0,0027865269 Yes [75] FOXO1 0,0020635299 Yes [87] EWSR1-FLI1 0,0013408675 Yes [76]

17/30 TCF12 0,000892008 Yes [4, 115] NFE2L1::MafG 0,0008351026 No FOXP1 0,0007172305 Yes [123] ZFP187 -0,0003232196 No MAX -0,0009493691 Yes [88] ZBTB33 -0,0017353254 No DDIT3::CEBPA -0,002032019 Yes [65, 77] ZFP410 -0,0036966614 No NFKB1 -0,0039982392 Yes [86] IRF1 -0,0042283527 Yes [9, 40] RREB1 -0,0047828011 No THAP1 -0,0050645346 No NR1H2::RXRA -0,0092038599 Yes [5] CTCF -0,0098416089 Yes [28, 36] ZEB1 -0,0108784155 No NRF1 -0,011446867 Yes [114] PPARG -0,0118073828 Yes [121] REST -0,0127368744 No ARNT::AHR -0,0138593738 Yes [111] USF1 -0,0149067325 Yes [11] NR2C2 -0,0150924028 Yes [42] RFX1 -0,0217842402 No ZNF143 -0,0219362479 Yes [27, 39] JUN -0,0238801745 Yes [96] Klf7 -0,0271065176 No NFYB -0,0354297411 Yes [3, 21] Elf2 -0,0443044154 No RFXDC2 -0,0484289964 No SP1 -0,0537070776 Yes [71] GMEB1 -0,0958298307 No HBP1 -0,1238455613 No

samples for the 3kb-S setup is shown. Overall, 61(35.3%) of the features overlap 249

between the replicates and only very few features are selected uniquely in one replicate 250

(13, 17, and 3). Within Supplementary Figure 15b the feature coefficients for the top 10 251

positive and negative features in the 3kb-S setup, are shown. The effect of having both 252

a strong negative coefficient and a strong positive coefficient for the same feature 253

between two replicates, as it occurred for Arid5a in LiHe2 and LiHe3, could be avoided 254 by considering the replicate information in the learning itself.

a b

* Top 10 positive features

… *

Top 10 * negative features * *

* *

Feature value

Figure 15. (a) Venn diagram showing the feature overlap between the hepatocyte samples for the models using the 3kb-S annotation. (b) Heatmap representing the top 10 positive and top 10 negative features in the liver samples using the 3kb-S annotation.

255

18/30 5.10 Analysis of T-Cells 256

As described in the main paper, we decided to use the gene expression models learned 257 on the 3kb-S and 50kb-S annotation for biological interpretation. In Supplementary 258

Table 4, we provide the feature coefficients for all selected features in both models as 259 well as an overview whether the corresponding factor is known to be T-cell related. 260

In Supplementary Figure 16a, the feature overlaps between the T-cell replicates for 261 the 3kbs-S setup are shown. Overall, 32.7% of the features are shared between the 262

T-cell samples. Within Supplementary Figure 16b the Feature coefficients for the top 10 263 positive and negative features in the 3kb-S setup, are shown. TF names labelled with a 264

∗ indicate that we could not find literature proof for those factors to be involved with 265 T-cells. For the remaining factors, we were able to do so, see Table 4.

a b

Top 10 positive features *

* … * * Top 10 * * negative * * features * * * * *

Overlapping features Feature value

Figure 16. (a) Heatmap showing the feature overlap between the T-cell replicates using the 3kb-S annotation. (b) Heatmap representing the top 10 positive and top 10 negative features in the T-cell samples using the 3kb-S annotation.

266

19/30 Table 4. Overview on T cell coefficients 50.000 Intersection TF Score Related Source SP100 0,2663187508 Yes [78] GMEB1 0,1504942689 Yes [44] ETV3 0,0884743483 Yes [112] YY1 0,0548309734 Yes [89] ETS1 0,039425311 Yes [16] ZBTB33 0,034082116 No NRF1 0,03367 Yes [80] USF2 0,0335490723 No STAT2::STAT1 0,02477 Yes [104] ATF1 0,0244918905 Yes [99] IRF1 0,0243213952 Yes [59] ZFX 0,0217099626 Yes [92] ZFP281 0,021425008 No HSF1 0,0187627656 Yes [7] ELF1 0,0186152951 Yes [84] RUNX2 0,01536321 Yes [105] BACH1::MAFK 0,01439117 Yes [93] ZNF143 0,0142456572 No SREBF1 0,0119006074 Yes [45] MAFF 0,0115383581 Yes [83] JUN::FOS 0,0114034787 Yes [85] RUNX1 0,011220547 Yes [113] FOXP1 0,00768 Yes [18] GABPA 0,0075433539 Yes [119] GATA3 0,00739 Yes [35] KLF5 0,0071428059 Yes [116] SREBF2 0,00372 Yes [45] NFKB1 -0,00023 Yes [91] ZFP187 -0,0004499639 No SRF -0,0008219395 Yes [74] NR3C1 -0,00351 Yes [8] MTF1 -0,0042663939 No EWSR1-FLI1 -0,00453 Yes [15] ZFP161 -0,0056638874 No BCL6 -0,0069983464 Yes [57] RXRA -0,0089238113 Yes [95] STAT6 -0,0107655277 Yes [118] ZEB1 -0,01233 Yes [109] JUN -0,0145969529 No [85] NFATC2 -0,0155255515 Yes [64] GFI1 -0,019625925 Yes [120] RFX1 -0,0198494733 No SP4 -0,0207154248 Yes [117] POU2F2 -0,0211863143 Yes [72] FOXN2 -0,0221967058 Yes [51] USF1 -0,02229 Yes [49] ZNF354C -0,0253099623 No REST -0,0295456703 No CTCF -0,0323316077 Yes [33] THAP1 -0,0325213617 Yes [60] ZBTB7B -0,0557226792 Yes [110] ZFP691 -0,0604547386 No ARID5A -0,11383 Yes [67] 3000 Intersection TF Score Related Source YY1 0,0552683541 Yes [89] ETV3 0,045038283 Yes [112] ETS1 0,042191038 Yes [16] ETV6 0,037862848 Yes [106] ATF1 0,0373995968 Yes [99] NRF1 0,0367107599 Yes [80] ZBTB33 0,0365038969 No IRF1 0,0333219928 Yes [59] STAT2::STAT1 0,0236258969 Yes [104] ZFP281 0,0236115045 No ZFX 0,0211128774 Yes [92] ELF1 0,0174644186 Yes [84] ZNF143 0,0166663103 No HSF1 0,0133327944 Yes [7] TBP 0,0123855409 Yes [94] FOXK1 0,0121519368 No GATA3 0,0105954815 Yes [35] RUNX2 0,0103983124 Yes [105] GABPA 0,0067778528 Yes [119] BCL6 -0,0047744007 Yes [57] POU6f1 -0,0098964579 No ZEB1 -0,0133389686 Yes [109] GFI1 -0,0189466837 Yes [120] JUN -0,0194625822 Yes [85] SP4 -0,0206092665 Yes [117] CTCF -0,0213387738 Yes [33] ZBTB3 -0,0227562281 No ARID3A -0,0264594638 No RFX1 -0,0288191047 No REST -0,0313675717 No ZBTB7B -0,033026699 Yes [110] ZFP691 -0,0462631363 No

20/30 5.11 Runtime Fimo-Prior versus TEPIC 267

As mentioned in the main paper, the runtime of Fimo using an epigenetic prior [13] is 268 extensive compared to TEPIC. This might be due to the fact that Fimo is not 269 parallelised. In Supplementary Table 5, we present an overview of the runtimes of 270

Fimo-Prior compared to TEPIC. The runtime of TEPIC is composed of the time 271 required for peak calling using JAMM, and of the actual runtime of TEPIC. The 272 runtime of Fimo-Prior is composed of the runtime of Fimo and the generation of the 273 gene scores using a part of TEPIC. We generated the scores for both the 3kb-S and 274 50kb-S approach. The time was measured using the linux time command. The runtime 275 tests were conducted on a compute server equipped with Intel Xeon CPU E7-8837 276 processors and 1TB of main memory. For TEPIC 16 cores were used.

Table 5. Runtime TEPIC versus Fimo-Prior HepG2 K562 GM12878 H1-hESC LiHe1 LiHe2 LiHe3 TEPIC 16h 23.1h 6.4 9.1h 18.2h 16.7h 26.8h Fimo-Prior 158h 115.7h 134.1h 109.6h 171.2h 175.7h 175.1h

277

References

1. S. Ahn, C. L. Huang, E. Ozkumur, X. Zhang, J. Chinnala, A. Yalcin, S. Bandyopadhyay, S. Russek, M. S. Unlu, C. DeLisi, and R. J. Irani. TATA binding proteins can recognize nontraditional DNA sequences. Biophys. J., 103(7):1510–1517, Oct 2012. 2. M. Ahrens, O. Ammerpohl, W. von Schonfels, J. Kolarova, S. Bens, T. Itzel, A. Teufel, A. Herrmann, M. Brosch, H. Hinrichsen, W. Erhart, J. Egberts, B. Sipos, S. Schreiber, R. Hasler, F. Stickel, T. Becker, M. Krawczak, C. Rocken, R. Siebert, C. Schafmayer, and J. Hampe. DNA methylation analysis in nonalcoholic fatty liver disease suggests distinct disease-specific and remodeling signatures after bariatric surgery. Cell Metab., 18(2):296–302, Aug 2013. 3. P. Benatti, D. Dolfini, A. Vigano, M. Ravo, A. Weisz, and C. Imbriano. Specific inhibition of NF-Y subunits triggers different cell proliferation defects. Nucleic Acids Res., 39(13):5356–5368, Jul 2011.

4. S. Benhamouche, T. Decaens, C. Godard, R. Chambrey, D. S. Rickman, C. Moinard, M. Vasseur-Cognet, C. J. Kuo, A. Kahn, C. Perret, and S. Colnot. Apc tumor suppressor gene is the ”zonation-keeper” of mouse liver. Dev. Cell, 10(6):759–770, Jun 2006.

5. M. Boergesen, T. A. Pedersen, B. Gross, S. J. van Heeringen, D. Hagenbeek, C. Bindesb?ll, S. Caron, F. Lalloyer, K. R. Steffensen, H. I. Nebb, J. A. Gustafsson, H. G. Stunnenberg, B. Staels, and S. Mandrup. Genome-wide profiling of liver X , , and peroxisome proliferator-activated receptor alpha in mouse liver reveals extensive sharing of binding sites. Mol. Cell. Biol., 32(4):852–867, Feb 2012.

6. M. J. Borok, V. E. Papaioannou, and L. Sussel. Unique functions of Gata4 in mouse liver induction and heart development. Dev. Biol., 410(2):213–222, Feb 2016. 7. E. W. Brenu, D. R. Staines, L. Tajouri, T. Huth, K. J. Ashton, and S. M. Marshall-Gradisnik. Heat shock proteins and regulatory T cells. Autoimmune Dis, 2013:813256, 2013.

21/30 8. J. A. Brewer, B. Khor, S. K. Vogt, L. M. Muglia, H. Fujiwara, K. E. Haegele, B. P. Sleckman, and L. J. Muglia. T-cell is required to suppress COX-2-mediated lethal immune activation. Nat. Med., 9(10):1318–1322, Oct 2003. 9. S. Burghardt, A. Erhardt, B. Claass, S. Huber, G. Adler, T. Jacobs, A. Chalaris, D. Schmidt-Arras, S. Rose-John, K. Karimi, and G. Tiegs. Hepatocytes contribute to immune regulation in the liver by activation of the Notch signaling pathway in T cells. J. Immunol., 191(11):5574–5582, Dec 2013.

10. S. S. Chambel, A. Santos-Goncalves, and T. L. Duarte. The Dual Role of Nrf2 in Nonalcoholic Fatty Liver Disease: Regulation of Antioxidant Defenses and Hepatic Lipid Metabolism. Biomed Res Int, 2015:597134, 2015. 11. H. Chen, S. Lu, J. Zhou, Z. Bai, H. Fu, X. Xu, S. Yang, B. Jiao, and Y. Sun. An integrated approach for the identification of USF1-centered transcriptional regulatory networks during liver regeneration. Biochim. Biophys. Acta, 1839(5):415–423, May 2014. 12. R. H. Costa, V. V. Kalinichenko, A. X. Holterman, and X. Wang. Transcription factors in liver development, differentiation, and regeneration. Hepatology, 38(6):1331–1347, Dec 2003.

13. G. Cuellar-Partida, F. A. Buske, R. C. McLeay, T. Whitington, W. S. Noble, and T. L. Bailey. Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics, 28(1):56–62, Jan 2012. 14. A. M. Diehl. Roles of CCAAT/enhancer-binding proteins in regulation of liver regenerative growth. J. Biol. Chem., 273(47):30843–30846, Nov 1998.

15. C. H. Evans, F. Liu, R. M. Porter, R. P. O’Sullivan, T. Merghoub, E. P. Lunsford, K. Robichaud, F. Van Valen, S. L. Lessnick, M. C. Gebhardt, and J. W. Wells. EWS-FLI-1-targeted cytotoxic T-cell killing of multiple tumor types belonging to the Ewing sarcoma family of tumors. Clin. Cancer Res., 18(19):5341–5351, Oct 2012.

16. S. Eyquem, K. Chemin, M. Fasseu, and J. C. Bories. The Ets-1 transcription factor is required for complete pre-T cell receptor function and allelic exclusion at the T cell receptor beta locus. Proc. Natl. Acad. Sci. U.S.A., 101(44):15712–15717, Nov 2004.

17. E. Falvey, F. Fleury-Olela, and U. Schibler. The rat hepatic leukemia factor (HLF) gene encodes two transcriptional activators with distinct circadian rhythms, tissue distributions and target preferences. EMBO J., 14(17):4307–4317, Sep 1995. 18. X. Feng, H. Wang, H. Takata, T. J. Day, J. Willen, and H. Hu. Transcription factor Foxp1 exerts essential cell-intrinsic regulation of the quiescence of naive T cells. Nat. Immunol., 12(6):544–550, Jun 2011. 19. A. Fernandez-Alvarez, M. Soledad Alvarez, C. Cucarella, and M. Casado. Characterization of the human insulin-induced gene 2 (INSIG2) promoter: the role of Ets-binding motifs. J. Biol. Chem., 285(16):11765–11774, Apr 2010.

20. J. M. Ferrell and J. Y. Chiang. Circadian rhythms in liver metabolism and disease. Acta Pharm Sin B, 5(2):113–122, Mar 2015.

22/30 21. J. D. Fleming, G. Pavesi, P. Benatti, C. Imbriano, R. Mantovani, and K. Struhl. NF-Y coassociates with FOS at promoters, enhancers, repetitive elements, and inactive chromatin regions, and is stereo-positioned with growth-controlling transcription factors. Genome Res., 23(8):1195–1209, Aug 2013. 22. B. Gao, L. Jiang, and G. Kunos. Transcriptional regulation of alpha(1b) adrenergic receptors (alpha(1b)AR) by nuclear factor 1 (NF1): a decline in the concentration of NF1 correlates with the downregulation of alpha(1b)AR gene expression in regenerating liver. Mol. Cell. Biol., 16(11):5997–6008, Nov 1996.

23. B. Gao, H. Wang, F. Lafdil, and D. Feng. STAT proteins - key regulators of anti-viral responses, inflammation, and tumorigenesis in the liver. J. Hepatol., 57(2):430–441, Aug 2012. 24. R. G. Gieling, A. M. Elsharkawy, J. H. Caamano, D. E. Cowie, M. C. Wright, M. R. Ebrahimkhani, A. D. Burt, J. Mann, P. Raychaudhuri, H. C. Liou, F. Oakley, and D. A. Mann. The c-Rel subunit of nuclear factor-kappaB regulates murine liver inflammation, wound-healing, and hepatocyte proliferation. Hepatology, 51(3):922–931, Mar 2010. 25. P. Godoy, N. J. Hewitt, U. Albrecht, M. E. Andersen, N. Ansari, S. Bhattacharya, J. G. Bode, J. Bolleyn, C. Borner, J. Bottger, et al. Recent advances in 2D and 3D in vitro systems using primary hepatocytes, alternative hepatocyte sources and non-parenchymal liver cells and their use in investigating mechanisms of hepatotoxicity, cell signaling and ADME. Arch. Toxicol., 87(8):1315–1530, Aug 2013. 26. F. Goeman, I. Manni, S. Artuso, B. Ramachandran, G. Toietta, G. Bossi, G. Rando, C. Cencioni, S. Germoni, S. Straino, M. C. Capogrossi, S. Bacchetti, A. Maggi, A. Sacchi, P. Ciana, and G. Piaggio. Molecular imaging of nuclear factor-Y transcriptional activity maps proliferation sites in live animals. Mol. Biol. Cell, 23(8):1467–1474, Apr 2012. 27. C. E. Grossman, Y. Qian, K. Banki, and A. Perl. ZNF143 mediates basal and tissue-specific expression of human transaldolase. J. Biol. Chem., 279(13):12190–12205, Mar 2004. 28. S. Guibert, Z. Zhao, M. Sjolinder, A. Gondor, A. Fernandez, V. Pant, and R. Ohlsson. CTCF-binding sites within the H19 ICR differentially regulate local chromatin structures and cis-acting functions. Epigenetics, 7(4):361–369, Apr 2012.

29. H. Guillou, P. G. Martin, and T. Pineau. Transcriptional regulation of hepatic metabolism. Subcell. Biochem., 49:3–47, 2008. 30. X. Guo, M. Yang, H. Gu, J. Zhao, and L. Zou. Decreased expression of SOX6 confers a poor prognosis in hepatocellular carcinoma. Cancer Epidemiol, 37(5):732–736, Oct 2013.

31. E. Gusmao, M. Allhoff, M. Zenke, and I. Costa. Analysis of computational footprinting methods for DNase sequencing experiments. Nature Methods, 2016. 32. G. He and M. Karin. NF-ˆIoB and STAT3 - key players in liver inflammation and cancer. Cell Res., 21(1):159–168, Jan 2011.

23/30 33. H. Heath, C. Ribeiro de Almeida, F. Sleutels, G. Dingjan, S. van de Nobelen, I. Jonkers, K. W. Ling, J. Gribnau, R. Renkawitz, F. Grosveld, et al. CTCF regulates cell cycle progression of alphabeta T cells in the thymus. EMBO J., 27(21):2839–2850, Nov 2008. 34. N. Hernandez. TBP, a universal eukaryotic transcription factor? Genes Dev., 7(7B):1291–1308, Jul 1993. 35. I. C. Ho, T. S. Tai, and S. Y. Pai. GATA3 and the T-cell lineage: essential functions before and after T-helper-2-cell differentiation. Nat. Rev. Immunol., 9(2):125–135, Feb 2009. 36. S. J. Holwerda and W. de Laat. CTCF: the protein, the binding partners, the binding sites and their chromatin loops. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 368(1620):20120369, 2013.

37. M. A. Hume, L. A. Barrera, S. S. Gisselbrecht, and M. L. Bulyk. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res., 43(Database issue):D117–122, Jan 2015. 38. M. M. Ibrahim, S. A. Lacadie, and U. Ohler. JAMM: a peak finder for joint analysis of NGS replicates. Bioinformatics, 31(1):48–55, Jan 2015.

39. H. Ishiguchi, H. Izumi, T. Torigoe, Y. Yoshida, H. Kubota, S. Tsuji, and K. Kohno. ZNF143 activates gene expression in response to DNA damage and binds to cisplatin-modified DNA. Int. J. Cancer, 111(6):900–909, Oct 2004. 40. B. Jaruga, F. Hong, W. H. Kim, and B. Gao. IFN-gamma/STAT1 acts as a proinflammatory signal in T cell-mediated hepatitis via induction of multiple chemokines and adhesion molecules: a critical role of IRF-1. Am. J. Physiol. Gastrointest. Liver Physiol., 287(5):G1044–1052, Nov 2004. 41. R. L. Jirtle, editor. Liver Regeneration and Carcinogenesis. Elsevier, 1995. 42. H. S. Kang, K. Okamoto, Y. S. Kim, Y. Takeda, C. D. Bortner, H. Dang, T. Wada, W. Xie, X. P. Yang, G. Liao, and A. M. Jetten. Nuclear TAK1/TR4-deficient mice are protected against obesity-linked inflammation, hepatic steatosis, and . Diabetes, 60(1):177–188, Jan 2011. 43. M. B. Kannan, V. Solovieva, and V. Blank. The small MAF transcription factors MAFF, MAFG and MAFK: current knowledge and perspectives. Biochim. Biophys. Acta, 1823(10):1841–1846, Oct 2012. 44. K. Kawabe, D. Lindsay, M. Braitch, A. J. Fahey, L. Showe, and C. S. Constantinescu. IL-12 inhibits glucocorticoid-induced T cell apoptosis by inducing GMEB1 and activating PI3K/Akt pathway. Immunobiology, 217(1):118–123, Jan 2012. 45. Y. Kidani, H. Elsaesser, M. B. Hock, L. Vergnes, K. J. Williams, J. P. Argus, B. N. Marbois, E. Komisopoulou, E. B. Wilson, T. F. Osborne, et al. Sterol regulatory element-binding proteins are essential for the metabolic programming of effector T cells and adaptive immunity. Nat. Immunol., 14(5):489–499, May 2013.

24/30 46. E. Ktistaki and I. Talianidis. Chicken ovalbumin upstream promoter transcription factors act as auxiliary cofactors for hepatocyte nuclear factor 4 and enhance hepatic gene expression. Mol. Cell. Biol., 17(5):2790–2797, May 1997. 47. I. V. Kulakovskiy, I. E. Vorontsov, I. S. Yevshin, A. V. Soboleva, A. S. Kasianov, H. Ashoor, W. Ba-Alawi, V. B. Bajic, Y. A. Medvedeva, F. A. Kolpakov, et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res., 44(D1):D116–125, Jan 2016.

48. K. P. Lai, J. Chen, M. He, A. K. Ching, C. Lau, P. B. Lai, K. F. To, and N. Wong. Overexpression of ZFX confers self-renewal and chemoresistance properties in hepatocellular carcinoma. Int. J. Cancer, 135(8):1790–1799, Oct 2014. 49. E. Lamber. Structural studies on Ets1 and Usf1 transcription factor complexes with DNA. PhD thesis, Universit¨atHeidelberg, 2005. 50. B. Langmead and S. L. Salzberg. Fast gapped-read alignment with Bowtie 2. Nat. Methods, 9(4):357–359, Apr 2012. 51. C. Li, A. J. Lusis, R. Sparkes, S. M. Tran, and R. Gaynor. Characterization and chromosomal mapping of the gene encoding the cellular DNA binding protein HTLF. Genomics, 13(3):658–664, Jul 1992. 52. H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, Jul 2009. 53. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078–2079, Aug 2009. 54. Q. Li, Y. Gao, Z. Jia, L. Mishra, K. Guo, Z. Li, X. Le, D. Wei, S. Huang, and K. Xie. Dysregulated Kr¨uppel-like factor 4 and signaling contribute to progression of hepatocellular carcinoma. Gastroenterology, 143(3):799–810, Sep 2012. 55. S. Li, W. Ma, T. Fei, Q. Lou, Y. Zhang, X. Cui, X. Qin, J. Zhang, G. Liu, Z. Dong, Y. Ma, Z. Song, and Y. Hu. Upregulation of 1 transcription activity is associated with hepatocellular carcinoma progression. Mol Med Rep, 10(5):2313–2321, Nov 2014.

56. T. Liu, Y. Zhou, K. S. Ko, and H. Yang. Interactions between Myc and Mediators of Inflammation in Chronic Liver Diseases. Mediators Inflamm., 2015:276850, 2015. 57. X. Liu, X. Yan, B. Zhong, R. I. Nurieva, A. Wang, X. Wang, N. Martin-Orozco, Y. Wang, S. H. Chang, E. Esplugues, R. A. Flavell, Q. Tian, and C. Dong. Bcl6 expression specifies the T follicular helper cell program in vivo. J. Exp. Med., 209(10):1841–1852, Sep 2012. 58. Y. Liu, K. D. Siegmund, P. W. Laird, and B. P. Berman. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol., 13(7):R61, 2012.

59. M. Lohoff and T. W. Mak. Roles of interferon-regulatory factors in T-helper-cell differentiation. Nat. Rev. Immunol., 5(2):125–135, Feb 2005.

25/30 60. C. Lu, J. Y. Li, Z. Ge, L. Zhang, and G. P. Zhou. Par-4/THAP1 complex and Notch3 competitively regulated pre-mRNA splicing of CCAR1 and affected inversely the survival of T-cell acute lymphoblastic leukemia cells. Oncogene, 32(50):5602–5613, Dec 2013. 61. Y. Lu, Z. Ma, Z. Zhang, X. Xiong, X. Wang, H. Zhang, G. Shi, X. Xia, G. Ning, and X. Li. Yin Yang 1 promotes hepatic steatosis through repression of farnesoid X receptor in obese mice. Gut, 63(1):170–178, Jan 2014. 62. Y. Lu, X. Xiong, X. Wang, Z. Zhang, J. Li, G. Shi, J. Yang, H. Zhang, G. Ning, and X. Li. Yin Yang 1 promotes hepatic gluconeogenesis through upregulation of glucocorticoid receptor. Diabetes, 62(4):1064–1073, Apr 2013. 63. R. Luo, S. A. Klumpp, M. J. Finegold, and S. N. Maity. Inactivation of CBF/NF-Y in postnatal liver causes hepatocellular degeneration, lipid deposition, and endoplasmic reticulum stress. Sci Rep, 1:136, 2011.

64. F. Macian. NFAT proteins: key regulators of T-cell development and function. Nat. Rev. Immunol., 5(6):472–484, Jun 2005. 65. H. Malhi and R. J. Kaufman. Endoplasmic reticulum stress in liver disease. J. Hepatol., 54(4):795–809, Apr 2011.

66. V. Marfil, M. Moya, C. E. Pierreux, J. V. Castell, F. P. Lemaigre, F. X. Real, and R. Bort. Interaction between Hhex and SOX13 modulates Wnt/TCF activity. J. Biol. Chem., 285(8):5726–5737, Feb 2010. 67. K. Masuda, B. Ripley, R. Nishimura, T. Mino, O. Takeuchi, G. Shioi, H. Kiyonari, and T. Kishimoto. Arid5a controls IL-6 mRNA stability, which contributes to elevation of IL-6 level in vivo. Proc. Natl. Acad. Sci. U.S.A., 110(23):9409–9414, Jun 2013. 68. A. Mathelier, O. Fornes, D. J. Arenillas, C. Y. Chen, G. Denay, J. Lee, W. Shi, C. Shyr, G. Tan, R. Worsley-Hunt, et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res., 44(D1):D110–115, Jan 2016. 69. S. Mitsui, S. Yamaguchi, T. Matsuo, Y. Ishida, and H. Okamura. Antagonistic role of E4BP4 and PAR proteins in the circadian oscillatory mechanism. Genes Dev., 15(8):995–1006, Apr 2001. 70. D. Monte, L. Coutte, J. L. Baert, I. Angeli, D. Stehelin, and Y. de Launoit. Molecular characterization of the ets-related human transcription factor ER81. Oncogene, 11(4):771–779, Aug 1995. 71. C. Mounier and B. I. Posner. Transcriptional regulation by insulin: from the receptor to the gene. Can. J. Physiol. Pharmacol., 84(7):713–724, Jul 2006. 72. K. Mueller, J. Quandt, R. B. Marienfeld, P. Weihrich, K. Fiedler, M. Claussnitzer, H. Laumen, M. Vaeth, F. Berberich-Siebelt, E. Serfling, T. Wirth, and C. Brunner. Octamer-dependent transcription in T cells is mediated by NFAT and NF-ˆIoB. Nucleic Acids Res., 41(4):2138–2154, Feb 2013. 73. K. M. Mueller, M. Themanns, K. Friedbichler, J. W. Kornfeld, H. Esterbauer, J. P. Tuckermann, and R. Moriggl. Hepatic growth hormone and glucocorticoid receptor signaling in body growth, steatosis and metabolic liver cancer development. Mol. Cell. Endocrinol., 361(1-2):1–11, Sep 2012.

26/30 74. A. Mylona, R. Nicolas, D. Maurice, M. Sargent, D. Tuil, D. Daegelen, R. Treisman, and P. Costello. The essential function for in T-cell development reflects its specific coupling to extracellular signal-regulated kinase signaling. Mol. Cell. Biol., 31(2):267–276, Jan 2011. 75. L. Norton, X. Chen, M. Fourcaudot, N. K. Acharya, R. A. DeFronzo, and S. Heikkinen. The mechanisms of genome-wide target gene regulation by TCF7L2 in liver cells. Nucleic Acids Res., 42(22):13646–13661, Dec 2014. 76. T. E. Oakland, K. J. Haselton, and G. Randall. EWSR1 binds the hepatitis C virus cis-acting replication element and is required for efficient viral replication. J. Virol., 87(12):6625–6634, Jun 2013. 77. R. C. Pereira, A. M. Delany, and E. Canalis. CCAAT/enhancer binding protein homologous protein (DDIT3) induces osteoblastic cell differentiation. Endocrinology, 145(4):1952–1960, Apr 2004.

78. J. Pitkanen and P. Peterson. : from loss of function to autoimmunity. Genes Immun., 4(1):12–21, Jan 2003. 79. J. Planchais, M. Boutant, V. Fauveau, L. D. Qing, L. Sabra-Makke, P. Bossard, M. Vasseur-Cognet, and J. P. Pegorier. The role of chicken ovalbumin upstream promoter transcription factor II in the regulation of hepatic fatty acid oxidation and gluconeogenesis in newborn mice. Am. J. Physiol. Endocrinol. Metab., 308(10):E868–878, May 2015. 80. E. E. Prieschl, V. Novotny, R. Csonga, D. Jaksche, A. Elbe-Burger, W. Thumb, M. Auer, G. Stingl, and T. Baumruker. A novel splice variant of the transcription factor Nrf1 interacts with the TNFalpha promoter and stimulates transcription. Nucleic Acids Res., 26(10):2291–2297, May 1998. 81. M. T. Pritchard and L. E. Nagy. Ethanol-induced liver injury: potential roles for egr-1. Alcohol. Clin. Exp. Res., 29(11 Suppl):146S–150S, Nov 2005. 82. C. Qi, Y. Zhu, and J. K. Reddy. Peroxisome proliferator-activated receptors, coactivators, and downstream targets. Cell Biochem. Biophys., 32 Spring:187–204, 2000. 83. A. Rani, B. Afzali, A. Kelly, L. Tewolde-Berhan, M. Hackett, A. S. Kanhere, I. Pedroza-Pacheco, H. Bowen, S. Jurcevic, R. G. Jenner, D. J. Cousins, J. A. Ragheb, P. Lavender, and S. John. IL-2 regulates expression of C-MAF in human CD4 T cells. J. Immunol., 187(7):3721–3729, Oct 2011. 84. B. L. Rellahan, J. P. Jensen, T. K. Howcroft, D. S. Singer, E. Bonvini, and A. M. Weissman. Elf-1 regulates basal expression from the T cell antigen receptor zeta-chain gene promoter. J. Immunol., 160(6):2794–2801, Mar 1998. 85. L. Riera-Sans and A. Behrens. Regulation of alphabeta/gammadelta T cell development by the activator protein 1 transcription factor c-Jun. J. Immunol., 178(9):5690–5700, May 2007. 86. S. M. Robinson and D. A. Mann. Role of nuclear factor kappaB in liver health and disease. Clin. Sci., 118(12):691–705, Jun 2010. 87. L. Rui. Energy metabolism in the liver. Compr Physiol, 4(1):177–197, Jan 2014.

27/30 88. J. A. Sanders and P. A. Gruppuso. Coordinated regulation of c-Myc and Max in rat liver development. Am. J. Physiol. Gastrointest. Liver Physiol., 290(1):G145–155, Jan 2006. 89. G. T. Schwenger, R. Fournier, L. M. Hall, C. J. Sanderson, and V. A. Mordvinov. Nuclear factor of activated T cells and YY1 combine to repress IL-5 expression in a human T-cell line. J. Allergy Clin. Immunol., 104(4 Pt 1):820–827, Oct 1999. 90. X. B. Shen, L. Huang, S. H. Zhang, D. P. Wang, Y. L. Wu, W. N. Chen, S. H. Xu, and X. Lin. Transcriptional regulation of the apolipoprotein F (ApoF) gene by ETS and C/EBPalpha in hepatoma cells. Biochimie, 112:1–9, May 2015. 91. E. M. Smith, M. Gregg, F. Hashemi, L. Schott, and T. K. Hughes. Corticotropin Releasing Factor (CRF) activation of NF-kappaB-directed transcription in leukocytes. Cell. Mol. Neurobiol., 26(4-6):1021–1036, 2006. 92. M. Smith-Raska. Role of the Transcription Factor Zfx in T Cell . PhD thesis, Columbia University, 2011. 93. A. Y. So, Y. Garcia-Flores, A. Minisandram, A. Martin, K. Taganov, M. Boldin, and D. Baltimore. Regulation of APC development, immune response, and autoimmunity by Bach1/HO-1 pathway in mice. Blood, 120(12):2428–2437, Sep 2012. 94. A. Son, H. Nakamura, H. Okuyama, S. Oka, E. Yoshihara, W. Liu, Y. Matsuo, N. Kondo, H. Masutani, Y. Ishii, and others. Dendritic cells derived from TBP-2-deficient mice are defective in inducing T cell responses. Eur. J. Immunol., 38(5):1358–1367, May 2008. 95. C. B. Stephensen, A. D. Borowsky, and K. C. Lloyd. Disruption of Rxra gene in thymocytes and T lymphocytes modestly alters lymphocyte frequencies, proliferation, survival and T helper type 1/type 2 balance. Immunology, 121(4):484–498, Aug 2007. 96. E. Stepniak, R. Ricci, R. Eferl, G. Sumara, I. Sumara, M. Rath, L. Hui, and E. F. Wagner. c-Jun/AP-1 controls liver regeneration by repressing /p21 and p38 MAPK activity. Genes Dev., 20(16):2306–2314, Aug 2006. 97. A. Subramaniam, M. K. Shanmugam, E. Perumal, F. Li, A. Nachiyappan, X. Dai, S. N. Swamy, K. S. Ahn, A. P. Kumar, B. K. Tan, K. M. Hui, and G. Sethi. Potential role of signal transducer and activator of transcription (STAT)3 signaling pathway in inflammation, survival, proliferation and invasion of hepatocellular carcinoma. Biochim. Biophys. Acta, 1835(1):46–60, Jan 2013. 98. K. Sun, M. A. Battle, R. P. Misra, and S. A. Duncan. Hepatocyte expression of serum response factor is essential for liver function, hepatocyte proliferation and survival, and postnatal body growth in mice. Hepatology, 49(5):1645–1654, May 2009. 99. D. A. Thomas and J. Massague. TGF-beta directly targets cytotoxic T cell functions during tumor evasion of immune surveillance. Cancer Cell, 8(5):369–380, Nov 2005. 100. J. Timsit, C. Saint-Martin, D. Dubois-Laforgue, and C. Bellanne-Chantelot. Searching for Maturity-Onset Diabetes of the Young (MODY): When and What for? Can J Diabetes, Apr 2016.

28/30 101. X. Tong, M. Muchnik, Z. Chen, M. Patel, N. Wu, S. Joshi, L. Rui, M. A. Lazar, and L. Yin. Transcriptional repressor E4-binding protein 4 (E4BP4) regulates metabolic hormone fibroblast growth factor 21 (FGF21) during circadian cycles and feeding. J. Biol. Chem., 285(47):36401–36409, Nov 2010. 102. C. Trapnell, L. Pachter, and S. L. Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25(9):1105–1111, May 2009. 103. C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, and L. Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28(5):511–515, May 2010. 104. G. Vahedi, H. Takahashi, S. Nakayamada, H. W. Sun, V. Sartorelli, Y. Kanno, and J. J. O’Shea. STATs shape the active enhancer landscape of T cell populations. Cell, 151(5):981–993, Nov 2012.

105. F. Vaillant, K. Blyth, L. Andrew, J. C. Neil, and E. R. Cameron. Enforced expression of Runx2 perturbs T cell development at a stage coincident with beta-selection. J. Immunol., 169(6):2866–2874, Sep 2002. 106. P. Van Vlierberghe, A. Ambesi, I. Rigo, E. Paietta, J. Racevskis, P. H. Wiernik, J. M. Rowe, M. Rue, and A. A. Ferrando. ETV6 is an early T-cell progenitor (ETP) specific tumor suppressor gene in adult T-ALL, 2011. 53rd ASH Annual Meeting and Exposition, Oral and Poster Abstracts. 107. D. Wang, S. Han, X. Wang, R. Peng, and X. Li. SOX5 promotes epithelial-mesenchymal transition and cell invasion via regulation of Twist1 in hepatocellular carcinoma. Med. Oncol., 32(2):461, Feb 2015.

108. H. C. Wang, T. Y. Li, Y. J. Chao, Y. C. Hou, Y. S. Hsueh, K. H. Hsu, and Y. S. Shan. KIT Exon 11 Codons 557-558 Deletion Mutation Promotes Liver Metastasis Through the CXCL12/CXCR4 Axis in Gastrointestinal Stromal Tumors. Clin. Cancer Res., Mar 2016.

109. J. Wang, S. Lee, C. E. Teh, K. Bunting, L. Ma, and M. F. Shannon. The transcription repressor, ZEB1, cooperates with CtBP2 and HDAC1 to suppress IL-2 gene activation in T cells. Int. Immunol., 21(3):227–235, Mar 2009. 110. L. Wang, K. F. Wildt, E. Castro, Y. Xiong, L. Feigenbaum, L. Tessarollo, and R. Bosselut. The transcription factor Zbtb7b represses CD8-lineage gene expression in peripheral CD4+ T cells. Immunity, 29(6):876–887, Dec 2008.

111. X. L. Wang, R. Suzuki, K. Lee, T. Tran, J. E. Gunton, A. K. Saha, M. E. Patti, A. Goldfine, N. B. Ruderman, F. J. Gonzalez, and C. R. Kahn. Ablation of ARNT/HIF1beta in liver alters gluconeogenesis, lipogenic gene expression, and serum ketones. Cell Metab., 9(5):428–439, May 2009.

112. E. N. Wardle. Guide to Signal Pathways in Immune Cells. Springer Science and Business Media, 2009. page 233. 113. W. F. Wong, K. Kohu, A. Nakamura, M. Ebina, T. Kikuchi, R. Tazawa, K. Tanaka, S. Kon, T. Funaki, A. Sugahara-Tobinai, et al. Runx1 deficiency in CD4+ T cells causes fatal autoimmune inflammatory lung disease due to spontaneous hyperactivation of cells. J. Immunol., 188(11):5408–5420, Jun 2012.

29/30 114. Z. Xu, L. Chen, L. Leung, T. S. Yen, C. Lee, and J. Y. Chan. Liver-specific inactivation of the Nrf1 gene in adult mouse leads to nonalcoholic steatohepatitis and hepatic neoplasia. Proc. Natl. Acad. Sci. U.S.A., 102(11):4120–4125, Mar 2005. 115. J. Yang, L. E. Mowry, K. N. Nejak-Bowen, H. Okabe, C. R. Diegel, R. A. Lang, B. O. Williams, and S. P. Monga. beta-catenin signaling in murine liver zonation and regeneration: a Wnt-Wnt situation! Hepatology, 60(3):964–976, Sep 2014. 116. X. O. Yang, R. T. Doty, J. S. Hicks, and D. M. Willerford. Regulation of T-cell receptor D beta 1 promoter by KLF5 through reiterated GC-rich motifs. Blood, 101(11):4492–4499, Jun 2003. 117. P. Ye and D. E. Kirschner. Reevaluation of T cell receptor excision circles as a measure of human recent thymic emigrants. J. Immunol., 168(10):4968–4979, May 2002.

118. C. R. Yu, R. M. Mahdi, S. Ebong, B. P. Vistica, J. Chen, Y. Guo, I. Gery, and C. E. Egwuagu. Cell proliferation and STAT6 pathways are negatively regulated in T cells by STAT1 and suppressors of cytokine signaling. J. Immunol., 173(2):737–746, Jul 2004.

119. S. Yu, D. M. Zhao, R. Jothi, and H. H. Xue. Critical requirement of GABPalpha for normal T cell development. J. Biol. Chem., 285(14):10179–10188, Apr 2010. 120. R. Yucel, H. Karsunky, L. Klein-Hitpass, and T. Moroy. The transcriptional repressor Gfi1 affects development of early, uncommitted c-Kit+ T cell progenitors and CD4/CD8 lineage decision in the thymus. J. Exp. Med., 197(7):831–844, Apr 2003.

121. E. M. Zardi, L. Navarini, G. Sambataro, P. Piccinni, F. M. Sambataro, C. Spina, and A. Dobrina. Hepatic PPARs: their role in liver physiology, fibrosis and treatment. Curr. Med. Chem., 20(27):3370–3396, 2013. 122. Y. Zhang, T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol., 9(9):R137, 2008. 123. Y. Zhang, S. Zhang, X. Wang, J. Liu, L. Yang, S. He, L. Chen, and J. Huang. Prognostic significance of FOXP1 as an oncogene in hepatocellular carcinoma. J. Clin. Pathol., 65(6):528–533, Jun 2012.

30/30