Supplementary material

Olgert Denas September 24, 2014

List of Figures

1 Workflow of data and experiments...... 3 2 The number of TFos for all human and mouse cell types mapped by each of the considered whole genome alignments ...... 4 3 For each human celltype-factor pair we tested whether the SeqCons rate (of all TFos how many are SeqCons) is higher than expected by a Binomial test ...... 5 4 For each mouse celltype-factor pair we tested whether the SeqCons rate (of all TFos how many are SeqCons) is higher than expected by a Binomial test ...... 6 5 The distribution of mouse mappable TFos across cell types ...... 7 6 The distribution of human mappable TFos across cell types ...... 8 7 The distribution of mouse mappable TFos across transcription factors ...... 9 8 The distribution of human mappable TFos across transcription factors ...... 10 9 The distribution of mouse mappable TFos nucleotides across cell types ...... 11 10 The distribution of human mappable TFos nucleotides across cell types ...... 12 11 The distribution of mouse mappable TFos nucleotides across transcription factors .. 13 12 The distribution of human mappable TFos nucleotides across transcription factors . 14 13 For each celltype-factor pair we tested whether the (log) binding signal over Funct- Cons or FunctActive elements was significantly different from that over SeqCons elements ...... 15 14 For each celltype-factor pair we tested whether the (log) binding signal over Funct- Cons or FunctActive elements was significantly different from that over SeqCons elements ...... 16 15 Classification of mappable elements for all three analogous cell types ...... 17 16 Classification of mappable elements for all three analogous cell types ...... 18 17 The plot shows the number of FunctActive elements from a query assay as a function of the subset size ...... 19 18 The plot shows the number of FunctActive elements from a query assay with respect to a subset of assays from the other species ...... 20 19 We modeled the situation of a set of assays that have no co-association, thus overlap with an exclusive set of TFos on the other species ...... 21 20 A mouse Rad21 occupancy site mapped on human 20. Mapping is guided by the human-mouse whole genome alignments which report 5 insertions in human. We classified this mouse TFos as FunctCons, as its mapped version in human overlapps with Rad21 occupancy sites in K562...... 22

List of Tables

1 Peak signal statistics by peak classes ...... 23 2 Analogous cells ...... 23 3 Counts of mappable, functionally active, and total TFos. The table reports both the element count and the coverage in mega bases ...... 24 4 Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length...... 24

1 5 Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length. Continues ...... 25 6 Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length...... 26

2 1 Supplementary figures

Figure 1: Workflow of data and experiments.

3 Wi38 Werirb1 U87 U2os T47d Sknshra Sknsh Sknmc Shsy5y Saec Rptec Raji Pfsk1 Pbdefetal Pbde Panc1 Nt2d1 Nhlf Nhek Nhdfneo Nb4 Mcf7 Mcf10aes K562 Imr90 Hvmf Huvec Hrpe Hre Hpf Hpaf Hmf Hmec Hl60 Hffmyc Hff Hepg2 Helas3 Hek293t Hek293 Hee Hct116 Hcpe Hcm Hcfaa Hbmec Hasp item_class Hac H1hesc Gm19193 Gm19099 epo Gm18951 Gm18526 Gm18505 Gm15510 shared Gm12892 Gm12891 Gm12878 Gm12875 ucsc Gm12874 Gm12873 Gm12872 Gm12865 Gm12864 Gm12801 Gm10847 Gm08714

Cell type (hg19 and mm9) Gm06990 Ecc1 Caco2 Bj Be2c Aoaf Ag10803 Ag09319 Ag09309 Ag04450 Ag04449 A549 Wbrain Thymus Testis Spleen Smint Olfact Mel Megakaryo Mef Lung Liver Limb Kidney Heart G1eer4e2 G1eer4 G1e Ese14 Esb4 Erythrobl Cortex Ch12 Cbellum Bmdm Bmarrow 0e+00 2e+05 4e+05 6e+05 element count

Figure 2: The number of TFos for all human and mouse cell types mapped by each of the considered whole genome alignments. Here we show, only those TFos that overlap by half their length with a DHS peak. See supplementary Table 4 for more details.

4 1.00

0.75

significant TRUE 0.50 FALSE

0.25

0.00 0 200 400 1:nrow(a)

Figure 3: For each human celltype-factor pair we tested whether the SeqCons rate (of all TFos how many are SeqCons) is higher than expected by a Binomial test. The vertical lines show the estimated 99% confidence interval and the black line indicates the expected rate. The color of the vertical lines indicates whether the interval contains the expected value.

5 0.8

significant TRUE

0.6

0.4

0 40 80 120 1:nrow(a)

Figure 4: For each mouse celltype-factor pair we tested whether the SeqCons rate (of all TFos how many are SeqCons) is higher than expected by a Binomial test. The vertical lines show the estimated 99% confidence interval and the black line indicates the expected rate. The color of the vertical lines indicates whether the interval contains the expected value.

6 Wbrain Thymus Olfact Heart Cbellum Megakaryo Smint Limb Cortex Ch12 Spleen Mef Lung Cell Bmdm Liver Testis Bmarrow Mel Kidney G1eer4 Erythrobl Ese14 Esb4 G1eer4e2 G1e

0.7 0.8 0.9 Fraction of elements

Figure 5: The distribution of mouse mappable TFos across cell types. The box-plot for each cell type summarizes the distribution of values for the fraction of elements that can be mapped on the other species. Occupied segments for each cell type contribute one value to the distribution.

7 Hek293 Nt2d1 Pfsk1 Sknsh Hct116 Raji U87 Sknmc Hek293t Panc1 Gm10847 Shsy5y Gm15510 Pbdefetal Helas3 H1hesc Pbde Gm12892 Gm18526 Gm18951 Gm18505 Gm19099 Gm19193 A549 Hepg2 Mcf7 Gm12878 Gm12891 Ecc1 Nb4 Gm12801 Huvec Sknshra Nhlf Mcf10aes Hff Wi38 Ag04450 Hac Nhdfneo

Cell T47d Ag09309 Hcm K562 Hre Bj Hee Hffmyc Ag10803 Ag09319 Hpf Imr90 Hmf Hasp Gm06990 Hpaf Hvmf Nhek Hmec Hcfaa Aoaf Hl60 Hrpe Gm12874 Gm12865 Be2c Ag04449 Gm12875 Gm12872 Saec Gm12864 Hbmec Hcpe Caco2 Gm12873 Rptec Werirb1 U2os Gm08714 0.5 0.6 0.7 0.8 0.9 Fraction of elements

Figure 6: The distribution of human mappable TFos across cell types. The box-plot for each cell type summarizes the distribution of values for the fraction of elements that can be mapped on the other species.

8 Ubfsc13125 Maz Gnc5 Mazab85725 Ubf Gcn5 Fli1sc356 Sin3anb6001263 Pol2 Chd1 Hcfc1 Hcfc1nb10068209 Sin3a Mxi1af4185 Nelfe Znfmizdcp1ab65767 Chd1nb10060411 Corestsc30189 Chd2ab68301 Znfmizdcp1 Max Gata2sc9008 Pax5c Usf2 Bhlhe40nb100 Pol2s2

Tf Cmybsc7874 Cmybh141 Zkscan1hpa006672 Ets1 Cjun Nrf2 Bhlhe40 Cmyc Tbp Zc3h11anb10074650 P300sc584 Gata1a Zkscan1 Corest Znf384hpa004051 Jund P300 Tal1 Ctcf Smc3ab9263 Rad21 Znf384 Gata1 Mafkab50322 Ctcfb Pol24h8 Ctcfsc15914 Mafk 0.6 0.7 0.8 0.9 Fraction of elements

Figure 7: The distribution of mouse mappable TFos across transcription factors. The box-plot for each summarizes the distribution of values for the fraction of elements that can be mapped on the other species.

9 Grp20 Suz12 Ctbp2 Irf3 Spt20 Cebpz Brca1a300 Hae2f1 Sin3anb6001263 Hmgn3 Elk4 E2f4 Nfya Ccnt2 Sin3ak20 Chd1a301218a Mazab85725 Sp2 Nrf1 Sp2sc643 Taf1 Mxi1 Chd2 Ubfsc13125 Yy1c Elk112771 Srebp1 Baf170 Foxp2 Ubtfsab1404509 Creb1sc240 Tcf7l2 Mxi1af4185 Brca1 Baf155 Pol2 Sp4v20 Gata3sc269sc269 Gtf2f1 Tr4 Yy1sc281 Ini1 Brg1 Chd2ab68301 Gtf2b E2f6 Six5 P300sc582 Ets1 Gabp Corestsc30189 Zkscan1hpa006672 Egr1 Ap2alpha Tcf7l2c9b92565 Tbp Taf7sc101167 Zbtb7a Gtf2f1ab28179 Rfx5200401194 Cebpdsc636 Max Zbtb7asc34508 Ap2gamma Pol24h8 Cmyc Pgc1a Sp1 Pmlsc71910 Kap1 Yy1 Nrsf Bhlhe40 Irf1 Stat1 Znf263 Whip Nfyb Erra Atf3 Ehdac8 Thap1sc98174 Znf143166181ap Mybl2sc81192 Rxra Gata3 Nelfe Hdac2sc6296 Elf1sc631 P300 Bclaf101388 Pol2s2 Zeb1sc25388 Gata1 Mbd4sc271530 Bhlhe40c Zbtb33 Ikzf1iknucla Tblr1ab24550 Nfkb Nanogsc33759 Bach1 Elf1 Tf Gata3sc269 Arid3anb100279 Atf2sc81188 Gr Egata2 Stat2 Znf217 Fosl1sc183 Gata2 Srf Usf1sc8983 Eralphaa Tblr1nb600270 Mta3sc81325 Hnf4a Tcf12 Nficsc81335 Bhlhe40nb100 Foxm1sc502 Tcf3 Corestab24166 Nfe2sc22827 Eraa Mef2csc13268 Gata3sc268 Stat3 Pou2f2 P300b Enr4a1 Znf143 Cfos Hnf4gsc6558 Usf2 Pbx3 Ejunb Smc3ab9263 Cjun Ejund Prdm19115 Stat5asc74442 P300sc584sc584 Pax5n19 Foxa1sc6553 Bach1sc14700 Nfatc1sc17834 Mef2a Nr2f2sc271940 Arid3asc8821 Ctcf Tead4sc101184 Atf106325 Rad21 Zzz3 Ctcfb Ebf1sc137065 Jund Foxa1 Ctcfc Bcl3 Pax5c20 Ctcfsc15914c20 Hnf4asc8987 Efos Ctcflsc98982 Cbx3sc101004 Irf4sc6059 Mafk Trim28sc81411 Foxa1sc101058 Fosl2 Runx3sc101553 Foxa2sc6554 Ctcfsc5916 Bcl11a Maffm8194 Mafksc477 Usf1 Rfx5 Cebpb Mafkab50322 Pu1 Maff Cebpbsc150 Sirt6 Srebp2 Tal1sc12984 Nfe2 Pou5f1sc9081 Batf Gcn5 Gata2sc267 Esr Hsf1 Znf274m01 Setdb1 Brf2 Znf274 Tf3c110 Rpc155 Bdp1 Brf1 Pol3 0.25 0.50 0.75 Fraction of elements

Figure 8: The distribution of human mappable TFos across transcription factors. The box-plot for each transcription factor summarizes the distribution of values for the fraction of elements that can be mapped on the other species.

10 Wbrain Thymus Olfact Heart Cbellum Megakaryo Smint Cortex Ch12 Mef Limb Lung Liver Cell Bmdm Spleen Bmarrow Testis Mel Kidney Erythrobl Esb4 G1eer4 Ese14 G1eer4e2 G1e

0.60 0.65 0.70 0.75 0.80 0.85 Fraction of bases

Figure 9: The distribution of mouse mappable TFos nucleotides across cell types. The box-plot for each cell type summarizes the distribution of values for the fraction of nucleotides that can be mapped on the other species.

11 Nt2d1 Hek293 Sknsh Pfsk1 Shsy5y Hct116 Sknmc U87 H1hesc Raji Hek293t Gm10847 Gm12892 Panc1 Helas3 Gm15510 Pbdefetal Gm18526 Pbde Gm18951 A549 Ecc1 Gm19099 Hepg2 Gm18505 Gm19193 Sknshra Huvec Mcf7 Mcf10aes Gm12878 Gm12891 Nb4 T47d Nhlf K562 Hff Hac Gm12801 Nhdfneo

Cell Hcm Ag04450 Imr90 Bj Wi38 Ag09319 Hre Hffmyc Ag10803 Ag09309 Hee Hpf Hpaf Nhek Hvmf Hmec Hasp Ag04449 Aoaf Hmf Hcfaa Be2c Saec Gm06990 Hrpe Hcpe Hbmec Gm12874 Gm12865 Rptec Gm12872 Gm12864 Gm12875 Hl60 Gm12873 Caco2 U2os Werirb1 Gm08714 0.5 0.6 0.7 Fraction of bases

Figure 10: The distribution of human mappable TFos nucleotides across cell types. The box-plot for each cell type summarizes the distribution of values for the fraction of nucleotides that can be mapped on the other species.

12 Ubfsc13125 Ubf Gnc5 Maz Mazab85725 Pol2 Fli1sc356 Gcn5 Hcfc1nb10068209 Hcfc1 Sin3anb6001263 Chd1 Mxi1af4185 Sin3a Nelfe Znfmizdcp1ab65767 E2f4 Chd1nb10060411 Gata2sc9008 Corestsc30189 Pax5c Znfmizdcp1 Chd2ab68301 Max Bhlhe40nb100 Zkscan1hpa006672 Pol2s2

Tf Usf2 Cmybsc7874 Cmybh141 Cjun Nrf2 Ets1 Bhlhe40 Cmyc Tbp P300sc584 Gata1a Corest Zc3h11anb10074650 P300 Zkscan1 Jund Znf384hpa004051 Tal1 Smc3ab9263 Gata1 Rad21 Znf384 Ctcf Mafkab50322 Ctcfb Pol24h8 Ctcfsc15914 Mafk 0.5 0.6 0.7 0.8 Fraction of bases

Figure 11: The distribution of mouse mappable TFos nucleotides across transcription factors. The box-plot for each cell type summarizes the distribution of values for the fraction of nucleotides that can be mapped on the other species.

13 Grp20 Suz12 Irf3 Cebpz Spt20 Ctbp2 Sp2sc643 Brca1a300 Sp2 E2f1 Sin3ak20 Nfya Nrf1 Hae2f1 E2f4 Sin3anb6001263 Chd2 Gata3sc269sc269 Chd1a301218a Six5 Taf1 Foxp2 Hmgn3 Elk4 Ccnt2 Creb1sc240 Yy1c Mazab85725 Gtf2f1 Srebp1 Elk112771 Yy1sc281 Tcf7l2 Ubfsc13125 Mxi1 Ubtfsab1404509 Mxi1af4185 Rfx5200401194 Brca1 Ets1 Tr4 Gtf2b P300sc582 Pol2 Cebpdsc636 Baf170 Chd2ab68301 Sp4v20 Nrsf Gabp Brg1 Tcf7l2c9b92565 Zkscan1hpa006672 Egr1 Corestsc30189 Max Tbp Taf7sc101167 Bhlhe40 Zbtb7a Sp1 E2f6 Baf155 Pgc1a Ini1 Zbtb7asc34508 Gtf2f1ab28179 Nanogsc33759 Ap2alpha Yy1 Ap2gamma Nelfe Nfyb Pol24h8 Atf3 Rxra Irf1 Zbtb33 P300 Pmlsc71910 Thap1sc98174 Mybl2sc81192 Elf1sc631 Kap1 Znf263 Stat1 Gr Srf Pol2s2 Hdac2sc6296 Cmyc Whip Gata1 Gata3 Erra Usf1sc8983 Bhlhe40c Gata2 Mbd4sc271530 Arid3anb100279 Tcf12 Eralphaa Eraa Nficsc81335 Tf Ehdac8 Atf2sc81188 Zeb1sc25388 Nfkb Znf143166181ap Gata3sc268 Fosl1sc183 Stat2 Bclaf101388 Gata3sc269 Tblr1ab24550 Stat3 Mef2csc13268 Tblr1nb600270 Cfos Bach1 Mta3sc81325 Foxm1sc502 Pbx3 Tcf3 Hnf4a Nfe2sc22827 Pou2f2 Hnf4gsc6558 Ikzf1iknucla Bhlhe40nb100 P300b Egata2 Elf1 Corestab24166 P300sc584sc584 Prdm19115 Usf2 Znf217 Cjun Foxa1sc6553 Ejunb Mef2a Ejund Stat5asc74442 Pax5n19 Nr2f2sc271940 Znf143 Nfatc1sc17834 Zzz3 Jund Mafk Tead4sc101184 Atf106325 Foxa1 Hnf4asc8987 Foxa1sc101058 Bach1sc14700 Foxa2sc6554 Fosl2 Arid3asc8821 Pax5c20 Bcl3 Ebf1sc137065 Bcl11a Enr4a1 Efos Irf4sc6059 Trim28sc81411 Cbx3sc101004 Smc3ab9263 Runx3sc101553 Rad21 Maffm8194 Ctcf Usf1 Ctcfb Mafksc477 Ctcflsc98982 Ctcfc Cebpb Ctcfsc15914c20 Cebpbsc150 Pu1 Tal1sc12984 Mafkab50322 Ctcfsc5916 Gcn5 Maff Rfx5 Pou5f1sc9081 Gata2sc267 Batf Sirt6 Nfe2 Srebp2 Hsf1 Setdb1 Esr Znf274m01 Znf274 Brf2 Tf3c110 Rpc155 Bdp1 Pol3 Brf1 0.2 0.4 0.6 0.8 Fraction of bases

Figure 12: The distribution of human mappable TFos nucleotides across transcription factors. The box-plot for each cell type summarizes the distribution of values for the fraction of nucleotides that can be mapped on the other species.

14 hg19 Wi38 * Werirb1 * U87 * * U2os T47d * * ** * Sknshra * * * * * Sknsh * * * * Sknmc * Shsy5y * Saec * Rptec * Raji * Pfsk1 * * Pbdefetal Pbde * Panc1 * * Nt2d1 * * Nhlf * Nhek * Nhdfneo * Nb4 * * * * Mcf7 * Mcf10aes * * * pval K562 ** * * *** ** **** * ******* * *** ** * * ****** * **** * Imr90 * * * Hvmf * Huvec * * * * * 0.75 Hrpe * Hre * cell Hpf * 0.50 Hpaf * Hmf * Hmec * 0.25 Hl60 * Hffmyc * Hff * 0.00 Hepg2 ** * * ** * * *** ** * ** * * ** * ** * ** * ** * **** Helas3 * * * * ***** * * * *** ** * * ** * ** * * Hek293t Hek293 * * Hee * Hct116 * * ** Hcpe * Hcm * Hcfaa * Hbmec * Hasp * Hac * H1hesc * * ** * * * * ** * * *** *** * Gm * **** * * ** * * ** * ***** *** *** **** * ** * ** * *** Ecc1 * * * Caco2 * Bj * Be2c * Aoaf * Ag * A549 * ** * ** * * ** * * * * * * Irf Gr Zn Atf Sp Srf Taf Gtf Nrf Tr4 Bcl Tcf Znf Esr Nfk Nfy Rfx Era Ubf Fox Ini1 Fos Pax Nfic Pml Elf1 Tbp Yy1 Pou Ctcf Ap2 Atf3 Batf Pu1 Stat Tal1 Brf1 Brf2 Nrsf Ubtf Erra Max Maz Nr2f Bcl3 Elk4 Maff Six5 Pol2 Pol3 Efos Ets1 Sin3 Smc Zbtb Cfos Cjun Ebf1 Hsf1 Usf1 Usf2 Nfe2 Sirt6 Rxra Brg1 Egr1 Hnf4 Mxi1 Jund Gata Mafk Zzz3 Mta3 Tblr1 Whip Pbx3 Zeb1 Nelfe Cbx3 Bdp1 Kap1 P300 Chd1 Chd2 Gcn5 Gabp Mbd4 Cmyc Ejunb Ejund Brca1 Ccnt2 Mybl2 Ctbp2 Srebp Tead4 Creb1 Grp20 Bach1 Pgc1a Suz12 Thap1 Nfatc1 Cebpz Hdac2 Runx3 Arid3a Cebpb Cebpd Corest Nanog Rad21 Trim28 Bcl11a Baf155 Baf170 Egata2 Setdb1 Enr4a1 Hae2f1 Hmgn3 Ehdac8 Rpc155 Tf3c110 Bhlhe40 Zkscan1 Elk112771 Prdm19115 Ikzf1iknucla tf

Figure 13: For each celltype-factor pair we tested whether the (log) binding signal over FunctCons or FunctActive elements was significantly different from that over SeqCons elements. Cases with a Bonferroni corrected error rate of 1% are marked with a ’*’; those for which there were not enough elements to perform the test are labeled with gray; white positions are missing data.

15 mm9 Wbrain * Thymus * Testis * * Spleen * Smint * * Olfact * * Mel * * * * * * * * * * * * * Megakaryo Mef * * Lung * * pval Liver * * Limb * 0.75 Kidney * cell 0.50 Heart * * G1eer4e2 0.25 G1eer4 G1e Ese14 * Esb4 * * * Erythrobl Cortex * * Ch12 * * * * * * * * * * * * Cbellum * * Bmdm * Bmarrow * * Znf E2f Ubf Fli1 Tbp Ctcf Tal1 Max Maz Nrf2 Pol2 Ets1 Sin3 Cjun Usf2 Mxi1 Jund Mafk Gata Nelfe P300 Chd1 Chd2 Gcn5 Gnc5 Cmyc Hcfc1 NULL Smc3 Cmyb Corest Rad21 Zkscan Bhlhe40 Zkscan1 Zc3h11a tf

Figure 14: For each celltype-factor pair we tested whether the (log) binding signal over FunctCons or FunctActive elements was significantly different from that over SeqCons elements. Cases with a Bonferroni corrected error rate of 1% are marked with a ’*’; those for which there were not enough elements to perform the test are labeled with gray; white positions are missing data.

16 erythroleukemia 0.8

0.6 mm9 0.4

0.2

cls 0.0 FunctCons FunctActive 0.8 SeqCons fraction of elements fraction 0.6 hg19 0.4

0.2

0.0 Chd2ab68301 Mafkab50322 Smc3ab9263 Mxi1af4185 Pol2s2 Rad21 Gata1 Cmyc P300 Ctcfb Jund Usf2 Ets1 Pol2 Max Ctcf Tbp

TF

Figure 15: Classification of mappable elements for all three analogous cell types

17 lymphoblastoid 0.8

0.6 mm9 0.4

0.2

cls 0.0 FunctCons 0.8 FunctActive SeqCons

fraction of elements fraction 0.6 hg19 0.4

0.2

0.0 Sin3anb6001263 Corestsc30189 Chd2ab68301 Smc3ab9263 Mazab85725 Pol2s2 Rad21 Jund E2f4 Usf2 Ets1 Pol2 Max Ctcf Tbp

TF

Figure 16: Classification of mappable elements for all three analogous cell types

18 Mel Max Mel Bhl Mel Usf Mel Chd2

Mel Ctcf Mel Cmyc Mel Corest Mel Ets1

Mel Max Mel Gata1 Mel Jund Mel Mafk

Mel Mxi Mel Maz Mel P300 Mel Rad21

Mel Sin3

Figure 17: The plot shows the number of FunctActive elements from a query assay as a function of the subset size. We perform multiple countings for each number of assays from the other species

19 Mel Bhl Mel Chd2 Mel Cmyc Mel Corest

Mel Ctcf Mel Gata1 Mel Jund Mel Mafk

Mel Max Mel Maz Mel Mxi Mel P300

Mel Rad21 Mel Sin3 Mel Smc3 Mel Tal1

Mel Tbp Mel Znf Mel Ubf Mel Usf

Figure 18: The plot shows the number of FunctActive elements from a query assay with respect to a subset of assays from the other species. We perform multiple countings for each number of assays from the other species

20 1000 800 600 400 200 0

1 7 14 22 30 38 46 54 62 70 78 86 94 103 113

Figure 19: We modeled the situation of a set of assays that have no co-association, thus overlap with an exclusive set of TFos on the other species. The figure shows a simulation for 100 assays covergin 1000 TFos on the other species such that the sets of covered TFos are pairwise disjoint. We perform multiple countings for each number of assays from the other species.

21 Figure 20: A mouse Rad21 occupancy site mapped on human chromosome 20. Mapping is guided by the human-mouse whole genome alignments which report 5 insertions in human. We classified this mouse TFos as FunctCons, as its mapped version in human overlapps with Rad21 occupancy sites in K562.

22 2 Suplementary tables

Min. X1st.Qu. Median Mean X3rd.Qu. Max. FunctActive (mm9) 5.4 38.3 74.0 131.1 164.5 7960.0 FunctCons (mm9) 14.6 74.9 153.7 227.4 302.5 6875.0 NonMapped (mm9) 5.4 41.2 83.1 135.2 174.9 8899.0 SeqCons (mm9) 5.4 38.5 74.0 121.2 157.0 4958.0 FunctActive (hg19) 9.8 34.7 65.9 98.4 137.9 1370.0 FunctCons (hg19) 14.1 41.3 77.9 98.6 139.1 829.9 NonMapped (hg19) 9.8 33.1 59.5 85.6 114.1 1588.0 SeqCons (hg19) 9.7 30.9 51.0 74.9 95.8 1239.0

Table 1: Peak signal statistics by peak classes

Analogous cell pair (human, mouse) Transcription factor 1 leuk (K562, Mel) Gata1 2 leuk (K562, Mel) Pol2s2 3 leuk (K562, Mel) Usf2 4 leuk (K562, Mel) Ctcfb 5 leuk (K562, Mel) Ets1 6 leuk (K562, Mel) Tbp 7 leuk (K562, Mel) Max 8 leuk (K562, Mel) Mafkab50322 9 leuk (K562, Mel) Chd2ab68301 10 leuk (K562, Mel) P300 11 leuk (K562, Mel) Jund 12 leuk (K562, Mel) Smc3ab9263 13 leuk (K562, Mel) Mxi1af4185 14 leuk (K562, Mel) Ctcf 15 leuk (K562, Mel) Cmyc 16 leuk (K562, Mel) Rad21 17 leuk (K562, Mel) Pol2 18 lymph (Gm12878, Ch12) Pol2s2 19 lymph (Gm12878, Ch12) Sin3anb6001263 20 lymph (Gm12878, Ch12) Usf2 21 lymph (Gm12878, Ch12) Ets1 22 lymph (Gm12878, Ch12) Tbp 23 lymph (Gm12878, Ch12) Max 24 lymph (Gm12878, Ch12) Chd2ab68301 25 lymph (Gm12878, Ch12) Jund 26 lymph (Gm12878, Ch12) Smc3ab9263 27 lymph (Gm12878, Ch12) Corestsc30189 28 lymph (Gm12878, Ch12) Mazab85725 29 lymph (Gm12878, Ch12) E2f4 30 lymph (Gm12878, Ch12) Ctcf 31 lymph (Gm12878, Ch12) Rad21 32 lymph (Gm12878, Ch12) Pol2 Table 2: Analogous cells

23 Species Counting Total Count Function conserved Sequence conserved 1 mm9 elements 727680 397846 503710 2 hg19 elements 5330864 2065660 3850414 3 mm9 Mega-nt 32 16 22 4 hg19 Mega-nt 121 25 83

Table 3: Counts of mappable, functionally active, and total TFos. The table reports both the element count and the coverage in mega bases

cell epo ucsc both species 1 Cbellum 13339 16961 12301 mm9 2 Erythrobl 1472 1919 1370 mm9 3 Cortex 17450 21978 16072 mm9 4 Ch12 65349 84997 60454 mm9 5 Mel 111971 143033 102328 mm9 6 Bmarrow 13758 17662 12558 mm9 7 G1eer4e2 6610 8332 6007 mm9 8 Limb 10018 12634 9243 mm9 9 Heart 22077 28145 20571 mm9 10 G1eer4 7543 9520 6872 mm9 11 Ese14 3738 4795 3401 mm9 12 Wbrain 10163 12937 9416 mm9 13 Mef 15423 19776 14283 mm9 14 Lung 18262 23237 16796 mm9 15 Bmdm 1503 1944 1353 mm9 16 Testis 10113 12846 9209 mm9 17 Liver 16232 20546 14857 mm9 18 Spleen 1979 2770 1874 mm9 19 Smint 8595 11012 7880 mm9 20 Megakaryo 633 850 603 mm9 21 Thymus 4530 6123 4234 mm9 22 G1e 10300 13041 9353 mm9 23 Olfact 3385 4535 3198 mm9 24 Kidney 14969 18594 13616 mm9 25 Esb4 18483 23375 16845 mm9

Table 4: Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length.

24 cell epo ucsc both species 1 A549 145435 192624 136593 hg19 2 Helas3 191883 257474 181513 hg19 3 Gm12878 323618 434881 303925 hg19 4 U2os 1085 1529 1013 hg19 5 H1hesc 179399 238861 168954 hg19 6 Pbde 5011 7039 4773 hg19 7 Mcf10aes 182743 232988 171603 hg19 8 Gm12891 39863 54887 37533 hg19 9 Ecc1 28441 37530 26925 hg19 10 K562 424430 579307 398262 hg19 11 Hepg2 318167 423487 298333 hg19 12 Nb4 32771 44031 30571 hg19 13 Gm06990 12317 15951 11439 hg19 14 Huvec 51885 68268 49157 hg19 15 Gm12872 12756 16409 11787 hg19 16 Wi38 5526 7106 5123 hg19 17 T47d 35984 46540 33810 hg19 18 Mcf7 23622 31147 22094 hg19 19 Hcm 11542 14928 10751 hg19 20 Hasp 16039 20502 14901 hg19 21 Nhek 10680 13814 9941 hg19 22 Hpaf 16072 20530 14939 hg19 23 Hmf 16625 21272 15498 hg19 24 Hct116 33158 45045 31284 hg19 25 Pfsk1 11124 15420 10651 hg19 26 Gm12892 25328 36539 24239 hg19 27 Hek293 25114 33422 23552 hg19 28 Hl60 4840 6146 4455 hg19 29 Rptec 14744 18840 13675 hg19 30 Panc1 7341 10445 7004 hg19 31 Imr90 62159 79214 58040 hg19 32 Hee 12157 15463 11285 hg19 33 Caco2 18754 23826 17375 hg19 34 Hek293t 4898 6799 4695 hg19 35 Hffmyc 8425 10735 7806 hg19 36 Ag09319 16126 20650 15035 hg19 37 Sknshra 88506 113500 83167 hg19 38 Be2c 14342 18389 13315 hg19 39 Nhlf 10561 13521 9825 hg19 40 Sknsh 115218 149442 108746 hg19 41 Hrpe 18594 23642 17257 hg19 42 Gm15510 7589 10884 7242 hg19 43 Werirb1 19255 24568 17815 hg19 44 Gm12864 13074 16900 12118 hg19 45 Hbmec 17228 22036 15998 hg19 46 Saec 14583 18802 13537 hg19 47 Aoaf 19868 25343 18479 hg19 48 Gm18526 5204 7519 4981 hg19 49 Gm12865 14135 18284 13104 hg19 50 Gm18951 8262 11677 7868 hg19

Table 5: Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length. Continues ...

25 cell epo ucsc both species 51 Nt2d1 2431 3320 2382 hg19 52 Gm19099 7419 10467 7078 hg19 53 Gm18505 6578 9431 6282 hg19 54 Hmec 18667 24058 17411 hg19 55 Nhdfneo 12788 16358 11901 hg19 56 Sknmc 8670 11862 8255 hg19 57 Hvmf 12829 16426 11886 hg19 58 Bj 13007 16754 12095 hg19 59 Hpf 15975 20473 14891 hg19 60 Hcpe 18749 23982 17440 hg19 61 Raji 4356 6256 4145 hg19 62 Hac 11025 14109 10283 hg19 63 Ag04450 13815 17771 12862 hg19 64 Hcfaa 14202 18203 13216 hg19 65 Hre 13388 17115 12457 hg19 66 Gm10847 4313 6156 4121 hg19 67 Ag04449 24796 31787 23088 hg19 68 U87 8224 11328 7859 hg19 69 Ag10803 17574 22431 16350 hg19 70 Shsy5y 4937 6263 4720 hg19 71 Gm12875 12409 16123 11517 hg19 72 Gm19193 6284 9018 6028 hg19 73 Gm12801 862 1089 797 hg19 74 Gm08714 1 7 1 hg19 75 Pbdefetal 258 348 248 hg19 76 Gm12873 14378 18541 13320 hg19 77 Hff 17984 22979 16688 hg19 78 Gm12874 12534 16237 11610 hg19 79 Ag09309 11504 14748 10680 hg19

Table 6: Number of TFos mapped by each alignment for select assays. An element can be mapped if it overlaps DHS elements and shared human mouse DNA by half of its length.

26