Transcriptomics-guided design of synthetic promoters for a mammalian system

Joseph K. Cheng1 and Hal S. Alper1,2,*

1Department of Chemical Engineering, The University of Texas at Austin, 200 E Dean Keeton St. Stop C0400, Austin, Texas 78712 2Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2500 Speedway Avenue, Austin, Texas 78712 *Correspondence and requests for materials should be addressed to H.S.A. ([email protected])

Supporting Information Accession codes of nucleotide sequences used in this manuscript

Human cytomegalovirus: M60321.1

Simian Virus 40: J02400.1

Human , GRCh38.p2 assembly: NC_000001.11, NC_000002.12, NC_000003.12,

NC_000004.12, NC_000005.10, NC_000006.12, NC_000007.14, NC_000008.11, NC_000009.12,

NC_000010.11, NC_000011.10, NC_000012.12, NC_000013.11, NC_000014.9, NC_000015.10,

NC_000016.10, NC_000017.11, NC_000018.10, NC_000019.10, NC_000020.11, NC_000021.9,

NC_000022.11, NC_000023.11, NC_000024.10

Equations

From the derived GMM, we can determine the probabilities of several key concerns: 1) the false positive probability at a particular threshold expression value (log-transformed) of belonging to the high expression group; 2) the false negative probability at a particular threshold expression value (log- transformed) of belonging to the high expression group; and 3) the probability of observing an expression value (log-transformed), X , less than or equal to a specified expression value (e.g. median expression value) if X belongs to the high expression group.

∞, , , ,

∞, , , ,

, ,

, , Supporting Tables and Graphics

Nucleotides considered from ref. file (GRCh38.p2 Primary Assembly) 2000-bp region 500-bp region EEF1A1* 6 73,521,382 - 73,520,027 73,521,074 - 73,520,027 CLUAP1 16 3,498,945 - 3,500,944 3,500,445 - 3,500,944 TPT1 13 45,343,284 - 45,341,285 45,341,784 - 45,341,285 TUBA1B 12 49,133,521 - 49,131,522 49,132,021 - 49,131,522 GGA1 22 37,606,385 - 37,608,384 37,607,885 - 37,608,384 LAIR1 19 54,372,553 - 54,370,554 54,371,053 - 54,370,554 UBC* 12 124,915,549 - 124,913,772 124,915,081 - 124,913,772 VIM 10 17,225,935 - 17,227,934 17,227,435 - 17,227,934 ITIH5 10 7,668,998 - 7,666,999 7,667,498 - 7,666,999 F2R 5 76,714,043 - 76,716,042 76,715,543 - 76,716,042 TMEM158 3 45,228,322 - 45,226,323 45,226,822 - 45,226,323 MSH3 5 80,652,648 - 80,654,647 80,654,148 - 80,654,647 CCR6 6 167,109,807 - 167,111,806 167,111,307 - 167,111,806 ACTG1 17 81,514,866 - 81,512,867 81,513,366 - 81,512,867 FTL 19 48,963,309 - 48,965,308 48,964,809 - 48,965,308 TMSB10 2 84,903,639 - 84,905,638 84,905,139 - 84,905,638 GNB2L1 5 181,245,906 - 181,243,907 181,244,406 - 181,243,907 ROCK2 2 11,346,585 - 11,344,586 11,345,085 - 11,344,586 HSP90AA1 14 102,141,749 - 102,139,750 102,140,249 - 102,139,750 EIF4A1* 17 7,571,970 - 7,572,705 7,572,206 - 7,572,705

Table S1a. Details of 20 most highly expressed . *denotes alternative length of promoter considered for annotation. Nucleotides considered from ref. file Gene Chromosome (GRCh38.p2 Primary Assembly) 2000-bp region 500-bp region SUMO4 6 149,398,359 - 149,400,358 149,399,859 - 149,400,358 MAGED4B 23, X 52,071,272 - 52,069,273 52,069,572 - 52,069,273 ARTN 1 43,931,320 - 43,933,319 43,932,820 - 43,933,319 DUSP23 1 159,778,940 - 159,780,939 159,780,440 - 159,780,939 KLC4 6 43,057,594 - 43,059,593 43,059,094 - 43,059,593 NR1I2 3 119,778,484 - 119,780,483 119,779,984 - 119,780,483 PRIM2 6 57,314,305 - 57,314,804 57,312,805 - 57,314,804 MAP2 2 209,422,009 - 209,424,008 209,423,509 - 209,424,008 SMN1 5 70,922,941 - 70,924,940 70,924,441 - 70,924,940 COL25A1 4 109,304,643 - 109,302,644 109,303,143 - 109,302,644 NPTX1 17 80,478,604 - 80,476,605 80,477,104 - 80,476,605 NPSR1 7 34,656,285 - 34,658,284 34,657,785 - 34,658,284 LRRIQ1 12 85,034,321 - 85,036,320 85,035,821 - 85,036,320 MED31 17 6,653,799 - 6,651,800 6,652,299 - 6,651,800 CACNB4 2 152,101,079 - 152,099,080 152,099,579 - 152,099,080 HDAC9 7 18,084,942 - 18,086,941 18,086,442 - 18,086,941 SHC2 19 462,996 - 460,997 461,496 - 460,997 POMC 2 25,170,690 - 25,168,691 25,169,190 - 25,168,691 AGTR1 3 148,695,871 - 148,697,870 148,697,371 - 148,697,870 FGL1 8 17,912,365 - 17,910,366 17,910,865 - 17,910,366

Table S1b. Details of 20 genes with median expression. Nucleotides considered from ref. file Gene Chromosome (GRCh38.p2 Primary Assembly) 2000-bp region 500-bp region ALB 4 73,402,255 - 73,404,254 73,403,755 - 73,404,254 ALG8 11 78,141,660 - 78,139,661 78,140,160 - 78,139,661 ARHGEF11 1 157,048,640 - 157,046,641 157,047,140 - 157,046,641 ASPSCR1 17 81,975,550 - 81,977,549 81,977,050 - 81,977,549 CLIP1 12 122,424,632 - 122,422,633 122,423,132 - 122,422,633 CRNN 1 152,416,274 - 152,414,275 152,414,774 - 152,414,275 DRAM 12 101,875,327 - 101,877,326 101,876,827 - 101,877,326 FAM134B 5 16,619,058 - 16,617,059 16,617,558 - 16,617,059 HDHD3 9 113,379,009 - 113,377,010 113,377,509 - 113,377,010 KRTAP4-2 17 41,180,208 - 41,178,209 41,178,708 - 41,178,209 LYSMD4 15 99,735,458 - 99,733,459 99,733,958 - 99,733,459 MUC17 7 101,018,076 - 101,020,075 101,019,576 - 101,020,075 OR13F1 9 104,502,263 - 104,504,262 104,503,763 - 104,504,262 PLSCR4 3 146,253,179 - 146,251,180 146,251,679 - 146,251,180 POLR3H 22 41,546,606 - 41,544,607 41,545,106 - 41,544,607 POP2 3 191,459,163 - 191,461,162 191,460,663 - 191,461,162 SFI1 22 31,486,688 - 31,488,687 31,488,188 - 31,488,687 TAF9 5 69,372,013 - 69,370,014 69,370,513 - 69,370,014 TCP1 6 159,791,703 - 159,789,704 159,790,203 - 159,789,704 TFDP3 23, X 133,220,348 - 133,218,349 133,218,848 - 133,218,349

Table S1c. Details of 20 randomly selected genes. Nucleotides considered from ref. file promoter Chromosome 2000-bp region 500-bp region CMV IE viral 1 - 2,105 534 - 1,193 SV40 viral 5,176 - 346 5,176 - 346 HSV-TK viral 48,633 - 47,881 48,633 - 47,881 ACTB 7 5,532,044 - 5,529,628 78,103,669 - 78,104,168 PGK1 23, X 78,102,169 - 78,104,168 6,533,905 - 6,534,404 GAPDH 12 6,532,405 - 6,534,404 94,394,061 - 94,394,560 COL1A2 7 94,392,561 - 94,394,560 534 - 1,193

Table S1d. Details of some common viral-derived and endogenous promoters. Human cytomegalovirus immediate-early (CMV IE) promoter taken from M60321.1; Simian Virus 40 (SV40) promoter taken from J02400.1, reverse complement; Herpes Simplex Virus thymidine-kinase (HSV-TK) promoter taken from NC_001806.1, human herpesvirus 1, reverse complement. Endogenous sequences are taken from GRCh38.p2 primary assembly. transcription factor consensus sequence AR ARGAACANNNTGTNC BATF::JUN RNWATGASTCA BRCA1 NCAACMS CDX2 NNGYMATAAAA CEBPA ATTGCAYAAYN CEBPB KATTGCAYMAY CREB (half) CGTCA CREB1 TGACGTCA CREB1 (other) BNBGRTGACGYN CTCF YNRCCASYAGRKGGCRSYN CTCF (short) CCGCGNGGNGGCAG DUX4 TAAYYYAATCA E2F1 NGGGCGGGARV E2F4 NGGCGGGARRN E2F6 RGGCGGGARRN EBF1 NYCCCCWGGGA EGR1 NCNCCGCCCCCKCN EHF MCTTCCTS ELF1 NRANCMGGAAGTG ELK1 NNNCCGGAAR ELK4 NCRCTTCCGGN ESR1 NNNNNAGGTCACCCTGACCY ESR2 AGGTCASNNTGNCCY ESRRA YCAAGGTCACN ETS1 YWTCCK EWSR1-FLI1 GGAAGGAAGGAAGGAAGG FEV CAGGAART FLI1 RCAGGAAGTGR FOS NNTGASTCATN FOSL1 RRTGASTCAKN FOSL2 NRRTGASTCAB FOXA1 TGTTTRCWYWN FOXC1 NNNNNGTA FOXD1 GTAAACAN FOXF2 NNANSGTAAACAAN FOXH1 NNNAATCCACA FOXI1 NNNTRTTTRTTT FOXL1 WNNANATA FOXO3 KGTAAACA FOXP1 NNNANGTAAACAAAN FOXP2 NWGTAAACARN GABPA ACCGGAAGNS GATA2 NNNTTCTTATCTSN GATA3 AGAGAAGA HIF1A::ARNT VNACGTGN HINFP NAACGTCCGC HLF NRTTACRYAATN HNF1A GGTTAATNATTANC HNF1B YYAATRWTTAAC HNF4A NTGRACTTTGNNCYN HNF4G NRRGNNCAAAGKYCA HOXA5 CDBWAATK HSF1 NTTCTRGAANNTTCY INSM1 TGYCWGGGGGCR IRF1 NNNYRSTTTCACTTTCNNTTT transcription factor consensus sequence IRF2 SGAAAGYGAAASCNWWWM JUN NNRATGATGTMAT JUN (var.2) NNNNRRTGASTCAN JUN::FOS TGACTCA JUNB NRRTGASTCAK JUND NRTGASTCATN JUND (var.2) NNNRATGABGTCATN KLF5 GCCCCDCCCH MAFF NCTNASTCAGCANWWWNN MAFK MNNASTCAGCANWWW MAX RRGCACATGK MEF2A NKCTAAAAATAGMNN MEF2C NNNCYAAAAATAGMN MYC::MAX RASCACGTGGT MZF1_1-4 NGGGGA MZF1_5-13 NKAGGGGKAR NF1 (consensus) TTGGCNNNNNNCC NFATC2 TTTTCCA NFE2::MAF ATGACTCAGCANWWN NFE2L2 RTGACWMAGCA NFIC TTGGCN NFIL3 TTAYGTAAYNN NFKB1 KGGRNTTTCCM NFYA AGNSYKCTGATTGGTNNR NFYB NNNYNRRCCAATCAG NHLH1 NCGCAGCTGCGN NKX3-1 ATACTTA NR1H2::RXRA AAAGGTCAAAGGTCAAC NR2C2 NGRGGTCARAGGTCA NR2F1 TGAMCTTTGVMCHT NR4A2 AAGGTCAC NRF1 GCGCNTGCGCR PAX5 NNGNKCAGYSRAGCRTGAC Pax6 TTCACGCWTSANTK PBX1 NCATCAATCAAW PLAG1 GGGGCCCWAGGG POU2F2 NNNATTTGCATRW PPARG STRGGTCACNGTGACCYANT PPARG::RXRA NNWGRGGTCAAAGGTCANNN PRDM1 NRAAAGTGAAAGTNN REL BGGRNWTTCC RELA GGGRRTTTCC REST TTCAGCACCATGGACAGCKCC RFX2 GTYNCCATGGCAACNRNNN RFX5 YNSCMTRGCAACAGN RORA_1 AWNNAGGTCA RORA_2 NWWAWNTAGGTCAN RREB1 CCCCMAAMCAMCCMCMMMCN RUNX1 WWYTGYGGTWW RUNX2 KNKNNYTGTGGTYTK RXR::RAR_DR5 RGKTCANNNRSAGGTCA RXRA::VDR GGGTCAWNGRGTTCA SMAD2::SMAD3::SMAD4 NTGTCTGNCACCT SOX10 CWTTGT SOX9 CYATTGTTN transcription factor consensus sequence SP1 NCCCCKCCCCC SP1 (short) CCKCCY SP2 GYCCCGCCYCYNNNN SPI1 AGGAAGT SPIB WSMGGAA SREBF1 VTCACCCCAY SREBF2 RTGGGGTGAY SRF NNTKNCCAWATAWGGNAA SRY NWWAACAAT STAT1 NTTCYRGGAAA STAT2::STAT1 TNAGTTTCNNTTTCY STAT3 NTTCYKGGAAN TAL1::GATA1 NTTATCWNNNNNNNNCAG TAL1::TCF3 NNAMCATCTGKT TBP STATAWAWRNNNNNN TCF7L2 NNASWTCAAAGNNN TEAD1 YACATTCCWSNG TFAP2A NNNNGCCYSAGGGCA TFAP2C NNNNSCCYCAGGSCN THAP1 YTGCCCNNA TLX1::NFIC TGGCASSRNGCCAA TP53 RCATGYCCAGACATG TP63 NNRCAWGYNCARRCWTGYNN USF1 NNCAYGTGACC USF2 RYCAYGTGACY YY1 CAARATGGCNGC ZBTB33 NTCTCGCGAGANYTN ZEB1 NCWCACCTG ZNF263 GGAGGAGGRRGRGGRGGRRRR ZNF354C MTCCAC

Table S2. Consensus sequences of putative TFBSs used in annotation of promoter and other genomic regions based on the JASPAR database1.

set TFBSs

high U CMV U SV40 ETS1, FOXC1, MZF1_1-4, NFKB1, REL, SP1 (short), SPIB

CREB (half), CREB1, FOXL1, HIF1A::ARNT, HOXA5, high U CMV JUN::FOS, NF1, NFATC2, NFIC, RELA, RORA_1, SOX10, TBP, ZNF354C

high U SV40 SP2, ZEB1

CMV U SV40 none

BRCA1, CEBPB, CREB1 (other), E2F1, E2F4, E2F6, EGR1, EHF, ELK1, ESRRA, EWSR1-FLI1, FEV, FOS, FOSL1, FOSL2, FOXD1, FOXF2, FOXI1, FOXO3, FOXP1, FOXP2, high only HINFP, JUN (var.2), JUNB, KLF5, MZF1_5-13, NFE2L2, NKX3-1, NR4A2, NRF1, POU2F2, RUNX1, RUNX2, SOX9, SP1, SPI1, SREBF1, SRY, STAT1, STAT3, TFAP2C, THAP1, YY1

CMV only NFIL3

SV40 only TEAD1

Table S3a. List of potentially overlapping TFBSs across high expression group and two commonly used viral-derived promoters, CMV and SV40 based on the larger, 2000-bp region annotation. set TFBSs

high U CMV U SV40 ETS1, FOXC1, MZF1_1-4, NFKB1, REL, SP1 (short)

CREB (half), CREB1, FOXL1, HOXA5, JUN::FOS, NF1, high U CMV NFIC, RORA_1, TBP, ZNF354C

high U SV40 SP2, SPIB, ZEB1

CMV U SV40 none

BRCA1, CREB1 (other), E2F1, E2F4, E2F6, EGR1, EHF, ELK1, FOSL2, FOXD1, FOXI1, FOXO3, FOXP2, high only HIF1A::ARNT, HINFP, JUN (var.2), KLF5, MZF1_5-13, NFATC2, NRF1, SOX10, SP1, STAT3, TFAP2C, THAP1, YY1

CMV only NFIL3, RELA

SV40 only TEAD1

Table S3b. List of potentially overlapping TFBSs across high expression group and two commonly used viral-derived promoters, CMV and SV40 based on the smaller, 500-bp region annotation. set TFBSs

BRCA1, CREB (half), ETS1, FOXC1, HIF1A::ARNT, HOXA5, high U EEF1A1 U UbC KLF5, MZF1_1-4, NFIC, REL, SOX10, SP1 (short), SPIB, ZNF354C E2F1, FOSL2, JUN (var.2), JUN::FOS, MZF1_5-13, high U EEF1A1 NFKB1, SP1, TBP

high U UbC CREB1, EHF, FOXL1, NFATC2, THAP1, YY1

EEF1A1 U UbC none

CEBPB, CREB1 (other), E2F4, E2F6, EGR1, ELK1, ESRRA, EWSR1-FLI1, FEV, FOS, FOSL1, FOXD1, FOXF2, FOXI1, FOXO3, FOXP1, FOXP2, HINFP, JUNB, NF1, NFE2L2, NKX3- high only 1, NR4A2, NRF1, POU2F2, RELA, RORA_1, RUNX1, RUNX2, SOX9, SP2, SPI1, SREBF1, SRY, STAT1, STAT3, TFAP2C, ZEB1

EEF1A1 only none

UbC only none

Table S3c. List of potentially overlapping TFBSs across high expression group and two commonly used endogenous promoters, EEF1A1 and UbC based on the larger, 2000-bp region annotation. set TFBSs

BRCA1, ETS1, FOXC1, HOXA5, KLF5, MZF1_1-4, NFIC, SP1 high U EEF1A1 U UbC (short), SPIB, ZNF354C

FOSL2, HIF1A::ARNT, JUN (var.2), JUN::FOS, MZF1_5- high U EEF1A1 13, NFKB1, REL, SOX10, SP1, TBP

high U UbC CREB (half), CREB1, FOXL1, THAP1, YY1

EEF1A1 U UbC none

CREB1 (other), E2F1, E2F4, E2F6, EGR1, EHF, ELK1, high only FOXD1, FOXI1, FOXO3, FOXP2, HINFP, NF1, NFATC2, NRF1, RORA_1, SP2, STAT3, TFAP2C, ZEB1

EEF1A1 only none

UbC only none

Table S3d. List of potentially overlapping TFBSs across high expression group and two commonly used endogenous promoters, EEF1A1 and UbC based on the smaller, 500-bp region annotation.

Count of TFBS in synthetic promoter TFBS E.synth v1 E.synth v2 E.synth v3 CREB (half) 1 1 0 CREB1 (other) 1 1 0 EGR1 2 3 0 EHF 1 0 0 ELF1 1 0 0 ELK1 1 1 0 ELK4 1 0 0 ETS1 2 2 0 FOS 2 2 0 FOSL1 2 2 0 FOSL2 2 3 0 FOXC1 0 0 29 FOXL1 0 0 16 HIF1A::ARNT 1 1 0 HOXA5 0 0 1 JUN (var.2) 2 3 0 JUN::FOS 2 2 0 JUNB 2 3 0 JUND 2 2 0 KLF5 4 3 0 MZF1_1-4 4 3 0 NKX3-1 0 0 3 SOX10 0 0 1 SP1 4 5 0 SP1 (short) 13 11 0 SP2 2 4 0 STAT3 1 1 0 TBP 0 0 2 Total 53 53 52

Table S4a. Actual TFBSs present in synthetic promoters E.synth v1, v2, and v3 based on JASPAR annotation. Using the same convention from Table 2: green denotes that the TFBS is enriched in high expression and red denotes that the TFBS is enriched in background.

TFBS location TFBS location JUN (var.2) 1..14 JUN::FOS 70..76 FOSL2 4..14 KLF5 78..87 JUNB 4..14 SP1 78..88 FOSL1 5..15 SP2 78..92 FOS rev:5..15 SP1 (short) 81..86 JUND rev:5..15 MZF1_1-4 89..94 JUN::FOS 7..13 SP1 (short) 95..100 HIF1A::ARNT 15..22 MZF1_1-4 101..106 ELF1 22..34 SP1 (short) 107..112 ELK1 23..32 SP1 (short) 113..118 ELK4 rev:25..35 SP1 (short) 117..122 ETS1 rev:27..32 KLF5 119..128 MZF1_1-4 33..38 SP1 (short) 122..127 SP1 (short) 39..44 KLF5 124..133 SP1 42..52 SP1 (short) 127..132 EGR1 42..55 KLF5 129..138 SP1 (short) 45..50 SP1 129..139 SP1 (short) 51..56 EGR1 129..142 STAT3 rev:56..66 SP2 129..143 EHF rev:60..67 SP1 (short) 132..137 ETS1 rev:61..66 SP1 135..145 JUN (var.2) 64..77 SP1 (short) 138..143 FOSL2 67..77 MZF1_1-4 rev:140..145 JUNB 67..77 CREB1 (other) 144..155 FOSL1 68..78 CREB (half) rev:149..153 FOS rev:68..78 SP1 (short) 156..161 JUND rev:68..78

Table S4b. TFBSs present in synthetic promoters synth.v1 based on JASPAR annotation in between EcoRI and NotI sites (Fig. S2d).

TFBS location TFBS location SP1 (short) 1..6 FOSL1 rev:95..105 STAT3 rev:6..16 JUND rev:95..105 ETS1 rev:11..16 FOSL2 rev:96..106 SP1 (short) 17..22 JUNB rev:96..106 MZF1_1-4 23..28 JUN (var.2) rev:96..109 KLF5 29..38 JUN::FOS 97..103 SP1 29..39 MZF1_1-4 105..110 SP2 29..43 JUN (var.2) 111..124 SP1 (short) 32..37 FOSL2 114..124 CREB1 (other) 40..51 JUNB 114..124 CREB (half) rev:45..49 FOSL1 115..125 SP2 49..63 FOS rev:115..125 SP1 (short) 52..57 JUND rev:115..125 SP1 (general) 55..65 JUN::FOS 117..123 EGR1 55..68 KLF5 125..134 SP1 (short) 58..63 SP1 125..135 SP1 (general) 61..71 SP2 125..139 EGR1 61..74 SP1 (short) 128..133 SP1 (short) 64..69 SP1 (short) 135..140 SP1 (short) 70..75 MZF1_1-4 141..146 HIF1A::ARNT 76..83 KLF5 147..156 ELK1 84..93 SP1 147..157 ETS1 rev:88..93 EGR1 147..160 JUN (var.2) 91..104 SP2 147..161 FOSL2 94..104 SP1 (short) 150..155 JUNB 94..104 SP1 (short) 156..161 FOS rev:95..105

Table S4c. TFBSs present in synthetic promoters synth.v2 based on JASPAR annotation in between EcoRI and NotI sites (Fig. S2e).

TFBS location TFBS location FOXC1 1..8 FOXC1 rev:113..120 FOXC1 21..28 FOXC1 rev:121..128 FOXC1 25..32 FOXC1 rev:133..140 FOXC1 29..36 FOXL1 5..12 FOXC1 33..40 FOXL1 9..16 FOXC1 37..44 FOXL1 11..18 FOXC1 41..48 FOXL1 17..24 FOXC1 45..52 FOXL1 53..60 FOXC1 49..56 FOXL1 57..64 FOXC1 61..68 FOXL1 85..92 FOXC1 65..72 FOXL1 89..96 FOXC1 69..76 FOXL1 111..118 FOXC1 73..80 FOXL1 113..120 FOXC1 77..84 FOXL1 119..126 FOXC1 81..88 FOXL1 rev:7..14 FOXC1 93..100 FOXL1 rev:15..22 FOXC1 97..104 FOXL1 rev:55..62 FOXC1 123..130 FOXL1 rev:87..94 FOXC1 127..134 FOXL1 rev:117..124 FOXC1 139..146 HOXA5 135..142 FOXC1 143..150 NKX3-1 rev:3..9 FOXC1 rev:11..18 NKX3-1 rev:51..57 FOXC1 rev:19..26 NKX3-1 rev:83..89 FOXC1 rev:59..66 SOX10 105..110 FOXC1 rev:91..98 TBP rev:7..21 FOXC1 rev:103..110 TBP rev:109..123

Table S4d. TFBSs present in synthetic promoters synth.v3 based on JASPAR annotation in between EcoRI and NotI sites (Fig. S2f).

Chromosome Region 1 Region 2 Region 3 Region 4 Region 5 Region 6 Region 7 Region 8 Region 9 Region 10 1 123,538,000 234,274,000 182,424,000 13,812,000 179,166,000 100,146,000 118,380,000 8,664,000 144,020,000 161,550,000 2 9,302,000 58,686,000 84,580,000 181,398,000 103,370,000 238,200,000 598,000 138,464,000 214,326,000 144,620,000 3 155,568,000 123,586,000 59,284,000 166,384,000 154,964,000 47,380,000 77,718,000 45,840,000 160,474,000 34,706,000 4 61,020,000 37,852,000 39,078,000 13,808,000 92,042,000 29,462,000 33,854,000 85,652,000 186,330,000 164,652,000 5 43,898,000 100,458,000 12,896,000 50,696,000 34,220,000 153,998,000 40,088,000 178,868,000 90,054,000 42,306,000 6 82,638,000 39,482,000 68,114,000 99,904,000 11,014,000 79,998,000 53,682,000 84,000,000 63,220,000 82,526,000 7 1,856,000 10,002,000 6,048,000 30,992,000 90,556,000 78,810,000 106,712,000 20,200,000 65,218,000 117,196,000 8 112,936,000 51,868,000 46,246,000 90,226,000 69,518,000 101,142,000 80,774,000 21,512,000 48,778,000 50,966,000 9 80,812,000 8,880,000 67,006,000 22,522,000 43,388,000 20,944,000 75,268,000 11,002,000 135,830,000 125,144,000 10 120,450,000 118,730,000 31,082,000 10,880,000 86,832,000 48,718,000 125,292,000 96,550,000 4,170,000 113,462,000 11 131,076,000 123,472,000 104,350,000 88,426,000 116,392,000 36,502,000 15,560,000 20,810,000 47,036,000 47,448,000 12 44,500,000 28,336,000 56,986,000 58,418,000 29,176,000 40,874,000 45,640,000 79,524,000 37,348,000 26,556,000 13 112,768,000 80,774,000 43,668,000 74,554,000 58,856,000 45,810,000 19,020,000 57,410,000 35,976,000 16,602,000 14 20,530,000 61,128,000 84,662,000 81,042,000 32,086,000 106,832,000 86,082,000 84,848,000 47,294,000 48,374,000 15 100,770,000 55,212,000 47,686,000 41,552,000 45,100,000 52,910,000 19,572,000 65,386,000 66,868,000 27,464,000 16 56,636,000 53,650,000 14,920,000 72,748,000 50,320,000 76,592,000 32,596,000 89,402,000 13,894,000 77,620,000 17 22,742,000 34,644,000 76,288,000 81,014,000 51,530,000 39,012,000 45,092,000 44,988,000 19,580,000 56,348,000 18 64,404,000 32,910,000 62,476,000 44,876,000 29,066,000 47,416,000 33,590,000 9,994,000 37,936,000 18,988,000 19 40,304,000 32,378,000 41,508,000 28,964,000 2,020,000 184,000 25,802,000 27,424,000 15,384,000 27,100,000 20 23,118,000 21,928,000 37,842,000 18,850,000 9,958,000 31,844,000 1,446,000 55,314,000 56,356,000 3,070,000 21 40,192,000 7,770,000 13,736,000 39,610,000 29,198,000 5,546,000 33,340,000 23,394,000 29,232,000 19,538,000 22 45,748,000 24,990,000 23,564,000 35,128,000 27,518,000 50,766,000 21,184,000 33,288,000 44,094,000 42,740,000 X 21,204,000 114,822,000 48,034,000 52,092,000 92,730,000 120,208,000 41,854,000 143,504,000 63,248,000 126,678,000 Y 17,386,000 15,446,000 7,908,000 20,036,000 17,858,000 1,434,000 21,390,000 18,902,000 26,230,000 10,488,000

Chromosome Region 11 Region 12 Region 13 Region 14 Region 15 Region 16 Region 17 Region 18 Region 19 Region 20 1 169,118,000 104,800,000 166,864,000 108,424,000 152,366,000 242,986,000 19,636,000 230,400,000 244,108,000 228,512,000 2 26,932,000 89,862,000 119,464,000 76,640,000 193,108,000 113,048,000 38,046,000 188,410,000 170,018,000 19,208,000 3 13,746,000 152,204,000 70,210,000 69,510,000 28,864,000 192,984,000 145,716,000 157,130,000 25,826,000 176,346,000 4 104,012,000 8,832,000 175,668,000 82,978,000 18,110,000 119,186,000 4,898,000 167,500,000 48,674,000 114,624,000 5 19,862,000 17,160,000 10,392,000 77,112,000 21,414,000 162,232,000 31,326,000 3,872,000 173,898,000 40,720,000 6 70,236,000 100,200,000 106,724,000 85,534,000 85,224,000 38,072,000 50,290,000 4,838,000 33,200,000 67,510,000 7 36,814,000 60,330,000 85,018,000 30,726,000 41,348,000 81,308,000 122,242,000 101,580,000 47,234,000 25,764,000 8 35,028,000 4,574,000 76,836,000 40,392,000 80,162,000 91,156,000 55,380,000 35,918,000 123,418,000 69,706,000 9 27,614,000 18,596,000 19,906,000 102,566,000 130,894,000 131,172,000 135,616,000 121,146,000 82,188,000 127,818,000 10 50,284,000 58,464,000 91,298,000 57,658,000 61,166,000 117,204,000 41,094,000 87,148,000 92,222,000 52,058,000 11 107,266,000 71,596,000 73,870,000 103,334,000 94,574,000 13,102,000 33,274,000 63,668,000 102,784,000 78,574,000 12 16,700,000 65,300,000 100,380,000 84,674,000 91,634,000 90,196,000 101,606,000 60,542,000 91,728,000 101,830,000 13 113,644,000 27,668,000 62,978,000 113,412,000 104,534,000 21,806,000 56,368,000 44,890,000 64,526,000 71,398,000 14 51,166,000 55,106,000 56,376,000 39,950,000 99,924,000 20,704,000 40,938,000 91,928,000 103,716,000 64,790,000 15 27,794,000 34,476,000 54,564,000 74,214,000 31,240,000 25,202,000 98,530,000 100,372,000 70,202,000 68,356,000 16 17,562,000 85,240,000 52,010,000 64,862,000 66,326,000 3,584,000 62,842,000 88,658,000 36,688,000 - 17 12,880,000 50,844,000 31,788,000 50,266,000 31,012,000 76,356,000 71,736,000 36,960,000 - - 18 25,572,000 28,002,000 68,362,000 34,582,000 38,890,000 41,014,000 58,850,000 - - - 19 44,024,000 27,450,000 12,808,000 ------20 8,082,000 57,696,000 10,702,000 63,948,000 ------21 ------22 48,692,000 ------X 15,950,000 91,294,000 141,066,000 88,826,000 45,948,000 52,070,000 63,660,000 83,046,000 22,358,000 43,124,000 Y 14,632,000 15,294,000 ------

Chromosome Region 21 Region 22 Region 23 Region 24 Region 25 Region 26 Region 27 Region 28 Region 29 Region 30 1 48,494,000 56,674,000 167,100,000 6,916,000 90,966,000 97,806,000 163,886,000 193,932,000 70,420,000 145,312,000 2 168,648,000 6,516,000 94,284,000 180,142,000 68,290,000 126,298,000 175,322,000 117,544,000 9,568,000 169,238,000 3 91,596,000 194,050,000 110,106,000 168,508,000 54,350,000 57,896,000 161,846,000 83,130,000 37,450,000 6,956,000 4 164,010,000 31,534,000 17,232,000 182,774,000 2,762,000 164,464,000 164,886,000 34,146,000 172,772,000 117,476,000 5 90,350,000 175,822,000 71,578,000 79,136,000 112,876,000 155,270,000 44,704,000 38,274,000 83,922,000 17,674,000 6 24,790,000 29,302,000 56,454,000 145,654,000 4,430,000 37,240,000 59,778,000 12,948,000 55,296,000 86,252,000 7 66,844,000 139,348,000 19,192,000 138,122,000 116,182,000 105,324,000 10,446,000 110,718,000 143,950,000 4,146,000 8 94,452,000 114,644,000 4,474,000 126,910,000 17,816,000 53,116,000 105,152,000 6,866,000 81,494,000 74,332,000 9 3,520,000 122,234,000 13,320,000 31,862,000 118,670,000 119,828,000 110,832,000 62,270,000 50,876,000 86,032,000 10 9,098,000 90,958,000 117,756,000 54,680,000 13,290,000 42,948,000 109,912,000 16,580,000 31,032,000 - 11 2,234,000 7,878,000 53,944,000 45,808,000 53,742,000 115,390,000 2,854,000 53,740,000 48,914,000 - 12 41,188,000 58,820,000 30,332,000 50,640,000 73,104,000 74,480,000 108,428,000 46,196,000 105,518,000 - 13 44,378,000 47,976,000 102,946,000 88,970,000 ------14 101,630,000 27,074,000 19,392,000 ------15 29,572,000 65,448,000 ------16 ------17 ------18 ------19 ------20 ------21 ------22 ------X 132,678,000 6,076,000 55,438,000 109,752,000 52,836,000 31,196,000 51,638,000 129,170,000 50,120,000 11,930,000 Y ------

Chromosome Region 31 Region 32 Region 33 Region 34 Region 35 Region 36 Region 37 Region 38 Region 39 Region 40 1 173,824,000 105,438,000 5,618,000 8,208,000 162,966,000 204,848,000 49,232,000 93,108,000 18,022,000 150,548,000 2 185,208,000 57,814,000 239,566,000 104,004,000 58,316,000 130,672,000 86,532,000 15,484,000 106,766,000 75,766,000 3 75,236,000 70,660,000 128,202,000 92,448,000 169,856,000 68,072,000 147,250,000 57,660,000 114,344,000 176,862,000 4 32,952,000 161,966,000 165,340,000 88,902,000 130,052,000 6,840,000 70,940,000 15,998,000 141,974,000 99,482,000 5 116,574,000 34,958,000 41,200,000 137,946,000 102,854,000 36,994,000 1,650,000 137,094,000 151,050,000 - 6 42,722,000 109,422,000 82,038,000 36,598,000 37,426,000 55,820,000 2,260,000 - - - 7 72,386,000 53,334,000 49,132,000 109,890,000 ------8 138,530,000 ------9 ------10 ------11 ------12 ------13 ------14 ------15 ------16 ------17 ------18 ------19 ------20 ------21 ------22 ------X 36,830,000 113,924,000 ------Y ------

Chromosome Region 41 Region 42 Region 43 Region 44 Region 45 Region 46 Region 47 Region 48 Region 49 Region 50 1 63,906,000 26,146,000 20,836,000 221,576,000 62,676,000 149,674,000 112,516,000 184,648,000 189,870,000 237,782,000 2 22,996,000 152,858,000 4,224,000 28,088,000 239,298,000 6,074,000 113,860,000 148,856,000 6,576,000 77,462,000 3 86,168,000 20,424,000 ------4 164,316,000 ------5 ------6 ------7 ------8 ------9 ------10 ------11 ------12 ------13 ------14 ------15 ------16 ------17 ------18 ------19 ------20 ------21 ------22 ------X ------Y ------

Chromosome Region 51 Region 52 Region 53 1 63,456,000 123,310,000 165,492,000 2 36,716,000 45,800,000 - 3 - - - 4 - - - 5 - - - 6 - - - 7 - - - 8 - - - 9 - - - 10 - - - 11 - - - 12 - - - 13 - - - 14 - - - 15 - - - 16 - - - 17 - - - 18 - - - 19 - - - 20 - - - 21 - - - 22 - - - X - - - Y - - -

Table S5. Chromosomal regions annotated for putative TFBS with JASPAR database. Value corresponds to ending of a 2000-bp region.

primer sequence Base vector region 1 - F ACATTTCCCCGAAAAGTGCCACCTGACGTCGATATCAATAAA Base vector region 1 - R TTGCGGCCGCTTTTTTCCTTCGGAATTCCGCCTTAATTAAG Base vector region 2 - F ATTTCTCTCCTTAATTAAGGCGGAATTCCGAAGGAAAAAAG Base vector region 2 - R TGGCCTTTTGCTGGCCTTTTGCTCACATGTTACCACATTTGTAGAGG Base vector region 3 - F AGTAAAACCTCTACAAATGTGGTAACATGTGAGCAAAAGGCCAGCAA Base vector region 3 - R AAATAAAGATATTTTATTGATATCGACGTCAGGTGGCACTTTTCG hrGFP between XbaI/NsiI - F GCTCTAGAATGGTGAGCAAGCAGATCCTGAAGAA hrGFP between XbaI/NsiI - R AACTGCAGAACCAATGCATTTACACCCACTCGTGCAGGCTG IRES between PmeI/XbaI - F AGCTTTGTTTAAACCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAA IRES between PmeI/XbaI - R GCTCTAGACATTATCATCGTGTTTTTCAAAGGAAAACCACGTCC SEAP between NheI/PmeI - F CTAGCTAGCATGCTGCTGCTGCTGCTGCTGCTGGGCCTGA SEAP between NheI/PmeI - R AGCTTTGTTTAAACTCATGTCTGCTCGAAGCGGCCG CMV promoter variant (gBlock) - F AAGGAAAAAAGCGGCCGCT CMV promoter variant (gBlock) - R CTAGCTAGCCGTGTCAAGGACGGTGAGTCACTCTTGGCACGGGGAAT Core CMV promoter - F ATAAGAATGCGGCCGCAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCG Core CMV promoter - R CTAGCTAGCCGTGTCAAGGACGGTGAGTCACTCTTGGCACGGGGAAT EEF1A1 (EF1a) full promoter - F ATAAGAATGCGGCCGCGAGTAATTCATACAAAAGGACTCGCCCCTGC EEF1A1 (EF1a) minimal promoter - F ATAAGAATGCGGCCGCGGGGGAGAACCGTATATAAGTGCAGTAGTC EEF1A1 (EF1a) promoter - R CTAGCTAGCTTTGGCTTTTAGGGGTAGTTTTCACGAC AATTAAGGCGGAATTCAGGAGATGACTCATGGACGTGCGAGCCGGAAGTGGGGACCGCCCCCGCCCCCGC synthetic variant 1 - F CCTTCCAGGAAGGGATGACTCATGCCCCGCCCCCTGGGGA TATAGACCTGCGGCCGCGGGCGGGACGTCACCAGGGGGAGGGGGCGGGGCGGGGCGGGGCGGGCGGGGGC synthetic variant 1 - R GGTCCCCAGGGCGGTCCCCAGGGGGCGGGGCATGAGTCAT AATTAAGGCGGAATTCCCGCCCTTCCAGGAAGCCGCCCTGGGGAGCCCCGCCCCCCCTGGTGACGTCCCG synthetic variant 2 - F CCCCCGCCCCCGCCCCCGCCCGGACGTGCGAGCCGGAAGG TATAGACCTGCGGCCGCGGGAGGGGGCGGGGCTCCCCAGGGCGGGGGGCGGGGCATGAGTCATCTCCTTC synthetic variant 2 - R CCCAATGAGTCATCCCTTCCGGCTCGCACGTCCGGGCGGG AATTAAGGCGGAATTCGGTAAGTATATACATATATACATAGGTAAGTAGGTAAGTAGGTAAGTAGGTAAG synthetic variant 3 - F TATATACATAGGTAAGTAGGTAAGTAGGTAAGTATATACATAGGTAAGTACTTTGTTATACATATATACA TAGGTAAGTACACTAATTGGTAAGTAGCGGCCGCAGGTCTATA TATAGACCTGCGGCCGCTACTTACCAATTAGTGTACTTACCTATGTATATATGTATAACAAAGTACTTAC synthetic variant 3 - R CTATGTATATACTTACCTACTTACCTACTTACCTATGTATATACTTACCTACTTACCTACTTACCTACTT ACCTATGTATATATGTATATACTTACCGAATTCCGCCTTAATT

Table S6. Primers used in this study.

Fig. S1a. Distribution of TFBSs in decreasing frequency (based on frequency found in top promoters) across an average of all TFBSs found top promoters (blue), median promoters (teal), randomly selected promoters (orange), and randomly selected regions of the (yellow) based on annotation of a 2000-bp region (exceptions shown in Supporting Information Table S1a). This subset of TFBSs is the top 20% in terms of frequency as found in the top promoters. TFBS frequency is defined as the ratio of the number of times a given TFBS is found within the 2000-bp sequence window to all TFBSs found in the same window for a given annotated set (top, median, random, chromosomal region).

Fig. S1b. Distribution of TFBSs in decreasing frequency (based on frequency found in top promoters) across an average of top promoters (blue), median promoters (teal), randomly selected promoters (orange), and randomly selected regions of the human genome (yellow) based on annotation of a 500-bp region (exceptions shown in Supporting Information Table S1a). This subset of TFBSs is the top 20% in terms of frequency as found in the top promoters. TFBS frequency is defined as the ratio of the number of times a given TFBS is found within the 500-bp sequence window to all TFBSs found in the same window for a given annotated set (top, median, random, chromosomal region).

GAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCT GACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCC CCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGG GAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGT GTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACA CGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCT TGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTC GGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCA GATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAA ACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGT TTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTC AGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTAC CATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCA GCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGC TAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGT CGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAA GCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGC ACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCT GAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACT TTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTC GATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAG GAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATAT TATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGG GGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTCGATATCAATAAAATATCTTTATTTTCATTACATCTGT GTGTTGGTTTTTTGTGTGAATCGATAGTACTAACATACGCTCTCCATCAAAACAAAACGAAACAAAACAAACTAGCA AAATAGGCTGTCCCCAGTGCAAGTGCAGGTGCCAGAACATTTCTCTCCTTAATTAAGGCGGAATTCCGAAGGAAAAA AGCGGCCGCAAAAGGAAAACTAGCTAGCTAGAGCTTTGTTTAAACGGCGCGCCGGGCTCTAGAGCCCAATGCATCAG ACATGATAAGATACATTGATGAGTTTGGACAAACCACAACTAGAATGCAGTGAAAAAAATGCTTTATTTGTGAAATT TGTGATGCTATTGCTTTATTTGTAACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTTAT GTTTCAGGTTCAGGGGGAGGTGTGGGAGGTTTTTTAAAGCAAGTAAAACCTCTACAAATGTGGTAACATGT Fig. S2a. Base expression vector generated by Gibson Assembly to construct dual-reporter vector.

GAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCT GACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCC CCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGG GAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGT GTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACA CGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCT TGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTC GGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCA GATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAA ACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGT TTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTC AGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTAC CATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCA GCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGC TAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGT CGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAA GCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGC ACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCT GAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACT TTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTC GATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAG GAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATAT TATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGG GGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTTTGGCGCGCCAATAAAATATCTTTATTTTCATTACATCTGTGT GTTGGTTTTTTGTGTGAATCGATAGTACTAACATACGCTCTCCATCAAAACAAAACGAAACAAAACAAACTAGCAAA ATAGGCTGTCCCCAGTGCAAGTGCAGGTGCCAGAACATTTCTCTCCTTAATTAAGGCGGAATTCCGAAGGAAAAAAG CGGCCGCAAAAGGAAAACTAGCTAGCATGCTGCTGCTGCTGCTGCTGCTGGGCCTGAGGCTACAGCTCTCCCTGGGC ATCATCCCAGTTGAGGAGGAGAACCCGGACTTCTGGAACCGCGAGGCAGCCGAGGCCCTGGGTGCCGCCAAGAAGCT GCAGCCTGCACAGACAGCCGCCAAGAACCTCATCATCTTCCTGGGCGATGGGATGGGGGTGTCTACGGTGACAGCTG CCAGGATCCTAAAAGGGCAGAAGAAGGACAAACTGGGGCCTGAGATACCCCTGGCCATGGACCGCTTCCCATATGTG GCTCTGTCCAAGACATACAATGTAGACAAACATGTGCCAGACAGTGGAGCCACAGCCACGGCCTACCTGTGCGGGGT CAAGGGCAACTTCCAGACCATTGGCTTGAGTGCAGCCGCCCGCTTTAACCAGTGCAACACGACACGCGGCAACGAGG TCATCTCCGTGATGAATCGGGCCAAGAAAGCAGGGAAGTCAGTGGGAGTGGTAACCACCACACGAGTGCAGCACGCC TCGCCAGCCGGCACCTACGCCCACACGGTGAACCGCAACTGGTACTCGGACGCCGACGTGCCTGCCTCGGCCCGCCA GGAGGGGTGCCAGGACATCGCTACGCAGCTCATCTCCAACATGGACATTGACGTGATCCTAGGTGGAGGCCGAAAGT ACATGTTTCGCATGGGAACCCCAGACCCTGAGTACCCAGATGACTACAGCCAAGGTGGGACCAGGCTGGACGGGAAG AATCTGGTGCAGGAATGGCTGGCGAAGCGCCAGGGTGCCCGGTATGTGTGGAACCGCACTGAGCTCATGCAGGCTTC CCTGGACCCGTCTGTGACCCATCTCATGGGTCTCTTTGAGCCTGGAGACATGAAATACGAGATCCACCGAGACTCCA CACTGGACCCCTCCCTGATGGAGATGACAGAGGCTGCCCTGCGCCTGCTGAGCAGGAACCCCCGCGGCTTCTTCCTC TTCGTGGAGGGTGGTCGCATCGACCATGGTCATCATGAAAGCAGGGCTTACCGGGCACTGACTGAGACGATCATGTT CGACGACGCCATTGAGAGGGCGGGCCAGCTCACCAGCGAGGAGGACACGCTGAGCCTCGTCACTGCCGACCACTCCC ACGTCTTCTCCTTCGGAGGCTACCCCCTGCGAGGGAGCTCCATCTTCGGGCTGGCCCCTGGCAAGGCCCGGGACAGG AAGGCCTACACGGTCCTCCTATACGGAAACGGTCCAGGCTATGTGCTCAAGGACGGCGCCCGGCCGGATGTTACCGA GAGCGAGAGCGGGAGCCCCGAGTATCGGCAGCAGTCAGCAGTGCCCCTGGACGAAGAGACCCACGCAGGCGAGGACG TGGCGGTGTTCGCGCGCGGCCCGCAGGCGCACCTGGTTCACGGCGTGCAGGAGCAGACCTTCATAGCGCACGTCATG GCCTTCGCCGCCTGCCTGGAGCCCTACACCGCCTGCGACCTGGCGCCCCCCGCCGGCACCACCGACGCCGCGCACCC GGGTTACTCTAGAGTCGGGGCGGCCGGCCGCTTCGAGCAGACATGAGTTTAAACCCCCTCTCCCTCCCCCCCCCCTA ACGTTACTGGCCGAAGCCGCTTGGAATAAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCT TTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAA AGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAG CGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACA CCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAG CGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTGCACAT GCTTTACATGTGTTTAGTCGAGGTTAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAA CACGATGATAATGTCTAGAATGGTGAGCAAGCAGATCCTGAAGAACACCGGCCTGCAGGAGATCATGAGCTTCAAGG TGAACCTGGAGGGCGTGGTGAACAACCACGTGTTCACCATGGAGGGCTGCGGCAAGGGCAACATCCTGTTCGGCAAC CAGCTGGTGCAGATCCGCGTGACCAAGGGCGCCCCCCTGCCCTTCGCCTTCGACATCCTGAGCCCCGCCTTCCAGTA CGGCAACCGCACCTTCACCAAGTACCCCGAGGACATCAGCGACTTCTTCATCCAGAGCTTCCCCGCCGGCTTCGTGT ACGAGCGCACCCTGCGCTACGAGGACGGCGGCCTGGTGGAGATCCGCAGCGACATCAACCTGATCGAGGAGATGTTC GTGTACCGCGTGGAGTACAAGGGCCGCAACTTCCCCAACGACGGCCCCGTGATGAAGAAGACCATCACCGGCCTGCA GCCCAGCTTCGAGGTGGTGTACATGAACGACGGCGTGCTGGTGGGCCAGGTGATCCTGGTGTACCGCCTGAACAGCG GCAAGTTCTACAGCTGCCACATGCGCACCCTGATGAAGAGCAAGGGCGTGGTGAAGGACTTCCCCGAGTACCACTTC ATCCAGCACCGCCTGGAGAAGACCTACGTGGAGGACGGCGGCTTCGTGGAGCAGCACGAGACCGCCATCGCCCAGCT GACCAGCCTGGGCAAGCCCCTGGGCAGCCTGCACGAGTGGGTGTAAATGCATCAGACATGATAAGATACATTGATGA GTTTGGACAAACCACAACTAGAATGCAGTGAAAAAAATGCTTTATTTGTGAAATTTGTGATGCTATTGCTTTATTTG TAACCATTATAAGCTGCAATAAACAAGTTAACAACAACAATTGCATTCATTTTATGTTTCAGGTTCAGGGGGAGGTG TGGGAGGTTTTTTAAAGCAAGTAAAACCTCTACAAATGTGGTAACATGT

Fig. S2b. Sequence of dual-reporter vector used to evaluate promoter sequences derived from sub- cloning reporter genes and EMCV IRES into the base expression vector in Fig. S2a. EcoRI, NotI and NheI restriction sites are underlined. SEAP: 2029..3588, EMCV IRES: 3597..4171, hrGFP: 4178..4897, SV40 late terminator/poly(A): 4904..5125.

GCGGCCGCGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGA ACACAGgtaagtgccgtgtgtggttcccgcgggcctggcctctttacgggttatggcccttgcgtgccttgaattac ttccacgcccctggctgcagtacgtgattcttgatcccgagcttcgggttggaagtgggtgggagagttcgaggcct tgcgcttaaggagccccttcgcctcgtgcttgagttgaggcctggcttgggcgctggggccgccgcgtgcgaatctg gtggcaccttcgcgcctgtctcgctgctttcgataagtctctagccatttaaaatttttgatgacctgctgcgacgc tttttttctggcaagatagtcttgtaaatgcgggccaagatctgcacactggtatttcggtttttggggccgcgggc ggcgacggggcccgtgcgtcccagcgcacatgttcggcgaggcggggcctgcgagcgcggccaccgagaatcggacg ggggtagtctcaagctggccggcctgctctggtgcctggcctcgcgccgccgtgtatcgccccgccctgggcggcaa ggctggcccggtcggcaccagttgcgtgagcggaaagatggccgcttcccggccctgctgcagggagctcaaaatgg aggacgcggcgctcgggagagcgggcgggtgagtcacccacacaaaggaaaagggcctttccgtcctcagccgtcgc ttcatgtgactccacggagtaccgggcgccgtccaggcacctcgattagttctcgagcttttggagtacgtcgtctt taggttggggggaggggttttatgcgatggagtttccccacactgagtgggtggagactgaagttaggccagcttgg cacttgatgtaattctccttggaatttgccctttttgagtttggatcttggttcattctcaagcctcagacagtggt tcaaagtttttttcttccatttcagGTGTCGTGAAAACTACCCCTAAAAGCCAAAGCTAGC

Fig. S2c. Sequence of “minimal” EEF1A1 promoter variant cloned between the underlined NotI and NheI restriction sites of the dual-reporter vector found in Fig. S2b. Lower case nucleotides denote the annotated intron 1 found in the EEF1A1 gene flanked by exons 1 and 2.

GAATTCAGGAGATGACTCATGGACGTGCGAGCCGGAAGTGGGGACCGCCCCCGCCCCCGCCCTTCCAGGAAGGGATG ACTCATGCCCCGCCCCCTGGGGACCGCCCTGGGGACCGCCCCCGCCCGCCCCGCCCCGCCCCGCCCCCTCCCCCTGG TGACGTCCCGCCCGCGGCCGC

Fig. S2d. Sequence of synth.v1 variant cloned between the underlined EcoRI and NotI restriction sites of the dual-reporter vector found in Fig. S2b.

GAATTCCCGCCCTTCCAGGAAGCCGCCCTGGGGAGCCCCGCCCCCCCTGGTGACGTCCCGCCCCCGCCCCCGCCCCC GCCCGGACGTGCGAGCCGGAAGGGATGACTCATTGGGGAAGGAGATGACTCATGCCCCGCCCCCCGCCCTGGGGAGC CCCGCCCCCTCCCGCGGCCGC

Fig. S2e. Sequence of synth.v2 variant cloned between the underlined EcoRI and NotI restriction sites of the dual-reporter vector found in Fig. S2b.

GAATTCGGTAAGTATATACATATATACATAGGTAAGTAGGTAAGTAGGTAAGTAGGTAAGTATATACATAGGTAAGT AGGTAAGTAGGTAAGTATATACATAGGTAAGTACTTTGTTATACATATATACATAGGTAAGTACACTAATTGGTAAG TAGCGGCCGC

Fig. S2f. Sequence of synth.v3 variant cloned between the underlined EcoRI and NotI restriction sites of the dual-reporter vector found in Fig. S2b.

GCGGCCGCTGGAAAAAGCACCTGGGTTTTCCAGGTCCCGAGAGGCACATGGCCAATAAGGTGAATTTGGCAATTAGC CGTGTGTAAAGTTGGCTATTGGCCTCATAAAGCTTTGGCTCATGTCCCCGCACTGATCCGCCCAAGTGTAACACCGC CCCGAAGAAGCGTGACGTCAGGACTCACTGTGACGTATGCCGGGATAGGGGACTTTCCACTAGAAATCTGACGTCAG GGTAGGATAGTGGAGCCTGTGTACATTGGCAAGGTTAAGGCTGCCAATAGATACTTCTGACGTCAAAGACTGTACTG ACGGTGAACGTGAGGGACTTTCCTATAGTATAATTTGGCATACGAGTAATTTGGCAGTACACCCCTTAATTGAGTGG ATTCTTCAGGCGTGACTCAGTCGGATATAGGGGATTTCCCAACTGTGTTCTCCACTTCTGCTTTCTGACGTCATACC TACTGCTTGGCAATCGCAGACACCGCCCTTTACACGCCTGACGCTTTACAAATGGGCGGTAGGCGTGTACGGTGGGA GGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATA GAAGACACCGGGACCGATCCAGCCTCCGCGGCCGGGAACGGTGCATTGGAACGCGGATTCCCCGTGCCAAGAGTGAC TCACCGTCCTTGACACGGCTAGC

Fig. S2g. Sequence of synthetic “native” pCMV variant cloned between the underlined NotI and NheI restriction sites of the dual-reporter vector found in Fig. S2b.

GCGGCCGCTGGAAAAAGTTTCGGGCTTTTCCAAGAGAGAAGCGGCACATGGCCAAAAGTGAACAATTGGCAATTAGC CTATCCGAAAGTTGGCTATTGGCCTCCCTTCTCCTTGGCTCATGTCCATATTAAACATTGGCAGTACACCTATTAGT ATACCGCCCCTGTCATAAACCGCCCAAATACATACCCGCCCTTCTGCTCCCGGGCGGCCCTGTTTTTTGACGTCACT AAGCTTGCTGACGTCAACCCCTTCACTGACGTCAACAAAAATCGTGACGTCACCAGGGTTCCTGACGTATTAGTCCT GACTGACGATTAGGTAACTGACGCGCTGCCATTGGGACTTTCCACAATCTGCCGGGACTTTCCACGCTCACCGGTGG AGTCAGGCGCTAGTGGATTCCAATCAGGCTCCACACCTTTATAGTTGGCATTGCAGTACCTGCCAAATAAACCCAAT TGGCATCACCTGTATTTGGCAAGAAACGGTTTGACTCAAGCCACAAATGGGGATTTCCTAGGCGTGTACGGTGGGAG GTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAG AAGACACCGGGACCGATCCAGCCTCCGCGGCCGGGAACGGTGCATTGGAACGCGGATTCCCCGTGCCAAGAGTGACT CACCGTCCTTGACACGGCTAGC Fig. S2h. Sequence of synthetic “sequential” pCMV variant cloned between the underlined NotI and NheI restriction sites of the dual-reporter vector found in Fig. S2b.

GCGGCCGCTGCCAAATCAAACGTATTGGCAGGTTTGCAACTGACGCCCTCACCTCTGACGTCAGCGCTATCGCTGGA AAATGAATACGAATTGGCTCATGTCCTTGACATAGACTCCACAGTCTGTGAGTTGGCAGTACACCTTAGAAACGAGG GGATTTCCGGTTTTCGTGGGCACATGGCCAATAGACACTGCTGACGTTAGTACACTGTGGATTTGTACCCTGTTGGC TATTGGCCGCGGGCGCACTTGGCAAGCGAATTACTGACGTCAGACAAATCAATTGGCAATTAGCCGTATAGTTGATT TTCCATTGCTGAATCCCGCCCCTGAAGAATCTGACGTCAATATCTTTGCTGACGTATAAGTAATCTCGTGGAGGTAA TAGAAAGGGCGGGTGAAATTCCTGACTCATCAAAGTTCGCCGCCCGTGAAGTTCTGGGACTTTCCTGTTAACTGGCC GCCCCAGGGGTTGGTGACGTCATTAAGACCCCTTGGCAGGCTCCAAATGGGACTTTCCTAGGCGTGTACGGTGGGAG GTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATAG AAGACACCGGGACCGATCCAGCCTCCGCGGCCGGGAACGGTGCATTGGAACGCGGATTCCCCGTGCCAAGAGTGACT CACCGTCCTTGACACGGCTAGC Fig. S2i. Sequence of synthetic “random” pCMV variant cloned between the underlined NotI and NheI restriction sites of the dual-reporter vector found in Fig. S2b.

Fig. S3. Sample histograms of GFP fluorescence intensity from transiently transfected HT1080 cells with promoter variants driving the expression of dual-reporter expression vector in this work, corresponding to data found in Fig. 4c. Histogram corresponds to live cells of one transfection replicate of each promoter variant and red dashed line approximately delineates separation of GFP- and GFP+ subpopulations. Supporting Information References

[1] Mathelier, A., Zhao, X., Zhang, A. W., Parcy, F., Worsley-Hunt, R., Arenillas, D. J., Buchman, S., Chen, C. Y., Chou, A., Ienasescu, H., Lim, J., Shyr, C., Tan, G., Zhou, M., Lenhard, B., Sandelin, A., and Wasserman, W. W. (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles, Nucleic Acids Res 42, D142-147.