Stop thinking protein-centric! How do we define a • mRNA does NOT start with ATG!!!! eukaryotic ? Translation starts with ATG. • In more and more genes, the first Shifra Ben-Dor exon is short and non-coding. The first intron has a tendency to be long (longer than most prediction programs can handle).

• Most cloning methods today are based on cDNA - using polyT as the Promoter elements primer • Core Promoter • This creates genes with complete 3’ • Proximal Promoter ends. Generally when we see ATG, we • Distal Promoter stop looking to see if there is • anything else. • Make sure you have the complete 5’ end of your gene. • Silencers, Boundary/Insulators Core Promoter Distal Promoter • Start Site (TSS) • Anything further downstream, still • Approximately -35 to +35 within the gene area • General Transcription Factor Binding Sites • Specific Transcription Factors

Proximal Promoter Enhancers • Approximately -250 to +250 • Anything further away • Specific Transcription Factor Binding Sites • Specific Transcription Factors

Various conserved sequences -37 to -32-31 to -26 -2 to +4 +28 to +32 • TATA box BRE TATA INR DPE • Inr Box

TSS • DPE • BRE • CAAT box • CpG Islands Modified from: Butler and Kadonaga, Genes & Development 16:2583-2592 2002 TATA BOX TATA BOX • Consensus sequence : TATAAA(A) • In Drosophila, 43% of a test set of • Binds TFIID subunit - TBP - TATA Binding 205 core promoters have one Protein (and related factors) • In Humans, 32% of 1031 potential • Position -25 to -30 from TSS promoter regions have one (transcription start site) • In Yeast, from -40 to -100

Inr (Initiator) Box Inr Box

• Consensus Sequence: • Binds TFIID subunits: TAFII250 and TAF 150 • Mammalian: PyPy(C)A+1NT/A,PyPy II • Also binds RNA polymerase II directly, • Drosophila: TCA +1G/T,TC/T has been shown active in absence of TAFs • The A is the transcription start site • Other proteins (TFII-I ……) • Present in both TATA and TATAless promoters DPE DPE

• Downstream Promoter Element • Probably interacts with TAFII60 and TAF 40 • Consensus Seuqence: II • Helps position TFIID together with A/G+28,G,A/T,C/T,G/A/T minor Inr - always at +28 preference A at +24 • Must have Inr • Generally present only in TATAless • Present in ~30% of promoters in promoters Drosophila (same frequency as TATA)

BRE CCAAT box • B Recognition Element • Consensus sequence: CCAAT • TFIIB binding protein • Generally upstream, about -60 to -100 • Consensus sequence: G/C,G/C,G/A,CGCC • In TATAless promoters, can be close • 3’ end of BRE followed by 5’ of TATA to Inr (even downstream) • 5/7 found in 12 % of 315 TATA • Found in ~50% of vertebrate containing promoters promoters CpG Islands CCAAT box • Present only in vertebrate promoters • Binds TFIIB (indirectly ?) • Present in about 50% of promoters, mostly • CTF (CCAAT-binding transcription factor; housekeeping genes. also called nuclear factor–I, or NF-I) • Generally no TATA or DPE • CBF (CCAAT-box–binding factor; also • Bound by Sp1 (also helps maintain called nuclear factor–Y, or NF-Y). hypomethylation) • CCAAT/enhancer binding proteins - c/EBP • Multiple weak start sites (distributed) family with 6 members • Sp1 + Inr sufficient for transcription initiation

CpG Islands Proximal Promoters • CpG generally underrepresented because of methylation/deamination • Several transcription factor binding to TpG sites spread out over a (relatively) • Measured as percent CG dinucleotides large area over a given window, with x distance between them, and compared to the expected number of CG in the given window Families • Transcription factors and binding sites come in groups - where many factors • We have to remember that the fact can recognize the same site, or that a binding site exists does NOT consensus site mean that it is bound. • Various combinations of family members • The sites are short and degenerate, can bind, often in a cell type or cell and so appear many times at random cycle dependent manner in the genome (Inr every 512 bp and • For example - AP1 requires members of TATA every 120 bp) the Fos and Jun families

Enhancers • It’s also important to remember that • A subset of enhancers can be specific many transcription factor binding to the core promoter type (TATA, sites are active regardless of the DPE) orientation (plus strand or minus • This can help with recognition (over strand) distances, gene clusters) Mechanisms of Enhancer Function Structural Observations • Protein-Protein Contacts • Low sequence similarity test set • Covalent Modification of Proteins • Low bendability upstream of TATA (Phosphorylation/Acetylation) • High bendability downstream of TSS • Chromatin Structure • Downstream: periodic sequence and • Remodeling bendability patterns in phase with DNA helical pitch, similar to • Superhelical Tension • Length of bendable area similar to • Nuclear Localization length binding region

Structural Assumptions MAR/SAR • Maybe positioning of nucleosome • Matrix/Scaffold Attachment Regions right at TSS (allows for regulation?) • They are physical domain boundaries • Unclear what role they play in • Inr - when all possible sequences that transcriptional regulation (functional fit the Inr consensus are checked, domain boundaries?) there is a maximum bendability at the • Possibly present in flanking regions of genes A+1 - so Inr may have a structural role Boundary Elements/Insulators

• Blocks the action of an enhancer on a promoter when placed between them. Insulates transgenes from positive and negative position effects. • May also have other functions (promoter/enhancer) for other genes.