Downloaded from genesdev.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

OUTLOOK

Finding the start site: redefining the human initiator element

Jennifer F. Kugel and James A. Goodrich Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA

Transcription by RNA polymerase II (Pol II) is dictated in −1 to +1 (where, R = A/G, and +1 is underlined) (Carninci part by core elements, which are DNA sequenc- et al. 2006; Frith et al. 2008). Other studies have also de- es flanking the start site (TSS) that help di- fined somewhat different consensus sequences for the rect the proper initiation of transcription. Taking Inr; however, all have an A at +1 in common (for review, advantage of recent advances in genome-wide sequencing see Kadonaga 2012). approaches, Vo ngoc and colleagues (pp. 6–11) identified Kadonaga and colleagues (Vo ngoc et al. 2017) devised transcripts with focused sites of initiation and found and implemented a novel multistep approach that com- that many were transcribed from promoters containing a bines experimental and computational methods to rein- new consensus sequence for the human initiator (Inr) vestigate the human Inr consensus sequence. First, they core promoter element. generated two 5′-GRO-seq (5′ end-selected global run-on followed by sequencing) libraries with human MCF-7 cells to identify the 5′ ends of nascent capped transcripts. Second, they developed a peak-calling algorithm named Defining the proper initiation of RNA polymerase II (Pol ′ II) transcription requires a complex interplay of proteins, FocusTSS to find transcripts in the 5 -GRO-seq data sets DNA elements, and RNA that work together to dictate that were initiated at a focused position on the genome, where on the genome transcription begins. This entails hence identifying clear TSSs to enable analysis of Inr se- the regulated assembly of large multisubunit nucleopro- quences. FocusTSS identified 7678 TSSs that were in tein complexes containing Pol II and many accessory fac- both data sets. Third, to identify sequence motifs enriched tors; the platform for forming these large complexes is the among the focused TSSs, they used the HOMER motif dis- core promoter. The core promoter in human genes is the covery tool (Heinz et al. 2010), which yielded an Inr-like − region from −40 to +40 and flanks the transcription start consensus sequence of BBCABW from 3 to +3 (where, site (TSS) at +1. Although no single core promoter element B = C/G/T, W = A/T, and +1 is underlined). Forty percent is contained in all human promoters, many contain one or of the focused TSSs contained a perfect match to the more of the following core elements (Fig. 1): the TATA BBCABW consensus Inr. Similar computational analyses box, initiator (Inr), TFIIB recognition elements (BREu performed with data sets from three other human cell and BREd), polypyrimidine initiator (TCT), motif ten ele- lines yielded the same Inr consensus sequence. Interest- ment (MTE), and downstream core promoter element ingly, their analyses also revealed that Inr-containing pro- (DPE) (for review, see Danino et al. 2015). Of these, the moters are less likely to have a TATA box than promoters Inr element encompasses the TSS and is thought to be lacking an Inr and that there is no correlation between the the most common core promoter element, with previous presence of BBCABW Inr elements and CpG islands. studies estimating that ∼50% of human core promoters The importance of the sequence at individual positions contain an Inr (Gershenzon and Ioshikhes 2005; Yang in the BBCABW consensus Inr sequence was tested us- et al. 2007). The commonly used consensus sequence for ing in vitro transcription assays (Vo ngoc et al. 2017). the human Inr, which was derived from mutational anal- Two native core promoters that each contained a consen- yses, is YYANWYY from −2 to +5 (where, Y = C/T, W = A/ sus Inr were used, and single-point mutations were made − T, N = A/C/G/T, and +1 is underlined) (Javahery et al. at each position from 3 to +3 that took the sequence − 1994; Lo and Smale 1996). More recently, analysis of ge- away from consensus. The sequences at positions 1to nome-wide CAGE (cap analysis gene expression) data +3 were the most important for setting levels of basal tran- led to the considerably shorter Inr consensus of YR from scription, with mutations at +1 and +3 showing the largest reductions in transcription levels. In addition, 12 natural

[Keywords: RNA polymerase II; initiator; core promoter; transcription © 2017 Kugel and Goodrich This article is distributed exclusively by start site; focused transcription] Cold Spring Harbor Laboratory Press for the first six months after the Corresponding authors: [email protected], jennifer.kugel@ full-issue publication date (see http://genesdev.cshlp.org/site/misc/ colorado.edu terms.xhtml). After six months, it is available under a Creative Commons Article is online at http://www.genesdev.org/cgi/doi/10.1101/gad.295980. License (Attribution-NonCommercial 4.0 International), as described at 117. http://creativecommons.org/licenses/by-nc/4.0/.

GENES & DEVELOPMENT 31:1–2 Published by Cold Spring Harbor Laboratory Press; ISSN 0890-9369/17; www.genesdev.org 1 Downloaded from genesdev.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Kugel and Goodrich

the effects on the position of the TSS, level of transcrip- tion, and occupancy of factors at the core promoter. Final- ly, this work was limited to the analysis of core promoters with focused TSSs. Although much more complicated, it will be important to extend this new approach to promot- ers with nonfocused start sites to investigate whether such promoters contain Inr elements. This study illus- Figure 1. Relative locations of select human core promoter ele- trates that, despite years of research, much remains to ments and the Inr consensus sequence found in promoters with be learned about core promoters and how they set start focused TSSs. The promoter elements depicted include BREu site positions and levels of transcription at human genes. (the upstream TFIIB recognition element), TATA (the TATA box), BREd (the downstream TFIIB recognition element), Inr (new consensus sequence shown), MTE, and DPE.

References core promoters were chosen that each differed from con- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, sensus at one position; these positions were mutated to − Ponjavic J, Semple CA, Taylor MS, Engström PG, Frith MC, create the Inr consensus. Mutating positions 1 to +3 to- et al. 2006. Genome-wide analysis of mammalian promoter ward consensus increased transcriptional activity, and, architecture and evolution. Nat Genet 38: 626–635. again, the mutations at positions +1 and +3 had the great- Danino YM, Even D, Ideses D, Juven-Gershon T. 2015. The core est effect. promoter: At the heart of gene expression. Biochim Biophys This work provides a substantial step forward in under- Acta 1849: 1116–1131. standing core promoter sequences, establishes a new ap- Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin proach to defining TSSs, and raises many interesting A. 2008. A code for transcription initiation in mammalian ge- questions that will guide future research. For example, al- nomes. Genome Res 18: 1–12. though the Inr is enriched at promoters with focused tran- Gershenzon NI, Ioshikhes IP. 2005. Synergy of human Pol II core scriptional start sites, it is also found randomly promoter elements revealed by statistical sequence analysis. Bioinformatics 21: 1295–1300. distributed throughout the genome. Hence, a consensus Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng Inr alone does not constitute a promoter. The data also JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of showed that promoters with consensus Inr sequences lineage-determining transcription factors prime cis-regulato- are relatively deficient in TATA boxes. It will be interest- ry elements required for macrophage and B cell identities. ing to determine the interplay between other core promot- Mol Cell 38: 576–589. er elements and the Inr at promoters with focused TSSs. Javahery R, Khachi A, Lo K, Zenzie-Gregory B, Smale ST. 1994. Although this work defines a clear correlation between DNA sequence requirements for transcriptional initiator ac- the presence of consensus Inr sequences and focused tivity in mammalian cells. Mol Cell Biol 14: 116–127. TSSs, the extent to which the Inr itself causes start sites Kadonaga JT. 2012. Perspectives on the RNA polymerase II core to be focused remains to be determined. In addition, the promoter. Wiley Interdiscip Rev Dev Biol 1: 40–51. role of specific Inr positions in controlling cellular tran- Lo K, Smale ST. 1996. Generality of a functional initiator consen- Gene – scription warrants further investigation. For example, sus sequence. 182: 13 22. Vo ngoc L, Cassidy CJ, Huang CY, Duttke SHC, Kadonaga JT. C− and A were found most frequently in Inr sequences 1 +1 2017. The human initiator is a distinct and abundant element identified in cells, but mutating C− away from consensus 1 that is precisely positioned in focused core promoters. Genes did not have a strong effect on transcription in vitro. The Dev (this issue). doi: 10.1101/gad.293837.116. investigators suggest there is an additional constraint for Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E. 2007. Preva- the use of C−1 in cells. Many of the questions raised by lence of the initiator over the TATA box in human and yeast this study could be answered by changing the sequences genes and identification of DNA motifs enriched in human of core promoters in the human genome to determine TATA-less core promoters. Gene 389: 52–65.

2 GENES & DEVELOPMENT Downloaded from genesdev.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press

Finding the start site: redefining the human initiator element

Jennifer F. Kugel and James A. Goodrich

Genes Dev. 2017, 31: Access the most recent version at doi:10.1101/gad.295980.117

Related Content The human initiator is a distinct and abundant element that is precisely positioned in focused core promoters Long Vo ngoc, California Jack Cassidy, Cassidy Yunjing Huang, et al. Genes Dev. January , 2017 31: 6-11

References This article cites 10 articles, 3 of which can be accessed free at: http://genesdev.cshlp.org/content/31/1/1.full.html#ref-list-1

Articles cited in: http://genesdev.cshlp.org/content/31/1/1.full.html#related-urls

Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first Commons six months after the full-issue publication date (see License http://genesdev.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the top Service right corner of the article or click here.

© 2017 Kugel and Goodrich; Published by Cold Spring Harbor Laboratory Press