Transcriptional Regulatory Elements in the Human Genome
Total Page:16
File Type:pdf, Size:1020Kb
ANRV285-GG07-02 ARI 8 August 2006 1:29 Transcriptional Regulatory Elements in the Human Genome Glenn A. Maston, Sara K. Evans, and Michael R. Green Howard Hughes Medical Institute, Programs in Gene Function and Expression and Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605; email: [email protected], [email protected], [email protected] Annu. Rev. Genomics Hum. Genet. 2006. Key Words 7:29–59 bioinformatics, functional genomics, transcription factors First published online as a Review in Advance on May 23, 2006 Abstract The Annual Review of Genomics and Human Genetics is online at The faithful execution of biological processes requires a precise and genom.annualreviews.org carefully orchestrated set of steps that depend on the proper spa- by Stanford University Robert Crown Law Lib. on 04/03/07. For personal use only. tial and temporal expression of genes. Here we review the various This article’s doi: 10.1146/annurev.genom.7.080505.115623 classes of transcriptional regulatory elements (core promoters, prox- Annu. Rev. Genom. Human Genet. 2006.7:29-59. Downloaded from arjournals.annualreviews.org imal promoters, distal enhancers, silencers, insulators/boundary el- Copyright c 2006 by Annual Reviews. All rights reserved ements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that in- 1527-8204/06/0922-0029$20.00 teracts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcrip- tional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regu- latory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome. 29 ANRV285-GG07-02 ARI 8 August 2006 1:29 INTRODUCTION expression and highlight diseases that result from their alteration. Finally, we review the The faithful execution of biological processes methods currently used to identify transcrip- LCR: locus control such as development, proliferation, apopto- tional regulatory elements, both experimen- region sis, aging, and differentiation requires a pre- tally and through bioinformatics approaches. Combinatorial cise and carefully orchestrated set of steps that control: the depend on the proper spatial and temporal ex- concerted action of pression of genes. As a result, deregulation of combinations of EUKARYOTIC gene expression can often lead to disease. The multiple TRANSCRIPTION: transcriptional completion of the human genome sequence AN OVERVIEW regulatory elements and its annotation using computational and The expression of eukaryotic protein-coding and their cognate comparative genomic methods has led to the genes (also called class II or structural genes) transcription factors cataloging of ∼20,000–25,000 protein-coding can be regulated at several steps, including genes (39). Key questions now relate to un- transcription initiation and elongation, and derstanding how these genes and their prod- mRNA processing, transport, translation, and ucts function, as well as how their spatial and stability. Most regulation, however, is believed temporal expression patterns are established to occur at the level of transcription initiation. at both the cellular and organismal level. In eukaryotes, transcription of protein-coding To understand the molecular mechanisms genes is performed by RNA polymerase II. that govern specific expression patterns on a Genes transcribed by RNA polymerase II global scale, it is important to identify the typically contain two distinct families of - transcriptional regulatory elements associated cis acting transcriptional regulatory DNA ele- with each predicted gene. Moreover, the abil- ments: ( ) a promoter, which is composed of ity to identify such elements is an impor- a a core promoter and nearby (proximal) regu- tant step toward understanding how gene latory elements, and ( ) distal regulatory el- expression is altered in pathological condi- b ements, which can be enhancers, silencers, tions. Thus, one of the main emerging chal- insulators, or locus control regions (LCR) lenges for genomics research is to identify all (Figure 1). These -acting transcriptional functional elements in the genome, includ- cis regulatory elements contain recognition sites ing those that regulate gene expression. The for -acting DNA-binding transcription availability of the complete human genome trans factors, which function either to enhance or sequence, in combination with genome-wide repress transcription. expression data, will facilitate the comprehen- The structure of human gene promot- sive identification of these transcriptional reg- ers can be quite complex, typically con- ulatory elements. In addition, these resources sisting of multiple transcriptional regulatory serve as a starting point for studying transcrip- by Stanford University Robert Crown Law Lib. on 04/03/07. For personal use only. elements. The need for this complexity be- tion regulation of human genes on a global comes clear when one considers that although scale, and provide information regarding the Annu. Rev. Genom. Human Genet. 2006.7:29-59. Downloaded from arjournals.annualreviews.org the human genome contains ∼20,000–25,000 establishment of spatial and temporal gene genes, each of which may have a unique spa- expression patterns and the mechanisms re- tial/temporal expression pattern, it encodes quired for their establishment. only ∼1850 DNA-binding transcription Here we review the various classes of tran- factors—presumably far less than the number scriptional regulatory elements and the cur- of expression patterns that must be generated rent understanding of how they function. We (183). The presence of multiple regulatory el- begin with an overview of the eukaryotic tran- ements within promoters confers combinato- scription process and the molecular machin- rial control of regulation, which exponentially ery that drives it. We then focus on the role increases the potential number of unique ex- of transcriptional regulatory elements in gene pression patterns. The challenge now is to 30 Maston · Evans · Green ANRV285-GG07-02 ARI 8 August 2006 1:29 understand how different permutations of the Distal regulatory elements same regulatory elements alter gene expres- sion. An understanding of how the combina- Locus control region Insulator torial organization of a promoter encodes reg- Silencer Enhancer ulatory information first requires an overview of the proteins that constitute the transcrip- tional machinery. Proximal Core THE EUKARYOTIC promoter promoter TRANSCRIPTIONAL elements MACHINERY Promoter ( 1 kb) Factors involved in the accurate transcrip- tion of eukaryotic protein-coding genes by Figure 1 RNA polymerase II can be classified into three Schematic of a typical gene regulatory region. The promoter, which is groups: general (or basic) transcription fac- composed of a core promoter and proximal promoter elements, typically spans less than 1 kb pairs. Distal (upstream) regulatory elements, which can tors (GTFs), promoter-specific activator pro- include enhancers, silencers, insulators, and locus control regions, can be teins (activators), and coactivators (Figure 2). located up to 1 Mb pairs from the promoter. These distal elements may GTFs are necessary and can be sufficient for contact the core promoter or proximal promoter through a mechanism that accurate transcription initiation in vitro (re- involves looping out the intervening DNA. viewed in 141). Such factors include RNA polymerase II itself and a variety of auxil- (73); subsequent reinitiation of transcription iary components, including TFIIA, TFIIB, then only requires rerecruitment of RNA TFIID, TFIIE, TFIIF, and TFIIH. In addi- polymerase II-TFIIF and TFIIB. General tion to these “classic” GTFs, it is apparent that The assembly of a PIC on the core pro- transcription factor in vivo transcription also requires Mediator, moter is sufficient to direct only low levels of (GTF): a factor that a highly conserved, large multisubunit com- accurately initiated transcription from DNA assembles on the plex that was originally identified in yeast (re- templates in vitro, a process generally referred core promoter to viewed in 38, 119). to as basal transcription. Transcriptional ac- form a preinitiation complex and is GTFs assemble on the core promoter in tivity is greatly stimulated by a second class required for an ordered fashion to form a transcription of factors, termed activators. In general, ac- transcription of all preinitiation complex (PIC), which directs tivators are sequence-specific DNA-binding (or almost all) genes RNA polymerase II to the transcription start proteins whose recognition sites are usually Coactivators: site (TSS). The first step in PIC assembly present in sequences upstream of the core adaptor proteins that is binding of TFIID, a multisubunit com- promoter (reviewed in 149). Many classes of typically lack by Stanford University Robert Crown Law Lib. on 04/03/07. For personal use only. plex consisting of TATA-box-binding pro- activators, discriminated by different DNA- intrinsic sequence-specific tein (TBP) and a set of tightly bound TBP- binding domains, have been described, each DNA binding but Annu. Rev. Genom. Human Genet. 2006.7:29-59. Downloaded from arjournals.annualreviews.org associated factors (TAFs). Transcription then