Open Access Research2006Mariño-RamírezetVolume al. 7, Issue 12, Article R122

Multiple independent evolutionary solutions to core comment regulation Leonardo Mariño-Ramírez*, I King Jordan† and David Landsman*

Addresses: *Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894-6075, USA. †School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, Georgia 30332-0230, USA.

Correspondence: David Landsman. Email: [email protected] reviews

Published: 21 December 2006 Received: 8 August 2006 Revised: 20 October 2006 Genome Biology 2006, 7:R122 (doi:10.1186/gb-2006-7-12-r122) Accepted: 21 December 2006 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2006/7/12/R122

© 2006 Mariño-Ramírez et al.; licensee BioMed Central Ltd. reports This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Evolution

Anindicates analysis a of substantial histone of the regulation evolutionaryevolutionary dynamicsturnover ofof cis-regulatorythe regulatory sequence mechanisms motifs that along give withrise to the the transcription conserved histone factors regulatorythat bind them

phenotype

Abstract research deposited

Background: Core histone are periodically expressed along the cell cycle and peak during S phase. Core histone is deeply evolutionarily conserved from the yeast Saccharomyces cerevisiae to human.

Results: We evaluated the evolutionary dynamics of the specific regulatory mechanisms that give rise to the conserved histone regulatory phenotype. In contrast to the conservation of core histone

gene expression patterns, the core histone regulatory machinery is highly divergent between refereed research species. There has been substantial evolutionary turnover of cis-regulatory sequence motifs along with the transcription factors that bind them. The regulatory mechanisms employed by members of the four core histone families are more similar within species than within gene families. The presence of species-specific histone regulatory mechanisms is opposite to what is seen at the sequence level. Core histone are more similar within families, irrespective of their species of origin, than between families, which is consistent with the shared common ancestry of the members of individual histone families. Structure and sequence comparisons between histone families reveal that H2A and H2B form one related group whereas H3 and H4 form a distinct group,

which is consistent with the assembly dynamics. interactions

Conclusion: The dissonance between the evolutionary conservation of the core histone gene regulatory phenotypes and the divergence of their regulatory mechanisms indicates a highly dynamic mode of regulatory evolution. This distinct mode of regulatory evolution is probably facilitated by a solution space for promoter sequences, in terms of functionally viable cis-regulatory sites, that is substantially greater than that of protein sequences. information

Background 146 base-pairs (bp) of DNA wrapped around an octameric Core histone genes encode four families of proteins that pack- core containing two copies of each histone protein. Compara- age DNA into the nucleosome, which is the basic structural tive studies of core have revealed that their unit of eukaryotic [1]. The four core histones sequences are among the most evolutionary conserved of all are H2A, H2B, H3 and H4, and each nucleosome consists of eukaryotic proteins [2]. For instance, the human H4 protein

Genome Biology 2006, 7:R122 R122.2 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

(NP_003539) is 92% identical to its yeast Saccharomyces gene upstream activating element [27]. There are a number of cerevisiae ortholog (NP_014368) [3]. The high levels of core such examples, from S. cerevisiae and other model systems, histone sequence conservation are thought to be due to severe of efforts to characterize experimentally the mechanisms of structural constraints imposed by their assembly into the his- core histone gene regulation. In addition, efforts are under- tone octamer [4] as well as the similar functional constraints way to investigate core histone promoters among different across species associated with the compact binding of DNA species computationally [28]. [5]. Despite the substantial body of knowledge on the regulation Most of the packaging of genomic DNA by core histones of core histone genes, little is known about the evolutionary occurs primarily during the S phase of the cell cycle, when dynamics that have given rise to these regulatory mecha- DNA is being actively replicated; stoichiometrically appropri- nisms. We present here an evolutionary analysis of core his- ate levels of histone proteins are required to bind DNA imme- tone gene regulatory mechanisms. The emphasis of this work diately following replication [6]. As such, the expression of is placed on understanding the evolution of cis-regulatory core histone genes is tightly regulated and peaks sharply dur- sites along with their cognate transcription factors. We ana- ing S phase [7]. Much like the histone sequences, this histone lyzed the phyletic distributions of 14 experimentally verified gene expression pattern is highly conserved among eukaryo- core histone cis-regulatory elements among 24 crown group tes ranging from human to the yeasts Saccharomyces cerevi- . The evolution of core histone gene cis-regulatory siae and Schizosaccharomyces pombe [8-13]. sites and transcription factors is considered in light of core histone protein sequence and structure evolution. Despite the The mechanisms that underlie the cell cycle specific regula- highly conserved core histone sequences and expression pat- tion of core histone genes have been intensively studied [6,7]. terns, the mechanisms of histone gene regulation were found Although most of this work has focused on the regulation of to be highly divergent and lineage specific. The implications transcription via the interaction of cis-regulatory elements of this dissonance with respect to the evolution of gene regu- and transcription factors, a number of studies have also latory systems are explored. addressed the role of post-transcriptional regulation of core histone synthesis. Here, we focus exclusively on the regula- tion of core histone gene expression at the transcriptional Results and discussion level. Numerous studies have characterized core histone cis- Gene expression patterns regulatory sites and their cognate transcription factors [7,14- The expression of core histone genes is tightly regulated dur- 23]. Sequence logos representing 14 experimentally verified ing the cell cycle and peaks specifically during S phase, con- cis-regulatory motifs, along with the names of the transcrip- comitant with DNA replication (Figure 2). This is thought to tion factors that bind them, are shown in Figure 1. be due to the requirement for histone proteins to bind DNA immediately after its synthesis. A number of recent studies The studies that resulted in the characterization of these have revealed the extent to which this S phase specific pattern motifs and transcription factors have led to the elucidation of of core histone gene expression is conserved among eukaryo- core histone gene regulation in model experimental systems tic species; the histone expression pattern has been demon- such as S. cerevisiae. For example, the yeast transcription fac- strated for human core histone genes as well as for histones tor Spt10p was recently demonstrated to activate core histone from S. cerevisiae and S. pombe [8-13]. This highly conserved gene expression [16]. Interestingly, the SPT10 gene was orig- regulatory phenotype (the expression pattern) is consistent inally identified as a suppressor of Ty insertion mutations with the deep conservation of histone protein sequences and [24,25] and as a global regulator of core promoter activity further underscores the strong functional (selective) con- [26]. However, despite the fact that Spt10p affects the expres- straint that histone genes are subject to. Considering the sion of hundreds of yeast genes, it specifically binds cis-regu- highly conserved regulatory phenotype of core histone genes, latory sequences, referred to as upstream activating elements, it would seem to follow that their regulatory mechanisms are which are found only in core histone gene promoters. Thus, similarly conserved. the global regulatory properties of Spt10p are based solely on changes in levels of core histone gene expression. In support Lineage-specific cis-regulatory mechanisms of this model of histone gene regulation, the DNA-binding Contrary to the expectation that core histone genes would domain of Spt10p was recently characterized and shown to have conserved regulatory mechanisms across species, the mediate sequence-specific interaction with the core histone best studied core histone genes - namely human and S. cere-

CoreFigure histone 1 (see genefollowing cis-regulatory page) sequence motifs and transcription factors Core histone gene cis-regulatory sequence motifs and transcription factors. Experimentally verified cis-regulatory motifs and their transcription factors were taken from the literature as described in the Introduction section (see text). Sequence logos for the cis-motifs show information content (conservation) per position. Unidentified transcription factors are indicated by NI. TF, transcription factor.

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.3 comment Motif TF Motif TF

SPT10 2 2 Spt10 E2F 1 1 bits bits G C GAA T CG TA A CCC G C AG A T G C A T A T GT A T G GCC C C C C G A C C G GG G A C 0 2 3 4 5 6 7 8 9 T 1 T A CG C A G T T A G ATGAGG T TTAACTATGT T G T C A 0 T 10 11 12 2 3 4 5 6 7 8 9 1 C T C 10 11 12 13 T14 T15 C16 17 18 19 20 21 CG reviews TBP GC box 2 TBP 2 Sp1

1 1 bits bits T T G C G C A T G G G T CT A GT A T C A C A C A G G T G T A A T T C C T A A G AT C C A TGAGTTCG 0 A 1 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 T TA 1 A GG 10 11 12 13 14 NEG HEX reports

2 2

1 1 bits NI bits NI

AA A TGTC GG C 0 TT C AACA T 2 3 4 5 6 7 8 9 1 0 2 3 4 5 6 1 T ACGCT A10 A11 12 13 14 15 GA TTC

AACCCT HiNF-D research deposited

2 NI 2 HiNF-D

1 1 bits bits C CC C T A GA AC G GA C AC CG A T A CG GG TG G T G CG GT C 0 G A T A 0 C AT CC 2 3 4 5 6 7 8 9 1 G 2 3 4 5 6 7 8 9 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ATCA AACCCTAACCCT AG 10 11 12 13 14 15 16

Oct-1 IRF-7

2 refereed research 2 POU2F1 IRF-7 1 1 bits bits GA GT G CA C AG

T T A AC AT CAGTC A G T A G A T C C ATC C G G C G T A C T T T A 0 A CGCC A A GT 2 3 4 5 6 7 8 9 G 1 CAGGTGTGGC T AA

0 10 11 1 2 3 4 5 T6 7 8 9 A A10 11 12 13 14 15 AA GA

CCAAT box IRF-1 2 NF-Y 2 IRF-1

1 1 bits bits interactions AG CA C A T AT G A A C C A G A G T T C G C GTAG AC T 0 G 0 T 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 1 CCA 10 11 12 TTC TT a-CP1 TATA box

2 2 NF-Y TIIFD 1 bits 1 bits

CA GA T T A C C G A G 0 C GC 2 3 4 5 6 7 8 9 1 G G G A C C G G A C GTCA information C A T A ATT T GT T 10 11 0 T 2 3 4 5 6 7 8 9 1 CCAAT AG ATA A 10 11 12 13 14 15

Figure 1 (see legend on previous page)

Genome Biology 2006, 7:R122 R122.4 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

3

G1 S G2 M G1 S G2 M G1 2

1 H4.1 H4.2 H3.1 H3.2 0 H2A.1 H2A.2 H2B.1 Log(ratio) H2B.2 -1

-2

-3 0 100 200 300

Time (minutes)

CellFigure cycle 2 (S phase) specific expression patterns of core histone genes Cell cycle (S phase) specific expression patterns of core histone genes. A cluster of eight core histone genes and their relative expression levels are plotted along the progression time of the cell cycle for the yeast S. cerevisiae.

visiae (yeast) - have different promoter architectures; in fact, binding proteins that interact with these sites tend to be line- they are regulated quite differently [7]. The human and yeast age specific (Table 2). core histone promoters, many of which are bidirectional, are illustrated in Figure 3. Human core histone gene promoters In order to evaluate the evolution of core histone promoter contain more known cis-regulatory binding sites, relative to cis-regulatory sites in more detail, the phyletic distribution of yeast promoters, which is consistent with the involvement of all 14 experimentally characterized DNA binding motifs more transcription factors and the greater complexity of among 24 crown group eukaryotic species was assessed. To human histone gene regulation. Out of the 14 experimentally do this, position frequency matrices (PFMs) of the cis-regula- characterized cis-regulatory sites that are known to be tory motifs (Figure 1) were taken from the TRANSFAC data- involved in histone gene regulation in the two species, only base [29] or were generated from the binding site alignments one site, the TBP/TATA box, is shared between the two spe- reported in the original citation. Intergenic promoter regions cies (Table 1). Furthermore, the phyletic distributions (the of core histones (H2A, H2B, H3, and H4) for all 24 species presence/absence among species) of the trans-regulatory were then searched for the presence of the 14 cis-regulatory

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.5

(a) comment GC IRF-1- IRF-7 TATA HIST1H2AB H2A HiNF-D Oct-1 CCAAT

Oct-1 GC IRF-1 IRF-7 TATA HIST1H2BB H2B HiNF-D CCAAT reviews

Oct-1 E2F IRF-1 TATA HIST1H3E H3 IRF-7 HiNF-D HEX

GC IRF-1 TATA reports HIST1H4B H4 HiNF-D IRF-7 (b) deposited research deposited TATASPT10 SPT10 TATA H2A H2B NEG

TATASPT10 SPT10 TATA H3 H4 refereed research NEG

SchemaFigure 3for core histone gene promoters Schema for core histone gene promoters. (a) Four different human core histone gene promoters are shown along with the relative locations of predicted cis-binding motifs. Official gene names are indicated for each promoter. These are examples of genes that are not divergently transcribed because a pair of divergently transcribed genes share identical motifs. (b) Yeast (S. cerevisiae) bidirectional core histone promoters and cis-binding motifs. The promoter sequences and, accordingly, the location/presence of the cis-motifs of individual members of each family may vary for each gene. Not drawn to scale. interactions

motifs using the program CLOVER [30]. CLOVER uses the represent two promoter sequences with similar cis-regulatory cis-regulatory site PFMs to evaluate the promoter sequences binding sites. The r values were transformed into pair-wise for statistically significant over- or under-representation of promoter distances using the following formula: d = 1 - (r + motif elements. For any given promoter sequence (Pi), CLO- 1)/2. VER assigns a numerical value (raw score) to each cis motif (j) indicating its over- or under-representation in that sequence. A total of 254 core histone promoter sequences were com- information The distribution of cis-regulatory motifs in that promoter is pared in this way, resulting in a matrix of 32,131 pair-wise dis- then represented as a vector, Pi = (Pi1, Pi2 ... Pi14], of tances. This distance matrix was evaluated using a fast sequence- and motif-specific CLOVER scores (Pij). The CLO- implementation of the neighbor-joining algorithm [31,32] to VER-generated vectors were then compared using the Pear- determine the evolutionary relationships, based on cis-regu- son correlation coefficient (r). High r values would thus latory binding sites, among the core histone promoter

Genome Biology 2006, 7:R122 R122.6 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

Table 1 ing in concert within evolutionary lineages, perhaps due to similar lineage-specific regulatory constraints. Distribution of core histone regulatory motifs among human and yeast The lineage-specific nature of core histone promoter Motifa Humanb Yeastb sequence evolution was further explored by generating a species distance matrix analogous to the sequence distance Spt10 - + matrix described above. For the species distance matrix, NEG - + CLOVER scores were calculated for sets of all promoter TBP/TATA box + + sequences from individual species. Species-specific CLOVER CCAAT box + - vectors calculated in this way were compared using r value Alpha-CP1 + - distances, and the resulting distance matrices were used to Oct-1 + - compute a neighbor joining tree (Figure 5a). As a control, the IRF-7 + - same comparison was done using promoter sequences that GC box + - were randomly permuted with preservation of their mono- HEX + - and dinucleotide frequencies (Figure 5b). Although the topol- E2F + - ogy of the control tree shows no relationship to the species phylogeny, the topology of the tree generated from the HiNF-D + - observed data is in general agreement with the species phyl- IRF-1 + - ogeny and thus underscores the within-species coherence of aName of the cis-regulatory binding motif/transcription factor. the core histone promoter cis-regulatory motifs. There are, b Presence (+) or absence (-) of the element in human or yeast (S. however, some interesting exceptions to this trend. For exam- cerevisiae). ple, S. pombe and Aspergillus nidulans are found in a cluster that includes Drosophila mojavensis; in addtition, Arabidop- sequences (Figure 4). Surprisingly, when histone promoter sis thaliana is nested close to vertebrates as opposed to being sequences are related in this way, they tend to form clusters an outgroup to the entire ensemble, as would be expected. that are relatively lineage specific with respect to the species from which they are derived rather than their family of origin. Motif evolutionary dynamics For instance, there are fairly well defined clusters of histone Further examination of the cis-regulatory motif distribution promoters that are fungi specific and others that are meta- within the yeast group of species (order Saccharomycetales) zoan specific (see red blocks in Figure 4). Importantly, these shows that different combinations of motifs have distinct evo- distinct clusters contain promoters from all four histone gene lutionary trajectories, suggesting lineage-specific mecha- families. In general, histone promoter sequences from differ- nisms of regulation (Figure 6). For instance, Spt10p and TBP ent families are completely intermixed on the tree (they do combine to regulate core histones among all Saccharomyc- not tend to group into gene family specific clusters). This sug- etales species evaluated here, whereas the NEG element gests that some core histone promoter regions may be evolv-

Table 2

Phyletic distribution of core histone transcription factors

Transcription factor RefSeq accession (protein name)a Phyletic distributionb

E2F NP_005216 () Metazoans and plants NP_009042 (TFDP1) Metazoans and plants TBP NP_950248 (TBPL2) Eukaryota Sp1 NP_038700 (SP1) Metazoans HiNF-D NP_853530 (CUTL1) Metazoans Oct-1 NP_002688 (POU2F1) Metazoans IRF-7 NP_058546 (Irf7) Vertebrates IRF-1 NP_032416 (Irf1) Vertebrates NF-Y NP_002496 (NFYA) Eukaryota NP_006157 (NFYB) Eukaryota NP_055038 (NFYC) Eukaryota Spt10p NP_012408 (SPT10) Ascomycota

aAccession identifier from the NCBI Reference Sequence (RefSeq) database along with official protein name for the DNA-binding protein. bDeepest taxonomic node(s) that covers the phyletic distribution of the transcription factor.

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.7 comment reviews reports deposited research deposited refereed research interactions

RelationshipsFigure 4 among core histone gene promoter sequences Relationships among core histone gene promoter sequences. Promoter sequences are related by comparisons of cis-regulatory motif vectors, as described in the text. Individual promoters are ordered by similarity along each axis. Pair-wise correlations between promoter-specific vectors are color coded according to the scale bar shown. The block color structure along the diagonal reveals clusters of related promoter sequences.

exerts its negative regulatory effects exclusively among S. cer- changes suggest that they are not always in accordance with information evisiae and its two closest relatives. Furthermore, the posi- the phylogenetic relationships among species (Figure 6). tion-specific sequence conservation of cis motifs is coherent within species but divergent between species. The informa- The position of cis-regulatory motifs in the proximal pro- tion content along positions of the motif sequences changes moter sequences is also critical to histone gene regulation as slightly between lineages, and visual inspection of these demonstrated by the conserved relative positions of the

Genome Biology 2006, 7:R122 R122.8 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

motifs in particular species contexts (Figure 7). Spt10p has the distance was taken as d = 1 - zr. The resulting pair-wise four experimentally characterized binding sites for each bidi- distance matrix was used to build a neighbor joining tree for rectional promoter in S. cerevisiae (Figure 3). Accordingly, the four histone families (Figure 8a). This tree shows that when the relative position of Spt10p cognate sequence motifs H2A and H2B form one related cluster, whereas H3 and H4 are evaluated among all species where they are present, they form another. Interestingly, these evolutionary relationships exhibit a marked clustering in the center of the promoter are reflected in the structure (Figure 8b) and assembly regions (compare Figure 7 panels a and b). On the other hand, dynamics of the histone octamer [38]. H3 and H4 first form NEG and TBP are excluded from the centers of the core dimers that come together as a tetramer. Meanwhile, H2A histone promoters and tend to map closer to the translational and H2B form dimers separately and these H2A-H2B dimers start sites (Figure 7c-f). join the H3-H4 tetramer to form the octamer.

Sequence and structure evolution A more detailed analysis of the evolutionary relationships The lineage-specific pattern of core histone promoter evolu- within and between histone protein families was performed tion revealed by the comparative analysis of cis-regulatory using a comparative analysis of the HFD. The HFD is repre- motif sequences stands in contrast to the evolution of core sented in the Pfam database, and an alignment of its repre- histone protein sequences and structures. There are four fam- sentative members has been used to generate a hidden ilies of core histone proteins, namely H2A, H2B, H3 and H4, Markov model (HMM) that captures the position-specific and these families are present in all eukaryotes, indicating sequence variation characteristic of the domain. In order to that they probably evolved via three ancient gene duplication build a multiple sequence alignment that unites members of events that preceded the diversification of the eukaryotic lin- all four families, representative members of each family from eage. Given this evolutionary scenario, it can be expected that the 24 species analyzed here were aligned in register to the all protein sequences (structures) of a given family will be HFD-HMM. This HFD multiple sequence alignment was then more closely related to one another, regardless of the species used to calculate all pair-wise distances, within and between from which they are derived, than they are to members of families, and to build a HFD phylogeny (Figure 8c). As other families. Straightforward sequence comparison meth- expected, all members within any given family are more ods, such as BLASTP [33], bear this expectation out (data not closely related to one another than to members of any other shown). In fact, although sequences within families are highly family. The phylogenetic relationships within families are conserved, it is not possible to identify members of different largely consistent with the established taxonomic relation- families using pair-wise BLASTP comparisons. On the other ships of the species from which the sequences were derived. hand, despite its low sequence similarity among core his- However, the relatively high within-family sequence identi- tones, the histone fold domain (HFD) is present in all four ties, as well as the level of resolution afforded by the between- core histones [34,35]. family HMM approach, do not lend themselves to robust delineation of evolutionary relationships within families. Per- In order to explore the sequence/structure relationships haps most germane is the fact that the between-family rela- within and among core histone protein families, sensitive tionships illustrated by the HFD-HMM approach are methods of comparison are needed. For instance, compari- identical to those seen in the DALI structural comparison. It sons of three-dimensional protein structures [36] can often is worth reiterating that these family-specific protein reveal deep evolutionary relationships that are not apparent sequence relationships are totally discordant with the largely when protein sequences alone are compared. A high-resolu- lineage-specific promoter sequence element relationships. tion structure of the Xenopus laevis nucleosome exists, and structural comparison of the individual histone units, which correspond to distinct histone families, was performed using Conclusion similarity scores from the DALI database [37]. The We have demonstrated a striking dissonance between the statistically significant similarity scores observed indicate deep evolutionary conservation of core histone regulatory that the signal of common ancestry among all histone families phenotypes and the profound divergence of their regulatory is preserved at the structural level. For each histone variant, mechanisms. Core histone genes exhibit similar cell cycle (S its pair-wise DALI Z scores were normalized by the self-com- phase specific) expression patterns from the yeast S. cerevi-

parison of Z scores (Zij/Zii) to yield a relative Z score (Zr), and siae to human (Figure 2). This regulatory conservation is con-

RelationshipsFigure 5 (see among following species-spec page) ific cis-regulatory motif sets Relationships among species-specific cis-regulatory motif sets. (a) Species are related by comparisons of cis-regulatory motif vectors as described in the text. Individual sets of promoters are grouped by species, which are then ordered by similarity along each axis. Pair-wise correlations between species vectors are color coded according to the scale bar shown. (b) Randomized promoter sets preserving both mono- and dinucleotide sequence composition is shown for comparison.

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.9

(a) comment reviews reports deposited research deposited

(b) refereed research interactions information

Figure 5 (see legend on previous page)

Genome Biology 2006, 7:R122 R122.10 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

Species Spt10 NEG TBP

S. cerevisiae

S. paradoxus

S. mikatae

S. kudriavzevii

S. bayanus

S. castellii

S. kluyveri

K. waltii

A. gossypii

C. albicans

DistributionFigure 6 of cis-regulatory motifs among Saccharomycetales Distribution of cis-regulatory motifs among Saccharomycetales. Species are ordered according to their taxonomic relationships and presence/absence of three motifs is shown, along with their sequence logos.

sistent with the high levels of sequence conservation among may have limited utility for identifying functional regulatory core histone proteins. Nevertheless, the regulatory mecha- elements across all but the most closely related species. nisms that are used to achieve the conserved expression patterns of core histone genes are almost entirely lineage spe- In addition to the divergence of cis sites and trans factors, a cific. The cis-trans machinery involved in core histone gene distinct level of post-transcriptional regulation of core his- regulation has changed substantially between lineages tones emerged along the metazoan evolutionarily lineage through gain and loss of transcription factor proteins and [40]. Core histone gene 3'-untranslated regions encode a their cognate binding sites. This suggests that, for families stem loop structure (Figure 9a) that, when bound by protein, like the core histone genes, phylogenetic footprinting [39] greatly increases mRNA stability. This mechanism is respon-

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.11

(a) (b) comment

Spt10 matrix distribution Randomized Spt10 matrix distribution

20 20

18 18

16 16 S. paradoxus S. mikatae 14 14 S. cerevisiae S. kudriavzevii S. bayanus 12 12 S. castellii C. albicans reviews S. kluyveri Clover score Clover Clover score Clover 10 10 K. waltii A. gossypii 8 8

6 6

4 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Relative distance from ATG Relative distance from ATG

(c) (d) reports

NEG matrix distribution Randomized NEG matrix distribution

14 14

12 12 deposited research deposited

10 10 S. paradoxus S. mikatae S. cerevisiae

8 8 Clover score Clover Clover score Clover

6 6

4 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Relative distance from ATG Relative distance from ATG refereed research

(e) (f) TBP matrix distribution Randomized TBP matrix distribution

13 13

12 12

11 11

S. paradoxus S. mikatae

10 10 interactions S. cerevisiae S. kudriavzevii 9 9 S. bayanus S. castellii C. albicans S. kluyveri Clover score Clover 8 score Clover 8 K. waltii A. gossypii 7 7

6 6

5 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Relative distance from ATG Relative distance from ATG information

RelativeFigure 7positions of cis-regulatory motifs among Sacchromycetales core histone gene promoters Relative positions of cis-regulatory motifs among Sacchromycetales core histone gene promoters. The relative location of each motif is shown along with its raw CLOVER score; only motifs with scores ≥6 are shown. Relative positions are shown for (a) Spt10, (c) NEG, and (e) TBP. (b, d, f) Randomized distributions of motif positions are shown for comparison.

Genome Biology 2006, 7:R122 R122.12 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

(a) H2A H2B

H3

H4

(b)

1 1

3 1 2

9

-9

(c) /18-9 2 2 s/18-9 2 1 H4 A -9 9 9-9 -9 2 ens/18 1 H4 9 i/18 nidu H4 A go S kluyv ns/19 2 H4 S altii/19- ypi sapi urpuratu s lans 2 H4 S kudriavzevii/25-9 veri/19-9 paradoxus/25 K w y 9-9 icans/18- 1 e / 2A H lu ssypii/ 2A C elegans/20- b r 25- isiae/19-9 H

i thaliana/19-9 A H2A /25-9 H al 2 H2A M musculus 2A S pombe/1 H2A 9 H2A S p C 2 H4 S mikatae/2 4 H 9-9 25-9 H2A A nidula H2A A gos 4 2A e/1 /19-9 H4 S bayanus H2A S kH vii 4 -9 2 H4 S cerevis 4 H2A S cerev mikata riavze 9-9 2 4 H2A S paradoxus/ lli/1 5-9 HMf-1/1-6 te /25-9 H2A S s H4 4 Hm a S castellii/2iae/25-9 HM 0 4 H2A S kud S c -9 fa/ bayanus/19-9 H4 K w f-2/2-6 H2A /17 4 1 ae -6 0 H4 C albi 5 8 9 altii/25-9-9 7 H2A S 4 0 7 ananass H4 cans/ 4 r/17-9 S pomb 24-9 H2A D 3 H4 e/25 H2A D simulans/17- melanogaste90 M mus -9 D culus/25- 4 H2A rilis/17- H4 D m D vi 0 9 H2A ojavensis/25-9 4 90 H4 D pseudoo 2A D yakuba/17-9is/17- 4 H 4 bscura/25-9 mojavens H2A D 4 H H2A D pseudoobscura/20-9 4 D yakuba 2 /25-94 a/32-10 H2B A thalian H4 D viril is/25-94 8 H2B C elegans/28-9 H4 D melanogaster/25-94 H2B M musculus/32-102 H4 D simulans/25-94 H2B H sapiens/32-102 -94 H2B S purpuratu H4 D ananassae/25 s/28-9 4 H2B 8 iens/25-9 D pseudoo H4 H sap 4 H2 bscura/28-9 25-9 B D mo uratus/ H2B javensis/28-9 8 purp 94 D v H4 S i H2B rilis/28-9 8 4 D 8 H4 C elegans/25-/25-9 2 H2B D simelanogaste tii/58-13 2 H2B D A thaliana mul r H4 58-13 ans/2 /28-9 H3 K wal / 2 H2B D ananassae/35-10 8 8-9 2 H2B S yakuba/ 8 H3 A gossypiiyveri/58-13 2 13 H2B A nidulans/29-9 pombe/28- elli/58-13 0 21- 9 H 9 5 H3 S klu 2 2B C alb 1 H2B A 3 S cast anus/58- 2 H 9 2 H2B K waltii/36-10 8 2 icans/3 3 S bay H2B S kluyveri/35 gossypii/36-10 H s/58-13 13 2 2 H 2B S 5-1 9 2 H2B S H3 S kudriavzevii/16- 2 H2B S 0 2 H 5

H3 S mikatae/58-13 58-13 2 2

2 2 cerevisiae/35-10 be/58-13 2 2 H2B 2B S 2 ulans/58- 13 2

m - 13 132 H2B S cast 3 par 6 /58-13 6 a -13 mik

H3 S paradoxu /58 58- 58- S ku adoxus/35-10

58-13 - 10 H3 S cerevisiae/58-13 bayanus/35-10 atae/35-10 H3 A nid driavze 5

H3 S po ns/

us/58-13 e avensis/58-13 ans/58-13 oobscur virilis/

aster/58 at

H3 C albicans/ oj yakuba ellii/35-1

haliana/58-13 api

ur elegans/ 5 D usculus/58-13 v

p

3 r ii/35-10 H3 D m 5 H 5 elanog

D pseud pu H3 C C H3

H3 D m t A H3 3 D simul

H3 H s H H3 0

D ananassae/58-1 H 5 5

H3 5 H3 M M H3

H3 SH3

H3

H3 D m

Figure 8 (see legend on next page)

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.13

CoreFigure histone 8 (see proteinprevious structure page) and sequence evolution Core histone protein structure and sequence evolution. (a) Structural relationships between the four core histone protein families. (b) Three-dimensional nucleosome structure is shown with each core histone chain colored: H2A, orange; H2B, yellow; H3, blue; and H4, green. (c) Sequence relationships comment within and between the four core histone protein families. Internal nodes that set-off each core histone family are color coded using the same scheme used for the nucleosome structure. The tree is rooted with archaeal histone-like sequences (white internal node).

sible for 70% of the upregulation of core histones in S phase. function while simultaneously allowing for selective testing of The sequence that forms the stem loop is conserved across novel expression patterns [47]. Such an evolutionary mode, metazoans (Figure 9b). The emergence of this mechanism with less pronounced purifying and more prominent adaptive may have allowed for some of the turnover of the cis-trans selection, could explain the observation that novel cis-trans reviews regulatory machinery among metazoan genomes subsequent combinations are subject to substantial turnover and may be to their divergence from the yeast evolutionary lineage. regularly reinvented among evolutionary lineages. In addition, the inherent evolutionary flexibility of regulatory There are additional regulatory elements that may help to systems may allow for coordinated within-species changes achieve coordinated regulation of core histone genes in meta- that respond to epistatic pressure from other regulatory path- zoans. For instance, a sequence found in core histone gene ways in the same lineage that share transcription factors. encoding regions is important for their expression and may serve as an internal promoter element common to the It is currently unclear whether the turnover of regulatory reports mammalian lineage [41-43]. In addition, the transcription mechanisms, in the face of conserved expression patterns, is factor NPAT has been implicated as a global regulator of core unique to core histones or also occurs for other gene families. histone gene expression among metazoans even though it Some studies on the evolution of gene regulation do report does not seem to bind any DNA sequence directly [44-46]. evidence of conserved regulatory sequences and expression This may provide yet another global lineage specific regula- patterns [47,48], whereas others indicate that gene regula- tory mechanism that distinguishes the metazoan mode of tory networks do in fact diverge rapidly [49-51]. However, research deposited core histone gene regulation from that of yeast. regulatory divergence usually leads to distinct expression pat- terns [51-53]. Interestingly, although yeast core histone Even though the four core histone gene families (H2A, H2B, transcripts include polyA tails, core histone transcripts are H3, and H4) diverged before the species studied here, the reg- unique among metazoan transcripts in that they lack polyA ulatory mechanisms are more similar for different family tails. The absence of polyA tails, which are often bound by members within species than for the same family members poly(A)-binding proteins to promote translation initiation, between species (Figures 4 and 5). Thus, there is a kind of may necessitate, to some extent, species-specific solutions to

concerted regulatory evolution operating between members core histone gene regulation. refereed research of different core histone gene families. This pattern stands in stark contrast to the pattern of core histone sequence evolu- The comparative genomics of core histone gene regulation tion, whereby members of the same family are more similar to reveal a novel evolutionary mode, which we dub 'circuitous one another across species reflecting their more recent com- evolution'. Circuitous evolution of core histone gene regula- mon ancestry (Figure 8). This suggests that very different tion is distinct from convergent evolution, because the con- modes of evolution exist for histone gene regulation versus servation of the core histone gene regulatory patterns protein sequence and structure. The solution space for pro- suggests that the same pattern existed in the last common moter sequence evolution (the space of functionally viable ancestor of all species analyzed here. After divergence from

cis-regulatory binding site sequences) may be far more vast the last common ancestor, the core histone expression pat- interactions than that of core histone protein sequences. This results in a terns remained unchanged but the regulatory mechanisms much more dynamic evolutionary paradigm for promoter that give rise to the conserved phenotype diverged dramati- sequences and the transcription factors proteins that bind cally. Thus, with respect to core histone gene regulation, them. Purifying selection may be less efficacious at eliminat- where you are from and where you are are far more important ing variants of cis-regulatory sites because a number of than how you get there. sequence variants may bind transcription factors with similar affinities. In addition, new cis-regulatory sites, which are As an addendum, during revision of the manuscript we

short and degenerate by nature, may arise relatively quickly became aware of a recently published paper [54], which con- information through mutation along the promoter. It is possible that these firms that the specific periodic pattern of core histone gene new variants can lead to an exploration of expression space expression is uniquely evolutionarily conserved. The report and rapid fixation of adaptive variants by positive selection. by Jensen and coworkers also demonstrates how many differ- Adaptive expression changes of this type may be facilitated by ent regulatory solutions have evolved to control the periodic the emergence of intermediate redundant regulatory expression of integrated biological systems that function in programs that maintain the ancestral expression pattern and the cell cycle.

Genome Biology 2006, 7:R122 R122.14 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

(a) U U U C U = A C ≡ G U = A C ≡ G G ≡ C 5’ CCAAAG ≡ CACCAA 3’

(b)

StructureFigure 9 and conservation of the histone 3'-UTR stem loop Structure and conservation of the histone 3'-UTR stem loop. (a) Schema of the 3'-UTR stem loop structure present in metazoan mRNAs. (b) Sequence logo representation of the histone 3'-UTR stem loop. The sequences (accession number: RF00032) were obtained from the Rfam database [68]. UTR, untranslated region.

Materials and methods project databases [55] in order to locate the precise genomic Promoter sequences regions of core histones. Genome project species-spe- Core histone protein coding sequences were obtained from cific databases include complete Reference Sequence (Ref- the histone database [3]. A list of species from which the Seq) genomes when available or whole genome shotgun sequences were obtained is provided in Additional data file 3. sequence entries when RefSeq versions are unavailable. Core Core histone protein coding sequences were used as queries histone proximal promoter sequences were taken as 1 kilo- in a series of tblastn [33] searches against species-specific base upstream of the annotated translational start site. For National Center for Biotechnology (NCBI) Entrez Genome bidirectional promoters, the entire intergenic regions (range

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.15

133 to 970 bp; average 413 bp) were taken for analysis. Pro- individual core histone proteins were performed using the moter sequences are provided as Additional data file 1. The fold classifications computed in the Dali database [37,64]. Z

nomenclature reported by Marzluff and coworkers [56] was scores between individual core histone proteins were taken comment used for the human and mouse core histone genes. and converted to pairwise distances (d) by normalizing with the self-similarity Z score using the following equation: Cis-regulatory binding sites The DNA binding subunits of the transcription factor pro- Z =− ij teins and their cognate cis-regulatory binding sites were dij 1 Z taken from the published literature as described in the Intro- ii duction and Results and discussion sections. PFMs of the cis- Pair-wise distances were used to calculate a neighbor joining regulatory motifs were taken from the TRANSFAC database tree [32] using the program MEGA [65]. reviews [29] or, when not available, generated from the binding site alignments reported in the original citation. The PFMs were Gene expression used with the program CLOVER [30] to search the core his- Gene expression data were taken from reports published else- tone promoter regions for the presence of the cis-regulatory where [8-13]. For Figure 2, relative expression levels (log2 motifs. The complete set of CLOVER predictions is provided ratios) for S. cerevisiae were plotted against cell cycle time in Additional data file 4. CLOVER output was used to con- points, and visualization was done using matrix2png [66]. struct promoter-specific vectors composed of scores of over- represented and/or under-represented cis sites for each reports sequence; the vectors were then used to compare promoter Additional data files sequences with pair-wise Pearson correlation coefficients. The following additional data are available with the online CLOVER also gives the position of each predicted motif and version of this article. Additional data file 1 contains the pro- these positions were normalized by the length of the pro- moter sequences of core histone genes used in the study. moter sequence to give the relative lengths shown in Figure 7 Additional data file 2 contains the core histone protein panels a, c and e. Locations were randomly sampled from a sequences used in the study. Additional data file 3 contains deposited research deposited uniform distribution in order to generate the negative control the list of species used in the study. Additional data file 4 con- plots shown in Figure 7 panels b, d and f. tains the CLOVER predictions for all core histone gene pro- moters used in the study. The TFBS Perl modules [57] were used to further analyze cis- AdditionalPromoterClickCoreListCLOVERthe study of histonehere species predictions sequencesfor data protein file used file 1234in of sequencesfor the core all study corehistone used histone genes in the gene used study promoters in the study used in regulatory sequence binding motifs. For each cis-regulatory motif, sequences of all the motif sites predicted by CLOVER Acknowledgements were extracted and aligned. These alignments were used to The authors would like to thank Alex Brick and Geoffrey Watson for their construct PFMs, which were converted to position weight assistance in obtaining core histone intergenic regions during their intern- ships at NCBI and Boris E Shakhnovich for helpful discussions. We are refereed research matrices by normalizing with background nucleotide fre- grateful to two anonymous reviewers for valuable comments and sugges- quencies. Information content per cis-regulatory sequence tions. This study utilized the high-performance computational capabilities of motif position [58], taken from the position weight matrices, the Biowulf PC/Linux cluster at the National Institutes of Health, Bethesda, Maryland, USA [67]. The authors wish to thank several anonymous review- were used to build sequence logos with the program WebLogo ers for very helpful comments and suggestions. This research was sup- [59]. ported by the Intramural Research Program of the NIH, NLM, and NCBI.

Protein sequence and structure Core histone protein sequences were taken from the histone References 1. van Holde KE: London: Springer-Verlag; 1989. database [3]. Protein sequences are provided in Additional 2. Malik HS, Henikoff S: Phylogenomics of the nucleosome. Nat data file 2. A probabilistic HMM representing the HFD found Struct Biol 2003, 10:882-891. interactions in all core histone proteins [34,35] was taken from the Pfam 3. Marino-Ramirez L, Hsu B, Baxevanis AD, Landsman D: The Histone Database: a comprehensive resource for histones and his- database [60]. The HFD (Pfam accession number: PF00125) tone fold-containing proteins. Proteins 2006, 62:838-842. was extracted from each core histone protein sequence using 4. Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ: Crys- tal structure of the nucleosome core particle at 2.8 A the HMM with the program HMMER [61]. The core histone resolution. Nature 1997, 389:251-260. HFD multiple sequence alignment was built by aligning each 5. Kornberg RD: Chromatin structure: a repeating unit of his- HFD sequence back to the PF00125 HMM, thus preserving tones and DNA. Science 1974, 184:868-871. 6. Gunjan A, Paik J, Verreault A: Regulation of histone synthesis the same structural register for all HFD domain sequences. and nucleosome assembly. Biochimie 2005, 87:625-635. The program QuickTree [31] was used to build a neighbor 7. Osley MA: The regulation of histone synthesis in the cell cycle. information joining tree [32] from the HFD multiple sequence alignment. Annu Rev Biochem 1991, 60:827-861. 8. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis A three-dimensional structure of the nucleosome core, PDB RW: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2:65-73. ID:1KX5 [62], used for comparison was taken from the RCSB 9. Cho RJ, Huang M, Campbell MJ, Dong H, Steinmetz L, Sapinoso L, [63]. Structural comparisons between the Hampton G, Elledge SJ, Davis RW, Lockhart DJ: Transcriptional

Genome Biology 2006, 7:R122 R122.16 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. http://genomebiology.com/2006/7/12/R122

regulation and function during the human cell cycle. Nat Nucleic Acids Res 2004, 32:1372-1381. Genet 2001, 27:48-54. 31. Howe K, Bateman A, Durbin R: QuickTree: building huge Neigh- 10. Oliva A, Rosebrock A, Ferrezuelo F, Pyne S, Chen H, Skiena S, bour-Joining trees of protein sequences. Bioinformatics 2002, Futcher B, Leatherwood J: The cell cycle-regulated genes of 18:1546-1547. Schizosaccharomyces pombe. PLoS Biol 2005, 3:e225. 32. Saitou N, Nei M: The neighbor-joining method: a new method 11. Rustici G, Mata J, Kivinen K, Lio P, Penkett CJ, Burns G, Hayles J, for reconstructing phylogenetic trees. Mol Biol Evol 1987, Brazma A, Nurse P, Bahler J: Periodic gene expression program 4:406-425. of the fission yeast cell cycle. Nat Genet 2004, 36:809-817. 33. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lip- 12. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, man DJ: Gapped BLAST and PSI-BLAST: a new generation of Brown PO, Botstein D, Futcher B: Comprehensive identification protein database search programs. Nucleic Acids Res 1997, of cell cycle-regulated genes of the yeast Saccharomyces cer- 25:3389-3402. evisiae by microarray hybridization. Mol Biol Cell 1998, 34. Arents G, Moudrianakis EN: The histone fold: a ubiquitous archi- 9:3273-3297. tectural motif utilized in DNA compaction and protein 13. Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander dimerization. Proc Natl Acad Sci USA 1995, 92:11170-11174. KE, Matese JC, Perou CM, Hurt MM, Brown PO, Botstein D: Identi- 35. Baxevanis AD, Arents G, Moudrianakis EN, Landsman D: A variety fication of genes periodically expressed in the human cell of DNA-binding and multimeric proteins contain the histone cycle and their expression in tumors. Mol Biol Cell 2002, fold motif. Nucleic Acids Res 1995, 23:2685-2691. 13:1977-2000. 36. Marino-Ramirez L, Kann MG, Shoemaker BA, Landsman D: Histone 14. Birnbaum MJ, Wright KL, van Wijnen AJ, Ramsey-Ewing AL, Bourke structure and nucleosome stability. Expert Rev Proteomics 2005, MT, Last TJ, Aziz F, Frenkel B, Rao BR, Aronin N, et al.: Functional 2:719-729. role for Sp1 in the transcriptional amplification of a cell cycle 37. Holm L, Sander C: Mapping the protein universe. Science 1996, regulated gene. Biochemistry 1995, 34:7648-7658. 273:595-603. 15. Cross SL, Smith MM: Comparison of the structure and cell 38. Eickbush TH, Moudrianakis EN: The histone core complex: an cycle expression of mRNAs encoded by two -H4 octamer assembled by two sets of protein-protein loci in Saccharomyces cerevisiae. Mol Cell Biol 1988, 8:945-954. interactions. Biochemistry 1978, 17:4955-4964. 16. Eriksson PR, Mendiratta G, McLaughlin NB, Wolfsberg TG, Marino- 39. Zhang Z, Gerstein M: Of mice and men: phylogenetic footprint- Ramirez L, Pompa TA, Jainerin M, Landsman D, Shen CH, Clark DJ: ing aids the discovery of regulatory elements. J Biol 2003, 2:11. Global regulation by the yeast Spt10 protein is mediated 40. Dominski Z, Marzluff WF: Formation of the 3' end of histone through chromatin structure and the histone upstream acti- mRNA. Gene 1999, 239:1-14. vating sequence elements. Mol Cell Biol 2005, 25:9127-9137. 41. Bowman TL, Hurt MM: The coding sequences of mouse H2A 17. Fletcher C, Heintz N, Roeder RG: Purification and characteriza- and H3 histone genes contains a conserved seven nucleotide tion of OTF-1, a transcription factor regulating cell cycle element that interacts with nuclear factors and is necessary expression of a human histone H2b gene. Cell 1987, for normal expression. Nucleic Acids Res 1995, 23:3083-3092. 51:773-781. 42. Bowman TL, Kaludov NK, Klein M, Hurt MM: An H3 coding region 18. Matsumoto S, Yanagida M: Histone gene organization of fission regulatory element is common to all four nucleosomal yeast: a common upstream sequence. EMBO J 1985, classes of mouse histone-encoding genes. Gene 1996, 176:1-8. 4:3531-3538. 43. Eliassen KA, Baldwin A, Sikorski EM, Hurt MM: Role for a YY1- 19. Osley MA, Gould J, Kim S, Kane MY, Hereford L: Identification of binding element in replication-dependent mouse histone sequences in a yeast histone promoter involved in periodic gene expression. Mol Cell Biol 1998, 18:7106-7118. transcription. Cell 1986, 45:537-544. 44. Ma T, Van Tine BA, Wei Y, Garrett MD, Nelson D, Adams PD, Wang 20. Oswald F, Dobner T, Lipp M: The E2F transcription factor acti- J, Qin J, Chow LT, Harper JW: Cell cycle-regulated phosphoryla- vates a replication-dependent human H2A gene in early S tion of p220(NPAT) by cyclin E/Cdk2 in Cajal bodies pro- phase of the cell cycle. Mol Cell Biol 1996, 16:1889-1895. motes histone gene transcription. Genes Dev 2000, 21. Sive HL, Heintz N, Roeder RG: Multiple sequence elements are 14:2298-2313. required for maximal in vitro transcription of a human his- 45. Wei Y, Jin J, Harper JW: The cyclin E/Cdk2 substrate and Cajal tone H2B gene. Mol Cell Biol 1986, 6:3329-3340. body component p220(NPAT) activates histone transcrip- 22. van den Ent FM, van Wijnen AJ, Lian JB, Stein JL, Stein GS: Cell cycle tion through a novel LisH-like domain. Mol Cell Biol 2003, controlled histone H1, H3, and H4 genes share unusual 23:3669-3680. arrangements of recognition motifs for HiNF-D supporting a 46. Zhao J, Kennedy BK, Lawrence BD, Barbie DA, Matera AG, Fletcher coordinate promoter binding mechanism. J Cell Physiol 1994, JA, Harlow E: NPAT links cyclin E-Cdk2 to the regulation of 159:515-530. replication-dependent histone gene transcription. Genes Dev 23. Xie R, van Wijnen AJ, van Der Meijden C, Luong MX, Stein JL, Stein 2000, 14:2283-2297. GS: The cell cycle control element of histone H4 gene tran- 47. Tanay A, Regev A, Shamir R: Conservation and evolvability in scription is maximally responsive to interferon regulatory regulatory networks: the evolution of ribosomal regulation factor pairs IRF-1/IRF-3 and IRF-1/IRF-7. J Biol Chem 2001, in yeast. Proc Natl Acad Sci USA 2005, 102:7203-7208. 276:18624-18632. 48. Gasch AP, Moses AM, Chiang DY, Fraser HB, Berardini M, Eisen MB: 24. Natsoulis G, Dollard C, Winston F, Boeke JD: The products of the Conservation and evolution of cis-regulatory systems in SPT10 and SPT21 genes of Saccharomyces cerevisiae increase ascomycete fungi. PLoS Biol 2004, 2:e398. the amplitude of transcriptional regulation at a large 49. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein number of unlinked loci. New Biol 1991, 3:1249-1259. M: Genomic analysis of regulatory network dynamics reveals 25. Natsoulis G, Winston F, Boeke JD: The SPT10 and SPT21 genes large topological changes. Nature 2004, 431:308-312. of Saccharomyces cerevisiae. Genetics 1994, 136:93-105. 50. Madan Babu M, Teichmann SA, Aravind L: Evolutionary dynamics 26. Denis CL, Malvar T: The CCR4 gene from Saccharomyces cere- of prokaryotic transcriptional regulatory networks. J Mol Biol visiae is required for both nonfermentative and spt-mediated 2006, 358:614-633. gene expression. Genetics 1990, 124:283-291. 51. Ihmels J, Bergmann S, Gerami-Nejad M, Yanai I, McClellan M, Berman 27. Mendiratta G, Eriksson PR, Shen CH, Clark DJ: The DNA-binding J, Barkai N: Rewiring of the yeast transcriptional network domain of the yeast Spt10p activator includes a zinc finger through the evolution of motif usage. Science 2005, that is homologous to foamy virus integrase. J Biol Chem 2006, 309:938-940. 281:7040-7048. 52. Gu Z, Nicolae D, Lu HH, Li WH: Rapid divergence in expression 28. Chowdhary R, Ali RA, Albig W, Doenecke D, Bajic VB: Promoter between duplicate genes inferred from microarray data. modeling: the case study of mammalian histone promoters. Trends Genet 2002, 18:609-613. Bioinformatics 2005, 21:2623-2628. 53. Makova KD, Li WH: Divergence in the spatial pattern of gene 29. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie expression between human duplicate genes. Genome Res 2003, A, Reuter I, Chekmenev D, Krull M, Hornischer K, et al.: TRANS- 13:1638-1645. FAC and its module TRANSCompel: transcriptional gene 54. Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P: Co-evolu- regulation in eukaryotes. Nucleic Acids Res 2006, 34:D108-D110. tion of transcriptional and post-translational cell-cycle 30. Frith MC, Fu Y, Yu L, Chen JF, Hansen U, Weng Z: Detection of regulation. Nature 2006, 443:594-597. functional DNA motifs via statistical over-representation. 55. NCBI Entrez Genome Project [http://www.ncbi.nlm.nih.gov/

Genome Biology 2006, 7:R122 http://genomebiology.com/2006/7/12/R122 Genome Biology 2006, Volume 7, Issue 12, Article R122 Mariño-Ramírez et al. R122.17

entrez/query.fcgi?db=genomeprj] 56. Marzluff WF, Gongidi P, Woods KR, Jin J, Maltais LJ: The human and mouse replication-dependent histone genes. Genomics 2002,

80:487-498. comment 57. Lenhard B, Wasserman WW: TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics 2002, 18:1135-1136. 58. Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990, 18:6097-6100. 59. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188-1190. 60. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al.: The Pfam

protein families database. Nucleic Acids Res 2004, 32:D138-D141. reviews 61. HMMER [http://hmmer.janelia.org] 62. Davey CA, Sargent DF, Luger K, Maeder AW, Richmond TJ: Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. J Mol Biol 2002, 319:1097-1113. 63. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28:235-242. 64. Dali Database [http://ekhidna.biocenter.helsinki.fi/dali/start] 65. Kumar S, Tamura K, Nei M: MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 2004, 5:150-163. 66. Pavlidis P, Noble WS: Matrix2png: a utility for visualizing reports matrix data. Bioinformatics 2003, 19:295-296. 67. Biowulf at the NIH [http://biowulf.nih.gov] 68. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 2005, 33:D121-D124. deposited research deposited refereed research interactions information

Genome Biology 2006, 7:R122