Available online at www.sciencedirect.com

Comparative genomics of gene regulation—conservation and divergence of cis-regulatory information Antonio CA Meireles-Filho and Alexander Stark

We recently witnessed a tremendous increase in genomics and regulator binding sites. Combined with detailed studies on gene regulation and in entirely sequenced genomes dissections of individual regulatory sequences, this has from closely related species. This has triggered analyses that answered old questions but created new ones and – in suggest a wide range of evolutionary dynamics of gene particular – fuelled the controversy over the evolutionary regulation, from rapid turnover of -factor binding conservation of regulatory sequences. In this review, we sites to conservation of function across large will discuss current work on the conservation and diver- evolutionary distances. Many examples show that enhancers gence of regulatory sequences. can evolve beyond recognizable sequence similarity while retaining function. However, bioinformatics approaches are Main text increasingly able to detect conserved regulatory elements Gene-regulatory information through characteristic evolutionary sequence signatures. The genome encodes gene-regulatory information in the Cis-regulatory changes are also a major source of form of modular cis-regulatory sequences or enhancers, morphological evolution, which might be facilitated by many each of which contributes certain parts of a gene’s overall biochemically functional elements that are selectively neutral expression pattern in an additive fashion (e.g. [1]). and by the buffering function of redundant enhancers and Enhancers have been found thousands of base pairs ‘shadow’ enhancers. upstream or downstream of the regulated gene, in the gene’s or in neighboring genes’ introns, and even work Address across intervening genes (e.g. [2,3]). While this indicates Research Institute of Molecular Pathology (IMP), Dr. Bohr-Gasse 7, that the relative location of enhancers can be highly A-1030 Vienna, Austria variable, some enhancers’ positions are constrained, such Corresponding author: Stark, Alexander ([email protected]) as a distal enhancer of brinker in Drosophila and mosqui- toes [3], or the Hox gene cluster (see discussion in [3]).

Current Opinion in Genetics & Development 2009, 19:565–570 Each enhancer contains binding sites for specific sets of This review comes from a themed issue on TFs and is thus only active in cell types where certain Genomes and evolution factors are present. It integrates and thresholds regulatory Edited by Martha Bulyk and Michael Levine cues and is typically assumed to have a binary on/off Available online 11th November 2009 output. The TFs’ binding preferences can be summarized in the form of short regulatory motifs [4], which usually 0959-437X/$ – see front matter contain specified positions that are highly discriminative # 2009 Elsevier Ltd. All rights reserved. and positions that are more degenerate. While the basic DOI 10.1016/j.gde.2009.10.006 nature of enhancers as binding platforms for TFs is estab- lished, their more detailed structural requirements remain less clear. Two models represent the two extremes of structural requirements: the enhanceosome and the bill- Introduction board models (Figure 1;[5]). The enhanceosome model is The basis for multicellular life is the ability to create based on the interferon-b enhancer, which serves as a rigid functional specialization and distinct cell types through assembly platform for defined protein complexes and has differentially regulated . The information strict requirements on binding-site composition, order, that underlies this regulation is encoded in the genome and orientation, and spacing [6]. Enhanceosomes thus display is ‘read’ by sequence-specific transcription factors (TFs). near-perfect conservation across their entire lengths [6], To date, we only know a small fraction of any animal’s which is generally rare for enhancers [7]. A billboard regulatory sequences and deciphering the sequences’ spa- enhancer in its extreme form merely integrates regulatory tio-temporal activity has remained one of the greatest – and input from individual factors that can be bound in any most fascinating – challenges of today’s biology. order, orientation, and spacing. Even the individual factors themselves might change provided that the sum of activat- The past few years have seen tremendous progress in the ing and repressing inputs remains constant [5]. genome-wide assessment of gene expression and of tran- scription-factor DNA binding. Similarly, the sequencing of Most enhancers lack a perfect conservation of motif entire genomes from closely related species has enabled organization and lie between these extremes: neuro-ecto- comparative studies of known regulatory sequences dermal enhancers in fly and mesodermal enhancers in www.sciencedirect.com Current Opinion in Genetics & Development 2009, 19:565–570 566 Genomes and evolution

Figure 1

Enhancer models and their expected evolutionary characteristics. Enhancers evolve according to their functional constraints over short (e.g. within mammals or Drosophilidae) and long evolutionary time-spans (e.g. human–fish/chicken or D. melanogaster–mosquito/sepsidae. Rigid enhanceosome- type enhancers display extended sequence conservation to maintain the exact organization of TFBS (left; [6]). Billboard-type enhancers have only minimal functional constraints and allow binding-site substitutions provided that the sum of TF input activities remains constant (right; [5]). Over large evolutionary time-spans, they appear shuffled extensively, often beyond detectable sequence similarity when using standard methods. Many enhancers are probably between these extremes and show both fixed TF arrangements and more flexible parts (center). Thin vertical or diagonal black and grey dashed lines connect orthologous motifs vs. novel motifs for the same TF (motif turnover), respectively; lack of lines indicates that a motif is replaced by a novel motif for a different, functionally equivalent TF. Horizontal black lines represent sequence with conserved orthologous TFBS, while horizontal grey dashed lines represent divergent sequence or novel sites. The conservation track (bottom) shows a likely outcome of standard conservation measures, where black indicates detectable conservation and grey minimal conservation of short elements that is possibly missed. ciona displayed characteristic motif spacing but different studying 10 kb non-coding sequence surrounding each of overall motif organization [8,9]. Enhancers can maintain 4000 orthologous gene pairs, up to 90% of each factor’s function through stabilizing selection of compensatory target genes differed between the two organisms. Even changes as revealed through the study of chimeric enhan- when the factors were bound to the promoters of ortho- cers [10]. Known fly enhancers are selected against logous genes, a majority of the binding sites did not align deletions and clustered binding sites and certain motif- [15]. This difference was determined almost exclusively arrangements are preferentially conserved [11,12,13], by changes in cis, as the factors’ binding sites on human suggesting some level of constraint on enhancer archi- chromosome 21 in mouse hepatocytes resembled those in tecture. Indeed, altered spacing in enhancers from related human cells [16]. species can have functional consequences, fine-tuning enhancer responses to morphogen gradients [8], and These findings suggest that gene regulation might have making enhancer sequences even from closely related diverged substantially, providing an attractive expla- species not always functionally equivalent [14]. This nation for the apparent lack of sequence conservation might stem from cooperative DNA binding of some of many functional elements in the human genome TFs, but lack of understanding makes it difficult to [17,18] and TFs binding sites (TFBS) in Drosophila reliably identify enhancers or to estimate the impact of [19]. Alternatively, many of the elements detected by sequence changes on their activity. ChIP might be non-functional or biochemically active yet selectively neutral, such that evolutionary conservation Conservation of cis-regulatory connections and might in fact help to identify the relevant ones sequences? [11,1719]. Indeed, it remained unclear to what extent A series of two recent papers on direct target genes of four the divergent binding sites in [15,16] corresponded to liver-specific TFs showed that TF binding diverged functional enhancers required for hepatocyte differen- substantially between human and mouse [15,16]: when tiation [20].

Current Opinion in Genetics & Development 2009, 19:565–570 www.sciencedirect.com Comparative genomics of gene regulation—conservation and divergence of cis-regulatory information Meireles-Filho and Stark 567

Figure 2 example, the deep conservation ofa hairy binding site across the sequenced Drosophila species [11,33] that allowed its comparative prediction at very high confidence [26]. Sequence conservation also improved the identification of enhancers via clustering of TFBS [34,35,36].

Detecting constrained cis-regulatory elements computationally—accounting for evolutionary signatures Frequent alterations of regulatory connections is also unexpected given many reports in the literature of enhan- cers, whose function is conserved across large evolution- ary distances such as between mammals and fish (e.g. [30,37,38], or between Drosophila, Anopheles,andSepsis (e.g. [3,39,40].

Some of these studies concluded that standard measures to assess genome sequence conservation tend to ‘overlook regulatory sequences’ [38]: five commonly used Comparative prediction of individual TFBS. Comparative genomics in 12 measures of evolutionary sequence constraint detected Drosophila genomes using the Branch-Length-Score (BLS) framework only roughly half of the functional phox2b enhancers in [26] predicted the physiologically relevant binding site of the Drosophila human–fish comparisons [38], similar to an earlier result TF hairy in the upstream region of achaete [65]. The hairy motif (PWM) is from Transfac [66] and we created the sequence logo with the WebLogo using other methods including BLAT (see below; [37]). program [67]. The thin and thick blue lines represent the achaete non- However, human and fish enhancers still shared features coding and coding regions, respectively, the transcription start is such as nucleotide composition and motif-content that indicated by the blue arrow, and the conservation track shows sequence discriminated them from non-functional sequence. Lack conservation in insects (modified after a UCSC Genome Browser screenshot [68]. of extensive nucleotide identity over large evolutionary distances has also been found between Drosophila and scavenger flies (Sepsidae;[39]): despite their near identical Rapid divergence of crucial core regulatory connections expression pattern in transgenic Drosophila embryos, sep- would be unexpected given highly similar developmental sid enhancer sequences appeared strongly diverged, partly programs regulated by orthologous TFs with conserved rearranged, and seemed to have lost a substantial fraction of expression patterns. Some factors and their regulatory con- the binding sites required in Drosophila [39]. Residual nections are very deeply conserved among animals, most similarities were restricted to short blocks, which were notably the so-called ‘gene-regulatory network kernels’ strongly enriched in TFBS [39] and partly occurred in such as the heart specification kernel that is shared between conserved order, distance, and orientation [41,42]. flies and vertebrates (e.g. [21]). Further, the similarity of transcriptional profiles between homologous cell types Interestingly, in all cases conservation of the respective (‘molecular fingerprints’, e.g. [22]) suggests conservation enhancer sequences had been detected in more closely of the defining regulators and their target genes. Regulatory related species, e.g. between zebrafish and fugu, human connections related to metabolism and growth can also be and non-primate mammals [37], and within the Drosophila highly conserved, such as ribosome biogenesis genes that species [39]. In addition, short blocks of shared sequences contain promoter-proximal E(CG) binding sites for the or conserved TF binding motifs in these regions could be Myc:Max heterodimer in many species [23]. detected and served to predict conserved enhancers across large evolutionary distances (e.g. [30,40]), indicat- Many studies suggest that regulatory sequences are con- ing that there seems to be a discrepancy between func- served, especially at close evolutionary distances. For tional conservation and detectable sequence conservation example, experimentally determined TFBS in Drosophila of enhancers. Some of the studies above relied on melanogaster contain TF binding motifs that are conserved methods that require extended sequence identity beyond in other Drosophila species [24,25]. Detailed comparison the lengths of individual TF motifs (e.g. BLAT; [43]). withsequence-matchedcontrols showedthat TFmotifsare Further, individual TFBS evolve according to the motifs more highly conserved than the overall enhancer sequence degeneracy profile, such that, for example, positions with [26]. Many TF binding motifs also show significantly low information content appear unconstrained [12,44]. increased genome-wide conservation levels that allow their de novo discovery (e.g. [11,27–29]) and the prediction of Several different strategies have recently been developed individual TFBS and targetgenes inyeast,worms, flies,and to account for the evolutionary properties of regulatory mammals (e.g. [26,30,31,32]). Figure 2 illustrates, for sequences. For example, Asthana et al. aimed specifically www.sciencedirect.com Current Opinion in Genetics & Development 2009, 19:565–570 568 Genomes and evolution

at detecting short elements under evolutionary constraint The functional divergence of enhancers is probably facili- [45], and Garber et al. at detecting biased substitutions tated by redundancies that can buffer loss-of-function characteristic of degenerate positions within TFBS [46]. situations: several genes have multiple enhancers (e.g. An entirely novel approach focused on the three-dimen- ‘shadow enhancers’ [59]) that drive expression in over- sional structure (topology) of the DNA helix that probably lapping or identical patterns. Further, neutral binding affects regulator binding [47]. While DNA topology can be sites might be a rich source for the de novo creation of inferred from the sequence, the relationship is not simple enhancers from non-functional sequence [17], which has and roughly twice as many bases had conserved topology probably happened during evolution given clearly differ- rather than sequence. All three approaches detected novel ent paralogous enhancers with similar functions [60]. elements that were enriched in regulatory sequences such Additionally, an old theory that transposable elements as DNase I hypersensitive sites. These examples empha- might contribute substantially to regulatory re-wiring [61] size the notion of evolutionary signatures that reflect the gained recent support: they harbour binding sites for functional constraints of diverse classes of functional ele- several TFs [62,63], consistent with their exaptation ments and are more suited for their comparative discovery as functional enhancers [64]. than raw nucleotide-level sequence conservation [11]. Conclusions Assessing motif conservation and predicting cis- Over the course of evolution, orthologous regulatory regulatory motifs sequences diverge by accumulating sequence changes that An important aspect in phylogenetic analyses is the under- are selected negatively if they disrupt important functions lying sequence alignments. Even for protein sequences, leading to phenotypic defects or otherwise evolve neu- multiple sequence alignments are the source of several trally. This is analogous to synonymous changes in protein- problems and can result in erroneous conclusions [48], coding DNA sequences, which do not alter the protein especially regarding the placement of gaps [49]. Multiple sequence and thus occur frequently. As we currently genome alignments generally suffer from similar problems, cannot assess the functional impact of regulatory sequence which strongly influence enhancer analyses and predic- changes, we rely on more or less direct measures of nucleo- tions: the conservation of TF motif instances across 12 tide-level sequence conservation. These often ignore the Drosophila genomes, for example, differed substantially enhancers’ functional and evolutionary properties (evol- (up to 50%) between different alignments [26]. utionary signatures; [11]) and may thus not detect con- served regulatory sequences (Figure 1). Such errors can lead to apparent motif-divergence or -loss, such that local adjustments during the motif search (e.g. We foresee that regulatory genomics datasets (e.g. ChIP- rMonkey; [50]), or motif-aware alignments [51–53] might seq and RNA-seq) from individual species within a phy- be more sensitive in detecting conservation. As an alterna- logeny will soon allow the distinction of functional con- tive, we chose to scan the motifs in the individual informant servation, neutral divergence, and species-specific sequences separately and used the alignment only to map regulation. Novel bioinformatics approaches will then identified motifs to a common reference. Depending on help to determine the minimal requirements for con- the motifs’ information content, we were able to allow for served enhancer function, an important step towards substantial motif-movement in the alignment, which – understanding the regulatory code. together with a framework for scoring conservation accord- ing to the phylogenetic tree (Branch-Length-Score, BLS) – Conflict of interest enabled sensitive and specific predictions (Figure 2;[26]). The authors declare no conflict of interest. This framework has recently been applied for microRNA target prediction in mammals [54] and extended to include Acknowledgements a probabilistic term to measure the likelihood of motif- We thank members of the Stark group (IMP) and Roger Revilla (IMP) for helpful discussions and apologize to all authors that we could not cite presence in each species [31]. because of the limited number of allowed citations and the requirement to restrict them to recent work. The Stark group is supported by the Austrian Divergence of cis-regulatory elements Ministry for Science and Research through the GEN-AU Bioinformatics Integration Network III. When discussing enhancer conservation, it should not be overlooked that cis-regulatory mutations are seen as the References and recommended reading major source of evolutionary innovations [55,56,57 ], Papers of particular interest, published within the annual period of meaning that enhancer function has probably evolved review, have been highlighted as: if the corresponding phenotypic traits have diverged. For of special interest example, mutations in a tan enhancer lead to the diver- of outstanding interest gence of body pigmentation between Drosophila yakuba and santomea [2] and divergence of shavenbaby enhancers 1. Visel A, Akiyama JA, Shoukry M, Afzal V, Rubin EM, correlates with changes of trichome patterns between D. Pennacchio LA: Functional autonomy of distant-acting human melanogaster and D. sechellia [58]. enhancers. Genomics 2009, 93:509-513.

Current Opinion in Genetics & Development 2009, 19:565–570 www.sciencedirect.com Comparative genomics of gene regulation—conservation and divergence of cis-regulatory information Meireles-Filho and Stark 569

2. Jeong S, Rebeiz M, Andolfatto P, Werner T, True J, Carroll SB: The 19. Li XY, MacArthur S, Bourgon R, Nix D, Pollard DA, Iyer VN, evolution of gene regulation underlies a morphological Hechmer A, Simirenko L, Stapleton M, Luengo Hendriks CL et al.: difference between two Drosophila sister species. Cell 2008, Transcription factors bind thousands of active and 132:783-793. inactive regions in the Drosophila blastoderm. PLoS Biol 2008, 6:e27. 3. Cande J, Goltsev Y, Levine MS: Conservation of enhancer This work presents ChIP-chip data on 6 TFs involved in early patterning in positions in divergent insects. Proc Natl Acad Sci USA 2009. the Drosophila embryo and concludes that many reliably bound sequences do not correspond to active and developmentally relevant 4. Stormo GD: DNA binding sites: representation and discovery. enhancers that are under evolutionary constraint. Bioinformatics 2000, 16:16-23. 20. Zinzen RP, Furlong EE: Divergence in cis-regulatory networks: 5. Arnosti DN, Kulkarni MM: Transcriptional enhancers: intelligent taking the ‘species’ out of cross-species analysis. Genome Biol enhanceosomes or flexible billboards? J Cell Biochem 2005, 2008, 9:240. 94:890-898. 21. Davidson EH, Erwin DH: Gene regulatory networks and the 6. Panne D, Maniatis T, Harrison SC: An atomic model of the evolution of animal body plans. Science 2006, 311:796-800. interferon-beta enhanceosome. Cell 2007, 129:1111-1123. 22. Arendt D: The evolution of cell types in animals: emerging 7. Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, principles from molecular studies. Nat Rev Genet 2008, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA: 9:868-882. Ultraconservation identifies a small subset of extremely The author reviews work on the evolution of cell types and methodology to constrained developmental enhancers. Nat Genet 2008, measure and compare their expression profiles (‘molecular fingerprints’). 40:158-160. 23. Brown SJ, Cole MD, Erives AJ: Evolution of the holozoan 8. Crocker J, Tamori Y, Erives A: Evolution acts on enhancer ribosome biogenesis regulon. BMC Genom 2008, 9:442. organization to fine-tune gradient threshold readouts. Plos Biol 2008, 6:e263. 24. Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P, Furlong EE: A temporal map of 9. Erives A: Non-homologous structured CRMs from the Ciona activity: mef2 directly regulates target genome. J Comput Biol 2009, 16:369-377. genes at all stages of muscle development. Dev Cell 2006, 10:797-807. 10. Ludwig MZ, Bergman C, Patel NH, Kreitman M: Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 25. Zeitlinger J, Zinzen RP, Stark A, Kellis M, Zhang H, Young RA, 2000, 403:564-567. Levine M: Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes 11. Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, in the Drosophila embryo. Genes Dev 2007, 21:385-390. Crosby MA, Rasmussen MD, Roy S, Deoras AN et al.: Discovery of functional elements in 12 Drosophila genomes using 26. Kheradpour P, Stark A, Roy S, Kellis M: Reliable prediction of evolutionary signatures. Nature 2007, 450:219-232. regulator targets using 12 Drosophila genomes. Genome Res This paper summarizes the comparative analysis and discovery of func- 2007, 17:1919-1931. tional elements such as protein-coding genes, non-coding RNAs and microRNAs, and regulatory motifs in the 12 Drosophila genomes and 27. Ettwiller L, Paten B, Souren M, Loosli F, Wittbrodt J, Birney E: The provides references to the respective companion papers. discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol 12. Kim J, He X, Sinha S: Evolution of regulatory sequences in 12 2005, 6:R104. Drosophila species. PLoS Genet 2009, 5:e1000330. 28. Chan CS, Elemento O, Tavazoie S: Revealing 13. Elemento O, Slonim N, Tavazoie S: A universal framework for posttranscriptional regulatory elements through network- regulatory element discovery across all genomes and data level conservation. PLoS Comput Biol 2005, 1:e69. types. Mol Cell 2007, 28:337-350. 29. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, 14. Ludwig M, Palsson A, Alekseeva E, Bergman CM, Nathan J, Lander ES, Kellis M: Systematic discovery of regulatory motifs Kreitman M: Functional evolution of a cis-regulatory module. in human promoters and 30 UTRs by comparison of several Plos Biol 2005, 3:e93. mammals. Nature 2005, 434:338-345. 15. Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, 30. Del Bene F, Ettwiller L, Skowronska-Krawczyk D, Baier H, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E: Matter JM, Birney E, Wittbrodt J: In vivo validation of a Tissue-specific transcriptional regulation has diverged computationally predicted conserved Ath5 target gene set. significantly between human and mouse. Nat Genet 2007, PLoS Genet 2007, 3:1661-1671. 39:730-732. The authors use sequence conservation between the very distant species human and fish to predict TF targets, which they validate experimentally. 16. Wilson MD, Barbosa-Morais NL, Schmidt D, Conboy CM, Vanes L, Tybulewicz VL, Fisher EM, Tavare´ S, Odom DT: Species-specific 31. Xie X, Rigor P, Baldi P: MotifMap: a human genome-wide transcription in mice carrying human chromosome 21. Science map of candidate regulatory motif sites. Bioinformatics 2009, 2008, 322:434-438. 25:167-174. This is an elegant study on the evolution of transcription-factor binding between human and mouse. The authors show that differences are due to 32. Bigelow HR, Wenick AS, Wong A, Hobert O: CisOrtho: a program sequence changes in cis by excluding trans-effects in a Down syndrome pipeline for genome-wide identification of transcription factor mouse model carrying human chromosome 21. target genes using phylogenetic footprinting. BMC Bioinform 2004, 5:27. 17. Consortium EP, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo´ R, Gingeras TR, Margulies EH, Weng Z, Snyder M, 33. Consortium DG, Clark AG, Eisen MB, Smith DR, Bergman CM, Dermitzakis ET et al.: Identification and analysis of functional Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al.: elements in 1% of the human genome by the ENCODE pilot Evolution of genes and genomes on the Drosophila phylogeny. project. Nature 2007, 447:799-816. Nature 2007, 450:203-218. This paper summarizes the effort by the ENCODE consortium, including This paper describes the sequencing of the 12 Drosophila genomes and many elements that are expressed or functional according to standard summarizes the evolutionary analyses with references to the respective genomics approaches yet not evolutionarily conserved. The authors companion papers. conclude that some of them are probably biochemically functional yet selectively neutral. 34. Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, Eisen MB, Celniker SE: Computational identification of 18. King DC, Taylor J, Zhang Y, Cheng Y, Lawson HA, Martin J, developmental enhancers: conservation and function of Chiaromonte F, Miller W, Hardison RC: Finding cis-regulatory transcription factor binding-site clusters in Drosophila elements using comparative genomics: some lessons from melanogaster and Drosophila pseudoobscura. Genome Biol ENCODE data. Genome Res 2007, 17:775-786. 2004, 5:R61. www.sciencedirect.com Current Opinion in Genetics & Development 2009, 19:565–570 570 Genomes and evolution

35. Warner JB, Philippakis AA, Jaeger SA, He FS, Lin J, Bulyk ML: 52. Berezikov E, Guryev V, Plasterk RH, Cuppen E: CONREAL: Systematic identification of mammalian regulatory conserved regulatory elements anchored alignment algorithm motifs’ target genes and functions. Nat Methods 2008, for identification of transcription factor binding sites by 5:347-353. phylogenetic footprinting. Genome Res 2004, 14:170-178. The authors develop an integrated approach to identify motifs or motif combinations in sets of co-regulated genes during human myogenic 53. Sinha S, He X: MORPH: probabilistic alignment combined with differentiation and the enhancers that contain these motifs. They used hidden Markov models of cis-regulatory modules. PLoS motif conservation and clustering across multiple genomes helped to Comput Biol 2007, 3:e216. identify enhancers that were subsequently experimentally validated. 54. Friedman RC, Farh KK, Burge CB, Bartel DP: Most mammalian 36. Hallikas O, Palin K, Sinjushina N, Rautiainen R, Partanen J, mRNAs are conserved targets of microRNAs. Genome Res Ukkonen E, Taipale J: Genome-wide prediction of mammalian 2009, 19:92-105. enhancers based on analysis of transcription-factor binding 55. King MC, Wilson AC: Evolution at two levels in humans and affinity. Cell 2006, 124:47-59. chimpanzees. Science 1975, 188:107-116. 37. Fisher S, Grice EA, Vinton RM, Bessling SL, McCallion AS: 56. Wray GA: The evolutionary significance of cis-regulatory Conservation of RET regulatory function from human to mutations. Nat Rev Genet 2007, 8:206-216. zebrafish without sequence similarity. Science 2006, 312:276-279. 57. Carroll SB: Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 2008, 134:25-36. 38. McGaughey DM, Vinton RM, Huynh J, Al-Saif A, Beer MA, The author reviews work on the evolution of cis-regulatory sequences and McCallion AS: Metrics of sequence constraint overlook provides a model to explain morphological evolution by cis-regulatory regulatory sequences in an exhaustive analysis at phox2b. changes. Genome Res 2008, 18:252-260. This paper shows that commonly used measures of evolutionary constraint 58. McGregor AP, Orgogozo V, Delon I, Zanet J, Srinivasan DG, Payre F, do not detect 40–70% of functionally conserved regulatory elements. Stern DL: Morphological evolution through multiple cis- regulatory mutations at a single gene. Nature 2007, 448:587-590. 39. Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB: Sepsid even- skipped enhancers are functionally conserved in Drosophila 59. Hong JW, Hendrix DA, Levine MS: Shadow enhancers as a despite lack of sequence conservation. PLoS Genet 2008, source of evolutionary novelty. Science 2008, 321:1314. 4:e1000106. This paper coins the term ‘shadow enhancers’ for functionally overlap- ping or redundant enhancers of a specific gene and suggests that they 40. Peterson BK, Hare EE, Iyer VN, Storage S, Conner L, Papaj DR, might provide buffering activity during cis-regulatory evolution. Kurashima R, Jang E, Eisen MB: Big genomes facilitate the comparative identification of regulatory elements. PLoS ONE 60. Brown C, Johnson D, Sidow A: Functional architecture and 2009, 4:e4688. evolution of transcriptional elements that drive gene coexpression. Science 2007, 317:1557-1560. 41. Hare EE, Peterson BK, Eisen MB: A careful look at binding site The authors show that different Ciona muscle enhancers can achieve the reorganization in the even-skipped enhancers of Drosophila same function with widely different architectures, yet that functional and sepsids. PLoS Genet 2008, 4:e1000268. architectures are often preserved in orthologous enhancers. 42. Crocker J, Erives A: A closer look at the eve stripe 2 enhancers 61. Davidson EH, Britten RJ: Regulation of gene expression: of Drosophila and Themira. PLoS Genet 2008, 4:e1000276. possible role of repetitive sequences. Science 1979, 43. Kent WJ: BLAT—the BLAST-like alignment tool. Genome Res 204:1052-1059. 2002, 12:656-664. 62. Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, 44. Moses AM, Chiang DY, Kellis M, Lander ES, Eisen MB: Position Chew JL, Ruan Y, Wei CL, Ng HH et al.: Evolution of the specific variation in the rate of evolution in transcription factor mammalian transcription factor binding repertoire via binding sites. BMC Evol Biol 2003, 3:19. transposable elements. Genome Res 2008, 18:1752-1762. This work examines ChIP data to show that transposable elements (TEs) 45. Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S: are bound by TFs in vivo, suggesting that TE activity can rewire regulatory Analysis of sequence conservation at nucleotide resolution. networks, as originally suggested 30 years ago. PLoS Comput Biol 2007, 3:e254. 63. Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, 46. Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X: Burgess SM, Brachmann RK, Haussler D: Species-specific Identifying novel constrained elements by exploiting biased endogenous retroviruses shape the transcriptional network of substitution patterns. Bioinformatics 2009, 25:i54-62. the human tumor suppressor protein p53. Proc Natl Acad Sci USA 2007, 104:18613-18618. 47. Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH: Local DNA topography correlates with functional 64. Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, noncoding regions of the human genome. Science 2009, Rubin EM, Kent WJ, Haussler D: A distal enhancer and an 324:389-392. ultraconserved exon are derived from a novel retroposon. This is an interesting approach that assesses evolutionary conservation of Nature 2006, 441:87-90. DNA structure rather than DNA sequence and finds structural conserva- tion of regulatory sequences. 65. Van Doren M, Bailey AM, Esnayra J, Ede K, Posakony JW: Negative regulation of proneural gene activity: hairy is a 48. Wong KM, Suchard MA, Huelsenbeck JP: Alignment uncertainty direct transcriptional repressor of achaete. Genes Dev 1994, and genomic analysis. Science 2008, 319:473-476. 8:2729-2742. 49. Loytynoja A, Goldman N: Phylogeny-aware gap placement 66. Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, prevents errors in sequence alignment and evolutionary Hornischer K, Karas D, Kel AE, Kel-Margoulis OV et al.: analysis. Science 2008, 320:1632-1635. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31:374-378. 50. Moses AM, Pollard DA, Nix DA, Iyer VN, Li XY, Biggin MD, Eisen MB: Large-scale turnover of functional transcription factor binding 67. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sites in Drosophila. PLoS Comput Biol 2006, 2:e130. sequence logo generator. Genome Res 2004, 14:1188-1190. 51. Satija R, Pachter L, Hein J: Combining statistical alignment and 68. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, phylogenetic footprinting to detect regulatory elements. Haussler D: The human genome browser at UCSC. Genome Res Bioinformatics 2008, 24:1236-1242. 2002, 12:996-1006.

Current Opinion in Genetics & Development 2009, 19:565–570 www.sciencedirect.com