Detecting Conserved Regulatory Elements with the Model Genome Of
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Natl. Acad. Sci. USA Vol. 92, pp. 1684-1688, February 1995 Genetics Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes SAMUEL APARICIO*t, ALASTAIR MORRISONt, ALEX GOULDt, JONATHAN GILTHORPE§, CHITRITA CHAUDHURIt, PETER RIGBY§, ROBB KRUMLAUF*, AND SYDNEY BRENNER* *Molecular Genetics Unit, Department of Medicine, Addenbrookes Hospital, Cambridge CB2 2QQ, United Kingdom; iLaboratory of Developmental Neurobiology and §Laboratory of Eukaryotic Molecular Genetics at Medical Research Council National Institute for Medical Research, The Ridgeway Mill Hill, London NW7 1AA, United Kingdom Contributed by Sydney Brenner, October 26, 1994 ABSTRACT Comparative vertebrate genome sequencing with a Mg2+ concentration of 1.5 mM final. A A2001 Sau3AI offers a powerful method for detecting conserved regulatory Fugu genomic library (1) was screened at high stringency with sequences. We propose that the compact genome of the teleost this probe by using the method of Church and Gilbert (9). A Fugu rubripes is well suited for this purpose. The evolutionary phage designated H26S6A1 was obtained after three rounds of distance of teleosts from other vertebrates offers the maxi- plaque purification that contained the Hoxb-4 gene. The re- mum stringency for such evolutionary comparisons. To illus- gion from an internalEcoRI restriction site to the A polylinker, trate the comparative genome approach forF. rubripes, we use approximately 11.4 kb, was subcloned and sequenced by the sequence comparisons between mouse and Fugu Hoxb-4 non- dideoxynucleotide method (10) on an ABI373A DNA se- coding regions to identify conserved sequence blocks. We have quencer according to the manufacturer's instructions. used two approaches to test the function of these conserved DNA Sequence Analysis. The conceptual translation ofFugu blocks. In the first, homologous sequences were deleted from and mouse Hoxb-4 DNA sequences (see Fig. 1) was aligned by a mouse enhancer, resulting in a tissue-specific loss ofactivity using CLUSTALV (11). Regions of nucleotide sequence simi- when assayed in transgenic mice. In the second approach, larity were detected with two methods. First, pairwise DNA Fugu DNA sequences showing homology to mouse sequences alignments were made by using xsip in the Staden suite of were tested for enhancer activity in transgenic mice. This programs (12). The scoring matrix used was the default strategy identified a neural element that mediates a subset of MDM78, the identity score was 6, the odd-span length was 9, Hoxb-4 expression that is conserved between mammals and the proportional score was taken as default, and the standard teleosts. The comparison of noncoding vertebrate sequences deviation above the mean was 2.5. Regions of similarity appear with those of Fugu, coupled to a transgenic bioassay, repre- as continuous lines on the plots. These subregions were then sents a general approach suitable for many genome projects. either locally aligned by using the Smith-Waterman algorithm or searched for the best local similarity (13) by using the The major challenges for all genome projects will be decipher- programs high, mediurn, and low (J. Parsons, personal ing gene function and regulation. In the absence of useful communication) to search larger tracts of sequence for short grammars, testing the value of sequences by transgenesis is an similarity blocks. important approach to finding regulatory information buried Generation of Transgenic Mice. Regulatory regions were in the noncoding regions that constitute 95% of mammalian inserted into two basic lacZ reporter constructs that used genomes; however, transgenic analysis is costly and labor in- either the hsp68 promoter (p610Za; ref. 6) or the Hoxb-4 tensive, and means of facilitating this approach are required. promoter (construct 8; ref. 6). Construct 1 contains the mouse We have shown (1) that the Japanese puffer fish Fugu rubripes 1.38-kb Sal I-Bgl II fragment ofHoxb-4 (wild type). Construct (Fugu) has a compact vertebrate genome and propose that one 2 contains a 1.28-kb Sal I-Bgl II fragment from mouse Hoxb-4, use of Fugu is the discovery of conserved gene regulatory where the conserved region (CR1) was deleted by removing sequences by comparison of Fugu noncoding sequences with the internal Mun I-Sfi I fragment. Construct 3 contains a 3-kb those of higher vertebrates. The HOM/Hox complexes are HindIII-Nco I mouse Hoxb-4 fragment. Construct 4 contains some of the best studied cases of structural, functional, and a 5.1-kbXho I-Sal I fragment from phage H26S6A1 containing regulatory conservation (2-5), and we therefore chose Fugu Fugu Hoxb-4. Construct 5 contains the 250-bp conserved CR2 Hox genes as examples to study the degree of conservation in region generated by PCR with oligonucleotides (BAM- gene regulation. As part of a systematic analysis of the Fugu TCTTGGTTCATGTTCTTTTAATGC and BAM-CTCTT- Hox gene clusters, the Hoxb-4 gene was cloned and sequenced, GCTCTATATTTTTCGATAC; BAM, a BamHI restriction since the regulation and function of this gene have been enzyme recognition sequence for cloning) from phage examined in detail in mice (6-8). Here we show that sequence H26S6A1. Construct 6 contains the 300-bp CR3 region gen- comparisons between Fugu and mouse can be used to identify erated by PCR with oligonucleotides (BAM-TCTTTTAAT- conserved regulatory elements that direct subsets of the full GTGAATGGGAGCAG and BAM-GGGTGTGAGTGTG- Hoxb-4 expression pattern in transgenic mice. TAGACGTACGC) from phage H26S6A1. All inserts were released from vector sequences prior to microinjection. MATERIALS AND METHODS Animal experiments were performed according to proce- Cloning and Sequencing of Fugu Hoxb-4. Degenerate PCR dures regulated under the project licences issued by the Home primers hox26rl and hox26fl whose sequences were derived Office and the Genetic Manipulation Advisory Group. The from the human and mouse Hoxb-4 exon 2 sequences were generation and analysis of transgenic mice from F1 hybrids used to amplify and clone a fragment of Fugu Hoxb-4 from (CBA x C57BL/6) were as described (6). Pseudopregnant Fugu genomic DNA. Taq DNA polymerase (Cetus) was used females were injected intraperitoneally with the anaesthetic Avertin at a dose of0.015-0.017 ml of a 2.5% solution per gram The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in Abbreviations: rn, rhombomere number; dpc, days post coitum. accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom correspondance and reprint requests should be addressed. 1684 Downloaded by guest on September 26, 2021 Genetics: Aparicio et al. Proc. Natl. Acad. Sci. USA 92 (1995) 1685 of body weight prior to embryo transfer. Embryos were pho- tographed with the aid of a Leitz M10 stereomicroscope and A Mouse MAMSSFLINSNYVDPKFPPCEEYSQSDYLPSDHSPGYYAGGQRRESGFQPEAAFGRRAPC camera. The numbers of embryos in each experiment are MAMSSYLINSNYVDPKFPPCEEYSQSDYLP-NHSPDYYSSQRQEPAAQPfDS.LYHHHPHH expressed in the figure legends as follows: the number of Fugu embryos with the pattern shown/total number of embryos Mouse TVQR.----A--YAACRDPGPPPPPPPPPPPPP2GLSPRAPVQPTAG---ALLPEPGQRSEA expressing lacZ. Fugu PQHQQRAEPPYTQCQRAGQPASVV -- MSPRGHVLPPSGLQTTPVPEQSHRCES Mouse VSSSPPPPP-CAQNPLHPSPSHSAC--KEPVVYPWMRKVHVSTVNPNYAGGEPKRSRTAY RESULTS Fugu VTPSPPPPPSCGQTPHSQGASSPASTRKDPVVYPWMKKVHVNIVSSNYTGGEPKRSRTAY Sequence Comparisons. Conceptual translation and com- Mouse TRQQVLELEKEFHYNRYLTRRRRVEIAHALCLSERQIKIWFQNRRMKWKKDHKI.PNTKIR parison of the cloned Fugu DNA sequence with the mouse Fugu TRQQVLELEKEFHYNRYLTRRRRVEIAHTLCLSERQIKIWFQNRRMKWKKDHKLPNTKVR Hoxb-4 sequence reveals multiple regions of similarity over the Mouse SGGTAGAAGGPPGRPNGGPPAL entire protein, confirming the Fugu sequence as Hoxb-4 (Fig. Fugu SGSTNTNSQVLTGSQNRTGP-L 1A). The pairwise identity score was 56% over the whole protein but approached 100% in the homeodomain region and B C3 90% in the N-terminal region diagnostic for paralogs in group I 4. The putative splice junction corresponds in phase and po- 0.6 sition to the mouse sequence. At the nucleotide level the C < 2 0 sequence was closer to b-4 than to other group 4 paralogs (data 0.4 0 wr not shown). a) 0 e'10 Most Fugu genes are smaller than their mammalian coun- 0.2 terparts because of unexpanded introns (ref. 1 and unpub- 0 /-k1- lished observations). However, in Hoxb-4 the introns in Fugu '. -, ,I and mammals are of a similar size; we presume that this is 0.5 0 1 -- 2 3 because of functional constraints during evolution due to the Fugu. intron, kb Fugu region A, kb presence of transcriptional regulatory elements within the D regionC regionA intron. Sequence comparisons between the intron and 3' non- B coding regions of mouse and Fugu Hoxb-4 revealed three Mouse 17 op1 Sa 1Bg Nc positionally conserved islands of sequence homology (Fig. 1 B #1 and C). It is intriguing that these sequences mapped within two - A +IacZ #2 large noncoding regions of mouse Hoxb-4 previously shown to 1kb contain enhancers that direct major aspects of the spatially #3 restricted patterns of expression (6). The first conserved is- cr1 cr2 cr3 land, CR1, mapped within the region-C enhancer. Region C Fugu II a mediates the proper spatially restricted expression pattern of E Sa X Hoxb-4 in the mesoderm and gives a subset of the full Hoxb-4 I #4 pattern in the central and peripheral nervous systems. The H +lacZ #5 second and third conserved islands, CR2 and CR3, are located H _ #6 within the region-A enhancer. This 3-kb region, 3' of mouse Hoxb-4, directs expression in the neural tube with a sharp FIG. 1. Comparisons of Fugu and mouse Hoxb-4 genes. (A) Con- anterior boundary at the junction of rhombomeres 6 and 7 ceptual translation and alignment of Fugu Hoxb-4 and mouse Hoxb-4 (r6/7) in the hindbrain (Fig. 3 A and B and ref. 6). Posterior amino acid sequences. Identical residues are indicated with asterisks, to r6/7, expression extends back along the entire length of the and conservative amino acid differences are indicated with periods.