Volume 12 Number 24 1984 Nucleic Acids Research

Divergence of U2 snRNA sequences in the of D. melanogaster

A.Alonsol*, E.Beck2, J.L.Jorcano3 and B.Hovemann4

'German Cancer Research Center, Institute of Experimental Pathology, 2Department of Microbiology, University of Heidelberg, 3German Cancer Research Center, Institute of Tumor and Cell Biology, and 4lnstitute of Molecular , University of Heidelberg, Heidelberg, FRG

Received 12 September 1984; Revised and Accepted 20 November 1984

ABSTRACT Four different U2-snRNA genes/related sequences of D. melanogaster were cloned and characterized. The sequences of all four genes suggest that they were generated by a DNA-mediated mechanism. These genes/related se- quences were found to be located in two loci, each locus containing two U2 snRNA sequences. Using coding sequences as well as flanking sequences as hybridization probes against polytene chromosomes of D. melanogaster Oregon R we were able to map these loci separately at positions 34BC and 84C. By Northern analysis we observed that the quantities of U2- and U1-snRNA are coordinated and change during the embryonic development of the fly.

INTRODUCTION Small nuclear RNA (snRNA) is a well-defined RNA population which has been found in all species so far analysed (1,2). This population is com- posed of six RNA species, two of which are involved in hnRNA splicing (3,4). One of them, U2 snRNA, is 190 long, and its sequence, like other snRNAs, is well conserved during evolution (2,5,6). Unlike rodents and primates, the fruit fly, Drosophila melanogaster, contains very few gene-related sequences for this snRNA (5). It is still unclear what mechanisms are responsible for the production of the large number of genes and related sequences found in mammals and rodents. Two different mechanisms have been postulated. One of these re- quires the reverse of snRNA molecules and their reintegration into the genome (7,8,9). The other mechanism is proposed to involve a re- combination of genomic DNA in which not only the coding sequences but also fragments of the flanking regions are exchanged. We have previously shown that D. melanogaster possesses between four to six gene-related sequences for U2 snRNA (5) and have demonstrated that these genes are arranged in two blocks, each block containing two differ- ent genes. In the present report, we present results which indicate that the second mechanism, i.e. the recombination of DNA, is the most likely explana-

© IRL Press Umited, Oxford, England. 9543 Nucleic Acids Research

A 1o 20 30 40 50 60 70 80 90 100

I ------JCACTcrAT.4ATTCCCAACr4CoTrCTGGCcC(;71TG7CTCrGCAGAGCCrGCC-rrr'TCCGrrrCAAGrrAeCCCrrcrcocrt 10 A--G-ccTrAAG7AATG.777G4*****************Y**e**************S**********R7AGCrccAACATCTGAArccTccccrTGGGGCrAC.4TcAcAcETGcTrTACrGAcT7rc8rG4TAGccttc;cT*********$*************-*********S****************

101 81 AYGCCrAAACCAAAGTGTGATCTGTTCTTJATCACTTAACATCCGArAAGrCC7CCArrGGAGGACAAC{AAACTGA4ACTCArvrCrCGGArCACA

201 CGGcAGTCrU+;GGACCTIGCq'CCACC TcrTcrcCCGGfTirPsC.CCCGGAJGCcAGrACCGCsCGGAr"GGsCCCAAC7GAArAATAAATrA rrtAAcrArrG **********************R************************-*****-*************Y********* *-********** ***y * 185 CGCAG7GC7AGGAGC77GTCCGCC7CCJGrCGCGGGrTCGCCC-C.A77CCACTACCyCCCGGATrtCCCGCCCAACTT4AAArAATA"TATAAACCAArr

301 TAAAArCAG;AGATC7AAC;TGCCAACACAG>-TAGACfT;TCCA 4AcrrcTCA--TTATCrTJCrAGrlr;crTArCAA A;rTATGCGT;TrGrGGTAGC' *A**** 9 * *** * R** ** * a * * *****R * *** * ** *S **Y * * *9r r*R Y *0 R*R* 285 IGAATA.-cAJTG7CcAvAcAACIAGAcirGGr-rATCATAAAAGrrAcUAASACTr4AJArTC4AAGA- TA4A4AArcrrJACCCC4AACAGtrAAcAC

366 ---C7GAICACCTACCACAArGAC7GCCCCATCTACCTCGCG7ACCAACCA ** *VA ** *aY* * 384 ATTG7G7TTGGC7GCGTATT.------

B 10 20 30 40 50 60 70 80 so too

I ------.-TCGATTCrGAAGTCrGCAAG7AAAGrAAT7CCCAACrGATrTA * **9* * R * YR**R ****************** 101 A IAAYAAATAcACAC?CAGAAGCGGA ICATTrCCTAsASAAGTATACAA ^+4GrA frT'rGTcrrrJcrrACAGGrTCTAAArArArAArCI'CA AC rGA7StrA

45 GC7GCAGTCGCACGAAG'!c( TrGTCC7TGGAAGCACTAGTTA TCGCT7CTCGGCCTTATGGCrAAGATCAA 4GrCTAGTArCTGTrcrT47CACCTTAA 201 TGCsAG7C"AFTGAAG7CC 7tGrc?CCCCTCGGAGGAGcrAG TArA CGC7f CrCGsCCCrA sCrAAGCA CAAAG rrGAGTA rCGrrcTrAfCAGCcrAA

145 CA7CTGATACT7CC7CCArTTGAGGACAACAA.47GTTAAACrCA rrTT?TGAA'CAGCACGGAGTGCTAGcGGrTGCrCCACCrTCrGCACGGG"GGCC; 3C I CAtCTGATAGfCCTCCATT5GGAGGACAACA 4ATGrAAACrGATrTrrGGAATCAGACGGACrGCTAGQGGcrTGcTccAccTcrGcCACGCGTIGGCC

245 CGG7A rGCcFG7AccCCGCCGCGAr'TCGCssCAAcrGAA rAA TA A ATATr;4tAAr7ATAAi AJrrACGA 4;AA rGc,rcrC;rrAC4AcrTr;crrATcACGA 401 CCCGJArrGCAG7AccGCCGGGArr rCG"cCCAACrcAATAA rA AArArrJAArJrAAAACATrJAGCGCAAGAAGCc rCorCrTACA AC rrtCrrrArTGAG

345 GAAAAGTCT;ACAC"7G;TcGcrJAAAAA AATcrcscscrcTrGrTACA?AAArCGA ------345o AAAAG7CTrACACGCITGACGTTAAAAACAATCTCccrrGTTrrACArGCAAATTCGA.A.rrAGGAr7rAcrr.ccr.rGACArTCGAA.AGrGC

404 ------_-______-______-. 601 C77JCCTCTGCCTCTTCGCATCGATTCTArTCCTrrtrGGCGGACGCAAGTTCCGcTIrGGccrrT7TG

C 10 20 30 40 50 60 70 80 90 100

I -GAATTCAcTCTTGTG--TGTCCTA.CAATTCCCAACTGcTrcTGGCCCTTr7GcrCATrGAG4GCCCc1rCGrTrtc?crTre7c4TArrtcccrtcTCGGC R**9* * * *9* **R R********** **Y*R**Y*Y ****R** *YY * * *s**s******* I TCGA TTC7GAAGTCTGCA 4CTAAACrAArTCCCAACTGA TTrTA ,CTcCAGTCcCATGAAGTCcrrTrcccTGGGAAG-GAGTAGTrATCGcrrcTCGGC

b i C I rATGCCrCA.ATCAAAGTGTAGTACTrTnrTATCAGCTTA 4CATC ;GArAGrTCCICCATTGGAGCACA ACA AA rc rr4.4 4c rCA rrrTGGAA4C ****9*******9* *9 90**9*0**9099999***900******9*******9*******9*****9**********-****0******************9 100 C 7 JA GGCTAAGA TC'AAA GTGTA GT Arc TGtTCT rAJCAGC TTAA CA rc rc;A rAC. rTcc rcCArTGGAGGACAACAAA rc'rTAA AC rGA TrTrrrGAArc

198s AGCACGGACr;crA GCACCTJ GC rccAcc~rGcTC"GGG;TrG¢CCCCGGrTTCCA CTACCsGCCGGAr r;GsCCCAAC:GAA r4A r4A A ArrTA 4C7; 200 AGACGGAGTGC7. GGGGCC TTG.C rC C ACC tCTGTC4CGGCT'+CrCCCCGJ A rTCCAGrACCGCC GGGATtrCGGCCCA CrG4AA TAA rAA r 4TITAA 7A

266S TGTAAAA?CACAGCGAY-T-ACTTGrC OACACA3TA6ACr7TCCA AAGTTGCTArTA TCTGrcrAGrtrCrTArCAAA 4TTrATTCCT?rrGTGG4G 9 p * ****R** ** *** **** Y 0 *5Y ***R * Y**Y ** * *** *Y * Y Y *7*** YR** *R 300 TAAAAt7TAGCGAAGAATGCTCTCT72'ACAtA( rTTCTTTArTGACGAAAACTCTAC4CGCT-rGACCrTAAAAACAATTCGCCCtrrCTTACArGCAAA

3be C J-GAICACCT4ACCACAATGAC7,CGCCATCTACCTCCCGTACCAACCA Y9 ** 3698 TCCA.------.

9544 Nucleic Acids Research tion for the generation of multiple genes or related sequences in the fly genome. Furthermore, we also present evidence demonstrating that, in D. melanogaster U2- and U1-snRNA genes are not intermingled in the fruit- fly genome.

MATERIALS AND METHODS The subcloning into pUC8, sequencing, northern hybridizations and computer comparisons were performed as previously described (5). In situ hybridization was performed as described by Saluz et al. (18). Hybridization was performed in 50% formamide, 5 x SSC, 2 mM phosphate buffer, pH 6.8, 1 x Denhardt reagent and 0.1% sodium dodecyl sulfate (SDS) at 420C. After hybridization, the filters were washed in 0.1 x SSC at 530C.

RESULTS We have previously shown that D. melanogaster contains very few genes or related sequences for U2 snRNA and have identified two different genomic loci (named clones 131 and 141) each of which contained two genes or related sequences, separated by several hundred nucleotides (131A and 131B as well as 141A and 141B). We have subcloned these fragments and characterized them according to their sequence and cytological location on polytene chromo- somes. Figure 1 shows a computer comparison of the sequences of the genes iso- lated from genomic clone 131 (5). This comparison shows that the coding region of both genes are very similar (differing by only 3 nucleotides) with 2 purine-purine transitions and 1 pyrimidine-pyrimidine transition. It is remarkable that the similarities do not stop at the 5'end of the coding sequence, but continue at least 65 base pairs upstream from the cap side of the mature snRNA. At the 3' end, the homology is only conserved in the first 12 nucleotides, being the first after the 3'end of the mature snRNA different. A comparison of the genes found in the genomic clone 141 is shown in Fig. 1B. In these two genes, the coding regions are identical. Furthermore, the upstream region is homologous as far as 60 nucleotides, whereas the downstream sequences are identical up to 125 nucleotides, the longest tract of clone 141B, which has been sequenced.

Fig. 1. Computer comparisons of U2 sequences isolated from different loci. AKZomparison of the two U2 sequences located in clone 131 (lA), in clone 141 (1B) or one gene/related sequence from clone 131 and another from clone 141 (1C). The U2-coding sequence is underlined. Pyrimidine or purine tran- sitions are denoted by Y or R, respectively.

9545 Nucleic Acids Research

A EcoRI 600bp Aval 400 bp EcoRI PUC8. -PUC U2 snRNA (131B)

84C A*ft>^ 34B4C

Fig. 2. A: Organization of the fragments used for the in situ hybridization. B: In situ hybridization of the clones showed in panel A with polytene chro- mosoiiies of D. melanogaster Oregon R.

Figure 1C shows a comparison of two genes from two different loci. The coding regions differ by 3 nucleotides, 2 purine and 1 pyrimidine transition. Furthermore, the 5'end flanking sequences lack homology, except for one tract located at position -58. This tract contains a putative TATA- box motif which can be found in all of the four genes described in this re- port. This homology leads us to speculate that this motif may be involved in the control of gene expression; however, no experimental data, are as yet available to support this point of view. Nevertheless, the presence of a similar motif in Ul genes (10), located at the same distance from the capping side argues in favour of this suggestion. At the 3'end, a tract of 20 homologous nucleotides is found. This tract contains a putative poly- adenylation/cutting signal AATAAA, as described for other polymerase-II transcripts. In this context, it should be mentioned that analysis of the RNA synthesized in Xenopus leavis oocytes injected with Ul genes, shows the presence of percursors about 18 bases longer than the mature snRNA. The presence of such precursors have been also detected in vivo (11). In situ hybridization We have already localized the position of Ul snRNA genes in polytene chromosomes of D. melanogaster. We have demonstrated that the isolated genomic clones for Ul did not show hybridization with labelled U2 snRNA. Nevertheless, it may well be that both types of snRNA sequences are close together, yet far enough apart to be identified using standard mapping techniques. To clarify this point, we subcloned into plasmid pUC8 two dif- ferent fragments, one 600 bp long which contains the coding region for the

9546 Nucleic Acids Research

U22_

a bc d e

: fi folwing times after egg fertilization (at 250C). a,1-ours; b, 30 hours; c, 50 hours; d, 90 hours; e, 8 days. The RNA was denatured, separated by acrylamide-urea gel electrophoresis, blotted onto activated ABM-paper and hybridized to nick translated cloned Ul and U2 snRNA. B: Toluidine-blue staining of a parallel gel loaded with the same as in A.

U2-131B, and a second 400 bp containing flanking sequences for the same U2 sequence. The structural organization of both fragments is shown in Fig. 2A. Figure 2B shows the results of the in situ hybridization. We could identify two different loci, located at positions 84C and 34BC hybridizing with the fragment carrying the U2-sequence. This last l-ocus also gave a signal with the 400 bp flanking sequence showed in Fig. 2A, and therefore demonstrates that this sequence is unique for this locus, and that this locus contains the sequences we have cloned in reccombinant 131 (results not shown). That these loci could be due to hybridization of possible repetitive sequences was ruled out by hybridizing Southern blots of genomic DNA (re-' sults not shown) with two different subcloned fragments, one from recombin- ant 131 and one from recombinant 141. In both cases the length and number of bands were identical, demonstrating that the flanking sequences cannot be responsible for the in situ hybridization (results presented above).

9547 Nucleic Acids Research

Synthesis of snRNA during development To analyse the synthesis of the Ul and U2 snRNAs during the fly devel- opment in detail, we isolated RNA from different stages after fertilization, separated them on agarose gel electrophoresis and after blotting onto acti- vated ABM paper, hybridized them to radioactively labelled Ul and U2 snRNA cloned sequences. Figure 3A shows the results of this experiment. Clearly the quantity of Ul and U2 snRNAs seems to be coordinated during the different develop- mental stages analysed. Furthermore, the RNA extracted from embryos 10 hours after fertilization shows the strongest hybridization signal, whereas the RNA from the first, second and third larval stages contain smaller amounts of Ul and U2 snRNA sequence. This quantity is clearly greater in RNA extracted from the pupa stages. The equal quantities of 5S RNA in all the stages analysed (Fig. 3B), demonstrate that the quantitative dif- ferences in signals observed were real and not due to different loadings.

DISCUSSION We have presented evidence demonstrating that the U2 snRNA genes- related sequences are organized in doublets in the genome of the fruit fly D. melanogaster. A similar organization has also been reported in other species (12,13,14). In humans the existence of tandem repeats, 6kbp in length the unit, have been described (6,15) and a similar structure has also been found in Xenopus laevis (16,17). Two differences between rodents, mammals and flies are obvious: firstly, the fly contains far less snRNA genes/related sequences than the other two species. Second, from our data it can be concluded that no tandem repeats exist as is the case with hu- mans and Xenopus in which long segments of flanking sequence are identical or highly homologous. The in situ hybridization data show that only two reacting loci were found after hybridizating polytene chromosomes to the cloned DNA depicted in Fig. 2A. We failed to identify the loci described by Saluz et al. (18) for his snRNA1, which they identify as U2snRNA. One possible reason for this failure may be the different methods used for the in situ hybridiza- tions: these authors utilized a crude snRNA preparation, isolated after two-dimensional electrophoresis with a presaturation of the hybridization with rRNA in order to avoid possible cross-reaction with ribosomal sequences. On the contrary, our probe was a cloned DNA fragment, containing one U2 snRNA sequence. Curiously, the number of genes given by these authors for

9548 Nucleic Acids Research snRNA1 agrees very well with the number we found after in situ hybridiza- tion and genomic reconstruction experiments (5). A remarkable observations is the homology found in the flanking se- quences between the different genes. As shown in the results section, this homology differs between the two loci analysed. This suggests that the ac- tual structural situation of U2 snRNA genes in D. melanogaster is the re- sult of amplification of an ancestral U2 gene by a DNA-mediated mechanism giving origin to the loci 131 and 141. Recombination between neighbouring sequences would then explain the presence of two genes in each locus as well as the homologies found in the sequences down- and upstream from the corresponding coding regions. It is intriguing that both genes found in clone 141 are identical in the more then 125 nucleotides sequenced so far at the 3' end. This suggests that the recombination events which probably gave origin to these two genes is more recent than the event by which the two other genes (loci 131) were produced. This suggestion is reinforced by the fact that the last two genes contained in clone 141 are slightly divergent whereas both genes found in clone 141 are identical. Such a mechanism has also been postulated for the generation of multiple Ul genes in humans (8,15). The difference between flies and humans is that the fly did not produce multiple pseudogenes as was the case in mammals. A further point that should be mentioned is the quantitatively coordi- nated production of Ul and U2 snRNAs found during development. As demon- strated in Fig. 3, both snRNAs are present in similar amounts. This suggests that the mechanisms regulating their synthesis could be related and that they may be engaged in similar function(s). The homologies found at the up- stream sequences of both Ul and U2 coding regions, may be responsible for this coordinated synthesis, although no experimental evidence is available as yet to support this point. Transfection experiments with snRNA genes which have been modified in vitro are currently being performed to elucidate the mechanisms regulating expression of these genes.

Acknowledgments This work has been supported by the Deutsche Forschungsgemeinschaft to A.A. (Al 158/3) and to the Forschergruppe Genexpression Heidelberg. We thank Dr. Suhai for his help with the computer program and Dr. L. Sanchez for the gift of flies.

*To whom correspondence should be addressed

9549 Nucleic Acids Research

REFERENCES 1. Busch, H., Reddy, R., Rotblum, L. and Choi, C.Y. (1982) Annu. Rev. Biochem., 51, 617-651. 2. Blin, N., leber, T. and Alonso, A. (1983) Nucl. Acids Res., 11, 1375- 1388. 3. Lerner, M.R., Boyle, J.A., Mount, S.M., Wolin, S.L. and Steitz, J.A. (1980) Nature (London), 283, 220-224. 4. Ohshima, Y., Itoh, M., OaEa, N. and Miyata, T. (1981) Proc. Natl. Acad. Sci. U.S.A., 78, 4471-4474. 5. Alonso, A., Jorcano, J.L., Beck, E. and Spiess, E. (1983) J. Mol. Bi ol ., 169, 691-705. 6. Westin,-G., Zabielsky, J., Hammarstrom, K., Monstein, M.S., Bark, G. and Pettersson, U. (1984) Proc. Natl. Acad. Sci. USA, 81, 3811-3815. 7. Denison, R.A. and Weiner, A.M. (1982) Mol. Cell Biol.,7, 815-828. 8. Weiner, A.M. and Denison, R.A. (1983) Cold Spring Harbor Symp. Quant Biol., 47, 1141-1150. 9. Van Arsdell, BW. and Weiner, A.M. (1984) Mol. Cell Biol., 4, 492-499. 10. Beck, E., Jorcano, J.L. and Alonso, A. (1984) J. Mol. Biol., 173, 539- 542. 11. Tani, T., Watanabe-Nagasu, N., Okada, N., Ohshima, Y. (1983) Nucl. Acids Res., 11, 1375-1388. 12. Card, C.O., NErris, G.F., Brown, D.T. and Marzluff, W.F. (1982) Nucl. Acids Res., 10, 7677-7688. 13. Marzluff, W.T7, Brown, D.T., Lobo, S. and Wang, S.S. (1983) Nucl. Acids Res., 11, 6255-6270. 14. Nojima, H. ai' Kornberg, R.D. (1983) J. Biol. Chem., 258, 8151-8155. 15. Van Arsdell, S. and Weiner, A.M. (1984) Nucl. Acids R-s., 12, 1463- 1471. 16. Mattaj, I.W. and Zeller, R. (1983) The EMBO J., 11, 1883-1891. 17. Zeller, R., Carri, M.T., Mattaj, I.W. and De Robertis, E.M. (1984) The EMBO J., 3, 1075-1081. 18. Saluz, H.P., schmidt, T., Dudler, R., Altwegg. M., Stumm-Zollinger, E., Kubli, E. and Chen, P.S. (1983) Nucl. Acids. Res., 11, 77-90.

9550