<<

Volume 1 1 Number 5 1983 Nucleic Acids Research Volume 11 Number 5 1983 Nucleic Acids Research

Repetitive satellite-ike sequences are present within or upstream from 3 avian -coding

L.Maroteaux, R. Heilig, D. Dupret and J.L. Mandel

Laboratoire de Genetique Mol&culaire des Eucaryotes du CNRS, Unite 184 de Biologie Moleculaire et de Genie Genetique de l'INSERM, Faculte de M&decine, 11 Rue Humann, 67085 Strasbourg Cedex, France Received 17 January 1983; Accepted 9 February 1983

ABSTRACT. Peculiar DNA sequences made up by the tandem repetition of a 5 bp unit have been identified within or upstream from three avian protein-coding genes. One sequence is located within an intron of the chicken "ovalbumin-X" with 5'-TCTCC-3' as basic repeat unit (36 repeats). Another sequence made of 27 repeats of a 5'-GGAAG-3' basic unit is found 2500 base pairs upstream from the promoter of the chicken ovotransferrin (conalbumin) gene. A related but different sequence is present in the corresponding region of the ovotransferrin gene in the pheasant, with 5'-GGAAA-3' as the basic unit (55 repeats). These three satellite-like elements are thus characterized by a total assymetry in base distribution, with purines restricted to one strand, and pyrimidines to the other. Two of the basic repeat units can be derived from the third one (GGAAA) by a single base pair change. These rela- ted sequences are found repeated in three avian genomes, at degrees which vary both with the sequence type and the genome type. of tandemly repeated sequences (including satellites) is in general studied by analysing randomly picked elements. The presence of conserved protein-coding regions neighbouring satellite-like sequences allow to follow their evolution at a single locus, as examplified by the striking comparison of the pheasant and chicken sequences upstream from the ovotransferrin gene.

INTRODUCTION Families of middle or highly repetitive sequences have been found in all eucaryotic genomes [and even in archaebacteria, (1)]. Some of these sequences are interspersed with unique sequences ; the Alu type sequences for instance can be found in the vicinity of expressed genes, or even within a transcription unit (2, 3, 4). Other families appear in the genome as long tandem repeats which can confer characteristic physical properties to the DNA, allowing their isolation as satellite components by buoyant density centrifugation. Some of these satellite are characterized by very short repeat units (5-10 bp) while others have a much longer basic element which in some cases contain hidden internal homologies (see ref. 5 for a review). Satellite sequences have been localised, by in situ hybridization, to hete- rochromatic regions (as in Drosophila) including centromeres and ,

© I R L Press Limited, Oxford, England. 1 227 Nucleic Acids Research and it is generally assumed that these sequences are not transcribed. It has been proposed that interspersed repetitive sequences play a role in specific gene regulation (6) and more recently attention has been focused on their possible mobility within the genome (7). Satellite sequences on the contrary have been thought to play a structural role in chromosome organization and in chromosome mechanic [pairing, recombination, etc. (see ref. 8)]. The presence of repetitive sequences has been demonstrated in the vici- nity or within some of the genes coding for egg white in the chi- cken : ovotransferrin (9), lysozyme (10), ovalbumin X and Y genes (11, 12), ovomucoid (Gerlinger,personal communication). All these genes are expressed in the oviduct under similar steroid hormone control. It was thus interes- ting to look for possible similarities in the organization of such repeti- tive sequences, and in their localization with respect to functional parts of the genes or of the active chromatin domain (13, 14). We report here the presence of sequences constituted by tandem repeats of a 5 bp unit, 2.5 kb upstream from the chicken and pheasant ovotransferrin genes and within an intron of the ovalbumin-X gene, which corresponds to some of the repetitive sequences previously identified. Each of these satellite-like elements has a characteristic repeat unit: two of these units (GGAAG and GGAGA) can be derived from the third one (GGAAA) by a single nucleotide change.

MATERIALS AND METHODS. Clones and DNA sequencing. For analysis of sequences upstream from the chicken ovotransferrin gene, we subcloned a 1.5 kb EcoRI-PstI fragment derived from the 4 kb "Ecob" fragment (9, see Fig. 1A, line a). A 6 kb EcoRI fragment containing the first exon and upstream sequences from the pheasant ovotransferrin gene was isolated after screening, with a chicken ovotransferrin cDNA probe, of a pheasant erythrocyte DNA library in bacteriophage x (Dupret et al., in pre- paration). A 950 bp PstI fragment containing the repetitive sequence was subcloned in pBR322 and used for sequence determination (see Fig. 1B, line b). The 1.7 kb and the 4.4 kb EcoRI subclones of the ovalbumin-X gene have been described previously (11, see Fig. 1C, line a). DNA sequencing was performed according to the Maxam and Gilbert proce- dure (15) using 5' end-labelled restriction fragments which had been in general submitted to strand separation by polyacrylamide gel electrophoresis as described previously (16).

1228 Nucleic Acids Research

Hybridization experiments. Fragments of cloned DNA were immobilized on DBM paper according to a modification (17) of the method of Alwine et al. (18) and hybridized to 150 ng of total nick-translated cellular DNA (specific activity 1.4 108 cpm/pg) for 18 h at 42°C in 6.4 ml of 40 % deionized formamide, 40 mM phosphate buffer (pH 6,5), 0.7 M NaCl, 2 mM EDTA, 1 x Denhardt Solution, 8 % dextran sulphate (19), containing 50 jg/ml of sonicated heat-denatured salmon sperm DNA. Filters were washed stepwise under three conditions of increasing stringency 200C 0.5 x SSC - 0.1 % SDS ; 68°C 2 x SSC - 0.1 % SDS and 68°C 0.5 x SSC - 0.1 % SDS. After each washing step the filters were exposed to Fuji RX films at -80°C, with a Philips Ultra-S intensifying screen.

RESULTS Location of repetitive sequences upstream chicken and pheasant ovotransfer- rin genes. Previous study of the cloned EcoRI "b" fragment containing the first exon from the chicken ovotransferrin gene (and upstream sequences) had de- monstrated the presence of a sequence highly repeated in the chicken genome, in a region located 2000 to 3200 bp upstream from the 5' end of the gene, corresponding to the fragment "Mbo a" (9, see Fig. 1A, line d). In the nei- ghbouring "Mbo b" fragment a sequence is located which cross-hybridizes to intron B of the same gene and which shows a lower degree of repetition in the chicken genome (9). We have performed a more precise localization of the two repetitive sequences by blotting onto DBM paper of restriction digests of a plasmid containing the 1.5 kb EcoRI-PstI fragment and hybridizing them to nick-translated total cellular chicken DNA. This allowed to map the high- ly repetitive sequence (labelled S in Fig. 1A and 5A1) within a 190 bp MspI-MboII fragment, and the other repetitive sequence to the adjacent 180 bp MboII-KpnI fragment (labelled R in Fig. 1A and in Fig. 5A1, lane 3). The ovotransferrin gene from pheasant was cloned in order to study its structure and compare it to that of the chicken gene (Dupret et al. in pre- paration). The first exon of the gene is contained within a 6 kb EcoRI frag- ment. Using the experimental design outlined above, we found repetitive sequences within the 620 bp PstI-SphI fragment located 2300 bp upstream from exon 1 (Fig. 1B and Fig. 5B, lanes 5 to 7). Location of repetitive sequences within introns of the chicken "ovalbumin-X" gene. We have previously shown that introns A and C of the chicken "oval-

1229 Nucleic Acids Research

0 1kb transcription

El EJa S.~~~~~~R PI A t_ b c t za t b d 111j e L.~ E 1 E

. ' H ii I .1 1? n | I II ITn.c IIIe

E E E|tRA i B | C UH D | E Ia C * r '~ ~~ 11 1I . 6t' ? . c II1 1I 1111i1 I .d IJ f

Figure 1 Maps of the three genomic regions. Satellite-like sequences (S) are ind cated by the hatched areas. The dotted lines correspond to other repetitive sequences (R). Exons (numbered 1, 2, etc.) are represented by black boxes. The arrows above the S regions indicate the direction and ex- tent of sequence determinations. The thick arrows correspond to the auto- radiograms presented in Fig. 2. A) Chicken ovotransferrin gene. Line a: map of the 4 kb EcoRI fragment (fl, E2) and of the 1.5 kb EcoRI (El, P1) fragment used in this study. Lines b to d: location of repetitive sequences by blot-hybridization with nick-translated chicken DNA (lines b to d corres- pond to lanes 1 to 3 in Fig. 5). The plasmid containing the Ecol-Pstl frag- ment was digested by EcoRI + PstI + MspI (?) (b), by KpnI + AvaI (c) and by EcoRI + PstI + MboII (t) + HhaI (d). The MboII fragments "a" and "b" contain the S and the R elements respectively (see text). Strongly hybridizing frag- ments are indicated by heavy lines, weaker signals by thin lines. Line e resulting location of S and R elements. B) Pheasant ovotransferrin gene line a : Map of the 6 kb EcoRI fragment cToned from pheasant DNA (D. D. et al., in preparation). Lines b to d : location of the repetitive element by blot hybridization with nick-translated pheasant DNA, (lines b to d corres- pond to lanes 5 to 7 in Fig. 5). The plasmid containing the 6 kb EcoRI frag- ment was digested by PstI (b), by PstI + PvuII (t) (c) and by SphI + AvaI(t) (d). Line e: resulting location of the "S" element. C) Chicken "ovalbumin-X" gene. Location, within the 1.7 and 4.4 Kb EcoRI fragments (11), of the lea- der-exon (L), of exons 1 to 4 and of introns A to E. Lines b to e : the plasmid containing the (1.7 + 4.4) kb EcoRI fragment was digested by Sau 961 + Msp (t) (b) and by HgiAI + BglII (1) + HhaI (c) allowing the detection of the repetitive regions in introns A(R) and C(S). The repetitive element in intron C, which corresponds to the satellite-like sequence (S), was further located using digests of plasmid containing the 4.4 kb EcoRI fragment with DdeI + HhaI (d) and MboII + ClaI (t) (lines b to e correspond to lanes 9 to 12 in Fig. 5). Line f: resulting location of S and R elements.

1230 Nucleic Acids Research bumin-X" gene contain regions of repetitive sequences (11). The two regions, labelled R and S respectively in Fig. 1C and 5A1, lanes 9 to 12, have now been localized more precisely within a 290 bp long segment limited by Sau96I and HgiAI sites in intron A, on both sides of an EcoRI site, and within an MboII-DdeI 420 bp fragment in intron C (see Fig. 1C and Fig. 5A1). Presence of satellite-like sequences corresponding to repetitive regions. Sequence determination of the repetitive regions described above shows that three of them contain satellite-like structures, made from tandem re- peats of a 5 bp basic unit: these occur in the 5' flanking region of the ovotransferrin gene in chicken and pheasant, and within intron C of the chicken "ovalbumin-X" gene (Fig. 2, the corresponding regions are indicated by the hatched areas labelled S, in Fig. 1A, B and C). The MspI-MboII fragment upstream from the chicken ovotransferrin gene contains 27 tandem repeats of a GGAAG basic unit (Fig. 2A and 3A) with va- riations in only 2 of the repeats (one transition, one insertion). In the region upstream from the pheasant ovotransferrin gene, a 291 bp long sequence is derived from repetitions of a GGAAA unit (Fig. 2B and 3B). The repeating pattern is not as well preserved as in the chicken sequence described above, since 10 of the repeat units are modified by single base pair changes, (transitions only) and the 5 bp spacing is altered at 8 loca- tions. Each of these spacing alterations can be interpreted either as an insertion of 2 purines between two adjacent repeats, or as a deletion of 3 bp in a single unit (see Fig. 3B and table I). Some longer range organiza- tion can be detected: alterations of the repeat pattern are found mainly in the 5' third of the satellite region (13 out of 18) with no more than two consecutive exact repeats of the basic unit. However, a 34 bp sequence con- taining 4 alterations (2 insertions, 2 base changes) is exactly repeated (indicated by small brackets in Fig. 3B). On the contrary, in the remaining two thirds of the satellite element, two stretches of 13 and 12 consecutive repeat units are found, each stretch being followed by a 7 bp derivative of the basic motive (AAAGAAA). This characteristic organization points out to past occurance of unequal exchange events. The chicken and pheasant sequences are both found about 2.5 kb from the 5' end of the ovotransferrin gene and are flanked on both sides by more complex sequences which show a 82% (excluding a few deletions or insertions) over at least 289 nucleotides (Fig. 4, the homology might be preserved for longer distances, but we have not performed additional se- quence determination in the two regions). The two satellite-like elements

1231 Nucleic Acids Research

T (Ir t. L G A+G A-C T+C C Aj B C ->:

'It I 4 I S S e 4

os GO- A- A--

A - £_G t.uV am

-~ . S ta 5I ILa

G'tA- G 4 G, ft: G " aw ao f !4

Fl.2: Autoradiograms of sequencing gels showing the satellite-like regions. AT Sequence pattern of the 'S' element located upstream trom the chicken ovotransferrin gene. The sequence was obtained using a fragment 5'end-label- led at an upstream MspI site (thick arrow in Fig. 1A, line a). The T residue present at the bottom just precedes the "S" element and a single C residue is inserted within the repeating pattern (starred in Fig. 3A). B) Sequence pattern of the "S" element located upstream from the pheasant ovotransferrin gene. Sequencing was performed from a 5'-labelled DdeI site (thick arrow in Fig. 1B, line a). In the lower part, the repeating pattern is altered by many single base changes (transitions) or by insertions of purine residues, while two longer stretches of exact repeats appears in the upper part. In this sequencing experiment some background is found in the T+C and C tracks. However true pyrimidine residues have a stronger reaction as shown by the single T present at the bottom of the figure. C) Sequence pattern of the "S" element located within intron C of the "ovalbumin-X" gene. Sequencing was performed from a 5'end-labelled DdeI site (thick arrow in Fig. 1C, line a). The regular repeating pattern is followed, in the upper part, by a 105 bp long purine rich region.

1232 Nucleic Acids Research

A * GAGGTCTGGAAAATGAAGGAAGGGAGGGGAAGGGAAGGGAAGGCGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAG

GGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAG GAA4 GAGATGGAA

B r ** * F* GGGAGGACTGGAAAATGAGAGGAAAGGAGAGGAAAGGAAAAAGGAAAGGAAAGAAAAGAAAAGGAAAAAGGAAAGGAAA * *D .* F- .* * * AGAAAGGAAAGAGGAAAGGAAAAAGGAAAGGAAAGAAAAGGAAAAAGGAAAGGAAAAGAAAGGAAAGAGGAAAGGAAAGG

AAAGGAAAGGAAAGGAAAGGAAAGGAAAGOAAAGGAAAGGAAAGGAAAGGAAAAAAGAAAGGAAAGGGAAGGAAAGGAAA

GGAAAGGAAAGGAAAGGAAAGGAAAGGAAAGGAAAGGAAAGGAAAGGAAAAAAGAAAGGAAAGGAA; GGACTGTGGAGAT .--

C GGCTTGTAAAGGACAAGAAGATTTTTCCGTTTCCTTT TTTTTCCTTTCCCCC TCATTTTTCCTTTCCCCC CACT

TTCTCTCTCCTTTTCCCACTTCCCTCTTTCCTTTACATTTCCCA TCF TCTCCTCTCCTCTCCTCTCCTCTCCTCTCC

TCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCC

TCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCTCTCCT TCCTTCTATTCTTTTTG

CTAGAGCATTTAGATGGTTATGTAGAACAATTCACAAAACACAATCAGACAAATCACTCACATTTTCTGTTTCTTATCAc

Figure 3: Sequence of the three satellite-like elements. Limits of the tan- demiy repeated regions are indicated by square brackets. Exact repeats of the basic units are underlined. Single base pair changes are pointed out by arrows. Insertions with respect to the 5 bp spacing are indicated by stars. A) S-element located upstream from the chicken ovotransferrin gene. B) S-element located upstream from the pheasant ovotransferrin gene. Small brackets indicate the limits of a 34 bp exactly . C) S-element present in intron C of the chicken ovalbumin-X gene. The 17 bp exact repeats are boxed. should thus correspond to divergent evolution of a sequence present in this region before separation of chicken and pheasant lineage. Finally in intron C of the chicken "ovalbumin-X" gene, we find 36 re- peats of the TCTCC (or GGAGA) motive, without any variations (Fig. 2C and 3C). This sequence is preceded by a 100 nucleotide long pyrimidine rich region, with no simple organization but which is characterized by the pre- sence of small homopolymeric runs of 3 to 5 residues and by the presence of an exact 17 bp (with a 4 bp spacing) (boxed in Fig. 3C). No

1233 Nucleic Acids Research

TABLE I: Characteristics of the 3 satellite-like elements.

Type and frequency of alterations of the Gene and Basic Length Number repeat unit. species unit (bp) of 5 bp units Spacing Base Changes alterations** Transitions Transversions

Ovotransferrin GGAAG 136 27 1 1 0 J (chicken)

Ovotransferrin GGAAA 291 55 8 (16) 10 0 (pheasant)

Ovalbumin-X GGAGA* 180 36 0 0 0 (chicken)

* TCTCC with respect to the orientation of transcription. * The first number corresponds to the number of gaps with respect to the repeating pattern. The number in parenthesis indicates the number of nucleotides included in these gaps.

such sequences are found in the corresponding intron C of both the ovalbumin and Y genes, which have been completely sequenced (16, 20, 21). Presence of the satellite-like sequences in three avian genomes. We have compared the occurence of sequences rel ated to each of the three satellite-like elements described above, in the homologous genomes (chicken and pheasant) and in a more distantly related avian specie, the duck. Genomic fragments corresponding to the chicken or pheasant ovotrans- ferrin 5' flanking regions or to part of the chicken ovalbumin-X gene, were digested with restriction enzymes which separate the different repetitive regions present in the same fragment or which allows them to be mapped wi- thin relatively narrow limits. The digests were electrophoresed, transferred and blotted onto DBM paper and hybridized to total nick translated genomic DNA from either chicken, pheasant or duck. Care was taken to put on the gel similar molar amounts of restricted DNA of each type (see Fig. 5D) and to use equal concentrations of probes labelled at similar specific activities. Under relatively low stringency conditions for hybridization (40% formamide at 42°C, 0.7 M NaCl ) and washing (0.5 x SSC, 20°C) the chicken DNA probe revealed approximately equally chicken and pheasant ovotransferrin "satel- lite regions" (labelled S in Fig. 5 Al, lanes 1-3 and 5-7) while the signal with the "ovalbumin-X" satellite region was about 3 times weaker (lanes 9-12).

1234 Nucleic Acids Research

rTGAACCCTGTGCTGACTTGGGGGTCAGGCGGGTGCAGC ------GGGCATGTGTCCCC------GGCA $****$**********g****$***** ** ***** * *********** ** TGGAACCCTGTGCTGACTTGGGGGTCAAGCAGGTGCCTGCTGAGTGCCAGTCTGTGTCATOTGTCCCCCCTGGGCTACCA I GCAGGGAGCAGGGAGGGAGGTCTGGAAAA GAAGOGAAG------116------GGAAGGGAA GAGATG ******t****$**$**t*** ********S *g* **** *$** **** *** GCAGGGAGCAGGGAGGGAGGACTGGAAAA GAGAGGAAA------271------GGAAAGOAA GACTGT

GAAGAAGTGGCACTGATGGATACGGTCAGTGGGCATAGTGGGGATGGGTTGGTGGTTTTGGGCTTGTGGTCCTTGGGGG * *** ** ******* **** **t***g******* **************** t*********** ** *** * ** GGAGATGTAGCACTGAGGGATGTGGTCAGTGGGCATGCTGGGGATGGGTTGGTGATTTTGGGCTTGGGGATCTTAGAGGT

TCCTTAAAATAATAATGATCTTATTCTGTTCTAAAAATGCATTCCCATCAGGCTGCCCTCCTTGCCTTCACCTGCTATGA $$**$* ***g**** *gg***********$******** t***$****** t*********** t******* t*** CTTTTCCAATCTTAATGATCCTATTCTGTTCTAAAAATGCATTCTCATCAGGCTGACCTCCTTGCCTTTACCTGCTGTGA

TGT-ACCTGCGAACAGCACAGGATGGGATAGGTACC------CHICKEN-OVOTRANSFERRIN-GENE * t**** * ******t********* * * * CATGACCTATGGGCAGCACAGGATGGGAACTGGAGC------PHEASANT-OVOTRANSFERRIN-GENE

Figure 4: Alignment between the chicken and pheasant sequences which flank the ovotransferrin "S-elements". The sequence of the chicken region is pre- sented above the pheasant sequence. Only the first and last two repeats are indicated for each element.

The stronger hybridizing bands (labelled R in Fig. 5 Al) in the X gene frag- ments correspond to the repetitive region in intron A. [In the case of the chicken ovotransferrin, where two types of repetitive sequences are found in very close proximity, only the digestion with MboII allows the two regions to be separated and shows that the signal is much stronger for the satel- lite-like element (S) than for the other repetitive region (R) (Fig. 1A and Fig. 5A1, lane 3)]. A comparable image is obtained using pheasant DNA as a probe under the same experimental conditions (Fig. 5B1). With duck DNA however (Fig. 5C), the signal obtained with the pheasant ovotransferrin S region is somewhat stronger than with the corresponding chicken region and is much stronger than with the S region of the ovalbumin-X gene [The repetitive (R) region in intron A of this gene hybridizes very poorly to the duck probe, in complete contrast with the results obtained with the chicken or pheasant probes]. When a high stringency wash is applied to the same hybridized filters (0,5 x SSC at 68°C) the patterns obtained are strikingly different, since the phea- sant ovotransferrin DNA fragments (lanes 5 to 7) disappear completely with the pheasant probe (Fig. 5B2) and are much reduced with the duck and chicken probes (Fig. 5A2 and 5C2), compared to the chicken ovotransferrin fragments. The high stringency washing did not diminish appreciably the signal with the S region of the ovalbumin-X gene, but resulted in an almost complete disap- pearance of the signal corresponding to the R region (intron A) with the

1235 Nucleic Acids Research

Chiceni rhP easaka3 C c en Chicken Pheasantt Chi cken Dovtranssf. ovotransf. Dvalbunin - h ovotransf. ovotransf. ovalobumin - x 1 2 3 4 5 6 7 8 9 11O 12 1 2 3 4 5 6 7 89 10 11 12 Al A2 1 800 -L 1 400- 1 250- _ Cicm to Z 900- % c'J

w _ C: 620- _IC-, 530- _02 400- w 310- B1 B2

V cm a- tob * 4- I e0 V _a]Cr or C1 C2

-WI; cC 0s ID fr- CDa-1 w UN

lowi stringency high stringency

1236 Nucleic Acids Research

hi cken Pheasant Chi ckee ovoransf. ovotransf. ovalnunmin - 1 2 3 4 5 6 7 8 9 10 11 12 4

of "satellite-like -- E ~~~~~~~~Representationsequences" in avian genomes Genome Seq ence family GGAAG GGAAA GGAGA Chicken +++ + ++ Pheasant ++ - ++ Duck ++ + +

Fiqure 5 : Representation of the three types of satellite-like sequences in three avi an genomes. Equivalent molar amounts of plasmids containing the repetitive sequences were digested with restriction enzymes, and the frag- ments were blotted onto DBM paper and hybridized to nick-translated DNA from either chicken, pheasant or duck as described in Materials and Methods. A, B and C correspond to chicken, pheasant and duck DNA probes, respec- tively. Al, Bi, Cl correspond to the low stringency (determined by the hybridi- zation conditions : 40 % formamide 42°C, 0.7 M NaCl, see Materials and Methods). A2, B2, C2 correspond to filters washed at high stringency (0.5 SSC, 68C). The ethidium bromide staining pattern is shown in panel D. Lanes 1 to 3: EcoRI-PstI subclone of the chicken ovotransferrin gene digested by MspI + EcoRI + PstI, KpnI + AvaI, and MboII + HhaI + EcoRI + PstI respectively. Lanes 5 to 7 : plasmid containing the 6 kb EcoRI fragment of the pheasant ovotransferrin gene, digested by PstI, PstI + PvuII and SphI + AvaI respec- tively. The presence of the S element results in a 10 peculiar mobility of the corresponding restriction fragments. Lanes 9 and 10: plasmid containing the (1.7 + 4.4) kb EcoRI fragment of chicken ovalbumin-X gene, digested by Sau96I + MspI and HgiAI + BglII + HhaI respectively. Lanes 11 and 12 plasmid containing the 4.4 kb EcoRI fragment of the ovalbumin-X gene, diges- ted by DdeI + HhaI and MboII + ClaI respectively. Lanes 4 and 8 : markers whose sizes are reported on panel Al. The lower intensity of the signal in lane 5, compared to lanes 6 and 7 (panel C) is due to an artifact of blot- ting, as checked in a separate experiment (not shown). The same exposure times were used after both washing conditions. S and R correspond to frag- ments containing satellite-like elements or other type of repetitive sequen- ces respectively. Panel E presents a sunmiary of the results obtained after the high stringency wash. One, two and three crosses represent arbitrarily hybridization signals of increasing intensity.

1237 Nucleic Acids Research chicken or pheasant probe (Fig. 5A2 and 5B2, lanes 9 to 12). The results obtained after the high stringency wash are summarized in Fig. 5E. In a reconstruction experiment (not shown) using the same blots and a chicken total genomic probe to which various amounts of nick-translated pBR322 were added, an equivalent signal was obtained for pBR containing fragments and for the fragment containing the chicken ovotransferrin "S region" when the dilution of the plasmid was equivalent to a thousand copies per genome. However given the internally repetitious nature of the satellite like se- quences, which will affect hybridization kinetics, one cannot derive from this experiment an absolute number for the degree of repetition of such sequences in the avian genomes. It can be concluded that the "GGAAG element" found upstream the chicken ovotransferrin gene is highly repeated in all 3 avian genomes. The "GGAGA element" is also highly repeated in the genomes of chicken and pheasant, but less so in the genome of the more distantly related duck species. On the con- trary the "GGAAA element", as found upstream the pheasant ovotransferrin gene, is much less frequent in chicken DNA than the two other sequence types and is even rarer in the pheasant genome (see Fig. 5E). However sequences related to this "GGAAA" are frequent in all 3 genomes since hybridization with the avian DNAs yield a very strong signal with the "GGMA region" when the higher stringency wash is ommitted. This signal cannot be due to cross hybridization to the "GGAAG" or the "GGAGA" sequences, since a control experiment (not shown) demonstrated that the S region of the ovalbu- min X gene (GGAGA type) does not hybridize, under the conditions used for the experiments in Fig. 5, to the pheasant S region. In fact it has been shown that DNA hybrids between satellites characterized by 7 bp units which differ by a single base pair, have a Tm diminished by 20°C, relative to the homoduplex (see ref. 5).

DISCUSSION We have identified 3 new types of repetitive sequences present in avian genomes which share homologous features: they are made by tandem repeats of a 5 bp basic unit, characteristic of each sequence type ; they have a total- ly assymetric base composition, with purines on one strand, and pyrimidines on the other ; the basic units of two of the sequences (GGAGA, GGGAA) can be derived by a single base pair change from the GGAAA unit characteristic of the third one. These sequences thus resemble the satellite sequences made of tandem repeats of a short basic unit which have been identified in several

1238 Nucleic Acids Research species, notably in Drosophila virilis and melanogaster, where families of simple sequence satellites share closely related repeat units (of 5 to 10 bp, see ref. 5). In general, such sequences have been localised in hetero- chromatic regions inactive in transcription (for an exception, see however ref. 22). In the case studied here, the 3 satellite-like elements are found, interspersed with unique sequences, within or upstream from genes which can be actively transcribed. It is of course possible that some or most of the elements of these repetitive families are organised in much longer stretches in inactive regions of avian genomes and might thus correspond to one or the other of the numerous satellite components isolated in chicken or in other birds by differential buoyant density centrifugation (23). It has recently been shown for more complex tandem repeat sequences (such as the aRI human sequence), that isolated elements can be present interspersed within unique sequences (24), which suggests that they might have some mobility in the genome. In the present case a satellite element is found at the same loca- tion upstream from the ovotransterrin gene in chicken and pheasant, whose lineages diverged approximately 20 million years ago (25). On the contrary, the satellite-like sequence present in intron C of the ovalbumin-X gene is absent from the two other members of the ovalbumin gene family in chicken. Sequence analysis has suggested that the three genes diverged at least 50 to 80 million years ago (11, 16). A study of ovalbumin family genes in other avian species might reveal whether the differential presence of the satelli- te element is due to deletion of a sequence existing in the ancestor gene, or to an insertion which occured in the X gene. The elements we describe in this paper can also be compared to the gro- wing number of "simple sequences" which have been identified within or in the vicinity of several protein-coding genes. Most strikingly, a region containing 26 repeats of the GGAGA motive has been found upstream from the Ca imunoglobulin gene in mouse (26). It is thus exactly homologous to the S element identified within the ovalbumin-X gene (although it is found in the opposite direction with respect to transcription). It is not known whether this type of sequence is highly repeated in the mouse genome. Another close- ly related sequence, AAGAG has been found to constitute the basic 5 bp unit of one component of a Drosophila melanogaster satellite DNA (27). Other simple sequences which have been characterized are constituted by repeats of dinucleotides [TG or CA, in introns of the human y globins (28), human car- diac (29), and within or downstream some immunoglobulin genes in mouse (30, 31) ; or CT in a spacer between histone genes (32)], trinucleotides

1239 Nucleic Acids Research

[TCC and TCA, upstream some mouse VH genes (33)] tetranucleotides [ GATA or GACA, in the sex specific sequences from the snake Elaphe radiata (34)]. A variable length polymorphic region upstream human insulin gene is consti- tuted of repeats of a 14-mer element (35). These simple sequence elements, except in the insulin case, are of smaller size than those identified in the present work, and up to now only the TG motif has been demonstrated to be highly repeated elsewhere (in the human genome) (29, 36). The "insulin" sequence on the contrary is apparently unique, at least under the hybridi- zation condition used (35). The case of the collagene gene, where the exons coding for the triple helical part of the protein have evolved from tandem repeats of a 9 bp unit (37) shows that this type of organization can be found also in protein coding sequences. The catalogue of simple sequences detected in the vicinity of protein- coding genesis thus increasing rapidly, and it can be asked whether they af- fect the function or evolution of the neighbouring genes. The presence of such sequences might induce local alteration of the DNA conformation, like transition to the Z form [for (TG)n sequences (38)] or local unpairing as demonstrated by Si sensitivity of homocopolymeric regions in a plasmid con- taining a cloned histone gene (39). DNAs consisting of purines in one strand and pyrimidines in the other strand (like the ones described in this paper, or like a 300 bp region upstream from the "ovalbumin-Y" gene (16) may assume anomalous structures (40). It would thus be interesting to know whether the presence of such regions has an effect on in vivo or in vitro expression of the adjacent genes, and whether it leads to local alterations in chromatin structure. On the other hand it has been suggested that simple sequences, which will reanneal at a very high speed, but which can be subject to slippage mechanisms (39), might be involved in recombination reactions, like the which occurred at the human y globin locus (28) and would thus allow to maintain some homogeneity in complex gene families. There is pre- sently no evidence that the "S elements" had a directional effect in the evolution of the genes studied here. In the case of the ovalbumin gene fami- ly, sequencing of the X and Y genes has provided convincing arguments for the occurence of gene conversion events during evolution (Heilig et al., in preparation), the most recent one involving the region between introns E and G. The TCTCC repeat observed in intron C does not appear to be correlated to such events since homology between X and Y genes is the same in exons 2 and 3 which flank intron C (no simple sequence is found within the corresponding

1240 Nucleic Acids Research

"ovalbumin-Y" gene intron). In the ovotransferrin case the same 80 to 88 % homology is found between chicken and pheasant sequences upstream or down- stream the S element. Evolution of satellite sequences has been studied at the level of popu- lations of sequence or by characterizing the structure of a stretch of tan- dem repeat at one point in its evolution. This has suggested that mechanisms of unequal crossing-over, resulting in contraction or amplification of the repeat number, or gene conversions, can maintain homogeneity within a family while allowing evolution of families characterized by slightly different repeat units. The presence, near a satellite-like region, of a well charac- terized unique sequence, corresponding to a protein-coding gene conserved during evolution, allows to follow the same satellite-like element in va- rious species. Thus, we have demonstrated a one base difference between the 5 bp repeat units which constitute the simple sequence elements localized at homologous positions upstream from the chicken and pheasant ovotransferrin genes. This difference might have arisen in two alternative ways: either a short purine rich sequence was present in the ancestral genomic region and was independently amplified in the pheasant and chicken lineages, suggesting that this primordial sequence might be a hot spot for amplification events (by slippage during replication for instance) ; or the satellite-like se- quence was already present in the ancestor, and the succession of correc- tion/amplification events led to a change in the repeat unit of one of the satellite sequence. An analysis of this locus in other avian species might help to resolve this point. The present data shows that the chicken sequence must have been subjected to recent amplification or correction events since there are only two variants out of 27 repeats. On the contrary, the pheasant sequence has accumulated many base changes and insertions (or deletions) although the existence of a longer range organization (see results) suggests that unequal exchange events also occurred within this sequence. It is in- teresting to note in this case that the ten single base changes which oc- cured with respect to the repeat unit are all transitions (see table I), while comparison of neighbouring sequences between chicken and pheasant shows an about equal rate of transitions and transversions. Whatever the mechanism (selective pressure or intrinsic property of this type of sequen- ces), this suggests a tendancy to conserve the extreme assymetry of the sequence. Thus, as it has been possible to reconstitute evolutionary path- ways for protein, and more recently for protein-coding genes, it should be

1241 Nucleic Acids Research possible to follow the events in the history of satellite-like elements, and estimate the frequency of occurence of amplification or correction events.

ACKNOWLEDGEMENTS. We wish to thank Prof. P. Chambon for his constant support and help- ful discussions, Prof. J. Kaye for editorial help, C. Kloepfer and J.M. Garnier for excellent technical assistance and B. Boulay and E. Badzinski for preparation of the manuscript. This work was supported by the Centre National de la Recherche Scientifique (ATP 006520/50) the Institut National de la Sante et de la Recherche Medicale (PRC 124.026), the Fondation pour la Recherche Medicale Fransaise and the Fondation S. et C. Del Duca.

REFERENCES 1. Sapienza, C. and Doolittle, W.F. (1982), Nature 295, 384-389. 2. Di Segni, G., Carrara, G., Tocchini-Valentini, G_7, Shoulders, C.C. and Baralle, F.E. (1981), Nucleic Acids Res. 9, 6709-6722. 3. Kidd, V.J. and Saunders, G.F. (1982), J. Biol. Chem. 157, 10673-10680. 4. Jelinek, W.R. and Schmid, C.W. (1982), in Ann. Rev. BTohem. (eds. E. Snell, P.D. Boyer, A. Meister and C.C. Richardson) Vol. 51, Annual Reviews Inc., pp. 813-844. 5. Brutlag, D.L. (1980), in Ann. Rev. Genet. (eds. H.L. Roman, A. Campbell and L.M. Sandler) Vol. 14, Annual Reviews Inc., pp. 121-144. 6. Davidson, E.H. and Britten, R.J. (1979), Science 204, 1052-1059. 7. Calabretta, B., Robberson, D.L., Barrera-Saldana,7.A., Lambrou, T.P., and Saunders, G.F. (1982), Nature 296, 219-225. 8. Bostock, C. (1980), Trends Biochem73ci. 5, 117-119. 9. Cochet, M., Gannon, F., Hen, R., Maroteaux, L., Perrin, F. and Chambon, P. (1979), Nature 282, 567-574. 10. Baldacci, P., RoyaT,A., Bregegere, F., Abastado, J.P., Cami, B., Daniel, F. and Kourilsky, P. (1981), Nucleic Acids Res. 9, 3575-3588. 11. Heilig, R., Perrin, F., Gannon, F., Mandel, J.L. and Chamibon, P. (1980), Cell 20, 625-637. 12. Heilig, R., Muraskowsky, R. and Mandel, J.L. (1982) J. Mol. Biol. 156, 1-19. 13. Bellard, M., Dretzen, G., Bellard, F., Oudet, P. and Chambon, P. (1982) EMBO Journal 1, 223-230. 14. Lawson, G.M.7,knoll, B.J., March, C.J., Woo, S.L.C., Tsai, M.-J. and O'Malley, B.W. (1982), J. Biol. Chem. 257, 1501-1507. 15. Maxam, A. and Gilbert, W. (1980) in MetIi-ds in Enzymology (eds. L. Grossman and K. Moldave) Vol. 65, Academic Press, New York, pp. 499-560. 16. Heilig, R., Muraskowsky, R., KT5epfer, C. and Mandel, J.L. (1982), Nucleic Acids Res. 10, 4363-4382. 17. Bellard, M., Kuo, MT., Dretzen, G. and Chambon, P. (1980), Nucleic Acids Res. 8, 2737-2750. 18. Alwine, J.C., Kemp, D.J. and Stark, G.R. (1977), Proc. Natl. Acad. Sci. USA 74, 5350-5354. 19. Wahl, G.M., Stern, M. and Stark, G.R. (1979), Proc. Natl. Acad. Sci. USA 76, 3683-3687. 20. linoist, C., O'Hare, K., Breathnach, R. and Chambon, P. (1980), Nucleic Acids Res. 8, 127-142.

1242 Nucleic Acids Research

21. Woo, S.L.C., Beattie, W.G., Catterall, J.F., Dugaiczyk, A., Staden, R., Brownlee, G.G. and O'Malley, B.W. (1981), Biochemistry 20, 6437-6446. 22. Varley, J.M., Macgregor, H.C. and Erba, H.P. (1980) Natuire 283, 686-688. 23. Cortadas, J., Olofsson, B., Meunier-Rotival, M., Macaya, G. aWnd Bernardi, G. (1979), Eur. J. Biochem. 99, 179-186. 24. Darling, S.M., Crampton, J.M. and Williamson, R. (1982) J. Mol. Biol. 154, 51-63. 25. Prager, E.M. and Wilson, A.C. (1975), Proc. Natl. Acad. Sci. USA 72, 200-204. 26. Davis, M.M., Kim, S.K. and Hood, L.E. (1980), Science 209, 1360-1365. 27. Fry, K. and Brutlag, D. (1979), J. Mol. Biol. 135, 581M3. 28. Slightom, J.L., Blechl, A.E. and Smithies, 0. TIW80), Cell 21, 627-638. 29. Hamada, H. and Kakunaga, T. (1982), Nature 298, 396-398. 30. Kim, S., Davis, M., Sinn, E., Patten, P. andTHood, L. (1981), Cell 27, 573-581. 31. Nishioka, Y. and Leder, P. (1980) J. Biol. Chem. 255, 3691-3694. 32. Sures, I., Lowry, J. and Kedes, L.H. (1978), CellT5, 1033-1044. 33. Cohen, J.B., Effron, K., Rechavi, G., Ben-Neriah, Y., Zakut, R. and Givol, D. (1982), Nucleic Acids Res. 10, 3353-3370. 34. Epplen, J.T., McCarrey, J.R., Sutou, ST and Ohno, S. (1982), Proc. Natl. Acad. Sci. USA 79, 3798-3802. 35. Bell, G.I., SelFy-, M.J. and Rutter, W.J. (1982) Nature 295, 31-35. 36. Hamada, H., Petrino, M.G. and Kakunaga, T. (1982), Proc.¶Natl. Acad. Sci. USA 79, 6465-6469. 37. Wozney, J., Hanahan, D., Tate, V., Boedtker, H. and Doty, P. (1981), Nature 294, 131-135. 38. Leslie,T.G.W., Arnott, S., Chandrasekaran, R. and Ratliff, R.L. (1980), J. Mol. Biol. 143, 49-72. 39. Hentschel, C.C7T1982) Nature 295, 714-716. 40. Johnson, D. and Morgan, A.R. (TM 8), Proc. Natl. Acad. Sci. USA 75, 1637-1641.

1243