<<

k.-_:) 1991 Oxford University Press Nucleic Acids Research, Vol. 19, No. 17 4731-4738

Medium reiteration frequency repetitive sequences in the human

David J.Kaplan, Jerzy Jurka1, Joseph F.Solus and Craig H.Duncan* Center for Molecular Biology, Wayne State University, Detroit, Ml and 'Linus Pauling Institute of Science and Medicine, 440 Page Mill Road, Palo Alto, CA 94306, USA

Received April 24, 1991; Revised and Accepted August 7, 1991 EMBL accession nos X59017-X59026 (incl.)

ABSTRACT Fourteen novel medium reiteration frequency (MER) Isolation of novel MER families from sequence libraries families were found, in the , by using The Alu fragment library. Human genomic DNA (250 isg) was two different methods. Repetition frequencies per digested to completion with the restriction enzyme AluI. The haploid human genome were estimated for each of resulting fragments were fractionated on a 6% polyacrylamide these families as well as for six previously described gel and fragments in the 500-1000 bp region were isolated. MER DNA families. By these measurements, the 50 ng of this size fractionated DNA was ligated to 500 ng of families were found to contain variable numbers of SnaI cleaved Ml3mpl9 RF DNA (5). The vector was elements, ranging from 200 to 10,000 copies per phosphatased before use. The ligated DNA was mixed with haploid human genome. competent JM 109 bacteria (Stratagene Inc.) and 20,000 resulting transformants were plated at a density of 1,650 plaques per 10 cm INTRODUCTION petri dish. Duplicate filter replicates were prepared and were incubated separately with either of two radioactive oligonucleotide The human genome, like those of other higher , probes. One of these oligonucleotides was the 25 mer, GTGG- contains a substantial amount of interspersed repetitive DNA CTCA[C/T][A/G]CCTGTAATCCCAGCA. This sequence is sequence. Besides the three largest families, the Alu family, the from the 5' consensus sequence for Alu repeats (bases # 12-36 Ll family, and the THE repeats, there are a number of minor from Fig. 1, ref 6). The other oligonucleotide was the 33 mer families, referred to as medium reiteration frequency repeats GG[C/T]TGCAGTGAGC[C/T] [A/G][T/A]GAT[C/T][A/G] (MER). [C/T][A/G]CCA[C/T]TGCACT. This sequence was from the Medium frequency repetitive sequences were first observed 3' region of the Alu family consensus sequence (bases # as sequences which were present in naturally occurring SV40 218-250 from Fig. 1, ref 6). Phage which hybridized to both variants (1,2). With the recent surge of structural information probes were replated at lower density and subjected to a repeat about the human genome, more examples of MER families have screening with the same probes. Single stranded template DNA been uncovered. Two quite different approaches have proven was extracted from 131 phage isolates in 2 ml cultures. useful for identifying MER families. One method was a systematic computer analysis of the entire human DNA sequence The PCR library. One zg of genomic human placental DNA was database to detect repetitive elements which do not belong to mixed with 20 Atg each of two oligonucleotides. One known families (3). The second method was a sequence library oligonucleotide was the 31 mer GGGTCGACAGTGAGCCG- construction technique which selected regions of human DNA AGATCGCGCCACTG, where the underlined region is an lying between closely spaced Alu repeats (4). In this report, we outward facing primer at the 3' boundary of the Alu family use both approaches to discover fourteen novel MER families. consensus sequence (bases 224-246 from Fig. 1, ref 6). The We also provide measurements ofthe repetition frequency ofeach second oligonucleotide was the 28 mer GGGGATCCTGGGA- family. There are certainly more families which have yet escaped TTACAGGCGTGAGCC, where the underlined region is an detection. outward facing primer at the 5' boundary of the Alu family consensus sequence (bases 33-14 from Fig. 1, ref 6). The reaction mixture was adjusted to a volume of 2001d, containing EXPERIMENTAL METHODS each dNTP at a final concentration of 300AtM, polymerase Homology searches by computer assisted analysis of the reaction buffer, and 40 units of Thermus aquaticus DNA human DNA database polymerase (Amersham). The polymerase chain reaction (7) was The GenBank DNA sequence database (Release 65) was subjected performed for 25 cycles of 940 for 3 minutes, followed by 500 to a rapid search using the DASHER2 program as previously for 5 minutes, followed by 720 for 5 minutes. The reaction was described (3). then extracted twice with an equal volume of phenol, once with

* To whom correspondence should be addressed at: Room 4118, MCHT, 2727 2nd Avenue, Detroit, MI 48201, USA 4732 Nucleic Acids Research, Vol. 19, No. 17 an equal volume of CHCl3, and ethanol precipitated. The DNA formula, 1( # of positive plaques binding probe) x (2.5 x 106)1 was then dissolved in 50 ,ul of restriction enzyme buffer, followed 1(total # of plaques on the filter) x 151. In this formula the by the addition of 25 units of Sall and 50 units of BamH 1. After 2.5 x 106 represents the size of the human genome in kb, while a 4 hour incubation at 370, the reaction mixture was heated at 15 is the average size (in kb) of the human DNA inserted in the 650 for 5 minutes and DNA fragments were purified by two bacteriophage. As reflected by the numbers in Table I, this cycles of preparative electrophoresis on a 7% polyacrylamide formula gives a statistical estimate, which may be in error by gel. DNA fragments in the 300-1000 bp region were isolated. as much as 50%. 40 ng of this size fractionated DNA was ligated to 100 ng of BamHl, Sall cleaved M13mpl9 RF DNA (from which the 12 bp Sall-BamHl insert had been removed by S-300 gel filtration RESULTS [Pharmacia]). The ligated DNA was mixed with competent JM109 bacteria (Stratagene Inc.) and 2000 resulting transformants Sequence Analysis of novel MER families were plated at a density of 150 plaques per 10 cm petri dish on A total of 562 M13 templates were subjected to sequence analysis. NZY plates containing f-galactosidase indicator dye. Duplicate 131 of these templates came from the Alu fragment library and filter replicates were prepared and were incubated separately with 431 of the templates came from the PCR library. In both cases, either of two radioactive probes. One probe was a double stranded the selection procedures were designed to identify DNA 69 bp sequence taken from the internal region of an Alu repeat sequences 200-700 bp in length which are flanked by Alu (CACTTTGGGAGGCCAAGGCAGGTGGATCACCTCAA- repeats. GTCAGGAGTTCAAGGCCAGCCTGACCAACATGGA). The The Alu fragment method was based on the fact that the bulk second probe was a 5 kb fragment containing an L1 sequence of Alu family elements in the human genome contain a conserved from the region 5' to the human -y-gamma globin (8). 431 site for the restriction enzyme AluI (16). When human genomic phage plaques were selected by the criteria of not reacting with DNA was cleaved with this enzyme a subset of the resulting the dye indicator and not hybridizing to either of the radioactive fragments possessed the 3' portion of an Alu family repeat at probes. Single stranded template DNA was extracted from small one end, the 5' portion of an Alu family repeat at the other end, scale (3 ml) cultures of these phage. with an internal region of non-Alu family DNA in between. The AluI restriction fragments were cloned and the subset of DNA DNA, sequencing, amplification, hybridization, and sequence fragments described above was identified by the use of two analysis oligonucleotide hybridization probes. One was specific for the The single stranded templates were sequenced by the dideoxy 5' end of an Alu family repeat, and the other was specific for termination method (8) as modified (9) using T7 DNA the 3' end of an Alu family repeat. Polymerase (10). 400-600 bp of sequence information was The PCR method used two oligonucleotide primers which faced obtained from each template. Oligonucleotides were synthesized outward from the 5' and 3' regions of a consensus Alu family by Research Genetics Inc (Huntsville AL) and labeled with repeat sequence. When these primers were incubated with human polynucleotide kinase and 32p labeled 'y-ATP. Double stranded genomic DNA under PCR conditions, the regions between Alu DNA was labeled by the random primer method (11). repeats were selectively amplified. These amplified fragments Hybridizations were performed at 600 in 6 x SSC, 20 mM were then cloned. NaPO4, 10% dextran sulfate, 4xDenhart's solution, 0.1% Both these methods suffered limitations which precluded the SDS, and 100 pjg/ml denatured salmon sperm DNA for 16 hrs. isolation of certain sequences or included undesired sequences Washing was done twice for 5 minutes in 2 x SSC, 0.2% SDS at room temperature, followed by two washes for 15 minutes in 0.2 xSSC, 0.1% SDS at 450. Autoradiography was 6 hrs to Table I. MER sequences. overnight with an intensifying screen at -70°. MER probes were # in Repetition # in Alu n Hyb to Hyb to constructed by synthesizing oligonucleotide primers suitable for MER# Source Genkank Frequency frag lib PCR lib rod DNA bov DA I DASHER2; ref 3 4 4000-8000 - + Polymerase Chain Reaction amplification (7) of the M 13 sequence 2 DASHER2; ref 3 4 1500-3000 2 + 3 DASHER2: ref 3 nd - nd nd templates. The PCR products were then subcloned in Bluescript 4 DASHER2: ref 3 3 1000-2000 5 DASHER2; ref 3 2 1000-2000 1 - + vectors (Stratagene Inc.). The sequence data were analysed on 6 DASHER2; ref 3 2 500-1000 - 7 Alu frag lib: ref 4 3 500-1000 1 a Macintosh II microcomputer using MacVector 3.5 DNA 8 Alu frag lib 2 400-800 1 - + 9 DASHER2 2 700-1500 - analysis software from International Biotechnologies Inc. 10 DASHER2; ref 26 4 4000-8000 - - 11 DASHER2 3 1500-3000 Southern blot hybridizations were performed according to 12 PCR lib 4 2500-5000 4 - + 13 DASHER2 5 1500-3000 - Southern (12) as modified by 14 PCR lib 2 1500-3000 - Thomas (13). 15 PCR Jib 5 700-1500 2 - + 16 PCR lib 2 300-600 - + 1 7 PCR lib 1500-3000 Reiteration Frequency of DNA probes 18 Alu frag lib: ref 35 5 5000-10000 19 Alu frag lib 2500-5000 were 20 Alu frag lib 200-400 + Reiteration frequencies of the probes estimated by a plaque 21 Alu frag lib 1 500-1000 1 - + hybridization assay. Purified double stranded DNA fragments TOTALS 56 31300-62800 6 14 used as probes were provided by Lagan Inc., Detroit MI. These Characteristics of 21 MER sequences are presented, including the source by which DNA probes were labeled with 32P by the random primer they were discovered, the number of their occurrences in GenBank 65. their method and purified by centrifugal chromatography on Sephadex repetition frequencies by the plaque hybridization assay, the numbers of their G-25 (15). The probes were then hybridized in situ to occurrences in both the Alu fragment and PCR libraries, and their hybridization 25,000-50,000 recombinant lambda bacteriophage plaques from signals to rodent and bovine chromosomal . DASHER2 is the computer program for rapid similarity searches. The repetition frequencies are given in a genomic human DNA library which had been immobilized on copies per haploid human genome. MER3 was a loosely homologous sequence a 150 mm circular nitrocellulose filter (14). The reiteration family detected by DASHER2 analysis, but a probe made from this fanC.' did frequency of a particular probe was then estimated by the not detect any repetitive sequences by the plaque hybridization assay. Nucleic Acids Research, Vol. 19, No. 17 4733 in the selection of clones. In some ways, the two methods were Thus any region of the genome spanning an AluI restriction site complementary in this regard. The main weakness of the PCR was excluded from this library. library stemmed from the fact that many Alu family elements Sequence data was obtained from all 562 clones selected from exist as directly repeated tandem multimers. Any such both libraries. This sequence data was then aligned against a arrangement formed a short (30-40 bp) sequence which was datafile containing other known primate repetitive elements, an ideal candidate to be amplified during the polymerase chain including LI sequences (17), satellite sequences (18), Xba repeats reaction using the primers we employed. The polymerase chain (19), 0 repeats (20), THE repeats (21), simple polydinucleotides reaction products were subjected to two cycles of preparative gel and homopolymer runs greater than 12 bp. All regions electrophoresis before cloning, to select inserts of greater than homologous to these elements were removed. At this point the 200 bp. Despite this, only 220 of 431 clones contained inserts sequence datafile contained 58,272 bp, of which 22,285 bp were of greater than 150 bp. Only 178 of these clones contained inserts derived from the Alu fragment library and 35,987 bp were of greater than 100 bp of non-Alu family sequence. derived from the PCR library. This sequence was divided into The Alu fragment library was designed to contain pieces of blocks of4-5 kb and aligned with the primate sequence database. Alu family repeats at the boundaries with non-Alu family DNA in between. However, 57 of the 131 clones isolated by the Alu fragment method contained otherwise intact Alu family repeats Table H. Locations of MERs in known chromosomal loci. which did not possess the characteristic AluI restriction enyzme cleavage site. In most cases, the sequence data obtained from Locs Nalme imuen96s MERI kb p MER such clones consisted of Alu family sequence with little or no HUMAGG 727-981 4669 HUMAIGRE 205410 10 412 flanking regions. A further weakness of this method was that HUMAPOAII 6055-6395 1 8967 HUMAPOA4C 1-145 2 3605 the clones were designed to terminate at all AluI restriction sites. HUMATP1A2 48204980 16 26669 HUMB2M2 1863-2030 7 2872 HUMBCR22I 1090-1220 12 3023 HUMC21DLA 1395-1560 18 2717 MER1 MER2 HUMC21DLA 1600-1800 15 MER4 MER6 HUMCRYGBC 4720-4877 > 4 22776 5.7 2 3 4 5 1 it 3 4 5 1 2 3 4 5 1 2 3 4 5 kb HUMCRYGBC 9067-9180 x 17 .. HUMCRYGBC 9524-9921 o 9 I HUMCRYGBC 21020 21145 13 vi 1, W- - X~ HUMCYARO1 1010-1120 - 21 1297 _ HUMCYP345 2880-3063 <- 12 3064 HUMERYA 2165-2478 - 2 3075 HUMFIXG 6669-7025 -> 2 38060 7.6 -2 HUMFIXG 8000-8268 > 7 HUMFIXG 18220-18400 13 HUMFIXG 21700-22145 - 9 HUMFIXG 24021-24170 -> 5 HUMFOLA2 1-264 - 6 596 HUMGSTPIA 1-571 - 11 2670 HUMHBB 14860-15020 - 13 73327 36.7 HUMBBB 68220-68340 -13 HUMHLASBA 8810-9026 - 10 14647 M ER8 MER9 HUMHMG14A 8044-8248 >> 7 8883 MER10 MER13 HUMHPRTB 7812-8071 >x 18 56737 14.2 1 2 3 4 5 i 2 3 4 5 1 2 3 4 N1 2 3 4 HUMHPRTB 37059-37280 >- 12 HUMHPRTB 54772-54975 << 15 HVJMHPRTB 54960-55145 << 18 HUMIHXAUTUTTTrktTAXT"t I1-160ICtn 14., 352-.rx HUMIGCD3 283-500 10 762 HUMIIRBA 170-320 14 1095 .... HUMINSRD 1060 1200 13 7241 HUMINT2 11440 11660 15 11609 HUMLMWOB1 390-540 16 1482 HUMMBPIA 273400 15 3032 HUMNGFE 3399-3550 5 11595 HUMP45C17 52-790 11 8880 HUMPADP 45-240 18 3805 HUMPAIA 1-511 1 15868 HUMRASSK2 530-760 19 244 HUMRPS17A 3245-3500 20 4030 HUMSEXREPB 350-560 18 917 MER16 MER17 MER18 MER20 HUMSIGMG3 403-1194 11 3724 12 3 4 5 I 2 1 2345 12345 HUMSIGMG3 2000-2150 15 HUMTFPB 8913-8989 8 13866 HUMTPA 40-578 36594 52 HUMTPA 6307-6297 ins 4 HUMTPA 17474-17877 12 HUMTPA 18392-18851 ins 2 HUMTPA 22931-23031 8 HUMTPA 24498-25144 4 HUMTPA 34152-35153 ins 6 HUMTPOO4 451-673 - 10 4019 Alu fragmentlibrary 22285 3.7 I IPCR library 35987 2.6 The 56 MERs with known genomic locations are listed. All sequences use the GenBank numbering system. The column Alu sequences refers to the presence Figure 1. Hybridization of mammalian chromosomal DNAs with MER DNA or absence of Alu family repeats with 500 bp 5' or 3' ofthe given genomic location probes. Five Atg samples of restriction endonuclease digested chromosomal DNAs and the arrows refer to the orientation of the Alu repeats. For example - means were electrophoresed in a 0.7% agarose gel and blotted onto nylon membranes. that there is no Alu sequence 5' to the genomic location and that there is an Alu The membranes were incubated with radiolabelled DNA probes described in Table 3' to the genomic location which is oriented in the 5' to 3' direction. The letters III or Ref. 3. Shown are autoradiographs of the blots. The lanes are: 1, Human 'ins' refer to an insertion of an Alu element within the MER element. MER # DNA digested with EcoRl; 2, Rhesus monkey DNA digested with EcoRl; 3, is the MER designation of each repetitive element. Length in bp refers to the Mouse DNA digested with EcoRl; 4,Chinese hamster DNA digested with EcoRl; total length of the given genomic sequence in the database. kb per MER is obtained 5, Bovine DNA digested with EcoRl. Positions of molecular weight markers by dividing the length in kb of each loci by the numbers of MERs identified within are shown by dashes at the right of each autoradiograph. selected loci. 4734 Nucleic Acids Research, Vol. 19, No. 17 Twenty seven sequences which showed matches to the primate Description of MER families database (greater than 150 on initial scoring or 300 on optimized Sequence alignments are presented for fourteen families, MER8 scoring) were investigated further. Of these twenty seven to MER21 (Table III). MER1 to MER6 (3) and MER7 (4) sequences, nine were eliminated because they were members of sequence alignments have been previously presented. previously described MER families. Double stranded DNA probes (100-300 bp) were constructed from the eighteen MER8. This sequence was present in the Alu fragment library. remaining sequences. Eleven of these probes identified repeat It is 60% homologous to a 160 bp region in the tissue plasminogen sequence families by the plaque hybridization assay (Materials activator gene (23), and has a 70 bp homology to the tissue factor & Methods). Probes were also constructed for ten additional MER families discovered by DASHER2 analysis of the GenBank 3 gene (24). 65 database. The estimated reiteration frequencies of all these probes are given (gble I). MER9. This sequence was found by DASHER2 analysis of used here relied on the occurrence GenBank, as a 88% homology over 270 bp. The probe for this The screening processes family was made from a sequence within e of the human of homologous regions in the human sequence database. At least for IX one region of homology was required with the sequence library gene blood clotting factor (25). methods and at least two regions of homology were required with the DASHER2 method. It was conceivable that some of the MERIO. This sequence was first noted as being repetitive by sequences in the Alu fragment or PCR libraries could be MER Lawrance et al. (26). It appears four times in the human sequence elements which are not represented in the human sequence database, and different family members share about 80% database. To test this hypothesis probes were constructed for homology. MERlO has also been studied by Mermer et al. (27) sixteen sequences from the libraries which did not have a who referred to this family as the MstH repeats. The probe for significant match with the human primate database. None of these this family was constructed from an intergenic region in the probes identified repeat sequence families in the plaque human HLA locus (26). hybridization assay. MERIJ. This sequence was found by DASHER2 analysis of Hybridization of MERs to chromosomal DNAs GenBank. This is the longest MER found, showing matches over Genomic blot hybridizations were performed in order to evaluate 1100 bp in three different . MER 11 sequences exhibit length the distribution of MER repeats in genomic DNA. Double differences. This is due to the presence of variable numbers of stranded DNA probes for all MER families were radiolabelled a 50 bp subrepeat, which is present four times in the and hybridized to EcoRl digests of primate, bovine, Chinese HUMP45C17 sequence (marked by underlines in Table I), three hamster, and mouse chromosomal DNAs. The resulting times in the HUMSIGMG3 sequence and once in the autoradiographs (twelve representative films are shown, Fig. 1) HUMGSTPIA sequence. Akahori et al. (28) noticed this 50 bp showed hybridization ofthese probes to disperse sets of restriction subrepeat in the HUMSIGMG3 sequence. Two probes for this fragments from human and rhesus monkey chromosomal DNAs. family were constructed, one from the 5' region and another from Similar results have been reported for another MERI probe (22). the 3' region of the MERi 1 repeat in the human gene for Some of the patterns showed what appeared to be discrete length cytochrome P450-C17 (29). Both probes indicated similar repeat bands within a background of heterogeneous length fragments. frequencies in the plaque hybridization assay. Further investigation is required to explain the origin of these discrete bands. None ofthe probes showed hybridization to mouse MER12. This sequence was found in the PCR library. There are or Chinese hamster chromosomal DNAs. The bovine four appearances of this family in the primate database, each chromosomal DNA showed hybridization signals for ten of the sharing 60-70% homology with the others. probes. These signals were always less intense than those of from an equivalent amount of primate DNA, but contrasted to no MERJ3. This sequence was found by computer analysis of the detectable signal in the rodents' chromosomal DNAs. database. A probe was constructed from an intergenic region of the macaque ,B globin gene region (30). The probe shared 93 % homology with its orthologous counterpart in the human f globin region and 70-75% homology to the other members of this DISCUSSION family. The MER13 sequence may be part of a variant LI Isolation of MERs from the sequence libraries sequence. It was considered to be the 3' portion an LI repeat The sequence library methods were designed to select for novel in the human gene for blood clotting factor IX (25). It lies adjacent MERs on the basis of their proximity to Alu family repeats. Of to, but was not considered part of, an L1H repeat in the human the 56 known genomic MER locations, 26 lie within 500 bp of 3-globin locus (31). MER13 is also adjacent to an L1H repeat an Alu repeat (Table II). The Alu fragment and PCR libraries in the human crystallin gene locus (32) and in 5' flanking were constructed in such a way to select for regions ofthe genome sequence of the human glutathione S-transferase pi gene (33). adjacent to Alu repeats. Both of these sequence libraries were However MER13 was not associated with LI repeats in its other clearly. enriched for MERs. The 59 kb of sequence analysed two known genomic locations. contained 20 examples of MERs or one repeat for every 2.9 kb. This is in contrast to known genomic loci containing MERs. MERJ4. This sequence was found in the PCR library. It shares Several of the long known gene sequences contain multiple 75-80% homology with two sequences in the primate database. examples of MERs, but even the most richly endowed sequence At one location MER14 flanks the 5' side of a deletion in a mutant (HUMTPA) has fewer MERs per unit length (5.2 kb per MER) gene for the alpha chain of (-hexosaminidase A (34). The 3' than either the PCR or Alu fragment libraries (Table II). side of this deletion is within an Alu family repeat. Nucleic Acids Research, Vol. 19, No. 17 4735

Table III. Alignment of MER8 -21 with their locations at sequenced loci.

MERS MERBAl CAGATACTTCCATCGATACCAAACTCTTTGCATTCTCGAAATTTTCACAATGTGAAGCGA 120 HUMTPA A. .--AC.A..C. .. .C.C...... -G-C-GTGG-A-AC. .G.--G.. .---A.C..lC.....A....C.--G....TC.TG.C .... .G.1A....A. .CAT.C 22974 HUIMTFPB .T....AT. .A....A.T... 8936 MERBAl GTAAGTCGACCTGTGCAGTTCAAACCCATGTTGTTCAAGGGTCAACTATGTATGAAAACATTTCTCTTCAGTAAGTACTTAGAAGTGGAA 210 HUIMTPA A... .-T. .A.CC....G....G..AAG.... CC. .G.A.T.TC. .-AGCAG. .-CCA. .G.T-G-A.GT-G... .TA.< 22891 HUMTFPB AC .....G. .T.CACAG.G.G...C..T...8989

MI"9 HUMFIXG TTAACGGGCCACGGTTCCCTTTTGCGAGCGCTAATATAATTCTAGGATAGGTATTGTGTTT21861 HUMZRYGBC .A.C. .lA.CC. .T.ATT.G. .TT.-. ..CTAT.G. .. .C.CGA.T.-.-.T.-AA...... CCCT.A.GGGA.... .A.GGGA.GGGCC....G.G..G.G...... 9765 HUMFIXG GCGAGAAGCTAGCCATATAGATGCGGTTAACCTTACGTTCCCGGGGCGTTCTGCTATTGAA21981 HUM4ZRYGBC...... C...... G..TG..C .... T....TA...... G...... G.....A...... T.....CA....9645 HUMFIXG CCAACTCATTGTTAGCACCACTTAATCGAAATTTTTGCGTTGGCGTAGCAATGAGCGTCAC22101 HUMCRYGBC...... C .... GCG.T.T. ...C.C...T..A. .. .G...... T...... 2.....2.G...... 9524 HUMFIXG AAACCAGAAGCTAGGAAT 22119 HUMCZRYGBC . .GAA.T.T.G. .TA..< 9507

MER10 HUMHLASBA TCCATTTTGCATTGCTATAAAGGAATATCTGAAGTTACCTAATTTACGAAGAAAAGAGCTTTAAATGGCTCACAG TCTGCAGGCTGTACGCGAAACATGGCACTAGCATCTGCTTCTGT 8740 HUMAIGRE .C...... A....C..C.GAC.GGG.....TA...... CG....T...... T...... T...AG...... T...G.....C.NGC. 129 HUMTP004 ...... G....A....C....GAA.GGG....-...- T.....G.A....T.....AG.T....G...... AC.C..G.....2..TC 744 HUMIGCD3 .A .. .AT.A.A.T-A.A.G. ..-T.TG.ACC. .GC.-. .--G.GG.A.G. .. GG. .. GA .....C.A.....T.....C.AT.C. .-...... C..G...... - 216 HUMHLASBA TGGGGGATTCTGGAAGCTTTTACTCATGGTGGAAGGCAAGTGGAGCCAGTGCATCACATGGTCATAGAGGGAGAAAGAGACATAGAAAGAGGTGCCAGCCTCTTTTTAACAACCAGGTTT 8860 HUMAIGRE 3. .A. .CC..A.....AC.G...... 2A.G.G...AG.CA...... C. .G. .CA....G.2. .T.T.G.GGG. .A.A. .T.TA.....A. .TG....A.C. 256 HUMTPOO4 . ... .1.1C. .A.-....CAC.G...... 2. .GCA....AG....TC...... G.G. ... .A....C....GTCA. .CGG1 .....CA.AACG.AA...... C. 619 HUMIGCD3 GC.A. .CC. .A. .G. .A. .GC.A. .. C....A...1....G..GAG...... 2...... C....-.G .-.....T..CA.GTC.2A....A....TG. .C.G. 336 HUMHLASBA CATGTGCACTAATAGAGTGAGAACTCACTCATTACCCGGAGAGGGGACAAAGCCATTCATGAGGGTCTCCTCCATGATTCAAATACCTCCCACCAGGCCCCACCTGCAACACTGGG ATC 8980 HUMAIGRE ..AAGTA...--CG ... CA ... GT.....G....AA ... T.T..C.C ...... CA....-A....A.C.CCA.G.TCC .... T..C ... T.T.....C...... A...T 373 HUM~TP004 . .CA. .G.. .TCCT.G.CA....G....C... AA.1..T. .. .G.T..A.....C....2.TG. .CTTG .... .C.. .TCG...... T...... T....C....TCCA.A ... 496 11UJMIGD . .CAG.G....CT....G.....G....C..1G....T. .TG.T.G...... 1.. .1.AG.C.....CT. .TC...... N.-NNNN....C...... A ... 458 HU!KiLASBA AATTTTCAACATGA.GACTTGGAACTGACAAATATCCAAATCATATT> 9026 HUMAIGRE .CAA....G.-. ..T. .. .-....C.-.G.-C....C....> 411 HUMTP004 . CA.. .T .T....T.AAGG.G. .A. .GC ... T.C.C.T .... < 450 HUMIGCD3 .CA.C...... A.T....-... .A....C.C...... > 501

HUMP45C17 GAGCAAGCCTTCATCCCGACGGCACACTTATATAAAGAGGGGTGAGGGATAGACTGAATGAAGGGACTGCTGGAGCCATGGCAGAGGAACATAAATTGTGA 120 HUMSIGMG3 . .T.-T.CT.AT...... A.1.....1.C....G.....1.A...... 470 HUMP4 5C17 AGTTATATTGCTTTATCCATAATTAATTTAGCGCTATTACCTACTTACTGAGTAGTTTTAT240 HUMSIGMG3...... A...... CA...... AC....C 583 HUIP4 5C17 TCAGGACCACTGTGATAATTGTGTTAACTGTACAAATTGATTGTAAGACATGTGTTTGAACAATATGAAATTAGTGCACCTTGAAAAAG ACACAATAAGAGCAATTTTTAGGGAACAAG 360 HUMSIGMG3 ...... G...... A.2...... C ...... -...G....C....G...... T... 704 HUMP4 5C17 GGAAACAAGTTATCTCGGCGCGAGGCTTTCTCTAAAGTTATGCTCATGGAAACCAATTTCT480 HUMSIGMG3 . ... G...... AG.... .T....A...... T....1....A.C.C.....A.G.G...... G....A.....825 HUMGSTPIA G....A..C. .C 13 HUMP4 5C17 AGAGATTATTATCCGGAGATCTCTGGGGTTTACGCCCGGAGCACTTCGTAAAGATAAAGCT600 HUMSIGMG3.....A.A..A.C...... C...... -. .G.G...... G ... .A..TG....9...3...... 956 HUI4STPIA ..T....A..lA..A...iG....A....A...... CA...CCG..C.AlG ... TG...T.C ...A..G..C..G1T..A.T ...AlT..-A.... GA...A.C.....137 HUMP4 5C17 GGTCTCCTGCAGTACCCTCAGGCTTACTAGGGTGGGGAAAAACTCCGCCCTGGTAAATTTGTGGTCAGACCGGTTCTCTGCTGTCGAAGGGTGTTTGCTGTTGTTTAAGGTGTTTATCAA 720 HUMSIGMG3...... T....G..T...... A...... C..T...... C..-..CTC....T...... A...... 1074 HUM4STPIA A...... C....C...A... GA....A.TA.--G ... T...AG....C..-A..C.A...... C.T ... CCC..ACA.TGCG..CAC.GT.G.ACCTGG.. 254 HUMP4 5C17 GAATTTCCGTACTrA.CCTTTGTTCTTCCTGCTTACTGT TATGTTCTTCCTGCTTACT840 H(JMSIGMG3 .T.A .. .A....T.T...... T...A...... T...... T.C. ------1188 HU!4STPIA TT. .T.-A... .ATT.------269------HUMP4C17MTGGACCTTACGGTATTCTCTTTT1.CT ATGTATCTTGTTAACACITATTATAGTCTGCTTTCGC! GCATGTGATCTTTA 96096

HUJMSIGMG3 ...... T...... 1 6 HUM~STPIA -2T.------.C.CC..-c. ... 3CCT.-T.A..-TT.G.G...-C....T...... C. 328 HUMP~45C17 TCCATTCTTCTCCCCCCTTTAACTATAACTCGTTAGTAGGGACCGCTCGTTTAGCCCCGG1080 HUMSIGMG3.... .-...... G...... lA....C...... C..TC . T...... T...... T 1380 HUMG~STPIA .GA.--C.A.A.....G.G .. .T....T-C. .. GA. T..TTGC.C...... C...-G...*C...... G.G...C. T.T .....T. . AC 446 HUMP45C17 GCCCAGCTGTAAAATTCCTCTCTTTATACTGTCTCTCTTTATTTCTCAGCCGGCTGACACTTATAGAAAGAACCTACGTTGAAATATTGGGGGTGGGTTCCCCCAGTACGCTAGTCCTGC 1200 HUMSIGMG3.T. .. .CA. ..'A .. 7...... A...... CA...... A.. 1AA.TA...> 1506 HU14STPIA A. G... T 1 G...... G....C...7A.1G....G ....C..-G...... iC. .AA..T...... C.ATAATA ... > 571

MZR12 MER12A1 CAGTGGCACCCAGCCCCATTTTAACCATTTTTAAGTGGACAGTTCACTGACATTAAGTACTACACATTCGCATTGTTGTGCCACCATCCCCACCATCCATCTCCAGAACTTTTTTATCTT HUMBCR22I .A.AT.TA.TG ...... TG.G .. G..GT...C. -.-T.---.A.... A.CA.A.AC.TG..A.T..TG . C..A.. HUMTPA < CC G.-.G. .C... .C. C.G.G .CGT. .T.- . . TG. .-C.AC.G ..G.G.A. GC. C.CC.. .AC HUMHPRTB C.C.G.GT .CG.ACC.-.--.A.A ...... CTTT.A . ... TG...... C.AGTGGT. .T. .AT.TAT. .AGAGTTG.G.. ... 1T . CC . . .A... C..-.. HUMCYP345 ..C.CG ..TCGA ..TC.C.C.AA . GC.GGCA.-T .... G.GTGA.C ..CGGT.CC.GG.C ..--.-A ... AA.T.TAG.AA ... TTT ..T. .C. .C.-AAA .... ACCC.GC.-.CC 4736 Nucleic Acids Research, Vol. 19, No. 17

MER12Al GTGAACTAGACTTATATCCTTAACATAATCTCATTCACCCCTCCCAGCCCTACAACACCATCTATTTCGTGTTATAATTTACTATCTAGTACTCAT 2404 HUMBCR22I C---- ...-.G.A .... 1C.A ... GC....-..-C..C. .1.l.GTC.....A.. .CA...... CC. .T...... C..0...G.....C.... G.A.....> 1155 HUIMTPA CCCT.... GAA.0..G.ACTA..A...0...... -. .T.T. ....-T.AO.-AC..--G.T.G ...... G.CC.....C ... CA....A. .C..< 17555 HUMHPRTB -.. .0...-..C.-AAG.A. .-.T.G. .C.C. .T.GCAGT.C...C.T ...... G.....T.A...... 7. .CCG..C.G....TO ... .A.GGAC.TTC .... > 37280 HUMCZYP345 A.-TOG.-.---. .CC.C.A.A. .. .CC.CC. .G.-. 0.-C. .C.TG .C...... ACG...... 0.G.T.A...... 0..G...C. .TC....AC.TT....> 3044

MZR13 MACHBB OGGGAGCTAAATGATOGA ~AAGAAAAAGGTACGCTCGTTCTAGTGGGGGAAGAAGTAAAGACATGTCAGT 300

HUMHBB...... T...... cc..c..C ...... -...A...... T....58297 HUMG.STPIA .0...... -. TiTA.....C... GOA.A ..O...TGT ... G..C.....CC ...... A.0. .A... CG..l...... 0.ACOCA 714 HUMFIXG .....G. .T.AlA..TT.. .A...... OA.A .... Tl..G.-...... AG.T.....T.... C....GC.G. .. .l.CA....T.0...G.C. 18305 HI.l4RYGBC ... .C....CTT.. .-GT....TOG..-...... A.A...G....CTGG..C. .G.....TGC.....CG.TT ..... C....C....GCA.....TAC. 21101 HUMHBB A.A...... GA.T.....T.T. ...G.G.A..... GC.T.CTGG.C.... .TCA..0..G....A-G .. .A..... 100...... AC...A.A.. .A.G....CO 14952 HUMINSRD <..C.GG.AT.TC.~TT.-T..CGTGT. .. .T.C.. .-.T.-.C. .-....T.C...... A. .G.T...A .....A...C....CGo....CA.....T.C. 1111 MACHBB TATTAGCTGAGTGATGAAATAATCTGTACATCAAAACCCAGTGACATGCAATTTACCTATATAAGTTGTACATGTACCC:C~C TTTAAAATAAAAGTGAAAACAAAGTA> 410 HUIMHBB c. .C.-...... G.....T...... C...... T...... > 58405 HUMSTPIA .A.CC...G.0 . . ..C...... C.-.G.....T .0..G.4CC.TC...... ACAG ...... T...... <' 606 HUMFIXG .....C . T A....A.....C. ..T....-CATG .... .T.... G.0..4CC.TC...... T..CC...... 3T.A....A.... .A.< 18189 HUMCRYGBC ... .C.C. .. G.0...A...... C.....C.TTCACA. ...1...... 5. .C.-. .CC. .G.T.C.G0.....T .... .C...... TT...T...> 21213 HUMHBB A..C .0.....G.C.... .A... .A.... C....CT....T-C...T.... GO.7TC6 ....C.TO...TG.C...... iT....-C... .A> 15072 HUMINSRD 0 ... .CA.G.0....CA.... .T.0.....G C...1....AA... .CC ...... T.T.-. .T. .TT.T.< 1036

MER1 4A1 ACGGCGCTTTTTTTTGTAGGGTTTGTCGCTGATGTAATAACAGAAGGTTCTCCCACTTTC 120 HUMHXAMT .-.-.. C..0...G....T....C.... GTGA...... T. .AG. .... G.I.A....A...T...... 89 HtJMIL2RBA G.C....CA..AGGC.A....C.C...... A .....GT ...... -.. ... - ..A...A..T... GOT...... 211 MER1 4A1 TTCTGTTACAAATGATCCAGTCTACTCTTGTAGO 153 HUMHXAMUIT ..G.... T...C...... CA -.. .A.-.C.TTTAAAATATACAATTATTATTGACTATAGTCACCCTGGAGTAGCTA> 165 HUMIL2RBA ..4 ....-..--C.A....A.....T..6.T.... G.0.4...0..G....A...... TTA.-.....( 122

MER1 5Al CCTACTGGGCTGATAAGGGCCAATTCTAATGGGCTCATq,T 51 MER1 5Bl GGC___CACr____TGGGTTAAT______38

HUMSIGMG3 1...TT . . . G . . .- . . 2 0 HUMC21DLA T.C...C.T. .. .A.... .T.CTGT.2AT... T..T...... A.. ..T.T.TG.A.A.G.T .A.....C....10.T...... T.. T.T.-T....-C.A 1718 HUMMBPlA ...... CCTlCCC. OTO..TC. .T.G....T ... .Al.G.T.T. .. .C.T.CA....C.....--....GT.0...... G ...A.T...... -...C ... 291 HUMHPRTB <....T------.-C.. .-G.-TT. .. .T.....TGTA... .T. .TT. .C.T.A.... T.C...... C.... .T. .A.T.....A.---.-. .--.T.AT.AA.... .T.2A 54870 HUMINT2 <...TCCTGO....C.A... .1....OT. .TT.....TA.GTT .....-...C. .CA. 11548 MER1 5Al TAACAAAGCCCAAGTGGTCAAGATTGGGCGCTACATAACAAGGTTT 172 MER1 5Bl --TCCTCGTGTGATGGCCCTGTTTCCAAATGGTCACAAACTGAGGTTCTA .GGTA -TCTTGAACGTATGATTTT CCATAAACACAAGGCATTTGAGAC 156 HUMSIGMG3 .TG. .G.TG....--A...... C...... TGT.l.CT...... A... .C... .AC.....2A....C.C. .-T.TA...(< 2009 HUMZ21DLA .CT...TO.. .T.AAC. .A...... T.....TG.lT.CTA.A....AA. .CC. ...A...... CA.A.TG-A.G..G.A.> 1812 HUMMIBP1A .TC. .C.TG....-A... A.A...... C.....T.CA.,AA....A.A..G..A....-C. .-... ..A.GG... CT..> 380 HUMH8PRTB ...C. .TTT. .... A.-. .A.C.T....CA...T..TGTO.. TlCAT. ..C.. A...CC....A.G.0.....A.AGG.C.CA. .CAT..< 54772 HUMINT2 T..-...... TGT. .T.A.C. .-A...... GTG. .TTlTA ...... A... .C.. .AG...C.C....-.GG.-. .CA..< 11456

M2RI6 MER1 6Al CAGCTOGCATCTGGCCTAAATCCTAATGTAAACTATOGAATTTT 42 HtJMATP1A2 AGGGCAGTGAAGCTATTCCATATGATACCACAATGGCGGATACACGTCAGTATACATTTGCCCAAACCCATAGAATGT-...... C.AA.AG.G. .CT.A..CT .0...GC... 4855 HUMLI.WOS1 . .A....TOO.. GOGTGT....T.....T.A. .TGT. .TG...... -..T.....1...-.. .-GA. .-.AA.AG.G. .C.T.....9AA. .. G0.C... 424 MER16EAl GGGTAGTTT-AAAGTACATTAAAGATCTGTCGTTGTGGAAGCCGAGAGCAGAGATTTGTTC162 HUMATP1A2 .AC....A....A.C...G.C.....-0... COG ... T ... AC... .C..A.G.I.G...... 0.0. 0.--...... -00..GC .... .1....1.AAC.T 4974 HtJMLMWOS1 T.T.A. .T.....A.T....CAA...... T...-.-.-ACA. .--.-C....A... .A.--.-G. .. .AA. .-T. .-.--.G.-.... G.0.1.. .CT...-.. AAC.. 530 MER16A1 TICTOG 166 HUMPATP1A2 ... -CTACTTCCCACTCAATTTTGCTGTGAACCAGAAAATTGCTCCA> 5020 HUMLMWOS1....-. .A.T.AG.A...A.T. .... A.G..TA... .C.> 569

1(1R17 MER1 7A1 TCTGCTCCTTCCCATATATATAACATCATCGACTCTTTGAGACTTCGTTCTT 102 HUMTh9RYGBC ...... A.....C.T. .2.TC...... CC...... lA...... T...... A..> 9184

1(1Rl3 MER18Al 120 HUMSEXREPB .T.A.CT....GGA.G.CA. .G-0.GC.ATC. .---.... AT.. .-AC.-.. .C. .-. .C.GA. .C.. .-GT.A.. -0.----. .C. .AGA.T.GTTAC.A.....A.C .. .A.... 394 HUM.iPRTB ..T.....T.A. .-.-.....A.T.GG. .TT..AT.....-. .A.TA.C.ACACCTTC. .A.AGAACAAC. .CC.G.TC .0..G.A.....A. .A...... 7966 HUMEPRTB <..C. .-..---.0...A .T...T.1..A.G...-...... OTG..A 55098 HUMPADP .C. .-T.GT.T. .. G.0.0.2.... .CTGT. .CC.A.AAATA. .ACCAAAA.C.A.--CG...... CTG.GCCA.ATGA.CCC.G.-. ..G.G0A....T....T....-TT.. 214 HUM.C21DLA .-A.. .TT.TG. .... TT.G. -.-.0.G.0.. .T.3. .. .G.T. .T.3A.A. .T.A. .T2.i.C. .-.. .-.A.TAA.C.G. .A. .-T.A.A.A. .TTCT ...... A... 1404 MER1 8Al 240 HUMSEXREPB G.CCC...... --..C.. .GA..C.-C... .G.C. .T.. ..G.A. 10..l....C.ATGCCA..GC.G. .. C...CT. .. .4.0....G...... C.C. 516 HUKi4PRTB 7850 HUMHPRTB 0. .3G...... TG..--.T.G0.....CTC. ..0.G...C.....A.....1...... A..CA. .0.G.0....TCT... .4....C...... 1... G.--.G.. 54971 HUMPADP G,A.GGA.C....C....TO....CCTC. ...T. ..A.... .0.....GA...... 0.T... CT.AG.G.0.. .A..TCT. .. .4.C...... C...l.CAC. .. CC. 88 HUMC21DiA G.AGG.0...... T.... GOGCCTA..0..OC....TCA. .... 1.0...... GT.A.CCA. .-.G.0.. .CT. ..-..... A. --0. .-AGG.G.A. .-...... TCC. .0. 1519 MER1 8Al CTGTGTCTACACAATGCCTTCCTCTGTGCATGTCTGTGA I 279 HUMSEXREPB .....CT....G01 M.C.TO....M.... 556 HUMEPRTB .O..CT....-.. .A.C..T. .. .AACA.GC... .C..< 7812 HUR.HPRTB T.C.CC. .-.TG.T. .TG.GT.G.0..< 54945 HUMPADP ...A.TCT. .... TG.T.l...... TC.TA .... 49 F.UMC21DLA ..C. .C. .-T.-GC.-T. .GGTGG.T.... G0.> 1548 Nucleic Acids Research, Vol. 19, No. 17 4737

MER1 9 MER1 9A1 TGATTAGATGGTGTCCACTCAGATT12GGGTGGGTCT0CTTTCCCCAGCCACTGAGTCCMTGTTMCCTCCTTTGGCMCACCCTCACAGACACACCCAGGATCMTACTTTGTMC120 HUMRASSK2 . .C.C.G.GTTG ...... GG. . . CCA .GAGC.G.G.1GGGG.... GGG.C.G.ATC.CC...T. C...... A.9.2. .AG.C.T. 623

MER1 9A1 14T1IAT;1.,At'A1AA1TjJJIT1.AiAi115X1TA1.AA..iT.AJJA.J.JI.AA1IVJW1AA IiIE1AA3PUAUAJ1T.14AI'.AITdt1.^A_AA_1Ii.A11 1 UI 205

ITIMDIICCYI) . R Al HUMrAb fZ o ...... U wI.u s..-...... U -. u.- -T. . L;. .JUT U -i;.....1. U.A T.U -. . ,=; .. .. Ztel MZR2 0 MER2OA1 CTTGTAAGCAGGGCTTTCAAAAGGCAGCATTTGCATCTGCTATGTTAACTTTTTAATACGCTTTGGGCAGAGAACTGTAAGAGATCCCAGAGGACATTTGACAATTTCTGGAGACATTTC 120

HUMRPS17A T.A. .-AT ..CT . A.CC.G ... T.GTCC ..G-AAA ..A.GTGGC ... G.G-T ..T.T.AAA ..-G.C.-G.G-T--TTT.C-TTT ...... G..CA. .C .. 1> 3277 MER2OA1 TGTTGTCGTAACTGGGAGGTCCACAGACATACATTGGATAGMGCCAGGMTGCTGCTMATGCTATGCAGTCCACAGAACAGCCCCCATCCCCTCAACCAAAAATTATGTGGCCCAAAA 240 HUMRPS17A C.. .C. .A. .G.CA ..G.2C.TG.TA ..T.CT.A ...G...... A.G . .T. CTT.C ... A.A . T. TG.-C---TA-T.-A.... G.A.G.CCA . GG.> 3394 MER2OA1 TGTCAGTAGTGTCAAGGTGGAGAAACTCTCGTCTAGACCTACCCTAACAGAATTTAACAACAAGCCTGGAAGGATCAAGTTGAI 323 HU[PS17A .... .TACGT. .C.. . AA.T.. GG. GT.T...-AG-TGAT.G.T...G> 3443

MIR21 MER2 lAl AATTCTGACACTGTCTACCTGGAGATAGCATCAAATCCCAGGGTTGAGGTCTCAGTCTGAC0GACTGCACCCCTCTTCAGACACAGTAGCATGTCTGGGCCTCTGGMCTTCTGACCM1120 HUMCYAR01 .G.. CA .A. A.G . TG ..TG. lA. G. A. .TT.-.GA.-.TA.A.ACG...... -GTC.G.A.GA ...... C.... .C.T ..TG 985 MER2lAl CTGGCTTCACATTGGGGT4CCAGACTTCGTATTTATGTTC0TTMTTTGCTAGAGTGGCTCACAAAACTCAGAGAAATACATTTACTGGTTMTTATAAAGGATATTTATGCCTTTAT240 HUM4CYAR01 A.A.GA.T.-.G.-TCT.CTT .-C.... C.G ...--.--.-C.CC.CG ..-.TC ..-CC... CAG...... CCT.-. .CC.-.G.C. .TTGGGCC.T.T.T...G ...l.1TTG.1.CC 875 MER2lAl ATTCCTT&CACTTTCAATTCATCTGTTTCTTTCAGGCACATCCTTTATAGAA_ _ 291 HUMCYAR01 . .GA. .GAAG.A.GTG...G.A.AAAA.G.GATT .. ...< 836

The best match alignments from the MacVector programs are shown. Matches are indicated by periods, while mismatches are indicated by the letters G,A,T,C or dashes. Insertions are indicated by numbers showing their lengths. The boxed sequences indicate the regions used as probes in hybridization experiments. Nomenclature was adopted to allow future identification of additional members or subfamilies of MER repeats. For example MER8Al indicates example number 1 of the A subfamily of MER8. The new sequences presented here have been assigned the following EMBL accession numbers: IMER12A1; X59017}, (MER14AI; X590181, [MER15Al; X590191, IMER16A1; X59020}, tMER15B1; X59021j, IMER17A1; X59022}, IMER19A1; X59023X, {MER18A1; X59024}, tMER21A1; X590251, tMER2OAl; X59026). Arrows () indicate the 5' to 3' directions of the sequence numberings. For MER15 the best consensus sequence is a mosaic of sequences and is indicated by dotted underlines. For MERl1, underlines denote the 50 bp subrepeat described in DISCUSSION.

MER15. This sequence was noted by Akahori et al. (28) as being MER20. This sequence was found in the Alu fragment library. part of a repeat unit in the human immunoglobulin locus. Two Hybridization experiments indicate a low (200-400) copy sequences, MER1SA1 and MERiSBI, were found in the PCR number per haploid genome. One of these hybridization targets library which shared homology with this repeat unit, which was may be a 60% homology to the first intron of the human gene called the MER15 family. Although MER15A1 and MER1SBI for ribosomal protein S17 (37). share only short patches of homology with each other, together they form a mosaic consensus sequence. Searches of the database MER21. This sequence was found in the Alu fragment library. identified five sequences which shared 75-80% homology with As for MER20, hybridization experiments indicate a repetitive this consensus. MER15 sequences are found directly adjacent sequence, but database searches revealed only a 60% homology to MER18 sequences at two iocations (HUMHPRTB 54970 and to a 100 bp sequence in the 5' flank of the human aromatase HUMC21DLA 1560). However, the two sequences are not cytochrome P450 gene (38). Such a homology may not be juxtaposed at other genomic locations and hybridization sufficient to have been detected in our hybridization experiments. experiments indicate different repetition frequencies, so MER15 and MER18 are considered to be distinct MER families. Undiscovered MER families The 21 known MER families account for 30,000-60,000 MER16. This sequence was found in the PCR library. It shares repetitive elements in the human genome (Table I). There are 70% homology with two human sequences from the database. surely more MER families remaining to be discovered. Their number can be estimated in several ways. The simplest way is MERJ 7. This sequence was found in the Alu fragment library. to assume that the sequence libraries constructed in this study Although short, it shares 87% homology with an intergenic select MERs at random. Then, out of all the MERs in the sequence in the human crystallin gene locus (32). libraries, the proportion which are members of already known MER families will correspond to the proportion of the MERs MER18. This sequence was found in the PCR library. It is a in the human genome which have already been assigned to known member of a repeat family first described by Fisher et al. (35). MER families. Specifically, the sequence libraries contained 20 This family is rather loosely conserved (60-65% homology) and matches to the GenBank database. Of these 20, 9 were members is spread throughout the chromosomes. A more tightly conserved of known MER families. This implies that 9/20 or 45 % of the (93 % homology) subfamily is embedded in tandem duplications MERs are in known families and that 55% remain to be on the human sex chromosomes (35). MER18 sequences are discovered. This implies the existence of 70,000-140,000 MERs found directly adjacent to MER15 sequences at two locations in the human genome. (HUMHPRTB 54970 and HUMC21DLA 1560), but is still Another way to estimate the abundance of MERs is to considered as a distinct MER family. extrapolate from the abundance of known MERs within sequenced gene loci. This estimate would have to be a lower MERJ9. This sequence was found in the Alu fragment library. bound, because it neglects the not yet discovered MERs which It contains a 100 bp region which is 85% homologous to a region may lie within these loci. Nonetheless, some rough figures can 5' to the human SK2 c-Ha-ras-1 oncogene (36). be assembled. Five gene loci (demarcated by boxes in Table II) 4738 Nucleic Acids Research, Vol. 19, No. 17 contain 227494 bp of DNA and 22 MERs. This extrapolates to 18. Waye, J.S., England, S.B., and Willard, H.F. (1987) Mol. Cell. Biol., 7. MERs in the entire human genome. 349-356. 242,000 19. Safrany, G. and Hidvegi E.J. (1989) Nucleic Acids Res., 17, 3013-3022. Although both of these estimates are crude, they give some 20. Sun, L., Paulson, K.E., Schmid, C.W., Kadyk, L., and Leinwand, L. (1984) notion of the number and complexity of MER families in the Nucleic Acids Res., 12, 2669-2690. human genome. More work is required in many areas before 21. Paulson, K.E., Deka, N., Schmid, C.W., Misra, R., Schindler, C.W., Rush, the and function of MER families will be understood. Even M.G., Kadyk, L. and Leinwand, L. (1985) Nature (London), 316, 359-361. 22. Bosma, P.J., Kooistra, T., Siemieniak, D.R., and Slightom, J.L. (1991) at this early stage several interesting features stand out. One is Gene 100, 261-266. the apparent clustering of MER repeats at certain genetic loci 23. Degen, S. J. F., Rajput, B., and Reich, E. (1986) J. Biol. Chem.. 261, (Table II). This clustering may imply that MER families influence 6972-6985. gene expression, although one experimental test of this idea failed 24. Mackman, N., Morrissey, J.H., Fowler, B., and Edgington, T.S. (1989) feature is the diversity in Biochemistry, 28, 1755-1762. to support it (22). Another interesting 25. Kurachi, K., Davie, E.W., Strydom, D.J., Riordan, J.F., and Vallee, B.L. levels of homology between the different MER families, ranging (1985) Biochemistry, 24, 5494-5499. from >90% (MER9, MERI1) to <70% (MER12, MER18). 26. Lawrance, S.K., Das, H.K., Pan, J., and Weissman, S.M. (1985) Nucleic This implies that some of the MER families are of recent Acids Res. ,13, 7515-7528. evolutionary origin while others are ancient. Some of the MER 27. Mermer, B., Colb, M., and Krontiris, T.G. (1987) Proc. Natl. Acad. Sci. from U.S.A. 84, 3320-3324. families may be old enough to be relics of Alu-type repeats 28. Akahori, Y., Handa, H., Imai, K, Abe, M., Kameyama, K., Hibiya, M., the early times of the mammalian radiation. This latter Yasui, H., Okamura, K., Naito, M., Matsuoka, H. and Kurosawa, Y. (1988) interpretation is supported by the observation that ten of the MER Nucleic Acids Res., 16, 9497 - 9511. probes showed hybridization signals with bovine chromosomal 29. Picado-Leonard, J., and Miller, W.L. (1987) DNA, 6, 439-448. 1). 30. Savatier, P., Trabuchet, G., Chebloune, Y., Faure, C., Verdier, G., and DNAs (Fig. Nigon, V.M. (1987) J. Mol. Evol., 24, 297-308. Medium reiteration frequency, or MER repeats, are significant 31. Rogan, P.K., Pan J., and Weissman, S.M. (1987) Mol. Biol. Evol., 4, additions to the repertoire of interspersed repetitive DNA which 327-342. has heretofore been described in the human genome. As the DNA 32. Den Dunnen, J.T., Van Neck, J.W., Cremers, F.P.M., Lubsen, N.H., and databases grow in size, more MER families will be discovered. Schoenmakers, J.G.G. (1989) Gene, 78, 201 -213. intrinsic 33. Morrow, C.S., Goldsmith, M.E. and Cowan, K.H. (1990) Gene, 88, These families merit further study both because of their 215-225. importance as well as for their use as mapping sites for developing 34. Myerowitz, R. and Hogikyan, N.D. (1987) J. Biol. Chem., 262, a physical map of the human genome. 15396-15399. 35. Fisher, E. M. C., Alitalo, T., Luoh, S-W., De la Chapelle, A., and Page, D.C. (1990) Genomics, 7, 625-628. 36. Sekiya, T., Tokunaga, A., and Fushimi, M. (1985) Jpn. J. Cancer Res., ACKNOWLEDGEMENTS 76, 787-791. The authors wish to thank Gayathri Murali for technical support. 37. Chen, I.T. and Roufa, D.J. (1988) Gene, 70, 107-116. 38. Means, G.D., Mahendroo, M.S., Corbin, C.J., Mathis, J.M., Powell, F.E., We are also grateful to the following colleagues who provided Mendelson, C.R., and Simpson, E.R. (1989) J. Biol. Chem., 264, plasmids: S.Degen for the tissue plasminogen activator gene, 19385-19391. K.Kurachi for the blood clotting factor IX gene, S.Lawrance for the HLA gene, and W.Miller/S.Brentano for the cytochrome P450-C17 gene. This research was supported by the State of Michigan Research Excellence Fund.

REFERENCES 1. Wakamiya, T., McCutchan, T., Rosenberg, M., and Singer, M. (1979) J. Biol. Chem., 254, 3584-3591. 2. Sheflin, L., Celeste, A., and Woodworth-Gutai, M. (1983) J. Biol. Chem., 258, 14315-14321. 3. Jurka, J (1990) Nucleic Acids Res., 18, 137-141. 4. Kaplan, D.J., and Duncan, C.H. (1990) Nucleic Acids Res., 18, 192. 5. Yanisch-Perron, C., Vieira, J., and Messing, J. (1985) Gene, 33, 103-119. 6. Kariya, Y., Kato, K., Hayashizaki, Y., Himeno, S., Tarui, S., and Matsubara, K. (1987) Gene, 53, 1-10. 7. Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., and Erlich, H.A. (1988) Science, 239, 487-491. 8. Sanger, F., Nicklen, S., and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA, 74, 5463-5467. 9. Bnmnner, A.M., Schimenti, J.C., and Duncan, C.H. (1986) Biochemistry, 25, 5028-5035. 10. Tabor. S., and Richardson, C.C. DNA (1987) Proc. Natl. Acad. Sci. U.S.A., 84, 4767-4771. 11. Feinberg, A.P., and Vogelstein, B. (1983) Anal. Biochem., 132, 6-13. 12. Southern, E.M. (1975) J. Mol. Biol., 98, 503-517. 13. Thomas, P.S. (1980) Proc. Natl. Acad. Sci. USA, 77, 5201-5205. 14. Benton, W.D., and Davis, R.W. (1977) Science, 1%, 180-182. 15. Neal W.M. and Florini, J.R. (1973) Anal. Biochem., 55, 328-330. 16. Houck, C.M., Rinehart, F.P. and Schmid, C.W.(1979) J. Mol. Biol., 132, 289-306. 17. Martin, S.L., Voliva, C.F., Burton, F.H., Edgell, M.H., and Hutchison, III, C.A. (1984) Proc. Natl. Acad. Sci. U.S.A. 81, 2308-2312.