Medium Reiteration Frequency Repetitive Sequences in the Human Genome

k.-_:) 1991 Oxford University Press Nucleic Acids Research, Vol. 19, No. 17 4731-4738 Medium reiteration frequency repetitive sequences in the human genome David J.Kaplan, Jerzy Jurka1, Joseph F.Solus and Craig H.Duncan* Center for Molecular Biology, Wayne State University, Detroit, Ml and 'Linus Pauling Institute of Science and Medicine, 440 Page Mill Road, Palo Alto, CA 94306, USA Received April 24, 1991; Revised and Accepted August 7, 1991 EMBL accession nos X59017-X59026 (incl.) ABSTRACT Fourteen novel medium reiteration frequency (MER) Isolation of novel MER families from sequence libraries families were found, in the human genome, by using The Alu fragment library. Human genomic DNA (250 isg) was two different methods. Repetition frequencies per digested to completion with the restriction enzyme AluI. The haploid human genome were estimated for each of resulting fragments were fractionated on a 6% polyacrylamide these families as well as for six previously described gel and fragments in the 500-1000 bp region were isolated. MER DNA families. By these measurements, the 50 ng of this size fractionated DNA was ligated to 500 ng of families were found to contain variable numbers of SnaI cleaved Ml3mpl9 RF DNA (5). The vector was elements, ranging from 200 to 10,000 copies per phosphatased before use. The ligated DNA was mixed with haploid human genome. competent JM 109 bacteria (Stratagene Inc.) and 20,000 resulting transformants were plated at a density of 1,650 plaques per 10 cm INTRODUCTION petri dish. Duplicate filter replicates were prepared and were incubated separately with either of two radioactive oligonucleotide The human genome, like those of other higher eukaryotes, probes. One of these oligonucleotides was the 25 mer, GTGG- contains a substantial amount of interspersed repetitive DNA CTCA[C/T][A/G]CCTGTAATCCCAGCA. This sequence is sequence. Besides the three largest families, the Alu family, the from the 5' consensus sequence for Alu repeats (bases # 12-36 Ll family, and the THE repeats, there are a number of minor from Fig. 1, ref 6). The other oligonucleotide was the 33 mer families, referred to as medium reiteration frequency repeats GG[C/T]TGCAGTGAGC[C/T] [A/G][T/A]GAT[C/T][A/G] (MER). [C/T][A/G]CCA[C/T]TGCACT. This sequence was from the Medium frequency repetitive sequences were first observed 3' region of the Alu family consensus sequence (bases # as sequences which were present in naturally occurring SV40 218-250 from Fig. 1, ref 6). Phage which hybridized to both variants (1,2). With the recent surge of structural information probes were replated at lower density and subjected to a repeat about the human genome, more examples of MER families have screening with the same probes. Single stranded template DNA been uncovered. Two quite different approaches have proven was extracted from 131 phage isolates in 2 ml cultures. useful for identifying MER families. One method was a systematic computer analysis of the entire human DNA sequence The PCR library. One zg of genomic human placental DNA was database to detect repetitive elements which do not belong to mixed with 20 Atg each of two oligonucleotides. One known families (3). The second method was a sequence library oligonucleotide was the 31 mer GGGTCGACAGTGAGCCG- construction technique which selected regions of human DNA AGATCGCGCCACTG, where the underlined region is an lying between closely spaced Alu repeats (4). In this report, we outward facing primer at the 3' boundary of the Alu family use both approaches to discover fourteen novel MER families. consensus sequence (bases 224-246 from Fig. 1, ref 6). The We also provide measurements ofthe repetition frequency ofeach second oligonucleotide was the 28 mer GGGGATCCTGGGA- family. There are certainly more families which have yet escaped TTACAGGCGTGAGCC, where the underlined region is an detection. outward facing primer at the 5' boundary of the Alu family consensus sequence (bases 33-14 from Fig. 1, ref 6). The reaction mixture was adjusted to a volume of 2001d, containing EXPERIMENTAL METHODS each dNTP at a final concentration of 300AtM, polymerase Homology searches by computer assisted analysis of the reaction buffer, and 40 units of Thermus aquaticus DNA human DNA database polymerase (Amersham). The polymerase chain reaction (7) was The GenBank DNA sequence database (Release 65) was subjected performed for 25 cycles of 940 for 3 minutes, followed by 500 to a rapid search using the DASHER2 program as previously for 5 minutes, followed by 720 for 5 minutes. The reaction was described (3). then extracted twice with an equal volume of phenol, once with * To whom correspondence should be addressed at: Room 4118, MCHT, 2727 2nd Avenue, Detroit, MI 48201, USA 4732 Nucleic Acids Research, Vol. 19, No. 17 an equal volume of CHCl3, and ethanol precipitated. The DNA formula, 1( # of positive plaques binding probe) x (2.5 x 106)1 was then dissolved in 50 ,ul of restriction enzyme buffer, followed 1(total # of plaques on the filter) x 151. In this formula the by the addition of 25 units of Sall and 50 units of BamH 1. After 2.5 x 106 represents the size of the human genome in kb, while a 4 hour incubation at 370, the reaction mixture was heated at 15 is the average size (in kb) of the human DNA inserted in the 650 for 5 minutes and DNA fragments were purified by two bacteriophage. As reflected by the numbers in Table I, this cycles of preparative electrophoresis on a 7% polyacrylamide formula gives a statistical estimate, which may be in error by gel. DNA fragments in the 300-1000 bp region were isolated. as much as 50%. 40 ng of this size fractionated DNA was ligated to 100 ng of BamHl, Sall cleaved M13mpl9 RF DNA (from which the 12 bp Sall-BamHl insert had been removed by S-300 gel filtration RESULTS [Pharmacia]). The ligated DNA was mixed with competent JM109 bacteria (Stratagene Inc.) and 2000 resulting transformants Sequence Analysis of novel MER families were plated at a density of 150 plaques per 10 cm petri dish on A total of 562 M13 templates were subjected to sequence analysis. NZY plates containing f-galactosidase indicator dye. Duplicate 131 of these templates came from the Alu fragment library and filter replicates were prepared and were incubated separately with 431 of the templates came from the PCR library. In both cases, either of two radioactive probes. One probe was a double stranded the selection procedures were designed to identify DNA 69 bp sequence taken from the internal region of an Alu repeat sequences 200-700 bp in length which are flanked by Alu (CACTTTGGGAGGCCAAGGCAGGTGGATCACCTCAA- repeats. GTCAGGAGTTCAAGGCCAGCCTGACCAACATGGA). The The Alu fragment method was based on the fact that the bulk second probe was a 5 kb fragment containing an L1 sequence of Alu family elements in the human genome contain a conserved from the region 5' to the human -y-gamma globin gene (8). 431 site for the restriction enzyme AluI (16). When human genomic phage plaques were selected by the criteria of not reacting with DNA was cleaved with this enzyme a subset of the resulting the dye indicator and not hybridizing to either of the radioactive fragments possessed the 3' portion of an Alu family repeat at probes. Single stranded template DNA was extracted from small one end, the 5' portion of an Alu family repeat at the other end, scale (3 ml) cultures of these phage. with an internal region of non-Alu family DNA in between. The AluI restriction fragments were cloned and the subset of DNA DNA, sequencing, amplification, hybridization, and sequence fragments described above was identified by the use of two analysis oligonucleotide hybridization probes. One was specific for the The single stranded templates were sequenced by the dideoxy 5' end of an Alu family repeat, and the other was specific for termination method (8) as modified (9) using T7 DNA the 3' end of an Alu family repeat. Polymerase (10). 400-600 bp of sequence information was The PCR method used two oligonucleotide primers which faced obtained from each template. Oligonucleotides were synthesized outward from the 5' and 3' regions of a consensus Alu family by Research Genetics Inc (Huntsville AL) and labeled with repeat sequence. When these primers were incubated with human polynucleotide kinase and 32p labeled 'y-ATP. Double stranded genomic DNA under PCR conditions, the regions between Alu DNA was labeled by the random primer method (11). repeats were selectively amplified. These amplified fragments Hybridizations were performed at 600 in 6 x SSC, 20 mM were then cloned. NaPO4, 10% dextran sulfate, 4xDenhart's solution, 0.1% Both these methods suffered limitations which precluded the SDS, and 100 pjg/ml denatured salmon sperm DNA for 16 hrs. isolation of certain sequences or included undesired sequences Washing was done twice for 5 minutes in 2 x SSC, 0.2% SDS at room temperature, followed by two washes for 15 minutes in 0.2 xSSC, 0.1% SDS at 450. Autoradiography was 6 hrs to Table I. MER sequences. overnight with an intensifying screen at -70°. MER probes were # in Repetition # in Alu n Hyb to Hyb to constructed by synthesizing oligonucleotide primers suitable for MER# Source Genkank Frequency frag lib PCR lib rod DNA bov DA I DASHER2; ref 3 4 4000-8000 - + Polymerase Chain Reaction amplification (7) of the M 13 sequence 2 DASHER2; ref 3 4 1500-3000 2 + 3 DASHER2: ref 3 nd - nd nd templates. The PCR products were then subcloned in Bluescript 4 DASHER2: ref 3 3 1000-2000 5 DASHER2; ref 3 2 1000-2000 1 - + vectors (Stratagene Inc.).

Medium Reiteration Frequency Repetitive Sequences in the Human Genome

Identification of the Promoter and a Transcriptional Enhancer of The

Super Short Operations on Both Gene Order and Intergenic Sizes Andre R

Genome-Wide Analysis of the Intergenic Regions in Arabidopsis Thaliana Suggests the Existence of Bidirectional Promoters and Genetic Insulators Xiaohan Yang, Cara M

A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding Rnas John P

Number of Patterns Over-Represented in Three Genomes

Transcriptional Regulation of Pena Β-Lactamase in Acquired

Deconvoluting the Most Clinically Relevant Region of the Human Genome

6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008

Comparative Genome Sequence Analyses of Geographic Samples of Aspergillus Fumigatus—Relevance for Amphotericin B Resistance

Intronic Cnvs Cause Gene Expression Variation in Human Populations

Integration of Multiple Repeats of Geminiviral DNA Into

Transcription-Mediated Gene Fusion in the Human Genome