Proc. Natl. Acad. Sci. USA Vol. 94, pp. 14584–14589, December 1997

Sequence of the FRA3B common fragile region: Implications for the mechanism of FHIT deletion

HIROSHI INOUE,HIDESHI ISHII,HANSJUERG ALDER,ERIC SNYDER,TERESA DRUCK,KAY HUEBNER, AND CARLO M. CROCE

Kimmel Cancer Institute, Jefferson Medical College, Philadelphia, PA 19107

Communicated by Sidney Weinhouse, Jefferson Medical College, Philadelphia, PA, October 30, 1997 (received for review October 3, 1997)

ABSTRACT The hypothesis that chromosomal fragile sites sequence (17), resulting in a total sequenced region of 276 kbp. may be ‘‘weak links’’ that result in hot spots for cancer-specific This region also showed hemizygous loss in numerous cancers chromosome rearrangements was supported by the discovery (19, 26–31). We also sequenced a region encompassing the that numerous cancer cell homozygous deletions and a familial familial kidney cancer-associated translocation (32) break be- translocation map within the FHIT gene, which encompasses the tween FHIT exons 3 and 4. common fragile site, FRA3B. Sequence analysis of 276 kb of the FRA3B͞FHIT locus and 22 associated cancer cell deletion end- MATERIALS AND METHODS points shows that this locus is a frequent target of homologous DNA Sequencing Templates. As shown in Fig. 1, six over- recombination between long interspersed nuclear element se- lapping cosmid clones, covering Ϸ200 kbp of the FHIT locus quences resulting in FHIT gene internal deletions, probably as a from intron 4 through a large part of intron 5 (16), were result of carcinogen-induced damage at FRA3B fragile sites. sequenced completely. BAC clones 1O12 and 358N7, overlap- ping the centromeric and telomeric ends of the cosmid contig, Fragile sites are chromosome regions that reveal cytogeneti- were obtained from Research Genetics, Huntsville, AL. For cally detectable gaps after the exposure of cells to specific sequencing the t(3;8) translocation region, a bacteriophage reagents (1); several folate-sensitive, heritable fragile sites clone from a YAC 850A6 phage library was selected by using have been localized to unstable CGG repeats, and one of these the D3S1480 amplified product. Phage ends were sequenced, sites, the FRA11B at 11q23.3, is associated with Jacobsen and primers pairs prepared. PCR amplification of hybrid (11q-) syndrome that shows a direct link between a fragile site , including DNA from a hybrid carrying the der 3 and in vivo chromosome breakage (2). Another heritable chromosome from a t(3;8) family member, showed that this fragile site, the distamycin-A sensitive FRA16B, recently was phage spanned the break, as did an 8.4-kb subclone (33). found to be caused by an expanded 33-bp AT-rich minisatellite Cell Lines. Cancer cell lines KATO III (stomach carcinoma), repeat (3). Thus, repeat expansion is a property of trinucle- MDA-MB436 (breast carcinoma), LoVo (colon adenocarci- otide and minisatellite repeats, and these dynamic mutations noma), and LS180 (colon adenocarcinoma) were obtained are sometimes detected as fragile sites. from the American Type Culture Collection and HK1 (naso- After induction of fragile sites in somatic cell hybrids, pharyngeal carcinoma) was provided by Dolly Huang (Chinese increased frequencies of translocations, deletions (4–6), and University of Hong Kong). Deletion in the FHIT locus and plasmid DNA integration (7) at fragile sites have been ob- absence of Fhit were reported for these cell lines (16, served. Also, because the chromosomal positions of fragile 19, 20). sites apparently coincide cytogenetically with the regions of Shotgun Sequencing. Cosmid and BAC DNA were prepared similar chromosomal aberrations in cancer cells, it has been by Qiagen Mini or Maxi prep kit (Qiagen, Chatsworth, CA). For postulated that fragile sites, which are susceptible to carcino- each clone, 30-␮l aliquots (30 ␮g) of DNA were sonicated at the gen-induced alterations (8), could play a role in cancer cell- lowest energy setting for 5–10 sec at 0°C with the 3-mm probe. specific chromosomal rearrangements (9–12), but the direct Aliquots were electrophoresed on 0.8% agarose gels, and the proof of involvement of fragile sites in cancer has been lacking. preparations for which the peak fragment size was 2–3 kb were Recently, the most inducible common fragile site, FRA3B at analyzed further. DNA fragments (50 ␮l) for each clone were 3p14.2, was shown to exhibit gaps or fragility over a broad region digested for 10 min at 30°C with BAL-31 nuclease (New England (13–15), much of which has been cloned (7, 13, 14, 16, 17) and Biolabs), extracted with phenol, precipitated, dissolved, and partially sequenced (7, 14, 17, 18). A papillomavirus insertion site fractionated on a 0.8% low melting agarose gel. The 1.5- to 2.0-kb (14), plasmid integration sites (7), and cancer specific transloca- fraction was excised, extracted with glassmilk with a Geneclean tions and deletions (17, 19, 20) have been mapped within the III kit (Bio 101), and dissolved in 10 ␮l of distilled water. DNA FHIT gene, which encompasses FRA3B (16). FHIT alleles are concentration was measured with Picogreen fluorescent dye frequently inactivated in common human cancers (16, 20–24), (Molecular Probes) on a FluorImager SI (Molecular Dynamics). and the replacement of Fhit expression in cancer cells suppresses A two-step ligation procedure was used to produce plasmid their tumorigenicity (25). Thus, chromosome rearrangement at libraries essentially as described previously (34), for each cosmid FRA3B, probably following carcinogen damage, is associated or BAC clone. Recombinant plasmids were isolated by Qiagen with development of major human cancers. BioRobot 9600, and DNA concentration measured on the Fluo- To elucidate mechanisms involved in FHIT alterations and rImager SI analyzer. Plasmid DNAs were digested with EcoRI FRA3B͞FHIT fragility, we sequenced a Ϸ200-kb region sur- and HindIII (Boehringer Mannheim) and were gel separated to rounding FHIT exon 5, which included the majority of induced confirm the presence of insert DNA. Sequencing reactions and gaps (15) and overlaps the reported 110-kb partial intron 4 analysis were performed by using dyedeoxy-terminator reaction

The publication costs of this article were defrayed in part by page charge Abbreviations: RT, reverse ; LINE, long interspersed payment. This article must therefore be hereby marked ‘‘advertisement’’ in nuclear element. accordance with 18 U.S.C. §1734 solely to indicate this fact. Data deposition: The sequences reported in this paper have been © 1997 by The National Academy of Sciences 0027-8424͞97͞9414584-6$2.00͞0 deposited in the GenBank database (accession nos. AF020503, PNAS is available online at http:͞͞www.pnas.org. AF020504, AF020609-AF020615, and AF019967).

14584 Downloaded by guest on September 25, 2021 Genetics: Inoue et al. Proc. Natl. Acad. Sci. USA 94 (1997) 14585

Reverse Transcription–PCR (RT-PCR) and cDNA Library Screening. RT-PCR was performed by using primers Z13– 44͞f(5Ј-GTCCGTAGATCTTGTTATGG-3Ј) and Z13–44͞r (5Ј-TGGCTTTCAGGCATGTTGAGC-3Ј) and fetal kidney cDNA. A fetal kidney cDNA library was purchased (CLON- TECH) and 3ϫ 106 plaques were screened. Inverse PCR. Inverse PCR amplifications were performed by using a modification of the method described in ref. 40. DNA (3 ␮g) was digested with 50 units of restriction enzymes with four or five recognition sites. The incompatible cohesive ends were filled in with T4 DNA polymerase. For ͞ FIG. 1. The normal FHIT FRA3B locus. The top line represents circularization, the digested DNA was ligated in a 50-␮l the locus with positions of reported markers and FHIT exons. The long reaction with 4 units of T4 DNA ligase and ligation buffer (50 hatched bar represents the 210-kb sequenced region, which overlapped ⅐ ͞ ͞ the reported U66722 sequence (solid bar) (17). Genomic YAC, BAC, mM Tris Cl, pH 7.4 10 mM MgCl2 10 mM ATP) at 16°C phage and cosmid clones used in the analysis are shown. The small overnight. Circular products served as PCR templates under hatched bar represents the sequence of chromosome 3 at the t(3;8) the following conditions: 5 min at 95°C followed with 35 cycles break. of 15 sec at 95°C, 30 sec at 60°C (first 10 cycles), 59°C (second 10 cycles), and 58°C (last 15 cycles), and then 3 min at 72°C. chemistry on a Perkin–Elmer͞Cetus DNA Thermal Cycler 9600 and the Applied Biosystems Model 377 DNA sequencing systems. RESULTS Sequence data were edited, assembled, and analyzed with DNA Sequence Analysis. Shotgun sequencing was performed the SEQUENCHER 3.0 program (Gene Codes, Ann Arbor, MI). for the 210-kb region encompassing FHIT exon 5; 1,427,400 bp Strategies implemented for sequence completion included were sequenced for a coverage ratio of 6.9. Sequence fidelity was primer walking to extend the sequence read of a given sub- confirmed by primer walking, by amplification of identical se- clone, to close a gap or confirm sequence that was obtained in quence fragments from human DNA templates and by using the opposite orientation. The sequence of the regions not Southern blot detection of expected size fragments in human covered at least twice in one orientation and once in the other DNAs. The complete sequence, numbered through 206,881 be- was obtained by PCR amplification and sequencing of the ginning at the telomeric end, was submitted to GenBank (acces- sion no. AF020503). Fig. 1 shows the location of the sequenced region from a cosmid or plasmid clone. After proofreading, the region relative to FRA3B͞FHIT landmarks. The GC content of the final sequence was analyzed with BLAST (35), REPEATMASKER 210-kb region was 38.39 Ϯ 0.50% with minimal deviation from this (36), GRAIL II (37), GENEFINDER (FGENEH) (38), and GENE- mean over the region, as shown for the 30-kb subregions in Table SCAN (39) programs available at http:͞͞gc.bcm.tmc.edu: 8088͞ ͞ 1; this value was similar to that of the adjacent 110-kb sequence search-launcher launcher.html. (38.9%) within intron 4 (17). There were no apparent CpG islands Deletion Analysis. To define cancer cell homozygous deletions in the sequence. in the FRA3B region, 150 pairs of PCR oligonucleotide primers ␮ The BLAST analysis did not reveal putative genes except for the in unique sequences were generated. PCR reactions (20 l) FHIT exon 5 in the 210-kb region, and no other matching ␮ containing 100 ng of DNA, 20–40 ng of primers, 200 M dNTPs, sequences were found in the EST database. After the masking of ⅐ ͞ 10 mM Tris Cl (pH 8.3), 50 mM KCl, 0.1 mg ml gelatin, 1.5 mM known repetitive sequences, the potential exons were detected by MgCl2, and 0.5–1.5 units of Taq polymerase were performed in using GRAIL, GENEFINDER, and GENESCAN programs; 28, 15, and a Perkin–Elmer͞Cetus DNA Thermal Cycler 9600 for 35 cycles 7 exon candidates were found, respectively. One of the sequences, of 95°C denaturation for 30 sec, 57°C annealing (varied for Z4͞34, at position 92 kb, also was detected by using RT-PCR and specific primer pairs) for 30 sec, and extending at 72°C for 30–45 Northern blotting, which detected a 9- to 10-kb transcript in fetal sec. The amplified products were visualized in ethidium bromide- kidney mRNA. The screening of a fetal kidney cDNA library, by stained agarose gels. using the RT-PCR product as probe, detected six overlapping

Table 1. Repetitive elements in the FRA3B͞FHIT locus Subregions, kb Repeats 1–30 30–60 60–90 90–120 120–150 150–180 180–210 210–240† 240–270† SINEs Alus 1328 (4.4)* 1560 (5.2) 2063 (6.9) 1127 (3.8) 2292 (7.6) 1200 (4.0) 1456 (4.9) 1903 (6.3) 1990 (6.6) MIRs 1066 (3.4) 1301 (4.3) 980 (3.3) 579 (1.9) 704 (2.4) 460 (1.5) 858 (2.9) 678 (2.3) 492 (1.6) LINEs L1 303 (1.0) 4651 (15.5) 9278 (30.9) 6295 (20.9) 2152 (7.2) 5079 (16.9) 3882 (12.9) 3320 (11.1) 3368 (11.2) L2 475 (1.6) 236 (0.8) 0 (0) 299 (1.0) 1068 (3.6) 1197 (4.0) 54 (0.2) 295 (1.0) 359 (1.2) LTR elements MalRVs 781 (2.6) 394 (1.3) 0 (0) 1115 (3.7) 883 (2.9) 0 (0) 0 (0) 839 (2.8) 421 (1.4) HERVs 340 (1.1) 177 (0.6) 0 (0) 0 (0) 566 (1.9) 131 (0.4) 0 (0) 0 (0) 0 (0) DNA elements MER1 655 (2.2) 434 (1.5) 0 (0) 1008 (3.4) 354 (1.2) 588 (2.0) 811 (2.7) 386 (1.3) 235 (0.8) MER2 952 (3.2) 0 (0) 654 (2.2) 269 (0.9) 904 (3.0) 2497 (8.3) 344 (1.2) 0 ((0) 0 (0) Mariners 1287 (4.3) 0 (0) 0 (0) 81 (0.3) 0 (0) 0 (0) 0 (0) 66 (0.2) 0 (0) Unclassified 1028 (3.4) 0 (0) 0 (0) 0 (0) 0 (0) 779 (2.6) 0 (0) 0 (0) 0 (0) Total repeats 8598 (28.7) 8753 (29.2) 12975 (43.3) 10737 (35.8) 8923 (29.7) 11931 (39.8) 7622 (25.4) 7487 (25.0) 6855 (22.9) GC content (%) 38.9 37.7 38.4 37.3 39.2 38.6 39.0 38.8 39.1 *Total and percent (in parentheses) nucleotides represented by the repeat clone within the specific subregion. †Repetitive element analysis included the 110-kb sequence reported by Boldog et al. (1997). Downloaded by guest on September 25, 2021 14586 Genetics: Inoue et al. Proc. Natl. Acad. Sci. USA 94 (1997)

cDNA clones. Sequence analysis revealed that the cDNA was at clones, which matched an Escherichia coli insertion-IS1 ele- least 9 kb long and colinear with genomic DNA, suggesting that ment (GenBank accession no. M13369). The insertion prob- this cDNA represented an unprocessed transcript or resulted ably occurred during amplification of the cosmid DNA in E. from false priming of contaminating DNA. A sequence with coli; the sequence of the insertion site (Fig. 3) and element was homology to human MTERF cDNA (GenBank accession no. deposited in GenBank (accession no. AF020504). Y09615) was found at position 231 kb. The sequence was colinear Hybrid Breakpoints. Aphidicolin-induced breakpoints in with genomic DNA, was interrupted with L1 repeats, and is somatic cell hybrids retaining human chromosome 3 were probably a . reported to cluster in the FRA3B region (13, 18, 17), and the Repetitive Element Analysis. Long interspersed nuclear ele- distal breakpoint cluster reported by Wang et al. (18) was ments (LINE) (L1 and L2), short interspersed nuclear elements located at position 203 kb (see Fig. 3 and 4). A spontaneous (Alu and MIR), elements with long terminal repeats (HERVs break in hybrid cl3 (41) was mapped to 230 kb by using PCR and MalRVs), and DNA transposons (mariner and MER) were amplifcation of primers derived from the FRA3B (Fig. 4). distributed throughout the region (summarized in Table 1); Interestingly, the cl3 break is near or in the L1 element at the Ј 32.4% of the sequences in the region were repetitive elements. 5 end of the MTERF pseudogene. The distribution pattern of repetitive elements varied across the Homozygous Deletions. We (19, 16, 20) and others (17, 30) region, representing 43% of sequences in region 60–90 kb, 40% have mapped cancer cell homozygous deletions in the FRA3B at 150–180 kb, and 36% at 90–120 kb, in contrast to 25% in the region. Deletion breakpoints were located precisely within the region 180–210 kb. Forty Alu and 41 MIR repetitive elements, sequenced region by primer ‘‘walking’’ to the endpoints by totaling 10.7 kb (5.1% of total) and 5.9 kb (2.8%) in length, performing multiple PCR reactions on DNA templates from respectively, were recognized. Thirty-nine MER repeats occu- five cancer cell lines (unpublished data). Three discontinuous, pied 4.4% of the region. Smit (36), in a robust analysis for human homozygous deletions were found in MB436 and HK1; other repetitive sequences in several megabases of DNA retrieved from cell lines (KATO III and LoVo) were shown to have one GenBank, concluded that Alu sequences were less frequent in homozygously deleted region. For LS180, with three deletions regions of low GC content, comprising 5.7% of the human (20), we fine-mapped only the large central deletion. Inverse genome with GC content of Ͻ43%, 17.9% of regions with GC PCRs performed at both sides (when possible) of the deletion endpoints revealed structural rearrangements of both FHIT content of 43–52% and 20.2% of sequences in regions with GC alleles in these cell lines (Fig. 3; Table 2). content of Ͼ52%, consistent with our findings. A comparison of In LS180 cells, independent deletions of 122 kb and 109 kb repeat class frequencies in the Smit sequences, the Boldog et al. occurred on the two alleles at positions 22–144 kb for allele a and sequence (17), and our 210-kb sequence is shown in Fig. 2. LINE 71–180 kb for allele b, with a 73-kb overlapping region of deletion elements comprised Ϸ15% of the genome; in this 210-kb region, at 71–144 kb, which was observed as a homozygous deletion. there were 43 LINEs with a length of 35.0 kb (16.7%), near the LS180 cells also exhibited homozygous deletions in other regions mean value for the whole genome. Most LINE elements were of of the FHIT locus (20) that were not sequenced. In KATO III and the L1 subtype (total length 31.6 kb, 15.1%) and were concen- LoVo cells, the entire 210 kb was missing on allele b; one trated at 50–100 kb and 150–180 kb, with 8 of the 11 L1 elements Ͼ breakpoint was found in the KATO III allele a at position 61 kb of 1 kb in these two subregions (see Table 1 and Fig. 3). and at position 179 kb of the LoVo a allele. In these cell lines, the Consensus topoisomerase II recognition sites also were distrib- sequence adjacent to the identified breakpoint was not present in uted throughout the region, and 24 dinucleotide repeat motifs, the 210-kb region, indicating a deletion that ended outside the including D3S1300, were identified. Four AT-rich trinucleotide sequenced region. In both cell lines, the sequences adjacent to the repeats also were found: the polymorphic (TAA)11–22 (D3S4103) break in allele a were in the partially sequenced BAC 358N7, at 61 kb, (TAA)8 at 72 kb, (TGA)5 at 92 kb, and an imperfect telomeric to cosmid 63. In HK1 cells, deletions of 73 kb and 24 (CAA)8 at 121 kb, as illustrated in Fig. 4. kb were found at 81–153.8 kb and at 153.9–178 kb on allele a, Insertion Elements. pSV2neo-integration site flanking se- whereas the entire second allele except for a 0.5-kb segment quences were positioned on the 210-kb sequence (Fig. 4) at (position 83–83.5 kb) was absent. In the MB436 DNA, we found 109–110 kb (GenBank accession no. U06118) and at 146–147 four breakpoints for each allele as shown in Fig. 3. Interestingly, kb (GenBank accession no. U40597) near the FHIT exon 5 at this cell line with the most complex FRA3B alterations was 149 kb. An inserted sequence was found in P4 cosmid sub- derived from a breast cancer that was treated with radiation and cytotoxic therapy before establishment of the cell line. Among the 22 breakpoints sequenced in these five cell lines, 16 were within L1-rich subregions; there were two ‘‘hot spots’’, 50–100 kb and 150–180 kb, associated with a deletion in the 210-kb region (Fig. 3). In the HK1 line, four breakpoints on allele a, at 81, 153.8, 153.9, and 178 kb, were located within two large L1 sequences. This is true for four other break͞rejoins: in the LS180 line (71 kb and 180 kb on allele b), and in the MB436 line (71 kb and 152 kb on allele a). In the HK1 cell line, two breakpoints on allele a were in L1 sequences at positions 83 kb and 83.5 kb, but the joining sequences were outside the 210-kb sequence, presumably in flanking regions. The same applied to a break on allele a in the MB436 line at 152 kb. In addition, four breaks were within 2 kb of an L1 sequence: one at 61 kb in the KATO III line, two in LoVo at 179 kb and in BAC 358N7, and one at 178 kb on allele a in the HK1 line. The previously reported U60203 lung cancer break- point sequence, which is not included in the 16 breaks discussed, also was not far from a large L1 sequence at 50 kb (Fig. 3 and 4). The breakpoints in LS180 and MB436 at 71 kb were identical, and FIG. 2. Repeats in the FHIT͞FRA3B region. In this modification of Smit’s original figure (36), the whole genome was divided into three although the joining sequences for the two breakpoints were groups according to GC content, Ͻ43%, 43–52%, and Ͼ52%. MIR derived from different sites, position 180 kb for LS180 and 152 kb and L2 elements are combined. The ordinate shows the fraction of the for MB436, the joining sequences were 90% homologous over respective sequenced region occupied by specific repetitive classes. 1,500 bp, as shown in Fig. 5. Another MB436 breakpoint at 232 Downloaded by guest on September 25, 2021 Genetics: Inoue et al. Proc. Natl. Acad. Sci. USA 94 (1997) 14587

FIG. 3. Deletions in cancer cell lines. Depiction of 21 breakpoints (not including the LoVo breakpoint in BAC clone 358N7) with locations relative to L1 elements. L1 sequences of Ͼ1 kb, bold arrows; L1 sequences of Ͻ1 kb, small arrowheads

kb is near L1 sequences at the 3Ј end of the MTERF pseudogene, DISCUSSION not far from the cl3 hybrid break. Although other breaks showed The Fragile Region. The structural basis for the cytogenetically no direct association with repetitive elements, a total of 16 of 22 visible chromosomal gaps that define fragile sites is not known, cancer cell breaks were located in or near L1 sequences (Gen- although the heritable fragile sites thus far defined somehow Bank accession nos. AF020609–AF020615), demonstrating that transmit expanded repeats at the nucleotide level to an aberrant the deletion machinery in this FRA3B region was influenced by chromatin structure that appears as a gap in metaphase chromo- DNA structure, particularly by L1 repetitive sequences. somes. At least one heritable fragile site can be associated with Sequence of the t(3;8) Break. The sequence of the normal a deletion, with one deletion endpoint located within 20 kb of the chromosome 3 region in which the translocation occurred was expanded triplet repeat. It may be that the fragile site or gap determined by sequencing of an 8.4-kb phage subclone from the corresponds to a single or double strand break within the triplet D3S1480 phage (shown in Fig. 1) that crossed the 3p14.2 break- caused by the interference with DNA replication and that the point. The position of the breakpoint was determined by ampli- deletion is then caused by incorrect repair of the break; thus, the fication of products from a der 3 (8qter 3 8q24::3p14.2 3 3qter) actual site of the break or fragility is lost in the repaired hybrid (41) with primer pairs designed from the 8,392-bp se- chromosome. Not all expanded triplet sequences have been quence (numbered 1–8,392 from telemeric end; GenBank acces- identified as fragile. Thus, there must be other components of sion no. AF019967). The breakpoint was between nucleotides chromatin that affect fragility, possibly including surrounding 1,086 and 1,608, which includes an Alu from 1,287 to 1,608. Thus, sequence, time of DNA replication, and level and tissue speci- the breakpoint is in an Alu or within 200 bp of the Alu sequence. ficity of expression, although no characteristics aside from ex- panded polymorphic repeat sequences have been definitively associated with fragile site detection. The basis for fragility at common fragile sites is not known, but it has been shown previously that the FRA3B ‘‘site’’ at 3p14.2 represents a broad region of fragility (13, 14) that may extend at least a megabase, with an epicenter of a few hundred kilobases surrounding the FHIT exon 5 (15), in which more than 60% of induced gaps occur. Nearly 300 kb of the fragile region has now been sequenced, and the basis for fragility across the region is not obvious. Because all individuals express this fragile region, spe- cific polymorphisms would not be expected to be associated with fragility, although they might affect the degree of fragility. Without DNA from genetically unaffected individuals, the fragile sequence(s) will be difficult to identify. Numerous molecular genetic landmarks, such as translocations, insertion sites, somatic cell hybrid breakpoints, and cancer cell deletion breakpoints, have Ϸ FIG. 4. Positions of FRA3B͞FHIT rearrangements. Illustration of been placed precisely within the 300-kb context and certain the 276-kb sequenced region summarizing positions of deletion end- conclusions can be inferred. points, hybrid breaks, plasmid or viral flanking sequences, and trinu- The Sequence. In our analysis of positions of chromosomal cleotide repeats. fragile sites relative to FHIT exons, we found that an exon 4 cosmid Downloaded by guest on September 25, 2021 14588 Genetics: Inoue et al. Proc. Natl. Acad. Sci. USA 94 (1997)

Table 2. Summary of deletions and breakpoints in cancer cell lines Cell line Allele Deletion, kb Length, kb Breakpoint, kb Repetitive sequence LS180 a 22–144 122 22.024͞143.708 (Ϫ)͞(Ϫ) b 71–180 109 70.784͞180.720 L1͞L1 KATO III a 1–61 Ͼ61 61.369͞BAC358N7 L1 LoVo a 1–179 Ͼ179 179.243͞BAC358N7 L1͞L1 HK1 a 81–154 73 80.561͞153.878 L1͞L1 154–178 24 153.930͞178.288 L1͞L1 b 1–83 Ͼ83 83Ϸ83.5 L1* 83.5–Ϸ180 Ͻ97 83.5Ϸ84 L1* MB436 a 3–70 67 2.832͞70.494 MIR͞(Ϫ) 71–152 81 70.784͞152.126 L1͞L1 152–? Ͼ80 152Ϸ153 L1* b 3–70 67 2.832͞70.494 MIR͞(Ϫ) 94–232 138 93.880͞231.609 L1͞L1† (Ϫ), breakpoint not in repetitive elements. *Breakpoints located by PCR analysis of homozygous deletion; sequences not determined. †Breakpoint at position 232 kb was in the sequence reported by Boldog et al. (1997).

was centromeric to Ϸ70% of the gaps and found that cosmid 63 specificity. Presumably, the hybrid breakpoints are breaks that was telomeric to Ϸ84% of the gaps (15). Thus, we chose to have been repaired so they might not be useful in identifying the sequence a 210-kb region starting at cosmid 63 in midintron 5 and actual fragile sites. Possibly one of the pSV2neo-integration site ending in intron 4; in combination with the published partial flanking sequences in intron 5 represents an actual fragile site, but sequence of intron 4 (17), a 276-kb sequence encompassing the the integration also involved deletion of a large intervening majority of fragile ‘‘breaks,’’ is now available. The total region, sequence. The integration site flanking sequences are in unique although in a Giemsa-light chromosome band, is AT rich and Alu sequence falling between the major L1 rich clusters, as summa- depleted relative to the whole genome. There are numerous short rized in Fig. 4. The comparison of sequences around the integra- Ј and long repeats distributed over the region, including some triplet tion site to each other, to the 3 HPV integration site flanking repeats, but neither the overall structure nor the specific regions sequences, and to the hybrid break cluster did not yield clues to within the sequence provided clues to the exact points within the FRA3B break mechanisms. Another type of breakpoint in the sequence that are responsible for induced gaps. To pinpoint sites fragile region, but centromeric to the map shown, is the t(3;8) of fragile breaks, others have used aphidicolin-induced hybrid translocation, between FHIT exons 3 and 4. This break occurred clones with breaks or plasmid insertions in 3p14.2 (7, 13, 18) to within or near an Alu sequence, perhaps similarly to the centro- isolate sequences involved in breakage. The hybrid breaks and meric side of the HPV16 integration (14), and could represent a plasmid integration flanking sites are distributed over the region, sequence susceptible to gap formation. These observations suggest as shown in Fig. 4, and their positions do not suggest sequence that the fragility in the FRA3B region is not dependent on specific sequences, but perhaps on long-range structural aspects associated with DNA replication, possibly involving transcriptional activity. Cancer Cell Deletions. The deletions within this region in cancer cells, as well as in carcinogen-exposed tissues (28, 42), preneoplasias and neoplasias, frequently involving both FHIT alleles, are no doubt associated with the fragile nature of the locus. A possible scenario is that the gaps seen after aphidi- colin treatment represent single or double strand breaks that occur during replication. If the breaks are not repaired, the cell may be programmed to die; if the breaks are repaired properly, there is no consequence; if the breaks are repaired improperly, a FHIT exon could be deleted, imparting an in vivo advantage to the cell, especially when both FHIT alleles are inactivated. Thus, we examined the deleted FHIT alleles in several types of cancer cells for clues to fragility, as well as clues to mechanism of repair to breaks and deletions. We first fine-mapped positions of deletion endpoints on the sequenced region, and then we sequenced the deletion endpoints by inverse PCR. As shown in Fig. 3, the structure of cancer cell deleted alleles was complex, with all cell lines exhibiting independent deletions of both FHIT alleles, as predicted (20). Most interestingly, 16 of 22 break or deletion repairs involved L1 sequences of Ͼ1 kb. Thus, the deletion repair mechanism apparently involved recombination between L1 elements. Recently, the sequence of a 112-kb region of the human dystrophin gene intron 7 was FIG. 5. L1 associated breakpoints and sequences. (A) Six L1 reported to contain mutational ‘‘hot spots’’ including translo- sequences in the 210-kb region are aligned to the active LINE-1 (L1.2) Ͼ cation breakpoints (43). The sequence contained only nine Alu prototype sequence (46). These L1 elements showed 80% sequence elements (2.4% of the region) and 22 L1s (25.8% of the 112-kb homology to the prototype over 1,000 bp. Note that all L1s were truncated, retaining only a 3Ј portion of the sequence, and most region). This 112-kb region was similar to our sequenced breakpoints were located at the 5Ј end of the retained sequence. EN, region, regarding features such as the length of intron size, Alu endonuclease; RT, reverse transcriptase; C, cysteine-rich region. (B) depletion, richness in LINE sequences, and rearrangements, Rearrangements and sequence of breakpoints in MB436 and LS180; although it is not known to be fragile. It also has been noted note sequence homology to the L1 prototype. (44) that intrachromsomal recombination between identical Downloaded by guest on September 25, 2021 Genetics: Inoue et al. Proc. Natl. Acad. Sci. USA 94 (1997) 14589

sequences is an active mechanism for generation of deletions, 6. Wang, N. D., Testa, J. R. & Smith, D. I. (1993) Genomics 17, 341–347. 7. Rassool, F. V., Le Beau, M. M., Shen, M. L., Neilly, M. E., Espinosa, R. R., Ong, and the rate of intrachromosomal recombination is a direct S. T., Boldog, F., Drabkin, H., McCarroll, R. & McKeithan, T. W. (1996) function of the homology between the two elements, dropping Genomics 35, 109–17. 8. Yunis, J. J., Soreng, A. L. & Bowe, A. E. (1987) Oncogene 1, 59–69. off when the homology is below 200 bp. L1 repetitive elements 9. Yunis, J. J. & Soreng, A. L. (1984) Science 226, 1199–1204. are thought to be derived from the active L1 prototype with 10. Le Beau, M. M. & Rowley, J. D. (1984) Science 226, 1199–1204. transposon activity. Recently, seven active L1 elements were 11. Cannizzaro, L. A., Durst, M., Mendez, M. J., Hecht, B. K. & Hecht, F. (1988) Cancer Genet. Cytogenet. 33, 93–98. identified in the and a consensus sequence was 12. Popescu, N. C. & DiPaolo, J. A. (1989) Cancer Genet. Cytogenet. 42, 157–171. reported (45, 46). Active elements were not among the 43 L1 13. Paradee, W., Wilke, C. M., Wang, L., Shridhar, R., Mullins, C. M., Hoge, A., Glover, T. W. & Smith, D. I. (1996) Genomics 35, 87–93. elements identified in our sequenced region, but some of the 14. Wilke, C. M., Hall, B. K., Hoge, A., Paradee, W., Smith, D. I. & Glover, T. W. elements showed high sequence homology with the prototype (1996) Hum. Mol. Genet. 5, 187–195. L1. These elements, including the L1s at 56.3 kb (1,444 bp), at 15. Zimonjic, D. B., Druck, T., Ohta, M., Kastury, K., Croce, C. M., Popescu, N. C. & Huebner, K. (1997) Cancer Res. 57, 1166–1170. 57.8 kb (1,060 bp), 70.8 kb (1,616 bp), 152 kb (1,817 bp), and 16. Ohta, M., Inoue, H., Cotticelli, M. G., Kastury, K., Baffa, R., Palazzo, J., 180.1 kb (2,249 bp), also showed 90% homology to each other Siprashvili, Z., Mori, M., McCue, P., Druck, T., Croce, C. M. & Huebner, K. (1996) Cell 84, 587–597. over 1,000 bp and were associated with DNA deletion. In 17. Boldog, F., Gemmill, R. M., West, J., Robinson, M., Robinson, L., Li, E., Roche, particular, MB436 and LS180 cell lines showed the same J., Todd, S., Waggoner, B., Lundstrom, R., Jacobson, J., Mullokandov, M. R., breakpoint at position 71 kb and, although the joining break- Klinger, H. & Drabkin, H. A. (1997) Hum. Mol. Genet. 6, 193–203. 18. Wang, L., Paradee, W., Mullins, C., Shridhar, R., Rosati, R., Wilke, C. M., Glover, point sequences were at different sites (position 152.1 kb in T. W. & Smith, D. I. (1997) Genomics 41, 485–488. MB436 and180.7 kb in LS180), they showed high sequence 19. Kastury, K., Baffa, R., Druck, T., Ohta, M., Cotticelli, M. G., Inoue, H., Negrini, M., Rugge, M., Huang, D., Croce, C. M., Palazzo, J. & Huebner, K. (1996) Cancer homology (see Fig. 5), indicating that the deletions involved Res. 56, 978–983. common mechanisms. 20. Druck, T., Hadaczek, P., Fu, T. B., Ohta, M., Siprashvili, Z., Baffa, R., Negrini, M., Kastury, K., Veronese, M. L., Rosen, D., Rothstein, J., McCue, P., Cotticelli, L1-mediated gene rearrangements have been reported in var- M. G., Inoue, H., Croce, C. M. & Huebner, K. (1997) Cancer Res. 57, 504–512. ious disorders: a 300-kb deletion in intron 10 of the PAX6 gene in 21. Virgilio, L., Shuster, M., Gollin, S. M., Veronese, M. L., Ohta, M., Huebner, K. familial aniridia (47), a translocation within COL5A intron 24 in & Croce, C. M. (1996) Proc. Natl. Acad. Sci. USA 93, 9770–9775. 22. Sozzi, G., Sard, L., De Gregorio, L., Marchetti, A., Musso, K., Buttitta, F., Ehlers–Danlos syndrome (48), and an insertional mutation of a Tornielli, S., Pellegrini, S., Veronese, M. L., Manenti, G., Incarbone, M., Chella, 7.1-kb L1 element within intron 6 of the ␤ subunit of the glycine A., Angeletti, C. A., Pastorino, U., Huebner, K., Bevilaqua, G., Pilotti, S., Croce, C. M. & Pierotti, M. A. (1997) Cancer Res. 57, 2121–2123. receptor in congenital myoclonia (49). The association of L1 23. Greenspan, D. L., Connolly, D. C., Wu, R., Lei, R. Y., Vogelstein, J. T. C., Kim, elements with DNA deletion in cancer cells had not previously Y.-T., Mok, J. E., Munoz, N., Bosch, X., Shah, K. & Cho, K. R. (1997) Cancer Res. been observed. Loss of tumor suppressor genes is a primary step 57, 4692–4698. 24. Huebner, K., Hadaczek, P., Siprashvili, Z., Druck, T. & Croce, C. M. (1997) in carcinogenesis and our results suggest that the frequent dele- Biochim. Biophys. Acta 1332, M65–M70. tions in the FRA3B͞FHIT locus are mediated by L1 sequences and 25. Siprashvili, Z., Sozzi, G., Barnes, L. D., McCue, P., Robinson, A. K., Eryomin, V., Sard, L., Tagliabue, E., Greco, A., Fusetti, L., Schwartz, G., Pierotti, M. A., result in inactivation of the FHIT gene. Croce, C. M. & Huebner, K. (1997) Proc. Natl. Acad. Sci. USA 25, 13771–13776. The distal, aphidicolin-induced breakpoint cluster (17, 18, 50), 26. Druck, T., Kastury, K., Hadaczek, P., Podolski, J., Toloczko, A., Sikorski, A., was located at position 203 kb; we found no carcinoma-associated Ohta, M., LaForgia, S., Lasota, J., McCue, P., Lubinski, J. & Huebner, K. (1995) Cancer Res. 55, 5348–5353. deletion breakpoints in and around this position. The aphidicolin- 27. Shridhar, R., Shridhar, V., Wang, X., Paradee, W., Dugan, M., Sarkar, F., Wilke, induced break in the A5–4 hybrid, probably located Ϸ70 kb C., Glover, T. W., Vaitkevicius, V. K. & Smith, D. I. (1996) Cancer Res. 56, 4347–4350. telomeric of FHIT exon 5 (17), could fall near an L1 in the 28. Sozzi, G., Huebner, K. & Croce, C. M. (1997) Adv. Cancer Res., in press. deletion breakpoint ‘‘hot spot’’ within subregion 50–100 kb. The 29. Ahmadian, M., Wistuba, I. I., Fong, K. M., Behrens, C., Kodagoda, D. R., Saboorian, M. H., Shay, J., Tomlinson, G. E., Blum, J., Minna, J. D. & Gazdar, spontaneous break or break repair in hybrid cl3 was within 1 kb A. F. (1997) Cancer Res. 57, 3664–3668. of an L1 sequence and a cancer cell deletion endpoint (Fig. 4). 30. Fong, K. M., Biesterveld, E. J., Virmani, A., Wistuba, I., Sekido, Y., Bader, S. A., Perspective. It will be important to determine whether there is Ahmandian, M., Ong, S. T., Rassool, F. V., Zimmerman, P. V., Giaccone, G., Gazdar, A. F. & Minna, J. D. (1997) Cancer Res. 57, 2256–2267. polymorphism in the human population at fragile breaks (possibly 31. Larson, A. A., Kern, S., Curtiss, S., Gordon, R., Cavenee, W. K. & Hampton, integration or translocation sequences) or at break recombination G. M. (1997) Cancer Res. 57, 4082–4090. 32. Cohen, A. J., Li, F. P., Berg, S., Marchetto, D. J., Tsai, S., Jacobs, S. C. & Brown, sequences such as the deletion endpoints. The answers to some of R. S. (1979) N. Engl. J. Med. 301, 592–595. these questions can be determined by PCR amplification, but the 33. Kastury, K., Ohta, M., Lasota, J., Moir, D., Dorman, T., LaForgia, S., Druck, T. repetitive nature and size of the repeat clusters will require & Huebner, K. (1996) Genomics 32, 225–235. 34. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., investigation by using Southern blot to determine whether some Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A. & Merrick, J. M. (1995) individuals exhibit fewer L1-sequences in the region. Perhaps, if Science 269, 496–512. 35. Altschul, S. F., Gish, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol. 215, this is the case, the locus would still be fragile as expected, with less 403–410. likelihood of L1 mediated repair. An immediate possibility is to 36. Smit, A. F. A. (1996) Curr. Opin. Genet. Dev. 6, 743–748. 37. Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. & Uberbacher, E. C. (1994) in An use primer pairs flanking the characterized deletions to determine Improved System for Exon Recognition and Gene Modeling in Human DNA whether similar deletions occur frequently in primary cancers. Sequences, ed. Press, A. (AAAI, Menlo Park, CA), pp. 376–384. Thus, we may be able to determine whether environmental insult 38. Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156–5163. to the common fragile region is more likely to result in FHIT 39. Burge, C. & Karlin, S. (1997) J. Mol. Biol. 268, 78–94. deletion in individuals, depending on specific configurations of L1 40. Ochman, H., Gerber, A. S. & Hartl, D. L. (1988) Genetics 120, 621–623. 41. LaForgia, S., Morse, B., Levy, J., Barnea, G., Cannizzaro, L. A., Li, F., Nowell, loci for example, and if said FHIT deletions can be useful in P. C., Boghosian-Sell, L., Glick, J., Weston, A., Harris, C. C., Drabkin, H., following the natural history of the disease in important cancers. Patterson, D., Croce, C. M., Schlessinger, J. & Huebner, K. (1991) Proc. Natl. Acad. Sci. USA 88, 5036–5040. This research was supported by U.S. Public Health Service Grants 42. Mao, L., Lee, J. S., Kurie, J. M., Fan, Y. H., Lippman, S. M., Lee, J. J., Ro, J. Y., CA39860, CA51083, CA21124, and CA56336 from the National Broxson, A., Yu, R., Morice, R. C., Kemp, B. L., Khuri, F. R., Walsh, G. L., Hittelman, W. N. & Hong, W. K. (1997) J. Natl. Cancer Inst. 89, 857–862. Cancer Institute and by a gift from Mr. R. R. M. Carpenter III and Mrs. 43. McNaughton, J. C., Hughes, G., Jones, W. A., Stockwell, P. A., Klamut, H. J. & Mary Carpenter. Genomic clones covering the region of the t(3;8) Petersen, G. B. (1997) Genomics 40, 294–304. break were isolated by Kumar Kastury. 44. Waldman, A. S. & Liskay, R. M. (1987) Proc. Natl. Acad. Sci. USA 84, 5340–5344. 45. Feng, Q. H., Moran, J. V., Kazazian, H. H. & Boeke, J. D. (1996) Cell 87, 905–916. 1. Sutherland, G. R. & Richards, R. I. (1995) Curr. Opin. Genet. Dev. 5, 323–327. 46. Sassaman, D. M., Dombroski, B. A., Moran, J. V., Kimberland, M. L., Naas, T. P., 2. Jones, C., Penny, L., Mattina, T., Yu, S., Baker, E., Voullaire, L., Langdon, W. Y., Deberardinis, R. J., Gabriel, A., Swergold, G. D. & Kazazian, H. H. (1997) Nat. Sutherland, G. R., Richards, R. I. & Tunnacliffe, A. (1995) Nature (London) 376, Genet. 16, 37–43. 145–149. 47. Drechsler, M. & Royer-Pokora, B. (1996) Hum. Genet. 98, 297–303. 3. Yu, S., Mangelsdorf, M., Hewett, D., Hobson, L., Baker, E., Eyre, H. J., Lapsys, 48. Toriello, H. V., Glover, T. W., Takahara, K., Byers, P. H., Miller, D. E., Higgins, N., Paslier, D. L., Doggett, N. A., Sutherland, G. R. & Richards, R. I. (1997) Cell J. V. & Greenspan, D. S. (1996) Nat. Genet. 13, 361–365. 88, 367–374. 49. Kingsmore, S. F., Giros, B., Suh, D., Bieniarz, M., Caron, M. G. & Seldin, M. F. 4. Glover, T. W. & Stein, C. K. (1988) Am. J. Hum. Genet. 43, 265–273. (1994) Nat. Genet. 7, 136–141. 5. Warren, S. T., Zhang, F., Licambi, G. R. & Peters, J. F. (1987) Science 237, 50. Paradee, W., Mullins, C., He, Z., Glover, T., Wilke, C., Opalka, B., Schutte, J. & 420–423. Smith, D. I. (1995) Genomics 27, 358–361. Downloaded by guest on September 25, 2021