IIIHIII IIII US005474896A United States Patent (19) 11) Patent Number: 5,474,896 Dujon et al. (45. Date of Patent: Dec. 12, 1995

54 NUCLEOTIDE SEQUENCE ENCODING THE 185-187 (1980). ENZYME-SCE AND THE USES THEREOF Michel, F. et al., Comparison of fungal mitochondrial reveals extensive homologies in RNA secondary 75) Inventors: Bernard Dujon, Gif sur Yvette; Andre structure, Biochimie, vol. 64: 867-881 (1982). Choulika, Paris, both of France; Michel, F. and Dujon, B., Conservation of RNA secondary Laurence Colleaux, Edinburgh, structures in two families including mitochondrial-, Scotland; Cecile Fairhead, Malakoff, chloroplast- and nuclear-encoded members, EMBO Jour France; Arnaud Perrin, Paris, France; nal, vol. 2(1): 33-38 (1983). Anne Plessis, Paris, France; Agnes Jacquier, A. and Dujon, B., The Intron of the Mitochondrial Thierry, Paris, France 21s rRNA : Distribution in Different Species and Sequence Comparison Between Kluyveromyces thermotol 73 Assignees: Institut Pasteur; Université Paris-VI, erans and , Mol. Gen. Genet., both of France 192: 487-499 (1983). Dujon, B. and Jacquier, A., Mitochondria 1983, Walter de (21) Appl. No.: 971,160 Gruyter & Co., pp. 389-403. Jacquier, A. and Dujon, B., An Intron-Encoded is 22 Filed: Nov. 5, 1992 Active in a Process That Spreads an Intron into a Mitochondrial Gene, , vol. 41: 383-394 (1985). Related U.S. Application Data Dujon, B. et al., in Achievements and Perspectives of 63 Continuation-in-part of Ser. No. 879,689, May 5, 1992, Mitochondrial Research, vol. II: Biogenesis, Elsevier Sci abandoned. ence Publishers, pp. 215-225 (1985). Colleaux, L. et al., Universal Code Equivalent of a Yeast

(51) Int. Cl...... C12Q 1/68; C12N 15/70 52 U.S. Cl...... 435/6; 435/320.1; 935/78 Mitochondrial Intron Reading Frame is Expressed into E. 58) Field of Search ...... 435/6, 255,320.1; coli as a Specific Double Strand Endonuclease, CELL, vol. 935/77, 78 44:521-533 (1986). Michel, F and Dujon, B., Genetic Exchanges between 56) References Cited Bacteriophase T4 and Filamentous Fungi?, CELL, vol. 46: PUBLICATIONS 323 (1986). Tartot et al., Gene 67:169-182 (1988) “New cloning vectors (List continued on next page.) and techniques . . . '. Primary Examiner-Margaret Parr Colleaux et al, Human Mol. Genet. 2(3):265-271 (1993) Assistant Examiner-Eggerton Campbell "Rapid Mapping of YAC inserts...'. Attorney, Agent, or Firm-Finnegan, Henderson, Farabow, Dujon, Bernard "Group I introns as mobile genetic ele Garrett & Dunner ments: facts and mechanistic speculations-a review,' GENE, vol. 82: 91-114 (1989). 57 ABSTRACT Dujon, Bernard “Mobile Introns', EMBO Workshop on An isolated DNA encoding the enzyme I-SceI is provided. Molecular Mechanisms of Transposition and Its Control, Roscoff, France, Jun. 24-28, 1990. The DNA sequence can be incorporated in cloning and Dujon, Bernard "Des Introns Autonomes et Mobiles', expression vectors, transformed cell lines and transgenic Annales de L'Institut Pasteur/Actualites, i:181-194 (1990). animals. The vectors are useful in gene mapping and site Dujon, B., Sequence of the Intron and Flanking Exons of the directed insertion of . Mitochondrial 21s rBNA Gene of Yeast Strains Having Different Alleles at the co and rib-1 Loci, CELL, vol. 20: 2 Claims, 22 Drawing Sheets

AAAAATAAAATCAT AG AAA AAT ATT AAA AAA AA CAA GA ATC AAT CTC 66 CCAA TC M. K. N. I K K N G W M N G P N S AAA TTA TTA AAA GAA TAT AAA CA CAA TA AT GAA TA AAT ATT GAA CAA TT GAA GCA K K E Y K S Q E L E G F E A GGT AT GGTTTA AT TTA. GGA GAT GCTA A CGT AG CGT GAT GAA GGT AAA AC TAT G I. 6 L G D A Y R S R B E S K T Y GT AG CAA TTT CAC TCC AAA AA AAG GCA TAC ATG GAT CAT GA TGT TTA TTA, TA GAT C M F. E. W. K. N. K. A Y M D H W C Y D CAA TGG GTATTACA CCT CCT CAT AAA AAA GAA AGA is AA CA TTA. GGT AAT AGA Q W W L S P P H K K E R W N G. N. L. W. AT ACC TGG GGA GCT CAA ACT TT AAA CAT CAA GOT TAAT AAA TTA GCT. AAC A TT K A F M K L A AT GTA AAT AAT AAA AAA CT ACC AA AAT TA GT GAA AA TA TTA ACA CC ATG Y N N K K L P N N L W E N Y T P M AGT CG GCA TA, TGG TTT ATG GA GA GGA GGT AAA TGG GAAT AAAAA AAT C CT S A Y W F M D D G G K D Y N K. N S AA AAA AGT A GTA TTA AAT ACA CAA AG T ACT TT GAA GAA GTA GAA AA CT N K S W N T Q S F F E E W C Y W AAA GGT TA AGA AATAAA TTT CAA TTA AA TG TAT ST AAA ATT AATAAA AATAAA CCA K G R N K F Q C Y V K K N K P At A TATAT GAT TCT AG, AGT TAT. CTG AT TTT TAT AAT TA At AAA CCT TAT TTA Y I D S M S Y F Y N K P Y L. ATT CCT CAA AGATS TATAAA CTG CCAA AC AT TCA CC GAA ACTA AAA TAA I P G M M Y K L N S S E F L K 5,474,896 Page 2

OTHER PUBLICATIONS Saccharomyces and Neurospora introns, Mol. Gen. Genet., Colleaux, L. et al., Recognition and cleavage site of the vol. 223: 288-296 (1990). intron-encoded omega transposase, Proc. Natl. Acad. Sci. Dujon, B. et al., in Extrachromosomal Elements in Lower USA, vol. 85: 6022-6026 (1988). , (Plenum Publishing Corporation 1986), pp. Dujon, B. et al., Mobile introns: definition of terms and 5-27. recommended nomenclature, GENE, vol. 82: 115-118 (1989). Thierry, A. et al., Cleavage of yeast and bacteriophage T7 Monteilhet, C. et al., Purification and characterization of the genomes at a single site using the rare cutter endonuclease in vitro activity of I-Scel, a novel and highly specific I-Sce, Nucleic Acids Research, vol. 19(1): 189-190 (1991). endonuclease encoded by a group 1 intron, Nucleic Acids Plessis, A. et al., Site-specific recombination by I-Scel; a Research, vol. 18(6), 1407-1413 (1990. mitochondrial group 1 intron-encoded endonuclease Colleaux, L. et al., The apocytochrome b gene of Chlamy expressed in the yeast nucleus, , vol. 130(3): domonas smithi contains a mobile intron related to both 451-460 (1992).

U.S. Patent Dec. 12, 1995 Sheet 3 of 22 5,474,896

severely affected A AA i partially affected i unoffected 5' i:GT TA CGC III:ATAA 3' 3' CAAT GCGAT AT 5'

U.S. Patent Dec. 12, 1995 Sheet 6 of 22 5,474,896

-2 -1 5 10 M. H M K N I K K N G V M N L G P N S 20 30 K K E Y K S Q L I E L N E Q F E A LO 50 G G L I L G D A Y I R S R D E G K T Y 60 70 C M Q F E W K N K A Y M D H W C L L Y C 80 90 Q W Y L S P P H K K E R Y N H L G N L Y 100 10 I T W G A Q T F K H Q A F N K L A N L. F 120 130 I V N N K K I I P N N L V E N Y L T P M 140 150 G L A Y W P M D D G G K W D Y N K N S I 160 170 N K S I W L N T Q S F T F E E W E Y L. W 180 190 K G L R N K F Q. L N C Y V K N. K. N. K. P 200 20 I I Y D S M S Y L I F Y N I K P Y L. 220 230 P Q M M Y K L P N T S S E T F L K * POSitions that Can be Changed Without affecting enzyme activity (demonstrated) positions -l and -2 are not natural, The two amino acids are added due to ClOning Strategies pOSlt). OnS Ug 10; Can be deleted position 35: G is tolerated position 40: M or V are tolerated position 41: S or N are tolerated position 43: A is tolerated position 46: W or N are tolerated position 91A is tolerated positions 123 and 156; L are tolerated pOSition 225; A and S are tolerated Changes that affect enzyme activity (demon Strated) position 19; L to S position 38: t O S position 39: position 4Q: position 42: p0Sition 44: O H position 45: position 46: position pOSlton rr position | position F.G. 5 pOSition pOSiti On : U.S. Patent Dec. 12, 1995 Sheet 7 of 22 5,474,896 Group I intron Encoded Endonucleases and Redted Endonucleoses

CLEAVAGE ENDONUCLEASE RECOGNITION SEQUENCE SITE V INTRON SITE I-Sce

I-Sce IV (Saccharomyces mitochondrio) I Sce I (Saccharomyces mitochondria) (ChlamydomonasI-Ceu chloroplast)I I-Ppo I ( Phys OU C leus (SaccharomycesI-Sce mitochondrio) II (ChlamydomonasI-Cre chloroplast) Endo. Sce I(RF5 Socchoromyces mi tOc handi) (Non intronic) HO (Sacchgromyces nucleus) (Non intronic) I-CSm Chlamydomonas mitochondria) (Putative endonuclease) I-Pon I ( Podos poro mitochondriig) ( Putotive endonuclease5 (Bacteriophage T4)

FIG, 6 U.S. Patent Dec. 12, 1995 Sheet 8 of 22 5,474,896 XPRESSION VECTORS

ECORI ECORI 2 MCRON 2 MCRON

pPEX 408 pPEX7.1 417kb.

EcoRI

GAL CYC EcoRI 2 MICRON

EcoRI FIG, 7

Hind 5457 Pyu II 5224 Nor I 5222 ECOR I. 89 H ind II 4805 2á LTR RSV I Sce-I Pvu II 4384. 2 Pst I 4278 pRSV I Sce-I 5514 bp

Pst 1890 POLYA SV40 Bom I 3393 SCC 2127

EcoRI. 2642 Pst 2848 FIG, 8 U.S. Patent Dec. 12, 1995 Sheet 9 of 22 5,474,896 Tth 1112669 BSGAI 2664 Xco 2645 Snd I. 2645 Nde II 2594

g BceP "I I58 25. ECOR 2381 E SOC II 2.375

Kpn I 2369 E. &R 23

Sma : 23.65 Xbo II 2554 Sol I 2.348 Hinc 2.348 BspMI 2544 Pst I 2,542 Sse8537 I 234

6 Bbs 15 ECOO 109 73 Act 189 Ssp I 309 EcoS7 I 392 Xmn

513 SCO I

624 PVu I pAF 100 652 AVO II --

PVu 2062 771 Fsp I 2050 Tf I 2827 base pairs 854 AVO II sites C-2 872. Bg 1910 Tfi I 906 GSu I 1884. Af. 911 Cfr10 Drd 1776 924 BSO I

1694 Mme I 1357 Eco57 1399 BCeF 1510 Mme I-170AwNI FIG, 9 U.S. Patent Dec. 12, 1995 Sheet 10 of 22 5,474,896

Sou5A Mbo I Dpn IT Scr I Nci Msp I Hpo II DSG V Bstk I Xma I Simo I Scr I Nici I DSO V Rso I Dpn I No IV No IV Sac I Csp6 I Alw I HgiA I Bstk I Taq I Ec1 136 I Bso I Scal Sph I Rma I Kpn I BstY I Hinc NspC Bsp1286 I Bcn I Sfe I EcoR Xbd Bon Bom H Acc I-Sce I Apo I Nlo III Bon II BCn Sfc I Rmo Taq I Nsp7524 I Alu I Avd I Pst I Hgo I Aluf NPI fo, Asp71. As I spy. I, Pro I EcoR v CCAAGCTCGAATCGCAGCCAGAGCTCGGACCCGGGATCCTGCAGTCGACG SESS; ACAGAT 2.32O GGTCGAGGITAGGGTACAATTCGAGCATGGGCTAGGAEGTAGTGc(AICCCIALIGICCAT GTTA 2244 2255 2262 2271 2279 2286 2296 2318 2247 2255 2266 2275 2284 2292 2249 2.256 2265 2275 2284 2296 2249 226 2271 2279 2289 2297 2255 2265 2276 2284 2255 2262 2271 2279 2289 2265 2275 2289 2265 2275 2290 2265 2272 228O 2271 2279 2272 228O 2275 2275 2275 2275 2275 2275 2276 2276 2276 2276 2276 2276 228O 228O 228O

FIG, OA U.S. Patent Dec. 12, 1995 Sheet 11 of 22 5,474,896

Scr I Nci Msp I Hpo II DSO V BStk Bcn Xmo I Smo Scr Nc I Dso V BStk I Bsd I Sou5A I Mbo Sag. I Dpn II HgiA I Alu Sfe Dpn I RSo I Hind II Sfc Ple I No V Csp6. I Dode Pst I Hinf BStY I No V EcoR I Soujo I SSe8357 I Bomh I Kpn I Apo I Mbo I No II Taq I Aw I Bon I Alu I BceF I Dpn II Sph I Sol I Rrno I Bcn I Ec1 136 Hae III Don I NspCI Hinc Mn I Avo I Bsp1286 I Gd II BstY Nsp7524 I. Acc I Bfa I BSOJ I Bon I Ede I Bo II Nsp I, BspMI Xbo I Alw I Asp718 Taq BSr I ATCAGATCTAAGCT GCAGCCTGCAGGTCGACTCAGAGGATCCCCGGGTACCGAGCTCGAATCACTGGCCGTCGTT 2400

TAGISTAGATCGAGTACGAGTCAGTGAGATTGTAGGGGCATGGGTCGATAGIGAGGGCAGCAA2524 2,336 2544 2354 2361 2369 2379 2387 2524 2,336 2348 2355 2564 2375 2389 2525 2,336 2.348 2.358 23.65 2375 2389 2325 2,336 2348 2355 2365 2375 2,390 2525 2337 2349 2560 2369 2376 2391 2525 2341 2560 2369 238 2328 2342 2,351 2360 2369 2381 2330 2342 2351 2360 2370 2331 2542 2361 2370 2361 2375 2361 2375 2361 2365 2365 23.65 23.65 2365 2365 23.65 2,366 2,366 2366 2366 2366 2366 Mde II 2366 Scr II EcoR II Dso V. BStN Fok I Alu I Moe III Bstk I Fnu4H I PVU II Moe II BSr I Bsd Mse I Bow I NspB II TACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGT 24.80 ATGTGCAGACIGACCCTTGACCGEATGGGTGATAGCGGGTCGTSTAGGGGGAAGCSGTCGACCGCA9 O p 8 o 2405 24.13 2423 2440 2451 24.71 2410 323; 251 2:y 2 2424 2 2

2:3: 24.30 FIG, OB U.S. Patent Dec. 12, 1995 Sheet 12 of 22 5,474,896

NAMES: plSm a pTKm a pTTc c) mob RP4,

Km

Sm

Tc tnp I-Sce SITE

Construction: pCP 704 from De Lorenzo, with transposase gene and insertion of the linker-Sce in Nott unique site FIG, U.S. Patent Dec. 12, 1995 Sheet 13 of 22 5,474,896

Construction: p) 123, from J.D. Boeke with insertion of a linker I-Sce-Nott in BamHI

FIG, 12

Sno 8806 Xbo 9009 Bornh 8997 Nor 8778 Soc 9033 Kpn I 8810 Sco 4-36

Soc 8743 EcoR 831 Nde II 8265 Kpn I. 2205 Xho 222O SO 2226 24-HinD 2241 EcoR V 2402 Soc 724 pMLV LTR SAPLZ Nhe 2829 9685 bp ECOR V 2940 (7 EcoR V 30.15 PhleoLocz S Xbo I SO96 EcoR V 6418 Soc 3212

Nor 5247 Pst I 3810 Sno 3275 Smo II 5021 Nor 41 06 Kpn I 3279 Nico 493 Nor I 4587 Kpn I 4926 FIG, 3A -Sce U.S. Patent Dec. 12, 1995 Sheet 14 of 22 5,474,896

IJDN

IODS

I3DS

IJON

IODS

U.S. Patent Dec. 12, 1995 Sheet 15 of 22 5,474,896

5 s & t G s

left end probe cosmid puxG 040 FIG.14B U.S. Patent Dec. 12, 1995 Sheet 16 of 22 5,474,896

12O 560 A502 NNNNNNNNNNNNNNNNNNNNNNNNNNNS D304 SSSNNNNNNNNNNNNNNNNNNNNNNNNSSSSSS12O 560 E40 SSSNNNNNNNNNNNNNNNNNNNNNNNNSSSSS50 61 O G41 SSSNNNNNNNNNNNNNNNNNNNSSSSS HB1 Ša\NNNNNNNNNNNNNNNNNNNNNNNSSSSS M57 NSSSNNNNNNNNNNNNNNNNNNNNNNNNNSSSSS T62 SSSSSNNNNNNNN\NNNNNNNNNNSSSSSS31 O 32O FIG, 5A

12O 56O A502 NNNNNNNNNNNNNNNNNNNNNNNNN D3O4. Hliše E40 50 NNNNNNNNNNNNNNN\\\\\\\\SNS61 O 38O 27O G41 mRNNNNN H81 film250 SNNNNNNNNNNNNNNNNNNSSSSS460 M57 Slašs T62 EHaN& FIG, 5B

61 O E40 E. NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN A3O2 Al-Niš M57 Clašs

HB1 ÁSCHNS& T62 J2Ofilmin\\\\\\\\\\\\\\\\\ 31 O G41 38O min\\\\\\\\\\\\\27O

D304 E-lašs FIG, 5C U.S. Patent Dec. 12, 1995 Sheet 17 of 22 5,474,896

FIG, 15D

I-Sce M AP

ess50 40 40stee 90 150 e

pEKG 100 pEKG 090 puKG 151 | pEKG 119 pEKG 021 pUKG 046 pEKG 019 pEKG 121 pEKG 013 pUKG 066 puKG 146 puKG 047

FIG. I5E U.S. Patent Dec. 12, 1995 Sheet 18 of 22 5,474,896

i g g i

Y: 8:

is

: : :

ief egg parise fight exi praise :8s six: 3XG343 goes it: six&0&fs FIG.16A FIG.16B FIG.16C U.S. Patent Dec. 12, 1995 Sheet 19 of 22 5,474,896

U.S. Patent Dec. 12, 1995 Sheet 20 of 22 5,474,896

FIG.17E FIG.17F

FIG.17H

U.S. Patent Dec. 12, 1995 Sheet 22 of 22 5,474,896

X TIF2 -O-N->

-o- XI URA3 -6-ZNZ

IF REPAIR USING ECTOPIC DUPLICATION DELETION AND GENE CONVERSION

FIG, 19A

REPAIR USING HOMOLOGOUS SEQUENCE ON DELETION OF URA3 AND REPLACEMENT PASMID INTEGRATION FIG, 19B 5,474,896 1. 2 NUCLEOTIDE SEQUENCE ENCODING THE a template (Resnick and Martin 1976). This latter model has ENZYME-SCE AND THE USES THEREOF gained support from the fact that integrative transformation in yeast is dramatically increased when the transforming This is a continuation-in-part of application Ser. No. plasmid is linearized in the region of chromosomal homol 07/879,689, filed May 5, 1992, abandoned. The entire dis ogy (Orr-Weaver, Szostak and Rothstein 1981) and from the closure of the prior application is relied upon and incorpo direct observation of a double-strand break during mating rated herein by reference. type interconversion of yeast (Strathern et al. 1982). Recently, double-strand breaks have also been characterized BACKGROUND OF THE INVENTION during normal yeast meiotic recombination (Sun et al. 1989; This invention relates to a nucleotide sequence that 10 Alani, Padmore and Kleckner 1990). encodes the restriction endonuclease I-Sce. This invention Several double-strand endonuclease activities have been also relates to vectors containing the nucleotide sequence, characterized in yeast: HO and intron encoded endonu cells transformed with the vectors, transgenic animals based cleases are associated with func on the vectors, and cell lines derived from cells in the tions, while others still have unknown genetic functions animals. This invention also relates to the use of I-Sce for 5 (Endo-Sce, Endo-Sce) (Shibata et al. 1984; Morishima et mapping eukaryotic genomes and for in vivo site directed al. 1990). The HO site-specific endonuclease initiates mat . ing-type interconversion by making a double-strand break near the YZjunction of MAT (Kostriken et al. 1983). The The ability to introduce genes into the germ line of break is subsequently repaired using the intact HML or mammals is of great interest in biology. The propensity of 20 HMR sequences and resulting in ectopic gene conversion. mammalian cells to take up exogenously added DNA and to The HO recognition site is a degenerate 24 bp non-sym express genes included in the DNA has been known for metrical sequence (Nickoloff, Chen, and Heffron 1986; many years. The results of gene manipulation are inherited Nickoloff, Singer and Heffron 1990). This sequence has by the offspring of these animals. All cells of these offspring been used as a "recombinator” in artificial constructs to inherit the introduced gene as part of their genetic make-up. 25 promote intra- and intermolecular mitotic and meiotic Such animals are said to be transgenic. recombination (Nickoloff, Chen and Heffron, 1986; Kolod Transgenic mammals have provided a means for studying kin, Klar and Stahl 1986; Ray et al. 1988, Rudin and Haber, gene regulation during embryogenesis and in differentiation, 1988; Rudin, Sugarman, and Haber 1989). for studying the action of genes, and for studying the The two-site specific endonucleases, I-Sce (Jacquier and intricate interaction of cells in the immune system. The 30 Dujon 1985) and I-SceII (Delahodde et al. 1989; Wenzlau et whole animal is the ultimate assay system for manipulated al. 1989), that are responsible for intron mobility in mito genes, which direct complex biological processes. chondria, initiate a gene conversion that resembles the Transgenic animals can provide a general assay for func HO-induced conversion (see Dujon 1989 for review). I-SceI, tionally dissecting DNA sequences responsible for tissue which is encoded by the optional intron ScLSU.1 of the 21S specific or developmental regulation of a variety of genes. In 35 rRNA gene, initiates a double-strand break at the intron addition, transgenic animals provide useful vehicles for insertion site (Macreadie et al. 1985; Dujon et al. 1985; expressing recombinant and for generating precise Colleaux et al. 1986). The recognition site of I-SceI extends animal models of human genetic disorders. over an 18 bp non-symmetrical sequence (Colleaux et al. For a general discussion of gene cloning and expression 1988). Although the two proteins are not obviously related in animals and animal cells, see Old and Primrose, "Prin 40 by their structure (HO is 586 amino acids long while I-Sce ciples of Gene Manipulation,” Blackwell Scientific Publi is 235 amino acids long), they both generate 4 bp staggered cations, London (1989), page 255 et seq. cuts with 3'OH overhangs within their respective recogni Transgenic lines, which have a predisposition to specific tion sites. It has been found that a mitochondrial intron diseases and genetic disorders, are of great value in the encoded endonuclease, transcribed in the nucleus and trans investigation of the events leading to these states. It is well 45 lated in the cytoplasm, generates a double-strand break at a known that the efficacy of treatment of a genetic disorder nuclear site. The repair events induced by I-Sce are iden may be dependent on identification of the gene defect that is tical to those initiated by H.O. the primary cause of the disorder. The discovery of effective In summary, there exists a need in the art for reagents and treatments can be expedited by providing an animal model methods for providing transgenic animal models of human that will lead to the disease or disorder, which will enable the 50 diseases and genetic disorders. The reagents can be based on study of the efficacy, safety, and mode of action of treatment the I-Sce and the gene encoding this protocols, such as genetic recombination. enzyme. In particular, there exists a need for reagents and One of the key issues in understanding genetic recombi methods for replacing a natural gene with another gene that nation is the of the initiation step. Studies of homolo 55 is capable of alleviating the disease or genetic disorder. gous recombination in bacteria and fungi have led to the proposal of two types of initiation mechanisms. In the first SUMMARY OF THE INVENTION model, a single-strand nick initiates strand assimilation and Accordingly, this invention aids in fulfilling these needs in branch migration (Meselson and Radding 1975). Alterna the art. Specifically, this invention relates to an isolated DNA tively, a double-strand break may occur, followed by a repair encoding the enzyme I-Scel. The DNA has the following mechanism that uses an uncleaved homologous sequence as nucleotide sequence: 5,474,896

(SEQID NO:1) ATG CAT ATG AAA AAC ATC AAA AAA AAC CAG GTA ATG 2670 (SEQID NO:2) M H M K N I K K N V M 2 2671 AAC CTC GGT CCG AAC TCT AAA CTG CTG AAA GAA TAC AAA TCC CAG CTG ATC GAA CTG AAC 2730 13N L G P N S K L L K E Y K S Q L I E L N 32 2731 ATC GAA CAG TTC GAA GCA GGT ATC GGT CTG ATC CTG GGT GAT GCT TAC ATC CGT TCT CGT 2790 33 I E Q F E G I G L I L G D A Y R S R 52 2791 GAT GAA GGT AAA ACC TAC TGT ATG CAG TTC GAG TGG AAA AAC AAA GCA TAC ATG GAC CAC 2850 53D E G K T C M Q F E W K N K A. Y M D H 72 285 GTA TGT CTG CTG TAC GAT CAG TGG GTA CTG TCC CCG CCG CAC AAA AAA GAA CGT GTT AAC 2910 73 V C L L Y W W L S p p H K K E R W N 92

2911 CAC TCG GGT AAC CTG GTA ATC ACC TGG GGC GCC CAG ACT TTC AAA CAC CAA GCT TTC AAC 2970 93H L N L W I T W G A. Q T F K H O A F N 12 297 AAA CTG GCT AAC CTG TTC ATC GTT AAC AAC AAA AAA ACC ATC CCG AAC AAC CTG GTT GAA 3030 13K L N L F I W N N K K T I P N N L W E 132

3031 AAC TAC CTG ACC CCG ATG TCT CTG GCA TAC TGG TTC ATG GAT GAT GGT GGT AAA TGG GAT 3090 133N Y T P S L A Y W F M D D K W D 152

3091 TAC AAC AAA AAC TCT ACC AAC AAA TCG ATC GTA CTG AAC ACC CAG TCT TTC ACT TTC GAA 3150 153 Y N K N S S I W L N T Q T F E 172 351 GAA GTA GAA TAC CTG GTT AAG GGT CTG CGT AAC AAA TTC CAA CTG AAC TGT TAC GTA AAA 3210 173E W E Y L L R N K F Q L K 192 3211 ATC AAC AAA AAC AAA CCG ATC ATC TAC ATC GAT TCT ATG TCT TAC CTG ATC TTC TAC AAC 3270 193 I N K N K Y I D S M S Y N 212

3271 CTG ATC AAA CCG TAC CTG ATC CCG CAG ATG ATG TAC AAA CTG CCG AAC ACT ATC TCC TCC 3330 213 I K P Y L P Q M. M. Y K L P N T I S S 232 333 GAA ACT TTC CTG AAA TAA 233E T F L K k

35 This invention also relates to a DNA sequence comprising FIG. 2 depicts the nucleotide sequence of the invention a promoter operatively linked to the DNA sequence of the encoding the enzyme I-SceI and the amino acid sequence of invention encoding the enzyme I-SceI. the natural I-SceI enzyme (SEQED NOs: 5 and 2). This invention further relates to an isolated RNA comple FIG. 3 depicts the I-Sce recognition sequence and indi mentary to the DNA sequence of the invention encoding the 40 cates possible base in the recognition site and the enzyme I-Sce and to the other DNA sequences described effect of such mutations on stringency of recognition. (SEQ herein. ID NOs: 6, 7 and 8) In another embodiment of the invention, a vector is FIG. 4 is the nucleotide sequence and deduced amino acid provided. The vector comprises a plasmid, bacteriophage, or sequence of a region of plasmid pSCM525. The nucleotide cosmid vector containing the DNA sequence of the inven 45 sequence of the invention encoding the enzyme I-SceI is tion encoding the enzyme I-SceI. enclosed in the box. (SEQ ID NOs: 9 through 16) FIG. S depicts variations around the amino acid sequence In addition, this invention relates to E. coli or eukaryotic of the enzyme I-SceI. (SEQID NO: 2) cells transformed with a vector of the invention. FIG. 6 shows Group I intron encoding endonucleases and Also, this invention relates to transgenic animals contain 50 related endonucleases. (SEQED NOs: 17 through 44) ing the DNA sequence encoding the enzyme I-SceI and cell FIG. 7 depicts yeast expression vectors containing the lines cultured from cells of the transgenic animals. synthetic gene for I-Sce. In addition, this invention relates to a transgenic organism FIG. 8 depicts the mammalian expression vector PRSV in which at least one restriction site for the enzyme I-SceI I-Sce. has been inserted in a of the organism. 55 FIG. 9 is a restriction map of the plasmid paF100. (See Further, this invention relates to a method of genetically also YEAST, 6:521-534, 1990, which is relied upon and mapping a eukaryotic genome using the enzyme I-SceI. incorporated by reference herein). This invention also relates to a method for in vivo site FIGS. 10A and 10B show the nucleotide sequence and directed recombination in an organism using the enzyme restriction sites of regions of the plasmid pAF100. (SEQID I-Sce. 60 NOs: 45 through 50) FIG. 11 depicts an insertion vectorpTSMo, pTKMao, and BRIEF DESCRIPTION OF THE DRAWINGS pTTc() containing the I-SceI site for E. coli and other This invention will be more fully described with reference bacteria. to the drawings in which: 65 FIG. 12 depicts an insertion vectorpTYW6 containing the FIG. 1 depicts the universal code equivalent of the mito I-SceI site for yeast. chondrial I-SceI gene (SEQID NOs: 3 and 4). FIG. 13 depicts an insertion vector PMLV LTR SAPLZ 5,474,896 S 6 containing the I-Sce site for mammalian cells. to the I-Sce map and allow comparison with the genetic FIG. 14 depicts a set of seven transgenic yeast strains map (top). cleaved by I-SceI. from FY1679 (control) FIG. 19 depicts diagrams of successful site directed and from seven transgenic yeast strains with I-Sce sites homologous recombination experiments performed in yeast. inserted at various positions along chromosome XI were treated with I-SceI. DNA was electrophoresed on 1% aga rose (SeaKem) gel in 0.25 XTBE buffer at 130 V and 12° C. on a Rotaphor apparatus (Biometra) for 70 hrs using 100 DETAILED DESCRIPTION OF THE sec to 40 sec decreasing pulse times. (A) DNA was stained PREFERRED EMBODIMENTS with ethidium bromide (0.2 g/ml) and transferred to a 10 Hybond N (Amersham) membrane for hybridization. (B) 'P The genuine mitochondrial gene (ref. 8) cannot be labelled cosmid pUKG040 which hybridizes with the short expressed in E. coli, yeast or other organisms due to the est fragment of the set was used as a probe. Positions of peculiarities of the mitochondrial . A “universal chromosome XI and shorter chromosomes are indicated. code equivalent' has been constructed by in vitro site FIG. 15 depicts the rationale of the nested chromosomal 15 directed mutagenesis. Its sequence is given in FIG. 1. Note fragmentation strategy for genetic mapping. (A) Positions of that all non-universal codons (except two CTN) have been I-Scel sites are placed on the map, irrespective of the replaced together with some codons extremely rare in E. left/right orientation (shorterfragments are arbitrarily placed coli. on the left). Fragment sizes as measured from PFGE (FIG. The universal code equivalent has been successfully 14A) are indicated in kb (note that the sum of the two 20 fragment sizes varies slightly due to the limit of precision of expressed in E. coli and determines the synthesis of an active each measurement). (B) Hydridization with the probe that enzyme. However, expression levels remained low due to hydridizes the shortest fragment of the set determines the the large number of codons that are extremely rare in E. coli. orientation of each fragment (see FIG. 14B). Fragments that Expression of the "universal code equivalent” has been hydridize with the probe (full lines) have been placed 25 detected in yeast. arbitrarily to the left. (C) Transgenic yeast strains have been To optimize gene expression in heterologous systems, a ordered with increasing sizes of hydridizing chromosome Synthetic gene has been designed to encode a protein with fragments. (D) Deduced I-Scel map with minimal and the genuine amino acid sequence of I-SceI using, for each maximal size of intervals indicated in kb (variations in some intervals are due to limitations of PFGE measurements). (E) 30 codon, that most frequently used in E. coli. The sequence of Chromosome subfragments are used as probes to assign the synthetic gene is given in FIG. 2. The synthetic gene was each cosmid clone to a given map interval or across a given constructed in vitro from eight synthetic I-Sce site. with partial overlaps. Oligonucleotides were designed to FIG. 16 depicts mapping of the I-SceI sites of transgenic allow mutual priming for second strand synthesis by Klenow yeast strains by hybridization with left end and right end 35 polymerase when annealed by pairs. The elongated pairs probes of chromosome XI. Chromosomes from FY1679 were then ligated into . Appropriately placed (control) and the seven transgenic yeast strains were restriction sites within the designed sequence allowed final digested with I-Sce. Transgenic strains were placed in order assembly of the synthetic gene by in vitro ligation. The as explained in FIG. 15. Electrophoresis conditions were as 40 synthetic gene has been successfully expressed in both E. in FIG. 14. 'P labelled cosmids pUKG040 and puKG066 coli and yeast. were used as left end and right end probes, respectively. 1. I-Sce Gene Sequence FIG. 17 depicts mapping of a cosmid collection using the This invention relates to an isolated DNA sequence nested chromosomal fragments as probes. Cosmid encoding the enzyme I-Scel. The enzyme I-Sce is an were digested with EcoRI and electrophoresed on 0.9% 45 agarose (SeaKem) gel at 1.5 V/cm for 14 hrs, stained with endonuclease. The properties of the enzyme (ref. 14) are as ethidium bromide and transferred to a Hybond N membrane. follows: Cosmids were placed in order from previous hybridizations I-Sce is a double-stranded endonuclease that cleaves to help visualize the strategy. Hybridizations were carried DNA within its recognition site. I-Sce generates a 4bp out serially on three identical membranes using left end 50 staggered cut with 3'OH overhangs. nested chromosome fragments purified on PFGE (see FIG. 16) as probes. A: ethidium bromide staining (ladder is the Substrate: Acts only on double-stranded DNA. Substrate BRL "1 kb ladder”), B: membrane #1, probe: Left tel to DNA can be relaxed or negatively supercoiled. A302 site, C: membrane #1, probe: Left tel to M57 site, D: Cations: Enzymatic activity requires Mg (8 mM is membrane #2, probe: Left tel to H81 site, E: membrane #2, 55 optimum). Mn" can replace Mg, but this reduces the probe: Left tel to T62 site, F: membrane #3, probe: Left tel stringency of recognition. to G41 site, G: membrane #3, probe: Left tel to D304 site, Optimum conditions for activity: high pH (9 to 10), H: membrane #3, probe: entire chromosome XI. temperature 20-40 C., no monovalent cations. FIG. 18 depicts a map of the yeast chromosome XI as Enzyme stability: I-Sce is unstable at room temperature. determined from the nested chromosomal fragmentation 60 strategy. The chromosome is divided into eight intervals The enzyme-substrate complex is more stable than the (with sizes indicated in kb, see FIG. 15D) separated by seven enzyme alone (presence of recognition sites stabilizes I-Scel sites (E40, A302 . . . ). Cosmid clones falling either the enzyme.) within intervals or across a given I-Sce site are listed below The enzyme I-SceI has a known recognition site. (ref. 14.) intervals or below interval boundaries, respectively. Cosmid 65 The recognition site of I-Scel is a non-symmetrical sequence clones that hybridize with selected genes used as probes are that extends over 18bp as determined by systematic muta indicated by letters (a-i). They localize the gene with respect tional analysis. The sequence reads: (arrows indicate cuts) 5,474,896 8 ing of the enzyme to the downstream half of the site (corresponding to the downstream exon) followed by binding of the enzyme to the upstream half of the site V (corresponding to the upstream exon). The first binding 5'TAGGGATAACAGGGTAAT3' (SEQID NO:51) is strong, the second is weaker, but the two are neces 3 ATCCCTATTGTCCCATT A5 (SEQID NO:52) sary for cleavage of DNA. In vitro, the enzyme can bind the downstream exon alone as well as the intron exonjunction sequence, but no cleavage results. The evolutionarily conserved dodecapeptide motifs of The recognition site corresponds, in part, to the upstream intron-encoded I-Sce are essential for endonuclease activ exon and, in part, to the downstream exon of the intron plus O ity. It has been proposed that the role of these motifs is to form of the gene. properly position the acidic amino acids with respect to the The recognition site is partially degenerate: single base DNA sequence recognition domains of the enzyme for the substitutions within the 18bp long sequence result in either catalysis of phosphodiester bond hydrolysis (ref. P3). complete insensitivity or reduced sensitivity to the enzyme, The nucleotide sequence of the invention, which encodes depending upon position and nature of the substitution. 15 the natural I-Scel enzyme is shown in FIG.2. The nucleotide The stringency of recognition has been measured on: sequence of the gene of the invention was derived by 1-mutants of the site. dideoxynucleotide sequencing. The base sequences of the 2-the total yeast genome (Saccharomyces cerevisiae, nucleotides are written in the 5->3' direction. Each of the genome complexity is 1.4x10 bp). Data are unpub letters shown is a conventional designation for the following lished. 20 nucleotides: Results are: A Adenine 1-Mutants of the site: As shown in FIG. 3, there is a G Guanine general shifting of stringency, i.e., mutants severely T Thymine affected in Mg" become partially affected in Min'", 25 C Cytosine. mutants partially affected in Mg become unaffected It is preferred that the DNA sequence encoding the in Mn. enzyme I-Sce be in a purified form. For instance, the 2-Yeast: In magnesium conditions, no cleavage is sequence can be free of human blood-derived proteins, observed in normal yeast. In the same condition, DNA human serum proteins, viral proteins, nucleotide sequences from transgenic is cleaved to completion at the 30 encoding these proteins, human tissue, human tissue com artificially inserted I-Sce site and no other cleavage ponents, or combinations of these substances. In addition, it site can be detected. If magnesium is replaced by is preferred that the DNA sequence of the invention is free manganese, five additional cleavage sites are revealed of extraneous proteins and lipids, and adventitious micro in the entire yeast genome, none of which is cleaved to organisms, such as bacteria and viruses. The essentially completion. Therefore, in manganese the enzyme 35 purified and isolated DNA sequence encoding I-SceI is reveals an average of 1 site for ca. 3 millions based especially useful for preparing expression vectors. pairs (5/1.4x107 bp). Plasmid pSCM525 is a pUC12 derivative, containing an Definition of the recognition site: important bases are artificial sequence encoding the DNA sequence of the inven indicated in FIG. 3. They correspond to bases for which tion. The nucleotide sequence and deduced amino acid severely affected mutants exist. Notice however that: 40 sequence of a region of plasmid pSCM525 is shown in FIG. 1-All possible mutations at each position have not been 4. The nucleotide sequence of the invention encoding I-SceI determined; therefore a base that does not correspond is enclosed in the box. The artificial gene is a BamHI- SalI to a severely affected mutant may still be important if piece of DNA sequence of 723 base pairs, chemically another mutant was examined at this very same posi synthesized and assembled. It is placed under tac promoter t1Ol. 45 control. The DNA sequence of the artificial gene differs from 2-There is no clear-cut limit between a very important the natural coding sequence or its universal code equivalent base (all mutants are severely affected) and a moder described in Cell (1986), Vol. 44, pages 521-533. However, ately important base (some of the mutants are severely the translation product of the artificial gene is identical in affected). There is a continuum between excellent sub sequence to the genuine omega-endonuclease except for the strates and poor substrates for the enzyme. 50 addition of a Met-His at the N-terminus. It will be under stood that this modified endonuclease is within the scope of The expected frequency of natural I-Sce sites in a ran this invention. dom DNA sequence is, therefore, equal to (0.25) or Plasmid pSCM525 can be used to transform any suitable (1.5x10'). In other words, one should expect one E. coli strain and transformed cells become ampicillin natural site for the equivalent of ca. 20 human 55 resistant. Synthesis of the omega-endonuclease is obtained genomes, but the frequency of degenerate sites is more by addition of I.P.T.G. or an equivalent inducer of the lactose difficult to predict. operon system. I-Sce belongs to a "degenerate” subfamily of the two A plasmid identified as pSCM525 containing the enzyme dodecapeptide family. Conserved amino acids of the I-Sce was deposited in E. coli strain TG1 with the Collec dodecapeptide motifs are required for activity. In par 60 tion Nationale de Cultures de Microorganismes (C.N.C.M.) ticular, the aspartic residues at positions 9 of the two of Institut Pasteur in Paris, France on Nov. 22, 1990, under dodecapeptides cannot be replaced, even with glutamic culture collection deposit Accession No. I-1014. The nucle residues. It is likely that the dodecapeptides form the otide sequence of the invention is thus available from this catalytic site or part of it. deposit. Consistent with the recognition site being non-symmetri 65 The gene of the invention can also be prepared by the cal, it is likely that the endonucleolytic activity of formation of 3'->5' phosphate linkages between nucleoside I-Sce requires two successive recognition steps: bind units using conventional chemical synthesis techniques. For 5,474,896 9 10 example, the well-known phosphodiester, phosphotriester, DNA to be amplified can be prepared by well known and phosphite triester techniques, as well as known modi techniques for the synthesis of oligonucleotides. One end of fications of these approaches, can be employed. Deoxyribo each primer can be extended and modified to create restric nucleotides can be prepared with automatic synthesis tion endonuclease sites when the primer is annealed to the machines, such as those based on the phosphoramidite DNA. The PCR reaction mixture can contain the DNA, the approach. Oligo- and polyribonucleotides can also be DNA primer pairs, four deoxyribonucleoside triphosphates, obtained with the aid of RNA ligase using conventional MgCl2, DNA polymerase, and conventional buffers. The techniques. DNA can be amplified for a number of cycles. It is generally This invention of course includes variants of the DNA possible to increase the sensitivity of detection by using a sequence of the invention exhibiting substantially the same multiplicity of cycles, each cycle consisting of a short period properties as the sequence of the invention. By this it is 10 of denaturation of the DNA at an elevated temperature, meant that DNA sequences need not be identical to the cooling of the reaction mixture, and polymerization with the sequence disclosed herein. Variations can be attributable to DNA polymerase. Amplified sequences can be detected by single or multiple base substitutions, deletions, or insertions the use of a technique termed oligomer restriction (OR). See, or local mutations involving one or more nucleotides not R. K. Saiki et al., Bio/Technology 3:1008-1012 (1985). substantially detracting from the properties of the DNA 15 The enzyme I-Scel is one of a number of endonucleases sequence as encoding an enzyme having the cleavage prop with similar properties. Following is a listing of related erties of the enzyme I-Scel. enzymes and their sources. FIG. 5 depicts some of the variations that can be made Group I intron encoded endonucleases and related around the I-Sce amino acid sequence. It has been demon enzymes are listed below with references. Recognition sites strated that the following positions can be changed without 20 are shown in FIG. 6. affecting enzyme activity: positions -1 and -2 are not natural. The two amino acids are added due to cloning strategies. Enzyme Encoded by Ref positions 1 to 10: can be deleted. I-Sce Sc LSU-1 intron this work 25 I-Sce Sc coxl-4 intron Sargueil et al., NAR position 36: G is tolerated. (1990) 18, 5659-5665 position 40: M or V are tolerated. I-SceIII Sc cox1-3 intron Sargueil et al., MGG (1991) 225, 340-341 position 41: S or N are tolerated. I-SceIV Sc cox1-5a intron Seraphin et al. (1992) position 43: A is tolerated. in press 30 I-Ceul Ce LSU-5 intron Marshall, Lemieux Gene position 46: V or N are tolerated. (1991) 104, 241-245 position 91: A is tolerated. I-Cre Cr LSU-1 intron Rochaix (unpublished) I-Ppol Pp LSU-3 intron Muscarella et al., MCB positions 123 and 156: L is tolerated. (1990) 10,3386-3396 position 223: A and S are tolerated. I-Tew T4 td-1 intron Chu et al., PNAS (1990) 35 87,3574–3578 and Bell It will be understood that enzymes containing these modi Pedersen et al. NAR fications are within the scope of this invention. (1990) 18, 3763-3770. Changes to the amino acid sequence in FIG. 5 that have I-Tew T4 sunYintron Bell-Pedersen et al. NAR been demonstrated to affect enzyme activity are as follows: (1990) 18, 3763-3770. I-Tew RB3 mrdB-1 intron Eddy, Gold, Genes Dev. position 19; L to S (1991) 5, 1032-1041 position 38: I to S or N 40 HO HO yeast gene Nickoloff et al., MCB (1990) 10, 1174-1179 position 39: G to D or R Endo SceI RF3 yeast mito. gene Kawasaki et al., JBC position 40; L to O (1991) 266, 5342-5347 position 42: L to R 45 Putative new enzymes (genetic evidence but no activity as position 44: D to E, G or H yet) are I-CsmI from cytochrome b intron 1 of Chlamy position 45: A to E or D domonas smithi mitochondria (ref. 15), I-Pan from cyto position 46: Y to D chrome b intron 3 of Podospora anserina mitochondria (Jill position 47: I to R or N Salvo), and probably enzymes encoded by introns Nc nd 11 position 80; L to S 50 and Nc cob! from Neurospora crassa. position 144: D to E The I-endonucleases can be classified as follows: Class I: Two dodecapeptide motifs, 4bp staggered cut with position 145: D to E 3'OH overhangs, cut internal to recognition site position 146: G to E position 147: G to S 55 It will also be understood that the present invention is intended to encompass fragments of the DNA sequence of Subclass "I-Sce" Other subclasses the invention in purified form, where the fragments are I-Sce I-Sce capable of encoding enzymatically active I-Sce. I-SeeW I-Sce I-Csm I-Ceul (only one dodecapeptide motif) The DNA sequence of the invention coding for the 60 I-Pan I-CreI (only one dodecapeptide motif) enzyme I-Sce can be amplified in the well known poly HO merase chain reaction (PCR), which is useful for amplifying TFP1-408 (HO homolog) all or specific regions of the gene. See e.g., S. Kwok et al., Endo Sce J. Virol., 61:1690-1694 (1987); U.S. Pat. No. 4,683,202; and U.S. Pat. No. 4,683,195. More particularly, DNA primer 65 Class II: GIY-(N)YIG motif, 2 bp staggered cut with 3' pairs of known sequence positioned 10-300 base pairs apart OH overhangs, cut external to recognition site: that are complementary to the plus and minus strands of the I-TeVI 5,474,896 11 12 Class III: no typical structural motifs, 4 bp staggered cut or a higher organism, such as a plant or animal. The nucleic with 3'OH overhangs, cut internal to recognition site: acid can be a fraction of a more complex mixture, such as I-PpoI a portion of a gene contained in whole human DNA or a Class IV: no typical structural motifs, 2 bp staggered cut portion of a nucleic acid sequence of a particular microor with 3'OH overhangs, cut external to recognition site: ganism. The nucleic acid can be a fraction of a larger -Tew molecule or the nucleic acid can constitute an entire gene or Class V: no typical structural motifs, 2 bp staggered cut with assembly of genes. The DNA can be in a single-stranded or 5’ OH overhangs: double-stranded form. If the fragment is in single-stranded I-TeVIII. form, it can be converted to double-stranded form using 2. Nucleotide Probes Containing the I-SceI Gene of The 10 DNA polymerase according to conventional techniques. Invention The DNA sequence of the invention can be linked to a The DNA sequence of the invention coding for the structural gene. As used herein, the term "structural gene' enzyme I-Sce can also be used as a probe for the detection refers to a DNA sequence that encodes through its template of a nucleotide sequence in a biological material, such as or messenger mRNA a sequence of amino acids character tissue or body fluids. The probe can be labeled with an atom 15 istic of a specific protein or polypeptide. The nucleotide or inorganic radical, most commonly using a radionuclide, sequence of the invention can function with an expression but also perhaps with a heavy metal. Radioactive labels control sequence, that is, a DNA sequence that controls and include 'P, H, C, or the like. Any radioactive label can regulates expression of the gene when operatively linked to be employed, which provides for an adequate signal and has the gene. sufficient half-life. Other labels include ligands that can 20 4. Vectors Containing the Nucleotide Sequence of the Inven serve as a specific binding member to a labeled antibody, tion fluorescers, chemiluminescers, enzymes, antibodies which This invention also relates to cloning and expression can serve as a specific binding pair member for a labeled vectors containing the DNA sequence of the invention ligand, and the like. The choice of the label will be governed coding for the enzyme I-SceI. by the effect of the label on the rate of hybridization and 25 More particularly, the DNA sequence encoding the binding of the probe to the DNA or RNA. It will be enzyme can be ligated to a vehicle for cloning the sequence. necessary that the label provide sufficient sensitivity to The major steps involved in gene cloning comprise proce detect the amount of DNA or RNA available for hybridiza dures for separating DNA containing the gene of interest tion. from prokaryotes or eukaryotes, cutting the resulting DNA When the nucleotide sequence of the invention is used as 30 fragment and the DNA from a cloning vehicle at specific a probe for hybridizing to a gene, the nucleotide sequence is sites, mixing the two DNA fragments together, and ligating preferably affixed to a water insoluble solid, porous support, the fragments to yield a recombinant DNA molecule. The such as nitrocellulose paper. Hybridization can be carried recombinant molecule can then be transferred into a host out using labeled polynucleotides of the invention and cell, and the cells allowed to replicate to produce identical conventional hybridization reagents. The particular 35 cells containing clones of the original DNA sequence. ization technique is not essential to the invention. The vehicle employed in this invention can be any The amount of labeled probe present in the hybridization double-stranded DNA molecule capable of transporting the solution will vary widely, depending upon the nature of the nucleotide sequence of the invention into a host cell and label, the amount of the labeled probe which can reasonably capable of replicating within the cell. More particularly, the bind to the support, and the stringency of the hybridization. 40 vehicle must contain at least one DNA sequence that can act Generally, substantial excesses of the probe over stoichio as the origin of replication in the host cell. In addition, the metric will be employed to enhance the rate of binding of the vehicle must contain two or more sites for insertion of the probe to the fixed DNA. DNA sequence encoding the gene of the invention. These Various degrees of stringency of hybridization can be sites will ordinarily correspond to restriction enzyme sites at employed. The more severe the conditions, the greater the 45 which cohesive ends can be formed, and which are comple complementarity that is required for hybridization between mentary to the cohesive ends on the promotersequence to be the probe and the polynucleotide for duplex formation. ligated to the vehicle. In general, this invention can be Severity can be controlled by temperature, probe concen carried out with plasmid, bacteriophage, or cosmid vehicles tration, probe length, ionic strength, time, and the like. having these characteristics. Conveniently, the stringency of hybridization is varied by 50 The nucleotide sequence of the invention can have cohe changing the polarity of the reactant solution. Temperatures sive ends compatible with any combination of sites in the to be employed can be empirically determined or determined vehicle. Alternatively, the sequence can have one or more from well known formulas developed for this purpose. blunt ends that can be ligated to corresponding blunt ends in 3. Nucleotide Sequences Containing the Nucleotide the cloning sites of the vehicle. The nucleotide sequence to Sequence Encoding I-Scel 55 be ligated can be further processed, if desired, by successive This invention also relates to the DNA sequence of the exonuclease deletion, such as with the enzyme Bal 31. In the invention encoding the enzyme I-Sce, wherein the nucle event that the nucleotide sequence of the invention does not otide sequence is linked to other nucleic acids. The nucleic contain a desired combination of cohesive ends, the acid can be obtained from any source, for example, from sequence can be modified by adding a linker, an adaptor, or plasmids, from cloned DNA or RNA, or from natural DNA 60 homopolymer tailing. or RNA from any source, including prokaryotic and eukary It is preferred that plasmids used for cloning nucleotide otic organisms. DNA or RNA can be extracted from a sequences of the invention carry one or more genes respon biological material, such as biological fluids or tissue, by a sible for a useful characteristic, such as a selectable marker, variety of techniques including those described by Maniatis displayed by the host cell. In a preferred strategy, plasmids et al., Molecular Cloning: A Laboratory Manual, Cold 65 having genes for resistance to two different drugs are cho Spring Harbor Laboratory, New York (1982). The nucleic sen. For example, insertion of the DNA sequence into a gene acid will generally be obtained from a bacteria, yeast, virus, for an antibiotic inactivates the gene and destroys drug 5,474,896 13 14 resistance. The second drug resistance gene is not affected pPEX7, which is a yeast expression vector derived from when cells are transformed with the recombinants, and pRP51-Bam O (a LEU2d derivative of plG-SD5) by inser colonies containing the gene of interest can be selected by tion of the synthetic gene under the control of the galactose resistance to the second drug and susceptibility to the first promoter. Expression is induced by galactose. drug. Preferred antibiotic markers are genes imparting pPEX408, which is a yeast expression vector derived from , amplicillin, or tetracycline resistance to the pLG-SD5 by insertion of the synthetic gene under the host cell. control of the galactose promoter. Expression is induced by A variety of restriction enzymes can be used to cut the galactose. vehicle. The identity of the restriction enzyme will generally Several yeast expression vectors are depicted in FIG. 7. depend upon the identity of the ends on the DNA sequence 10 Typical mammalian expression vectors are: to be ligated and the restriction sites in the vehicle. The pRSV I-Scel, which is a pRSV derivative in which the restriction enzyme is matched to the restriction sites in the synthetic gene (BamHI - Psti fragment from pSCM525) is vehicle, which in turn is matched to the ends on the nucleic under the control of the LTR promoter of Rous Sarcoma acid fragment being ligated. Virus. This expression vector is depicted in FIG. 8. The ligation reaction can be set up using well known 15 Vectors for expression in Chinese Hamster Ovary (CHO) techniques and conventional reagents. Ligation is carried out cells can also be employed. with a DNA ligase that catalyzes the formation of phos 5. Cells Transformed with Vectors of the Invention phodiester bonds between adjacent 5'-phosphate and the free The vectors of the invention can be inserted into host 3'-hydroxy groups in DNA duplexes. The DNAligase can be organisms using conventional techniques. For example, the derived from a variety of microorganisms. The preferred 20 vectors can be inserted by transformation, transfection, DNA ligases are enzymes from E. coli and bacteriophage electroporation, microinjection, or by means of liposomes T4. T4 DNA ligase can ligate DNA fragments with blunt or (lipofection). sticky ends, such as those generated by restriction enzyme Cloning can be carried out in prokaryotic or eukaryotic digestion. E. coli DNA ligase can be used to catalyze the cells. The host for replicating the cloning vehicle will of formation of phosphodiester bonds between the termini of 25 course be one that is compatible with the vehicle and in duplex DNA molecules containing cohesive ends. which the vehicle can replicate. Cloning is preferably carried Cloning can be carried out in prokaryotic or eukaryotic out in bacterial or yeast cells, although cells of fungal, cells. The host for replicating the cloning vehicle will of animal, and plant origin can also be employed. The preferred course be one that is compatible with the vehicle and in host cells for conducting cloning work are bacterial cells, which the vehicle can replicate. When a plasmid is 30 such as E. coli. The use of E. coli cells is particularly employed, the plasmid can be derived from bacteria or some preferred because most cloning vehicles, such as bacterial other organism or the plasmid can be synthetically prepared. plasmids and bacteriophages, replicate in these cells. The plasmid can replicate independently of the host cell In a preferred embodiment of this invention, an expres chromosome or an integrative plasmid (episome) can be sion vector containing the DNA sequence encoding the employed. The plasmid can make use of the DNA replicative 35 nucleotide sequence of the invention operatively linked to a enzymes of the host cell in order to replicate or the plasmid promoter is inserted into a mammalian cell using conven can carry genes that code for the enzymes required for tional techniques. plasmid replication. A number of different plasmids can be employed in practicing this invention. APPLICATION OF I-Sce FOR LARGE SCALE The DNA sequence of the invention encoding the enzyme 40 I-SceI can also be ligated to a vehicle to form an expression "MAPPING vector. The vehicle employed in this case is one in which it 1. Occurrence of natural sites in various genomes is possible to express the gene operatively linked to a Using the purified I-SceI enzyme, the occurrence of promoter in an appropriate host cell. It is preferable to natural or degenerate sites has been examined on the com employ a vehicle known for use in expressing genes in E. 45 plete genomes of several species. No natural site was found coli, yeast, or mammalian cells. These vehicles include, for in Saccharomyces cerevisiae, Bacillus anthracis, Borrelia example, the following E. coli expression vectors: burgdorferi, Leptospira biflexia and L. interrogans. One pSCM525, which is an E. coli expression vector derived degenerate site was found on T7 phage DNA. from pUC12 by insertion of atac promoter and the synthetic 2. Insertion of artificial sites gene for I-SceI. Expression is induced by IPTG. 50 Given the absence of natural I-Sce sites, artificial sites pGEXco6, which is an E. coli expression vector derived from can be introduced by transformation or transfection. Two pGEX in which the synthetic gene from pSCM525 for I-SceI cases need to be distinguished: site-directed integration by is fused with the glutathione S transferase gene, producing homologous recombination and random integration by non a hybrid protein. The hybrid protein possesses the endonu homologous recombination, transposon movement or retro clease activity. 55. viral infection. The first is easy in the case of yeast and a few pDIC73, which is an E. coli expression vector derived from bacterial species, more difficult for higher eucaryotes. The pET-3C by insertion of the synthetic gene for I-Sce (Ndel second is possible in all systems. - BamHI fragment of pSCM525) under T7 promoter control. 3. Insertion vectors This vector is used in strain BL21 (DE3) which expresses Two types can be distinguished: the T7 RNA polymerase under IPTG induction. 60 1-Site specific cassettes that introduce the I-Sce site pSCM351, which is an E. coli expression vector derived together with a selectable marker. from pUR291 in which the synthetic gene for I-Sce is fused For yeast: all are pAF100 derivatives (Thierry et al. (1990) with the Lac Z gene, producing a hybrid protein. YEAST 6:521-534) containing the following marker genes: pSCM353, which is an E. coli expression vector derived pAF101: URA3 (inserted in the HindIII site) from pEX1 in which the synthetic gene for I-SceI is fused 65 pAF103: Neo (inserted in BglII site) with the Cro?Lac Z gene, producing a hybrid protein. pAF104: HIS3 (inserted in BglII site) Examples of yeast expression vectors are: pAF105: Kan' (inserted in BglII site) 5,474,896 15 16 pAF106: Kan" (inserted in BglII site) tion. Two sites were inserted within genetically defined pAF107: LYS2 (inserted between HindIII and EcoRV) genes, TIF1 and FAS1, the others were inserted at unknown A restriction map of the plasmid pAF100 is shown in FIG. positions in the chromosome from five non-overlapping 9. The nucleotide sequence and restriction sites of regions of cosmids of our library, taken at random. Agarose embedded plasmid pAF100 are shown in FIGS. 10A and 10B. Many 5 DNA of each of the seven transgenic strains was then transgenic yeast strains with the I-Sce site at various and digested with I-Sce and analyzed by pulsed field gel elec known places along chromosomes are available. trophoresis (FIG. 14A). The position of the I-Sce site of 2-Vectors derived from transposable elements or retro each transgenic strain in chromosome XI is first deduced viruses. from the fragment sizes without consideration of the left/ For E. coli and other bacteria: mini Tn5 derivatives con 10 right orientation of the fragments. Orientation was deter taining the I-SceI site and mined as follows. The most proximal I-SceI site pTSmoo Str from this set of strains is in the transgenic E40 because the pTKm o Kan (See FIG. 11) 50 kb fragment is the shortest of all fragments (FIG. 15A). pTTc co Tet Therefore, the cosmid clone pUKGO40, which was used to For yeast: pTyco6 is a pl)123 derivative in which the I-SceI insert the I-Sce site in the transgenic E40, is now used as a site has been inserted in the LTR of the Ty element. (FIG.12) probe against all chromosome fragments (FIG. 14B). As For mammalian cells: expected, pUKG040 lights up the two fragments from strain pMLV LTRSAPLZ: containing the I-Sce site in the LTR of E40 (50 kb and 630 kb, respectively). The large fragment is MLV and Phleo-LacZ (FIG. 13). This vector is first grown close to the entire chromosome XI and shows a weak in 2 cells (3T3 derivative, from R. Mulligan). Two trans 20 hybridization signal due to the fact that the insert of genic cell lines with the I-Sce site at undetermined locations pUKG040, which is 38 kb long, contains less than 4 kb in the genome are available: 1009 (pluripotent nerve cells, J. within the large chromosome fragment. Note that the entire F. Nicolas) and D3 (ES cells able to generate transgenic chromosome XI remains visible after I-SceI digestion, due animals). to the fact that the transgenic strains are diploids in which 4. The nested chromosomal fragmentation strategy 25 the I-Sce site is inserted in only one of the two homologs. The nested chromosomal fragmentation strategy for Now, the pUKG040 probe hybridizes to only one fragment genetically mapping a eukaryotic genome exploits the of all other transgenic strains allowing unambiguous left/ unique properties of the restriction endonuclease I-SceI, right orientation of I-Sce sites (See FIG. 15B). No signifi such as an 18 bp long recognition site. The absence of cant cross hybridization between the cosmid vector and the natural I-Sce recognition sites in most eukaryotic genomes 30 chromosome subfragment containing the I-SceI site inser is also exploited in this mapping strategy. tion vector is visible. Transgenic strains can now be ordered First, one or more I-Sce recognition sites are artificially such that I-Scel sites are located at increasing distances from inserted at various positions in a genome, by homologous the hybridizing end of the chromosome (FIG. 15C) and the recombination using specific cassettes containing selectable I-Scel map can be deduced (FIG. 15D). Precision of the markers or by random insertion, as discussed supra. The 35 mapping depends upon PFGE resolution and optimal cali genome of the resulting transgenic strain is then cleaved bration. Note that actual left/right orientation of the chro completely at the artificially inserted I-Sce site(s) upon mosome with respect to the genetic map is not known at this incubation with the I-Sce restriction enzyme. The cleavage step. To help visualize our strategy and to obtain more produces nested chromosomal fragments. precise measurements of the interval sizes between I-SceI The chromosomal fragments are then purified and sepa 40 sites between I-Scel, a new pulsed field gel electrophoresis rated by pulsed field gel (PFG) electrophoresis, allowing one with the same transgenic strains now placed in order was to 'map' the position of the inserted site in the chromosome. made (FIG. 16). After transfer, the fragments were hybrid If total DNA is cleaved with the restriction enzyme, each ized successively with cosmids pUKG040 and puKG066 artificially introduced I-Scelsite provides a unique "molecu which light up, respectively, all fragments from the opposite lar milestone' in the genome. Thus, a set of transgenic 45 ends of the chromosome (clone pUKG066 defines the right strains, each carrying a single I-Sce site, can be created end of the chromosome as defined from the genetic map which defines physical genomic intervals between the mile because it contains the SIR1 gene. A regular stepwise Stones. Consequently, an entire genome, a chromosome or progression of chromosome fragment sizes is observed. any segment of interest can be mapped using artificially Note some cross hybridization between the probe puKG066 introduced I-Sce restriction sites. 50 and chromosome III, probably due to some repetitive DNA The nested chromosomal fragments may be transferred to sequences. a solid membrane and hybridized to a labelled probe con All chromosome fragments, taken together, now define taining DNA complementary to the DNA of the fragments. physical intervals as indicated in FIG. 15d. The I-SceI map Based on the hybridization banding patterns that are obtained has an 80 kb average resolution. observed, the eukaryotic genome may be mapped. The set of 55 EXAMPLE 2 transgenic strains with appropriate "milestones' is used as a Application of the Nested Chromosomal Fragmentation reference to map any new gene or clone by direct hybrid Strategy to the Mapping of Yeast Artificial Chromosome ization. (YAC) Clones EXAMPLE 1. This strategy can be applied to YAC mapping with two Application of the Nested Chromosomal Fragmentation 60 possibilities. Strategy to the Mapping of Yeast Chromosome XI 1-insertion of the I-SceI site within the gene of interest This strategy has been applied to the mapping of yeast using homologous recombination in yeast. This permits chromosome XI of Saccharamyces cerevisiae. The I-Sce mapping of that gene in the YAC insert by I-SceI digestion site was inserted at 7 different locations along chromosome in vitro. This has been done and works. XI of the diploid strain FY1679, hence defining eight 65 2-random integration of I-Scel sites along the YAC physical intervals in that chromosome. Sites were inserted insert by homologous recombination in yeast using highly from a URA3-1-I-Sce cassette by homologous recombina repetitive sequences (e.g., B2 in mouse or Alu in human). 5,474,896 17 18 Transgenic strains are then used as described in ref. P1 to Partial restriction mapping has been done on yeast DNA sort libraries or map genes. and on mammalian cell DNA using the commercial enzyme The procedure has now been extended to YAC containing I-Sce. DNA from cells containing an artificially inserted 450 kb of Mouse DNA. To this end, a repeated sequence of I-Scelsite is first cleaved to completion by I-SceI. The DNA mouse DNA (called B2) has been inserted in a plasmid is then treated under partial cleavage conditions with bac containing the I-Sce site and a selectable yeast marker (LYS2). Transformation of the yeast cells containing the terial restriction endonucleases of interest (e.g., BamHI) and recombinant YAC with the plasmidlinearized within the B2 electrophoresed along with size calibration markers. The sequence resulted in the integration of the I-Sce site at five DNA is transferred to a membrane and hybridized succes different locations distributed along the mouse DNA insert. sively using the short sequences flanking the I-Sce sites on Cleavage at the inserted I-Sce sites using the enzyme has O either side (these sequences are known because they are part been successful, producing nested fragments that can be of the original insertion vector that was used to introduce the purified after electrophoresis. Subsequent steps of the pro I-Sce site). Autoradiography (or other equivalent detection tocol exactly parallels the procedure described in Example 1. system using non radioactive probes) permit the visualiza EXAMPLE 3 tion of ladders, which directly represent the succession of Application of Nested Chromosomal Fragments to the 15 the bacterial restriction endonuclease sites from the I-Sce Direct Sorting of Cosmid Libraries site. The size of each band of the ladder is used to calculate The nested, chromosomal fragments can be purified from the physical distance between the successive bacterial preparative PFG and used as probes against clones from a restriction endonuclease sites. chromosome X1 specific sublibrary. This sublibrary is com posed of 138 cosmid clones (corresponding to eight times 20 APPLICATION OF I-Sce FOR IN VIVO SITE coverage) which have been previously sorted from our DIRECTED RECOMBINATION complete yeast genomic libraries by colony hybridization with PFG purified chromosome X1. This collection of 1. Expression of I-SceI in yeast unordered clones has been sequentially hybridized with The synthetic I-Scel gene has been placed under the chromosome fragments taken in order of increasing sizes 25 control of a galactose inducible promoter on multicopy from the left end of the chromosome. Localization of each plasmids pPEX7 and pPEX408. Expression is correct and cosmid clone on the I-Sce map could be unambiguously induces effects on site as indicated below. A transgenic yeast determined from such hybridizations. To further verify the with the I-Sce synthetic gene inserted in a chromosome results and to provide a more precise map, a subset of all under the control of an inducible promoter can be con cosmid clones, now placed in order, have been digested with 30 structed. EcoRI, electrophoresed and hybridized with the nested 2. Effects of site specific double strand breaks in yeast (refs. series of chromosome fragments in order of increasing sizes 18 and P4) from the left end of the chromosome. Results are given in Effects on plasmid-borne I-SceI sites: FIG. 17. Intramolecular effects are described in detail in Ref. 18. For a given probe, two cases can be distinguished: cosmid 35 Intermolecular (plasmid to chromosome) recombination can clones in which all EcoRI fragments hybridize with the be predicted. probe and cosmid clones in which only some of the EcoRI Effects on chromosome integrated I-Sce sites fragments hybridize (i.e., compare pEKG100 to pEKG098 In a haploid cell, a single break within a chromosome at in FIG. 17b). The first category corresponds to clones in an artificial I-Sce site results in cell division arrest followed which the insert is entirely included in one of the two 40 by death (only a few % of survival). Presence of an intact chromosome fragments, the second to clones in which the sequence homologous to the cut site results in repair and insert overlaps an I-SceI site. Note that, for clones of the 100% cell survival. In a diploid cell, a single break within a pEKG series, the EcoRI fragment of 8 kb is entirely com chromosome at an artificial I-SceI site results in repair using posed of vector sequences (pWE15) that do not hybridize the chromosome homolog and 100% cell survival. In both with the chromosome fragments. In the case where the 45 cases, repair of the induced double strand break results in chromosome fragment possesses the integration vector, a loss of heterozygosity with deletion of the non homologous weak cross hybridization with the cosmid is observed (FIG. sequences flanking the cut and insertion of the non homolo 17e). gous sequences from the donor DNA molecule. Examination of FIG. 17 shows that the cosmid clones can 3. Application for in vivo recombination YACs in Yeast unambiguously be ordered with respect to the I-SceI map 50 Construction of a YAC vector with the I-Sce restriction (FIG. 13E), each clone falling either in a defined interval or site next to the cloning site should permit one to induce across an I-Sce site. In addition, clones from the second homologous recombination with another YAC if inserts are category allow us to place some EcoRI fragments on the partially overlapping. This is useful for the construction of I-SceI maps, while others remain unordered. The complete contigs. set of chromosome XI- specific cosmid clones, covering 55 4. Prospects for other organisms altogether eighttimes the equivalent of the chromosome, has Insertion of an I-Sce restriction site has been done for been sorted with respect to the I-Sce map, as shown in FIG. bacteria (E. coli, Yersinia entorocolitica, Y. pestis, Y. 18. pseudotuberculosis), and mouse cells. Cleavage at the arti 5. Partial restriction mapping using I-SceI ficial I-Sce site in vitro has been successful with DNA from In this embodiment, complete digestion of the DNA at the 60 the transgenic mouse cells. Expression of I-Sce from the artificially inserted I-Sce site is followed by partial diges Synthetic gene in mammalian or plant cells should be tion with bacterial restriction endonucleases of choice. The successful. restriction fragments are then separated by electrophoresis The I-Sce site has been introduced in mouse cells and and blotted. Indirect end labelling is accomplished using left bacterial cells as follows: or right I-Sce half sites. This technique has been successful 65 1-Mouse cells: with yeast chromosomes and should be applicable without a-Mouse cells (2) were transfected with the DNA of difficulty for YAC. the vector pMLV LTR SAPLZ containing the I-SceI 5,474,896 19 20 site using standard calcium phosphate transfection in RNA secondary structure. Biochimie, 1982, 64, 867–881. technique. 3. F. Michel and B. Dujon, Conservation of RNA secondary b-Transfected cells were selected in DMEM medium structures in two intron families including mitochondrial-, containing phleomycin with 5% fetal calf serum and chloroplast-, and nuclear-encoded members. The EMBO grown under 12% CO, 100% humidity at 37°C. until Journal, 1983, 2, 33-38. they form colonies. 4. A. Jacquier and B. Dujon, The intron of the mitochondrial 21S rRNA gene: distribution in different yeast species and c-Phleomycin resistant colonies were subcloned once in sequence comparison between Kluyveromyces thermotoler the same medium. ans and Saccharomyces cerevisiae. Mol. Gen. Gent. (1983) d-Clone MLOP014, which gave a titer of 10 virus O 192, 487-499. particles per ml, was chosen. This clone was deposited 5. B. Dujon and A. Jacquier, Organization of the mitochon at C.N.C.M. on May 5, 1992 under culture collection drial 21S rRNA gene in Saccharomyces cerevisiae: mutants accession No. I-1207. of the peptidyl transferase centre and nature of the omega e-The supernatant of this clone was used to infect other locus in "Mitochondria 1983', Editors R. J. Schweyen, K. mouse cells (1009) by spreading 10 virus particles on 15 Wolf, F. Kaudewitz, Walter de Gruyter et Co., Berlin, N.Y. 10 cells in DMEM medium with 10% fetal calf serum (1983), 389-403. and 5 mg/ml of "polybrain'. Medium was replaced 6 6. A. Jacquier and B. Dujon, An intron encoded protein is hours after infection by the same fresh medium. active in a gene conversion process that spreads an intron f-24 hours after infection, phleomycin resistant cells into a mitochondrial gene. Cell (1985) 41, 383-394. were selected in the same medium as above. 20 7. B. Dujon, G. Cottatel, L. Colleaux, M. Betermier, A. g-phleomycin resistant colonies were subcloned once in Jacquier, L. D'Auriol, F. Galibert, Mechanism of integration the same medium. of an intron within a mitochondrial gene: a double strand h-one clone was picked and analyzed. DNA was purified break and the transposase function of an intron encoded with standard procedures and digested with I-Sce protein as revealed by in vivo and in vitro assays. "In under optimal conditions. 25 Achievements and perspectives of Mitochondrial 2-Bacterial cells: Research”. Vol. II, Biogenesis, E. Quagliariello et al. Eds. Mini Tn5 transposons containing the I-Sce recognition Elsevier, Amsterdam (1985) pages 215-225. site were constructed in E. coli by standard recombinant 8. L. Colleaux, L. D'Auriol, M. Betermier, G. Cottarel, A. DNA procedures. The mini Tn5 transposons are carried on Jacquier, F. Galibert, and B. Dujon, A universal code equiva a conjugative plasmid. Bacterial conjugation between E. coli 30 lent of a yeast mitochondrial intron reading frame is and Yersinia is used to integrate the mini Tn5 transposon in expressed into as a specific double strand Yersinia. Yersinia cells resistant to Kanamycin, Streptomy endonuclease. Cell (1986) 44, 521-533. cin or tetracycline are selected (vectors pTKM-co, pTSM-co 9. B. Dujon, L. Colleaux, A. Jacquier, F. Michel and C. and pTTc-co, respectively). Monteilhet, Mitochondrial introns as mobile genetic ele Several strategies can be attempted for the site specific 35 ments: the role of intron-encoded proteins. In "Extrachro insertion of a DNA fragment from a plasmid into a chro mosomal elements in lower eucaryotes', Reed B et al. Eds. mosome. This will make it possible to insert transgenes at (1986) Plenum Pub. Corp. 5-27. predetermined sites without laborious screening steps. 10. F. Michel and B. Dujon, Genetic Exchanges between Strategies are: Bacteriophage T4 and Filamentous Fungi? Cell (1986) 46, 1-Construction of a transgenic cell in which the I-SceI 40 323. recognition site is inserted at a unique location in a chro 11. L. Colleaux, L. D'Auriol, F. Galibert and B. Dujon, mosome. Cotransformation of the transgenic cell with the Recognition and cleavage site of the intron encoded omega expression vector and a plasmid containing the gene of transposase. PNAS (1988), 85, 6022-6026. interest and a segment homologous to the sequence in which 12. B. Dujon, Group I introns as mobile genetic elements, the I-Sce site is inserted. 45 facts and mechanistic speculations: A Review. Gene (1989), 2-Insertion of the I-Sce recognition site next to or 82, 91-114. within the gene of interest carried on a plasmid. Cotrans 13. B. Dujon, M. Belfort, R. A. Butow, C.Jacq., C. Lemieux, formation of a normal cell with the expression vector P. S. Perlman, V. M. Vogt, Mobile introns: definition of terms carrying the synthetic I-Sce gene and the plasmid contain and recommended nomenclature. Gene (1989), 82, 115-118. ing the I-SceI recognition site. 50 14. C. Monteilhet, A. Perrin, A. Thierry, L. Colleaux, B. 3-Construction of a stable transgenic cell line in which Dujon, Purification and Characterization of the in vitro the I-SceIgene has been integrated in the genome under the activity of I-Sce, a novel and highly specific endonuclease control of an inducible or constitutive cellular promoter. encoded by a group I intron. Nucleic Acid Research (1990), Transformation of the cell line by a plasmid containing the 18, 1407-1413. I-Sce site next to or within the gene of interest. 55 15. L. Colleaux, M. R. Michel-Wolwertz, R. F. Matagne, B. Site directed homologous recombination: diagrams of Dujon-The apocytochrome b gene of Chlamydomonas successful experiments performed in yeast are given in FIG. Smithii contains a mobile intron related to both Saccharo 19. myces and Neurospora introns. Mol. Gen. Genet. (1990) 223, 288-296. PUBLICATIONS CITED IN APPLICATION 60 16. B. Dujon Des introns autonomes et mobiles. Annales de I’Institut Pasteur/Actualires (1990) 1. 181-194. 1. B. Dujon, Sequence of the intron and flanking exons of the 17. A. Thierry, A. Perrin, J. Boyer, C. Fairhead, B. Dujon, B. mitochondrial 21 S rRNA gene of yeast strains having Frey, G. Schmitz. Cleavage of yeast and bacteriophage 17 different alleles at the w and RIB 1 loci. Cell (1980) 20, genomes at a single site using the rare cutter endonuclease 185-187. 65 I-Sce. I Nuc. Ac. Res. (1991) 19, 189-190. 2. F. Michel, A. Jacquier and B. Dujon, Comparison of 18. A. Plessis, A. Perrin, J. E. Haber, B. Dujon, Site specific fungal mitochondrial introns reveals extensive homologies recombination determined by I-Sce, a mitochondrial intron 5,474,896 21 22 encoded endonuclease expressed in the yeast nucleus. A14. B. Dujon, L. Colleaux, C. Monteilhet, A. Perrin, L. GENETICS (1992) 130, 451-460. D’Auriol, F. Galibert, Group I introns as mobile genetic elements: the role of intron encoded proteins and the nature of the target site. 14th Annual EMBO Symposium ABSTRACTS "Organelle genomes and the nucleus' Heidelberg, 26-29 A1. A. Jacquier, B. Dujon. Intron recombinational insertion Sep. 1988. at the DNA level: Nature of a specific receptor site and direct A15. L. Colleaux, R. Matagne, B. Dujon, A new mobile role of an intron encoded protein. Cold Spring Harbor mitochondrial intron provides evidence for genetic Symposium 1984. exchange between Neurospora and Chlamydomonas spe A2. I. Colleaux, L. D'Auriol, M. Demariaux, B. Dujon, F. O cies. Cold Spring Harbor, May 1989. Galibert, and A. Jacquier, Construction of a universal code A16. L. Colleaux, M. R. Michel-Wolwertz, R. F. Matagne, equivalent from a mitochondrial intron encoded transposase B. Dujon, The apoxytochrome b gene of Chlamydomonas gene using directed multiple mutagenesis. smithi contains a mobile intron related to both Saccharo Colloque International de DNRS "oligonucleotids et Gene myces and Neurospora introns. Fourth International Con tique Moleculaire” Aussois (Savoie) 8-12 January 1985. 5 ference on Cell and Molecular Biology of Chlamydomonas. A3. L. Colleaux, D'Auriol, M. Demariaux, B. Dujon, F. Madison, Wis., Apr. 1990. Galibert, and A. Jacquier, Expression in E. coli of a universal A17. B. Dujon, L. Colleaux, E. Luzi, C. Monteilhet, A. code equivalent of a yeast mitochondrial intron reading Perrin, A. Plessis, I. Stroke, A. Thierry, Mobile Introns, frame involved in the integration of an intron within a gene. EMBO Workshop on "Molecular Mechanisms of transpo Cold Spring Harbor Meeting on “Molecular Biology of 20 sition and its control', Roscoff (France) June 1990. Yeast', Aug. 13-19, 1985. A18. A. Perrin, C. Monteilhet, A. Thierry, E. Luzi, I. Stroke, A4. B. Dujon, G. Cottarel, L. Colleaux, M. Demariaux, A. L. Colleaux, B. Dujon. I-Scel, a novel double strand site Jacquier, L. D'Auriol, and F. Galibert, Mechanism of inte specific endonuclease, encoded by a mobile group I intron in gration of an intron within a mitochondrial gene: a double Yeast. Workshop on "RecA and Related Proteins' Sacly, strand break and the "transposase' function of an intron 25 France 17-21 Sep. 1990. encoded protein as revealed by in vivo and in vitro assays. A19. A. Plessis, A. Perrin, B. Dujon, Site specific recombi International symposium on "Achievements and Perspec nation induced by double strand endonucleases, HO and tives in Mitochondrial Research', Selva de Fasono (Brindisi, I-Scel in yeast. Workshop on "RecA and Related Proteins” Italy) 26 Sep. 1985. Saclay, France 17-21 Sep. 1990. A5. L. Colleaux, G. Cottarel, M. Betermier, A. Jacquier, B. 30 A20. B. Dujon, The genetic propagation of introns 20th Dujon, L. D'auriol, and F. Galibert, Mise en evidence de FEBS Meeting, Budapest, Hungary, August 1990. l'activite endonuclease double brin d'unc protein codee par A21. E. Luzi, B. Dujon, Analysis of the intron encoded site un intron mitochondrial de levure. Forum sur la Biologie specific endonuclease I-Sce by mutagenesis, Third Euro Moleculaire de la levure, Bonbannes, France 2-4 Oct. 1985. pean Congress on Cell Biology, Florence, Italy, September A6. B. Dujon, L. Colleaux, F. Michel and A. Jacquier, 35 1990. Mitochondrial introns as mobile genetic elements. In "Extra A22. B. Dujon, Self splicing introns as contagious genetic chromosomal elements in lower eucaryotes', Urbana, Ill., elements. Journees Franco-Beiges de Pont a Mousson. Octo 1-5 Jun. 1986. ber 1990. A7. L. Colleaux and B. Dujon, Activity of a mitochondrial A23. B. Frey, H. Dubler, G. Schmitz, A. Thierry, A. Perrin, intron encoded transposase. Yeast Genetics and Molecular 40 J. Boyer, C. Fairhead, B. Dujon, Specific cleavage of the Biology Meeting, Urbana, Ill. 3-6 Jun. 1986. yeast genome at a single site using the rare cutter endonu A8. L. Colleaux and B. Dujon, The role of a mitochondrial clease I-SceIHuman Genome, Frankfurt, Germany, Novem intron encoded protein. XIIIth International Conference on ber 1990. Yeast Genetics and Molecular Biology, Banff, Alberta A24. B. Dujon, A. Perrin, I. Stroke, E. Luzi, L. Colleaux, A. (Canada) 31 Aug.-5 Sep. 1986. 45 Plessis, A. Thierry, The genetic mobility of group I introns A9. L. Colleaux, L. D'Aurio, F. Galibert and and B. Dujon, at the DNA level. Keystone Symposia Meeting on "Molecu Recognition and cleavage specificity of an intron encoded lar Evolution of Introns and Other RNA elements', Taos, N. transposase. 1987 Meeting on Yeast Genetics and Molecular Mex, 2-8 Feb. 1991. Biology. San Francisco, Calf. 16-21 Jun. 1987. A25. B. Dujon, J. Boyer, C. Fairhead, A. Perrin, A Thierry, A10. A. Perrin, C. Monteilhet, L. Colleaux and B. Dujon, 50 Cartographie chez la levure. Reunion "Strategies Biochemical activity of an intron encoded transposase of d'etablissement des cartes geniques' Toulouse 30-31 Mai Saccharomyces cerevisiae. Cold Spring Harbor Meeting on 1991. "Molecular Biology of Mitochondria and chloroplasts' A26. B. Dujon, A. Thierry, Nested chromosomal fragmen 25–30 Aug. 1987 Cold Spring Harbor, N.Y. tation using the meganuclease I-Sce: a new method for the A11. B. Dujon, A. Jacquier, L. Colleaux, C. Monteilhet, A. 55 rapid mapping of the yeast genome. Elounda, Crete 15-17 Perrin, "Les Introns autoepissables et leurs proteins' Col Mail 1991. loque "Biologie Moleculaire de la levure: expression gene A27. A. Thierry, L. Gaillon, F. Galibert, B. Dujon. The tique chez Saccharomyces' organise parla Societe francaise chromosome XI library: what has been accomplished, what de Microbiologie 18 Jan. 1988 Institut Pasteur, Paris. is left. Brugge meeting 22-24 Sep.1991. A12. L. Colleaux, L. D'Auriol, C. Monteilhet, F. Galibert 60 A28. B. Dujon, A. Thierry, Nested chromosomal fragmen and B. Dujon, Characterization of the biochemical activity tation using the meganuclease I-Sce: a new method for the of an intron encoded transposase. 14th International Con rapid physical mapping of the eukaryotic genomes. Cold ference on Yeast Genetics and Molecular Biology. Espoo, Spring Harbor 6-May 1992. Finland, 7-13 Aug. 1988. A29. A. Thierry, L. Gaillon, F. Galibert, B. Dujon. Yeast A13. B. Dujon, Agoup I intron as a mobile genetic element, 65 chromosome XI: construction of a cosmid contig. a high Albany Conference sur "RNA: catalysis, splicing, evolu resolution map and sequencing progress. Cold Spring Har tion”, Albany, N.Y., 22-25 Sep. 1988. bor 6-May 1992. 5,474,896 23 24 IN PREPARATION are essential for endonuclease function. Submission to P1. A. Thierry and B. Dujon, Nested Chromosomal Frag EMBO J. mentation. Using the Meganuclease I-Sce: Application to P4. C. Fairhead and B. Dujon: Consequences of a double the physical mapping of a yeast chromosome and the direct strand break induced in vivo in yeast at specific artificial sorting of cosmid libraries. Probably Submission to 5 sites,ai using the meganuclease I-SceI. Possiblee submission to or EMBO J. GENETICS, NATURE. P2. A. Thierry, L. Colleaux and B. Dujon: Construction and P5 A. Perrin, and B. Dujon: Asymetrical recognition by the Expression of a synthetic gene coding for the meCanuclease I-Sce endonuclease on exon and intron sequences reveals a I-Sce. Possible submission: NAR, EMBO J. new step in intron mobility. Possible submission: NATURE. P3. I. Stroke, V. Pelicic and B. Dujon: The evolutionarily The entire disclosure of each of these publications and conserved dodecapeptide motifs of intron-encoded I-Sce abstracts is relied upon and incorporated by reference herein.

SEQUENCE LISTING

( 1) GENERAL INFORMATION: ( i i i ) NUMBER OF SEQUENCES: 52

( 2) INFORMATION FOR SEQID NO:1: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 714 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single ( D.) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:1:

A T G CATA T GA AA AA CAT CAA AAAAAA CCA G GTAATGAACC TCG GT C C GAA CTCT AAA CTG 60

CT GAA A GAA. T A CAAA. T C C C A G CT GAT C GAA CT GAA CAT C G AACA GT T C G A A G CAG GT ATC 20

G GT CT GAT C C T G G G T G AT GC TTA CAT C C GT T C T C G T G AT G AAG G T AAAAC CT ACT G T A T G. 8 O

CA GT TCGA GT G GAAAAACAA A G CATA CATG GACCAC GTA T G T C T G C T G T A CGAT CAG T G G 24 O

GT ACT G T C C C C GCC GCA CAA AA AA GAA CG T G T AA C C A C C T G G G T AAC CT G G T AAT CACC 3 O O

TGG G GCGC C C A GA CTTT CAA ACAC CAA G CT TT CAA CAAA C T G GCT. AAC CT G T T CAT C GT T 3 6 O

AA CAA CAA AA AAA C CAT C C C GAA CAA C C T G GTTGAAAA CT AC CT G A C C C C G A T G T C T CTG 4 2 O

G CAT ACT G GT T CAT GG AT G A T G G T G G T AAA T G G GATTACA A CAAAAA CT C T A C CAA CAAA 48 0.

T CGA TCG T A C T GAA CACCCA GT CTTT CACT TTC GAA GAA G T A GAA TAC CT G GT TAAG G GT 5 4 O

CT GC GTA ACA AA TT CCAA CT GAA CT G T TAC GT AAAAAF CA. A. CAAAAACAA A C C GAT CATC 60 0

TACA. T C GATT CTAT GT CTTA C CT GAT CTTC TA CAA C CT GA. T CAAA C C G T A C CT GAT C C CG 60

CAGATGA T G T ACA AA C GCC GAA CA CTAT C T C C T C C GAAA CTTTCCT GAA ATA A T 14

( 2) INFORMATION FOR SEQID NO:2: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 237 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: peptide ( x i ) SEQUENCE DESCRIPTION: SEQID NO:2: M. c t H is Me t Lys As n I 1 c L y s Lys As in G 1 n V a l Me t As n Le u G y Pro 5 1 O 15 As n S c r Lys Le u L. c u Lys G u Ty r Lys Ser G l n Le u I I e G 1 u Le u As n 2 O 25 3 O I l c G 1 u G 1 n P he G 1 u A 1 a G y I e G 1 y Le u I l c L e u G l y A s p A l a Tyr 35 45 5,474,896

-continued

i 1 c A rig S c r A rig A s p G i u G 1 y Lys Thr Ty r Cys Me t G 1 in P he G 1 u T r p 5 55 6 O Lys As n Lys A a Ty I Me t A s p H is V a l Cy s Le u Le u Ty r A s p G l n T r p 65 70 7 5 8 0 V a l Le u S c r Pro Pro H is Ly S Lys G i u A rig V a l As n H is L. c u G 1 y A sm 85 90 95 L. c u Wa l l l e Thr T r p G l y Al a G in Th r Phi e Lys H is G in l a Ph. e. As in 1 OO O 5 O Lys L. eu la As n L. eu Ph. c. I le V a l As n As n Lys Lys Thr I 1 e P r o As n 1 15 120 1 2 5 As n Le u V a l G 1 u As in Ty r Le u T hr Pro Mc t Ser Le u A 1 a Ty r T r p P he 3 0. 35 1 4 0 Me t A s p A s p G i y G 1 y Lys T r p A s p Ty r A s in Lys As n Ser T h r A s in Lys 4, 5 1 5 155 1 6. 0 Sc r I le Wa l Le u s in Thr G 1 n Ser Phe Thr P he G 1 u G u V a l G 1 u Tyr 65 0 1 75 L. eu Wa Lys G 1 y Le u. A rig As n Lys Ph. c G 1 in L. c u As in Cys y r V a Lys 80 1 85 90 I lic As in y s As n Lys Pro I 1 e I c Ty r. I 1 e A s p Ser Me t Ser Ty r Le u 95 0 0 2 O 5 i le Phe Ty r A s in Lc u I 1 e Lys Pro Ty r Le u I le Pro G 1 in Me t M c t Ty r 2 : 0 2 15 22 O Lys Le u Pro A sin Th r I 1 e Ser Ser G 1 u Thr P he Le u Lys 2 25 2 30 2 35

( 2) INFORMATION FOR SEQID NO:3: ( i) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 722 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: AAAAAAAAA. T CAT AGAAA AA TATT AAAA AA AAT CAA GT AAT CAA T C T C G G T C CTA TTT 6 0

CTA AAT TATT AAAAGAAA AAAT CACAAT TAATTGAATT AAA TATTGAA CAATTT GAAG 20

CAG GTATTGG T TT AATTT TA G GAGA T G CTT ATA TT CGT A G T C G T G AT GAA G G T AAAACTT 80

ATTGA, GCA ATTT GAG T G G AAA AATAAG G CATA CAT G GA. T CA T G T A T G T T TAT TATA T G 2 4 O

AT CAA T G G G T ATACA C C T C CT CATAAAA AAGAAA GA GT TAAT CAT TTA. GGTAATTT AG 3 O O

TAATTAC CTG GGG A G CT CAA. A. CTTTTAAA C ATCA A G CTTT T. AATAAA TIA GCT. AACTTAT 3 60

TTA TT G T AAA TAATAAAAAA CTTAT T C CTA ATAATT TA GT T GAA AAT TAT TT AA CAC CTA 4 20

TGA GT CTG GC ATATTGGTTT ATGGA T G A T G GAG G T AAA T G G GATTATAAT AAAAATT CT C 4 80

TT AATAAAAG TAT T G T ATTA AAT ACA CAAA GTTTT ACT TT T. GAA GAA G T A GAA TATTT AC 5 40 TT AAA GGTTT AAGAAATAAA TTT CAATT AA AT T G T T A T G T TAA AA TT AAT AAA AATAAA C 60 0

CAATTATTTA TATTGATT CT AT GA GT TAT C T GATTTTTA AA TT TAA TT AAA CCTTA TT 6 60

TAAT T C C CA. AATGA T G T AT AAA C T G CCTA ATA CTATTTC ATCC GAA ACT TTTTT AAAAT 20

AA 7 22

( 2) INFORMATION FOR SEQID NO:4: 5,474,896

-continued ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 235 amino acids (B) TYPE: amino acid ( D.) TOPOLOGY: linear ( i i ) MOLECULETYPE: peptide ( x i ) SEQUENCE DESCRIPTION: SEQID NO:4: Me Lys As n I 1 e Lys Lys As in G l n V a Mc I. As n Lc u G y Pro As a Se 5 O 1 5 Ly is Le u Lc u Lys G 1 u Tyr Lys Ser G 1 in Le u I le G 1 u L. eu. As n I le G 1 20 25 30 G l n P he G 1 u A 1 a G y I 1 c G 1 y Le u I l c L e u G y A s p A 1 a Ty r I 1 e A r 35 40 4 5 Sc r A rig A s p G 1 u G 1 y Lys Thr Ty r Cys Mc t G in P he Glu Trip Lys As 5 O 55 6 0 Lys Al a Ty I Met A s p H is V a 1 Cy s Le u Lc u Ty r A s p G l n Trip V a Le 6 5 7 0 7 5 8 0 Sc r Pro Pro H is Lys Lys G 1 u A rig V a l As n H is Le u Gil y As n Le u V a 85 9 0 95 I 1 c Th r Trip G 1 y Al a G 1 in Thr Phc Lys H is G n. A 1 a Phe s in L y S I c 1 00 O 5 1 0

A 1 a. As In Ph e I le Wa l As in As in Lys Lys Le u I 1 c Pro As n As n Le 1 15 2 0 1 2 5 V a 1 G 1 u A s in Ty r Le u Thr P to Mc t Ser Le u A 1 a Ty r T r p Phc Me t As 1. 3 O 1 35 40 A s p G l y G l y Lys Trip A s p Ty r A s in Lys As n S c r L. c u As n Lys S c r I 1 4 S 1 5 O 155 6 V a 1 Le u As in Thr G 1 in S c r Ph e Thr P he G 1 u G 1 u V a l Cy s Ty r Le u V a 1 5 5 1 7 O 1 75 Lys Gl y L. c u A rig As n Lys P he G l n Le u As n Cys Ty r V a l Lys I e As 80 185 1 90 Lys A Sn y s Pro I le I 1 c Ty r I 1 e A s p Ser Mc Ser Tyr Le u I le Ph 1 95 0 0 2 05 Ty r As n I e I 1 c L y s Pro y Le u I le Pro G 1 in Mc t Me t Tyr Lys Le 2 1 0 2 5 22 O Pro As n Th r I le S c r S c r G 1 u Thr P he L cu Lys 2 25 23 O 2 35

( 2) INFORMATION FOR SEQ ID NO:5: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 754 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:5: CCG GAT C CAT G CATA TGA AA AA CAT CAA AA AAAA C CAG GT AATGAAC CTG G GT C C GAA CT 60 CTA AA CT GCT GAA A GAAT AC AAA. T C C C A G C T G AT CGAACT GAA CAT C GAA CA GT T C GAA G 1 20 CAG G T A TCG G T CT GAT C CTG G G T G A T G CTT ACA TCC GT T C T C G T G AT GAA G G T AAAAC CT 80 A CT G T A T GCA GT T C GA GTGG AAAAACAAA G CATA CAT G GA C CAC GTA T G T C T G CT G T ACG 2 40 AT CAG T G G GT ACT G T C C C C G C C GCA CAAAA AA CAA CG T G T TAA C CAC CTG G GT AAC CT GG 3 O 0 TAAT CAC CTG GGGCGC C CAG ACT TT CAAA C A CCAA G CTTT CAA CAAA CT G GCTA AC CTGT 3 6 O 5,474,896 29 30

-continued

T CAT CGT TAA CAA CAAAAAA A C CAT C C C GA A CAA C C T G GT T GAAAACTAC C T G A C C C C GA 4 20

T G T C T C T G GC ATA CTGGTTC A T G GAT GATG G T G GT AAATG GG AT TA CAA C AAAAA CTCTA 48 0

C CAA CAAATC GA T C GT ACTG AA CA C C C A GT CTTT CAC TTT C GAA GAA GTA GAA TAC CTG G 5 4 0

TTA AGGGT CT GC GT AA CAAA TT C CAA CT GA A CT G T TAC GT AAAAATCAA C AAAAACAA AC 60 0

CGAT CAT CTA CAT CGATT CT A T G T CTTAC C T GAT CTT CTA CAAC CT GAT C AAA C C G T ACC 6 60

TGA T C C C G CA GA T G A T G T AC AAA CT GCC GA A CA CTAT CTC C T C C GAA ACT TT C C T GAAAT 72 0

AATAA GT CGA C T G CAG GATC CG G T AA GT AA GT AA 7 54

( 2) INFORMATION FOR SEQID NO:6: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQ D NO:6:

AA T G CT TT C C A 1 1

( 2) INFORMATION FOR SEQID NO:7: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 25 base pairs (B) TYPE: nuclcic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:7:

GT TA CGCTAG G GATA ACA GG G T AAT 25

( 2) INFORMATION FOR SEQID NO:8: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 25 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i SEQUENCE DESCRIPTION: SEQID NO:8:

CAA T GC GAT C C CTAT T G T C C CAT TA 25

( 2) INFORMATION FOR SEQID NO:9: ( i ) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 1738 base pairs (B) TYPE: nucleic acid ( C ) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQ D NO:9:

GCGGA CAG G T A T C C G G T A A G CG G CAGGGT C G GAA CAGGAG A G C GCAC GAG GG A G CTTC CA 60

GGGGGAAA CG CCTG GTAT CT T TATA GT CCT GT CGGGTTTC GCCA C CT CT G A CTTGAGC GT 12 O.

CGATTTTT G T G A T G C T C GT C A G GGG G GC GG A GCC TAT G GA AAAA C GCC AG CAAC GCG GCC 8 0

TTTTT ACG GT T C C T G GCCTT TT G C T G GCCT TTT GCT CACA T G T T. CTTTCC T GC GT TATCC 2 4 0

5,474,896 33 34

-continued

( i i ) MOLECULETYPE: peptide ( xi ) SEQUENCE DESCRIPTION: SEQID NO:11:

Met G 1 n L. c u Al a A rig Gil n Wa Ser A rig L e u Glu S. c. r Gly G l n 5 1 0

( 2) INFORMATION FOR SEQID NO:12: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 13 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: peptide ( xi ) SEQUENCE DESCRIPTION: SEQID NO:12:

Mc t Le u Pro A la A rig Me t L c u Cy s G 1 y I l e Wa Sc r Gly 1. 5

( 2) INFORMATION FOR SEQID NO:13: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE:amino acid (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: peptide ( x i ) SEQUENCE DESCRIPTION: SEQID NO:13:

Me t Th I Me t I le T h As in S. c. r His Wa 1 1. 5

(2) INFORMATION FOR SEQID NO:14: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 80 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: peptide ( x i ). SEQUENCE DESCRIPTION: SEQID NO:14:

Mc t Lys Ser As in As in A 1 a L. c u I 1 c Wa I c G y Th r Wa 1 5 10

A s p A i a V a. i G I y I le Gly L c u Wa Mc i Pro Wa L. c u P I 0 G 1 y 20 25 30

A rig A s p I l e A rig L e u Me t A rig G 1 u A rig A sp G y A rig As n H is As p 40 45

Me t Cys V a l L. eu Ph e Arg T r p A a Cys Gl in As in P e T r p Gly 55 60

As in Wa Le u S c r Pro A l a Lys Le u l r P ro H is P 0 P o Wa As n 65 70 7 5 8 0

(2) INFORMATION FOR SEQID NO:15: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: peptide ( x i ) SEQUENCE DESCRIPTION: SEQID NO:15: Mc t Cy s Gl y I l c V a 1 Se r Gly 1 5 5,474,896

-continued

(2) INFORMATION FOR SEQ ID NO:16: ( i) SEQUENCE CHARACTERISTICS: ( A LENGTH: 237 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: peptide ( x i ) SEQUENCE DESCRIPTION: SEQID NO:16: Mc t H is Mc Lys As n I 1 e Lys Lys As in G 1 in Wal Me t As in Le u G 1 y Pro 5 1 O 5 As n Ser Lys L. eu Le u Lys G 1 u Ty I Lys S c r G in L. c u I le G 1 u L. eu As n 2 0 25 3 0 I le G 1 u G 1 n P he G 1 u A l a Gly I e Gil y Le u I le Le u G 1 y A s p A l a Ty r 35 45 I I c. A rig Ser A rig A s p G 1 u G l y Lys T h r Ty r Cys Me t Gln P he G 1 u T r p 5 O 55 6 O Lys As n Lys A a Ty r Me t A s p H is V a Cy s Le u Lc u Ty r A s p G 1 in Tr p 65 0 7 5 80 V a 1 Le u S c r Pro Pro H is Lys Lys G 1 u A rig V a As n H is Le u G 1 y As in 85 9 O 95

L c u V a I 1 e Thr T r p G 1 y A l a G l n Thr Phc Lys H is G n a Ph. e. As in 1 O 0 1 O 5 1 1 0 Ly S. L. c u a A is in L. c u Ph e I 1 e V a 1 As in As in Lys Lys Thr I le Pir o As n 5 1 20 1 2 5 As n Le u Wa 1 G 1 u As in Ty r Le u Thr Pro Me t Ser Le u A la Tyr T r p P he 30 1 35 4 O Mc t A s p A s p G y Gil y Lys Trip A s p Ty r A s in Lys As n S c r Thr As n Lys 1 45 150 55 60 S e r I 1 c V a l L. eu As n Thr G 1 n Ser Phc Th r Phc G 1 u G 1 u Val G 1 u Tyr 65 1 7 0 1 75 Le u V a l Lys G 1 y Le u A rig A s in Lys P he G 1 in Le u A sin Cys y r V a l Lys 1 8 O 85 1 90 I l e A s in Lys As n Lys Pro I le I 1 e Ty r I 1 e A s p S cr Mc t S c r Ty r Le u 95 O. O. 2 5 I le Phe Ty r A is in Le u I e Lys Pro Ty r Lc u I le Pro G in Me t Me t Ty r 2 O 2 5 22 O Ly is Le u Pro As in Th r I 1 e S e r S e r G i u Thr P he Le u Lys 225 O 2 35

( 2) INFORMATION FOR SEQID NO:17: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 basc pairs (B) TYPE: nuclcic acid (C) STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID No:17:

CGCTAGGG AT AA CA. GGGTAA TATA GC 2 6

( 2) INFORMATION FOR SEQID NO:18: ( i) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 26 basc pairs (B) TYPE: nucleic acid 5,474,896 37 38

-continued (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:18:

GCGAT C C CTA TT G T C C CATT ATA TCG 2 6

( 2) INFORMATION FOR SEQID NO:19: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH; 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i SEQUENCE DESCRIPTION: SEQID NO:19:

TT CT CAT GAT TAG C T CTAAT C CAT GG 2 6

( 2) INFORMATION FOR SEQID NO:20: ( i) SEQUENCE CHARACTERISTICS: ( A LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:20:

AAG A GT ACTA ATC GA. GATT A G G T ACC 2 6

( 2) INFORMATION FOR SEQID NO:21: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid ( C ) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:21:

C T TT G GT CAT C C A GAA GT AT ATA TTT 2 6

( 2) INFORMATION FOR SEQID NO:22: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:22:

GAAA CCA GTA. GGT C T CATA TAI AAA 2 6

( 2) INFORMATION FOR SEQID NO:23: ( i ) SEQUENCE CHARACTERISTICS: ( A LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) 5,474,896 39 40 -continued

( x i ) SEQUENCE DESCRIPTION: SEQED NO:23:

TAACGG T C CT AAG G TAG CGA AATTCA 2 6

( 2) INFORMATION FOR SEQED NO:24: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH; 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single ( D TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:24:

ATT GCCA GGA TT C CAT CGCT TTA AGT 2

( 2) INFORMATION FOR SEQID NO:25: ( i ). SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:25: TGA C T C T C T AAG G T A G CCA AA GCC 2 6

( 2) INFORMATION FOR SEQID NO:26: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single ( D TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:26: A CT GAGA GAA TT C CAT CG GT TTA CG G 2 6

( 2) INFORMATION FOR SEQED NO:27: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:27:

CGAGGTTTT G GT AACTATTT AT TAC C 2 6

( 2) INFORMATION FOR SEQED NO:28: ( i) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:28: C CTCCAAAA C CAT GATAAA TAAG G 2 6 5,474,896 41 42

-continued

(2) INFORMATION FOR SEQ ID NO:29: ( i) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 26 base pairs (B) TYPE: nucleic acid ( C ), STRANDEDNESS: single (d) TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:29:

GGGTT CAAAA CGT C G T GAGA CA GTTT 2 6

( 2) INFORMATION FOR SEQID NO:30: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:30:

CCCAA GTTTT GCA GCA CTCT GT CAAA 2 6

( 2) INFORMATION FOR SEQID NO:31: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:31: GA T G CT G TAG G CAT AG GCT T G GT TAT 2 6

( 2) INFORMATION FOR SEQID NO:32: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:32: C T A CGA CAT C C G T A T C C GAA C CAATA 2 6

( 2) INFORMATION FOR SEQID NO:33: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 basc pairs (B) TYPE: nucleic acid ( C ) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:33: CT TT CCG CAA CAG TATAATT T TATA A 2, 6

( 2) INFORMATION FOR SEQID NO:34: 5,474,896 43 44 -continued ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:34:

GAAAG G C G T G T CAT AT TAA AAFATT 2 6

( 2) INFORMATION FOR SEQED NO:35: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: singic (D) TOPOLOGY: incar ( i i ) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:35:

A C CAT GGG GT CAAA T G T CTT T. CTG G G 2 6

( 2) INFORMATION FOR SEQID NO:36: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 basc pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:36:

TGG T A C C C CA GTTT ACA GAA A GA C C C 2 6

( 2) INFORMATION FOR SEQID NO:37: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 basc pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single ( D.) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:37:

G T G CCT GAA T GATATT TATT AC CTTT 2 6

( 2) INFORMATION FOR SEQ ID NO:38: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:38:

G T G CCT GAA T GATA TT TATT AC CTTT 2 6

( 2) INFORMATION FOR SEQID NO:39: ( i ) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single 5,474,896 45 46 -continued (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:39:

CAA CGCT CAG TAGA T G T TTT CTTGG GT CTA CCGTTTAAT 39

( 2) INFORMATION FOR SEQID NO:40: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single (D TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:40:

GT T G CGA GT C AT CTA CAAAA GAA C C CAGA T G G CAAATTA 3 9

( 2) INFORMATION FOR SEQ D NO:41: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 32 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:41:

CAA G CTTA T G A GT AT GAA GT GAA CAC GT TA TT 32

( 2) INFORMATION FOR SEQID NO:42: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 32 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQED NO:42:

GTT CGAAT AC T CAT ACTT CA CT T G T G CAAT AA 3 2

( 2) INFORMATION FOR SEQED NO:43: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 38 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: lincar ( i i ) MOLECULETYPE: DNA (genomic) ( xi ) SEQUENCE DESCRIPTION: SEQID NO:43:

G CTAT T C GT T TTTA T G T ATC TTTT GCG T G T A G CT TT AA 38

(2) INFORMATION FOR SEQID NO:44: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 38 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) 5,474,896 47 48 -continued ( xi ) SEQUENCE DESCRIPTION: SEQID NO:44:

C GATA AG CAA AA ATA CAT AG AAAA C GCA CA T G GAA. A.T 3 8

( 2) INFORMATION FOR SEQID NO:45: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH; 80 base pairs (B) TYPE; nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE; DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:45:

C CAAG C T C G A A TT C GCA T G C T CTA GAG C T C G G T A C C C G G G A C C GCA GT CGA CGCTA GG 60

GAAA CAGGG T AAT ACA GAT 8 0

( 2) INFORMATION FOR SEQID NO:46: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 80 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:46:

GGTTC GAG CT TAAGC GT ACG. AGAT C T C G A G C CAT GGG C C C AG GAC GT CA G CT GC GAT C C 60

CTATTGTCCC AT TAT GT CTA 80

( 2) INFORMATION FOR SEQID NO:47: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 80 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE; DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:47:

AT CAGAT CTA AG C T G CAT G C C T G CAG G T C G A CTCT AGAG GAT C C C C G G G E A C C G A G C T C 60

GAA TT CAC T G GCC GT CG TTT 8 O

( 2) INFORMATION FOR SEQID NO:48: ( i ) SEQUENCE CHARACTERISTICS: ( A ) LENGTH: 80 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single ( D.) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:48:

TA GT CT AGAT TCGAA CGTAC G GAC GT C CAG CT GAGAT CTC CTAGGGG CCC AT G GCTC GAG 6 O

CTA AG T G A C C G GCA G CAAA 8 O

( 2) INFORMATION FOR SEQID No:49: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 80 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single 5,474,896

-continued (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:49:

TACAA CGT C G T G A CTGG GAA AA C C CTG G CG T TAC CCAA CT TAATCGCCTT G CAG CA CATC 60

C C C CTTT CGC CAG CTG GC GT 8 O

( 2) INFORMATION FOR SEQID NO:50: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 80 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:50:

A T G T T G CAGC ACT GAC CCTT TT GGG A C C GC AATGG GT TGA AT TAG CG GAA CGT C G T G TAG 60

GGG GAA A G CG GT CGA C C G CA 80

( 2) INFORMATION FOR SEQED NO:51: ( i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 18 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULE TYPE: DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:51:

TAGGGATA A. C. A. GGG T AAT 1 8

( 2) INFORMATION FOR SEQED NO:52: ( i ) SEQUENCE CHARACTERISTICS: (A) LENGTH: 18 base pairs (B) TYPE: nucleic acid (C)STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULETYPE; DNA (genomic) ( x i ) SEQUENCE DESCRIPTION: SEQID NO:52:

A T C C CTATT G T C C CAT TA 1 8

What is claimed is: 50 (f) detecting the hybridization banding patterns; and 1. A method of genetically mapping a yeast genome that (g) mapping said yeast genome based on the hybridization does not contain a natural restriction site for I-SceI, com banding patterns observed in step (f). prising the steps of: 2. A method of genetically mapping a yeast genome that (a) artificially inserting one or more I-Sce sites at various 55 does not contain a natural restriction site for I-Sce, com positions in the genome; prising the steps of: (b) completely cleaving said genome at the inserted I-Sce (a) artificially inserting one or more I-Sce sites at various sites, with the restriction enzyme I-SceI, to produce positions in the genome; nested chromosomal fragments; 60 (c) purifying said fragments of step (b) by pulsed field gel (b) completely cleaving said genome at the inserted I-SceI electrophoresis; sites, with the restriction enzyme I-Sce, to produce (d) transferring the fragments to a solid membrane; nested chromosomal fragments; (e) hybridizing the fragments bound to said membrane to 65 (c) purifying said fragments of step (b) by pulsed field gel a labelled probe derived from a cosmid clone, electrophoresis; pUKG040. (d) transferring the fragments to a solid membrane; 5,474,896 S1 52 (e) hybridizing the fragments bound to said membrane to (g) mapping said yeast genome based on the hybridization a labelled probe derived from a cosmid clone, banding patterns observed in step (f). pUKG066. (f) detecting the hybridization banding patterns; and ck k k : k