USOO6395889B1 (12) United States Patent (10) Patent No.: US 6,395,889 B1 Robison (45) Date of Patent: May 28, 2002

(54) NUCLEIC ACID MOLECULES ENCODING WO WO-98/56804 A1 * 12/1998 ...... CO7H/21/02 HUMAN HOMOLOGS WO WO-99/0785.0 A1 * 2/1999 ... C12N/15/12 WO WO-99/37660 A1 * 7/1999 ...... CO7H/21/04 (75) Inventor: fish E. Robison, Wilmington, MA OTHER PUBLICATIONS Vazquez, F., et al., 1999, “METH-1, a human ortholog of (73) Assignee: Millennium Pharmaceuticals, Inc., ADAMTS-1, and METH-2 are members of a new family of Cambridge, MA (US) proteins with angio-inhibitory activity', The Journal of c: - 0 Biological Chemistry, vol. 274, No. 33, pp. 23349–23357.* (*) Notice: Subject to any disclaimer, the term of this Descriptors of Protease Classes in Prosite and Pfam Data patent is extended or adjusted under 35 bases. U.S.C. 154(b) by 0 days. * cited by examiner (21) Appl. No.: 09/392, 184 Primary Examiner Ponnathapu Achutamurthy (22) Filed: Sep. 9, 1999 ASSistant Examiner William W. Moore (51) Int. Cl." C12N 15/57; C12N 15/12; (74) Attorney, Agent, or Firm-Alston & Bird LLP C12N 9/64; C12N 15/79 (57) ABSTRACT (52) U.S. Cl...... 536/23.2; 536/23.5; 435/69.1; 435/252.3; 435/320.1 The invention relates to polynucleotides encoding newly (58) Field of Search ...... 536,232,235. identified protease homologs. The invention also relates to 435/6, 226, 69.1, 252.3 the . The invention further relates to methods using s s s/ - - -us the protease polypeptides and polynucleotides as a target for (56) References Cited diagnosis and treatment in protease-mediated disorders. The invention further relates to drug-Screening methods using U.S. PATENT DOCUMENTS the protease polypeptides and polynucleotides to identify 5,552,526 A 9/1996 Nakamura et all 530/350 agonists and antagonists for diagnosis and treatment. The 5883.241 A * 3/1999 Docherty et al. r 536/23.2 invention further encompasses agonists and antagonists 5.990.293 A * 11/1999 Docherty et al...... 536/23.1 based on the protease polypeptides and polynucleotides. The invention further relates to procedures for producing the FOREIGN PATENT DOCUMENTS protease polypeptides and polynucleotides. EP O874050 A2 * 10/1998 ...... C12N/15/15 WO WO-98/55643 A1 * 12/1998 ...... C12P/21/02 1 Claim, No Drawings US 6,395,889 B1 1 2 NUCLEIC ACID MOLECULES ENCODING The invention also provides an isolated fragment or HUMAN PROTEASE HOMOLOGS portion of any of SEQ ID NOS: 1-33 and the complement of SEQ ID NOS: 1-33. In preferred embodiments, the FIELD OF THE INVENTION fragment is useful as a probe or primer, and/or is at least 15, The invention relates to newly identified polynucleotides more preferably at least 18, even more preferably 20–25, 30, having homology to various protease families. The inven 50, 100, 200 or more nucleotides in length. tion also relates to protease polypeptides. The invention The invention also provides isolated fragments of the further relates to methods using the protease polypeptides polypeptides. and polynucleotides as a target for diagnosis and treatment In another embodiment, the invention provides an isolated in protease-mediated and related disorders. The invention nucleic acid molecule that hybridizes under high Stringency further relates to drug-Screening methods using the protease conditions to a nucleotide Sequence Selected from the group polypeptides and polynucleotides to identify agonists and consisting of SEQ ID NOS: 1-33 and the complements of antagonists for diagnosis and treatment. The invention fur SEO ID NOS: 1-33. ther encompasses agonists and antagonists based on the The invention further provides nucleic acid constructs protease polypeptides and polynucleotides. The invention 15 comprising the nucleic acid molecules described above. In a further relates to procedures for producing the protease preferred embodiment, the nucleic acid molecules of the polypeptides and polynucleotides. invention are operatively linked to a regulatory Sequence. The invention also provides vectors and host cells for BACKGROUND OF THE INVENTION expressing the protease nucleic acid molecules and polypep Proteases are a major target for drug action and develop tides and particularly recombinant vectors and host cells. ment. Accordingly, it is valuable to the field of pharmaceu The invention also provides methods of making the tical development to identify and characterize previously vectors and host cells and methods for using them to produce unknown protease nucleic acids and polypeptides. The the protease nucleic acid molecules and polypeptides. present invention advances the State of the art by providing 25 The invention also provides antibodies or antigen-binding previously unidentified human protease Sequences. fragments thereof that Selectively bind the protease polypep tides and fragments. SUMMARY OF THE INVENTION The invention also provides methods of Screening for It is an object of the invention to identify novel proteases. compounds that modulate expression or activity of the It is a further object of the invention to provide novel protease polypeptides or nucleic acid (RNA or DNA). protease polypeptides that are useful as reagents or targets in The invention also provides a process for modulating protease assays applicable to treatment and diagnosis of protease polypeptide or nucleic acid expression or activity, protease-mediated disorders. especially using the Screened compounds. Modulation may It is a further object of the invention to provide poly 35 be used to treat conditions related to aberrant activity or nucleotides corresponding to the novel protease polypep expression of the protease polypeptides or nucleic acids. tides that are useful as targets and reagents in protease assays The invention also provides assays for determining the applicable to treatment and diagnosis of protease-mediated presence or absence of and level of the protease polypep disorders and useful for producing novel protease polypep tides or nucleic acid molecules in a biological Sample, tides by recombinant methods. 40 including for disease diagnosis. Aspecific object of the invention is to identify compounds The invention also provides assays for determining the that act as agonists and antagonists and modulate the expres presence of a mutation in the protease polypeptides or Sion of the novel proteases. nucleic acid molecules, including for disease diagnosis. A further specific object of the invention is to provide In still a further embodiment, the invention provides a compounds that modulate expression of the proteases for 45 computer readable means containing the nucleotide and/or treatment and diagnosis of protease- related disorders. Sequences of the nucleic acids and polypeptides The present invention is based on the identification of of the invention, respectively. novel nucleic acid molecules that are homologous to pro DETAILED DESCRIPTION OF THE tease Sequences. INVENTION 50 Thus, in one aspect, the invention provides an isolated The following text describes classes of proteins with nucleic acid molecule that comprises a nucleotide Sequence which the Sequences described herein have been compared selected from the group consisting of SEQ ID NOS: 1-33 with respect to homology. and the complements of SEQ ID NOS: 1-33. Eukaryotic and Viral Aspartyl Active Sites In another aspect, the invention provides isolated proteins 55 Aspartyl proteases, also known as acid proteases, (EC and polypeptides encoded by nucleic acid molecules of the 3.4.23.-), are a widely distributed family of proteolytic invention. (Foltman (1981) Essays Biochem 17:52–84; Davis In another embodiment, the invention provides an isolated (1990) Annu. Rev. Biophys. Chem. 19:189–215; Rao et al. nucleic acid molecule that comprises a nucleotide Sequence (1991) Biochemistry 30: 4663-4671) that exist in that is at least about 60% identical, preferably at least about 60 vertebrates, fungi, plants, retroviruses and Some plant 80% identical, preferably at least about 85% identical, more Viruses. ASpartate proteases of eukaryotes are monomeric preferably at least about 90% identical, and even more enzymes which consist of two domains. Each domain con preferably at least about 95% identical, and most preferably tains an centered on a catalytic aspartyl residue. about 98% or more identical to a nucleotide sequence The two domains most probably evolved from the duplica selected from the group consisting of SEQ ID NOS: 1-33 65 tion of an ancestral gene encoding a primordial domain. and the complements of SEQ ID NOS: 1-33. Currently known eukaryotic aspartyl proteases include, but The invention also provides isolated variant polypeptides. are not limited to: US 6,395,889 B1 3 4 Vertebrate gastric A and C (also known as Eukaryotic Thiol (Cysteine) Proteases Active Sites Eukary gastricsin). otic thiol proteases (EC 3.4.22.-) (Dufour (1988) Biochimie Vertebrate (rennin), involved in digestion and 70: 1335–1342) are a family of proteolytic enzymes which used for making cheese. contain an active Site cysteine. Catalysis proceeds through a Vertebrate lysosomal D (EC 3.4.23.5) and E thioester intermediate and is facilitated by a nearby (EC 3.4.23.34). Side chain; an asparagine completes the essential catalytic Mammalian (EC 3.4.23.15) whose function is to triad. Proteases that belong to this family include, but are not generate angiotensin I from angiotensinogen in the limited to: plasma. Vertebrate lysosomal cathepsins B (EC 3.4.22. 1), H (EC Fungal proteases Such as aspergillopepsin A (EC 3.4.22.16), L (EC 3.4.22.15), and S (EC 3.4.22.27) 3.4.23.18), candidapepsin (EC 3.4.23.24), mucoropep (Kirschke et al. (1995) Protein Prof. 2:1587–1643). sin (EC 3.4.23.23) (mucor rennin), endothiapepsin (EC Vertebrate lysosomal I (EC 3.4.14.1) 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizo (also known as C) (Kirschke et al. (1995) puspepsin (EC 3.4.23.21). 15 Protein Prof. 2:1587–1643). Yeast Saccharopepsin (EC 3.4.23.25) (proteinase A) (gene Vertebrate calpains (EC 3.4.22.17). Calpains are intrac PEP4). PEP4 is implicated in posttranslational regula ellular calcium- activated thiol proteases that contain tion of vacuolar . both N-terminal catalytic and C-terminal calcium Yeast barrierpepsin (EC 3.4.23.35) (gene BAR1), a pro binding domains. tease that cleaves alpha-factor and thus acts as an Mammalian cathepsin K, which seems involved in Osteo antagonist of the mating pheromone. clastic bone resorption (Shi et al. (1995) FEBS Lett. Fission yeast SXa1 which is involved in degrading or 357: 129-134). processing the mating pheromones. Human cathepsin O (Velasco et al. (1994) J. Biol. Chem. Most retroviruses and Some plant viruses, Such as 269:27136-27142). badnaviruses, encode an aspartyl protease which is an 25 Bleomycin , an that catalyzes the inac homodimer of a chain of about 95 to 125 amino acids. In tivation of the antitumor drug BLM (a glycopeptide). most retroviruses, the protease is encoded as a Segment of a Plant enzymes: barley aleurain (EC 3.4.22.16), EP-B1/ polyprotein which is cleaved during the maturation proceSS B4; kidney bean EP-C1, rice bean SH-EP; kiwi fruit of the virus. It is generally part of the poll polyprotein and, actinidin (EC 3.4.22.14); papaya latex papain (EC more rarely, of the gag polyprotein. 3.4.22.2), chymopapain (EC 3.4.22.6), caricain (EC Caspase Family Active Sites 3.4.22.30), and proteinase IV (EC 3.4.22.25); pea Interleukin-1 beta converting enzyme (EC 3.4.22.36) turgor-responsive protein 15A, pineapple Stem brome (ICE) (Thornberry et al. (1995) Protein Sci. 4:3-12; Kumar lain (EC 3.4.22.32); rape COT44; ice oryzain alpha, (1995) Trends Biochem. Sci. 20:198-202) is responsible for beta, and gamma; tomato low-temperature induced, the cleavage of the IL-1 beta precursor at an Asp-Ala bond 35 Arabidopsis thaliana A494, RD 19A and RD21A. to generate the mature biologically active cytokine, ICE, a House-dust mites allergens DerP1 and EurM1. thiol protease composed of two subunits of 10 (p10) and 20 Kd (p20), both derived by the autocleavage of a 45 Kd Cathepsin B-like proteinases from the worms Caenorhab precursor (p45). Two residues are implicated in the catalytic ditis elegans (genes gcp-1, cpr-3, cpr-4, cpr-5 and mechanism: a cysteine and an histidine. ICE belongs to a 40 cpr-6), Schistosoma mansoni (antigen SM31) and family of peptidases (Nicholson et al. (1997) Trends Bio Japonica (antigen SJ31), Haemonchus contortus (genes chem Sci. 22:299-306) which is implicated in programmed AC-1 and AC-2), and Ostertagia Ostertagi (CP-1 and cell death (apoptosis) and which has been termed 'caspase CP-3). for cysteine aspase. ICE is known as Caspase-1 and the other Slime mold cysteine proteinases CP1 and CP2. members of this family (Alnemri et al. (1996) Cell 45 Cruzipain from TrypanoSoma Cruzi and brucei. 87: 171-171) include, but are not limited to: Throphozoite cysteine proteinase (TCP) from various Caspase-2 (ICH-1, NEDD-2). Plasmodium species. Caspase-3 (also known as apopain, CPP32, Yama), a Proteases from Leishmania mexicana, Theileria annulata protease which, at the onset of apoptosis, proteolyti 50 and Theileria parva. cally cleaves poly(ADP-ribose) polymerase (See) at an Baculoviruses cathepsin-like enzyme (V-cath). Asp-Gly bond. Drosophila Small optic lobes protein (gene Sol), a neu Caspase-4 (ICH-2, TX ICErel-II). ronal protein that contains a calpain-like domain. Caspase-5 (ICH-3, TY, ICErel-III). Yeast thiol protease BLH1/YCP1/LAP3. Caspase-6 (MCH-2). 55 Caenorhabditis elegans hypothetical protein C06G4.2, a Caspase-7 (MCH-3, ICE-LAP3, CMH-1, SCA-2, calpain-like protein. LICE2). Two bacterial peptidases are also part of this family: Caspase-8 (MCH-5, MACH, FLICE). C from LactococcuS lactis (gene pepC) (Chapot-Chartier et al. (1993) Appl. Environ. Microbiol Caspase-9 (MCH-6, ICE-LAP6). 60 59:330–333). Caspase-10 (MCH-4, FLICE2). Thiol protease tpr from Porphyromonas gingivalis. Caspase-11. Three other proteins are structurally related to this family. Caspase-12. Soybean oil body protein P34. This protein has its active Caenorhabditis elegans ced-3 is involved in the initiation 65 Site cysteine replaced by a glycine. of apoptosis. Rat testin, a Sertoli cell Secretory protein highly similar to Drosophila Ice. cathepsin L but with the active site cysteine is replaced US 6,395,889 B1 S 6 by a . Rat testin should not be confused with enzyme activity; the two bind Zinc and the mouse testin which is a LIM-domain protein. glutamate is necessary for catalytic activity. Non active Plasmodium falciparum serine-repeat protein (SERA), members of this family have lost from one to three of these the major blood Stage antigen. This protein of 111 Kd active Site residues. possesses a C-terminal thiol-protease-like domain Hig Cytosol Aminopeptidase Signature gins et al. (1989) Nature 340:604-604), but the active Cytosol aminopeptidase is a eukaryotic cytosolic zinc Site cysteine is replaced by a Serine. dependent that catalyzes the removal of unsub Insulinase family, Zinc-binding region signature Stituted amino-acid residues from the N-terminus of pro A number of proteases dependent on divalent cations for teins. This enzyme is often known as leucine their activity have been shown (Rawlings et al. (1991) aminopeptidase (EC 3.4.11.1) (LAP) but has been shown Biochem. J. 275:389–391; Braun et al. (1995) Trends Bio (Matsushima et al. (1991) Biochem. Biophys. Res. Commun. chem. Sci. 20:171-175) to belong to one family, on the basis 178:1459–1464) to be identical with prolylaminopeptidase of Sequence Similarity. (EC 3.4.11.5). Cytosol aminopeptidase is a hexamer of These enzymes include but are not limited to: identical chains, each of which binds two Zinc ions. Insulinase (EC 3.4.24.56) (also known as insulysin or 15 Cytosol aminopeptidase is highly similar to Escherichia -degrading enzyme or IDE), a cytoplasmic coli pepA, a manganese dependent aminopeptidase. Resi enzyme which seems to be involved in the cellular dues involved in zinc ion-binding (Burley et al. (1992) J. processing of insulin, glucagon and other Small Mol. Biol. 224:113-140) in the mammalian enzyme are polypeptides. absolutely conserved in pepA where they presumably bind Escherichia coli protease III (EC 3.4.24.55) (pitrilysin) manganese. (gene ptr), a periplasmic enzyme that degrades Small A cytosol aminopeptidase from Rickettsia prowazeki (Wood et al. (1993).J. Bacteriol. 175:159-165) and one from peptides. Arabidopsis thaliana also belong to this family. Mitochondrial processing peptidase (EC 3.4.24.64) Zinc , Zinc-binding regions signatures (MPP). This enzyme removes the transit peptide from 25 There are a number of different types of zinc-dependent the precursor form of proteins imported from the cyto carboxypeptidases (EC3.4.17.-) (Tan et al. (1989) J. Biol. plasm acroSS the mitochondrial inner membrane. It is Chem. 264:13165–13170; Reynolds et al. (1989) J. Biol. composed of two nonidentical homologous Subunits Chem. 264:20094-20099). All these enzymes seem to be termed alpha and beta. The beta subunit seems to be Structurally and functionally related. The enzymes that catalytically active while the alpha Subunit has prob belong to this family include, but are not limited to those ably lost its activity. listed below. Nardilysin (EC 3.4.24.61) (N-arginine dibasic convertase A1 (EC 3.4.17.1), a pancreatic diges or NRD convertase) this mammalian enzyme cleaves tive enzyme that can remove all C-terminal amino acids peptide Substrates on the N-terminus of Arg residues in with the exception of Arg, Lys and Pro. dibasic Stretches. 35 (EC 3.4.17.15), a pancreatic diges Klebsiella pneumoniae protein pdqF. This protein is tive enzyme with a specificity Similar to that of car required for the biosynthesis of the coenzyme pyrrolo boxypeptidase A1, but with a preference for bulkier quinoline-quinone (POO). It is thought to be a protease C-terminal residues. that cleaves peptide bonds in a Small peptide (gene (EC 3.4.17.2), also a pancreatic pqqA) thus providing the glutamate and tyrosine resi 40 , but that preferentially removes dues necessary for the synthesis of PQQ. C-terminal Arg and LyS. Yeast protein AXL1, which is involved in axial budding Carboxypeptidase N (EC 3.4.17.3) (also known as argi (Becker et al. (1992) Proc. Natl. Acad Sci. U.S.A. nine carboxypeptidase), a plasma enzyme which pro 89:3835-3839). tects the body from potent vasoactive and inflammatory Eimeria bovis Sporozoite developmental protein. 45 peptides containing C-terminal Arg or Lys (Such as Escherichia coli hypothetical protein yddC and HI1368, kinins or anaphylatoxins) which are released into the the corresponding Haemophilus influenzae protein. circulation. Bacillus Subtilis hypothetical protein ymxG. Carboxypeptidase H (EC 3.4.17.10) (also known as Caenorhabditis elegans hypothetical proteins C28F5.4 50 enkephalin convertase or ), an and F56D2.1. enzyme located in Secretory granules of pancreatic It should be noted that in addition to the above enzymes, islets, adrenal gland, pituitary and brain. This enzyme this family also includes the core proteins I and II of the removes residual C-terminal Arg or Lys remaining after mitochondrial bc1 complex (also called cytochrome c reduc initial endoprotease cleavage during prohormone pro tase or complex III): 55 cessing. In mammals and yeast, core proteins I and II lack enzy Carboxypeptidase M (EC 3.4.17.12), a membrane bound matic activity. Arg and LyS Specific enzyme. It is ideally situated to act In NeuroSpora crassa and in potato core protein I is on peptide hormones at local tissue Sites where it could equivalent to the beta subunit of MPP. control their activity before or after interaction with In Euglena gracilis, core protein I Seems to be active, 60 Specific plasma membrane receptors. while Subunit II is inactive. Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme These proteins do not share many regions of Sequence with a specificity to , but found in Similarity; the most noticeable is in the N-terminal Section. the Secretory granules of mast cells. This region includes a conserved histidine followed, two Streptomyces griseus carboxypeptidase (Cpase SG) (EC residues later by a glutamate and another histidine. In 65 3.4.17.-) (Narahashi, Y. (1990) J. Biochem. pitrilysin, it has been shown (Fujita et al. (1994) Nature 107:879-886), which combines the specificities of 372:567-570) that this H-x-x-E-H motif is involved in mammalian carboxypeptidases A and B. US 6,395,889 B1 7 8 Thermoactinomyces vulgaris carboxypeptidase T (EC Family M4 3.4.17.18) (CPT) (Teplyakov et al. (1992) Eur: J. Thermostable thermolysins (EC 3.4.24.27), and related Biochem. 208:281-288), which also combines the thermolabile neutral proteases (bacillolysins) (EC Specificities of carboxypeptidases A and B. 3.4.24.28) from various species of Bacillus. AEBP1 (He et al. (1995) Nature 378:92–96), a transcrip Pseudolysin (EC 3.4.24.26) from Pseudomonas aerugi tional repressor active in preadipocytes. AEBP1 Seems noSa (gene lasB). to regulate transcription by cleavage of other transcrip Extracellular from StaphylococcuS epidermidis. tional proteins. Extracellular protease prt1 from Erwinia carotovora. Yeast hypothetical protein YHR132c. All of these enzymes bind an atom of Zinc. Three con 1O Extracellular minor protease Smp from Serratia marce Served residues are implicated in the binding of the Zinc SC8S. atom: two histidines and a glutamic acid. Vibriolysin (EC 3.4.24.25) from various species of Vibrio. Neutral Zinc Metallopeptidases, Zinc-Binding Region Sig Protease prtA from Listeria monocytogenes. nature Extracellular proteinase proA from Legionella pneumo The majority of zinc-dependent metallopeptidases (with 15 phila. the notable exception of the carboxypeptidases) share a Family M5 common pattern of primary structure (Jongeneel et al. Mycolysin (EC 3.4.24.31) from Streptomyces cacaoi. (1989) FEBS Lett. 242:211–214; Murphy et al. (1991) FEBS Family M6 Lett. 289:4-7) in the part of their sequence involved in the Immune inhibitor A from Bacillus thuringiensis (gene binding of Zinc, and can be grouped together as a Super ina). Ina degrades two classes of insect antibacterial family on the basis of this Sequence Similarity. They can be proteins, attacins and cecropins. classified into a number of distinct families (Rawlings et al. Family M7 (1995) Meth. Enzymol. 248:183-228) including, but not Streptomyces extracellular Small neutral proteases limited to those listed below, along with Some proteases that 25 Family M8 belong to these families: Leishmanolysin (EC 3.4.24.36) (surface glycoprotein Family M1 gp63), a cell Surface protease from various species of Bacterial aminopeptidase N (EC 3.4.11.2) (gene pepN). Leishmania. Mammalian aminopeptidase N (EC 3.4.11.2). Family M9 Mammalian (EC 3.4.11.7) Microbial (EC 3.4.24.3) from CloStridium (aminopeptidase A). It may play a role in regulating perfringens and Vibrio alginolyticus. growth and differentiation of early B-lineage cells. Family M10A Yeast aminopeptidase yScII (gene APE2). Serralysin (EC 3.4.24.40), an extracellular metallopro tease from Serratia. Yeast alanine/arginine aminopeptidase (gene AAP1). 35 Yeast hypothetical protein YIL137c. Alkaline metalloproteinase from Pseudomonas aerugi Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is noSa (gene aprA). responsible or the hydrolysis of an epoxide moiety of Secreted proteases A, B, C and G from Erwinia chrysan LTA-4 to form LTB-4; it has been shown (Medina et al. themi. (1991) Proc. Natl. AcadSci. U.S.A.88:7620-7624) that 40 Yeast hypothetical protein YIL 108w. it binds Zinc and is capable of peptidase activity. Family M10B Family M2 Mammalian extracellular matrix metalloproteinases Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl (known as matrixins) (Woessner (1991) FASEB J. carboxypeptidase I) (ACE) the enzyme responsible for 5:2145-2154); MMP-1 (EC 3.4.24.7) (interstitial 45 collagenase), MMP-2 (EC 3.4.24.24) (72 Kd hydrolyzing angiotensin I to angiotensin II. There are ), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), two forms of ACE: a testis-specific isozyme and a MMP-7 (EC 3.4.24.23) (matrylisin), MMP-8 (EC Somatic isozyme which has two active centers (Ehlers 3.4.24.34) (neutrophil collagenase), MMP-3 (EC et al. (1991) Biochemistry 30:7118–7126). 3.4.24.17) (stromelysin-1), MMP-10 (EC 3.4.24.22) Family M3 50 (stromelysin-2), and MMP-11 (stromelysin-3), MMP Thimet oligopeptidase (EC 3.4.24.15), a mammalian 12 (EC 3.4.24.65) (macrophage metalloelastase). enzyme involved in the cytoplasmic degradation of Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12), a Small peptides. protease that allows the embryo to digest the protective Neurolysin (EC 3.4.24.16) (also known as mitochondrial envelope derived from the egg extracellular matrix. oligopeptidase M or microSomal ). 55 Soybean metalloendoproteinase 1. Mitochondrial intermediate peptidase precursor (EC Family M11 3.4.24.59) (MIP). It is involved the second stage of Chlamydomonas reinhardtii gamete lytic enzyme (GLE). processing of Some proteins imported in the mitochon Family M12A drion. 60 Astacin (EC 3.4.24.21), a crayfish endoprotease. Yeast saccharolysin (EC 3.4.24.37) (proteinase yScD) Meprin A (EC 3.4.24.18), a mammalian kidney and (Buchler et al. (1994) Eur: J. Biochem. 219:627-639). intestinal brush border . Escherichia coli and related bacteria dipeptidyl carbox Bone morphogenic protein 1 (BMP-1), a protein which ypeptidase (EC 3.4.15.5) (gene dcp). induces cartilage and bone formation and which Escherichia coli and related bacteria oligopeptidase A 65 expresses metalloendopeptidase activity. The Droso (EC 3.4.24.70) (gene opdA or prlC). phila homolog of BMP-1 is the dorsal-ventral pattern Yeast hypothetical protein YKL134c. ing protein tolloid. US 6,395,889 B1 9 10 Blastula protease 10 (BP10) from Paracentrotus lividus tionary related peptidases whose catalytic activity Seems to and the related protein SpAN from Strongylocentrotus be provided by a charge relay System similar to that of the purpuratus. family of serine proteases, but which evolved by Caenorhabditis elegans hypothetical proteins F42A10.8 independent convergent evolution. The known members of and R151.5. 5 this family include: (EC 3.4.21.26) (PE) (also called Choriolysins L and H (EC 3.4.24.67) (also known as post-proline cleaving enzyme). PE is an enzyme that embryonic hatching proteins LCE and HCE) from the cleaves peptide bonds on the C-terminal Side of prolyl fish Oryzias lapides. These proteases participate in the residues. The sequence of PE has been obtained from a breakdown of the egg envelope, which is derived from mammalian Species (pig) and from bacteria the egg extracellular matrix, at the time of hatching. (Flavobacterium meningOSepticum and Aeromonas Family M12B hydrophila); there is a high degree of Sequence con Snake metalloproteinases (Hite et al. (1992) Bio. Servation between these Sequences. Chem. Hoppe-Seyler 373:381-385). This subfamily Esche richia coli protease II (EC 3.4.21.83) mostly groups proteases that act in hemorrhage. () (gene prtB) which cleaves peptide Examples are: adamalysin II (EC 3.4.24.46), atrolysin 15 bonds on the C-terminal side of lysyl and argininyl C/D (EC 3.4.24.42), atrolysin E (EC 3.4.24.44), fibro residues. lase (EC 3.4.24.72), trimerelysin I (EC 3.4.25.52) and Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV II (EC 3.4.25.53). is an enzyme that removes N-terminal dipeptides Mouse cell surface antigen MS2. Sequentially from polypeptides having unsubstituted N-termini provided that the penultimate residue is Family M13 proline. Mammalian (EC 3.4.24.11) (neutral Yeast vacuolar dipeptidyl aminopeptidase A (DPAPA) endopeptidase) (NEP). (gene: STE13) which is responsible for the proteolytic Endothelin-converting enzyme I (EC 3.4.24.71) (ECE-1), maturation of the alpha-factor precursor. which processes the precursor of endothelin to release 25 Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) the active peptide. (gene: DAP2). Kell blood group glycoprotein, a major antigenic protein Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl of erythrocytes. The Kell protein is very probably a peptide hydrolase). This enzyme catalyzes the hydroly Zinc endopeptidase. sis of the amino-terminal peptide bond of an Peptidase O from LactococcuS lactis (gene pepO). N-acetylated protein to generate a N-acetylated amino Family M27 acid and a protein with a free amino-terminus. Clostridial neurotoxins, including tetanus toxin (TeTx) A conserved Serine residue has experimentally been and the various botulinum toxins (BoNT). These toxins shown (in E. coli protease II as well as in pig and bacterial are Zinc proteases that block neurotransmitter release PE) to be necessary for the catalytic mechanism. This serine, by proteolytic cleavage of Synaptic proteins Such as 35 which is part of the (Ser, His, Asp), is synaptobrevins, syntaxin and SNAP-25 (Montecucco generally located about 150 residues away from the et al. (1993) Trends Biochem. Sci. 18:324-327; C-terminal extremity of these enzymes (which are all pro Niemann et al. (1994) Trends Cell Biol. 4:179-185). teins that contains about 700 to 800 amino acids). Family M30 Renal active Site 40 Renal dipeptidase (rDP) (EC 3.4.13.19), also known as StaphylococcuS hyicuS neutral metalloprotease. microSomal dipeptidase, is a zinc-dependent metalloenzyme Family M32 which hydrolyzes a wide range of dipeptides. It is involved Thermostable carboxypeptidase 1 (EC 3.4.17.19) in renal metabolism of glutathione and its conjugates. It is a (carboxypeptidase Taq), an enzyme from Thermus homodimeric disulfide-linked glycoprotein attached to the aquaticuS which is most active at high temperature. 45 renal brush border microvilli membrane by a GPI-anchor. Family M34 A glutamate residue has recently been shown (Adachi et Lethal factor (LF) from Bacillus anthracis, one of the al. (1993) Biochim. Biophys. Acta 1163:42-48) to be impor three proteins composing the anthrax toxin. tant for the catalytic activity of rDP. Family M35 RDP seems to be evolutionary related to hypothetical Deuterolysin (EC 3.4.24.39) from Penicillium citrinum 50 proteins in the PQQ biosynthesis operons of Acinetobacter and related proteases from various species of Aspergil calcoaceticus and Klebsiella pneumoniae. lus. Aminopeptidase P and Proline Dipeptidase Signature Family M36 Aminopeptidase P (EC 3.4.11.9) is the enzyme respon Extracellular elastinolytic metalloproteinases from Sible for the release of any N-terminal amino acid adjacent Aspergillus. 55 to a proline residue. Proline dipeptidase (EC 3.4.13.9) From the tertiary structure of , the position of (prolidase) splits dipeptides with a prolyl residue in the the residues acting as Zinc ligands and those involved in the carboxyl terminal position. catalytic activity are known. Two of the Zinc ligands are Bacterial aminopeptidase PII (gene pepP) (Yoshimoto et histidines which are very close together in the Sequence; al. (1989) J. Biochem. 105:412–416), proline dipeptidase C-terminal to the first histidine is a glutamic acid residue 60 (gene pepO) (Nakahigashi et al. (1990) Nucleic Acids Res. which acts as a nucleophile and promotes the attack of a 18:6439-6439), and human proline dipeptidase (gene water molecule on thecarbonyl carbon of the substrate. PEPD) (Endo et al. (1989) J. Biol chem. 264: 4476–4481) Prolyl oligopeptidase family Serine active site are evolutionary related. These proteins are manganese The prolyl oligopeptidase family (Rawlings et al. (1991) metalloenzymes. Biochem. J. 279:907–911; Barrett et al. (1992) Biol. Chem. 65 Yeast hypothetical proteins YER078c and YFR006w and Hoppe-Seyler 373:353–360; Polgaret al. (1992) Biol. Chem. Mycobacterium tuberculosis hypothetical protein Hoppe-Seyler 373:361-366) consist of a number of evolu MtCY49.29c also belong to this family. US 6,395,889 B1 11 12 Methionine Aminopeptidase Signatures carboxypeptidases, like that of the trypsin family Serine Methionine aminopeptidase (EC 3.4.11.18) (MAP) is proteases, is provided by a charge relay System involving an responsible for the removal of the amino-terminal (initiator) aspartic acid residue hydrogen-bonded to a histidine, which methionine from nascent eukaryotic cytosolic and cytoplas is itself hydrogen-bonded to aserine (Liao et al. (1990).J. mic prokaryotic proteins if the penultimate amino acid is 5 Biol. Chem. 265:6528-6531). Proteins known to be serine small and uncharged. All MAP studied to date are mono carboxypeptidases include, but are not limited to: meric proteins that require cobalt ions for activity. Barley and wheat Serine carboxypeptidases I, II, and III Two subfamilies of MAP enzymes are known to exist (Sorenson et al. (1989) Carlsberg Res. Commun. (Arfin et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 54:193–202). 92:7714–1128; Keeling et al. (1996) Trends Biochem. Sci. Yeast carboxypeptidase Y (YSCY) (gene PRC1), a vacu 21:285-286). While being evolutionary related, they only olar protease involved in degrading Small peptides. share a limited amount of Sequence Similarity mostly clus tered around the residues shown, in the Escherichia coli Yeast KEX1 protease, involved in killer toxin and alpha MAP (Roderick et al. (1993) Biochemistry 32:3907–3912), factor precursor processing. to be involved in cobalt-binding. 15 Fission yeast SXa2, a probable carboxypeptidase involved The first family consists of enzymes from prokaryotes as in degrading or processing mating pheromones (Imai et well as eukaryotic MAP-1, while the Second group is made al. (1992) Mol. Cell. Biol. 12:1827–1834). up of archebacterial MAP and eukaryotic MAP-2. The Penicillium janthinellum carboxypeptidase S1 (Svendsen Second Subfamily also includes proteins which do not seem et al. (1993) FEBS Lett. 333:39043). to be MAP, but that are clearly evolutionary related such as Aspergulus niger carboxypeptidase pepF. mouse proliferation-associated protein 1 and fission yeast Aspergulus Satoi carboxypeptidase cpdS. curved DNA-binding protein. Vertebrate protective protein/ (Galjart et al. Matrixins Cysteine Switch (1991) J. Biol. Chem 266:14754–14762), a lysosomal Mammalian extracellular matrix metalloproteinases (EC protein which is not only a carboxypeptidase but also 3.4.24-), also known as matrixins (Woessner (1991) FASFB 25 essential for the activity of both beta-galactosidase and J. 5:2145-2154) (see ), are zinc-dependent enzymes. They neuraminidase. are Secreted by cells in an inactive form (zymogen) that Mosquito Vitellogenic carboxypeptidase (VCP) (Cho et differs from the mature enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is at. (1991) Proc. Natl. Acad. Sci. U.S.A. found two residues downstream of the C-terminal end of the 88: 10821–10824). propeptide. This region has been shown to be involved in Naegleria fowleri virulence-related protein Nf314 (Hu et autoinhibition of matrixins (Sanchez-Lopez et al. (1988) J. al. (1992) Infect. Immun. 60:2418-2424). Biol. Chem. 266:11892-11899; Parks et al. (1991).J. Biol. Yeast hypothetical protein YBR139w. Chem. 266:1584-1590); a cysteine within the octapeptide Caenorhabditis elegans hypothetical proteins C08H9.1, chelates the active site Zinc ion, thus inhibiting the enzyme. 35 F13D12.6, F32A5.3, F41C3.5 and K10B2.2. This region has been called the “cysteine Switch' or “auto This family also includes: inhibitor region'. Sorghum (S)-hydroxymandelonitrile (EC 4.1.2.11) A cysteine Switch has been found in the following Zinc (hydroxynitrile lyase) (HNL) (Wajant et al. (1994) proteases: Plant Mol. Biol. 26:735-746), an enzyme involved in MP-1 (EC 3.4.24.7) (interstitial collagenase). 40 plant cyanogenesis. The Sequences Surrounding the active site Serine and MP-2 (EC 3.4.24.24) (72 Kid gelatinase). histidine residues are highly conserved in all these Serine MP-3 (EC 3.4.24.17) (stromelysin-1). carboxypeptidases. MP-7 (EC 3.4.24.23) (matrilysin). (Siezen et al. (1991) Protein Eng. 4:719–737; MP-8 (EC 3.4.24.34) (neutrophil collagenase). 45 Siezen, R. J. (1992) In: Proceedings Symposium, MP-9 (EC 3.4.24.35) (92 Kd gelatinase). Hamburg) are an extensive family of Serine proteases whose catalytic activity is provided by a charge relay System MP-10 (EC 3.4.24.22) (stromelysin-2). Similar to that of the trypsin family of Serine proteases but MP-11 (EC 3.4.24-) (stromelysin-3). which evolved by independent convergent evolution. The MP-12 (EC 3.4.24.65) (macrophage metalloelastase). 50 Sequence around the residues involved in the catalytic triad MP-13 (EC 3.4.24.-) (collagenase 3). (aspartic acid, Serine and histidine) are completely different MP-14 (EC 3.4.24-) (membrane-type matrix metallip from that of the analogous residues in the trypsin Serine roteinase 1). proteases. The family is described below. MP-15 (EC 3.4.24-) (membrane-type matrix metallip Proteasome A-type Subunits Signature 55 The proteasome (or macropain) (EC 3.4.99.46) (Rivett roteinase 2). (1993) Biochem. J. 29: 1-10; Rivett (1989) Arch. Biochem MP-16 (EC 3.4.24-) (membrane-type matrix metallip Biophys. 268:1-8; Goldbert et al. (1992) Nature roteinase 3). 357:375–379; Wilk (1993) Enzyme Protein 47: 187–188; Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) Hilt et a. (1996) Trends Biochem. Sci. 21:96–102) is an (Lepage et al. (1990) EMBO J. 93003-3012). 60 eukaryotic and archaebacterial multicatalytic proteinase Chlamydomonas reinhardtii gamete lytic enzyme (GLE) complex that seems to be involved in an ATP/ubiquitin (Kinshita et al. (1992) Proc. Natl. Acad. Sci. U.S.A. dependent nonlySOSomal proteolytic pathway. In eukaryotes 89:4693-4697). the proteasome is composed of about 28 distinct Subunits Serine Carboxypeptidases, Active Sites which form a highly ordered ring-shaped structure (20S All known carboxypeptidases are either metallo carbox 65 ring) of about 700 Kd. ypeptidases or Serine carboxypeptidases (EC 3.4.16.5 and Most proteasome Subunits can be classified, on the basis EC 3.4.16.6). The catalytic activity of the serine on Sequence Similarities into two groups, A and B. Subunits US 6,395,889 B1 13 14 that belong to the A-type group are proteins of from 210 to Cell envelope-located proteases PI, PII, and PIII from 290 amino acids that share a number of conserved Sequence LactococcuS lactis. regions. Subunits that are known to belong to this family Extracellular from Serratia marceScens. include, but are not limited to: Extracellular protease from Xanthomonas campestris. Vertebrate subunits C2 (nu), C3, C8, C9, iota and Zeta. Intracellular serine protease (ISP) from various Bacillus. Drosophila PROS-25, PROS-28.1, PROS-29 and PROS Minor extracellular serine protease epr from Bacillus 35. Subtilis (gene epr). Yeast C1 (PRS1), C5 (PRS3), C7-alpha (Y8) (PRS2), Y7, Minor extracellular serine protease Vpr from Bacillus Y13, PRE5, PRE6 and PUP2. Subtilis (gene pr). Arabidopsis thaliana subunits alpha and PSM30. Nisin leader peptide processing protease nisp from Lac Thermoplasma acidophilum alpha-Subunit. In this archae toCOccuS lactis. bacteria the proteasome is composed of only two Serotype-Specific antigene 1 from Pasteurella different Subunits. haemolytica (gene Ssa1). Proteasome B-type Subunits Signature 15 The proteasome (or macropain) (EC 34.99.46) (Rivett (EC 3.4.21.66) from Thermoactinomyces vul (1993) Biochem. J. 29:1-10; Rivett (1989) Arch. Biochem garis. Biophys. 268:1-8; Goldbert et al. (1992) Nature Calcium-dependent protease from Anabaena variabilis 357:375–379; Wilk (1993) Enzyme Protein 47: 187–188; (gene preA). Hilt et al. (1996) Trends Biochem. Sci. 21:96–102) is an Halolysin from halophilic bacteria sp. 172p1 (gene hly). eukaryotic and archaebacterial multicatalytic proteinase Alkaline extracellular protease (AEP) from Yarrowia complex that seems to be involved in an ATP/ubiquitin lipolytica (gene Xpr2). dependent nonly SOSomal proteolytic pathway. In eukaryotes Alkaline proteinase from Cephalosporium acremonium the proteasome is composed of about 28 distinct Subunits (gene alp). which form a highly ordered ring-shaped structure (20S 25 ring) of about 700 Kd. Cerevisin (EC 3.4.21.48) (vacuolar protease B) from Most proteasome Subunits can be classified, on the basis yeast (gene PRB1). on Sequence Similarities into two groups, A and B. Subunits Cuticle-degrading protease (pr1) from Meta rhizium that belong to the B-type group are proteins of from 190 to anisopliae. 290 amino acids that share a number of conserved Sequence KEX-1 protease from Kluyveromyces lactis. regions. Subunits that belong to this family include, but are (EC 3.4.21.61) from yeast (gene KEX-2). not limited to: (EC 3.4.21.63) (alkaline proteinase) from Vertebrate subunits C5, beta, delta, epsilon, theta (C10 Aspergillus (gene alp). II), LMP2/RING12, C13 (LMP7/RING10), C7-I and (EC 3.4.21.64) from Tritirachium album MECL-1 35 (gene proK). Yeast PRE1, PRE2 (PRG1), PRE3, PRE4, PRS3, PUP1 Proteinase R from Tritirachium album (gene proR). and PUP3. Proteinase T from Tritirachium album (gene proT). Drosophila L(3)73AI. Subtilisin-like protease III from yeast (gene YSP3). Fission yeast pts1. 40 Thermomycolin (EC 3.4.21.65) from Malbranchea Sul Thermoplasma acidophilum beta-subunit. In this archae furea. bacteria the proteasome is composed of only two (EC 3.4.21.85), neuroendocrine convertases 1 to 3 different Subunits. (NEC-1 to 3) and PACE4 protease from mammals, Serine proteases Subtilase family, active sites other vertebrates, and invertebrates. These proteases Subtilases (Siezen et al. (1991) Protein Eng. 4:719-737, 45 Siezen, R. J. (1992) In: Proceedings Subtilisin Symposium, are involved in the processing of hormone precursors at Hamburg) are an extensive family of Serine proteases whose Sites comprised of pairs of basic amino acid residues catalytic activity is provided by a charge relay System (Barr, P. J. (1991) Cell 66:1-3). Similar to that of the trypsin family of Serine proteases but Tripeptidyl-peptidase II (EC 3.4.14.10) (tripeptidyl which evolved by independent convergent evolution. The 50 aminopeptidase) from human. Sequence around the residues involved in the catalytic triad Prestalk-specific proteins tagB and tagC from Slime mold (aspartic acid, Serine and histidine) are completely different (Shaulsky et al. (1995) Genes Dev. 9:1111–1122). Both from that of the analogous residues in the trypsin Serine proteins consist of two domains: a N-terminal Subtilase proteases. The Subtilase family includes, but is not limited catalytic domain and a C-terminal ABC transporter to, the following proteases. 55 domain. (EC 3.4.21.62), these alkaline proteases from Serine Proteases, Trypsin Family, Active Sites various Bacillus Species have been the target of numer The catalytic activity of the Serine proteases from the ouS Studies in the past thirty years. trypsin family is provided by a charge relay System involv Alkaline elastase YaB from Bacillus sp. (gene ale). ing an aspartic acid residue hydrogen-bonded to a histidine, 60 which itself is hydrogen-bonded to a Serine. The Sequences Alkaline Serine exoprotease A from Vibrio alginolyticuS in the vicinity of the active Site Serine and histidine residues (gene proA). are well conserved in this family of proteases (Brenner Aqualysin I from Thermus aquaticus (gene pst). (1988) Nature 334:528–530). Proteases that belong to the ASpA from Aeromonas Salmonicida. trypsin family include, but are not limited to: Bacillopeptidase F (esterase) from Bacillus subtilis (gene 65 . bpf). Blood factors VII, IX, X, XI and XII, from Streptococcus pyogenes (gene ScpA). , plasminogen, and . US 6,395,889 B1 15 16 . Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15) . (UCH) (deubiquitinating enzymes) (Jentsch et al. (1991) Complement components C1r, C1S, C2, and complement Biochim. Biophysi. Acta 1089:127-139) are thiol proteases factors B, D and I. that recognize and hydrolyze the peptide bond at the 5 C-terminal glycine of ubiquitin. These enzymes are involved Complement-activating component of RA-reactive factor. in the processing of poly-ubiquitin precursors as well as that Cytotoxic cell proteases ( A to H). of ubiquinated proteins. Duodenase I. There are two distinct families of UCH. The Second class 1, 2, 3 A, 3B (protease E), leukocyte (Papa et al. (1993) Nature 366:313–319) consist of large (medullasin). proteins (800 to 2000 residues) and is represented by: Enterokinase (EC 3.4.21.9) (). Yeast UBP1, UBP2, UBP3, UBP4 (or DOA4/SSV7), Hepatocyte growth factor activator. UBP5, UBP7, UBP9, UBP11, UBP12 and UBP13. . Yeast hypothetical protein YBR058c. Glandular (tissue) (including EGF-binding 15 Yeast hypothetical protein YFR010w. protein types A, B, and C, NGF-gamma chain, gamma Yeast hypothetical protein YMR304w. renin, prostate specific antigen (PSA) and tonin). Yeast hypothetical protein YMR223w. Plasma . Yeast hypothetical protein YNL186w. Mast cell proteases (MCP) 1 () to 8. Human tre-2. () (Wegener's autoantigen). Human isopeptidase T. Plasminogen activators (-type, and tissue-type). Human isopeptidase T-3. Trypsins I, II, III, and IV. Mammalian Ode-1. . Mammalian Unp. proteases Such as , , 25 cerastobin, flavoxobin, and protein C activator. Mouse Dub-1. Collagenase from common cattle grub and collagenolytic Drosophila fat facets protein (gene faf). protease from Atlantic Sand fiddler crab. Mammalian faf homolog. Apollipoprotein(a). Caenorhabditis elegans hypothetical protein R10E 11.3. Caenorhabditis elegans hypothetical protein KO2C4.3. Blood fluke cercarial protease. These proteins only share two regions of Similarity. The Drosophila trypsin like proteases: alpha, easter, Snake first region contains a conserved cysteine which is probably locus. implicated in the catalytic mechanism. The Second region Drosophila protease Stubble (gene Sb). contains two conserved histidines residues, one of which is Major mite fecal allergen Der p III. 35 also probably implicated in the catalytic mechanism. All the above proteins belong to family S1 in the classi The identification and characterization of the novel genes fication of peptidases (Rawlings et al. (1994) Meth. Enzy encoding the novel human proteases is described. The mol. 244:19-61) and originate from eukaryotic species. It invention is based, at least in part, on the identification of should be noted that bacterial proteases that belong to family human genes encoding members of protease families, S2A are similar enough in the regions of the active site 40 including but not limited to those described herein. The residues that they can be picked up by the same patterns. human protease family members were isolated based on a These proteases include, but are not limited to: Specific consensus motif or protein domain characteristic of Achromobacter lyticus protease I. a protease family of proteins. The Search of the nucleic acid LySobacter alpha-lytic protease. Sequence database (usually derived from random cDNA Streptogrisin A and B (Streptomyces proteases A and B). 45 library sequencing) was performed with one or more HMM Streptomyces griseuS glutamyl endopeptidase II. motifs, a TBLASTN set, or both. Streptomyces fradiae proteases 1 and 2. The TBLASTN set included a set of protein sequence Ubiquitin Carboxyl-Terminal Hydrolases Family 1 Cysteine probes which correspond to amino acid Sequence motifs that Active Sites are conserved in the protease family of proteins. Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15) 50 The HMM motif included a consensus sequence for a (UCH) (deubiquitinating enzymes) (Jentsch et al. (1991) protein domain. Such consensus Sequences can be found in Biochim. Biophys. Acta 1089:127-139) are thiol proteases a database of Hidden Markov Models (HMMs), e.g., the that recognize and hydrolyze the peptide bond at the Pfam database, release 2.1, (http://www.sanger.ac.uk/ C-terminal glycine of ubiquitin. These enzymes are involved Software/Pfam/HMM search). A description of the Pfam in the processing of poly-ubiquitin precursors as well as that 55 database can be found in Sonhammer et al. (1997) Proteins of ubiquinated proteins. 23(3):405-420 and detailed description of HMMs can be There are two distinct families of UCH. The first class found in, for example, Gribskov et al. (1990) Meth. Enzy consist of enzymes of about 25 Kd and is currently repre mol. 183:146-159; Gribskov et al. (1987) Proc. Natl. Acad. sented by: Sci. USA 84:4355-4358 Krogh et al. (1994) J. Mol. Biol. 60 235:1501–1531; and Stultz et al. (1993) Protein Sci. Mammalian isozymes L1 and L3. 2:305-314, the contents of which are incorporated herein by Yeast YUH1. reference. Drosophila Uch. The Sequences of the positive clones were determined and One of the active site residues of class-1 UCH (Johnston are set forth herein as SEO ID NOS:1-33. et al. (1997) EMBO J. 16:3787–3796) is a cysteine. 65 Polynucleotides Ubiquitin Carboxyl-Terminal Hydrolases Family 2 Signa Accordingly, the invention provides isolated nucleic acid tureS molecules comprising a nucleotide Sequence Selected from US 6,395,889 B1 17 18 the group consisting of SEQ ID NOS: 1-33 and the comple homogeneity, for example as determined by PAGE or col ments thereof. The Sequence Listing shows the relationship umn chromatography such as HPLC. Preferably, an isolated between each nucleotide Sequence and protease family. nucleic acid comprises at least about 50, 80 or 90% (on a In one embodiment, the isolated nucleic acid molecule molar basis) of all macromolecular species present. has the formula: The invention provides isolated polynucleotides encoding protease polypeptides. The nucleic acid molecule can include all or a portion of wherein, at the 5' end of the molecule R is either hydrogen the coding Sequence. In one embodiment, the protease or any nucleotide residue when n=1, and is any nucleotide nucleic acid comprises only the coding region. The poly residue when n>1; at the 3' end of the molecule R is either nucleotides include, but are not limited to, the Sequence hydrogen, a metal or any nucleotide residue when m=1, and encoding the mature polypeptide alone or the Sequence is any nucleotide residue when m>1, in and m are integers encoding the mature polypeptide and additional coding between about 1 and 5000; and R is a nucleic acid having Sequences, Such as a leader or Secretory Sequence (e.g., a a nucleotide Sequence Selected from the group consisting of pre-pro or pro-protein sequence). Such sequences may play SEQ ID NOS: 1-33 and the complements of SEQ ID NOS: 15 a role in processing of a protein from precursor to a mature 1-33. The R nucleic acid is oriented so that its 5' residue is form, facilitate protein trafficking, prolong or shorten protein bound to the 3' molecule of R, and its 3' residue is bound half-life or facilitate manipulation of a protein for assay or to the 5' molecule of Rs. Any Stretch of nucleic acid residues production, among other things. AS generally is the case in denoted by either R or R, which is greater than 1, is situ, the additional amino acids may be processed away from preferably a heteropolymer, but can also be a homopolymer. the mature protein by cellular enzymes. The nucleic acid In certain embodiments, n and m are integers between about molecule can include the Sequence encoding the mature 1 and 2000, preferably between about 1 and 1000, and polypeptide, with or without the additional coding preferably between about 1 and 500. In other embodiments, Sequences, plus additional non-coding Sequences, for the isolated nucleic acid molecule is at least about 50 example introns and non-coding 5' and 3'Sequences Such as nucleotides, preferably at least about 100 nucleotides, more 25 transcribed but non-translated Sequences that play a role in preferably at least about 150 nucleotides, and even more transcription, mRNA processing (including splicing and preferably at least about 200 or more nucleotides in length. polyadenylation signals), ribosome binding and Stability of In still another embodiment, R and R are both hydrogen. mRNA. In addition, the polynucleotide may be fused to a The term “protease polynucleotide' or “protease nucleic marker Sequence encoding, for example, a peptide that acid” refers to nucleic acid having the Sequences shown in facilitates purification, Such as those described herein. SEQ ID NO: 1-33 as well as variants and fragments of the Protease polynucleotides can be in the form of RNA, such polynucleotides of SEQ ID NO: 1-33. as mRNA, or in the form of DNA, including cDNA and An "isolated” protease nucleic acid is one that is separated genomic DNA, obtained by cloning or produced by chemi from other nucleic acid present in the natural Source of the cal Synthetic techniqueS or by a combination thereof. The protease nucleic acid. Preferably, an "isolated nucleic acid 35 nucleic acid, especially DNA, can be double-Stranded or is free of Sequences which naturally flank the nucleic acid Single-Stranded. Single-Stranded nucleic acid can be the (i.e., Sequences located at the 5' and 3' ends of the nucleic coding Strand (Sense Strand) or the non-coding Strand (anti acid) in the genomic DNA of the organism from which the Sense Strand). nucleic acid is derived. However, there can be Some flanking Protease nucleic acids comprise the nucleotide Sequences nucleotide Sequences, for example up to about 5KB. The 40 shown in SEQ ID NOS: 1-33, corresponding to human important point is that the nucleic acid is isolated from protease cDNAs. flanking Sequences Such that it can be Subjected to the The invention further provides variant protease Specific manipulations described herein Such as recombinant polynucleotides, and fragments thereof, that differ from the expression, preparation of probes and primers, and other nucleotide sequence shown in SEQ ID NOS: 1-33 due to uses Specific to the protease nucleic acid Sequences. 45 degeneracy of the genetic code and thus encode the same Moreover, an "isolated” nucleic acid molecule, Such as a protein as that encoded by the nucleotide Sequences shown cDNA or RNA molecule, can be substantially free of other in SEO ID NOS: 1-33. cellular material, or culture medium when produced by The invention also provides protease nucleic acid mol recombinant techniques, or chemical precursors or other ecules encoding the variant polypeptides described herein. chemicals when chemically Synthesized. However, the 50 Such polynucleotides may be naturally occurring, Such as nucleic acid molecule can be fused to other coding or allelic variants (same locus), homologs (different locus), and regulatory Sequences and Still be considered isolated. orthologs (different organism), or may be constructed by For example, recombinant DNA molecules contained in a recombinant DNA methods or by chemical synthesis. Such vector or other construct (i.e., as part of a larger constructed non-naturally occurring variants may be made by mutagen nucleic acid) are considered isolated. Further examples of 55 esis techniques, including those applied to polynucleotides, isolated DNA molecules include recombinant DNA mol cells, or organisms. Accordingly, the variants can contain ecules maintained in heterologous host cells or purified nucleotide Substitutions, deletions, inversions and inser (partially or substantially) DNA molecules in solution. Iso tions. lated RNA molecules include in vivo or in vitro RNA Variation can occur in either or both the coding and transcripts of the isolated DNA molecules of the present 60 non-coding regions. The variations can produce both con invention. Isolated nucleic acid molecules according to the Servative and non-conservative amino acid Substitutions. present invention further include Such molecules produced Typically, variants have a Substantial identity with a Synthetically. nucleic acid molecule Selected from the group consisting of In Some instances, the isolated material will form part of SEQ ID NOS: 1-33 and the complements thereof. a composition (for example, a crude extract containing other 65 Orthologs, homologs, and allelic variants can be identified Substances), buffer System or reagent mix. In other using methods well known in the art. These variants com circumstances, the material may be purified to essential prise a nucleotide Sequence encoding a protease that is US 6,395,889 B1 19 20 50–55% at least about 55%, typically at least about 70–75%, one embodiment, the nucleic acid consists of a portion of a more typically at least about 80-85%, and most typically at nucleotide Sequence Selected from the group consisting of least about 90-95% or more homologous to the nucleotide SEQ ID NOS: 1-33 and the complements SEQ ID NOS: Sequence shown herein or a fragment of these Sequences. 1-33. The nucleic acid fragments of the invention are at least Such nucleic acid molecules can be readily identified as about 15, preferably at least about 18, 20, 23 or 25 being able to hybridize under Stringent conditions to a nucleotides, and can be 30, 40, 50, 100, 200 or more nucleotide Sequence or fragments thereof Selected from the nucleotides in length. Longer fragments, for example, 30 or more nucleotides in length, which encode antigenic proteins group consisting of SEQ ID NOS: 1-33 and the comple or polypeptides described herein are useful. Additionally, ments thereof. In one embodiment, the variants hybridize nucleotide Sequences described herein can also be contigged under high Stringency hybridization conditions (e.g., for to produce longer Sequences (see, for example, http:// Selective hybridization) to a nucleotide Sequence Selected boZeman.mbt.washington.edu/phrap.docs/phrap.html). from SEQ ID NOS: 1-33. It is understood that stringent In a related aspect, the nucleic acid fragments of the hybridization does not indicate Substantial homology where invention provide probes or primers in assayS. Such as those it is due to general homology, Such as poly A Sequences, or described below. “Probes' are oligonucleotides that hybrid Sequences common to all or most proteins, Sequences com 15 ize in a base-specific manner to a complementary Strand of mon to all or most proteases, or Sequences common to all or nucleic acid. Such probes include polypeptide nucleic acids, most members of the protease family to which the Specific as described in Nielsen el al., Science 254, 1497-1500 protease belongs. Moreover, it is understood that variants do (1991). Typically, a probe comprises a region of nucleotide not include any of the nucleic acid Sequences that may have Sequence that hybridizes under highly Stringent conditions to been disclosed prior to the invention. at least about 15, typically about 20-25, and more typically AS used herein, the term "hybridizes under Stringent about 40, 50 or 75 consecutive nucleotides of a nucleic acid conditions” is intended to describe conditions for hybrid selected from the group consisting of SEQ ID NOS: 1-33 ization and washing under which nucleotide Sequences and the complements thereof. More typically, the probe encoding a protease at least 50-55%, 55% homologous to further comprises a label, e.g., radioisotope, fluorescent each other typically remain hybridized to each other. The 25 compound, enzyme, or enzyme co-factor. conditions can be Such that Sequences at least about 65%, at AS used herein, the term “primer' refers to a single least about 70%, at least about 75%, at least about 80%, at Stranded oligonucleotide which acts as a point of initiation least about 90%, at least about 95% or more identical to each of template-directed DNA synthesis using well-known other remain hybridized to one another. Such Stringent methods (e.g., PCR, LCR) including, but not limited to those conditions are known to those skilled in the art and can be described herein. The appropriate length of the primer found in Current Protocols in Molecular Biology, John depends on the particular use, but typically ranges from Wiley & Sons, N.Y. (1989), 6.3.1–6.3.6, incorporated by about 15 to 30 nucleotides. The term “primer site” refers to reference. One example of Stringent hybridization condi the area of the target DNA to which a primer hybridizes. The tions are hybridization in 6xSodium chloride/sodium citrate term “primer pair refers to a Set of primers including a 5' (SSC) at about 45 C., followed by one or more washes in 35 (upstream) primer that hybridizes with the 5' end of the 0.2xSSC, 0.1% SDS at 50–65° C. In another non-limiting nucleic acid sequence to be amplified and a 3’ (downstream) example, nucleic acid molecules are allowed to hybridize in primer that hybridizes with the complement of the Sequence 6xsodium chloride/sodium citrate (SSC) at about 45° C., to be amplified. followed by one or more low stringency washes in 0.2x Fragments include nucleic acid Sequences corresponding SSC/0.1% SDS at room temperature, or by one or more 40 to Specific amino acid Sequences described herein. Further moderate stringency washes in 0.2xSSC/0.1% SDS at 42 fragments can include Subfragments of Specific domains or C., or washed in 0.2xSSC/0.1% SDS at 65° C. for high Sites, Such as proteolytic cleavage Sites, Sites of interaction Stringency. In one embodiment, an isolated protease nucleic with a protein that modifies or activates the protease (an acid molecule that hybridizes under Stringent conditions to “effector” protein), or substrate binding sites. Nucleic acid the sequence of SEQ ID NOS: 1-33 corresponds to a 45 fragments, according to the present invention, are not to be naturally-occurring nucleic acid molecule. AS used herein, a construed as encompassing those fragments that may have “naturally-occurring nucleic acid molecule refers to an been disclosed prior to the invention. RNA or DNA molecule having a nucleotide sequence that Protease nucleic acid fragments include Sequences corre occurs in nature (e.g., encodes a natural protein). sponding to any domain described herein, Subregions also AS understood by those of ordinary skill, the exact con 50 described, and Specific functional Sites, Such as binding and ditions can be determined empirically and depend on ionic cleavage Sites. Protease nucleic acid fragments also include Strength, temperature and the concentration of destabilizing combinations of the domains, regions, Segments, and other agents Such as formamide or denaturing agents Such as SDS. functional Sites described herein. A perSon of ordinary skill Other factors considered in determining the desired hybrid in the art would be aware of the many permutations that are ization conditions include the length of the nucleic acid 55 possible. Sequences, base composition, percent mismatch between the It is understood that a protease fragment includes any hybridizing Sequences and the frequency of occurrence of nucleic acid Sequence that does not include the entire gene. Subsets of the Sequences within other non-identical Where the location of the domains or sites have been Sequences. Thus, equivalent conditions can be determined predicted by computer analysis, one of ordinary skill would by varying one or more of these parameters while maintain 60 appreciate that the amino acid residues constituting these ing a similar degree of identity or Similarity between the two domains can vary depending on the criteria used to define nucleic acid molecules. the domains. The present invention also provides isolated nucleic acids The invention also provides protease nucleic acid frag that contain a single or double Stranded fragment or portion ments that encode epitope bearing regions of the protease that hybridizes under Stringent conditions to a nucleotide 65 proteins encoded by the cDNAs of the invention. Sequence Selected from the group consisting of SEQ ID For example, the coding region of a protease gene can be NOS: 1-33 and the complements of SEQ ID NOS: 1-33. In isolated using the known nucleotide Sequence to Synthesize US 6,395,889 B1 21 22 an oligonucleotide probe. A labeled probe can then be used The protease polynucleotides are useful as a hybridization to screen a cDNA library, genomic DNA library, or mRNA probe for cDNA and genomic DNA to isolate a full-length to isolate nucleic acid corresponding to the coding region. cDNA and genomic clones encoding the protease polypep Further, primers can be used in PCR reactions to clone tides and to isolate cDNA and genomic clones that corre Specific regions of protease genes. spond to variants producing the same protease polypeptides The nucleic acid molecules of the invention Such as those or the other types of variants described herein. Variants can described above can be identified and isolated using Stan be isolated from the Same tissue and organism from which dard molecular biology techniques and the Sequence infor the polypeptides were isolated, different tissues from the mation provided in SEQ ID NOS: 1-33. For example, Same organism, or from different organisms. This method is nucleic acid molecules can be amplified and isolated by the useful for isolating gene S and cDNA that are polymerase chain reaction using Synthetic oligonucleotide 1O developmentally-controlled and therefore may be expressed primerS designed based on one or more of the Sequences in the same tissue or different tissues at different points in the provided in SEQ ID NOS: 1-33 and the complements development of an organism. thereof. See generally PCR Technology: Principles and The probe can correspond to any Sequence along the Applications for DNA Amplification (ed. H. A. Erlich, Free entire length of the gene encoding the protease. Accordingly, man Press, New York, N.Y., 1992); PCR Protocols: A Guide 15 it could be derived from 5' noncoding regions, the coding to Methods and Applications (Eds. Innis, et al., Academic region, and 3' noncoding regions. Press, San Diego, Calif., 1990); Mattila et al., Nucleic Acids The nucleic acid probe can be, for example, the full Res. 19, 4967 (1991); Eckert el al., PCR Methods and length cDNA sequence of SEQ ID NOS: 1-33, or fragments Applications 1, 17 (1991), PCR (eds. McPherson et al., IRL thereof, such as an oligonucleotide of at least 12, 15, 30, 50, Press, Oxford); and U.S. Pat. No. 4,683.202. The nucleic 100, 250 or 500 nucleotides in length and sufficient to acid molecules can be amplified using cDNA, mRNA or specifically hybridize under stringent conditions to mRNA genomic DNA as a template, cloned into an appropriate or DNA. For example, the nucleic acid probe can be all or vector and characterized by DNA sequence analysis. a portion of SEQ ID NOS: 1-33, or the complement of SEQ Other Suitable amplification methods include the ID NOS: 1-33, or a portion thereof. Other suitable probes chain reaction (LCR) (see Wu and Wallace, Genomics 4,560 25 for use in the diagnostic assays of the invention are (1989), Landegren et al., Science 241, 1077 (1988), tran described herein. scription amplification (Kwoh et al., Proc. Natl. Acad Sci. Fragments of the polynucleotides described herein are USA 86, 1173 (1989)), and self-sustained sequence replica also useful to Synthesize larger fragments or full-length tion (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87, 1874 polynucleotides described herein. For example, a fragment (1990)) and nucleic acid based sequence amplification can be hybridized to any portion of an mRNA and a larger (NASBA). The latter two amplification methods involve or full-length cDNA can be produced. isothermal reactions based on isothermal transcription, AntiSense nucleic acids of the invention can be designed which produce both single stranded RNA (SSRNA) and using the nucleotide Sequences of SEQ ID NOS: 1-33, and double stranded DNA (dsDNA) as the amplification prod constructed using chemical Synthesis and enzymatic ligation ucts in a ratio of about 30 or 100 to 1, respectively. 35 reactions using procedures known in the art. For example, an Polynucleotide uses antisense nucleic acid (e.g., an antisense oligonucleotide) The nucleic acid Sequences of the present invention can can be chemically Synthesized using naturally occurring further be used as a “query Sequence' to perform a Search nucleotides or variously modified nucleotides designed to against public databases to, for example, identify other increase the biological Stability of the molecules or to family members or related Sequences. Such Searches can be 40 increase the physical stability of the duplex formed between performed using the NBLAST and XBLAST programs the antisense and Sense nucleic acids, e.g., phosphorothioate (version 2.0) of Altschul, et al. (1990) J. Mol. Biol. derivatives and acridine Substituted nucleotides can be used. 215:403-10. BLAST nucleotide searches can be performed Examples of modified nucleotides which can be used to with the NBLAST program, score=100, wordlength=12 to generate the antisense nucleic acid include 5-fluorouracil, obtain nucleotide Sequences homologous to the nucleic acid 45 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, molecules of the invention. To obtain gapped alignments for Xanthine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) comparison purposes, Gapped BLAST can be utilized as uracil, 5-carboxymethylaminomethyl-2-thiouridine, described in Altschul et al., (1997) Nucleic Acids Res. 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 25(17):3389-3402. When utilizing BLAST and Gapped galactosylqueosine, inosine, N6-isopentenyladenine, BLAST programs, the default parameters of the respective 50 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, programs (e.g., XBLAST and NBLAST) can be used. See 2-methyladenine, 2-methylguanine, 3-methylcytosine, http://www.ncbi.nlm.nih.gov. 5-methylcytosine, N6-ade nine, 7-methylguanine, The protease polynucleotides are useful for probes, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2- primers, and in biological assays. Where the polynucleotides thiour a cil, be ta-D-man no Syl qu e o Sine, are used to assess protease properties or functions, Such as 55 5'-methoxycarboxymethyluracil, 5-methoxyuracil, in the assays described herein, all or less than all of the entire 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic cDNA can be useful. In this case, even fragments that may acid (v), Wybutoxosine, pseudou racil, queosine, have been known prior to the invention are encompassed. 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, Thus, for example, assays Specifically directed to protease 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid functions, Such as assessing agonist or antagonist activity, 60 methylester, uracil-5-oxyacetic acid (V), 5-methyl-2- encompass the use of known fragments. Further, diagnostic thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) methods for assessing protease function can also be prac W, and 2,6-diaminopurine. Alternatively, the antisense ticed with any fragment, including those fragments that may nucleic acid can be produced biologically using an expres have been known prior to the invention. Similarly, in meth Sion vector into which a nucleic acid has been Subcloned in ods involving treatment of protease dysfunction, all frag 65 an antisense orientation (i.e., RNA transcribed from the ments are encompassed including those which may have inserted nucleic acid) will be of an antisense orientation to been known in the art. a target nucleic acid of interest. US 6,395,889 B1 23 24 Additionally, the nucleic acid molecules of the invention the amplification process. These primers can then be used can be modified at the base moiety, Sugar moiety or phos for PCR screening of Somatic cell hybrids containing indi phate backbone to improve, e.g., the Stability, hybridization, vidual human chromosomes. Only those hybrids containing or solubility of the molecule. For example, the deoxyribose the human gene corresponding to the appropriate nucleotide phosphate backbone of the nucleic acids can be modified to Sequences will yield an amplified fragment. PCR mapping of Somatic cell hybrids is a rapid procedure generate peptide nucleic acids (see Hyrup et al. (1996) for assigning a particular Sequence to a particular chromo Bioorganic & Medicinal Chemistry 4:5). As used herein, the Some. Three or more Sequences can be assigned per day terms “peptide nucleic acids” or “PNAS' refer to nucleic using a Single thermal cycle. Using the nucleic acid mol acid mimics, e.g., DNA mimics, in which the deoxyribose ecules of the invention to design oligonucleotide primers, phosphate backbone is replaced by a pseudopeptide back Sublocalization can be achieved with panels of fragments bone and only the four natural nucleobases are retained. The from Specific chromosomes. Other mapping Strategies which neutral backbone of PNAS has been shown to allow for can similarly be used to map a specified Sequence to its specific hybridization to DNA and RNA under conditions of chromosome include in Situ hybridization (described in Fan, low ionic strength. The synthesis of PNA oligomers can be Y et al. (1990) PNAS, 97:6223–27), pre-screening with performed using Standard Solid phase peptide Synthesis 15 labeled flow-Sorted chromosomes, and pre-Selection by protocols as described in Hyrup et al. (1996), Supra; Perry hybridization to chromosome specific cDNA libraries. O'Keefe et al. (1996) Proc. Natl. Acad. Sci. USA 93:14670. Fluorescence in situ hybridization (FISH) of a nucleotide PNAS can be further modified, e.g., to enhance their Sequence to a metaphase chromosomal spread can further be Stability, Specificity or cellular uptake, by attaching lipo used to provide a precise chromosomal location in one Step. philic or other helper groups to PNA, by the formation of Chromosome spreads can be made using cells whose divi PNA-DNA chimeras, or by the use of liposomes or other Sion has been blocked in metaphase by a chemical Such as techniques of drug delivery known in the art. The Synthesis colcemid that disrupts the mitotic Spindle. The chromo of PNA-DNA chimeras can be performed as described in Somes can be treated briefly with trypsin, and then Stained Hyrup (1996), Supra, Finn et al. (1996) Nucleic Acids Res. with Giemsa. A pattern of light and dark bands develops on 24(17):3357–63, Mag et al. (1989) Nucleic Acids Res. 25 each chromosome, So that the chromosomes can be identi 17:5973, and Peterser et al. (1975) Bioorganic Med. Chem. fied individually. The FISH technique can be used with a Lett. 5:1119. nucleotide sequence as short as 500 or 600 bases. However, clones larger than 1,000 bases have a higher likelihood of The nucleic acid molecules and fragments of the inven binding to a unique chromosomal location with Sufficient tion can also include other appended groupS. Such as peptides signal intensity for simple detection. Preferably 1,000 bases, (e.g., for targeting host cell proteases in vivo), or agents and more preferably 2,000 bases will suffice to get good facilitating transport across the cell membrane (see, e.g., results at a reasonable amount of time. For a review of this Letsinger et al. (1989) Proc. Natl. Acad Sci. USA technique, see Verma et al., Human Chromosomes. A 86:6553–6556; Lemaitre et al. (1987) Proc. Natl. Acad. Sci. Manual of Basic Techniques (Pergamon Press, New York USA 84:648–652; PCT Publication No. WO88/0918) or the 1988). blood brain barrier (see, e.g., PCT Publication No. WO89/ 35 Reagents for chromosome mapping can be used individu 10134). In addition, oligonucleotides can be modified with ally to mark a single chromosome or a single Site on that hybridization-triggered cleavage agents (see, e.g., Krol et al. chromosome, or panels of reagents can be used for marking (1988) Bio-Techniques 6:958-976) or intercalating agents multiple Sites and/or multiple chromosomes. Reagents cor (see, e.g., Zon (1988) Pharm Res. 5:539-549). responding to noncoding regions of the genes actually are The protease polynucleotides are also useful as primers 40 preferred for mapping purposes. Coding Sequences are more for PCR to amplify any given region of a protease poly likely to be conserved within gene families, thus increasing nucleotide. the chance of croSS hybridizations during chromosomal The protease polynucleotides are also useful for con mapping. Structing recombinant vectors. Such vectors include expres Once a Sequence has been mapped to a precise chromo Sion vectors that express a portion of, or all of, the protease 45 Somal location, the physical position of the Sequence on the polypeptides. Vectors also include insertion vectors, used to chromosome can be correlated with genetic map data. (Such integrate into another polynucleotide Sequence, Such as into data are found, for example, in V. McKusick, Mendelian the cellular genome, to alter in Situ expression of protease Inheritance in Man, available on-line through Johns Hop genes and gene products. For example, an endogenous kins University Welch Medical Library). The relationship protease coding Sequence can be replaced via homologous 50 between a gene and a disease, mapped to the same chromo recombination with all or part of the coding region contain Somal region, can then be identified through linkage analysis ing one or more specifically introduced mutations. (co-inheritance of physically adjacent genes), described in, The protease polynucleotides are also useful for express for example, Egeland, J et al. (1987) Nature, 325:783–787. ing antigenic portions of the proteases. Moreover, differences in the DNA sequences between The protease polynucleotides are also useful as probes for 55 individuals affected and unaffected with a disease associated determining the chromosomal positions of the proteases by with a specified gene, can be determined. If a mutation is means of in Situ hybridization methods. observed in Some or all of the affected individuals but not in Once the nucleic acid (or a portion of the Sequence) has any unaffected individuals, then the mutation is likely to be been isolated, it can be used to map the location of the gene the causative agent of the particular disease. Comparison of on a chromosome. The mapping of the Sequences to chro 60 affected and unaffected individuals generally involves first mosomes is an important first Step in correlating these looking for Structural alterations in the chromosomes, Such Sequences with genes associated with disease. Briefly, genes as deletions or translocations that are visible form chromo can be mapped to chromosomes by preparing PCR primers Some spreads or detectable using PCR based on that DNA (preferably 15-25 bp in length) from the nucleic acid Sequence. Ultimately, complete Sequencing of genes from molecules described herein. Computer analysis of the 65 Several individuals can be performed to confirm the presence Sequences can be used to predict primers that do not span of a mutation and to distinguish mutations from polymor more than one exon in the genomic DNA, thus complicating phisms. US 6,395,889 B1 25 26 The polynucleotide probes are also useful to determine acid expression. The modulator may bind to the nucleic acid patterns of the presence of the gene encoding the proteases or indirectly modulate expression, Such as by interacting and their variants with respect to tissue distribution, for with other cellular components that affect nucleic acid example, whether gene duplication has occurred and expression. whether the duplication occurs in all or only a Subset of The invention thus provides a method for identifying a tissues. The genes can be naturally-occurring or can have compound that can be used to treat a disorder associated been introduced into a cell, tissue, or organism exogenously. with nucleic acid expression of the protease gene. The The polynucleotides are also useful for designing method typically includes assaying the ability of the com ribozymes corresponding to all, or a part, of the mRNA pound to modulate the expression of the protease nucleic produced from genes encoding the polynucleotides acid and thus identifying a compound that can be used to described herein. treat a disorder characterized by undesired protease nucleic The polynucleotides are also useful for constructing host acid expression. cells expressing a part, or all, of the protease polynucleotides The assays can be performed in cell-based and cell-free and polypeptides. Systems. Cell-based assays include cells naturally express The polynucleotides are also useful for constructing trans 15 ing the protease nucleic acid or recombinant cells geneti genic animals expressing all, or a part, of the protease cally engineered to express Specific nucleic acid Sequences. polynucleotides and polypeptides. Modulatory methods can be performed in vitro (e.g., by The polynucleotides are also useful as hybridization culturing the cell with the agent) or, alternatively, in vivo probes for determining the level of protease nucleic acid (e.g., by administering the agent to a Subject) in patients or expression. Accordingly, the probes can be used to detect the in transgenic animals. presence of, or to determine levels of, protease nucleic acid The assay for protease nucleic acid expression can in cells, tissues, and in organisms. The nucleic acid whose involve direct assay of nucleic acid levels, Such as mRNA level is determined can be DNA or RNA, including mRNA. levels, or on collateral compounds involved in the pathway Accordingly, probes corresponding to the polypeptides in which the protease is found. Further, the expression of described herein can be 10 used to assess gene copy number 25 genes that are up- or down-regulated in response to the in a given cell, tissue, or organism. This is particularly protease function in the pathway can also be assayed. In this relevant in cases in which there has been an amplification of embodiment the regulatory regions of these genes can be the protease genes. operably linked to a reporter gene Such as luciferase. Alternatively, the probe can be used in an in Situ hybrid Accordingly, the invention provides methods of ization context to assess the position of extra copies of the treatment, with the nucleic acid as a target, using a com protease genes, as on extrachromosomal elements or as pound identified through drug Screening as a gene modulator integrated into chromosomes in which the protease gene is to modulate protease nucleic acid expression. Modulation not normally found, for example as a homogeneously stain includes both up-regulation (i.e. activation or agonization) ing region. or down-regulation (Suppression or antagonization) or These uses are relevant for diagnosis of disorders involv 35 effects on nucleic acid activity (e.g. when nucleic acid is ing an increase or decrease in protease expression relative to mutated or improperly modified) Treatment is of disorders normal, Such as a proliferative disorder, a differentiative or characterized by aberrant expression or activity of nucleic developmental disorder, or a hematopoietic disorder. acid. Thus, the present invention provides a method for iden One aspect of the invention relates to diagnostic assays tifying a disease or disorder associated with aberrant expres 40 for determining nucleic acid expression as well as activity in Sion or activity of protease nucleic acid, in which a test the context of a biological sample (e.g., blood, Serum, cells, Sample is obtained from a Subject and nucleic acid (e.g., tissue) to thereby determine whether an individual has a mRNA, genomic DNA) is detected, wherein the presence of disease or disorder, or is at risk of developing a disease or the nucleic acid is diagnostic for a Subject having or at risk disorder, asSociated with aberrant expression or activity. of developing a disease or disorder associated with aberrant 45 Such assays can be used for prognostic or predictive purpose expression or activity of the nucleic acid. to thereby prophylactically treat an individual prior to the In vitro techniques for detection of mRNA include North onset of a disorder characterized by or associated with ern hybridizations and in situ hybridizations. In vitro tech expression or activity of the nucleic acid molecules. niques for detecting DNA includes Southern hybridizations The protease polynucleotides are also useful for monitor and in Situ hybridization. 50 ing the effectiveness of modulating compounds on the Probes can be used as a part of a diagnostic test kit for expression or activity of the protease gene in clinical trials identifying cells or tissues that express a protease, Such as by or in a treatment regimen. Thus, the gene expression pattern measuring a level of a protease-encoding nucleic acid in a can Serve as a barometer for the continuing effectiveness of Sample of cells from a Subject e.g., mRNA or genomic DNA, treatment with the compound, particularly with compounds or determining if a protease gene has been mutated. 55 to which a patient can develop resistance. The gene expres Nucleic acid expression assays are useful for drug Screen Sion pattern can also serve as a marker indicative of a ing to identify compounds that modulate protease nucleic physiological response of the affected cells to the compound. acid expression (e.g., antisense, polypeptide S, Accordingly, Such monitoring would allow either increased peptidomimetics, Small molecules or other drugs). A cell is administration of the compound or the administration of contacted with a candidate compound and the expression of 60 alternative compounds to which the patient has not become mRNA determined. The level of expression of protease resistant. Similarly, if the level of nucleic acid expression mRNA in the presence of the candidate compound is com falls below a desirable level, administration of the com pared to the level of expression of protease mRNA in the pound could be commensurately decreased. absence of the candidate compound. The candidate com The monitoring can be, for example, as follows: (i) pound can then be identified as a modulator of nucleic acid 65 obtaining a pre-administration Sample from a Subject prior to expression based on this comparison and be used, for administration of the agent; (ii) detecting the level of expres example to treat a disorder characterized by aberrant nucleic sion of a specified mRNA or genomic DNA of the invention US 6,395,889 B1 27 28 in the pre-administration sample; (iii) obtaining one or more of skill in the art. These detection Schemes are especially post-administration Samples from the Subject; (iv) detecting useful for the detection of nucleic acid molecules if Such the level of expression or activity of the mRNA or genomic molecules are present in very low numbers. DNA in the post-administration Samples, (v) comparing the Alternatively, mutations in a protease gene can be directly level of expression or activity of the mRNA or genomic identified, for example, by alterations in restriction enzyme DNA in the pre-administration sample with the mRNA or digestion patterns determined by gel electrophoresis. genomic DNA in the post-administration Sample or Samples, Further, sequence-specific ribozymes (U.S. Pat. No. and (vi) increasing or decreasing the administration of the 5,498.531) can be used to score for the presence of specific agent to the Subject accordingly. mutations by development or loSS of a ribozyme cleavage The protease polynucleotides are also useful in diagnostic Site. assays for qualitative changes in protease nucleic acid, and Perfectly matched Sequences can be distinguished from particularly in qualitative changes that lead to pathology. mismatched Sequences by nuclease cleavage digestion The polynucleotides can be used to detect mutations in assays or by differences in melting temperature. protease genes and gene expression products Such as Sequence changes at Specific locations can also be mRNA. The polynucleotides can be used as hybridization 15 assessed by nuclease protection assayS. Such as RNase and SI probes to detect naturally-occurring genetic mutations in a protection or the chemical cleavage method. protease gene and thereby to determine whether a Subject Furthermore, Sequence differences between a mutant pro with the mutation is at risk for a disorder caused by the tease gene and a wild-type gene can be determined by direct mutation. Mutations include deletion, addition, or Substitu DNA sequencing. A variety of automated Sequencing pro tion of one or more nucleotides in the gene, chromosomal cedures can be utilized when performing the diagnostic rearrangement, Such as inversion or transposition, modifi assays (1995) Biotechniques 19:448), including sequencing cation of genomic DNA, Such as aberrant methylation by mass spectrometry (see, e.g., PCT International Publica patterns or changes in gene copy number, Such as amplifi tion No. WO 94/16101; Cohen et al., Adv. Chromatogr. cation. Detection of a mutated form of a protease gene 36:127-162 (1996); and Griffin et al., Appl. Biochem. Bio asSociated with a dysfunction provides a diagnostic tool for 25 technol. 38:147–159 (1993)). an active disease or Susceptibility to disease when the Other methods for detecting mutations in the gene include disease results from overexpression, underexpression, or methods in which protection from cleavage agents is used to altered expression of a protease. detect mismatched bases in RNA/RNA or RNA/DNA Individuals carrying mutations in the protease gene can be duplexes (Myers et al., Science 230:1242 (1985)); Cotton et detected at the nucleic acid level by a variety of techniques. al., PNAS 85.4397 (1988); Saleeba et al., Meth. Enzymol. Genomic DNA can be analyzed directly or can be amplified 217:286-295 (1992)). by using PCR prior to analysis. RNA or cDNA can be used In Still another embodiment, the mismatch cleavage reac in the same way. tion employs one or more proteins that recognize mis In certain embodiments, detection of the mutation matched base pairs in double-stranded DNA (so called involves the use of a probe/primer in a polymerase chain 35 “DNA mismatch repair” enzymes) in defined systems for reaction (PCR) (see, e.g. U.S. Pat. Nos. 4,683,195 and detecting and mapping point mutations in cDNAS obtained 4,683.202), such as anchor PCR or RACE PCR, or, from samples of cells. For example, the mutY enzyme of E. alternatively, in a ligation chain reaction (LCR) (See, e.g., coli cleaves A at G/A mismatches and the thymidine DNA Landegran et al., Science 241:1077–1080 (1988); and Naka glycosylase from HeLa cells cleaves T at G/T mismatches Zawa et al., PNAS 91:360–364 (1994)), the latter of which 40 (Hsu et al. (1994) Carcinogenesis, 15:1657-1662). Accord can be particularly useful for detecting point mutations in ing to an exemplary embodiment, a probe based on an the gene (see Abravaya et al., Nucleic Acids Res. nucleotide Sequence of the invention is hybridized to a 23:675–682 (1995)). This method can include the steps of cDNA or other DNA product from a test cell(s). The duplex collecting a Sample of cells from a patient, isolating nucleic is treated with a DNA mismatch repair enzyme, and the acid (e.g., genomic, mRNA or both) from the cells of the 45 cleavage products, if any, can be detected from electrophore Sample, contacting the nucleic acid Sample with one or more sis protocols or the like. See, for example, U.S. Pat. No. primers which Specifically hybridize to a gene under con 5,459,039. In other embodiments, electrophoretic mobility ditions Such that hybridization and amplification of the gene of mutant and wild type nucleic acid is compared (Orita et (if present) occurs, and detecting the presence or absence of al., PNAS 86:2766 (1989); Cotton et al., Mutat. Res. an amplification product, or detecting the size of the ampli 50 285:125-144 (1993); and Hayashi et al., Genet. Anal. Tech. fication product and comparing the length to a control Appl. 9:73–79 (1992)), and movement of mutant or wild Sample. Deletions and insertions can be detected by a change type fragments in polyacrylamide gels containing a gradient in size of the amplified product compared to the normal of denaturant is assayed using denaturing gradient gel elec genotype. Point mutations can be identified by hybridizing trophoresis (Myers et al., Nature 313:495 (1985)). The amplified DNA to normal RNA or antisense DNA 55 Sensitivity of the assay may be enhanced by using RNA sequences. It is anticipated that PCR and/or LCR may be (rather than DNA), in which the secondary structure is more desirable to use as a preliminary amplification Step in Sensitive to a change in Sequence. In one embodiment, the conjunction with any of the techniques used for detecting Subject method utilizes heteroduplex analysis to Separate mutations described herein. double stranded heteroduplex molecules on the basis of Alternative amplification methods include: Self Sustained 60 changes in electrophoretic mobility (Keen et al. (1991) sequence replication (Guatelli, J. C. et al. (1990) Proc. Natl. Trends Genet., 7: 5). Examples of other techniques for Acad. Sci. USA, 87: 1874-1878), transcriptional amplifica detecting point mutations include, Selective oligonucleotide tion system (Kwoh, D.Y. et al., (1989) Proc. Natl. Acad. Sci. hybridization, Selective amplification, and Selective primer USA, 86:1173–1177), Q-Beta Replicase (Lizardi, P. M. et al., extension. (1988) Bio/Technology, 6:1197), or any other nucleic acid 65 In other embodiments, genetic mutations can be identified amplification method, followed by the detection of the by hybridizing a Sample and control nucleic acids, e.g., DNA amplified molecules using techniques well known to those or RNA, to high density arrayS containing hundreds or US 6,395,889 B1 29 30 thousands of oligonucleotide probes (Cronin, M. T. et al. to tailor treatment in an individual. Accordingly, the pro (1996) Human Mutation, 7:244-255; Kozal, M. J. et al. duction of recombinant cells and animals containing these (I996) Nature Medicine, 2:753–759). For example, genetic polymorphisms allow effective clinical design of treatment mutations can be identified in two dimensional arrayS con compounds and dosage regimens. taining light-generated DNA probes as described in Cronin, 5 The methods can involve obtaining a control biological M. T. et al. Supra. Briefly, a first hybridization array of Sample from a control Subject, contacting the control Sample probes can be used to Scan through long Stretches of DNA with a compound or agent capable of detecting mRNA, or in a Sample and control to identify base changes between the Sequences by making linear arrays of Sequential Overlapping genomic DNA, Such that the presence of mRNA or genomic probes. This Step allows the identification of point muta DNA is detected in the biological Sample, and comparing the tions. This step is followed by a second hybridization array presence of mRNA or genomic DNA in the control sample that allows the characterization of Specific mutations by with the presence of mRNA or genomic DNA in the test using Smaller, Specialized probe arrays complementary to all Sample. variants or mutations detected. Each mutation array is com The protease polynucleotides are also useful for chromo posed of parallel probe Sets, one complementary to the Some identification when the Sequence is identified with an wild-type gene and the other complementary to the mutant 15 individual chromosome and to a particular location on the gene. chromosome. First, the DNA sequence is matched to the With regard to both prophylactic and therapeutic methods chromosome by in Situ or other chromosome-specific of treatment, Such treatments may be specifically tailored or hybridization. Sequences can also be correlated to specific modified, based on knowledge obtained from the field of chromosomes by preparing PCR primers that can be used for pharmacogenomics. "Pharmacogenomics', as used herein, PCR screening of somatic cell hybrids containing individual refers to the application of genomics technologies Such as chromosomes from the desired species. Only hybrids con gene Sequencing, Statistical genetics, and gene expression taining the chromosome containing the gene homologous to analysis to drugs in clinical development and on the market. the primer will yield an amplified fragment. Sublocalization Pharmacogenomics deal with clinically significant heredi can be achieved using chromosomal fragments. Other Strat tary variations in the response to drugs due to altered drug 25 egies include prescreening with labeled flow-Sorted chro disposition and abnormal action in affected perSons. See, mosomes and preselection by hybridization to chromosome e.g., Eichelbaum, M., Clin. Exp. Pharmacol. Physiol. Specific libraries. Further mapping Strategies include 23(10–11):983–985 (1996), and Linder, M. W., Clin. Chem. fluorescence in situ hybridization which allows hybridiza 43(2):254-266 (1997). The clinical outcomes of these varia tion with probes shorter than those traditionally used. tions result in Severe toxicity of therapeutic drugs in certain Reagents for chromosome mapping can be used individually individuals or therapeutic failure of drugs in certain indi to mark a single chromosome or a single Site on the viduals as a result of individual variation in metabolism. chromosome, or panels of reagents can be used for marking Thus, the genotype of the individual can determine the way multiple sites and/or multiple chromosomes. Reagents cor a therapeutic compound acts on the body or the way the responding to noncoding regions of the genes actually are body metabolizes the compound. Further, the activity of 35 preferred for mapping purposes. Coding Sequences are more drug metabolizing enzymes effects both the intensity and likely to be conserved within gene families, thus increasing duration of drug action. Thus, the pharmacogenomics of the the chance of croSS hybridizations during chromosomal individual permit the Selection of effective compounds and mapping. effective dosages of Such compounds for prophylactic or The protease polynucleotides can also be used to identify therapeutic treatment based on the individual's genotype. 40 individuals from Small biological Samples. This can be done The discovery of genetic polymorphisms in Some drug for example using restriction fragment-length polymor metabolizing enzymes has explained why Some patients do phism (RFLP) to identify an individual. Thus, the polynucle not obtain the expected drug effects, Show an exaggerated otides described herein are useful as DNA markers for RFLP drug effect, or experience Serious toxicity from Standard (See U.S. Pat. No. 5,272,057). drug dosages. Polymorphisms can be expressed in the phe 45 Furthermore, the protease Sequence can be used to pro notype of the extensive metabolizer and the phenotype of the vide an alternative technique which determines the actual poor metabolizer. Accordingly, genetic polymorphism may DNA sequence of Selected fragments in the genome of an lead to allelic protein variants of the protease in which one individual. Thus, the protease Sequences described herein or more of the protease functions in one population is can be used to prepare two PCR primers from the 5' and 3 different from those in another population. 50 ends of the Sequences. These primers can then be used to The protease polynucleotides are thus useful for testing an amplify DNA from an individual for Subsequent Sequencing. individual for a genotype that while not necessarily causing Panels of corresponding DNA sequences from individuals the disease, nevertheless affects the treatment modality. prepared in this manner can provide unique individual Thus, the polynucleotides can be used to study the relation identifications, as each individual will have a unique Set of ship between an individual’s genotype and the individual’s 55 Such DNA sequences. It is estimated that allelic variation in response to a compound used for treatment humans occurs with a frequency of about once per each 500 (pharmacogenomic relationship). In the present case, for bases. Allelic variation occurs to Some degree in the coding example, a mutation in a protease gene that results in altered regions of these Sequences, and to a greater degree in the affinity for Substrate or effector could result in an excessive noncoding regions. The protease Sequences can be used to or decreased drug effect with Standard concentrations of 60 obtain Such identification Sequences from individuals and effector that activates the protease or Substrate cleaved by from tissue. The Sequences represent unique fragments of the protease. Accordingly, the protease polynucleotides the human genome. Each of the Sequences described herein described herein can be used to assess the mutation content can, to Some degree, be used as a Standard against which of the protease gene in an individual in order to Select an DNA from an individual can be compared for identification appropriate compound or dosage regimen for treatment. 65 purposes. The noncoding Sequences of these Sequences can Thus polynucleotides displaying genetic variations that comfortably provide positive individual identification with a affect treatment provide a diagnostic target that can be used panel of perhaps 10 to 1,000 primers which each yield a US 6,395,889 B1 31 32 noncoding amplified Sequence of 100 bases. If predicted a disorder characterized by abnormal or undesired protease coding Sequences are used, a more appropriate number of nucleic acid expression. This technique involves cleavage primers for positive individual identification would be by means of ribozymes containing nucleotide Sequences 500–2,000. complementary to one or more regions in the mRNA that If a panel of reagents from the Sequences is used to attenuate the ability of the mRNA to be translated. Possible generate a unique identification database for an individual, regions include coding regions and particularly coding those same reagents can later be used to identify tissue from regions corresponding to the catalytic and other functional activities of the protease protein, Such as Substrate binding that individual. Using the unique identification database, and cleavage Site. positive identification of the individual, living or dead, can The protease polynucleotides also provide vectors for be made from extremely Small tissue Samples. gene therapy in patients containing cells that are aberrant in The protease polynucleotides can also be used in forensic protease gene expression. Thus, recombinant cells, which identification procedures. PCR technology can be used to include the patient's cells that have been engineered ex vivo amplify DNA sequences taken from very Small biological and returned to the patient, are introduced into an individual Samples, Such as a single hair follicle, body fluids (e.g. where the cells produce the desired protease protein to treat blood, Saliva, or Semen). The amplified Sequence can then be 15 the individual. compared to a Standard allowing identification of the origin The invention also encompasses kits for detecting the of the Sample. presence of a protease nucleic acid in a biological Sample. The protease polynucleotides can thus be used to provide For example, the kit can comprise reagents Such as a labeled polynucleotide reagents, e.g., PCR primers, targeted to spe or labelable nucleic acid (probe or primer) or agent capable cific loci in the human genome, which can enhance the of detecting protease nucleic acid in a biological Sample; reliability of DNA-based forensic identifications by, for means for determining the amount of protease nucleic acid example, providing another “identification marker” (i.e. in the Sample, and means for comparing the amount of another DNA sequence that is unique to a particular protease nucleic acid in the Sample with a Standard. The individual). AS described above, actual base Sequence infor compound or agent can be packaged in a Suitable container. mation can be used for identification as an accurate alter 25 The kit can further comprise instructions for using the kit to native to patterns formed by restriction enzyme generated detect protease mRNA or DNA. fragments. Sequences targeted to the noncoding region are Polypeptides particularly useful since greater polymorphism occurs in the The invention thus relates to novel proteases having the noncoding regions, making it easier to differentiate indi deduced amino acid Sequences encoded by the open reading viduals using this technique. Examples of polynucleotide frames present in the nucleic acid molecules of SEQ ID reagents include the nucleic acid molecules or the invention, NOS: 1-33. or portions thereof, e.g., fragments having a length of at least The term “protease polypeptide' or “protease” refers to a 20 bases, preferably at least 30 bases. protein Sequence encoded by the nucleic acid Sequences The protease polynucleotides can further be used to represented by SEQ ID NOS: 1-33. The term “protease” or provide polynucleotide reagents, e.g., labeled or labelable 35 “protease polypeptide', however, further includes the probes which can be used in, for example, an in Situ numerous variants described herein, as well as fragments hybridization technique, to identify a specific tissue. This is derived from the polypeptides and variants. useful in cases in which a forensic pathologist is presented The present invention thus provides an isolated or purified with a tissue of unknown origin. Panels of protease probes protease polypeptides and variants and fragments thereof. can be used to identify tissue by Species and/or by organ 40 AS used herein, a polypeptide is Said to be "isolated” or type. “purified” when it is substantially free of cellular material In a similar fashion, these primers and probes can be used when it is isolated from recombinant and non-recombinant to Screen tissue culture for contamination (i.e. Screen for the cells, or free of chemical precursors or other chemicals when presence of a mixture of different types of cells in a culture). it is chemically Synthesized. A polypeptide, however, can be Alternatively, the protease polynucleotides can be used 45 joined to another polypeptide with which it is not normally directly to block transcription or translation of protease gene associated in a cell and still be considered "isolated” or Sequences by means of antisense or ribozyme constructs. “purified.” Thus, in a disorder characterized by abnormally high or The polypeptides can be purified to homogeneity. It is undesirable protease gene expression, nucleic acids can be understood, however, that preparations in which the directly used for treatment. 50 polypeptide is not purified to homogeneity are useful and The protease polynucleotides are thus useful as antisense considered to contain an isolated form of the polypeptide. constructs to control protease gene expression in cells, The critical feature is that the preparation allows for the tissues, and organisms. A DNA antisense polynucleotide is desired function of the polypeptide, even in the presence of designed to be complementary to a region of the gene considerable amounts of other components. Thus, the inven involved in transcription, preventing transcription and hence 55 tion encompasses various degrees of purity. production of protease protein. An antisense RNA or DNA In one embodiment, the language “Substantially free of polynucleotide would hybridize to the mRNA and thus block cellular material” includes preparations of the polypeptide translation of mRNA into protease protein. having less than about 30% (by dry weight) other proteins Examples of antisense molecules useful to inhibit nucleic (i.e., contaminating protein), less than about 20% other acid expression include antisense molecules complementary 60 proteins, less than about 10% other proteins, or less than to a fragment of any 5' untranslated regions present in SEQ about 5% other proteins. When the polypeptide is recombi ID NOS: 1-33 which also includes the start codon and nantly produced, it can also be Substantially free of culture antisense molecules which are complementary to a fragment medium, i.e., culture medium represents less than about of any 3' untranslated region present in SEQ ID NOS: 1-33. 20%, less than about 10%fo, or less than about 5% of the Alternatively, a class of antisense molecules can be used 65 Volume of the protein preparation. to inactivate mRNA in order to decrease expression of In Some instances, the protease will be associated with protease nucleic acid. Accordingly, these molecules can treat cellular membranes. This could include intracellular mem US 6,395,889 B1 33 34 branes or the outer cellular membrane. In either case, a number of identical positions shared by the Sequences, protease is considered isolated if it is part of a purified taking into account the number of gaps, and the length of membrane preparation or if it is purified and then reconsti each gap, which need to be introduced for optimal alignment tuted into membrane vesicles or liposomes. of the two Sequences. The language “Substantially free of chemical precursors The invention also encompasses polypeptides having a or other chemicals includes preparations of the polypeptide lower degree of identity but having Sufficient Similarity So as in which it is separated from chemical precursors or other to perform one or more of the same functions performed by chemicals that are involved in its Synthesis. In one the polypeptide. Similarity is determined by conserved embodiment, the language “Substantially free of chemical amino acid Substitution. Such Substitutions are those that precursors or other chemicals” includes preparations of the Substitute a given amino acid in a polypeptide by another polypeptide having less than about 30% (by dry weight) amino acid of like characteristics. Conservative Substitutions chemical precursors or other chemicals, less than about 20% are likely to be phenotypically Silent. Typically Seen as chemical precursors or other chemicals, less than about 10% conservative Substitutions are the replacements, one for chemical precursors or other chemicals, or less than about another, among the aliphatic amino acids Ala, Val, Leu, and 5% chemical precursors or other chemicals. 15 Ile; interchange of the hydroxyl residues Ser and Thr, In one embodiment, a polypeptide comprises an amino eXchange of the acidic residues Asp and Glu, Substitution acid Sequence encoded by a nucleic acid comprising a between the amide residues ASn and Gln, exchange of the nucleotide Sequence Selected from the group consisting of basic residues Lys and Arg and replacements among the SEQ ID NOS: 1-33 and the complements thereof. However, aromatic residues Phe, Tyr. Guidance concerning which the invention also encompasses Sequence variants. Variants amino acid changes are likely to be phenotypically Silent are include a Substantially homologous protein encoded by the found in Bowie et al., Science 247 1306–1310 (1990). Same genetic locus in an organism, i.e., an allelic Variant. Variants also encompass proteins derived from other genetic TABLE 1. loci in an organism, but having Substantial homology to a Conservative Amino Acid Substitutions. polypeptide encoded by a nucleic acid comprising a nucle 25 otide Sequence Selected from the group consisting of SEQ Aromatic Phenylalanine Tryptophan ID NOS: 1-33 and the complements thereof. Variants also Tyrosine include proteins Substantially homologous to the protease Hydrophobic Leucine but derived from another organism, i.e., an ortholog. Vari Isoleucine ants also include proteins that are Substantially homologous Valine Polar Glutamine to the protease that are produced by chemical Synthesis. Asparagine Variants also include proteins that are Substantially homolo Basic Arginine gous to the protease that are produced by recombinant Lysine methods. It is understood, however, that variants exclude Histidine 35 Acidic Aspartic Acid any amino acid Sequences disclosed prior to the invention. Glutamic Acid AS used herein, two proteins (or a region of the proteins) Small Alanine are Substantially homologous when the amino acid Serine sequences are at least about 50-55%, 55-60%, typically at Threonine Methionine least about 70-75%, more typically at least about 80-85%, Glycine and most typically at least about 90–95% or more homolo 40 gous. A Substantially homologous amino acid Sequence, according to the present invention, will be encoded by a The comparison of Sequences and determination of per nucleic acid Sequence hybridizing to the nucleic acid cent identity and Similarity between two Sequences can be Sequence, or portion thereof, of a nucleic acid Sequence accomplished by well-known methods Such as using a selected from the group consisting of SEQ ID NOS: 1-33, 45 mathematical algorithm. (Computational Molecular or portion thereof under Stringent conditions as more Biology, Lesk, A. M., ed., Oxford University Press, New described above. York, 1988, Biocomputing. Informatics and Genome To determine the percent identity of two amino acid Projects, Smith, D. W., ed., Academic Press, New York, Sequences or of two nucleic acid Sequences, the Sequences 1993, Computer Analysis of Sequence Data, Part 1, Griffin, are aligned for optimal comparison purposes (e.g., gaps can 50 A. M., and Griffin, H. G., eds., Humana Press, New Jersey, be introduced in one or both of a first and a Second amino 1994, Sequence Analysis in Molecular Biology, von Heinje, acid or nucleic acid Sequence for optimal alignment and G., Academic Press, 1987; and Sequence Analysis Primer, non-homologous Sequences can be disregarded for compari Gribskov, M. and Devereux, J., eds., M. Stockton Press, New Son purposes). In a preferred embodiment, the length of a York, 1991). reference Sequence aligned for comparison purposes is at 55 A preferred, non-limiting example of Such a mathematical least 30%, preferably at least 40%, more preferably at least algorithm is described in Karlin et al., Proc. Natl. Acad. Sci. 50%, even more preferably at least 60%, and even more USA 90:5873–5877 (1993). Such an algorithm is incorpo preferably at least 70%, 80%, or 90% of the length of the rated into the NBLAST and XBLAST programs (version reference Sequence. The amino acid residues or nucleotides 2.0) as described in Altschul et al., Nucleic Acids Res., at corresponding amino acid positions or nucleotide posi 60 25:3389–3402 (1997). When utilizing BLAST and Gapped tions are then compared. When a position in the first BLAST programs, the default parameters of the respective Sequence is occupied by the same amino acid residue or programs (e.g., NBLAST) can be used. See http:// nucleotide as the corresponding position in the Second www.ncbi.nlm.nih.gov. In one embodiment, parameters for Sequence, then the molecules are identical at that position (as Sequence comparison can be set at Score=100, wordlength= used herein amino acid or nucleic acid “identity” is equiva 65 12, or can be varied (e.g., W=5 or W=20). lent to amino acid or nucleic acid “homology’). The percent In a preferred embodiment, the percent identity between identity between the two Sequences is a function of the two amino acid Sequences is determined using the Needle US 6,395,889 B1 35 36 man and Wunsch (J. Mol. Biol. (48):444–453 (1970)) algo activator than the one with which the protease is normally rithm which has been incorporated into the GAP program in asSociated. Another useful variation provides a fusion pro the GCG Software package (available at http:// tein in which one or more domains or Subregions is opera www.gcg.com), using either a BLOSUM 62 matrix or a tionally fused to one or more domains or Subregions from PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or another protease. 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet another Amino acids that are essential for function can be iden preferred embodiment, the percent identity between two tified by methods known in the art, Such as site-directed nucleotide Sequences is determined using the GAP program mutagenesis or alanine-Scanning mutagenesis (Cunningham in the GCG Software package (Devereux, J., et al., Nucleic et al., Science 244: 1081–1085 (1989)). The latter procedure Acids Res. 12(1):387 (1984)) (available at http:// introduces Single alanine mutations at every residue in the www.gcg.com), using a NWSgapdna. CMP matrix and a gap molecule. The resulting mutant molecules are then tested for weight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, biological activity Such as protease binding, cleavage, or in 3, 4, 5, or 6. vitro, or in vitro proliferative activity. Sites that are critical Another preferred, non-limiting example of a mathemati for protease binding can also be determined by Structural cal algorithm utilized for the comparison of Sequences is the 15 analysis Such as crystallization, nuclear magnetic resonance algorithm of Myers and Miller, CABIOS (1989). Such an or photoaffinity labeling (Smith et al., J. Mol. Biol. algorithm is incorporated into the ALIGN program (version 224:899–904 (1992); de Vos et al. Science 255:306–312 2.0) which is part of the CGC sequence alignment software (1992)). package. When utilizing the ALIGN program for comparing Substantial homology can be to the entire nucleic acid or amino acid Sequences, a PAM120 weight residue table, a gap amino acid Sequence or to fragments of these Sequences. length penalty of 12, and a gap penalty of 4 can be used. The invention also includes polypeptide fragments of the Additional algorithms for Sequence analysis are known in polypeptides of the invention Fragments can be derived the art and include ADVANCE and ADAM as described in from a polypeptide encoded by a nucleic acid comprising a Torellis and Robotti (1994) Comput. Appl. BioSci. 10:3-5; nucleotide Sequence Selected from the group consisting of and FASTA described in Pearson and Lipman (1988) PNAS 25 SEQ ID NOS: 1-33 and the complements thereof. However, 85:2444-8. the invention also encompasses fragments of the variants of A variant polypeptide can differ in amino acid Sequence the polypeptides encoded by the nucleic acid described by one or more Substitutions, deletions, insertions, herein. inversions, fusions, and truncations or a combination of any In one embodiment, the fragment is or includes an open of these. reading frame. Open reading frames can be determined by Variant polypeptides can be fully functional or can lack routine computerized homology Search procedures. function in one or more activities. Thus, in the present case, The fragments to which the invention pertains, however, variations can affect the function, for example, of one or are not to be construed as encompassing fragments that may more of the regions corresponding to Substrate binding, be disclosed prior to the present invention. Subcellular localization, Such as membrane association, and 35 AS used herein, a fragment comprises at least 10 contigu proteolytic cleavage, effector binding, effector modification ous amino acids. Fragments can retain one or more of the of the protease, other modification Sites, or site of interaction biological activities of the protein, for example the ability to with any other protein. bind to a Substrate or activator, as well as fragments that can Fully functional variants typically contain only conserva be used as an immunogen to generate protease antibodies. tive variation or variation in non-critical residues or in 40 Biologically active fragments (peptides which are, for non-critical regions. Functional variants can also contain example, 10, 12, 15, 20, 30, 35, 36, 37,38, 39, 40, 50, 100 Substitution of Similar amino acids which result in no change or more amino acids in length) can comprise a domain or or an insignificant change in function. Alternatively, Such region, as indicated, identified by analysis of the polypeptide Substitutions may positively or negatively affect function to Sequence by well-known methods, e.g., cleavage Sites, Sub Some degree. 45 strate binding sites, glycosylation sites, cAMP and coMP Non-functional variants typically contain one or more dependent phosphorylation sites, N-myristoylation sites, non-conservative amino acid Substitutions, deletions, activator binding sites, casein kinase II phosphorylation insertions, inversions, or truncation or a Substitution, Sites, palmitoylation sites, amidation sites, or parts of any of insertion, inversion, or deletion in a critical residue or these. Such domains or Sites can be identified by means of critical region. 50 routine procedures for computerized homology or motif AS indicated, variants can be naturally-occurring or can be analysis. made by recombinant means or chemical Synthesis to pro Fragments further include combinations of the various vide useful and novel characteristics for the protease. This functional regions described herein. Other fragments include includes preventing immunogenicity from pharmaceutical the mature protein. Fragments, for example, can extend in formulations by preventing protein aggregation. 55 one or both directions from the functional site to encompass Useful variations further include alteration of Substrate 5, 10, 15, 20, 30, 40, 50, or up to 100 amino acids. Further, binding and cleavage characteristics. For example, one fragments can include Sub-fragments of the Specific domains embodiment involves a variation at the that mentioned above, which Sub-fragments retain the function results in binding but not release, or slower release, of of the domain from which they are derived. Substrate. A further useful variation at the same Sites can 60 Accordingly, possible fragments include but are not lim result in a higher affinity for substrate. Useful variations also ited to fragments defining a Substrate-binding Site, fragments include changes that provide for affinity for another Sub defining a phosphorylation site, fragments defining mem Strate. Another useful variation includes one that allows brane association, fragments defining glycosylation sites, binding but which prevents proteolysis of the substrate. fragments defining interaction with activators and fragments Another useful variation includes variation in the domain 65 defining myristoylation sites. By this is intended a discrete that provides for reduced or increased binding by the appro fragment that provides the relevant function or allows the priate activator (effector) or for binding by a different relevant function to be identified. In a preferred US 6,395,889 B1 37 38 embodiment, the fragment contains the Substrate or tion also encompasses Soluble fusion proteins containing a activator-binding site. protease polypeptide and various portions of the constant The invention also provides fragments with immunogenic regions of heavy or light chains of immunoglobulins of properties. These contain an epitope-bearing portion of the various Subclass (IgG, IgM, IgA, IgE). Preferred as immu protease and variants. These epitope-bearing peptides are noglobulin is the constant part of the heavy chain of human useful to raise antibodies that bind Specifically to a protease IgG, particularly IgG1, where fusion takes place at the hinge polypeptide or region or fragment. These peptides can region. For Some uses it is desirable to remove the Fc after contain at least 10, 12, at least 14, or between at least about the fusion protein has been used for its intended purpose, for 15 to about 30 amino acids. example when the fusion protein is to be used as antigen for Non-limiting examples of antigenic polypeptides that can immunizations. In a particular embodiment, the Fc part can be used to generate antibodies include peptides derived from be removed in a simple way by a cleavage Sequence which the amino terminal extracellular domain or any of the is also incorporated and can be cleaved with factor Xa. extracellular loops. Regions having a high antigenicity indeX Achimeric or fusion protein can be produced by Standard can be determined by routine computerized amino acid recombinant DNA techniques. For example, DNA fragments Sequence analysis. However, intracellularly-made antibod 15 coding for the different protein Sequences are ligated ies ("intrabodies') are also encompassed, which would together in-frame in accordance with conventional tech recognize intracellular peptide regions. niques. In another embodiment, the fusion gene can be The polypeptides (including variants and fragments Synthesized by conventional techniques including auto which may have been disclosed prior to the present mated DNA synthesizers. Alternatively, PCR amplification invention) are useful for biological assays related to pro of gene fragments can be carried out using anchor primers teases. Such assays involve any of the known protease which give rise to complementary overhangs between two functions or activities or properties useful for diagnosis and consecutive gene fragments which can Subsequently be treatment of protease-related conditions. annealed and re-amplified to generate a chimeric gene The epitope-bearing protease and polypeptides may be sequence (see Ausubel et al., Current Protocols in Molecu produced by any conventional means (Houghten, R. A., 25 lar Biology, 1992). Moreover, many expression vectors are Proc. Natl. AcadSci, USA 82:5131–5135 (1985)). Simulta commercially available that already encode a fusion moiety neous multiple peptide synthesis is described in U.S. Pat. (e.g., a GST protein). A protease encoding nucleic acid can No. 4,631,211. be cloned into Such an expression vector Such that the fusion Fragments can be discrete (not fused to other amino acids moiety is linked in-frame to the protease Sequence. or polypeptides) or can be within a larger polypeptide. Another form of fusion protein is one that directly affects Further, Several fragments can be comprised within a single protease functions. Accordingly, a polypeptide is encom larger polypeptide. In one embodiment a fragment designed passed by the present invention in which one or more of the for expression in a host can have heterologous pre- and protease domains (or parts thereof) has been replaced by pro-polypeptide regions fused to the amino terminus of the homologous domains (or parts thereof) from another pro protease fragment and an additional region fused to the 35 tease or other type of protease. Accordingly, various permu carboxyl terminus of the fragment. tations are possible. The Substrate binding, or Subregion The invention thus provides chimeric or fusion proteins. thereof, can be replaced, for example, with the correspond These comprise a protease amino acid Sequence operatively ing domain or Subregion from another Substrate for the linked to a heterologous protein having an amino acid protease. Thus, chimeric proteases can be formed in which Sequence not Substantially homologous to the protease. 40 one or more of the native domains or Subregions has been “Operatively linked' indicates that the protease Sequence replaced. and the heterologous protein are fused in-frame. The heter The isolated protease Sequence can be purified from cells ologous protein can be fused to the N-terminus or that naturally express it, purified from cells that have been C-terminus of the protease Sequence. altered to express it (recombinant), or Synthesized using In one embodiment the fusion protein does not affect 45 known protein Synthesis methods. protease function perse. For example, the fusion protein can In one embodiment, the protein is produced by recombi be a GST-fusion protein in which the protease Sequences are nant DNA techniques. For example, a nucleic acid molecule fused to the C-terminus of the GST sequences or an influ encoding the polypeptide is cloned into an expression enza HA marker. Other types of fusion proteins include, but vector, the expression vector introduced into a host cell and are not limited to, enzymatic fusion proteins, for example 50 the protein expressed in the host cell. The protein can then beta-galactosidase fusions, yeast two-hybrid GAL fusions, be isolated from the cells by an appropriate purification poly-His fusions and Ig fusions. Such fusion proteins, par Scheme using Standard protein purification techniques. ticularly poly-His fusions, can facilitate the purification of Polypeptides often contain amino acids other than the 20 recombinant protease. In certain host cells (e.g., mammalian amino acids commonly referred to as the 20 naturally host cells), expression and/or Secretion of a protein can be 55 occurring amino acids. Further, many amino acids, including increased by using a heterologous Signal Sequence. the terminal amino acids, may be modified by natural Therefore, in another embodiment, the fusion protein con processes, Such as processing and other post-translational tains a heterologous Signal Sequence at its N-terminus. modifications, or by chemical modification techniques well EP-A-O 464 533 discloses fusion proteins comprising known in the art. Common modifications that occur natu various portions of immunoglobulin constant regions. The 60 rally in polypeptides are described in basic texts, detailed Fc is useful in therapy and diagnosis and thus results, for monographs, and the research literature, and they are well example, in improved pharmacokinetic properties (EP-A known to those of skill in the art. 0232 262). In drug discovery, for example, human proteins Accordingly, the polypeptides also encompass derivatives have been fused with Fc portions for the purpose of high or analogs in which a Substituted amino acid residue is not throughput Screening assays to identify antagonists. Bennett 65 one encoded by the genetic code, in which a Substituent elal. (J. Mol. Recog. 8:52-58 (1995)) and Johanson et al. (J. group is included, in which the mature polypeptide is fused Biol. Chem. 270, 16:9459–9471 (1995)). Thus, this inven with another compound, Such as a compound to increase the US 6,395,889 B1 39 40 half-life of the polypeptide (for example, polyethylene Polypeptide Uses glycol), or in which the additional amino acids are fused to The protein Sequences of the present invention can further the mature polypeptide, Such as a leader or Secretory be used as a “query Sequence' to perform a Search against Sequence or a Sequence for purification of the mature public databases to, for example, identify other family polypeptide or a pro-protein Sequence. members or related Sequences. Such Searches can be per Known modifications include, but are not limited to, formed using the NBLAST and XBLAST programs (version acetylation, acylation, ADP-ribosylation, amidation, cova 2.0) of Altschul, et al. (1990) J. Mol. Biol. 215:403–10. lent attachment of flavin, covalent attachment of a heme BLAST protein searches can be performed with the moiety, covalent attachment of a nucleotide or nucleotide XBLAST program, score=50, wordlength=3 to obtain amino derivative, covalent attachment of a lipid or lipid derivative, 1O acid Sequences homologous to the proteins of the invention. covalent attachment of phosphatidylinositol, croSS-linking, To obtain gapped alignments for comparison purposes, cyclization, disulfide bond formation, demethylation, for Gapped BLAST can be utilized as described in Altschulet mation of covalent crosslinks, formation of cystine, forma al., (1997) Nucleic Acids Res. 25(17):3389–3402. When tion of pyroglutamate, formylation, gamma carboxylation, utilizing BLAST and Gapped BLAST programs, the default glycosylation, GPI anchor formation, hydroxylation, 15 parameters of the respective programs (e.g., XBLAST and iodination, methylation, myristoylation, oxidation, pro NBLAST) can be used. See http:H/www.ncbi.nlm.nih.gov. teolytic processing, phosphorylation, prenylation, The protease polypeptides are useful for producing anti racemization, Selenoylation, Sulfation, transfer-RNA medi bodies Specific for the protease, regions, or fragments. ated addition of amino acids to proteins Such as arginylation, The polypeptides (including variants and fragments and ubiquitination. which may have been disclosed prior to the present Such modifications are well-known to those of skill in the invention) are useful for biological assays related to pro art and have been described in great detail in the Scientific teases. Such assays involve any of the known protease literature. Several particularly common modifications, functions or activities or properties useful for diagnosis and glycosylation, lipid attachment, Sulfation, gamma treatment of protease-related conditions. carboxylation of glutamic acid residues, hydroxylation and 25 The polypeptides are also useful in drug Screening assays, ADP-ribosylation, for instance, are described in most basic in cell-based or cell-free Systems. Cell-based Systems can be texts, such as Proteins-Structure and Molecular native, i.e., cells that normally express the protease protein, Properties, 2nd Ed., T. E. Creighton, W. H. Freeman and as a biopsy or expanded in cell culture. In one embodiment, Company, New York (1993). Many detailed reviews are however, cell-based assays involve recombinant host cells available on this subject, such as by Wold, F., Posttransla expressing the protease protein. tional Covalent Modification of Proteins, B. C. Johnson, The polypeptides can be used to identify compounds that Ed., Academic Press, New York 1-12 (1983); Seifter et al. modulate protease activity. Such compounds can increase or (Meth, Enzymol. 182: 626-646 (1990)) and Rattan et al. decrease affinity or rate of binding to a known Substrate or (Ann. N.Y. Acad. Sci. 663:48–62 (1992)). activator, compete with Substrate or activator for binding to AS is also well known, polypeptides are not always 35 the protease, or displace Substrate or activator bound to the entirely linear. For instance, polypeptides may be branched protease. Both protease protein and appropriate variants and as a result of ubiquitination, and they may be circular, with fragments can be used in high-throughput Screens to assay or without branching, generally as a result of post candidate compounds for the ability to bind to the protease. translation events, including natural processing event and These compounds can be further Screened against a func events brought about by human manipulation which do not 40 tional protease to determine the effect of the compound on occur naturally. Circular, branched and branched circular the protease activity. Compounds can be identified that polypeptides may be Synthesized by non-translational natu activate (agonist) or inactivate (antagonist) the protease to a ral processes and by Synthetic methods. desired degree. Modulatory methods can be performed in Modifications can occur anywhere in a polypeptide, vitro (e.g., by culturing the cell with the agent) or, including the peptide backbone, the amino acid Side-chains 45 alternatively, in Vivo (e.g., by administering the agent to a and the amino or carboxyl termini. Blockage of the amino or Subject). carboxyl group in a polypeptide, or both, by a covalent The protease polypeptides can be used to Screen a com modification, is common in naturally-occurring and Syn pound for the ability to stimulate or inhibit interaction thetic polypeptides. For instance, the amino terminal residue between the protease protein and a target molecule that of polypeptides made in E. coli, prior to proteolytic 50 normally interacts with the protease protein. The target can processing, almost invariably will be N-formylmethionine. be a component of the pathway with which the protease The modifications can be a function of how the protein is normally interacts. The assay includes the Steps of combin made. For recombinant polypeptides, for example, the modi ing the protease with a candidate compound under condi fications will be determined by the host cell posttranslational tions that allow the protease or fragment to interact with the modification capacity and the modification signals in the 55 target molecule, and to detect the formation of a complex polypeptide amino acid Sequence. Accordingly, when gly between the protein and the target or to detect the biochemi cosylation is desired, a polypeptide should be expressed in cal consequence of the interaction with the protease and the a glycosylating host, generally a eukaryotic cell. Insect cells target, Such as any of the associated effects of proteolytic often carry out the Same posttranslational glycosylations as cleavage, Such as detecting the induction of a reporter gene mammalian cells and, for this reason, insect cell expression 60 (comprising a target-responsive regulatory element opera Systems have been developed to efficiently express mam tively linked to a nucleic acid encoding a detectable marker, malian proteins having native patterns of glycosylation. e.g., luciferase), detecting a cellular response, for example, Similar considerations apply to other modifications. development, differentiation or rate of proliferation detec The same type of modification may be present in the same tion of activation of the Substrate, or change in Substrate or varying degree at Several Sites in a given polypeptide. 65 levels (i.e., level of end product). Also, a given polypeptide may contain more than one type Determining the ability of the protein to bind to a target of modification. molecule can also be accomplished using a technology Such US 6,395,889 B1 41 42 as real-time Bimolecular Interaction Analysis (BIA). The assays typically involve an assay of events in the Sjolander, S. and Urbaniczky, C. (1991) Anal. Chem., pathway in which the protease is found that indicate protease 63:2338-2345 and Szabo et al. (1995) Curr. Opin. Struct. activity. Thus, the expression of genes that are up- or Biol., 5:699-705, AS used herein, “BIA' is a technology for down-regulated in response to the protease protein depen Studying biospecific interactions in real time, without label dent cascade can be assayed. In one embodiment, the ing any of the interactants (e.g., BIAcore"). Changes in the regulatory region of Such genes can be operably linked to a optical phenomenon Surface plasmon resonance (SPR) can marker that is easily detectable, Such as luciferase. be used as an indication of real-time reactions between Alternatively, modification of the protease protein, or a biological molecules. protease protein target, could also be measured. The test compounds of the present invention can be Any of the biological or biochemical functions mediated obtained using any of the numerous approaches in combi by the protease can be used as an endpoint assay. These natorial library methods known in the art, including: bio include all of the biochemical or biochemical/biological logical libraries, Spatially addressable parallel Solid phase or events described herein, in the references cited herein, Solution phase libraries, Synthetic library methods requiring incorporated by reference for these endpoint assay targets, deconvolution; the one-bead one-compound library 15 and other functions known to those of ordinary skill in the method; and Synthetic library methods using affinity chro art. matography Selection. The biological library approach is Binding and/or activating compounds can also be limited to polypeptide libraries, while the other four Screened by using chimeric proteases in which a domain, or approaches are applicable to polypeptide, non-peptide oli parts thereof, are replaced by heterologous domains or gomer or Small molecule libraries of compounds (Lam, K. S. Subregions. For example, a Substrate-binding region can be (1997) Anticancer Drug Des. 12:145). used that interacts with a different Substrate than that which Examples of methods for the synthesis of molecular is recognized by the native protease. Accordingly, a different libraries can be found in the art, for example in DeWitt et al. Set of pathway components is available as an end-point (1993) Proc. Natl. Acad. Sci. U.S.A., 90:6909, Erb et al. (I assay for activation. Alternatively, a portion or Subregions 994) Proc. Natl. Acad. Sci. U.S.A., 91:11422, Zuckermann et 25 can be replaced with a portion or Subregions Specific to a al. (1994). J. Med. Chem., 37:2678; Cho et al.(1993) host cell that is different from the host cell from which the Science, 261:1303; Carell et al. (1994) Angew. Chem. Int. domain is derived. This allows for assays to be performed in Ed. Engl., 33:2059; Carell et al. (1994) Angew. Chem. Int. other than the Specific host cell from which the protease is Ed. Engl., 33:2061; and in Gallop et al. (1994) J. Med. derived. Alternatively, the substrate or activator could be Chem., 37:1233. replaced by a domain (and/or other binding region) binding Libraries of compounds may be presented in Solution a different Substrate or activator, thus providing an assay for (e.g., Houghten (1992) Biotechniques, 13:412–421), or on test compounds that interact with the heterologous domain beads (Lam(1991) Nature, 354:82–84), chips (Fodor (1993) (or region) but still cause the events in the pathway. Finally, Nature, 364;555-556), bacteria(Ladner U.S. Pat. No. 5,223, activation can be detected by a reporter gene containing an 409), spores (Ladner U.S. Pat. No. 409), plasmids (Cullet 35 easily detectable coding region operably linked to a tran al.(1992) Proc. Natl. Acad. Sci. U.S.A., 89:1865–1869) or on Scriptional regulatory Sequence that is part of the native phage (Scott and Smith (1990) Science, 249:386–390); Signal transduction pathway. (Devlin (1990) Science, 249:404–406); (Cwirla et al. (1990) The protease polypeptides are also useful in competition Proc. Natl. Acad. Sci.,97:6378-6382); (Felici (1991).J. Mol. binding assays in methods designed to discover compounds Biol., 222:301-310); (Ladner supra). 40 that interact with the protease. Thus, a compound is exposed Candidate compounds include, for example, 1) peptides to a protease polypeptide under conditions that allow the Such as Soluble peptides, including Ig-tailed fusion peptides compound to bind or to otherwise interact with the polypep and members of random peptide libraries (see, e.g., Lam et tide. Soluble polypeptide is also added to the mixture. If the al., Nature 354:82-84 (1991); Houghten et al., Nature test compound interacts with the Soluble polypeptide, it 354:84-86 (1991)) and combinatorial chemistry-derived 45 decreases the amount of complex formed or activity from molecular libraries made of D- and/or L- configuration the protease target. This type of assay is particularly useful amino acids; 2) phosphopeptides (e.g., members of random in cases in which compounds are Sought that interact with and partially degenerate, directed phosphopeptide libraries, Specific regions of the protease. Thus, the Soluble polypep see, e.g., Songyang et al., Cell 72:767-778 (1993)); 3) tide that competes with the target protease region is designed antibodies (e.g., polyclonal, monoclonal, humanized, anti 50 to contain peptide Sequences corresponding to the region of idiotypic, chimeric, and Single chain antibodies as well as interest. Fab, F(ab'), Fab expression library fragments, and epitope Determining the ability of the test compound to interact binding fragments of antibodies); and 4) Small organic and with the polypeptide can also comprise determining the inorganic molecules (e.g., molecules obtained from combi ability of the test compound to preferentially bind to the natorial and natural product libraries). 55 polypeptide as compared to the ability of the native One candidate compound is a Soluble full-length protease counterpart, Such as activator or Substrate, or a biologically or fragment that competes for Substrate or activator binding. active portion thereof, to bind to the polypeptide. Other candidate compounds include mutant proteases or To perform cell free drug Screening assays, it is desirable appropriate fragments containing mutations that affect pro to immobilize either the protease, or fragment, or its target tease function and thus compete for Substrate, activator or 60 molecule to facilitate Separation of complexes from uncom other protein that interacts with the protease. Accordingly, a plexed forms of one or both of the proteins, as well as to fragment that competes for Substrate or activator, for accommodate automation of the assay. example with a higher affinity, or a fragment that binds Techniques for immobilizing proteins on matrices can be Substrate or activator but does not allow release, is encom used in the drug Screening assays. In one embodiment, a passed by the invention. 65 fusion protein can be provided which adds a domain that The invention provides other end points to identify com allows the protein to be bound to a matrix. For example, pounds that modulate (Stimulate or inhibit) protease activity. glutathione-S-/protease fusion proteins can be US 6,395,889 B1 43 44 adsorbed onto glutathione Sepharose beads (Sigma Situation is where it is desirable to achieve tissue regenera Chemical, St. Louis, Mo.) or glutathione derivatized tion in a Subject (e.g., where a Subject has undergone brain microtitre plates, which are then combined with the cell or spinal cord injury and it is desirable to regenerate neu lysates (e.g., S-labeled) and the candidate compound, and ronal tissue in a regulated manner). the mixture incubated under conditions conducive to com In yet another aspect of the invention, the proteins of the plex formation (e.g., at physiological conditions for Salt and invention can be used as “bait proteins” in a two-hybrid pH). Following incubation, the beads are washed to remove assay or three-hybrid assay (see, e.g., U.S. Pat. No. 5,283, any unbound label, and the matrix immobilized and radio 317, Zervos et al. (1993) Cell, 72:223–232; Madura et al. label determined directly, or in the Supernatant after the (1993) J. Biol. Chem., 268: 12046-12054; Bartel et al. complexes are dissociated. Alternatively, the complexes can (1993) Biotechniques, 14:920–924; Iwabuchi et al. (1993) be dissociated from the matrix, separated by SDS-PAGE, Oncogene, 8:1693–1696; and Brent WO94/10300), to iden and the level of protease-binding protein found in the bead tify other proteins (captured proteins) which bind to or fraction quantitated from the gel using Standard electro interact with the proteins of the invention and modulate their phoretic techniques. For example, either the polypeptide or activity. Such captured proteins are also likely to be involved its target molecule can be immobilized utilizing conjugation 15 in the pathway that includes by the proteins of the invention of biotin and Streptavidin using techniques well known in as, for example, downstream elements of a protease the art. Alternatively, antibodies reactive with the protein but mediated pathway. which do not interfere with binding of the protein to its The protease polypeptides also are useful to provide a target molecule can be derivatized to the Wells of the plate, target for diagnosing a disease or predisposition to disease and the protein trapped in the Wells by antibody conjugation. mediated by the protease, Such as a proliferative disorder, a Preparations of a protease-binding protein and a candidate differentiative or developmental disorder, or a hematopoietic compound are incubated in the protease protein-presenting disorder. Accordingly, methods are provided for detecting Wells and the amount of complex trapped in the well can be the presence, or levels of the protease in a cell, tissue, or quantitated. Methods for detecting Such complexes, in addi organism. The method can involve contacting a biological tion to those described above for the GST-immobilized 25 Sample with a compound capable of interacting with the complexes, include immunodetection of complexes using protease Such that the interaction can be detected. antibodies reactive with the protease protein target One agent for detecting a protease is an antibody capable molecule, or which are reactive with protease protein and of Selectively binding to the protease. A biological Sample compete with the target molecule; as well as enzyme-linked includes tissues, cells and biological fluids isolated from a assays which rely on detecting an enzymatic activity asso Subject, as well as tissues, cells and fluids present within a ciated with the target molecule. Subject. Modulators of protease activity identified according to The protease also provides a target for diagnosing active these drug Screening assays can be used to treat a Subject disease, or predisposition to disease, in a patient having a with a disorder mediated by the protease pathway, by variant protease. Thus, protease can be isolated from a treating cells that express the protease. These methods of 35 biological Sample, assayed for the presence of a genetic treatment include the Steps of administering the modulators mutation that results in aberrant protease. This includes of protein activity in a pharmaceutical composition as amino acid Substitution, deletion, insertion, rearrangement, described herein, to a Subject in need of Such treatment. The (as the result of aberrant splicing events), and inappropriate compounds may be tested first in an animal model to post-translational modification. Analytic methods include determine Safety and efficacy. 40 altered electrophoretic mobility, altered tryptic peptide The protease polypeptides are thus useful for treating a digest, altered protease activity in cell-based or cell-free protease-associated disorder characterized by aberrant assay, alteration in activator, Substrate, or antibody-binding expression or activity of a protease. In one embodiment, the pattern, altered isoelectric point, direct amino acid method involves administering an agent (e.g., an agent Sequencing, and any other of the known assay techniques identified by a screening assay described herein), or com 45 useful for detecting mutations in a protein. bination of agents that modulates (e.g., upregulates or In vitro techniques for detection of protease include downregulates) expression or activity of the protein. In enzyme linked immunosorbent assays (ELISAS), Western another embodiment, the method involves administering a blots, immunoprecipitations and immunofluorescence. protein as therapy to compensate for reduced or aberrant Alternatively, the protein can be detected in Vivo in a Subject expression or activity of the protein. Accordingly, methods 50 by introducing into the Subject a labeled anti-protease anti for treatment include the use of Soluble protease or frag body. For example, the antibody can be labeled with a ments of the protease protein that compete, for example, radioactive marker whose presence and location in a Subject with activator or Substrate binding. These proteases or can be detected by Standard imaging techniques. Particularly fragments can have a higher affinity for the activator or useful are methods which detect the allelic variant of a Substrate So as to provide effective competition. 55 protease expressed in a Subject and methods which detect Stimulation of protein activity is desirable in Situations in fragments of a protease in a Sample. which the protein is abnormally downregulated and/or in It is also within the scope of this invention to determine which increased protein activity is likely to have a beneficial the ability of a test compound to interact with the polypep effect. Likewise, inhibition of protein activity is desirable in tide without the labeling of any of the interactants. For Situations in which the protein is abnormally upregulated 60 example, a microphysiometer can be used to detect the and/or in which decreased protein activity is likely to have interaction of a test compound with the polypeptide without a beneficial effect. One example of Such a situation is where the labeling of either the test compound or the polypeptide. a Subject has a disorder characterized by aberrant develop McConnell, H. M. et al. (1992) Science, 257: 1906–1912. ment or cellular differentiation. Another example of Such a The present invention provides for both prophylactic and Situation is where the Subject has a proliferative disease 65 therapeutic methods of treating a Subject at risk of (or (e.g., cancer) or a disorder characterized by an aberrant Susceptible to) a disorder or having a disorder associated hematopoietic response. Yet another example of Such a with aberrant expression or activity of proteins of the US 6,395,889 B1 45 46 invention. Administration of a prophylactic agent can occur described herein. A preferred fragment produces an antibody prior to the manifestation of Symptoms characteristic of the that diminishes or completely prevents Substrate or aberrancy, Such that a disease or disorder is prevented or, activator-binding. Antibodies can be developed against the alternatively, delayed in its progression. entire protease or portions of the protease, for example, With regard to both prophylactic and therapeutic methods Specific Segments or any portions thereof. Antibodies may of treatment, Such treatments may be specifically tailored or also be developed against Specific functional Sites, Such as modified, based on knowledge obtained from the field of the Site of Substrate or activator-binding, or Sites that are pharmacogenomics. The polypeptides thus provide a target phosphorylated, glycosylated, myristoylated, or otherwise to ascertain a genetic predisposition that can affect treatment modified, Such as amidated. modality. Thus, in a Substrate-based treatment, polymor An antigenic fragment will typically comprise at least 10 phism may give rise to Substrate or activator-binding regions contiguous amino acid residues. The antigenic peptide can that are more or less active in Substrate or activator binding, comprise, however, at least 12, at least 14 amino acid and protease activation or proteolysis. Accordingly, activa residues, at least 15 amino acid residues, at least 20 amino tor or Substrate dosage would necessarily be modified to acid residues, or at least 30 amino acid residues. In one maximize the therapeutic effect within a given population 15 embodiment, fragments correspond to regions that are containing a polymorphism. As an alternative to genotyping, located on the Surface of the protein, e.g., hydrophilic Specific polymorphic polypeptides could be identified. regions. These fragments are not to be construed, however, The protease polypeptides are also useful for monitoring as encompassing any fragments which may be disclosed therapeutic effects during clinical trials and other treatment. prior to the invention. Thus, the therapeutic effectiveness of an agent that is Antibodies can be polyclonal or monoclonal. An intact designed to increase or decrease gene expression, protein antibody, or a fragment thereof (e.g. Fab or F(ab')) can be levels or protease activity can be monitored over the course used. of treatment using the polypeptides as an end-point target. Any of the many well known protocols used for fusing The monitoring can be, for example, as follows: (i) obtain lymphocytes and immortalized cell lines can be applied for ing a pre-administration Sample from a Subject prior to 25 the purpose of generating a monoclonal antibody to a administration of the agent, (ii) detecting the level of expres polypeptide of the invention. Sion of a specified protein in the pre-administration Sample; Detection can be facilitated by coupling (i.e., physically (iii) obtaining one or more post-administration Samples from linking) the antibody to a detectable Substance. Examples of the Subject; (iv) detecting the level of expression or activity detectable Substances include various enzymes, prosthetic of the protein in the post-administration samples, (v) com groups, fluorescent materials, luminescent materials, biolu paring the level of expression or activity of the protein in the minescent materials, and radioactive materials. Examples of pre-administration Sample with the protein in the post Suitable enzymes include horseradish peroxidase, alkaline administration Sample or Samples; and (vi) increasing or phosphatase, f-galactosidase, or acetylcholinesterase, decreasing the administration of the agent to the Subject examples of Suitable prosthetic group complexes include accordingly. 35 streptavidin/biotin and avidin/biotin; examples of suitable The invention also comprises kits for detecting a protease fluorescent materials include umbelliferone, fluorescein, protein. The kit can comprise a labeled compound or agent fluorescein isothiocyanate, rhodamine, dichlorotriaziny capable of detecting protein in a biological Sample, Such as lamine fluorescein, dansyl chloride or phycoerythrin; an an antibody or other binding compound; means for deter example of a luminescent material includes luminol, mining the amount of in the Sample, and means for com 40 examples of bioluminescent materials include luciferase, paring the amount of in the Sample with a Standard. The luciferin, and aequorin, and examples of Suitable radioactive compound or agent can be packaged in a Suitable container. material include 'I, I, S or H. The kit can further comprise instructions for using the kit to An appropriate immunogenic preparation can be derived detect the protein. from native, recombinantly expressed, protein or chemically Antibodies 45 Synthesized peptides. In another aspect, the invention provides antibodies to the Completely human antibodies are particularly desirable polypeptides and polypeptide fragments of the invention, for therapeutic treatment of human patients. Such antibodies e.g., having an amino acid encoded by a nucleic acid can be produced using transgenic mice that are incapable of comprising all or a portion of a nucleotide Sequence Selected expressing endogenous immunoglobulin heavy and light from the group consisting of SEQID NOS: 1-33. Antibodies 50 chains genes, but which can express human heavy and light Selectively bind to the protease and its variants and frag chain genes. The transgenic mice are immunized in the ments. An antibody is considered to Selectively bind, even if normal fashion with a Selected antigen, e.g., all or a portion it also binds to other proteins that are not Substantially of a polypeptide of the invention. Monoclonal antibodies homologous with the protease. These other proteins share directed against the antigen can be obtained using conven homology with a fragment or domain of the protease. This 55 tional hybridoma technology. The human immunoglobulin conservation in Specific regions gives rise to antibodies that transgenes harbored by the transgenic mice rearrange during bind to both proteins by Virtue of the homologous Sequence. B cell differentiation, and Subsequently undergo class In this case, it would be understood that antibody binding to Switching and Somatic mutation. Thus, using Such a the protease is still Selective. technique, it is possible to produce therapeutically useful To generate antibodies, an isolated protease polypeptide is 60 IgG, IgA and IgE antibodies. For an overview of this used as an immunogen to generate antibodies using Standard technology for producing human antibodies, See Lonberg techniques for polyclonal and monoclonal antibody prepa and Huszar (1995, Int. Rev. Immunol. 13.65–93). For a ration. Either the full-length protein or an antigenic peptide detailed discussion of this technology for producing human fragment can be used. antibodies and human monoclonal antibodies and protocols Antibodies are preferably prepared from these regions or 65 for producing Such antibodies, See, e.g., U.S. Pat. No. from discrete fragments in antigenic regions. However, 5,625,126; U.S. Pat. No. 5,633,425; U.S. Pat. No. 5,569,825, antibodies can be prepared from any region of the peptide as U.S. Pat. No. 5,661,016; and U.S. Pat. No. 5,545,806. In US 6,395,889 B1 47 48 addition, companies Such as Abgenix, Inc. (Freemont, The antibodies are also useful in forensic identification. Calif.), can be engaged to provide human antibodies directed Accordingly, where an individual has been correlated with a against a Selected antigen using technology Similar to that Specific genetic polymorphism resulting in a specific poly described above. morphic protein, an antibody Specific for the polymorphic Completely human antibodies that recognize a Selected protein can be used as an aid in identification. epitope can be generated using a technique referred to as The antibodies are also useful for inhibiting protease “guided selection.” This technology is described, for function, for example, blocking Substrate or activator bind example, in Jespers et al. (1994, Bio/technology ing. 12:899–903). These uses can also be applied in a therapeutic context in Antibody Uses which treatment involves inhibiting protease function. An The antibodies can be used to isolate a protease by antibody can be used, for example, to block activator or Standard techniques, Such as affinity chromatography or Substrate binding. Antibodies can be prepared against Spe immunoprecipitation. The antibodies can facilitate the puri cific fragments containing sites required for function or fication of the natural protease from cells and recombinantly against intact protease associated with a cell. produced protease expressed in host cells. 15 The invention also encompasses kits for using antibodies The antibodies are useful to detect the presence of a to detect the presence of a protease in a biological Sample. protease in cells or tissues to determine the pattern of The kit can comprise antibodies Such as a labeled or label expression of the protease among various tissues in an able antibody and a compound or agent for detecting pro organism and over the course of normal development. tease in a biological Sample, means for determining the The antibodies can be used to detect a protease in situ, in amount of protease in the Sample, and means for comparing Vitro, or in a cell lysate or Supernatant in order to evaluate the amount of protease in the Sample with a Standard. The the abundance and pattern of expression. compound or agent can be packaged in a Suitable container. The antibodies can be used to assess abnormal tissue The kit can further comprise instructions for using the kit to distribution or abnormal expression during development. detect protease. Antibody detection of circulating fragments of the full 25 Computer Readable Means length protease can be used to identify protease turnover. The nucleotide or amino acid Sequences of the invention Further, the antibodies can be used to assess protease are also provided in a variety of mediums to facilitate use expression in disease States Such as in active Stages of the thereof. AS used herein, "provided” refers to a manufacture, disease or in an individual with a predisposition toward other than an isolated nucleic acid or amino acid molecule, disease related to protease function. When a disorder is which contains a nucleotide or amino acid Sequence of the caused by an inappropriate tissue distribution, developmen present invention. Such a manufacture provides the nucle tal expression, or level of expression of the protease protein, otide or amino acid Sequences, or a Subset thereof (e.g., a the antibody can be prepared against the normal protease. If Subset of open reading frames (ORFs)) in a form which a disorder is characterized by a Specific mutation in the allows a skilled artisan to examine the manufacture using protease, antibodies Specific for this mutant protein can be 35 means not directly applicable to examining the nucleotide or used to assay for the presence of the Specific mutant pro amino acid Sequences, or a Subset thereof, as they exists in tease. However, intracellularly-made antibodies nature or in purified form. (“intrabodies') are also encompassed, which would recog In one application of this embodiment, a nucleotide or nize intracellular protease peptide regions. amino acid Sequence of the present invention can be The antibodies can also be used to assess normal and 40 recorded on computer readable media. AS used herein, aberrant Subcellular localization of cells in the various “computer readable media' refers to any medium that can be tissues in an organism. Antibodies can be developed against read and accessed directly by a computer. Such media the whole protease or portions of the protease, Such as those include, but are not limited to: magnetic Storage media, Such described herein. as floppy discs, hard disc Storage medium, and magnetic The diagnostic uses can be applied, not only in genetic 45 tape, optical Storage media Such as CD-ROM; electrical testing, but also in monitoring a treatment modality. storage media such as RAM and ROM; and hybrids of these Accordingly, where treatment is ultimately aimed at correct categories Such as magnetic/optical Storage media. The ing protease expression level or the presence of aberrant skilled artisan will readily appreciate how any of the pres proteases and aberrant tissue distribution or developmental ently known computer readable mediums can be used to expression, antibodies directed against the protease or rel 50 create a manufacture comprising computer readable medium evant fragments can be used to monitor therapeutic efficacy. having recorded thereon a nucleotide or amino acid Antibodies accordingly can be used diagnostically to Sequence of the present invention. monitor levels in tissue as part of a clinical testing AS used herein, "recorded” refers to a process for Storing procedure, e.g., to, for example, determine the efficacy of a information on computer readable medium. The skilled given treatment regimen. 55 artisan can readily adopt any of the presently known meth Additionally, antibodies are useful in pharmacogenomic ods for recording information on computer readable medium analysis. Thus, antibodies prepared against polymorphic to generate manufactures comprising the nucleotide or proteases can be used to identify individuals that require amino acid Sequence information of the present invention. modified treatment modalities. A variety of data Storage Structures are available to a The antibodies are also useful as diagnostic tools as an 60 skilled artisan for creating a computer readable medium immunological marker for aberrant protease analyzed by having recorded thereon a nucleotide or amino acid electrophoretic mobility, isoelectric point, tryptic peptide Sequence of the present invention. The choice of the data digest, and other physical assays known to those in the art. Storage Structure will generally be based on the means The antibodies are also useful for tissue typing. Thus, chosen to access the Stored information. In addition, a where a specific protease has been correlated with expres 65 variety of data processor programs and formats can be used Sion in a Specific tissue, antibodies that are specific for this to Store the nucleotide Sequence information of the present protease can be used to identify a tissue type. invention on computer readable medium. The Sequence US 6,395,889 B1 49 SO information can be represented in a word processing text molecule, the protease polynucleotides are covalently linked file, formatted in commercially-available Software Such as to the vector nucleic acid. With this aspect of the invention, WordPerfect and MicroSoft Word, or represented in the form the vector includes a plasmid, Single or double Stranded of an ASCII file, Stored in a database application, Such as phage, a single or double stranded RNA or DNA viral vector, DB2, Sybase, Oracle, or the like. The skilled artisan can or artificial chromosome, Such as a BAC, PAC, YAC, OR readily adapt any number of dataprocessor Structuring for MAC. A vector can be maintained in the host cell as an extra mats (e.g., text file or database) in order to obtain computer chromosomal element where it replicates and produces readable medium having recorded thereon the nucleotide additional copies of the protease polynucleotides. Sequence information of the present invention. Alternatively, the vector may integrate into the host cell By providing the nucleotide or amino acid Sequences of genome and produce additional copies of the protease poly the invention in computer readable form, the skilled artisan nucleotides when the host cell replicates. can routinely access the Sequence information for a variety The invention provides vectors for the maintenance of purposes. For example, one skilled in the art can use the (cloning vectors) or vectors for expression (expression nucleotide or amino acid Sequences of the invention in vectors) of the protease polynucleotides. The vectors can computer readable form to compare a target Sequence or 15 function in procaryotic or eukaryotic cells or in both (shuttle target Structural motif with the Sequence information Stored vectors). within the data Storage means. Search means are used to Expression vectors contain cis-acting regulatory regions identify fragments or regions of the Sequences of the inven that are operably linked in the vector to the protease poly tion which match a particular target Sequence or target motif. nucleotides Such that transcription of the polynucleotides is AS used herein, a “target Sequence' can be any DNA or allowed in a host cell. Within a recombinant expression amino acid Sequence of Six or more nucleotides or two or vector, “operably linked' is intended to mean that the more amino acids. A skilled artisan can readily recognize nucleotide Sequence of interest is linked to the regulatory that the longer a target Sequence is, the less likely a target Sequence(s) in a manner which allows for expression of the Sequence will be present as a random occurrence in the nucleotide sequence (e.g., in an in vitro transcription/ database. The most preferred Sequence length of a target 25 translation System or in a host cell when the vector is sequence is from about 10 to 100 amino acids or from about introduced into the host cell). The polynucleotides can be 30 to 300 nucleotide residues. However, it is well recognized introduced into the host cell with a separate polynucleotide that commercially important fragments, Such as Sequence capable of affecting transcription. Thus, the Second poly fragments involved in gene expression and protein nucleotide may provide a trans-acting factor interacting with processing, may be of shorter length. the cis-regulatory control region to allow transcription of the AS used herein, “a target Structural motif, or “target protease polynucleotides from the Vector. Alternatively, a motif,” refers to any rationally Selected Sequence or com trans-acting factor may be Supplied by the host cell. Finally, bination of Sequences in which the Sequence(s) are chosen a trans-acting factor can be produced from the vector itself based on a three-dimensional configuration which is formed It is understood, however, that in Some embodiments, upon the folding of the target motif. There are a variety of 35 transcription and/or translation of the protease polynucle target motifs known in the art. Protein target motifs include, otides can occur in a cell-free System. but are not limited to, enzyme active sites and Signal The regulatory Sequence to which the polynucleotides Sequences. Nucleic acid target motifs include, but are not described herein can be operably linked include promoters limited to, promoter Sequences, hairpin Structures and induc for directing mRNA transcription. These include, but are not ible expression elements (protein binding sequences). 40 limited to, the left promoter from bacteriophage 2, the lac, Computer software is publicly available which allows a TRP, and TAC promoters from E. coli, the early and late skilled artisan to access Sequence information provided in a promoters from SV40, the CMV immediate early promoter, computer readable medium for analysis and comparison to the adenovirus early and late promoters, and retrovirus other Sequences. A variety of known algorithms are dis long-terminal repeats. closed publicly and a variety of commercially available 45 In addition to control regions that promote transcription, Software for conducting Search means are and can be used in expression vectors may also include regions that modulate the computer-based Systems of the present invention. transcription, Such as repressor binding Sites and enhancers. Examples of Such Software includes, but is not limited to, Examples include the SV40 enhancer, the cytomegalovirus MacPattern (EMBL), BLASTN and BLASTX (NCBIA). immediate early enhancer, polyoma enhancer, adenovirus For example, Software which implements the BLAST 50 enhancers, and retrovirus LTR enhancers. (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and In addition to containing sites for transcription initiation BLAZE (Brutlag et al., Comp. Chem. 17:203-207 (1993)) and control, expression vectors can also contain Sequences Search algorithms on a Sybase System can be used to identify necessary for transcription termination and, in the tran open reading frames (ORFs) of the sequences of the inven Scribed region a ribosome binding Site for translation. Other tion which contain homology to ORFs or proteins from other 55 regulatory control elements for expression include initiation libraries. Such ORFs are protein encoding fragments and are and termination codons as well as polyadenylation signals. useful in producing commercially important proteins Such as The person of ordinary skill in the art would be aware of the enzymes used in various reactions and in the production of numerous regulatory Sequences that are useful in expression commercially useful metabolites. vectors. Such regulatory Sequences are described, for Vectors/Host Cells 60 example, in Sambrook et al., Molecular Cloning. A Labo The invention thus provides vectors containing the pro ratory Manual. 2nd. ed., Cold Spring Harbor Laboratory tease polynucleotides. Another aspect of the invention per Press, Cold Spring Harbor, N.Y., (1989), and such regulatory tains to nucleic acid constructs containing a nucleic acid Sequences are described, for example, in Goeddel, Gene selected from the group consisting of SEQID NOS: 1-33 (or Expression Technology. Methods in Enzymology 185, Aca a portion thereof). The term “vector” refers to a vehicle, 65 demic Press, San Diego, Calif. (1990). preferably a nucleic acid molecule, that can transport the A variety of expression vectors can be used to express a protease polynucleotides. When the vector is a nucleic acid protease poly nucleotide. Such vectors include US 6,395,889 B1 S1 52 chromosomal, episomal, and virus-derived vectors, for of the recombinant protein, and aid in the purification of the example vectors derived from bacterial plasmids, from protein by acting for example as a ligand for affinity puri bacteriophage, from yeast episomes, from yeast chromo fication. A proteolytic cleavage Site may be introduced at the Somal elements, including yeast artificial chromosomes, junction of the fusion moiety So that the desired polypeptide from viruses Such as baculoviruses, papovaviruses Such as can ultimately be separated from the fusion moiety. Pro SV40, Vaccinia viruses, adenoviruses, adeno-associated teolytic enzymes include, but are not limited to, factor Xa, Virus, poxviruses, pseudorabies viruses, and retroviruses. thrombin, and enterokinase. Typical fusion expression vec Vectors may also be derived from combinations of these tors include pGEX (Smith et al., Gene 67:31-40 (1988)), Sources Such as those derived from plasmid and bacterioph pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 age genetic elements, e.g. coSmids and phagemids. Appro (Pharmacia, Piscataway, N.J.) which fuse glutathione priate cloning and expression vectors for prokaryotic and S-transferase (GST), maltose E binding protein, or protein eukaryotic hosts are described in Sambrook et al., Molecular A, respectively, to the target recombinant protein. Examples Cloning. A Laboratory Manual. 2nd. ed., Cold Spring of Suitable inducible non-fusion E. coli expression vectors Harbor Laboratory Press, Cold Spring Harbor, N.Y., (1989). include pTrc (Amann et al., Gene 69:301-315 (1988)) and The regulatory Sequence may provide constitutive expres 15 pET 1 d (Studier et al., Gene Expression Technology: Sion in one or more host cells (i.e. tissue specific) or may Methods in Enzymology 185:60-89 (1990)). provide for inducible expression in one or more cell types Recombinant protein expression can be maximized in a Such as by temperature, nutrient additive, or exogenous host bacteria by providing a genetic background wherein the factor Such as a hormone or other ligand. A variety of vectors host cell has an impaired capacity to proteolytically cleave providing for constitutive and inducible expression in the recombinant protein. (Gottesman, S., Gene Expression prokaryotic and eukaryotic hosts are well known to those of Technology. Methods in Enzymology 185, Academic Press, ordinary skill in the art. Non-limiting examples of Suitable San Diego, Calif. (1990) 119-128). Alternatively, the tissue-specific promoters include the albumin promoter Sequence of the polynucleotide of interest can be altered to (liver-specific; Pinkert et al. (1987) Genes Dev. 1:268-277), provide preferential codon usage for a specific host cell, for lymphoid-specific promoters (Calame and Eaton (1988) 25 example E. coli. (Wada et al., Nucleic Acids Res. Adv. Immunol. 43:235-275), in particular promoters of T 20:2111–2118 (1992)). cell receptors (Winoto and Baltimore (1989) EMBO J. The protease polynucleotides can also be expressed by 8:729–733) and immunoglobulins (Banerji et al. (1983) Cell expression vectors that are operative in yeast. Examples of 33:729–740; Queen and Baltimore (1983) Cell 33:741-748), vectors for expression in yeast e.g., S. cerevisiae include neuron-specific promoters (e.g., the neurofilament promoter; pYepSec1 (Baldari, et al., EMBO J. 6:229–234 (1987)), Byrne and Ruddle (1989) Proc. Natl. Acad. Sci. USA pMFa (Kurjan et al., Cell 30:933–943(1982)), p.JRY88 86:5473-5477), pancreas-specific promoters (Edlund et al. (Schultz et al., Gene 54:113-123 (1987)), and pYES2 (1985) Science 230:912–916), and mammary gland-specific (Invitrogen Corporation, San Diego, Calif.). promoters (e.g., milk whey promoter; U.S. Pat. No. 4,873, The protease polynucleotides can also be expressed in 316 and European Application Publication No. 264,166). 35 insect cells using, for example, baculovirus expression vec Developmentally regulated promoters are also tors. Baculovirus vectors available for expression of proteins encompassed, for example the murine hoX promoters in cultured insect cells (e.g., Sf 9 cells) include the pac (Kessel and Gruss (1990) Science 249:374-379) and the series (Smith et al., Mol. Cell Biol. 3:2156–2165 (1983)) and C.-fetoprotein promoter (Campes and Tilghman (1989) the pVL series (Lucklow et al., Virology 170:31–39 (1989)). Genes Dev. 3:537-546). It will be appreciated by those 40 In certain embodiments of the invention, the polynucle skilled in the art that the design of the expression vector can otides described herein are expressed in mammalian cells depend on Such factors as the choice of the host cell to be using mammalian expression vectors. Examples of mam transformed, the level of expression of protein desired, etc. malian expression vectors include pCDM8 (Seed, B. Nature The expression vectors of the invention can be introduced 329:840 (1987)) and pMT2PC (Kaufman et al., EMBO J. into host cells to thereby produce proteins or peptides, 45 6:187-195 (1987)). When used in mammalian cells, the including fusion proteins or peptides, encoded by nucleic expression vector's control functions are often provided by acids as described herein. Viral regulatory elements. For example, commonly used The protease polynucleotides can be inserted into the promoters are derived from polyoma, Adenovirus 2, cytome vector nucleic acid by well-known methodology. Generally, galovirus and Simian Virus 40. For other suitable expression the DNA sequence that will ultimately be expressed is joined 50 Systems for both prokaryotic and eukaryotic cells See chap to an expression vector by cleaving the DNA sequence and ters 16 and 17 of Sambrook et al., Supra. the expression vector with one or more restriction enzymes Suitable host cells are discussed further in Goeddel, Supra. and then ligating the fragments together. Procedures for Alternatively, the recombinant expression vector can be restriction enzyme digestion and ligation are well known to transcribed and translated in vitro, for example using T7 those of ordinary skill in the art. 55 promoter regulatory Sequences and T7 polymerase. The vector containing the appropriate polynucleotide can It is understood that "host cells' and “recombinant host be introduced into an appropriate host cell for propagation or cells' refer not only to the particular subject cell but also to expression using well-known techniques. Bacterial cells the progeny or potential progeny of Such a cell. Because include, but are not limited to, E. coli, Streptomyces, and certain modifications may occur in Succeeding generations Salmonella typhimurium. Eukaryotic cells include, but are 60 due to either mutation or environmental influences, Such not limited to, yeast, insect cells Such as Drosophila, animal progeny may not, in fact, be identical to the parent cell, but cells such as COS and CHO cells, and plant cells. are still included within the Scope of the term as used herein. AS described herein, it may be desirable to express the The expression vectors listed herein are provided by way polypeptide as a fusion protein. Accordingly, the invention of example only of the well-known vectors available to provides fusion vectors that allow for the production of the 65 those of ordinary skill in the art that would be useful to protease polypeptides. Fusion vectors can increase the express the protease polynucleotides. The perSon of ordinary expression of a recombinant protein, increase the Solubility skill in the art would be aware of other vectors Suitable for US 6,395,889 B1 S3 S4 maintenance propagation or expression of the polynucle Sequence can be endogenous to the protease polypeptides or otides described herein. These are found for example in heterologous to these polypeptides. Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular Where the polypeptide is not Secreted into the medium, Cloning. A Laboratory Manual. 2nd, ed., Cold Spring the protein can be isolated from the host cell by standard Harbor Laboratory, Cold Spring Harbor Laboratory Press, disruption procedures, including freeze thaw, Sonication, Cold Spring Harbor, N.Y., 1989. mechanical disruption, use of lysing agents and the like. The The invention also encompasses vectors in which the polypeptide can then be recovered and purified by well nucleic acid Sequences described herein are cloned into the known purification methods including ammonium Sulfate vector in reverse orientation, but operably linked to a precipitation, acid extraction, anion or cationic exchange regulatory Sequence that permits transcription of antisense chromatography, phosphocellulose chromatography, RNA. Thus, an antisense transcript can be produced to all, or to a portion, of the polynucleotide Sequences described hydrophobic-interaction chromatography, affinity herein, including both coding and non-coding regions. chromatography, hydroxylapatite chromatography, lectin Expression of this antisense RNA is subject to each of the chromatography, or high performance liquid chromatogra parameters described above in relation to expression of the phy. Sense RNA (regulatory sequences, constitutive or inducible 15 It is also understood that depending upon the host cell in expression, tissue-specific expression). recombinant production of the polypeptides described The invention also relates to recombinant host cells herein, the polypeptides can have various glycosylation containing the VectorS described herein. Host cells therefore patterns, depending upon the cell, or maybe non include prokaryotic cells, lower eukaryotic cells Such as glycosylated as when produced in bacteria. In addition, the yeast, other eukaryotic cells Such as plant, fungal, and insect polypeptides may include an initial modified methionine in cells, and higher eukaryotic cells Such as mammalian cells. Some cases as a result of a host-mediated process. The recombinant host cells are prepared by introducing Uses of Vectors and Host Cells the vector constructs described herein into the cells by The host cells expressing the polypeptides described techniques readily available to the perSon of ordinary skill in herein, and particularly recombinant host cells, have a the art. These include, but are not limited to, calcium 25 variety of uses. First, the cells are useful for producing phosphate transfection, DEAE-deXtran-mediated protease polypeptides that can be further purified to produce transfection, cationic lipid-mediated transfection, desired amounts of these. Thus, host cells containing expres electroporation, transduction, infection, lipofection, and Sion vectors are useful for polypeptide production. other techniques Such as those found in Sambrook, et al. Host cells are also useful for conducting cell-based assays (Molecular Cloning: A Laboratory Manual. 2nd, ed., Cold involving the protease or protease fragments. Thus, a recom Spring Harbor Laboratory, Cold Spring Harbor Laboratory binant host cell expressing a native protease is useful to Press, Cold Spring Harbor, N.Y., 1989). assay for compounds that Stimulate or inhibit protease Host cells can contain more than one vector. Thus, dif function. This includes activator binding, gene expression at ferent nucleotide Sequences can be introduced on different the level of transcription or translation, Substrate interaction, vectors of the same cell. Similarly, the protease polynucle 35 and components of the pathway in which the protease is a otides can be introduced either alone or with other poly member. nucleotides that are not related to the protease polynucle Host cells are also useful for identifying protease mutants otides Such as those providing trans-acting factors for in which these functions are affected. If the mutants natu expression vectors. When more than one vector is intro rally occur and give rise to a pathology, host cells containing duced into a cell, the vectors can be introduced 40 the mutations are useful to assay compounds that have a independently, co-introduced or joined to the protease poly desired effect on the mutant protease (for example, Stimu nucleotide vector. lating or inhibiting function) which may not be indicated by In the case of bacteriophage and viral vectors, these can their effect on the native protease. be introduced into cells as packaged or encapsulated virus Recombinant host cells are also useful for expressing the by Standard procedures for infection and transduction. Viral 45 chimeric polypeptides described herein to assess compounds vectors can be replication-competent or replication that activate or SuppreSS activation by means of a heterolo defective. In the case in which Viral replication is defective, gous activator binding domain. Alternatively, a heterologous replication will occur in host cells providing functions that proteolytic region can be used to assess the effect of a complement the defects. desired proteolytic domain on any given host cell. In this Vectors generally include Selectable markers that enable 50 embodiment, a proteolytic region (or parts thereof) compat the Selection of the Subpopulation of cells that contain the ible with the specific host cell is used to make the chimeric recombinant vector constructs. The marker can be contained vector. Alternatively, a heterologous Substrate binding in the same vector that contains the polynucleotides domain can be introduced into the host cell. described herein or may be on a separate vector. Markers Further, mutant proteases can be designed in which one or include tetracycline or amplicillin-resistance genes for 55 more of the various functions is engineered to be increased prokaryotic host cells and dihydrofolate reductase or neo or decreased (e.g., activator binding or Substrate binding) mycin resistance for eukaryotic host cells. However, any and used to augment or replace proteases in an individual. marker that provides Selection for a phenotypic trait will be Thus, host cells can provide a therapeutic benefit by replac effective. ing an aberrant protease or providing an aberrant protease While the mature proteins can be produced in bacteria, 60 that provides a therapeutic result. In one embodiment, the yeast, mammalian cells, and other cells under the control of cells provide proteases that are abnormally active. the appropriate regulatory Sequences, cell-free transcription In another embodiment, the cells provide proteases that and translation Systems can also be used to produce these are abnormally inactive. These proteases can compete with proteins using RNA derived from the DNA constructs endogenous proteases in the individual. described herein. 65 In another embodiment, cells expressing proteases that Where Secretion of the polypeptide is desired, appropriate cannot be activated, are introduced into an individual in Secretion signals are incorporated into the vector. The Signal order to compete with endogenous proteases for activator. US 6,395,889 B1 SS S6 For example, in the case in which excessive activator is part is integrated into the genome of a cell from which a of a treatment modality, it may be necessary to inactivate transgenic animal develops and which remains in the this activator at a Specific point in treatment. Providing cells genome of the mature animal in one or more cell types or that compete for the activator, but which cannot be affected tissueS of the transgenic animal. These animals are useful for by protease activation would be beneficial. Studying the function of a protease protein and identifying Homologously recombinant host cells can also be pro and evaluating modulators of protease protein activity. duced that allow the in Situ alteration of endogenous pro Other examples of transgenic animals include non-human tease polynucleotide Sequences in a host cell genome. The primates, sheep, dogs, cows, goats, chickens, and amphib host cell includes, but is not limited to, a stable cell line, cell ians. in Vivo, or cloned microorganism. This technology is more In one embodiment, a host cell is a fertilized oocyte or an fully described in WO 93/09222, WO 91/12650, WO embryonic Stem cell into which protease polynucleotide 91/06667, U.S. Pat. No. 5,272,071, and U.S. Pat. No. . Sequences have been introduced. 5,641,670. Briefly, Specific polynucleotide Sequences corre A transgenic animal can be produced by introducing sponding to the protease polynucleotides or Sequences nucleic acid into the male pronuclei of a fertilized oocyte, proximal or distal to a protease gene are allowed to integrate 15 e.g., by microinjection, retroviral infection, and allowing the into a host cell genome by homologous recombination oocyte to develop in a pseudopregnant female foster animal. where expression of the gene can be affected. In one Any of the protease nucleotide Sequences can be introduced embodiment, regulatory Sequences are introduced that either as a transgene into the genome of a non-human animal, Such increase or decrease expression of an endogenous Sequence. S OSC. Accordingly, a protease protein can be produced in a cell not Any of the regulatory or other Sequences useful in expres normally producing it. Alternatively, increased expression of Sion vectors can form part of the transgenic Sequence. This protease protein can be effected in a cell normally producing includes intronic Sequences and polyadenylation Signals, if the protein at a specific level. Further, expression can be not already included. A tissue-specific regulatory Sequence decreased or eliminated by introducing a specific regulatory (s) can be operably linked to the transgene to direct expres Sequence. The regulatory Sequence can be heterologous to 25 Sion of the protease to particular cells. the protease protein Sequence or can be a homologous Methods for generating transgenic animals via embryo Sequence with a desired mutation that affects expression. manipulation and microinjection, particularly animals Such Alternatively, the entire gene can be deleted. The regulatory as mice, have become conventional in the art and are Sequence can be specific to the host cell or capable of described, for example, in U.S. Pat. Nos. 4.736,866 and functioning in more than one cell type. Still further, Specific 4,870,009, both by Leder et al., U.S. Pat. No. 4,873,191 by mutations can be introduced into any desired region of the Wagner et al. and in Hogan, B., Manipulating the Mouse gene to produce mutant protease proteins. Such mutations Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring could be introduced, for example, into the specific functional Harbor, N.Y., 1986). Similar methods are used for produc regions Such as the Substrate-binding site. tion of other transgenic animals. A transgenic founder ani In one embodiment, the host cell can be a fertilized oocyte 35 mal can be identified based upon the presence of the or embryonic Stem cell that can be used to produce a transgene in its genome and/or expression of transgenic transgenic animal containing the altered protease gene. mRNA in tissueS or cells of the animals. A transgenic Alternatively, the host cell can be a stem cell or other early founder animal can then be used to breed additional animals tissue precursor that gives rise to a specific Subset of cells carrying the transgene. Moreover, transgenic animals carry and can be used to produce transgenic tissues in an animal. 40 ing a transgene can further be bred to other transgenic See also Thomas et al., Cell 51:503 (1987) for a description animals carrying other transgenes. A transgenic animal also of homologous recombination vectors. The vector is intro includes animals in which the entire animal or tissues in the duced into an embryonic stem cell line (e.g., by animal have been produced using the homologously recom electroporation) and cells in which the introduced gene has binant host cells described herein. homologously recombined with the endogenous protease 45 In another embodiment, transgenic non-human animals gene is selected (see e.g., Li, E. et al., Cell 15 69:915 can be produced which contain Selected Systems which (1992)). The selected cells are then injected into a blastocyst allow for regulated expression of the transgene. One of an animal (e.g., a mouse) to form aggregation chimeras example of Such a System is the cre/loxP recombinase (see e.g., Bradley, A. in Teratocarcinomas and Embryonic system of bacteriophage P1. For a description of the cre/loxP Stem Cells: A Practical Approach, E. J. Robertson, ed. (IRL, 50 recombinase System, See, e.g., LakSo et al. PNAS Oxford, 1987) pp. 113-152). Achimeric embryo can then be 89:6232–6236 (1992). Another example of a recombinase implanted into a Suitable pseudopregnant female foster system is the FLP recombinase system of S. cerevisiae animal and the embryo brought to term. Progeny harboring (O'Gorman et al. Science 251:1351–1355 (1991). If a cre/ the 20 homologously recombined DNA in their germ cells loXP recombinase System is used to regulate expression of can be used to breed animals in which all cells of the animal 55 the transgene, animals containing transgenes encoding both contain the homologously recombined DNA by germline the Cre recombinase and a Selected protein is required. Such transmission of the transgene. Methods for constructing animals can be provided through the construction of homologous recombination vectors and homologous recom “double' transgenic animals, e.g., by mating two transgenic binant animals are described further in Bradley, A. (1991) animals, one containing a transgene encoding a Selected Current Opinion in Biotechnology 2:823–829 and in PCT 60 protein and the other containing a transgene encoding a International Publication Nos. WO 90/11354, WO recombinase. 91/011.40; and WO 93/04169. Clones of the non-human transgenic animals described The genetically engineered host cells can be used to herein can also be produced according to the methods produce non-human transgenic animals. A transgenic animal described in Wilmut, I. et al. Nature 385:810-813 (1997) is preferably a mammal, for example a rodent, Such as a rat 65 and PCT International Publication Nos. WO 97/07668 and or mouse, in which one or more of the cells of the animal WO 97/07669. In brief, a cell, e.g., a somatic cell, from the include a transgene. A transgene is exogenous DNA which transgenic animal can be isolated and induced to exit the US 6,395,889 B1 57 58 growth cycle and enter G. phase. The quiescent cell can then nents: a Sterile diluent Such as water for injection, Saline be fused, e.g., through the use of electrical pulses, to an Solution, fixed oils, polyethylene glycols, glycerine, propy enucleated oocyte from an animal of the same Species from lene glycol or other Synthetic Solvents, antibacterial agents which the quiescent cell is isolated. The reconstructed Such as benzyl alcohol or methyl parabens, antioxidants oocyte is then cultured Such that it develops to morula or Such as ascorbic acid or Sodium bisulfite, chelating agents blastocyst and then transferred to a pseudopregnant female Such as ethylenediaminetetraacetic acid; bufferS Such as foster animal. The offspring born of this female foster acetates, citrates or phosphates and agents for the adjustment animal will be a clone of the animal from which the cell, e.g., of tonicity Such as Sodium chloride or dextrose. pH can be the Somatic cell, is isolated. adjusted with acids or bases, Such as hydrochloric acid or Transgenic animals containing recombinant cells that Sodium hydroxide. The parenteral preparation can be express the polypeptides described herein are useful to enclosed in ampules, disposable Syringes or multiple dose conduct the assays described herein in an in Vivo context. Vials made of glass or plastic. Accordingly, the various physiological factors that are Pharmaceutical compositions Suitable for injectable use present in Vivo and that could affect binding, protease include Sterile aqueous Solutions (where water Soluble) or activation, and the pathway events may not be evident from 15 dispersions and Sterile powders for the extemporaneous in vitro cell-free or cell-based assays. Accordingly, it is preparation of Sterile injectable Solutions or dispersion. For useful to provide non-human transgenic animals to assay in? intravenous administration, Suitable carriers include physi Vivo protease function, including Substrate or activator ological saline, bacteriostatic water, Cremophor ELTM interaction, the effect of Specific mutant proteases on pro (BASF, Parsippany, N.J.) or phosphate buffered saline tease function and interaction with other components, and (PBS). In all cases, the composition must be sterile and the effect of chimeric proteases. It is also possible to assess should be fluid to the extent that easy syringability exists. It the effect of null mutations, that is mutations that Substan must be stable under the conditions of manufacture and tially or completely eliminate one or more protease func Storage and must be preserved against the contaminating tions. action of microorganisms. Such as bacteria and fungi. The In general, methods for producing transgenic animals 25 carrier can be a Solvent or dispersion medium containing, for include introducing a nucleic acid Sequence according to the example, water, ethanol, polyol (for example, glycerol, present invention, the nucleic acid Sequence capable of propylene glycol, and liquid polyethylene glycol, and the expressing the protease protein in a transgenic animal, into like), and Suitable mixtures thereof. The proper fluidity can a cell in culture or in vivo. When introduced in vivo, the be maintained, for example, by the use of a coating Such as nucleic acid is introduced into an intact organism Such that lecithin, by the maintenance of the required particle size in one or more cell types and, accordingly, one or more tissue the case of dispersion and by the use of Surfactants. Pre types, express the nucleic acid encoding the protease. vention of the action of microorganisms can be achieved by Alternatively, the nucleic acid can be introduced into virtu Various antibacterial and antifungal agents, for example, ally all cells in an organism by transfecting a cell in culture, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, Such as an embryonic Stem cell, as described herein for the 35 and the like. In many cases, it will be preferable to include production of transgenic animals, and this cell can be used isotonic agents, for example, Sugars, polyalcohols Such as to produce an entire transgenic organism. AS described, in a mannitol, Sorbitol, Sodium chloride in the composition. further embodiment, the host cell can be a fertilized oocyte. Prolonged absorption of the injectable compositions can be Such cells are then allowed to develop in a female foster brought about by including in the composition an agent animal to produce the transgenic organism. 40 which delays absorption, for example, aluminum Pharmaceutical Compositions monoStearate and gelatin. The protease nucleic acid molecules, protein (particularly Sterile injectable Solutions can be prepared by incorpo fragments that comprise an extracellular domain), modula rating the active compound (e.g., a protease or anti-protease tors of the protein, and antibodies (also referred to herein as antibody) in the required amount in an appropriate Solvent “active compounds”) can be incorporated into pharmaceu 45 with one or a combination of ingredients enumerated above, tical compositions Suitable for administration to a Subject, as required, followed by filtered Sterilization. Generally, e.g., a human. Such compositions typically comprise the dispersions are prepared by incorporating the active com nucleic acid molecule, protein, modulator, or antibody and a pound into a Sterile vehicle which contains a basic disper pharmaceutically acceptable carrier. Sion medium and the required other ingredients from those AS used herein the language “pharmaceutically accept 50 enumerated above. In the case of Sterile powders for the able carrier' is intended to include any and all Solvents, preparation of Sterile injectable Solutions, the preferred dispersion media, coatings, antibacterial and antifungal methods of preparation are vacuum drying and freeze-drying agents, isotonic and absorption delaying agents, and the like, which yields a powder of the active ingredient plus any compatible with pharmaceutical administration. The use of additional desired ingredient from a previously Sterile Such media and agents for pharmaceutically active Sub 55 filtered solution thereof. stances is well known in the art. Except insofar as any Oral compositions generally include an inert diluent or an conventional media or agent is incompatible with the active edible carrier. They can be enclosed in gelatin capsules or compound, Such media can be used in the compositions of compressed into tablets. For oral administration, the agent the invention. Supplementary active compounds can also be can be contained in enteric forms to Survive the Stomach or incorporated into the compositions. A pharmaceutical com 60 further coated or mixed to be released in a particular region position of the invention is formulated to be compatible with of the GI tract by known methods. For the purpose of oral its intended route of administration. Examples of routes of therapeutic administration, the active compound can be administration include parenteral, e.g., intravenous, incorporated with excipients and used in the form of tablets, intradermal, Subcutaneous, oral (e.g., inhalation), transder troches, or capsules. Oral compositions can also be prepared mal (topical), transmucosal, and rectal administration. Solu 65 using a fluid carrier for use as a mouthwash, wherein the tions or Suspensions used for parenteral, intradermal, or compound in the fluid carrier is applied orally and Swished Subcutaneous application can include the following compo and expectorated or Swallowed. US 6,395,889 B1 59 60 Pharmaceutically compatible binding agents, and/or adju vector in an acceptable diluent, or can comprise a slow Vant materials can be included as part of the composition. release matrix in which the gene delivery vehicle is imbed The tablets, pills, capsules, troches and the like can contain ded. Alternatively, where the complete gene delivery vector any of the following ingredients, or compounds of a similar can be produced intact from recombinant cells, e.g. retro nature: a binder Such as microcrystalline cellulose, gum Viral vectors, the pharmaceutical preparation can include tragacanth or gelatin; an excipient Such as Starch or lactose, one or more cells which produce the gene delivery System. a disintegrating agent Such as alginic acid, Primogel, or corn The pharmaceutical compositions can be included in a Starch; a lubricant Such as magnesium Stearate or Sterotes, a container, pack, or dispenser together with instructions for glidant Such as colloidal Silicon dioxide; a Sweetening agent administration. Such as Sucrose or Saccharin; or a flavoring agent Such as AS defined herein, a therapeutically effective amount of peppermint, methyl Salicylate, or orange flavoring. protein or polypeptide (i.e., an effective dosage) ranges from For administration by inhalation, the compounds are about 0.001 to 30 mg/kg body weight, preferably about 0.01 delivered in the form of an aeroSol Spray from pressured to 25 mg/kg body weight, more preferably about 0.1 to 20 container or dispenser which contains a Suitable propellant, mg/kg body weight, and even more preferably about 1 to 10 e.g., a gas Such as carbon dioxide, or a nebulizer. 15 mg/kg, 2 to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5 to 6 Systemic administration can also be by transmucosal or mg/kg body weight. transdermal means. For transmucosal or transdermal The skilled artisan will appreciate that certain factors may administration, penetrants appropriate to the barrier to be influence the dosage required to effectively treat a Subject, permeated are used in the formulation. Such penetrants are including but not limited to the Severity of the disease or generally known in the art, and include, for example, for disorder, previous treatments, the general health and/or age transmucosal administration, detergents, bile Salts, and of the Subject, and other diseaseS present. Moreover, treat fusidic acid derivatives. Transmucosal administration can be ment of a Subject with a therapeutically effective amount of accomplished through the use of nasal SprayS or Supposito a protein, polypeptide, or antibody can include a single ries. For transdermal administration, the active compounds treatment or, preferably, can include a Series of treatments. are formulated into ointments, Salves, gels, or creams as 25 In a preferred example, a Subject is treated with antibody, generally known in the art. protein, or polypeptide in the range of between about 0.1 to The compounds can also be prepared in the form of 20 mg/kg body weight, one time per week for between about Suppositories (e.g., with conventional Suppository bases 1 to 10 weeks, preferably between 2 to 8 weeks, more Such as cocoa butter and other glycerides) or retention preferably between about 3 to 7 weeks, and even more enemas for rectal delivery. preferably for about 4, 5, or 6 weeks. It will also be In one embodiment, the active compounds are prepared appreciated that the effective dosage of antibody, protein, or with carriers that will protect the compound against rapid polypeptide used for treatment may increase or decrease elimination from the body, Such as a controlled release Over the course of a particular treatment. Changes in dosage formulation, including implants and microencapsulated may result and become apparent from the results of diag delivery Systems. Biodegradable, biocompatible polymers 35 nostic assays as described herein. can be used, Such as ethylene Vinyl acetate, polyanhydrides, The present invention encompasses agents which modul polyglycolic acid, , polyorthoesters, and polylactic late expression or activity. An agent may, for example, be a acid. Methods for preparation of such formulations will be Small molecule. For example, Such Small molecules include, apparent to those skilled in the art. but are not limited to, peptides, peptidomimetics, amino The materials can also be obtained commercially from 40 acids, amino acid analogs, polynucleotides, polynucleotide Alza Corporation and Nova Pharmaceuticals, Inc. Liposo analogs, nucleotides, nucleotide analogs, organic or inor mal Suspensions (including liposomes targeted to infected ganic compounds (i.e., including heteroorganic and organo cells with monoclonal antibodies to viral antigens) can also metallic compounds) having a molecular weight less than be used as pharmaceutically acceptable carriers. These can about 10,000 grams per mole, organic or inorganic com be prepared according to methods known to those skilled in 45 pounds having a molecular weight less than about 5,000 the art, for example, as described in U.S. Pat. No. 4,522,811. grams per mole, organic or inorganic compounds having a It is especially advantageous to formulate oral or molecular weight less than about 1,000 grams per mole, parenteral compositions in dosage unit form for ease of organic or inorganic compounds having a molecular weight administration and uniformity of dosage. “Dosage unit less than about 500 grams per mole, and Salts, esters, and form' as used herein refers to physically discrete units Suited 50 other pharmaceutically acceptable forms of Such com as unitary dosages for the Subject to be treated; each unit pounds. containing a predetermined quantity of active compound It is understood that appropriate doses of Small molecule calculated to produce the desired therapeutic effect in asso agents depends upon a number of factors within the ken of ciation with the required pharmaceutical carrier. The Speci the ordinarily skilled physician, Veterinarian, or researcher. fication for the dosage unit forms of the invention are 55 The dose(s) of the small molecule will vary, for example, dictated by and directly dependent on the unique character depending upon the identity, Size, and condition of the istics of the active compound and the particular therapeutic Subject or Sample being treated, further depending upon the effect to be achieved, and the limitations inherent in the art route by which the composition is to be administered, if of compounding Such an active compound for the treatment applicable, and the effect which the practitioner desires the of individuals. 60 Small molecule to have upon the nucleic acid or polypeptide The nucleic acid molecules of the invention can be of the invention. Exemplary doses include milligram or inserted into Vectors and used as gene therapy vectors. Gene microgram amounts of the Small molecule per kilogram of therapy vectors can be delivered to a Subject by, for example, Subject or Sample weight (e.g., about 1 microgram per intravenous injection, local administration (U.S. Pat. No. kilogram to about 500 milligrams per kilogram, about 100 5,328,470) or by Stereotactic injection (see e.g., Chen et al., 65 micrograms per kilogram to about 5 milligrams per PNAS 91:3054–3057 (1994)). The pharmaceutical prepara kilogram, or about 1 microgram per kilogram to about 50 tion of the gene therapy vector can include the gene therapy micrograms per kilogram. It is furthermore understood that US 6,395,889 B1 61 62 appropriate doses of a Small molecule depend upon the renaldipep potency of the Small molecule with respect to the expression reprolysin or activity to be modulated. Such appropriate doses may be S2p determined using the assays described herein. When one or more of these Small molecules is to be administered to an Sercarboxy animal (e.g., a human) in order to modulate expression or Subtilase activity of a polypeptide or nucleic acid of the invention, a t1a physician, Veterinarian, or researcher may, for example, thimet prescribe a relatively low dose at first, Subsequently increas trypsin ing the dose until an appropriate response is obtained. In addition, it is understood that the Specific dose level for any uch1 particular animal Subject will depend upon a variety of luch2 factors including the activity of the Specific compound Zincarboxy employed, the age, body weight, general health, gender, and Znprotease diet of the Subject, the time of administration, the route of 15 administration, the rate of excretion, any drug combination, HUNTSUMMARIES and the degree of expression or activity to be modulated. Hunt Summary: Ace (angiotensin-converting enzyme) This invention may be embodied in many different forms Profiles/HMMs: 1 and should not be construed as limited to the embodiments TBLASTN: 2 Set forth herein; rather, these embodiments are provided So that this disclosure will fully convey the invention to those skilled in the art. HMM Probes for Angiotensin-converting Enzyme Many modifications and other embodiments of the inven Name: Peptidase M2 tion will come to mind in one skilled in the art to which this Acc: PFO1401 invention pertains having the benefit of the teachings pre 25 Description: Angiotensin-converting Enzyme Sented in the foregoing description. Although specific terms Accession PFO1401 are employed, they are used as in the art unless otherwise indicated. ID Peptidase M2 EXAMPLES AC PFO1401 Angiotensin-converting enzyme AS discussed above, the inventors identified human genes encoding members of protease families based on a specific AU Bateman A, Coates D consensus motif or protein domain that characterizes the AL Clustalw manual SE Swiss-Prot family. A search (hunt) of the nucleic acid Sequence database 35 was performed using one or more HMM (Hidden Markov GA-45-45 Model) motifs or a TRANS BLASTN set or both. TC 43.70 43.70 The database was derived from random cDNA sequencing NC-2918O-2918O or by contigging nucleic acid fragments in pre-existing BM hmmbuild HMM SEED databases. 40 BM hmmcalibrate-seed O HMM Hunts Included: RN 1 aCC RM 98.2928.10 aspartyl RT Toward a role for angiotensin-converting enzyme in astacin 45 insects. c15 RA Isaac R E, Schoofs L., Williams T A, Corvol P. CaaX Veelaert D, calpain RA Sajid M, Coates D; caspase RL Ann. NY Acad. Sci. (1998) 839:288–292. cathepsin 50 RN 2) gamma-glutamyl RM 95405266 hemoglobinase RT Peptidyl dipeptidase A: angiotensin I-converting insulinase enzyme. lon 55 RA Corvol P, Williams TA, Soubrier F; m17 RL Methods Enzymol. (1995). 248:283-305. m18 DC This is family M2 in the peptidase classification. MAP DR MEROPS; M2; DC This Prosite motif covers only the active site. matrixin 60 DR PROSITE, PDOCO0129; neprily Sin CC Members of this family are dipeptidyl carboxydipep pCnp tidases peptidase-m20b CC (cleave carboxyl dipeptides) and most notably convert picornain3c 65 CC angiotensin I to angiotensin II. prolyloligo CC Many members of this family contain a tandem proX duplication US 6,395,889 B1 63 64 CC of the 600 amino acid peptidase domain, both of these RN 1 C RM 95405261 CC catalytically active. Most members are Secreted mem RT Evolutionary families of metallopeptidases. brane RA Rawlings N D, Barrett AJ; CC bound ectoenzymes. RL Meth. Enzymol. (1995) 248:183-228. SO 20 RN 2) RM 95.38.4761 Accession PDOCOO129 RT The NMR structure of the inhibited catalytic domain of RT human stromelysin-1. TBLASTN Probes for Angiotensin-converting Enzyme RA Gooley P R, O'Connell J F, Marcy AI, Cuca G C, DbAcc: SP:P22966 Salowe S P Bush Species: Homo Sapiens 15 RABL, Hermes JD, Esser CK, Hagmann WK, Springer Description: angiotensin-converting enzyme precursor, J. P. et al; testis-specific (EC 3.4.15.1) (ACE-T) (dipeptidyl car boxypeptidase I) (kininase II). RL Nat. Struct. Biol. (1994) 1:111-118. DbAcc: SP:P12821 DC This is family M10 in the peptidase classification. Species: Homo Sapiens DR MEROPS; M10; Description: angiotensin-converting enzyme precursor, DC This Prosite motif covers only the active site. Somatic (EC 3.4.15.1) (ACE) (dipeptidyl carboxypep DR PROSITE, PDOCOO472; tidase I) (kininase II). DR PRINTS, PRO0138; DR SCOP; 2srt; fa; Hunt Summary: Hemoglobinase 25 DC The following Pfam-B families contain sequences Profiles/HMMS: O that according to Prodom TBLASTN: 2 DC are members of this Pfam-A family. DR PFAMB, PB035983; HMM Probes for Hemoglobinase DR PFAMB; PB036528; None. CC The members of this family are enzymes that cleave peptides. TBLASTN Probes for Hemoglobinase CC These proteaseS require Zinc for catalysis. DbAcc: GP:YO7596 Species: Homo Sapiens 35 SO 99 Description: GPI transamidase Accession PDOCOO)472 bAcc: SP:O99538 Species: Homo Sapiens Description: Legumain Precursor (EC 3.4.22.34) 40 (asparaginyl endopeptidase). TBLASTN Probes for Matrix Metalloproteases DbAcc: SP:PO3956 Hunt Summary: Matrixin (matrix metalloproteases) Species: Homo Sapiens Profiles/HMMs 1 Description: interstitial collagenase precursor (EC 45 3.4.24.7) (-1) (MMP-1) TBLASTN: 27 (fibroblast collagenase). DbAcc: SP:PO8253 HMM Probes for Matrix Metalloproteases Species: Homo Sapiens Name: Peptidase M10 Description: 72 KD type IV collagenase precursor (EC Acc: PFOO413 50 3.4.24.24) (72 KD gelatinase) (matrix Description: Matrixin metalloproteinase-2) (MMP-2) (gelatinase A) (TBE-1). Accession PFOO413 DbAcc: SP:PO8254 Species: Homo Sapiens ID Peptidase M10 Description: stromelysin-1 precursor (EC 3.4.24.17) AC PFOO413 55 (matrix metalloproteinase-3) (MMP-3) (TRANSIN-1) Matrixin (SL-1). PI matrixin; DbAcc: SP:P22894 Species: Homo Sapiens AU Bateman A, Finn R D Description: neutrophil collagenase precursor (EC AL Clustalw 60 3.4.24.34) (matrix metalloproteinase-8) (MMP-8) SE ProSite (PMNL collagenase) (PMNL-CL). GA-50-50 DbAcc: SP:P1478O TC 49.10-49.10 Species: Homo Sapiens NC-55.50-55.50 65 Description: 92 KD type IV collagenase precursor (EC BM hmmbuild HMM SEED 3.4.24.35) (92. KD gelatinase) (matrix BM hmmcalibrate-seed O HMM metalloproteinase-9) (MMP-9) (gelatinase B). US 6,395,889 B1 65 66 DbAcc: SP:P39900 DbAcc: GP:AFO)40648 Species: Homo Sapiens Species: Caenorhabditis elegans Description: macrophage metalloelastase precursor (EC Description: H19M22.3 contains similarity to the M10B 3.4.24.65) (HME) (matrix metalloproteinase-12) peptidase family (MMP-12). DbAcc: GP:UOOO38 DbAcc: SP:P5O281 Species: Caenorhabditis elegans Species: Homo Sapiens Description: T21D11.1 similar to hatching enzyme pre Description: matrix metalloproteinase-14 precursor (EC cursor and other Zinc metalloproteinases 3.4.24-) (MMP-14) (membrane-type matrix metallo DbAcc: GP:AFOO3534 proteinase 1) (MT-MMP 1) (MTMMP1). Species: Chilo iridescent virus DbAcc: SP:P51511 Description: putative metallopeptidase Species: Homo Sapiens DbAcc: GP:ACOO2396 Description: matrix metalloproteinase-15 precursor (EC Species: Arabidopsis thaliana 3.4.24-) (MMP-15) (membrane-type matrix metallo 15 Description: F3I6.6 similar to zinc metalloproteinases proteinase 2) (MT-MMP 2) (MTMMP 2). DbAcc: GP:AFO63866 DbAcc: SP:P51512 Species: Melanoplus Sanguinipes entomopoxvirus Species: Homo Sapiens Description: matrix metalloproteinase-16 precursor (EC Description: MSV179 3.4.24-) (MMP-16) (membrane-type matrix metallo DbAcc: SP:P16317 proteinase 3) (MT-MMP3) (MTMMP3) (MMP-X2). Species: Erwinia chrysanthemi DbAcc: SP:PO9238 Description: Secreted protease C precursor (EC 3.4.24.-) Species: Homo Sapiens (PROC). DbAcc: GP:Z78414 Description: stromelysin-2 precursor (EC 3.4.24.22) 25 (matrix metalloproteinase-10) (MMP-10) (TRANSIN Species: Caenorhabditis elegans 2) (SL-2). Description: 1350129 W09D12.1 predicted using Gen DbAcc: SP:P24347 efinder Species: Homo Sapiens Description: stromelysin-3 precursor (EC 3.4.24.-) Hunt Summary: pcnp (procollagen N-proteinase) (matrix metalloproteinase-11) (MMP-11) (ST3) (SL Profiles/HMMS: O 3). TBLASTN: 1 DbAcc: SP:P45452 Species: Homo Sapiens 35 HMM Probes for procollagen N-proteinase Description: collagenase 3 precursor (EC 3.4.24.-) None. (matrix metalloproteinase-13) (MMP-13). DbAcc: SP:PO9237 TBLASTN Probes for Procollagen N-proteinase Species: Homo Sapiens DbAcc: GP:AJOO3125 Description: matrilysin precursor (EC 3.4.24.23) 40 Species: Homo Sapiens (PUMP-1 protease) (uterine metalloproteinase) (matrix Description: 3928.000 hPCPNI metalloproteinase-7) (MMP-7) (MATRIN). DbAcc: GP:X92521 Hunt Summary: Reprolysin (ADAM family of Species: Homo Sapiens metalloprotease) Description: 173 1986 MMP-19 (matrix 45 metalloproteinase) Profiles/HMMs: 1 TBLASTN: 24 DbAcc: GP:X89576 Species: Homo Sapiens HMM Probes for ADAM Family of Metalloprotease Description: 1490312 MT4-MMP 50 Name: Reprolysin DbAcc: GP:AB010960 Acc: PFO1421 Species: Rattus rattuS Description: Reprolysin (M12B) family Zinc metallopro Description: 2879941 MIFR teaSe DbAcc: GP:AJOO3144 Accession PFO1421 Species: Homo Sapiens 55 Description: 2808655 MMP20 DbAcc: GP:ABOO7815 ID Reprolysin Species: Caenorhabditis elegans AC PFO1421 Description: 3152402 matrix metalloproteinase Reprolysin (M12B) Family Zinc Metalloprotease DbAcc: GP:ABOOT817 60 AU Bateman A Species: Caenorhabditis elegans AL Clustalw Description: 31524.06 matrix metalloproteinase SE Swissprot DbAcc: GP:Z95309 GA -100-100 Species: Caenorhabditis elegans 65 TC -96.50 -96.50 Description: 3878095 H36L18.1. Similarity to human NC -11180-1118O matrix metalloprotease I (MMP1) BM hmmbuild HMM SEED US 6,395,889 B1 67 68 BM hmmcalibrate-seed O HMM Species: Homo Sapiens RN 1 Description: cell surface antigen MS2 precursor (EC RM 95405261 3.4.24-) (CD156 antigen). RT Evolutionary families of metallopeptidases. DbAcc: GP:AFO 19887 RA Rawlings N D, Barrett AJ; Species: MuS musculus RL Meth. Enzymol. (1995)248:183-228. Description: metalloprotease- meltrin beta DbAcc: GP:U22058 DC This Prosite motif covers only the active site. Species: MuS musculus DR PROSITE; PDOC00129; 1O Description: ADAM 4 protein precursor DR SCOP; last; fa; DbAcc: GP:AFO29899 DC This is family M12B in the peptidase classification. Species: Homo Sapiens DR MEROPS; M12; Description: ADAM 20 DC The following Pfam-B families contain sequences DbAcc: GP:AFOO61.96 that according to Prodom 15 Species: MuS musculus DC are members of this Pfam-A family. Description: metalloprotease-disintegrin MDC15 DR PFAMB; PB016125; DbAcc: GP:AJO 12603 DR PFAMB; PB031186; Species: Rattus rattuS DR PFAMB; PB035940; Description: 1340857 TNF-alpha converting enzyme DR PFAMB, PB036263; (TACE) CC The members of this family are enzymes that cleave DbAcc: GP:AB009672 peptides. Species: Homo Sapiens Description: 3419878 MDC3 CCThese proteases require zinc for catalysis. Members of 25 this DbAcc: GP:X77619 CC family are also known as adamalysins. Species: Macaca macaca CC Most members of this family are Snake venom Description: tMDC II , DbAcc: GP:Y13323 CC but there are also Some mammalian proteins Such as Species: Homo Sapiens Swiss: P78325, Description: 23701.07 disintegrin-protease CC and fertilin Swiss:Q28472. Fertilin and closely related DbAcc: GP:U681.85 CC proteins appear to not have Some active site residues Species: Caenorhabditis elegans Description: adm-1 domain organization is identical to and 35 CC may not be active enzymes. PH-30/fertilin DbAcc: GP:U42846 SO 115 Species: Caenorhabditis elegans Description: T19D2.1 coded for by C. elegans cDNA Accession PDOCOO129 40 yk126d 9.3 DbAcc: GP:Z81460 Species: Caenorhabditis elegans TBLASTN Probes for ADAM Family of Metalloprotease Description: 3873969 CO4A11.4 Similarity to mouse DbAcc: SP:OD5910 meltrin alpha protein Species: MuS musculus 45 DbAcc: SP:O13766 Description: cell surface antigen MS2 precursor (EC Species: Schizosaccharomyces pombe 3.4.24.-) (macrophage cysteine-rich glycoprotein). Description: probable zinc metallopeptidase C17A5.04C DbAcc: GP:U41767 precursor (EC 3.4.24.-). Species: Homo Sapiens DbAcc: GP:Y1O615 Description: metargidin precursor 50 Species: Homo Sapiens DbAcc: GP:AFOO9615 Description: 2198473 CYRN2 member of ADAM family Species: Homo Sapiens DbAcc: GP:AJO10688 Description: disintegrin-metalloprotease MADM Species: MuS musculus DbAcc: GP:AJ242O15 55 Description: 1320044 CYRN1 Species: Homo Sapiens DbAcc: GP:AFO32383 Description: 4757044 eMDC II protein Species: Xenopus laevis DbAcc: GP:D31872 Description: MDC 11 b Species: Homo Sapiens DbAcc: GP:UT81.85 Description: 836684 metalloprotease/disintegrin-like pro 60 Species: Xenopus laevis tein Description: XMDC 16 DbAcc: GP:D67O76 DbAcc: SP:P30403 Species: MuS musculus Species: Agkistrodon rhodostoma Description: 1813340 secretory protein containing throm 65 Description: hemorrhagic protein-rhodostomin precursor bospondin (EC 3.4.24.-) (RHO) contains: disintegrin DbAcc: SP:P78325 rhodostomin. US 6,395,889 B1 69 70 DbAcc: GP:ACOO5162 Hunt Summary: Sercarboxy (serine carboxypeptidases) Species: Homo Sapiens Profiles/HMMs: 1 Description: probable carboxypeptidase precursor TBLASTN: 9 DbAcc: GP:ACOO6929 Species: Arabidopsis thaliana HMM Probes for Serine Carboxypeptidases Description: T1E2.16 Name: Serine carbpept Acc: PFOO450 Hunt Summary: t1a (t1a (proteasome-a) proteases) Description: Serine Carboxypeptidase Profiles/HMMs: 1 Accession PFOO450 TBLASTN: 23

ID Serine carbpept HMM Probes for t1a (proteasome-a) proteases AC PFOO450 15 Name: proteasome Acc: PFOO227 Serine carboxypeptidase Description: Proteasome A-type and B-type AU Finn RD Accession PFOO227 AL Clustalw SE ProSite ID proteasome GA 25 O AC PFOO227 TC 69.10 4.50 Proteasome A-type and B-type NC 610 6.10 AU Finn RD BM hmmbuild -f HMM SEED 25 AL Clustalw BM hmmcalibrate-seed OHMM SE ProSite DR PROSITE; PDOCO0122; GA 9.6 9.6 DR PRINTS; PRO0724; TC 1340 13.40 DR SCOP; 1 whs; fa; NC 8.108.10 SO 89 BM hmmbuild -f HMM SEED BM hmmcalibrate-seed O HMM Accession PDOCOO 122 DR SCOP; 1pma; fa; 35 DR PROSITE, PDOCO0326; TBLASTN Probes for Serine Carboxypeptidases DR PROSITE, PDOCO0668; DbAcc: SP:P10619 DR PRINTS; PRO0141; DC The following Pfam-B families contain sequences Species: Homo Sapiens that according to Prodom Description: lysosomal protective protein precursor (EC 40 3.4.16.5) (cathepsin A) (carboxypeptidase C). DC are members of this Pfam-A family. DbAcc: SP:PO962O DR PFAMB; PB005708; Species: Saccharomyces cerevisiae DR PFAMB; PB035721; Description: carboxypeptidase KEX1 precursor (EC SO 206 3.4.16.6) (carboxypeptidase D). 45 DbAcc: SP:P38109 Accession PDOCOO326 Species: Saccharomyces cerevisiae Description: putative Serine carboxypeptidase in ESR1 IRA1 intergenic region (EC 3.4.16.-). 50 Accession PDOCOO668 DbAcc: SP:P52714 Species: Caenorhabditis elegans Description: putative serine carboxypeptidase C08H9.1 TBLASTN Probes for t1a (proteasome-a) proteases (EC 3.4.16.-). DbAcc: SP:P4972O DbAcc: SP:P52715 55 Species: Homo Sapiens Species: Caenorhabditis elegans Description: proteasome theta chain (EC 3.4.99.46) Description: putative serine carboxypeptidase F13S12.6 (macropain theta chain) (multicatalytic endopeptidase precursor (EC 3.4.16.-). complex theta chain) (prote a Some chain 13) DbAcc: SP:P52717 60 (proteasome component C10-II). Species: Caenorhabditis elegans DbAcc: SP:P18421 Description: putative serine carboxypeptidase F41C3.5 Species: Rattus rattuS precursor (EC 3.4.16.-). Description: proteasome component C5 (EC 3.4.99.46) DbAcc: SP:O09991 (macropain Subunit C5) (proteasome gamma chain) Species: Caenorhabditis elegans 65 (multicatalytic endopeptidase complex subunit C5). Description: putative serine carboxypeptidase K10B2.2 DbAcc: SP:P1842O precursor (EC 3.4.16.-). Species: Rattus rattuS US 6,395,889 B1 71 72 Description: proteasome component C2 (EC 3.4.99.46) YSCE subunit PUP1) (multicatalytic endopeptidase (macropain Subunit C2) (proteasome nu chain) complex subunit PUP1). (multicatalytic endopeptidase complex subunit C2). DbAcc: SP:P28077 DbAcc: SP:P48004 Species: Rattus rattuS Species: Rattus rattuS Description: proteasome chain 7 precursor (EC 3.4.99.46) Description: proteasome subunit RC6-1 (EC 3.4.99.46) (macropain chain 7) (multicatalytic endopeptidase (multicatalytic endopeptidase complex Subunit RC6-1). complex chain 7) (ring 12 protein). DbAcc: SP:P28062 DbAcc: SP:O58634 Species: Homo Sapiens Species: MethanOCOccus jannaschii Description: proteasome component C13 precursor (EC Description: proteasome, beta subunit (EC 3.4.99.46) 3.4.99.46) (macropain subunit C13) (multicatalytic (multicatalytic endopeptidase complex beta Subunit). endopeptidase complex Subunit C13). DbAcc: GP:Y13175 DbAcc: SP:P25789 Species: Arabidopsis thaliana Species: Homo Sapiens 15 Description: 2511572 prega yeast counterpart beta4 Sc Description: proteasome component C9 (EC 3.4.99.46) DbAcc: SP:P30657 (macropain Subunit C9) (multicatalytic endopeptidase Species: Saccharomyces cerevisiae complex subunit C9). Description: proteasome component pre4 (EC 3.4.99.46) DbAcc: SP:P49721 (macropain subunit PRE4) (proteinase YSCE subunit Species: Homo Sapiens PRE4) (multicatalytic endopeptidase complex subunit Description: proteasome component C7-I(EC 3.4.99.46) PRE4). (macropain Subunit C7-I) (multicatalytic endopepti DbAcc: SP:P38624 dase complex subunit C7-I). Species: Saccharomyces cerevisiae DbAcc: SP:P2807O 25 Description: proteasome component PRE3 precursor (EC Species: Homo Sapiens 3.4.99.46) (macropain subunit PRE3) (proteinase Description: proteasome beta chain precursor (EC YSCE subunit PRE3) (multicatalytic endopeptidase 3.4.99.46) (macropain beta chain) (multicatalytic complex subunit PRE3). endopeptidase complex beta chain) (proteasome chain DbAcc: SP:O09720 3) (HSN3) (HSBPROS26). Species: Schizosaccharomyces pombe DbAcc: SP:P34062 Description: putative proteasome component C7-I/C11 Species: Rattus rattuS (EC 3.4.99.46) (macropain subunit) (multicatalytic Description: proteasome iota chain (EC 3.4.99.46) endopeptidase complex Subunit). (macropain iota chain) (multicatalytic endopeptidase DbAcc: SP:P31059 complex iota chain) (27 KD prosomal protein) (PROS 35 Species: Escherichia coli 27) (P27K). DbAcc: SP:P28066 Description: heat shock protein HSLV (EC 3.4.99.-). Species: Homo Sapiens DbAcc: GP:U80442 Description: proteasome Zeta chain (EC 3.4.99.46) Species: Caenorhabditis elegans (macropain Zeta chain) (multicatalytic endopeptidase 40 Description: T20F5.2 coded for by C. elegans cDNA complex Zeta chain). CEESCT1F. DbAcc: SP:P49722 Species: MuS musculus Hunt Summary: Trypsin (trypsin-like Serine proteases) Description: proteasome component C3 (EC 3.4.99.46) 45 Profiles/HMMs: 1 (macropain Subunit C3) (multicatalytic endopeptidase TBLASTN: 69 complex subunit C3). DbAcc: SP:P22141 HMM Probes for Trypsin-like Serine Proteases Species: Saccharomyces cerevisiae Name: trypsin Description: proteasome component C11 (EC 3.4.99.46) 50 Acc: PFOOO89 (macropain Subunit C11) (proteinase ySce Subunit 11) (multicatalytic endopeptidase complex subunit C11). Description: Trypsin DbAcc: SP:O23715 Accession PF00089 Species: Arabidopsis thaliana Description: proteasome component C8 (EC 3.4.99.46) 55 ID trypsin (macropain Subunit C8) (multicatalytic endopeptidase AC PFOOO89 complex subunit C8). Trypsin DbAcc: SP:P23724 AU Lutfiyya L. L., Sonnhammer E L L Species: Saccharomyces cerevisiae Description: potential proteasome component C5 (EC 60 AL Clustalw 3.4.99.46) (multicatalytic endopeptidase complex sub SE SCOP and ProSite unit C5). GA-35-35 DbAcc: SP:P25043 TC-34-20-342O Species: Saccharomyces cerevisiae 65 NC 35.6O-35.60 Description: proteasome component PUP1 precursor (EC BM hmmbuild HMM SEED 3.4.99.46) (macropain subunit PUP1) (proteinase BM hmmcalibrate-seedO HMM US 6,395,889 B1 73 74 RN 1 DbAcc: SP:P22891 RM 95147689 Species: Homo Sapiens RT Families of Serine Peptidases Description: -dependent protein Z. precursor. RA Rawlings N D, Barrett AJ; DbAcc: SP:P10323 Species: Homo Sapiens RL Meth. Enzymol. (1994) 244:19-61. Description: acrosin precursor (EC 3.4.21.10). RN 2 DbAcc: SP:POO736 RM 87292123 Species: Homo Sapiens RTThe Three Dimensional Structure of Asn102 Trypsin: Description: complement C1R component precursor (EC Role of 3.4.21.41). RT Asp102 in Serine Protease Catalysis DbAcc: SP:P2O160 RA Sprang S, Standing T, Fletterick R J, Stroud R M, Species: Homo Sapiens Finer-Moore Description: azurocidin precursor (cationic antimicrobial RAJ, Xuong NH, Hamlin R, Rutter WJ, Craik C S; 15 protein CAP37) (heparin-binding protein) (HBP). RL Science (1987) 237:905–909. DbAcc: SP:PO8311 DR PROSITE; PDOCOO124; Species: Homo Sapiens DR SCOP; 1 mct; fa; Description: cathepsin G precursor (EC 3.4.21.20). DC The following Pfam-B families contain sequences DbAcc: SP:POO751 that according to Prodom Species: Homo Sapiens Description: precursor (EC DC are members of this Pfam-A family. 3.4.21.47) (C3/C5 convertase) (properdin factor B) DR PFAMB; PB001787; (glycine-rich beta glycoprotein) (GBG) (PBF2). DR PFAMB; PB001799; 25 DbAcc: SP:PO51.56 DR PFAMB; PB031543; Species: Homo Sapiens DR PFAMB; PB036276; Description: precursor (EC DR PFAMB; PB036371; 3.4.21.45) (C3B/C4B inactivator). DR PFAMB; PB036499; DbAcc: SP:PO6681 DR PFAMB; PB036558; Species: Homo Sapiens CC Proteins recognized include all proteins in families Description: complement C2 precursor (EC 3.4.21.43) S1, S2A, S2B, (C3/C5 convertase). CC S2C, and S5 in the classification of peptidases. DbAcc: SP:P4874O Species: Homo Sapiens CC Also included are proteins that are clearly members, 35 but that lack Description: complement-activating component of CC peptidase activity, Such as haptoglobin and protein Z RA-reactive factor precursor (EC 3.4.21.-) (RA (PRTZ'). reactive factor serine protease P100) (RARF) SO 696 (mannose-binding protein associated Serine protease) 40 (MASP). DbAcc: SP:P17538 Accession PDOCOO 124 Species: Homo Sapiens Description: chymotrypsinogen B precursor (EC 3.4.21.1). TBLASTN Probes for Trypsin-like Serine Proteases 45 DbAcc: SP:P13582 DbAcc: SP:PO9871 Species: Drosophila melanogaster Species: Homo Sapiens Description: Serine protease easter precursor (EC Description: complement C1S component precursor (EC 3.4.21.-). 3.4.21.42) (C1 esterase). DbAcc: SP:PO8217 DbAcc: SP:POO742 50 Species: Homo Sapiens Species: Homo Sapiens Description: elastase 2A precursor (EC 3.4.21.71). Description: coagulation precursor (EC 3.4.21.6) DbAcc: SP:PO8861 (Stuart factor). Species: Homo Sapiens DbAcc: SP:POO748 Description: elastase IIIB precursor (EC 3.4.21.70) Species: Homo Sapiens 55 (protease E). Description: coagulation factor XII precursor (EC DbAcc: SP:P12544 3.4.21.38) (Hageman factor) (HAF). Species: Homo Sapiens DbAcc: SP:PO8709 Description: a precursor (EC 3.4.21.78) Species: Homo Sapiens 60 (cytotoxic T-lymphocyte proteinase 1) (hanukkah Description: coagulation factor VII precursor (EC factor) (H factor) (HF) (granzyme 1) (CTL ) 3.4.21.21). (fragmentin 1). DbAcc: SP:P33587 DbAcc: SP:PO8883 Species: MuS musculus Species: MuS musculus Description: Vitamin-K dependent protein C precursor 65 Description: granzyme F precursor (EC 3.4.21.-) (EC 3.4.21.69) (autoprothrombin IIA) ( (cytotoxic cell protease 4) (CCP4) (CTL serine pro protein C). tease 3) (C134) (cytotoxic serine protease 3) (MCSP3). US 6,395,889 B1 75 76 DbAcc: SP:P51124 DbAcc: SP:PO7477 Species: Homo Sapiens Species: Homo Sapiens Description: granzyme M precursor (EC 3.4.21.-) (MET Description: 1 precursor (EC 3.4.21.4). ASE) (natural killer cell granular protease) (HU-MET DbAcc: SP:P35005 1) (MET-1 serine protease). Species: Drosophila melanogaster DbAcc: SP:PO5981 Description: trypsin epsilon precursor (EC 3.4.21.4). Species: Homo Sapiens DbAcc: SP:P52905 Description: Serine protease hepsin (EC 3.4.21.-). Species: Drosophila melanogaster DbAcc: SP:P26927 Species: Homo Sapiens Description: trypsin iota precursor (EC 3.4.21.4). Description: hepatocyte growth factor-like protein precur DbAcc: SP:P42278 sor (macrophage stimulatory protein) (MSP) Species: Drosophila melanogaster (macrophage stimulating protein). Description: trypsin theta precursor (EC 3.4.21.4). DbAcc: SP:POO737 15 DbAcc: SP:P42279 Species: Homo Sapiens Species: Drosophila melanogaster Description: haptoglobin-1 precursor. Description: trypsin eta precursor (EC 3.4.21.4). DbAcc: SP:P26262 DbAcc: SP:P42279 Species: MuS musculus Species: Drosophila melanogaster Description: precursor (EC 3.4.21.34) Description: trypsin eta precursor (EC 3.4.21.4). (plasma prekallikrein) (kininogenin) (Fletcher factor). DbAcc: SP:P1512O DbAcc: SP:P15948 Species: Gallus gallus Species: MuS musculus Description: urokinase-type pre Description: glandular kallikrein K22 precursor (EC cursor (EC 3.4.21.73) (UPA) (U-plasminogen 3.4.21.35) (tissue kallikrein) (MGK-22) (epidermal 25 activator). growth factor-binding protein type A) (EGF-BPA) DbAcc: GP:D49742 (nerve growth factor beta chain endopeptidase) (beta Species: Homo Sapiens NGF- endopeptidase). Description: HGF activator like protein DbAcc: SP:P21845 DbAcc: GP:S82666 Species: MuS musculus Species: Homo Sapiens Description: mast cell protease 6 precursor (EC 3.4.21.-) Description: Description: normal epithelial cell Specific (MMCP-6) (tryptase). gene DbAcc: SP:P98159 DbAcc: SP:O16651 Species: Drosophila melanogaster 35 Species: Homo Sapiens Description: Serine protease nudel precursor (EC Description: prostasin 3.4.21.-). DbAcc: GP:U47810 DbAcc: SP:POO747 Species: Homo Sapiens Species: MuS musculus 40 Description: complement factor I Description: plasminogen precursor (EC 3.4.21.7). DbAcc: GP:YO9926 DbAcc: SP:P241.58 Species: Homo Sapiens Species: Homo Sapiens Description: MASP-2 protein Description: myeloblastin precursor (EC 3.4.21.76) DbAcc: GP:Y13192 (leukocyte proteinase 3) (PR-3) (PR3) (AGP7) 45 (Wegener's autoantigen) (P29) (C-ANCA antigen). Species: MuS musculus DbAcc: SP:P498.62 Description: neurotrypsin Species: Homo Sapiens DbAcc: GP:AEOOO663 Description: Stratum corneum chymotryptic enzyme pre Species: MuS musculus cursoR (EC 3.4.21.-) (SCCE). 50 Description: trypsinogen 1 DbAcc: SP:P17205 DbAcc: SP:P17945 Species: Drosophila melanogaster Species: Rattus rattuS Description: Serine proteases 1 and 2 precursor (EC Description: hepatocyte growth factor precursor (Scatter 3.4.21.-). factor) (SF) (hepatopoeitin A). DbAcc: SP:PO5049 55 DbAcc: SP:P19637 Species: Drosophila melanogaster Species: Rattus rattuS Description: Serine protease Snake precursor (EC Description: tissue plasminogen activator precursor (EC 3.4.21.-). 3.4.21.68) (TPA) (T-plasminogen activator). DbAcc: SP:O05319 DbAcc: SP:P97435 Species: Drosophila melanogaster 60 Species: MuS musculus Description: serine proteinase stubble (EC 3.4.21.-) Description: enteropeptidase (EC 3.4.21.9) (stubble-stubbloid protein). (enterokinase). DbAcc: SP:POO734 DbAcc: SP:P49864 Species: Homo Sapiens 65 Species: Rattus rattuS Description: prothrombin precursor (EC 3.4.21.5) Description: granzyme K precursor (EC 3.4.21.-) (NK (coagulation factor II). tryptase-2) (NK-TRYP-2). US 6,395,889 B1 77 78 DbAcc: GP:AFO)42822 Species: MuS musculus HMM Probes for Ubiquitin Carboxyl-terminal Hydrolases Description: epithin Family 2 DbAcc: GP:AFO)42822 Name: UCH-1 Species: MuS musculus Acc: PFOO442 Description: epithin Description: Ubiquitin Carboxyl-terminal Hydrolases DbAcc: SP:P27435 Family 2 Species: Rattus rattuS Accession PF00442 Description: mast cell protease 7 precursor (EC 3.4.21.-) (RMCP-7) (tryptase, skin). ID UCH-1 DbAcc: SP:P32O38 AC PFOO442 Species: Rattus rattuS Ubiquitin Carboxyl-terminal Hydrolases Family 2 Description: complement precursor (EC 3.4.21.46) (C3 convertase activator) (properdin factor 15 AU Finn R D D) (adipsin) (endogenous vascular elastase). AL Clustalw DbAcc: GP:ABOO2134 SE ProSite Species: Homo Sapiens GA 15 15 Description: 3184184 airway trypsin-like protease TC 17.30 17.30 DbAcc: GP:ACOO3965 NC 1480 14.80 Species: Homo Sapiens BM hmmbuild HMM SEED Description: SP001LA BM hmmcalibrate-seedO HMM DbAcc: GP:UT6256 DR PROSITE, PDOCO0750; Species: SuS Scrofa 25 DC The following Pfam-B family may contain sequences Description: enamel matrix Serine proteinase 1 precursor that according to Prodom DbAcc: GP:X75363 DC are members of this Pfam-A family. Species: Homo Sapiens DR PFAMB; PB000592; Description: Serine protease homologue SO 81 DbAcc: SP:P39790 Species: Bacillus Subtilis Accession PDOCOO750 Description: extracellular metalloprotease precursor (EC 3.4.21.-). 35 Name: UCH-2 DbAcc: GP:AFO56311 Species: Drosophila melanogaster Acc: PFOO443 Description: gol Description: Ubiquitin Carboxyl-terminal Hydrolases DbAcc: GP:U29380 Family 2 Accession PFOO443 Species: Caenorhabditis elegans 40 Description: ZK546.15 similar to plasminogen and to trypsin-like Serine ID UCH-2 DbAcc: GP:AL031025 AC PFOO443 Species: Drosophila melanogaster Ubiquitin carboxyl-terminal hydrolases family 2 45 Description: 1313456 EG:9D2.41 AU Finn R D DbAcc: GP:U31961 AL Clustalw Species: Drosophila melanogaster SE ProSite Description: Serine protease-like protein GA 25 25 DbAcc: GP:UTO848 50 Species: Caenorhabditis elegans TC 27.10 27.10 Description: C43G2.5 similar to peptidase family S1 NC 21.80 24.00 DbAcc: GP:AB026735 BM hmmbuild -f HMM SEED Species: Bombyx mori BM hmmcalibrate-seedO HMM Description: 4760786 30kP protease A (43k peptide) 55 DR PROSITE, PDOC00750; precursor SO 78 DbAcc: SP:PO4.188 Species: StaphylococcuS aureuS Accession PDOCOO750 Description: glutamyl endopeptidase precursor (EC 3.4.21.19) (staphylococcal serine proteinase) (V8 60 proteinase) (endoproteinase GLU-C). TBLASTN Probes for Ubiquitin Carboxyl-terminal Hydro lases Family 2 Hunt Summary: uch2 (Ubiquitin carboxyl-terminal hydro DbAcc: SP:P40818 lases family 2) 65 Species: Homo Sapiens ProfileSHMMS: 2 Description: probable ubiquitin carboxyl-terminal hydro TBLASTN: 33 lase (EC 3.1.2.15) (ubiquitin thiolesterase) (ubiquitin US 6,395,889 B1 79 80 Specific processing protease) (deubiquitinating DbAcc: SP:O01476 enzyme) (KIAA0055). Species: Saccharomyces cerevisiae DbAcc: SP:P56399 Description: ubiquitin carboxyl-terminal hydrolase 2 (EC Species: MuS musculus 3.1.2.15) (ubiquitin thiolesterase 2) (ubiquitin-specific Description: ubiquitin carboxyl-terminal hydrolase T (EC processing protease 2) (deubiquitinating enzyme 2). 3.1.2.15) (ubiquitin thiolesterase T) (ubiquitin-specific DbAcc: SP:O24574 processing protease T) (deubiquitinating enzyme T) Species: Drosophila melanogaster (isopeptidase T). Description: ubiquitin carboxyl-terminal hydrolase 64e DbAcc: SP:P54578 (EC 3.1.2.15) (ubiquitin thiolesterase 64E) (ubiquitin Species: Homo Sapiens Specific processing protease 64E) (deubiquitinating Description: queuine trna-ribosyltransferase (EC enzyme 64E). 2.4.2.29) (TRNA-guanine transglycosylase) (guanine DbAcc: GP:AF 126736 insertion enzyme). Species: Homo Sapiens DbAcc: SP:O93008 15 Description: Ubp-M deubiquitinating enzyme homolog Species: Homo Sapiens DbAcc: SP:P34547 Description: probable ubiquitin carboxyl-terminal hydro Species: Caenorhabditis elegans lase FAF-X (EC 3.1.2.15) (ubiquitin thiolesterase FAF Description: probable ubiquitin carboxyl-terminal hydro X) (ubiquitin-specific processing protease FAF-X) lase ROEI 1.3 (EC 3.1.2.15) (ubiquitin thiolesterase) (deubiquitinating enzyme FAF-X) (FATFACETS pro (ubiquitin-Specific processing prote a Se) tein related, X-linked). (deubiquitinating enzyme). Attomey Docket No. 5800-55 DbAcc: SP:P39538 DbAcc: SP:O93009 Species: Saccharomyces cerevisiae Species: Homo Sapiens 25 Description: ubiquitin carboxyl-terminal hydrolase 12 Description: probable ubiquitin carboxyl-terminal hydro (EC 3.1.2.15) (ubiquitin thiolesterase 12) (ubiquitin lase HAUSP (EC 3.1.2.15) (ubiquitin thiolesterase Specific processing protease 12) (deubiquitinating HAUSP) (ubiquitin-specific processing protease enzyme 12). HAUSP) (deubiquitinating enzyme HAUSP) DbAcc: SP:P39944 (herpesvirus associated ubiquitin-specific protease). Species: Saccharomyces cerevisiae DbAcc: SP:O14694 Description: ubiquitin carboxyl-terminal hydrolase 5 (EC Species: Homo Sapiens 3.1.2.15) (ubiquitin thiolesterase 5) (ubiquitin-specific Description: probable ubiquitin carboxyl-terminal hydro processing protease 5) (deubiquitinating enzyme 5). lase (EC 3.1.2.15) (ubiquitin thiolesterase) (ubiquitin DbAcc: SP:P39967 Specific processing protease) (deubiquitinating 35 Species: Saccharomyces cerevisiae enzyme) (KIAAO190). Description: ubiquitin carboxyl-terminal hydrolase 9 (EC DbAcc: SP:O61068 3.1.2.15) (ubiquitin thiolesterase 9) (ubiquitin-specific Species: MuS musculus processing protease 9) (deubiquitinating enzyme 9). Description: ubiquitin carboxyl-terminal hydrolase DbAcc: SP:P5O1 O2 DUB-1 (EC 3.1.2.15) (ubiquitin thiolesterase DUB-1) 40 Species: Saccharomyces cerevisiae (ubiquitin-specific processing protease DUB-1) Description: ubiquitin carboxyl-terminal hydrolase 8 (EC (deubiquitinating enzyme 1). 3.1.2.15) (ubiquitin thiolesterase 8) (ubiquitin-specific DbAcc: SP:O92353 processing protease 8) (deubiquitinating enzyme 8). Species: Schizosaccharomyces pombe 45 DbAcc: SP:P53874 Description: putative ubiquitin carboxyl-terminal hydro Species: Saccharomyces cerevisiae lase C6G9.08 (EC 3.1.2.15) (ubiquitin thiolesterase) Description: ubiquitin carboxyl-terminal hydrolase 10 (ubiquitin-Specific processing protease) (EC 3.1.2.15) (ubiquitin thiolesterase 10) (ubiquitin (deubiquitinating enzyme). Specific processing protease 10) (deubiquitinating DbAcc: GP:AFO795.64 50 enzyme 10). Species: Homo Sapiens DbAcc: SP:O09738 Description: UBP41 similar to Gallus gallus ubiquitin Species: Schizosaccharomyces pombe Specific Description: putative ubiquitin carboxyl-terminal hydro DbAcc: GP:ACOO3974 lase C 13 AI 1.04C (EC 3.1.2.15) (ubiquitin Species: Arabidopsis thaliana 55 thiolesterase) (ubiquitin-specific processing protease) Description: F24L7.8 (deubiquitinating enzyme). DbAcc: GP:AFO)40640 DbAcc: GP:ALO21838 Species: Caenorhabditis elegans Species: Schizosaccharomyces pombe Description: FO9D1.1 similar to peptidase family C19 Description: 1251102 SPBC6B1.06c SPBC6B1.06c, (ubiquitin 60 ubiquitin carboxyl-terminal hydrolase, DbAcc: SP:P43593 DbAcc: SP:P25037 Species: Saccharomyces cerevisiae Species: Saccharomyces cerevisiae Description: putative ubiquitin carboxyl-terminal hydro Description: ubiquitin carboxyl-terminal hydrolaSE 1 lase YFRO 1 OW (EC 3.1.2.15) (ubiquitin 65 (EC 3.1.2.15) (ubiquitin thiolesterase 1) (ubiquitin thiolesterase) (ubiquitin-specific processing protease) Specific processing protease 1) (deubiquitinating (deubiquitinating enzyme). enzyme 1) US 6,395,889 B1 81 82 DbAcc: SP:P36O26 TC 22.5O 640 Species: Saccharomyces cerevisiae NC 1830 18.30 Description: ubiquitin carboxyl-terminal hydrolase 11 BM hmmbuild -f HMM SEED (EC 3.1.2.15) (ubiquitin thiolesterase 11) (ubiquitin Specific processing protease 11) (deubiquitinating BM hmmcalibrate-seedOHMM enzyme DR PROSITE, PDOCO0128; DbAcc: SP:OO1477 DR PRINTS; PRO0792; Species: Saccharomyces cerevisiae DC The Prosite entry also includes Pfam:PF00077. Description: ubiquitin carboxyl-terminal hydrolase 3 (EC DR SCOP, 1mpp; fa; 3.1.2.15) (ubiquitin thiolesterase 3) (ubiquitin-specific DC The following Pfam-B families contain sequences processing protease 3) (deubiquitinating enzyme 3). that according to Prodom DbAcc: SP:O02863 DC are members of this Pfam-A family. Species: Saccharomyces cerevisiae DR PFAMB, PB000566; Description: ubiquitin carboxyl-terminal hydrolase 16 15 DR PFAMB; PB003851; (EC 3.1.2.15) (ubiquitin thiolesterase 16) (ubiquitin Specific processing protease 16) (deubiquitinating DR PFAMB; PB035770; enzyme 16). DR PFAMB; PB036097; DbAcc: SP:O09931 DR PFAMB; PB036332; Species: Caenorhabditis elegans CC Aspartyl (acid) proteases include pepsins, cathepsins, Description: probable ubiquitin carboxyl-terminal hydro and . lase K02C4.3 (EC 3.1.2.15) (ubiquitin thiolesterase) CC Two-domain Structure, probably arising from ances (ubiquitin-Specific processing protease) tral duplication. (deubiquitinating enzyme). 25 CC This family does not include the retroviral nor ret DbAcc: GP:Z81048 rotransposon Species: Caenorhabditis elegans CC proteases (Pfam:PF00077), which are much smaller Description: 1344704 F30A10.10 similarity to human and appear to ubiquitin carboxyl-terminal CC be homologous to a single domain of the eukaryotic DbAcc: GP:Z66512 asp proteases. Species: Caenorhabditis elegans SO 218 Description: 3877391 F52H3.3 951003 DbAcc: GP:U64844 Accession PDOCOO128 Species: Caenorhabditis elegans Description: T22F3.2 coded for by C. elegans cDNA 35 yk85c9.5 TBLASTN Probes for Aspartyl Proteases DbAcc: GP:U5O193 DbAcc: SP:POO790 Species: Caenorhabditis elegans Species: Homo Sapiens Description: ZK328.1 coded for by C. elegans cDNA 40 Description: pepsinogen A precursor (EC 3.4.23. 1). CEMSG95FB DbAcc3 SP:PO7339 DbAcc: GP:Z47812 Species: Homo Sapiens Species: Caenorhabditis elegans Description: precursor eC 3.4.23.5). Description: 1349023 T05H10.1 similar to ubiquitin DbAcc: SP:O05744 carboxyl-terminal hydrolase; 45 Species: Gallus gallus Description: cathepsin D precursor (EC 3.4.23.5). HUNTSUMMARIES DbAcc: SP:P32834 Hunt Summary: Aspartyl (aspartyl proteases) Species: Schizosaccharomyces pombe Profiles/HMMs: 1 50 Description: aspartic proteinase SXA1 precursor (EC TBLASTN: 15 3.4.23.-). DbAcc: SP:P40583 Species: Saccharomyces cerevisiae HMM Probes for Aspartyl Proteases Description: putative aspartyl proteinase YIR039C pre Name: asp 55 cursor (EC 3.4.23.-). Acc: PFOOO26 DbAcc: SP:P53379 Description: Eukaryotic ASpartyl Protease Species: Saccharomyces cerevisiae Accession PFOOO26 Description: aspartic proteinase MKC7 precursor (EC 3.4.23.-). 5 ID asp 60 DbAcc: GP:ACOO5851 AC PFOOO26 Species: Arabidopsis thaliana Eukaryotic Aspartyl Protease Description: F24D 13.17 AU Eddy S R DbAcc: SP:P12630 AL Clustalw 65 Species: Saccharomyces cerevisiae SE Overington enriched Description: barrierpepsin precursor (EC 3.4.23.35) GA 200 (extracellular barrier protein) (BAR proteinase). US 6,395,889 B1 83 84 DbAcc: SP:P28872 RAM A, Murcko M A, Chambers S. P. Aldape R A, Species: Candida albicans Raybuck S A, et al. Description: candidapepsin 1 precursor (EC 3.4.23.24) RL Nature (1994) 370:270–275. (aspartate protease 1) (ACP 1) (Secreted aspartic pro DC This is family C14 in the peptidase classification. tease 1). DR MEROPS: C14; DbAcc: SP:P43.096 Species: Candida albicans DR PROSITE PROFILE; PS50207; Description: candidapepsin 7 precursor (EC 3.4.23.24) DR PROSITE; PD0C00864; (aspartate protease 7) (ACP 7) (Secreted aspartic pro DR SCOP; lice; fa; tease 7). SO 48 DbAcc: SP:O42779 Species: Candida albicans Accession PDOCOO864 Description: candidapepsin 9 precursor (EC 3.4.23.24) (aspartate protease 9) (ACP9) (Secreted aspartic pro 15 Name: ICE p20 tease 9). Acc: PFOO656 DbAcc: SP:P41748 Description: ICE-like protease (caspase) p20 domain Species: Aspergillus fumigatus Accession PFOO656 Description: aspergillopepsin F precursor (EC 3.4.23.18). DbAcc: GP:U97OOO Species: Caenorhabditis elegans ID ICE p20 Description: F21F8.2 similar to eukaryotic aspartyl pro AC PFOO656 teaSeS ICE-like protease (caspase) p20 domain AU Bateman A DbAcc: GP:AFO16435 25 Species: Caenorhabditis elegans AL Clustalw Description: F59D6.3 similar to aspartyl protease SE ProSite DbAcc: GP:U97OOO GA 25 25 Species: Caenorhabditis elegans TC 46.1O 43.30 Description: F21F8.7 similar to eukaryotic aspartyl pro NC 7.00 7.00 teaSeS BM hmmbuild -f HMM SEED BM hmmcalibrate-seedO HMM Hunt Summary: Caspase (caspase family of apoptosis regu lating proteases) RN 1 35 RM 94.309732 Profiles/MS: 2 RT Structure and mechanism of interleukin-1 beta con TBLASTN: 16 Verting RT enzyme. HMM Probes for Caspase Family of Apoptosis Regulating 40 RA Wilson KP, Black J A, Thomson J A, Kim E. E., Proteases Griffith J. P. Navia Name: ICE PIO RAM A, Murcko M A, Chambers S. P. Aldape R A, Acc: PFOO655 Raybuck S A, et al. Description: ICE-like protease (caspase) p10 domain RL Nature (1994) 370:270–275. Accession PFOO655 45 DC This is family C14 in the peptidase classification. DR MEROPS: C14; IDICE p10 DR PROSITE PROFLE; PS50208; AC PFOO655 DR PROSITE, PDOCO0864; ICE-like protease (caspase) p10 domain 50 DR SCOP; lice; fa; AU Bateman A SO 58 AL pftools SE ProSite TBLASTN Probes for Caspase Family of Apoptosis Regu GA 25 25 lating Proteases 55 DbAcc: SP:P294.66 TC 26.40 26.40 Species: Homo Sapiens NC 7.30 7.30 Description: interleukin-1beta convertase precursor (IL BM hmmbuild -f HMM SEED 1BC) (EC 3.4.22.36) (IL-1 beta converting enzyme) BM hmmcalibrate -seedOHM (ICE) (interleukin-1 beta converting enzyme) (P45) RN 1 60 (caspase-1). RM 94.309732 DbAcc: SP:P49662 RT Structure and mechanism of interleukin-1 beta con Species: Homo Sapiens Verting Description: caspase-4 precursor (EC 3.4.22.-) (ICH-2 RT enzyme. 65 protease) (TX protease) (ICEREL-II). RA Wilson KP, Black J A, Thomson J A, Kim E. E., DbAcc: SP:P51878 Griffith JI), Navia Species: Homo Sapiens US 6,395,889 B1 85 86 Description: ICH-3 protease precursor (EC 3.4.22.-) (TY HMM Probes for Peptidase Family m17 protease) (ICEREL-III). Name: Peptidase M17 DbAcc: SP:P42575 Acc: PFOO883 Species: Homo Sapiens Description: Cytosol Aminopeptidase Family Description: caspase-2 precursor (EC 3.4.22.-) (ICH-1 protease) (ICH-1 L/1S). Accession PFOO883 DbAcc: SP:P55210 Species: Homo Sapiens ID Peptidase M17 Description: caspase-7 precursor (EC 3.4.22.-) (ICE-like AC PFOO883 apoptotic protease 3) (ICE-LAP3) (apoptotic protease Cytosol aminopeptidase family MCH-3) (CMH-1). AU Bateman A DbAcc: SP:P55211 AL Clustalw Species: Homo Sapiens SE Pfam-B 990 (release 3.0) Description: caspase-9 precursor (EC 3.4.22.-) (ICE-like 15 GA 10 10 apoptotic protease 6) (ICE-LAP6) (apoptotic protease TC 1160 1160 MCH-6). NC -260.90-260.90 DbAcc: SP:P55212 Species: Homo Sapiens BM hmmbuild-F HMM SEED Description: apoptotic protease MCH-2 precursor (EC BM hmmcalibrate-seed O HMM 3.4.22.-). DR SCOP; 11am; fa, DbAcc: SP:P42574 DR PROSITE, PDOCO0548; Species: Homo Sapiens DC This family belongs to family M17 of the peptidase Description: apopain precursor (EC 3.4.22.-) (cysteine 25 classification. protease CPP32) (YAMA protein). DR MEROPS; M17; DbAcc: SP:O15121 SO 30 Species: Homo Sapiens Description: astrocytic phosphoprotein PEA-15. Accession PDOCOO548 DbAcc: SP:O9285.1 Species: Homo Sapiens Description: caspase-10 precursor (EC 3.4.22.-) (ICE-like TBLASTN Probes for Peptidase Family m17 apoptotic protease 4) (apoptotic protease MCH-4) DbAcc: GP:AFO61738 (FAS-associated death domain protein interleukin-1B 35 Species: Homo Sapiens converting enzyme 2) (FLICE2). Description: leucine aminopeptidase DbAcc: SP:O14790 Species: Homo Sapiens DbAcc: SP:P34629 Description: caspase-8 precursor (EC 3.4.22.-) (ICE-like Species: Caenorhabditis elegans apoptotic protease 5) (MORT1-associated CED-3 40 Description: putative aminopeptidase ZK353.6 in chro homolog) (MACH) (FADD-homologous ICE/CED-3- mosome III (EC 3.4.11.-). like protease) (FADD-like ICE) (FLICE) (apoptotic DbAcc: SP:P37095 ) (apoptotic protease MCH-5) Species: Escherichia coli (CAP4). Description: peptidase B DbAcc: SP:P42573 45 DbAcc: SP:P30184 Species: Caenorhabditis elegans Species: Arabidopsis thaliana Description: cell death protein 3 precursor (EC 3.4.22.-). Description: cytosol aminopeptidase (EC 3.4.11.1) DbAcc: GP:AFO1 O127 (leucine a minopeptidase) (LAP) (leucyl Species: Homo Sapiens 50 aminopeptidase) (proline aminopeptidase) (EC Description: casper 3.4.11.5) (prolyl aminopeptidase). DbAcc: GP:AJOO775O DbAcc: GP:Z78018 Species: MuS musculus Species: Caenorhabditis elegans Description: 3550510 caspase- 14 Description: 3880567 W07G4.4 Identity to C. elegans 55 DbAcc: GP:AFOO1464 protein WP:W07A12.3; cDNA EST Species: Drosophila melanogaster Hunt Summary: Prolyloligo (prolyl oligopeptidases) Description: caspase-1 ProfileSHMMS: 2 DbAcc: GP:AFO9.7874 TBLASTN: 5 Species: Homo Sapiens 60 Description: caspase 14 precursor HMM Probes for Prolyl Oligopeptidases Name: DPPIV N term Hunt Summary: m17 (peptidase family m17) Acc: PFOO930 Profiles/HMMs: 1 65 Description: Dipeptidyl peptidase IV (DPP IV) N-terminal TBLASTN: 5 region Accession PFOO930 US 6,395,889 B1 87 88 Species: Homo Sapiens ID DPPIV N term Description: dipeptidyl peptidase IV like protein AC PFOO930 (dipeptidyl aminopeptidase- related protein) Dipeptidyl peptidase IV (DPP IV) N-terminal region (dipeptidylpeptidase VI) (DPPX-L/DPPX-S). DbAcc: SP:P33894 AU Finn R D, Bateman A Species: Saccharomyces cerevisiae AL Clustalw Description: dipeptidyl aminopeptidase A (EC 3.4.14.-) SE Pfam-B 1017 (release 3.0) (DPAPA) (YSCIV). GA -114-114 1O DbAcc: SP:P18962 TC 1820 18.20 Species: Saccharomyces cerevisiae NC 16870-168.70 Description: dipeptidyl aminopeptidase B (EC 3.4.14.-) BM hmmbuild-F HMM SEED (DPAP B) (YSCV). BM hmmcalibrate -seedO HMM DR PROSITE; PDOC00587; 15 Hunt Summary: Renaldipep DC The following Pfam-B family may contain sequences ProfileSHMMS: 1 that according to Prodom TBLASTN: 1 DC are members of this Pfam-A family. DR PFAMB; PB036169, HMM Probes for Renaldipep CC This family is an alignment of the region to the Name: Renal dipeptase CC N-terminal side of the active site. Acc: PFO1244 CC The Prosite motif does not correspond to this Pfam Description: Renal Dipeptidase Accession PFO1244 entry. 25 SO 23 ID Renal dipeptase Accession PDOCOO587 AC PFO1244 Renal dipeptidase Name: Peptidase S9 AU Finn R D, Bateman A Acc: PFOO326 AL Clustalw Description: Prolyl Oligopeptidase Family SE ProSite Accession PFOO326 GA 25 25 35 TC 271.6O 271.60 ID Peptidase S9 NC 2890-28.90 AC PFOO326 BM hmmbuild-F HMM SEED Prolyl oligopeptidase family BM hmmcalibrate-seedOH PI Prolyl oligopep; 40 DR PROSITE, PDOCOO678; AU Finn R D DC The following Pfam-B families contain sequences AL Clustalw that according to Prodom SE ProSite DC are members of this Pfam-A family. GA 12.8 12.8 45 DR PFAMB; PB012247; TC 1290 12.90 DR PFAMB, PB019330; NC 12.50 12.50 SO 8 BM hmmbuild -f HMM SEED BM hmmcalibrate-seedO HMM Accession PDOCOO678 DC This family belongs to family S9 of the peptidase 50 classification. DR MEROPS; S9; TBLASTN Probes for Renaldipep DR PROSITE; PDOC00587; Mineld: 6453 SO 53 55 DbAcc: SP:P16444 Species: Homo Sapiens TBLASTN Probes for Prolyl Oligopeptidases Description: microsomal dipeptidase precursor (EC DbAcc: SP:P48147 3.4.13.19) (MDP) (dehydropeptidase-i) (renal Species: Homo Sapiens dipeptidase) (RDP). Description: prolyl endopeptidase (EC 3.4.21.26) (post 60 proline cleaving enzyme) (PE). Hunt Summary: Reprolysin (ADAM family of DbAcc: SP:P28843 metalloprotease) Species: MuS musculus Profiles/HMMs: 1 Description: dipeptidyl peptidase IV (EC 3.4.14.5) (DPP 65 TBLASTN: 24 IV) (thymocyte-activating molecule) (THAM). DbAcc: SP:P42658 HMM Probes for ADAM Family of Metalloprotease US 6,395,889 B1 89 90 Name: Reprolysin Species: Homo Sapiens Acc: PFO1421 Description: metargidin precursor Description: Reprolysin (M12B) Family Zinc Metallopro DbAcc: GP:AFOO9615 teaSe Species: Homo Sapiens Accession PFO1421 Description: disintegrin-metalloprotease MADM DbAcc: GP:AJ242O15 ID Reprolysin Species: Homo Sapiens AC PFO1421 Description: 4757044 eMDC II protein Reprolysin (M12B) family Zinc metalloprotease 1O DbAcc: GP:D31872 AU Bateman A Species: Homo Sapiens AL Clustalw Description: 836684 metalloprotease/disintegrin-like pro SE Swissprot tein DbAcc: GP:D67O76 GA -100-100 15 Species: MuS musculus TC -96.50 -96.50 Description: 1813340 secretory protein containing throm NC -11180-1118O bospondin BM hmmbuild HMM SEED DbAcc: SP:P78325 BM hmmcalibrate-seedOHMM Species: Homo Sapiens RN 1 Description: cell surface antigen MS2 precursor (EC RM 95405261 3.4.24-) (CD156 antigen). RT Evolutionary families of metallopeptidases. DbAcc: GP:AFO 19887 Species: MuS musculus RA Rawlings N D, Barrett AJ, 25 RL Meth Enzymol 1995,248:183–228. Description: metalloprotease-disintegrin meltrin beta DC This Prosite motif covers only the active site. DbAcc: GP:U22058 DR PROSITE; PDOC00129: Species: MuS musculus DR SCOP; last; fa; Description: ADAM 4 protein precursor DbAcc: GP:AFO29899 DC This is family M12B in the peptidase classification. Species: Homo Sapiens DR MEROPS; M12; Description: ADAM 20 DC The following Pfam-B families contain Sequences DbAcc: GP:AFOO61.96 that according to Prodom Species: MuS musculus DC are members of this Pfam-A family. 35 Description: metalloprotease-disintegrin MDC 15 DR PFAMB; PB016125; DbAcc: GP:AJO 12603 DR PFAMB; PB031186; Species: Rattus rattuS DR PFAMB; PB035940; Description: 1340857 TNF-alpha converting enzyme DR PFAMB; PB036263; 40 (TACE) CC The members of this family are enzymes that cleave DbAcc: GP:AB009672 peptides. Species: Homo Sapiens CCThese proteases require zinc for catalysis. Members of Description: 3419878 MDC3 this DbAcc: GP:X77619 CC family are also known as adamalysins. 45 Species: Macaca macaca CC Most members of this family are Snake venom Description: tMDC II endopeptidases, DbAcc: GP:Y13323 CC but there are also Some mammalian proteins Such as Species: Homo Sapiens Swiss: P78325, 50 Description: 23701.07 disintegrin-protease CC and fertilin Swiss:Q28472. Fertilin and closely related DbAcc: GP:U681.85 CC proteins appear to not have Some active site residues Species: Caenorhabditis elegans and Description: adm-1 domain organization is identical to CC may not be active enzymes. PH-30/fertilin 55 DbAcc: GP:U42846 SO 115 Species: Caenorhabditis elegans Description: T19D2.1 coded for by C. elegans cDNA Accession PDOCOO129 yk126d 9.3 DbAcc: GP:Z81460 60 Species: Caenorhabditis elegans TBLASTN Probes for ADAM Family of Metalloprotease Description: 3873969 C04A11.4 Similarity to Mouse DbAcc: SP:O05910 meltrin alpha protein Species: MuS musculus DbAcc: SP:O13766 Description: cell surface antigen MS2 precursor (EC 65 Species: Schizosaccharomyces pombe 3.4.24.-) (macrophage cysteine-rich glycoprotein). Description: probable zinc metallopeptidase C17A5.04C DbAcc: GP:U41767 precursor (EC 3.4.24.-).

US 6,395,889 B1 95 96

-continued citgttttact ttacct cact aacaatgggg ggagaaagga gtacaaatag g atctttgac 384 O cago actott tatggctgct atggtttcag agaatgttta tacattattit c taccgagaa 39 OO ttaaaactitc agattgttca acatgagaga aaggctcago aacgtgaaat acgcaaatg 39 60 gctitcctcitt toctitttittg gaccatctoa gtotttattt gtgtaatto a tittgaggaa 4020 aaaacaactic catgitattta ttcaagtgca ttaaagttcta caatggaaaa aag cagtga 4080 agcattagat gctggtaaaa gctagaggag acacaatgag cittagtacct caact tcct 414 O ttctitt.ccita ccatgitaacc citgctittggg aatatggatg taaagaagta cittgttgttct 4200 catgaaaatc agtacaatca Cacaaggagg atgaaacgcc ggaacaaaaa gaggcgg.cg 4260 gaacaaaaat ggggtgtgta galacagggtc. ccacaggttt gggga cattg gatcacttg 4320 tottgtggtg gggaggctgc tgagggg tag caggtocatc to cagcagot gtocaa.ca.g 4.380 togtatcctg gtgaatgtct gttcagotct totgtgagaa tatgatttitt ccatatgta 4 440 tatagtaaaa tatgttacta talaattacat gtactittata agtattggitt gggtgttcc 4500 titccaagaag gactatagitt agtaataaat gcctataata acatattitat tittatacat 45 60 ttatttctaa tgaaaaaaac ttittaaatta tatcgcttitt gtggaagtgc tataaaata gagtattitat acaatatatg ttact agaaa taaaagaa.ca cittittggaaa aaaaaaaaa. 4680 aaaaaaaaaa. aaaaaaaaaa. aac tacticta laala 4740

4800

Cala ataaag cant agcatccaaa ttittcaccala taaagct 4858

<210> SEQ ID NO 2 &2 11s LENGTH 179 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (179) <223> OTHER INFORMATION: ace (angiotensin-converting enzyme) <400 SEQUENCE: 2 ggaatgtgaa atcttgcc cc gggtoaaaat caccitttagt ccttggcact t ggggggaca 60 gggcctggta cittcaaccitt gaggtocacc acatc.ccggit tataaatttic c ttgggtoag 120 gttt catcaa ataaactcca gg.cccaatga togac gaggg tag to gaag g g gaaaaagg 179

<210> SEQ ID NO 3 &2 11s LENGTH 553 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (553) <223> OTHER INFORMATION: Hemoglobinase <400 SEQUENCE: 3 titttittittgg tittctgttat aaatctititat tttittaagac ttgaacacct cagaattaa 60 atacaaaag.c aatcaaaaat catcatattt totaaaaaac agtttitccat tggggcttg 120 cgggcc citct toaaataaaa. to attctgaa gatttaaaga ggitttcagot caaattctic 18O agaataaaga citccttittga gaggggctac agaagcc.ccc atgagctitcc gctccitcaa 240 aactaa.cagg Caaaaaa Ca aacctcctag gaaaggcaaa tatgacccitt tdaatacag agctttitccc accoctaattit gtaggitttitt cagaaaagac tggggggalag cagggtaac 360

US 6,395,889 B1 123 124

-continued c gaatgtgtg ccc gatctog toggcaatgg togaacgctgt ggc.caggcc 349

<210> SEQ ID NO 25 &2 11s LENGTH 323 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (323) <223> OTHER INFORMATION: matrixin (matrix metallop roteases) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (323) <223> OTHER INFORMATION: n = A, T, C or G <400 SEQUENCE: 25 aaatgagggit ttggcatgca gctcgtoatc ttalagagitta citaticittctit ccctdgtgt 60 titc.gc.cgttc cagtgcc.ccc tgctgcagac cataaaggat ggg actttgt gagggctat 120 titccatcaat tttitcctgac C gaga aggag togcc actoc ttaccCagga a CaCasa Ca 18O cagotcct gc aacaattic ca. toggaatggg acagaccitac ttgacatgca ttgcatgct 240 totgctacan cago.cccact gtggggtgCC tgatgggtoc gacaactnca citc.gc.cagg aagatgcaag tggattaa.gc a Ca 323

<210> SEQ ID NO 26 &2 11s LENGTH 118 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature &222s. LOCATION (1) . . . (118) <223> OTHER INFORMATION: m17 (peptidase family m17) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (118) &223> OTHER INFORMATION: n = A T C or G <400 SEQUENCE: 26 tacgattgaa atacgggagc gtgtntggag gatgcnintct togc galagatt a tanaagata 60 ggttgtag at tognagcttg cngatgttaa gaanattgga aaatagagat c ngnagga 118

<210 SEQ ID NO 27 &2 11s LENGTH 658 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (658) <223> OTHER INFORMATION: m17 (peptidase family m17) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (658) <223> OTHER INFORMATION: n = A, T, C or G <400 SEQUENCE: 27

CCaCaaC aggtggctitt tittggittittg cCaggggca 60 aagaattittc aagttggcta atgtc.cctac catcaaaaat ttgagtacca tittggittitc 120 taaaggaaaa ccatatoctic ggaaagggga ttgggaggitt caaaaaattit aatcttitta 18O tagagtttct aaaaaa.ccct tacaaggaaa aatttalacct toccottitat aattittggg 240 gggaaaggaa aaaaaaac Ca gggct tccala titt cattggg gttctoctitt taaaaa.ccc accottggcc ccc.gtaaaaa aattittalacc citttittittaa. aaa.ccctitta ggtocaccg 360 gcc.catttaa aalacagaggg aaaactittitt ggggtttcgg accitalaccitt gcc togcct 420 galaccaaaga aaaaactaat ggaaag.ccct ggggg.ccctc ccago cittgc titc.ccgaag 480 US 6,395,889 B1 125 126

-continued atagggaact taatctttgt gggccatcac gotggctato citaaatgggc c cactaagga 540 ggagaccalaa toctitcagaa togctgcago gga catgcto citgcgatctt g attitccaaa 600 tgtggtacat aagcaa.gctd gaatcaaaca cct ggcttgg atagtgtcgg a agagagg 658

<210> SEQ ID NO 28 &2 11s LENGTH 561. &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (561) <223> OTHER INFORMATION: caspase (caspase family of apoptosis regulating proteases) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (561) <223> OTHER INFORMATION: n = A, T, C or G <400 SEQUENCE: 28 ttittaaaaag atacagtctt tacttittatt gaaataccaa atattgacta t gcaa.gctat 60 actggtaaat gtgctotttg atgttgacag aggagggcto ggctoccitgt g gtttcctitt 120 cagattgtct gtgagtictaa tttct catgg togtgttgaca agg acco cat c tittagctga 18O aattittcaat togc.caggaaa gaggtagaaa totcittgtca aggttgctog t t citatggtg 240 ggcatctggg ctittagctic g toggaacttica aatgatttct gtacctt.ccg a aatattitcc 3OO attaggtggc agcago aaga atatttctgg aag catgtga tigagttcc.gt a atgaagatg 360 gag.ccc cittg togcggtotct coaggacacg titatgttggtg ttgaagaac a g aaa.gcaatg 420 aagttccttct coccggggag tottgcaaac agaatctgcc tocaaggttct cagaggact 480 tgtgaagaga tigacingccaa aggatgctgg agagt cittgg accoa gagtt C cccanggitt 540 ttcaacct cit gcangcc.ccg g 561

<210 SEQ ID NO 29 &2 11s LENGTH 304 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (304) <223> OTHER INFORMATION: caspase (caspase family of apoptosis regulating proteases)

<400 SEQUENCE: 29 tgcc agga aa gagggggaaa tat cittggga tigg acagtca citt tatgggg g g catttalag 60 citttggcc ct toggagtttca aaagattgct g g gcctt.ccg aaatactitct t citacgtggc 120 atcaccalata atttittctgg aag catgttga t gagttgttgt gatgaaaata g agcc.cattg 18O ggctggctct tcaggacacg ttgagtggtg ttittaalacca taaagctato a atticcittct 240 ttaagtgggit cittgttgaaca aaatctttitt citcagttctt acttgtaatt g aaaaag.cct 3OO tittc. 3O4.

<210 SEQ ID NO 30 &2 11s LENGTH 250 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (250) <223> OTHER INFORMATION: renaldipep (renal dipepti dase) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (250) US 6,395,889 B1 127 128

-continued <223> OTHER INFORMATION: n = A, T, C or G <400 SEQUENCE: 30 tgcttitccct tctoacctgttccacttgtc tdaag accog cactatgtat g catgatgga 60 cacattgaac citctitcctcg citccagotac gacitcatcaa cittctotato a tdactgggit 120 atgtggacac atcctacatc. cactgaggta accggcc tog totcgtotat it intccactaa 18O tgct gatgaa citcagatcca atgactgctg tdttgttgttcaaagatatot g c cacagtgg 240 acac gttggc 25 O

<210> SEQ ID NO 31 <211& LENGTH: 612 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (612) <223> OTHER INFORMATION: prolyloligo (prolyl oligo peptidase) <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (612) <223> OTHER INFORMATION: n = A, T, C or G <400 SEQUENCE: 31 to cnttgc.cg gngaccalatg angtogatcca aggtnittctn gngggtngtin a aaggtnnan 60 ggtoanatat tdtcct cocq attcagnacc totnntgttg gncttcticcin g aggttggitt 120 citgnaantica tttggtttcc cagoccitcac taaaaacctin agtantntnct ggtntgtga 18O aaantggaaa toctaatcca ggaalaccatin taagagaagn aacnnattgg g thcagaggg 240 gaacttitcct gntggaatgg ccacagatcc taagtaatag ccctgtc.cat n citgginnagg 3OO gtgacccatin taacgtc.ccg ngtttcctgt titcatagaag atccacagag t gactggggc 360 cccagaaata galaccctgan gat atctgac citctgaatta ntgccatcag g gagaggitat 420 ccitccatagg acago.cgingg attgccacac gatcttangit caatgaaatc a titt.cgagaa 480 gttagatatt go gagtccinc ccaccitgatc cqacaattct timinggitttnc c coggatnag 540 cacn.cnnnnn nnnn.nnnnnn incgnntnnnn nincnn.nnnnn nnnn.cccnnn n incinnn.ccc.n 600 ngnincgnntn tt 612

<210> SEQ ID NO 32 &2 11s LENGTH 239 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (239) <223> OTHER INFORMATION: prolyloligo (prolyl oligo peptidase) <400 SEQUENCE: 32 citgagctgag aggaggctgg citt.ccaacaa agaaatttac agttggatgc a aagctgctg 60 actittggata toggaacco ga aaagttctttg ggaac aggag agacitcaiaca C agcagaagg 120 agaattcaat aaggggg act actg.cgtagt aaaaacgggc aatagdataa t aagagacag 18O atggagacca ccacaaag.ca gaggaggggc tigaagagctt cottcaaaac C cagcc agt 239

<210 SEQ ID NO 33 &2 11s LENGTH 466 &212> TYPE DNA <213> ORGANISM: Homo sapiens &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (1) . . . (466) US 6,395,889 B1 129 130

-continued <223> OTHER INFORMATION: aspartyl (aspartyl protea se) <400 SEQUENCE: 33 ccaggcc.cac ccggacgctg tttittaatgt accctgggtg taagagggcc a catatgtcc 60 ccaataagaci gttcc.cgagg aaccacaagg gcc citgcacg cqgaggaiaca t c cagg gcct 120 ggaalacc.gca caagtgaagg cqc coaccat ttctagt citt citgitatgacg t aatcatoga 18O cc.gagaggitt aaaccaaacc ccc.ctaacaa gqaatgagat toggggagggc t ttgttgattt 240 cc.gagcacag gatgatgaac taccg accca gcaaggggat toccaaagag g atgaatgca 3OO ggg.cccaaat gct citcagtg gctacgtgaa tagg gacct gcc.cgitatcc a ggatgg tag 360 cgcagoccitt gacacaaaga ggcagacatg ggcccaccitt cacacgctoa a tigtggat.ct 420 gccagacggc agg gaccgtg actgg accct agg to aggga tiggitat 466

2O That which is claimed: sequence shown in SEQ ID NO:1 or a complement thereof.

1. An isolated nucleic acid molecule having the nucleotide k . . . .