<<

USOO7282,360B2

(12) United States Patent (10) Patent No.: US 7.282,360 B2 Meyers et al. (45) Date of Patent: Oct. 16, 2007

(54) HUMAN PROTEIN , PHOSPHATASE, provisional application No. 60/197.508, filed on Apr. AND FAMILY MEMBERS AND 18, 2000, provisional application No. 60/187,454, USES THEREOF filed on Mar. 7, 2000, provisional application No. 60/187.420, filed on Mar. 7, 2000, provisional appli (75) Inventors: Rachel E. Meyers, Newton, MA (US); cation No. 60/186,061, filed on Feb. 29, 2000. Peter J. Olandt, Newton, MA (US); Rosana Kapeller-Libermann, Chestnut (51) Int. Cl. Hill, MA (US); Rory A. J. Curtis, CI2N 9/20 (2006.01) Framingham, MA (US); Mark CI2N IS/00 (2006.01) Williamson, Saugus, MA (US); Nadine CI2N L/20 (2006.01) Weich, Brookline, MA (US) CI2O I/68 (2006.01) C7H 2L/04 (2006.01) (73) Assignee: Millennium Pharmaceuticals, Inc., (52) U.S. Cl...... 435/194; 435/320.1; 435/252.3: Cambridge, MA (US) 435/6:536/23.2 (58) Field of Classification Search ...... 435/194, *) Notice: Subject to anyy disclaimer, the term of this 435/6, 252.3, 320.1; 536/23.2 patent is extended or adjusted under 35 See application file for complete search history. U.S.C. 154(b) by 0 days. (56) References Cited (21) Appl. No.: 11/636,948 U.S. PATENT DOCUMENTS (22) Filed: Dec. 11, 2006 6,476,210 B2 * 1 1/2002 Turner et al...... 536,232 2004.0034.888 A1 2/2004 Liu et al. (65) Prior Publication Data US 2007/OO99274 A1 May 3, 2007 FOREIGN PATENT DOCUMENTS WO WOOOf73469 A2 12/2000 Related U.S. Application Data (60) Division of application No. 11/151,601, filed on Jun. * cited by examiner 13, 2005, now Pat. No. 7,198.930, which is a division Primary Examiner Maryam Monshipouri of application No. 10/170,789, filed on Jun. 13, 2002, now Pat. No. 7,070.947, and a continuation-in-part of (57) ABSTRACT application No. 10/045,367, filed on Nov. 7, 2001, now abandoned, and a continuation-in-part of appli The invention provides isolated nucleic acids molecules, cation No. 09/961,721, filed on Sep. 24, 2001, now designated 53070, 15985, 26583, 21953, m32404, 14089, abandoned, and a continuation-in-part of application and 23436 nucleic acid molecules, which encode novel No. 09/934.406, filed on Aug. 21, 2001, now aban human protein kinase family members, serine/threonine doned, and a continuation-in-part of application No. protein kinase family members, serine/threonine phos 09/882,166, filed on Jun. 15, 2001, now abandoned, phatase family members, prolyl oligopeptidase family mem and a continuation-in-part of application No. 09/861, bers, family members, trypsin family 801, filed on May 21, 2001, now abandoned, and a members, and ubiquitin protease family members. The continuation-in-part of application No. 09/829,671, invention also provides antisense nucleic acid molecules, filed on Apr. 10, 2001, now abandoned, and a con recombinant expression vectors containing 53070, 15985, tinuation-in-part of application No. 09/801.267, filed 26583, 21953, m32404, 14089, or 23436 nucleic acid mol on Mar. 6, 2001, now abandoned, and a continuation ecules, host cells into which the expression vectors have in-part of application No. 09/801.275, filed on Mar. 6, been introduced, and nonhuman transgenic animals in which 2001, now abandoned, which is a continuation-in-part a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 of application No. 09/797,039, filed on Feb. 28, 2001, has been introduced or disrupted. The invention still now Pat. No. 6,730,491. further provides isolated 53070, 15985, 26583, 21953, (60) Provisional application No. 60/246,561, filed on Nov. m32404, 14089, or 23436 proteins, fusion proteins, anti 7, 2000, provisional application No. 60/235,023, filed genic and anti-53070, 15985, 26583, 21953, on Sep. 25, 2000, provisional application No. 60/226, m32404, 14089, or 23436 antibodies. Diagnostic methods 740, filed on Aug. 21, 2000, provisional application utilizing compositions of the invention are also provided. No. 60/212,078, filed on Jun. 15, 2000, provisional application No. 60/205,508, filed on May 19, 2000, 15 Claims, 35 Drawing Sheets U.S. Patent US 7.282,360 B2

#ºwºº..),\ 3______^^^^^^^^^A, Z98.18Z10€.|Z|??7 FIGURE 1

U.S. Patent Oct. 16, 2007 Sheet 4 of 35 US 7.282,360 B2

|,'!

FIGURE 4

U.S. Patent Oct. 16, 2007 Sheet 9 Of 35 US 7.282,360 B2

PROTEINBOSPAASE pROTEN prospas 2C 4 DOMAIN 2C DOMAN

4, 8, 12 16l 20, 24l 28, 32 36l 4O 44 48 52

FIGURE 9

U.S. Patent Oct. 16, 2007 Sheet 12 Of 35 US 7.282,360 B2

SSS

COLONMETSSo

COLONTSs. ECOLONT Sea SECOLONT 32S ECOLONT Son ECOLONT SSN

ECOLONT SS COONT &so EcotoNN &es Scoton N. Sssa

EUNGT SSS

Esu GT SSS

SS UNGT SSS EUNG N SS

EUNGN also

S. REAST T s SEESS. B REA ST N

BREASTN s 8 & Š S. s 3 & O C FIGURE 1. E. U.S. Patent Oct. 16, 2007 Sheet 13 Of 35 US 7.282,360 B2

3.

s

FIGURE 12

U.S. Patent Oct. 16, 2007 Sheet 17 of 35 US 7.282,360 B2

A| "Aliyah'Nwa f i is WMNAN , , R. A. Mr. An Cys I Ngly

1. 80 160 240 320 400 480 552.

FIGURE 15

U.S. Patent Oct. 16, 2007 Sheet 22 of 35 US 7.282,360 B2

Trypsin Domain

1 40 8O 120 160 200 24

FIGURE 8

U.S. Patent US 7.282,360 B2

TVN?ISHOSHINDRIAGINTHASASV?LOHà//(8)LLOW(8)LAHL(II)d?RIL(68%)z66d9*< RIOJLOVHGHNH0IJLTÍTWXTIWVHNICHLOHJOOXTONGIÐOW?Z(HSVTORIOLAH

99z=q??šuºT

FIGURE 20A

U.S. Patent Oct. 16, 2007 Sheet 29 of 35 US 7.282,360 B2

VPA48HRCONTP4 VPA48HRCONTP42 VPA50 g/ml. 48HRLP141 VPA50pg/ml 48HRLP142 VPA100ug/ml 48HRLP14

VPA 100 g/ml. 48HRLP42 VPA d6 CONTP26 WPA dOCONTP26 VPA 50 g/ml d6 LP126 29.8 SSS VPA 100 g/ml d6 LP126 20. VPA 50 g/mldOuP126 25.2255 VPA 100 g/ml dOP126 7.6 SVPAd6 CONTLP130 VPAdOCONTLP130 30.2&SVPA50g/mld6 LP130 VPA50 g/ml dOP30

l545iS VPA00 g/ml dolp130

VPAdOCONTLP134 445 SSSSSSSSS VPA 50 g/ml d6 LP134 2.33i:3 VPA100 g/ml d6 LP134 VPA 50 g/mldOP134 VPA100 g/ml dOLP134 3. TGFBLP15548 HR CONT 60.8 SSSSSSSSS 65.25: GRBCONT 48HRLP159 GB50g/mLP159 GFB100 g/ml LP159 C O O C O O C C C C g g s S S S S S 9 o RELATIVE EXPRESSION

FIGURE 23 U.S. Patent Oct. 16, 2007 Sheet 30 Of 35 US 7.282,360 B2

TGFB100 g/ml LP159 GFB50g/ml LP159 GFBCONT48HRLP159 TGFB50 g/ml LP55 TGFB50g/ml LP55 CONT BRU d7-3EPOLPOA BRUd7-3EPOP8 BFU d7 LP79 VPA100 g/ml dOP134

VPA100 g/ml dOP130 VPA100 g/mldO LP126 VPA 50g/mtdIOLPI34

VPA 50 g/ml d10 LP126 VPA100 g/m d6P134 VPA100 g/ml d6 LP30 MPA100 g/ml d6 LP126

VPA 50 g/mld6 LP30 MPA 50 g/mld6P126 VPA100 g/ml 48HRLP42

VPAlOOug/ml. 48HRP4 VPA 50 g/ml 48HRP42 iszVPA50 g/ml48HRLP41

SSVPAdOCONTLP3A SVPA dOCONTLP130 VPAdOCONLP126

PAEES NEUTROPH d4 LP3-6

NEUROPH dALF78 NEUROPH dAF/

NEUROP-d3LF78 NEUROPHL d2 LP26B

EUROPHL di LF78 NEUROPHL d8 LF78 NEUTROPHL d7 PA-3 NEUROPHL d6 P78 NEUTROPHL déP7 EUTROPHILod6 LP27 NEUTROPHILd6 LF26 NEUTROPHL d4 F78 NEUROPHL d4 F7 3 8 woO C C O RELAIME EXPRESSION (NTCUSED AS REFERENCESAMPLE

FIGURE 24 U.S. Patent Oct. 16, 2007 Sheet 31 of 35 US 7,282,360 B2

HEP3bHMP 254.2074393

EP3bNORM270523795 MOA 23690248 NC 40 40 NK562 2382A3789

NBMGPA 274926.340

NERYTH 23.204276 scord BLOOD 293723.97237

s Y PBMCREST 260,759 29 CD4 25,486,267

hoo466h 2568.758 sufnDRO79 27032266 s LMERNDR1928.52500 MERCHT33934,99230303 PASSEL 2922447

Easis NHDFTGF 27722.89356 NtAMER 24.7283277

UNG 27.288.259 3 & 8 5 & 5 . RELATIVE EXPRESSION (NTCUSED AS REFERENCE SAMPLE)

FIGURE 25 U.S. Patent Oct. 16, 2007 Sheet 32 of 35 US 7,282,360 B2

MEGd4P3-5 2549042

MEGd2F35 24209.68

MEGd2F02 272985 6

MEGd7P32 2589753 MEG48RF790 270323,335

MEG48HRF76 29842476296

Eyd4GPAP3-4 12624,949.93

Eyd2F240 239357

Eydt2F23-8 2.84943974 sEydop25-4 2443982409 3:55 Eyd P24-5 2409.20.8665 EydéP3 24872O4346

Ey48HRF76 26679.8589

mBMCD5CDbP52126958.2324 mBMCD5P5 270682222

BMGPAoP82 25842082308.

CORDBOODCD34+P228.620527 BMCD3438 25.2878960 mPBCD34+P52 2524.84389 nPBCD34+F70 2906205328

mBMNCLP139 2523.72840 SKMUSCLEMP67 29.827.458 tAMERMP425 255223.0756

MERNDR200 32320508 KDNEYMP58 2593.99660 SCOLONMP60

UNG 2683778. s & S S & RELATIVE EXPRESSION (NTC USED AS REFERENCE SAMPLE

FIGURE 26 U.S. Patent Oct. 16, 2007 Sheet 33 Of 35 US 7.282,360 B2

SeSYS-Ysary 5

sassassraisasaksstrasaesarsayersssssstrasasters

WSX

assassassraesareer Sess

Sassists SSSSSSSSSSSSSSafeSSSSSSSassasssssssssssss

See SafeS.5se

3 S S 3 3 S 3 iss s ast SS

FIGURE 27 U.S. Patent Oct. 16, 2007 Sheet 34 of 35 US 7.282,360 B2

g CN

S.SES. sists

SšSSéSSSSSSS

EHepG2-B sists c C 323E3&g 2.5- Cneur

sts SSSSSSSS

S W SSSSsea HepG2-A MEEisigles

SMAoO RSS E. at 90 f

- WO au o un o un d d S. co cy. CN N - F O c RELATIVE EXPRESSION (HepC+519 USED AS REFERENCE SAMPLE)

FIGURE 28 U.S. Patent Oct. 16, 2007 Sheet 35 of 35 US 7,282,360 B2

BFUd73Epo 263820239 BRUd3Epo 2642,252

BRUd779 27.262473 NEyd4GRAP34 28239.6726 NByd2F3 2706268240 NEyd2F24-O 2465943.274 NEyd2F238 23057852.72 NByd8F3 2604243409 NEydt P24-5 24.91927202 NEyd3LP3NByd6F3. 22623365.6279237354 NEy48hF72 25,548.0957 NEy48h FO2 273,984 64 NBy48hrs 3249239827

NEy24RF02 28.897 28

scoRDELoudog4273938GPA-GHP34, 27920.8523 46 BMC34P3 272974

NEAMR 29O23:382O2

MERNORM 32.426307

BREAST 3025207s 14 BRANCOREX 3065234568

BRAIN 328425.250 S. s. 3 & 2 V RELATIVE EXPRESSION (NTCUSED AS REFERENCE SAMPLE

FIGURE 29 US 7,282,360 B2 1. 2 HUMAN PROTEIN KINASE, PHOSPHATASE, SUMMARY OF THE INVENTION AND PROTEASE FAMILY MEMBERS AND USES THEREOF The present invention is based, in part, on the discovery of novel protein kinase, serine/threonine protein kinase, RELATED APPLICATIONS 5 serine/threonine phosphatase, prolyl oligopeptidase, trypsin, serine protease, and ubiquitin carboxy-terminal The present application is a divisional of U.S. application family members, referred to herein as “53070, 15985, Ser. No. 11/151,601, filed Jun. 13, 2005, now U.S. Pat. Ser. 26583, 21953, m32404, 14089, and 23436. The nucleotide No. 7,198.930, which is a divisional of U.S. application Ser. sequences of cDNAs encoding 53070, 15985,265.83,21953, No. 10/170,789, filed Jun. 13, 2002, now U.S. Pat. Ser. No. 10 m32404, 14089, and 23436 are recited in SEQID NO: 1, 7, 7,070,947, which is a continuation-in-part of U.S. applica 14, 19, 24, 33, and 40, respectively, and the amino acid tion Ser. No. 09/797,039, filed Feb. 28, 2001, now U.S. Pat. sequences of 53070, 15985, 26583, 21953, m32404, 14089, Ser. No. 6,730,491, which claims the benefit of U.S. Provi and 23436 polypeptides are recited in SEQ ID NO:2, 8, 15, sional Application Ser. No. 60/186,061, filed Feb. 29, 2000 20, 25, 34, and 41, respectively. In addition, the nucleotide (abandoned). U.S. application Ser. No. 10/170,789 is also a 15 sequences of the coding regions are recited in SEQ ID NO: continuation-in-part of U.S. application Ser. No. 09/882, 3, 9, 16, 21, 26, 35, and 42, respectively. 166, filed Jun. 15, 2001 (abandoned), which claims the Accordingly, in one aspect, the invention features a benefit of U.S. Provisional Application Ser. No. 60/212,078, nucleic acid molecule that encodes a 53070 protein or filed Jun. 15, 2000 (abandoned). U.S. application Ser. No. polypeptide, e.g., a biologically active portion of the 53070, 10/170,789 is also a continuation-in-part of U.S. application 15985, 26583, 21953, m32404, 14089, or 23436 protein. In Ser. No. 09/934,406, filed Aug. 21, 2001 (abandoned), a preferred embodiment the isolated nucleic acid molecule which claims the benefit of U.S. Provisional Application Ser. encodes a polypeptide having the amino acid sequence of No. 60/226,740, filed Aug. 21, 2000 (abandoned). U.S. SEQ ID NO:2, 8, 15, 20, 25, 34, and 41. In other embodi application Ser. No. 10/170,789 is also a continuation-in ments, the invention provides isolated 53070, 15985,265.83, part of U.S. application Ser. No. 09/861,801, filed May 21, 25 21953, m32404, 14089, or 23436 nucleic acid molecules 2001 (abandoned), which claims the benefit of U.S. Provi having the nucleotide sequence shown in SEQ ID NO:1, 3, sional Application Ser. No. 60/205,508, filed May 19, 2000 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or 42. In still other (abandoned). U.S. application Ser. No. 10/170,789 is also a embodiments, the invention provides nucleic acid molecules continuation-in-part of U.S. application Ser. No. 09/801, that are Substantially identical (e.g., naturally occurring 267, filed Mar. 6, 2001, (abandoned), which claims the 30 allelic variants) to the nucleotide sequence shown in SEQID benefit of U.S. Provisional Application Ser. No. 60/187,454, NO:1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or 42. In filed Mar. 7.2000 (abandoned). U.S. application Ser. No. other embodiments, the invention provides a nucleic acid 10/170,789 is also a continuation-in-pan of U.S. application molecule which hybridizes under a stringency condition Ser. No. 09/829,671, filed Apr. 10, 2001 (abandoned), which described herein to a nucleic acid molecule comprising the claims the benefit of U.S. Provisional Application Ser. No. 35 nucleotide sequence of SEQID NO:1, 3, 7, 9, 14, 16, 19, 21, 60/197.508, filed Apr. 18, 2000 (abandoned). U.S. applica 24, 26, 33, 35, 40, or 42, wherein the nucleic acid encodes tion Ser. No. 10/170,789 is also a continuation-in-pan of a full length 53070, 15985, 26583, 21953, m32404, 14089, U.S. application Ser. No. 09/961,721, filed Sep. 24, 2001 or 23436 protein or an active fragment thereof. (abandoned), which claims the benefit of U.S. Provisional In a related aspect, the invention further provides nucleic Application Ser. No. 60/235,023, filed Sep. 25, 2000 (aban 40 acid constructs that include a 53070, 15985, 26583, 21953, doned). U.S. application Ser. No. 10/170,789 is also a m32404, 14089, or 23436 nucleic acid molecule described continuation-in-part of U.S. application Ser. No. 10/045, herein. In certain embodiments, the nucleic acid molecules 367, filed Nov. 7, 2001 (abandoned), which claims the of the invention are operatively linked to native or heter benefit of U.S. Provisional Application Ser. No. 60/246,561, ologous regulatory sequences. Also included, are vectors 45 and host cells containing the 53070, 15985, 26583, 21953, filed Nov. 7, 2000 (abandoned). U.S. application Ser. No. m32404, 14089, or 23436 nucleic acid molecules of the 10/170,789 is also a continuation-in-part of U.S. application invention e.g., vectors and host cells Suitable for producing Ser. No. 09/801,275, filed Mar. 6, 2001 (abandoned), which 53070, 15985, 26583, 21953, m32404, 14089, or 23436 claims the benefit of U.S. Provisional Application Ser. No. nucleic acid molecules and polypeptides. 60/187.420, filed Mar. 7, 2000 (abandoned), the contents of 50 In another related aspect, the invention provides nucleic each of which are incorporated herein by reference. acid fragments Suitable as primers or hybridization probes The contents of the Sequence Listing are submitted here for the detection of 53070-, 15985- 26583-, 21953-, with on compact disc in duplicate. Each duplicate disc has m32404-, 14089-, or 23436-encoding nucleic acids. a copy of the file "sequence listing..txt which is incorpo In still another related aspect, isolated nucleic acid mol rated herein by this reference. This file is 140 kilobytes and 55 ecules that are antisense to a 53070, 15985, 26583, 21953, was created on Jun. 8, 2005. The compact disc copies were m32404, 14089, or 23436 encoding nucleic acid molecule created on Dec. 8, 2006. are provided. In another aspect, the invention features, 53070, 15985, BACKGROUND OF THE INVENTION 26583, 21953, m32404, 14089, and 23436 polypeptides, and 60 biologically active or antigenic fragments thereof that are The invention provides isolated polypeptide molecules useful, e.g., as reagents or targets in assays applicable to and nucleic acid molecules encoded the polypeptide mol treatment and diagnosis of 53070-, 15985- 26583-, 21953-, ecules, designated 53070, 15985, 26583, 21953, m32404, m32404-, 14089-, or 23436-mediated or -related disorders. 14089, and 23436 molecules, which encode novel kinase In another embodiment, the invention provides 53070, family molecules, phosphatase family members, and pro 65 15985, 26583, 21953, m32404, 14089, and 23436 polypep tease family members, including prolyl oligopeptidases, tides having a 53070, 15985, 26583,21953, m32404, 14089, serine , and ubiquitin carboxy terminal . or 23436 activity, respectively. Preferred polypeptides are US 7,282,360 B2 3 4 53070 proteins including at least one protein kinase domain, m32404, 14089, or 23436 polypeptides or fragments e.g., a serine/threonine kinase domain, and, preferably, hav thereof, e.g., the protein kinase domain of a 53070 polypep ing a 53070 activity, e.g., a 53070 activity as described tide, the C-terminal non-kinase domain of a 53070 polypep herein. tide, an epitope that includes a phosphorylated amino acid Preferred polypeptides are 15985 proteins including at residue, an extracellular domain of a 15985 polypeptide, least one protein kinase domain and at least one, preferably trypsin domain of an m32404 polypeptide, a trypsin domain two doublecortin repeats, and, preferably, having a 15985 of a 14089 polypeptide, or ubiquitin carboxy-terminal activity, e.g., a 15985 activity as described herein. hydrolase domain. In one embodiment, the antibodies or Preferred polypeptides are 26583 proteins including at antigen-binding fragment thereof competitively inhibit the least one phosphatase catalytic domain, and, preferably, 10 binding of a second antibody to a 53070 or 15985 polypep having a 26583 activity, e.g., a 26583 activity as described tide or a fragment thereof, e.g., the protein kinase domain of herein. 53070, the C-terminal non-kinase domain of 53070, an Preferred polypeptides are 21953 proteins including at epitope that includes a phosphorylated amino acid residue, least one prolyl oligopeptidase domain, and, preferably, or an extracellular domain of a 15985 polypeptide. having a 21953 activity, e.g., a 21953 activity as described 15 In another aspect, the invention provides methods of herein. screening for compounds that modulate the expression or Preferred polypeptides are m32404 proteins including at activity of the 53070, 15985, 26583, 21953, m32404, 14089, least one trypsin domain, e.g., polypeptides including or 23436 polypeptides or nucleic acids. In a preferred m32404 amino acids from about 35 to 268 or polypeptides embodiment, a screened compound alters the de-ubiquiti including m32404 amino acids from about 300-520, and, nating activity of the 23436 polypeptide. preferably, having an m32404 activity, e.g., an m32404 In still another aspect, the invention provides a method for activity as described herein. modulating 53070, 15985, 26583, 21953, m32404, 14089, Preferred polypeptides are 14089 proteins including at or 23436 polypeptide or nucleic acid expression or activity, least one trypsin domain, and, preferably, having a 14089 e.g. using the screened compounds. In certain embodiments, activity, e.g., a 14089 activity as described herein. 25 the methods involve treatment of conditions related to Preferred polypeptides are 23436 polypeptides including aberrant activity or expression of the 53070, 15985, 26583, at least one ubiquitin carboxy-terminal hydrolase domain, 21953, m32404, 14089, or 23436 polypeptides or nucleic and, preferably, having a 23436 activity, e.g., a 23436 acids, such as conditions involving aberrant or deficient de-ubiquitinating activity as described herein. cellular proliferation or differentiation; cell migration; con In other embodiments, the invention provides 53070, 30 ditions involving cholesterol biosynthesis, mitochondrial 15985, 26583, 21953, m32404, 14089, and 23436 polypep dysfunction, or aberrant cellular proliferation of a 26583 tides, e.g., a 53070, 15985, 26583, 21953, m32404, 14089, expressing cell, e.g., a lung cell, a breast cell, a colon cell. or 23436 polypeptide having the amino acid sequence a liver cell, or a brain cell; e.g., a cancer (e.g. a cancer of the shown in SEQ ID NO:2, 8, 15, 20, 25, 34, or 41 an amino lung, breast, ovary, prostate, or colon), or conditions or acid sequence that is substantially identical to the amino acid 35 disorders of the cardiovascular (including vascular, e.g., a sequence shown in SEQID NO:2, 8, 15, 20, 25, 34, and 41; Smooth muscle or an endothelial cell), neuronal, or repro or an amino acid sequence encoded by a nucleic acid ductive (e.g., prostatic) systems; as well as conditions molecule having a nucleotide sequence which hybridizes involving the immune response, and the blood clotting under a stringency condition described herein to a nucleic system, or tumor invasion or metastasis; conditions involv acid molecule comprising the nucleotide sequence of SEQ 40 ing aberrant or deficient proteolytic cleavage; or prolifera ID NO: 1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or 42, tion or cellular differentiation of a hematopoietic cell (e.g., wherein the nucleic acid encodes a full length 53070, 15985, a hematopoietic or an erythroid disorder). 26583,21953, m32404, 14089, or 23436 protein oran active In one embodiment, a method for inhibiting abnormal fragment thereof. phosphorylation in a cell or a subject is provided. In other In a related aspect, the invention further provides nucleic 45 embodiments, a method for enhancing phosphorylation in a acid constructs which include a 15985, 21953, m32404, or cell or a subject is provided. The method includes contacting 23436 nucleic acid molecule described herein. a cell, or administering to a subject, a modulator of 53070 In a related aspect, the invention provides 53070, 15985, polypeptide or nucleic acid activity or expression, to thereby 26583, 21953, m32404, 14089, or 23436 polypeptides or modulate, e.g., inhibit or enhance, the phosphorylation state fragments operatively linked to non-53070, 15985, 26583, 50 in the cell or subject. 21953, m32404, 14089, or 23436 polypeptides to form The invention also provides assays for determining the fusion proteins. activity of or the presence or absence of 15985 polypeptides In another aspect, the invention provides a method of or nucleic acid molecules in a biological sample, including evaluating a sample. The method includes: providing a for disease diagnosis. sample; detecting a 21953 polypeptide or nucleic acid in the 55 In yet another aspect, the invention provides methods for sample; and, optionally, comparing the level of expressed inhibiting the proliferation or inducing the killing, of a 21953 molecules to a reference sample. For example, an 15985-expressing cell, e.g., a hyperproliferative 15985-ex increased level of 21953 molecules can be an indication that pressing cell. The method includes contacting the cell with the sample includes cells transiting from the GI cell cycle an agent, e.g., a compound, (e.g., a compound identified phase to S phase. In other examples, the level of 21953 60 using the methods described herein) that modulates the molecules can be an indication that a sample includes a activity, or expression, of the 15985 polypeptide or nucleic proliferating cell, e.g., a proliferating lung, breast, ovary, or acid. In a preferred embodiment, the contacting step is colon cell; or a heart cell, a prostate cell, a vascular cell (e.g., effective in vitro or ex vivo. In other embodiments, the a Smooth muscle or an endothelial cell), or a brain cell. contacting step is effected in vivo, e.g., in a Subject (e.g., a In another aspect, the invention features antibodies and 65 mammal, e.g., a human), as part of a therapeutic or prophy antigen-binding fragments thereof, that react with, or more lactic protocol. In a preferred embodiment, the cell is a preferably specifically bind 53070, 15985, 26583, 21953, hyperproliferative cell, e.g., a cell found in a solid tumor, a US 7,282,360 B2 5 6 Soft tissue tumor, or a metastatic lesion. In other embodi The contacting step(s) can be repeated. ments, the hyperproliferative cell is an ovarian or a lung In a preferred embodiment, the agent decreases the pro tumor cell. liferation and/or enhances the differentiation of the cell, e.g., In still another aspect, the invention features a method of the 26583-expressing cell, e.g., the lung, breast, colon, liver, modulating (e.g., enhancing or inhibiting) the proliferation, 5 or brain cell. Such agents can be used to treat or prevent survival, and/or differentiation of a cell, e.g., a 26583 cancers, e.g., liver, breast, brain, colon, or lung carcinomas. expressing cell, e.g., a lung cell, a breast cell, a colon cell, In yet another aspect, the invention features a method of a liver cell, or a brain cell. The method includes contacting treating or preventing a disorder, e.g., a metabolic disorder, the cell with an agent that modulates the activity or expres e.g., a mitochondrial related disorder or a cholesterol bio sion of a 26583 polypeptide or nucleic acid, in an amount 10 synthesis related disorder; or a cellular proliferation and/or effective to modulate the proliferation and/or differentiation differentiation disorder, in a subject. The method includes of the cell. administering to the Subject an effective amount of an agent In a preferred embodiment, the 26583 polypeptide has an that modulates the activity or expression of a 26583 amino acid sequence identical to, or Substantially identical polypeptide or nucleic acid Such that the disorder is ame to, SEQ ID NO:15. In other embodiments, the 26583 15 liorated or prevented. polypeptide is a fragment of at least 15, 20, 50, 100, 150, In a preferred embodiment, the 26583 polypeptide has an 200, 213, 250, or more contiguous amino acids of SEQ ID amino acid sequence identical to, or Substantially identical NO:15. In a preferred embodiment, the 26583 polypeptide is to, SEQ ID NO:15. In other embodiments, the 26583 a fragment of at least 213 contiguous amino a polypeptide is a fragment of at least 15, 20, 50, 100, 150, or In a preferred embodiment, the 26583 nucleic acid has a more contiguous amino acids of SEQ ID NO:15. In a nucleotide sequence identical to, or Substantially identical preferred embodiment, the 26583 polypeptide is a fragment to, SEQ ID NO:14 or 16. In other embodiments, the 26583 of at least 213 contiguous amino acids of SEQ ID NO:15. nucleic acid is a fragment of at least 50, 100, 150, 200, 250, In a preferred embodiment, the 26583 nucleic acid has a 300, 350, 400, 450, 500, or more contiguous nucleotides of nucleotide sequence identical to, or Substantially identical 25 to, SEQ ID NO:14 or 16. In other embodiments, the 26583 SEQ ID NO:14 or 16. nucleic acid is a fragment of at least 50, 100, 150, 200, 250, In a preferred embodiment, the agent modulates (e.g., 300, 350, 400, 450, 500, or more contiguous nucleotides of increases or decreases) 26583 protein phosphatase activity, SEQ ID NO:14 or 16. e.g., serine/threonine phosphatase activity. In a preferred embodiment, the agent modulates (e.g., In a preferred embodiment, the agent modulates (e.g., 30 increases or decreases) protein phosphatase activity. increases or decreases) expression of the 26583 nucleic acid In a preferred embodiment, the agent modulates (e.g., by, e.g., modulating transcription, mRNA stability, etc. increases or decreases) expression of the 26583 nucleic acid In a preferred embodiment, the cell, e.g., the 26583 by, e.g., modulating transcription, mRNA stability, etc. expressing cell, is a lung cell, a breast cell, a colon cell, a In preferred embodiments, the agent is a , a liver cell, or a brain cell, e.g., a neuron or glial cell. 35 phosphopeptide, a small molecule, e.g., a member of a In a preferred embodiment, the cell, e.g., the 26583 combinatorial library, or an antibody, or any combination expressing cell, is a tumor cell, e.g., a lung, breast, colon, thereof. The antibody can be conjugated to a therapeutic liver, or brain tumor cell. moiety selected from the group consisting of a cytotoxin, a In a preferred embodiment, the cell, e.g., the 26583 cytotoxic agent and a radioactive metal ion. expressing cell, is further contacted with a protein, e.g., a 40 In additional preferred embodiments, the agent is an cytokine or a hormone. Exemplary proteins include, but are antisense, a ribozyme, or a triple helix molecule, oran 26583 not limited to, G-CSF, GM-CSF, stem cell factor, interleu nucleic acid, or any combination thereof. kin-3 (IL-3), IL-4, Flt-3 ligand, thrombopoietin, and eryth In a preferred embodiment, the agent is administered in ropoietin. The protein contacting step can occur before, at 45 combination with a cytotoxic agent. the same time, or after the agent is contacted. The protein In a preferred embodiment, the Subject is a human, e.g., contacting step can be effected in vitro or ex vivo. For a patient with a metabolic disorder, e.g., a mitochondrial example, the cell, e.g., the 26583-expressing cell is obtained related disorder or a cholesterol biosynthesis related disor from a Subject, e.g., a patient, and contacted with the protein der, e.g., hypo- or hypercholesterolemia, diabetes mellitus, ex vivo. The treated cell can be re-introduced into the 50 or a neurodegenerative disorder (e.g., Parkinson's, Hunting Subject. Alternatively, the protein contacting step can occur tons, or Alzheimer's disease). The subject can also be a in vivo. patient with a cell proliferation or differentiation disorder, In a preferred embodiment, the agent and the 26583 e.g., a tumor, e.g., a patient with a lung, breast, colon, liver, polypeptide or nucleic acid are contacted in vitro or ex vivo. or brain tumor. In other embodiments, the subject is a In a preferred embodiment, the contacting step is effected 55 non-human animal, e.g., an experimental animal. in vivo in a subject, e.g., as part of a therapeutic or In a preferred embodiment, the agent decreases the pro prophylactic protocol. Preferably, the subject is a human, liferation and/or enhances the differentiation of the cell, e.g., e.g., a patient with a metabolic disorder, e.g., a mitochon the 26583-expressing cell, e.g., the lung, breast, colon, liver, drial related disorder or a cholesterol biosynthesis related or brain cell. Such agents can be used to treat or prevent disorder, or a patient with a cell proliferation or differentia 60 cancers, e.g., liver, breast, brain, colon, or lung carcinomas. tion disorder, e.g., a tumor. For example, the Subject can be In a preferred embodiment, the disorder is a metabolic a cancer patient, e.g., a patient with a lung, breast, colon, disorder, e.g., a cholesterol synthesis disorder, e.g., hypo- or liver, or brain tumor. The subject can also be a patient with hypercholesterolemia; or a mitochondrial related disorder, diabetes mellitus or a neurodegenerative disorder (e.g., e.g., diabetes mellitus, or Parkinson's, Huntington's, or Parkinson's, Huntington's, or Alzheimer's disease). In other 65 Alzheimer's disease. embodiments, the Subject is a non-human animal, e.g., an In a preferred embodiment, the disorder is a cancer, e.g., experimental animal. a lung, breast, colon, liver, or brain cancer. US 7,282,360 B2 7 8 In a preferred embodiment, the method further includes agent (e.g., a test compound); and determining the effect of administering an effective amount of a protein, e.g., a the test compound on the activity of the polypeptide or cytokine or a hormone, to the Subject. Exemplary proteins nucleic acid to thereby identify a compound which modu include, but are not limited to, G-CSF, GM-CSF, stem cell lates the activity of the polypeptide or nucleic acid. factor, interleukin-3 (IL-3), IL-4, Flt-3 ligand, thrombopoi 5 In a preferred embodiment, the activity of the 26583 etin, and erythropoietin. The protein can be administered polypeptide is a protein phosphatase activity. before, at the same time or after, administration of the agent. In a preferred embodiment, the activity of the 26583 The administration of the agent and/or protein can be polypeptide is proliferation, differentiation, and/or survival repeated. of a cell, e.g., a 26583-expressing cell, e.g., a lung, breast, In still another aspect, the invention features a method for 10 colon, liver, or brain cell. evaluating the efficacy of a treatment of a disorder, in a In yet another aspect, the invention features a method of Subject. The method includes treating a subject with a treating or preventing a hematopoietic disorder, e.g., an protocol under evaluation; assessing the expression of a erythroid-associated disorder, in a subject. The method 26583 nucleic acid or 26583 polypeptide, such that a change includes administering to the Subject an effective amount of in the level of 26583 nucleic acid or 26583 polypeptide after 15 an agent that modulates the activity or expression of a 23436 treatment, relative to the level before treatment, is indicative polypeptide or nucleic acid Such that the hematopoietic of the efficacy of the treatment of the disorder. disorder is ameliorated or prevented. In preferred embodi In a preferred embodiment, the disorder is a metabolic ments, the agent is a peptide, a phosphopeptide, a small disorder, e.g., a cholesterol synthesis disorder, e.g., hypo- or molecule, e.g., a member of a combinatorial library, or an hypercholesterolemia; or a mitochondrial related disorder, antibody, or any combination thereof. The antibody can be e.g., diabetes mellitus, or Parkinson's, Huntington's, or conjugated to a therapeutic moiety selected from the group Alzheimer's disease. In a preferred embodiment, the disor consisting of a cytotoxin, a cytotoxic agent and a radioactive der is a cancer, e.g., a lung, breast, colon, liver, or brain metal ion. cancer. In a preferred embodiment, the Subject is a human. In another aspect, the invention features a method of In a preferred embodiment, the Subject is an experimental 25 modulating a hematopoietic disorder, e.g., an erythroid animal, e.g., an animal model for a metabolic disorder or associated disorder or a disorder of erythropoiesis, compris CaCC. ing contacting a hematopoietic cell, e.g., a blood cell. Such In a preferred embodiment, the method can further as an erythroid cell or erythroid-precursor, with a agent that include treating the Subject with a protein, e.g., a cytokine or increases or decreases the activity or expression of a 23436 a hormone. Exemplary proteins include, but are not limited 30 polypeptide or nucleic acid, thereby (a) ameliorating or to, G-CSF, GM-CSF, stem cell factor, interleukin-3 (IL-3), preventing the hematopoietic disorder and/or (b) modulating IL-4, Flt-3 ligand, thrombopoietin, and erythropoietin. the differentiation of the hematopoietic cell, e.g., the blood The invention also features a method of diagnosing a cell. disorder, e.g., a metabolic disorder or a cell proliferation/ The invention also provides assays for determining the differentiation disorder, e.g., cancer, in a subject. The 35 activity of or the presence or absence of 23436 polypeptides method includes evaluating the expression or activity of a or nucleic acid molecules in a biological sample, including 26583 nucleic acid or a 26583 polypeptide, such that, a for disease diagnosis. difference in the level of 26583 nucleic acid or 26.583 In one embodiment, the modulator of the 53070, 15985, polypeptide relative to a normal Subject or a cohort of 26583, 21953, m32404, 14089, or 23436 is an agent as normal subjects is indicative of the disorder. 40 described herein. In a preferred embodiment, the evaluating step occurs in In yet another aspect, the invention provides methods for vivo. For example, by administering to the Subject a detect modulating, e.g., inhibiting or increasing, the activity or ably labeled agent that interacts with the 26583 nucleic acid expression of a 53070-, 15985- 26583-, 21953-, m32404-, or polypeptide. Such that a signal is generated relative to the 14089-, or 23436-expressing cell, e.g., a hyper-proliferative level of activity or expression of the 26583 nucleic acid or 45 53070-, 15985- 26583-, 21953-, m32404-, 14089-, or polypeptide. 23436-expressing cell. The method includes contacting the In a preferred embodiment, the disorder is a metabolic cell with an agent, e.g., a compound (e.g., a compound disorder, e.g., a cholesterol synthesis disorder, e.g., hypo- or identified using the methods described herein) that modu hypercholesterolemia; or a mitochondrial related disorder, lates the activity, or expression, of the 53070, 15985, 26583, e.g., diabetes mellitus, or Parkinson's, Huntington's, or 50 21953, m32404, 14089, or 23436 polypeptide or nucleic Alzheimer's disease. In a preferred embodiment, the disor acid. der is a cancer, e.g., a lung, breast, colon, liver, or brain Preferably, the methods inhibit the proliferation or induce CaCC. the killing of a 53070-, 15985- 26583-, 21953-, m32404-, The invention also provides assays for determining the 14089-, or 23436-expressing cell, e.g., a hyper-proliferative activity of or the presence or absence of 26583 polypeptides 55 53070-, 15985- 26583-, 21953-, m32404-, 14089-, or or nucleic acid molecules in a biological sample, including 23436-expressing cell. for disease diagnosis. In a preferred embodiment, the contacting step is effective In further aspect, the invention provides assays for deter in vitro or ex vivo. In other embodiments, the contacting step mining the presence or absence of a genetic alteration in a is effected in Vivo, e.g., in a Subject (e.g., a mammal, e.g., a 26583 polypeptide or nucleic acid molecule, including for 60 human), as part of a therapeutic or prophylactic protocol. disease diagnosis. In a preferred embodiment, the cell is a hyperproliferative In yet another aspect, the invention features a method for cell, e.g., a cell found in a solid tumor, a Soft tissue tumor, identifying an agent, e.g., a compound, which modulates the or a metastatic lesion. For example, the 21953-expressing activity of a 26583 polypeptide, e.g., a 26583 polypeptide as cell is a lung, breast, ovary, prostate, or colon cell. In a described herein, or the expression of a 26583 nucleic acid, 65 preferred embodiment, the cell is lung cell. e.g., a 26583 nucleic acid as described herein, including In other embodiments, the 21953-expressing cell is a contacting the 26583 polypeptide or nucleic acid with a test neural cell (e.g., a neuronal or a glial cell), a vascular cell US 7,282,360 B2 10 (e.g., Smooth muscle or an endothelial cell), a heart cell, a In another aspect, the invention provides methods of prostatic cell, or an immune cell. Preferably, the tumor is a diagnosing or staging a disorder, e.g., proliferative disorder. sarcoma, a carcinoma, or an adenocarcinoma. Preferably, the The method includes: (i) identifying a subject having, or at m32404-expressing hyperproliferative cell is found in a risk of having, the disorder; (ii) obtaining a sample of a cancerous or pre-cancerous tissue, e.g., a cancerous or tissue or cell affected with the disorder; (iii) contacting said pre-cancerous tissue where an m32404 polypeptide or sample or a control sample with a labeled agent specific for nucleic acid is expressed, e.g., breast, ovarian, colon, liver, a 15985 polypeptide or nucleic acid, e.g., a probe or a lung, kidney, or brain cancer. Most preferably, the m32404 primer, under conditions that allow interaction of the labeled expressing hyperproliferative cell is found in a tumor from agent and the 15985 nucleic acid, e.g., cDNA, mRNA, or the breast, ovary, colon, liver and lung. 10 15985 protein to occur, and (iv) detecting formation of a In a preferred embodiment, the agent, e.g., the compound, complex. A statistically significant increase in the formation is an inhibitor of a 53070, 15985, 26583, 21953, m32404, of the complex between the labeled agent with respect to a 14089, or 23436 polypeptide. Preferably, the inhibitor is reference, e.g., a control sample, is indicative of the disorder chosen from a peptide, a phosphopeptide, a small organic or the stage of the disorder. The level of 15985 nucleic acid molecule, a small inorganic molecule and an antibody (e.g., 15 or polypeptide expression can be detected by any method an antibody conjugated to a therapeutic moiety selected described herein. Preferably, the labeled agent is directly or from a cytotoxin, a cytotoxic agent and a radioactive metal indirectly labeled with a detectable substance to facilitate ion). In another preferred embodiment, the agent, e.g., detection of the bound or unbound binding agent. Suitable compound, is an inhibitor of a 53070, 15985, 26583, 21953, detectable Substances include various , prosthetic m32404, 14089, or 23436 nucleic acid, e.g., an antisense, a groups, fluorescent materials, luminescent materials and ribozyme, or a triple helix molecule. The inhibitor can also radioactive materials. be a protease inhibitor or a derivative thereof, or a peptido In a further aspect, the invention provides methods for mimetic, e.g., a phosphonate analog of a peptide Substrate evaluating the efficacy of a treatment of a disorder, e.g., a Such as a prolyl peptide Substrate. In another preferred proliferative disorder or a differentiation disorder (e.g. in the embodiment, the compound is an inhibitor of a 21953 25 case of 21953, lung cancer, or a neuronal disorder). The nucleic acid, e.g., an antisense, a ribozyme, or a triple helix method includes: treating a Subject, e.g., a patient or an molecule. The inhibitor can also be a or a animal, with a protocol under evaluation (e.g., treating a derivative thereof, or a peptidomimetic, e.g., a phosphonate Subject with one or more of chemotherapy, radiation, and/or analog of a peptide Substrate. a compound identified using the methods described herein); 30 and evaluating the expression of a 53070, 15985, 26583, In another embodiment, the agent, e.g., the compound, is 21953, m32404, 14089, or 23436 nucleic acid or polypep administered in combination with a cytotoxic agent. tide before and after treatment. A change, e.g., a decrease or Examples of cytotoxic agents include anti-microtubule increase, in the level of a 53070, 15985, 26583, 21953, agent, a topoisomerase I inhibitor, a topoisomerase II inhibi m32404, 14089, or 23436 nucleic acid (e.g., mRNA) or tor, an anti-metabolite, a mitotic inhibitor, an alkylating 35 polypeptide after treatment, relative to the level of expres agent, an intercalating agent, an agent capable of interfering sion before treatment, is indicative of the efficacy of the with a signal transduction pathway, an agent that promotes treatment of the disorder. The level of 53070, 15985, 26583, apoptosis or necrosis, and radiation. 21953, m32404, 14089, or 23436 nucleic acid or polypep In another embodiment, the agent, e.g., compound, is an tide expression can be detected by any method described activator of a 53070, 15985, 26583,21953, m32404, 14089, 40 herein. or 23436 polypeptide. Preferably, the activator is chosen The invention also provides assays for determining the from a peptide, a phosphopeptide, a Small organic molecule, activity of or the presence or absence of m32404 polypep a Small inorganic molecule and an antibody. In yet another tides or nucleic acid molecules in a biological sample, embodiment, the compound stimulates the expression of a including for disease diagnosis. Preferably, the biological 53070, 15985, 26583, 21953, m32404, 14089, or 23436 45 sample includes a cancerous or pre-cancerous cell or tissue. nucleic acid. For example, the cancerous tissue can be a solid tumor, a soft In another aspect, the invention features methods for tissue tumor, or a metastatic lesion. Preferably, the cancer treating or preventing a disorder characterized by aberrant ous tissue is a sarcoma, a carcinoma, or an adenocarcinoma. cellular proliferation or differentiation of a 53070-, 15985-, Preferably, the cancerous tissue is from the breast, ovarian, 26583-, m32404-, 14089-, or 23436-expressing cell, in a 50 colon, lung, liver, kidney, or brain. Subject. Preferably, the method includes comprising admin In a further aspect the invention provides assays for istering to the Subject (e.g., a mammal, e.g., a human) an determining the presence or absence of a genetic alteration effective amount of a compound (e.g., a compound identified in an m32404 polypeptide or nucleic acid molecule in a using the methods described herein) that modulates the sample, for, e.g., disease diagnosis. Preferably, the sample activity, or expression, of the 53070, 15985, 26583, 21953, 55 includes a cancer cell or tissue. For example, the cancer can m32404, 14089, or 23436 polypeptide or nucleic acid. In a be a solid tumor, a soft tissue tumor, or a metastatic lesion. preferred embodiment, the disorder is a cancerous or pre Preferably, the cancer is a sarcoma, a carcinoma, or an cancerous condition, e.g. in the case of 21953, relating to adenocarcinoma. Preferably, the cancer is a breast, ovarian, proliferation of a lung, breast, ovary, prostate, or colon cell. colon, lung, liver, kidney, or brain cancer. In another preferred embodiment, the disorder is an immune, 60 In a still further aspect, the invention provides methods a neuronal, cardiovascular, reproductive disorder, e.g., a for evaluating the efficacy of a treatment of a disorder, e.g., disorder relating to aberrant processing of a polypeptide proliferative disorder, e.g., cancer (e.g., breast, ovarian, hormone. Preferably, the cancer is found in a tissue where an colon, liver or lung cancer). The method includes: treating a m32404 polypeptide or nucleic acid is expressed, e.g., Subject, e.g., a patient or an animal, with a protocol under breast, ovarian, colon, liver, lung, kidney, or brain cancer. 65 evaluation (e.g., treating a subject with one or more of Most preferably, the cancer is found in the breast, ovary, chemotherapy, radiation, and/or a compound identified colon, liver and lung. using the methods described herein); and evaluating the US 7,282,360 B2 11 12 expression of an m32404 nucleic acid or polypeptide before a sample by contacting the sample to the aforementioned and after treatment. A change, e.g., a decrease or increase, in array and detecting binding of the sample to the array. the level of an m32404 nucleic acid (e.g., mRNA) or Other features and advantages of the invention will be polypeptide after treatment, relative to the level of expres apparent from the following detailed description, and from sion before treatment, is indicative of the efficacy of the the claims. treatment of the disorder. In a preferred embodiment, the disorder is a cancer of the BRIEF DESCRIPTION OF THE DRAWINGS breast, ovary, colon, lung, or liver. The level of m32404 nucleic acid or polypeptide expression can be detected by FIG. 1 depicts a hydropathy plot of human 53070. Rela any method described herein. 10 tive hydrophobic residues are shown above the dashed In a preferred embodiment, the evaluating step includes horizontal line, and relative hydrophilic residues are below obtaining a sample (e.g., a tissue sample, e.g., a biopsy, or the dashed horizontal line. Numbers corresponding to posi a fluid sample) from the subject, before and after treatment tions in the amino acid sequence of human 53070 are and comparing the level of expressing of a 53070, 15985, indicated. Polypeptides of the invention include fragments 26583, 21953, m32404, 14089, or 23436 nucleic acid (e.g., 15 which include: all or part of a hydrophobic sequence, i.e., a mRNA) or polypeptide before and after treatment. sequence above the dashed line, e.g., the sequence from In another aspect, the invention provides methods for about amino acid 63 to 73, from about 86 to 102, and from evaluating the efficacy of a therapeutic or prophylactic agent about 199 to 216 of SEQ ID NO:2; all or part of a (e.g., an anti-neoplastic agent). The method includes: con hydrophilic sequence, i.e., a sequence below the dashed line, tacting a sample with an agent (e.g., a compound identified e.g., the sequence of from about amino acid 103 to 119, from using the methods described herein, a cytotoxic agent) and, about 226 to 247, and from about 301 to 329 of SEQ ID evaluating the expression of 53070, 15985, 26583, 21953, NO:2. m32404, 14089, or 23436 nucleic acid or polypeptide in the FIG. 2 depicts an alignment of the protein kinase domain sample before and after the contacting step. A change, e.g., of human 53070 with a consensus amino acid sequence a decrease or increase, in the level of 53070, 15985, 26583, 25 derived from a hidden Markov model (HMM) from PFAM. 21953, m32404, 14089, or 23436 nucleic acid (e.g., mRNA) The upper sequence is the consensus amino acid sequence or polypeptide in the sample obtained after the contacting (SEQ ID NO:4), while the lower amino acid sequence step, relative to the level of expression in the sample before corresponds to amino acids 12 to 272 of SEQ ID NO:2. the contacting step, is indicative of the efficacy of the agent. FIG. 3 depicts an alignment of the serine/threonine pro The level of 53070, 15985, 26583, 21953, m32404, 14089, 30 tein kinase domain of human 53070 with a consensus amino or 23436 nucleic acid or polypeptide expression can be acid sequence derived from a hidden Markov model (HMM) detected by any method described herein. In a preferred from SMART. The upper sequence is the consensus amino embodiment, the sample includes cells obtained from a acid sequence (SEQID NO:5), while the lower amino acid cancerous tissue. The cancerous tissue can include, for 35 sequence corresponds to amino acids 12 to 272 of SEQ ID example in the case of 21953, cells of lung, breast, ovary, NO:2. prostate, or colon. In a preferred embodiment, the sample FIG. 4 depicts a hydropathy plot of human 15985. Rela includes cells obtained from a cancerous tissue where an tive hydrophobic residues are shown above the dashed m32404 polypeptide or nucleic acid is obtained, e.g., a horizontal line, and relative hydrophilic residues are below cancer of the breast, ovary, colon, lung, or liver. 40 the dashed horizontal line. The cysteine residues (cys) are The invention also provides assays for determining the indicated by short vertical lines just below the hydropathy activity of or the presence or absence of 14089 polypeptides trace. The numbers corresponding to the amino acid or nucleic acid molecules in a biological sample, including sequence of human 15985 are indicated. Polypeptides of the for disease diagnosis. invention include fragments which include: all or part of a In further aspect, the invention provides assays for deter 45 hydrophobic sequence, i.e., a sequence above the dashed mining the presence or absence of a genetic alteration in a line, e.g., the sequence from about amino acid 83 to 91, from 53070, 15985, 26583, 21953, m32404, 14089, or 23436 about 465 to 472, and from about 568 to 585 of SEQ ID polypeptide or nucleic acid molecule, including for disease NO:8; all or part of a hydrophilic sequence, i.e., a sequence diagnosis, or, in the case of 23436, a disease Susceptibility below the dashed line, e.g., the sequence of from about (e.g., Susceptibility to prostate cancer and/or brain cancer). 50 amino acid 8 to 20, from about 592 to 600, and from about In a still further aspect, the 21953 invention features a 652 to 672 of SEQ ID NO:8; a sequence which includes a method of processing a polypeptide hormone precursor, e.g., Cys, or a site. in vitro. FIG. 5 depicts an alignment of the protein kinase domain In another aspect, the invention features a two dimen of human 15985 with a consensus amino acid sequence sional array having a plurality of addresses, each address of 55 derived from a hidden Markov model (HMM) from PFAM. the plurality being positionally distinguishable from each The upper sequence is the consensus amino acid sequence other address of the plurality, and each address of the (SEQ ID NO:10), while the lower amino acid sequence plurality having a unique capture probe, e.g., a nucleic acid corresponds to amino acids 394 to 651 of SEQ ID NO:8. or peptide sequence. At least one address of the plurality has FIGS. 6A-6B depicts an alignment of the doublecortin a capture probe that recognizes a 53070, 15985, 26583, 60 repeats of human 15985 with a consensus amino acid 21953, m32404, 14089, or 23436 molecule. In one embodi sequence derived from a hidden Markov model (HMM) ment, the capture probe is a nucleic acid, e.g., a probe from SMART. A. The upper sequence is the consensus complementary to a 53070, 15985, 26583, 21953, m32404, amino acid sequence (SEQ ID NO:11), while the lower 14089, or 23436 nucleic acid sequence. In another embodi amino acid sequence corresponds to the first doublecortin ment, the capture probe is a polypeptide, e.g., an antibody 65 repeat of human 15985, amino acids 67 to 158 of SEQ ID specific for 53070, 15985, 26583,21953, m32404, 14089, or NO:8. B. The upper sequence is the consensus amino acid 23436 polypeptides. Also featured is a method of analyzing sequence (SEQ ID NO:11), while the lower amino acid US 7,282,360 B2 13 14 sequence corresponds to the second doublecortin repeat of ID NO:23), to the 21953 amino acid sequence. The * symbol human 15985, amino acids 192 to 280 of SEQ ID NO:8. indicates identities, and the: or, symbols indicate similari FIG. 7 depicts an alignment of the protein kinase domain ties. The alignment was generated by ClustalW (Thompson of human 15985 with a consensus amino acid sequence for et al. (1994) Nucleic Acids Res. 22:4673-4680). serine/threonine protein derived from a hidden FIG. 15 depicts a hydropathy plot of human m32404. Markov model (HMM) from SMART. The upper sequence Relative hydrophobic residues are shown above the dashed is the consensus amino acid sequence (SEQ ID NO:12), horizontal line, and relative hydrophilic residues are below while the lower amino acid sequence corresponds to the the dashed horizontal line. The cysteine residues (cys) are protein kinase domain of human 15985, amino acids 394 to indicated by short vertical lines just below the hydropathy 651 of SEQ ID NO:8. 10 trace. The numbers corresponding to the amino acid FIG. 8 depicts an alignment of the doublecortin repeats of sequence of human m32404 are indicated. Polypeptides of human 15985 with a consensus amino acid sequence derived the invention include fragments which include: all or part of from a ProDom family PD024506 (ProDomain Release a hydrophobic sequence, i.e., a sequence above the dashed 2000.1). The lower sequence is the consensus amino acid line, e.g., the sequence from about amino acid 320 to 340, sequence (SEQ ID NO:13), while the upper amino acid 15 and from about 450-470, of SEQ ID NO:25; all or part of a sequence corresponds to the doublecortin repeats of human hydrophilic sequence, i.e., a sequence below the dashed line, 15985, amino acids 42 to 291 of SEQ ID NO:8. e.g., the sequence from about amino acid 30 to 60 of SEQ FIG. 9 depicts a hydropathy plot of human 26583. Rela ID NO:25; a sequence which includes a Cys, or a glycosy tive hydrophobic residues are shown above the dashed lation site. horizontal line, and relative hydrophilic residues are below FIGS. 16A-16B depict alignments of the trypsin domains the dashed horizontal line. The cysteine residues (cys) are of human m32404 with a consensus amino acid sequence indicated by short vertical lines just below the hydropathy derived from a hidden Markov model (HMM) from PFAM. trace. The numbers corresponding to the amino acid The upper sequence is the consensus amino acid sequence sequence of human 26583 are indicated. Polypeptides of the (SEQ ID NO:27), while the lower amino acid sequence invention include fragments which include: all or part of a 25 corresponds to amino acids 45 to 268 of SEQ ID NO:25 hydrophobic sequence, i.e., a sequence above the dashed (FIG. 16A) or upper sequence is the consensus amino acid line, e.g., the sequence of 262-279; all or part of a hydro sequence (SEQ ID NO:28), while the lower amino acid philic sequence, i.e., a sequence below the dashed line, e.g., sequence corresponds to to amino acids 311 to 520 of SEQ the sequence of 60-70; a sequence which includes a Cys, or ID NO:25 (FIG. 16B). a glycosylation site. 30 FIGS. 17A-17B depict alignments of the trypsin domains FIGS. 10A-10B depict alignments of human 26583 amino of human m32404 with a consensus amino acid sequence for acid sequence with a consensus amino acid sequence a model trypsin domain from SMART. The upper sequence derived from protein phosphatase 2C (PP2C) (FIG. 10A) is the consensus amino acid sequence (SEQ ID NO:29), and protein phosphatase 2C 4 (PP2C 4) (FIG. 10B). In while the lower amino acid sequence corresponds to amino FIG. 10A, the upper sequence is the consensus amino acid 35 acids 38 to 268 of SEQ ID NO:25 (FIG. 17A) or to amino sequence (SEQID NO:17) for PP2C, while the lower amino acids 300 to 520 of SEQ ID NO:25 (FIG. 17B). acid sequence corresponds to amino acids 173 to 461 of SEQ FIG. 18 depicts a hydropathy plot of human 14089. ID NO:15. In FIG. 10B, the upper sequence is the consensus Relative hydrophobic residues are shown above the dashed amino acid sequence (SEQ ID NO:18) for PP2C 4, while horizontal line, and relative hydrophilic residues are below the lower amino acid sequence corresponds to amino acids 40 the dashed horizontal line. Cysteine (cys) residues are noted 99 to 522 of SEQ ID NO:15. by short vertical lines just below the hydropathy trace. The FIG. 11 shows a bar graph depicting relative 26583 numbers corresponding to the amino acid sequence of mRNA expression as determined by TaqMan assays on human 14089 are indicated. Polypeptides of the invention mRNA derived from the following tissue samples. Columns include fragments that include: all or part of a hydrophobic are numbered at five-column intervals at the bottom of the 45 sequence, i.e., a sequence above the dashed line, e.g., the Figure (i.e., columns 1-42), and correspond to the following: sequence from about amino acid 35 to 55, from about 58 to columns 1-3, normal breast; columns 4-10, breast tumor; 70, and from about 175 to 184 of SEQID NO:34: all or part columns 11-13, normal lung; columns 14-20, lung tumor; of a hydrophilic sequence, i.e., a sequence below the dashed columns 21-23, normal colon; columns 24-31, colon tumor; line, e.g., the sequence of from about amino acid 71 to 79, columns 32-35, colon metastases; columns 36-37, normal 50 from about 161 to 171, and from about 185 to 192 of SEQ liver; columns 38-39, normal brain; columns 40-42, brain ID NO:34, a sequence which includes a Cys, or a glycosy tumor. lation site. FIG. 12 depicts a hydropathy plot of human 21953. FIGS. 19 A-19B depict alignments of the trypsin domain Relative hydrophobic residues are shown above the dashed of human 14089 with a consensus amino acid sequence horizontal line, and relative hydrophilic residues are below 55 derived from a hidden Markov model (HMM) from PFAM the dashed horizontal line. Numbers corresponding to posi (3A) and SMART (3B). The upper sequences are the con tions in the amino acid sequence of human 21953 are sensus amino acid sequences (SEQ ID NO:36 and SEQ ID indicated. NO:37), while the lower amino acid sequence corresponds FIG. 13 depicts an alignment of the prolyl oligopeptidase to amino acids 41 to 234 of SEQID NO:34 and amino acids domain of human 21953 with a consensus amino acid 60 24 to 234 of SEQ ID NO:34 (FIGS. 19A and 19B, respec sequence derived from a hidden Markov model for prolyl tively). oligopeptidase domains. The upper sequence is the consen FIGS. 20A-20B depict a BLAST alignment of the serine sus amino acid sequence (SEQ ID NO:22), while the lower protease domain of human 14089 with a consensus amino acid sequence corresponds to amino acids 672 to 744 amino acid sequence derived from ProDomain No. 46 of SEQ ID NO:20. 65 (Release 1999.2; see also ProDom family PD00000046 FIGS. 14A-14B depict an alignment of human dipeptidyl (ProDomain Release 2000.1). FIG. 20A: The lower peptidase IV (Accession Number P4.8147) (upper line, SEQ sequence is the consensus amino acid sequence (SEQ ID US 7,282,360 B2 15 16 NO:38), while the upper amino acid sequence corresponds Molta; (38) Hep3b Normal; and (39) Hep3b Hyp. Erythroid to the serine protease Zymogen domain of human 14089. K562 cells (34), erythroid cells (26), and fetal liver cells (3) about amino acids 72 to 234 of SEQ ID NO:34. FIG. 20B: have elevated 2343.6 mRNA expression levels. The lower sequence is the consensus amino acid sequence FIG. 26 is a bar graph depicting relative 2343.6 mRNA (SEQ ID NO:39), while the upper amino acid sequence 5 expression as determined by TaqMan assays on mRNA corresponds to the serine protease Zymogen domain of derived from the following cell types: (1) Lung; (2) Colon human 14089, about amino acids 41 to 109 of SEQ ID 60; (3) Kidney 58; (4) Liver NDR 200; (5) Fetal Liver 425; NO:34. (6) Skeletal Muscle 167; (7) mBone Marrow MNC LP139; FIG. 21 depicts a hydropathy plot of human 23436. (8) mBone Marrow CD34+ LP92; (9) mBone Marrow Relative hydrophobic residues are shown above the dashed 10 CD34+LP143; (10) mPB CD34+ LF70; (11) mPB CD34+ horizontal line, and relative hydrophilic residues are below LP152; (12) Bone Marrow CD34+LF68: (13) Bone Marrow the dashed horizontal line. The cysteine residues (cys) are CD34+ LF154: (14) Cord Blood CD34+ LP121; (15) Bone indicated by short vertical lines just below the hydropathy Marrow GPA+: (16) Bone Marrow GPA+ LP34-1: (17) trace. The numbers corresponding to the amino acid Bone Marrow GPA Lo LP69; (18) Bone Marrow GPA Lo sequence of human 23436 are indicated. Polypeptides of the 15 LP82; (19) Bone Marrow CD41+ CD14-LP78; (20) mBone invention include fragments which include: all or part of a Marrow CD15+ LP15; (21) mBone Marrow CD15+ hydrophobic sequence, i.e., a sequence above the dashed CD11b- LP7-4; (22) mBone Marrow CD15+ CD11b+ line, e.g., the sequence from about amino acid 103 to 114, LP15-2: (23) Bone Marrow CD15+ CD11b- LP23-2: (24) from about 285 to 297, and from about 413 to 420 of SEQ Bone Marrow CD15+ CD34- LP27-2: (25) Bone Marrow ID NO:41; all or part of a hydrophilic sequence, i.e., a CD15+ CD34- LP41-1; (26) Erythroid 24 hr LF90; (27) sequence below the dashed line, e.g., the sequence of from Erythroid 48 hr LF73; (28) Erythroid 48 hr LF76; (29) about amino acid 76 to 87, from about 138 to 143, and from Erythroid 48 hr LF90: (30) Erythroid d6 LP31-1; (31) about 458 to 478 of SEQ ID NO:41; a sequence which Erythroid d7 LF24-5; (32) Erythroid d10 LP25-4; (33) includes a Cys, or a glycosylation site. Erythroid d12 LF23-8; (34) Erythroid d12 LF24-10; (35) FIGS. 22A-22B depict alignment of the ubiquitin car 25 Erythroid d14 GPA+LP31-4; (36) Meg 48 hr LFT6: (37) boxy-terminal hydrolase (family 2) domain of human 23436 Meg 48 hr LFT90: (38) Meg d7 LP31-2; (39) Meg d12 with consensus amino acid sequences derived from a hidden LF102; (40) Meg d12 LF35; and (41) Meg d14 LP31-5. Markov model (H) from PFAM. The consensus sequence for Fetal Liver (5) and day 12 erythroid cells (33) and (34) have the ubiquitin carboxy-terminal hydrolase (family 2) domain elevated 2343.6 mRNA expression levels. comprises two non-contiguous segments, UCH-1 and UCH 30 FIG. 27 is a bar graph depicting 23436 expression in 2. FIG. 22A depicts the alignment of human 23436 with the human prostate, hypothalamus, lung, bone marrow, differ UCH-1 segment of the ubiquitin carboxy-terminal hydrolase entiated osteoblasts, and aorta cells as assessed by Taq Man (family 2) domain. The upper sequence is the consensus analysis. Elevated expression is observed in Some prostate, amino acid sequence (SEQ ID NO:43), while the lower hypothalamus, and bone marrow cells. Relative expression amino acid sequence corresponds to amino acids 89 to 120 35 levels were determined by normalizing against a trachea of SEQID NO:41. FIG.22B depicts the alignment of human control. 23436 with the UCH-2 segment of the ubiquitin carboxy FIG. 28 is a bar graph depicting 23436 expression in terminal hydrolase (family 2) domain. The upper sequence human liver, several hepatoma cell lines (HepG2) and is the consensus amino acid sequence (SEQ ID NO:44), ganglia, as assessed by TaqMan analysis. Elevated expres while the lower amino acid sequence corresponds to amino 40 sion is observed in hepatoma cells (HepG2 cell line). Rela acids 332 to 420 of SEQ ID NO:41. tive expression levels were determined by normalizing FIG. 23 is a bar graph depicting relative 23436 mRNA against a trachea control. expression as determined by TaqMan assays on mRNA FIG. 29 is a bar graph depicting 23436 expression as derived from human hematological cell lines treated for determined by TaqMan assays on mRNA derived from the various times with transforming growth factor-fi (TGF-B) 45 following cell types: (1) brain; (2) brain cortex; (3) breast; and VPA. Erythroid lineage precursors have elevated 23436 (4) colon tumor; (5) heart; (6) kidney; (7) liver norm; (8) expression levels. Expression is reduced by TGF-B treat liver fib; (9) lung tumor; (10) ovary; (11) fetal liver; (12) ment. mBM CD34+ LP92; (13) mBM CD34+ LP143: (14) mPB FIG. 24 is a bar graph depicting relative 23436 mRNA CD34+ LF70; (15) mPB CD34+ LF 162; (16) BM CD34+ expression as determined by TaqMan assays on mRNA 50 LF93: (17) BM CD34+ LP154; (18) Cord Blood CD34+ derived from human hematological cells including neutro LF101; (19) GPA+ High LP34-1; (20) GPA+ High 69; (21) phils, platelets, blood forming units (BFU), and TGFB GPA+ High 74; (22) GPA+ Low LP69; (23) GPA+ Low treated hematopoietic precursors. BFU's treated with eryth LP82; (24) Ery 24 hr LF 102: (25) Ery 48 h LF87; (26) Ery ropoietin (EPO) have elevated 23436 expression levels. 48 h LF 102: (27) Ery 48 h LF72; (28) Ery d6 LP31-1; (29) FIG. 25 is a bar graph depicting relative 23436 mRNA 55 Ery d6 LF113; (30) Ery d7 LF24-5; (31) Ery d8 LF113; (32) expression as determined by TaqMan assays on mRNA Ery d10 LP24-4; (33) Ery d12 LF23-8; (34) Ery d12 derived from the following cell types: (1) lung; (2) kidney; LF24-10; (35) Ery d12 LF113; (36) Ery d14 GPA+ LP31-4: (3) fetal liver; (4) grans.; (5) NHDF mock; (6) NHDF TGF; (37) BFU d7 LP79; (38) BFU d7 LP95; (39) BFU d7+3 Epo (7) NHLF mock; (8) NHLF TGF; (9) NC Heps: (10) Pass LP81; and (40) BFU d7+3 Epo LP104. Stell; (11) Liver CHT 339; (12) Liver NDR 191; (13) LF 60 NDR 079; (14) Lymph Node: (15) Tho 046 6h: (16) Th1 046 DETAILED DESCRIPTION OF THE 6h: (17) Th2 046 6h: (18) CD8; (19) CD 14; (20) PBMC INVENTION Rest; (21) MBM MNC; (22) MPB CD34; (23) ABM CD34; (24) Cord Blood; (25) Erythroid cells; (26) Megakaryocytes; Phosphate tightly associated with protein has been known (27) Neutrophil d14; (28) CD15+/CD14- cells: (29) MBM 65 since the late nineteenth century. The occurrence of phos CD11b-; (30) BMGPA; (31) VZV mock; (32) VZV 18h; phorylated proteins implies the existence of one or more (33) VZV 72h; (34) K562; (35) NTC; (36) HL60: (37) protein kinases capable of phosphorylating amino acid resi US 7,282,360 B2 17 18 dues on proteins, and also of protein phosphatases capable five predicted Protein Kinase C phosphorylation sites of hydrolyzing phosphorylated amino acid residues on pro (PS00005) located at about amino acid residues 31 to 33, teins. 158 to 160, 166 to 168, 290 to 292, and 304 to 306 of SEQ Protein kinases play critical roles in the regulation of ID NO:2: biochemical and morphological changes associated with 5 three predicted Casein Kinase II phosphorylation sites cellular growth and division (D'Urso, G. et al. (1990) (PS00006) located at about amino acid residues 310 to 313, Science 250: 786-791; Birchmeier. C. et al. (1993) Bioes 326 to 329, and 349 to 352 of SEQ ID NO:2; and says 15:185-189). They serve as growth factor receptors and one predicted N-myristylation sites (PS00008) from about signal transducers and have been implicated in cellular amino acid residues 15 to 20 of SEQ ID NO:2. transformation and malignancy (Hunter, T. et al. (1992) Cell 10 For general information regarding PFAM identifiers, PS 70: 375-387: Posada, J. et al. (1992) Mol. Biol. Cell 3: prefix and PF prefix domain identification numbers, refer to: 583-592; Hunter, T. et al. (1994) Cell 79: 573-582). For Sonnhammer et al. (1997) Protein 28:405-420; example, protein kinases have been shown to participate in The 53070 protein contains a significant number of struc the transmission of signals from growth-factor receptors tural characteristics in common with members of the protein (Sturgill, T. W. et al. (1988) Nature 344: 715-718; Gomez, 15 kinase family, and in particular the serine/threonine protein N. et al. (1991) Nature 353: 170-173), control of entry of kinase subfamily. The term “family' when referring to the cells into mitosis (Nurse, P. (1990) Nature 344: 503-508: protein and nucleic acid molecules of the invention means Maller, J. L. (1991) Curr. Opin. Cell Biol. 3: 269-275) and two or more proteins or nucleic acid molecules having a regulation of actin bundling (Husain-Chishti, A. et al. (1988) common structural domain or motif and having Sufficient Nature 334: 718-721). amino acid or nucleotide sequence homology as defined Protein kinases can be divided into two main groups herein. Such family members can be naturally or non based on either amino acid sequence similarity or specificity naturally occurring and can be from either the same or for either serine/threonine or tyrosine residues. A small different species. For example, a family can contain a first number of dual-specificity kinases are structurally like the protein of human origin as well as other distinct proteins of serine/threonine-specific group. Within the broad classifica 25 human origin, or alternatively, can contain homologues of tion, kinases can be further sub-divided into families whose non-human origin, e.g., rat or mouse proteins. Members of members share a higher degree of catalytic domain amino a family can also have common functional characteristics. acid sequence identity and also have similar biochemical Protein kinase family members are characterized by a properties. Most protein kinase family members also share common fold, which includes a small lobe associated pri structural features outside the kinase domain that reflect 30 marily with binding ATP and a large lobe associated prima their particular cellular roles. These include regulatory rily with binding Substrate peptides and catalyzing the domains that control kinase activity or interaction with other transfer of phosphate from ATP to substrate. Bases on proteins (Hanks, S. K. et al. (1988) Science 241: 42-52). sequence similarity, the kinase domain has been divided into The human 53070 sequence (see SEQID NO:1), which is eleven distinct regions, or Subdomains, and within these approximately 1704 nucleotides long including untranslated 35 eleven Subdomains there are a large number of amino acid regions, contains a predicted methionine-initiated coding residues that are considered “invariant, or highly con sequence of about 1104 nucleotides, including the termina served, amongst members of the protein kinase family. As tion codon. The coding sequence encodes a 367 amino acid used herein, an amino acid is “invariant' if it is present in the protein (see SEQ ID NO:2). equivalent position, as determined by a sequence alignment, 40 in 95% or more of the members of family. For example, in Human 53070 contains the following regions or other Subdomain 1 of kinase domain family members there are structural features: two invariant glycine residues and an invariant resi a protein kinase domain (PFAM accession number due; in Subdomain 2 there is an invariant residue; in PF00069) located at about amino acid residues 12 to 272 of Subdomain 3 there is an invariant glutamic acid residue; in SEQ ID NO:2: 45 Subdomain 6 there is an invariant residue and thirteen highly conserved amino acid residues typically an invariant asparagine residue; in Subdomain 7 there are present in members of the protein kinase family, including three invariant residues adjacent to one another, consisting a glycine residue located at about amino acid residue 19 of of the sequence aspartic acid, phenylalanine, and glycine; in SEQID NO:2, a glycine residue located at about amino acid Subdomain 8 there is an invariant glutamic acid residue; in residue 21 of SEQID NO:2, a valine residue located at about 50 Subdomain 9 there is an invariant aspartic acid residue and amino acid residue 26 of SEQ ID NO:2, a lysine residue an invariant glycine; and in Subdomain 11 there is an located at about amino acid residue 41 of SEQ ID NO:2, a invariant residue. An alignment of protein kinase glutamic acid residue located at about amino acid residue 60 family members that includes a description of the eleven of SEQ ID NO:2, an aspartic acid residue located at about Subdomains and the invariant residues found within each amino acid residue 136 of SEQ ID NO:2, an asparagine 55 subdomain can be found in Hanks et al. (1988), Science residue located at about amino acid residue 141 of SEQ ID 241:42-52, the contents of which are incorporated herein by NO:2, an aspartic acid residue located at about amino acid reference. residue 154 of SEQ ID NO:2, a phenylalanine residue Structural analyses of the kinase domains of several located at about amino acid residue 155 of SEQID NO:2, a different proteins have been performed, and the function of glutamic acid residue located at about amino acid residue 60 the invariant amino acid residues can be assigned accord 185 of SEQ ID NO:2, an aspartic acid residue located at ingly. The invariant glycines of Subdomain 1 are part of a about amino acid residue 198 of SEQ ID NO:2, a glycine loop that anchors the 0-phosphate of ATP, while the invari residue located at about amino acid residue 203 of SEQ ID ant valine of Subdomain 1 forms part of the adenine binding NO:2, and an arginine residue located at about amino acid pocket. The invariant lysine of subdomain 2 also helps the residue 260 of SEQ ID NO:2: 65 kinase domain bind ATP by interacting with both the I- and one serine/threonine signature motif 0-phosphate groups of ATP. The invariant aspartic acid (PS00108), located at about amino acid residues 132 to 144: residue of subdomain 6 catalyzes the transfer of the K-phos US 7,282,360 B2 19 20 phate group of ATP to the substrate. The invariant aspartic amino acid sequence of human 53070 at about residues 12 acid residue in Subdomain 7 binds to a magnesium ion which to 272 of SEQ ID NO:2 (see FIG. 2). is required for the catalytic activity of the kinase domain. To identify the presence of a “serine/threonine protein And finally, the invariant aspartic acid of subdomain 9 kinase domain in a 53070 protein sequence, the amino acid stabilizes the position of the catalytic loop, located in sequence of the protein can be searched against a SMART subdomain 7. A more extensive description of the structures database (Simple Modular Architecture Research Tool) of of protein kinase domains and the function of the invariant HMMs as described in Schultz et al. (1998), Proc. Natl. residues can be found in Taylor and Radzio-Andzelm Acad. Sci. USA 95:5857 and Schultz et al. (200) Nucl. Acids (1994), Structure 2:345-55, the contents of which are incor Res 28:231. The database contains domains identified by porated herein by reference. 10 profiling with the hidden Markov models of the HMMer2 A 53070 polypeptide can include a “protein kinase search program (R. Durbinet al. (1998) Biological sequence domain or regions homologous with a “protein kinase analysis: probabilistic models of proteins and nucleic acids. domain'. Cambridge University Press). The database also is exten sively annotated and monitored by experts to enhance accu As used herein, the term “protein kinase domain includes 15 racy. A search was performed against the HMM database an amino acid sequence of about 225 to 350 amino acid resulting in the identification of a “serine/threonine protein residues in length and having a bit score for the alignment kinase domain in the amino acid sequence of human 53070 of the sequence to the protein kinase domain profile (PFAM at about residues 12 to 272 of SEQ ID NO:2 (see FIG. 3). HMM) of at least 150. Preferably, a protein kinase domain In one embodiment, a 53070 protein includes at least one, includes an amino acid sequence of about 225 to 350 amino preferably two, three, four, five, six, seven, eight, nine, ten, acid residues in length and having a bit score for the eleven, twelve, or even more preferably thirteen of the alignment of the sequence to the serine/threonine kinase invariant residues present in protein kinase family members, domain profile (SMART HMM) of at least 150. Even more selected form the group consisting of a glycine residue preferably, a protein kinase domain includes at least about located at about amino acid residue 19 of SEQ ID NO:2, a 230 to 325 amino acids, more preferably about 235 to 300 25 glycine residue located at about amino acid residue 21 of amino acid residues, or about 240 to 280 amino acids and SEQID NO:2, a valine residue located at about amino acid has a bit score for the alignment of the sequence to the residue 26 of SEQID NO:2, alysine residue located at about serine/threonine protein kinase domain (SMART HMM) of amino acid residue 41 of SEQ ID NO:2, a glutamic acid at least 200, 250, 280, or greater. The protein kinase domain residue located at about amino acid residue 60 of SEQ ID (HMM) has been assigned the PFAM identifier PF00069, 30 NO:2, an aspartic acid residue located at about amino acid and the serine/threonine protein kinase domain (HMM) has residue 136 of SEQID NO:2, an asparagine residue located been given the SMART identifier S TKc. An alignment of at about amino acid residue 141 of SEQID NO:2, an aspartic the protein kinase domain (amino acids 12 to 272 of SEQID acid residue located at about amino acid residue 154 of SEQ NO:2) of human 53070 with the PFAM consensus amino ID NO:2, a phenylalanine residue located at about amino acid sequence (SEQ ID NO:4) derived from a hidden 35 acid residue 155 of SEQ ID NO:2, a glutamic acid residue Markov model is depicted in FIG. 2, and with the SMART located at about amino acid residue 185 of SEQID NO:2, an serine/threonine protein kinase domain consensus amino aspartic acid residue located at about amino acid residue 198 acid sequence (SEQ ID NO:5) derived from a hidden of SEQID NO:2, a glycine residue located at about amino Markov model is depicted in FIG. 3. acid residue 203 of SEQ ID NO:2, and an arginine residue In a preferred embodiment, a 53070 polypeptide or pro 40 located at about amino acid residue 260 of SEQ ID NO:2. tein has a “protein kinase domain or a region which In one embodiment, a 53070 protein includes at least one includes at least about 230 to 325 more preferably about 235 serine/threonine protein kinase active-site signature motif to 300, or 240 to 280 amino acid residues and has at least (PS00108), located at about amino acid residues 132 to 144 about 85%, 90%, 95%, 99%, or 100% homology with a of SEQID NO:2. As used herein, the term “serine/threonine “protein kinase domain, e.g., the protein kinase domain of 45 protein kinase active-site signature motif includes a human 53070 (e.g., residues 12 to 272 of SEQ ID NO:2). sequence of at least 8 amino acid residues defined by the To identify the presence of a “protein kinase domain in sequence: LIVMFYC-X-HY-X-D-LIVMFY-K-X-X- a 53070 protein sequence, and make the determination that N-ILIVMFYCT-ILIVMFYCT-ILIVNFYCT (SEQ ID a polypeptide or protein of interest has a particular profile, NO:6). A serine/threonine protein kinase active-site signa the amino acid sequence of the protein can be searched 50 ture motif, as defined, can be involved in the enzymatic against the PFAM database of HMMs (e.g., the PFAM transfer of a phosphate moiety from ATP to an appropriate database, release 2.1) using the default parameters. For acceptor molecule, e.g., a serine or threonine residue in a example, the hmmsf program, which is available as part of substrate molecule. More preferably, a serine/threonine pro the HMMER package of search programs, is a family tein kinase active-site signature motif includes 10 or, even specific default program for MILPAT0063 and a score of 15 55 more preferably, 13 amino acid residues. Serine/threonine is the default threshold score for determining a hit. Alter protein kinase active-site signature motifs have been given natively, the threshold score for determining a hit can be the PROSITE identifier PS00108. lowered (e.g., to 8 bits). A description of the PFAM database A 53070 family member can include at least one protein can be found in Sonhammer et al. (1997) Proteins 28(3): kinase domain. Furthermore, a 53070 family member can 405-420 and a detailed description of HMMs can be found, 60 include at least one serine/threonine protein kinase active for example, in Gribskov et al. (1990) Meth. Enzymol. site signature motif (PS00108); at least one, two, three, four, 183:146-159; Gribskov et al. (1987) Proc. Natl. Acad. Sci. preferably five predicted protein kinase C phosphorylation USA 84:4355-4358; Krogh et al. (1994) J. Mol. Biol. 235: sites (PS00005); at least one, two, preferably three predicted 1501-1531; and Stultz et al. (1993) Protein Sci. 2:305-314, casein kinase II phosphorylation sites (PS00006); and at the contents of which are incorporated herein by reference. 65 least one predicted N-myristylation sites (PS00008). A search was performed against the HMM database result As the 53070 polypeptides of the invention may modulate ing in the identification of a “protein kinase domain in the 53070-mediated activities, they may be useful as of for US 7,282,360 B2 21 22 developing novel diagnostic and therapeutic agents for Thus, the 53070 molecules can act as novel diagnostic 53070-mediated or related disorders, as described below. targets and therapeutic agents for controlling one or more of As used herein, a “53070 activity”, “biological activity of cellular proliferative and/or differentiative disorders, disor 53070” or “functional activity of 53070, refers to an ders associated with bone metabolism, immune disorders activity exerted by a 53070 protein, polypeptide or nucleic (e.g., inflammatory disorders), cardiovascular disorders, acid molecule. For example, a 53070 activity can be an liver disorders, viral diseases, pain or metabolic disorders. activity exerted by 53070 in a physiological milieu on, e.g., Examples of cellular proliferative and/or differentiative a 53070-responsive cell or on a 53070 substrate, e.g., a disorders include cancer, e.g., carcinoma, sarcoma, meta protein substrate. A53070 activity can be determined in vivo static disorders or hematopoietic neoplastic disorders, e.g., or in vitro. In one embodiment, a 53070 activity is a direct 10 leukemias. A metastatic tumor can arise from a multitude of activity, such as an association with a 53070 target molecule. primary tumor types, including but not limited to those of A “target molecule' or “binding partner is a molecule with prostate, colon, lung, breast and liver origin. which a 53070 protein binds or interacts in nature. In an As used herein, the terms “cancer”, “hyperproliferative' exemplary embodiment, 53070 is a protein kinase, e.g., a and “neoplastic' refer to cells having the capacity for serine/threonine protein kinase. 15 autonomous growth. Examples of Such cells include cells As used herein, the term “protein kinase' includes a having an abnormal state or condition characterized by protein or polypeptide that is capable of modulating its own rapidly proliferating cell growth. Hyperproliferative and phosphorylation state or the phosphorylation state of another neoplastic disease states may be categorized as pathologic, protein or polypeptide. Protein kinases can have a specificity i.e., characterizing or constituting a disease state, or may be for (i.e., a specificity to phosphorylate) serine/threonine categorized as non-pathologic, i.e., a deviation from normal residues, tyrosine residues, or both serine/threonine and but not associated with a disease state. The term is meant to tyrosine residues. Specificity of a protein kinase for phos include all types of cancerous growths or oncogenic pro phorylation of either tyrosine or serine/threonine can be cesses, metastatic tissues or malignantly transformed cells, predicted by the sequence of two of the subdomains (VIb tissues, or organs, irrespective of histopathologic type or and VIII) in which different residues are conserved in each 25 stage of invasiveness. “Pathologic hyperproliferative” cells class (as described in, for example, Hanks et al. (1988) occur in disease states characterized by malignant tumor Science 241:42-52) the contents of which are incorporated growth. Examples of non-pathologic hyperproliferative cells herein by reference). Preferably, the protein kinase of the include proliferation of cells associated with wound repair. invention is a serine/threonine protein kinase. The terms “cancer or “neoplasms include malignancies A 53070 activity can also be an indirect activity, e.g., a 30 of the various organ systems. Such as affecting lung, breast, cellular signaling activity mediated by interaction of the thyroid, lymphoid, gastrointestinal, and genito-urinary tract, 53070 protein with a 53070 substrate. Protein kinases play as well as adenocarcinomas which include malignancies a role in signaling pathways associated with cellular growth. Such as most colon cancers, renal-cell carcinoma, prostate For example, protein kinases are involved in the regulation cancer and/or testicular tumors, non-small cell carcinoma of of signal transmission from cellular receptors, e.g., growth 35 the lung, cancer of the Small intestine and cancer of the factor receptors; entry of cells into mitosis; and the regula esophagus. tion of cytoskeleton function, e.g., actin bundling. The The term “carcinoma is art recognized and refers to features of the 53070 molecules of the present invention can malignancies of epithelial or endocrine tissues including provide similar biological activities as protein kinase family respiratory system carcinomas, gastrointestinal system car members. For example, the 53070 proteins of the present 40 cinomas, genitourinary system carcinomas, testicular carci invention can have one or more of the following activities: nomas, breast carcinomas, prostatic carcinomas, endocrine (1) the ability to bind to at least one nucleoside tri-phos system carcinomas, and melanomas. Exemplary carcinomas phate, e.g., ATP; (2) the ability to auto-phosphorylate; (3) include those forming from tissue of the cervix, lung, the ability to phosphorylate other proteins; (4) the ability to prostate, breast, head and neck, colon and ovary. The term phosphorylate serine or threonine residues on other proteins; 45 also includes carcinosarcomas, e.g., which include malig (5) the ability to to alter the activity or sub-cellular local nant tumors composed of carcinomatous and sarcomatous ization of a substrate molecule via phosphorylation; (6) the tissues. An "adenocarcinoma’ refers to a carcinoma derived ability to regulate the transmission of signals from cellular from glandular tissue or in which the tumor cells form receptors, e.g., growth factor receptors or adhesion recep recognizable glandular structures. tors; (7) the ability to modulate the entry of a cell into 50 The term "sarcoma' is art recognized and refers to malig mitosis; (8) the ability to regulate the process of cell death; nant tumors of mesenchymal derivation. (9) the ability to regulate cell differentiation; (10) the ability Additional examples of proliferative disorders include to regulate cell growth; (11) the ability to regulate actin or hematopoietic neoplastic disorders. As used herein, the term tubulin dynamics; and/or (12) the ability to regulate cell "hematopoietic neoplastic disorders' includes diseases shape and motility. 55 involving hyperplastic/neoplastic cells of hematopoietic ori Inhibition or over stimulation of the activity of protein gin. A hematopoietic neoplastic disorder can arise from kinases involved in signaling pathways associated with myeloid, lymphoid or erythroid lineages, or precursor cells cellular growth or differentiation can lead to perturbed thereof. Preferably, the diseases arise from poorly differen cellular growth or function, which can in turn lead to cellular tiated acute leukemias, e.g., erythroblastic leukemia and growth and/or differentiation related disorders. As used 60 acute megakaryoblastic leukemia. Additional exemplary herein, a “cellular growth and/or differentiation disorder myeloid disorders include, but are not limited to, acute includes a disorder, disease, or condition characterized by a promyeloid leukemia (APML), acute myelogenous leuke deregulation, e.g., an upregulation or a downregulation, of mia (AML) and chronic myelogenous leukemia (CML) cellular growth and/or abnormal cellular behavior. Cellular (reviewed in Vaickus, L. (1991) Crit Rev. in Oncol./Hemo growth deregulation may be due to a deregulation of cellular 65 tol. 11:267-97); lymphoid malignancies include, but are not proliferation, cell cycle progression, cellular differentiation limited to acute lymphoblastic leukemia (ALL) which and/or cellular hypertrophy. includes B-lineage ALL and T-lineage ALL, chronic lym US 7,282,360 B2 23 24 phocytic leukemia (CLL), prolymphocytic leukemia (PLL), of regulation, sensitivity, and amplification. Kinase cascades hairy cell leukemia (HLL) and Waldenstrom's macroglobu can be activated locally, for example, near a signalling linemia (WM). Additional forms of malignant lymphomas receptor on a discrete region of the plasma membrane. An include, but are not limited to non-Hodgkin lymphoma and ultimate target of protein kinases is the cytoskeleton and its variants thereof, peripheral T cell lymphomas, adult T cell associated proteins, as it is often the object of signalling leukemia/lymphoma (ATL), cutaneous T-cell lymphoma cascades to alter cell morphology, or cell movement. (CTCL), large granular lymphocytic leukemia (LGF), One important cytokeletal protein is doublecortin. Dou Hodgkin’s disease and Reed-Sternberg disease. blecortin coassembles with microtubules in neurons of the The 53070 nucleic acid and protein of the invention can brain. Doublecortin was observed in vitro to stimulate the be used to treat and/or diagnose a variety of immune 10 polymerization of microtubules (Gleeson et al. (1999) Neu disorders. Examples of hematopoieitic disorders or diseases rom 23:257-271). Moreover, doublecortin colocalizes with include, but are not limited to, autoimmune diseases (includ microtubules in neurons that are migrating in the central and ing, for example, diabetes mellitus, arthritis (including rheu peripheral nervous system during embryonic and postnatal matoid arthritis, juvenile rheumatoid arthritis, osteoarthritis, development (Gleeson, Supra.). Remarkably, defects in gene psoriatic arthritis), multiple Sclerosis, encephalomyelitis, 15 for doublecortin are the cause of X-linked lissencephaly, myasthenia gravis, Systemic lupus erythematosis, autoim also called Double Cortex Syndrome (Gleeson et al. (1998) mune thyroiditis, dermatitis (including atopic dermatitis and Cell 92:63-72). Patients with this disorder have severe eczematous dermatitis), psoriasis, Sjögren's Syndrome, mental retardation, and intractable epilepsy. As result of the Crohn's disease, aphthous ulcer, iritis, conjunctivitis, kera failure of almost all cortical neurons to migrate completely toconjunctivitis, ulcerative colitis, asthma, allergic asthma, to their destination, the cerebral cortex is malformed, liter cutaneous lupus erythematosus, Scleroderma, vaginitis, ally “smooth brain” as a result. The doublecortin protein proctitis, drug eruptions, leprosy reversal reactions, appears to be critical to the neuronal migration process. erythema nodosum leprosum, autoimmune uveitis, allergic A feature of the doublecortin protein is two copies of a encephalomyelitis, acute necrotizing hemorrhagic encepha short repeats of approximately 80 amino acids. Mutations in lopathy, idiopathic bilateral progressive sensorineural hear 25 affected individuals cluster in these repeats (Gleeson et al. ing loss, aplastic anemia, pure red cell anemia, idiopathic (1999) Ann. Neurol. 45:146-153; Sapir et al. (2000) Hum. thrombocytopenia, polychondritis, Wegener's granulomato Mol. Genet. 9:703-712). These repeats in isolation can sis, chronic active hepatitis, Stevens-Johnson syndrome, modulate the properties of microtubules (Sapir, Supra.). idiopathic sprue, lichen planus, Graves disease, sarcoidosis, Interestingly, another human protein, KIAA0369, has two primary biliary cirrhosis, uveitis posterior, and interstitial 30 copies of these noted doublecortin repeats. KIAA0369 also lung fibrosis), graft-Versus-host disease, cases of transplan contains a CAM kinase-like serine protein kinase domain. tation, and allergy such as, atopic allergy. KIAA0369 is highly expressed in the fetal and adult brain Examples of disorders involving the heart or “cardiovas (Sossey-Alaoui and Srivastava (1999) Genomics 56:121 cular disorder” include, but are not limited to, a disease, 126) and may function in a calcium signaling pathway disorder, or state involving the cardiovascular system, e.g., 35 controlling neuronal migration in the brain (see GenBank the heart, the blood vessels, and/or the blood. A cardiovas entry GI:6225242). cular disorder can be caused by an imbalance in arterial The human 15985 sequence (see SEQID NO:7), which is pressure, a malfunction of the heart, or an occlusion of a approximately 3552 nucleotides long including untranslated blood vessel, e.g., by a thrombus. Examples of Such disor regions, contains a predicted methionine-initiated coding ders include hypertension, atherosclerosis, coronary artery 40 sequence of about 2301 nucleotides, including the termina spasm, congestive heart failure, coronary artery disease, tion codon. The coding sequence encodes a 766 amino acid valvular disease, arrhythmias, and cardiomyopathies. protein (see SEQ ID NO:8). The 53070 molecules of the invention may be used to Human 15985 contains the following regions or other treat, prevent, and/or diagnose reproductive disorders, e.g., structural features. prostatic or testicular disorders. As used herein, “a prostate 45 a protein kinase domain (PFAM Accession Number disorder” refers to an abnormal condition occurring in the PF00069) located at about amino acid residues 394 to 651 of male pelvic region characterized by, e.g., male sexual dys SEQ ID NO:8: function and/or urinary symptoms. This disorder may be a serine/threonine kinase active-site signature (Prosite manifested in the form of genitourinary inflammation (e.g., 50 PS00108) located at about amino acid residues 511 to 523 of inflammation of Smooth muscle cells) as in several common SEQ ID NO:8: diseases of the prostate including prostatitis, benign pros two doublecortin repeats located at about amino acid tatic hyperplasia and cancer, e.g., adenocarcinoma or carci residues 67 to 158, and 192 to 280 of SEQ ID NO:8: noma, of the prostate. four predicted N-glycosylation sites (PS00001) at about 15985 55 amino acids 164 to 167, 363 to 366, 619 to 622, and 681 to Protein kinases are a large, diverse . Promi 684 of SEQ ID NO:8: nent among protein kinases in eukaryotes, are serine/threo nineteen predicted Protein Kinase C phosphorylation sites nine protein kinases. These enzymes transfer a phosphate (PS00005) at about amino acids 3 to 5, 23 to 25, 67 to 69, from ATP to the hydroxyl of a serine or threonine side chain, 93 to 95, 129 to 131, 173 to 175, 182 to 184, 312 to 314, 331 where the phosphate can remain stably attached. Serine/ 60 to 333,334 to 336,357 to 349, 416 to 418, 484 to 486, 488 threonine protein kinases, also called serine protein kinases, to 490, 532 to 534, 623 to 625, 666 to 668, 710 to 712, and are frequently utilized in signalling cascades as the activity 760 to 762 of SEQ ID NO:8: of these enzymes can be finely regulated by stimuli. A eleven predicted Casein Kinase II phosphorylation sites common stimulus is phosphorylation of the serine protein (PS00006) located at about amino acids 109 to 122, 133 to kinase itself. Hence, signalling pathways, such as the MAP 65 136, 389 to 392,416 to 419, 461 to 464, 488 to 491, 542 to protein kinase cascade, can contain multiple proteins kinases 545, 623 to 626, 693 to 696, 724 to 727, and 739 to 742 of which sequentially activate. This design has the advantages SEQ ID NO:8: US 7,282,360 B2 25 26 one predicted cAMP/cGMP-dependent protein kinase A 15985 polypeptide can include a “protein kinase phosphorylation sites (PS00004) located at about amino domain or regions homologous with a “protein kinase acids 130 to 133 of SEQ ID NO:8; and domain'. ten predicted N-myristylation sites (PS00008) from about As used herein, the term “protein kinase domain includes an amino acid sequence of about 200 to 500 amino acid amino acids 22 to 27, 32 to 37, 86 to 91, 172 to 177, 323 to residues in length and having a bit score for the alignment 328,346 to 351,378 to 383, 643 to 648, 699 to 704, and 754 of the sequence to the protein kinase domain profile (Pfam to 759 of SEQ ID NO:8. HMM) of at least 300. Preferably, a protein kinase domain For general information regarding PFAM identifiers, PS includes at least about 200 to 500 amino acids, more prefix and PF prefix domain identification numbers, refer to 10 preferably about 210 to 400 amino acid residues, or about Sonnhammer et al. (1997) Protein 28:405-420. 230 to 280 amino acids and has a bit score for the alignment The 15985 protein contains a significant number of struc of the sequence to the protein kinase domain (HMM) of at tural characteristics in common with members of the protein least 345 or greater. The protein kinase domain (HMM) has kinase family. The term “family' when referring to the been assigned the PFAM Accession Number PF00069. An protein and nucleic acid molecules of the invention means 15 alignment of the protein kinase domain (amino acids 394 to two or more proteins or nucleic acid molecules having a 651 of SEQ ID NO:8) of human 15985 with a consensus common structural domain or motif and having Sufficient amino acid sequence (SEQ ID NO:10) derived from a amino acid or nucleotide sequence homology as defined hidden Markov model is depicted in FIG. 5. herein. Such family members can be naturally or non In a preferred embodiment 15985 polypeptide or protein naturally occurring and can be from either the same or has a “protein kinase domain or a region which includes at different species. For example, a family can contain a first least about 200 to 500 more preferably about 200 to 400 or protein of human origin as well as other distinct proteins of 230 to 280 amino acid residues and has at least about 50%, human origin, or alternatively, can contain homologues of 60%, 70% 80% 90% 95%, 99%, or 100% homology with a non-human origin, e.g., rat or mouse proteins. Members of “protein kinase domain, e.g., the protein kinase domain of 25 human 15985 (e.g., residues 394 to 651 of SEQ ID NO:8). a family can also have common functional characteristics. In addition, a 15985 polypeptide preferably includes a serine Protein kinases have a catalytic protein kinase domain, protein kinase active site signature, e.g., the amino acid which contains both C-helical and B-stranded structures. In sequence from about residues 511 to 523 of SEQ ID NO:8, general, the domain has a smaller amino-terminal lobe including a highly conserved aspartic acid, lysine, and whose primarily function is to bind ATP, whereas the larger 30 asparagine at amino acids 515, 517, and 520 of SEQ ID carboxy-terminal lobe functions to recognize and bind pep NO:8, respectively. tide substrates, and contributes catalytic side chains for To identify the presence of a “protein kinase” domain in phosphoryl transfer. One hallmark of serine protein kinases a 15985 protein sequence, and make the determination that is the active site signature, Prosite PS00108, LIVMFYC a polypeptide or protein of interest has a particular profile, X-HY-D-LIVMFY-K-X-X-N-ILIVMFYCT(3) wherein 35 the amino acid sequence of the protein can be searched X represents any amino acid and the number in parentheses against the Pfam database of HMMs (e.g., the Pfam data indicates the number of consecutive positions with a given base, release 2.1) using the default parameters. For example, profile of amino acids. the hmmsf program, which is available as part of the Protein kinases play a role in signaling pathways associ HMMER package of search programs, is a family specific ated with cellular growth. For example, protein kinases are 40 default program for MILPAT0063 and a score of 15 is the involved in the regulation of signal transmission from cel default threshold score for determining a hit. Alternatively, lular receptors, e.g., growth-factor receptors; entry of cells the threshold score for determining a hit can be lowered into mitosis; and the regulation of cytoskeleton function, (e.g., to 8 bits). A description of the Pfam database can be e.g., actin bundling. Thus, the molecules of the present found in Sonhammer et al. (1997) Proteins 28(3):405-420 invention may be involved in: 1) the regulation of transmis 45 and a detailed description of HMMs can be found, for sion of signals from cellular receptors, e.g., cell growth example, in Gribskov et al. (1990) Meth. Enzymol. 183:146 factor receptors; 2) the modulation of the entry of cells, e.g., 159; Gribskov et al. (1987) Proc. Natl. Acad. Sci. USA precursor cells, into mitosis; 3) the modulation of cellular 84:4355-4358; Krogh et al. (1994).J. Mol. Biol. 235:1501 differentiation; 4) the modulation of cell death; and 5) the 1531; and Stultz et al. (1993) Protein Sci. 2:305-314, the regulation of cytoskeleton function, e.g., actin bundling. 50 contents of which are incorporated herein by reference. A Inhibition or over stimulation of the activity of protein search was performed against the HMM database resulting kinases involved in signaling pathways associated with in the identification of a “protein kinase' domain in the cellular growth can lead to perturbed cellular growth, which amino acid sequence of human 15985 at about residues 394 can in turn lead to cellular growth related disorders. As used to 651 of SEQ ID NO:8 (see FIG. 5). herein, a “cellular growth related disorder includes a dis 55 A doublecortin repeats family of proteins is characterized order, disease, or condition characterized by a deregulation, by a common fold, as typified by the doublecortin and the e.g., an upregulation or a downregulation, of cellular KIAA0367 proteins. These repeats can modulate the activity growth. Cellular growth deregulation may be due to a and properties of microtubules, especially microtubules in deregulation of cellular proliferation, cell cycle progression, neuronal cells. A 15985 polypeptide can include at least one, cellular differentiation and/or cellular hypertrophy. 60 preferably two “doublecortin repeats' or regions homolo Examples of cellular growth related disorders include car gous with a “doublecortin repeat”. diovascular disorders such as heart failure, hypertension, As used herein, the term “doublecortin repeat' includes atrial fibrillation, dilated cardiomyopathy, idiopathic cardi an amino acid sequence of about 50 to 120 amino acid omyopathy, orangina; proliferative disorders or differentia residues in length and having a bit score for the alignment tive disorders such as cancer, e.g., melanoma, prostate 65 of the sequence to the doublecortin repeat (HMM) of at least cancer, cervical cancer, breast cancer, colon cancer, or 250. Preferably, a doublecortin repeat includes at least about SaCOa. 50 to 120 amino acids, more preferably about 60 to 100 US 7,282,360 B2 27 28 amino acid residues, or about 75 to 90 amino acids and has ability to phosphorylate a protein Substrates, e.g., a protein a bit score for the alignment of the sequence to the dou having a serine and/or threonine residue; (4) the ability to blecortin repeat (HMM) of at least 280 or greater. An bind to a nucleotide, e.g., an ATP molecule; (5) the ability to alignment of the doublecortin repeats (amino acids 67 to 158 modulate cellular migration, e.g., neuronal cell migration; and 192 to 280 of SEQ ID NO:8) of human 15985 with a (6) the ability to modulate neural development and/or main consensus amino acid sequence derived from a hidden tenance; (7) the ability to regulate the transmission of signals Markov model is depicted in FIGS. 6A-6B. from cellular receptors, e.g., cell growth factor receptors; 8) In a preferred embodiment 15985 polypeptide or protein the ability to modulate the entry of cells, e.g., precursor has a “doublecortin repeat' or a region which includes at cells, into mitosis; 9) the ability to modulate cellular differ least about 50 to 120 more preferably about 60 to 100 or 75 10 entiation; and/or 10) the ability to modulate cell death. to 90 amino acid residues and has at least about 70% 80% Based on the above-described sequence similarities, the 90% 95%, 99%, or 100% homology with a “doublecortin 15985 molecules of the present invention are predicted repeat, e.g., the doublecortin repeats of human 15985 (e.g., regulate cell migration, e.g., neuronal cell migration, inflam residues 67 to 158 and 192 to 280 of SEQ ID NO:8). mation, and cellular growth and differentiation, e.g., cancer. To identify the presence of a “doublecortin repeat” in a 15 Thus, the 15985 molecules can act as novel diagnostic 15985 protein sequence, and make the determination that a targets and therapeutic agents for controlling such disorders polypeptide or protein of interest has a particular profile, the that can include neurological and hematopoietic disorders, amino acid sequence of the protein can be searched against as well as cancer. a database of HMMs (e.g., the SMART database, Washing 15985 mRNA is expressed in tumors from the ovary and ton University School of Medicine) as described above. A lung (Example 7), as well as breast cancer cell lines (e.g., search was performed against the SMART database result SkBr3 cells). Lower levels of expression are detected in ing in the identification of “doublecortin repeats” in the cardiovascular tissues and the brain (Example 7). Accord amino acid sequence of human 15985 at about residues 67 ingly, molecules of the invention may serve as tools to to 158 and 192 to 280 of SEQ ID NO:8 (see FIG. 4). diagnose and/or treat disorders involving aberrant activities A 15985 family member can include at least one protein 25 of those cells in which they are expressed disorders of the kinase domain; and at least one, preferably two “doublecor lung, breast or ovaries, e.g., cancers, e.g., ovarian, breast, or tin repeats.” Furthermore, a 15985 family member can lung cancers, as well as cardiovascular or neurological include at least one, two, three, four, five, six, seven, eight, disorders. nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, The 15985 molecules can act as novel diagnostic targets seventeen, eighteen, and preferably nineteen predicted pro 30 and therapeutic agents for controlling one or more of cellular tein kinase C phosphorylation sites (PS00005); at least one, proliferative and/or differentiative disorders, disorders asso two, three, four, five, six, seven, eight, nine, ten, and ciated with bone metabolism, immune disorders (e.g., preferably eleven predicted casein kinase II phosphorylation inflammatory disorders), cardiovascular disorders, liver dis sites (PS00006); at least one predicted cAMP/cGMP-depen orders, viral diseases, pain or metabolic disorders. dent protein kinase phosphorylation site (PS00004); at least 35 Examples of cellular proliferative and/or differentiative one, two, three, four, five, six, seven, eight, nine, preferably disorders include cancer, e.g., carcinoma, sarcoma, meta ten predicted N-myristylation sites (PS00008), at least one, static disorders or hematopoietic neoplastic disorders, e.g., two, three, preferably four predicted N-glycosylation sites leukemias. A metastatic tumor can arise from a multitude of (PS000001); at least one protein kinase ATP-binding region primary tumor types, including but not limited to those of signature (PS00107). and at least one serine/threonine pro 40 prostate, colon, lung, breast and liver origin. tein kinase active-site signature (PS00108). As used herein, the terms “cancer”, “hyperproliferative' As the 15985 polypeptides of the invention may modulate and “neoplastic' refer to cells having the capacity for 15985-mediated activities, they may be useful as of for autonomous growth. Examples of Such cells include cells developing novel diagnostic and therapeutic agents for having an abnormal state or condition characterized by 15985-mediated or related disorders, as described below. 45 rapidly proliferating cell growth. Hyperproliferative and As used herein, a “15985 activity”, “biological activity of neoplastic disease states may be categorized as pathologic, 15985” or “functional activity of 15985”, refers to an i.e., characterizing or constituting a disease state, or may be activity exerted by a 15985 protein, polypeptide or nucleic categorized as non-pathologic, i.e., a deviation from normal acid molecule. For example, a 15985 activity can be an but not associated with a disease state. The term is meant to activity exerted by 15985 in a physiological milieu on, e.g., 50 include all types of cancerous growths or oncogenic pro a 15985-responsive cell or on a 15985 substrate, e.g., a cesses, metastatic tissues or malignantly transformed cells, protein substrate. A 15985 activity can be determined in vivo tissues, or organs, irrespective of histopathologic type or or in vitro. In one embodiment, a 15985 activity is a direct stage of invasiveness. “Pathologic hyperproliferative” cells activity, such as an association with a 15985 target molecule. occur in disease states characterized by malignant tumor A “target molecule' or “binding partner is a molecule with 55 growth. Examples of non-pathologic hyperproliferative cells which a 15985 protein binds or interacts in nature. In an include proliferation of cells associated with wound repair. exemplary embodiment, 15985 is a microtubule binding The terms “cancer or “neoplasms include malignancies protein of the various organ systems. Such as affecting lung, breast, A 15985 activity can also be an indirect activity, e.g., a thyroid, lymphoid, gastrointestinal, and genito-urinary tract, cellular signaling activity mediated by interaction of the 60 as well as adenocarcinomas which include malignancies 15985 protein with a 15985 receptor. The features of the Such as most colon cancers, renal-cell carcinoma, prostate 15985 molecules of the present invention can provide simi cancer and/or testicular tumors, non-small cell carcinoma of lar biological activities as protein kinase family members. the lung, cancer of the Small intestine and cancer of the For example, the 15985 proteins of the present invention can esophagus. have one or more of the following activities: (1) the ability 65 The term “carcinoma is art recognized and refers to to bind a cytoskeletal protein, e.g., a microtubule; (2) the malignancies of epithelial or endocrine tissues including ability to stimulate microtubule polymerization; (3) the respiratory system carcinomas, gastrointestinal system car US 7,282,360 B2 29 30 cinomas, genitourinary system carcinomas, testicular carci leukemia/lymphoma (ATL), cutaneous T-cell lymphoma nomas, breast carcinomas, prostatic carcinomas, endocrine (CTCL), large granular lymphocytic leukemia (LGF), system carcinomas, and melanomas. Exemplary carcinomas Hodgkin’s disease and Reed-Sternberg disease. include those forming from tissue of the cervix, lung, Examples of neurological disorders include, but are not prostate, breast, head and neck, colon and ovary. The term limited to, disorders involving neurons, and disorders also includes carcinosarcomas, e.g., which include malig involving glia, such as astrocytes, oligodendrocytes, nant tumors composed of carcinomatous and sarcomatous ependymal cells, and microglia; cerebral edema, raised tissues. An "adenocarcinoma’ refers to a carcinoma derived intracranial pressure and herniation, and hydrocephalus; from glandular tissue or in which the tumor cells form malformations and developmental diseases, such as neural recognizable glandular structures. 10 tube defects, forebrain anomalies, posterior fossa anomalies, The term "sarcoma' is art recognized and refers to malig and Syringomyelia and hydromyelia; perinatal brain injury; nant tumors of mesenchymal derivation. cerebrovascular diseases, such as those related to hypoxia, Examples of cellular proliferative and/or differentiative ischemia, and infarction, including hypotension, hypoperfu disorders of the colon include, but are not limited to, Sion, and low-flow states—global cerebral ischemia and non-neoplastic polyps, adenomas, familial syndromes, col 15 focal cerebral ischemia infarction from obstruction of orectal carcinogenesis, colorectal carcinoma, and carcinoid local blood Supply, intracranial hemorrhage, including tumors. intracerebral (intraparenchymal) hemorrhage. Subarachnoid Examples of cellular proliferative and/or differentiative hemorrhage and ruptured berry aneurysms, and vascular disorders of the liver include, but are not limited to, nodular malformations, hypertensive cerebrovascular disease, hyperplasias, adenomas, and malignant tumors, including including lacunar infarcts, slit hemorrhages, and hyperten primary carcinoma of the liver and metastatic tumors. sive encephalopathy; infections, such as acute meningitis, Examples of cellular proliferative and/or differentiative including acute pyogenic (bacterial) meningitis and acute disorders of the breast include, but are not limited to, aseptic (viral) meningitis, acute focal Suppurative infections, proliferative breast disease including, e.g., epithelial hyper including brain abscess, Subdural empyema, and extradural plasia, Sclerosing adenosis, and Small duct papillomas; 25 abscess, chronic bacterial meningoencephalitis, including tumors, e.g., Stromal tumors such as fibroadenoma, phyl tuberculosis and mycobacterioses, neurosyphilis, and neu lodes tumor, and sarcomas, and epithelial tumors such as roborreliosis (Lyme disease), Viral meningoencephalitis, large duct papilloma; carcinoma of the breast including in including arthropod-borne (Arbo) viral encephalitis, Herpes situ (noninvasive) carcinoma that includes ductal carcinoma simplex virus Type 1, Herpes simplex virus Type 2, Vari in situ (including Paget’s disease) and lobular carcinoma in 30 calla-zoster virus (Herpes zoster), cytomegalovirus, polio situ, and invasive (infiltrating) carcinoma including, but not myelitis, rabies, and human immunodeficiency virus 1. limited to, invasive ductal carcinoma, invasive lobular car including HIV-1 meningoencephalitis (subacute encephali cinoma, medullary carcinoma, colloid (mucinous) carci tis), vacuolar myelopathy, AIDS-associated myopathy, noma, tubular carcinoma, and invasive papillary carcinoma, peripheral neuropathy, and AIDS in children, progressive and miscellaneous malignant neoplasms. Disorders in the 35 multifocal leukoencephalopathy, Subacute Sclerosing panen male breast include, but are not limited to, gynecomastia and cephalitis, fungal meningoencephalitis, other infectious dis carcinoma. eases of the nervous system; transmissible spongiform Examples of cellular proliferative and/or differentiative encephalopathies (prion diseases); demyelinating diseases, disorders of the lung include, but are not limited to, bron including multiple Sclerosis, multiple Sclerosis variants, chogenic carcinoma, including paraneoplastic syndromes, 40 acute disseminated encephalomyelitis and acute necrotizing bronchioloalveolar carcinoma, neuroendocrine tumors, such hemorrhagic encephalomyelitis, and other diseases with as bronchial carcinoid, miscellaneous tumors, and metastatic demyelination; degenerative diseases, such as degenerative tumors; pathologies of the pleura, including inflammatory diseases affecting the cerebral cortex, including Alzheimer pleural effusions, noninflammatory pleural effusions, pneu disease and Pick disease, degenerative diseases of basal mothorax, and pleural tumors, including Solitary fibrous 45 ganglia and brain stem, including Parkinsonism, idiopathic tumors (pleural fibroma) and malignant mesothelioma. Parkinson disease (paralysis agitans), progressive Supra Additional examples of proliferative disorders include nuclear palsy, corticobasal degenration, multiple system hematopoietic neoplastic disorders. As used herein, the term atrophy, including striatonigral degenration, Shy-Drager "hematopoietic neoplastic disorders' includes diseases syndrome, and olivopontocerebellar atrophy, and Hunting involving hyperplastic/neoplastic cells of hematopoietic ori 50 ton disease; spinocerebellar degenerations, including gin. A hematopoietic neoplastic disorder can arise from spinocerebellar ataxias, including Friedreich ataxia, and myeloid, lymphoid or erythroid lineages, or precursor cells ataxia-telanglectasia, degenerative diseases affecting motor thereof. Preferably, the diseases arise from poorly differen neurons, including amyotrophic lateral Sclerosis (motor neu tiated acute leukemias, e.g., erythroblastic leukemia and ron disease), bulbospinal atrophy (Kennedy syndrome), and acute megakaryoblastic leukemia. Additional exemplary 55 spinal muscular atrophy; inborn errors of metabolism, Such myeloid disorders include, but are not limited to, acute as leukodystrophies, including Krabbe disease, metachro promyeloid leukemia (APML), acute myelogenous leuke matic leukodystrophy, adrenoleukodystrophy, Pelizaeus mia (AML) and chronic myelogenous leukemia (CML) Merzbacher disease, and Canavan disease, mitochondrial (reviewed in Vaickus, L. (1991) Crit Rev. in Oncol./Hemo encephalomyopathies, including Leigh disease and other tol. 11:267-97); lymphoid malignancies include, but are not 60 mitochondrial encephalomyopathies; toxic and acquired limited to acute lymphoblastic leukemia (ALL) which metabolic diseases, including vitamin deficiencies such as includes B-lineage ALL and T-lineage ALL, chronic lym thiamine (vitamin B) deficiency and vitamin B2 deficiency, phocytic leukemia (CLL), prolymphocytic leukemia (PLL), neurologic sequelae of metabolic disturbances, including hairy cell leukemia (HLL) and Waldenstrom's macroglobu hypoglycemia, hyperglycemia, and hepatic encephatopathy, linemia (WM). Additional forms of malignant lymphomas 65 toxic disorders, including carbon monoxide, methanol, etha include, but are not limited to non-Hodgkin lymphoma and nol, and radiation, including combined methotrexate and variants thereof, peripheral T cell lymphomas, adult T cell radiation-induced injury; tumors, such as gliomas, including US 7,282,360 B2 31 32 astrocytoma, including fibrillary (diffuse) astrocytoma and tumors, such as Kaposi sarcoma and hemangloendothe glioblastoma multiforme, pilocytic astrocytoma, pleomor lioma, and malignant tumors, such as angiosarcoma and phic Xanthoastrocytoma, and brain stem glioma, oligoden hemangiopericytoma; and pathology of therapeutic inter droglioma, and ependymoma and related paraventricular ventions in vascular disease, Such as balloon angioplasty and mass lesions, neuronal tumors, poorly differentiated neo related techniques and vascular replacement, Such as coro plasms, including medulloblastoma, other parenchymal nary artery bypass graft Surgery. tumors, including primary brain lymphoma, germ cell As used herein, disorders involving the heart, or “cardio tumors, and pineal parenchymal tumors, meningiomas, vascular disease' or a "cardiovascular disorder” includes a metastatic tumors, paraneoplastic syndromes, peripheral disease or disorder which affects the cardiovascular system, nerve sheath tumors, including Schwannoma, neurofibroma, 10 e.g., the heart, the blood vessels, and/or the blood. A and malignant peripheral nerve sheath tumor (malignant cardiovascular disorder can be caused by an imbalance in Schwannoma), and neurocutaneous syndromes (phakoma arterial pressure, a malfunction of the heart, or an occlusion toses), including neurofibromotosis, including Type 1 neu of a blood vessel, e.g., by a thrombus. A cardiovascular rofibromatosis (NF1) and TYPE 2 neurofibromatosis (NF2), disorder includes, but is not limited to disorders such as tuberous Sclerosis, and Von Hippel-Lindau disease. 15 arteriosclerosis, atherosclerosis, cardiac hypertrophy, The 15985 nucleic acid and protein of the invention can ischemia reperfusion injury, restenosis, arterial inflamma be used to treat and/or diagnose a variety of immune tion, vascular wall remodeling, Ventricular remodeling, disorders. Examples of immune disorders or diseases rapid Ventricular pacing, coronary microembolism, tachy include, but are not limited to, autoimmune diseases (includ cardia, bradycardia, pressure overload, aortic bending, coro ing, for example, diabetes mellitus, arthritis (including rheu nary artery ligation, vascular heart disease, valvular disease, matoid arthritis, juvenile rheumatoid arthritis, osteoarthritis, including but not limited to, Valvular degeneration caused by psoriatic arthritis), multiple Sclerosis, encephalomyelitis, calcification, rheumatic heart disease, endocarditis, or com myasthenia gravis, Systemic lupus erythematosis, autoim plications of artificial valves; atrial fibrillation, long-QT mune thyroiditis, dermatitis (including atopic dermatitis and syndrome, congestive heart failure, sinus node dysfunction, eczematous dermatitis), psoriasis, Sjögren's Syndrome, 25 angina, heart failure, hypertension, atrial fibrillation, atrial Crohn's disease, aphthous ulcer, iritis, conjunctivitis, kera flutter, pericardial disease, including but not limited to, toconjunctivitis, ulcerative colitis, asthma, allergic asthma, pericardial effusion and pericarditis; cardiomyopathies, e.g., cutaneous lupus erythematosus, Scleroderma, vaginitis, dilated cardiomyopathy or idiopathic cardiomyopathy, myo proctitis, drug eruptions, leprosy reversal reactions, cardial infarction, coronary artery disease, coronary artery erythema nodosum leprosum, autoimmune uveitis, allergic 30 spasm, ischemic disease, arrhythmia, Sudden cardiac death, encephalomyelitis, acute necrotizing hemorrhagic encepha and cardiovascular developmental disorders (e.g., arterio lopathy, idiopathic bilateral progressive sensorineural hear venous malformations, arteriovenous fistulae, raynaud’s ing loss, aplastic anemia, pure red cell anemia, idiopathic syndrome, neurogenic thoracic outlet syndrome, causalgia/ thrombocytopenia, polychondritis, Wegener's granulomato reflex sympathetic dystrophy, hemangioma, aneurysm, cav sis, chronic active hepatitis, Stevens-Johnson syndrome, 35 ernous angioma, aortic valve Stenosis, atrial septal defects, idiopathic sprue, lichen planus, Graves disease, sarcoidosis, atrioventricular canal, coarctation of the aorta, ebsteins primary biliary cirrhosis, uveitis posterior, and interstitial anomaly, hypoplastic left heart syndrome, interruption of the lung fibrosis), graft-Versus-host disease, cases of transplan aortic arch, mitral valve prolapse, ductus arteriosus, patent tation, and allergy Such as, atopic allergy. foramen ovale, partial anomalous pulmonary venous return, 15985 is expressed at relatively high levels in normal vein 40 pulmonary atresia with Ventricular septal defect, pulmonary tissue. Thus, aberrant expression and/or activity of 15985 atresia without ventricular septal defect, persistance of the molecules may mediate disorders involving the blood ves fetal circulation, pulmonary valve Stenosis, single ventricle, sels. Disorders involving blood vessels include, but are not total anomalous pulmonary venous return, transposition of limited to, responses of vascular cell walls to injury, such as the great vessels, tricuspid atresia, truncus arteriosus, Ven endothelial dysfunction and endothelial activation and inti 45 tricular septal defects). A cardiovasular disease or disorder mal thickening; vascular diseases including, but not limited also can include an endothelial cell disorder. to, congenital anomalies, such as arteriovenous fistula, ath 15985 mRNA is expressed at relatively high levels in erosclerosis, and hypertensive vascular disease, Such as ovary tumor and normal ovary tissue. Thus, aberrant expres hypertension; inflammatory disease—the vasculitides. Such sion and/or activity of 15985 molecules may mediate dis as giant cell (temporal) arteritis, Takayasu arteritis, pol 50 orders involving ovary tissue, e.g. disorders involving the yarteritis nodosa (classic), Kawasaki syndrome (mucocuta ovary. Disorders involving the ovary include, for example, neous lymph node syndrome), microscopic polyanglitis (mi polycystic ovarian disease, Stein-leventhal syndrome, croscopic polyarteritis, hypersensitivity or leukocytoclastic Pseudomyxoma peritonei and stromal hyperthecosis; ova anglitis), Wegener granulomatosis, thromboanglitis obliter rian tumors such as, tumors of coelomic epithelium, Serous ans (Buerger disease), vasculitis associated with other dis 55 tumors, mucinous tumors, endometeriod tumors, clear cell orders, and infectious arteritis; Raynaud disease; aneurysms adenocarcinoma, cystadenofibroma, brenner tumor, Surface and dissection, Such as abdominal aortic aneurysms, syphi epithelial tumors; germ cell tumors such as mature (benign) litic (luetic) aneurysms, and aortic dissection (dissecting teratomas, monodermal teratomas, immature malignant ter hematoma); disorders of veins and lymphatics, such as atomas, dysgerminoma, endodermal sinus tumor, choriocar varicose veins, thrombophlebitis and phlebothrombosis, 60 cinoma; sex cord-stomal tumors such as, granulosa-theca obstruction of Superior vena cava (Superior vena cava Syn cell tumors, thecoma-fibromas, androblastomas, hill cell drome), obstruction of inferior vena cava (inferior vena cava tumors, and gonadoblastoma; and metastatic tumors such as syndrome), and lymphangitis and lymphedema; tumors, Krukenberg tumors. including benign tumors and tumor-like conditions, such as As 15985 mRNA is expressed in lung tissue, and therefore hemangioma, lymphangioma, glomus tumor (gloman 65 aberrant expression and/or activity of 15985 molecules may gioma), vascular ectasias, and bacillary angiomatosis, and mediate disorders involving this tissue, e.g. disorders intermediate-grade (borderline low-grade malignant) involving the lung. Examples of disorders of the lung US 7,282,360 B2 33 34 include, but are not limited to, congenital anomalies; modulators of Such molecules for use in regulating a variety atelectasis; diseases of vascular origin, such as pulmonary of normal and/or pathological cellular processes. congestion and edema, including hemodynamic pulmonary The human 26583 sequence (SEQ ID NO:14), which is edema and edema caused by microvascular injury, adult approximately 2838 nucleotides long including untranslated respiratory distress syndrome (diffuse alveolar damage), regions, contains a predicted methionine-initiated coding pulmonary embolism, hemorrhage, and infarction, and pull sequence of about 1613 nucleotides (nucleotides 462 to monary hypertension and vascular Sclerosis; chronic 2075 of SEQ ID NO:14: SEQ ID NO:16). The coding obstructive pulmonary disease, such as emphysema, chronic sequence encodes a 537 amino acid protein (SEQ ID bronchitis, bronchial asthma, and bronchiectasis; diffuse NO:15). interstitial (infiltrative, restrictive) diseases, such as pneu 10 Human 26583 contains the following regions or other moconioses, sarcoidosis, idiopathic pulmonary fibrosis, structural features: a predicted serine/threonine catalytic descuamative interstitial pneumonitis, hypersensitivity domain at residues 172-461; a predicted serine/threonine pneumonitis, pulmonary eosinophilia (pulmonary infiltra catalytic domain at residues 99-523: one predicted N-gly tion with eosinophilia), Bronchiolitis obliterans-organizing cosylation site (PS00001) from about amino acids 105 to pneumonia, diffuse pulmonary hemorrhage syndromes, 15 108; five predicted Protein Kinase C sites (PS00005) from including Goodpasture syndrome, idiopathic pulmonary about amino acids 95 to 97, 156 to 158, 182 to 184, 211 to hemosiderosis and other hemorrhagic syndromes, pulmo 213 and 463 to 465 of SEQID NO:15; five predicted Casein nary involvement in vascular disorders, and pull Kinase II phosphorylation sites (PS00006) from about monary alveolar proteinosis; complications of therapies, amino acids 172 to 175, 228 to 231, 371 to 374, 471 to 474 Such as drug-induced lung disease, radiation-induced lung and 505 to 508 of SEQID NO:15; seven predicted N-myris disease, and lung transplantation; tumors, such as bron toylation sites (PS00008) from about amino acids 137 to chogenic carcinoma, including paraneoplastic syndromes, 142, 148 to 153, 271 to 276, 303 to 308, 419 to 424, 456 to bronchioloalveolar carcinoma, neuroendocrine tumors, such 461 and 531 to 536 of SEQ ID NO:15; one amidation site as bronchial carcinoid, miscellaneous tumors, and metastatic (PS00009) at about amino acids 67 to 70 of SEQID NO:15: tumors; pathologies of the pleura, including inflammatory 25 and one protein phosphatase 2C signature (PS01037) from pleural effusions, noninflammatory pleural effusions, pneu about amino acids 139 to 147 of SEQ ID NO:15. mothorax, and pleural tumors, including Solitary fibrous For general information regarding PFAM identifiers, PS tumors (pleural fibroma) and malignant mesothelioma. prefix and PF prefix domain identification numbers, refer to 26.583 Sonnhammer et al. (1997) Protein 28:405-420. 30 The 26583 protein contains a significant number of struc Protein phosphatases are enzymes that reverse the actions tural characteristics in common with members of the serine/ of protein kinases by cleaving phosphate from serine, threo threonine phosphatase family. The term “family” when nine, and/or tyrosine residues in proteins. The protein phos referring to the protein and nucleic acid molecules of the phatases are divided into three groups according to catalytic invention means two or more proteins or nucleic acid function: (1) protein phosphatases that dephosphorylate 35 molecules having a common structural domain or motif and serine and threonine residues; (2) protein phosphatases having Sufficient amino acid or nucleotide sequence homol which dephosphorylate tyrosine residues; and (3) protein ogy as defined herein. Such family members can be naturally phosphatases which dephosphorylate serine, threonine and or non-naturally occurring and can be from either the same tyrosine residues. or different species. For example, a family can contain a first Serine/threonine protein phosphatases are associated with 40 protein of human origin as well as other distinct proteins of the regulation of cholesterol biosynthesis, glycogen metabo human origin, or alternatively, can contain homologues of lism, muscle contractility, calcium ion channels, protein non-human origin, e.g., rat or mouse proteins. Members of synthesis, regulation of the G2 to M transition of the cell a family can also have common functional characteristics. cycle, regulation of glycolysis (6-phosphofructo-2-kinase A 26583 polypeptide of the invention can include a and pyruvate kinase), glycogenolysis (phosphorylase kinase 45 'serine/threonine phosphatase catalytic domain” or regions Subunit), gluconeogenesis (fructose-2,6-bisphosphatase and homologous with a 'serine/threonine phosphatase catalytic pyruvate kinase), amino-acid degradation (phenylalanine domain.” As used herein, the term “serine/threonine phos hydroxylase), lipid metabolism (acetyl-CoA carboxylase), phatase catalytic domain refers to an amino acid sequence catecholamine synthesis (tyrosine hydroxylase) and protein having about 200 to 450, preferably about 150 to 350, more synthesis (elongation factor 2). 50 preferably about 100 to 300, and even more preferably about Protein tyrosine phosphatases (PTPs) are a family of 288 amino acid residues. intracellular and integral membrane phosphatases that Based on structural similarities, members of the serine/ dephosphorylate tyrosine residues in proteins. PTPs have threonine phosphatase family have been classified into vari been identified in mammals, Drosophila and Schiz. pombe ous Subfamilies, including four major types of protein phos and are implicated in the control of normal and neoplastic 55 phatase catalytic subunits that dephosphorylate serine and growth and proliferation. threonine residues. These enzymes are termed protein phos Generally, the balance of protein phosphorylation in a cell phatases 1, 2A, 2B, and 2C (PP1, PP2A, PP2B and PP2C, depends on the level of protein kinase and protein phos the human genome symbols being PPP1, PPP2, PPP3 and phatase activity. Protein phosphorylation is important for the PPM1 respectively). Protein phosphatase PP1 appears to regulation of numerous metabolic processes such as choles 60 have pleiotropic actions in the regulation of glycogen terol biosynthesis and has been associated with cell cycle metabolism, muscle contractility, calcium ion channels, pro progression and transformation of cells. Thus, protein phos tein synthesis and cell division. Protein phosphatase 2A phatases can serve as positive or negative regulators of (PP2A) dephosphorylates enzymes involved in the regula metabolic function as well as cell growth and differentiation. tion of glycolysis (6-phosphofructo-2-kinase and pyruvate Given the important biological roles and properties of phos 65 kinase), glycogenolysis (phosphorylase kinase subunit), glu phatases, there exists a need for the identification of novel coneogenesis (fructose-2,6-bisphosphatase and pyruvate encoding Such proteins as well as for the discovery of kinase), amino-acid degradation (phenylalanine hydroxy US 7,282,360 B2 35 36 lase), lipid metabolism (acetyl-CoA carboxylase), catechola 100% homology with a “serine/threonine phosphatase cata mine synthesis (tyrosine hydroxylase) and protein synthesis lytic domain.” e.g., the serine/threonine phosphatase cata (elongation factor 2). The catalytic Subunit has also been lytic domain of human 26583. identified as a negative regulator of the dephosphorylation Thus, a 26583 molecule of the present invention can be and activation of p34cdc2 protein kinase in Xenopus and S. 5 identified based on the presence of a “serine/threonine pombe and therefore as a suppressor of the G2 to M phosphatase catalytic domain in the protein or correspond transition of the cell cycle. Protein phosphatase 2B (PP2B) ing nucleic acid molecule. Preferably, a serine/threonine is particularly abundant in brain where it comprises up to 1% phosphatase catalytic domain includes a protein domain of total protein. The physiological roles of PP2B may be to having an amino acid sequence of about 200 to 500 amino allow extracellular signals that act via Ca" to attenuate 10 acid residues and having a bit score for the alignment of the those that act through cyclic AMP. PP2B may be involved in sequence to the fibroblast growth factor domain (HMM) of the regulation of ion channels in both neuronal and non at least 150. Preferably, a “serine/threonine phosphatase neuronal cells. Protein phosphatase 4 (PP4) is required in catalytic domain refers to an amino acid sequence having late G1 of the cell cycle for progression into S phase in yeast. about 200 to 500, preferably about 250 to 400, more Protein phosphatase 2C (PP2C) may play a role in the 15 preferably about 250 to 350 amino acid residues and has a regulation of cholesterol biosynthesis, as PP2C possesses bit score for the alignment of the sequence to a serine/ high activity against hydroxymethylglutaryl-CoA reductase threonine phosphatase catalytic domain (HMM) of at least kinase, which inactivates HMG-CoA reductase, the rate 100, 200, 250 or greater. An alignment of the serine/ limiting of this pathway. Protein phosphatase 2C threonine phosphatase of human 26583 (SEQ ID NO:15) (PP2C) is a monomeric enzyme of about 42 Kd that shows with a consensus amino acid sequence derived from a broad substrate specificity and is dependent on divalent hidden Markov model is depicted in FIGS. 10A-10B. cations (mainly manganese and magnesium) for its activity. To identify the presence of a “serine/threonine phos At least three isozymes are known in mammals: PP2C phatase catalytic domain in a 26583 protein sequence and alpha, -beta and -gamma. In yeast, there are at least four to make the determination that a polypeptide or protein of PP2C homologs: phosphatase PTC1 that has weak tyrosine 25 interest has a particular profile, the amino acid sequence of phosphatase activity in addition to its activity on serines, the protein can be searched against a database of HMMs phosphatases PTC2 and PTC3. Isozymes of PP2C are also using the default parameters. For example, the hmmsf known from Arabidopsis thaliana (ABI1, PPH1), Cae program, which is available as part of the HMMER package norhabditis elegans (FEM-2, F42G9.1, T23F11.1), Leish of search programs, is a family specific default program for mania chagasi and Paramecium tetraurelia. In Arabidopsis 30 MILPATOO63 and a score of 15 is the default threshold score thaliana, the kinase associated protein phosphatase (KAPP) for determining a hit. Alternatively, the threshold score for is an enzyme that dephosphorylates the Ser/Thr receptor-like determining a hit can be lowered (e.g., to 8 bits). A descrip kinase RLK5 and which contains a C-terminal PP2C tion of the Pfam database can be found in Sonhammer et al. domain. In addition, PP2C appears to be significantly similar (1997) Proteins 28(3):405-420 and a detailed description of to the catalytic Subunit of pyruvate dehydrogenase phos 35 HMMs can be found, for example, in Gribskov et al. (1990) phatase (EC 3.1.3.43) (PDPC) that catalyzes dephosphory Meth. Enzymol. 183:146-159; Gribskov et al. (1987) Proc. lation and concomitant reactivation of the alpha Subunit of Natl. Acad. Sci. USA 84:4355-4358; Krogh et al. (1994) J. the E1 component of the pyruvate dehydrogenase complex. Mol. Biol. 235:1501-1531; and Stultz et al. (1993) Protein PDPC is a mitochondrial enzyme and, like PP2C, is mag Sci. 2:305-314, the contents of which are incorporated nesium-dependent. 40 herein by reference. A search was performed against the In addition, protein serine/threonine phosphatases may HMM database resulting in the identification of an amino play a role in signaling pathways associated with cellular acid sequence of human 26583 that is homologous to a growth. For example, protein serine/threonine phosphatases sequence contained in PP2C at about residues 172 to 461 of can be involved in the regulation of signal transmission from SEQID NO:15 (see FIG. 10A). The search further identified cellular receptors, e.g., growth-factor receptors; entry of 45 an amino acid sequence of human 26583 that is homologous cells into mitosis. Thus, the 265.83 molecules of the present to a sequence contained in PP2C 4 at about residues 99 to invention may be involved in: (1) catalyzing the removal of 523 of SEQ ID NO:15 (see FIG. 10B). a phosphate group attached to a tyrosine residue in a protein; As the 26583 polypeptides of the invention may modulate (2) the regulation of transmission of signals from cellular 26583-mediated activities, they may be useful as of for receptors; (3) modulation of cellular growth signaling 50 developing novel diagnostic and therapeutic agents for mechanisms; (4) modulation of cell proliferation or growth; 26583-mediated or related disorders, as described below. As (5) modulation of cell differentiation; (6) modulation of cell used herein, “26583 activity,” “biological activity of 26583” survival; (7) modulation of transformation; (8) modulation or “functional activity of 26583, refers to an activity of apoptosis of a cell (e.g., a cancer cell); (9) modulation of exerted by a 26583 protein, polypeptide or nucleic acid cholesterol biosynthesis; (10) modulation of glycogen 55 molecule on e.g., a 26583-responsive cell or on a 26583 metabolism; (11) modulation of muscle contractility; (12) Substrate, e.g., a protein Substrate, as determined in vivo or modulation of calcium ion channel activity; (13) modulation in vitro. In one embodiment, a 26583 activity is a direct of glycolysis, glycogenolysis, or gluconeogenesis; (14) activity, Such as an association with a 26583 target molecule. modulation of amino-acid degradation; (15) modulation of A “target molecule' or “binding partner is a molecule with lipid metabolism; and/or (16) modulation of catecholamine 60 which a 26583 protein binds or interacts with in nature, e.g., synthesis. a protein containing one or more serine/threonine residues. In a preferred embodiment, a 26583 polypeptide or pro A 26583 activity also can be an indirect activity, e.g., a tein has a 'serine/threonine phosphatase catalytic domain cellular signaling activity mediated by interaction of the refers to an amino acid sequence having about 200 to 450, 26583 protein with a 26583 receptor (e.g., a receptor that is preferably about 150 to 350, more preferably about 100 to 65 a protein serine/threonine kinase). For example, a 26583 300, and even more preferably about 288 amino acid resi protein of the present invention can have one or more of the dues and has at least about 70% 80% 90% 95%, 99%, or following activities: (1) removal of phosphate moieties from US 7,282,360 B2 37 38 phospho-serine/threonine residues in proteins; (2) the regu been implicated in neuro-degenerative disorders, such as lation of transmission of signals from cellular receptors; (3) Parkinson's, Huntington's and Alzheimer's diseases. modulation of cellular growth signaling mechanisms; (4) In addition, the 26583 molecules of the invention are modulation of cell proliferation; (5) modulation of cell useful for diagnosing and/or treating cellular proliferative differentiation; (6) modulation of transformation; (7) modu 5 disorders. As used herein, a “cellular proliferative disorder lation of apoptosis (e.g., a cancer cell); (8) modulation of includes a disorder, disease, or condition characterized by a cholesterol biosynthesis; (9) modulation of glycogen deregulated, e.g., up-regulated or down-regulated, growth metabolism; (10) modulation of muscle contractility; (11) response. As used herein, a “cellular differentiative disorder modulation of calcium ion channel activity; (12) modulation includes a disorder, disease, or condition characterized by of glycolysis, glycogenolysis and gluconeogenesis; (13) 10 aberrant cellular differentiation. As used herein, metastatic modulation of amino-acid degradation; (14) modulation of refers to the ability of a tumor cell to form implants at a site lipid metabolism; and/or (15) modulation of catecholamine distant from the original tumor. Thus, the 26583 molecules synthesis. can act as novel diagnostic targets and therapeutic agents for As used herein, the term "cellular growth signaling controlling cellular proliferative and/or differentiative dis mechanism' includes the ability to interact with, e.g., bind 15 orders. to, and remove a phospho-serine/threonine residue present Based on the above-described sequence similarities, the in a protein, e.g., a serine or threonine phosphorylated 265.83 molecules of the present invention are predicted to protein and modulate, e.g., inhibit, one or more of: (1) have similar biological activities as serine/threonine phos induction of receptor dimerization, (2) serine/threonine phatase family members. Thus, the 265.83 molecules can act kinase activation, (3) phosphorylation of signaling mol as novel diagnostic targets and therapeutic agents for con ecules, and/or (4) induction gene expression; thereby regu trolling one or more of cellular proliferative and/or differ lating one or more of: (5) cell proliferation, (6) cell differ entiative disorders, disorders associated with bone metabo entiation, (7) cell Survival, (8) oncogenic transformation, (9) lism, immune or hematopoietic disorders, cardiovascular migration, and/or (10) apoptosis, of a cell (e.g., a cancer disorders, liver disorders, viral diseases, pain or metabolic cell), (11) modulation of cholesterol biosynthesis, (12) 25 disorders such as hypo- or hypercholesterolemia, or disor modulation of glycogen metabolism, (13) modulation of ders associated with mitochondrial dysfunction. muscle contractility, (14) modulation of calcium ion channel In addition, the 265.83 molecules of the invention may activity, (15) modulation of glycolysis, glycogenolysis and modulate physiological and pathological processes in the gluconeogenesis, (16) modulation of amino-acid degrada cells or tissues where they are expressed. For example, Taq 30 Man studies described herein show expression of 26583 in tion, (17) modulation of lipid metabolism and/or (18) modu normal human breast, lung, colon, liver, and brain tissue lation of catecholamine synthesis. (FIG. 11). 265.83 expression can be modulated in samples of Based on the above-described sequence similarities, a tumor tissue compared to normal tissue. For example, 26.583 26583 molecule of the present invention is predicted to have expression in brain tumor samples can be significantly similar biological activities as serine/threonine phosphatase 35 higher than in normal brain tissue samples; and 26.583 family members. Thus, the 26583 molecules can act as novel expression in lung tumor tissue can be higher than in normal diagnostic targets and therapeutic agents for controlling lung tissue (FIG. 11). cellular proliferative disorders or metabolic disorders such Examples of cellular proliferative and/or differentiative as those associated with cholesterol biosynthesis or mito disorders include cancer, e.g., carcinoma, sarcoma, meta chondrial dysfunction. 40 static disorders or hematopoietic neoplastic disorders, e.g., As used herein, a “cholesterol biosynthesis-associated leukemias. A metastatic tumor can arise from a multitude of disorder includes any disorder wherein the regulation of primary tumor types, including but not limited to those of cholesterol biosynthesis is affected by the presence or prostate, colon, lung, breast and liver origin. absence of a 25583 activity of the invention. For example, As used herein, the terms “cancer,” “hyperproliferative' the 26583 protein of the present invention contains sequence 45 and “neoplastic' refer to cells having the capacity for homology to PP2C (see FIG. 10A). PP2C possesses high autonomous growth, i.e., an abnormal state or condition activity against hydroxymethylglutaryl-CoA reductase characterized by rapidly proliferating cell growth. Hyper kinase, which inactivates HMG-CoA reductase, the rate proliferative and neoplastic disease states may be catego limiting enzyme of the cholesterol biosynthetic pathway. rized as pathologic, i.e., characterizing or constituting a Thus, the present invention provides a means for diagnosing 50 disease state, or may be categorized as non-pathologic, i.e., and/or treating a cholesterol biosynthesis-associated disor a deviation from normal but not associated with a disease der Such as, for example, hypo- or hypercholesterolemia. state. The term is meant to include all types of cancerous As previously noted, the 26583 protein of the present growths or oncogenic processes, metastatic tissues or malig invention contains sequence homology to PP2C (see FIG. nantly transformed cells, tissues, or organs, irrespective of 10A). PP2C appears to be significantly similar to the cata 55 histopathologic type or stage of invasiveness. “Pathologic lytic Subunit of pyruvate dehydrogenase phosphatase (EC hyperproliferative” cells occur in disease states character 3.1.3.43) (PDPC) that catalyzes dephosphorylation and con ized by malignant tumor growth. Examples of non-patho comitant reactivation of the alpha subunit of the E1 com logic hyperproliferative cells include proliferation of cells ponent of the pyruvate dehydrogenase complex. PDPC is a associated with wound repair. mitochondrial enzyme and, like PP2C, is magnesium-de 60 The terms “cancer or “neoplasms include malignancies pendent. Thus, the present invention is additionally useful as of the various organ systems. Such as affecting lung, breast, a means for diagnosing and/or treating disorders associated thyroid, lymphoid, gastrointestinal, and genito-urinary tract, with mitochondria. As used herein, a “mitochondrial-asso as well as adenocarcinomas which include malignancies ciated disorder includes any disorder related to the function Such as most colon cancers, renal-cell carcinoma, prostate or dysfunction of mitochondria. For example, diabetes mel 65 cancer and/or testicular tumors, non-small cell carcinoma of litus has been associated with deficient mitochondrial oxi the lung, cancer of the Small intestine and cancer of the dative phosphorylation. Also, mitochondrial dysfunction has esophagus. US 7,282,360 B2 39 40 The term “carcinoma is art recognized and refers to Examples of immune disorders or diseases include, but malignancies of epithelial or endocrine tissues including are not limited to, autoimmune diseases (including, for respiratory system carcinomas, gastrointestinal system car example, diabetes mellitus, arthritis (including rheumatoid cinomas, genitourinary system carcinomas, testicular carci arthritis, juvenile rheumatoid arthritis, osteoarthritis, psori nomas, breast carcinomas, prostatic carcinomas, endocrine atic arthritis), multiple Sclerosis, encephalomyelitis, myas system carcinomas, and melanomas. Exemplary carcinomas thenia gravis, systemic lupus erythematosis, autoimmune include those forming from tissue of the cervix, lung, thyroiditis, dermatitis (including atopic dermatitis and prostate, breast, head and neck, colon and ovary. The term eczematous dermatitis), psoriasis, Sjögren's Syndrome, also includes carcinosarcomas, e.g., which include malig Crohn's disease, aphthous ulcer, iritis, conjunctivitis, kera 10 toconjunctivitis, ulcerative colitis, asthma, allergic asthma, nant tumors composed of carcinomatous and sarcomatous cutaneous lupus erythematosus, Scleroderma, vaginitis, tissues. An "adenocarcinoma’ refers to a carcinoma derived proctitis, drug eruptions, leprosy reversal reactions, from glandular tissue or in which the tumor cells form erythema nodosum leprosum, autoimmune uveitis, allergic recognizable glandular structures. encephalomyelitis, acute necrotizing hemorrhagic encepha The term "sarcoma' is art recognized and refers to malig 15 lopathy, idiopathic bilateral progressive sensorineural hear nant tumors of mesenchymal derivation. ing loss, aplastic anemia, pure red cell anemia, idiopathic The 26583 nucleic acid and protein of the invention can thrombocytopenia, polychondritis, Wegener's granulomato be used to treat and/or diagnose a variety of proliferative sis, chronic active hepatitis, Stevens-Johnson syndrome, disorders. Such disorders include hematopoietic neoplastic idiopathic sprue, lichen planus, Graves disease, sarcoidosis, disorders. As used herein, the term “hematopoietic neoplas primary biliary cirrhosis, uveitis posterior, and interstitial tic disorders' includes diseases involving hyperplastic/neo lung fibrosis), graft-Versus-host disease, cases of transplan plastic cells of hematopoietic origin, e.g., arising from tation, and allergy Such as, atopic allergy. myeloid, lymphoid or erythroid lineages, or precursor cells Examples of disorders involving the heart or “cardiovas thereof. Preferably, the diseases arise from poorly differen cular disorder” include, but are not limited to, a disease, tiated acute leukemias, e.g., erythroblastic leukemia and 25 disorder, or state involving the cardiovascular system, e.g., acute megakaryoblastic leukemia. Additional exemplary the heart, the blood vessels, and/or the blood. A cardiovas myeloid disorders include, but are not limited to, acute cular disorder can be caused by an imbalance in arterial promyeloid leukemia (APML), acute myelogenous leuke pressure, a malfunction of the heart, or an occlusion of a mia (AML) and chronic myelogenous leukemia (CML) blood vessel, e.g., by a thrombus. Examples of Such disor (reviewed in Vaickus (1991) Crit Rev. in Oncol./Hemotol. 30 ders include hypertension, atherosclerosis, coronary artery 11:267-97); lymphoid malignancies include, but are not spasm, congestive heart failure, coronary artery disease, limited to acute lymphoblastic leukemia (ALL) which valvular disease, arrhythmias, and cardiomyopathies. includes B-lineage ALL and T-lineage ALL, chronic lym Disorders which may be treated or diagnosed by methods phocytic leukemia (CLL), prolymphocytic leukemia (PLL), described herein include, but are not limited to, disorders hairy cell leukemia (HLL) and Waldenstrom's macroglobu 35 associated with an accumulation in the liver of fibrous tissue, linemia (WM). Additional forms of malignant lymphomas Such as that resulting from an imbalance between production include, but are not limited to non-Hodgkin lymphoma and and degradation of the extracellular matrix accompanied by variants thereof, peripheral T cell lymphomas, adult T cell the collapse and condensation of preexisting fibers. The leukemia/lymphoma (ATL), cutaneous T-cell lymphoma methods described herein can be used to diagnose or treat (CTCL), large granular lymphocytic leukemia (LGF), 40 hepatocellular necrosis or injury induced by a wide variety Hodgkin’s disease and Reed-Sternberg disease. of agents including processes which disturb homeostasis, Aberrant expression and/or activity of 26583 molecules Such as an inflammatory process, tissue damage resulting may mediate disorders associated with bone metabolism. from toxic injury or altered hepatic blood flow, and infec "Bone metabolism' refers to direct or indirect effects in the tions (e.g., bacterial, viral and parasitic). For example, the formation or degeneration of bone structures, e.g., bone 45 methods can be used for the early detection of hepatic injury, formation, bone resorption, etc., which may ultimately affect Such as portal hypertension or hepatic fibrosis. In addition, the concentrations in serum of calcium and phosphate. This the methods can be employed to detect liver fibrosis attrib term also includes activities mediated by 26583 molecules uted to inborn errors of metabolism, for example, fibrosis effects in bone cells, e.g., osteoclasts and osteoblasts, that resulting from a storage disorder Such as Gaucher's disease may in turn result in bone formation and degeneration. For 50 (lipid abnormalities) or a glycogen storage disease, Al example, 26583 molecules may support different activities antitrypsin deficiency; a disorder mediating the accumula of bone resorbing osteoclasts, such as the stimulation of tion (e.g., Storage) of an exogenous Substance, for example, differentiation of monocytes and mononuclear phagocytes hemochromatosis (iron-overload syndrome) and copper into osteoclasts. Accordingly, 26583 molecules that modu storage diseases (Wilson's disease), disorders resulting in late the production of bone cells can influence bone forma 55 the accumulation of a toxic metabolite (e.g., tyrosinemia, tion and degeneration, and thus may be used to treat bone fructosemia and galactosemia) and peroxisomal disorders disorders. Examples of such disorders include, but are not (e.g., Zellweger syndrome). Additionally, the methods limited to, osteoporosis, osteodystrophy, osteomalacia, rick described herein may be useful for the early detection and ets, osteitis fibrosa cystica, renal osteodystrophy, osteoscle treatment of liver injury associated with the administration rosis, anti-convulsant treatment, osteopenia, fibrogenesis 60 of various chemicals or drugs, such as, for example, meth imperfecta ossium, secondary hyperparathyrodism, otrexate, isonizaid, oxyphenisatin, methyldopa, chlorprom hypoparathyroidism, hyperparathyroidism, cirrhosis, azine, tolbutamide or alcohol, or which represents a hepatic obstructive jaundice, drug induced metabolism, medullary manifestation of a vascular disorder Such as obstruction of carcinoma, chronic renal disease, rickets, sarcoidosis, glu either the intrahepatic or extrahepatic bile flow or an alter cocorticoid antagonism, malabsorption syndrome, Steator 65 ation in hepatic circulation resulting, for example, from rhea, tropical sprue, idiopathic hypercalcemia and milk chronic heart failure, veno-occlusive disease, portal vein fever. thrombosis or Budd-Chiari syndrome. US 7,282,360 B2 41 42 Additionally, 265.83 molecules may play an important Human 21953 contains the following regions or other role in the etiology of certain viral diseases, including, but structural features: a predicted prolyl oligopeptidase domain not limited to, Hepatitis B, Hepatitis C, and Herpes Simplex (PFAM Accession PF00326) located at about amino acids Virus (HSV). Modulators of 26583 activity could be used to 672-744 of SEQID NO:20; two predicted cAMP phospho control viral diseases. The modulators can be used in the rylation sites and coMP-dependent protein kinase phospho treatment and/or diagnosis of viral infected tissue or virus rylation domains (Prosite Accession PS00004) located at associated tissue fibrosis, especially liver and liver fibrosis. about amino acid residues 231 to 234 of SEQID NO:20 and Also, 26583 modulators can be used in the treatment and/or about amino acid residues 476-479 of SEQ ID NO:20; ten diagnosis of virus-associated carcinoma, especially hepato predicted Protein Kinase C sites (PS00005) at about amino cellular cancer. 10 acids 52 to 54, 80 to 82, 115 to 117, 307 to 309, 312 to 314, Additionally, 265.83 may play an important role in the 326 to 328,551 to 553,594 to 596, 776 to 778, and 850 to regulation of metabolism or pain disorders. Diseases of 852 of SEQ ID NO:20; 11 predicted Casein Kinase II sites metabolic imbalance include, but are not limited to, obesity, (PS00006) located at about amino 133 to 136, 227 to 230, anorexia nervosa, cachexia, lipid disorders or diabetes. 293 to 296, 412 to 415, 443 to 446, 499 to 502, 530 to 533, Examples of pain disorders include, but are not limited to, 15 587 to 590, 603 to 606, 615 to 618, and 723 to 726 of SEQ pain response elicited during various forms of tissue injury, ID NO:20; five predicted tyrosine phosphorylation sites e.g., inflammation, infection, and ischemia, usually referred (PS00007) at about amino acids 29 to 36, 47 to 55, 308 to to as hyperalgesia (described in, for example, Fields, H. L. 315, 549 to 555, and 837 to 844 of SEQ ID NO:20; four (1987) Pain, New York: McGraw-Hill); pain associated with predicted N-myristylation sites (PS00008) from about muscoloskeletal disorders, e.g., joint pain; tooth pain; head amino 176 to 181, 741 to 746, 762 to 767 and 873 to 878 of aches; pain associated with Surgery; pain related to irritable SEQ ID NO:20 and one predicted amidation site (PS00009) bowel syndrome; or chest pain. from about amino acid 642 to 645 of SEQ ID NO:20. For general information regarding PFAM identifiers, PS 21953 prefix and PF prefix domain identification numbers, refer to Prolyl oligopeptidases are a distinct Sub-group of 25 Sonnhammer et al. (1997) Protein 28:405-420. that degrade a variety of proline-containing The 21953 polypeptide contains a significant number of peptides by cleaving the peptide bond at the carboxyl side of structural characteristics in common with members of the proline residues. The natural Substrates of prolyl oligopep human prolyl oligopeptidase family. The term “family tidases include many biologically active peptides Such as when referring to the protein and nucleic acid molecules of peptide messenger molecules. For example, they are 30 the invention means two or more proteins or nucleic acid involved in the metabolism of peptide hormones and neu molecules having a common structural domain or motif and ropeptides. Prolyl oligopeptidases have few naturally occur having sufficient amino acid or nucleotide sequence homol ring inhibitors and their distinctive specificity prevents them ogy as defined herein. Such family members can be naturally from interacting with B-macroglobulin, unlike the great or non-naturally occurring and can be from either the same majority of endopeptidases. The specificity of an oligopep 35 or different species. For example, a family can contain a first tidase depends on the three dimensional structure of its protein of human origin as well as other distinct proteins of active site, which includes a putative , which human origin, or alternatively, can contain homologues of contains aspartate, serine and residues. non-human origin, e.g., rat or mouse proteins. Members of Examples of known prolyl oligopeptidases include human a family can also have common functional characteristics. prolyl oligopeptidase (Yoshimoto et al. Genebank 40 Polypeptide of the prolyl oligopeptidase family such as a AB020018), mouse prolyl oligopeptidase (Ishino et al., J. 21953 polypeptide typically include an N-terminal seven Biochem. 123 (3), 540-545 (1998)), pig prolyl oligopepti blade B-propeller domain and a C-terminal C/B hydrolase dase (Rennix et al., Biochemistry, 30:2195-2203, 1991), rat domain. The N-terminal seven-blade 3-propeller domain dipeptidyl-peptidase IV (Ognata et al., J. Biol. Chem, 264: can include a “DPP IV N-terminal domain or regions 3596-3601, 1989), F meningosepticum prolyl oligopepti 45 homologous with a “DPP IV N-terminal domain.” The dase (Yoshimoto et al., J. Biochem. 110:873-878, 1991), and C-terminal C/B hydrolase domain, e.g., the C-terminal E. coli protease II (Kanatani et al., J. Biochemistry (Tokyo), region of a 21953 polypeptide, can include a “prolyl oli 110: 315-320, 1991). gopeptidase domain or regions homologous with a “prolyl Prolyl oligopeptidases also control the activity of other oligopeptidase domain'. The “prolyl oligopeptidase peptides present in body fluids such as bradykinin and 50 domain can include a catalytic active site, which generally angiotensin. Bradykinin is a very potent vasodilator that occurs at the C-terminal region of the polypeptide chain, increases the permeability of post capillary venules and acts which is involved in the hydrolysis of proline-containing on endothelial cells to activate phospholipase A2. Angio peptide bonds. A prolyl oligopeptidase can be soluble. An tensin causes contraction of vascular Smooth muscle, raising alignment of human dipeptidyl peptidase IV (Accession blood pressure and stimulating aldosterone release from the 55 Number P48147) to the 21953 amino acid sequence is adrenal glands. Other members of the prolyl oligopeptidase depicted in FIGS. 14A-14B. family mediate the degradation of neuropeptides such as As used herein, the term “prolyl oligopeptidase domain' Substance P, thyrotropin releasing hormone, hippocampal includes an amino acid sequence of at least about 60 amino cholinergic neurostimulating peptide (HCNP), neuropeptide acid residues in length and having a bit score for the Y (NPY), and neuropeptides derived from pro-opiomelano 60 alignment of the sequence to the Pfam Hidden Markov cortin (POMC) and neurohypophyseal hormones. Model (HHM) PF00326 of at least 10. Preferably, a prolyl The human 21953 sequence (see SEQ ID NO:19), which oligopeptidase domain includes at least about 30 to 180 is approximately 3143 nucleotides long, including untrans amino acids, more preferably about 50 to 140 amino acid lated regions, contains a predicted methionine-initiated cod residues, or about 60 to 80 amino acids and has a bit score ing sequence of about 2649 nucleotides, including the ter 65 for the alignment of the sequence to the prolyl oligopepti mination codon. The coding sequence encodes a 882 amino dase domain (HMM) of at least 10, 20, 30 or greater. An acid protein (see SEQ ID NO:20). alignment of the prolyl oligopeptidase domain (amino acids US 7,282,360 B2 43 44 672 to 744 of SEQ ID NO:20) of human 21953 with a tional expression data for 21953 polypeptides are described consensus amino acid sequence derived from a hidden below and in the Figures. Generally, increased prolyl oli Markov model is depicted in FIGS. 14A-14B. In a preferred gopeptidase activity has been detected in human prostate, embodiment, a human 21953 polypeptide has a serine pep lung, and sigmoid tumors relative to healthy normal tissue. tidase active site, e.g., an active site that is nearly identical Such increased activity can result from 21953 increased to the Prosite signature PDOC00587. The active site can expression. have a conserved catalytic triad with a conserved serine, As used herein, a “21953 activity”, “biological activity of e.g., a serine residue located at about amino acid 739 of SEQ 21953” or “functional activity of 21953, refers to an ID NO:20, a conserved aspartic acid, e.g., an aspartic acid activity exerted by a 21953 protein, polypeptide or nucleic residue located at about amino acid 817 of SEQ ID NO:20, 10 acid molecule on, e.g., a 21953-responsive cell or on a and a conserved histidine, e.g., a histidine residue located at 21953 Substrate, e.g., a oligopeptide Substrate, as determined about amino acid 849 of SEQ ID NO:20. in vivo or in vitro. In one embodiment, a 21953 activity is In a preferred embodiment 21953 polypeptide or protein a direct activity, such as an association with a 21953 target has a “prolyl oligopeptidase domain or a region which molecule. A “target molecule' or “binding partner is a includes at least about 30-300, more preferably about 15 molecule with which a 21953 protein binds or interacts in 50-150, or 60-80 amino acid residues and has at least about nature. For example, the 21953 proteins of the present 50%. 60%, 70% 80% 90% 95%, 99%, or 100% homology invention can have one or more of the following activities: with a “prolyl oligopeptidase domain, e.g., the prolyl (1) hydrolyzing peptide bonds at the carboxyl side of proline oligopeptidase domain of human 21953 (e.g., residues 672 residues; (2) mediating degradation of proline-containing 744 of SEQ ID NO:20). peptides, e.g., a prolyl endopeptidases activity; (3) process To identify the presence of a “prolyl oligopeptidase' ing of peptide factors (e.g., peptide hormones, chemokines, domain in a 21953 protein sequence, and make the deter cytokines, neuropeptides, and vasoactive peptides); (4) pro mination that a polypeptide or protein of interest has a cessing N-terminal dipeptides of unmodified N-termini particular profile, the amino acid sequence of the protein can wherein the penultimate residue is proline; (5) modulating be searched against a database of HMMs (e.g., the Pfam 25 cell proliferation and/or modulating cell differentiation (e.g., database, release 2.1) using the default parameters. For of a lung, breast, lymphoid, or colon cell); (6) modulating example, the hmmsf program, which is available as part of the regulation of transmission of intracellular signals, e.g., the HMMER package of search programs, is a family during immunological processes; (7) modulating metabo specific default program for MILPAT0063 and a score of 15 lism of neurotransmitters or neuropeptides; (8) modulating is the default threshold score for determining a hit. Alter 30 neurodegeneration; or (9) modulating follicular develop natively, the threshold score for determining a hit can be ment. lowered (e.g., to 8 bits). A description of the Pfam database As used herein, a "dipetidyl peptidase activity” refers to can be found in Sonhammer et al. (1997) Proteins 28(3): a catalytic activity that accelerates the Scission of a peptide 405-420 and a detailed description of HMMs can be found, bond between an amino acid sequence of less than four for example, in Gribskov et al. (1990) Meth. Enzymol. 35 amino acids and the remainder of the polypeptide. Prefer 183:146-159; Gribskov et al. (1987) Proc. Natl. Acad. Sci. ably, the cleaved peptide is a dipeptide having two amino USA 84:4355-4358; Krogh et al. (1994) J. Mol. Biol. 235: acids. The catalytic activity can be mediated by the side 1501-1531; and Stultz et al. (1993) Protein Sci. 2:305-314, chain of a serine amino acid and Surrounding residues in the the contents of which are incorporated herein by reference. active site. A search was performed against the HMM database result 40 As used herein, a “prolyl endopeptidases activity” refers ing in the identification of a “prolyl oligopeptidase domain to a catalytic activity that accelerates the Scission of a domain in the amino acid sequence of human 21953 at about peptide bond adjacent to a proline amino acid in a peptide or residues 672-744 of SEQ ID NO:20 (see FIG. 13). polypeptide chain. This catalytic activity has been detected, In a preferred embodiment, a 21953 polypeptide includes for example, in primary human lung tumors, squamous cell an N-terminal seven-blade B-propeller domain, e.g., resi 45 lung carcinomas, and lung adenocarcinomas. For example, dues about 88 to 663 of SEQ ID NO:20. The amino acid squamous cell lung carcinomas and lung adenocarcinomas sequence of this region can be aligned to the HMM profile showed significantly higher levels of prolyl endopeptidases for DPP IV N-terminal domain or the human DPP IV amino activity relative to normal lung parenchyma. acid sequence (P27487). As used herein, the term “DPP IV In accordance with the above-described sequence simi N-terminal domain refers to an amino acid sequence at 50 larities and observed polypeptide expression pattern, the least 60% identical to residues about 88 to 663 of SEQ ID 21953 molecules of the present invention can have similar NO:2O. biological activities as related prolyl oligopeptidase family A 21953 family member can include a prolyl oligopepti members. Members of the prolyl oligopeptidase family can dase domain and may also include a cAMP phosphorylation play an important role in the metabolism of a variety of site and coiMP-dependent protein kinase phosphorylation 55 proline containing peptides by cleaving prolyl bonds. These domain, a predicted Protein Kinase C site, a predicted peptides can be less than about 200, 150, 100, or 50 residues Casein Kinase II site, a predicted tyrosine phosphorylation in length. Prolyl oligopeptidases are involved, e.g., alone or site, a predicted N-myristylation site, and an amidation site. together with other factors, in the regulation, e.g., process As the 21953 polypeptides of the invention may modulate ing, activation, or degradation of biological factors, e.g., 21953-mediated activities, e.g., a dipeptidyl peptidase activ 60 peptide hormones (such as growth hormone, insulin, pro ity Such as a prolyl oligopeptidase activity, they may be lactin, adrenocorticotropic hormone, placental lactogen, cal useful for developing novel diagnostic and therapeutic citonin, parathyroid hormone, and thyroid stimulating hor agents for 21953-mediated or related disorders, as described mone); chemokines; cytokines; neuropeptides; and below. The 21953 polypeptide of the invention are highly vasoactive peptides. expressed in tumors, for example in breast and lung tumors. 65 As the 21953 mRNA is highly expressed, for example, in Further, 21953 polypeptide expression is increased at the cancerous tissues (e.g., lung and breast tumors), as well as G1-S phase transition of the mammalian cell cycle. Addi normal cardiovascular, neural, and prostatic tissues, the US 7,282,360 B2 45 46 molecules of the invention can be used to treat, prevent non-neoplastic polyps, adenomas, familial syndromes, col and/or diagnose disorders involving aberrant activity of orectal carcinogenesis, colorectal carcinoma, and carcinoid 21953-expressing cells. Accordingly, the 21953 molecules tumors. can act as novel diagnostic targets and therapeutic agents for Examples of cellular proliferative and/or differentiative controlling disorders associated with the aberrant activity or 5 disorders of the liver include, but are not limited to, nodular degradation of peptide hormones, e.g., disorders associated hyperplasias, adenomas, and malignant tumors, including with cell differentiation and proliferation (e.g., a cancer of primary carcinoma of the liver and metastatic tumors. the lung, breast, ovary, and colon tissues), immune function Examples of cellular proliferative and/or differentiative (e.g., T cell activities, e.g., lymphomas, leukemias, and disorders of the ovary include, but are not limited to, ovarian immune disorders), reproductive, neurological and cardio 10 tumors such as, tumors of coelomic epithelium, Serous vascular function. tumors, mucinous tumors, endometeriod tumors, clear cell As used herein, the terms “cancer”,99 “hyperproliferative' adenocarcinoma, cystadenofibroma, brenner tumor, Surface and “neoplastic' refer to cells having the capacity for epithelial tumors; germ cell tumors such as mature (benign) autonomous growth, i.e., an abnormal state or condition 15 teratomas, monodermal teratomas, immature malignant ter characterized by rapidly proliferating cell growth. Hyper atomas, dysgerminoma, endodermal sinus tumor, choriocar proliferative and neoplastic disease states may be catego cinoma; sex cord-stomal tumors such as, granulosa-theca rized as pathologic, i.e., characterizing or constituting a cell tumors, thecoma-fibromas, androblastomas, hill cell disease state, or may be categorized as non-pathologic, i.e., tumors, and gonadoblastoma; and metastatic tumors such as a deviation from normal but not associated with a disease Krukenberg tumors. state. The term is meant to include all types of cancerous Additional examples of proliferative disorders include growths or oncogenic processes, metastatic tissues or malig hematopoietic neoplastic disorders. As used herein, the term nantly transformed cells, tissues, or organs, irrespective of "hematopoietic neoplastic disorders' includes diseases histopathologic type or stage of invasiveness. “Pathologic involving hyperplastic/neoplastic cells of hematopoietic ori hyperproliferative” cells occur in disease states character 25 gin, e.g., arising from myeloid, lymphoid or erythroid lin ized by malignant tumor growth. Examples of non-patho eages, or precursor cells thereof. Preferably, the diseases logic hyperproliferative cells include proliferation of cells arise from poorly differentiated acute leukemias, e.g., eryth associated with wound repair. roblastic leukemia and acute megakaryoblastic leukemia. Examples of cellular proliferative and/or differentiative Additional exemplary myeloid disorders include, but are not 30 limited to, acute promyeloid leukemia (APML), acute myel disorders include cancer, e.g., carcinoma, sarcoma, or meta ogenous leukemia (AML) and chronic myelogenous leuke static disorders. The 21953 molecules can act as novel mia (CML) (reviewed in Vaickus, L. (1991) Crit Rev. in diagnostic targets and therapeutic agents for controlling lung Oncol./Hemotol. 11:267-97); lymphoid malignancies cancer, breast cancer, ovarian cancer, colon cancer, metasta sis of Such cancers and the like. A metastatic tumor can arise include, but are not limited to acute lymphoblastic leukemia 35 (ALL) which includes B-lineage ALL and T-lineage ALL, from a multitude of primary tumor types, including but not chronic lymphocytic leukemia (CLL), prolymphocytic leu limited to those of lung, breast, liver, colon and ovarian kemia (PLL), hairy cell leukemia (HLL) and Waldenstrom's origin. macroglobulinemia (WM). Additional forms of malignant Examples of cellular proliferative and/or differentiative lymphomas include, but are not limited to non-Hodgkin disorders of the lung include, but are not limited to, squa 40 lymphoma and variants thereof, peripheral T cell lympho mous cell lung carcinomas, Small cell lung carcinoma, lung mas, adult T cell leukemia/lymphoma (ATL), cutaneous adenocarcinomas, bronchogenic carcinoma, including para T-cell lymphoma (CTCL), large granular lymphocytic leu neoplastic syndromes, bronchioloalveolar carcinoma, neu kemia (LGF), Hodgkin’s disease and Reed-Sternberg dis roendocrine tumors, such as bronchial carcinoid, miscella CaSC. neous tumors, and metastatic tumors; pathologies of the 45 The 21953 nucleic acid and protein of the invention can pleura, including inflammatory pleural effusions, noninflam be used to treat and/or diagnose a variety of immune matory pleural effusions, pneumothorax, and pleural tumors, disorders, e.g., as a result of aberrant 21953 activity in T including Solitary fibrous tumors (pleural fibroma) and cells. Examples of immune disorders or diseases include, but malignant mesothelioma. are not limited to, autoimmune diseases (including, for Examples of cellular proliferative and/or differentiative 50 example, diabetes mellitus, arthritis (including rheumatoid disorders of the breast include, but are not limited to, arthritis, juvenile rheumatoid arthritis, osteoarthritis, psori proliferative breast disease including, e.g., epithelial hyper atic arthritis), multiple Sclerosis, encephalomyelitis, myas plasia, Sclerosing adenosis, and Small duct papillomas; thenia gravis, systemic lupus erythematosis, autoimmune tumors, e.g., Stromal tumors such as fibroadenoma, phyl thyroiditis, dermatitis (including atopic dermatitis and lodes tumor, and sarcomas, and epithelial tumors such as 55 eczematous dermatitis), psoriasis, Sjögren's Syndrome, large duct papilloma; carcinoma of the breast including in Crohn's disease, aphthous ulcer, iritis, conjunctivitis, kera situ (noninvasive) carcinoma that includes ductal carcinoma toconjunctivitis, ulcerative colitis, asthma, allergic asthma, in situ (including Paget’s disease) and lobular carcinoma in cutaneous lupus erythematosus, Scleroderma, vaginitis, situ, and invasive (infiltrating) carcinoma including, but not proctitis, drug eruptions, leprosy reversal reactions, limited to, invasive ductal carcinoma, invasive lobular car 60 erythema nodosum leprosum, autoimmune uveitis, allergic cinoma, medullary carcinoma, colloid (mucinous) carci encephalomyelitis, acute necrotizing hemorrhagic encepha noma, tubular carcinoma, and invasive papillary carcinoma, lopathy, idiopathic bilateral progressive sensorineural hear and miscellaneous malignant neoplasms. Disorders in the ing loss, aplastic anemia, pure red cell anemia, idiopathic male breast include, but are not limited to, gynecomastia and thrombocytopenia, polychondritis, Wegener's granulomato carcinoma. 65 sis, chronic active hepatitis, Stevens-Johnson syndrome, Examples of cellular proliferative and/or differentiative idiopathic sprue, lichen planus, Graves disease, sarcoidosis, disorders of the colon include, but are not limited to, primary biliary cirrhosis, uveitis posterior, and interstitial US 7,282,360 B2 47 48 lung fibrosis), graft-Versus-host disease, cases of transplan of the mitral valve (mitral valve prolapse), rheumatic fever tation, and allergy Such as, atopic allergy. and rheumatic heart disease, infective endocarditis, and Examples of neuronal disorders include, but are not noninfected vegetations. Such as nonbacterial thrombotic limited to disorders involving neurons, and disorders involv endocarditis and endocarditis of systemic lupus erythema ing glia, such as astrocytes, oligodendrocytes, ependymal to SuS (Libman-Sacks disease), carcinoid heart disease, and cells, and microglia; cerebral edema, raised intracranial complications of artificial valves; myocardial disease, pressure and herniation, and hydrocephalus; malformations including but not limited to dilated cardiomyopathy, hyper and developmental diseases, such as neural tube defects, trophic cardiomyopathy, restrictive cardiomyopathy, and forebrain anomalies, posterior fossa anomalies, and Syrin myocarditis; pericardial disease, including but not limited to, gomyelia and hydromyelia; perinatal brain injury; transmis 10 pericardial effusion and hemopericardium and pericarditis, sible spongiform encephalopathies (prion diseases); demy including acute pericarditis and healed pericarditis, and elinating diseases, including multiple Sclerosis, multiple rheumatoid heart disease; neoplastic heart disease, including Sclerosis variants, acute disseminated encephalomyelitis and but not limited to, primary cardiac tumors. Such as myxoma, acute necrotizing hemorrhagic encephalomyelitis, and other lipoma, papillary fibroelastoma, rhabdomyoma, and sar diseases with demyelination; degenerative diseases, such as 15 coma, and cardiac effects of noncardiac neoplasms; con degenerative diseases affecting the cerebral cortex, includ genital heart disease, including but not limited to, left-to ing Alzheimer disease and Pick disease, degenerative dis right shunts—late cyanosis, such as atrial septal defect, eases of basal ganglia and brain stem, including Parkin Ventricular septal defect, patent ductus arteriosus, and atrio Sonism, idiopathic Parkinson disease (paralysis agitans), Ventricular septal defect, right-to-left shunts—early cyano progressive Supranuclear palsy, corticobasal degeneration, sis, Such as tetralogy of fallot, transposition of great arteries, multiple system atrophy, including striatonigral degenera truncus arteriosus, tricuspid atresia, and total anomalous tion, Shy-Drager syndrome, and olivopontocerebellar atro pulmonary venous connection, obstructive congenital phy, and Huntington disease; spinocerebellar degenerations, anomalies. Such as coarctation of aorta, pulmonary Stenosis including spinocerebellar ataxias, including Friedreich and atresia, and aortic Stenosis and atresia, and disorders ataxia, and ataxia-telanglectasia, degenerative diseases 25 involving cardiac transplantation. affecting motor neurons, including amyotrophic lateral scle As used herein, “a prostate disorder” refers to an abnor rosis (motor neuron disease), bulbospinal atrophy (Kennedy mal condition occurring in the male pelvic region charac syndrome), and spinal muscular atrophy; tumors, such as terized by, e.g., male sexual dysfunction and/or urinary gliomas, including astrocytoma, including fibrillary (diffuse) symptoms. This disorder may be manifested in the form of astrocytoma and glioblastoma multiforme, pilocytic astro 30 genitourinary inflammation (e.g., inflammation of Smooth cytoma, pleomorphic Xanthoastrocytoma, and brain stem muscle cells) as in several common diseases of the prostate glioma, oligodendroglioma, and ependymoma and related including prostatitis, benign prostatic hyperplasia and can paraventricular mass lesions, neuronal tumors, poorly dif cer, e.g., adenocarcinoma or carcinoma, of the prostate. ferentiated neoplasms, including medulloblastoma, other The 21953 nucleic acid and protein of the invention can parenchymal tumors, including primary brain lymphoma, 35 be used to treat and/or diagnose a variety of conditions, in germ cell tumors, and pineal parenchymal tumors, menin addition to the ones described above (see “Methods of giomas, metastatic tumors, paraneoplastic syndromes, Treatment” for additional examples). peripheral nerve sheath tumors, including Schwannoma, The presence of 21953 RNA or protein can also be used neurofibroma, and malignant peripheral nerve sheath tumor to identify a cell or tissue, or other biological sample, as (malignant Schwannoma), and neurocutaneous syndromes 40 being derived from breast, T-cell, kidney, liver, and aorta, or (phakomatoses), including neurofibromotosis, including being of human origin. Expression can also be used to Type 1 neurofibromatosis (NF1) and TYPE 2 neurofibro diagnose or stage a disorder, e.g., a cancer (e.g., a cancer of matosis (NF2), tuberous sclerosis, and Von Hippel-Lindau the lung or breast), or a breast, lymphoid, lung, ovarian, or disease. liver disorder. Expression can be determined by evaluating The term “vascular disorder includes disorders involving 45 RNA, e.g., by hybridization of a 21953 specific probe, or aberrant activity (e.g., proliferation, metabolism, angiogen with a 21953 specific antibody. esis, vascularization) of blood vessel-associated cells, e.g., smooth muscle or endothelial cells. Examples of such dis m32404 orders include but are not limited to hypertension (e.g., Four major classes of proteases are known and are des arterial hypertension), vascular restenosis, ischemic disease 50 ignated by the principal functional group in their active site: (e.g., atherosclerosis), tumorigenesis, tumor metastasis, dia serine, thiol, carboxyl, and metallo. Serine proteases are betic retinopathy, endometriosis, Grave's disease. Aberrant characterized by the presence of a unique serine residue that vascular activity may also affect cardiovascular function, functions as a nucleophile to cleave peptide bonds. In some and thus the molecules of the invention can be used to treat, cases, the serine forms covalent adducts with Substrates and prevent and/or diagnose cardiovascular disorders. Examples 55 inhibitors. The serine functions with two other principal of cardiovascular disorders, include but are not limited to, residues of the active site, a histidine, and an acid, frequently heart failure, cardiac hypertrophy, left-sided heart failure, aspartic acid. Together these three residues compose the and right-sided heart failure; ischemic heart disease, includ catalytic triad which is a signature of the family. Serine ing but not limited to angina pectoris, myocardial infarction, proteases are divided into two major evolutionary families. chronic ischemic heart disease, and Sudden cardiac death; 60 One family is represented by the bacterial protease subtili hypertensive heart disease, including but not limited to, sin. The other family is the trypsin- family and systemic (left-sided) hypertensive heart disease and pulmo includes chymotrypsin, trypsin, and . Members of nary (right-sided) hypertensive heart disease; valvular heart the trypsin-chymotrypsin serine protease family are disease, including but not limited to, Valvular degeneration involved in a range of diverse cellular functions including, caused by calcification, such as calcific aortic Stenosis, 65 cell motility, cell growth and differentiation, hormone pro calcification of a congenitally bicuspid aortic valve, and duction, organogenesis, extracellular matrix regulation, mitral annular calcification, and myxomatous degeneration blood clotting, and complementation activation. US 7,282,360 B2 49 50 While the various serine proteases catalyze this reaction cercarial elastase; ; Factor C. Proclotting enzyme; in very similar ways, they differ in their preference for the easter gene product; Snake gene product; stubble gene amino acid side chains immediately C-terminal to the cleave product; Vitellin-degrading ; hypodermin C: site. Trypsin cleaves bonds only after lysine and arginine Serine proteases 1 and 2; achelase, chymotrypsin (forms A, residues, whereas chymotrypsin cleaves bonds after large B, II, and 2); Proteinase RVV-V (forms C. and Y); flavoboxin; hydrophobic residues. Some members of the trypsin serine ; Crotalase; ; ; ; protease family play critical roles in a variety of important seminin; ; tissue ; renal kallikrein; biological events including regulating cell proliferation, submandibular kallikrein; 7S nerve growth factor (chains C. tumor growth, tumor invasion, metastasis, development, and and Y); epidermal growth factor-binding protein (forms 1, 2, tissue remodeling. 10 and 3); tonin; arginine esterase; I: pan The human m32404 sequence (see SEQ ID NO:24), creatic elastase II (forms A and B); pancreatic endopeptidase which is approximately 2219 nucleotides long including E (forms A and B); leukocyte elastase; medullasin; azuro untranslated regions, contains a predicted methionine-initi cidin; G, (); ated coding sequence of about 1659 nucleotides, including (forms I and II); Y-renin; (forms 1, 2, and 3); the termination codon. The coding sequence encodes a 552 15 A: natural killer cell protease 1, gilatoxin; amino acid protein (SEQ ID NO:25). The human m32404 B, C, D, E, F, G and Y: carboxypeptidase A protein of SEQ ID NO:25 and FIG. 15 includes an amino complex component III; complement factors D. B. I; terminal hydrophobic amino acid sequence, consistent with complement components C1r, C1s, and C2; calcium-depen a signal sequence, of about 23 amino acids (from amino acid dent serine protease; hypodermin A, B, and C.; haptoglobin 1 to about amino acid 23 of SEQ ID NO:25), which upon (forms 1 and 2); haptoglobin-related protein; , apo cleavage results in the production of a mature protein form. lipoprotein (a); hepatocyte growth factor; medullasin; This mature protein form is approximately 529 amino acid ; t-, u-plasminogen activator; residues in length (from about amino acid 24 to amino acid salivary plasminogen activator, ; coagula 552 of SEQ ID NO:25). tion factors VII, IX, X, XI, and XII; and proteins C and Z. Human m32404 contains the following regions or other 25 as well as as-yet unidentified members. structural features: Accordingly, the m32404 polypeptide contains a signifi two trypsin domains (PFAM Accession PF00089) located cant number of structural characteristics in common with at about amino acid residues 45 to 268 and 311 to 520 of members of the S1 family of the SA clan of serine-type SEQ ID NO:25, which include trypsin histidine and serine proteases (also referred to herein as "trypsin-chymotrypsin' active sites located at about amino acids 73-78 and 337-342, 30 or “trypsin' family members). The term “family' when and 218-229, respectively, of SEQ ID NO:25; referring to the protein and nucleic acid molecules of the eight predicted Protein Kinase C phosphorylation sites invention means two or more proteins or nucleic acid (PS00005) at about amino acids 4 to 6, 53 to 55, 96 to 98, molecules having a common structural domain or motif and 173 to 175, 246 to 248, 298 to 300, 422 to 424, and 504 to having Sufficient amino acid or nucleotide sequence homol 506 of SEQ ID NO:25; 35 ogy as defined herein. Such family members can be naturally six predicted Casein Kinase II phosphorylation sites or non-naturally occurring and can be from either the same (PS00006) located at about amino acid 161 to 164, 348 to or different species. For example, a family can contain a first 351, 375 to 378, 496 to 499, 514 to 517, and 521 to 524 of protein of human origin as well as other distinct proteins of SEQ ID NO:25; human origin, or alternatively, can contain homologues of two predicted N-glycosylation sites (PS00001) from 40 non-human origin, e.g., rat or mouse proteins. Members of about amino acid 166 to 169 and 545 to 548 of SEQ ID a family can also have common functional characteristics. NO:25; and As used herein, a "trypsin-chymotrypsin family member nine predicted N-myristylation sites (PS00008) from typically contains a catalytic unit which is generally a about amino 20 to 25, 58 to 63, 64 to 69, 101 to 106, 126 to polypeptide sequence of about 100 to about 300 amino 131, 206 to 211, 297 to 302,328 to 333, and 460 to 465 of 45 acids, more preferably about 150 to about 250 amino acid SEQ ID NO:25. residues, even more preferably about 200 to about 230 For general information regarding PFAM identifiers, PS amino acid residues, although some members have N-ter prefix and PF prefix domain identification numbers, refer to minal extensions of unrelated peptide segments. The cata Sonnhammer et al. (1997) Protein 28:405-420. lytic unit almost always forms the C-terminal portion of the The m32404 polypeptide contains a significant number of 50 enzyme. These proteases typically cleave arginine or lysine structural characteristics in common with members of the residues in a target protein. trypsin serine protease family (Rawlings and Barret (1993) Trypsin-chymotrypsin family members preferably have at Biochem J. 290: 205-218, and Meth. Enzymol. (1994) 244: least one trypsin domain, comprising at least one histidine 19-61, the contents of which are hereby incorporated by active site residue, and at least one serine active site residue. reference in their entirety). Based on the presence of the 55 Trypsin-chymotrypsin family members can also include an histidine-aspartate-serine catalytic triad, the m32404 aspartate residue within the trypsin domain. These three polypeptide appears to be a member of the serine protease residues act as a “catalytic triad, with serine as nucleophile, clan SA (Rawlings and Barret supra). The clan SA includes aspartate as electrophile, and histidine as base. The serine the trypsin-chymotrypsin family (S1), the C-lytic endopep nucleophile typically occurs in a signature motif character tidase family (S2), and the Togavirus endopeptidase family 60 ized by Prosite Motif PS00135 (also PDOC00124): G-DE (S3). S-G-GS. Typically, a trypsin domain additionally includes The m32404 polypeptide seems to belong to the trypsin an activation and cleavage site, Arg-Ile-Val-Gly-Gly (or chymotrypsin family (S1). The prototype of this family is “RIVGG'; SEQ ID NO:30), which is present just N-termi chymotrypsin and the 3D structure of some of its members nal to the serine protease domain. has been resolved. The trypsin-chymotrypsin family (S1) 65 m32404 polypeptides contain structural features similar includes such members as: trypsin (forms I, II, III, IV, Va and to trypsin-chymotrypsin family members. For example, each Vb); trypsin-like enzyme: ; TMPRSS2; venombin; of the two trypsin domains of the m32404 polypeptide has US 7,282,360 B2 51 52 a conserved histidine residue present at about amino acid 77 a polypeptide or protein of interest has a particular profile, and 341 of SEQ ID NO:25. The histidine base typically the amino acid sequence of the protein can be searched occurs in a signature motif characterized by Prosite Motif against a database of HMMs (e.g., the Pfam database, PS00134 (also PDOC00124): LIVM-ST-A-STAG-H- release 2.1) using the default parameters. For example, the C. An m32404 polypeptide also contains the sequence himmsf program, which is available as part of the HMMER LTAAHC (SEQ ID NO:31), which matches PS00134, at package of search programs, is a family specific default about amino acids 73 to 78 and 337 to 342 of SEQ ID program for MILPAT0063 and a score of 15 is the default NO:25. threshold score for determining a hit. Alternatively, the In addition, the m32404 polypeptide includes the threshold score for determining a hit can be lowered (e.g., to sequence GDSGG (SEQ ID NO:32), which matches 10 8 bits). A description of the Pfam database can be found in PS00135, at about amino acids 222 to 226 of SEQ ID Sonhammer et al. (1997) Proteins 28(3):405-420 and a NO:25. The serine active site is located at amino acid 224 of detailed description of HMMs can be found, for example, in SEQ ID NO:25. The trypsin domains of the m32404 Gribskov et al. (1990) Meth. Enzymol. 183:146-159; Grib polypeptide additionally include eleven conserved cys skov et al. (1987) Proc. Natl. Acad. Sci. USA 84:4355-4358: teines, which are present at about amino acids 62, 187, 209, 15 Kroghet al. (1994) J. Mol. Biol. 235:1501-1531; and Stultz 220, 249,326,342, 443, 463, 473, 501 of SEQID NO:25. et al. (1993) Protein Sci. 2:305-314, the contents of which Trypsin-chymotrypsin family members occasionally are incorporated herein by reference. A search was per function intracellularly, but more generally, they act extra formed against the PFAM HMM database resulting in the cellularly. Examples of such extracellular activity include identification of two “trypsin domains' in the amino acid release or activation of growth factors, degradation of extra sequence of human m32404 from about residues 45 to 268 cellular matrix, , , Zymogen and and 311 to 520 of SEQID NO:25 with a bit score of 254 (see growth hormone activation, and complement activation. FIGS. 16A-16B). Trypsin-chymotrypsin family members have been impli To identify the presence of a “trypsin’ domain in an cated in modulating tumor invasion and growth by, for m32404 protein sequence, and make the determination that example, releasing or activating growth factors and/or 25 a polypeptide or protein of interest has a particular profile, digesting extracellular matrix components. the amino acid sequence of the protein can also be searched An m32404 polypeptide includes at least one and prefer against a SMART database (Simple Modular Architecture ably two “trypsin domains” or at least one and preferably Research Tool) of HMMs as described in Schultz et al. two regions homologous with a "trypsin domain.” (1998), Proc. Natl. Acad. Sci. USA95:5857 and Schultz et As used herein, the term “trypsin domain” (or a "trypsin 30 al. (200) Nucl. Acids Res 28:231. The database contains chymotrypsin' domain) refers to a protein domain having an domains identified by profiling with the hidden Markov amino acid sequence of from about 50 to about 350 amino models of the HMMer2 search program (R. Durbin et al. acid residues and having a bit score for the alignment of the (1998) Biological sequence analysis: probabilistic models of sequence to the trypsin domain (HMM) of at least 60. proteins and nucleic acids. Cambridge University Press). Preferably, a trypsin domain includes at least about 100 to 35 The database also is extensively annotated and monitored by about 300 amino acids, more preferably about 150 to about experts to enhance accuracy. A search was performed against 250 amino acid residues, more preferably about 200 to about the HMM database resulting in the identification of two 230 amino acids and has a bit score for the alignment of the "trypsin’ domains in the amino acid sequence of human sequence to the trypsin domain (HMM) of at least 80, m32404 at about residues 38 to 268 and 300 to 520 of SEQ preferably at least 90, more preferably at least 100, and most 40 preferably 110 or greater. The trypsin domain (HMM) has ID NO:25 (see 3A-3B). An m32404 family member can include one or more of a been assigned the PFAM Accession (PF00089). Alignments trypsin domain, a signal peptide domain, an N-glycosylation of two trypsin domains (from about amino acids 45 to 268 site, a protein kinase C phosphorylation site, a casein kinase and from about amino acids 311 to 520 of SEQID NO:25) II phosphorylation site, or an N-myristoylation site. of human m32404 with a consensus amino acid sequence 45 derived from a hidden Markov model (PFAM) are depicted As used herein, a 'signal peptide' or “signal sequence' in FIGS. 16A and 16B. Alignments of the two trypsin refers to a peptide of about 15 to 30, preferably about 20 to domains (from about amino acids 38 to about 268 and from 25, more preferably, 23 amino acid residues in length which about amino acids 300 to 520 of SEQID NO:25) of human occurs at the N-terminus of secretory and integral membrane m32404 with a consensus amino acid sequence derived from 50 proteins and which contains a majority of hydrophobic another hidden Markov model (SMART) are depicted in amino acid residues. For example, a signal sequence con FIGS. 17A and 17B. tains at least about 15 to 25 amino acid residues, preferably In a preferred embodiment, an m32404 polypeptide or about 20 to 25 amino acid residues, more preferably about protein has a "trypsin’ domain or a region which includes at 23 amino acid residues, and has at least about 40-70%, least about 100 to about 300 amino acids, more preferably 55 preferably about 50-65%, and more preferably about about 150 to about 250 amino acid residues, or about 210 to 55-60% hydrophobic amino acid residues (e.g., alanine, about 235 amino acid residues and has at least about 70%, valine, leucine, isoleucine, phenylalanine, tyrosine, tryp 80%, 90%. 95%, 99%, or 100% homology with a “trypsin tophan, or proline). Such a 'signal sequence', also referred domain.” e.g., either trypsin domain of human m32404 (e.g., to in the art as a “signal peptide', serves to direct a protein residues about 45 to 268 and 311 to 520 of SEQID NO:25). 60 containing Such a sequence to a lipid bilayer. For example, Preferably, the trypsin domain includes at least one histidine in one embodiment, an m32404 protein contains a signal active site residue, and at least one serine active site residue. sequence of about amino acids 1 to 23 of SEQ ID NO:25. The trypsin domain can also include an aspartate residue, The “signal sequence' is cleaved during processing of the thus forming a catalytic triad, with serine as nucleophile, mature protein. The mature m32404 protein corresponds to aspartate as electrophile, and histidine as base. 65 amino acids 24 to 552 of SEQ ID NO:25. To identify the presence of a “trypsin' domain in an As the m32404 polypeptides of the invention may modu m32404 protein sequence, and make the determination that late m32404-mediated activities, they may be useful as of US 7,282,360 B2 53 54 for developing novel diagnostic and therapeutic agents for diagnostic targets and therapeutic agents for controlling m32404-mediated or related disorders, as described below. breast cancer, ovarian cancer, colon cancer, lung cancer, As used herein, a “m32404 activity.” “biological activity metastasis of Such cancers and the like. A metastatic tumor of m32404” or “functional activity of m32404” refers to an can arise from a multitude of primary tumor types, including activity exerted by an m32404 protein, polypeptide or 5 but not limited to those of breast, lung, liver, colon and nucleic acid molecule on e.g., an m32404-responsive cell or ovarian origin. on an m32404 Substrate, e.g., a protein Substrate, as deter Examples of cancers or neoplastic conditions, in addition mined in vivo or in vitro. In one embodiment, an m32404 to the ones described above, include, but are not limited to, activity is a direct activity, Such as an association with an a fibrosarcoma, myosarcoma, liposarcoma, chondrosar m32404 target molecule. A “target molecule' or “binding 10 coma, osteogenic sarcoma, chordoma, angiosarcoma, endot partner is a molecule with which an m32404 protein binds heliosarcoma, lymphangiosarcoma, lymphangioendothe or interacts in nature. An m32404 activity can also be an liosarcoma, synovioma, mesothelioma, Ewings tumor, indirect activity, e.g., a cellular signaling activity mediated leiomyosarcoma, rhabdomyosarcoma, gastric cancer, esoph by interaction of the m32404 protein with an m32404 ageal cancer, rectal cancer, pancreatic cancer, ovarian can receptor. For example, the m32404 proteins of the present 15 cer, prostate cancer, uterine cancer, cancer of the head and invention can have one or more of the following activities: neck, skin cancer, brain cancer, squamous cell carcinoma, (1) modulate (e.g., stimulate or inhibit) cellular proliferation sebaceous gland carcinoma, papillary carcinoma, papillary (2) modulate cell differentiation; (3) modulate tumorigenesis adenocarcinoma, cystadenocarcinoma, medullary carci and/or tumor invasion; (4) alter extracellular matrix com noma, bronchogenic carcinoma, renal cell carcinoma, position; (5) catalyze polypeptide growth factor activation hepatoma, bile duct carcinoma, choriocarcinoma, semi and/or release; (6) regulate the blood clotting cascade; (7) noma, embryonal carcinoma, Wilm's tumor, cervical cancer, catalyze proteolytic cleavage of a Substrate, e.g., a protein testicular cancer, Small cell lung carcinoma, non-small cell Substrate (e.g., cleavage at an arginine or lysine residue); (8) lung carcinoma, bladder carcinoma, epithelial carcinoma, catalyze the proteolytic activation of signaling molecules, glioma, astrocytoma, medulloblastoma, craniopharyngioma, e.g., other proteases, growth factor activation or release; or 25 ependymoma, pinealoma, hemangioblastoma, acoustic neu (9) regulate of cell motility or attachment. roma, oligodendroglioma, meningioma, melanoma, neuro Based on the above-described sequence similarities, the blastoma, retinoblastoma, leukemia, lymphoma, or Kaposi m32404 molecules of the present invention are predicted to SaCOa. have similar biological activities as other trypsin family Examples of cellular proliferative and/or differentiative members, such as hepsin proteases. Hepsin proteases are 30 disorders of the breast include, but are not limited to, overexpressed in ovarian tumors and hepatoma cells (Tan proliferative breast disease including, e.g., epithelial hyper imoto, H. et al. (1997) Cancer Res. 57:2884-2887). Further plasia, Sclerosing adenosis, and Small duct papillomas; in vitro studies have shown inhibition of hepatoma cell tumors, e.g., Stromal tumors such as fibroadenoma, phyl proliferation using hepsin inhibitors (Torres-Rosado, A. et lodes tumor, and sarcomas, and epithelial tumors such as al. (1993) Proc. Natl. Acad. Sci. USA 90: 7181-7185). 35 large duct papilloma; carcinoma of the breast including in Accordingly, m32404 molecules are predicted to have pep situ (noninvasive) carcinoma that includes ductal carcinoma tidase activity, and are predicted to regulate cell proliferation in situ (including Paget’s disease) and lobular carcinoma in and differentiation, to regulate coagulation (such as in blood situ, and invasive (infiltrating) carcinoma including, but not clotting), regulate organogenesis, control hormone produc limited to, invasive ductal carcinoma, invasive lobular car tion, and/or modulate complement activation. Thus, the 40 cinoma, medullary carcinoma, colloid (mucinous) carci m32404 molecules can serve as novel diagnostic targets and noma, tubular carcinoma, and invasive papillary carcinoma, therapeutic agents for controlling cell proliferation and and miscellaneous malignant neoplasms. Disorders in the differentiation disorders, coagulation disorders, hormonal male breast include, but are not limited to, gynecomastia and disorders, fertilization disorders, and disorders of organo carcinoma. genesis and cell signaling. 45 Examples of cellular proliferative and/or differentiative The polypeptides and nucleic acids of the invention can disorders of the lung include, but are not limited to, bron also be used to treat, prevent, and/or diagnose cancers and chogenic carcinoma, including paraneoplastic syndromes, neoplastic conditions in addition to the ones described bronchioloalveolar carcinoma, neuroendocrine tumors, such above. As used herein, the terms “cancer,” “hyperprolifera as bronchial carcinoid, miscellaneous tumors, and metastatic tive' and “neoplastic’ refer to cells having the capacity for 50 tumors; pathologies of the pleura, including inflammatory autonomous growth, i.e., an abnormal state or condition pleural effusions, noninflammatory pleural effusions, pneu characterized by rapidly proliferating cell growth. Hyper mothorax, and pleural tumors, including Solitary fibrous proliferative and neoplastic disease states may be catego tumors (pleural fibroma) and malignant mesothelioma. rized as pathologic, i.e., characterizing or constituting a Examples of cellular proliferative and/or differentiative disease state, or may be categorized as non-pathologic, i.e., 55 disorders of the colon include, but are not limited to, a deviation from normal but not associated with a disease non-neoplastic polyps, adenomas, familial syndromes, col state. The term is meant to include all types of cancerous orectal carcinogenesis, colorectal carcinoma, and carcinoid growths or oncogenic processes, metastatic tissues or malig tumors. nantly transformed cells, tissues, or organs, irrespective of Examples of cellular proliferative and/or differentiative histopathologic type or stage of invasiveness. “Pathologic 60 disorders of the liver include, but are not limited to, nodular hyperproliferative” cells occur in disease states character hyperplasias, adenomas, and malignant tumors, including ized by malignant tumor growth. Examples of non-patho primary carcinoma of the liver and metastatic tumors. logic hyperproliferative cells include proliferation of cells Examples of cellular proliferative and/or differentiative associated with wound repair. disorders of the ovary include, but are not limited to, ovarian Examples of cellular proliferative and/or differentiative 65 tumors such as, tumors of coelomic epithelium, Serous disorders include cancer, e.g., carcinoma, sarcoma, or meta tumors, mucinous tumors, endometeriod tumors, clear cell static disorders. The m32404 molecules can act as novel adenocarcinoma, cystadenofibroma, brenner tumor, Surface US 7,282,360 B2 55 56 epithelial tumors; germ cell tumors such as mature (benign) one predicted tyrosine kinase phosphorylation site teratomas, monodermal teratomas, immature malignant ter (PS00007) from about amino acids 108 to 116 of SEQ ID atomas, dysgerminoma, endodermal sinus tumor, choriocar NO:34; or cinoma; sex cord-stomal tumors such as, granulosa-theca one predicted serine protease, histidine active site cell tumors, thecoma-fibromas, androblastomas, hill cell 5 (PS00134) from about amino acids 52 to 57 of SEQ ID tumors, and gonadoblastoma; and metastatic tumors such as NO:34. Krukenberg tumors. For general information regarding PFAM identifiers, PS prefix and PF prefix domain identification numbers, refer to Additional examples of proliferative disorders include Sonnhammer et al. (1997) Protein 28:405-420. hematopoietic neoplastic disorders. As used herein, the term 10 The 14089 polypeptide contains a significant number of "hematopoietic neoplastic disorders' includes diseases structural characteristics in common with members of the involving hyperplastic/neoplastic cells of hematopoietic ori trypsin serine protease family (Rawlings and Barret (1993) gin, e.g., arising from myeloid, lymphoid or erythroid lin Biochem J. 290: 205-218, and Meth. Enzymol. (1994) 244: eages, or precursor cells thereof. Preferably, the diseases 19-61, the contents of which are hereby incorporated by arise from poorly differentiated acute leukemias, e.g., eryth 15 reference in their entirety). Based on the presence of the roblastic leukemia and acute megakaryoblastic leukemia. histidine-aspartate-serine catalytic triad, the 14089 polypep Additional exemplary myeloid disorders include, but are not tide appears to be a member of the serine protease clan SA limited to, acute promyeloid leukemia (APML), acute myel (Rawlings and Barret, supra). The clan SA includes the ogenous leukemia (AML) and chronic myelogenous leuke trypsin-chymotrypsin family (S1), the C-lytic endopeptidase mia (CML) (reviewed in Vaickus, L. (1991) Crit Rev. in family (S2), and the Togavirus endopeptidase family (S3). Oncol./Hemotol. 11:267-97); lymphoid malignancies The 14089 polypeptide seems to belong to the trypsin include, but are not limited to acute lymphoblastic leukemia chymotrypsin family (S1). The prototype of this family is (ALL) which includes B-lineage ALL and T-lineage ALL, chymotrypsin and the 3D structure of some of its members chronic lymphocytic leukemia (CLL), prolymphocytic leu has been resolved. The trypsin-chymotrypsin family (S1) kemia (PLL), hairy cell leukemia (HLL) and Waldenstrom's 25 includes such members as: trypsin (forms I, II, III, IV, Va and macroglobulinemia (WM). Additional forms of malignant Vb); trypsin-like enzyme; hepsin; Venombin; cercarial lymphomas include, but are not limited to non-Hodgkin elastase; brachyurin; Factor C. Proclotting enzyme; easter lymphoma and variants thereof, peripheral T cell lympho gene product: Snake gene product; stubble gene product; mas, adult T cell leukemia/lymphoma (ATL), cutaneous Vitellin-degrading endopeptidase; hypodermin C; Serine T-cell lymphoma (CTCL), large granular lymphocytic leu 30 proteases 1 and 2; achelase, chymotrypsin (forms A, B, II, kemia (LGF), Hodgkin’s disease and Reed-Sternberg dis and 2); Proteinase RVV-V (forms C. and Y); flavoboxin; CSC. venombin A; Crotalase; enteropeptidase; acrosin; ancrod; seminin; semenogelase; tissue kallikrein; renal kallikrein; 14089 submandibular kallikrein; 7S nerve growth factor (chains C. The human 14089 sequence (see SEQ ID NO:33), which 35 and Y); epidermal growth factor-binding protein (forms 1, 2, is approximately 957 nucleotides long including untrans and 3); tonin; arginine esterase; pancreatic elastase I: pan lated regions, contains a predicted methionine-initiated cod creatic elastase II (forms A and B); pancreatic endopeptidase ing sequence of about 726 nucleotides, including the termi E (forms A and B); leukocyte elastase; medullasin; azuro nation codon (nucleotides indicated as coding of SEQ ID cidin; , proteinase 3 (myeloblastin); chymase NO:33: SEQ ID NO:35). The coding sequence encodes a 40 (forms I and II); Y-renin; tryptase (forms 1, 2, and 3); 241 amino acid protein (SEQID NO:34). The human 14089 : natural killer cell protease 1, gilatoxin; protein of SEQ ID NO:34 includes an amino-terminal granzymes B, C, D, E, F, G and Y: carboxypeptidase A hydrophobic amino acid sequence, consistent with a signal complex component III; complement factors D. B. I; sequence, of about 18 amino acids (from amino acid 1 to complement components C1r, C1s, and C2; calcium-depen about amino acid 18 of SEQ ID NO:34), which upon 45 dent serine protease; hypodermin A, B, and C.; haptoglobin cleavage results in the production of a mature protein. This (forms 1 and 2); haptoglobin-related protein; plasmin, apo mature protein form is approximately 222 amino acid resi lipoprotein (a); hepatocyte growth factor; medullasin; dues in length (from about amino acid 19 to amino acid 241 thrombin; t-plasminogen activator, u-plasminogen activator; of SEQ ID NO:34). salivary plasminogen activator, plasma kallikrein; coagula Human 14089 contains the following regions or other 50 tion factors VII, IX, X, XI, and XII; and proteins C and Z. structural features: as well as as-yet unidentified members. a trypsin domain (PFAM Accession Number PF00089) The 14089 polypeptides can be homologous to the mouse located at about amino acid residues 24 to 234 or 41 to 234 bodenin gene (GenBank Accession No. AJO01373). The of SEQ ID NO:34 (according to SMART and PFAM, mouse bodenin gene is expressed in region of the brain Such respectively); 55 as the basal ganglia, thalamus, cerebral cortex, and may play four predicted Casein Kinase II phosphorylation sites a role in the developing and mature central nervous system. See, Faisst and Gruss, (1998) Dev. Dyn. 212:293-303. (PS00006) located at about amino acids 96 to 99, 109 to 112, Accordingly, the 14089 polypeptide contains a significant 126 to 129, and 210 to 213 of SEQID NO:34: number of structural characteristics in common with mem three predicted N-glycosylation sites (PS00001) from 60 bers of the SI family of the SA clan of serine-type proteases about amino acids 11 to 14, 156 to 159, and 173 to 176 of (also referred to herein as “trypsin-chymotrypsin' or SEQ ID NO:34: “trypsin' family members). The term “family' when refer two predicted N-myristylation sites (PS00008) from ring to the protein and nucleic acid molecules of the inven about amino acids 182 to 187 and 203 to 208 of SEQ ID tion means two or more proteins or nucleic acid molecules NO:34: 65 having a common structural domain or motif and having one predicted amidation site (PS00009) from about amino Sufficient amino acid or nucleotide sequence homology as acids 185 to 188 of SEQ ID NO:34: defined herein. Such family members can be naturally or US 7,282,360 B2 57 58 non-naturally occurring and can be from either the same or least 110, more preferably at least 120 or greater. The trypsin different species. For example, a family can contain a first domain (HMM) has been assigned the PFAM Accession protein of human origin as well as other distinct proteins of (PF00089). An alignment of the trypsin domain (from about human origin, or alternatively, can contain homologues of amino acids 41 to 234 of SEQ ID NO:34) of human 14089 non-human origin, e.g., rat or mouse proteins. Members of 5 with a consensus amino acid sequence derived from a a family can also have common functional characteristics. hidden Markov model (PFAM) is depicted in FIG. 19A. An As used herein, a "trypsin-chymotrypsin family member alignment of the trypsin domain (from about amino acids 24 typically contains a catalytic unit that is generally a polypep to about 234 of SEQ ID NO:34) of human 14089 with a tide sequence of about 100 to about 300 amino acids, more consensus amino acid sequence derived from another hidden preferably about 150 to about 250, or about 170 to about 230 10 Markov model (SMART) is depicted in FIG. 19B. amino acid residues, although some members have N-ter In a preferred embodiment, a 14089 polypeptide or pro minal extensions of unrelated peptide segments. The cata tein has a "trypsin' domain or a region which includes at lytic unit typically forms the C-terminal portion of the least about 100 to about 300 amino acids, more preferably enzyme. These proteases typically cleave arginine or lysine about 150 to about 250, or about 170 to about 220 amino residues in a target protein. 15 acid residues and has at least about 70%, 80%, 90%, 95%, Trypsin-chymotrypsin family members preferably have at 99%, or 100% homology with a “trypsin domain, e.g., the least one trypsin domain, comprising at least one histidine trypsin domain of human 14089 (e.g., about residues 224 to active site residue, and at least one serine active site residue. 234 or 241 to 234 of SEQID NO:34). Preferably, the trypsin Trypsin-chymotrypsin family members can also include an domain includes at least one histidine active site residue, and aspartate residue within the trypsin domain. These three 20 at least one serine active site residue. The trypsin domain can residues act as a “catalytic triad', with serine as nucleophile, also include an aspartate residue, thus forming a catalytic aspartate as electrophile, and histidine as base. triad, with serine as nucleophile, aspartate as electrophile, 14089 polypeptides contain structural features similar to and histidine as base. trypsin-chymotrypsin family members. For example, the To identify the presence of a “trypsin' domain in a 14089 trypsin domain of the 14089 polypeptide has a conserved 25 protein sequence, and make the determination that a histidine residue present at about amino acid 56 of SEQ ID polypeptide or protein of interest has a particular profile, the NO:34, and a serine active site located at amino acid 195 of amino acid sequence of the protein can be searched against SEQ ID NO:34. The trypsin domain of the 14089 polypep a database of HMMs (e.g., the Pfam database, release 2.1) tide additionally includes eight conserved cysteines, which using the default parameters. For example, the hmmsf are present at about amino acids 40, 57, 133, 143, 165, 180, 30 program, which is available as part of the HMMER package 191, 201, and 215 of SEQ ID NO:34. Eight of these of search programs, is a family specific default program for cysteines can form disulfide bonds together in an intramo MILPAT0063 and a score of 15 is the default threshold score lecular context. Preferably, the disulfide bonds are formed for determining a hit. Alternatively, the threshold score for between residues about 40 and 57, 133 and 201, 165 and determining a hit can be lowered (e.g., to 8 bits). A descrip 180, 191 and 215 of SEQ ID NO:34. 35 tion of the Pfam database can be found in Sonhammer et al. In addition, the 14089 polypeptide includes an active site (1997) Proteins 28(3):405-420 and a detailed description of serine at about residue 195 of SEQID NO:34. The histidine HMMs can be found, for example, in Gribskov et al. (1990) base typically occurs in a signature motif characterized by Meth. Enzymol. 183:146-159; Gribskov et al. (1987) Proc. Prosite Motif PS00134: LIVM-ST-A-STAG-H-C. A Natl. Acad. Sci. USA 84:4355-4358; Krogh et al. (1994) J. 14089 polypeptide also contains the sequence ITAAHC, 40 Mol. Biol. 235:1501-1531; and Stultz et al. (1993) Protein which matches PS00134, at about amino acids 52 to 57 of Sci. 2:305-314, the contents of which are incorporated SEQ ID NO:34. herein by reference. A search was performed against the Trypsin-chymotrypsin family members occasionally PFAM HMM database resulting in the identification of a function intracellularly, but more generally, they act extra "trypsin domain” in the amino acid sequence of human cellularly. Examples of such extracellular activity include 45 14089 at about residues 41 to 234 of SEQID NO:34 with a release or activation of growth factors, degradation of extra bit score of 122.5 (see FIGS. 18 and 20A-20B). cellular matrix, coagulation, fibrinolysis, Zymogen and To identify the presence of a “trypsin' domain in a 14089 growth hormone activation, and complement activation. protein sequence, the amino acid sequence of the protein can Trypsin-chymotrypsin family members have been impli also be searched against a SMART database (Simple Modu cated in modulating tumor invasion and growth by, for 50 lar Architecture Research Tool) of HMMs as described in example, releasing or activating growth factors and/or Schultz et al. (1998), Proc. Natl. Acad. Sci. USA 95:5857 digesting extracellular matrix components. A 14089 and Schultz et al. (200) Nucl. Acids Res 28:231. The polypeptide can include a signal sequence, located at resi database contains domains identified by profiling with the dues about 1 to 18 of SEQ ID NO:34, which directs the hidden Markov models of the HMMer2 search program (R. polypeptide to the extracellular milieu. 55 Durbin et al. (1998) Biological sequence analysis: proba A 14089 polypeptide includes at least one “trypsin bilistic models of proteins and nucleic acids. Cambridge domain or at least one region homologous with a "trypsin University Press). The database also is extensively anno domain'. As used herein, the term “trypsin domain (or a tated and monitored by experts to enhance accuracy. A "trypsin-chymotrypsin' domain) refers to a protein domain search was performed against the HMM database resulting having an amino acid sequence of from about 50 to about 60 in the identification of a “serine protease” domain in the 350 amino acid residues and having a bit score for the amino acid sequence of human 14089 at about residues 24 alignment of the sequence to the trypsin domain (HMM) of to 234 of SEQ ID NO:34 (see FIG. 18). at least 70. Preferably, a trypsin domain includes at least The sequence of interest can also be characterized using about 100 to about 300 amino acids, more preferably about the ProDom database. To perform this analysis, the amino 150 to about 250, or about 170 to about 220 amino acid 65 acid sequence of the protein is searched against a database residues and has a bit score for the alignment of the sequence of domains, e.g., the ProDom database (Corpet et al. (1999), to the trypsin domain (HMM) of at least 100, preferably at Nucl. Acids Res. 27:263-267) The ProDom protein domain US 7,282,360 B2 59 60 database consists of an automatic compilation of homolo therapeutic agents for controlling disorders of cell prolifera gous domains. Current versions of ProDom are built using tion, cell differentiation, organogenesis, coagulation, and recursive PSI-BLAST searches (Altschul SF et al. (1997) cell signaling. Nucleic Acids Res. 25:3389-3402: Gouzy et al. (1999) Thus, the 14089 molecules can act as novel diagnostic Computers and Chemistry 23:333-340.) of the SWISS targets and therapeutic agents for controlling one or more of PROT 38 and TREMBL protein databases. The database cellular proliferative and/or differentiative disorders. automatically generates a consensus sequence for each Examples of cellular proliferative and/or differentiative dis domain. A BLAST search was performed against the HMM orders include cancer, e.g., carcinoma, sarcoma, metastatic database resulting in the identification of a “protease serine disorders or hematopoietic neoplastic disorders, e.g., leuke precursor signal hydrolase Zymogen family 10 mias. A metastatic tumor can arise from a multitude of primary tumor types, including but not limited to those of multigene factor” domain in the amino acid sequence of prostate, colon, lung, breast and liver origin. human 14089 at about residues 76 to 266 of SEQID NO:34 As used herein, the terms “cancer”, “hyperproliferative' (see FIGS. 20A-20B). and “neoplastic' refer to cells having the capacity for A 14089 family member can include at least one trypsin 15 autonomous growth, i.e., an abnormal state or condition domain and at least one serine protease, typsin family, characterized by rapidly proliferating cell growth. Hyper histidine active site. Furthermore, a 14089 family member proliferative and neoplastic disease states may be catego can include at least one, two, three, and preferably four rized as pathologic, i.e., characterizing or constituting a predicted casein kinase II phosphorylation sites (PS00006): disease state, or may be categorized as non-pathologic, i.e., at least one, and preferably two predicted N-myristoylation a deviation from normal but not associated with a disease sites (PS00008); at least one predicted tyrosine kinase state. The term is meant to include all types of cancerous phosphorylation site (PS00007); at least one amidation site growths or oncogenic processes, metastatic tissues or malig (PS00009); and at least one or two, and preferably three nantly transformed cells, tissues, or organs, irrespective of N-glycosylation sites (PS00001). histopathologic type or stage of invasiveness. “Pathologic 25 hyperproliferative” cells occur in disease states character As the 14089 polypeptides of the invention may modulate ized by malignant tumor growth. Examples of non-patho 14089-mediated activities, they may be useful as of for logic hyperproliferative cells include proliferation of cells developing novel diagnostic and therapeutic agents for associated with wound repair. 14089-mediated or related disorders, as described below. The terms “cancer or “neoplasms include malignancies As used herein, a “14089 activity”, “biological activity of 30 of the various organ systems. Such as affecting lung, breast, 14089” or “functional activity of 14089', refers to an thyroid, lymphoid, gastrointestinal, and genito-urinary tract, activity exerted by a 14089 protein, polypeptide or nucleic as well as adenocarcinomas which include malignancies acid molecule on e.g., a 14089-responsive cell or on a 14089 Such as most colon cancers, renal-cell carcinoma, prostate Substrate, e.g., a protein Substrate, as determined in vivo or cancer and/or testicular tumors, non-small cell carcinoma of in vitro. In one embodiment, a 14089 activity is a direct 35 the lung, cancer of the Small intestine and cancer of the activity, such as an association with a 14089 target molecule. esophagus. A “target molecule' or “binding partner is a molecule with The term “carcinoma is art recognized and refers to which a 14089 protein binds or interacts in nature, e.g., a malignancies of epithelial or endocrine tissues including substrate for proteolytic cleavage. A 14089 activity can also respiratory system carcinomas, gastrointestinal system car be an indirect activity, e.g., a cellular signaling activity 40 cinomas, genitourinary system carcinomas, testicular carci mediated by interaction of the 14089 protein with a 14089 nomas, breast carcinomas, prostatic carcinomas, endocrine receptor. Based on the above-described sequence similari system carcinomas, and melanomas. Exemplary carcinomas ties, the 14089 molecules of the present invention are include those forming from tissue of the cervix, lung, predicted to have similar biological activities as serine prostate, breast, head and neck, colon and ovary. The term protease family members. For example, the 14089 proteins 45 also includes carcinosarcomas, e.g., which include malig of the present invention can have one or more of the nant tumors composed of carcinomatous and sarcomatous following activities: (1) modulate (stimulate or inhibit) tissues. An "adenocarcinoma’ refers to a carcinoma derived cellular proliferation (2) modulate cell differentiation; (3) from glandular tissue or in which the tumor cells form modulate tumorigenesis and tumor invasion; (4) alter extra recognizable glandular structures. cellular matrix composition; (5) catalyze polypeptide 50 The term "sarcoma' is art recognized and refers to malig growth factor activation and release; (6) regulate the blood nant tumors of mesenchymal derivation. clotting cascade; (7) catalyze proteolytic cleavage of a Additional examples of proliferative disorders include Substrate, e.g., a protein Substrate (e.g., cleavage at an hematopoietic neoplastic disorders. As used herein, the term arginine or lysine residue; (8) catalyze the proteolytic acti "hematopoietic neoplastic disorders' includes diseases Vation of signaling molecules, e.g., other proteases, growth 55 involving hyperplastic/neoplastic cells of hematopoietic ori factor activation or release; or (9) regulate of cell motility or gin, e.g., arising from myeloid, lymphoid or erythroid lin attachment. eages, or precursor cells thereof. Preferably, the diseases Based on the above-described sequence similarities, the arise from poorly differentiated acute leukemias, e.g., eryth 14089 molecules of the present invention are predicted to roblastic leukemia and acute megakaryoblastic leukemia. have similar biological activities as other trypsin family 60 Additional exemplary myeloid disorders include, but are not members, such as hepsin proteases. Hepsin proteases are limited to, acute promyeloid leukemia (APML), acute myel overexpressed in ovarian tumors and hepatoma cells (Tan ogenous leukemia (AML) and chronic myelogenous leuke imoto, H. et al. (1997) Cancer Res. 57:2884-2887). Further mia (CML) (reviewed in Vaickus, L. (1991) Crit Rev. in in vitro studies have shown inhibition of hepatoma cell Oncol./Hemotol. 11:267-97); lymphoid malignancies proliferation using hepsin inhibitors (Torres-Rosado, A. et 65 include, but are not limited to acute lymphoblastic leukemia al. (1993) Proc. Natl. Acad. Sci. USA 90: 7181-7185). The (ALL) which includes B-lineage ALL and T-lineage ALL, 14089 molecules can serve as novel diagnostic targets and chronic lymphocytic leukemia (CLL), prolymphocytic leu US 7,282,360 B2 61 62 kemia (PLL), hairy cell leukemia (HLL) and Waldenstrom's Human 23436 contains the following regions or other macroglobulinemia (WM). Additional forms of malignant structural features: lymphomas include, but are not limited to non-Hodgkin a ubiquitin carboxy-terminal hydrolase (family 2) domain lymphoma and variants thereof, peripheral T cell lympho with a first segment (PFAM Accession Number PF00442) mas, adult T cell leukemia/lymphoma (ATL), cutaneous located at about amino acid residues 89 to 120 of SEQ ID T-cell lymphoma (CTCL), large granular lymphocytic leu NO:41 and a second segment (PFAM Accession Number kemia (LGF), Hodgkin’s disease and Reed-Sternberg dis PF00443) located at about amino acid residues 332 to 420 of CaSC. SEQ ID NO:41: four predicted protein kinase C phosphorylation sites 23436 10 (PS00005) at about amino acids 17 to 19, 158 to 160,280 to One post-translational mechanism of regulating protein 282, and 398 to 400 of SEQ ID NO:41: levels is the ubiquitin pathway. Ubiquitin is a highly con four predicted casein kinase II phosphorylation sites served polypeptide expressed in all eukaryotic cells. Ubiq (PS00006) located at about amino acids 123 to 126, 143 to uitin is covalently attached as a single molecule or as a 146, 191 to 194, and 445 to 448 of SEQ ID NO:41: conjugated form to lysine residue(s) of target proteins by 15 two predicted cAMP/cGMP-dependent protein kinase formation of an isopeptide bond to the C-terminal glycine phosphorylation sites (PS00004) located at about amino residue of ubiquitin. Most ubiquitinated proteins are subse acids 84 to 87 and 458 to 461 of SEQ ID NO:41: quently targeted to the 26S proteasome, a multicatalytic one predicted tyrosine kinase phosphorylation site protease, which cleaves the marked protein into peptide (PS00007) located at about amino acids 261 to 268; fragments. two predicted N-glycosylation sites (PS00001) from Of the various enzymes involved in the ubiquitin protein about amino acids 278 to 281 and 427 to 430 of SEQ ID degradation pathway, one type of enzyme, termed ubiquitin NO:41: carboxy-terminal hydrolase (also “UCH or “ubiquitin pro one predicted amidation site (PS00009) from about amino tease'), hydrolyzes the bond between ubiquitin and ubiq acids 378 to 381 of SEQ ID NO:41; and uitin-tagged proteins and the bond linking ubiquitin-ubiq 25 uitin conjugates. This activity can provide a proofreading three predicted N-myristylation sites (PS00008) from function, e.g., a function that reduces protein degradation. about amino acids 50 to 55, 173 to 178, and 406 to 411 of These enzymes can include determinants for Substrate SEQ ID NO:41. specific recognition in order to selectively regulate degra The ubiquitin carboxy-terminal hydrolase (family 2) 30 domain of 23436 protein also features a conserved catalytic dation of their preferred substrates. They can also associate cysteine at about amino acid 98 of SEQID NO:41, and two 19S regulatory complex of the 26S proteasome. conserved at about amino acids 344 and 353 of The regulatory function of ubiquitin carboxy-terminal SEQID NO:41. The two conserved histidines are contained hydrolases has been demonstrated for a number of cellular within a ubiquitin specific carboxyl terminal hydrolase fam processes. For example, in Drosophila the ubiquitin car 35 ily signature domain (Prosite motif PS00973) located at boxy-terminal hydrolase, fat facets (faf) is a regulator impor about amino acid residues 336 to 354 (PFAM Accession tant for eye development (Chen and Fischer (2000) Genetics PS00973); 156:1829-36). In yeast, the ubiquitin carboxy-terminal For general information regarding PFAM identifiers, PS hydrolase UBP3 is associated with mating-type silencing prefix and PF prefix domain identification numbers, refer to (Moazed and Johnson (1996) Cell 86:667-77). These find ings suggest that ubiquitin carboxy-terminal hydrolases 40 Sonnhammer et al. (1997) Protein 28:405-420. The 23436 protein contains a significant number of char exert a regulatory function by controlling de-ubiquitination acteristics in common with members of the ubiquitin car of substrates. boxy-terminal hydrolase family 2. The term “family' when Ubiquitination and de-ubiquitination are important pro referring to the protein and nucleic acid molecules of the cesses through which protein levels and function are regu 45 invention means two or more proteins or nucleic acid lated in cells. Ubiquitination has been implicated in regu molecules having a common structural domain or motif and lating numerous cellular processes including proliferation, having Sufficient amino acid or nucleotide sequence homol differentiation, apoptosis (programmed cell death), tran ogy as defined herein. Such family members can be naturally Scription, signal-transduction, cell-cycle progression, recep or non-naturally occurring and can be from either the same tor-mediated endocytosis, and organelle biogenesis. The 50 or different species. For example, a family can contain a first activity of an enzyme mediating Substrate de-ubiquitination protein of human origin as well as other distinct proteins of or ubiquitin flux is key to the outcome of Such processes. human origin, or alternatively, can contain homologues of Levels of ubiquitination can be altered in the diseased non-human origin, e.g., rat or mouse proteins. Members of state. For example, in neuropathological conditions such as a family can also have common functional characteristics. Alzheimer's and Pick's disease abnormal amounts of ubiq 55 Proteins of the ubiquitin carboxy-terminal hydrolase fam uitinated proteins accumulate. In proliferative disorders, ily 2 are characterized by a common fold with characteristics oncogenes (e.g., V-jun and V-fos) can be more resistant to cysteine protease activity. The fold includes a conserved ubiquitination in comparison to their normal cell counter cysteine, e.g., the cysteine at about amino acid 98 of SEQID parts. The failure to degrade oncogene protein products may NO:41, which can be the catalytic cysteine for the protease contribute to their cell transformation capability. 60 domain. The fold also includes a conserved structural motif, The human 23436 sequence (SEQ ID NO:40), which is characterized by the Prosite signature Y-X-L-X-SAG approximately 2446 nucleotides long including untranslated LIVMFT-X(2)-H-x-G-X(4,5)-G-H-Y (wherein X is any regions, contains a predicted methionine-initiated coding amino acid; and numbers in parentheses indicate a repetition sequence of about 1458 nucleotides, including the TAA of a feature for the indicated number of residues or a range termination codon (nucleotides indicated as coding of SEQ 65 of residues; SEQID NO:45) which is located at about amino ID NO:40; SEQ ID NO:42). The coding sequence encodes acids 336 to 354 of SEQ ID NO:41 and includes two a 485 amino acid protein (SEQ ID NO:41). conserved histidines, e.g., histidines at about amino acids US 7,282,360 B2 63 64 344 and 353 of SEQ ID NO:41. At least one of these of the ubiquitin carboxy-terminal hydrolase domain of histidines can participate in catalysis. human 23436 (e.g., residues 379 to 420 of SEQID NO:41). A 23436 polypeptide or subsequence thereof can include In a much preferred embodiment, the 23436 polypeptide a "ubiquitin carboxy-terminal hydrolase domain,” or a includes the two conserved histidines at about amino acids "ubiquitin protease domain,” or sequences homologous with 5 344 and 353 of SEQ ID NO:41. a "ubiquitin carboxy-terminal hydrolase or protease To identify the presence of a “ubiquitin carboxy-terminal domain.” As used herein the phrases, "ubiquitin carboxy hydrolase' domain in a 23436 protein sequence, and make terminal hydrolase.” “ubiquitin specific hydrolase,” “ubiq the determination that a polypeptide or protein of interest uitin hydrolase,” “ubiquitin protease,” or “ubiquitin specific has a particular profile, the amino acid sequence of the protease are used interchangeably and mean a polypeptide 10 protein can be searched against the Pfam database of HMMs with the ability to remove one or more ubiquitin molecules (e.g., the Pfam database, release 2.1) using the default from a protein that has one or more covalently attached parameters. For example, the hmmsf program, which is molecules of ubiquitin. For example, the definition includes available as part of the HMMER package of search pro cleavage of conjugated forms of ubiquitin, e.g., at the grams, is a family specific default program for MILPAT0063 peptide bond following the carboxy-terminal glycine (e.g., 15 and a score of 15 is the default threshold score for deter whether or not the ubiquitin conjugate is attached to a mining a hit. Alternatively, the threshold score for determin protein). In a preferred embodiment, the ubiquitin carboxy ing a hit can be lowered (e.g., to 8 bits). A description of the terminal hydrolase can cleave a ubiquitin moiety from the Pfam database can be found in Sonhammer et al. (1997) e-NH group of a lysine side chain of a target protein. Proteins 28(3):405-420 and a detailed description of HMMs As used herein, the term “ubiquitin carboxy-terminal can be found, for example, in Gribskov et al. (1990) Meth. hydrolase domain includes an amino acid sequence of Enzymol. 183:146-159; Gribskov et al. (1987) Proc. Natl. about 300 to 450 amino acid residues in length and having Acad. Sci. USA 84:4355-4358; Krogh et al. (1994) J. Mol. a bit score for the alignment of the sequence to the first Biol. 235:1501-1531; and Stultz et al. (1993) Protein Sci. ubiquitin carboxy-terminal hydrolase (family 2) consensus 2:305-314, the contents of which are incorporated herein by (PFAM PF00442) of at least 20 and to the second ubiquitin 25 reference. A search was performed against the HMM data carboxy-terminal hydrolase (family 2) consensus (PFAM base resulting in the identification of a “ubiquitin carboxy PF00443) of at least 50. Preferably, a ubiquitin carboxy terminal hydrolase' domain in the amino acid sequence of terminal hydrolase domain includes at least about 300 to 450 human 23436 at about residues 89 to 420 (e.g., particularly amino acids, more preferably about 320 to 440 amino acid the segments 89 to 120 and 332 to 420) of SEQ ID NO:41: residues, or about 330 to 420 amino acids and has a bit score 30 see FIGS. 22A and 22B)). for the alignment of the sequence to the second ubiquitin A 23436 family member can include at least one ubiquitin carboxy-terminal hydrolase (family 2) domain consensus carboxy-terminal hydrolase domain. Furthermore, a 23436 sequence (HMM) of at least 50, 60, 70, 75 or greater. The family member can include at least one, two, three, prefer ubiquitin carboxy-terminal hydrolase (family 2) domain ably four protein kinase C phosphorylation sites (PS00005); (HMM) has been assigned two non-contiguous consensus 35 at least one, two, three, preferably four predicted casein sequences PFAM Accession Numbers PF00442 and kinase II phosphorylation sites (PS00006); at least one PF00443. An alignment of the ubiquitin carboxy-terminal tyrosine kinase phosphorylation site (PS00009); at least one, hydrolase domain (amino acids 89 to 120 of SEQID NO:41) preferably two cAMP and ccjMP protein kinase phospho of human 23436 with the first ubiquitin carboxy-terminal rylation sites (PS00004); at least one, preferably two N-gly hydrolase (family 2) consensus amino acid sequence (SEQ 40 cosylation sites (PS00001); and at least one, two, preferably ID NO:43) derived from a hidden Markov model is depicted three predicted N-myristylation sites (PS00008). in FIG. 22A and an alignment of the ubiquitin carboxy As the 23436 polypeptides of the invention may modulate terminal hydrolase domain (amino acids 332 to 420 of SEQ 23436-mediated activities, they may be useful as of for ID NO:41) of human 23436 with the second ubiquitin developing novel diagnostic and therapeutic agents for carboxy-terminal hydrolase (family 2) consensus amino acid 45 23436-mediated or related disorders, as described below. sequence (SEQ ID NO:44) derived from a hidden Markov As used herein, a “23436 activity”, “biological activity of model is depicted in FIG. 22B. 23436” or “functional activity of 23436', refers to an In a preferred embodiment, 23436 polypeptide or protein activity exerted by a 23436 protein, polypeptide or nucleic has a “ubiquitin carboxy-terminal hydrolase (family 2) acid molecule. For example, a 23436 activity can be an domain first signature region (PF00442) which includes at 50 activity exerted by 23436 in a physiological milieu on, e.g., least about 10 to 70 more preferably about 20 to 50 or 24 to a 23436-responsive cell or on a 23436 substrate, e.g., a 35 amino acid residues and has at least about 50%, 60%, ubiquitinated protein Substrate or a ubiquitin-ubiquitin con 70% 80% 90% 95%, 99%, or 100% homology with a jugate. A 23436 activity can be determined in vivo or in “ubiquitin carboxy-terminal hydrolase (family 2) domain vitro. In one embodiment, a 23436 activity is a direct UCH-1, e.g., the first signature region of the ubiquitin 55 activity, Such as an association with a 23436 target molecule. carboxy-terminal hydrolase domain of human 23436 (e.g., A “target molecule' or “binding partner is a molecule with residues 89 to 120 of SEQ ID NO:41). In a much preferred which a 23436 protein binds or interacts in nature. In a embodiment, the 23436 polypeptide includes a conserved preferred embodiment, the target molecule is a ubiquitinated catalytic cysteine at about residue 98 of SEQ ID NO:41. compound which is a substrate for 23436-mediated de In another preferred embodiment, 23436 polypeptide or 60 ubiquitination. In an exemplary embodiment, 23436 is an protein has a "ubiquitin carboxy-terminal hydrolase (family enzyme that catalyzes the removal of ubiquitin from a 2) domain second signature region (PF00443) which Substrate, e.g., by hydrolyzing a peptide bond. includes at least about 50 to 140 more preferably about 70 A 23436 activity can also be an indirect activity, e.g., to 120, or 80 to 100 amino acid residues and has at least decreased degradation or increased Stability of a protein due about 50%, 60%, 70% 80% 90% 95%, 99%, or 100% 65 to 23436-mediated de-ubiquitination, or a cellular signaling homology with a "ubiquitin carboxy-terminal hydrolase activity (e.g., proliferation, differentiation, apoptosis, etc.) (family 2) domain UCH-2. e.g., the second signature region that results from or is mediated by the 23436 protein or a US 7,282,360 B2 65 66 protein de-ubiquitinated by 23436. For example, altered membrane defects, such as paroxysmal nocturnal hemoglo expression or activity of a 23436 molecule can cause an binuria and spur cell anemia; hemolytic anemias caused by inhibition or failure to target proteins for degradation or, antibody reactions, for example to the RBC antigens, or alternatively, excessive or undesirable protein degradation, antigens of the ABO system, Lewis System, Ii system, Rh leading to accumulation of protein in cells which, in turn, system, Kidd system, Duffy system, and Kell system; meth leads to a disorder of a tissue in which 23436 is normally emoglobinemia; a failure of erythropoiesis, for example, as expressed (e.g., the brain). a result of aplastic anemia, pure red cell aplasia, myelodys Based on the discovery disclosed herein, e.g., the above plastic syndromes, sideroblastic anemias, and congenital described sequence similarities, the 23436 molecules of the dyserythropoietic anemia; secondary anemia in non-hema present invention are predicted to have similar biological 10 tolic disorders, for example, as a result of chemotherapy, activities as ubiquitin carboxy-terminal hydrolase family 2 alcoholism, or liver disease; anemia of chronic disease. Such members. Protein ubiquitination is important in growth as chronic renal failure; and endocrine deficiency diseases. factor-mediated cellular proliferation. The deubiquitinating Agents that modulate 23436 polypeptide or nucleic acid enzymes act as regulatory enzymes that couple extracellular activity or expression can be used to treat anemias, in signaling to cell growth. 23436, which shows sequence 15 particular, drug-induced anemias or anemias associated with similarity to a deubiquitinating hydrolase is believed to cancer chemotherapy, chronic renal failure, malignancies, negatively regulates cytokine signaling in hematopoietic, adult and juvenile rheumatoid arthritis, disorders of hemo e.g., erythroid, progenitors resulting in the inhibition of globin synthesis, prematurity, and Zidovudine treatment of hematopoietic progenitor growth. Antagonists of this 23436 HIV infection. A subject receiving the treatment can be are expected to promote hematopoietic, e.g., erythroid, cell additionally treated with a second agent, e.g., erythropoietin, proliferation and differentiation. Accordingly, the 23436 proteins of the present invention to further ameliorate the condition. can have one or more of the following activities: (1) de As used herein, the term “erythropoietin” or “EPO refers ubiquitinating polypeptides that are ubiquitinated; (2) cleav to a glycoprotein produced in the kidney, which is the ing ubiquitin conjugates (e.g., ubiquitin-tagged substrates, 25 principal hormone responsible for stimulating red blood cell ubiquitin-tagged peptide fragments, head to tail linked ubiq production (erythrogenesis). EPO stimulates the division uitin molecules); (3) reversing targeting of a polypeptide to and differentiation of committed erythroid progenitors in the a proteasome (e.g., by removing ubiquitin targeting signals); bone marrow. Normal plasma erythropoietin levels range or (4) altering flux in the ubiquitin pathway (e.g., by from 0.01 to 0.03 Units/mL, and can increase up to 100 to recycling ubiquitin from proteasome digestions products). 30 1,000-fold during hypoxia or anemia. Graber and Krantz Hence, modulation of 23436 polypeptide activity or expres (1978) Ann. Rev. Med. 29:51; Eschbach and Adamson sion are likely to influence degradation events, and thereby (1985) Kidney Intl. 28:1. Recombinant human erythropoi regulate cellular activities related to cell proliferation, cell etin (rhulpo or epoietin alpha) is commercially available as signaling, cell death (e.g., apoptosis), cell motility, receptor EPOGEN.RTM. (epoietin alpha, recombinant human eryth mediated endocytosis, organelle biogenesis, hematopoietic, 35 ropoietin) (Amgen Inc., Thousand Oaks, Calif.) and as e.g., erythroid, cell proliferation and differentiation, and PROCRIT.RTM. (epoietin alpha, recombinant human eryth cytokine-mediated signaling events. ropoietin) (Ortho Biotech Inc., Raritan, N.J.). The molecules of the invention can be used to develop Another example of an erythroid-associated disorder is novel agents or compounds to treat and/or diagnose disor erythrocytosis. Erythrocytosis, a disorder of red blood cell ders involving aberrant activities of the cells in which 23436 40 overproduction caused by excessive and/or ectopic erythro nucleic acids and polypeptides are expressed. 2343.6 mRNA poietin production, can be caused by cancers, e.g., a renal is found primarily in hematopoietic cells, and in particular, cell cancer, a hepatocarcinoma, and a central nervous system in cells of the erythroid lineage (FIGS. 25-26), as well as cancer. Diseases associated with erythrocytosis include prostate, hypothalamus, and hepatoma cells. More specifi polycythemias, e.g., polycythemia Vera, secondary poly cally, high expression of 23436 was detected in fetal liver, 45 cythemia, and relative polycythemia. bone marrow, erythroid progenitor and mature cells. Lower Aberrant expression or activity of the 23436 molecules levels of expression were detected in the brain (e.g., the may be involved in neoplastic disorders. Accordingly, treat cortex), kidney, ovary, human vascular endothelial cells and ment, prevention and diagnosis of cancer or neoplastic hematopoietic progenitor cells. This pattern of expression disorders related to hematopoietic cells and, in particular, suggests a role for 23436 in the function and development of 50 cells of the erythroid lineage are also included in the present the tissues in which it is expressed, and in particular in invention. Such neoplastic disorders are exemplified by hematopoietic cells. erythroid leukemias, or leukemias of erythroid precursor As the 23436 polypeptides of the invention may modulate cells, e.g., poorly differentiated acute leukemias Such as 23436-mediated activities, they may be useful as of for erythroblastic leukemia and acute megakaryoblastic leuke developing novel diagnostic and therapeutic agents for 55 mia. Additional exemplary myeloid disorders include, but 23436-mediated or related disorders, e.g., blood cell-asso are not limited to, acute promyeloid leukemia (APML), ciated or erythroid-associated disorders such as erythropoie acute myelogenous leukemia (AML) and chronic myelog sis, and other hematopoietic disorders. enous leukemia (CML) (reviewed in Vaickus, L. (1991) Crit As used herein, the term “erythroid associated disorders' Rev. in Oncol./Hemotol. 11:267-97). In particular, AML can include disorders involving aberrant (increased or deficient) 60 include the uncontrolled proliferation of CD34+ cells such erythroblast proliferation, e.g., an erythroleukemia; aberrant as AML subtypes M1 and M2, myeloblastic leukemias with (increased or deficient) erythroblast differentiation, e.g., an and without maturation, and AML subtype M6, erythroleu anemia; anemias Such as, for example, drug-(chemo kemia (Di Guglielmo's disease). Additional neoplastic dis therapy-) induced anemias, hemolytic anemias due to orders include a myelodysplastic syndrome or preleukemic hereditary cell membrane abnormalities, such as hereditary 65 disorder, e.g., oligoblastic leukemia, Smoldering leukemia. spherocytosis, hereditary elliptocytosis, and hereditary Additional cancers of the erythroid lineage include erythro pyropoikilocytosis; hemolytic anemias due to acquired cell blastosis, and other relevant diseases of the bone marrow. US 7,282,360 B2 67 68 The term “leukemia' or “leukemic cancer is intended to An alteration in a 23436 nucleic acid or polypeptide can have its clinical meaning, namely, a neoplastic disease in be associated with Susceptibility for prostate cancer, e.g., which white corpuscle maturation is arrested at a primitive early-onset prostate cancer, and/or brain cancer. As used stage of cell development. The disease is characterized by an herein, “a prostate disorder” refers to an abnormal condition increased number of leukemic blast cells in the bone mar- 5 occurring in the male pelvic region characterized by, e.g., row, and by varying degrees of failure to produce normal male sexual dysfunction and/or urinary symptoms. This hematopoietic cells. The condition may be either acute or disorder may be manifested in the form of genitourinary chronic. Leukemias are further typically categorized as inflammation (e.g., inflammation of Smooth muscle cells) as being either lymphocytic i.e., being characterized by cells in several common diseases of the prostate including pros which have properties in common with normal , 10 tatitis, benign prostatic hyperplasia and cancer, e.g., adeno or myelocytic (or myelogenous), i.e., characterized by cells carcinoma or carcinoma, of the prostate. having some characteristics of normal granulocytic cells. As used herein, the term “brain cancer includes a hyper Acute lymphocytic leukemia (ALL) arises in lymphoid proliferative or neoplastic state of tissue in the brain, includ tissue, and ordinarily first manifests its presence in bone ing tumors such as gliomas, including astrocytoma, includ marrow. Acute myelocytic leukemia (“AML') arises from 15 ing fibrillary (diffuse) astrocytoma and glioblastoma bone marrow hematopoietic stem cells or their progeny. The multiforme, pilocytic astrocytoma, pleomorphic Xanthoas term acute myelocytic leukemia Subsumes several Subtypes trocytoma, and brain stem glioma, oligodendroglioma, and of leukemia: myeloblastic leukemia, promyelocytic leuke ependymoma and related paraventricular mass lesions, neu mia, and myelomonocytic leukemia. In addition, leukemias ronal tumors, poorly differentiated neoplasms, including with erythroid or megakaryocytic properties are considered 20 medulloblastoma, other parenchymal tumors, including pri myelogenous leukemias as well. mary brain lymphoma, germ cell tumors, and pineal paren The molecules of the invention may also modulate the chymal tumors, meningiomas, metastatic tumors, paraneo activity of neoplastic, non-hematopoietic tissues in which plastic syndromes, peripheral nerve sheath tumors, they are expressed, e.g., liver and prostate. The 23436 including Schwannoma, neurofibroma, and malignant molecules can act as novel diagnostic targets and therapeutic 25 peripheral nerve sheath tumor (malignant Schwannoma), and agents for controlling one or more of cellular proliferative neurocutaneous syndromes (phakomatoses), including neu and/or differentiative disorders. Examples of such cellular rofibromotosis, including Type 1 neurofibromatosis (NF1) proliferative and/or differentiative disorders include cancer, and TYPE 2 neurofibromatosis (NF2), tuberous sclerosis, e.g., carcinoma, sarcoma, or metastatic disorders. A meta and Von Hippel-Lindau disease. static tumor can arise from a multitude of primary tumor 30 The 53070, 15985, 26583, 21953, m32404, 14089, or types, including but not limited to those of prostate and liver 23436 protein, fragments thereof, and derivatives and other variants of the sequence in SEQID NO:2, 8, 15, 20, 25, 34, Origin. or 41 thereof are collectively referred to as “polypeptides or As used herein, the terms “cancer”, “hyperproliferative'. proteins of the invention” or “53070, 15985, 26583, 21953, and “neoplastic' refer to cells having the capacity for 35 m32404, 14089, or 23436 polypeptides or proteins. Nucleic autonomous growth, i.e., an abnormal state or condition acid molecules encoding Such polypeptides or proteins are characterized by rapidly proliferating cell growth. Hyper collectively referred to as “nucleic acids of the invention” or proliferative and neoplastic disease states may be catego “53070, 15985, 26583, 21953, m32404, 14089, or 23436 rized as pathologic, i.e., characterizing or constituting a nucleic acids. 53070, 15985, 26583, 21953, m32404, disease state, or may be categorized as non-pathologic, i.e., 40 a deviation from normal but not associated with a disease 14089, or 23436 molecules refer to 53070, 15985, 26583, state. The term is meant to include all types of cancerous 21953, m32404, 14089, or 23436 nucleic acids, polypep growths or oncogenic processes, metastatic tissues or malig tides, and antibodies. nantly transformed cells, tissues, or organs, irrespective of As used herein, the term “nucleic acid molecule' includes histopathologic type or stage of invasiveness. “Pathologic DNA molecules (e.g., a cDNA or genomic DNA), RNA 45 molecules (e.g., an mRNA) and analogs of the DNA or hyperproliferative” cells occur in disease states character RNA. A DNA or RNA analog can be synthesized from ized by malignant tumor growth. Examples of non-patho nucleotide analogs. The nucleic acid molecule can be single logic hyperproliferative cells include proliferation of cells stranded or double-stranded, but preferably is double associated with wound repair. stranded DNA. The terms "cancer or "neoplasms include malignancies so The term "isolated nucleic acid molecule' or “purified of the various organ systems, such as those affecting lung, nucleic acid molecule' includes nucleic acid molecules that breast, thyroid, lymphoid, gastrointestinal, and the genito are separated from other nucleic acid molecules present in urinary tract. The terms “cancer or “neoplasms also the natural source of the nucleic acid. For example, with includes adenocarcinomas that include malignancies such as regards to genomic DNA, the term "isolated includes prostate cancer and/or testicular tumors, and non-small cell 55 nucleic acid molecules which are separated from the chro carcinoma of the lung. mosome with which the genomic DNA is naturally associ The term “carcinoma is art recognized and refers to ated. Preferably, an "isolated nucleic acid is free of malignancies of epithelial or endocrine tissues including sequences which naturally flank the nucleic acid (i.e., respiratory system carcinomas, gastrointestinal system car sequences located at the 5' and/or 3' ends of the nucleic acid) cinomas, genitourinary system carcinomas, testicular carci- 60 in the genomic DNA of the organism from which the nucleic nomas, breast carcinomas, prostatic carcinomas, endocrine acid is derived. For example, in various embodiments, the system carcinomas, and melanomas. Exemplary carcinomas isolated nucleic acid molecule can contain less than about 5 include those forming from tissue of the prostate, and liver. kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of 5" and/or 3' The term also includes carcinosarcomas, e.g., malignant nucleotide sequences which naturally flank the nucleic acid tumors composed of carcinomatous and sarcomatous tis- 65 molecule in genomic DNA of the cell from which the nucleic Sues. The term "sarcoma' is art recognized and refers to acid is derived. Moreover, an "isolated nucleic acid mol malignant tumors of mesenchymal derivation. ecule, such as a cDNA molecule, can be substantially free of US 7,282,360 B2 69 70 other cellular material, or culture medium when produced by includes isolated or purified preparations of at least 0.01, recombinant techniques, or Substantially free of chemical 0.1, 1.0, and 10 milligrams in dry weight. precursors or other chemicals when chemically synthesized. A “non-essential amino acid residue is a residue that can As used herein, the term “hybridizes under low strin be altered from the wild-type sequence of 53070, 15985, gency, medium stringency, high Stringency, or very high 26583, 21953, m32404, 14089, or 23436 without abolishing stringency conditions' describes conditions for hybridiza or substantially altering a 53070, 15985, 26583, 21953, tion and washing. Guidance for performing hybridization m32404, 14089, or 23436 activity. Preferably the alteration reactions can be found in Current Protocols in Molecular does not substantially alter the 53070, 15985, 26583, 21953, Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6, which m32404, 14089, or 23436 activity, e.g., the activity is at least is incorporated by reference. Aqueous and nonaqueous 10 20%, 40%, 60%, 70% or 80% of wild-type. An “essential” methods are described in that reference and either can be amino acid residue is a residue that, when altered from the used. Specific hybridization conditions referred to herein are wild-type sequence of 53070, 15985, 26583, 21953, as follows: 1) low stringency hybridization conditions in 6x m32404, 14089, or 23436, results in abolishing a 53070, sodium chloride/sodium citrate (SSC) at about 45° C. 15985, 26583, 21953, m32404, 14089, or 23436 activity followed by two washes in 0.2xSSC, 0.1% SDS at least at 15 such that less than 20% of the wild-type activity is present. 50° C. (the temperature of the washes can be increased to For example, conserved amino acid residues in 53070, 55° C. for low stringency conditions); 2) medium stringency 15985, 26583, 21953, m32404, 14089, or 23436 are pre hybridization conditions in 6xSSC at about 45° C., followed dicted to be particularly unamenable to alteration. by one or more washes in 0.2xSSC, 0.1% SDS at 60° C.; 3) A “conservative amino acid substitution' is one in which high stringency hybridization conditions in 6xSSC at about the amino acid residue is replaced with an amino acid 45° C., followed by one or more washes in 0.2xSSC, 0.1% residue having a similar side chain. Families of amino acid SDS at 65° C.; and preferably 4) very high stringency residues having similar side chains have been defined in the hybridization conditions are 0.5M sodium phosphate, 7% art. These families include amino acids with basic side SDS at 65° C., followed by one or more washes at 0.2xSSC, chains (e.g., lysine, arginine, histidine), acidic side chains 1% SDS at 65° C. Very high stringency conditions (4) are the 25 (e.g., aspartic acid, glutamic acid), uncharged polar side preferred conditions and the ones that should be used unless chains (e.g., glycine, asparagine, glutamine, serine, threo otherwise specified. nine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, Preferably, an isolated nucleic acid molecule of the inven valine, leucine, isoleucine, proline, phenylalanine, methion tion that hybridizes under a stringency condition described ine, tryptophan), beta-branched side chains (e.g., threonine, herein to the sequence of SEQID NO:1, 3, 7, 9, 14, 16, 19, 30 valine, isoleucine) and aromatic side chains (e.g., tyrosine, 21, 24, 26, 33, 35, 40, or 42, corresponds to a naturally phenylalanine, tryptophan, histidine). Thus, a predicted non essential amino acid residue in a 53070, 15985, 26583, occurring nucleic acid molecule. 21953, m32404, 14089, or 23436 protein is preferably As used herein, a “naturally-occurring nucleic acid mol replaced with another amino acid residue from the same side ecule refers to an RNA or DNA molecule having a nucle 35 chain family. Alternatively, in another embodiment, muta otide sequence that occurs in nature. For example a naturally tions can be introduced randomly along all or part of a occurring nucleic acid molecule can encode a natural pro 53070, 15985, 26583, 21953, m32404, 14089, or 23436 tein. coding sequence, such as by Saturation mutagenesis, and the As used herein, the terms “gene' and “recombinant gene’ resultant mutants can be screened for 53070, 15985, 26583, refer to nucleic acid molecules which include at least an 40 21953, m32404, 14089, or 23436 biological activity to open reading frame encoding a 53070, 15985, 26583,21953, identify mutants that retain activity. Following mutagenesis m32404, 14089, or 23436 protein. The gene can optionally of SEQ ID NO: 1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, further include non-coding sequences, e.g., regulatory or 42, the encoded protein can be expressed recombinantly sequences and introns. Preferably, a gene encodes a mam and the activity of the protein can be determined. malian 53070, 15985, 26583, 21953, m32404, 14089, or 45 As used herein, a “biologically active portion of a 53070, 23436 protein or derivative thereof. 15985, 26583, 21953, m32404, 14089, or 23436 protein An "isolated' or “purified’ polypeptide or protein is includes a fragment of a 53070, 15985, 26583, 21953, Substantially free of cellular material or other contaminating m32404, 14089, or 23436 protein which participates in an proteins from the cell or tissue source from which the protein interaction, e.g., an intramolecular or an inter-molecular is derived, or substantially free from chemical precursors or 50 interaction. An inter-molecular interaction can be a specific other chemicals when chemically synthesized. “Substan binding interaction or an enzymatic interaction (e.g., the tially free” means that a preparation of 53070, 15985,265.83, interaction can be transient and a covalent bond is formed or 21953, m32404, 14089, or 23436 protein is at least 10% broken). An inter-molecular interaction can be between a pure. In a preferred embodiment, the preparation of 53070, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 15985, 26583,21953, m32404, 14089, or 23436 protein has 55 molecule and a non-53070, -15985, -265.83, -21953, less than about 30%, 20%, 10% and more preferably 5% (by -m32404, -14089, or -23436 molecule or between a first dry weight), of non-53070, 15985, 26583, 21953, m32404, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 14089, or 23436 protein (also referred to herein as a “con molecule and a second 53070, 15985, 26583, 21953, taminating protein'), or of chemical precursors or non m32404, 14089, or 23436 molecule (e.g., a dimerization 53070, -15985, -26583, -21953, -m32404, -14089, or 60 interaction). Biologically active portions of a 53070, 15985, -23436 chemicals. When the 53070, 15985, 26583, 21953, 26583, 21953, m32404, 14089, or 23436 protein include m32404, 14089, or 23436 protein or biologically active peptides comprising amino acid sequences sufficiently portion thereof is recombinantly produced, it is also prefer homologous to or derived from the amino acid sequence of ably substantially free of culture medium, i.e., culture the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 medium represents less than about 20%, more preferably 65 protein, e.g., the amino acid sequence shown in SEQID NO: less than about 10%, and most preferably less than about 5% 2, 8, 15, 20, 25, 34, or 41, which include less amino acids of the volume of the protein preparation. The invention than the full length 53070, 15985, 26583, 21953, m32404, US 7,282,360 B2 71 72 14089, or 23436 proteins, and exhibit at least one activity of Sum 62 scoring matrix with a gap penalty of 12, a gap extend a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 penalty of 4, and a frameshift gap penalty of 5. protein. Typically, biologically active portions comprise a The percent identity between two amino acid or nucle domain or motif with at least one activity of the 53070, otide sequences can be determined using the algorithm of E. 15985, 26583, 21953, m32404, 14089, or 23436 protein, Meyers and W. Miller (1989) CABIOS, 4:11-17) which has e.g., the ability to phosphorylate a Substrate, to bind micro been incorporated into the ALIGN program (version 2.0), tubules and/or phosphorylate proteins, to remove the phos using a PAM 120 weight residue table, a gap length penalty phate from a serine or threonine residue of a phosphorylated of 12 and a gap penalty of 4. protein, to bind and/or cleave polypeptide substrates, to bind The nucleic acid and protein sequences described herein peptide sequences and exhibit proteolytic activity, to bind 10 can be used as a “query sequence' to perform a search proteolytic Substrates, or to de-ubiquitinate Substrates, against public databases to, for example, identify other respectively. family members or related sequences. Such searches can be . A biologically active portion of a 53070, 15985, 26583, performed using the NBLAST and XBLAST programs 21953, m32404, 14089, or 23436 protein can be a polypep (version 2.0) of Altschul, et al. (1990).J. Mol. Biol. 215: tide which is, for example, 10, 25, 50, 100, 200 or more 15 403-10. BLAST nucleotide searches can be performed with amino acids in length. Biologically active portions of a the NBLAST program, score=100, wordlength=12 to obtain 53070, 15985, 26583, 21953, m32404, 14089, or 23436 nucleotide sequences homologous to 53070, 15985, 26583, protein can be used as targets for developing agents which 21953, m32404, 14089, or 23436 nucleic acid molecules of modulate a 53070, 15985, 26583,21953, m32404, 14089, or the invention. BLAST protein searches can be performed 23436 mediated activity, e.g., Substrate phosphorylation; with the XBLAST program, score=50, wordlength=3 to protein kinase activity or microtubule binding; prolyl oli obtain amino acid sequences homologous to 53070 protein gopeptidase activity; protease activity; proteolytic cleavage molecules of the invention. To obtain gapped alignments for of a Substrate; or de-ubiquitinating activity or ubiquitin comparison purposes, Gapped BLAST can be utilized as carboxy-terminal hydrolase activity. described in Altschul et al., (1997) Nucleic Acids Res. Calculations of homology or sequence identity between 25 25:3389-3402. When utilizing BLAST and Gapped BLAST sequences (the terms are used interchangeably herein) are programs, the default parameters of the respective programs performed as follows. (e.g., XBLAST and NBLAST) can be used. To determine the percent identity of two amino acid Particularly preferred 53070, 15985, 26583, 21953, sequences, or of two nucleic acid sequences, the sequences m32404, 14089, or 23436 polypeptides of the present inven are aligned for optimal comparison purposes (e.g., gaps can 30 tion have an amino acid sequence Substantially identical to be introduced in one or both of a first and a second amino the amino acid sequence of SEQID NO:2, 8, 15, 20, 25, 34. acid or nucleic acid sequence for optimal alignment and or 41, respectively. In the context of an amino acid sequence, non-homologous sequences can be disregarded for compari the term “substantially identical' is used herein to refer to a son purposes). In a preferred embodiment, the length of a first amino acid that contains a sufficient or minimum reference sequence aligned for comparison purposes is at 35 number of amino acid residues that are i) identical to, or ii) least 30%, preferably at least 40%, more preferably at least conservative substitutions of aligned amino acid residues in 50%. 60%, and even more preferably at least 70%, 80%, a second amino acid sequence Such that the first and second 90%, 100% of the length of the reference sequence. The amino acid sequences can have a common structural domain amino acid residues or nucleotides at corresponding amino and/or common functional activity. For example, amino acid acid positions or nucleotide positions are then compared. 40 sequences that contain a common structural domain having When a position in the first sequence is occupied by the at least about about 60%, or 65% identity, likely 75% same amino acid residue or nucleotide as the corresponding identity, 80%, or 85% identity, likely 90% identity, more position in the second sequence, then the molecules are likely 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical at that position (as used herein amino acid or identity to SEQID NO:2, 8, 15, 20, 25, 34, or 41 are termed nucleic acid “identity” is equivalent to amino acid or nucleic 45 substantially identical. acid “homology’). In the context of nucleotide sequence, the term "substan The percent identity between the two sequences is a tially identical' is used herein to refer to a first nucleic acid function of the number of identical positions shared by the sequence that contains a Sufficient or minimum number of sequences, taking into account the number of gaps, and the nucleotides that are identical to aligned nucleotides in a length of each gap, which need to be introduced for optimal 50 second nucleic acid sequence Such that the first and second alignment of the two sequences. nucleotide sequences encode a polypeptide having common The comparison of sequences and determination of per functional activity, or encode a common structural polypep cent identity between two sequences can be accomplished tide domain or a common functional polypeptide activity. using a mathematical algorithm. In a preferred embodiment, For example, nucleotide sequences having at least about the percent identity between two amino acid sequences is 55 60%, or 65%, 70%, or 75% identity, likely 80% identity, determined using the Needleman and Wunsch (1970) J. more likely 85%, 90%. 91%, 92%, 93%, 94%, 95%, 96%, Mol. Biol. 48:444-453) algorithm which has been incorpo 97%, 98% or 99% identity to SEQID NO:1, 3, 7, 9, 14, 16, rated into the GAP program in the GCG software package, 19, 21, 24, 26, 33, 35, 40, or 42 are termed substantially using either a Blossum 62 matrix or a PAM250 matrix, and identical. a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight 60 “Misexpression or aberrant expression', as used herein, of 1, 2, 3, 4, 5, or 6. In yet another preferred embodiment, refers to a non-wildtype pattern of gene expression at the the percent identity between two nucleotide sequences is RNA or protein level. It includes: expression at non-wild determined using the GAP program in the GCG software type levels, i.e., over- or under-expression; a pattern of package using a NWSgapdna. CMP matrix and a gap weight expression that differs from wild type in terms of the time or of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, 65 stage at which the gene is expressed, e.g., increased or or 6. A particularly preferred set of parameters (and the one decreased expression (as compared with wild type) at a that should be used unless otherwise specified) are a Blos predetermined developmental period or stage; a pattern of US 7,282,360 B2 73 74 expression that differs from wild type in terms of altered, ID NO:15. In another embodiment, the nucleic acid mol e.g., increased or decreased, expression (as compared with ecule encodes a sequence corresponding to a fragment of the wild type) in a predetermined cell type or tissue type; a 21953 protein that includes amino acid 672 to 744, 88 to pattern of expression that differs from wild type in terms of 663, or 88 to 744 of SEQID NO:20. In another embodiment, the splicing size, translated amino acid sequence, post 5 the nucleic acid molecule encodes a sequence corresponding transitional modification, or biological activity of the to a fragment of the m32404 protein from about amino acid expressed polypeptide; a pattern of expression that differs 45 to 268 of SEQ ID NO:25. In another embodiment, the from wild type in terms of the effect of an environmental nucleic acid molecule encodes a sequence corresponding to stimulus or extracellular stimulus on expression of the gene, a fragment of the m32404 protein from about amino acid e.g., a pattern of increased or decreased expression (as 10 300 to 520 of SEQ ID NO:25. In another embodiment, the compared with wild type) in the presence of an increase or nucleic acid molecule encodes a sequence corresponding to decrease in the strength of the stimulus. a fragment of the 14089 protein from about amino acids 41 “Subject,” as used herein, refers to human and non-human to 234 or 24 to 234 of SEQ ID NO:34 or the mature 14089 animals. The term “non-human animals' of the invention protein (about amino acids 19 to 241 of SEQID NO:34). In includes all vertebrates, e.g., mammals, such as non-human 15 another embodiment, the nucleic acid molecule encodes a primates (particularly higher primates), sheep, dog, rodent sequence corresponding to a fragment of the 23436 protein (e.g., mouse or rat), guinea pig, goat, pig, cat, rabbits, cow, from about amino acid 89 to 420 of SEQ ID NO:41. and non-mammals, such as chickens, amphibians, reptiles, In another embodiment, an isolated nucleic acid molecule etc. In a preferred embodiment, the Subject is a human. In of the invention includes a nucleic acid molecule which is a another embodiment, the Subject is an experimental animal complement of the nucleotide sequence shown in SEQ ID or animal suitable as a disease model. NO:1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or 42, or a A “purified preparation of cells’, as used herein, refers to portion of any of these nucleotide sequences. In other an in vitro preparation of cells. In the case cells from embodiments, the nucleic acid molecule of the invention is multicellular organisms (e.g., plants and animals), a purified Sufficiently complementary to the nucleotide sequence preparation of cells is a subset of cells obtained from the 25 shown in SEQ ID NO:1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, organism, not the entire intact organism. In the case of 35, 40, or 42, such that it can hybridize (e.g., under a unicellular microorganisms (e.g., cultured cells and micro stringency condition described herein) to the nucleotide bial cells), it consists of a preparation of at least 10% and sequence shown in SEQ ID NO: 1, 3, 7, 9, 14, 16, 19, 21, more preferably 50% of the subject cells. 24, 26, 33, 35, 40, or 42, thereby forming a stable duplex. Various aspects of the invention are described in further 30 In one embodiment, an isolated nucleic acid molecule of detail below. the present invention includes a nucleotide sequence which is at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, Isolated Nucleic Acid Molecules 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more In one aspect, the invention provides, isolated or purified, homologous to the entire length of the nucleotide sequence nucleic acid molecules that encode 53070, 15985, 26583, 35 shown in SEQ ID NO: 1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 21953, m32404, 14089, and 23436 polypeptides described 35, 40, or 42, or a portion, preferably of at least 260, 300, herein, e.g., full-length 53070, 15985, 26583, 21953, 350, 400, 450, 500,520, 550, 590, 600, 650, 700, 750 800, m32404, 14089, or 23436 proteins or fragments thereof, 850, 900, 950, or 1000 nucleotides, of any of these nucle e.g., a biologically active portion of 53070, 15985, 26583, otide sequences. 21953, m32404, 14089, or 23436 protein. Also included is 40 a nucleic acid fragment Suitable for use as a hybridization Nucleic Acid Fragments probe, which can be used, e.g., to identify a nucleic acid A nucleic acid molecule of the invention can include only molecule encoding a polypeptide of the invention, 53070, a portion of the nucleic acid sequence of SEQ ID NO:1, 3, 15985, 26583, 21953, m32404, 14089, or 2343.6 mRNA, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or 42. For example, and fragments Suitable for use as primers, e.g., PCR primers 45 Such a nucleic acid molecule can include a fragment which for the amplification or mutation of nucleic acid molecules. can be used as a probe or primer or a fragment encoding a In one embodiment, an isolated nucleic acid molecule of portion of a 53070, 15985, 26583, 21953, m32404, 14089, the invention includes the nucleotide sequences shown in or 23436 protein, e.g., an immunogenic or biologically SEQ ID NO:1, 7, 14, 19, 24, 33, and 40, or a portion of any active portion of a 53070, 15985, 26583, 21953, m32404, of these nucleotide sequences. In one embodiment, the 50 14089, or 23436 protein. A fragment can comprise: those nucleic acid molecule includes sequences encoding the nucleotides of SEQID NO: 1 which encode a protein kinase human 53070, 15985, 26583, 21953, m32404, 14089, or domain of human 53070, e.g., about nucleotides 171 to 953 23436 protein (i.e., “the coding region” of SEQID NO:1, 7, of SEQID NO:1; those nucleotides of SEQID NO:7, which 14, 19, 24, 33, or 40, as shown in SEQID NO:3, 9, 16, 21, encode a protein kinase domain or a doublecortin repeat of 26, 35, or 42), as well as 5' untranslated sequences. Alter 55 human 15985; those nucleotides encoding amino acids 172 natively, the nucleic acid molecule can include only the to 461 or 99 to 523 of SEQ ID NO:15, which encode a coding region of SEQID NO:1, 7, 14, 19, 24, 33, or 40 (e.g., phosphatase catalytic domain of human 26583; those nucle SEQID NO3, 9, 16, 21, 26, 35, or 42) and, e.g., no flanking otides of SEQ ID NO:19 which encode a prolyl oligopep sequences which normally accompany the Subject sequence. tidase domain of human 21953; those nucleotides of SEQID In another embodiment, the nucleic acid molecule encodes 60 NO:24 which encode a trypsin domain of human m32404; a sequence corresponding to a fragment of the 15985 protein those nucleotides of SEQID NO:33, which encode a trypsin from about amino acid 394 to 651 of SEQ ID NO:8; a domain of human 14809; or those nucleotides of SEQ ID fragment from about amino acid 67 to 158 of SEQID NO:8: NO:40, which encode a ubiquitin carboxy-terminal hydro or a fragment from about amino acid 192 to 280 of SEQID lase domain of human 23436: The nucleotide sequence NO:8. In another embodiment, the nucleic acid molecule 65 determined from the cloning of the 53070, 15985, 26583, encodes a sequence corresponding to the mature 26583 21953, m32404, 14089, or 23436 gene allows for the protein from about amino acid 1 to amino acid 537 of SEQ generation of probes and primers designed for use in iden US 7,282,360 B2 75 76 tifying and/or cloning other 53070, 15985, 26583, 21953, In a preferred embodiment the nucleic acid is a probe m32404, 14089, or 23436 family members, or fragments which is at least 5, 10, 12, 15, 18, or 20 and less than 500, thereof, as well as 53070, 15985, 26583, 21953, m32404, 300, or 200, more preferably less than 100, or less than 50, 14089, or 23436 homologues, or fragments thereof, from base pairs in length. It should be identical, or differ by 1, or other species. 5 less than in 5 or 10 bases, from a sequence disclosed herein. In another embodiment, a nucleic acid includes a nucle If alignment is needed for this comparison the sequences otide sequence that includes part, or all, of the coding region should be aligned for maximum homology. "Looped out and extends into either (or both) the 5' or 3' noncoding sequences from deletions or insertions, or mismatches, are region. Other embodiments include a fragment which considered differences. includes a nucleotide sequence encoding an amino acid 10 A probe or primer can be derived from the sense or fragment described herein. Nucleic acid fragments can anti-sense strand of a nucleic acid which encodes: a kinase encode a specific domain or site described herein or frag domain of 53070, e.g., about nucleotides 171 to 953 of SEQ ments thereof, particularly fragments thereof which are at ID NO:1 or a portion thereof, or a C-terminal non-kinase least 50, 95, 100, 150, 200, 300, 360, 400, 600, 650, or 700 domain of 53070, e.g., about nucleotides 954 to 1241 of amino acids in length. Fragments also include nucleic acid 15 SEQID NO:1 or a portion thereof; a protein kinase domain sequences corresponding to specific amino acid sequences of 15985 from about amino acid 394 to 651 of SEQ ID described above or fragments thereof. Nucleic acid frag NO:8; and/or doublecortin repeats from about amino acids ments should not be construed as encompassing those frag 67 to 158 amino acids and from 192 to 280 of SEQID NO:8: ments that may have been disclosed prior to the invention. a serine/threonine phosphatase catalytic domain of 26583: A nucleic acid fragment can include a sequence corre amino acids 172 to 461 or 99 to 523 of SEQ ID NO:15; a sponding to a domain, region, or functional site described fragment of the 21953 protein that includes amino acid 672 herein. A nucleic acid fragment can also include one or more to 744, 88 to 663, or 88 to 744 of SEQID NO:20; a trypsin domains, regions, or functional sites described herein. Thus, domain of the 14089 polypeptide (about amino acid 24 to for example, a 53070 nucleic acid fragment can include a 234 or 41 to 234 of SEQID NO:34; or amino acids about 89 sequence corresponding to a protein kinase domain or a 25 to 420, 89 to 120, 332 to 378,379 to 420, 332 to 420, 236 C-terminal non-kinase domain; a 15985 nucleic acid frag to 291, 292 to 327, 336 to 374, 176 to 423 and 85 to 105 of ment can include a sequence corresponding to protein kinase SEQ ID NO:41. domain or a doublecortin repeat; a 26583 nucleic acid In another embodiment a set of primers is provided, e.g., fragment can include a serine/threonine phosphatase cata primers suitable for use in a PCR, which can be used to lytic domain, a protein kinase C phosphorylation site, an 30 amplify a selected region of a 53070, 15985, 26583, 21953, N-glycosylation site, a casein kinase II phosphorylation site, m32404, 14089, or 23436 sequence, e.g., a domain, region, an N-myristoylation sit, an amidation site, a protein phos site or other sequence described herein. The primers should phatase 2C signature domain, or any combination thereof a be at least 5, 10, or 50 base pairs in length and less than 100, 21953 nucleic acid fragment can include a sequence corre or less than 200, base pairs in length. The primers should be sponding to a prolyl oligopeptidase domain; an m32404 35 identical, or differs by one base from a sequence disclosed nucleic acid fragment can include a sequence corresponding herein or from a naturally occurring variant. For example, to a trypsin domain; a 14089 nucleic acid fragment can primers Suitable for amplifying all or a portion of any of the include a sequence corresponding to a trypsin domain; and following regions are provided: a molecule that encodes a a 23436 nucleic acid fragment can include a sequence protein kinase domain, from about nucleotides 171 to 593 of corresponding to a ubiquitin carboxy-terminal hydrolase 40 SEQ ID NO:1; a molecule that encodes a C-terminal non domain; kinase domain, from about nucleotides 954 to 1241 of SEQ In one embodiment, a nucleic acid fragment can include ID NO:1, a protein kinase domain from about amino acid nucleotides 1 to 250, 50 to 300, 100 to 350, 150 to 400, 200 394 to 651 of SEQ ID NO:8; a doublecortin repeat from to 450, 250 to 500, 300 to 650, 350 to 700, 400 to 700, 450 about amino acids 67 to 158 of SEQ ID NO:8, or a to 750, 500 to 800, 550 to 850, 600 to 900, 650 to 950, 700 45 doublecortin repeat from 192 to 280 of SEQ ID NO:8; the to 1000, 800 to 1200, 900 to 1300, 1000 to 1400, 1100 to serine/threonine phosphatase catalytic domain (amino acid 1500, 1200 to 1600, 1300 to 1700, 1400 to 1800, 1500 to residues 172 to 461 or 99 to 523 of SEQID NO:15); a prolyl 1900, 1600 to 2000, 1700 to 2100, 1253 to 1307, 1253 to oligopeptidase domain from about amino acid 672 to 744, 88 1337, 1241 to 1379, 1382 to 1505, 1241 to 1505, 773 to to 663, or 88 to 744 of SEQ ID NO:20; a trypsin domain 1514,953 to 1118,953 to 1226, 1121 to 1226, 1253 to 1367, 50 from about amino acid 45 to 268 of SEQID NO:25; a trypsin 773 to 1514, 500 to 560, or 512 to 605 of SEQID NO:40, domain from about amino acid 311 to 520 of SEQ ID or any combination thereof. NO:25; a histidine active site located at about amino acid 73 In a preferred embodiment, the fragment is at least 300, to 78 of SEQ ID NO:25; a histidine active site located at 500, 520, 590, 600, 650, 700, 750, 800, 850, 900, 950, or about amino acid 337 to 342 of SEQID NO:25; and a serine 1000 nucleotides in length. 55 active site located at about amino acid 222 to 226 of SEQID 53070, 15985, 26583, 21953, m32404, 14089, or 23436 NO:25; a trypsin domain from about amino acid 24 to 234 probes and primers are provided. Typically a probe?primer is or 41 to 234 ofr SEQ ID NO:34: a conserved histidine an isolated or purified oligonucleotide. The oligonucleotide residue present at about amino acid 56 of SEQID NO:34: a typically includes a region of nucleotide sequence that serine active site located at amino acid 195 of SEQ ID hybridizes under a stringency condition described herein to 60 NO:34; and a ubiquitin carboxy-terminal hydrolase domain at least about 7, 12 or 15, preferably about 20 or 25, more from about amino acid 89 to 420 of SEQ ID NO:41. preferably about 30, 35, 40, 45, 50, 55, 60, 65, or 75 A nucleic acid fragment can encode an epitope bearing consecutive nucleotides of a sense or antisense sequence of region of a polypeptide described herein. SEQID NO:1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or A nucleic acid fragment encoding a “biologically active 42, or of a naturally occurring allelic variant or mutant of 65 portion of a 53070, 15985, 26583, 21953, m32404, 14089, SEQID NO:1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or or 23436 polypeptide' can be prepared by isolating a portion 42. of the nucleotide sequence of SEQID NO:1, 3, 7, 9, 14, 16, US 7,282,360 B2 77 78 19, 21, 24, 26, 33, 35, 40, or 42, which encodes a polypep the region about nucleotides 1-200, 138-301, 171-401, 302 tide having a 53070, 15985, 26583, 21953, m32404, 14089, 569, 402-692, 531-812, 660-932, 773-953, 873-1112,954 or 23436 biological activity (e.g., the biological activities of 1160, 1053-1241, 1161-1400, 1242-1550, 1350-1600, 1550 the 53070, 15985, 26583, 21953, m32404, 14089, and 1704 of SEQ ID NO:1 or SEQ ID NO:3. In a preferred 23436 proteins are described herein), expressing the embodiment, the nucleic acid includes a contiguous encoded portion of the 53070, 15985, 26583, 21953, sequence that includes approximately nucleotide 1640, or m32404, 14089, or 23436 protein (e.g., by recombinant 1642 of SEQ ID NO:19, e.g., the region from nucleotide expression in vitro) and assessing the activity of the encoded 1635 to 1645 of SEQ ID NO:19. In other embodiment the portion of the 53070, 15985, 26583, 21953, m32404, 14089, nucleic acid includes a contiguous sequence that includes or 23436 protein. For example, a nucleic acid fragment 10 about nucleotides 1 to 25, 1 to 66, 100 to 300, 300 to 700, encoding a biologically active portion of 53070 includes a 500 to 800, 800 to 1200, 1000 to 1400, or 1200 to 1600 of protein kinase domain, e.g., about nucleotides 171 to 953 of SEQ ID NO:19. SEQID NO:1. A nucleic acid fragment encoding a biologi In a preferred embodiment, a nucleic acid fragment differs cally active portion of 15985 includes a protein kinase by at least 1, 2, 3, 10, 20, or more nucleotides from, the domain, e.g., amino acid residues about 394 to 651 of SEQ 15 sequence of Genbank accession number U66059, e.g., from ID NO:8, a doublecortin repeat from about amino acid 67 to nucleotides 315-571 of SEQ ID NO:33; the sequence of 158 of SEQ ID NO:8, or a doublecortin repeat from about SEQ ID NO:247 of WO 01/40466; the sequence of SEQID amino acid 192 to 280 of SEQ ID NO:8. A nucleic acid NO:5 or 6 of WO 01/72961: the sequence of SEQID NO:22 fragment encoding a biologically active portion of 26583 of WO 01/71004. Differences can include differing in length includes a serine/threonine phosphatase catalytic domain, or sequence identity. For example, a nucleic acid fragment e.g., amino acid residues 99 to 523 of SEQ ID NO:15. A can: include one or more nucleotides from SEQ ID NO:33 nucleic acid fragment encoding a biologically active portion or SEQ ID NO:35 located outside the region of nucleotides of 21953 includes a prolyl oligopeptidase domain e.g., 315 to 571, 94 to 938, 136 to 861, 173 to 861, 1-570, 572 amino acid residues about 672 to 744, 88 to 663, or 88 to 744 to 947 of SEQ ID NO:33, e.g., can be one or more nucle of SEQ ID NO:20. A nucleic acid fragment encoding a 25 otides shorter (at one or both ends) than the sequence of biologically active portion of m32404 includes a trypsin Genbank accession number U66059, e.g., from nucleotides domain, e.g., amino acid residues about 45 to 268 or 311 to 315-571 of SEQID NO:33; the sequence of SEQID NO:247 520 of SEQID NO:25. A nucleic acid fragment encoding a of WO 01/40466; the sequence of SEQID NO:5 or 6 of WO biologically active portion of 14089 includes a trypsin 01/72961: the sequence of SEQID NO:22 of WO 01/71004; domain, e.g., amino acid residues about 24 to 234 or 41 to 30 or can differ by one or more nucleotides in the region of 234 of SEQID NO:34. A nucleic acid fragment encoding a overlap. biologically active portion of 23436 includes a ubiquitin carboxy-terminal hydrolase domain, e.g. amino acid resi Nucleic Acid Variants dues about 89 to 420 of SEQ ID NO:41. A nucleic acid The invention further encompasses nucleic acid mol fragment encoding a biologically active portion of a 53070, 35 ecules that differ from the nucleotide sequence shown in 15985, 26583, 21953, m32404, 14089, or 23436 polypep SEQID NO: 1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33, 35, 40, or tide, may comprise a nucleotide sequence which is greater 42. Such differences can be due to degeneracy of the genetic than 280, 300, 361, 400, 470, 800, 1000, 1600, or more code (and result in a nucleic acid which encodes the same nucleotides in length. 53070, 15985, 26583, 21953, m32404, 14089, or 23436 In preferred embodiments, the nucleic acid fragment 40 proteins as those encoded by the nucleotide sequence dis includes a nucleotide sequence that is other than, e.g., differs closed herein. In another embodiment, an isolated nucleic by at least one, two, three of more nucleotides from, the acid molecule of the invention has a nucleotide sequence sequence of AA498169 or AI480580. E.g., a nucleic acid encoding a protein having an amino acid sequence which fragment can: include one or more nucleotides from SEQID differs, by at least 1, but less than 5, 10, 20, 50, or 100 amino NO:24 or SEQ ID NO:26 outside the region of nucleotides 45 acid residues that shown in SEQID NO: 2, 8, 15, 20, 25, 34, 1699-2033 or 1711-2034 of SEQ ID NO:24; not include all or 41. If alignment is needed for this comparison the of the nucleotides of AA498169 or AI480580, e.g., can be sequences should be aligned for maximum homology. one or more nucleotides shorter (at one or both ends) than “Looped out sequences from deletions or insertions, or the sequence of AA498169 or AI480580; or can differ by mismatches, are considered differences. one or more nucleotides in the region of overlap. 50 Nucleic acids of the invention can be chosen for having In preferred embodiments, the fragment comprises the codons, which are preferred, or non-preferred, for a particu coding region of 46508, e.g., the nucleotide sequence of lar expression system. E.g., the nucleic acid can be one in SEQ ID NO:26. In other embodiments, the fragment com which at least one codon, at preferably at least 10%, or 20% prises nucleotides 1-1698 or 2034-2219 of SEQ ID NO:24, of the codons has been altered such that the sequence is or a fragment thereof (e.g., nucleotides 1-500, 500-1000, 55 optimized for expression in E. coli, yeast, human, insect, or 1000-1698, 2034-2100, 2100-2200, or 2200-2219 of SEQ CHO cells. ID NO:24). Nucleic acid variants can be naturally occurring, such as In preferred embodiments, a nucleic acid includes a allelic variants (same locus), homologs (different locus), and nucleotide sequence which is about 300, 340,400, 500, 590, orthologs (different organism) or can be non naturally occur 600, 650, 700, 750, 800, 850, 900, 1000, 1100, 1200, 1300, 60 ring. Non-naturally occurring variants can be made by 1400, 1500, 1600, 1700, 1800, 1900, 2100, 2200, 2300, mutagenesis techniques, including those applied to poly 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, nucleotides, cells, or organisms. The variants can contain 3300, 3400, 3500, 3550, or more nucleotides in length and nucleotide Substitutions, deletions, inversions and inser hybridizes under a stringency condition described herein to tions. Variation can occur in either or both the coding and a nucleic acid molecule of SEQID NO:1, 3, 7, 9, 14, 16, 19, 65 non-coding regions. The variations can produce both con 21, 24, 26, 33, 35, 40, or 42. In a preferred embodiment, a servative and non-conservative amino acid Substitutions (as nucleic acid includes at least one contiguous nucleotide from compared in the encoded product). US 7,282,360 B2 79 80 In a preferred embodiment, the nucleic acid differs from or 41, or substitution, deletion or insertion of non-critical that of SEQID NO: 1, 3, 7, 9, 14, 16, 19, 21, 24, 26, 33,35, residues in non-critical regions of the protein. Non-func 40, or 42, e.g., as follows: by at least one but less than 10, tional allelic variants are naturally-occurring amino acid 20, 30, or 40 nucleotides; at least one but less than 1%. 5%, sequence variants of the 53070, e.g., human 53070, protein 10% or 20% of the nucleotides in the subject nucleic acid. within a population that do not have the ability to bind ATP If necessary for this analysis the sequences should be or phosphorylate some or all substrates. Non-functional aligned for maximum homology. "Looped out sequences allelic variants of the 15985 protein not have the ability to from deletions or insertions, or mismatches, are considered bind to cytoskeletal proteins, e.g., microtubules, or phos differences. phorylate proteins. Non-functional allelic variants of the Orthologs, homologs, and allelic variants can be identified 10 26583 protein not have the ability to to remove the phos using methods known in the art. These variants comprise a phate from a serine or threonine residue of a phosphorylated nucleotide sequence encoding a polypeptide that is 50%, at protein. Non-functional allelic variants of the 21953 protein least about 55%, typically at least about 70-75%, more not have the ability to bind and/or cleave polypeptide typically at least about 80-85%, and most typically at least Substrates, e.g., a polypeptide having a proline residue. about 90-95% or more identical to the sequence shown in 15 Non-functional allelic variants of the m32404 protein not SEQ ID NO:2, 8, 15, 20, 25, 34, or 41 or a fragment of this have the ability to peptide sequences and exhibit proteolytic sequence. Such nucleic acid molecules can readily be iden activity. Non-functional allelic variants of the 14089 protein tified as being able to hybridize under a stringency condition not have the ability to cleave a substrate. Non-functional described herein, to the nucleotide sequence shown in SEQ allelic variants of the 23436 protein not have the ability to ID NO: 1, 7, 14, 19, 24, 33, or 40 or a fragment of the de-ubiquitinate substrates. Non-functional allelic variants sequence. Nucleic acid molecules corresponding to will typically contain a non-conservative Substitution, a orthologs, homologs, and allelic variants of the 53070, deletion, or insertion, or premature truncation of the amino 15985, 26583, 21953, m32404, 14089, or 23436 cDNAs of acid sequence of SEQID NO:2, 8, 15, 20, 25, 34, or 41, or the invention can further be isolated by mapping to the same a Substitution, insertion, or deletion in critical residues or chromosome or locus as the 53070, 15985, 26583, 21953, 25 critical regions of the protein. m32404, 14089, or 23436 gene. Moreover, nucleic acid molecules encoding other 53070, Preferred 53070 variants include those that are correlated 15985, 26583, 21953, m32404, 14089, or 23436 family with protein kinase activity, particularly serine/threonine members and, thus, which have a nucleotide sequence which protein kinase activity. Preferred 15985 variants include differs from the 53070, 15985, 26583, 21953, m32404, those that are correlated with protein kinase and/or micro 30 14089, or 23436 sequences of SEQID NO:1, 3, 7, 9, 14, 16, tubule binding activity. Preferred 26583 variants include 19, 21, 24, 26, 33, 35, 40, or 42 are intended to be within the those that are correlated with phosphatase activity, e.g., scope of the invention. serine/threonine phosphatase activity. Preferred 21953 vari ants include those that are correlated with dipeptidyl pepti Antisense Nucleic Acid Molecules, Ribozymes and Modi dase or prolyl endopeptidases activity. Preferred m32404 35 fied Nucleic Acid Molecules variants include those that are correlated with modulating In still another embodiment, a ribozyme. A ribozyme (stimulating and/or enhancing or inhibiting) cellular prolif having specificity for a 53070-, 15985- 26583-, 21953-, eration, differentiation, or tumorigenesis; modulating an m32404-, 14089-, or 23436-encoding nucleic acid can immune response (i.e. modulating the complementation include one or more sequences complementary to the nucle system); modulating hormone production; modulating the 40 otide sequence of a 53070, 15985, 26583, 21953, m32404, blood clotting cascade; or modulating of protein 14089, or 23436 cDNA disclosed herein (i.e., 1, 3, 7, 9, 14, substrates. Preferred 14089 variants include those that are 16, 19, 21, 24, 26, 33, 35, 40, or 42), and a sequence having correlated with proteolytic cleave of substrates. Preferred known catalytic sequence responsible for mRNA cleavage 23436 variants include those that are correlated with de (see U.S. Pat. No. 5,093.246 or Haselhoff and Gerlach ubiquitinating activity. 45 (1988) Nature 334:585-591). For example, a derivative of a Allelic variants of 53070, 15985, 26583, 21953, m32404, Tetrahymena L-19 IVS RNA can be constructed in which the 14089, or 23436, e.g., human 53070, 15985, 26583, 21953, nucleotide sequence of the active site is complementary to m32404, 14089, or 23436, include both functional and the nucleotide sequence to be cleaved in a 53070-, 15985 non-functional proteins. Functional allelic variants are natu 26583-, 21953-, m32404-, 14089-, or 23436-encoding rally occurring amino acid sequence variants of the 53070 50 mRNA. See, e.g., Cech et al. U.S. Pat. No. 4,987,071; and protein within a population that maintain the ability to bind Cech et al. U.S. Pat. No. 5,116,742. Alternatively, 53070, ATP and phosphorylate substrates. Functional allelic vari 15985, 26583, 21953, m32404, 14089, or 23436 mRNA can ants of the 15985 protein maintain the ability to bind be used to select a catalytic RNA having a specific ribonu microtubules and/or phosphorylate proteins. Functional clease activity from a pool of RNA molecules. See, e.g., allelic variants of the 26583 protein maintain the ability to 55 Bartel, D. and Szostak, J. W. (1993) Science 261:141 1-1418. remove the phosphate from a serine or threonine residue of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 a phosphorylated protein. Functional allelic variants of the gene expression can be inhibited by targeting nucleotide 21953 protein maintain the ability to bind and/or cleave sequences complementary to the regulatory region of the polypeptide Substrates, e.g., a polypeptide having a proline 53070, 15985, 26583, 21953, m32404, 14089, or 23436 residue. Functional allelic variants of the m32404 protein 60 (e.g., the 53070, 15985, 26583, 21953, m32404, 14089, or maintain the ability to bind peptide sequences and exhibit 23436 promoter and/or enhancers) to form triple helical proteolytic activity. Functional allelic variants of the 14089 structures that prevent transcription of the 53070, 15985, protein maintain the ability to bind proteolytic substrates. 26583, 21953, m32404, 14089, or 23436 gene in target cells. Functional allelic variants of the 23436 protein maintain the See generally, Helene, C. (1991) Anticancer Drug Des. ability to de-ubiquitinate substrates. Functional allelic vari 65 6:569-84: Helene, C. i (1992) Ann. N. Y. Acad. Sci. 660:27 ants will typically contain only conservative Substitution of 36; and Maher, L. J. (1992) Bioassays 14:807-15. The one or more amino acids of SEQID NO:2, 8, 15, 20, 25, 34, potential sequences that can be targeted for triple helix US 7,282,360 B2 81 82 formation can be increased by creating a so-called “switch complementary regions one having a fluorophore and one a back nucleic acid molecule. Switchback molecules are quencher such that the molecular beacon is useful for synthesized in an alternating 5'-3', 3'-5' manner, such that quantitating the presence of the 53070, 15985, 26583, they base pair with first one strand of a duplex and then the 21953, m32404, 14089, or 23436 nucleic acid of the inven other, eliminating the necessity for a sizeable stretch of 5 tion in a sample. Molecular beacon nucleic acids are either purines or pyrimidines to be present on one strand of described, for example, in Lizardi et al., U.S. Pat. No. a duplex. 5,854,033; Nazarenko et al., U.S. Pat. No. 5,866,336, and The invention also provides detectably labeled oligo Livak et al., U.S. Pat. No. 5,876,930. nucleotide primer and probe molecules. Typically, Such Isolated Polypeptides of the Invention labels are chemiluminescent, fluorescent, radioactive, or 10 In another aspect, the invention features, an isolated colorimetric. 53070, 15985, 26583, 21953, m32404, 14089, or 23436 A53070, 15985, 26583, 21953, m32404, 14089, or 23436 protein, or fragment, e.g., a biologically active portion, for nucleic acid molecule can be modified at the base moiety, Sugar moiety or phosphate backbone to improve, e.g., the use as immunogens or antigens to raise or test (or more stability, hybridization, or solubility of the molecule. For 15 generally to bind) anti-53070, -15985, -26583, -21953, non-limiting examples of synthetic oligonucleotides with -m32404, -14089, or -23436 antibodies. 53070, 15985, 26583, 21953, m32404, 14089, or 23436 protein can be modifications see Toulme (2001) Nature Biotech. 19:17 and isolated from cells or tissue sources using standard protein Faria et al. (2001) Nature Biotech. 19:40-44. Such phos purification techniques. 53070, 15985, 26583, 21953, phoramidite oligonucleotides can be effective antisense m32404, 14089, or 23436 protein or fragments thereof can agents. be produced by recombinant DNA techniques or synthesized For example, the deoxyribose phosphate backbone of the chemically. nucleic acid molecules can be modified to generate peptide Polypeptides of the invention include those which arise as nucleic acids (see Hyrup B. et al. (1996) Bioorganic & a result of the existence of multiple genes, alternative Medicinal Chemistry 4: 5-23). As used herein, the terms transcription events, alternative RNA splicing events, and "peptide nucleic acid” or “PNA' refers to a nucleic acid 25 alternative translational and post-translational events. The mimic, e.g., a DNA mimic, in which the deoxyribose phos polypeptide can be expressed in Systems, e.g., cultured cells, phate backbone is replaced by a pseudopeptide backbone which result in Substantially the same post-translational and only the four natural nucleobases are retained. The modifications present when the polypeptide is expressed in neutral backbone of a PNA can allow for specific hybrid a native cell, or in Systems which result in the alteration or ization to DNA and RNA under conditions of low ionic 30 omission of post-translational modifications, e.g., glycosy strength. The synthesis of PNA oligomers can be performed lation or cleavage, present when expressed in a native cell. using standard solid phase peptide synthesis protocols as In a preferred embodiment, a 53070 polypeptide has one described in Hyrup B. et al. (1996) supra and Perry-O'Keefe or more of the following characteristics: et al. Proc. Natl. Acad. Sci. 93: 14670-675. it has the ability to bind a nucleoside tri-phosphate, e.g., PNAs of 53070, 15985, 26583,21953, m32404, 14089, or 35 ATP: 23436 nucleic acid molecules can be used in therapeutic and it has the ability to phosphorylate a Substrate protein, e.g., diagnostic applications. For example, PNAS can be used as another protein or itself: antisense or antigene agents for sequence-specific modula it has a molecular weight, e.g., a deduced molecular tion of gene expression by, for example, inducing transcrip weight, preferably ignoring any contribution of post trans tion or translation arrest or inhibiting replication. PNAs of 40 lational modifications, amino acid composition or other 53070, 15985, 26583, 21953, m32404, 14089, or 23436 physical characteristic of a 53070 polypeptide, e.g., a nucleic acid molecules can also be used in the analysis of polypeptide of SEQ ID NO:2: single base pair mutations in a gene, (e.g., by PNA-directed it has an overall sequence similarity of at least 60%, 70%, PCR clamping); as artificial restriction enzymes when used preferably at least 75%, more preferably at least 80%, 90%, in combination with other enzymes, (e.g., S1 nucleases 45 (Hyrup B. et al. (1996) supra); or as probes or primers for or 95%, with a polypeptide of SEQ ID NO:2: DNA sequencing or hybridization (Hyrup B. et al. (1996) it has a protein kinase domain which is preferably about supra; Perry-O'Keefe supra). 80%, 90%. 95%, or more homologous with amino acid In other embodiments, the oligonucleotide may include residues about 12 to 272 of SEQ ID NO:2: other appended groups such as peptides (e.g., for targeting 50 it has a serine/threonine protein kinase active-site signa host cell receptors in vivo), or agents facilitating transport ture motif (PS00108); across the cell membrane (see, e.g., Letsinger et al. (1989) it has at least one, preferably two, three, four, five, six, Proc. Natl. Acad. Sci. USA 86:6553-6556; lemaitre et al. seven, eight, nine, ten, eleven, twelve, or more preferably (1987) Proc. Natl. Acad. Sci. USA 84:648-652; PCT Publi thirteen of the invariant amino acid residues present in cation No. W088/09810) or the blood-brain barrier (see, e.g., 55 protein kinase family members, and described above; PCT Publication No. W089/10134). In addition, oligonucle it has at least one, two, three, four, preferably five otides can be modified with hybridization-triggered cleav predicted Protein kinase C phosphorylation sites (PS00005); age agents (see, e.g., Krol et al. (1988) Bio-Techniques it has at least one, two, preferably three predicted Casein 6:958-976) or intercalating agents. (see, e.g., Zon (1988) kinase II phosphorylation sites (PS00006); and Phann. Res. 5:539-549). To this end, the oligonucleotide 60 it has at least one predicted N-myristoylation site may be conjugated to another molecule, (e.g., a peptide, (PS00008). hybridization triggered cross-linking agent, transport agent, In a preferred embodiment, a 15985 polypeptide has one or hybridization-triggered cleavage agent). or more of the following characteristics: The invention also includes molecular beacon oligonucle it has the ability to phosphorylate a protein Substrate, e.g., otide primer and probe molecules having at least one region 65 a serine and/or threonine side chains of a protein Substrate; which is complementary to a 53070, 15985, 26583, 21953, it has the ability to bind to cytoskeletal proteins, e.g., m32404, 14089, or 23436 nucleic acid of the invention, two microtubules; US 7,282,360 B2 83 84 it has the ability to modulate cell morphology and/or it can be found in human tissue; migration; it has a trypsin domain with a sequence which is prefer it has a molecular weight, e.g., a deduced molecular ably about 70%, 80%, 90% or 95% similar with amino acid weight, preferably ignoring any contribution of post trans residues about 45 to 268 or 311 to 520 of SEQ ID NO:25: lational modifications, amino acid composition or other O physical characteristic of a 15985 polypeptide, e.g., a it has at least 10, preferably at least 12, and most prefer polypeptide of SEQ ID NO:8: ably at least 16 of the 22 cysteines found in the amino acid it has an overall sequence similarity of at least 60%, more sequence of the native protein. preferably at least 70, 80,90, or 95%, with a polypeptide of In a preferred embodiment, a 14089 polypeptide has one SEQ ID NO:8: 10 or more of the following characteristics: it can be found in a tumor cell (e.g., an ovarian, lung, or it has protease activity; breast tumor cell), neuronal cells; it has a molecular weight, e.g., a deduced molecular it has a protein kinase domain which is preferably about weight, preferably ignoring any contribution of post-trans 70%, 80%, 90% or 95% identical with amino acid residues lational modifications, amino acid composition or other about 394 to 651 of SEQ ID NO:8: 15 physical characteristic of a 14089 polypeptide, e.g., a it can colocalize with microtubules; or polypeptide of SEQ ID NO:34: it has at least one, and preferably two doublecortin repeats it has an overall sequence similarity of at least 60%, more which are preferably about 70%, 80%, 90% or 95% identical preferably at least 70, 80,90, or 95%, with a polypeptide of with amino acid residues from about amino acids 67 to 158 SEQ ID NO:34: and/or 192 to 280 of SEQ ID NO:8. it has a trypsin domain which is preferably about 70%, In a preferred embodiment, a 26583 polypeptide has one 80%, 90% or 95% with amino acid residues about 24 to 234 or more of the following characteristics: or 41 to 234 of SEQ ID NO:34; or it has the ability to promote removal of phosphate from it has at least 5, preferably 7, and most preferably 8 of the phosphorylated serine or threonine residues of protein; 9 cysteines found in the amino acid sequence of the native it has a molecular weight (e.g., a deduced molecular 25 protein. weight), amino acid composition or other physical charac In a preferred embodiment, a 23436 polypeptide has one teristic of a 26583 protein, e.g., a 26583 protein of SEQ ID or more of the following characteristics: NO:15; it has the ability to de-ubiquitinate Substrates, e.g., by it has an overall sequence similarity of at least 60%, more means of a ubiquitin carboxy-terminal hydrolase activity; preferably at least 70, 80,90, 95%, most preferably at least 30 it has a molecular weight, e.g., a deduced molecular 99%, with a polypeptide encoded by SEQ ID NO:16: weight, preferably ignoring any contribution of post trans it has a phosphatase catalytic domain which is preferably lational modifications, amino acid composition or other about 70%, 80%, 90%. 95%, most preferably at least 99%, physical characteristic of SEQ ID NO:41: identical to amino acid residues 99-523 of SEQ ID NO:15: it has an overall sequence similarity of at least 60%, more it has a phosphatase catalytic domain which is preferably 35 preferably at least 70, 80, 90, 95%, 97%, 98% or 99%, with about 70%, 80%, 90%. 95%, most preferably at least 99%, a polypeptidea of SEQ ID NO:41: identical to with amino acid residues 172 to 461 of SEQID it can be found in erythroid cells, erythroid precursors, NO:15; or liver, prostate, and hypothalamus; it has at least 70%, preferably at least 80%, and most it has a ubiquitin carboxy-terminal hydrolase (family 2) preferably at least 95% of the cysteines found in the amino 40 domain which is preferably about 70%, 80%, 90%, 95%, acid sequence of the native protein. 98%, or 99% homologous with amino acid residues about 89 In a preferred embodiment, a 21953 polypeptide has one to 420 of SEQ ID NO:41; and/or or more of the following characteristics: it has a conserved cysteine at about amino acid 98 of SEQ it has the ability to promote the degradation of proline ID NO:41 and two conserved histidines at about amino acids containing peptides by cleaving the peptide bond at the 45 344 and 353 of SEQ ID NO:41. carboxyl side of proline residues; In a preferred embodiment the 53070, 15985, 26583, it has a molecular weight, (e.g., about 97 KDa), amino 21953, m32404, 14089, or 23436 protein, or fragment acid composition, or other physical characteristic, of a thereof, differs from the corresponding sequence in SEQID: 21953 polypeptide, e.g., a polypeptide of SEQ ID NO:20; 2, 8, 15, 20, 25, 34, or 41. In one embodiment it differs by it has an overall sequence similarity of at least 60%, more 50 at least one but by less than 15, 10 or 5 amino acid residues. preferably at least 70, 80,90, or 95%, with a polypeptide of In another it differs from the corresponding sequence in SEQ SEQ ID NO:20; ID NO: 2, 8, 15, 20, 25, 34, or 41 by at least one residue but it has a prolyl oligopeptidase domain which has prefer less than 20%, 15%, 10% or 5% of the residues in it differ ably about 70%, 80%, 90% or 95% sequence similarity with from the corresponding sequence in SEQ ID NO: 2, 8, 15, amino acid residues 672-744 of SEQ ID NO:20; or 55 20, 25, 34, or 41. (If this comparison requires alignment the it has at least 70%, preferably 80%, and most preferably sequences should be aligned for maximum homology. 90% of the cysteines found in the amino acid sequence of the “Looped out sequences from deletions or insertions, or native protein (SEQ ID NO:20). mismatches, are considered differences.) The differences In a preferred embodiment, an m32404 polypeptide has are, preferably, differences or changes at a non essential one or more of the following characteristics: 60 residue or a conservative substitution. In a preferred embodi it exhibits proteolytic activity; ment the differences are not in the 53070 protein kinase it has a molecular weight, or an amino acid composition domain, e.g., about amino acid residues 12 to 272 of SEQID of an m32404 polypeptide, e.g., a polypeptide of SEQ ID NO:2. In another preferred embodiment one or more differ NO:25. ences are in the 53070 protein kinase domain, e.g., about it has an overall sequence similarity of at least 60%, more 65 amino acid residues 12 to 272 of SEQ ID NO:2 . In a preferably at least 70, 80,90, or 95%, with a polypeptide of preferred embodiment the differences are not in the 15985 SEQ ID NO:25; protein kinase domain nor in the 15985 doublecortin repeats. US 7,282,360 B2 85 86 In another preferred embodiment one or more differences “Looped out sequences from deletions or insertions, or are in the 15985 protein kinase domain and/or the 15985 mismatches, are considered differences.) In some embodi doublecortin repeats. In a preferred embodiment, the differ ments the difference is at a non-essential residue or is a ences are not in the 265.83 serine/threonine phosphatase conservative substitution, while in others the difference is at catalytic domain. In another preferred embodiment one or an essential residue or is a non-conservative Substitution. more differences are at 26583 non-active site residues, e.g., A 15985 protein or fragment is provided which varies amino acids 1-98, or 524 to 537 of SEQ ID NO:15. In a from the sequence of SEQ ID NO:8 in regions defined by preferred embodiment the differences are not in the 21953 amino acids about 67 to 158, 192 to 280, and 394 to 651 by prolyl oligopeptidase domain and/or the DPP IV N-terminal at least one but by less than 15, 10 or 5 amino acid residues domain. In another preferred embodiment one or more 10 in the protein or fragment but which does not differ from differences are in the 21953 prolyl oligopeptidase domain SEQID NO:8 in regions defined by amino acids about 67 to and/or the DPP IV N-terminal domain. In a preferred 158, 192 to 280, and 394 to 651. (If this comparison requires embodiment the differences are not in the m32404 trypsin alignment the sequences should be aligned for maximum domain, i.e., from about amino acid 45 to 268 or 311 to 520 homology. “Looped out sequences from deletions or inser of SEQ ID NO:25. In another preferred embodiment one or 15 tions, or mismatches, are considered differences.) In some more differences are in the m32404 trypsin domain, i.e., embodiments the difference is at a non-essential residue or from about amino acid 45 to 268 or 311 to 520 of SEQ ID is a conservative substitution, while in others the difference NO:25. In a preferred embodiment the differences are not in is at an essential residue or is a non-conservative Substitu the 14089 trypsin domain. In another preferred embodiment tion. one or more differences are in the 14089 trypsin domain. In In one embodiment, a biologically active portion of a a preferred embodiment the differences are not in the 23436 15985 protein includes a protein kinase domain and/or ubiquitin carboxy-terminal hydrolase domain, e.g., the doublecortin repeats. Moreover, other biologically active region from about amino acid 89 to 120 and 332 to 420 of portions, in which other regions of the protein are deleted, SEQ ID NO:41. In another preferred embodiment one or can be prepared by recombinant techniques and evaluated more differences are in the 23436 ubiquitin carboxy-termi 25 for one or more of the functional activities of a native 15985 nal hydrolase domain, e.g., the region from about amino acid protein. 89 to 120 and 332 to 420 of SEQ ID NO:41. In another embodiment, the protein includes an amino Other embodiments include a protein that contain one or acid sequence at least 213 amino acids in length, and about more changes in amino acid sequence, e.g., a change in an 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, homolo amino acid residue which is not essential for activity. Such 30 gous to SEQ ID NO:15. 53070, 15985, 26583, 21953, m32404, 14089, or 23436 proteins differ in amino acid sequence from SEQID NO: 2, In another embodiment, a 26583 protein or fragment has 8, 15, 20, 25, 34, or 41, yet retain biological activity. an amino acid sequence which differs from the sequence of In one embodiment, the protein includes an amino acid AAA30697 by at least one, two, three, five or more amino sequence at least about 60%. 65%, 70%, 75%, 80%, 85%, 35 acids. The variations may include the addition, replacement, 90%. 95%, 98%, 99% or more homologous to SEQID NO: and/or deletion of amino acid residues. 2, 8, 15, 20, 25, 34, or 41. A 26583 protein or fragment is provided which varies The present invention also pertains to fragments of the from the sequence of SEQ ID NO:15 in non-active site 53070, 15985, 26583, 21953, m32404, 14089, or 23436 residues by at least one but by less than 15, 10 or 5 amino polypeptides. For examples, fragments of the 53070 40 acid residues in the protein or fragment, but which does not polypeptides which include amino acid residues about 103 differ from SEQ ID NO:15 in regions having phosphatase to 119, about 226 to 247, or about 301 to 329 of SEQ ID catalytic activity. (If this comparison requires alignment the NO:2, which correspond to hydrophilic regions of the 53070 sequences should be aligned for maximum homology. protein. Similarly, fragments of 53070 which include resi “Looped out sequences from deletions, insertions, or mis dues about 63 to 73, about 86 to 102, or about 199 to 216 of 45 matches, are considered differences.) In some embodiments, SEQ ID NO:2 correspond to hydrophobic regions of the the difference is at a non-essential residue or is a conserva 53070 protein. Fragments of 53070 which include residues tive substitution, while in others, the difference is at an about 12 to 45, about 125 to 150, or about 150 to 175 of SEQ essential residue or is a non conservative Substitution. ID NO:2 correspond to protein kinase domain of the 53070 A 21953 protein or fragment is provided which varies protein; and fragments of 53070 which include amino acid 50 from the sequence of SEQ ID NO:20 in regions defined by residues about 1 to 11 and 273 to 367 of SEQ ID NO:2 amino acids about 672 to 744 by at least one but by less than correspond to non-kinase domain region of the 53070 pro 15, 10 or 5 amino acid residues in the protein or fragment but tein. which does not differ from SEQID NO:20 in regions defined A 53070 protein or fragment is provided which varies by amino acids about 672 to 744. In some embodiments, the from the sequence of SEQ ID NO:2 in regions defined by 55 21953 protein includes at least one contiguous amino acid amino acids about 273 to 367 by at least one but by less than from the region of about amino acid 1 to 200, 100 to 300, 15, 10 or 5 amino acid residues in the protein or fragment but 200 to 400, 300 to 500, 400 to 600, 500 to 700, or 600 to 800 which does not differ from SEQID NO:2 in regions defined of SEQ ID NO:20. by amino acids about 1 to 272. Additionally, a 53070 protein In another preferred embodiment, the 21953 protein has a is provided which varies from the sequence of SEQID NO:2 60 K for the substrate H-Gly-Pro-p-nitroanilide (NA)/HCl in regions defined by amino acids about 1 to 90 or, alterna (Sigma Corp, MO, USA) (H-Gly-Pro-pNA) of less than tively, 91 to 272 by at least one but by less than 15, 10, or about 10 mM, 5 mM, 1 mM, 0.5 mM, 0.2 mM, or 0.1 mM, 5 amino acid residues in the protein or fragment, but which and/or a V for H-Gly-Pro-pNA of about at least 100, 500, does not differ from SEQ ID NO:2 in regions defined by 1000, 3000, 5000, or 10000 absorbance units-min'. Such amino acids 91 to 367 or 1 to 90 and 273 to 367, respec 65 parameters can be determined using a prolyl oligopeptidase tively. (If these comparisons require alignment, the assay described herein, e.g., as described in “Screening sequences should be aligned for maximum homology. Assays.” below. US 7,282,360 B2 87 88 An m32404 protein or fragment is provided which varies a 23436 polypeptide having an activating mutation, e.g., a from the sequence of SEQID NO:25 in regions defined by mutation to aspartic or glutamic acid, of a phosphorylation amino acids about 1 to 46 and 269 to 520 by at least one but site, e.g., a predicted phosphorylation site described herein, by less than 15, 10 or 5 amino acid residues in the protein can be introduced or expressed in cells. or fragment but which does not differ from SEQ ID NO:25 In one embodiment, a biologically active portion of a in regions defined by amino acids about 45 to 268. An 53070, 15985, 26583, 21953, m32404, 14089, or 23436 m32404 protein or fragment is also provided which varies protein includes a protein kinase domain, a protein kinase from the sequence of SEQID NO:25 in regions defined by domain and/or doublecortin repeats, a serine/threonine phos amino acids about 1 to 310, and 521 to 552, of SEQ ID phatase catalytic domain, a prolyl oligopeptidase domain NO:25 by at least one but by less than 15, 10 or 5 amino acid 10 and/or a DPP IV N-terminal domain, a trypsin domain, a residues in the protein or fragment but which does not differ trypsin domain, or a ubiquitin carboxy-terminal hydrolase from SEQ ID NO:25 in regions defined by amino acids domain, respectively. Moreover, other biologically active about 311 to 520 of SEQ ID NO:25. portions, in which other regions of the protein are deleted, A 14089 protein or fragment is provided which varies can be prepared by recombinant techniques and evaluated from the sequence of SEQID NO:34 in regions defined by 15 for one or more of the functional activities of a native 53070, amino acids about 41 to 234 by at least one but by less than 15985, 26583, 21953, m32404, 14089, or 23436 protein. 15, 10 or 5 amino acid residues in the protein or fragment but In a preferred embodiment, the 53070, 15985, 26583, which does not differ from SEQID NO:34 in regions defined 21953, m32404, 14089, or 23436 protein has an amino acid by amino acids about 41 to 234. sequence shown in SEQ ID NO:2, 8, 15, 20, 25, 34, or 41, In a preferred embodiment, a 14089 fragment differs by at respectively. In other embodiments, the 53070, 15985, least 1, 2, 3, 10, 20, or more amino acid residues encoded by 26583, 21953, m32404, 14089, or 23436 protein is substan a sequence present in Genbank accession number U66059. tially identical to SEQ ID NO:2, 8, 15, 20, 25, 34, or 41, e.g., from nucleotides 315-571 of SEQ ID NO:33; the respectively. In yet another embodiment, the 53070, 15985, sequence of SEQID NO:247 of WO 01/40466; the sequence 26583, 21953, m32404, 14089, or 23436 protein is substan of SEQID NO:5 or 6 of WO 01/72961: the sequence of SEQ 25 tially identical to SEQID NO:2, 8, 15, 20, 25, 34, or 41 and ID NO:22 of WOO1/71004. Differences can include differ retains the functional activity of the protein of SEQ ID ing in length or sequence identity. For example, a fragment NO:2, 8, 15, 20, 25, 34, or 41, as described in detail in the can: include one or more amino acid residues from SEQID Subsections above. NO:34 outside the region encoded by nucleotides 315 to 571, 94 to 938, 136 to 861, 173 to 861, 1-570, 572 to 947 30 Chimeric or Fusion Proteins of SEQID NO:33; not include all of the amino acid residues In another aspect, the invention provides 53070, 15985, encoded by a nucleotide sequence in Genbank accession 26583, 21953, m32404, 14089, or 23436 chimeric or fusion number U66059, e.g., from nucleotides 315-571 of SEQ ID proteins. As used herein, a 53070, 15985, 26583, 21953, NO:33; the sequence of SEQ ID NO:247 of WO 01/40466: m32404, 14089, or 23436 “chimeric protein' or “fusion the sequence of SEQ ID NO:5 or 6 of WO 01/72961: the 35 protein includes a 53070, 15985, 26583, 21953, m32404, sequence of SEQ ID NO:22 of WO 01/71004, e.g., can be 14089, or 23436 polypeptide linked to a non-53070, -15985, one or more amino acid residues shorter (at one or both -265.83, -21953, -m32404, -14089, or -23436 polypeptide. A ends) than a sequence encoded by the nucleotide sequence “non-53070, -15985, -26583, -21953, -m32404, -14089, or in Genbank accession number U66059, e.g., from nucle -23436 polypeptide' refers to a polypeptide having an amino otides 315-571 of SEQ ID NO:33; the sequence of SEQ ID 40 acid sequence corresponding to a protein which is not NO:247 of WO 01/40466; the sequence of SEQID NO:5 or substantially homologous to the 53070, 15985, 26583, 6 of WO 01/72961: the sequence of SEQ ID NO:22 of WO 21953, m32404, 14089, or 23436 protein, e.g., a protein 01/71004; or can differ by one or more amino acid residues which is different from the 53070, 15985, 26583, 21953, in the region of overlap. m32404, 14089, or 23436 protein and which is derived from A 23436 protein or fragment is provided which varies 45 the same or a different organism. The 53070, 15985, 26583, from the sequence of SEQID NO:41 in regions defined by 21953, m32404, 14089, or 23436 polypeptide of the fusion amino acids about 1 to 88, 121 to 331, and 421 to 485 by at protein can correspond to all or a portion e.g., a fragment least one but by less than 15, 10 or 5 amino acid residues in described herein of a 53070, 15985, 26583, 21953, m32404, the protein or fragment but which does not differ from SEQ 14089, or 23436 amino acid sequence. In a preferred ID NO:41 in regions defined by amino acids about 89 to 120, 50 embodiment, a 53070, 15985, 26583, 21953, m32404, and 332 to 420. Such polypeptide fragments of 23436 14089, or 23436 fusion protein includes at least one (or two) containing functional domains, signatures, and/or modifica biologically active portion of a 53070, 15985, 26583,21953, tion sites, and nucleic acids encoding same can be useful, m32404, 14089, or 23436 protein. The non-53070, -15985, e.g., as immunogens or as competitive inhibitors. For -265.83, -21953, -m32404, -14089, or -23436 polypeptide example, to inhibit 23436 mediated de-ubiquitination, a 55 can be fused to the N-terminus or C-terminus of the 53070, ubiquitinated protein can be contacted with a substrate 15985, 26583, 21953, m32404, 14089, or 23436 polypep binding subsequence of 23436 which lacks de-ubiquitina tide. tion activity thereby inhibiting or blocking de-ubiquitination The fusion protein can include a moiety which has a high by 23436 having the activity. A variant of 23436 lacking affinity for a ligand. For example, the fusion protein can be de-ubiquitination activity can be generated by mutating the 60 a GST-53070, -15985, -26583, -21953, -m32404, -14089, or conserved cysteine at about amino acid 98 of SEQ ID -23436 fusion protein in which the 53070, 15985, 26583, NO:41, e.g., to alanine, or the conserved histidines at about 21953, m32404, 14089, or 23436 sequences are fused to the amino acids 344 and 353 of SEQID NO:41, e.g., to alanine. C-terminus of the GST sequences. Such fusion proteins can To inhibit phosphorylation of a particular site of 23436 facilitate the purification of recombinant 53070, 15985, polypeptide in a cell, a 23436 polypeptide having a mutation 65 26583, 21953, m32404, 14089, or 23436. Alternatively, the at the site, e.g., to alanine, can be introduced or expressed in fusion protein can be a 53070, 15985, 26583, 21953, cells. To alter the activity of a 23436 polypeptide in a cell, m32404, 14089, or 23436 protein containing a heterologous US 7,282,360 B2 89 90 signal sequence at its N-terminus. In certain host cells (e.g., Variants of a 53070, 15985, 26583, 21953, m32404, mammalian host cells), expression and/or secretion of 14089, or 23436 protein can be identified by screening 53070, 15985, 26583, 21953, m32404, 14089, or 23436 can combinatorial libraries of mutants, e.g., truncation mutants, be increased through use of a heterologous signal sequence. of a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 Fusion proteins can include all or a part of a serum protein for agonist or antagonist activity. protein, e.g., an IgG constant region, or human serum Libraries of fragments e.g., N terminal, C terminal, or albumin. internal fragments, of a 53070, 15985, 26583, 21953, The 53070, 15985, 26583, 21953, m32404, 14089, or m32404, 14089, or 23436 protein coding sequence can be 23436 fusion proteins of the invention can be incorporated used to generate a variegated population of fragments for into pharmaceutical compositions and administered to a 10 screening and subsequent selection of variants of a 53070, subject in vivo. The 53070, 15985, 26583, 21953, m32404, 15985, 26583, 21953, m32404, 14089, or 23436 protein. 14089, or 23436 fusion proteins can be used to affect the Variants in which a cysteine residues is added or deleted or bioavailability of a 53070, 15985, 26583, 21953, m32404, in which a residue which is glycosylated is added or deleted 14089, or 23436 substrate. 53070, 15985, 26583, 21953, are particularly preferred. m32404, 14089, or 23436 fusion proteins may be useful 15 Methods for Screening gene products of combinatorial therapeutically for the treatment of disorders caused by, for libraries made by point mutations or truncation, and for example, (i) aberrant modification or mutation of a gene screening cDNA libraries for gene products having a encoding a 53070, 15985, 26583,21953, m32404, 14089, or selected property are known in the art. Such methods are 23436 protein; (ii) mis-regulation of the 53070, 15985, adaptable for rapid screening of the gene libraries generated 26583, 21953, m32404, 14089, or 23436 gene; and (iii) by combinatorial mutagenesis of 53070, 15985, 26583, aberrant post-translational modification of a 53070, 15985, 21953, m32404, 14089, or 23436 proteins. Recursive 26583, 21953, m32404, 14089, or 23436 protein. ensemble mutagenesis (REM), a new technique which Moreover, the 53070-, 15985- 26583-, 21953-, m32404-, enhances the frequency of functional mutants in the librar 14089-, or 23436-fusion proteins of the invention can be ies, can be used in combination with the screening assays to used as immunogens to produce anti-53070, -15985, -26583, 25 identify 53070, 15985, 26583, 21953, m32404, 14089, or -21953, -m32404, -14089, or -23436 antibodies in a subject, 23436 variants (Arkin and Yourvan (1992) Proc. Natl. Acad. to purify 53070, 15985, 26583, 21953, m32404, 14089, or Sci. USA 89:7811-7815; Delgrave et al. (1993) Protein 23436 ligands and in screening assays to identify molecules Engineering 6:327-331). which inhibit the interaction of 53070, 15985, 26583,21953, Cell-based assays can be exploited to analyze a variegated m32404, 14089, or 23436 with a 53070, 15985, 26583, 30 53070, 15985, 26583, 21953, m32404, 14089, or 23436 21953, m32404, 14089, or 23436 substrate. library. For example, a library of expression vectors can be Expression vectors are commercially available that transfected into a cell line, e.g., a cell line, which ordinarily already encode a fusion moiety (e.g., a GST polypeptide). A responds to 53070, 15985, 26583, 21953, m32404, 14089, 53070-, 15985- 26583-, 21953-, m32404-, 14089-, or or 23436 in a substrate-dependent manner. The transfected 23436-encoding nucleic acid can be cloned into Such an 35 cells are then contacted with 53070, 15985, 26583, 21953, expression vector Such that the fusion moiety is linked m32404, 14089, or 23436 and the effect of the expression of in-frame to the 53070, 15985, 26583, 21953, m32404, the mutant on signaling by the 53070, 15985, 26583,21953, 14089, or 23436 protein. m32404, 14089, or 23436 substrate can be detected, e.g., by measuring the phosphorylation of a Substrate, by measuring Variants of 53070, 15985, 26583, 21953, m32404, 14089, or 40 protein kinase activity and/or microtubule binding, by mea 23436 Proteins Suring phosphorylation of serine or threonine residues, by In another aspect, the invention also features a variant of measuring prolyl oligopeptidase as described below, by a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 measuring m32404 protease activity, by measuring 14089 polypeptide, e.g., which functions as an agonist (mimetics) protease activity, or by measuring de-ubiquitinating activity, or as an antagonist. Variants of the 53070, 15985, 26583, 45 respectively. Plasmid DNA can then be recovered from the 21953, m32404, 14089, or 23436 proteins can be generated cells which score for inhibition, or alternatively, potentiation by mutagenesis, e.g., discrete point mutation, the insertion of signaling by the 53070, 15985, 26583, 21953, m32404, or deletion of sequences or the truncation of a 53070, 15985, 14089, or 23436 substrate, and the individual clones further 26583,21953, m32404, 14089, or 23436 protein. An agonist characterized. of the 53070, 15985, 26583, 21953, m32404, 14089, or 50 In another aspect, the invention features a method of 23436 proteins can retain Substantially the same, or a Subset, making a 53070, 15985, 26583, 21953, m32404, 14089, or of the biological activities of the naturally occurring form of 23436 polypeptide, e.g., a peptide having a non-wild type a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 activity, e.g., an antagonist, agonist, or Super agonist of a protein. An antagonist of a 53070, 15985, 26583, 21953, naturally occurring 53070, 15985, 26583, 21953, m32404, m32404, 14089, or 23436 protein can inhibit one or more of 55 14089, or 23436 polypeptide, e.g., a naturally occurring the activities of the naturally occurring form of the 53070, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 15985, 26583, 21953, m32404, 14089, or 23436 protein by, polypeptide. The method includes: altering the sequence of for example, competitively modulating a 53070-, 15985 a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 26583-, 21953-, m32404-, 14089-, or 23436-mediated activ polypeptide, e.g., altering the sequence, e.g., by Substitution ity of a 53070, 15985, 26583, 21953, m32404, 14089, or 60 or deletion of one or more residues of a non-conserved 23436 protein. Thus, specific biological effects can be elic region, a domain or residue disclosed herein, and testing the ited by treatment with a variant of limited function. Prefer altered polypeptide for the desired activity. ably, treatment of a subject with a variant having a Subset of In another aspect, the invention features a method of the biological activities of the naturally occurring form of making a fragment or analog of a 53070, 15985, 26583, the protein has fewer side effects in a subject relative to 65 21953, m32404, 14089, or 23436 polypeptide a biological treatment with the naturally occurring form of the 53070, activity of a naturally occurring 53070, 15985, 26583, 15985, 26583, 21953, m32404, 14089, or 23436 protein. 21953, m32404, 14089, or 23436 polypeptide. The method US 7,282,360 B2 91 92 includes: altering the sequence, e.g., by Substitution or The term “antigen-binding fragment of an antibody (or deletion of one or more residues, of a 53070, 15985, 26583, simply “antibody portion,” or “fragment'), as used herein, 21953, m32404, 14089, or 23436 polypeptide, e.g., altering refers to one or more fragments of a full-length antibody that the sequence of a non-conserved region, or a domain or retain the ability to specifically bind to the antigen, e.g., residue described herein, and testing the altered polypeptide 5 53070, 15985, 26583, 21953, m32404, 14089, or 23436 for the desired activity. polypeptide or fragment thereof. Examples of antigen-bind ing fragments of the anti-53070, -15985, -265.83, -21953, Anti-53070, -15985, -26583, -21953, -m32404, -14089, or -m32404, -14089, or -23436 antibody include, but are not -23436 Antibodies limited to: (i) a Fab fragment, a monovalent fragment In another aspect, the invention provides an anti-53070, 10 consisting of the VL, VH, CL and CH1 domains; (ii) a -15985, -26583, -21953, -m32404, -14089, or -23436 anti F(ab'), fragment, a bivalent fragment comprising two Fab body, or a fragment thereof (e.g., an antigen-binding frag fragments linked by a disulfide bridge at the hinge region; ment thereof). The term “antibody' as used herein refers to (iii) a Fd fragment consisting of the VH and CH1 domains: an immunoglobulin molecule or immunologically active (iv) a Fv fragment consisting of the VL and VH domains of portion thereof, i.e., an antigen-binding portion. As used 15 a single arm of an antibody, (v) a dAb fragment (Ward et al., herein, the term “antibody” refers to a protein comprising at (1989) Nature 341:544-546), which consists of a VH least one, and preferably two, heavy (H) chain variable domain; and (vi) an isolated complementarity determining regions (abbreviated herein as VH), and at least one and region (CDR). Furthermore, although the two domains of preferably two light (L) chain variable regions (abbreviated the Fv fragment, VL and VH, are coded for by separate herein as VL). The VH and VL regions can be further genes, they can be joined, using recombinant methods, by a subdivided into regions of hypervariability, termed “comple synthetic linker that enables them to be made as a single mentarity determining regions” (“CDR), interspersed with protein chain in which the VL and VH regions pair to form regions that are more conserved, termed “framework monovalent molecules (known as single chain FV (ScPV); see regions’ (FR). The extent of the framework region and e.g., Bird et al. (1988) Science 242:423-426; and Huston et CDR's has been precisely defined (see, Kabat, E. A., et al. 25 al. (1988) Proc. Natl. Acad. Sci. USA 85:5879-5883). Such (1991) Sequences of Proteins of Immunological Interest, single chain antibodies are also encompassed within the Fifth Edition, U.S. Department of Health and Human Ser term “antigen-binding fragment of an antibody. These vices, NIH Publication No. 91-3242, and Chothia, C. et al. antibody fragments are obtained using conventional tech (1987) J. Mol. Biol. 196:901-917, which are incorporated niques known to those with skill in the art, and the fragments herein by reference). Each VH and VL is composed of three 30 are screened for utility in the same manner as are intact CDR's and four FRs, arranged from amino-terminus to antibodies. carboxy-terminus in the following order: FR1, CDR1, FR2, The anti-53070, -15985, -26583, -21953, -m32404, CDR2, FR3, CDR3, FR4. -14089, or -23436 antibody can be a polyclonal or a mono The anti-53070, -15985, -26583, -21953, -m32404, clonal antibody. In other embodiments, the antibody can be -14089, or -23436 antibody can further include a heavy and 35 recombinantly produced, e.g., produced by phage display or light chain constant region, to thereby form a heavy and light by combinatorial methods. immunoglobulin chain, respectively. In one embodiment, Phage display and combinatorial methods for generating the antibody is a tetramer of two heavy immunoglobulin anti-53070, -15985, -26583, -21953, -m32404, -14089, or chains and two light immunoglobulin chains, wherein the -23436 antibodies are known in the art (as described in, e.g., heavy and light immunoglobulin chains are inter-connected 40 Ladner et al. U.S. Pat. No. 5.223,409: Kang et al. Interna by, e.g., disulfide bonds. The heavy chain constant region is tional Publication No. WO92/186 19: Dower et al. Interna comprised of three domains, CH1, CH2 and CH3. The light tional Publication No. WO 91/1.7271: Winter et al. Interna chain constant region is comprised of one domain, CL. The tional Publication WO 92/20791; Markland et al. variable region of the heavy and light chains contains a International Publication No. WO92/15679: Breitling et al. binding domain that interacts with an antigen. The constant 45 International Publication WO 93/01288; McCafferty et al. regions of the antibodies typically mediate the binding of the International Publication No. WO92/01047: Garrard et al. antibody to host tissues or factors, including various cells of International Publication No. WO92/09690; Ladner et al. the (e.g., effector cells) and the first com International Publication No. WO 90/02809; Fuchs et al. ponent (Cld) of the classical . (1991) Bio/Technology 9:1370-1372; Hay et al. (1992) Hum As used herein, the term “immunoglobulin’ refers to a 50 Antibod Hybridomas 3:81-85; Huse et al. (1989) Science protein consisting of one or more polypeptides Substantially 246:1275-1281; Griffiths et al. (1993) EMBOJ 12:725-734; encoded by immunoglobulin genes. The recognized human Hawkins et al. (1992) J Mol Biol 226:889-896; Clackson et immunoglobulin genes include the kappa, lambda, alpha al. (1991) Nature 352:624-628; Gram et al. (1992) PNAS (IgA1 and IgA2), gamma (IgG1, IgG2, IgG3, IgG4), delta, 89:3576-3580; Garrad et al. (1991) Bio/Technology 9:1373 55 1377; Hoogenboom et al. (1991) Nuc Acid Res 19:4133 epsilon and mu constant region genes, as well as the myriad 4137; and Barbas et al. (1991) PNAS 88:7978-7982, the immunoglobulin variable region genes. Full-length immu contents of all of which are incorporated by reference noglobulin “light chains” (about 25 KDa or 214 amino herein). acids) are encoded by a variable region gene at the NH2 In one embodiment, the anti-53070, -15985, -265.83, terminus (about 110 amino acids) and a kappa or lambda 60 -21953, -m32404, -14089, or -23436 antibody is a fully constant region gene at the human antibody (e.g., an antibody made in a mouse which COOH--terminus. Full-length immunoglobulin “heavy has been genetically engineered to produce an antibody from chains” (about 50 KDa or 446 a human immunoglobulin sequence), or a non-human anti amino acids), are similarly encoded by a variable region body, e.g., a rodent (mouse or rat), goat, primate (e.g., gene (about 116 amino acids) and one of the other afore 65 monkey), camel antibody. Preferably, the non-human anti mentioned constant region genes, e.g., gamma (encoding body is a rodent (mouse or rat antibody). Method of pro about 330 amino acids). ducing rodent antibodies are known in the art. US 7,282,360 B2 93 94 Human monoclonal antibodies can be generated using each position in the consensus sequence is occupied by the transgenic mice carrying the human immunoglobulin genes amino acid occurring most frequently at that position in the rather than the mouse system. Splenocytes from these trans family. If two amino acids occur equally frequently, either genic mice immunized with the antigen of interest are used can be included in the consensus sequence. A "consensus to produce hybridomas that secrete human mAbs with 5 framework” refers to the framework region in the consensus specific affinities for epitopes from a human protein (see, immunoglobulin sequence. e.g., Wood et al. International Application WO 91/00906, An antibody can be humanized by methods known in the Kucherlapati et al. PCT publication WO91/10741; Lonberg art. Humanized antibodies can be generated by replacing et al. International Application WO 92/03918; Kay et al. sequences of the Fv variable region which are not directly International Application 92/03917: Lonberg, N. et al. 1994 10 involved in antigen binding with equivalent sequences from Nature 368:856-859; Green, L. L. et al. 1994 Nature Genet. human Fv Variable regions. General methods for generating 7:13-21; Morrison, S. L. et al. 1994 Proc. Natl. Acad. Sci. humanized antibodies are provided by Morrison, S. L., USA 81:6851-6855; Bruggeman et al. 1993 Year Immunol 1985, Science 229:1202-1207, by Oi et al., 1986, Bio 7:33-40; Tuaillon et al. 1993 PNAS 90:3720-3724; Brugge Techniques 4:214, and by Queen et al. U.S. Pat. Nos. man et al. 1991 EurJ Immunol 21:1323-1326). 15 5,585,089, 5,693,761 and 5,693,762, the contents of all of An anti-53070, -15985, -26583, -21953, -m32404, which are hereby incorporated by reference. Those methods -14089, or -23436 antibody can be one in which the variable include isolating, manipulating, and expressing the nucleic region, or a portion thereof, e.g., the CDR's, are generated acid sequences that encode all or part of immunoglobulin FV in a non-human organism, e.g., a rat or mouse. Chimeric, variable regions from at least one of a heavy or light chain. CDR-grafted, and humanized antibodies are within the Sources of such nucleic acid are well known to those skilled invention. Antibodies generated in a non-human organism, in the art and, for example, may be obtained from a e.g., a rat or mouse, and then modified, e.g., in the variable hybridoma producing an antibody against a 53070, 15985, framework or constant region, to decrease antigenicity in a 26583, 21953, m32404, 14089, or 23436 polypeptide or human are within the invention. fragment thereof. The recombinant DNA encoding the Chimeric antibodies can be produced by recombinant 25 humanized antibody, or fragment thereof, can then be cloned DNA techniques known in the art. For example, a gene into an appropriate expression vector. encoding the Fc constant region of a murine (or other Humanized or CDR-grafted antibodies can be produced species) monoclonal antibody molecule is digested with by CDR-grafting or CDR substitution, wherein one, two, or restriction enzymes to remove the region encoding the all CDR's of an immunoglobulin chain can be replaced. See murine Fc, and the equivalent portion of a gene encoding a 30 e.g., U.S. Pat. No. 5.225,539; Jones et al. 1986 Nature human Fc constant region is substituted (see Robinson et al., 321:552-525; Verhoeyan et al. 1988 Science 239:1534: International Patent Publication PCT/US86/02269; Akira, et Beidler et al. 1988.J. Immunol. 141:4053-4060: Winter U.S. al., European Patent Application 184, 187; Taniguchi, M., Pat. No. 5.225,539, the contents of all of which are hereby European Patent Application 171.496; Morrison et al., Euro expressly incorporated by reference. Winter describes a pean Patent Application 173,494; Neuberger et al., Interna 35 CDR-grafting method which may be used to prepare the tional Application WO 86/01533: Cabilly et al. U.S. Pat. No. humanized antibodies of the present invention (UK Patent 4,816,567; Cabilly et al., European Patent Application 125, Application GB 2188638A, filed on Mar. 26, 1987: Winter 023: Better et al. (1988 Science 240: 1041-1043); Liu et al. U.S. Pat. No. 5.225,539), the contents of which is expressly (1987) PNAS 84:3439-3443; Liu et al., 1987, J. Immunol. incorporated by reference. 139:3521-3526; Sun et al. (1987) PNAS 84:214-218; Nish 40 Also within the scope of the invention are humanized imura et al., 1987, Canc. Res. 47: 999-1005; Wood et al. antibodies in which specific amino acids have been Substi (1985) Nature 314:446-449; and Shaw et al., 1988, J. Natl tuted, deleted or added. Preferred humanized antibodies Cancer Inst. 80:1553-1559). have amino acid Substitutions in the framework region, Such A humanized or CDR-grafted antibody will have at least as to improve binding to the antigen. For example, a one or two but generally all three recipient CDR's (of heavy 45 humanized antibody will have framework residues identical and or light immuoglobulin chains) replaced with a donor to the donor framework residue or to another amino acid CDR. The antibody may be replaced with at least a portion other than the recipient framework residue. To generate Such of a non-human CDR or only some of the CDR's may be antibodies, a selected, Small number of acceptor framework replaced with non-human CDR's. It is only necessary to residues of the humanized immunoglobulin chain can be replace the number of CDR's required for binding of the 50 replaced by the corresponding donoramino acids. Preferred humanized antibody to a 53070, 15985, 26583, 21953, locations of the Substitutions include amino acid residues m32404, 14089, or 23436 or a fragment thereof. Preferably, adjacent to the CDR, or which are capable of interacting the donor will be a rodent antibody, e.g., a rat or mouse with a CDR (see e.g., U.S. Pat. No. 5,585,089). Criteria for antibody, and the recipient will be a human framework or a selecting amino acids from the donor are described in U.S. human consensus framework. Typically, the immunoglobu 55 Pat. No. 5,585,089, e.g., columns 12-16 of U.S. Pat. No. lin providing the CDR's is called the “donor and the 5,585,089, the e.g., columns 12-16 of U.S. Pat. No. 5,585, immunoglobulin providing the framework is called the 089, the contents of which are hereby incorporated by “acceptor.” In one embodiment, the donor immunoglobulin reference. Other techniques for humanizing antibodies are is a non-human (e.g., rodent). The acceptor framework is a described in Padlanet al. EP 519596 A1, published on Dec. naturally-occurring (e.g., a human) framework or a consen 60 23, 1992. SuS framework, or a sequence about 85% or higher, prefer In preferred embodiments an antibody can be made by ably 90%, 95%, 99% or higher identical thereto. immunizing with purified 53070, 15985, 26583, 21953, As used herein, the term "consensus sequence” refers to m32404, 14089, or 23436 antigen, or a fragment thereof, the sequence formed from the most frequently occurring e.g., a fragment described herein, membrane associated amino acids (or nucleotides) in a family of related sequences 65 antigen, tissue, e.g., crude tissue preparations, whole cells, (See e.g., Winnaker, From Genes to Clones (Verlagsgesell preferably living cells, lysed cells, or cell fractions, e.g., schaft, Weinheim, Germany 1987). In a family of proteins, cytosolic fractions. US 7,282,360 B2 95 96 A full-length 53070, 15985, 26583, 21953, m32404, to 690, 690 to 710, or 710 to 744 can be used to make an 14089, or 23436 protein or, antigenic peptide fragment of antibody against the prolyl oligopeptidase domain of the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 can 21953 protein. be used as an immunogen or can be used to identify Fragments of m32404 which include residues about 30 to anti-53070, -15985, -26583, -21953, -m32404, -14089, or 5 60 of SEQID NO:25 can be used to make, e.g., can be used -23436 antibodies made with other immunogens, e.g., cells, as immunogens or used to characterize the specificity of an membrane preparations, and the like. The antigenic peptide antibody, antibodies against hydrophilic regions of the of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 m32404 protein. Similarly, a fragment of m32404 which should include at least 8 amino acid residues of the amino includes residues about 320 to 340, or about 450 to 470 of acid sequence shown in SEQID NO: 2, 8, 15, 20, 25, 34, or 10 SEQ ID NO:25 can be used to make an antibody against a 41 and encompass an epitope of 53070, 15985, 26583, hydrophobic region of the m32404 protein; a fragment of 21953, m32404, 14089, or 23436. Preferably, the antigenic m32404 which include residues about 45 to 268, or about peptide includes at least 10 amino acid residues, more 311 to 520 of SEQ ID NO:25 (or a fragment thereof, e.g., preferably at least 15 amino acid residues, even more residues 45 to 100, 73 to 78, 100 to 150, 150 to 200, 200 to preferably at least 20 amino acid residues, and most pref 15 250, 218 to 229, 250 to 268, 311 to 360,337 to 342, 360 to erably at least 30 amino acid residues. 400, 400 to 450, 450 to 500, 500 to 520 of SEQID NO:25) Fragments of 53070, 15985, 26583, 21953, m32404, can be used to make an antibody against the trypsin region 14089, or 23436 can be used as immunogens or to charac of the m32404 protein. terize the specificity of an antibody. Fragments of 14089 that include residues about 71 to 79, Fragments of 53070 which include amino acid residues about 161 to 171, or about 185 to 192 of SEQID NO:34 can about 103 to 119, about 226 to 247, or about 301 to 329 of be used to make, e.g., used as immunogens or used to SEQID NO:2, for example, can be used to make antibodies characterize the specificity of an antibody, antibodies against against hydrophilic regions of the 53070 protein. Similarly, hydrophilic regions of the 14089 protein. Similarly, frag fragments of 53070 which include residues about 63 to 73, ments of 14089 that include residues about 35 to 55, 58 to about 86 to 102, or about 199 to 216 of SEQ ID NO:2 can 25 70, or 175 to 184 of SEQID NO:34 can be used to make an be used to make an antibody against a hydrophobic region antibody against a hydrophobic region of the 14089 protein; of the 53070 protein; fragments of 53070 which include a fragment of 14089 that includes residues about 41-234 of residues about 12 to 45, about 125 to 150, or about 150 to SEQID NO:34, or small fragments, e.g., 24 to 44, 74 to 94. 175 of SEQ ID NO:2 can be used to make an antibody or 170 to 190 of SEQ ID NO:34 can be used to make an against the protein kinase domain of the 53070 protein; and 30 antibody against the trypsin region of the 14089 protein. In fragments of 53070 which include amino acid residues about a preferred embodiment, the antibody can bind to the 1 to 11 and 273 to 367 of SEQID NO:2 can be used to make extracellular portion of the 14089 protein, e.g., it can bind to antibodies against a non-kinase domain region of the 53070 a whole cell which expresses the 14089 protein. protein. Fragments of 23436 which include residues about 76 to Fragments of 15985 which include residues 8 to 20, from 35 87, from about 138 to 143, and from about 458 to 478 of about 592 to 600, or from about 652 to 672 can be used to SEQ ID NO:41 can be used to make, e.g., used as immu make, e.g., used as immunogens or used to characterize the nogens or used to characterize the specificity of an antibody, specificity of an antibody, antibodies against hydrophilic antibodies against hydrophilic regions of the 23436 protein. regions of the 15985 protein. Similarly, fragments of 15985 Similarly, fragments of 23436 which include residues about which include residues 83 to 91, from about 465 to 472, or 40 103 to 114, from about 285 to 297, and from about 413 to from about 568 to 585 of SEQID NO:8 can be used to make 420 of SEQ ID NO:41 can be used to make an antibody an antibody against a hydrophobic region of the 15985 against a hydrophobic region of the 23436 protein; frag protein; a fragment of 15985 which include residues 394 to ments of 23436 which include residues about 89 to 120,332 651 can be used to make an antibody against the protein to 420, or 89 to 420 of SEQ ID NO:41 can be used to make kinase region of the 15985 protein; and a fragment of 15985 45 an antibody against the ubiquitin carboxy-terminal hydro which includes residues 67 to 158 or residues 192 to 280 can lase region of the 23436 protein. be used to make an antibody against a doublecortin repeat of Antibodies reactive with, or specific for, any of these the 15985 protein. regions, or other regions or domains described herein are Fragments of 265.83 which include residues about 60-70 provided. can be used to make, e.g., used as immunogens or used to 50 Antibodies which bind only native 53070, 15985, 26583, characterize the specificity of an antibody, antibodies against 21953, m32404, 14089, or 23436 protein, only denatured or hydrophilic regions of the 26583 protein. Similarly, frag otherwise non-native 53070, 15985, 26583, 21953, m32404, ments of 26583 which include residues 262-279 can be used 14089, or 23436 protein, or which bind both, are with in the to make an antibody against a hydrophobic region of the invention. Antibodies with linear or conformational epitopes 26583 protein; a fragment of 265.83 which includes residues 55 are within the invention. Conformational epitopes can some about 172 to 461 or 99 to 523 can be used to make an times be identified by identifying antibodies which bind to antibody against the phosphatase region of the 26583 pro native but not denatured 53070, 15985, 26583, 21953, tein. m32404, 14089, or 23436 protein. Hydrophilic fragments of 21953, e.g., those which include Preferred epitopes encompassed by the antigenic peptide residues 20 to 40, 65 to 80, or 780 to 790, can be used to 60 are regions of 53070, 15985, 26583,21953, m32404, 14089, make, e.g., used as immunogens or used to characterize the or 23436 are located on the surface of the protein, e.g., specificity of an antibody, antibodies against hydrophilic hydrophilic regions, as well as regions with high antigenic regions of the 21953 protein. Similarly, a hydrophobic ity. For example, an Emini surface probability analysis of the fragment of 21953, e.g. which include residues 250 to 270, human 53070, 15985, 26583, 21953, m32404, 14089, or 370 to 390, or 681 to 695, can be used to make an antibody 65 23436 protein sequence can be used to indicate the regions against a hydrophobic region of the 21953 protein; a frag that have a particularly high probability of being localized to ment of 21953 which include residues about 672 to 744, 672 the surface of the 53070, 15985, 26583, 21953, m32404, US 7,282,360 B2 97 98 14089, or 23436 protein and are thus likely to constitute 21953, m32404, 14089, or 23436 protein (e.g., in a cellular Surface residues useful for targeting antibody production. lysate or cell Supernatant) in order to evaluate the abundance In preferred embodiments antibodies can bind one or and pattern of expression of the protein. Anti-53070, -15985, more of purified antigen, membrane associated antigen, -265.83, -21953, -m32404, -14089, or -23436 antibodies can tissue, e.g., tissue sections, whole cells, preferably living be used diagnostically to monitor protein levels in tissue as cells, lysed cells, cell fractions, e.g., cytosolic fractions. part of a clinical testing procedure, e.g., to determine the The anti-53070, -15985, -26583, -21953, -m32404, efficacy of a given treatment regimen. Detection can be -14089, or -23436 antibody can be a single chain antibody. facilitated by coupling (i.e., physically linking) the antibody A single-chain antibody (ScPV) may be engineered (see, for to a detectable Substance (i.e., antibody labelling). Examples example, Colcher, D. et al. (1999) Ann N Y Acad Sci 10 of detectable Substances include various enzymes, prosthetic 880:263-80; and Reiter, Y. (1996) Clin Cancer Res 2:245 groups, fluorescent materials, luminescent materials, biolu 52). The single chain antibody can be dimerized or multi minescent materials, and radioactive materials. Examples of merized to generate multivalent antibodies having specifici Suitable enzymes include horseradish peroxidase, alkaline ties for different epitopes of the same target 53070, 15985, phosphatase, B-galactosidase, or acetylcholinesterase; 26583, 21953, m32404, 14089, or 23436 protein. 15 examples of Suitable prosthetic group complexes include In a preferred embodiment, the antibody has effector streptavidin/biotin and avidin/biotin; examples of suitable function, and/or can fix complement. In other embodiments, fluorescent materials include umbelliferone, fluorescein, the antibody does not, recruit effector cells, or fix comple fluorescein isothiocyanate, rhodamine, dichlorotriaziny ment. lamine fluorescein, dansyl chloride or phycoerythrin; an In a preferred embodiment, the antibody has reduced or example of a luminescent material includes luminol; no ability to bind an Fc receptor. For example, it is a isotype examples of bioluminescent materials include luciferase, or subtype, fragment or other mutant, which does not luciferin, and aequorin, and examples of suitable radioactive Support binding to an Fc receptor, e.g., it has a mutagenized material include l’I, I, S or H. or deleted Fc receptor binding region. The invention also includes a nucleic acid which encodes In a preferred embodiment, an anti-53070 antibody alters 25 an anti-53070, -15985, -26583, -21953, -m32404, -14089, or (e.g., increases or decreases) the kinase activity of a 53070 -23436 antibody, e.g., an anti-53070, -15985, -26583, polypeptide. For example, the antibody can bind at or in -21953, -m32404, -14089, or -23436 antibody described proximity to the active site, e.g., to an epitope that includes herein. Also included are vectors which include the nucleic a residue located from about 120 to 180 of SEQ ID NO:2. acid and cells transformed with the nucleic acid, particularly In a preferred embodiment, an anti-15985 antibody alters 30 cells which are useful for producing an antibody, e.g., (e.g., increases or decreases) an activity of a 15985 polypep mammalian cells, e.g. CHO or lymphatic cells. tide, e.g. phosphorylation of a protein substrate. The invention also includes cell lines, e.g., hybridomas, In a preferred embodiment, an anti-21953 antibody alters which make an anti-53070, -15985, -265.83, -21953, (e.g., increases or decreases) the prolyl oligopeptidase activ -m32404, -14089, or -23436 antibody, e.g., and antibody ity of a 21953 polypeptide. For example, the antibody can 35 described herein, and method of using said cells to make a specifically bind a residue of the active site of 21953 53070, 15985, 26583, 21953, m32404, 14089, or 23436 polypeptide, e.g., a residue located between about 730 to antibody. 750, 805 to 830, 835 to 860 of SEQID NO:20. The antibody can block the binding of substrate to the 21953 polypeptide. Recombinant Expression Vectors, Host Cells and Geneti In another preferred embodiment, the antibody specifically 40 cally Engineered Cells binds a residue in the 21953 prolyl oligopeptidase domain, In another aspect, the invention includes, vectors, prefer e.g., from about amino acid 672 to 744, or 610 to 883 of SEQ ably expression vectors, containing a nucleic acid encoding ID NO:20, or in the DPP IV N-terminal residue, e.g., a a polypeptide described herein. As used herein, the term residue between about amino acids 88 to 663 of SEQ ID “vector” refers to a nucleic acid molecule capable of trans NO:2O. 45 porting another nucleic acid to which it has been linked and In a preferred embodiment, an anti-m32404 antibody can include a plasmid, cosmid or viral vector. The vector can alters (e.g., increases or decreases) the proteolytic activity of be capable of autonomous replication or it can integrate into an m32404 polypeptide. For example, the antibody can bind a host DNA. Viral vectors include, e.g., replication defective at or in proximity to the active site, e.g., to an epitope that retroviruses, adenoviruses and adeno-associated viruses. includes a residue located from about 73 to 78, 337 to 342, 50 A vector can include a 53070, 15985, 26583, 21953, or 218 to 229 of SEQ ID NO:25. m32404, 14089, or 23436 nucleic acid in a form suitable for In a preferred embodiment, the antibody alters (e.g., expression of the nucleic acid in a host cell. Preferably the increases or decreases) the de-ubiquitinating activity of a recombinant expression vector includes one or more regu 23436 polypeptide. latory sequences operatively linked to the nucleic acid The antibody can be coupled to a toxin, e.g., a polypeptide 55 sequence to be expressed. The term “regulatory sequence' toxin, e.g., ricin or diphtheria toxin or active fragment hereof, includes promoters, enhancers and other expression control or a radioactive nucleus, or imaging agent, e.g. a radioactive, elements (e.g., polyadenylation signals). Regulatory enzymatic, or other, e.g., imaging agent, e.g., a NMR sequences include those which direct constitutive expression contrast agent. Labels which produce detectable radioactive of a nucleotide sequence, as well as tissue-specific regula emissions or fluorescence are preferred. 60 tory and/or inducible sequences. The design of the expres An anti-53070, -15985, -26583, -21953, -m32404, sion vector can depend on Such factors as the choice of the -14089, or -23436 antibody (e.g., monoclonal antibody) can host cell to be transformed, the level of expression of protein be used to isolate 53070, 15985, 26583, 21953, m32404, desired, and the like. The expression vectors of the invention 14089, or 23436 by standard techniques, such as affinity can be introduced into host cells to thereby produce proteins chromatography or immunoprecipitation. Moreover, an anti 65 or polypeptides, including fusion proteins or polypeptides, 53070, -15985, -26583, -21953, -m32404, -14089, or encoded by nucleic acids as described herein (e.g., 53070, -23436 antibody can be used to detect 53070, 15985, 26583, 15985, 26583, 21953, m32404, 14089, or 23436 proteins, US 7,282,360 B2 99 100 mutant forms of 53070, 15985, 26583, 21953, m32404, ments. For example, commonly used promoters are derived 14089, or 23436 proteins, fusion proteins, and the like). from polyoma, Adenovirus 2, cytomegalovirus and Simian The recombinant expression vectors of the invention can Virus 40. be designed for expression of 53070, 15985, 26583, 21953, In another embodiment, the promoter is an inducible m32404, 14089, or 23436 proteins in prokaryotic or eukary promoter, e.g., a promoter regulated by a steroid hormone, otic cells. For example, polypeptides of the invention can be by a polypeptide hormone (e.g., by means of a signal expressed in E. coli, insect cells (e.g., using baculovirus transduction pathway), or by a heterologous polypeptide expression vectors), yeast cells or mammalian cells. Suitable (e.g., the tetracycline-inducible systems, "Tet-On” and "Tet host cells are discussed further in Goeddel, (1990) Gene Off: see, e.g., Clontech Inc., CA, Gossen and Bujard (1992) Expression Technology. Methods in Enzymology 185, Aca 10 Proc. Natl. Acad. Sci. USA 89:5547, and Paillard (1989) demic Press, San Diego, Calif. Alternatively, the recombi Human Gene Therapy 9:983). nant expression vector can be transcribed and translated in In another embodiment, the recombinant mammalian vitro, for example using T7 promoter regulatory sequences expression vector is capable of directing expression of the and T7 polymerase. nucleic acid preferentially in a particular cell type (e.g., Expression of proteins in prokaryotes is most often car 15 tissue-specific regulatory elements are used to express the ried out in E. coli with vectors containing constitutive or nucleic acid). Non-limiting examples of Suitable tissue inducible promoters directing the expression of either fusion specific promoters include the albumin promoter (liver or non-fusion proteins. Fusion vectors add a number of specific; Pinkert et al. (1987) Genes Dev. 1:268-277), lym amino acids to a protein encoded therein, usually to the phoid-specific promoters (Calame and Eaton (1988) Adv. amino terminus of the recombinant protein. Such fusion Immunol. 43:235-275), in particular promoters of T cell vectors typically serve three purposes: 1) to increase expres receptors (Winoto and Baltimore (1989) EMBO J. 8:729 sion of recombinant protein; 2) to increase the solubility of 733) and immunoglobulins (Banerji et al. (1983) Cell the recombinant protein; and 3) to aid in the purification of 33:729-740; Queen and Baltimore (1983) Cell 33:741-748), the recombinant protein by acting as a ligand in affinity neuron-specific promoters (e.g., the neurofilament promoter; purification. Often, a proteolytic cleavage site is introduced 25 Byrne and Ruddle (1989) Proc. Natl. Acad. Sci. USA at the junction of the fusion moiety and the recombinant 86:5473-5477), -specific promoters (Edlund et al. protein to enable separation of the recombinant protein from (1985) Science 230:912-916), and mammary gland-specific the fusion moiety Subsequent to purification of the fusion promoters (e.g., milk whey promoter: U.S. Pat. No. 4,873, protein. Such enzymes, and their cognate recognition 316 and European Application Publication No. 264,166). sequences, include Factor Xa, thrombin and enterokinase. 30 Developmentally-regulated promoters are also encom Typical fusion expression vectors include pGEX (Pharmacia passed, for example, the murine hoX promoters (Kessel and Biotech Inc.; Smith, D. B. and Johnson, K. S. (1988) Gene Gruss (1990) Science 249:374-379) and the C-fetoprotein 67:31-40), pMAL (New England Biolabs, Beverly, Mass.) promoter (Campes and Tilghman (1989) Genes Dev. 3:537 and pRIT5 (Pharmacia, Piscataway, N.J.) which fuse glu 546). tathione S- (GST), maltose E binding protein, or 35 The invention further provides a recombinant expression protein A, respectively, to the target recombinant protein. vector comprising a DNA molecule of the invention cloned Purified fusion proteins can be used in 53070, 15985, into the expression vector in an antisense orientation. Regu 26583, 21953, m32404, 14089, or 23436 activity assays, latory sequences (e.g., viral promoters and/or enhancers) (e.g., direct assays or competitive assays described in detail operatively linked to a nucleic acid cloned in the antisense below), or to generate antibodies specific for 53070, 15985, 40 orientation can be chosen which direct the constitutive, 26583, 21953, m32404, 14089, or 23436 proteins. In a tissue specific or cell type specific expression of antisense preferred embodiment, a fusion protein expressed in a RNA in a variety of cell types. The antisense expression retroviral expression vector of the present invention can be vector can be in the form of a recombinant plasmid, used to infect bone marrow cells which are subsequently phagemid or attenuated virus. transplanted into irradiated recipients. The pathology of the 45 Another aspect the invention provides a host cell which includes a nucleic acid molecule described herein, e.g., a subject recipient is then examined after sufficient time has 53070, 15985, 26583, 21953, m32404, 14089, or 23436 passed (e.g., six weeks). nucleic acid molecule within a recombinant expression To maximize recombinant protein expression in E. coli is vector or a 53070, 15985, 26583, 21953, m32404, 14089, or to express the protein in a host bacteria with an impaired 50 23436 nucleic acid molecule containing sequences which capacity to proteolytically cleave the recombinant protein allow it to homologously recombine into a specific site of the (Gottesman, S., (1990) Gene Expression Technology. Meth host cells genome. The terms “host cell' and “recombinant ods in Enzymology 185, Academic Press, San Diego, Calif. host cell are used interchangeably herein. Such terms refer 119-128). Another strategy is to alter the nucleic acid not only to the particular subject cell but to the progeny or sequence of the nucleic acid to be inserted into an expression 55 potential progeny of Such a cell. Because certain modifica vector so that the individual codons for each amino acid are tions may occur in Succeeding generations due to either those preferentially utilized in E. coli (Wada et al., (1992) mutation or environmental influences. Such progeny may Nucleic Acids Res. 20:2111-2118). Such alteration of nucleic not, in fact, be identical to the parent cell, but are still acid sequences of the invention can be carried out by included within the scope of the term as used herein. standard DNA synthesis techniques. 60 A host cell can be any prokaryotic or eukaryotic cell. For The 53070, 15985, 26583, 21953, m32404, 14089, or example, a 53070, 15985, 26583,21953, m32404, 14089, or 23436 expression vector can be a yeast expression vector, a 23436 protein can be expressed in bacterial cells (such as E. vector for expression in insect cells, e.g., a baculovirus coli), insect cells, yeast or mammalian cells (such as Chinese expression vector or a vector Suitable for expression in hamster ovary cells (CHO) or COS cells (African green mammalian cells. 65 monkey kidney cells CV-1 origin SV40 cells: Gluzman When used in mammalian cells, the expression vector's (1981) CellI23:175-182)). Other suitable host cells are control functions can be provided by viral regulatory ele known to those skilled in the art. US 7,282,360 B2 101 102 Vector DNA can be introduced into host cells via con can be used to insert the heterologous DNA as described in, ventional transformation or transfection techniques. As used e.g., Chappel, U.S. Pat. No. 5.272,071; WO 91/06667, herein, the terms “transformation' and “transfection' are published in May 16, 1991. intended to refer to a variety of art-recognized techniques for In a preferred embodiment, recombinant cells described introducing foreign nucleic acid (e.g., DNA) into a host cell, 5 herein can be used for replacement therapy in a Subject. For including calcium phosphate or calcium chloride co-precipi example, a nucleic acid encoding a 53070, 15985, 26583, tation, DEAE-dextran-mediated transfection, lipofection, or 21953, m32404, 14089, or 23436 polypeptide operably electroporation. linked to an inducible promoter (e.g., a steroid hormone A host cell of the invention can be used to produce (i.e., receptor-regulated promoter) is introduced into a human or express) a 53070, 15985, 26583, 21953, m32404, 14089, or 10 nonhuman, e.g., mammalian, e.g., porcine recombinant cell. 23436 protein. Accordingly, the invention further provides The cell is cultivated and encapsulated in a biocompatible methods for producing a 53070, 15985, 26583, 21953, material. Such as poly-lysine alginate, and Subsequently m32404, 14089, or 23436 protein using the host cells of the implanted into the subject. See, e.g., Lanza (1996) Nat. invention. In one embodiment, the method includes cultur 15 Biotechnol. 14:1107; Joki et al. (2001) Nat. Biotechnol. ing the host cell of the invention (into which a recombinant 19:35; and U.S. Pat. No. 5,876,742. Production of 53070, expression vector encoding a 53070, 15985, 26583, 21953, 15985,265.83,21953, m32404, 14089, or 23436 polypeptide m32404, 14089, or 23436 protein has been introduced) in a can be regulated in the Subject by administering an agent suitable medium such that a 53070, 15985, 26583, 21953, (e.g., a steroid hormone) to the Subject. In another preferred m32404, 14089, or 23436 protein is produced. In another embodiment, the implanted recombinant cells express and embodiment, the method further includes isolating a 53070, secrete an antibody specific for a 53070, 15985, 26583, 15985, 26583, 21953, m32404, 14089, or 23436 protein 21953, m32404, 14089, or 23436 polypeptide. The antibody from the medium or the host cell. can be any antibody or any antibody derivative described In another aspect, the invention features, a cell or purified herein. preparation of cells which include a 53070, 15985, 26583, 25 21953, m32404, 14089, or 23436 transgene, or which oth 53070, 15985, 26583, 21953, m32404, 14089, or 23436 erwise misexpress 53070, 15985, 26583, 21953, m32404, Transgenic Animals 14089, or 23436. The cell preparation can consist of human The invention provides non-human transgenic animals. or non-human cells, e.g., rodent cells, e.g., mouse or rat Such animals are useful for studying the function and/or cells, rabbit cells, or pig cells. In preferred embodiments, the 30 activity of a 53070, 15985, 26583, 21953, m32404, 14089, cell or cells include a 53070, 15985, 26583, 21953, m32404, or 23436 protein and for identifying and/or evaluating 14089, or 23436 transgene, e.g., a heterologous form of a modulators of 53070, 15985, 26583, 21953, m32404, 53070, 15985, 26583, 21953, m32404, 14089, or 23436, 14089, or 23436 activity. As used herein, a “transgenic e.g., a gene derived from humans (in the case of a non animal' is a non-human animal, preferably a mammal, more human cell). The 53070, 15985, 26583, 21953, m32404, 35 preferably a rodent such as a rat or mouse, in which one or 14089, or 23436 transgene can be misexpressed, e.g., over more of the cells of the animal includes a transgene. Other expressed or underexpressed. In other preferred embodi examples of transgenic animals include non-human pri ments, the cell or cells include a gene that mis-expresses an mates, sheep, dogs, cows, goats, chickens, amphibians, and endogenous 53070, 15985, 26583, 21953, m32404, 14089, the like. A transgene is exogenous DNA or a rearrangement, or 23436, e.g., a gene the expression of which is disrupted, 40 e.g., a deletion of endogenous chromosomal DNA, which e.g., a knockout. Such cells can serve as a model for preferably is integrated into or occurs in the genome of the studying disorders that are related to mutated or mis-ex cells of a transgenic animal. A transgene can direct the pressed 53070, 15985, 26583, 21953, m32404, 14089, or expression of an encoded gene product in one or more cell 23436 alleles or for use in drug screening. types or tissues of the transgenic animal, other transgenes, In another aspect, the invention features, a human cell, 45 e.g., a knockout, reduce expression. Thus, a transgenic e.g., a hematopoietic stem cell, transformed with nucleic animal can be one in which an endogenous 53070, 15985, acid which encodes a subject 53070, 15985, 26583, 21953, 26583, 21953, m32404, 14089, or 23436 gene has been m32404, 14089, or 23436 polypeptide. altered by, e.g., by homologous recombination between the Also provided are cells, preferably human cells, e.g., endogenous gene and an exogenous DNA molecule intro human hematopoietic or fibroblast cells, in which an endog 50 duced into a cell of the animal, e.g., an embryonic cell of the enous 53070, 15985, 26583, 21953, m32404, 14089, or animal, prior to development of the animal. 23436 is under the control of a regulatory sequence that does Intronic sequences and polyadenylation signals can also not normally control the expression of the endogenous be included in the transgene to increase the efficiency of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 expression of the transgene. A tissue-specific regulatory gene. The expression characteristics of an endogenous gene 55 sequence(s) can be operably linked to a transgene of the within a cell, e.g., a cell line or microorganism, can be invention to direct expression of a 53070, 15985, 26583, modified by inserting a heterologous DNA regulatory ele 21953, m32404, 14089, or 23436 protein to particular cells. ment into the genome of the cell such that the inserted A transgenic founder animal can be identified based upon regulatory element is operably linked to the endogenous the presence of a 53070, 15985, 26583, 21953, m32404, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 60 14089, or 23436 transgene in its genome and/or expression gene. For example, an endogenous 53070, 15985, 26583, of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 21953, m32404, 14089, or 23436 gene which is “transcrip mRNA in tissues or cells of the animals. A transgenic tionally silent, e.g., not normally expressed, or expressed founder animal can then be used to breed additional animals only at very low levels, may be activated by inserting a carrying the transgene. Moreover, transgenic animals carry regulatory element which is capable of promoting the 65 ing a transgene encoding a 53070, 15985, 26583, 21953, expression of a normally expressed gene product in that cell. m32404, 14089, or 23436 protein can further be bred to Techniques such as targeted homologous recombinations, other transgenic animals carrying other transgenes. US 7,282,360 B2 103 104 53070, 15985, 26583, 21953, m32404, 14089, or 23436 23436 polypeptide; and evaluating ability of the compound proteins or polypeptides can be expressed in transgenic to interact with, e.g., to bind or form a complex with the animals or plants, e.g., a nucleic acid encoding the protein or subject 53070, 15985, 26583, 21953, m32404, 14089, or polypeptide can be introduced into the genome of an animal. 23436 polypeptide. This method can be performed in vitro, In preferred embodiments the nucleic acid is placed under 5 e.g., in a cell free system, or in Vivo, e.g., in a two-hybrid the control of a tissue specific promoter, e.g., a milk or egg interaction trap assay. This method can be used to identify specific promoter, and recovered from the milk or eggs naturally occurring molecules that interact with Subject produced by the animal. Suitable animals are mice, pigs, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 cows, goats, and sheep. polypeptide. It can also be used to find natural or synthetic The invention also includes a population of cells from a 10 inhibitors of subject 53070, 15985, 26583, 21953, m32404, transgenic animal, as discussed, e.g., below. 14089, or 23436 polypeptide. Screening methods are dis cussed in more detail below. Uses of 53070, 15985, 26583, 21953, m32404, 14089, and The 21953 polypeptide is also an enzyme useful for 23436 processing polypeptide hormone precursors. For example, The nucleic acid molecules, proteins, protein homo- 15 the 21953 polypeptide can be used in a method that includes logues, and antibodies described herein can be used in one a) providing a polypeptide hormone precursor; b) combining or more of the following methods: a) screening assays; b) the polypeptide hormone polypeptide with a 21953 polypep predictive medicine (e.g., diagnostic assays, prognostic tide or active fragment thereof (e.g., in an effective amount) assays, monitoring clinical trials, and pharmacogenetics); to provide a reaction mixture; and c) maintaining the mix and c) methods of treatment (e.g., therapeutic and prophy- 20 ture under conditions such that the polypeptide hormone lactic). precursor is modified to yield the processed polypeptide The isolated nucleic acid molecules of the invention can hormone, e.g., an active form thereof. The method can be used, for example, to express a 53070, 15985, 26583, further include d) separating the processed polypeptide 21953, m32404, 14089, or 23436 protein (e.g., via a recom hormone from the 21953 polypeptide. The polypeptide binant expression vector in a host cell in gene therapy 25 hormone precursor can be obtained from a synthetic process applications), to detect a 53070, 15985, 26583, 21953, or from a producing cell. The method can be used in the m32404, 14089, or 2343.6 mRNA (e.g., in a biological preparation of a pharmaceutical composition that includes sample) or a genetic alteration in a 53070, 15985, 26583, the processed hormone. 21953, m32404, 14089, or 23436 gene, and to modulate 53070, 15985, 26583, 21953, m32404, 14089, or 23436 30 53070, 15985, 26583, 21953, m32404, 14089, and 23436 activity, as described further below. The 53070, 15985, Screening Assays 26583, 21953, m32404, 14089, or 23436 proteins can be The invention provides methods (also referred to herein as used to treat disorders characterized by insufficient or exces 'screening assays”) for identifying modulators, i.e., candi sive production of a 53070, 15985, 26583, 21953, m32404, date or test compounds or agents (e.g., proteins, peptides, 14089, or 23436 substrate or production of 53070, 15985, 35 peptidomimetics, peptoids, Small molecules or other drugs) 26583, 21953, m32404, 14089, or 23436 inhibitors. In which bind to 53070, 15985, 26583, 21953, m32404, 14089, addition, the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 proteins, have a stimulatory or inhibitory effect on, or 23436 proteins can be used to screen for naturally for example, 53070, 15985, 26583, 21953, m32404, 14089, occurring 53070, 15985, 26583, 21953, m32404, 14089, or or 23436 expression or 53070, 15985, 26583, 21953, 23436 Substrates, to Screen for drugs or compounds which 40 m32404, 14089, or 23436 activity, or have a stimulatory or modulate 53070, 15985, 26583, 21953, m32404, 14089, or inhibitory effect on, for example, the expression or activity 23436 activity, as well as to treat disorders characterized by of a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 insufficient or excessive production of 53070, 15985, 26583, Substrate. Compounds thus identified can be used to modu 21953, m32404, 14089, or 23436 protein or production of late the activity of target gene products (e.g., 53070, 15985, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 45 26583, 21953, m32404, 14089, or 23436 genes) in a thera protein forms which have decreased, aberrant or unwanted peutic protocol, to elaborate the biological function of the activity compared to 53070, 15985, 26583, 21953, m32404, target gene product, or to identify compounds that disrupt 14089, or 23436 wild type protein (e.g., a cellular prolif normal target gene interactions. erative and/or differentiative disorder; imbalance of protein In any screening assay, a 26583 polypeptide that may serine/threonine kinase and protein serine/threonine phos- 50 have, e.g., a serine/threonine phosphatase domain, can be phorylase activities, leading to an increase or decrease in used. lipid biosynthesis, Such as cholesterol or cell cycle progres In one embodiment, the invention provides assays for sion and neoplastic transformation; a cancer, e.g., a cancer of screening candidate or test compounds which are Substrates the lung, prostate, breast, ovary, or colon; or an erythroid cell of a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 disorder or a proliferative disorder of erythroid, liver, pros- 55 protein or polypeptide or a biologically active portion tate, or brain cells). Moreover, the anti-53070, -15985, thereof. In another embodiment, the invention provides -265.83, -21953, -m32404, -14089, or -23436 antibodies of assays for screening candidate or test compounds that bind the invention can be used to detect and isolate 53070, 15985, to or modulate an activity of a 53070, 15985, 26583,21953, 26583, 21953, m32404, 14089, or 23436 proteins, regulate m32404, 14089, or 23436 protein or polypeptide or a the bioavailability of 53070, 15985, 26583, 21953, m32404, 60 biologically active portion thereof. 14089, or 23436 proteins, and modulate 53070, 15985, In one embodiment, an activity of a 53070 protein can be 26583, 21953, m32404, 14089, or 23436 activity. assayed directly in vitro by: expressing an affinity tagged A method of evaluating a compound for the ability to 53070 protein in either bacteria or an appropriate mamma interact with, e.g., bind, a subject 53070, 15985, 26583, lian cell line; purifiying the 53070 protein, e.g., by immu 21953, m32404, 14089, or 23436 polypeptide is provided. 65 noprecipitation or in an affinity column; mixing the 53070 The method includes: contacting the compound with the protein with radioactively labeled ATP, e.g., K-P-ATP; and subject 53070, 15985, 26583, 21953, m32404, 14089, or determining the amount of radioactive phosphate that is US 7,282,360 B2 105 106 transferred to proteins in the presence and absence of a tosidase) is engineered as a translation fusion with an suitable substrate. Alternatively, an activity of a 53070 amino-terminal ubiquitin moiety. The Substrate is incubated protein can be assayed indirectly by overexpressing the in solution with a polypeptide Such as 23436 or a fragment protein in an appropritate mammalian cell line and then thereof Suspected of having ubiquitin specific protease activ assaying for an increase in phosporylation of a 53070 ity. The production of free ubiquitin or the de-ubiquitinated Substrate that is present in the cells, or by assaying for a reporter protein can be determined, e.g., by PAGE electro cellular response, e.g., altered cell morphology, the adoption phoresis and comparion to a control incubation lacking the of a transformed phenotype, increased cell migration, or 23436 polypeptide (Zhu et al. (1997) Journal of Biological increased cell growth or cell death. Assays like these are Chemistry 272:51-57). A similar assay can be performed well known in the art and could easily be adapted to allow 10 using a reporter polypeptide having a lysine side chain to for the analysis of 53070 proteins. which a ubiquitin moiety is conjugated. In one embodiment, the activity of a 15985 protein can be The test compounds of the present invention can be assayed in a manner acceptable for detecting kinase activity. obtained using any of the numerous approaches in combi For example, kinase activity can be assayed in kinase natorial library methods known in the art, including: bio reaction buffer containing 20 mM MgAcetate, 20 mM ATP, 15 logical libraries; peptoid libraries (libraries of molecules 100 mM NaCl, 100 mM Tris-HCl pH 6.8, 1 mM ZnCl2 and having the functionalities of peptides, but with a novel, 2.5 mCi Og32PATP and 1 mg myelin basic protein. The non-peptide backbone which are resistant to enzymatic kinase reaction can be allowed to proceed for 30 minutes degradation but which nevertheless remain bioactive; see, before termination by addition of sample buffer with 10 mM e.g., Zuckermann, R. N. et al. (1994) J. Med. Chem. EDTA. Following separation by SDS-PAGE, gels can be 37:2678-85); spatially addressable parallel solid phase or stained with Coomassie Blue and Subjected to autoradiog Solution phase libraries; synthetic library methods requiring raphy. Burgess et al. (2001).J. Biol. Chem. published Jul. 25. deconvolution; the one-bead one-compound library 2001 as 10.1074/bc.M105153200. method; and synthetic library methods using affinity chro The prolyl oligopeptidase activity of a 21953 polypeptide matography selection. The biological library and peptoid can be assayed in vitro using an enzymatic assay Such as 25 library approaches are limited to peptide libraries, while the described in Abbott et al. (199) FEBS Lett. 458:278-284 and other four approaches are applicable to peptide, non-peptide Abbott et al. (2000) Eur: J. Biochem 267:6140-4150. A oligomer or Small molecule libraries of compounds (Lam sample to be assayed is combined with Substrate in phos (1997) Anticancer Drug Des. 12:145). phate buffer pH 7.4. Substrates include H-Gly-Pro-p-nitroa Examples of methods for the synthesis of molecular nilide (NA)/HCl (Sigma Corp, MO, USA), and Gly-Pro-7- 30 libraries can be found in the art, for example in: DeWitt et amino-4-trifluoromethylcoumarin (Calbiochem, San Diego, al. (1993) Proc. Natl. Acad. Sci. U.S.A. 90:6909; Erb et al. Calif., USA) and other peptidyl substrates. The reaction is (1994) Proc. Natl. Acad. Sci. USA 91:11422; Zuckermann et incubated for 30 minutes at 37° C. For example, hydrolysis al. (1994).J. Med. Chem. 37:2678; Cho et al. (1993) Science of H-Gly-Pro-pNA is monitored spectroscopically at 405 261:1303: Carrell et al. (1994) Angew. Chem. Int. Ed. Engl. nm. The sample to be assayed can be a purified 21953 35 33:2059; Carell et al. (1994) Angew. Chem. Int. Ed. Engl. polypeptide, e.g., a 21953 polypeptide or a 21953 fusion 33:2061; and Gallop et al. (1994) J. Med. Chem. 37:1233. protein purified by a method described herein. Routine Libraries of compounds may be presented in Solution Michaelis-Menten analysis of kinetic parameters can be (e.g., Houghten (1992) Biotechniques 13:412–421), or on used to quantify the enzymatic activity. Alternatively, the beads (Lam (1991) Nature 354:82-84), chips (Fodor (1993) reaction can be quenched and total Substrate hydrolyzed can 40 Nature 364:555-556), bacteria (Ladner, U.S. Pat. No. 5,223, be measured as indication of the activity. 409), spores (Ladner U.S. Pat. No. 5,223.409), plasmids De-ubiquitination assays useful for detecting a ubiquitin (Cullet al. (1992) Proc Natl AcadSci USA 89:1865-1869) carboxy-terminal hydrolase activity are described, for or on phage (Scott and Smith (1990) Science 249:386-390; example, in Zhu et al. (1997) Journal of Biological Chem Devlin (1990) Science 249:404–406; Cwirla et al. (1990) istry 272:51-57, Mitch et al. (1999) American Journal of 45 Proc. Natl. Acad. Sci. 87:6378-6382: Felici (1991).J. Mol. Physiology 276:C1132-C1138, Liu et al. (1999) Molecular Biol. 222:301-310; Ladner supra.). and Cell Biology 19:3029-3038, and such as those cited in In one embodiment, an assay is a cell-based assay in various reviews, for example, Ciechanover et al. (1994) The which a cell which expresses a 53070, 15985, 26583,21953, FASEB Journal 8:182-192, Chiechanover (1994) Biol. m32404, 14089, or 23436 protein or biologically active Chem. Hoppe-Seyler 375:565-581, Hershko et al. (1998) 50 portion thereof is contacted with a test compound, and the Annual Review of Biochemistry 67:425-479, Swartz (1999) ability of the test compound to modulate 53070, 15985, Annual Review of Medicine 50:57-74, Ciechanover (1998) 26583, 21953, m32404, 14089, or 23436 activity is deter EMBO Journal 17:7151-7160, and D’Andrea et al. (1998) mined. Determining the ability of the test compound to Critical Reviews in Biochemistry and Molecular Biology modulate 53070 activity can be accomplished by monitor 33:337-352. These assays include, but are not limited to, the 55 ing, for example, Substrate phosphorylation. Determining disappearance of Substrate, including a decrease in the the ability of the test compound to modulate 15985 activity amount of polyubiquitin or ubiquitinated Substrate protein or can be accomplished by monitoring, for example, protein protein remnant, appearance of intermediate and end prod kinase activity and/or microtubule binding. Determining the ucts, such as appearance of free ubiquitin monomers, gen ability of the test compound to modulate 26583 activity can eral protein turnover, specific protein turnover, ubiquitin 60 be accomplished by monitoring, for example, serine/threo binding, binding to ubiquitinated Substrate protein, Subunit nine phosphatase activity. Determining the ability of the test interaction, interaction with ATP, interaction with cellular compound to modulate 21953 activity can be accomplished components such as trans-acting regulatory factors, stabili by monitoring, for example, prolyl oligopeptidase activity. Zation of specific proteins, and the like. Determining the ability of the test compound to modulate For example, in order to identify a polypeptide having 65 m32404 activity can be accomplished by monitoring, for ubiquitin carboxy-terminal hydrolase activity in vitro, a example, trypsin protease activity. Determining the ability of reporter protein (e.g., green fluorescent protein or B-galac the test compound to modulate 14089 activity can be accom US 7,282,360 B2 107 108 plished by monitoring, for example, protease activity. Deter n-octylglucoside, n-dodecylglucoside, n-dodecylmaltoside, mining the ability of the test compound to modulate 23436 octanoyl-N-methylglucamide, decanoyl-N-methylglucam activity can be accomplished by monitoring, for example, ide, Triton RX-100, Triton RX-114, ThesitR), Isotridecypoly de-ubiquitinating activity. (ethylene glycol ether), 3-(3-cholamidopropyl)dimethy The cell, for example, can be of mammalian origin, e.g., lamminio-1-propane sulfonate (CHAPS), 3-(3- human. cholamidopropyl)dimethylamminio-2-hydroxy-1-propane The ability of the test compound to modulate 53070, sulfonate (CHAPSO), or N-dodecyl-N,N-dimethyl-3-am 15985, 26583, 21953, m32404, 14089, or 23436 binding to monio-1-propane Sulfonate. a compound, e.g., a 53070, 15985, 26583, 21953, m32404, Cell-free assays involve preparing a reaction mixture of 14089, or 23436 substrate, or to bind to 53070, 15985, 10 the target gene protein and the test compound under condi 26583, 21953, m32404, 14089, or 23436 can also be evalu tions and for a time sufficient to allow the two components ated. This can be accomplished, for example, by coupling to interact and bind, thus forming a complex that can be the compound, e.g., the Substrate, with a radioisotope or removed and/or detected. enzymatic label Such that binding of the compound, e.g., the The interaction between two molecules can also be substrate, to 53070, 15985, 26583, 21953, m32404, 14089, 15 detected, e.g., using fluorescence energy transfer (FET) (see, or 23436 can be determined by detecting the labeled com for example, Lakowicz et al., U.S. Pat. No. 5,631, 169; pound, e.g., substrate, in a complex. Alternatively, 53070, Stavrianopoulos, et al., U.S. Pat. No. 4,868,103). A fluoro 15985, 26583, 21953, m32404, 14089, or 23436 could be phore label on the first, donor molecule is selected such coupled with a radioisotope or enzymatic label to monitor that its emitted fluorescent energy will be absorbed by a the ability of a test compound to modulate 53070, 15985, fluorescent label on a second, acceptor molecule, which in 26583,21953, m32404, 14089, or 23436 binding to a 53070, turn is able to fluoresce due to the absorbed energy. Alter 15985, 26583, 21953, m32404, 14089, or 23436 substrate in nately, the donor protein molecule may simply utilize the a complex. For example, compounds (e.g., 53070, 15985, natural fluorescent energy of tryptophan residues. Labels are 26583, 21953, m32404, 14089, or 23436 substrates) can be chosen that emit different wavelengths of light, such that the labeled with 'I, S, ''C, or H, either directly or indi 25 acceptor molecule label may be differentiated from that of rectly, and the radioisotope detected by direct counting of the donor. Since the efficiency of energy transfer between radioemmission or by Scintillation counting. Alternatively, the labels is related to the distance separating the molecules, compounds can be enzymatically labeled with, for example, the spatial relationship between the molecules can be horseradish peroxidase, alkaline phosphatase, or luciferase, assessed. In a situation in which binding occurs between the and the enzymatic label detected by determination of con 30 molecules, the fluorescent emission of the acceptor mol version of an appropriate Substrate to product. ecule label in the assay should be maximal. An FET binding The ability of a compound (e.g., a 53070, 15985, 26583, event can be conveniently measured through standard fluo 21953, m32404, 14089, or 23436 substrate) to interact with rometric detection means well known in the art (e.g., using 53070, 15985, 26583,21953, m32404, 14089, or 23436 with a fluorimeter). or without the labeling of any of the interactants can be 35 In another embodiment, determining the ability of the evaluated. For example, a microphysiometer can be used to 53070, 15985, 26583, 21953, m32404, 14089, or 23436 detect the interaction of a compound with 53070, 15985, protein to bind to a target molecule can be accomplished 26583, 21953, m32404, 14089, or 23436 without the label using real-time Biomolecular Interaction Analysis (BIA) ing of either the compound or the 53070, 15985, 26583, (see, e.g., Solander, S. and Urbaniczky, C. (1991) Anal. 21953, m32404, 14089, or 23436. McConnell, H. M. et al. 40 Chem. 63:2338-2345 and Szabo et al. (1995) Curr. Opin. (1992) Science 257: 1906-1912. As used herein, a “micro Struct. Biol. 5:699-705). "Surface plasmon resonance” or physiometer” (e.g., Cytosensor) is an analytical instrument “BIA' detects biospecific interactions in real time, without that measures the rate at which a cell acidifies its environ labeling any of the interactants (e.g., BIAcore). Changes in ment using a light-addressable potentiometric sensor the mass at the binding Surface (indicative of a binding (LAPS). Changes in this acidification rate can be used as an 45 event) result in alterations of the refractive index of light indicator of the interaction between a compound and 53070, near the Surface (the optical phenomenon of Surface plasmon 15985, 26583, 21953, m32404, 14089, or 23436. resonance (SPR)), resulting in a detectable signal which can In yet another embodiment, a cell-free assay is provided be used as an indication of real-time reactions between in which a 53070, 15985, 26583, 21953, m32404, 14089, or biological molecules. 23436 protein or biologically active portion thereof is con 50 In one embodiment, the target gene product or the test tacted with a test compound and the ability of the test Substance is anchored onto a solid phase. The target gene compound to bind to the 53070, 15985, 26583, 21953, product/test compound complexes anchored on the Solid m32404, 14089, or 23436 protein or biologically active phase can be detected at the end of the reaction. Preferably, portion thereof is evaluated. Preferred biologically active the target gene product can be anchored onto a solid Surface, portions of the 53070, 15985, 26583, 21953, m32404, 55 and the test compound, (which is not anchored), can be 14089, or 23436 proteins to be used in assays of the present labeled, either directly or indirectly, with detectable labels invention include fragments which participate in interac discussed herein. tions with non-53070, 15985, 26583, 21953, m32404, It may be desirable to immobilize either 53070, 15985, 14089, or 23436 molecules, e.g., fragments with high sur 26583, 21953, m32404, 14089, or 23436, an anti-53070, face probability scores. 60 -15985, -26583, -21953, -m32404, -14089, or -23436 anti Soluble and/or membrane-bound forms of isolated pro body or its target molecule to facilitate separation of com teins (e.g., 53070, 15985, 26583, 21953, m32404, 14089, or plexed from uncomplexed forms of one or both of the 23436 proteins or biologically active portions thereof) can proteins, as well as to accommodate automation of the assay. be used in the cell-free assays of the invention. When Binding of a test compound to a 53070, 15985, 26583, membrane-bound forms of the protein are used, it may be 65 21953, m32404, 14089, or 23436 protein, or interaction of desirable to utilize a solubilizing agent. Examples of Such a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 solubilizing agents include non-ionic detergents such as protein with a target molecule in the presence and absence US 7,282,360 B2 109 110 of a candidate compound, can be accomplished in any vessel Alternatively, cell free assays can be conducted in a liquid Suitable for containing the reactants. Examples of Such phase. In Such an assay, the reaction products are separated vessels include microtiter plates, test tubes, and micro from unreacted components, by any of a number of standard centrifuge tubes. In one embodiment, a fusion protein can be techniques, including but not limited to: differential cen provided which adds a domain that allows one or both of the trifugation (see, for example, Rivas, G., and Minton, A. P. proteins to be bound to a matrix. For example, glutathione (1993) Trends Biochem Sci 18:284-7); chromatography (gel S-transferase/53070, 15985, 26583, 21953, m32404, 14089, filtration chromatography, ion-exchange chromatography): or 23436 fusion proteins or glutathione-S-transferase/target electrophoresis (see, e.g., Ausubel, F. et al., eds. Current fusion proteins can be adsorbed onto glutathione Sepharose Protocols in Molecular Biology 1999, J. Wiley: New York.); beads (Sigma Chemical, St. Louis, Mo.) or glutathione 10 and immunoprecipitation (see, for example, Ausubel, F. et derivatized microtiter plates, which are then combined with al., eds. (1999) Current Protocols in Molecular Biology, J. the test compound or the test compound and either the Wiley: New York). Such resins and chromatographic tech non-adsorbed target protein or 53070, 15985, 26583,21953, niques are known to one skilled in the art (see, e.g., m32404, 14089, or 23436 protein, and the mixture incubated Heegaard, N.H., (1998) J Mol Recognit 11:141-8; Hage, D. under conditions conducive to complex formation (e.g., at 15 S., and Tweed, S. A. (1997) J Chromatogr B Biomed Sci physiological conditions for salt and pH). Following incu Appl. 699:499-525). Further, fluorescence energy transfer bation, the beads or microtiter plate wells are washed to may also be conveniently utilized, as described herein, to remove any unbound components, the matrix immobilized detect binding without further purification of the complex in the case of beads, complex determined either directly or from solution. indirectly, for example, as described above. Alternatively, In a preferred embodiment, the assay includes contacting the complexes can be dissociated from the matrix, and the the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 level of 53070, 15985, 26583, 21953, m32404, 14089, or protein or biologically active portion thereof with a known 2343.6 binding or activity determined using standard tech compound which binds 53070, 15985, 26583, 21953, niques. m32404, 14089, or 23436 to form an assay mixture, con 25 tacting the assay mixture with a test compound, and deter Other techniques for immobilizing either a 53070, 15985, mining the ability of the test compound to interact with a 26583, 21953, m32404, 14089, or 23436 protein or a target 53070, 15985, 26583, 21953, m32404, 14089, or 23436 molecule on matrices include using conjugation of biotin protein, wherein determining the ability of the test com and streptavidin. Biotinylated 53070, 15985, 26583, 21953, pound to interact with a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 protein or target molecules can be 30 m32404, 14089, or 23436 protein includes determining the prepared from biotin-NHS (N-hydroxy-succinimide) using ability of the test compound to preferentially bind to 53070, techniques known in the art (e.g., biotinylation kit, Pierce 15985, 26583, 21953, m32404, 14089, or 23436 or biologi Chemicals, Rockford, Ill.), and immobilized in the wells of cally active portion thereof, or to modulate the activity of a streptavidin-coated 96 well plates (Pierce Chemical). target molecule, as compared to the known compound. In order to conduct the assay, the non-immobilized com 35 The target gene products of the invention can, in Vivo, ponent is added to the coated Surface containing the interact with one or more cellular or extracellular macro anchored component. After the reaction is complete, unre molecules, such as proteins. For the purposes of this dis acted components are removed (e.g., by washing) under cussion, such cellular and extracellular macromolecules are conditions such that any complexes formed will remain referred to herein as “binding partners. Compounds that immobilized on the solid surface. The detection of com 40 disrupt such interactions can be useful in regulating the plexes anchored on the solid Surface can be accomplished in activity of the target gene product. Such compounds can a number of ways. Where the previously non-immobilized include, but are not limited to molecules Such as antibodies, component is pre-labeled, the detection of label immobilized peptides, and Small molecules. The preferred target genes/ on the surface indicates that complexes were formed. Where products for use in this embodiment are the 53070, 15985, the previously non-immobilized component is not pre-la 45 26583, 21953, m32404, 14089, or 23436 genes herein beled, an indirect label can be used to detect complexes identified. In an alternative embodiment, the invention pro anchored on the Surface; e.g., using a labeled antibody vides methods for determining the ability of the test com specific for the immobilized component (the antibody, in pound to modulate the activity of a 53070, 15985, 26583, turn, can be directly labeled or indirectly labeled with, e.g., 21953, m32404, 14089, or 23436 protein through modula a labeled anti-Ig antibody). 50 tion of the activity of a downstream effector of a 53070, In one embodiment, this assay is performed utilizing 15985, 26583, 21953, m32404, 14089, or 23436 target antibodies reactive with 53070, 15985, 26583, 21953, molecule. For example, the activity of the effector molecule m32404, 14089, or 23436 protein or target molecules but on an appropriate target can be determined, or the binding of which do not interfere with binding of the 53070, 15985, the effector to an appropriate target can be determined, as 26583,21953, m32404, 14089, or 23436 protein to its target 55 previously described. molecule. Such antibodies can be derivatized to the wells of To identify compounds that interfere with the interaction the plate, and unbound target or 53070, 15985, 26583, between the target gene product and its cellular or extracel 21953, m32404, 14089, or 23436 protein trapped in the lular binding partner(s), a reaction mixture containing the wells by antibody conjugation. Methods for detecting Such target gene product and the binding partner is prepared, complexes, in addition to those described above for the 60 under conditions and for a time sufficient, to allow the two GST-immobilized complexes, include immunodetection of products to form complex. In order to test an inhibitory complexes using antibodies reactive with the 53070, 15985, agent, the reaction mixture is provided in the presence and 26583, 21953, m32404, 14089, or 23436 protein or target absence of the test compound. The test compound can be molecule, as well as enzyme-linked assays which rely on initially included in the reaction mixture, or can be added at detecting an enzymatic activity associated with the 53070, 65 a time Subsequent to the addition of the target gene and its 15985, 26583, 21953, m32404, 14089, or 23436 protein or cellular or extracellular binding partner. Control reaction target molecule. mixtures are incubated without the test compound or with a US 7,282,360 B2 111 112 placebo. The formation of any complexes between the target In an alternate embodiment of the invention, a homoge gene product and the cellular or extracellular binding partner neous assay can be used. For example, a preformed complex is then detected. The formation of a complex in the control of the target gene product and the interactive cellular or reaction, but not in the reaction mixture containing the test extracellular binding partner product is prepared in that compound, indicates that the compound interferes with the either the target gene products or their binding partners are interaction of the target gene product and the interactive labeled, but the signal generated by the label is quenched binding partner. Additionally, complex formation within due to complex formation (see, e.g., U.S. Pat. No. 4,109,496 reaction mixtures containing the test compound and normal that utilizes this approach for immunoassays). The addition target gene product can also be compared to complex of a test Substance that competes with and displaces one of formation within reaction mixtures containing the test com 10 the species from the preformed complex will result in the pound and mutant target gene product. This comparison can generation of a signal above background. In this way, test be important in those cases wherein it is desirable to identify Substances that disrupt target gene product-binding partner compounds that disrupt interactions of mutant but not nor interaction can be identified. mal target gene products. In yet another aspect, the 53070, 15985, 26583, 21953, These assays can be conducted in a heterogeneous or 15 m32404, 14089, or 23436 proteins can be used as “bait homogeneous format. Heterogeneous assays involve proteins’ in a two-hybrid assay or three-hybrid assay (see, anchoring either the target gene product or the binding e.g., U.S. Pat. No. 5.283,317; Zervos et al. (1993) Cell partner onto a solid phase, and detecting complexes 72:223-232; Madura et al. (1993).J. Biol. Chem. 268: 12046 anchored on the solid phase at the end of the reaction. In 12054: Bartel et al. (1993) Biotechniques 14:920-924; homogeneous assays, the entire reaction is carried out in a Iwabuchi et al. (1993) Oncogene 8:1693-1696; and Brent liquid phase. In either approach, the order of addition of WO94/10300), to identify other proteins, which bind to or reactants can be varied to obtain different information about interact with 53070, 15985, 26583, 21953, m32404, 14089, the compounds being tested. For example, test compounds or 23436 (“53070-, 15985- 26583-, 21953-, m32404-, that interfere with the interaction between the target gene 14089-, or 23436-binding proteins” or “53070-, 15985-, products and the binding partners, e.g., by competition, can 25 26583-, 21953-, m32404-, 14089-, or 23436-bp') and are be identified by conducting the reaction in the presence of involved in 53070, 15985, 26583,21953, m32404, 14089, or the test Substance. Alternatively, test compounds that disrupt 23436 activity. Such 53070-, 15985- 26583-, 21953-, preformed complexes, e.g., compounds with higher binding m32404-, 14089-, or 23436-bps can be activators or inhibi constants that displace one of the components from the tors of signals by the 53070, 15985, 26583, 21953, m32404, complex, can be tested by adding the test compound to the 30 14089, or 23436 proteins or 53070, 15985, 26583, 21953, reaction mixture after complexes have been formed. The m32404, 14089, or 23436 targets as, for example, down various formats are briefly described below. stream elements of a 53070-, 15985- 26583-, 21953-, In a heterogeneous assay system, either the target gene m32404-, 14089-, or 23436-mediated signaling pathway. product or the interactive cellular or extracellular binding The two-hybrid system is based on the modular nature of partner, is anchored onto a solid Surface (e.g., a microtiter 35 most transcription factors, which consist of separable DNA plate), while the non-anchored species is labeled, either binding and activation domains. Briefly, the assay utilizes directly or indirectly. The anchored species can be immo two different DNA constructs. In one construct, the gene that bilized by non-covalent or covalent attachments. Alterna codes for a 53070, 15985, 26583, 21953, m32404, 14089, or tively, an immobilized antibody specific for the species to be 23436 protein is fused to a gene encoding the DNA binding anchored can be used to anchor the species to the Solid 40 domain of a known transcription factor (e.g., GAL-4). In the Surface. other construct, a DNA sequence, from a library of DNA In order to conduct the assay, the partner of the immobi sequences, that encodes an unidentified protein (“prey” or lized species is exposed to the coated surface with or without “sample') is fused to a gene that codes for the activation the test compound. After the reaction is complete, unreacted domain of the known transcription factor. (Alternatively the: components are removed (e.g., by washing) and any com 45 53070, 15985, 26583, 21953, m32404, 14089, or 23436 plexes formed will remain immobilized on the solid surface. protein can be the fused to the activator domain.) If the Where the non-immobilized species is pre-labeled, the “bait' and the “prey’ proteins are able to interact, in vivo, detection of label immobilized on the surface indicates that forming a 53070-, 15985- 26583-, 21953-, m32404-, complexes were formed. Where the non-immobilized spe 14089-, or 23436-dependent complex, the DNA-binding and cies is not pre-labeled, an indirect label can be used to detect 50 activation domains of the transcription factor are brought complexes anchored on the Surface; e.g., using a labeled into close proximity. This proximity allows transcription of antibody specific for the initially non-immobilized species a reporter gene (e.g., lacZ) which is operably linked to a (the antibody, in turn, can be directly labeled or indirectly transcriptional regulatory site responsive to the transcription labeled with, e.g., a labeled anti-Ig antibody). Depending factor. Expression of the reporter gene can be detected and upon the order of addition of reaction components, test 55 cell colonies containing the functional transcription factor compounds that inhibit complex formation or that disrupt can be isolated and used to obtain the cloned gene which preformed complexes can be detected. encodes the protein which interacts with the 53070, 15985, Alternatively, the reaction can be conducted in a liquid 26583, 21953, m32404, 14089, or 23436 protein. phase in the presence or absence of the test compound, the In another embodiment, modulators of 53070, 15985, reaction products separated from unreacted components, and 60 26583, 21953, m32404, 14089, or 23436 expression are complexes detected; e.g., using an immobilized antibody identified. For example, a cell or cell free mixture is con specific for one of the binding components to anchor any tacted with a candidate compound and the expression of complexes formed in Solution, and a labeled antibody spe 53070, 15985, 26583, 21953, m32404, 14089, or 23436 cific for the other partner to detect anchored complexes. mRNA or protein evaluated relative to the level of expres Again, depending upon the order of addition of reactants to 65 sion of 53070, 15985, 26583, 21953, m32404, 14089, or the liquid phase, test compounds that inhibit complex or that 2343.6 mRNA or protein in the absence of the candidate disrupt preformed complexes can be identified. compound. When expression of 53070, 15985, 26583, US 7,282,360 B2 113 114 21953, m32404, 14089, or 2343.6 mRNA or protein is novel agents identified by the above-described screening greater in the presence of the candidate compound than in its assays can be used for treatments as described herein. absence, the candidate compound is identified as a stimula 53070, 15985, 26583, 21953, m32404, 14089, and 23436 tor of 53070, 15985, 26583, 21953, m32404, 14089, or Detection Assays 2343.6 mRNA or protein expression. Alternatively, when Portions or fragments of the nucleic acid sequences expression of 53070, 15985, 26583,21953, m32404, 14089, identified herein can be used as polynucleotide reagents. For or 2343.6 mRNA or protein is less (statistically significantly example, these sequences can be used to: (i) map their less) in the presence of the candidate compound than in its respective genes on a chromosome e.g., to locate gene absence, the candidate compound is identified as an inhibitor 10 regions associated with genetic disease or to associate of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 53070, 15985, 26583,21953, m32404, 14089, or 23436 with mRNA or protein expression. The level of 53070, 15985, a disease; (ii) identify an individual from a minute biological 26583, 21953, m32404, 14089, or 2343.6 mRNA or protein sample (tissue typing); and (iii) aid in forensic identification expression can be determined by methods described herein of a biological sample. These applications are described in for detecting 53070, 15985, 26583, 21953, m32404, 14089, the subsections below. or 23436 mRNA or protein. 15 In another aspect, the invention pertains to a combination 53070, 15985, 26583, 21953, m32404, 14089, and 23436 of two or more of the assays described herein. For example, Chromosome Mapping a modulating agent can be identified using a cell-based or a The 53070, 15985, 26583, 21953, m32404, 14089, and cell free assay, and the ability of the agent to modulate the 23436 nucleotide sequences or portions thereof can be used activity of a 53070 protein can be confirmed in vivo, e.g., in to map the location of the 53070, 15985, 26583, 21953, m32404, 14089, and 23436 genes on a chromosome. This an animal such as an animal model for cellular proliferative process is called chromosome mapping. Chromosome map and/or differentiative disorder. For example, a modulating ping is useful in correlating the 53070, 15985, 26583, agent can be identified using a cell-based or a cell free assay, 21953, m32404, 14089, or 23436 sequences with genes and the ability of the agent to modulate the activity of a 25 15985 protein can be confirmed in vivo, e.g., in an animal associated with disease. Such as an animal model for neural migration defects, The 23436 nucleotide sequences or portions thereof can immune cell migration defects, or metastasis. For example, be used to map the location of the 23436 genes on a a modulating agent can be identified using a cell-based or a chromosome, particularly chromosome 1, e.g., chromo cell free assay, and the ability of the agent to modulate the 30 Somal cytogenetic region 1 p36. This process is called chro activity of a 26583 protein can be confirmed in vivo, e.g., in mosome mapping. Chromosome mapping is useful in cor an animal Such as an animal model overexpressing a gene relating the 23436 sequences with genes associated with encoding a protein serine/threonine kinase. For example, a disease Such prostate cancer and/or brain cancer (see, e.g., modulating agent can be identified using a cell-based or a Gibbs et al. (1999) Am. J. Hum. Genet. 64.776). cell free assay, and the ability of the agent to modulate the 35 Briefly, 53070, 15985, 26583, 21953, m32404, 14089, or activity of a 21953 protein can be confirmed in vivo, e.g., in 23436 genes can be mapped to chromosomes by preparing an animal Such as an animal model for a cell proliferative or PCR primers (preferably 15-25 bp in length) from the cell differentiative disorder, e.g., a cancer, e.g., a cancer of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 the lung, prostate, breast, or colon; an animal model for an nucleotide sequences. These primers can then be used for immunological disorder, or an animal model for a neuro 40 PCR screening of somatic cell hybrids containing individual logical disorder. For example, a modulating agent can be human chromosomes. Only those hybrids containing the identified using a cell-based or a cell free assay, and the human gene corresponding to the 53070, 15985, 26583, ability of the agent to modulate the activity of an m32404 21953, m32404, 14089, or 23436 sequences will yield an protein can be confirmed in Vivo, e.g., in an animal Such as amplified fragment. an animal model for cancer. For example, a modulating 45 A panel of somatic cell hybrids in which each cell line agent can be identified using a cell-based or a cell free assay, contains either a single human chromosome or a small and the ability of the agent to modulate the activity of a number of human chromosomes, and a full set of mouse 14089 protein can be confirmed in vivo, e.g., in an animal chromosomes, can allow easy mapping of individual genes Such as an animal model for cancer. For example, a modu to specific human chromosomes. (DEustachio P. et al. lating agent can be identified using a cell-based or a cell free 50 (1983) Science 220:919-924). assay, and the ability of the agent to modulate the activity of Other mapping strategies e.g., in situ hybridization (de a 23436 protein can be confirmed in vivo, e.g., in an animal scribed in Fan, Y. et al. (1990) Proc. Natl. Acad. Sci. USA, Such as an animal model for an erythroid cell disorder or a 87:6223-27), pre-screening with labeled flow-sorted chro proliferative disorder of erythroid, liver, prostate, or brain mosomes, and pre-selection by hybridization to chromo cells. 55 some specific cDNA libraries can be used to map 53070, This invention further pertains to novel agents identified 15985, 26583, 21953, m32404, 14089, or 23436 to a chro by the above-described screening assays. Accordingly, it is mosomal location. within the scope of this invention to further use an agent Fluorescence in situ hybridization (FISH) of a DNA identified as described herein (e.g., a 53070, 15985, 26583, sequence to a metaphase chromosomal spread can further be 21953, m32404, 14089, or 23436 modulating agent, an 60 used to provide a precise chromosomal location in one step. antisense 53070, 15985, 26583, 21953, m32404, 14089, or The FISH technique can be used with a DNA sequence as 23436 nucleic acid molecule, a 53070-, 15985- 26583-, short as 500 or 600 bases. However, clones larger than 1,000 21953-, m32404-, 14089-, or 23436-specific antibody, or a bases have a higher likelihood of binding to a unique 53070-, 15985- 26583-, 21953-, m32404-, 14089-, or chromosomal location with Sufficient signal intensity for 23436-binding partner) in an appropriate animal model to 65 simple detection. Preferably 1,000 bases, and more prefer determine the efficacy, toxicity, side effects, or mechanism ably 2,000 bases will suffice to get good results at a of action, of treatment with Such an agent. Furthermore, reasonable amount of time. For a review of this technique, US 7,282,360 B2 115 116 see Verma et al., Human Chromosomes: A Manual of Basic differentiate individuals. The noncoding sequences of SEQ Techniques (1988) Pergamon Press, New York). ID NO:1, 7, 14, 19, 24, 33, or 40 can provide positive Reagents for chromosome mapping can be used individu individual identification with a panel of perhaps 10 to 1,000 ally to mark a single chromosome or a single site on that primers which each yield a noncoding amplified sequence of chromosome, or panels of reagents can be used for marking 100 bases. If predicted coding sequences, such as those in multiple sites and/or multiple chromosomes. Reagents cor SEQ ID NO:3, 9, 16, 21, 26, 35, or 42 are used, a more responding to noncoding regions of the genes actually are appropriate number of primers for positive individual iden preferred for mapping purposes. Coding sequences are more tification would be 500-2,000. likely to be conserved within gene families, thus increasing If a panel of reagents from 53070, 15985, 26583, 21953, the chance of cross hybridizations during chromosomal 10 m32404, 14089, or 23436 nucleotide sequences described mapping. herein is used to generate a unique identification database Once a sequence has been mapped to a precise chromo for an individual, those same reagents can later be used to Somal location, the physical position of the sequence on the identify tissue from that individual. Using the unique iden chromosome can be correlated with genetic map data. (Such tification database, positive identification of the individual, data are found, for example, in V. McKusick, Mendelian 15 living or dead, can be made from extremely small tissue Inheritance in Man, available on-line through Johns Hopkins samples. University Welch Medical Library). The relationship between a gene and a disease, mapped to the same chromo Use of Partial 53070, 15985, 26583, 21953, m32404, 14089, Somal region, can then be identified through linkage analysis or 23436 Sequences in Forensic Biology (co-inheritance of physically adjacent genes), described in, DNA-based identification techniques can also be used in for example, Egeland, J. et al. (1987) Nature, 325:783-787. forensic biology. To make such an identification, PCR Moreover, differences in the DNA sequences between technology can be used to amplify DNA sequences taken individuals affected and unaffected with a disease associated from very Small biological samples such as tissues, e.g., hair with the 53070, 15985, 26583, 21953, m32404, 14089, or or skin, or body fluids, e.g., blood, saliva, or semen found at 23436 gene, can be determined. If a mutation is observed in 25 a crime scene. The amplified sequence can then be compared some or all of the affected individuals but not in any to a standard, thereby allowing identification of the origin of unaffected individuals, then the mutation is likely to be the the biological sample. causative agent of the particular disease. Comparison of The sequences of the present invention can be used to affected and unaffected individuals generally involves first provide polynucleotide reagents, e.g., PCR primers, targeted looking for structural alterations in the chromosomes. Such 30 to specific loci in the human genome, which can enhance the as deletions or translocations that are visible from chromo reliability of DNA-based forensic identifications by, for some spreads or detectable using PCR based on that DNA example, providing another “identification marker” (i.e. sequence. Ultimately, complete sequencing of genes from another DNA sequence that is unique to a particular indi several individuals can be performed to confirm the presence vidual). As mentioned above, actual base sequence infor of a mutation and to distinguish mutations from polymor 35 mation can be used for identification as an accurate alter phisms. native to patterns formed by restriction enzyme generated fragments. Sequences targeted to noncoding regions of SEQ 53070, 15985, 26583, 21953, m32404, 14089, and 23436 ID NO:1, 7, 14, 19, 24, 33, or 40 (e.g., fragments derived Tissue Typing from the noncoding regions of SEQID NO:1, 7, 14, 19, 24, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 40 33, or 40 having a length of at least 20 bases, preferably at sequences can be used to identify individuals from biologi least 30 bases) are particularly appropriate for this use. cal samples using, e.g., restriction fragment length polymor The 53070, 15985, 26583, 21953, m32404, 14089, or phism (RFLP). In this technique, an individuals genomic 23436 nucleotide sequences described herein can further be DNA is digested with one or more restriction enzymes, the used to provide polynucleotide reagents, e.g., labeled or fragments separated, e.g., in a Southern blot, and probed to 45 labelable probes which can be used in, for example, an in yield bands for identification. The sequences of the present situ hybridization technique, to identify a specific tissue. invention are useful as additional DNA markers for RFLP This can be very useful in cases where a forensic pathologist (described in U.S. Pat. No. 5.272,057). is presented with a tissue of unknown origin. Panels of Such Furthermore, the sequences of the present invention can 53070, 15985, 26583, 21953, m32404, 14089, or 23436 also be used to determine the actual base-by-base DNA 50 probes can be used to identify tissue by species and/or by sequence of selected portions of an individual’s genome. Organ type. Thus, the 53070, 15985, 26583, 21953, m32404, 14089, or The 26583 nucleotide sequences described herein can 23436 nucleotide sequences described herein can be used to further be used to provide polynucleotide reagents, e.g., prepare two PCR primers from the 5' and 3' ends of the labeled or labelable probes which can be used in, for sequences. These primers can then be used to amplify an 55 example, an in situ hybridization technique, to identify a individual’s DNA and subsequently sequence it. Panels of specific tissue, e.g., a tissue containing 26583 serine/threo corresponding DNA sequences from individuals, prepared in nine phosphatase activity. this manner, can provide unique individual identifications, In a similar fashion, these reagents, e.g., 53070, 15985, as each individual will have a unique set of such DNA 26583, 21953, m32404, 14089, or 23436 primers or probes sequences due to allelic differences. 60 can be used to screen tissue culture for contamination (i.e. Allelic variation occurs to some degree in the coding screen for the presence of a mixture of different types of regions of these sequences, and to a greater degree in the cells in a culture). noncoding regions. Each of the sequences described herein can, to some degree, be used as a standard against which Predictive Medicine of 53070, 15985, 26583, 21953, DNA from an individual can be compared for identification 65 m32404, 14089, or 23436 purposes. Because greater numbers of polymorphisms occur The present invention also pertains to the field of predic in the noncoding regions, fewer sequences are necessary to tive medicine in which diagnostic assays, prognostic assays, US 7,282,360 B2 117 118 and monitoring clinical trials are used for prognostic (pre Methods of the invention can be used prenatally or to dictive) purposes to thereby treat an individual. determine if a subjects offspring will be at risk for a Generally, the invention provides, a method of determin disorder. ing if a subject is at risk for a disorder related to a lesion in In preferred embodiments the method includes determin or the misexpression of a gene which encodes 53070, 15985, 5 ing the structure of a 53070, 15985, 26583,21953, m32404, 26583, 21953, m32404, 14089, or 23436. 14089, or 23436 gene, an abnormal structure being indica Such disorders include, e.g., a disorder associated with the tive of risk for the disorder. misexpression of 53070 gene, such as a cellular proliferative In preferred embodiments the method includes contacting and/or differentiative disorder; a disorder associated with the a sample from the subject with an antibody to the 53070, misexpression of 15985 gene, e.g., a cancer, a neurological 10 15985, 26583, 21953, m32404, 14089, or 23436 protein or or a cardiovascular (e.g., blood vessel) disorder; a disorder a nucleic acid, which hybridizes specifically with the gene. associated with the misexpression of 21953 gene, a disorder These and other embodiments are discussed below. of cell proliferation (such as lung, breast, colon, prostate, or ovarian cancer) or of the nervous system; a disorder asso Diagnostic and Prognostic Assays of 53070, 15985, 26583, ciated with the misexpression of the m32404 gene, a disor 15 21953, m32404, 14089, or 23436 der of cell differentiation or proliferaiton, or of the immune Diagnostic and prognostic assays of the invention include system or blood clotting system; a disorder associated with method for assessing the expression level of 53070, 15985, the misexpression of 14089 gene, a disorder of the comple 26583, 21953, m32404, 14089, or 23436 molecules and for ment system; and a disorder associated with the misexpres identifying variations and mutations in the sequence of sion of 23436 gene, a disorder of the hematopoietic system, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 e.g., of erythroid cells or erythroid cell precursors. molecules. The method includes one or more of the following: Expression Monitoring and Profiling: detecting, in a tissue of the Subject, the presence or absence of a mutation which affects the expression of the The presence, level, or absence of 53070, 15985, 26583, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 25 21953, m32404, 14089, or 23436 protein or nucleic acid in gene, or detecting the presence or absence of a mutation in a biological sample can be evaluated by obtaining a bio a region which controls the expression of the gene, e.g., a logical sample from a test Subject and contacting the bio mutation in the 5' control region; logical sample with a compound or an agent capable of detecting, in a tissue of the Subject, the presence or detecting 53070, 15985, 26583, 21953, m32404, 14089, or absence of a mutation which alters the structure of the 30 23436 protein or nucleic acid (e.g., mRNA, genomic DNA) 53070, 15985, 26583, 21953, m32404, 14089, or 23436 that encodes 53070, 15985, 26583, 21953, m32404, 14089, gene; or 23436 protein such that the presence of 53070, 15985, detecting, in a tissue of the Subject, the misexpression of 26583, 21953, m32404, 14089, or 23436 protein or nucleic the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 acid is detected in the biological sample. The term “biologi gene, at the mRNA level, e.g., detecting a non-wild type 35 cal sample' includes tissues, cells and biological fluids level of a mRNA: isolated from a subject, as well as tissues, cells and fluids detecting, in a tissue of the Subject, the misexpression of present within a subject. A preferred biological sample is the gene, at the protein level, e.g., detecting a non-wild type serum. The level of expression of the 53070, 15985, 26583, level of a 53070, 15985, 26583, 21953, m32404, 14089, or 21953, m32404, 14089, or 23436 gene can be measured in 23436 polypeptide. 40 a number of ways, including, but not limited to: measuring In preferred embodiments the method includes: ascertain the mRNA encoded by the 53070, 15985, 26583, 21953, ing the existence of at least one of a deletion of one or more m32404, 14089, or 23436 genes; measuring the amount of nucleotides from the 53070, 15985, 26583, 21953, m32404, protein encoded by the 53070, 15985, 26583, 21953, 14089, or 23436 gene; an insertion of one or more nucle m32404, 14089, or 23436 genes; or measuring the activity otides into the gene, a point mutation, e.g., a Substitution of 45 of the protein encoded by the 53070, 15985, 26583, 21953, one or more nucleotides of the gene, a gross chromosomal m32404, 14089, or 23436 genes. rearrangement of the gene, e.g., a translocation, inversion, or The level of mRNA corresponding to the 53070, 15985, deletion. 26583, 21953, m32404, 14089, or 23436 gene in a cell can For example, detecting the genetic lesion can include: (i) be determined both by in situ and by in vitro formats. providing a probe?primer including an oligonucleotide con 50 The isolated mRNA can be used in hybridization or taining a region of nucleotide sequence which hybridizes to amplification assays that include, but are not limited to, a sense or antisense sequence from SEQID NO:1, 7, 14, 19. Southern or Northern analyses, polymerase chain reaction 24, 33, or 40, or naturally occurring mutants thereof or 5' or analyses and probe arrays. One preferred diagnostic method 3' flanking sequences naturally associated with the 53070, for the detection of mRNA levels involves contacting the 15985, 26583, 21953, m32404, 14089, or 23436 gene; (ii) 55 isolated mRNA with a nucleic acid molecule (probe) that exposing the probe?primer to nucleic acid of the tissue; and can hybridize to the mRNA encoded by the gene being detecting, by hybridization, e.g., in situ hybridization, of the detected. The nucleic acid probe can be, for example, a probe?primer to the nucleic acid, the presence or absence of full-length 53070, 15985, 26583, 21953, m32404, 14089, or the genetic lesion. 23436 nucleic acid, such as the nucleic acid of SEQ ID In preferred embodiments detecting the misexpression 60 NO:1, 7, 14, 19, 24, 33, or 40, respectively, or a portion includes ascertaining the existence of at least one of an thereof, such as an oligonucleotide of at least 7, 15, 30, 50. alteration in the level of a messenger RNA transcript of the 100, 250 or 500 nucleotides in length and sufficient to 53070, 15985, 26583, 21953, m32404, 14089, or 23436 specifically hybridize under stringent conditions to 53070, gene; the presence of a non-wild type splicing pattern of a 15985, 26583, 21953, m32404, 14089, or 2343.6 mRNA or messenger RNA transcript of the gene; or a non-wild type 65 genomic DNA. The probe can be disposed on an address of level of 53070, 15985, 26583, 21953, m32404, 14089, or an array, e.g., an array described below. Other Suitable 23436. probes for use in the diagnostic assays are described herein. US 7,282,360 B2 119 120 In one format, mRNA (or cDNA) is immobilized on a The detection methods can be used to detect 53070, surface and contacted with the probes, for example by 15985, 26583, 21953, m32404, 14089, or 23436 protein in running the isolated mRNA on an agarose gel and transfer a biological sample in vitro as well as in vivo. In vitro ring the mRNA from the gel to a membrane, Such as techniques for detection of 53070, 15985, 26583, 21953, nitrocellulose. In an alternative format, the probes are immo m32404, 14089, or 23436 protein include enzyme linked bilized on a surface and the mRNA (or cDNA) is contacted immunosorbent assays (ELISAS), immunoprecipitations, with the probes, for example, in a two-dimensional gene immunofluorescence, enzyme immunoassay (EIA), radio chip array described below. A skilled artisan can adapt immunoassay (RIA), and Western blot analysis. In vivo known mRNA detection methods for use in detecting the techniques for detection of 53070, 15985, 26583, 21953, level of mRNA encoded by the 53070, 15985, 26583,21953, 10 m32404, 14089, or 23436 protein include introducing into a m32404, 14089, or 23436 genes. subject a labeled anti-53070, -15985, -265.83, -21953, The level of mRNA in a sample that is encoded by one of -m32404, -14089, or -23436 antibody. For example, the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 can antibody can be labeled with a radioactive marker whose be evaluated with nucleic acid amplification, e.g., by rtPCR presence and location in a Subject can be detected by (Mullis (1987) U.S. Pat. No. 4,683.202), chain reac 15 standard imaging techniques. In another embodiment, the tion (Barany (1991) Proc. Natl. Acad. Sci. USA 88:189-193), sample is labeled, e.g., biotinylated and then contacted to the self sustained sequence replication (Guatelli et al., (1990) antibody, e.g., an anti-53070, -15985, -26583, -21953, Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional -m32404, -14089, or -23436 antibody positioned on an amplification system (Kwoh et al., (1989), Proc. Natl. Acad. antibody array (as described below). The sample can be Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi et al., detected, e.g., with avidin coupled to a fluorescent label. (1988) Bio/Technology 6: 1197), rolling circle replication In another embodiment, the methods further include con (Lizardi et al., U.S. Pat. No. 5,854,033) or any other nucleic tacting the control sample with a compound or agent capable acid amplification method, followed by the detection of the of detecting 53070, 15985, 26583, 21953, m32404, 14089, amplified molecules using techniques known in the art. As or 23436 protein, and comparing the presence of 53070, used herein, amplification primers are defined as being a pair 25 15985, 26583, 21953, m32404, 14089, or 23436 protein in of nucleic acid molecules that can anneal to 5' or 3' regions the control sample with the presence of 53070, 15985, of a gene (plus and minus Strands, respectively, or Vice 26583, 21953, m32404, 14089, or 23436 protein in the test versa) and contain a short region in between. In general, sample. amplification primers are from about 10 to 30 nucleotides in The invention also includes kits for detecting the presence length and flank a region from about 50 to 200 nucleotides 30 of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 in length. Under appropriate conditions and with appropriate in a biological sample. For example, the kit can include a reagents, such primers permit the amplification of a nucleic compound or agent capable of detecting 53070, 15985, acid molecule comprising the nucleotide sequence flanked 26583, 21953, m32404, 14089, or 23436 protein or mRNA by the primers. in a biological sample; and a standard. The compound or 35 agent can be packaged in a Suitable container. The kit can For in situ methods, a cell or tissue sample can be further comprise instructions for using the kit to detect prepared/processed and immobilized on a Support, typically 53070, 15985, 26583, 21953, m32404, 14089, or 23436 a glass slide, and then contacted with a probe that can protein or nucleic acid. hybridize to mRNA that encodes the 53070, 15985, 26583, For antibody-based kits, the kit can include: (1) a first 21953, m32404, 14089, or 23436 gene being analyzed. 40 antibody (e.g., attached to a Solid Support) which binds to a In another embodiment, the methods further contacting a polypeptide corresponding to a marker of the invention; and, control sample with a compound or agent capable of detect optionally, (2) a second, different antibody which binds to ing 53070, 15985, 26583, 21953, m32404, 14089, or 23436 either the polypeptide or the first antibody and is conjugated mRNA, or genomic DNA, and comparing the presence of to a detectable agent. 53070, 15985, 26583, 21953, m32404, 14089, or 23436 45 For oligonucleotide-based kits, the kit can include: (1) an mRNA or genomic DNA in the control sample with the oligonucleotide, e.g., a detectably labeled oligonucleotide, presence of 53070, 15985, 26583, 21953, m32404, 14089, which hybridizes to a nucleic acid sequence encoding a or 2343.6 mRNA or genomic DNA in the test sample. In still polypeptide corresponding to a marker of the invention or another embodiment, serial analysis of gene expression, as (2) a pair of primers useful for amplifying a nucleic acid described in U.S. Pat. No. 5,695,937, is used to detect 50 molecule corresponding to a marker of the invention. The kit 53070, 15985, 26583, 21953, m32404, 14089, or 23436 can also includes a buffering agent, a preservative, or a transcript levels. protein stabilizing agent. The kit can also includes compo A variety of methods can be used to determine the level nents necessary for detecting the detectable agent (e.g., an of protein encoded by 53070, 15985, 26583, 21953, enzyme or a Substrate). The kit can also contain a control m32404, 14089, or 23436. In general, these methods include 55 sample or a series of control samples which can be assayed contacting an agent that selectively binds to the protein, Such and compared to the test sample contained. Each component as an antibody with a sample, to evaluate the level of protein of the kit can be enclosed within an individual container and in the sample. In a preferred embodiment, the antibody bears all of the various containers can be within a single package, a detectable label. Antibodies can be polyclonal, or more along with instructions for interpreting the results of the preferably, monoclonal. An intact antibody, or a fragment 60 assays performed using the kit. thereof (e.g., Fab or F(ab')) can be used. The term The diagnostic methods described herein can identify “labeled, with regard to the probe or antibody, is intended Subjects having, or at risk of developing, a disease or to encompass direct labeling of the probe or antibody by disorder associated with misexpressed or aberrant or coupling (i.e., physically linking) a detectable Substance to unwanted 53070, 15985, 26583, 21953, m32404, 14089, or the probe or antibody, as well as indirect labeling of the 65 23436 expression or activity. As used herein, the term probe or antibody by reactivity with a detectable substance. “unwanted includes an unwanted phenomenon involved in Examples of detectable substances are provided herein. a biological response Such as deregulated cell proliferation; US 7,282,360 B2 121 122 pain; a cell proliferative or cell differentiative disorder, e.g., mitochondrial related disorder or a cholesterol biosynthesis a cancer, e.g., a cancer of the lung, prostate, breast, or colon related disorder, or a cell proliferation or differentiation or deregulated cell proliferation; or cell proliferation, cell disorder, e.g., a tumor; a cell proliferative or cell differen differentiation, coagulation, or cell signaling. tiative disorder, e.g., a cancer, e.g., a cancer of the lung; or In one embodiment, a disease or disorder associated with an erythroid cell disorder or a proliferative disorder of aberrant or unwanted 53070, 15985, 26583, 21953, m32404, erythroid, liver, prostate, or brain cells, in a subject wherein 14089, or 23436 expression or activity is identified. A test either an increase or a decrease in 53070, 15985, 26583, sample is obtained from a subject and 53070, 15985, 26583, 21953, m32404, 14089, or 23436 expression may be an 21953, m32404, 14089, or 23436 protein or nucleic acid indication that the Subject has or is disposed to having a the (e.g., mRNA or genomic DNA) is evaluated, wherein the 10 disorder. The method can be used to monitor a treatment for level, e.g., the presence or absence, of 53070, 15985, 26583, a disorder in a subject. For example, the gene expression 21953, m32404, 14089, or 23436 protein or nucleic acid is profile can be determined for a sample from a subject diagnostic for a Subject having or at risk of developing a undergoing treatment. The profile can be compared to a disease or disorder associated with aberrant or unwanted reference profile or to a profile obtained from the subject 53070, 15985, 26583, 21953, m32404, 14089, or 23436 15 prior to treatment or prior to onset of the disorder (see, e.g., expression or activity. As used herein, a “test sample” refers Golub et al. (1999) Science 286:531). to a biological sample obtained from a Subject of interest, In yet another aspect, the invention features a method of including a biological fluid (e.g., serum), cell sample, or evaluating a test compound (see also, “Screening Assays”. tissue. above). The method includes providing a cell and a test The prognostic assays described herein can be used to compound; contacting the test compound to the cell, obtain determine whether a Subject can be administered an agent ing a Subject expression profile for the contacted cell; and (e.g., an agonist, antagonist, peptidomimetic, protein, pep comparing the Subject expression profile to one or more tide, nucleic acid, Small molecule, or other drug candidate) reference profiles. The profiles include a value representing to treat a disease or disorder associated with aberrant or the level of 53070, 15985, 26583,21953, m32404, 14089, or unwanted 53070, 15985, 26583, 21953, m32404, 14089, or 25 23436 expression. In a preferred embodiment, the subject 23436 expression or activity. For example, such methods expression profile is compared to a target profile, e.g., a can be used to determine whether a subject can be effectively profile for a normal cell or for desired condition of a cell. treated with an agent for a cellular proliferative and/or The test compound is evaluated favorably if the subject differentiative disorder; a cell motility disorder; a metabolic expression profile is more similar to the target profile than an disorder, e.g., a mitochondrial related disorder or a choles 30 terol biosynthesis related disorder, or a cell proliferation or expression profile obtained from an uncontacted cell. differentiation disorder, e.g., a tumor; a cell proliferative or In another aspect, the invention features, a method of cell differentiative disorder, e.g., a cancer, e.g., a cancer of evaluating a subject. The method includes: a) obtaining a the lung, prostate, breast, or colon disorder; cell prolifera sample from a subject, e.g., from a caregiver, e.g., a car tion, cell differentiation, coagulation, or cell signaling; or an 35 egiver who obtains the sample from the subject; b) deter erythroid cell disorder or a proliferative disorder of eryth mining a subject expression profile for the sample. Option roid, liver, prostate, or brain cells. ally, the method further includes either or both of steps: c) In another aspect, the invention features a computer comparing the Subject expression profile to one or more medium having a plurality of digitally encoded data records. reference expression profiles; and d) selecting the reference Each data record includes a value representing the level of 40 profile most similar to the subject reference profile. The expression of 53070, 15985, 26583,21953, m32404, 14089, subject expression profile and the reference profiles include or 23436 in a sample, and a descriptor of the sample. The a value representing the level of 53070, 15985, 26583, descriptor of the sample can be an identifier of the sample, 21953, m32404, 14089, or 23436 expression. A variety of a Subject from which the sample was derived (e.g., a routine statistical measures can be used to compare two patient), a diagnosis, or a treatment (e.g., a preferred treat 45 reference profiles. One possible metric is the length of the ment). In a preferred embodiment, the data record further distance vector that is the difference between the two pro includes values representing the level of expression of genes files. Each of the subject and reference profile is represented other than 53070, 15985, 26583, 21953, m32404, 14089, or as a multi-dimensional vector, wherein each dimension is a 23436 (e.g., other genes associated with a 53070-, 15985 value in the profile. 26583-, 21953-, m32404-, 14089-, or 23436-disorder, or 50 The method can further include transmitting a result to a other genes on an array). The data record can be structured caregiver. The result can be the Subject expression profile, a as a table, e.g., a table that is part of a database Such as a result of a comparison of the subject expression profile with relational database (e.g., a SQL database of the Oracle or another profile, a most similar reference profile, or a descrip Sybase database environments). tor of any of the aforementioned. The result can be trans Also featured is a method of evaluating a sample. The 55 mitted across a computer network, e.g., the result can be in method includes providing a sample, e.g., from the Subject, the form of a computer transmission, e.g., a computer data and determining a gene expression profile of the sample, signal embedded in a carrier wave. wherein the profile includes a value representing the level of Also featured is a computer medium having executable 53070, 15985, 26583, 21953, m32404, 14089, or 23436 code for effecting the following steps: receive a subject expression. The method can further include comparing the 60 expression profile; access a database of reference expression value or the profile (i.e., multiple values) to a reference value profiles; and either i) select a matching reference profile or reference profile. The gene expression profile of the most similar to the Subject expression profile orii) determine sample can be obtained by any of the methods described at least one comparison score for the similarity of the Subject herein (e.g., by providing a nucleic acid from the sample and expression profile to at least one reference profile. The contacting the nucleic acid to an array). The method can be 65 Subject expression profile, and the reference expression used to diagnose a disorder, e.g., a cellular proliferative profiles each include a value representing the level of 53070, and/or differentiative disorder, a metabolic disorder, e.g., a 15985, 26583,21953, m32404, 14089, or 23436 expression. US 7,282,360 B2 123 124 53070, 15985, 26583, 21953, m32404, 14089, or 23436 further includes amplifying nucleic acid from the sample Arrays and Uses Thereof prior or during contact with the array. In another aspect, the invention features an array that In another embodiment, the array can be used to assay includes a Substrate having a plurality of addresses. At least gene expression in a tissue to ascertain tissue specificity of one address of the plurality includes a capture probe that genes in the array, particularly the expression of 53070, binds specifically to a 53070, 15985, 26583, 21953, 15985, 26583, 21953, m32404, 14089, or 23436. If a m32404, 14089, or 23436 molecule (e.g., a 53070, 15985, Sufficient number of diverse samples is analyzed, clustering 26583, 21953, m32404, 14089, or 23436 nucleic acid or a (e.g., hierarchical clustering, k-means clustering, Bayesian 53070, 15985, 26583, 21953, m32404, 14089, or 23436 clustering and the like) can be used to identify other genes polypeptide). The array can have a density of at least than 10 which are co-regulated with 53070, 15985, 26583, 21953, 10, 50, 100, 200, 500, 1,000, 2,000, or 10,000 or more m32404, 14089, or 23436. For example, the array can be addresses/cm, and ranges between. In a preferred embodi used for the quantitation of the expression of multiple genes. ment, the plurality of addresses includes at least 10, 100, Thus, not only tissue specificity, but also the level of 500, 1,000, 5,000, 10,000, 50,000 addresses. In a preferred expression of a battery of genes in the tissue is ascertained. embodiment, the plurality of addresses includes equal to or 15 Quantitative data can be used to group (e.g., cluster) genes less than 10, 100, 500, 1,000, 5,000, 10,000, or 50,000 on the basis of their tissue expression per se and level of addresses. The Substrate can be a two-dimensional Substrate expression in that tissue. Such as a glass slide, a wafer (e.g., silica or plastic), a mass For example, array analysis of gene expression can be spectroscopy plate, or a three-dimensional Substrate Such as used to assess the effect of cell-cell interactions on 53070, a gel pad. Addresses in addition to address of the plurality 15985, 26583,21953, m32404, 14089, or 23436 expression. can be disposed on the array. A first tissue can be perturbed and nucleic acid from a In a preferred embodiment, at least one address of the second tissue that interacts with the first tissue can be plurality includes a nucleic acid capture probe that hybrid analyzed. In this context, the effect of one cell type on izes specifically to a 53070, 15985, 26583, 21953, m32404, another cell type in response to a biological stimulus can be 14089, or 23436 nucleic acid, e.g., the sense or anti-sense 25 determined, e.g., to monitor the effect of cell-cell interaction strand. In one preferred embodiment, a subset of addresses at the level of gene expression. of the plurality of addresses has a nucleic acid capture probe In another embodiment, cells are contacted with a thera for 53070, 15985, 26583, 21953, m32404, 14089, or 23436. peutic agent. The expression profile of the cells is deter Each address of the subset can include a capture probe that mined using the array, and the expression profile is com hybridizes to a different region of a 53070, 15985, 26583, 30 pared to the profile of like cells not contacted with the agent. 21953, m32404, 14089, or 23436 nucleic acid. In another For example, the assay can be used to determine or analyze preferred embodiment, addresses of the subset include a the molecular basis of an undesirable effect of the therapeu capture probe for a 53070, 15985, 26583, 21953, m32404, tic agent. If an agent is administered therapeutically to treat 14089, or 23436 nucleic acid. Each address of the subset is one cell type but has an undesirable effect on another cell unique, overlapping, and complementary to a different vari 35 type, the invention provides an assay to determine the ant of 53070, 15985, 26583, 21953, m32404, 14089, or molecular basis of the undesirable effect and thus provides 23436 (e.g., an allelic variant, or all possible hypothetical the opportunity to co-administer a counteracting agent or variants). The array can be used to sequence 53070, 15985, otherwise treat the undesired effect. Similarly, even within a 26583, 21953, m32404, 14089, or 23436 by hybridization single cell type, undesirable biological effects can be deter 40 mined at the molecular level. Thus, the effects of an agent on (see, e.g., U.S. Pat. No. 5,695,940). expression of other than the target gene can be ascertained An array can be generated by various methods, e.g., by and counteracted. photolithographic methods (see, e.g., U.S. Pat. Nos. 5,143, In another embodiment, the array can be used to monitor 854: 5,510,270; and 5,527,681), mechanical methods (e.g., expression of one or more genes in the array with respect to directed-flow methods as described in U.S. Pat. No. 5,384, 45 time. For example, samples obtained from different time 261), pin-based methods (e.g., as described in U.S. Pat. No. points can be probed with the array. Such analysis can 5,288,514), and bead-based techniques (e.g., as described in identify and/or characterize the development of a 53070-, PCT US/93/04145). 15985- 26583-, 21953-, m32404-, 14089-, or 23436-asso In another preferred embodiment, at least one address of ciated disease or disorder; and processes, such as a cellular the plurality includes a polypeptide capture probe that binds 50 transformation associated with a 53070-, 15985- 26583-, specifically to a 53070, 15985, 26583, 21953, m32404, 21953-, m32404-, 14089-, or 23436-associated disease or 14089, or 23436 polypeptide or fragment thereof. The disorder. The method can also evaluate the treatment and/or polypeptide can be a naturally-occurring interaction partner progression of a 53070-, 15985- 26583-, 21953-, m32404-, of 53070, 15985, 26583, 21953, m32404, 14089, or 23436 14089-, or 23436-associated disease or disorder. polypeptide. Preferably, the polypeptide is an antibody, e.g., 55 The array is also useful for ascertaining differential an antibody described herein (see “Anti-53070, -15985, expression patterns of one or more genes in normal and -265.83, -21953, -m32404, -14089, or -23436 Antibodies,” abnormal cells. This provides a battery of genes (e.g., above). Such as a monoclonal antibody or a single-chain including 53070, 15985, 26583, 21953, m32404, 14089, or antibody. 23436) that could serve as a molecular target for diagnosis In another aspect, the invention features a method of 60 or therapeutic intervention. analyzing the expression of 53070, 15985, 26583, 21953, In another aspect, the invention features an array having m32404, 14089, or 23436. The method includes providing a plurality of addresses. Each address of the plurality an array as described above; contacting the array with a includes a unique polypeptide. At least one address of the sample and detecting binding of a 53070, 15985, 26583, plurality has disposed thereon a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 molecule (e.g., nucleic 65 21953, m32404, 14089, or 23436 polypeptide or fragment acid or polypeptide) to the array. In a preferred embodiment, thereof. Methods of producing polypeptide arrays are the array is a nucleic acid array. Optionally the method described in the art, e.g., in De Wildt et al. (2000). Nature US 7,282,360 B2 125 126 Biotech. 18,989-994; Lueking et al. (1999). Anal. Biochem. useful, e.g., for analyzing gene expression. The method 270, 103-111; Ge, H. (2000). Nucleic Acids Res. 28, e3, includes: providing a two dimensional array having a plu I-VII; MacBeath, G., and Schreiber, S. L. (2000). Science rality of addresses, each address of the plurality being 289, 1760-1763; and WO 99/51773A1. In a preferred positionally distinguishable from each other address of the embodiment, each addresses of the plurality has disposed plurality having a unique capture probe, contacting the array thereon a polypeptide at least 60, 70, 80,85,90, 95 or 99% with a first sample from a cell or subject which express or identical to a 53070, 15985, 26583, 21953, m32404, 14089, mis-express 53070, 15985, 26583, 21953, m32404, 14089, or 23436 polypeptide or fragment thereof. For example, or 23436 or from a cellor subject in which a 53070-, 15985-, multiple variants of a 53070, 15985, 26583,21953, m32404, 26583-, 21953-, m32404-, 14089-, or 23436-mediated 14089, or 23436 polypeptide (e.g., encoded by allelic vari 10 response has been elicited, e.g., by contact of the cell with ants, site-directed mutants, random mutants, or combinato 53070, 15985, 26583, 21953, m32404, 14089, or 23436 rial mutants) can be disposed at individual addresses of the nucleic acid or protein, or administration to the cell or plurality. Addresses in addition to the address of the plurality subject 53070, 15985, 26583, 21953, m32404, 14089, or can be disposed on the array. 23436 nucleic acid or protein; providing a two dimensional The polypeptide array can be used to detect a 53070, 15 array having a plurality of addresses, each address of the 15985, 26583, 21953, m32404, 14089, or 23436 binding plurality being positionally distinguishable from each other compound, e.g., an antibody in a sample from a Subject with address of the plurality, and each address of the plurality specificity for a 53070, 15985, 26583, 21953, m32404, having a unique capture probe, and contacting the array with 14089, or 23436 polypeptide or the presence of a 53070-, a second sample from a cell or Subject which does not 15985- 26583-, 21953-, m32404-, 14089-, or 23436-bind express 53070, 15985, 26583, 21953, m32404, 14089, or ing protein or ligand. 23436 (or does not express as highly as in the case of the The array is also useful for ascertaining the effect of the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 expression of a gene on the expression of other genes in the positive plurality of capture probes) or from a cell or subject same cell or in different cells (e.g., ascertaining the effect of which in which a 53070, 15985, 26583, 21953, m32404, 53070, 15985, 26583, 21953, m32404, 14089, or 23436 25 14089, or 23436 mediated response has not been elicited (or expression on the expression of other genes). This provides, has been elicited to a lesser extent than in the first sample); for example, for a selection of alternate molecular targets for and comparing the binding of the first sample with the therapeutic intervention if the ultimate or downstream target binding of the second sample. Binding, e.g., in the case of cannot be regulated. a nucleic acid, hybridization with a capture probe at an In another aspect, the invention features a method of 30 address of the plurality, is detected, e.g., by signal generated analyzing a plurality of probes. The method is useful, e.g., from a label attached to the nucleic acid, polypeptide, or for analyzing gene expression. The method includes: pro antibody. The same array can be used for both samples or viding a two dimensional array having a plurality of different arrays can be used. If different arrays are used the addresses, each address of the plurality being positionally plurality of addresses with capture probes should be present distinguishable from each other address of the plurality 35 on both arrays. having a unique capture probe, e.g., wherein the capture In another aspect, the invention features a method of probes are from a cell or subject which express 53070, analyzing 53070, 15985, 26583, 21953, m32404, 14089, or 15985, 26583, 21953, m32404, 14089, or 23436 or from a 23436, e.g., analyzing structure, function, or relatedness to cell or subject in which a 53070, 15985, 26583, 21953, other nucleic acid or amino acid sequences. The method m32404, 14089, or 23436 mediated response has been 40 includes: providing a 53070, 15985, 26583, 21953, m32404, elicited, e.g., by contact of the cell with 53070, 15985, 14089, or 23436 nucleic acid or amino acid sequence; 26583, 21953, m32404, 14089, or 23436 nucleic acid or comparing the 53070, 15985, 26583, 21953, m32404, protein, or administration to the cell or subject 53070, 14089, or 23436 sequence with one or more preferably a 15985, 26583, 21953, m32404, 14089, or 23436 nucleic plurality of sequences from a collection of sequences, e.g., acid or protein; providing a two dimensional array having a 45 a nucleic acid or protein sequence database; to thereby plurality of addresses, each address of the plurality being analyze 53070, 15985, 26583, 21953, m32404, 14089, or positionally distinguishable from each other address of the 23436. plurality, and each address of the plurality having a unique capture probe, e.g., wherein the capture probes are from a Detection of 53070, 15985, 26583, 21953, m32404, 14089, cell or subject which does not express 53070, 15985, 26583, 50 or 23436 Variations or Mutations 21953, m32404, 14089, or 23436 (or does not express as The methods of the invention can also be used to detect highly as in the case of the 53070, 15985, 26583, 21953, genetic alterations in a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 positive plurality of capture m32404, 14089, or 23436 gene, thereby determining if a probes) or from a cell or subject which in which a 53070, subject with the altered gene is at risk for a disorder 15985, 26583, 21953, m32404, 14089, or 23436 mediated 55 characterized by misregulation in 53070, 15985, 26583, response has not been elicited (or has been elicited to a lesser 21953, m32404, 14089, or 23436 protein activity or nucleic extent than in the first sample); contacting the array with one acid expression, Such as a cellular proliferative and/or dif or more inquiry probes (which is preferably other than a ferentiative disorder, a cancer or a neurological disorder, a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 metabolic disorder, e.g., a mitochondrial related disorder or nucleic acid, polypeptide, or antibody), and thereby evalu 60 a cholesterol biosynthesis related disorder, or a cell prolif ating the plurality of capture probes. Binding, e.g., in the eration or differentiation disorder, e.g., a tumor; coagulation, case of a nucleic acid, hybridization with a capture probe at or cell signaling disorders; or an erythroid cell disorder or a an address of the plurality, is detected, e.g., by signal proliferative disorder of erythroid, liver, prostate, or brain generated from a label attached to the nucleic acid, polypep cells disorder. In preferred embodiments, the methods tide, or antibody. 65 include detecting, in a sample from the Subject, the presence In another aspect, the invention features a method of or absence of a genetic alteration characterized by at least analyzing a plurality of probes or a sample. The method is one of an alteration affecting the integrity of a gene encoding US 7,282,360 B2 127 128 a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 26583, 21953, m32404, 14089, or 23436 nucleic acid or a protein, or the mis-expression of the 53070, 15985, 26583, putative variant (e.g., allelic variant) thereof. A probe can 21953, m32404, 14089, or 23436 gene. For example, such have one or more mismatches to a region of a 53070, 15985, genetic alterations can be detected by ascertaining the exist 26583, 21953, m32404, 14089, or 23436 nucleic acid (e.g., ence of at least one of 1) a deletion of one or more a destabilizing mismatch). The arrays can have a high nucleotides from a 53070, 15985, 26583, 21953, m32404, density of addresses, e.g., can contain hundreds or thousands 14089, or 23436 gene; 2) an addition of one or more of oligonucleotides probes (Cronin, M. T. et al. (1996) nucleotides to a 53070, 15985, 26583, 21953, m32404, Human Mutation 7: 244-255; Kozal, M. J. et al. (1996) 14089, or 23436 gene; 3) a substitution of one or more Nature Medicine 2: 753-759). For example, genetic muta nucleotides of a 53070, 15985, 26583, 21953, m32404, 10 tions in 53070, 15985, 26583, 21953, m32404, 14089, or 14089, or 23436 gene, 4) a chromosomal rearrangement of 23436 can be identified in two-dimensional arrays contain a 53070, 15985, 26583, 21953, m32404, 14089, or 23436 ing light-generated DNA probes as described in Cronin, M. gene; 5) an alteration in the level of a messenger RNA T. et al. supra. Briefly, a first hybridization array of probes transcript of a 53070, 15985, 26583,21953, m32404, 14089, can be used to scan through long stretches of DNA in a or 23436 gene, 6) aberrant modification of a 53070, 15985, 15 sample and control to identify base changes between the 26583,21953, m32404, 14089, or 23436 gene, such as of the sequences by making linear arrays of sequential overlapping methylation pattern of the genomic DNA, 7) the presence of probes. This step allows the identification of point muta a non-wild type splicing pattern of a messenger RNA tions. This step is followed by a second hybridization array transcript of a 53070, 15985, 26583,21953, m32404, 14089, that allows the characterization of specific mutations by or 23436 gene, 8) a non-wild type level of a 53070, 15985, using Smaller, specialized probe arrays complementary to all 26583, 21953, m32404, 14089, or 23436-protein, 9) allelic variants or mutations detected. Each mutation array is com loss of a 53070, 15985, 26583, 21953, m32404, 14089, or posed of parallel probe sets, one complementary to the 23436 gene, and 10) inappropriate post-translational modi wild-type gene and the other complementary to the mutant fication of a 53070, 15985, 26583, 21953, m32404, 14089, gene. or 23436-protein. 25 In yet another embodiment, any of a variety of sequencing An alteration can be detected without a probe?primer in a reactions known in the art can be used to directly sequence polymerase chain reaction, such as anchor PCR or RACE the 53070, 15985, 26583, 21953, m32404, 14089, or 23436 PCR, or, alternatively, in a ligation chain reaction (LCR), the gene and detect mutations by comparing the sequence of the latter of which can be particularly useful for detecting point sample 53070, 15985, 26583, 21953, m32404, 14089, or mutations in the 53070, 15985, 26583, 21953, m32404, 30 23436 with the corresponding wild-type (control) sequence. 14089, or 23436-gene. This method can include the steps of Automated sequencing procedures can be utilized when collecting a sample of cells from a subject, isolating nucleic performing the diagnostic assays (1995) Biotechniques acid (e.g., genomic, mRNA or both) from the sample, 19:448), including sequencing by mass spectrometry. contacting the nucleic acid sample with one or more primers Other methods for detecting mutations in the 53070, which specifically hybridize to a 53070, 15985, 26583, 35 15985, 26583, 21953, m32404, 14089, or 23436 gene 21953, m32404, 14089, or 23436 gene under conditions include methods in which protection from cleavage agents is such that hybridization and amplification of the 53070, used to detect mismatched bases in RNA/RNA or RNA/ 15985, 26583, 21953, m32404, 14089, or 23436-gene (if DNA heteroduplexes (Myers et al. (1985) Science 230:1242: present) occurs, and detecting the presence or absence of an Cotton et al. (1988) Proc. Natl Acad Sci USA 85.4397: amplification product, or detecting the size of the amplifi 40 Saleeba et al. (1992) Methods Enzymol. 217:286-295). cation product and comparing the length to a control sample. In still another embodiment, the mismatch cleavage reac It is anticipated that PCR and/or LCR may be desirable to tion employs one or more proteins that recognize mis use as a preliminary amplification step in conjunction with matched base pairs in double-stranded DNA (so called any of the techniques used for detecting mutations described “DNA mismatch repair enzymes) in defined systems for herein. Alternatively, other amplification methods described 45 detecting and mapping point mutations in 53070, 15985, herein or known in the art can be used. 26583, 21953, m32404, 14089, or 23436 cDNAs obtained In another embodiment, mutations in a 53070, 15985, from samples of cells. For example, the mutY enzyme of E. 26583, 21953, m32404, 14089, or 23436 gene from a coli cleaves A at G/A mismatches and the thymidine DNA sample cell can be identified by detecting alterations in glycosylase from HeLa cells cleaves T at G/T mismatches restriction enzyme cleavage patterns. For example, sample 50 (Hsu et al. (1994) Carcinogenesis 15:1657-1662; U.S. Pat. and control DNA is isolated, amplified (optionally), digested No. 5,459,039). with one or more restriction endonucleases, and fragment In other embodiments, alterations in electrophoretic length sizes are determined, e.g., by gel electrophoresis and mobility will be used to identify mutations in 53070, 15985, compared. Differences in fragment length sizes between 26583, 21953, m32404, 14089, or 23436 genes. For sample and control DNA indicates mutations in the sample 55 example, single strand conformation polymorphism (SSCP) DNA. Moreover, the use of sequence specific ribozymes may be used to detect differences in electrophoretic mobility (see, for example, U.S. Pat. No. 5,498.531) can be used to between mutant and wild type nucleic acids (Orita et al. score for the presence of specific mutations by development (1989) Proc Natl. Acad. Sci USA: 86:2766, see also Cotton or loss of a ribozyme cleavage site. (1993) Mutat. Res. 285:125-144; and Hayashi (1992) Genet. In other embodiments, genetic mutations in 53070, 60 Anal. Tech. Appl. 9:73-79). Single-stranded DNA fragments 15985, 26583, 21953, m32404, 14089, or 23436 can be of sample and control 53070, 15985, 26583, 21953, identified by hybridizing a sample and control nucleic acids, m32404, 14089, or 23436 nucleic acids will be denatured e.g., DNA or RNA, two-dimensional arrays, e.g., chip based and allowed to renature. The secondary structure of single arrays. Such arrays include a plurality of addresses, each of Stranded nucleic acids varies according to sequence, the which is positionally distinguishable from the other. A 65 resulting alteration in electrophoretic mobility enables the different probe is located at each address of the plurality. A detection of even a single base change. The DNA fragments probe can be complementary to a region of a 53070, 15985, may be labeled or detected with labeled probes. The sensi